idnits 2.17.1 draft-vpolak-bmwg-plrsearch-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. ** The abstract seems to contain references ([RFC2544]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 13, 2018) is 1984 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC8174' is defined on line 256, but no explicit reference was found in the text Summary: 2 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Benchmarking Working Group M. Konstantynowicz, Ed. 3 Internet-Draft V. Polak, Ed. 4 Intended status: Informational Cisco Systems 5 Expires: May 17, 2019 November 13, 2018 7 Probabilistic Loss Ratio Search for Packet Throughput (PLRsearch) 8 draft-vpolak-bmwg-plrsearch-00 10 Abstract 12 This document addresses challenges while applying methodologies 13 described in [RFC2544] to benchmarking NFV (Network Function 14 Virtualization) over an extended period of time, sometimes referred 15 to as "soak testing". More specifically to benchmarking software 16 based implementations of NFV data planes. Packet throughput search 17 approach proposed by this document assumes that system under test is 18 probabilistic in nature, and not deterministic. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on May 17, 2019. 37 Copyright Notice 39 Copyright (c) 2018 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 2 55 2. Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 56 3. Poisson Distribution . . . . . . . . . . . . . . . . . . . . 3 57 4. Fitting Function Coefficients Distribution . . . . . . . . . 4 58 5. Algorithm Formulas . . . . . . . . . . . . . . . . . . . . . 5 59 5.1. Integration . . . . . . . . . . . . . . . . . . . . . . . 5 60 5.2. Optimizations . . . . . . . . . . . . . . . . . . . . . . 5 61 6. Known Implementations . . . . . . . . . . . . . . . . . . . . 5 62 6.1. FD.io CSIT Implementation Specifics . . . . . . . . . . . 5 63 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 64 8. Security Considerations . . . . . . . . . . . . . . . . . . . 6 65 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 6 66 10. Normative References . . . . . . . . . . . . . . . . . . . . 6 67 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 6 69 1. Motivation 71 Network providers are interested in throughput a device can sustain. 73 RFC 2544 assumes loss ratio is given by a deterministic function of 74 offered load. But NFV software devices are not deterministic 75 (enough). This leads for deterministic algorithms (such as MLRsearch 76 with single trial) to return results, which when repeated show 77 relatively high standard deviation, thus making it harder to tell 78 what "the throughput" actually is. 80 We need another algorithm, which takes this indeterminism into 81 account. 83 2. Model 85 Each algorithm searches for an answer to a precisely formulated 86 question. When the question involves indeterministic systems, it has 87 to specify probabilities (or prior distributions) which are tied to a 88 specific probabilistic model. Different models will have different 89 number (and meaning) of parameters. Complicated (but more realistic) 90 models have many parameters, and the math involved can be very 91 complicated. It is better to start with simpler probabilistic model, 92 and only change it when the output of the simpler algorithm is not 93 stable or useful enough. 95 TODO: Refer to packet forwarding terminology, such as "offered load" 96 and "loss ratio". 98 TODO: Mention that no packet duplication is expected (or is filtered 99 out). 101 TODO: Define critical load and critical region earlier. 103 This document is focused on algorithms related to packet loss count 104 only. No latency (or other information) is taken into account. For 105 simplicity, only one type of measurement is considered: dynamically 106 computed offered load, constant within trial measurement of 107 predetermined trial duration. 109 Also, running longer trials (in some situations) could be more 110 efficient, but in order to perform trial at multiple offered loads 111 withing critical region, trial durations should be kept as short as 112 possible. 114 3. Poisson Distribution 116 TODO: Give link to more officially published literature about Poisson 117 distribution. 119 Note-1: that the algorithm makes an assumption that packet traffic 120 generator detects duplicate packets on receive detection, and reports 121 this as an error. 123 Note-2: Binomial distribution is a better fit compared to Poisson 124 distribution (acknowledging that the number of packets lost cannot be 125 higher than the number of packets offered), but the difference tends 126 to be relevant in loads far above the critical region, so using 127 Poisson distribution helps the algorithm focus on critical region 128 better. 130 When comparing different offered loads, the average loss per second 131 is assumed to increase, but the (deterministic) function from offered 132 load into average loss rate is otherwise unknown. 134 Given a loss target (configurable, by default one packet lost per 135 second), there is an unknown offered load when the average is exactly 136 that. We call that the "critical load". If critical load seems 137 higher than maximum offerable load, we should use the maximum 138 offerable load to make search output more stable. 140 Of course, there are great many increasing functions. The offered 141 load has to be chosen for each trial, and the computed posterior 142 distribution of critical load can change with each trial result. 144 To make the space of possible functions more tractable, some other 145 simplifying assumption is needed. As the algorithm will be examining 146 (also) loads close to the critical load, linear approximation to the 147 function (TODO: name the function) in the critical region is 148 important. But as the search algorithm needs to evaluate the 149 function also far away from the critical region, the approximate 150 function has to be well- behaved for every positive offered load, 151 specifically it cannot predict non-positive packet loss rate. 153 Within this document, "fitting function" is the name for such well- 154 behaved function which approximates the unknown function in the 155 critical region. 157 Results from trials far from the critical region are likely to affect 158 the critical rate estimate negatively, as the fitting function does 159 not need to be a good approximation there. Instead of discarding 160 some results, or "suppressing" their impact with ad-hoc methods 161 (other than using Poisson distribution instead of binomial) is not 162 used, as such methods tend to make the overall search unstable. We 163 rely on most of measurements being done (eventually) within the 164 critical region, and overweighting far-off measurements (eventually) 165 for well-behaved fitting functions. 167 4. Fitting Function Coefficients Distribution 169 To accomodate systems with different behaviours, the fitting function 170 is expected to have few numeric parameters affecting its shape 171 (mainly affecting the linear approximation in the critical region). 173 The general search algorithm can use whatever increasing fitting 174 function, some specific functions can be described later. 176 TODO: Describe sigmoid-based and erf-based functions. 178 It is up to implementer to chose a fitting function and prior 179 distribution of its parameters. The rest of this document assumes 180 each parameter is independently and uniformly distributed over common 181 interval. Implementers are to add non-linear transformations into 182 their fitting functions if their prior is different. 184 TODO: Move the following sentence into more appropriate place. 186 Speaking about new trials, each next trial will be done at offered 187 load equal to the current average of the critical load. 189 Exit condition is either critical load stdev becoming small enough, 190 or overal search time becoming long enough. 192 The algorithm should report both avg and stdev for critical load. If 193 the reported averages follow a trend (without reaching equilibrium), 194 avg and stdev should refer to the equilibrium estibated based on the 195 trend, not to immediate posterior values. 197 TODO: Explicitly mention the iterative character of the search. 199 5. Algorithm Formulas 201 5.1. Integration 203 The posterior distributions for fitting function parameters will not 204 be integrable in general. 206 The search algorithm utilises the fact that trial measurement takes 207 some time, so this time can be used for numeric integration (using 208 suitable method, such as Monte Carlo) to achieve sufficient 209 precision. 211 5.2. Optimizations 213 After enough trials, the posterior distribution will be concentrated 214 in a narrow area of parameter space. The integration method could 215 take advantage of that. 217 Even in the concentrated area, the likelihood can be quite small, so 218 the integration algorithm should track the logarithm of the 219 likelihood, and also avoid underflow errors bu ther means. 221 6. Known Implementations 223 The only known working implementatin of Probabilistic Loss Ratio 224 Search for Packet Throughput is in Linux Foundation FD.io CSIT 225 project. https://wiki.fd.io/view/CSIT. https://git.fd.io/csit/. 227 6.1. FD.io CSIT Implementation Specifics 229 In a sample implemenation in FD.io CSIT project, there is around 0.5 230 second delay between trials due to restrictons imposed by packet 231 traffic generator in use (T-Rex), avoiding that delay is out of scope 232 of this document. 234 TODO: Describe how the current integration algorithm finds the 235 concentrated area. 237 7. IANA Considerations 239 .. 241 8. Security Considerations 243 .. 245 9. Acknowledgements 247 .. 249 10. Normative References 251 [RFC2544] Bradner, S. and J. McQuaid, "Benchmarking Methodology for 252 Network Interconnect Devices", RFC 2544, 253 DOI 10.17487/RFC2544, March 1999, 254 . 256 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 257 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 258 May 2017, . 260 Authors' Addresses 262 Maciek Konstantynowicz (editor) 263 Cisco Systems 265 Email: mkonstan@cisco.com 267 Vratko Polak (editor) 268 Cisco Systems 270 Email: vrpolak@cisco.com