idnits 2.17.1 draft-wagner-conex-audit-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (February 14, 2014) is 3696 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC5681' is defined on line 386, but no explicit reference was found in the text Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ConEx Working Group D. Wagner 3 Internet-Draft M. Kuehlewind 4 Intended status: Informational University of Stuttgart 5 Expires: August 18, 2014 February 14, 2014 7 Auditing of Congestion Exposure (ConEx) signals 8 draft-wagner-conex-audit-01 10 Abstract 12 Congestion Exposure (ConEx) is a mechanism by which senders inform 13 the network about the congestion encountered by previous packets on 14 the same flow. Reliable auditing is necessary to provide a strong 15 incentive to declare ConEx information honestly. This document 16 defines how the signals are handled by an audit and lists 17 requirements for an audit implementation. This document does not 18 mandate a particular design but identifies state and functions that 19 any auditor element must provide to fullfil the requirements stated 20 in [draft-ietf-conex-abstract-mech]. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on August 18, 2014. 39 Copyright Notice 41 Copyright (c) 2014 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 57 1.1. Meaning of the Re-Echo Signals . . . . . . . . . . . . . 2 58 1.2. Meaning of Credit Signal . . . . . . . . . . . . . . . . 3 59 1.3. Definitions . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Audit Implementation . . . . . . . . . . . . . . . . . . . . 3 61 2.1. Placing an Audit Element . . . . . . . . . . . . . . . . 4 62 2.2. Per Flow State . . . . . . . . . . . . . . . . . . . . . 4 63 2.3. Penalty Criteria . . . . . . . . . . . . . . . . . . . . 6 64 2.4. Appropriately Penalizing Misbehaving Flows . . . . . . . 7 65 2.5. Audit start for existing connections . . . . . . . . . . 7 66 2.6. Handling Loss of ConEx-marked Packets . . . . . . . . . . 8 67 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 68 4. Security Considerations . . . . . . . . . . . . . . . . . . . 8 69 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 70 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 71 6.1. Normative References . . . . . . . . . . . . . . . . . . 8 72 6.2. Informative References . . . . . . . . . . . . . . . . . 9 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 75 1. Introduction 77 In order to make ConEx information useful, reliable auditing is 78 necessary to provide a strong incentive to declare ConEx information 79 honestly. However, there is always a delay between congestion events 80 and the respective ConEx signal at the audit. In 81 [draft-ietf-conex-abstract-mech] it is proposed to use credit signals 82 sent in advance to cover potential congestion in the next feedback 83 delay duration. 85 The ConEx signal is based on loss or Explicit Congestion Notification 86 (ECN) marks [RFC3168] as a congestion indication. Following 87 [draft-ietf-conex-abstract-mech] (Section 4.4), ConEx signaling has 88 to encode ConEx capability, Re-Echo-Loss (L), Re-Echo-ECN (E) and 89 credit (C). 91 1.1. Meaning of the Re-Echo Signals 93 By the Re-Echo-Loss signal a sender exposes to the network that this 94 transport has experienced loss very recently. By the Re-Echo-ECN 95 signal a sender exposes to the network that this transport has 96 experienced an ECN-CE mark very recently. For the audit this means, 97 that if it detects a loss or an ECN-CE mark for a ConEx-enabled flow, 98 for a compliant sender the corresponding Re-Echo-Loss or Re-Echo-ECN 99 signals must be observed in the near future. 101 1.2. Meaning of Credit Signal 103 The Credit signal represents potential for congestion. A ConEx- 104 enabled sender should signal sufficient credit in advance to any 105 congestion event. If a congestion event occurs, a corresponding 106 amount of credit is consumed. If the sender intends to take the same 107 risk again, it just must replace this consumed credit as non-consumed 108 credit does not expire. 110 1.3. Definitions 112 Congestion signal 113 The occurrence of a packet loss or ECN-CE mark. We do not consider 114 other signs such as increased delay as congestion signals. 116 Congestion event 117 One or more congestion signals within one RTT. For today's 118 congestion control algorithms all these signals trigger just one 119 reaction regardless of their number. 121 Connection 122 A connection between transport layer endpoints that allows 123 bidirectional signalling, e.g. a TCP connection. 125 Flow 126 The flow of packets of a connection in one direction. Therefore for 127 a flow sender and receiver are well defined. With regard to ConEx 128 auditing, only one flow of any observed connection is audited, see 129 Section 2.1. 131 2. Audit Implementation 133 An audit element shall be a shadow device, i.e. its presence should 134 not be detectable for well-behaving senders. The objective of an 135 audit is to verify that senders correctly signals ConEx information 136 and to penalize cheaters. For this, the audit element has to 137 maintain state for any active ConEx-enabled flow. It may maintain 138 appropriate timers to remove flows that have been idle for too long. 139 Flows can be audited independently, there are no dependencies. There 140 are two aspects the auditor has to check for each flow: 142 o if the congestion reported using the ConEx mechanism matches the 143 congestion actually observed by the receivers and 145 o if sufficient credit marks have been sent to signal the congestion 146 risk in advance. 148 The audit penalizes a flow if it fails either of these two criteria. 150 This document does not mandate a particular design but identifies 151 state and functions that any auditor element must provide to fullfil 152 the requirements stated in [draft-ietf-conex-abstract-mech]. 154 2.1. Placing an Audit Element 156 An audit element should be placed so that it can surveil all 157 congestion signals of audited flows. If both congestion signals, ECN 158 and loss, are detected directly, all auditing should take place 159 beyond any potential source of congestion, i.e. any potential 160 bottleneck, towards the receiver of that flow. In that case it must 161 be placed beyond any potential multi path routing in order to be able 162 to identify all packet losses. For unencrypted TCP and maybe other 163 protocols, losses can be easily detected indirectly by monitoring the 164 sender's retransmissions, making loss auditing simple and reliable at 165 the same time. Such ConEx-loss audit function must be placed close 166 to the sender before any bottleneck where retransmissions might get 167 lost. Therefore it might make sense to have ConEx-loss auditing and 168 ConEx-ECN auditing separated. Nevertheless, this applies for certain 169 deployment scenarios only. Therefore we describe a combined loss and 170 ECN ConEx auditor in the following which must be placed close to the 171 receiver, beyond any potential bottleneck. 173 2.2. Per Flow State 175 ConEx auditing must be performed per incoming ConEx-enabled flow, so 176 all monitoring, assessment and penalizing is per flow. 178 An audit maintains state for each active connection that is updated 179 on every packet of that flow. Such state entry is created when the 180 first packet of an unknown flow is observed. It is deleted when 181 either the corresponding connection is closed conforming to its 182 transport layer protocol or a timeout expired. This timeout should 183 be chosen to keep false negatives low, i.e. avoiding timing out still 184 active flows. In contrast, false positives, recognizing two flows as 185 one, are expected typically being a smaller issue since in most cases 186 the sender is the same host and either complies to the protocol or 187 not. We recommend setting this timeout to 60 seconds, a value also 188 common e.g. in NAT middle boxes. 190 An audit should maintain an RTT_MAX estimation per flow. This value 191 should be as close to the maximum RTT observed by the sender as 192 possible. RTT_MAX must not be chosen smaller than the RTT observed 193 by the sender. 194 An audit maintains the following variables per flow: 196 o ECN-CE counter (ECN-CE codepoint set) 197 When an ECN-CE-marked packet is observed, the counter is 198 increased. 200 o Loss counter (to be detected by the audit element, see also 201 Section 5.5 in [draft-ietf-conex-abstract-mech]) 202 When a loss is observed, the counter is increased by the numer of 203 IP payload bytes. 205 o Re-Echo-ECN counter (ConEx signal E) 206 When Re-Echo-ECN-marked packet is seen, the counter is increased 207 by the numer of IP payload bytes. 209 o Re-Echo-Loss counter (ConEx signal L) 210 When Re-Echo-Loss-marked packet is seen, the counter is increased 211 by the numer of IP payload bytes. 213 o Credit state 214 Whenever a ConEx-marked packet (Re-Echo-Loss or Re-Echo-ECN) is 215 seen and Credit state is greater than zero, the counter is 216 decreased by the numer of IP payload bytes. When a Credit-marked 217 packet is seen, the counter is increased by the numer of IP 218 payload bytes. 219 Please note that the meaning of the Credit variable differs from 220 the other variables: While all other variables are life-time 221 counters for the flow and thus grow monotonously, the credit 222 buffer just reflects the current signaled credit. It shrinks and 223 grows as congestion is experienced and credit is sent. 225 o RTT_MAX 227 o is_in_penalty_state 229 o p, an EWMA of the congestion rate (loss and/or ECN-CE marks) 231 o x, an EWMA of the rate of Re-Echo-Loss and/or Re-Echo-ECN marks. 232 If a packet carries both flags, it must be counted twice. 234 o current drop probability 236 If the flow is part of a bidirectional connection, an auditor may use 237 information from the return flow in order to define RTT_MAX and to 238 detect packet losses. 240 2.3. Penalty Criteria 242 Generally, a connection is judged on three criteria, one concerning 243 exposure of loss, one on exposure of ECN-based congestion signaling 244 and one on announcing potential congestion by credit. A flow is 245 considered misbehaving if at least one of the three conditions is 246 met. 248 A connection is assumed behaving abusive if 250 o the Credit state is zero, 252 o losses observed in the last 2x RTT_MAX period are not exposed by 253 Re-Echo-Loss signals, and 255 o ECN-CE signals received in the last 2x RTT_MAX period are not 256 exposed by Re-Echo-Loss signals. 258 The first criterion should be checked each time a packet of that flow 259 towards the destination is observed. If Credit state is zero, 260 is_in_penalty_state is set to true, else set to false. 261 The other two criteria shall be checked periodically based on 262 timeouts. The timeout t must be equal or bigger than 2x RTT_MAX. 263 There are n timers used and n should be equal or bigger than 2. To 264 do the check, an audit element must store n snapshots of the ECN-CE 265 and loss counter. When the timeout fires, the oldest set of values 266 is compared with the current values of Re-echo-Loss and Re-Echo-ECN 267 respectively. If the saved loss counter is greater than the current 268 Re-Echo-Loss counter, or saved ECN-CE counter is greater than the 269 current Re-Echo-ECN counter, is_in_penalty_state is set to true, else 270 set to false. 272 A flow may have not enough data at a time where it needs to send 273 ConEx markings and by this fall into misbehaving state 274 (is_in_penalty_state is true) during a phase of inactivity. If the 275 sender then restarts sending packets carrying markings for all failed 276 criteria, the sender is assumed being well-behaving (in dubio pro 277 reo). Therefore the auditor shall not drop packets which carry all 278 required flags, but use the normal penalty on all others. 280 For example, if credit is zero and the losses experienced in the 281 2xRTT_MAX period are not compensated by sufficient Re-Echo-Loss 282 signals, packets carrying both the C and the L flag will not be 283 subject of the penalty function. Nevertheless, packets carrying only 284 the C-flag or only the L-flag will. 286 2.4. Appropriately Penalizing Misbehaving Flows 288 If a flow is detected to misbehave, the audit must start penalizing 289 immediately. The only actually possible penalty is dropping packets 290 (with a certain probability). In order to not incentivize senders to 291 simply start new flows when detecting being penalized by an audit 292 element, the penalty of a misbehaving flow should be proportional to 293 the misbehavior. 295 Please note that we require the sender to make sure that any ConEx 296 mark will reach the receiver, so it is responsible for timely 297 retransmission of any lost ConEx signal. 299 The actual drop rate must provide a tangible disadvantage to the 300 sender but should not make the connection unusable. An auditor 301 should aim at forwarding not more packets than would have been 302 successfully sent with the exposed congestion rate. Since the 303 congestion rate may vary over time, the auditor should use an 304 exponentially-weighted moving average (EWMA) for each flow to define 305 the congestion rate p. An auditor should also maitain an EWMA for 306 the rate of ConEx-signals (Re-Echo-Loss and Re-Echo-ECN) x. 308 TODO: what are appropriate weighting factors alpha in EWMA? 310 Assume the packet rate is r and congestion rate is p, but only p-x 311 congestion is signaled by the sender using ConEx (c < p). The audit 312 should aim at giving the flow just a rate of r*(x/p). In other 313 words, it should drop (p-x)/p of the traffic, so its drop probability 314 should be (p-x)/p. Therefore the audit just keeps updated x and p, 315 and derives the drop probability as (p-x)/p. 317 2.5. Audit start for existing connections 319 An audit may be started with zero state information on existing 320 flows, e.g. due to (re-)started audit or re-routing of flows. As 321 credits will have been sent in advance of congestion events, it is 322 possible that no valid credit state is available at the audit when a 323 congestion event occurs. An audit implementation should take this 324 into account by ignoring the first criterion for some time. We 325 recommend starting to take credit into account after one minute. 327 TODO: is this the right way? this should be enough for the first 328 congestion epoche, but disregards credit build up in slow start. Or 329 give some credit on Credit for the start? how much? 331 2.6. Handling Loss of ConEx-marked Packets 333 ConEx-marked packets will be sent just after the sender noticed a 334 congestion signal, so often this sender will just have reduced its 335 sending rate. Thus the loss probability for ConEx-marked packets is 336 expected to be lower than for the average flow. Nevertheless, ConEx- 337 marked packets can be lost. The sender should re-send the ConEx- 338 signal. This induces additional delay for that ConEx-signal but this 339 is taken into account by using 2xRTT_MAX as threshold for penalties. 340 By that false positives of the auditor misbehavior detection are 341 avoided. Only if two ConEx-marked packets are lost in subsequent 342 RTTs, the auditor will penalize a flow of a well-behaving sender. To 343 avoid even these rare cases at least for long-lasting connections, 344 the audit may use the fraction of lost packets of that connection to 345 allow for the same fraction of loss for each ConEx-mark (E, L and C) 346 for a time longer than 2xRTT_MAX. Nevertheless, the rare event of 347 loss of ConEx-marked packets will often cause the audit to penalize 348 the flow for one RTT. We deem this price being acceptable for the 349 clean and robust auditor design made possible by making the sender 350 responsible for successful delivery of ConEx signals. 352 3. Acknowledgements 354 We would like to thank Bob Briscoe for his input (based on research 355 work of his PhD thesis). 357 4. Security Considerations 359 Here known / identified attacks will be discussed. Bob Briscoe's 360 dissertation provides good material here. big TODO. 362 5. IANA Considerations 364 This document has no IANA considerations. 366 6. References 368 6.1. Normative References 370 [draft-ietf-conex-abstract-mech] 371 Mathis, M. and B. Briscoe, "Congestion Exposure (ConEx) 372 Concepts and Abstract Mechanism", draft-ietf-conex- 373 abstract-mech-08 (work in progress), October 2013. 375 [draft-ietf-conex-destopt] 376 Krishnan, S., Kuehlewind, M., and C. Ucendo, "IPv6 377 Destination Option for ConEx", draft-ietf-conex-destopt-05 378 (work in progress), March 2013. 380 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 381 of Explicit Congestion Notification (ECN) to IP", RFC 382 3168, September 2001. 384 6.2. Informative References 386 [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion 387 Control", RFC 5681, September 2009. 389 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 390 of Explicit Congestion Notification (ECN) to IP", RFC 391 3168, September 2001. 393 Authors' Addresses 395 David Wagner 396 University of Stuttgart 397 Pfaffenwaldring 47 398 70569 Stuttgart 399 Germany 401 Email: david.wagner@ikr.uni-stuttgart.de 403 Mirja Kuehlewind 404 University of Stuttgart 405 Pfaffenwaldring 47 406 70569 Stuttgart 407 Germany 409 Email: mirja.kuehlewind@ikr.uni-stuttgart.de