idnits 2.17.1 draft-pan-tsvwg-hpccplus-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 11, 2020) is 1323 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-ietf-avtcore-cc-feedback-message-08 == Outdated reference: A later version (-17) exists of draft-ietf-ippm-ioam-data-09 == Outdated reference: A later version (-07) exists of draft-kumar-ippm-ifa-01 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group R. Miao 3 Internet-Draft H. Liu 4 Intended status: Experimental Alibaba Group 5 Expires: March 15, 2021 R. Pan 6 J. Lee 7 C. Kim 8 Intel Corporation 9 B. Gafni 10 Y. Shpigelman 11 Mellanox Technologies, Inc. 12 September 11, 2020 14 HPCC++: Enhanced High Precision Congestion Control 15 draft-pan-tsvwg-hpccplus-02 17 Abstract 19 Congestion control (CC) is the key to achieving ultra-low latency, 20 high bandwidth and network stability in high-speed networks. 21 However, the existing high-speed CC schemes have inherent limitations 22 for reaching these goals. 24 In this document, we describe HPCC++ (High Precision Congestion 25 Control), a new high-speed CC mechanism which achieves the three 26 goals simultaneously. HPCC++ leverages inband telemetry to obtain 27 precise link load information and controls traffic precisely. By 28 addressing challenges such as delayed inband telemetry information 29 during congestion and overreaction to inband telemetry information, 30 HPCC++ can quickly converge to utilize free bandwidth while avoiding 31 congestion, and can maintain near-zero in-network queues for ultra- 32 low latency. HPCC++ is also fair and easy to deploy in hardware, 33 implementable with commodity NICs and switches. 35 Status of This Memo 37 This Internet-Draft is submitted in full conformance with the 38 provisions of BCP 78 and BCP 79. 40 Internet-Drafts are working documents of the Internet Engineering 41 Task Force (IETF). Note that other groups may also distribute 42 working documents as Internet-Drafts. The list of current Internet- 43 Drafts is at https://datatracker.ietf.org/drafts/current/. 45 Internet-Drafts are draft documents valid for a maximum of six months 46 and may be updated, replaced, or obsoleted by other documents at any 47 time. It is inappropriate to use Internet-Drafts as reference 48 material or to cite them other than as "work in progress." 49 This Internet-Draft will expire on March 15, 2021. 51 Copyright Notice 53 Copyright (c) 2020 IETF Trust and the persons identified as the 54 document authors. All rights reserved. 56 This document is subject to BCP 78 and the IETF Trust's Legal 57 Provisions Relating to IETF Documents 58 (https://trustee.ietf.org/license-info) in effect on the date of 59 publication of this document. Please review these documents 60 carefully, as they describe your rights and restrictions with respect 61 to this document. Code Components extracted from this document must 62 include Simplified BSD License text as described in Section 4.e of 63 the Trust Legal Provisions and are provided without warranty as 64 described in the Simplified BSD License. 66 Table of Contents 68 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 69 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 3. System Overview . . . . . . . . . . . . . . . . . . . . . . . 4 71 4. HPCC++ Algorithm . . . . . . . . . . . . . . . . . . . . . . 5 72 4.1. Notations . . . . . . . . . . . . . . . . . . . . . . . . 5 73 4.2. Design Functions and Procedures . . . . . . . . . . . . . 6 74 5. Configuration Parameters . . . . . . . . . . . . . . . . . . 8 75 6. Design Enhancement and Implementation . . . . . . . . . . . . 8 76 6.1. HPCC++ Guidelines . . . . . . . . . . . . . . . . . . . . 9 77 6.2. Receiver-based HPCC . . . . . . . . . . . . . . . . . . . 9 78 7. Reference Implementations . . . . . . . . . . . . . . . . . . 10 79 7.1. Inband telemetry padding at the network elements . . . . 10 80 7.2. Congestion control at NICs . . . . . . . . . . . . . . . 10 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 12 83 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 84 11. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 12 85 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 86 12.1. Normative References . . . . . . . . . . . . . . . . . . 12 87 12.2. Informative References . . . . . . . . . . . . . . . . . 13 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13 90 1. Introduction 92 The link speed in data center networks has grown from 1Gbps to 93 100Gbps in the past decade, and this growth is continuing. Ultralow 94 latency and high bandwidth, which are demanded by more and more 95 applications, are two critical requirements in today's and future 96 high-speed networks. 98 Given that traditional software-based network stacks in hosts can no 99 longer sustain the critical latency and bandwidth requirements 100 [Zhu-SIGCOMM2015], offloading network stacks into hardware is an 101 inevitable direction in high-speed networks. Large-scale networks 102 with RDMA (remote direct memory access) often uses hardware- 103 offloading solutions. In some cases, the RDMA networks still face 104 fundamental challenges to reconcile low latency, high bandwidth 105 utilization, and high stability. 107 This document describes a new CC mechanism, HPCC++ (Enhanced High 108 Precision Congestion Control), for large-scale, high-speed networks. 109 The key idea behind HPCC++ is to leverage the precise link load 110 information from inband telemetry to compute accurate flow rate 111 updates. Unlike existing approaches that often require a large 112 number of iterations to find the proper flow rates, HPCC++ requires 113 only one rate update step in most cases. Using precise information 114 from inband telemetry enables HPCC++ to address the limitations in 115 current CC schemes. First, HPCC++ senders can quickly ramp up flow 116 rates for high utilization and ramp down flow rates for congestion 117 avoidance. Second, HPCC++ senders can quickly adjust the flow rates 118 to keep each link's output rate slightly lower than the link's 119 capacity, preventing queues from being built-up as well as preserving 120 high link utilization. Finally, since sending rates are computed 121 precisely based on direct measurements at switches, HPCC++ requires 122 merely three independent parameters that are used to tune fairness 123 and efficiency. 125 The base form of HPCC++ is the original HPCC algorithm and its full 126 description can be found in [SIGCOMM-HPCC]. While the original 127 design lays the foundation for inband telemetry based precision 128 congestion control, HPCC++ is an enhanced version which takes into 129 account system constraints and aims to reduce the design overhead and 130 further improves the performance. Section 6 describes these detailed 131 proposed design enhancements and guidelines. 133 HPCC++ proposes a new architecture for congestion control in large- 134 scale, high-speed networks. On one hand, HPCC++ leverages the inband 135 telemetry for congestion feedback, which offers more precise link 136 load information for congestion avoidance than conventional signals 137 such as ECN or RTT. This draft describes the architecture changes in 138 switches and end-host to support inband telemetry and proves the 139 efficiency in handling network congestion. On the other hand, HPCC++ 140 is generic to support a wide range of transport protocols such as 141 TCP, UDP, iWARP, etc. It requires to have the window limit and 142 congestion feedback through ACK self-clocking, which naturally 143 conforms to the paradigm of TCP/iWARP design. However, HPCC++ 144 introduces a scheme to measure the total inflight bytes for more 145 precise congestion control. To run in UDP, some modifications need 146 to be done to enforce the window limit and collect congestion 147 feedback via probing packets, which is incremental. In addition, 148 this new architecture should work for both datacenter and the WAN 149 networks, if the inband telemetry is supported in network switches 150 and end-host protocols. 152 2. Terminology 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 156 "OPTIONAL" in this document are to be interpreted as described in BCP 157 14 [RFC2119] [RFC8174] when, and only when, they appear in all 158 capitals, as shown here. 160 3. System Overview 162 Figure 1 shows the end-to-end system that HPCC++ operates in. During 163 the traverse of the packet from the sender to the receiver, each 164 switch along the path inserts inband telemetry that reports the 165 current state of the packet's egress port, including timestamp (ts), 166 queue length (qLen), transmitted bytes (txBytes), and the link 167 bandwidth capacity (B), together with switch_ID and port_ID. When 168 the receiver gets the packet, it may copy all the inband telemetry 169 recorded from the network to the ACK message it sends back to the 170 sender, and then the sender decides how to adjust its flow rate each 171 time it receives an ACK with network load information. 172 Alternatively, the receiver may calculate the flow rate based on the 173 inband telemetry information and feedback the calculated rate back to 174 the sender. The notification packets would include delayed ack 175 information as well. 177 Note that there also exist network nodes along the reverse 178 (potentially uncongested) path that the RTCP feedback reports 179 traverse. Those network nodes are not shown in the figure for sake 180 of brevity. 182 +---------+ pkt +-------+ pkt+tlm +-------+ pkt+tlm +----------+ 183 | Data |-------->| |-------->| |-------->| Data | 184 | Sender |=========|Switch1|=========|Switch2|=========| Receiver | 185 +---------+ Link-0 +-------+ Link-1 +-------+ Link-2 +----------+ 186 /|\ | 187 | | 188 +---------------------------------------------------------+ 189 Notification Packets/ACKs 191 Figure 1: System Overview (tlm=inband telemtry) 193 o Data sender: responsible for controlling inflight bytes. HPCC++ 194 is a window-based CC scheme that controls the number of inflight 195 bytes. The inflight bytes mean the amount of data that have been 196 sent, but not acknowledged at the sender yet. Controlling 197 inflight bytes has an important advantage compared to controlling 198 rates. In the absence of congestion, the inflight bytes and rate 199 are interchangeable with equation inflight = rate * T where T is 200 the base propagation RTT. The rate can be calculated locally or 201 obtained from the notification packet. The sender may further use 202 the data pacing mechanism in hardware to limit the rate 203 accordingly. 205 o Network nodes: responsible of inserting the inband telemetry 206 information to the data packet. The inband telemetry information 207 reports the current load of the packet's egress port, including 208 timestamp (ts), queue length (qLen), transmitted bytes (txBytes), 209 and the link bandwidth capacity (B). Besides, the inband 210 telemetry contains switch_ID and port_ID to identify a link. 212 o Data receiver: responsible for either reflecting back the inband 213 telemetry information in the data packet or calculating the proper 214 flow rate based on network congestion information in inband 215 telemetry and sending notification packets back to the sender. 217 4. HPCC++ Algorithm 219 HPCC++ is a window-based congestion control algorithm. The key 220 design choice of HPCC++ is to rely on network nodes to provide fine- 221 grained load information, such as queue size and accumulated tx/rx 222 traffic to compute precise flow rates. This has two major benefits: 223 (i) HPCC++ can quickly converge to proper flow rates to highly 224 utilize bandwidth while avoiding congestion; and (ii) HPCC++ can 225 consistently maintain a close-to-zero queue for low latency. 227 This section introduces the list of notations and describes the core 228 congestion control algorithm. 230 4.1. Notations 232 This section summarizes the list of variables and parameters used in 233 the HPCC++ algorithm. Figure 3 also includes the default values for 234 choosing the algorithm parameters either to represent a typical 235 setting in practical applications or based on theoretical and 236 simulation studies. 238 +--------------+-------------------------------------------------+ 239 | Notation | Variable Name | 240 +--------------+-------------------------------------------------+ 241 | W_i | Window for flow i | 242 | Wc_i | Reference window for flow i | 243 | B_j | Bandwidth for Link j | 244 | I_j | Estimated inflight bytes for Link j | 245 | U_j | Normalized inflight bytes for Link j | 246 | qlen | Telemetry info: link j queue length | 247 | txRate | Telemetry info: link j output rate | 248 | ts | Telemetry info: timestamp | 249 | txBytes | Telemetry info: link j total transmitted bytes | 250 | | associated with timestamp ts | 251 +--------------+-------------------------------------------------+ 253 Figure 2: List of variables. 255 +--------------+----------------------------------+----------------+ 256 | Notation | Parameter Name | Default Value | 257 +--------------+----------------------------------+----------------+ 258 | T | Known baseline RTT | 5us | 259 | eta | Target link utilization | 95% | 260 | maxStage | Maximum stages for additive | | 261 | | increases | 5 | 262 | N | Maximum number of flows | ... | 263 | W_ai | Additive increase amount | ... | 264 +--------------+----------------------------------+----------------+ 266 Figure 3: List of algorithm parameters and their default values. 268 4.2. Design Functions and Procedures 270 The HPCC++ algorithm can be outlined as below: 272 1: Function MeasureInflight(ack) 273 2: u = 0; 274 3: for each link i on the path do 275 4: ack.L[i].txBytes-L[i].txBytes 276 txRate = ----------------------------- ; 277 ack.L[i].ts-L[i].ts 278 5: min(ack.L[i].qlen,L[i].qlen) txRate 279 u' = ----------------------------- + ---------- ; 280 ack.L[i].B*T ack.L[i].B 281 6: if u' > u then 282 7: u = u'; tau = ack.L[i].ts - L[i].ts; 283 8: tau = min(tau, T); 284 9: U = (1 - tau/T)*U + tau/T*u; 285 10: return U; 286 11: Function ComputeWind(U, updateWc) 287 12: if U >= eta or incStage >= maxStagee then 288 13: Wc 289 W = ----- + W_ai; 290 U/eta 291 14: if updateWc then 292 15: incStagee = 0; Wc = W ; 293 16: else 294 17: W = Wc + W_ai ; 295 18: if updateWc then 296 19: incStage++; Wc = W ; 297 20: return W 299 21: Procedure NewAck(ack) 300 22: if ack.seq > lastUpdateSeq then 301 23: W = ComputeWind(MeasureInflight(ack), True); 302 24: lastUpdateSeq = snd_nxt; 303 25: else 304 26: W = ComputeWind(MeasureInflight(ack), False); 305 27: R = W/T; L = ack.L; 307 The above illustrates the overall process of CC at the sender side 308 for a single flow. Each newly received ACK message triggers the 309 procedure NewACK at Line 21. At Line 22, the variable lastUpdateSeq 310 is used to remember the first packet sent with a new W c , and the 311 sequence number in the incoming ACK should be larger than 312 lastUpdateSeq to trigger a new sync betweenW c andW (Line 14-15 and 313 18-19). The sender also remembers the pacing rate and current inband 314 telemetry information at Line 27. The sender computes a new window 315 size W at Line 23 or Line 26, depending on whether to update W c , 316 with function MeasureInflight and ComputeWind. Function 317 MeasureInflight estimates normalized inflight bytes with Eqn (2) at 318 Line 5. First, it computes txRate of each link from the current and 319 last accumulated transferred bytes txBytes and timestamp ts (Line 4). 320 It also uses the minimum of the current and last qlen to filter out 321 noises in qlen (Line 5). The loop from Line 3 to 7 selects maxi(Ui) 322 in Eqn. (3). Instead of directly using maxi(Ui), we use an EWMA 323 (Exponentially Weighted Moving Average) to filter the noises from 324 timer inaccuracy and transient queues. (Line 9). Function 325 ComputeWind combines multiplicative increase/ decrease (MI/MD) and 326 additive increase (AI) to balance the reaction speed and fairness. 327 If a sender finds it should increase the window size, it first tries 328 AI for maxStage times with the stepWAI (Line 17). If it still finds 329 room to increase after maxStage times of AI or the normalized 330 inflight bytes is above, it calls Eqn (4) once to quickly ramp up or 331 ramp down the window size (Line 12-13). 333 5. Configuration Parameters 335 HPCC++ has three easy-to-set parameters: eta, maxStagee, and W_ai. 336 eta controls a simple tradeoff between utilization and transient 337 queue length (due to the temporary collision of packets caused by 338 their random arrivals, so we set it to 95% by default, which only 339 loses 5% bandwidth but achieves almost zero queue. maxStage controls 340 a simple tradeoff between steady state stability and the speed to 341 reclaim free bandwidth. We find maxStage = 5 is conservatively large 342 for stability, while the speed of reclaiming free bandwidth is still 343 much faster than traditional additive increase, especially in high 344 bandwidth networks. W_ai controls the tradeoff between the maximum 345 number of concurrent flows on a link that can sustain near-zero 346 queues and the speed of convergence to fairness. Note that none of 347 the three parameters are reliability-critical. 349 HPCC++'s design brings advantages to short-lived flows, by allowing 350 flows starting at line-rate and the separation of utilization 351 convergence and fairness convergence. HPCC++ achieves fast 352 utilization convergence to mitigate congestion in almost one round- 353 trip time, while allows flows to gradually converge to fairness. 354 This design feature of HPCC++ is especially helpful for the workload 355 of datacenter applications, where flows are usually short and 356 latency-sensitive. Normally we set a very small W_ai to support a 357 large number of concurrent flows on a link, because slower fairness 358 is not critical. A rule of thumb is to set W_ai = W_init*(1-eta) / N 359 where N is the expected or receiver reported maximum number of 360 concurrent flows on a link. The intuition is that the total additive 361 increase every round (N*W_ai ) should not exceed the bandwidth 362 headroom, and thus no queue forms. Even if the actual number of 363 concurrent flows on a link exceeds N, the CC is still stable and 364 achieves full utilization, but just cannot maintain zero queues. 366 6. Design Enhancement and Implementation 368 The basic design of HPCC++, i.e. HPCC, as described above is to add 369 inband telemetry information into every data packet to response 370 congestion as soon as the very first packet observing the network 371 congestion. This is especially helpful to reduce the risk of severe 372 congestion in incast scenario at the first round-trip time. In 373 addition, original HPCC's algorithm introduction of Wc is for the 374 purpose of solving the over-reaction issue from using this per-packet 375 response. 377 Alternatively, the inband telemetry information needs not to be added 378 to every data packet to reduce the overhead. Switches can generate 379 inband telemetry less frequently, e.g., once per RTT or upon 380 congestion happening. 382 6.1. HPCC++ Guidelines 384 To ensure network stability, HPCC++ establishes a few guidelines for 385 different implementations: 387 o The algorithm should commit the window/rate update at most once 388 per round-trip time, similar to the procedure of updating Wc. 390 o To support different workloads and to properly set W_ai, HPCC++ 391 allows the option to incorporate mechanisms to speed up the 392 fairness convergence. 394 o The switch should capture inband telemetry information that 395 includes link load (txBytes, qlen, ts) and link spec (switch_ID, 396 port_ID, B) at the egress port. Note, each switch should record 397 all those information at the single snapshot to achieve a precise 398 link load estimate. 400 o HPCC++ can use a probe packet to query the inband telemetry 401 information. Thereby, the probe packets should take the same 402 routing path and QoS queueing with the data packets. 404 As long the above guidelines are met, this document does not mandate 405 a particular inband telemetry header format or encapsulation, which 406 are orthogonal to the HPCC++ algorithms described in this document. 407 The algorithm can be implemented with a choice of inband telemetry 408 protocols, such as in-band network telemetry [P4-INT], IOAM 409 [I-D.ietf-ippm-ioam-data], IFA [I-D.ietf-kumar-ippm-ifa] and others. 411 6.2. Receiver-based HPCC 413 Note that the window/rate calculation can be implemented at either 414 the data sender or the data receiver. If the ACK packets already 415 exist for reliability purpose, the inband telemetry information can 416 be echoed back to the sender via ACK self-clocking. Not all ACK 417 packets need to carry the inband telemetry information. To reduce 418 the Packet Per Second (PPS) overhead, the receiver may examine the 419 inband telemetry information and adopt the technique of delayed ACKs 420 that only sends out an ACK for a few of received packets. In order 421 to reduce PPS even further, one may implement the algorithm at the 422 receiver and feedback the calculated window in the ACK packet once 423 every RTT. 425 The receiver-based algorithm, Rx-HPCC, is based on int.L, which is 426 the inband telemetry information in the packet header. The receiver 427 performs the same functions except using int.L instead of ack.L. The 428 new function NewINT(int.L) is to replace NewACK(int.L) 429 28: Procedure NewINT(int.L) 430 29: if now > (lastUpdateTime + T) then 431 30: W = ComputeWind(MeasureInflight(int), True); 432 31: send_ack(W) 433 32: lastUpdateTime = now; 434 33: else 435 34: W = ComputeWind(MeasureInflight(int), False); 437 Here, since the receiver does not know the starting sequence number 438 of a burst, it simply records the lastUpdateTime. If time T has 439 passed since lastUpdateTime, the algorithm would recalcuate Wc as in 440 Line 30 and send out the ACK packet which would include W 441 information. Otherwise, it would just update W information locally. 442 This would reduce the amount of traffic that needs to be feedback to 443 the data sender. 445 Note that the receiver can also measure the number of outstanding 446 flows, N, if the last hop is the congestion point and use this 447 information to dynamically adjust W_ai to achieve better fairness. 448 The improvement would allow flows to quickly converge to fairness 449 without causing large swings under heavy load. 451 7. Reference Implementations 453 A prototype of HPCC++ in NICs is implemented to realize the CC 454 algorithm and switches to realize the inband telemetry feature. 456 7.1. Inband telemetry padding at the network elements 458 HPCC++ only relies on packets to share information across senders, 459 receivers, and switches. HPCC++ is open to a variety of inband 460 telemetry format standards. Inside a data center, the path length is 461 often no more than 5 hops. The overhead of the inband telemetry 462 padding for HPCC++ is considered to be low. 464 7.2. Congestion control at NICs 466 (Figure 4) shows HPCC++ implementation on a NIC. The NIC provides an 467 HPCC++ module that resides on the data path of the NIC, HPCC++ 468 modules realize both sender and receiver roles. 470 +------------------------------------------------------------------+ 471 | +---------+ window update +-----------+ PktSend +-----------+ | 472 | | |-------------->| Scheduler |-------> |Tx pipeline|---+-> 473 | | | rate update +-----------+ +-----------+ | 474 | | HPCC++ | ^ | 475 | | | inband telemetry| | 476 | | module | | | 477 | | | +-----+-----+ | 478 | | |<----------------------------------- |Rx pipeline| <-+-- 479 | +---------+ telemetry response event +-----------+ | 480 +------------------------------------------------------------------+ 482 Figure 4: Overview of NIC Implementation 484 1. Sender side flow 486 The HPCC++ module running the HPCC CC algorithm in the sender side 487 for every flow in the NIC. Flow can be defined by some transport 488 parameters including 5-tuples, destination QP (queue pair), etc. It 489 receives inband telemetry response events per flow which are 490 generated from the RX pipeline, adjusts the sending window and rate, 491 and update the scheduler on the rate and window of the flow. 493 The scheduler contains a pacing mechanism that determine the flow 494 rate by the value it got from the algorithm. It also maintains the 495 current sending window size for active flows. If the pacing 496 mechanism and the flow's sending window permits, the scheduler 497 invokes for the flow a PktSend command to TX pipeline. 499 The TX pipeline implements packet processing. Once it receives the 500 PktSend event with flow ID from the scheduler, it generates the 501 corresponding packet and delivers to the Network. If a sent packet 502 should collect telemetry on its way the TX pipeline may add 503 indications/headers that triggers the network elements to add 504 telemetry data according to the inband telemetry protocol in use. 505 The telemetry can be collected by the data packet or by dedicated 506 prob packets generated in the TX pipeline. 508 The RX pipe parses the incoming packets from the network and 509 identifies whether telemetry is embedded in the parsed packet. On 510 receiving a telemetry response packet, the RX pipeline extracts the 511 network status from the packet and passes it to the HPCC++ module for 512 processing. A telemetry response packet can be an ACK containing 513 inband telemetry, or a dedicated telemetry response prob packet. 515 2. Receiver side flow 516 On receiving a packet containing inband telemetry, the RX pipeline 517 extracts the network status, and the flow parameters from the packet 518 and passes it to the TX pipeline. The packet can be a data packet 519 containing inband telemetry, or a dedicated telemetry request prob 520 packet. The Tx pipeline may process and edit the telemetry data, and 521 then sends back to the sender the data using either an ACK packet of 522 the flow or a dedicated telemetry response packet. 524 8. IANA Considerations 526 This document makes no request of IANA. 528 9. Security Considerations 530 The rate adaptation mechanism in HPCC++ relies on feedback from the 531 network. As such, it is vulnerable to attacks where feedback 532 messages are hijacked, replaced, or intentionally injected with 533 misleading information resulting in denial of service, similar to 534 those that can affect TCP. It is therefore RECOMMENDED that the 535 notification feedback message is at least integrity checked. In 536 addition, [I-D.ietf-avtcore-cc-feedback-message] discusses the 537 potential risk of a receiver providing misleading congestion feedback 538 information and the mechanisms for mitigating such risks. 540 10. Acknowledgments 542 The authors would like to thank ... for their valuable review 543 comments and helpful input to this specification. 545 11. Contributors 547 The following individuals have contributed to the implementation and 548 evaluation of the proposed scheme, and therefore have helped to 549 validate and substantially improve this specification: Pedro Y. 550 Segura, Roberto P. Cebrian, Robert Southworth and Malek Musleh. 552 12. References 554 12.1. Normative References 556 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 557 Requirement Levels", BCP 14, RFC 2119, 558 DOI 10.17487/RFC2119, March 1997, 559 . 561 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 562 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 563 May 2017, . 565 12.2. Informative References 567 [I-D.ietf-avtcore-cc-feedback-message] 568 Sarker, Z., Perkins, C., Singh, V., and M. Ramalho, "RTP 569 Control Protocol (RTCP) Feedback for Congestion Control", 570 draft-ietf-avtcore-cc-feedback-message-08 (work in 571 progress), September 2020. 573 [I-D.ietf-ippm-ioam-data] 574 "Data Fields for In-situ OAM", March 2020, 575 . 578 [I-D.ietf-kumar-ippm-ifa] 579 "Inband Flow Analyzer", February 2019, 580 . 582 [P4-INT] "In-band Network Telemetry (INT) Dataplane Specification, 583 v2.0", February 2020, . 586 [SIGCOMM-HPCC] 587 Li, Y., Miao, R., Liu, H., Zhuang, Y., Fei Feng, F., Tang, 588 L., Cao, Z., Zhang, M., Kelly, F., Alizadeh, M., and M. 589 Yu, "HPCC: High Precision Congestion Control", ACM 590 SIGCOMM Beijing, China, August 2019. 592 [Zhu-SIGCOMM2015] 593 Zhu, Y., Eran, H., Firestone, D., Guo, C., Lipshteyn, M., 594 Liron, Y., Padhye, J., Raindel, S., Yahia, M., and M. 595 Zhang, "Congestion Control for Large-Scale RDMA 596 Deployments", ACM SIGCOMM London, United Kingdom, August 597 2015. 599 Authors' Addresses 601 Rui Miao 602 Alibaba Group 603 525 Almanor Ave, 4th Floor 604 Sunnyvale, CA 94085 605 USA 607 Email: miao.rui@alibaba-inc.com 608 Hongqiang H. Liu 609 Alibaba Group 610 108th Ave NE, Suite 800 611 Bellevue, WA 98004 612 USA 614 Email: hongqiang.liu@alibaba-inc.com 616 Rong Pan 617 Intel, Corp. 618 2200 Mission College Blvd. 619 Santa Clara, CA 95054 620 USA 622 Email: rong.pan@intel.com 624 Jeongkeun Lee 625 Intel, Corp. 626 4750 Patrick Henry Dr. 627 Santa Clara, CA 95054 628 USA 630 Email: jk.lee@intel.com 632 Changhoon Kim 633 Intel Corporation 634 4750 Patrick Henry Dr. 635 Santa Clara, CA 95054 636 USA 638 Email: chang.kim@intel.com 640 Barak Gafni 641 Mellanox Technologies, Inc. 642 350 Oakmead Parkway, Suite 100 643 Sunnyvale, CA 94085 644 USA 646 Email: gbarak@mellanox.com 647 Yuval Shpigelman 648 Mellanox Technologies, Inc. 649 Haim Hazaz 3A 650 Netanya 4247417 651 Israel 653 Email: yuvals@nvidia.com