idnits 2.17.1 draft-gg-udt-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 15) being 59 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 12, 2010) is 5100 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Obsolete informational reference (is this intentional?): RFC 4960 (Obsoleted by RFC 9260) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force Yunhong Gu 3 Internet Draft University of Illinois at Chicago 4 Intended status: Informational April 12, 2010 5 Expires: October 12, 2010 7 UDT: UDP-based Data Transfer Protocol 8 draft-gg-udt-03.txt 10 Status of this Memo 12 This Internet-Draft is submitted to IETF in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six months 21 and may be updated, replaced, or obsoleted by other documents at any 22 time. It is inappropriate to use Internet-Drafts as reference 23 material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt. 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html. 31 This Internet-Draft will expire on October 15, 2010. 33 Copyright Notice 35 Copyright (c) 2010 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with respect 43 to this document. 45 Abstract 47 This document describes UDT, or the UDP based Data Transfer protocol. 48 UDT is designed to be an alternative data transfer protocol for the 49 situations when TCP does not work well. One of the most common cases, 50 and also the original motivation of UDT, is to overcome TCP's 51 UDT April 12, 2010 53 inefficiency in high bandwidth-delay product (BDP) networks. Another 54 important target use scenario is to allow networking researchers, 55 students, and application developers to easily implement and deploy 56 new data transfer algorithms and protocols. Furthermore, UDT can also 57 be used to better support firewall traversing. 59 UDT is completely built on top of UDP. However, UDT is connection 60 oriented, unicast, and duplex. It supports both reliable data 61 streaming and partial reliable messaging. The congestion control 62 module is an open framework that can be used to implement and/or 63 deploy different control algorithms. UDT also has a native/default 64 control algorithm based on AIMD rate control. 66 UDT April 12, 2010 68 Table of Contents 70 1. Introduction...................................................4 71 2. Packet Structures..............................................5 72 3. UDP Multiplexer................................................8 73 4. Timers.........................................................8 74 5. Connection Setup and shutdown..................................9 75 5.1 Client/Server Connection Setup............................10 76 5.2 Rendezvous Connection Setup...............................10 77 5.3 Shutdown..................................................11 78 6. Data Sending and Receiving....................................11 79 6.1 The Sender's Algorithm....................................11 80 6.2 The Receiver's Algorithm..................................12 81 6.3 Flow Control..............................................15 82 6.4 Loss Information Compression Scheme.......................15 83 7. Configurable Congestion Control (CCC).........................15 84 7.1 CCC Interface.............................................15 85 7.2 UDT's Native Control Algorithm............................16 86 Security Considerations..........................................18 87 Normative References.............................................18 88 Informative References...........................................18 89 Author's Addresses...............................................19 90 UDT April 12, 2010 92 1. 93 Introduction 95 The Transmission Control Protocol (TCP) [RFC5681] has been very 96 successful and greatly contributes to the popularity of today's 97 Internet. Today TCP still contributes the majority of the traffic on 98 the Internet. 100 However, TCP is not perfect and it is not designed for every specific 101 applications. In the last several years, with the rapid advance of 102 optical networks and rich Internet applications, TCP has been found 103 inefficient as the network bandwidth-delay product (BDP) increases. 104 Its AIMD (additive increase multiplicative decrease) algorithm 105 reduces the TCP congestion window drastically but fails to recover it 106 to the available bandwidth quickly. Theoretical flow level analysis 107 has shown that TCP becomes more vulnerable to packet loss as the BDP 108 increases higher [LM97]. 110 To overcome the TCP's inefficiency problem over high speed wide area 111 networks is the original motivation of UDT. Although there are new 112 TCP variants deployed today (for example, BiC TCP [XHR04] on Linux 113 and Compound TCP [TS06] on Windows), certain problems still exist. 114 For example, none of the new TCP variants address RTT unfairness, the 115 situation that connections with shorter RTT consume more bandwidth. 117 Moreover, as the Internet continues to evolve, new challenges and 118 requirements to the transport protocol will always emerge. 119 Researchers need a platform to rapidly develop and test new 120 algorithms and protocols. Network researchers and students can use 121 UDT to easily implement their ideas on transport protocols, in 122 particular congestion control algorithms, and conduct experiments 123 over real networks. 125 Finally, there are other situations when UDT can be found more 126 helpful than TCP. For example, UDP-based protocol is usually easier 127 for punching NAT firewalls. For another example, TCP's congestion 128 control and reliability control is not desirable in certain 129 applications of VOIP, wireless communication, etc. Application 130 developers can use (with or without modification) UDT to suit their 131 requirements. 133 Due to all those reasons and motivations described above, we believe 134 that it is necessary to design a well defined and developed UDP-based 135 data transfer protocol. 137 As its name suggest, UDT is built solely on the top of UDP [RFC768]. 138 Both data and control packets are transferred using UDP. UDT is 139 connection-oriented in order to easily maintain congestion control, 140 reliability, and security. It is a unicast protocol while multicast 141 is not considered here. Finally, data can be transferred over UDT in 142 UDT April 12, 2010 144 duplex. 146 UDT supports both reliable data streaming and partial reliable 147 messaging. The data streaming semantics is similar to that of TCP, 148 while the messaging semantics can be regarded as a subset of SCTP 149 [RFC4960]. 151 This document defines UDT's protocol specification. The detailed 152 description and performance analysis can be found in [GG07], and a 153 fully functional reference implementation can be found at [UDT]. 155 2. 156 Packet Structures 158 UDT has two kinds of packets: the data packets and the control 159 packets. They are distinguished by the 1st bit (flag bit) of the 160 packet header. 162 The data packet header structure is as following. 164 0 1 2 3 165 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 166 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 167 |0| Packet Sequence Number | 168 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 169 |FF |O| Message Number | 170 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 171 | Time Stamp | 172 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 173 | Destination Socket ID | 174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 176 The data packet header starts with 0. Packet sequence number uses the 177 following 31 bits after the flag bit. UDT uses packet based 178 sequencing, i.e., the sequence number is increased by 1 for each sent 179 data packet in the order of packet sending. Sequence number is 180 wrapped after it is increased to the maximum number (2^31 - 1). 182 The next 32-bit field in the header is for the messaging. The first 183 two bits "FF" flags the position of the packet is a message. "10" is 184 the first packet, "01" is the last one, "11" is the only packet, and 185 "00" is any packets in the middle. The third bit "O" means if the 186 message should be delivered in order (1) or not (0). A message to be 187 delivered in order requires that all previous messages must be either 188 delivered or dropped. The rest 29 bits is the message number, similar 189 to packet sequence number (but independent). A UDT message may 190 contain multiple UDT packets. 192 Following are the 32-bit time stamp when the packet is sent and the 193 destination socket ID. The time stamp is a relative value starting 194 UDT April 12, 2010 196 from the time when the connection is set up. The time stamp 197 information is not required by UDT or its native control algorithm. 198 It is included only in case that a user defined control algorithm may 199 require the information (See Section 6). 201 The Destination ID is used for UDP multiplexer. Multiple UDT socket 202 can be bound on the same UDP port and this UDT socket ID is used to 203 differentiate the UDT connections. 205 If the flag bit of a UDT packet is 1, then it is a control packet and 206 parsed according to the following structure. 208 0 1 2 3 209 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 210 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 211 |1| Type | Reserved | 212 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 213 | | Additional Info | 214 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 215 | Time Stamp | 216 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 217 | Destination Socket ID | 218 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 219 | | 220 ~ Control Information Field ~ 221 | | 222 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 224 There are 8 types of control packets in UDT and the type information 225 is put in bit field 1 - 15 of the header. The contents of the 226 following fields depend on the packet type. The first 128 bits must 227 exist in the packet header, whereas there may be an empty control 228 information field, depending on the packet type. 230 Particularly, UDT uses sub-sequencing for ACK packet. Each ACK packet 231 is assigned a unique increasing 16-bit sequence number, which is 232 independent of the data packet sequence number. The ACK sequence 233 number uses bits 32 - 63 ("Additional Info") in the control packet 234 header. The ACK sequence number ranges from 0 to (2^31 - 1). 236 TYPE 0x0: Protocol Connection Handshake 237 Additional Info: Undefined 238 Control Info: 239 1) 32 bits: UDT version 240 2) 32 bits: Socket Type (STREAM or DGRAM) 241 3) 32 bits: initial packet sequence number 242 4) 32 bits: maximum packet size (including UDP/IP headers) 243 5) 32 bits: maximum flow window size 244 6) 32 bits: connection type (regular or rendezvous) 245 UDT April 12, 2010 247 7) 32 bits: socket ID 248 8) 32 bits: SYN cookie 249 9) 128 bits: the IP address of the peer's UDP socket 251 TYPE 0x1: Keep-alive 252 Additional Info: Undefined 253 Control Info: None 255 TYPE 0x2: Acknowledgement (ACK) 256 Additional Info: ACK sequence number 257 Control Info: 258 1) 32 bits: The packet sequence number to which all the 259 previous packets have been received (excluding) 260 [The following fields are optional] 261 2) 32 bits: RTT (in microseconds) 262 3) 32 bits: RTT variance 263 4) 32 bits: Available buffer size (in bytes) 264 5) 32 bits: Packets receiving rate (in number of packets 265 per second) 266 6) 32 bits: Estimated link capacity (in number of packets 267 per second) 269 TYPE 0x3: Negative Acknowledgement (NAK) 270 Additional Info: Undefined 271 Control Info: 272 1) 32 bits integer array of compressed loss information 273 (see section 3.9). 275 TYPE 0x4: Unused 277 TYPE 0x5: Shutdown 278 Additional Info: Undefined 279 Control Info: None 281 TYPE 0x6: Acknowledgement of Acknowledgement (ACK2) 282 Additional Info: ACK sequence number 283 Control Info: None 285 TYPE 0x7: Message Drop Request: 286 Additional Info: Message ID 287 Control Info: 288 1) 32 bits: First sequence number in the message 289 2) 32 bits: Last sequence number in the message 291 TYPE 0x7FFF: Explained by bits 16 - 31, reserved for user defined 292 Control Packet 294 Finally, Time Stamp and Destination Socket ID also exist in the 295 control packets. 297 UDT April 12, 2010 299 3. 300 UDP Multiplexer 302 A UDP multiplexer is used to handle concurrent UDT connections 303 sharing the same UDP port. The multiplexer dispatch incoming UDT 304 packets to the corresponding UDT sockets according to the destination 305 socket ID in the packet header. 307 One multiplexer is used for all UDT connections bound to the same UDP 308 port. That is, UDT sockets on different UDP port will be handled by 309 different multiplexers. 311 A multiplexer maintains two queues. The sending queue includes the 312 sockets with at least one packet scheduled for sending. The UDT 313 sockets in the sending queue are ordered by the next packet sending 314 time. A high performance timer is maintained by the sending queue and 315 when it is time for the first socket in the queue to send its packet, 316 the packet will be sent and the socket will be removed. If there are 317 more packets for that socket to be sent, the socket will be re- 318 inserted to the queue. 320 The receiving queue reads incoming packets and dispatches them to the 321 corresponding sockets. If the destination ID is 0, the packet will be 322 sent to the listening socket (if there is any), or to a socket that 323 is in rendezvous connection phase. (See Section 5.) 325 Similar to the sending queue, the receiving queue also maintains a 326 list of sockets waiting for incoming packets. The receiving queue 327 scans the list to check if any timer expires for each socket every 328 SYN (SYN = 0.01 second, defined in Section 4). 330 4. 331 Timers 333 UDT uses four timers to trigger different periodical events. Each 334 event has its own period and they are all independent. They use the 335 system time as origins and should process wrapping if the system time 336 wraps. 338 For a certain periodical event E in UDT, suppose the time variable is 339 ET and its period is p. If E is set or reset at system time t0 (ET = 340 t0), then at any time t1, (t1 - ET >= p) is the condition to check if 341 E should be triggered. 343 The four timers are ACK, NAK, EXP and SND. SND is used in the sender 344 only for rate-based packet sending (see Section 6.1), whereas the 345 other three are used in the receiver only. 347 ACK is used to trigger an acknowledgement (ACK). Its period is set by 348 the congestion control module. However, UDT will send an ACK no 349 UDT April 12, 2010 351 longer than every 0.01 second, even though the congestion control 352 does not need timer-based ACK. Here, 0.01 second is defined as the 353 SYN time, or synchronization time, and it affects many of the other 354 timers used in UDT. 356 NAK is used to trigger a negative acknowledgement (NAK). Its period 357 is dynamically updated to 4 * RTT_+ RTTVar + SYN, where RTTVar is the 358 variance of RTT samples. 360 EXP is used to trigger data packets retransmission and maintain 361 connection status. Its period is dynamically updated to N * (4 * RTT 362 + RTTVar + SYN), where N is the number of continuous timeouts. To 363 avoid unnecessary timeout, a minimum threshold (e.g., 0.5 second) 364 should be used in the implementation. 366 The recommended granularity of their periods is microseconds. 367 However, accurate time keeping is not necessary, except for SND. 369 In the rest of this document, a name of a time variable will be used 370 to represent the associated event, the variable itself, or the value 371 of its period, depending on the context. For example, ACK can mean 372 either the ACK event or the value of ACK period. 374 5. 375 Connection Setup and shutdown 377 UDT supports two different connection setup methods, the traditional 378 client/server mode and the rendezvous mode. In the latter mode, both 379 UDT sockets connect to each other at (approximately) the same time. 381 The UDT client (in rendezvous mode, both peer are clients) sends a 382 handshake request (type 0 control packet) to the server or the peer 383 side. The handshake packet has the following information (suppose UDT 384 socket A sends this handshake to B): 385 1) UDT version: this value is for compatibility purpose. The 386 current version is 4. 387 2) Socket Type: STREAM (0) or DGRAM (1). 388 3) Initial Sequence Number: It is the sequence number for the first 389 data packet that A will send out. This should be a random value. 390 4) Packet Size: the maximum size of a data packet (including all 391 headers). This is usually the value of MTU. 392 5) Maximum Flow Window Size: This value may not be necessary; 393 however, it is needed in the current reference implementation. 394 6) Connection Type. This information is used to differential the 395 connection setup modes and request/response. 396 7) Socket ID. The client UDT socket ID. 397 8) Cookie. This is a cookie value used to avoid SYN flooding attack 398 [RFC4987]. 399 9) Peer IP address: B's IP address. 401 UDT April 12, 2010 403 5.1 404 Client/Server Connection Setup 406 One UDT entity starts first as the server (listener). The server 407 accepts and processes incoming connection request, and creates new 408 UDT socket for each new connection. 410 A client that wants to connect to the server will send a handshake 411 packet first. The client should keep on sending the handshake packet 412 every constant interval until it receives a response handshake from 413 the server or a timeout timer expires. 415 When the server first receives the connection request from a client, 416 it generates a cookie value according to the client address and a 417 secret key and sends it back to the client. The client must then send 418 back the same cookie to the server. 420 The server, when receiving a handshake packet and the correct cookie, 421 compares the packet size and maximum window size with its own values 422 and set its own values as the smaller ones. The result values are 423 also sent back to the client by a response handshake packet, together 424 with the server's version and initial sequence number. The server is 425 ready for sending/receiving data right after this step is finished. 426 However, it must send back response packet as long as it receives any 427 further handshakes from the same client. 429 The client can start sending/receiving data once it gets a response 430 handshake packet from the server. Further response handshake 431 messages, if received any, should be omitted. 433 The connection type from the client should be set to 1 and the 434 response from the server should be set to -1. 436 The client should also check if the response is from the server that 437 the original request was sent to. 439 5.2 440 Rendezvous Connection Setup 442 In this mode, both clients send a connect request to each other at 443 the same time. The initial connection type is set to 0. Once a peer 444 receives a connection request, it sends back a response. If the 445 connection type is 0, then the response sends back -1; if the 446 connection type is -1, then the response sends back -2; No response 447 will be sent for -2 request. 449 The rendezvous peer does the same check on the handshake messages 450 (version, packet size, window size, etc.) as described in Section 451 5.1. In addition, the peer only process the connection request from 452 the address it has sent a connection request to. Finally, rendezvous 453 connection should be rejected by a regular UDT server (listener). 455 UDT April 12, 2010 457 A peer initializes the connection when it receives -1 response. 459 The rendezvous connection setup is useful when both peers are behind 460 firewalls. It can also provide better security and usability when a 461 listening server is not desirable. 463 5.3 464 Shutdown 466 If one of the connected UDT entities is being closed, it will send a 467 shutdown message to the peer side. The peer side, after received this 468 message, will also be closed. This shutdown message, delivered using 469 UDP, is only sent once and not guaranteed to be received. If the 470 message is not received, the peer side will be closed after 16 471 continuous EXP timeout (see section 3.5). However, the total timeout 472 value should be between a minimum threshold and a maximum threshold. 473 In our reference implementation, we use 3 seconds and 30 seconds, 474 respectively. 476 6. 477 Data Sending and Receiving 479 Each UDT entity has two logical parts: the sender and the receiver. 480 The sender sends (and retransmits) application data according to the 481 flow control and congestion control. The receiver receives both data 482 packets and control packets, and sends out control packets according 483 to the received packets and the timers. 485 The receiver is responsible for triggering and processing all control 486 events, including congestion control and reliability control, and 487 their related mechanisms. 489 UDT always tries to pack application data into fixed size packets 490 (the maximum packet size negotiated during connection setup), unless 491 there is not enough data to be sent. 493 We explained the rationale of some of the UDT data sending/receiving 494 schemes in [GHG04b]. 496 6.1 497 The Sender's Algorithm 499 Data Structures and Variables: 501 1) Sender's Loss List: The sender's loss list is used to store the 502 sequence numbers of the lost packets fed back by the receiver 503 through NAK packets or inserted in a timeout event. The numbers 504 are stored in increasing order. 506 Data Sending Algorithm: 507 1) If the sender's loss list is not empty, retransmit the first 508 UDT April 12, 2010 510 packet in the list and remove it from the list. Go to 5). 511 2) In messaging mode, if the packets has been the loss list for a 512 time more than the application specified TTL (time-to-live), send 513 a message drop request and remove all related packets from the 514 loss list. Go to 1). 515 3) Wait until there is application data to be sent. 516 4) 517 a. If the number of unacknowledged packets exceeds the 518 flow/congestion window size, wait until an ACK comes. Go to 519 1). 520 b. Pack a new data packet and send it out. 521 5) If the sequence number of the current packet is 16n, where n is an 522 integer, go to 2). 523 6) Wait (SND - t) time, where SND is the inter-packet interval 524 updated by congestion control and t is the total time used by step 525 1 to step 5. Go to 1). 527 6.2 528 The Receiver's Algorithm 530 Data Structures and Variables: 532 1) Receiver's Loss List: It is a list of tuples whose values include: 533 the sequence numbers of detected lost data packets, the latest 534 feedback time of each tuple, and a parameter k that is the number 535 of times each one has been fed back in NAK. Values are stored in 536 the increasing order of packet sequence numbers. 537 2) ACK History Window: A circular array of each sent ACK and the time 538 it is sent out. The most recent value will overwrite the oldest 539 one if no more free space in the array. 540 3) PKT History Window: A circular array that records the arrival time 541 of each data packet. 542 4) Packet Pair Window: A circular array that records the time 543 interval between each probing packet pair. 544 5) LRSN: A variable to record the largest received data packet 545 sequence number. LRSN is initialized to the initial sequence 546 number minus 1. 547 6) ExpCount: A variable to record number of continuous EXP time-out 548 events. 550 Data Receiving Algorithm: 551 1) Query the system time to check if ACK, NAK, or EXP timer has 552 expired. If there is any, process the event (as described below 553 in this section) and reset the associated time variables. For 554 ACK, also check the ACK packet interval. 555 2) Start time bounded UDP receiving. If no packet arrives, go to 1). 556 1) Reset the ExpCount to 1. If there is no unacknowledged data 557 packet, or if this is an ACK or NAK control packet, reset the EXP 558 timer. 559 3) Check the flag bit of the packet header. If it is a control 560 UDT April 12, 2010 562 packet, process it according to its type and go to 1). 563 4) If the sequence number of the current data packet is 16n + 1, 564 where n is an integer, record the time interval between this 565 packet and the last data packet in the Packet Pair Window. 566 5) Record the packet arrival time in PKT History Window. 567 6) 568 a. If the sequence number of the current data packet is greater 569 than LRSN + 1, put all the sequence numbers between (but 570 excluding) these two values into the receiver's loss list and 571 send them to the sender in an NAK packet. 572 b. If the sequence number is less than LRSN, remove it from the 573 receiver's loss list. 574 7) Update LRSN. Go to 1). 576 ACK Event Processing: 577 1) Find the sequence number prior to which all the packets have been 578 received by the receiver (ACK number) according to the following 579 rule: if the receiver's loss list is empty, the ACK number is LRSN 580 + 1; otherwise it is the smallest sequence number in the 581 receiver's loss list. 582 2) If (a) the ACK number equals to the largest ACK number ever 583 acknowledged by ACK2, or (b) it is equal to the ACK number in the 584 last ACK and the time interval between this two ACK packets is 585 less than 2 RTTs, stop (do not send this ACK). 586 3) Assign this ACK a unique increasing ACK sequence number. Pack the 587 ACK packet with RTT, RTT Variance, and flow window size (available 588 receiver buffer size). If this ACK is not triggered by ACK timers, 589 send out this ACK and stop. 590 4) Calculate the packet arrival speed according to the following 591 algorithm: 592 Calculate the median value of the last 16 packet arrival 593 intervals (AI) using the values stored in PKT History Window. 594 In these 16 values, remove those either greater than AI*8 or 595 less than AI/8. If more than 8 values are left, calculate the 596 average of the left values AI', and the packet arrival speed is 597 1/AI' (number of packets per second). Otherwise, return 0. 598 5) Calculate the estimated link capacity according to the following 599 algorithm: 600 Calculate the median value of the last 16 packet pair 601 intervals (PI) using the values in Packet Pair Window, and the 602 link capacity is 1/PI (number of packets per second). 603 6) Pack the packet arrival speed and estimated link capacity into the 604 ACK packet and send it out. 605 7) Record the ACK sequence number, ACK number and the departure time 606 of this ACK in the ACK History Window. 608 NAK Event Processing: 609 Search the receiver's loss list, find out all those sequence numbers 610 whose last feedback time is k*RTT before, where k is initialized as 2 611 UDT April 12, 2010 613 and increased by 1 each time the number is fed back. Compress 614 (according to section 6.4) and send these numbers back to the sender 615 in an NAK packet. 617 EXP Event Processing: 618 1) Put all the unacknowledged packets into the sender's loss list. 619 2) If (ExpCount > 16) and at least 3 seconds has elapsed since that 620 last time when ExpCount is reset to 1, or, 3 minutes has elapsed, 621 close the UDT connection and exit. 622 3) If the sender's loss list is empty, send a keep-alive packet to 623 the peer side. 624 4) Increase ExpCount by 1. 626 On ACK packet received: 627 1) Update the largest acknowledged sequence number. 628 2) Send back an ACK2 with the same ACK sequence number in this ACK. 629 3) Update RTT and RTTVar. 630 4) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 631 5) Update flow window size. 632 6) If this is a Light ACK, stop. 633 7) Update packet arrival rate: A = (A * 7 + a) / 8, where a is the 634 value carried in the ACK. 635 8) Update estimated link capacity: B = (B * 7 + b) / 8, where b is 636 the value carried in the ACK. 637 9) Update sender's buffer (by releasing the buffer that has been 638 acknowledged). 639 10) Update sender's loss list (by removing all those that has been 640 acknowledged). 642 On NAK packet received: 643 1) Add all sequence numbers carried in the NAK into the sender's loss 644 list. 645 2) Update the SND period by rate control (see section 3.6). 646 3) Reset the EXP time variable. 648 On ACK2 packet received: 649 1) Locate the related ACK in the ACK History Window according to the 650 ACK sequence number in this ACK2. 651 2) Update the largest ACK number ever been acknowledged. 652 3) Calculate new rtt according to the ACK2 arrival time and the ACK 653 departure time, and update the RTT value as: RTT = (RTT * 7 + 654 rtt) / 8. 655 4) Update RTTVar by: RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4. 656 5) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN. 658 On message drop request received: 659 1) Tag all packets belong to the message in the receiver buffer so 660 that they will not be read. 661 2) Remove all corresponding packets in the receiver's loss list. 663 UDT April 12, 2010 665 On Keep-alive packet received: 666 Do nothing. 668 On Handshake/Shutdown packet received: 669 See Section 5. 671 6.3 672 Flow Control 674 The flow control window size is 16 initially. 676 On ACK packet received: 677 The flow window size is updated to the receiver's available buffer 678 size. 680 6.4 681 Loss Information Compression Scheme 683 The loss information carried in an NAK packet is an array of 32-bit 684 integers. If an integer in the array is a normal sequence number (1st 685 bit is 0), it means that the packet with this sequence number is 686 lost; if the 1st bit is 1, it means all the packets starting from 687 (including) this number to (including) the next number in the array 688 (whose 1st bit must be 0) are lost. 690 For example, the following information carried in an NAK: 691 0x00000002, 0x80000006, 0x0000000B, 0x0000000E 692 means packets with sequence number 2, 6, 7, 8, 9, 10, 11, and 14 are 693 lost. 695 7. 696 Configurable Congestion Control (CCC) 698 The congestion control in UDT is an open framework so that user- 699 defined control algorithm can be easily implemented and switched. 700 Particularly, the native control algorithm is also implemented by 701 this framework. 703 The user-defined algorithm may redefine several control routines to 704 read and adjust several UDT parameters. The routines will be called 705 when certain event occurs. For example, when an ACK is received, the 706 control algorithm may increase the congestion window size. 708 7.1 709 CCC Interface 711 UDT allow users to access two congestion control parameters: the 712 congestion window size and the inter-packet sending interval. Users 713 may adjust these two parameters to realize window-based control, 714 rate-based control, or a hybrid approach. 716 In addition, the following parameters should also be exposed. 718 UDT April 12, 2010 720 1) RTT 721 2) Maximum Segment/Packet Size 722 3) Estimated Bandwidth 723 4) The latest packet sequence number that has been sent so far 724 5) Packet arriving rate at the receiver side 726 A UDT implementation may expose additional parameters as well. This 727 information can be used in user-defined congestion control algorithms 728 to adjust the packet sending rate. 730 The following control events can be redefined via CCC (e.g., by a 731 callback function). 733 1) init: when the UDT socket is connected. 734 2) close: when the UDT socket is closed. 735 3) onACK: when ACK is received. 736 4) onLOSS: when NACK is received. 737 5) onTimeout: when timeout occurs. 738 6) onPktSent: when a data packet is sent. 739 7) onPktRecv: when a data packet is received. 741 Users can also adjust the following parameters in the user-defined 742 control algorithms. 744 1) ACK interval: An ACK may be sent every fixed number of packets. 745 User may define this interval. If this value is -1, then it means 746 no ACK will be sent based on packet interval. 747 2) ACK Timer: An ACK will also be sent every fixed time interval. 748 This is mandatory in UDT. The maximum and default ACK time 749 interval is SYN. 750 3) RTO: UDT uses 4 * RTT + RTTVar to compute RTO. Users may redefine 751 this. 753 Detailed description and discussion of UDT/CCC can be found in 754 [GG05]. 756 7.2 757 UDT's Native Control Algorithm 759 UDT has a native and default control algorithm, which will be used if 760 no user-defined algorithm is implemented and configured. The native 761 UDT algorithm should be implemented using CCC. 763 UDT's native algorithm is a hybrid congestion control algorithm, 764 hence it adjusts both the congestion window size and the inter-packet 765 interval. The native algorithm uses timer-based ACK and the ACK 766 interval is SYN. 768 The initial congestion window size is 16 packets and the initial 769 inter-packet interval is 0. The algorithm start with Slow Start phase 770 UDT April 12, 2010 772 until the first ACK or NAK arrives. 774 On ACK packet received: 775 1) If the current status is in the slow start phase, set the 776 congestion window size to the product of packet arrival rate and 777 (RTT + SYN). Slow Start ends. Stop. 778 2) Set the congestion window size (CWND) to: CWND = A * (RTT + SYN) + 779 16. 780 3) The number of sent packets to be increased in the next SYN period 781 (inc) is calculated as: 782 if (B <= C) 783 inc = 1/PS; 784 else 785 inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta/PS, 1/PS); 786 where B is the estimated link capacity and C is the current 787 sending speed. All are counted as packets per second. PS is the 788 fixed size of UDT packet counted in bytes. Beta is a constant 789 value of 0.0000015. 790 4) The SND period is updated as: 791 SND = (SND * SYN) / (SND * inc + SYN). 793 These four parameters are used in rate decrease, and their initial 794 values are in the parentheses: AvgNAKNum (1), NAKCount (1), 795 DecCount(1), LastDecSeq (initial sequence number - 1). 797 We define a congestion period as the period between two NAKs in which 798 the first biggest lost packet sequence number is greater than the 799 LastDecSeq, which is the biggest sequence number when last time the 800 packet sending rate is decreased. 802 AvgNAKNum is the average number of NAKs in a congestion period. 803 NAKCount is the current number of NAKs in the current period. 805 On NAK packet received: 806 1) If it is in slow start phase, set inter-packet interval to 807 1/recvrate. Slow start ends. Stop. 808 2) If this NAK starts a new congestion period, increase inter-packet 809 interval (snd) to snd = snd * 1.125; Update AvgNAKNum, reset 810 NAKCount to 1, and compute DecRandom to a random (average 811 distribution) number between 1 and AvgNAKNum. Update LastDecSeq. 812 Stop. 813 3) If DecCount <= 5, and NAKCount == DecCount * DecRandom: 814 a. Update SND period: SND = SND * 1.125; 815 b. Increase DecCount by 1; 816 c. Record the current largest sent sequence number (LastDecSeq). 818 The native UDT control algorithm is designed for bulk data transfer 819 over high BDP networks. [GHG04a] 820 UDT April 12, 2010 822 Security Considerations 824 UDT's security mechanism is similar to that of TCP. Most of TCP's 825 approach to counter security attack should also be implemented in 826 UDT. 828 IANA Considerations 830 This document has no actions for IANA. 832 Normative References 834 [RFC768] J. Postel, User Datagram Protocol, Aug. 1980. 836 Informative References 838 [RFC4987] W. Eddy, TCP SYN Flooding Attacks and Common Mitigations. 840 [GG07] Yunhong Gu and Robert L. Grossman, UDT: UDP-based Data 841 Transfer for High-Speed Wide Area Networks, Computer Networks 842 (Elsevier). Volume 51, Issue 7. May 2007. 844 [GG05] Yunhong Gu and Robert L. Grossman, Supporting Configurable 845 Congestion Control in Data Transport Services, SC 2005, Nov 12 - 846 18, Seattle, WA, USA. 848 [GHG04b] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, Experiences 849 in Design and Implementation of a High Performance Transport 850 Protocol, SC 2004, Nov 6 - 12, Pittsburgh, PA, USA. 852 [GHG04a] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, An Analysis 853 of AIMD Algorithms with Decreasing Increases, First Workshop on 854 Networks for Grid Applications (Gridnets 2004), Oct. 29, San Jose, 855 CA, USA. 857 [LM97] T. V. Lakshman and U. Madhow, The Performance of TCP/IP for 858 Networks with High Bandwidth-Delay Products and Random Loss, 859 IEEE/ACM Trans. on Networking, vol. 5 no 3, July 1997, pp. 336- 860 350. 862 [RFC5681] Allman, M., Paxson, V. and E. Blanton, TCP Congestion 863 Control, September 2009. 865 [RFC4960] R. Stewart, Ed. Stream Control Transmission Protocol. 866 September 2007. 868 [TS06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, A Compound 869 TCP Approach for High-speed and Long Distance Networks, in IEEE 870 Infocom, April 2006, Barcelona, Spain. 872 UDT April 12, 2010 874 [UDT] UDT: UDP-based Data Transfer, URL http://udt.sf.net. 876 [XHR04] Lisong Xu, Khaled Harfoush, and Injong Rhee, Binary Increase 877 Congestion Control for Fast Long-Distance Networks, INFOCOM 2004. 879 Author's Addresses 881 Yunhong Gu 882 National Center for Data Mining 883 University of Illinois at Chicago 884 713 SEO, M/C 249, 851 S Morgan St 885 Chicago, IL 60607, USA 886 Phone: +1 (312) 413-9576 887 Email: yunhong@lac.uic.edu