idnits 2.17.1 draft-ietf-tsvwg-tcp-ulp-frame-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 220: '...otocol receivers MAY implement either ...' RFC 2119 keyword, line 221: '...aming protocol senders, MUST implement...' RFC 2119 keyword, line 222: '...marker mode, and MUST implement PDU al...' RFC 2119 keyword, line 234: '...U alignment mode MUST fail any attempt...' RFC 2119 keyword, line 248: '... the ULP MAY instruct the framing ...' (27 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC1122' is mentioned on line 492, but not defined == Unused Reference: 'ALF' is defined on line 722, but no explicit reference was found in the text == Unused Reference: 'SOCKS' is defined on line 729, but no explicit reference was found in the text == Unused Reference: 'RFC1112' is defined on line 737, but no explicit reference was found in the text == Unused Reference: 'RFC2581' is defined on line 749, but no explicit reference was found in the text == Unused Reference: 'Stevens' is defined on line 753, but no explicit reference was found in the text == Unused Reference: 'TLS' is defined on line 761, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'ALF' ** Obsolete normative reference: RFC 879 (Obsoleted by RFC 7805, RFC 9293) ** Obsolete normative reference: RFC 1750 (Obsoleted by RFC 4086) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) -- Possible downref: Non-RFC (?) normative reference: ref. 'Stevens' ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) -- Possible downref: Non-RFC (?) normative reference: ref. 'Satran' Summary: 8 errors (**), 0 flaws (~~), 10 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Transport Area Working Group S. Bailey 3 Internet-draft Sandburst 4 Expires: January 2001 J. Pinkerton 5 Microsoft 6 C. Sapuntzakis 7 Cisco 8 M. Wakeley 9 Agilent 10 J. Wendt 11 HP 12 J. Williams 13 Emulex 15 ULP Framing for TCP 16 draft-ietf-tsvwg-tcp-ulp-frame-00 18 Status of this Memo 20 This document is an Internet-Draft and is in full conformance with 21 all provisions of Section 10 of RFC2026. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF), its areas, and its working groups. Note that 25 other groups may also distribute working documents as Internet- 26 Drafts. 28 Internet-Drafts are draft documents valid for a maximum of six 29 months and may be updated, replaced, or obsoleted by other 30 documents at any time. It is inappropriate to use Internet-Drafts 31 as reference material or to cite them other than as "work in 32 progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/ietf/1id-abstracts.txt 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 Copyright Notice 42 Copyright (C) The Internet Society (2001). All Rights Reserved. 44 Abstract 46 The framing protocol accepts PDUs from a ULP (upper level protocol) 47 and transports them over a TCP connection. This is done in such a 48 way that the PDUs can be recovered at the receiver even if 49 preceding TCP segments have not yet been received. This is useful 50 when the PDUs are self describing within the context of a protocol 51 TCP connection. In this case, the framing protocol allows incoming 52 packets to be parsed (but not processed) in the order received and 53 their data to be placed directly in the ultimate destination memory 54 instead of TCP reassembly buffers. 56 Table Of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Theory Of Operation . . . . . . . . . . . . . . . . . . . . 3 60 3. ULP Support For Framing . . . . . . . . . . . . . . . . . . 5 61 4. Negotiating Use Of The Framing Protocol . . . . . . . . . . 6 62 5. PDU Alignment Mode . . . . . . . . . . . . . . . . . . . . . 6 63 5.1. Framing-aware TCP . . . . . . . . . . . . . . . . . . . . 8 64 5.2. PDU Alignment Mode Exception Cases . . . . . . . . . . . . 9 65 5.3. Validity Of Framing-aware TCP Segmentation . . . . . . . . 10 66 5.4. Receiving In PDU Alignment Mode . . . . . . . . . . . . . 11 67 6. Marker Mode . . . . . . . . . . . . . . . . . . . . . . . . 12 68 7. Security Considerations . . . . . . . . . . . . . . . . . . 12 69 7.1. Security Protocol Interactions . . . . . . . . . . . . . . 13 70 7.2. Using IPSec With The Framing Protocol . . . . . . . . . . 13 71 7.3. Using TLS With The Framing Protocol . . . . . . . . . . . 13 72 7.3.1. Using TLS In PDU Alignment Mode . . . . . . . . . . . . 15 73 7.3.2. Using TLS In Marker Mode . . . . . . . . . . . . . . . . 15 74 7.4. Other Security Considerations . . . . . . . . . . . . . . 16 75 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . 16 76 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 77 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 17 78 A. Sockets Support For The Framing Protocol . . . . . . . . . . 19 79 A.1 Enabling The Framing Protocol . . . . . . . . . . . . . . . 20 80 A.2 Sending Data Atomically . . . . . . . . . . . . . . . . . . 20 81 A.3 Retrieving The Current EMSS . . . . . . . . . . . . . . . . 21 82 A.4 Disabling ULP PDU Packing . . . . . . . . . . . . . . . . . 21 83 A.5 Enabling Emergency Mode . . . . . . . . . . . . . . . . . . 21 84 A.6 Setting The Sending Marker Interval . . . . . . . . . . . . 22 85 A.7 Setting The Receiving Marker Interval . . . . . . . . . . . 22 86 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 22 88 1. Introduction 90 Many upper layer protocols (ULP)s, particularly those which perform 91 bulk data transfer, permit the final location of transferred data 92 (e.g. a ULP client buffer) to be known when the data is received. 93 The information required to compute the final location of such data 94 is contained in local protocol state and ULP protocol data unit 95 (PDU) headers. In this case, ULP data can be placed directly at 96 its final destination by a network interface with knowledge of the 97 ULP. A direct placement network interface can offer extremely high 98 performance since the host CPU does not copy the data at all, and 99 the data only crosses system buses once. 101 Both specific application ULPs, such as iSCSI, and generic hardware 102 acceleration ULPs, such as an RDMA protocol, offer the potential 103 for direct data placement. The advantage of using a generic 104 acceleration ULP for direct data placement is that the same direct 105 placement network interface can be used to accelerate many 106 different application protocols (e.g. iSCSI on RDMA). 108 PDU shall mean ULP PDU for the remainder of the document unless 109 otherwise indicated. 111 TCP specifies that the ULP is notified of the delivery of octets in 112 the order in which they are presented to the sender. Many ULPs 113 rely on this sequencing guarantee. While notification from TCP is 114 required to be in-order, this does not prohibit arbitrary placement 115 of TCP data received in any order. Even if data for a ULP is 116 placed out-of-order, the ULP may still only be notified of of such 117 data in-order, in accordance with TCP semantics. In other words, 118 direct data placement based upon ULP information is not at odds 119 with TCP's stream-orientation, but rather is a natural application 120 of TCP's philosophy that ULP PDU framing be performed at the layer 121 above TCP. RFC 879 also points out in its discussion of layering 122 and modularity that this type of behavior is completely in harmony 123 with layered protocol design [RFC0879]. 125 Packet delay, loss and reordering are expected, common occurrences 126 in IP networks. Traditionally, data in TCP segments is placed in 127 an intermediate reassembly buffer to restore the sending order 128 which may have been lost as a result of segment delay, loss or 129 reordering. While it is possible for a direct placement network 130 interface to implement a complete reassembly buffer, the cost of 131 doing so is prohibitive. Such a reassembly buffer would need to 132 have a size equal to the sum of the maximum window sizes of all 133 active connections. On a fast network link (e.g. > 1 Gb/s), the 134 window size for each connection can be very large, which would 135 require a huge, very high speed reassembly buffer on the network 136 interface. 138 A way to find PDUs when previous PDU headers are in delayed, lost 139 or reordered segments will permit data in these subsequent PDUs to 140 be placed immediately by a direct placement network interface. 141 This will reduce the buffer requirements for a direct placement 142 network interface. Without such a mechanism, the data from 143 subsequent PDUs must all be buffered in the adapter until all 144 previous TCP segments are received. Initial discussion of this 145 issue, and how it relates specifically to iSCSI can be found in an 146 early iSCSI design team memo [Satran]. 148 This document specifies a protocol with two modes for efficiently 149 finding PDUs in the presence of lost, delayed or reordered TCP 150 segments. 152 2. Theory Of Operation 154 One very efficient way to guarantee that subsequent PDUs can always 155 be found when a previous PDU header has been lost is to ensure each 156 TCP segment begins with a PDU and contains an integral number of 157 PDUs. In this case, the data in each TCP segment may be placed 158 independently of all other segments. No reassembly buffer is 159 required at all. Guaranteeing a TCP segment begins with a PDU 160 requires a modification to TCP's sending behavior. This document 161 defines the behavior of a TCP with a modified sender behavior, 162 called a `framing-aware TCP'. A framing-aware TCP allows a ULP 163 implementation to ensure that each TCP segment begins with a PDU. 164 A framing-aware TCP is fully compliant with all RFCs governing TCP 165 and fully interoperable with existing, compliant, non-framing-aware 166 TCP implementations. When the framing protocol can use a framing- 167 aware TCP, it operates in `PDU alignment mode'. The framing 168 protocol in PDU alignment mode uses a combination of a framing- 169 aware TCP and an encapsulation of PDUs to permit error free PDU 170 location when TCP segments are lost. 172 Another way to locate PDUs in the presence of lost TCP segments is 173 to insert markers at a known period in the TCP octet stream. Each 174 marker points to the beginning of the next PDU. If the marker 175 frequency is high relative to packet loss rate (e.g. once per TCP 176 segment), the receiver can, with very high likelihood, learn the 177 location of the next PDU from a marker even when a previous PDU 178 header has been lost. The receiver must still buffer the octets 179 between the lost TCP segment and the subsequent PDU, but this is 180 likely to be a much smaller buffer than the maximum TCP window 181 size. By limiting the maximum PDU size, the receiver buffering can 182 be reasonably bounded. This document defines a periodic marker 183 mechanism which can be used to bound receiver reassembly buffers. 185 Two framing protocol modes are defined because of the substantial 186 tradeoff between the modes. Both modes can bound reassembly buffer 187 on a direct placement network interface, but the modes apply in 188 disjoint circumstances. 190 Marker mode has the following advantage: 192 1. Implementable without TCP sender modification 194 The PDU alignment mode has the following advantages: 196 1. No reassembly buffering required at all 198 2. Placement information is always at the start of a TCP segment, 199 substantially simplifying hardware processing 201 PDU alignment mode is more powerful, and is preferable when 202 available. Marker mode still requires some high-speed reassembly 203 memory, whose size is a linear function of the number of active TCP 204 connections. Furthermore, marker mode only offers a probabilistic 205 bound on the reassembly buffer size per active TCP connection. In 206 cases where many TCP segments with PDU headers are lost, the buffer 207 size required for direct placement could approach that of a 208 complete reassembly buffer. 210 It is expected that ultimately PDU alignment mode will dominate 211 because of compelling cost and performance scalability advantages. 212 However, until framing-aware TCPs are ubiquitous, marker mode 213 offers an alternative for use with an unmodified TCP 214 implementation. To make transition from marker mode to PDU 215 alignment mode easy, the sockets API extension defined in Appendix 216 A supports both modes relatively transparently. A ULP which 217 implements the behavior required for PDU alignment mode can use 218 marker mode without modification. 220 Framing protocol receivers MAY implement either PDU alignment mode, 221 or marker mode, or both. Framing protocol senders, MUST implement 222 marker mode, and MUST implement PDU alignment mode if the 223 underlying TCP is framing-aware. 225 3. ULP Support For Framing 227 A ULP using the framing protocol will submit each complete PDU to 228 the framing module in a single sending operation. This behavior is 229 already common practice for most ULP implementations. 231 When the framing protocol is in PDU alignment mode, each PDU 232 submitted is limited to the smaller of 2^16-8 (65528) and the size 233 that will fit entirely within a TCP segment. The framing protocol 234 in PDU alignment mode MUST fail any attempt to submit a PDU that is 235 larger than will fit with an 8-byte framing header in a TCP 236 segment. 238 The TCP maximum segment size (MSS) is defined in RFC 793 [TCP] as 239 the segment size exchanged on TCP connection establishment. In 240 addition, there is the segment size presently used by TCP which is 241 less than or equal to the exchanged MSS, adjusted by the current 242 path MTU [PathMTU]. This document calls the MSS presently in use 243 the `effective maximum segment size' (EMSS). The EMSS is of 244 primary concern to the framing protocol in PDU alignment mode. 246 The TCP EMSS can shrink to 8 octets [PathMTU] which leaves no room 247 for a PDU in PDU alignment mode. If the EMSS goes below 512 octets, 248 the ULP MAY instruct the framing protocol to enter an "emergency 249 mode." In this mode, the framing module MUST accept PDUs up to 512 250 octets and MAY fragment a PDU across TCP segments. 252 The EMSS may change during the course of the connection. The 253 framing module in PDU alignment mode MUST notify the ULP sender of 254 changes in the EMSS. The framing module in PDU alignment mode MUST 255 provide the current value of the path EMSS to the ULP on request. 257 When the framing protocol is in marker mode, each PDU submitted is 258 limited to 2^16-8 minus the size of all interspersed markers. The 259 framing protocol in marker mode MUST fail any attempt to submit a 260 PDU larger than this limit. The framing module MAY impose a 261 smaller, implementation specific size limit on PDUs. In order to 262 effectively bound the receiver's reassembly buffer size, the ULP 263 SHOULD submit PDUs limited in size by some appropriate function of 264 the receiver's reassembly buffer resources, but no specific limit 265 is imposed by the framing protocol. 267 4. Negotiating Use Of The Framing Protocol 269 Negotiating use of the framing protocol is the responsibility of 270 the ULP. The use of the framing protocol MAY be negotiated 271 separately for each direction on a particular connection. The 272 negotiation procedure MUST ensure that when receive framing is 273 enabled, the remote peer will not transmit the first TCP segment 274 with framed data until it is certain that the local peer has 275 actually enabled receive framing. 277 If a receiver requests PDU alignment mode, and the sender supports 278 PDU alignment mode, then the sender MUST enable PDU alignment mode. 279 This ensures that PDU alignment mode, with its favorable hardware 280 characteristics, is used when possible. 282 The specific negotiation mechanism for enabling the framing 283 protocol and choosing the framing mode is outside the scope of this 284 document. However, note that framing protocol behavior is 285 requested by the receiver and offered by the sender. Negotiation 286 will probably include exchange of: 288 1. the receiver's desired mode(s) 290 2. the sender's framing key if PDU alignment mode is selected 292 2. ULP packing behavior if PDU alignment mode is selected 294 3. the receiver's desired marker period if marker mode is 295 selected 297 4. the receiver's desired maximum PDU size if marker mode is 298 selected 300 5. PDU Alignment Mode 302 The framing protocol in PDU alignment mode sends one or more 303 complete ULP PDUs preceded by a framing header. This framing 304 header and set of ULP PDUs is called a `framing PDU'. The framing 305 protocol in PDU alignment mode is supported by a framing-aware TCP 306 whose behavior is described in `Framing-Aware TCP', below. 308 The format of a framing PDU is as follows: 310 0 1 2 3 311 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 | Length | Key | 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 315 | Key | 316 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 317 | | 318 | | 319 ~ ~ 320 ~ ULP PDUs ~ 321 | | 322 | | 323 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 324 | ULP PDUs | 325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 327 The "Length" field is 16 bits and contains the length in octets of 328 the set of framed ULP PDUs, excluding the framing header. 330 The "Key" field is 48 bits and is selected at random by the sender, 331 and signalled to the receiver in a ULP-specified way. All framing 332 PDUs sent on the same connection in the same direction must use the 333 same key value. A good quality random number generator MUST be 334 used to generate the initial key. RFC 1750 discusses relevant 335 characteristics and provides references for good quality random 336 number generation [RFC1750]. 338 The length of the framing PDU in octets will be 8 + L, where L is 339 the length of the set of framed ULP PDUs. 341 Whether more than one ULP PDU may be packed into a single framing 342 PDU is a controllable option of the framing module in PDU alignment 343 mode. Some receivers may choose to expect exactly one ULP PDU per 344 TCP segment when framing is behaving nominally. The sender MUST 345 NOT pack more than one ULP PDU into a framing PDU if this behavior 346 is desired by the receiver. ULP packing behavior may be negotiated 347 or specified priori by the ULP. 349 5.1. Framing-aware TCP 351 A framing-aware TCP SHALL send one complete framing PDU per TCP 352 segment whenever possible. Cases when it may not be possible to 353 send a complete framing PDU in each TCP segment are described in 354 `PDU Alignment Mode Exception Cases', below. 356 A framing-aware TCP MUST NOT send any TCP segment containing octets 357 from more than one sending operation. In other words, the boundary 358 between data of consecutive sending operations MUST occur between 359 TCP segments. By following this rule, the sender guarantees that 360 in the event an exception causes PDU alignment to be lost 361 temporarily, it will be regained as soon as possible. 363 The use of oversize TCP segments sent by means of IP fragmentation 364 is discouraged due to the limited size of the IP header 365 Identification field and the potential for undetected errors due to 366 wrapping of the Identification value. Framing-aware TCP 367 implementations SHOULD resegment at the TCP layer according to the 368 rule given in the previous paragraph when necessary to meet 369 requirements of the current maximum segment size for a path. In 370 this document, EMSS means the current TCP maximum segment size used 371 for sending segments on a connection, which is initially negotiated 372 during the connection handshake, and subsequently adjusted by path 373 maximum transfer unit (PMTU) discovery behavior [PathMTU]. 375 A framing-aware TCP must notify the framing module of changes in 376 the EMSS. The framing module must be able to retrieve the EMSS 377 from the framing-aware TCP. 379 If the framing-aware TCP chooses to probe for path MTU increase 380 using TCP segment larger than the path MTU, the framing-aware TCP 381 MUST report an appropriate EMSS increase. The candidate path MTU 382 will only be probed when the framing protocol submits a framing PDU 383 larger than the current EMSS. Immediately following the probing 384 segment, the framing-aware TCP MUST reduce EMSS to its previous 385 value until the candidate path MTU is confirmed. 387 Probing for path MTU increase is optional [PathMTU], and a framing- 388 aware TCP might elect not to do so unless the EMSS becomes 389 `inconveniently' small. By not probing for path MTU increase when 390 the current EMSS provides adequate performance, the framing 391 protocol will not send the potentially unaligned PDUs that would be 392 used to probe path MTU. 394 Although framing-aware TCP is defined specifically to support the 395 framing protocol in ULP alignment mode, it may be used by other 396 clients, assuming framing validation is provided by some means. 397 For example, as discussed below in `Security Considerations', a 398 framing-aware TLS could use a framing-aware TCP directly without 399 adding framing PDU headers, because TLS validation can serve the 400 same purpose, and actually provides stronger framing validations 401 guarantees than a framing PDU header. 403 5.2. PDU Alignment Mode Exception Cases 405 Although the framing-aware TCP sender should place exactly one 406 framing PDU in each TCP segment there are exceptions when this is 407 not possible. These exceptions include the following. 409 1. The connection is in emergency mode and EMSS is less than 512 410 octets. 412 2. The EMSS has been reduced. This will result in a window 413 during which the ULP is not yet aware of the reduced EMSS. 414 Since some framing PDUs may already have been sent and 415 possibly lost prior to being received, the same framing PDUs 416 must be resent, if necessary, but in smaller TCP segments 417 which conform to the new EMSS. 419 3. The remote end is advertising a window smaller than the EMSS. 420 If both ends manage their window as required in RFC-1122 421 [RFC1122], and a reasonable amount of receive buffering is 422 available, this case should not occur, but the sender, for 423 robustness, must tolerate this. 425 4. The sender is probing an advertised window of zero. 427 5. The sender is probing to determine if the path MTU can be 428 increased. 430 In addition, there is another case in which the receiver will 431 receive framing PDUs which are not aligned with TCP segments. 433 6. There is a middle-box in the connection which is resegmenting 434 the TCP data stream. 436 If the framing protocol in PDU alignment mode must send an 437 unaligned framing PDU, it SHALL take one of the following actions. 439 1. Send the framing PDU as a single TCP segment using IP 440 fragmentation. While this behavior is discouraged, it is not 441 prohibited by the framing protocol, or any other applicable 442 RFCs. 444 2. Send the framing PDU as several TCP segments, with each 445 segment guaranteed not to appear as a well-formed, complete 446 framing PDU on its own, at the time the segment is sent. That 447 is, the sender SHALL ensure that one of the following is true 448 for every segment with a partial framing PDU: 450 A. octets 0-1 do not equal the segment length minus 8 452 B. octets 2-8 do not match the framing key value 454 C. the total segment length is less than the framing PDU 455 header of 8 octets 457 These mechanisms ensure that the receiver will not falsely 458 misinterpret any piece of a framing PDU sent in several segments as 459 a complete, valid framing PDU. However if the TCP data stream is 460 subjected to resegmenting by a middle-box, the sender may no longer 461 control segmentation of received data. In this case the framing 462 protocol must rely on probability to ensure that segments of the 463 resegmented data stream will not appear as valid, complete framing 464 PDUs, if they are not. 466 In the case where the receiver detects a continuous stream of TCP 467 segments which do not contain complete framing PDUs, the ULP SHOULD 468 disable use of the framing protocol, or switch to marker mode if 469 the ULP provides a means of doing this, and the end points so 470 choose. Such a continuous stream of improperly framed TCP segments 471 implies the presence of a resegmenting middle-box. Such a 472 detection process SHOULD NOT mistake a temporary sequence of 473 improperly framed TCP segments resulting from an EMSS change with 474 the presence of a resegmenting middle-box 476 5.3. Validity Of Framing-aware TCP Segmentation 478 A framing-aware TCP normally sends exactly one framing PDU per TCP 479 segment. This may therefore result in more segments being sent 480 than would occur in a traditional TCP. However, the framing module 481 is allowed to pack multiple ULP PDUs into a single framing PDU if 482 ULP packing is enabled, which will give behavior approaching that 483 of a traditional TCP. Even with ULP packing disabled, the behavior 484 of a framing-aware TCP effectively corresponds to that of a 485 traditional TCP sender with the Nagle algorithm disabled (i.e. 486 TCP_NODELAY), and this is considered acceptable behavior. 488 Framing-aware TCPs still respect congestion control windows, which 489 are maintained as a octet count not as a segment count. 491 On retransmission, a framing-aware TCP respects the original stream 492 segmentation. This is allowed by RFC1122 [RFC1122], section 493 4.2.2.15. 495 5.4. Receiving In PDU Alignment Mode 497 Because each framing PDU contains sufficient information to 498 determine its length, the beginning of the next framing PDU can be 499 determined. Therefore each successive PDU can be recovered. 501 Conventional TCP implementations will pass received data to the ULP 502 in order, so framing is easily recovered by the ULP. 504 Special receive implementations which exploit PDU alignment mode, 505 typically found in direct placement network interfaces, may allow 506 the ULP to do direct data placement on TCP segments received out of 507 order. The receiving end can safely assume that a framing PDU is 508 exactly contained within TCP segment payload if the following 509 conditions are met. 511 1. Standard TCP processing indicates that this is a valid, in- 512 window segment. 514 2. The payload of the TCP segment, parsed as a framing PDU, has a 515 length field which equals the TCP segment length minus 8, and 516 a key field which matches the expected key for the framing 517 protocol connection. 519 The framing protocol passes the contained ULP PDUs to a ULP parser. 520 The ULP parser performs direct placement for the PDUs. The ULP 521 parser MUST NOT execute the ULP protocol (i.e. none of the ULP 522 protocol state variables change), until all preceding octets in the 523 TCP stream have also been received. 525 6. Marker Mode 527 The framing protocol in marker mode inserts framing markers in the 528 TCP octet stream at a period agreed upon by the framing protocol 529 sender and receiver. Each framing marker points to the next PDU in 530 the TCP octet stream. Marker insertion in the TCP octet stream is 531 not synchronized in any way with the ULP. The ULP may use PDUs of 532 any size up to 2^16-8-(4 * # of markers inserted) (determined by 533 marker interval). Markers will be inserted in the resulting octet 534 stream, possibly interrupting PDUs, as necessary to maintain the 535 interval. Although the placement of each marker is not a function 536 of the ULP PDU boundaries, the contents of each marker are. 538 The format of a framing marker is as follows: 540 0 1 2 3 541 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 542 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 543 | Next PDU Offset | Next PDU Offset | 544 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 546 The "Next PDU Offset" contains the offset to the next PDU, in 547 octets, from the end of the marker. 549 The "Next PDU Offset" occurs twice in the marker to guarantee that 550 when a marker is split across TCP segments, a complete copy of Next 551 PDU Offset occurs in at least one of the two TCP segments. 553 The framing protocol receiver must remove (or otherwise ignore) the 554 periodic markers in the received TCP octet stream to reconstruct 555 the PDUs from the sender. 557 The first marker SHALL be sent in the TCP octet stream preceding 558 any framed PDUs. This first marker will, necessarily, have a Next 559 PDU Pointer of 0. The first marker corresponds to the point in the 560 TCP octet stream when the framing protocol is enabled. 562 7. Security Considerations 564 7.1. Security Protocol Interactions 566 The ULP framing protocol may be layered on top of IPSec, or TLS. A 567 direct placement network interface which supports connections 568 secured with IPSec or TLS must directly implement security protocol 569 processing as well as framing and direct placement support. 571 7.2. Using IPSec With The Framing Protocol 573 Since IPSec is designed to secure arbitrary IP packet streams, 574 including streams where packets are lost, the framing protocol 575 could run cleanly on top of IPSec without any change. 577 Using IPSec end-to-end with the framing protocol in PDU alignment 578 mode permits an optimization to the framing protocol. Because 579 IPSec validation criteria guarantee that IP packets received are 580 equivalent to the IP packets sent, it is not possible for an 581 intermediary to resegment the TCP stream. If IP fragmentation 582 (rather than resegmenting) is used to send committed data when the 583 EMSS changes, the framing PDU validation header is not needed. In 584 this case, a ULP may run directly on top of a framing-aware TCP. 586 7.3. Using TLS With The Framing Protocol 588 Using TLS with the framing protocol is more complicated than using 589 IPSec. The combination of TLS and the framing protocol must still 590 provide a modest bound on reassembly buffer size to be useful. 592 TLS is a record-oriented protocol. TLS records are PDUs just like 593 those used by ULPs that permit direct placement. As with other 594 ULPs, the only way to avoid a complete reassembly buffer is to be 595 able to find TLS PDUs in the presence of lost TCP segments. 596 Therefore, to permit direct placement of ULPs secured with TLS, TLS 597 should also be treated as a protocol which uses framing support. 599 Using the framing protocol with TLS requires modification of a TLS 600 implementation for the combination to perform effectively. 601 Essentially, a TLS implementation must become a client of the 602 framing protocol. 604 TLS provides a similar interface to TCP for sending protocol data. 605 Protocol data submitted to the TLS send interface may be coalesced 606 with other protocol data in a single TLS PDU, or it may be 607 segmented arbitrarily across more than one TLS PDU. For the 608 framing protocol in to properly support direct placement with TLS, 609 a framing-aware TLS MUST provide a framing-aware interface to the 610 ULP similar to the one described in Appendix A. 612 This layering looks like: 614 Framing ULP client 615 | 616 V 617 TLS-capable framing module 618 | 619 V 620 Framing-aware TLS 621 | 622 V 623 Framing module 624 | 625 V 626 TCP (possibly framing-aware) 627 | 628 V 629 . . . 631 Although some framing information may be exposed in the clear when 632 running TLS on the framing protocol, this information does not add 633 to what is already available to an attacker. Framing only conveys 634 the location of TLS PDUs, which are already available in the clear. 636 Unfortunately, ciphers defined for use with TLS do not offer the 637 same independence of TLS PDUs that IPSec provides for IP datagrams. 638 For one thing, TLS supports the use of stream ciphers, which IPSec 639 does not. Stream ciphers typically have dependencies reaching far 640 back in the data stream for deciphering at the current point. 641 Therefore it is probably not appropriate to negotiate the use of a 642 stream cipher when securing the framing protocol. 644 Block ciphers defined for use with TLS have similar properties to 645 those defined for use with IPSec. Specifically, they all operate 646 in Cipher Block Chaining (CBC) mode. However, while IPSec provides 647 a CBC initialization vector for each IP datagram, TLS defines only 648 a single CBC initialization vector for use in the first block. All 649 subsequent blocks use the cipher-text of their predecessor. To 650 decipher the current TLS PDU, the final cipher-text block from the 651 previous TLS PDU must be available. Typically, block ciphers 652 defined for use with TLS have an 8-octet block size. This implies 653 that for ULP direct placement to be possible with TLS, data from a 654 preceding TCP segment may be needed, where it is not when using the 655 framing protocol without TLS. Note that if the preceding TCP 656 segment is missing, all cipher blocks within the current TCP 657 segment may still be processed except the first one (assuming the 658 bounds of the TLS PDU is known). 660 7.3.1. Using TLS In PDU Alignment Mode 662 To run the framing protocol running on TLS in PDU alignment mode, 663 an integral number of TLS PDUs may be sent in each TCP segment the 664 same way ULP PDUs are sent in the absence of TLS. A framing-aware 665 TLS would use the framing-aware TCP. In this case, the role of the 666 framing PDU header in detecting unexpected modification of TCP 667 segmentation is subsumed by the strong integrity checks performed 668 on TLS PDUs. There is no need to encapsulate TLS PDUs in a framing 669 PDU. In fact, the vulnerability of the framing key to active 670 attack is eliminated by using TLS validation algorithms instead. 672 Use of a non-null TLS compression algorithm may interact badly with 673 a framing-aware TLS implementation. A TLS compression algorithm is 674 allowed to increase content length by up to 1024, which may result 675 in the compressed TLS PDU no longer fitting within EMSS. 676 Therefore, only TLS compression algorithms which are known not to 677 increase content length, or increase content length by a small, 678 manageable amount, should be selected. 680 The need to receive the previous TCP segment before completing TLS 681 processing of current TCP segment means that using the framing 682 protocol in PDU alignment mode with TLS will require some high- 683 speed receive packet buffer memory. This defeats one of the 684 primary advantages of PDU alignment mode. Therefore, while it is 685 possible to use TLS to secure the framing protocol in PDU alignment 686 mode, IPSec would be a more appropriate choice for securing PDU 687 alignment mode connections because it does not require any 688 reassembly buffer memory. 690 7.3.2. Using TLS In Marker Mode 692 To use TLS on a framing protocol connection in marker mode, the TCP 693 stream must actually contain two, independent sets of periodic 694 markers. Clear-text markers in the TLS PDU stream will permit TLS 695 PDUs to be found in the presence of lost TCP segments. Once a 696 portion of the original, clear-text TCP stream is recovered by TLS 697 processing, markers in the original octet stream are used to find 698 ULP PDUs and perform direct placement. 700 7.4. Other Security Considerations 701 The modification of the sender's TCP segmentation algorithm in PDU 702 alignment mode does not open any new attacks, since: 1) the 703 segmentation algorithm is not based on input from the network, 2) 704 the segmentation algorithm may pack small ULP PDUs into a single 705 TCP segment so it does not open packet flooding attacks. 707 If an attacker can send an in-window TCP segment that is accepted, 708 on an unsecured framing protocol connection the attacker can 709 probably force the TCP receiver in to a framing protocol exception 710 path, degrading service. However, such an attacker can also place 711 arbitrary data into the stream, so merely forcing the receiver on 712 to an exception path is not a compelling attack. 714 8. IANA Considerations 716 If framing is enabled a priori for a ULP by connecting to a well- 717 known port, this well-known port would be registered for the framed 718 ULP with IANA. 720 9. References 722 [ALF] 723 D. D. Clark and D. L. Tennenhouse, "Architectural 724 considerations for a new generation of protocols," in SIGCOMM 725 Symposium on Communications Architectures and Protocols , 726 (Philadelphia, Pennsylvania), pp. 200--208, IEEE, Sept. 1990. 727 Computer Communications Review, Vol. 20(4), Sept. 1990. 729 [SOCKS] 730 Leech, M., and others, "SOCKS Protocol Version 5," RFC 1928, 731 April 1996 733 [RFC0879] 734 Postel, J., "TCP Maximum Segment Size And Related Topics", RFC 735 879, November 1983 737 [RFC1112] 738 Braden, R., ed., "Requirements for Internet Hosts -- 739 Communications Layers", RFC 1122, October 1989 741 [PathMTU] 742 Mogul, J., and Deering, S., "Path MTU Discovery", RFC 1191, 743 November 1990 745 [RFC1750] 746 Eastlake, D., Crocker, S., Schiller., J., "Randomness 747 Recommendations for Security.", RFC 1750, December 1994 749 [RFC2581] 750 Allman, M. and others, "TCP Congestion Control," RFC 2581, 751 April 1999 753 [Stevens] 754 Stevens, W. Richard, "Unix Network Programming Volume 1," 755 Prentice Hall, 1998, ISBN 0-13-490012-X 757 [TCP] 758 Postel, J., "Transmission Control Protocol - DARPA Internet 759 Program Protocol Specification", RFC 793, September 1981 761 [TLS] 762 Dierks, T. and others, "The TLS Protocol, Version 1.0", RFC 763 2246 765 [Satran] 766 Satran, J., "iSCSI - fragments, packets synchronization and 767 RDMA", http://www.haifa.il.ibm.com/satran/ips/iSCSI-RDMA- 768 memo.txt, July 2000. 770 Authors' Addresses 771 Stephen Bailey 772 Sandburst Corporation 773 600 Federal Street 774 Andover, MA 01810 775 USA 777 Phone: +1 978 689 1614 778 Email: steph@sandburst.com 780 Jim Pinkerton 781 Microsoft, Inc. 782 1 Microsoft Way 783 Redmond, WA 98052 784 USA 786 EMail: jpink@microsoft.com 788 Constantine Sapuntzakis 789 Cisco Systems 790 170 W Tasman Drive 791 San Jose, CA 95134 792 USA 794 Phone: +1 408 525 5497 795 EMail: csapuntz@cisco.com 797 Matt Wakeley 798 Agilent Technologies 799 1101 Creekside Ridge Drive 800 Suite 100, M/S RH21 801 Roseville, CA 95661 802 USA 804 Phone: +1 916 788 5670 805 EMail: matt_wakeley@agilent.com 806 Jim Wendt 807 Hewlett Packard Corporation 808 8000 Foothills Boulevard MS 5668 809 Roseville, CA 95747-5668 810 USA 812 Phone: +1 916 785 5198 813 EMail: jim_wendt@hp.com 815 Jim Williams 816 Emulex Corporation 817 580 Main Street 818 Bolton, MA 01740 819 US 821 Phone: +1 978 779 7224 822 EMail: jim.williams@emulex.com 824 Appendix A. Sockets Support For The Framing Protocol 826 The sockets support for the framing module takes the form of a set 827 of socket options which may be set or requested to enable the 828 appropriate behavior. 830 A socket may be in one of three modes in the send direction: 832 1. Framing-aware TCP mode. No data is added to the TCP octet 833 stream (neither framing PDUs nor markers), but each data 834 buffer presented in a sending operation is sent atomically as 835 a single TCP segment. This mode provides direct access to a 836 framing-aware TCP sender for purposes such as implementing a 837 framing-aware TLS. 839 2. Framing protocol PDU alignment sender mode. A framing PDU 840 header is added to data presented by an integral number of 841 sending operations, and the resulting framing PDU is sent 842 according to the rules of PDU alignment mode. 844 3. Framing protocol marker sender mode. Markers are inserted at 845 fixed intervals which point to the octet past the current PDU 846 submitted by a sending operation. 848 A socket may be in one of two modes in the receive direction: 850 1. Framing protocol PDU alignment receiver mode. Framing PDUs 851 are expected in each TCP segment. 853 2. Framing protocol marker receiver mode. Markers are expected 854 at a fixed interval in the TCP stream. 856 Received TCP segments are processed as defined above. If a socket 857 receiving operation is used to retrieve received data (as opposed 858 to direct placement), framing PDU headers or markers are removed 859 before the data is returned. 861 A.1 Enabling The Framing Protocol 863 /* Pick one sending mode and one receiving mode */ 864 if (sendMode == ATOMIC) 865 mode = TCP_FRAMING_SEND_ATOMIC 866 else if (sendMode == ALIGN) 867 mode = TCP_FRAMING_SEND_ALIGN; 868 else /* sendMode == MARKERS */ 869 mode = TCP_FRAMING_SEND_MARKERS; 871 if (recvMode == ALIGN) 872 mode |= TCP_FRAMING_RECV_ALIGN; 873 else /* recvMode == MARKERS */ 874 mode |= TCP_FRAMING_RECV_MARKERS; 876 setsockopt (s, SOL_TCP, TCP_FRAMING_MODE, &mode, 877 sizeof(mode)); 879 A framing module that does not support a requested mode MUST fail 880 the setsockopt call. Framing may be enabled on a socket before or 881 after it is connected, subject to the requirements of Section 2. 883 A.2 Sending Data Atomically 885 The standard socket sending operations, including send(), sendto(), 886 sendmsg(), writev(), and others are used to send framed data units 887 (ULP PDU)s with the framing protocol. The EMSGSIZE error should be 888 returned if the buffer passed to the sending operation does not 889 satisfied the size requirements defined in the `ULP Support For 890 Framing' section above. 892 When the path EMSS increases, the TCP MAY return EMSGSIZE once to 893 inform the client of the change. 895 A.3 Retrieving The Current EMSS 897 getsockopt (s, SOL_TCP, TCP_SEND_EMSS, &emss, sizeof(emss)); 899 This call returns the maximum segment size that can be submitted in 900 a sending operation without fragmentation. The number returned 901 depends upon the current socket sending mode. If the socket is in 902 framing protocol PDU alignment mode, the returned EMSS is 903 appropriately adjusted by the size of the framing header. The 904 number should not count any octets that go towards TCP options. A 905 framing protocol implementation which does not support PDU 906 alignment mode, because the underlying TCP sender is not framing- 907 aware, is not required to implement this getsockopt call. 909 A.4 Disabling ULP PDU Packing 911 flag = 0; 912 setsockopt (s, SOL_TCP, TCP_FRAMING_PACK_PDUS, &flag, 913 sizeof(flag)); 915 This call disables the framing protocol in PDU alignment mode from 916 packing more than one ULP PDU into a framing PDU. By default, ULP 917 PDU packing is enabled. 919 A.5 Enabling Emergency Mode 921 flag = 1; 922 setsockopt (s, SOL_TCP, TCP_FRAMING_EMERGENCY, &flag, 923 sizeof(flag)); 925 This call enables emergency mode for PDU alignment mode. It may be 926 called at any time on a socket, whether connected or not, and 927 whether the current EMSS is smaller than 512 octets or not. By 928 default emergency mode is disabled. 930 A.6 Setting The Sending Marker Interval 932 ivl = 2048; 933 setsockopt (s, SOL_TCP, TCP_FRAMING_SEND_INTERVAL, &ivl, 934 sizeof(ivl)); 936 This call sets the period at which markers will be introduced to 937 the sent TCP octet stream. The sending marker interval may be set 938 at any time, but it only has effect when sending markers is enabled 939 for the socket. 941 A.7 Setting The Receiving Marker Interval 943 ivl = 2048; 944 setsockopt (s, SOL_TCP, TCP_FRAMING_RECV_INTERVAL, &ivl 945 sizeof(ivl)); 947 This call sets the period at which markers are expected in the 948 received TCP octet stream. The receiving marker interval may be 949 set at any time, but it only has effect when receiving markers is 950 enabled for the socket. 952 Full Copyright Statement 954 Copyright (C) The Internet Society (2001). All Rights Reserved. 956 This document and translations of it may be copied and furnished to 957 others, and derivative works that comment on or otherwise explain 958 it or assist in its implementation may be prepared, copied, 959 published and distributed, in whole or in part, without restriction 960 of any kind, provided that the above copyright notice and this 961 paragraph are included on all such copies and derivative works. 962 However, this document itself may not be modified in any way, such 963 as by removing the copyright notice or references to the Internet 964 Society or other Internet organizations, except as needed for the 965 purpose of developing Internet standards in which case the