idnits 2.17.1 draft-ietf-rddp-applicability-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 17. -- Found old boilerplate from RFC 3978, Section 5.5 on line 983. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 960. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 967. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 973. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 663: '... Applications SHOULD trust that this...' RFC 2119 keyword, line 665: '...c. Applications SHOULD NOT apply addi...' RFC 2119 keyword, line 669: '... Administrators MUST NOT enable CRC32...' RFC 2119 keyword, line 734: '... messages MUST be provided by the...' Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 30, 2006) is 6512 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'RDMA-Security' on line 652 -- Looks like a reference, but probably isn't: 'RDMA-SEC' on line 847 == Unused Reference: '3' is defined on line 906, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 909, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 912, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 917, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2246 (ref. '1') (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2406 (ref. '2') (Obsoleted by RFC 4303, RFC 4305) == Outdated reference: A later version (-07) exists of draft-ietf-rddp-rdmap-05 == Outdated reference: A later version (-07) exists of draft-ietf-rddp-ddp-05 == Outdated reference: A later version (-07) exists of draft-ietf-rddp-sctp-02 == Outdated reference: A later version (-08) exists of draft-ietf-rddp-mpa-03 == Outdated reference: A later version (-10) exists of draft-ietf-rddp-security-09 == Outdated reference: A later version (-08) exists of draft-ietf-nfsv4-nfsdirect-02 Summary: 6 errors (**), 0 flaws (~~), 12 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Remote Direct Data Placement C. Bestler 3 Working group Broadcom Corporation 4 Internet-Draft L. Coene 5 Expires: December 1, 2006 Siemens 6 May 30, 2006 8 Applicability of Remote Direct Memory Access Protocol (RDMA) and Direct 9 Data Placement (DDP) 10 draft-ietf-rddp-applicability-07.txt 12 Status of this Memo 14 By submitting this Internet-Draft, each author represents that any 15 applicable patent or other IPR claims of which he or she is aware 16 have been or will be disclosed, and any of which he or she becomes 17 aware will be disclosed, in accordance with Section 6 of BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on December 1, 2006. 37 Copyright Notice 39 Copyright (C) The Internet Society (2006). 41 Abstract 43 This document describes the applicability of Remote Direct Memory 44 Access Protocol (RDMAP) and the Direct Data Placement Protocol (DDP). 45 It compares and contrasts the different transport options over IP 46 that DDP can use, provides guidance to ULP developers on choosing 47 between available transports and/or how to be indifferent to the 48 specific transport layer used, compares use of DDP with direct use of 49 the supporting transports, and compares DDP over IP transports with 50 non-IP transports that support RDMA functionality. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 3. Direct Placement . . . . . . . . . . . . . . . . . . . . . . . 6 57 3.1. Direct Placement using only the LLP . . . . . . . . . . . 6 58 3.2. Fewer Required ULP Interactions . . . . . . . . . . . . . 7 59 4. Tagged Messages . . . . . . . . . . . . . . . . . . . . . . . 8 60 4.1. Order Independent Reception . . . . . . . . . . . . . . . 8 61 4.2. Reduced ULP Notifications . . . . . . . . . . . . . . . . 9 62 4.3. Simplified ULP Exchanges . . . . . . . . . . . . . . . . . 9 63 4.4. Order Independent Sending . . . . . . . . . . . . . . . . 11 64 4.5. Untagged Messages and Tagged Buffers as ULP Credits . . . 12 65 5. RDMA Read . . . . . . . . . . . . . . . . . . . . . . . . . . 14 66 6. LLP Comparisons . . . . . . . . . . . . . . . . . . . . . . . 15 67 6.1. Multistreaming Implications . . . . . . . . . . . . . . . 15 68 6.2. Out of Order Reception Implications . . . . . . . . . . . 15 69 6.3. Header and Marker Overhead . . . . . . . . . . . . . . . . 15 70 6.4. Middlebox Support . . . . . . . . . . . . . . . . . . . . 15 71 6.5. Processing Overhead . . . . . . . . . . . . . . . . . . . 16 72 6.6. Data Integrity Implications . . . . . . . . . . . . . . . 16 73 6.6.1. MPA/TCP Specifics . . . . . . . . . . . . . . . . . . 16 74 6.6.2. SCTP Specifics . . . . . . . . . . . . . . . . . . . . 17 75 6.7. Non-IP Transports . . . . . . . . . . . . . . . . . . . . 17 76 6.7.1. No RDMA Layer Ack . . . . . . . . . . . . . . . . . . 17 77 6.8. Other IP Transports . . . . . . . . . . . . . . . . . . . 18 78 6.9. LLP Independent Session Establishment . . . . . . . . . . 19 79 6.9.1. RDMA-only Session Establishment . . . . . . . . . . . 19 80 6.9.2. RDMA-Conditional Session Establishment . . . . . . . . 19 81 7. Local Interface Implications . . . . . . . . . . . . . . . . . 21 82 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 83 9. Security considerations . . . . . . . . . . . . . . . . . . . 23 84 9.1. Connection/Association Setup . . . . . . . . . . . . . . . 23 85 9.2. Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . 23 86 9.3. Impact of Encrypted Transports . . . . . . . . . . . . . . 23 87 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 88 10.1. Normative references . . . . . . . . . . . . . . . . . . . 25 89 10.2. Informative References . . . . . . . . . . . . . . . . . . 25 90 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 91 Intellectual Property and Copyright Statements . . . . . . . . . . 27 93 1. Introduction 95 Remote Direct Memory Access Protocol (RDMAP) and Direct Data 96 Placement (DDP) work together to provide application independent 97 efficient placement of application payload directly into buffers 98 specified by the Upper Layer Protocol (ULP). 100 The DDP protocol is responsible for direct placement of received 101 payload into ULP specified buffers. The RDMAP protocol provides 102 completion notifications to the ULP and support for Data Sink 103 initiated fetch of advertised buffers (RDMA Reads). 105 DDP and RDMAP are both application independent protocols which allow 106 the ULP to perform remote direct data placement. DDP can use 107 multiple standard IP transports including SCTP and TCP. 109 By clarifying the situations where the functionality of these 110 protocols are applicable, this document can guide implementers, 111 application and protocol designers in selecting which protocols to 112 use. 114 The applicability of RDMAP/DDP is driven by their unique 115 capabilities: 117 o The existence of an application independent protocol allows common 118 solutions to be implemented in hardware and/or the kernel. This 119 document will discuss when common data placement procedures are of 120 the greatest benefit to applications as contrasted with 121 application specific solutions built on top of direct use of the 122 underlying transport. 124 o DDP supports both untagged and tagged buffers. Tagged buffers 125 allow the Data Sink ULP to be indifferent to what order (or in 126 what messages) the Data Source sent the data, or what order 127 packets are received in. Typically tagged data can be used for 128 payload transfer, while untagged is best used for control 129 messages. However each upper layer protocol can determine the 130 optimal use of tagged and untagged messages for itself. This 131 document will discuss when Data Source flexibility is of benefit 132 to applications. 134 o RDMAP consolidates ULP notifications, thereby minimizing the 135 number of required ULP interactions. 137 o RDMAP defines RDMA Reads, which allow remote access to advertised 138 buffers. This document will review the advantages of using RDMA 139 Reads as contrasted to alternate solutions. 141 Some non-IP transports, such as InfiniBand, directly integrate RDMA 142 features. This document will review the applicability of providing 143 RDMA services over ubiquitous IP transports as opposed to the use of 144 customized transport protocols. Due to the fact that DDP is defined 145 cleanly as a layer over existing IP transports, DDP has simpler 146 ordering rules than some prior RDMA protocols. This may have some 147 implications for application designers. 149 The full capabilities of DDP and RDMAP can only be fully realized by 150 applications that are designed to exploit them. The co-existence of 151 RDMAP/DDP aware local interfaces with traditional socket interfaces 152 will also be explored. 154 Finally, DDP support is defined for at least two IP transports: SCTP 155 and TCP. The rationale for supporting both transports is reviewed, 156 as well as when each would be the appropriate selection. 158 2. Definitions 160 Advertisement - the act of informing a Remote Peer that a local RDMA 161 Buffer is available to it. A Node makes available an RDMA Buffer 162 for incoming RDMA Read or RDMA Write access by informing its RDMA/ 163 DDP peer of the Tagged Buffer identifiers (STag, base address, and 164 buffer length). This advertisement of Tagged Buffer information 165 is not defined by RDMA/DDP and is left to the ULP. A typical 166 method would be for the Local Peer to embed the Tagged Buffer's 167 Steering Tag, base address, and length in a Send Message destined 168 for the Remote Peer. 170 Data Sink - The peer receiving a data payload. Note that the Data 171 Sink can be required to both send and receive RDMA/DDP Messages to 172 transfer a data payload. 174 Data Source - The peer sending a data payload. Note that the Data 175 Source can be required to both send and receive RDMA/DDP Messages 176 to transfer a data payload. 178 Lower Layer Protocol (LLP) The transport protocol that provides 179 services to DDP. This is an IP transport with any required 180 adaptation layer. Adaptation layers are defined for SCTP and TCP. 182 Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid 183 as defined within a protocol specification. 185 Tagged Message A DDP message that is directed to a ULP specified 186 buffer based upon imbedded addressing information. In the 187 immediate sense, the destination buffer is specified by the 188 message sender. The message receiver is given no independent 189 indication that a tagged message has been received. 191 Untagged Message A DDP message that is directed to a ULP specified 192 buffer based upon a Message Sequence Number being matched with a 193 receiver supplied buffer. The destination buffer is specified by 194 the message receiver. The message receiver is notified by some 195 mechanism that an untagged message has been received. 197 Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services. In 198 addition to protocols such as iSER [8] and NFSv4 over RDMA [9], 199 the ULP may be embedded in an application, or a middleware layer 200 as is often the case for the Sockets Direct Protocol (SDP) and 201 Remote Procedure Call (RPC) protocols. 203 3. Direct Placement 205 Direct Data Placement optimizes the placement of ULP payload into the 206 correct destination buffers, typically eliminating intermediate 207 copying. Placement is enabled without regard to order of arrival, 208 order of transmission or requiring per-placement interaction with the 209 ULP. 211 RDMAP minimizes the required ULP interactions . This capability is 212 most valuable for applications that require multiple transport layer 213 packets for each required ULP interaction. 215 3.1. Direct Placement using only the LLP 217 Direct data placement can be achieved without RDMA. Pre-posting of 218 receive buffers could allow a non-RDMA network stack to place data 219 directly to user buffers. 221 The degree to which DDP optimizes depends on which transport is being 222 compared with, and on the nature of the local interface. Without 223 RDMAP/DDP pre-posting buffers requires the receiving side to 224 accurately predict the required buffers and their sizes. This is not 225 feasible for all ULPs. By contrast, DDP only requires the ULP to 226 predict the sequence and size of incoming untagged messages. 228 An application that could predict incoming messages and required 229 nothing more than direct placement into buffers might be able to do 230 so with a properly designed local interface to native SCTP or TCP 231 (without RDMA). This is easier using native SCTP because the 232 application would only have to predict the sequence of messages and 233 the maximum size of each message, not the exact size of each message. 235 The main benefit of DDP for such an application would be that pre- 236 posting of receive buffers is a mandated local interface capability, 237 and that predictions can always be made on a per-message basis (not 238 per byte). 240 The Lower Layer Protocol, LLP, can also be used directly if ULP 241 specific knowledge is built into the protocol stack to allow "parse 242 and place" handling of received packets. Such a solution either 243 requires interaction with the ULP, or that the protocol stack have 244 knowledge of ULP specific syntax rules. 246 DDP achieves the benefits of directly placing incoming payload 247 without requiring tight coupling between the ULP and the protocol 248 stack. However, "parse and place" capabilities can certainly provide 249 equivalent services to a limited number of ULPs. 251 3.2. Fewer Required ULP Interactions 253 While reducing the number of required ULP interactions is in itself 254 desirable, it is critical for high speed connections. The burst 255 packet rate for a high speed interface could easily exceed the host 256 systems ability to switch ULP contexts. 258 Content access applications are primary examples of applications with 259 both high bandwidth and a high ratio of content transferred per 260 required ULP interaction. These applications include file access 261 protocols (NAS), storage access (SAN), database access and other 262 application specific forms of content access such as HTTP, XML and 263 email. 265 4. Tagged Messages 267 This section covers the major benefits from the use of Tagged 268 Messages. 270 A more critical advantage of DDP is the ability of the Data Source to 271 use tagged buffers. Tagging messages allows the Data Source to 272 choose the ordering and packetization of its payload deliveries. 273 With direct data placement based solely upon pre-posted receives, the 274 packetization and delivery of payload must be agreed by the ULP peers 275 in advance. 277 The Upper Layer Protocol can allocate content between untagged and/or 278 tagged messages to maximize the potential optimizations. Placing 279 content within an untagged message can deliver the content in the 280 same packet that signals completion to the receiver. This can 281 improve latency. It can even eliminate round trips. But it requires 282 making larger anonymous buffers to be available. 284 Some examples of data that typically belongs in the untagged message 285 would include: 287 short fixed-size control data that is inherently part of the 288 control message. This is especially true when the data is a 289 required part of the control message. 291 relatively short payload that is almost always needed, especially 292 when its inclusion would eliminate a round-trip to fetch the data. 293 Examples would include the initial data on a write request and 294 advertisements of tagged buffers. 296 Tagged messages standardizes direct placement of data without per- 297 packet interaction with the upper layers. Even if there is an upper 298 layer protocol encoding of what is being transferred, as is common 299 with middleware solutions, this information is not understood at the 300 application independent layers. The directions on where to place the 301 incoming data cannot be accessed without switching to the ULP first. 302 DDP provides a standardized 'packing list' which can be interpreted 303 without requiring ULP interaction. Indeed, it is designed to be 304 implementable in hardware. 306 4.1. Order Independent Reception 308 Tagged messages are directed to a buffer based on an included 309 Steering Tag. Additionally, no notice is provided to the ULP for each 310 individual Tagged Message's arrival. Together these allow tagged 311 messages received out-of-order to be processed without intermediate 312 buffering or additional notifications to the ULP. 314 4.2. Reduced ULP Notifications 316 RDMAP offers both tagged and untagged messages. No receiving side 317 ULP interactions are required for tagged messages. By optimally 318 dividing traffic between tagged and untagged messages the ULP can 319 limit the number of events that must be dealt with at the ULP layer. 320 This typically reduces the number of context switches required and 321 improves performance. 323 RDMAP further reduces required ULP interactions consolidating 324 completion notifications of tagged messages with the completion 325 notification of a trailing untagged message. For most ULPs this 326 radically reduces the number of ULP required interactions even 327 further. 329 While RDMAP consolidation of notices is beneficial to most 330 applications, it may be detrimental to some applications that benefit 331 from streamed delivery to enable ULP processing of received data as 332 promptly as possible. A ULP that uses RDMAP cannot begin processing 333 any portion of an exchange until it receives notification that the 334 entire exchange has been placed. An "exchange" here is a set of zero 335 or more tagged messages and a single terminating untagged message. 336 An application that would prefer to begin work on the received 337 payload, no matter what order it arrived in, as soon as possible 338 might prefer to work directly with the LLP. RDMAP is optimized for 339 applications that are more concerned when the entire exchange is 340 complete. 342 An application that benefits from being able to begin processing of 343 each received packet as quickly as possible may find RDMAP interferes 344 with that goal. 346 Such an application might be able to retain most of the benefits of 347 RDMAP by using the DDP layer directly. However, in addition to 348 taking on the responsibilities of the RDMAP layer, the application 349 would likely have more difficulty finding support for a DDP-only API. 350 Many hardware implementations may choose to tightly couple RDMAP and 351 DDP, and might not provide an API directly to DDP services. 353 These features minimize the required interactions with the ULP. This 354 can be extremely beneficial for applications that use multiple 355 transport layer packets to accomplish what is a single ULP 356 interaction. 358 4.3. Simplified ULP Exchanges 360 The notification rules for Tagged Messages allows ULPs to create 361 multi-message "exchanges" consisting of zero or more tagged messages 362 that represent a single step in the ULP interaction. The receiving 363 ULP is notified that the untagged message has arrived, and implicitly 364 of any associated tagged messages. 366 A ULP where all exchanges would naturally be untagged messages would 367 derive virtually no benefit from the use of RDMAP/DDP as opposed to 368 SCTP directly. But while tagged buffers are the justification for 369 RDMAP/DDP, untagged buffers are still necessary. Without untagged 370 buffers the only method to exchange buffer advertisements would 371 require out-of-band communications. Most RDMA-aware ULPs use 372 untagged buffers for requests and responses. Buffer advertisements 373 are typically done within these untagged messages. 375 More importantly there would be no reliable method for the upper 376 layer peers to synchronize. The absence of any guarantees about 377 ordering within or between tagged messages is fundamental to allowing 378 the DDP layer to optimize transfer of tagged payload. 380 So no ULP can be defined entirely in terms of tagged messages. 381 Eventually a notification that confirms delivery must be generated 382 from the RDMAP/DDP layer. 384 Limiting use of untagged buffers to requests and responses by moving 385 all bulk data using tagged transfers can greatly simplify the amount 386 of prediction that the Data Sink must perform in pre-posting receive 387 buffers. For example, a typical RDMA enabled interaction would 388 consist of the following: 390 Client sends transaction request to server's as an untagged 391 message. 393 This message includes buffer advertisements for the buffers where 394 the results are to be placed. 396 The Server sends multiple tagged messages to the advertised 397 buffers. 399 The Server sends transaction reply as an untagged message to the 400 client. 402 Client receives single notification, indicating completion of the 403 interaction. 405 With this type of exchange the pacing and required size of untagged 406 buffers is highly predictable. The variability of response sizes is 407 absorbed by tagged transfers. 409 4.4. Order Independent Sending 411 Use of tagged messages is especially applicable when the Data Sink 412 does not know the actual size, structure or location of the content 413 it is requesting (or updating). 415 For example, suppose the Data Sink ULP needs to fetch four related 416 pieces of data into a four separate buffers. With SCTP the Data Sink 417 ULP could receive four messages into four separate buffers, only 418 having to predict the maximum size of each. However it would have to 419 dictate the order in which the Data Source supplied the separate 420 pieces. If the Data Source found it advantageous to fetch them in a 421 different order it would have to use intermediate buffering to re- 422 order the pieces into the expected order even though the application 423 only required that all four be delivered and did not truly have an 424 ordering requirement. 426 Techniques such as RAID striping and mirroring represent this same 427 problem, but one step further. What appears to be a single resource 428 to the Data Sink is actually stored in separate locations by the Data 429 Source. Non RDMA protocols would either require the Data Source to 430 fetch the material in the desired order or force the Data Source to 431 use its own holding buffers to assemble an image of the destination 432 buffer. 434 While sometimes referred to as a "buffer-to-buffer" solution, RDMA 435 more fundamentally enables remote buffer access. The ULP is free to 436 work with larger remote buffers than it has locally. This reduces 437 buffering requirements and the number of times the data must be 438 copied in an end-to-end transfer. 440 There are numerous reasons why the Data Sink would not know the true 441 order or location of the requested data. It could be different for 442 each client, different records selected and/or different sort orders, 443 RAID striping, file fragmentation, volume fragmentation, volume 444 mirroring and server-side dynamic compositing of content (such as 445 server side includes for HTTP). 447 In all of these cases the Data Source is free to assemble the desired 448 data in the Data Sink's buffer in whatever order the component data 449 becomes available to it. It is not constrained on ordering. It does 450 not have to assemble an image in its own memory before creating it in 451 the Data Sink's buffers. 453 Note that while DDP enables use of tagged messages for bulk transfer, 454 there are some application scenarios where untagged messages would 455 still be used for bulk transfer. For example, a file server may not 456 expose its own memory to its clients. A client wishing to write may 457 advertise a buffer which the server will issue RDMA Reads upon. 458 However, when performing a small write it may be preferable to 459 include the data in the untagged message rather than incurring an 460 additional round trip with the RDMA Read and its response. 462 Generally, the best use of an untagged message is to synchronize and 463 to deliver data that is naturally tied to the same message as the 464 synchronization. For initial data transfers this has the additional 465 benefit of avoiding the need to advertise specific tagged buffers for 466 indefinite time periods. Instead anonymous buffers can be used for 467 initial data reception. Because anonymous buffers do not need to be 468 tied to specific messages in advance this can be a major benefit. 470 4.5. Untagged Messages and Tagged Buffers as ULP Credits 472 The handling of end-to-end buffer credits differs considerably with 473 DDP than when the ULP directly uses either TCP or SCTP. 475 With both TCP and SCTP buffer credits are based upon the receiver 476 granting transmit permission based on the total number of bytes. 477 These credits reflect system buffering resources and/or simple flow 478 control. They do not represent ULP resources. 480 DDP defines no standard flow control, but presumes the existince of a 481 ULP mechanism. The presumed mechanism is that the Data Sink ULP has 482 issued credits to the Data Source allowing the Data Source to send a 483 specific number of untagged messages. 485 The ULP peers must ensure that the sender is aware of the maximum 486 size that can be sent to any specific target buffer. One method of 487 doing so is to use a standard size for all untagged buffers within a 488 given connection. For example, a ULP may specify an initial untagged 489 buffer size to be used immediately after session establishment, and 490 then optionally specify mechanisms for negotiating changes. 492 Tagged buffers are ULP resources advertised directly from ULP to ULP. 493 A DDP put to a known tagged buffer is constrained only by transport 494 level flow control, not by available system buffering. 496 Either tagged or untagged buffers allows bypassing of system buffer 497 resources. Use of tagged buffers additionally allows the Data Source 498 to choose what order to exercise the credits in. 500 To the extent allowed by the ULP, tagged buffers are also divisible 501 resources. The Data Sink can advertise a single 100 KB buffer, and 502 then receive notifications from its peer that it had written 50 KB, 503 20 KB and 30 KB to that buffer in three successive transactions. 505 ULP-management of tagged buffer resources, independent of transport 506 and DDP layer credits, is an additional benefit of RDMA protocols. 507 Large bulk transfers cannot be blocked by limited general purpose 508 buffering capacity. Applications can flow control based upon higher 509 level abstractions, such as number of outstanding requests, 510 independent of the amount of data that must be transferred. 512 However, use of system buffering, as offered by direct use of the 513 underlying transports, can be preferable under certain circumstances. 515 One example would be when the number of target ULP buffers is 516 sufficiently large, and the rate at which any writes arrive is 517 sufficiently low, that pinning all the target ULP buffers in memory 518 would be undesirable. The maximum transfer rate, and hence the 519 maximum amount of system buffering required, may be more stable and 520 predictable than the total ULP buffer exposure. 522 Another would be the Data Sink wishes to receive a stream of data at 523 a predictable rate, but does not know in advance what the size of 524 each data packet will be. This is common from streaming media that 525 has been encoded with a variable bit rate. With DDP the Data Sink 526 would either have to use untagged buffers large enough for the 527 largest packet, or advertise a circular buffer. If for security or 528 other reasons the Data Sink did not want the size of its buffer to be 529 publicly known, using the underlying SCTP transport directly may be 530 preferable because of their byte-oriented credits. 532 5. RDMA Read 534 RDMA Reads are a further service provided by RDMAP. RDMA Reads allow 535 the Data Sink to fetch exactly the portion of the peer ULP buffer 536 required on a "just in time" basis. This can be done without 537 requiring per-fetch support from the Data Source ULP. 539 Storage servers may wish to limit the maximum write buffer allocated 540 to any single session. The storage server may be a very minimal 541 layer between the client and the disk storage media, or the server 542 may merely wish to limit the total resources that would be required 543 if all clients could push the entire payload they wished written at 544 their own convenience. 546 In either case, there is little benefit in transferring data from the 547 Data Source far in advance of when it will be written to the 548 persistent storage media. RDMA Reads allow the Storage Server to 549 fetch the payload on a "just in time" basis. In this fashion a 550 relatively small number of block sized buffers can be used to execute 551 a single transaction that specified writing a large file, or a 552 Storage Server with numerous clients can fetch buffers from the 553 individual clients in the order that is most convenient to the 554 server. 556 This same capability can be used when the desired portion of the 557 advertised buffer is not known in advance. For example the 558 advertised buffer could contain performance statistics. The data 559 sink could request the portions of the data it required, without 560 requiring an interaction with the Data Source ULP. 562 This is applicable for many applications that publish semi-volatile 563 data that does not require transactional validity checking (i.e., 564 authorized users have read access to the entire set of data). It is 565 less applicable when there are ULP consistency checks that must be 566 performed upon the data. Such applications would be better served by 567 having the client send a request, and having the server use RDMA 568 Writes to publish the requested data. Neither RDMAP or DDP provide 569 mechanisms for bundling multiple disjoint updates into an atomic 570 operation. Therefore use of an advertised buffer as a data resource 571 is subject to the same caveats as any randomly updated data resource, 572 such as flat files, that do not enforce their own consistency. 574 6. LLP Comparisons 576 Normally the choice of underlying IP transport is irrelevant to the 577 ULP. RDMAP and DDP provides the same services over either. There 578 may be performance impacts of the choice, however. It is the 579 responsibility of the ULP to determine which IP transport is best 580 suited to its needs. 582 SCTP provides for preservation of message boundaries. Each DDP 583 segment will be delivered within a single SCTP packet. The 584 equivalent services are only available with TCP through the use of 585 the MPA (Marker PDU Alignment) adaptation layer. 587 6.1. Multistreaming Implications 589 SCTP also provides multi-streaming. When the same pair of hosts have 590 need for multiple DDP streams this can be a major advantage. A 591 single SCTP association carries multiple DDP streams, consolidating 592 connection setup, congestion control and acknowledgements. 594 Completions are controlled by the DDP Source Sequence Number (DDP- 595 SSN) on a per stream basis. Therefore combining multiple DDP Streams 596 into a single SCTP association cannot result in a dropped packet 597 carrying data for one stream delaying completions on others. 599 6.2. Out of Order Reception Implications 601 The use of unordered Data Chunks with SCTP guarantees that the DDP 602 layer will be able to perform placements when IP datagrams are 603 received out of order. 605 Placement of out-of-order DDP Segments carried over MPA/TCP is not 606 guaranteed, but certainly allowed. The ability of the MPA receiver 607 to process out-of-order DDP Segments may be impaired when alignment 608 of TCP segments and MPA FPDUs is lost. Using SCTP, each DDP Segment 609 is encoded in a single Data Chunk and never spread over multiple IP 610 datagrams. 612 6.3. Header and Marker Overhead 614 MPA and TCP headers together are smaller than the headers used by 615 SCTP and its adaptation layer. However, this advantage can be 616 reduced by the insertion of MPA markers. The different in ULP 617 payload per IP Datagram is not likely to be a signifigant factor. 619 6.4. Middlebox Support 621 Even with the MPA adaptation layer, DDP traffic carried over MPA/TCP 622 will appear to all network middleboxes as a normal TCP connection. 623 In many environments there may be a requirement to use only TCP 624 connections to satisfy existing network elements and/or to facilitate 625 monitoring and control of connections. While SCTP is certainly just 626 as monitorable and controllable as TCP, there is no guarantee that 627 the network management infrastructure has the required support for 628 both. 630 6.5. Processing Overhead 632 A DDP stream delivered via MPA/TCP will require more processing 633 effort that one delivered over SCTP. However this extra work may be 634 justified for many deployments where full SCTP support is unavailable 635 in the endpoints of the network, or where middleboxes impair the 636 usability of SCTP. 638 6.6. Data Integrity Implications 640 Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c 641 protection against data accidental corruption, or its equivalent. 643 A ULP that requires a greater degree of protection may add it own. 644 However, DDP and RDMAP headers will only be guaranteed to have the 645 equivalent of end-to-end CRC32c protection. A ULP that requires data 646 integrity checking more thorough than an end-to-end CRC32c should 647 first invalidate all STags that reference a buffer before applying 648 their own integrity check. 650 CRC32c only provides protection against random corruption. To 651 protect against unauthorized alteration or forging of data packets 652 security methods must be applied. The security draft [RDMA-Security] 653 [7] specifies usage of RFC2406 [2] for both adaptation layers. 655 6.6.1. MPA/TCP Specifics 657 It is mandatory for MPA/TCP implementations to implement CRC32c, but 658 it is NOT mandatory to use the CRC32c during an RDMA connection. The 659 activating or deactivating of the CRC in MPA/TCP is an administrative 660 configuration operation at the local and remote end. The 661 administration of the CRC(ON/OFF) is invisible to the ULP. 663 Applications SHOULD trust that this administrative option will only 664 be used when the end-to-end protection is at least as effective as a 665 transport layer CRC32c. Applications SHOULD NOT apply additional 666 protection as a guard against this administrative option being turned 667 on inadvertently. 669 Administrators MUST NOT enable CRC32c suppression unless the end-to- 670 end protection is truly equivalent. 672 If the CRC is active/used for one direction/end , then the use of the 673 CRC is mandatory in both directions/ends. 675 If both ends have been configured NOT to use the CRC, then this is 676 allowed as long as an equivalent protection(comparable or better 677 than/to CRC) from undetected errors on the connection is provided. 679 6.6.2. SCTP Specifics 681 SCTP provides CRC32c protection automatically. The adaptation to 682 SCTP provides for no option to suppress SCTP CRC32c protection. 684 6.7. Non-IP Transports 686 DDP is defined to operate over ubiquitous IP transports such as SCTP 687 and TCP. This enabled a new DDP-enabled node to be added anywhere to 688 an IP network. No DDP-specific support from middle-boxes is 689 required. 691 There are non-IP transport fabric offering RDMA capabilities. 692 Because these capabilities are integrated with the transport protocol 693 they have some technical advantages when compared to RDMA over IP. 694 For example fencing of RDMA operations can be based upon transport 695 level acks. Because DDP is cleanly layered over an IP transport, any 696 explicit RDMA layer ack must be separate from the transport layer 697 ack. 699 There may be deployments where the benefits of RDMA/transport 700 integration outweigh the benefits of being on an IP network. 702 6.7.1. No RDMA Layer Ack 704 DDP does not provide for its own acknowledgements. The only form of 705 ack provided at the RDMAP layer is an RDMA Read Response. DDP and 706 RDMAP rely almost entirely upon other layers for flow control and 707 pacing. The LLP is relied upon to guarantee delivery and avoid 708 network congestion, and ULP level acking is relied upon for ULP 709 pacing and to avoid ULP buffer overruns. 711 Previous RDMA protocols, such as InfiniBand, have been able to use 712 their integration with the transport layer to provide stronger 713 ordering guarantees. It is important that application designers that 714 require such guarantees to provide them through ULP interaction. 716 Specifically: 718 There is no ability for a local interface to "fence" outbound 719 messages to guarantee that prior tagged messages have been placed 720 prior to sending a tagged message. The only guarantees available 721 from the other side would be an RDMA Read Response (coming from 722 the RDMAP layer) or a response from the ULP layer. Remember that 723 the normal ordering rules only guarantee when the Data Sink ULP 724 will be notified of untagged messages, it does not control when 725 data is placed into receive buffers. 727 Re-use of tagged buffers must be done with extreme care. The fact 728 that an untagged message indicates that all prior tagged messages 729 have been placed does not guarantee that no later tagged message 730 have. The best strategy is to only change the state of any given 731 advertised buffers with with untagged messages. 733 As covered elsewhere in this document, flow control of untagged 734 messages MUST be provided by the ULP itself. 736 6.8. Other IP Transports 738 Both TCP and SCTP provide DDP with reliable transport with TCP 739 friendly rate control. As currently DDP is defined to work over 740 reliable transports and implicitly relies upon some form of rate 741 control. 743 DDP is fully compatible with a non-reliable protocol. Out-of-order 744 placement is obviously not dependent on whether the other DDP 745 Segments ever actually arrive. 747 However, RDMAP requires the LLP to provide reliable service. An 748 alternate completion handling protocol would be required if DDP were 749 to be deployed over an unreliable IP transport. 751 As noted in the prior section on tagged buffers as ULP credits, 752 neither RDMAP or DDP provide any flow control for tagged messages. 753 If no transport layer flow control is provided, an RDMAP/DDP 754 application would be only limited by the link layer rate, almost 755 inevitably resulting in severe network congestion. 757 RDMAP encourages applications to be ignorant of the underlying 758 transport PMTU. The ULP is only notified when all messages ending in 759 a single untagged message have completed. The ULP is not aware of 760 the granularity or ordering of the underlying message. This approach 761 assumes that the ULP is only interested in the complete set of 762 messages, and has no use for a subset of them. 764 6.9. LLP Independent Session Establishment 766 For an RDMAP/DDP application, the transport services provided by a 767 pair of SCTP Streams and by a TCP connection both provide the same 768 service (reliable delivery of DDP Segments between two connected 769 RDMAP/DDP endpoints). 771 6.9.1. RDMA-only Session Establishment 773 It is also possible to allow for transport neutral establishment of 774 RDMAP/DDP sessions between endpoints. Combined, these two features 775 would allow most applications to be unconcerned as to which LLP was 776 actually in use. 778 Specifically, the procedures for DDP Stream Session establishment 779 discussed in section 3 of the SCTP mapping, and section 13.3 of the 780 MPA/TCP mapping, both allow for the exchange of ULP specific data 781 ("Private Data") before enabling the exchange of DDP Segments. This 782 delay can allow for proper selection and/or configuration of the 783 endpoints based upon the exchanged data. For example, each DDP 784 Stream Session associated with a single client session might be 785 assigned to the same DDP Protection Domain. 787 To be transport neutral, the applications should exchange Private 788 Data as part of session establishment messages to determine how the 789 RDMA endpoints are to be configured. One side must be the Initiator, 790 and the other the Responder. 792 With SCTP, a pair of SCTP streams can be used for successive sessions 793 while the SCTP association remains open. With MPA/TCP each 794 connection can be used for at most one session. However, the same 795 source/destination pair of ports can be re-used sequentially subject 796 to normal TCP rules. 798 Both SCTP and MPA limit the private data size to a maximum of 512 799 bytes. 801 MPA/TCP requires the end of the TCP connection that initiated the 802 conversion to MPA mode to send the first DDP Segment. SCTP does not 803 have this requirement. ULPs which wish to be transport neutral 804 should require the initiating end to send the first message. A zero- 805 length RDMA Write can be used for this purpose if the ULP logic 806 itself does naturally support this restriction. 808 6.9.2. RDMA-Conditional Session Establishment 810 It is sometimes desirable for the active side of a session to connect 811 with the passive side before knowing whether the passive side 812 supports RDMA. 814 This style of session establishment can be supported with either TCP 815 or SCTP, but not as transparently as for RDMA-only sessions. Pre- 816 existing non-RDMA servers are also far more likely to be using TCP 817 than SCTP. 819 With TCP. a normal TCP connection is established. It is then used by 820 the ULP to determine whether or not to convert to MPA mode and use 821 RDMA. This will typically be integral with other session 822 establishment negotiations. 824 With SCTP, the establishment of an association tests whether RDMA is 825 supported. If not supported, the application simply requests the 826 association without the RDMA adaptation indication. 828 One key difference is that with SCTP the determination as to whether 829 the peer can support RDMA is made before the transport layer 830 association/connection is established while with TCP the established 831 connection itself is used to determine whether RDMA is supported. 833 7. Local Interface Implications 835 Full utilization of DDP and RDMAP capabilities requires a local 836 interface that explicitly requests these services. Protocols such as 837 Sockets Direct Protocol (SDP) can allow applications to keep their 838 traditional byte-stream or message-stream interface and still enjoy 839 many of the benefits of the optimized wire level protocols. 841 8. IANA Considerations 843 There are no IANA considerations in this document. 845 9. Security considerations 847 RDMA security considerations are discussed in [RDMA-SEC] [7]. This 848 document will only deal with the more usage oriented aspects, and 849 where there are implications in the choice of underlying transport. 851 9.1. Connection/Association Setup 853 Both the SCTP and TCP adaptations allow for existing procedures to be 854 followed for the establishment of the SCTP association or TCP 855 connection. Use of DDP does not impair the use of any security 856 measures to filter, validate and/or log the remote end of an 857 association/connection. 859 9.2. Tagged Buffer Exposure 861 DDP only exposes ULP memory to the extent explicitly allowed by ULP 862 actions. These include posting of receive operations and enabling of 863 Steering Tags. 865 Neither RDMAP or DDP place requirements on how ULP's advertise 866 buffers. A ULP may use a single Steering Tag for multiple buffer 867 advertisements. However, the ULP should be aware that enforcement on 868 STag usage is likely limited to the overall range that is enabled. 869 If the remote peer writes into the 'wrong' advertised buffer, neither 870 the DDP or RDMAP layer will be aware of this. Nor is there any 871 report to the ULP on how the remote peer specifically used tagged 872 buffers. 874 Unless the ULP peers have an adequate basis for mutual trust, the 875 receiving ULP might be well advised to use a distinct STag for each 876 interaction, and to invalidate it after each use or to require its 877 peer to use the RDMAP option to invalidate the STag with its 878 responding untagged message. 880 9.3. Impact of Encrypted Transports 882 While DDP is cleanly layered over the LLP, its maximum benefit may be 883 limited when the LLP Stream is secured with a streaming cypher, such 884 as Transport Layer Security (TLS) RFC2246 [1]. If the LLP must 885 decrypt in order, it cannot provide out-of-order DDP Segments to the 886 DDP layer for placement purposes. IPsec RFC2406 [2]. tunnel mode 887 encrypts entire IP Datagrams. IPsec transport mode encrypts TCP 888 Segments or SCTP packets. In neither case should IPsec preclude 889 providing out-of-order DDP Segments to the DDP layer for placement. 891 Note that end-to-end use of IPsec cryptographic integrity protection 892 may allow suppression of MPA CRC generation and checking under 893 certain circumstances. This is one example where the LLP may be 894 judged to have "or equivalent" protection to an end-to-end CRC32c. 896 10. References 898 10.1. Normative references 900 [1] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", 901 RFC 2246, January 1999. 903 [2] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload 904 (ESP)", RFC 2406, November 1998. 906 [3] Recio, R., "An RDMA Protocol Specification", 907 draft-ietf-rddp-rdmap-05 (work in progress), July 2005. 909 [4] Shah, H., "Direct Data Placement over Reliable Transports", 910 draft-ietf-rddp-ddp-05 (work in progress), July 2005. 912 [5] Stewart, R., "Stream Control Transmission Protocol (SCTP) Remote 913 Direct Memory Access (RDMA) Direct Data Placement (DDP) 914 Adaptation", draft-ietf-rddp-sctp-02 (work in progress), 915 August 2005. 917 [6] Culley, P., "Marker PDU Aligned Framing for TCP Specification", 918 draft-ietf-rddp-mpa-03 (work in progress), October 2005. 920 [7] Pinkerton, J., "DDP/RDMAP Security", draft-ietf-rddp-security-09 921 (work in progress), May 2006. 923 10.2. Informative References 925 [8] Ko, M., "iSCSI Extensions for RDMA Specification", October 2005. 927 [9] Callaghan, B. and T. Talpey, "NFS Direct Data Placemetn", 928 draft-ietf-nfsv4-nfsdirect-02 (work in progress), October 2005. 930 Authors' Addresses 932 Caitlin Bestler 933 Broadcom Corporation 934 16215 Alton Parkway 935 P.O. Box 57013 936 Irvine, CA 92619-7013 937 USA 939 Phone: 949-926-6383 940 Email: caitlinb@broadcom.com 942 Lode Coene 943 Siemens 944 Atealaan 26 945 Herentals, 2200 946 Belgium 948 Phone: +32-14-252081 949 Email: lode.coene@siemens.com 951 Intellectual Property Statement 953 The IETF takes no position regarding the validity or scope of any 954 Intellectual Property Rights or other rights that might be claimed to 955 pertain to the implementation or use of the technology described in 956 this document or the extent to which any license under such rights 957 might or might not be available; nor does it represent that it has 958 made any independent effort to identify any such rights. Information 959 on the procedures with respect to rights in RFC documents can be 960 found in BCP 78 and BCP 79. 962 Copies of IPR disclosures made to the IETF Secretariat and any 963 assurances of licenses to be made available, or the result of an 964 attempt made to obtain a general license or permission for the use of 965 such proprietary rights by implementers or users of this 966 specification can be obtained from the IETF on-line IPR repository at 967 http://www.ietf.org/ipr. 969 The IETF invites any interested party to bring to its attention any 970 copyrights, patents or patent applications, or other proprietary 971 rights that may cover technology that may be required to implement 972 this standard. Please address the information to the IETF at 973 ietf-ipr@ietf.org. 975 Disclaimer of Validity 977 This document and the information contained herein are provided on an 978 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 979 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 980 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 981 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 982 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 983 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 985 Copyright Statement 987 Copyright (C) The Internet Society (2006). This document is subject 988 to the rights, licenses and restrictions contained in BCP 78, and 989 except as set forth therein, the authors retain all their rights. 991 Acknowledgment 993 Funding for the RFC Editor function is currently provided by the 994 Internet Society.