idnits 2.17.1 draft-ietf-rddp-applicability-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard == It seems as if not all pages are separated by form feeds - found 0 form feeds but 21 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 329 has weird spacing: '...r sends multi...' == Line 413 has weird spacing: '...g so is to us...' == Line 415 has weird spacing: '...ich the untag...' == Line 434 has weird spacing: '...control based...' == Line 620 has weird spacing: '...e level proto...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (June 12, 2003) is 7623 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '1' is defined on line 671, but no explicit reference was found in the text == Unused Reference: '2' is defined on line 674, but no explicit reference was found in the text == Unused Reference: '3' is defined on line 678, but no explicit reference was found in the text == Unused Reference: '4' is defined on line 681, but no explicit reference was found in the text == Unused Reference: '5' is defined on line 685, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 688, but no explicit reference was found in the text == Unused Reference: '7' is defined on line 691, but no explicit reference was found in the text == Unused Reference: '8' is defined on line 694, but no explicit reference was found in the text == Unused Reference: '9' is defined on line 699, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2246 (ref. '2') (Obsoleted by RFC 4346) ** Obsolete normative reference: RFC 2406 (ref. '3') (Obsoleted by RFC 4303, RFC 4305) ** Obsolete normative reference: RFC 2960 (ref. '4') (Obsoleted by RFC 4960) ** Downref: Normative reference to an Informational RFC: RFC 3257 (ref. '5') == Outdated reference: A later version (-07) exists of draft-ietf-rddp-rdmap-00 == Outdated reference: A later version (-07) exists of draft-ietf-rddp-ddp-00 -- Possible downref: Normative reference to a draft: ref. '8' == Outdated reference: A later version (-03) exists of draft-culley-iwarp-mpa-02 -- Possible downref: Normative reference to a draft: ref. '9' Summary: 7 errors (**), 0 flaws (~~), 20 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Remote Direct Data Placement C. Bestler 3 Working group L. Coene 4 Internet-Draft June 12, 2003 5 Expires: December 11, 2003 7 Applicability of Remote Direct Memory Access Protocol (RDMA) and 8 Direct Data Placement (DDP) 9 draft-ietf-rddp-applicability-00.txt 11 Status of this Memo 13 This document is an Internet-Draft and is in full conformance with 14 all provisions of Section 10 of RFC2026. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six months 22 and may be updated, replaced, or obsoleted by other documents at any 23 time. It is inappropriate to use Internet-Drafts as reference 24 material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at http:// 27 www.ietf.org/ietf/1id-abstracts.txt. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on December 11, 2003. 34 Copyright Notice 36 Copyright (C) The Internet Society (2003). All Rights Reserved. 38 Abstract 40 This document describes the applicability of Remote Direct Memory 41 Access Protocol (RDMAP) and the Direct Data Placement Protocol 42 (DDP). It contrasts the different transport options over IP that DDP 43 can use, compares use of DDP with direct use of the supporting 44 transports, and compares DDP over IP transports with non-IP 45 transports that support RDMA functionality. 47 Table of Contents 49 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 50 2. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 5 51 3. Direct Placement . . . . . . . . . . . . . . . . . . . . . . . 6 52 3.1 Fewer Required ULP Interactions . . . . . . . . . . . . . . . 6 53 3.2 Direct Placement using only the LLP . . . . . . . . . . . . . 6 54 4. Tagged Messages . . . . . . . . . . . . . . . . . . . . . . . 8 55 4.1 Order Independent Reception . . . . . . . . . . . . . . . . . 8 56 4.2 Reduced ULP Notifications . . . . . . . . . . . . . . . . . . 8 57 4.3 Simplified ULP Exchanges . . . . . . . . . . . . . . . . . . . 9 58 4.4 Order Independent Sending . . . . . . . . . . . . . . . . . . 10 59 4.5 Tagged Buffers as ULP Credits . . . . . . . . . . . . . . . . 11 60 5. RDMA Read . . . . . . . . . . . . . . . . . . . . . . . . . . 13 61 6. LLP Comparisons . . . . . . . . . . . . . . . . . . . . . . . 14 62 6.1 Multistreaming Implications . . . . . . . . . . . . . . . . . 14 63 6.2 Out of Order Reception Implications . . . . . . . . . . . . . 14 64 6.3 Header and Marker Overhead . . . . . . . . . . . . . . . . . . 14 65 6.4 Data Integrity Implications . . . . . . . . . . . . . . . . . 15 66 6.5 Non-IP Transports . . . . . . . . . . . . . . . . . . . . . . 15 67 6.6 Other IP Transports . . . . . . . . . . . . . . . . . . . . . 15 68 7. Local Interface Implications . . . . . . . . . . . . . . . . . 17 69 8. Security considerations . . . . . . . . . . . . . . . . . . . 18 70 8.1 Connection/Association Setup . . . . . . . . . . . . . . . . . 18 71 8.2 Tagged Buffer Exposure . . . . . . . . . . . . . . . . . . . . 18 72 8.3 Impact of Encrypted Transports . . . . . . . . . . . . . . . . 18 73 References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 74 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 19 75 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 21 77 1. Introduction 79 Remote Direct Memory Access Protocol (RDMAP) and Direct Data 80 Placement (DDP) work together to provide application independent 81 efficient placemenet of application payload directly into buffers 82 specified by the Upper Layer Protocol (ULP). 84 The DDP protocol is responsible for direct placement of received 85 payload into ULP specified buffers. The RDMAP protocol provides 86 completion notifications to the ULP and support for Data Sink 87 initiated fetch of advertised buffers (RDMA Reads). 89 DDP and RDMAP are both application independent protocols which allow 90 the ULP to perform remote direct data placement. DDP can use 91 multiple standard IP transports including SCTP and TCP. 93 By clarifying the situations where the functionality of these 94 protocols are applicable, this document can guide implementers, 95 application and protocol designers in selecting which protocols to 96 use. 98 The applicability of RDMAP/DDP is driven by their unique 99 capabilities: 101 o The existence of an application independent protocol allows common 102 solutions to be implemented in hardware and/or the kernel. This 103 document will discuss when common data placement procedures are of 104 the greatest benefit to applications as contrasted with 105 application specific solutions built on top of direct use of the 106 underlying transport. 108 o DDP supports both untagged and tagged buffers. Tagged buffers 109 allow the Data Sink ULP to be indifferent to what order (or in 110 what packets) the Data Source sent the data, or what order they 111 are received in. This document will discuss when Data Source 112 flexibility is of benefit to applications. 114 o RDMAP consolidates ULP notifications, thereby minimizing the 115 number of required ULP interactions. 117 o RDMAP defines RDMA Reads, which allow remote access to advertised 118 buffers. This document will review the advantages of using RDMA 119 Reads as contrasted to alternate solutions. 121 Some non-IP transports, such as InfiniBand, directly integrate RDMA 122 features. This document will review the applicability of providing 123 RDMA services over ubiquitous IP transports as opposed to the use of 124 customized transport protocols. 126 The full capabilities of DDP and RDMAP can only be fully realized by 127 applications that are designed to exploit them. The co-existence of 128 RDMAP/DDP aware local interfaces with traditional socket interfaces 129 will also be explored. 131 Finally, DDP support is defined for at least two IP transports: SCTP 132 and TCP. The rationale for supporting both transports is reviewed, 133 as well as when each would be the appropriate selection. 135 2. Definitions 137 Advertisement - the act of informing a Remote Peer that a local RDMA 138 Buffer is available to it. A Node makes available an RDMA Buffer 139 for incoming RDMA Read or RDMA Write access by informing its RDMA/ 140 DDP peer of the Tagged Buffer identifiers (STag, base address, and 141 buffer length). This advertisement of Tagged Buffer information 142 is not defined by RDMA/DDP and is left to the ULP. A typical 143 method would be for the Local Peer to embed the Tagged Buffer's 144 Steering Tag, base address, and length in a Send Message destined 145 for the Remote Peer. 147 Data Sink - The peer receiving a data payload. Note that the Data 148 Sink can be required to both send and receive RDMA/DDP Messages to 149 transfer a data payload. 151 Data Source - The peer sending a data payload. Note that the Data 152 Source can be required to both send and receive RDMA/DDP Messages 153 to transfer a data payload. 155 Lower Layer Protocol (LLP) The transport protocol that provides 156 services to DDP. This is an IP transport with any required 157 adaptation layer. Adaptation layers are defined for SCTP and TCP. 159 Steering Tag (STag) An identifier of a Tagged Buffer on a Node, valid 160 as defined within a protocol specification. 162 Tagged Message A DDP message that is directed to a ULP specified 163 buffer based upon imbedded addressing information. In the 164 immediate sense, the destination buffer is specified by the 165 message sender. 167 Untagged Message A DDP message that is directed to a ULP specified 168 buffer based upon a Message Sequence Number being matched with a 169 receiver supplied buffer. The destination buffer is specified by 170 the message receiver. 172 Upper Layer Protocol (ULP) The direct user of RDMAP/DDP services. 173 This may be an application, or a middleware layer such as Sockets 174 Direct Protocol (SDP) or Remote Procedure Calls (RPC). 176 3. Direct Placement 178 Direct Data Placement optimizes the placement of ULP payload into the 179 correct destination buffers, typically eliminating intermediate 180 copying. Placement is enabled without regard to order of arrival, 181 order of transmission or requiring per-placement interaction with the 182 ULP. 184 RDMAP minimizes the required ULP interactions . This capability is 185 most valuable for applications that require multiple transport layer 186 packets for each required ULP interaction. 188 3.1 Fewer Required ULP Interactions 190 While reducing the number of required ULP interactions is in itself 191 desirable, it is critical for high speed connections. The burst 192 packet rate for a high speed interface could easily exceed the host 193 systems ability to switch ULP contexts. 195 Content access applications are primary examples of applications with 196 both high bandwidth and high content to required ULP interaction 197 ratios. These applications include file access protocols (NAS), 198 storage access (SAN), database access and other application specific 199 forms of content access such as HTTP, XML and email. 201 Direct data placement can be achieved without RDMA. Pre-posting of 202 receive buffers could allow a non-RDMA network stack to place data 203 directly to user buffers. 205 3.2 Direct Placement using only the LLP 207 The degree to which DDP optimizes depends on which transport is being 208 compared with, and on the nature of the local interface. Without 209 RDMAP/DDP pre-posting buffers requires the receiving side to 210 accurately predict the required buffers and their sizes. This is not 211 feasible for all ULPs. By contrast, DDP only requires the ULP to 212 predict the sequence and size of incoming untagged messages. 214 An application that could predict incoming messages and required 215 nothing more than direct placement into buffers might be able to do 216 so with a properly designed local interface to SCTP or TCP. Doing so 217 for TCP requires making predictions at a byte level rather than a 218 message level. 220 The main benefit of DDP for such an application would be that pre- 221 posting of receive buffers is a mandated local interface capability, 222 and that predictions can be made on a per-message basis (not per 223 byte). 225 The LLP can also be used directly if ULP specific knowledge is built 226 into the protocol stack to allow "parse and place" handling of 227 received packets. Such a solution either requires interaction with 228 the ULP, or that the protocol stack have knowledge of ULP specific 229 syntax rules. 231 DDP achieves the benefits of directly placing incoming payload 232 without requiring tight coupling between the ULP and the protocol 233 stack. However, "parse adn place" capabilities can certainly provide 234 equivalent services to a limited number of ULPs. 236 4. Tagged Messages 238 This section covers the major benefits from the use of Tagged 239 Messages. 241 A more critical advantage of DDP is the ability of the Data Source to 242 use tagged buffers. Tagging transfers allows the Data Source to 243 choose the ordering and packetization of its payload deliveries. 244 With direct data placement based solely upon pre-posted receives, the 245 packetization and delivery of payload must be agreed by the ULP 246 peers. Even if there is an encoding of what is being transferred, as 247 is common with middleware solutions, this information is not 248 understood at the application independent layers. The directions on 249 where to place the incoming data cannot be accessed without switching 250 to the ULP first. DDP provides a standardized 'packing list' which 251 can be interpreted without requiring ULP interaction. Indeed, it is 252 designed to be implementable in hardware. 254 4.1 Order Independent Reception 256 Tagged messages are directed to a buffer based on an included 257 Steering Tag. Additionally, no notice is provided to the ULP for 258 each individual Tagged Message's arrival. Together these allow 259 tagged messages received out-of-order to be processed without 260 intermediate buffering or additional notifications to the ULP. 262 4.2 Reduced ULP Notifications 264 RDMAP further reduces required ULP interactions consolidating 265 completion notifications of tagged messages with the completion 266 notification of a trailing untagged message. For most ULPs this 267 radically reduces the number of ULP required interactions even 268 further. 270 While RDMAP consolidation of notices is beneficial to most 271 applications. It may be detrimental to some applications that 272 benefit from streamed delivery to enable ULP processing of received 273 data as promptly as possible. A ULP that uses RDMAP cannot begin 274 processing any portion of an exchange until it receives notification 275 that the entire exchange has been placed. An "exchange" here is a 276 set of zero or more tagged messages and a single terminating untagged 277 message. An application that would prefer to begin work on the 278 received payload, no matter what order it arrived in, as soon as 279 possible might prefer to work directly with the LLP. RDMAP is 280 optimized for applications that are more concerned when the entire 281 exchange is complete. 283 An application that benefits from being able to begin processing of 284 each received packet as quickly as possible may find RDMAP interferes 285 with that goal. 287 Such an application might be able to retain most of the benefits of 288 RDMAP by using the DDP layer directly. However, in addition to 289 taking on the responsibilities of the RDMAP layer, the application 290 would likely have more difficulty finding support for a DDP-only API. 291 Many hardware implementations may choose to tightly couple RDMAP and 292 DDP, and might not provide an API directly to DDP services. 294 These features minimize the required interactions with the ULP. This 295 can be extremely beneficial for applications that use multiple 296 transport layer packets to accomplish what is a single ULP 297 interaction. 299 4.3 Simplified ULP Exchanges 301 The notification rules for Tagged Messages allows ULPs to create 302 multi-message "exchanges" consisting of zero or more tagged messages 303 that represent a single step in the ULP interaction. The receiving 304 ULP is notified that the untagged message has arrived, and implicitly 305 of any associated tagged messages. 307 A ULP where all exchanges would naturally be only the untagged 308 message would derive virtually no benefit from the use of RDMAP/DDP 309 as opposed to SCTP. But while tagged buffers are the justification 310 for RDMAP/DDP, untagged buffers are still necessary. Without 311 untagged buffers the only method to exchange buffer advertisements 312 would involve out-of-band communications and/or sharing of compile 313 time constants. Most RDMA-aware ULPs use untagged buffers for 314 requests and responses. Buffer advertisements are typically done 315 within these untagged messages. 317 Limiting use of untagged buffers to requests and responses by moving 318 all bulk data using tagged transfers can greatly simplify the amount 319 of prediction that the Data Sink must perform in pre-posting receive 320 buffers. For example, a typical RDMA enabled interaction would 321 consist of the following: 323 Client sends transaction request to server's as an untagged 324 message. 326 This message includes buffer advertisements for the buffers where 327 the results are to be placed. 329 The Server sends multiple tagged messages to the advertised 330 buffers. 332 The Server sends transaction reply as an untagged message to the 333 client. 335 Client receives single notification, indicating completion of the 336 interaction. 338 With this type of exchange the pacing and required size of untagged 339 buffers is highly predictable. The variability of response sizes is 340 absorbed by tagged transfers. 342 4.4 Order Independent Sending 344 Use of tagged messages is especially applicable when the Data Sink 345 does not know the actual size, structure or location of the content 346 it is requesting (or updating). 348 For example, suppose the Data Sink ULP needs to fetch four related 349 pieces of data into a four separate buffers. With SCTP the Data Sink 350 ULP could receive four messages into four separate buffers, only 351 having to predict the maximum size of each. However it would have to 352 dictate the order in which the Data Source supplied the separate 353 pieces. If the Data Source found it advantageous to fetch them in a 354 different order it would have to use intermediate buffering to re- 355 order the pieces into the expected order even though the application 356 only required that all four be delivered and did not truly have an 357 ordering requirement. 359 Techniques such as RAID striping and mirroring represent this same 360 problem, but one step further. What appears to be a single resource 361 to the Data Sink is actually stored in separate locations by the Data 362 Source. Non RDMA protocols would either require the Data Source to 363 fetch the material in the desired order or force the Data Source to 364 use its own holding buffers to assemble an image of the destination 365 buffer. 367 While sometimes referred to as a "buffer-to-buffer" solution, RDMA 368 more fundamentally enables remote buffer access. The ULP is free to 369 work with larger remote buffers than it has locally. This reduces 370 buffering requirements and the number of times the data must be 371 copied in an end-to-end transfer. 373 There are numerous reasons why the Data Sink would not know the true 374 order or location of the requested data. It could be different for 375 each client, different records selected and/or different sort orders, 376 RAID striping, file fragmentation, volume fragmentation, volume 377 mirroring and server-side dynamic compositing of content (such as 378 server side includes for HTTP). 380 In all of these cases the Data Source is free to assemble the desired 381 data in the Data Sinks buffer in whatever order the component data 382 becomes available to it. It is not constrained on ordering. It does 383 not have to assemble an image in its own memory before creating it in 384 the Data Sink's buffers. 386 Note that while DDP enables use of tagged messages for bulk transfer, 387 there are some application scenarios where untagged messages would 388 still be used for bulk transfer. For example, under the Direct 389 Access File Server (DAFS) protocol the file server does not expose 390 its own memory to its clients. A client wishing to write may 391 advertise a buffer which the server will issue RDMA Reads upon. 392 However, when performing a small write it may be preferable to 393 include the data in the untagged message rather than incurring an 394 additional round trip with the RDMA Read and its response. 396 4.5 Tagged Buffers as ULP Credits 398 The handling of end-to-end buffer credits differs considerably with 399 DDP than when the ULP directly uses either TCP or SCTP. 401 With both TCP and SCTP buffer credits are based upon the receiver 402 granting transmit permission based on the total number of bytes. 403 These credits reflect system buffering resources and/or simple flow 404 control. They do not represent ULP resources. 406 DDP defines no standard flow control, but presumes the existince of a 407 ULP mechanism. The presumed mechanism is that the Data Sink ULP has 408 issued credits to the Data Source allowing the Data Source to send a 409 specific number of untagged messages. 411 The ULP peers must ensure that the sender is aware of the maximum 412 size that can be sent to any specific target buffer. One method of 413 doing so is to use a standard size for all untagged buffers within a 414 given connection. For example, DAFS specifies an initial size 415 requirement for session establishment, during which the untagged 416 buffer size for the remainder of the session is negotiated. 418 Tagged buffers are ULP resources advertised directly from ULP to ULP. 419 A DDP put to a known tagged buffer is constrained only by transport 420 level flow control, not by available system buffering. 422 Either tagged or untagged buffers allows bypassing of system buffer 423 resources. Use of tagged buffers additionally allows the Data Source 424 to choose what order to exercise the credits in. 426 To the extent allowed by the ULP, tagged buffers are also divisible 427 resources. The Data Sink can advertise a single 100 KB buffer, and 428 then receive notifications from its peer that it had written 50 KB, 429 20 KB and 30 KB to that buffer in three successive transactions. 431 ULP-management of tagged buffer resources, independent of transport 432 and DDP layer credits, is an additional benefit of RDMA protocols. 433 Large bulk transfers cannot be blocked by limited general purpose 434 buffering capacity. Applications can flow control based upon higher 435 level abstractions, such as number of outstanding requests, 436 independent of the amount of data that must be transferred. 438 However, use of system buffering, as offered by direct use of the 439 underlying transports, can be preferable under certain circumstances. 441 One example would be when the number of target ULP buffers is 442 sufficiently large, and the rate at which any writes arrive is 443 sufficiently low, that pinning all the target ULP buffers in memory 444 would be undesirable. The maximum transfer rate, and hence the 445 maximum amount of system buffering required, may be more stable and 446 predictable than the total ULP buffer exposure. 448 Another would be the Data Sink wishes to receive a stream of data at 449 a predictable rate, but does not know in advance what the size of 450 each data packet will be. This is common from streaming media that 451 has been encoded with a variable bit rate. With DDP the Data Sink 452 would either have to use untagged buffers large enough for the 453 largest packet, or advertise a circular buffer. If for security or 454 other reasons the Data Sink did not want the size of its buffer to be 455 publicly known, using the underlying SCTP transport directly may be 456 preferable because of their byte-oriented credits. 458 5. RDMA Read 460 RDMA Reads are a further service provided by RDMAP. RDMA Reads allow 461 the Data Sink to fetch exactly the portion of the peer ULP buffer 462 required on a "just in time" basis. This can be done without 463 requiring per-fetch support from the Data Source ULP. 465 Storage servers may wish to limit the maximum write buffer allocated 466 to any single session. The storage server may be a very minimal 467 layer between the client and the disk storage media, or the server 468 may merely wish to limit the total resources that would be required 469 if all clients could push the entire payload they wished written at 470 their own convenience. 472 In either case, there is little benefit in transferring data from the 473 Data Source far in advance of when it will be written to the 474 persistent storage media. RDMA Reads allow the Storage Server to 475 fetch the payload on a "just in time" basis. In this fashion a 476 relatively small number of block sized buffers can be used to execute 477 a single transaction that specified writing a large file, or a 478 Storage Server with numerous clients can fetch buffers from the 479 individual clients in the order that is most convenient to the 480 server. 482 This same capability can be used when the desired portion of the 483 advertised buffer is not known in advance. For example the 484 advertised buffer could contain performance statistics. The data 485 sink could request the portions of the data it required, without 486 requiring an interaction with the Data Source ULP. 488 This is applicable for many applications that publish semi-volatile 489 data that does not require transactional validity checking (i.e., 490 authorized users have read access to the entire set of data). It is 491 less applicable when there are ULP consistency checks that must be 492 performed upon the data. Such applications would be better served by 493 having the client send a request, and having the server use RDMA 494 Writes to publish the requested data. Neither RDMAP or DDP provide 495 mechanisms for bundling multiple disjoint updates into an atomic 496 operation. Therefore use of an advertised buffer as a data resource 497 is subject to the same caveats as any randomly updated data resource, 498 such as flat files, that do not enforce their own cosnsistency. 500 6. LLP Comparisons 502 Normally the choice of underlying IP transport is irrelevant to the 503 ULP. RDMAP and DDP provides the same services over either. There 504 may be performance impacts of the choice, however. It is the 505 responsibility of the ULP to determine which IP transport is best 506 suited to its needs. 508 SCTP provides for preservation of message boundaries. Each DDP 509 segment will be delivered within a single SCTP packet. The 510 equivalent services are only available with TCP through the use of 511 the MPA adaptation layer. 513 6.1 Multistreaming Implications 515 SCTP also provides multi-streaming. When the same pair of hosts have 516 need for multiple DDP streams this can be a major advantage. A 517 single SCTP association carries multiple DDP streams, consolidating 518 connection setup and flow control. 520 Completions are controlled by the DDP Source Sequence Number (DDP- 521 SSN) on a per stream basis. Therefore combining multiple DDP Streams 522 into a single SCTP association cannot result in a dropped packet 523 carrying data for one stream delaying completions on others. 525 6.2 Out of Order Reception Implications 527 The use of unordered Data Chunks with SCTP guarantees that the DDP 528 layer will be able to perform placements when IP datagrams are 529 received out of order. 531 Placement of out-of-order DDP Segments carried over MPA/TCP is not 532 guaranteed, but certainly allowed. The ability of the MPA receiver 533 to process out-of-order DDP Segments may be impaired when TCP 534 alignment is lost. Using SCTP, each DDP Segment is encoded in a 535 single Data Chunk and never spread over multiple IP datagrams. 537 6.3 Header and Marker Overhead 539 MPA and TCP headers together are smaller than the headers used by 540 SCTP and its adaptation layer. However, this advantage can be 541 considerably reduced by the insertion of MPA markers. In any event 542 the different in ULP payload per IP Datagram is not likely to be a 543 signifigant factor. 545 Even with the MPA adaptation layer, DDP traffic will appear to all 546 network traffic as a normal TCP connection. In many environmenets 547 there may be a requirement to use only TCP connections to satisfy 548 existing network elements and/or to facilitate monitoring and control 549 of connections. 551 A DDP stream delivered via MPA/TCP will require more processing 552 effort than one delivered over SCTP. However this extra work may be 553 justified for many deployments where full SCTP support is unavailable 554 in the intermediate network. 556 6.4 Data Integrity Implications 558 Both the SCTP and MPA/TCP adaptation provide end-to-end CRC32c 559 protection against data corruption, or its equivalent. 561 A ULP that requires a greater degree of protection may add it own. 562 However, DDP and RDMAP headers will only be guaranteed to have the 563 equivalent of end-to-end CRC32c protection. A ULP that requires data 564 integrity checking more thorough than an end-to-end CRC32c should 565 first invalidate all STags that reference a buffer before applying 566 their own integrity check. 568 6.5 Non-IP Transports 570 DDP is defined to operate over ubiquitous IP transports such as SCTP 571 and TCP. This enabled a new DDP-enabled node to be added anywhere to 572 an IP network. No DDP-specific support from middle-boxes is 573 required. 575 There are non-IP transport fabric offering RDMA capabilities. 576 Because these capabilities are integrated with the transport protocol 577 they have some technical advantages when compared to RDMA over IP. 578 For example fencing of RDMA operations can be based upon transport 579 level acks. Because DDP is cleanly layered over an IP transport, any 580 explicit RDMA layer ack must be separate from the transport layer 581 ack. 583 There may be deployments where the benefits of RDMA/transport 584 integration outweigh the benefits of being on an IP network. 586 6.6 Other IP Transports 588 Both TCP and SCTP provide DDP with reliable transport with TCP 589 friendly rate control. As currently DDP is defined to work over 590 reliable transports and implicitly relies upon some form of rate 591 control. 593 DDP is fully compatible with a non-reliable protocol. Out-of-order 594 placement is obviously not dependent on whether the other DDP 595 Segments ever actually arrive. 597 However, RDMAP requires the LLP to provide reliable service. An 598 alternate completion handling protocol would be required if DDP were 599 to be deployed over an unreliable IP transport. 601 As noted in the prior section on tagged buffers as ULP credits, 602 neither RDMAP or DDP provide any flow control for tagged messages. 603 If no transport layer flow control is provided, an RDMAP/DDP 604 application would be only limited by the link layer rate, almost 605 inevitably resulting in severe network congestion. 607 RDMAP encourages applications to be ignorant of the underlying 608 transport PMTU. The ULP is only notified when all messages ending in 609 a single untagged message have completed. The ULP is not aware of 610 the granularity or ordering of the underlying message. This approach 611 assumes that the ULP is only interested in the complete set of 612 messages, and has no use for a subset of them. 614 7. Local Interface Implications 616 Full utilization of DDP and RDMAP capabilities requires a local 617 interface that explicitly requests these services. Protocols such as 618 Sockets Direct Protocol (SDP) can allow applications to keep their 619 traditional byte-stream or message-stream interface and still enjoy 620 many of the benefits of the optimized wire level protocols. 622 8. Security considerations 624 8.1 Connection/Association Setup 626 Both the SCTP and TCP adaptations allow for existing procedures to be 627 followed for the establishment of the SCTP association or TCP 628 connection. Use of DDP does not impair the use of any security 629 measures to filter, validate and/or log the remote end of an 630 association/connection. 632 8.2 Tagged Buffer Exposure 634 DDP only exposes ULP memory to the extent explicitly allowed by ULP 635 actions. These include posting of receive operations and enabling of 636 Steering Tags. 638 Neither RDMAP or DDP place requirements on how ULP's advertise 639 buffers. A ULP may use a single Steering Tag for multiple buffer 640 advertisements. However, the ULP should be aware that enforcement on 641 STag usage is likely limited to the overall range that is enabled. 642 If the remote peer writes into the 'wrong' advertised buffer, neither 643 the DDP or RDMAP layer will be aware of this. Nor is there any 644 report to the ULP on how the remote peer specifically used tagged 645 buffers. 647 Unless the ULP peers have an adequate basis for mutual trust, the 648 receiving ULP might be well advised to use a distinct STag for each 649 interaction, and to invalidate it after each use or to require its 650 peer to use the RDMAP option to invalidate the STag with its 651 responding untagged message. 653 8.3 Impact of Encrypted Transports 655 While DDP is cleanly layered over the LLP, its maximum benefit may be 656 limited when the LLP Stream is secured with a streaming cypher, such 657 as Transport Layer Security (TLS). If the LLP must decrypt in order, 658 it cannot provide out-of-order DDP Segments to the DDP layer for 659 placement purposes. IPsec tunnel mode encrypts entire IP Datagrams. 660 IPsec transport mode encrypts TCP Segments or SCTP packets. In 661 neither case should IPsec preclude providing out-of-order DDP 662 Segments to the DDP layer for placement. 664 Note that end-to-end use of IPsec cryptographic integrity protection 665 may allow suppression of MPA CRC generation and checking under 666 certain circumstances. This is one example where the LLP may be 667 judged to have "or equivalent" protection to an end-to-end CRC32c. 669 References 671 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 672 Levels", BCP 14, RFC 2119, March 1997. 674 [2] Dierks, T., Allen, C., Treese, W., Karlton, P., Freier, A. and 675 P. Kocher, "The TLS Protocol Version 1.0", RFC 2246, January 676 1999. 678 [3] Kent, S. and R. Atkinson, "IP Encapsulating Security Payload 679 (ESP)", RFC 2406, November 1998. 681 [4] Stewart, R., Xie, Q., Morneault, K., Sharp, C., Schwarzbauer, 682 H., Taylor, T., Rytina, I., Kalla, M., Zhang, L. and V. Paxson, 683 "Stream Control Transmission Protocol", RFC 2960, October 2000. 685 [5] Coene, L., "Stream Control Transmission Protocol Applicability 686 Statement", RFC 3257, April 2002. 688 [6] Recio, R., "An RDMA Protocol Specification", draft-ietf-rddp- 689 rdmap-00 (work in progress), February 2003. 691 [7] Shah, H., "Direct Data Placement over Reliable Transports", 692 draft-ietf-rddp-ddp-00 (work in progress), February 2003. 694 [8] Stewart, R., "Stream Control Transmission Protocol (SCTP) Remote 695 Direct Memory Access (RDMA) Direct Data Placement (DDP) 696 Adaption", draft-stewart-rddp-sctp-02 (work in progress), 697 February 2003. 699 [9] Culley, P., "Marker PDU Aligned Framing for TCP Specification", 700 draft-culley-iwarp-mpa-02 (work in progress), February 2003. 702 Authors' Addresses 704 Caitlin Bestler 705 1241 W. North Shore 706 # 2G 707 Chicago, IL 60626 708 USA 710 Phone: +1-773-743-1594 711 EMail: cait@asomi.com 712 Lode Coene 713 Atealaan 26 714 Herentals, 2200 715 Belgium 717 Phone: +32-14-252081 718 EMail: lode.coene@siemens.com 720 Full Copyright Statement 722 Copyright (C) The Internet Society (2003). All Rights Reserved. 724 This document and translations of it may be copied and furnished to 725 others, and derivative works that comment on or otherwise explain it 726 or assist in its implementation may be prepared, copied, published 727 and distributed, in whole or in part, without restriction of any 728 kind, provided that the above copyright notice and this paragraph are 729 included on all such copies and derivative works. However, this 730 document itself may not be modified in any way, such as by removing 731 the copyright notice or references to the Internet Society or other 732 Internet organizations, except as needed for the purpose of 733 developing Internet standards in which case the procedures for 734 copyrights defined in the Internet Standards process must be 735 followed, or as required to translate it into languages other than 736 English. 738 The limited permissions granted above are perpetual and will not be 739 revoked by the Internet Society or its successors or assigns. 741 This document and the information contained herein is provided on an 742 "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 743 TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 744 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 745 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 746 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 748 Acknowledgement 750 Funding for the RFC Editor function is currently provided by the 751 Internet Society.