idnits 2.17.1 draft-ietf-rddp-rdmap-01.txt: ** The Abstract section seems to be numbered -(2675): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There is 1 instance of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** There is 1 instance of too long lines in the document, the longest one being 1 character in excess of 72. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 652: '... RDMAP MUST be layered on top of ...' RFC 2119 keyword, line 883: '...the DDP Protocol MUST be used by the R...' RFC 2119 keyword, line 884: '...of the bits in the first octet MUST be...' RFC 2119 keyword, line 888: '... field MUST be used by RDMAP to c...' RFC 2119 keyword, line 904: '... the RDMAP Layer to the DDP layer MUST...' (137 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The "Author's Address" (or "Authors' Addresses") section title is misspelled. -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IPSEC' is mentioned on line 2047, but not defined == Unused Reference: 'RFC2401' is defined on line 2242, but no explicit reference was found in the text -- Possible downref: Normative reference to a draft: ref. 'DDP' == Outdated reference: A later version (-03) exists of draft-culley-iwarp-mpa-01 -- Possible downref: Normative reference to a draft: ref. 'MPA' ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2401 (Obsoleted by RFC 4301) -- Obsolete informational reference (is this intentional?): RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) == Outdated reference: A later version (-10) exists of draft-ietf-rddp-security-00 Summary: 7 errors (**), 0 flaws (~~), 7 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT R. Recio 3 draft-ietf-rddp-rdmap-01.txt IBM Corporation 4 P. Culley 5 Hewlett-Packard Company 6 D. Garcia 7 Hewlett-Packard Company 8 J. Hilland 9 Hewlett-Packard Company 11 Expires: April, 2004 13 An RDMA Protocol Specification 15 1 Status of this Memo 17 This document is an Internet-Draft and is subject to all 18 provisions of Section 10 of RFC2026. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other 27 documents at any time. It is inappropriate to use Internet-Drafts 28 as reference material or to cite them other than as "work in 29 progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/1id-abstracts.html The list of Internet-Draft 33 Shadow Directories can be accessed at 34 http://www.ietf.org/shadow.html. 36 2 Abstract 38 This document defines a Remote Direct Memory Access Protocol 39 (RDMAP) that operates over the Direct Data Placement Protocol (DDP 40 protocol). RDMAP provides read and write services directly to 41 applications and enables data to be transferred directly into ULP 42 Buffers without intermediate data copies. It also enables a kernel 43 bypass implementation. 45 Table of Contents 47 1 Status of this Memo...................................1 48 2 Abstract.............................................1 49 3 Introduction.........................................4 50 3.1 Architectural Goals...................................4 51 3.2 Protocol Overview.....................................5 52 3.3 RDMAP Layering .......................................7 53 4 Glossary.............................................9 54 4.1 General..............................................9 55 4.2 LLP................................................10 56 4.3 Direct Data Placement (DDP)...........................11 57 4.4 Remote Direct Memory Access (RDMA).....................13 58 5 ULP and Transport Attributes..........................16 59 5.1 Transport Requirements & Assumptions...................16 60 5.2 RDMAP Interactions with the ULP........................17 61 6 Header Format.......................................21 62 6.1 RDMAP Control and Invalidate STag Field.................21 63 6.2 RDMA Message Definitions..............................23 64 6.3 RDMA Write Header....................................24 65 6.4 RDMA Read Request Header..............................25 66 6.5 RDMA Read Response Header.............................26 67 6.6 Send Header and Send with Solicited Event Header.........27 68 6.7 Send with Invalidate Header and Send with SE and Invalidate 69 Header..................................................27 70 6.8 Terminate Header.....................................27 71 7 Data Transfer.......................................33 72 7.1 RDMA Write Message...................................33 73 7.2 RDMA Read Operation..................................34 74 7.2.1 RDMA Read Request Message ...........................34 75 7.2.2 RDMA Read Response Message...........................35 76 7.3 Send Message Type....................................36 77 7.4 Terminate Message....................................38 78 7.5 Ordering and Completions..............................38 79 8 RDMAP Stream Management...............................43 80 8.1 Stream Initialization ................................43 81 8.2 Stream Teardown......................................44 82 8.2.1 RDMAP Abortive Termination...........................44 83 9 RDMAP Error Management................................46 84 9.1 RDMAP Error Surfacing ................................46 85 9.2 Errors Detected at the Remote Peer on Incoming RDMA Messages47 86 10 Security Considerations...............................49 87 10.1 Protocol-specific Security Considerations..............49 88 10.2 Using IPSec with RDMAP..............................49 89 10.3 Other Security Considerations........................49 90 11 References..........................................54 91 11.1 Normative References................................54 92 11.2 Informative References..............................54 93 12 Appendix............................................55 94 12.1 DDP Segment Formats for RDMA Messages.................55 95 12.1.1 DDP Segment for RDMA Write.........................55 96 12.1.2 DDP Segment for RDMA Read Request...................55 97 12.1.3 DDP Segment for RDMA Read Response..................56 98 12.1.4 DDP Segment for Send and Send with Solicited Event....57 99 12.1.5 DDP Segment for Send with Invalidate and Send with SE and 100 Invalidate...............................................57 101 12.1.6 DDP Segment for Terminate..........................58 102 12.2 Ordering and Completion Table........................59 103 13 Authors Addresses....................................62 104 14 Acknowledgments......................................63 105 15 Full Copyright Statement..............................66 107 Table of Figures 109 Figure 1 RDMAP Layering....................................7 110 Figure 2 Example of MPA, DDP, and RDMAP Header Alignment over TCP8 111 Figure 3 DDP Control, RDMAP Control, and Invalidate STag Fields.22 112 Figure 4 RDMA Usage of DDP Fields...........................23 113 Figure 5 RDMA Message Definitions...........................24 114 Figure 6 RDMA Read Request Header Format.....................25 115 Figure 7 Terminate Header Format ...........................28 116 Figure 8 Terminate Control Field ...........................28 117 Figure 9 Terminate Control Field Values .....................31 118 Figure 10 Error Type to RDMA Message Mapping.................32 119 Figure 11 RDMA Write, DDP Segment format.....................55 120 Figure 12 RDMA Read Request, DDP Segment format ..............56 121 Figure 13 RDMA Read Response, DDP Segment format..............57 122 Figure 14 Send and Send with Solicited Event, DDP Segment format57 123 Figure 15 Send with Invalidate and Send with SE and Invalidate, 124 DDP Segment..............................................58 125 Figure 16 Terminate, DDP Segment format .....................58 126 Figure 17 Operation Ordering...............................61 128 3 Introduction 130 Today, communications over TCP/IP typically require copy 131 operations, which add latency and consume significant CPU and 132 memory resources. The Remote Direct Memory Access Protocol 133 (RDMAP) enables removal of data copy operations and enables 134 reduction in latencies by allowing a local application to read or 135 write data on a remote computer's memory with minimal demands on 136 memory bus bandwidth and CPU processing overhead, while preserving 137 memory protection semantics. 139 RDMAP is layered on top of Direct Data Placement (DDP) and uses 140 the two Buffer Models available from DDP [DDP]. 142 3.1 Architectural Goals 144 RDMAP has been designed with the following high-level 145 architectural goals: 147 * Provide a data transfer operation that allows a Local Peer to 148 transfer up to 2^32 - 1 octets directly into a previously 149 advertised buffer (i.e. Tagged buffer) located at a Remote Peer 150 without requiring a copy operation. This is referred to as the 151 RDMA Write data transfer operation. 153 * Provide a data transfer operation that allows a Local Peer to 154 retrieve up to 2^32 - 1 octets directly from a previously 155 advertised buffer (i.e. Tagged buffer) located at a Remote Peer 156 without requiring a copy operation. This is referred to as the 157 RDMA Read data transfer operation. 159 * Provide a data transfer operation that allows a Local Peer to 160 send up to 2^32 - 1 octets directly into a buffer located at a 161 Remote Peer that has not been explicitly advertised. This is 162 referred to as the Send (Send with Invalidate, Send with 163 Solicited Event, and Send with Solicited Event and Invalidate) 164 data transfer operation. 166 * Enable the local ULP to use the Send Operation Type (includes 167 Send, Send with Invalidate, Send with Solicited Event, and Send 168 with Solicited Event and Invalidate) to signal to the remote 169 ULP the Completion of all previous Messages initiated by the 170 local ULP. 172 * Provide for all Operations on a single RDMAP Stream to be 173 reliably transmitted in the order that they were submitted. 175 * Provide RDMAP capabilities independently for each Stream when 176 the LLP supports multiple data Streams within an LLP 177 connection. 179 3.2 Protocol Overview 181 RDMAP provides seven data transfer operations. Except for the RDMA 182 Read operation, each operation generates exactly one RDMA Message. 183 Following is a brief overview of the RDMA Operations and RDMA 184 Messages: 186 1. Send - A Send operation uses a Send Message to transfer data 187 from the Data Source into a buffer that has not been 188 explicitly Advertised by the Data Sink. The Send Message uses 189 the DDP Untagged Buffer Model to transfer the ULP Message into 190 the Data Sink's Untagged Buffer. 192 2. Send with Invalidate - A Send with Invalidate operation uses a 193 Send with Invalidate Message to transfer data from the Data 194 Source into a buffer that has not been explicitly Advertised 195 by the Data Sink. The Send with Invalidate Message includes 196 all functionality of the Send Message, with one addition: an 197 STag field is included in the Send With Invalidate Message and 198 after the message has been Placed and Delivered at the Data 199 Sink the remote peer's buffer identified by the STag can no 200 longer be accessed remotely until the remote peer's ULP re- 201 enables access and Advertises the buffer. 203 3. Send with Solicited Event (Send with SE) - A Send with 204 Solicited Event operation uses a Send with Solicited Event 205 Message to transfer data from the Data Source into an Untagged 206 Buffer at the Data Sink. The Send with Solicited Event Message 207 is similar to the Send Message, with one addition: when the 208 Send with Solicited Event Message has been Placed and 209 Delivered, an Event may be generated at the recipient, if the 210 recipient is configured to generate such an Event. 212 4. Send with Solicited Event and Invalidate (Send with SE and 213 Invalidate) - A Send with Solicited Event and Invalidate 214 operation uses a Send with Solicited Event and Invalidate 215 Message to transfer data from the Data Source into a buffer 216 that has not been explicitly Advertised by the Data Sink. The 217 Send with Solicited Event and Invalidate Message is similar to 218 the Send with Invalidate Message, with one addition: when the 219 Send with Solicited Event and Invalidate Message has been 220 Placed and Delivered, an Event may be generated at the 221 recipient, if the recipient is configured to generate such an 222 Event. 224 5. Remote Direct Memory Access Write - An RDMA Write operation 225 uses an RDMA Write Message to transfer data from the Data 226 Source to a previously advertised buffer at the Data Sink. 228 The ULP at the Remote Peer, which in this case is the Data 229 Sink, enables the Data Sink Tagged Buffer for access and 230 Advertises the buffer's size (length), location (Tagged 231 Offset), and Steering Tag (STag) to the Data Source through a 232 ULP specific mechanism. The ULP at the Local Peer, which in 233 this case is the Data Source, initiates the RDMA Write 234 operation. The RDMA Write Message uses the DDP Tagged Buffer 235 Model to transfer the ULP Message into the Data Sink's Tagged 236 Buffer. Note: the STag associated with the Tagged Buffer 237 remains valid until the ULP at the Remote Peer invalidates it 238 or the ULP at the Local Peer invalidates it through a Send 239 with Invalidate or Send with Solicited Event and Invalidate. 241 6. Remote Direct Memory Access Read - The RDMA Read operation 242 transfers data to a Tagged Buffer at the Local Peer, which in 243 this case is the Data Sink, from a Tagged Buffer at the Remote 244 Peer, which in this case is the Data Source. The ULP at the 245 Data Source enables the Data Source Tagged Buffer for access 246 and Advertises the buffer's size (length), location (Tagged 247 Offset), and Steering Tag (STag) to the Data Sink through a 248 ULP specific mechanism. The ULP at the Data Sink enables the 249 Data Sink Tagged Buffer for access and initiates the RDMA Read 250 operation. The RDMA Read operation consists of a single RDMA 251 Read Request Message and a single RDMA Read Response Message, 252 and the latter may be segmented into multiple DDP Segments. 254 The RDMA Read Request Message uses the DDP Untagged Buffer 255 Model to Deliver the STag, starting Tagged Offset and length 256 for both the Data Source and Data Sink Tagged Buffers to the 257 remote peer's RDMA Read Request Queue. 259 The RDMA Read Response Message uses the DDP Tagged Buffer 260 Model to Deliver the Data Source's Tagged Buffer to the Data 261 Sink, without any involvement from the ULP at the Data Source. 263 Note: the Data Source STag associated with the Tagged Buffer 264 remains valid until the ULP at the Data Source invalidates it 265 or the ULP at the Data Sink invalidates it through a Send with 266 Invalidate or Send with Solicited Event and Invalidate. The 267 Data Sink STag associated with the Tagged Buffer remains valid 268 until the ULP at the Data Sink invalidates it. 270 7. Terminate - A Terminate operation uses a Terminate Message to 271 transfer to the Remote Peer information associated with an 272 error that occurred at the Local Peer. The Terminate Message 273 uses the DDP Untagged Buffer Model to transfer the Message 274 into the Data Sink's Untagged Buffer. 276 3.3 RDMAP Layering 278 RDMAP is dependent on DDP, subject to the requirements defined in 279 section 5 ULP and Transport Attributes 281 Transport Requirements & Assumptions. Figure 1 RDMAP Layering 282 depicts the relationship between Upper Layer Protocols (ULPs), 283 RDMAP, DDP protocol, the framing layer, and the transport For LLP 284 protocol definitions of each LLP, see [MPA], [TCP], and [SCTP]. 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 | | 288 | Upper Layer Protocol (ULP) | 289 | | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 | | 292 | RDMAP | 293 | | 294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 295 | | 296 | DDP protocol | 297 | | 298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 299 | | | 300 | MPA | | 301 | | | 302 +-+-+-+-+-+-+-+-+-+ SCTP | 303 | | | 304 | TCP | | 305 | | | 306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 307 Figure 1 RDMAP Layering 309 If RDMAP is layered over DDP/MPA/TCP, then the respective headers 310 and ULP Payload are arranged as follows (Note: For clarity, MPA 311 header and CRC fields are included but MPA markers are not shown): 313 0 1 2 3 314 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 315 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 316 | | 317 // TCP Header // 318 | | 319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 320 | MPA Header | | 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 322 | | 323 // DDP Header // 324 | | 325 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 326 | | 327 // RDMA Header // 328 | | 329 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 330 | | 331 // ULP Payload // 332 | (shown with no pad bytes) | 333 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 334 | MPA CRC | 335 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 336 Figure 2 Example of MPA, DDP, and RDMAP Header Alignment over TCP 338 4 Glossary 340 4.1 General 342 Advertisement (Advertised, Advertise, Advertisements, Advertises) 343 - the act of informing a Remote Peer that a local RDMA Buffer 344 is available to it. A Node makes available an RDMA Buffer for 345 incoming RDMA Read or RDMA Write access by informing its 346 RDMA/DDP peer of the Tagged Buffer identifiers (STag, base 347 address, and buffer length). This advertisement of Tagged 348 Buffer information is not defined by RDMA/DDP and is left to 349 the ULP. A typical method would be for the Local Peer to embed 350 the Tagged Buffer's Steering Tag, base address, and length in 351 a Send Message destined for the Remote Peer. 353 Data Sink - The peer receiving a data payload. Note that the Data 354 Sink can be required to both send and receive RDMA/DDP 355 Messages to transfer a data payload. 357 Data Source - The peer sending a data payload. Note that the Data 358 Source can be required to both send and receive RDMA/DDP 359 Messages to transfer a data payload. 361 Data Delivery (Delivery, Delivered, Delivers) - Delivery is 362 defined as the process of informing the ULP or consumer that a 363 particular Message is available for use. This is specifically 364 different from "Placement", which may generally occur in any 365 order, while the order of "Delivery" is strictly defined. See 366 "Data Placement". 368 Fabric - The collection of links, switches, and routers that 369 connect a set of Nodes with RDMA/DDP protocol implementations. 371 Fence (Fenced, Fences) - To block the current RDMA Operation from 372 executing until prior RDMA Operations have Completed. 374 iWARP - A suite of wire protocols comprised of RDMAP, DDP, and 375 MPA. The iWARP protocol suite may be layered above TCP, SCTP, 376 or other transport protocols. 378 Local Peer - The RDMA/DDP protocol implementation on the local end 379 of the connection. Used to refer to the local entity when 380 describing a protocol exchange or other interaction between 381 two Nodes. 383 Node - A computing device attached to one or more links of a 384 Fabric (network). A Node in this context does not refer to a 385 specific application or protocol instantiation running on the 386 computer. A Node may consist of one or more RNICs installed in 387 a host computer. 389 Remote Peer - The RDMA/DDP protocol implementation on the opposite 390 end of the connection. Used to refer to the remote entity when 391 describing protocol exchanges or other interactions between 392 two Nodes. 394 RNIC - RDMA Network Interface Controller. In this context, this 395 would be a network I/O adapter or embedded controller with 396 iWARP and verbs functionality. 398 RNIC Interface (RI) - The presentation of the RNIC to the verbs 399 Consumer as implemented through the combination of the RNIC 400 and the RNIC driver. 402 ULP - Upper Layer Protocol. The protocol layer above the protocol 403 layer currently being referenced. The ULP for RDMA/DDP is 404 expected to be an OS, Application, adaptation layer, or 405 proprietary device. The RDMA/DDP documents do not specify a 406 ULP - they provide a set of semantics that allow a ULP to be 407 designed to utilize RDMA/DDP. 409 ULP Payload - The ULP data that is contained within a single 410 protocol segment or packet (e.g. a DDP Segment). 412 Verbs - An abstract description of the functionality of a RNIC 413 Interface. The OS may expose some or all of this functionality 414 via one or more APIs to applications. The OS will also use 415 some of the functionality to manage the RNIC Interface. 417 4.2 LLP 419 LLP - Lower Layer Protocol. The protocol layer beneath the 420 protocol layer currently being referenced. For example, for 421 DDP the LLP is SCTP, MPA, or other transport protocols. For 422 RDMA, the LLP is DDP. 424 LLP Connection - Corresponds to an LLP transport-level connection 425 between the peer LLP layers on two nodes. 427 LLP Stream - Corresponds to a single LLP transport-level Stream 428 between the peer LLP layers on two Nodes. One or more LLP 429 Streams may map to a single transport-level LLP connection. 430 For transport protocols that support multiple Streams per 431 connection (e.g. SCTP), a LLP Stream corresponds to one 432 transport-level Stream. 434 MULPDU - Maximum ULPDU. The current maximum size of the record 435 that is acceptable for DDP to pass to the LLP for 436 transmission. 438 ULPDU - Upper Layer Protocol Data Unit. The data record defined 439 by the layer above MPA. 441 4.3 Direct Data Placement (DDP) 443 Data Placement (Placement, Placed, Places) - For DDP, this term is 444 specifically used to indicate the process of writing to a data 445 buffer by a DDP implementation. DDP Segments carry Placement 446 information, which may be used by the receiving DDP 447 implementation to perform Data Placement of the DDP Segment 448 ULP Payload. See "Data Delivery". 450 DDP Abortive Teardown - The act of closing a DDP Stream without 451 attempting to Complete in-progress and pending DDP Messages. 453 DDP Graceful Teardown - The act of closing a DDP Stream such that 454 all in-progress and pending DDP Messages are allowed to 455 Complete successfully. 457 DDP Control Field - a fixed 16-bit field in the DDP Header. The 458 DDP Control Field contains an 8-bit field whose contents are 459 reserved for use by the ULP. 461 DDP Header - The header present in all DDP segments. The DDP 462 Header contains control and Placement fields that are used to 463 define the final Placement location for the ULP payload 464 carried in a DDP Segment. 466 DDP Message - A ULP defined unit of data interchange, which is 467 subdivided into one or more DDP segments. This segmentation 468 may occur for a variety of reasons, including segmentation to 469 respect the maximum segment size of the underlying transport 470 protocol. 472 DDP Segment - The smallest unit of data transfer for the DDP 473 protocol. It includes a DDP Header and ULP Payload (if 474 present). A DDP Segment should be sized to fit within the 475 underlying transport protocol MULPDU. 477 DDP Stream - a sequence of DDP Messages whose ordering is defined 478 by the LLP. For SCTP, a DDP Stream maps directly to an SCTP 479 Stream. For MPA, a DDP Stream maps directly to a TCP 480 connection and a single DDP Stream is supported. Note that 481 DDP has no ordering guarantees between DDP Streams. 483 Direct Data Placement - A mechanism whereby ULP data contained 484 within DDP Segments may be Placed directly into its final 485 destination in memory without processing of the ULP. This may 486 occur even when the DDP Segments arrive out of order. Out of 487 order Placement support may require the Data Sink to implement 488 the LLP and DDP as one functional block. 490 Direct Data Placement Protocol (DDP) - Also, a wire protocol that 491 supports Direct Data Placement by associating explicit memory 492 buffer placement information with the LLP payload units. 494 Message Offset (MO) - For the DDP Untagged Buffer Model, specifies 495 the offset, in bytes, from the start of a DDP Message. 497 Message Sequence Number (MSN) - For the DDP Untagged Buffer Model, 498 specifies a sequence number that is increasing with each DDP 499 Message. 501 Queue Number (QN) - For the DDP Untagged Buffer Model, identifies 502 a destination Data Sink queue for a DDP Segment. 504 Steering Tag - An identifier of a Tagged Buffer on a Node, valid 505 as defined within a protocol specification. 507 STag - Steering Tag 509 Tagged Buffer - A buffer that is explicitly Advertised to the 510 Remote Peer through exchange of an STag, Tagged Offset, and 511 length. 513 Tagged Buffer Model - A DDP data transfer model used to transfer 514 Tagged Buffers from the Local Peer to the Remote Peer. 516 Tagged DDP Message - A DDP Message that targets a Tagged Buffer. 518 Tagged Offset (TO) - The offset within a Tagged Buffer on a Node. 520 Untagged Buffer - A buffer that is not explicitly Advertised to 521 the Remote Peer. 523 Untagged Buffer Model - A DDP data transfer model used to transfer 524 Untagged Buffers from the Local Peer to the Remote Peer. 526 Untagged DDP Message - A DDP Message that targets an Untagged 527 Buffer. 529 4.4 Remote Direct Memory Access (RDMA) 531 Event - An indication provided by the RDMAP Layer to the ULP to 532 indicate a Completion or other condition requiring immediate 533 attention. 535 Invalidate STag - A mechanism used to prevent the Remote Peer from 536 reusing a previous explicitly Advertised STag, until the Local 537 Peer makes it available through a subsequent explicit 538 Advertisement. The STag cannot be accessed remotely until it 539 is explicit Advertised again. 541 RDMA Completion (Completion, Completed, Complete, Completes) - For 542 RDMA, Completion is defined as the process of informing the 543 ULP that a particular RDMA Operation has performed all 544 functions specified for the RDMA Operations, including 545 Placement and Delivery. The Completion semantic of each RDMA 546 Operation is distinctly defined. 548 RDMA Message - A data transfer mechanism used to fulfill an RDMA 549 Operation. 551 RDMA Operation - A sequence of RDMA Messages, including control 552 Messages, to transfer data from a Data Source to a Data Sink. 553 The following RDMA Operations are defined - RDMA Writes, RDMA 554 Read, Send, Send with Invalidate, Send with Solicited Event, 555 Send with Solicited Event and Invalidate, and Terminate. 557 RDMA Protocol (RDMAP) - A wire protocol that supports RDMA 558 Operations to transfer ULP data between a Local Peer and the 559 Remote Peer. 561 RDMAP Abortive Termination (Termination, Terminated, Terminate, 562 Terminates) - The act of closing an RDMAP Stream without 563 attempting to Complete in-progress and pending RDMA 564 Operations. 566 RDMAP Graceful Termination - The act of closing an RDMAP Stream 567 such that all in-progress and pending RDMA Operations are 568 allowed to Complete successfully. 570 RDMA Read - An RDMA Operation used by the Data Sink to transfer 571 the contents of a source RDMA buffer from the Remote Peer to 572 the Local Peer. An RDMA Read operation consists of a single 573 RDMA Read Request Message and a single RDMA Read Response 574 Message. 576 RDMA Read Request - An RDMA Message used by the Data Sink to 577 request the Data Source to transfer the contents of an RDMA 578 buffer. The RDMA Read Request Message describes both the Data 579 Source and Data Sink RDMA buffers. 581 RDMA Read Request Queue - The queue used for processing RDMA Read 582 Requests. The RDMA Read Request Queue has a DDP Queue Number 583 of 1. 585 RDMA Read Response - An RDMA Message used by the Data Source to 586 transfer the contents of an RDMA buffer to the Data Sink, in 587 response to an RDMA Read Request. The RDMA Read Response 588 Message only describes the data sink RDMA buffer. 590 RDMAP Stream - An association between a pair of RDMAP 591 implementations, possibly on different Nodes, which transfer 592 ULP data using RDMA Operations. There may be multiple RDMAP 593 Streams on a single Node. An RDMAP Stream maps directly to a 594 single DDP Stream. 596 RDMA Write - An RDMA Operation that transfers the contents of a 597 source RDMA Buffer from the Local Peer to a destination RDMA 598 Buffer at the Remote Peer using RDMA. The RDMA Write Message 599 only describes the Data Sink RDMA buffer. 601 Remote Direct Memory Access (RDMA) - A method of accessing memory 602 on a remote system in which the local system specifies the 603 remote location of the data to be transferred. Employing a 604 RNIC in the remote system allows the access to take place 605 without interrupting the processing of the CPU(s) on the 606 system. 608 Send - An RDMA Operation that transfers the contents of a ULP 609 Buffer from the Local Peer to an Untagged Buffer at the Remote 610 Peer. 612 Send Message Type - A Send Message, Send with Invalidate Message, 613 Send with Solicited Event Message, or Send with Solicited 614 Event and Invalidate Message. 616 Send Operation Type - A Send Operation, Send with Invalidate 617 Operation, Send with Solicited Event Operation, or Send with 618 Solicited Event and Invalidate Operation. 620 Solicited Event (SE) - A facility by which an RDMA Operation 621 sender may cause an Event to be generated at the recipient, if 622 the recipient is configured to generate such an Event, when a 623 Send with Solicited Event or Send with Solicited Event and 624 Invalidate Message is received. Note: The Local Peer's ULP 625 can use the Solicited Event mechanism to ensure that Messages 626 designated as important to the ULP are handled in an 627 expeditious manner by the Remote Peer's ULP. The ULP at the 628 Local Peer can indicate a given Send Message Type is important 629 by using the Send with Solicited Event Message or Send with 630 Solicited Event and Invalidate Message. The ULP at the Remote 631 Peer can choose to only be notified when valid Send with 632 Solicited Event Messages and/or Send with Solicited Event and 633 Invalidate Messages arrive and handle other valid incoming 634 Send Messages or Send with Invalidate Messages at its leisure. 636 Terminate - An RDMA Message used by a Node to pass an error 637 indication to the peer Node on an RDMAP Stream. This operation 638 is for RDMAP use only. 640 ULP Buffer - A buffer owned above the RDMAP Layer and advertised 641 to the RDMAP Layer either as a Tagged Buffer or an Untagged 642 ULP Buffer. 644 ULP Message - The ULP data that is handed to a specific protocol 645 layer for transmission. Data boundaries are preserved as they 646 are transmitted through iWARP. 648 5 ULP and Transport Attributes 650 5.1 Transport Requirements & Assumptions 652 RDMAP MUST be layered on top of the Direct Data Placement Protocol 653 [DDP]. 655 RDMAP requires the following DDP support: 657 * RDMAP uses three queues for Untagged Buffers: 659 * Queue Number 0 (used by RDMAP for Send, Send with 660 Invalidate, Send with Solicited Event, and Send with 661 Solicited Event and Invalidate operations). 663 * Queue Number 1 (used by RDMAP for RDMA Read operations). 665 * Queue Number 2 (used by RDMAP for Terminate operations). 667 * DDP maps a single RDMA Message to a single DDP Message. 669 * DDP uses the STag and Tagged Offset provided by the RDMAP for 670 Tagged Buffer Messages (i.e. RDMA Write and RDMA Read 671 Response). 673 * When the DDP layer Delivers an Untagged DDP Message to the 674 RDMAP layer, DDP provides the length of the DDP Message. This 675 ensures that RDMAP does not have to carry a length field in its 676 header. 678 * When the RDMAP layer provides an RDMA Message to the DDP Layer, 679 DDP must insert the RsvdULP field value provided by the RDMAP 680 Layer into the associated DDP Message. 682 * When the DDP layer Delivers a DDP Message to the RDMAP layer, 683 DDP provides the RsvdULP field. 685 * The RsvdULP field must be 1 octet for DDP Tagged Messages and 5 686 octets for DDP Untagged Messages. 688 * DDP propagates to RDMAP all operation or protection errors 689 (used by RDMAP Terminate) and, when appropriate, the DDP Header 690 fields of the DDP Segment that encountered the error. 692 * If an RDMA Operation is aborted by DDP or a lower layer, the 693 contents of the Data Sink buffers associated with the operation 694 are considered indeterminate. 696 * DDP in conjunction with the lower layers provide reliable, in- 697 order Delivery. 699 5.2 RDMAP Interactions with the ULP 701 RDMAP provides the ULP with access to the following RDMA 702 Operations as defined in this specification: 704 * Send 706 * Send with Solicited Event 708 * Send with Invalidate 710 * Send with Solicited Event and Invalidate 712 * RDMA Write 714 * RDMA Read 716 For Send Operation Types, the following are the interactions 717 between the RDMAP Layer and the ULP: 719 * At the Data Source: 721 * The ULP passes to the RDMAP Layer the following: 723 * ULP Message Length 725 * ULP Message 727 * An indication of the Send Operation Type, where the 728 valid types are: Send, Send with Solicited Event, Send 729 with Invalidate, or Send with Solicited Event and 730 Invalidate. 732 * An Invalidate STag, if the Send Operation Type was 733 Send with Invalidate or Send with Solicited Event and 734 Invalidate. 736 * When the Send Operation Type Completes, an indication of 737 the Completion results. 739 * At the Data Sink: 741 * If the Send Operation Type Completed successfully, the 742 RDMAP Layer passes the following information to the ULP 743 Layer: 745 * ULP Message Length 747 * ULP Message 749 * An Event, if the Data Sink is configured to generate 750 an Event. 752 * An Invalidated STag, if the Send Operation Type was 753 Send with Invalidate or Send with Solicited Event and 754 Invalidate. 756 * If the Send Operation Type Completed in error, the Data 757 Sink RDMAP Layer will pass up the corresponding error 758 information to the Data Sink ULP and send a Terminate 759 Message to the Data Source RDMAP Layer. The Data Source 760 RDMAP Layer will then pass up the Terminate Message to the 761 ULP. 763 For RDMA Write Operations, the following are the interactions 764 between the RDMAP Layer and the ULP: 766 * At the Data Source: 768 * The ULP passes to the RDMAP Layer the following: 770 * ULP Message Length 772 * ULP Message 774 * Data Sink STag 776 * Data Sink Tagged Offset 778 * When the RDMA Write Operation Completes, an indication of 779 the Completion results. 781 * At the Data Sink: 783 * If the RDMA Write completed successfully, the RDMAP Layer 784 does not Deliver the RDMA Write to the ULP. It does Place 785 the ULP Message transferred through the RDMA Write Message 786 into the ULP Buffer. 788 * If the RDMA Write completed in error, the Data Sink RDMAP 789 Layer will pass up the corresponding error information to 790 the Data Sink ULP and send a Terminate Message to the Data 791 Source RDMAP Layer. The Data Source RDMAP Layer will then 792 pass up the Terminate Message to the ULP. 794 For RDMA Read Operations, the following are the interactions 795 between the RDMAP Layer and the ULP: 797 * At the Data Sink: 799 * The ULP passes to the RDMAP Layer the following: 801 * ULP Message Length 803 * Data Source STag 805 * Data Sink STag 807 * Data Source Tagged Offset 809 * Data Sink Tagged Offset 811 * When the RDMA Read Operation Completes, an indication of 812 the Completion results. 814 * At the Data Source: 816 * If no error occurred while processing the RDMA Read 817 Request, the Data Source will not pass up any information 818 to the ULP. 820 * If an error occurred while processing the RDMA Read 821 Request, the Data Source RDMAP Layer will pass up the 822 corresponding error information to the Data Source ULP and 823 send a Terminate Message to the Data Sink RDMAP Layer. The 824 Data Sink RDMAP Layer will then pass up the Terminate 825 Message to the ULP. 827 For STags made available to the RDMAP Layer, following are the 828 interactions between the RDMAP Layer and the ULP: 830 * If the ULP enables an STag, the ULP passes to the RDMAP Layer 831 the: 833 * STag; 835 * range of Tagged Offsets that are associated with a given 836 STag; 838 * remote access rights (read, write, or read and write) 839 associated with a given, valid STag; and 841 * association between a given STag and a given RDMAP Stream. 843 * If the ULP disables an STag, the ULP passes to the RDMAP Layer 844 the STag. 846 If an error occurs at the RDMAP Layer, the RDMAP Layer may pass 847 back error information (e.g. the content of a Terminate Message) 848 to the ULP. 850 6 Header Format 852 The control information of RDMA Messages is included in DDP 853 protocol defined header fields, with the following exceptions: 855 * The first octet reserved for ULP usage on all DDP Messages in 856 the DDP Protocol (i.e. the RsvdULP Field) is used by RDMAP to 857 carry the RDMA Message Opcode and the RDMAP version. This octet 858 is known as the RDMAP Control Field in this specification. For 859 Send with Invalidate and Send with Solicited Event and 860 Invalidate, RDMAP uses the second through fifth octets provided 861 by DDP on Untagged DDP Messages to carry the STag that will be 862 Invalidated. 864 * The RDMA Message length is passed by the RDMAP layer to the DDP 865 layer on all outbound transfers. 867 * For RDMA Read Request Messages, the RDMA Read Message Size is 868 included in the RDMA Read Request Header. 870 * The RDMA Message length is passed to the RDMAP Layer by the DDP 871 layer on inbound Untagged Buffer transfers. 873 * Two RDMA Messages carry additional RDMAP headers. The RDMA Read 874 Request carries the Data Sink and Data Source buffer 875 descriptions, including buffer length. The Terminate carries 876 additional information associated with the error that caused 877 the Terminate. 879 6.1 RDMAP Control and Invalidate STag Field 881 The version of RDMAP defined by this specification uses all 8 bits 882 of the RDMAP Control Field. The first octet reserved for ULP use 883 in the DDP Protocol MUST be used by the RDMAP to carry the RDMAP 884 Control Field. The ordering of the bits in the first octet MUST be 885 as defined in Figure 3 DDP Control, RDMAP Control, and Invalidate 886 STag Field. For Send with Invalidate and Send with Solicited Event 887 and Invalidate, the second through fifth octets of the DDP RsvdULP 888 field MUST be used by RDMAP to carry the Invalidate STag. Figure 3 889 DDP Control, RDMAP Control, and Invalidate STag Field depicts the 890 format of the DDP Control and RDMAP Control fields. (Note: In 891 Figure 3 DDP Control, RDMAP Control, and Invalidate STag Field, 892 the DDP Header is offset by 16 bits to accommodate the MPA header 893 defined in [MPA]. The MPA header is only present if DDP is layered 894 on top of MPA.) 895 0 1 2 3 896 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 897 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 898 |T|L| Resrv | DV| RV|Rsv| Opcode| 899 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 900 | Invalidate STag | 901 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 902 Figure 3 DDP Control, RDMAP Control, and Invalidate STag Fields 904 All RDMA Messages handed by the RDMAP Layer to the DDP layer MUST 905 define the value of the Tagged flag in the DDP Header. Figure 4 906 RDMA Usage of DDP Fields MUST be used to define the value of the 907 Tagged flag that is handed to the DDP Layer for each RDMA Message. 909 Figure 4 RDMA Usage of DDP Fields defines the value of the RDMA 910 Opcode field that MUST be used for each RDMA Message. 912 Figure 4 RDMA Usage of DDP Fields defines when the STag, Queue 913 Number, and Tagged Offset fields MUST be provided for each RDMA 914 Message. 916 For this version of the RDMAP, all RDMA Messages MUST have: 918 * Bits 24-25; RDMA Version field: 01b. 920 * Bits 26-27; Reserved. MUST be set to zero by sender, ignored by 921 the receiver. 923 * Bits 28-31; OpCode field: see Figure 4 RDMA Usage of DDP 924 Fields. 926 * Bits 32-63; Invalidate STag. However, this field is only valid 927 for Send with Invalidate and Send with Solicited Event and 928 Invalidate Messages (see Figure 4 RDMA Usage of DDP Fields). 929 For Send, Send with Solicited Event, RDMA Read Request, and 930 Terminate, the Invalidate STag field MUST be set to zero on 931 transmit and ignored by the receiver. 933 -------+-----------+-------+------+-------+-----------+-------------- 934 RDMA | Message | Tagged| STag | Queue | Invalidate| Message 935 Message| Type | Flag | and | Number| STag | Length 936 OpCode | | | TO | | | Communicated 937 | | | | | | between DDP 938 | | | | | | and RDMAP 939 -------+-----------+-------+------+-------+-----------+-------------- 940 0000b | RDMA Write| 1 | Valid| N/A | N/A | Yes 941 | | | | | | 942 -------+-----------+-------+------+-------+-----------+-------------- 943 0001b | RDMA Read | 0 | N/A | 1 | N/A | Yes 944 | Request | | | | | 945 -------+-----------+-------+------+-------+-----------+-------------- 946 0010b | RDMA Read | 1 | Valid| N/A | N/A | Yes 947 | Response | | | | | 948 -------+-----------+-------+------+-------+-----------+-------------- 949 0011b | Send | 0 | N/A | 0 | N/A | Yes 950 | | | | | | 951 -------+-----------+-------+------+-------+-----------+-------------- 952 0100b | Send with | 0 | N/A | 0 | Valid | Yes 953 | Invalidate| | | | | 954 -------+-----------+-------+------+-------+-----------+-------------- 955 0101b | Send with | 0 | N/A | 0 | N/A | Yes 956 | SE | | | | | 957 -------+-----------+-------+------+-------+-----------+-------------- 958 0110b | Send with | 0 | N/A | 0 | Valid | Yes 959 | SE and | | | | | 960 | Invalidate| | | | | 961 -------+-----------+-------+------+-------+-----------+-------------- 962 0111b | Terminate | 0 | N/A | 2 | N/A | Yes 963 | | | | | | 964 -------+-----------+-------+------+-------+-----------+-------------- 965 1000b | | 966 to | Reserved | Not Specified 967 1111b | | 968 -------+-----------+------------------------------------------------- 969 Figure 4 RDMA Usage of DDP Fields 971 Note: N/A means Not Applicable. 973 6.2 RDMA Message Definitions 975 The following figure defines which RDMA Headers MUST be used on 976 each RDMA Message and which RDMA Messages are allowed to carry ULP 977 payload: 979 -------+-----------+-------------------+------------------------- 980 RDMA | Message | RDMA Header Used | ULP Message allowed in 981 Message| Type | | the RDMA Message 982 OpCode | | | 983 | | | 984 -------+-----------+-------------------+------------------------- 985 0000b | RDMA Write| None | Yes 986 | | | 987 -------+-----------+-------------------+------------------------- 988 0001b | RDMA Read | RDMA Read Request | No 989 | Request | Header | 990 -------+-----------+-------------------+------------------------- 991 0010b | RDMA Read | None | Yes 992 | Response | | 993 -------+-----------+-------------------+------------------------- 994 0011b | Send | None | Yes 995 | | | 996 -------+-----------+-------------------+------------------------- 997 0100b | Send with | None | Yes 998 | Invalidate| | 999 -------+-----------+-------------------+------------------------- 1000 0101b | Send with | None | Yes 1001 | SE | | 1002 -------+-----------+-------------------+------------------------- 1003 0110b | Send with | None | Yes 1004 | SE and | | 1005 | Invalidate| | 1006 -------+-----------+-------------------+------------------------- 1007 0111b | Terminate | Terminate Header | No 1008 | | | 1009 -------+-----------+-------------------+------------------------- 1010 1000b | | 1011 to | Reserved | Not Specified 1012 1111b | | 1013 -------+-----------+-------------------+------------------------- 1014 Figure 5 RDMA Message Definitions 1016 6.3 RDMA Write Header 1018 The RDMA Write Message does not include an RDMAP header. The RDMAP 1019 layer passes to the DDP layer an RDMAP Control Field. The RDMA 1020 Write Message is fully described by the DDP Headers of the DDP 1021 Segments associated with the Message. 1023 See section 12 Appendix for a description of the DDP Segment 1024 format associated with RDMA Write Messages. 1026 6.4 RDMA Read Request Header 1028 The RDMA Read Request Message carries an RDMA Read Request Header 1029 that describes the Data Sink and Data Source Buffers used by the 1030 RDMA Read operation. The RDMA Read Request Header immediately 1031 follows the DDP header. The RDMAP layer passes to the DDP layer an 1032 RDMAP Control Field. The following figure depicts the RDMA Read 1033 Request Header that MUST be used for all RDMA Read Request 1034 Messages: 1036 0 1 2 3 1037 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1038 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1039 | Data Sink STag (SinkSTag) | 1040 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1041 | | 1042 + Data Sink Tagged Offset (SinkTO) + 1043 | | 1044 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1045 | RDMA Read Message Size (RDMARDSZ) | 1046 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1047 | Data Source STag (SrcSTag) | 1048 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1049 | | 1050 + Data Source Tagged Offset (SrcTO) + 1051 | | 1052 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1053 Figure 6 RDMA Read Request Header Format 1055 Data Sink Steering Tag: 32 bits. 1057 The Data Sink Steering Tag identifies the Data Sink's Tagged 1058 Buffer. This field MUST be copied, without interpretation, 1059 from the RDMA Read Request into the corresponding RDMA Read 1060 Response and allows the Data Sink to place the returning 1061 data. The STag is associated with the RDMAP Stream through a 1062 mechanism that is outside the scope of the RDMAP 1063 specification (see Section 10.3 Other Security 1064 Considerations). 1066 Data Sink Tagged Offset: 64 bits. 1068 The Data Sink Tagged Offset specifies the starting offset, in 1069 octets, from the base of the Data Sink's Tagged Buffer, where 1070 the data is to be written by the Data Source. This field is 1071 copied from the RDMA Read Request into the corresponding RDMA 1072 Read Response and allows the Data Sink to place the returning 1073 data. The Data Sink Tagged Offset MAY start at an arbitrary 1074 offset. 1076 The Data Sink STag and Data Sink Tagged Offset fields 1077 describe the buffer to which the RDMA Read data is written. 1079 Note: the DDP Layer protects against a wrap of the Data Sink 1080 Tagged Offset. 1082 RDMA Read Message Size: 32 bits. 1084 The RDMA Read Message Size is the amount of data, in octets, 1085 read from the Data Source. A single RDMA Read Request Message 1086 can retrieve from 0 to 2^32-1 data octets from the Data 1087 Source. 1089 Data Source Steering Tag: 32 bits. 1091 The Data Source Steering Tag identifies the Data Source's 1092 Tagged Buffer. The STag is associated with the RDMAP Stream 1093 through a mechanism that is outside the scope of the RDMAP 1094 specification (see Section 10.3 Other Security 1095 Considerations). 1097 Data Source Tagged Offset: 64 bits. 1099 The Tagged Offset specifies the starting offset, in octets, 1100 that is to be read from the Data Source's Tagged Buffer. The 1101 Data Source Tagged Offset MAY start at an arbitrary offset. 1103 The Data Source STag and Data Source Tagged Offset fields 1104 describe the buffer from which the RDMA Read data is read. 1106 See Section 9.2 Errors Detected at the Remote Peer on Incoming 1107 RDMA Messages for a description of error checking required upon 1108 processing of an RDMA Read Request at the Data Source. 1110 6.5 RDMA Read Response Header 1112 The RDMA Read Response Message does not include an RDMAP header. 1113 The RDMAP layer passes to the DDP layer an RDMAP Control Field. 1114 The RDMA Read Response Message is fully described by the DDP 1115 Headers of the DDP Segments associated with the Message. 1117 See Section 12 Appendix for a description of the DDP Segment 1118 format associated with RDMA Read Response Messages. 1120 6.6 Send Header and Send with Solicited Event Header 1122 The Send and Send with Solicited Event Message do not include an 1123 RDMAP header. The RDMAP layer passes to the DDP layer an RDMAP 1124 Control Field. The Send and Send with Solicited Event Message are 1125 fully described by the DDP Headers of the DDP Segments associated 1126 with the Message. 1128 See Section 12 Appendix for a description of the DDP Segment 1129 format associated with Send and Send with Solicited Event 1130 Messages. 1132 6.7 Send with Invalidate Header and Send with SE and Invalidate 1133 Header 1135 The Send with Invalidate and Send with Solicited Event and 1136 Invalidate Message do not include an RDMAP header. The RDMAP layer 1137 passes to the DDP layer an RDMAP Control Field and the Invalidate 1138 STag field (see section 6.1 RDMAP Control and Invalidate STag 1139 Field). The Send with Invalidate and Send with Solicited Event and 1140 Invalidate Message are fully described by the DDP Headers of the 1141 DDP Segments associated with the Message. 1143 See Section 12 Appendix for a description of the DDP Segment 1144 format associated with Send and Send with Solicited Event 1145 Messages. 1147 6.8 Terminate Header 1149 The Terminate Message carries a Terminate Header that contains 1150 additional information associated with the cause of the Terminate. 1151 The Terminate Header immediately follows the DDP header. The RDMAP 1152 layer passes to the DDP layer an RDMAP Control Field. The 1153 following figure depicts a Terminate Header that MUST be used for 1154 the Terminate Message: 1156 0 1 2 3 1157 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1158 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1159 | Terminate Control | Reserved | 1160 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1161 | DDP Segment Length (if any) | | 1162 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 1163 | | 1164 // // 1165 | Terminated DDP Header (if any) | 1166 + + 1167 | | 1168 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1169 | | 1170 // // 1171 | Terminated RDMA Header (if any) | 1172 + + 1173 | | 1174 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1175 Figure 7 Terminate Header Format 1177 Terminate Control: 19 bits. 1179 The Terminate Control field MUST have the format defined in 1180 Figure 8 Terminate Control Field. 1182 0 1 2 3 1183 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1184 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1185 | Layer | EType | Error Code |HdrCt| 1186 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1187 Figure 8 Terminate Control Field 1189 * Figure 9 Terminate Control Field Values defines the valid 1190 values that MUST be used for this field. 1192 * Layer: 4 bits. 1194 Identifies the layer that encountered the error. 1196 * EType (RDMA Error Type): 4 bits. 1198 Identifies the type of error that caused the 1199 Terminate. When the error is detected at the RDMAP 1200 Layer, the RDMAP Layer inserts the Error Type into 1201 this field. When the error is detected at a LLP layer, 1202 a LLP layer creates the Error Type and the DDP layer 1203 passes it up to the RDMAP Layer, and the RDMAP Layer 1204 inserts it into this field. 1206 * Error Code: 8 bits. 1208 This field identifies the specific error that caused 1209 the Terminate. When the error is detected at the RDMAP 1210 Layer, the RDMAP Layer creates the Error Code. When 1211 the error is detected at a LLP layer, a LLP layer 1212 creates the Error Code and the DDP layer passes it up 1213 to the RDMAP Layer, and the RDMAP Layer inserts it 1214 into this field. 1216 * HdrCt: 3 bits. 1218 Header control bits: 1220 * M: 1 bit. DDP Segment Length valid. See Figure 10 1221 for when this bit SHOULD be set. 1223 * D: 1 bit. DDP Header Included. See Figure 10 for 1224 when this bit SHOULD be set. 1226 * R: 1 bit. RDMAP Header Included. See Figure 10 for 1227 when this bit SHOULD be set. 1229 -------+----------+-------+-------------+------+-------------------- 1230 Layer | Layer | Error | Error Type | Error| Error Code Name 1231 | Name | Type | Name | Code | 1232 -------+----------+-------+-------------+------+-------------------- 1233 | | 0000b | Local | None | None 1234 | | | Catastrophic| | 1235 | | | Error | | 1236 | +-------+-------------+------+-------------------- 1237 | | | | 00X | Invalid STag 1238 | | | +------+-------------------- 1239 | | | | 01X | Base or bounds 1240 | | | | | violation 1241 | | | Remote +------+-------------------- 1242 | | 0001b | Protection | 02X | Access rights 1243 | | | Error | | violation 1244 | | | +------+-------------------- 1245 0000b | RDMA | | | 03X | STag not associated 1246 | | | | | with RDMAP Stream 1247 | | | +------+-------------------- 1248 | | | | 04X | TO wrap 1249 | | | +------+-------------------- 1250 | | | | 09X | STag cannot be 1251 | | | | | Invalidated 1252 | | | +------+-------------------- 1253 | | | | FFX | Miscellaneous 1254 | +-------+-------------+------+-------------------- 1255 | | | | 05X | Invalid RDMAP 1256 | | | | | version 1257 | | | +------+-------------------- 1258 | | | | 06X | Unexpected OpCode 1259 | | | Remote +------+-------------------- 1260 | | 0010b | Operation | 07X | Catastrophic error, 1261 | | | Error | | localized to RDMAP 1262 | | | | | Stream 1263 | | | +------+-------------------- 1264 | | | | 08X | Catastrophic error, 1265 | | | | | global 1266 | | | +------+-------------------- 1267 | | | | 09X | STag cannot be 1268 | | | | | Invalidated 1269 | | | +------+-------------------- 1270 | | | | FFX | Miscellaneous 1271 -------+----------+-------+-------------+------+-------------------- 1272 0001b | DDP | See DDP Specification [DDP] for a description of 1273 | | the values and names. 1274 -------+----------+-------+----------------------------------------- 1275 0010b | LLP | For MPA, see MPA Specification [MPA] for a 1276 | (eg MPA) | description of the values and names. 1277 -------+----------+-------+----------------------------------------- 1278 Figure 9 Terminate Control Field Values 1280 Reserved: 8 bits. This field MUST be set to zero on transmit, 1281 ignored on receive. 1283 DDP Segment Length: 16 bits 1285 The length handed up by the DDP Layer when the error was 1286 detected. It MUST be valid if the M bit is set. It MUST be 1287 present when the D bit is set. 1289 Terminated DDP Header: 112 bits for Tagged Messages and 144 bits 1290 for Untagged Messages. 1292 The DDP Header of the incoming Message that is associated 1293 with the Terminate. The DDP Header is not present if the 1294 Terminate Error Type is a Local Catastrophic Error. It MUST 1295 be present if the D bit is set. 1297 Terminated RDMA Header: 224 bits. 1299 The Terminated RDMA Header is only sent back if the terminate 1300 is associated with an RDMA Read Request Message. It MUST be 1301 present if the R bit is set. 1303 If the terminate occurs before the first RDMA Read Request 1304 byte is processed, the original RDMA Read Request Header is 1305 sent back. 1307 If the terminate occurs after the first RDMA Read Request 1308 byte is processed, the RDMA Read Request Header is updated to 1309 reflect the current location of the RDMA Read operation that 1310 is in process: 1312 * Data Sink STag = Data Sink STag originally sent in the 1313 RDMA Read Request. 1315 * Data Sink Tagged Offset = Current offset into the Data 1316 Sink Tagged Buffer. For example if the RDMA Read 1317 Request was terminated after 2048 octets were sent, 1318 then the Data Sink Tagged Offset = the original Data 1319 Sink Tagged Offset + 2048. 1321 * Data Message size = Number of bytes left to transfer. 1323 * Data Source STag = Data Source STag in the RDMA Read 1324 Request. 1326 * Data Source Tagged Offset = Current offset into the 1327 Data Source Tagged Buffer. For example if the RDMA 1328 Read Request was terminated after 2048 octets were 1329 sent, then the Data Source Tagged Offset = the 1330 original Data Source Tagged Offset + 2048. 1332 Note: if a given LLP does not define any termination codes for the 1333 RDMAP Termination message to use, then none would be used for that 1334 LLP. 1336 Figure 10 Error Type to RDMA Message Mapping maps layer name and 1337 error types to each RDMA Message type: 1339 ---------+-------------+------------+------------+----------------- 1340 Layer | Error Type | Terminate | Terminate | What type of 1341 Name | Name | Includes | Includes | RDMA Message can 1342 | | DDP Header | RDMA Header| cause the error 1343 | | and DDP | | 1344 | | Segment | | 1345 | | Length | | 1346 ---------+-------------+------------+------------+----------------- 1347 | Local | No | No | Any 1348 | Catastrophic| | | 1349 | Error | | | 1350 +-------------+------------+------------+----------------- 1351 | Remote | Yes, if | Yes | Only RDMA Read 1352 RDMA | Protection | possible | | Request, Send 1353 | Error | | | with Invalidate, 1354 | | | | and Send with SE 1355 | | | | and Invalidate 1356 +-------------+------------+------------+----------------- 1357 | Remote | Yes, if | No | Any 1358 | Operation | possible | | 1359 | Error | | | 1360 ---------+-------------+------------+------------+----------------- 1361 DDP | See DDP Spec| Yes | No | Any 1362 | [DDP] | | | 1363 ---------+-------------+------------+------------+----------------- 1364 LLP | See LLP Spec| No | No | Any 1365 | [e.g. MPA] | | | 1366 Figure 10 Error Type to RDMA Message Mapping 1368 7 Data Transfer 1370 7.1 RDMA Write Message 1372 An RDMA Write is used by the Data Source to transfer data to a 1373 previously Advertised Tagged Buffer at the Data Sink. The RDMA 1374 Write Message has the following semantics: 1376 * AN RDMA Write Message MUST reference a Tagged Buffer. That is, 1377 the Data Source RDMAP Layer MUST request that the DDP layer 1378 mark the Message as Tagged. 1380 * A valid RDMA Write Message MUST NOT be delivered to the Data 1381 Sink's ULP (i.e. it is placed by the DDP layer). 1383 * At the Remote Peer, when an invalid RDMA Write Message is 1384 delivered to the Remote Peer's RDMAP Layer, an error is 1385 surfaced (see section 9.1 RDMAP Error Surfacing). 1387 * The Tagged Offset of a Tagged Buffer MAY start at a non-zero 1388 value. 1390 * AN RDMA Write Message MAY target all or part of a previously 1391 Advertised buffer. 1393 * The RDMAP does not define how the buffer(s) used by an outbound 1394 RDMA Write is defined and how it is addressed. For example, an 1395 implementation of RDMA may choose to allow a gather-list of 1396 non-contiguous data blocks to be the source of an RDMA Write. 1397 In this case, the data blocks would be combined by the Data 1398 Source and sent as a single RDMA Write Message to the Data 1399 Sink. 1401 * The Data Source RDMAP Layer MUST issue RDMA Write Messages to 1402 the DDP layer in the order they were submitted by the ULP. 1404 * At the Data Source, a subsequent Send (Send with Invalidate, 1405 Send with Solicited Event, or Send with Solicited Event and 1406 Invalidate) Message MAY be used to signal Delivery of previous 1407 RDMA Write Messages to the Data Sink, if desired by the ULP. 1409 * If the Local Peer wishes to write to multiple Tagged Buffers on 1410 the Remote Peer, the Local Peer MUST use multiple RDMA Write 1411 Messages. That is, a single RDMA Write Message can only write 1412 to one remote Tagged Buffer. 1414 * The Data Source MAY issue a zero length RDMA Write Message. 1416 7.2 RDMA Read Operation 1418 The RDMA Read operation MUST consist of a single RDMA Read Request 1419 Message and a single RDMA Read Response Message. 1421 7.2.1 RDMA Read Request Message 1423 An RDMA Read Request is used by the Data Sink to transfer data 1424 from a previously Advertised Tagged Buffer at the Data Source to a 1425 Tagged Buffer at the Data Sink. The RDMA Read Request Message has 1426 the following semantics: 1428 * AN RDMA Read Request Message MUST reference an Untagged Buffer. 1429 That is, the Local Peer's RDMAP Layer MUST request that the DDP 1430 mark the Message as Untagged. 1432 * One RDMA Read Request Message MUST consume one Untagged Buffer. 1434 * The Remote Peer's RDMAP Layer MUST process an RDMA Read Request 1435 Message. A valid RDMA Read Request Message MUST NOT be 1436 delivered to the Data Sink's ULP (i.e. it is processed by the 1437 RDMAP layer). 1439 * At the Remote Peer, when an invalid RDMA Read Request Message 1440 is delivered to the Remote Peer's RDMAP Layer, an error is 1441 surfaced (see section 9.1 RDMAP Error Surfacing). 1443 * AN RDMA Read Request Message MUST reference the RDMA Read 1444 Request Queue. That is, the Local Peer's RDMAP Layer MUST 1445 request that the DDP layer set the Queue Number field to one. 1447 * The Local Peer MUST pass to the DDP Layer RDMA Read Request 1448 Messages in the order they were submitted by the ULP. 1450 * The Remote Peer MUST process the RDMA Read Request Messages in 1451 the order they were sent. 1453 * If the Local Peer wishes to read from multiple Tagged Buffers 1454 on the Remote Peer, the Local Peer MUST use multiple RDMA Read 1455 Request Messages. That is, a single RDMA Read Request Message 1456 MUST only read from one remote Tagged Buffer. 1458 * AN RDMA Read Request Message MAY target all or part of a 1459 previously Advertised buffer. 1461 * If the Data Source receives a valid RDMA Read Request Message 1462 it MUST respond with a valid RDMA Read Response Message. 1464 * The Data Sink MAY issue a zero length RDMA Read Request 1465 Message, by setting the RDMA Read Message Size field to zero in 1466 the RDMA Read Request Header. 1468 * If the Data Source receives a non-zero length RDMA Read Message 1469 Size, the Data Source RDMAP MUST validate the Data Source STag 1470 and Data Source Tagged Offset contained in the RDMA Read 1471 Request Header. 1473 * If the Data Source receives an RDMA Read Request Header with 1474 the RDMA Read Message Size set to zero, the Data Source RDMAP: 1476 * MUST NOT validate the Data Source STag and Data Source 1477 Tagged Offset contained in the RDMA Read Request Header, 1478 and 1480 * MUST respond with a zero length RDMA Read Response 1481 Message. 1483 7.2.2 RDMA Read Response Message 1485 The RDMA Read Response Message uses the DDP Tagged Buffer Model to 1486 Deliver the contents of a previously requested Data Source Tagged 1487 Buffer to the Data Sink, without any involvement from the ULP at 1488 the Remote Peer. The RDMA Read Response Message has the following 1489 semantics: 1491 * The RDMA Read Response Message for the associated RDMA Read 1492 Request Message travels in the opposite direction. 1494 * An RDMA Read Response Message MUST reference a Tagged Buffer. 1495 That is, the Data Source RDMAP Layer MUST request that the DDP 1496 mark the Message as Tagged. 1498 * The Data Source MUST ensure that a sufficient number of 1499 Untagged Buffers are available on the RDMA Read Request Queue 1500 (Queue with DDP Queue Number 1) to support the maximum number 1501 of RDMA Read Requests negotiated by the ULP. 1503 * The RDMAP Layer MUST Deliver the RDMA Read Response Message to 1504 the ULP. 1506 * At the Remote Peer, when an invalid RDMA Read Response Message 1507 is delivered to the Remote Peer's RDMAP Layer, an error is 1508 surfaced (see section 9.1 RDMAP Error Surfacing). 1510 * The Tagged Offset of a Tagged Buffer MAY start at a non-zero 1511 value. 1513 * The Data Source RDMAP Layer MUST pass RDMA Read Response 1514 Messages to the DDP layer in the order that the RDMA Read 1515 Request Messages were received by the RDMAP Layer at the Data 1516 Source. 1518 * The Data Sink MAY validate that the STag, Tagged Offset, and 1519 length of the RDMA Read Response Message are the same as the 1520 STag, Tagged Offset, and length included in the corresponding 1521 RDMA Read Request Message. 1523 * A single RDMA Read Response Message MUST write to one remote 1524 Tagged Buffer. If the Data Sink wishes to Read multiple Tagged 1525 Buffers, the Data Sink can use multiple RDMA Read Request 1526 Messages. 1528 7.3 Send Message Type 1530 The Send Message Type uses the DDP Untagged Buffer Model to 1531 transfer data from the Data Source into an Untagged Buffer at the 1532 Data Sink. 1534 * A Send Message Type MUST reference an Untagged Buffer. That is, 1535 the Local Peer's RDMAP Layer MUST request that the DDP layer 1536 mark the Message as Untagged. 1538 * One Send Message Type MUST consume one Untagged Buffer. 1540 * The ULP Message sent using a Send Message Type MAY be less 1541 than or equal to the size of the consumed Untagged Buffer. 1542 The RDMAP Layer communicates to the ULP the size of the 1543 data written into the Untagged Buffer. 1545 * If the ULP Message sent via Send Message Type is larger 1546 than the Data Sink's Untagged Buffer, it is an error (see 1547 section 9.1 RDMAP Error Surfacing). 1549 * At the Remote Peer, the Send Message Type MUST be Delivered to 1550 the Remote Peer's ULP in the order they were sent. 1552 * After the Send with Solicited Event or Send with Solicited 1553 Event and Invalidate Message is Delivered to the ULP, the RDMAP 1554 MAY generate an Event, if the Data Sink is configured to 1555 generate such an Event. 1557 * At the Remote Peer, when an invalid Send Message Type is 1558 Delivered to the Remote Peer's RDMAP Layer, an error is 1559 surfaced (see section 9.1 RDMAP Error Surfacing). 1561 * The RDMAP does not define how the buffer(s) used by an outbound 1562 Send Message Type is defined and how it is addressed. For 1563 example, an implementation of RDMA may choose to allow a 1564 gather-list of non-contiguous data blocks to be the source of a 1565 Send Message Type. In this case, the data blocks would be 1566 combined by the Data Source and sent as a single Send Message 1567 Type to the Data Sink. 1569 * For a Send Message Type, the Local Peer's RDMAP Layer MUST 1570 request that the DDP layer set the Queue Number field to zero. 1572 * The Local Peer MUST issue Send Message Type Messages in the 1573 order they were submitted by the ULP. 1575 * The Data Source MAY pass a zero length Send Message Type. A 1576 zero length Send Message Type MUST consume an Untagged Buffer 1577 at the Data Sink.A Send with Invalidate or Send with Solicited 1578 Event and Invalidate Message MUST reference an STag. That is, 1579 the Local Peer's RDMAP Layer MUST pass the RDMA control field 1580 and the STag that will be Invalidated to the DDP layer. 1582 * When the Send with Invalidate and Send with Solicited Event and 1583 Invalidate Message are Delivered to the Remote Peer's RDMAP 1584 Layer, the RDMAP Layer MUST: 1586 * Verify the STag that is associated with the RDMAP Stream; 1587 and 1589 * Invalidate the STag if it is associated with the RDMAP 1590 Stream; or Issue a Terminate Message if the STag is not 1591 associated with the RDMAP Stream (i.e. STag cannot be 1592 Invalidated Terminate Error Code). 1594 7.4 Terminate Message 1596 The Terminate Message uses the DDP Untagged Buffer Model to 1597 transfer error related information from the Data Source into an 1598 Untagged Buffer at the Data Sink and then ceases all further 1599 communications on the underlying DDP Stream. The Terminate Message 1600 has the following semantics: 1602 * A Terminate Message MUST reference an Untagged Buffer. That is, 1603 the Local Peer's RDMAP Layer MUST request that the DDP layer 1604 mark the Message as Untagged. 1606 * A Terminate Message references the Terminate Queue. That is, 1607 the Local Peer's RDMAP Layer MUST request that the DDP layer 1608 set the Queue Number field to two. 1610 * One Terminate Message MUST consume one Untagged Buffer. 1612 * On a single RDMAP Stream, the RDMAP layer MUST guarantee 1613 placement of a single Terminate Message. 1615 * A Terminate Message MUST be Delivered to the Remote Peer's 1616 RDMAP Layer. The RDMAP Layer MUST Deliver the Terminate Message 1617 to the ULP. 1619 * At the Remote Peer, when an invalid Terminate Message is 1620 delivered to the Remote Peer's RDMAP Layer, an error is 1621 surfaced (see section 9.1 RDMAP Error Surfacing). 1623 * The RDMAP Layer Completes in error all ULP Operations that have 1624 not been provided to the DDP layer. 1626 * After sending a Terminate Message on an RDMAP Stream, the Local 1627 Peer MUST NOT send any more Messages on that specific RDMAP 1628 Stream. 1630 * After receiving a Terminate Message on an RDMAP Stream, the 1631 Remote Peer MAY stop sending Messages on that specific RDMAP 1632 Stream. 1634 7.5 Ordering and Completions 1636 It is important to understand the difference between Placement and 1637 Delivery ordering since RDMAP provides quite different semantics 1638 for the two. 1640 Note that many current protocols, both as used in the Internet and 1641 elsewhere, assume that data is both Placed and Delivered in order. 1642 This allowed applications to take a variety of shortcuts by taking 1643 advantage of this fact. For RDMAP, many of these shortcuts are no 1644 longer safe to use, and could cause application failure. 1646 The following rules apply to implementations of the RDMAP 1647 protocol. Note, in these rules Send includes Send, Send with 1648 Invalidate, Send with Solicited Event, and Send with Solicited 1649 Event and Invalidate: 1651 1. RDMAP does not provide ordering among Messages on different 1652 RDMAP Streams. 1654 2. RDMAP does not provide ordering between operations that are 1655 generated from the two ends of an RDMAP Stream. 1657 3. RDMA Messages that use Tagged and Untagged Buffers MAY be 1658 Placed in any order. If an application uses overlapping 1659 buffers (points different Messages or portions of a single 1660 Message at the same buffer), then it is possible that the last 1661 incoming write to the Data Sink buffer will not be the last 1662 outgoing data sent from the Data Source. 1664 4. For a Send operation, the contents of an Untagged Buffer at 1665 the Data Sink MAY be indeterminate until the Send is Delivered 1666 to the ULP at the Data Sink. 1668 5. For an RDMA Write operation, the contents of the Tagged Buffer 1669 at the Data Sink MAY be indeterminate until a subsequent Send 1670 is Delivered to the ULP at the Data Sink. 1672 6. For an RDMA Read operation, the contents of the Tagged Buffer 1673 at the Data Sink MAY be indeterminate until the RDMA Read 1674 Response Message has been Delivered at the Local Peer. 1676 Statements 4, 5, and 6 imply "no peeking" at the data to see 1677 if it is done. It is possible for some data to arrive before 1678 logically earlier data does, and peeking may cause 1679 unpredictable application failure 1681 7. If the ULP or Application modifies the contents of Tagged or 1682 Untagged Buffers being modified by an RDMA Operation while the 1683 RDMAP is processing the RDMA Operation, the state of the 1684 Buffers is indeterminate. 1686 8. If the ULP or Application modifies the contents of Tagged or 1687 Untagged Buffers read by an RDMA Operation while the RDMAP is 1688 processing the RDMA Operation, the results of the read are 1689 indeterminate. 1691 9. The Completion of an RDMA Write or Send Operation at the Local 1692 Peer does not guarantee that the ULP Message has yet reached 1693 the Remote Peer ULP Buffer or been examined by the Remote ULP. 1695 10. Send Messages MUST be Delivered to the ULP at the Remote Peer 1696 after they are Delivered to RDMAP by DDP and in the order that 1697 the they were Delivered to RDMAP. 1699 Note that DDP ordering rules ensure that this will be the same 1700 order that they were submitted at the Local Peer and that any 1701 prior RDMA Writes have been submitted for ordered Placement at 1702 the Remote Peer. This means that when the ULP sees the 1703 Delivery of the Send, the memory buffers targeted by any 1704 preceding RDMA Writes and Sends are available to be accessed 1705 locally or remotely as authorized. If the ULP overlaps its 1706 buffers for different operations, the data from the RDMA Write 1707 or Send may be overwritten by subsequent RDMA Operations 1708 before the ULP receives and processes the Delivery. 1710 11. RDMA Read Response Messages MUST be Delivered to the ULP at 1711 the Remote Peer after they are Delivered to RDMAP by DDP and 1712 in the order that the they were Delivered to RDMAP. 1714 DDP ordering rules ensure that this will be the same order 1715 that they were submitted at the Local Peer. This means that 1716 when the ULP sees the Delivery of the RDMA Read Response, the 1717 memory buffers targeted by the RDMA Read Response are 1718 available to be accessed locally or remotely as authorized. If 1719 the ULP overlaps its buffers for different operations, the 1720 data from the RDMA Read Response may be overwritten by 1721 subsequent RDMA Operations before the ULP receives and 1722 processes the Delivery. 1724 12. RDMA Read Request Messages, including zero-length RDMA Read 1725 Requests, MUST NOT start processing at the Remote Peer until 1726 they have been Delivered to RDMAP by DDP. 1728 Note the ULP is assured that data written can be read back. 1729 For example, if an RDMA Read Request is issued by the local 1730 peer, targeting the same ULP Buffer as a preceding Send or 1731 RDMA Write (in the same direction as the RDMA Read Request), 1732 and there no other other sources of update for the ULP Buffer, 1733 then the remote peer will send back the data written by the 1734 Send or RDMA Write. That is, for this example the ULP Buffer: 1735 is Advertised for use on a series of RDMA Messages, is only 1736 valid on the RDMAP Stream for which it is advertised, and is 1737 not locally updated while the series of RDMAP Messages are 1738 performed. For this example, order rule (12) assures that 1739 subsequent local or remote accesses to the ULP Buffer contain 1740 the data written by the Send or RDMA Write. 1742 RDMA Read Response Messages MAY be generated at the Remote 1743 Peer after subsequent RDMA Write Messages or Send Messages 1744 have been Placed or Delivered. Therefore, when an application 1745 does an RDMA Read Request followed by an RDMA Write (or Send) 1746 to the same buffer, it may get the data from the later RDMA 1747 Write (or Send) in the RDMA Read Response Message, even though 1748 the operations completed in order at the Local Peer. If this 1749 behavior is not desired, the Local Peer ULP must Fence the 1750 later RDMA write (or Send) by withholding the RDMA Write 1751 Message until all outstanding RDMA Read Responses have been 1752 Delivered. 1754 13. The RDMAP Layer MUST submit RDMA Messages to the DDP layer in 1755 the order the RDMA Operations are submitted to the RDMAP Layer 1756 by the ULP. 1758 14. A Send or RDMA Write Message MUST NOT be considered Complete 1759 at the Local Peer (Data Source) until it has been successfully 1760 completed at the DDP layer. 1762 15. RDMA Operations MUST be Completed at the Local Peer in the 1763 order that they were submitted by the ULP. 1765 16. At the Data Sink, an incoming Send Message MUST be Delivered 1766 to the ULP only after the DDP Message has been Delivered to 1767 the RDMAP Layer by the DDP layer. 1769 17. RDMA Read Response Message processing at the Remote Peer 1770 (reading the specified Tagged Buffer) MUST be started only 1771 after the RDMA Read Request Message has been Delivered by the 1772 DDP layer (thus all previous RDMA Messages have been properly 1773 submitted for ordered Placement). 1775 18. Send Messages MAY be Completed at the Remote Peer (Data Sink) 1776 before prior incoming RDMA Read Request Messages have 1777 completed their response processing. 1779 19. An RDMA Read operation MUST NOT be Completed at the Local Peer 1780 until the DDP layer Delivers the associated incoming RDMA Read 1781 Response Message. 1783 20. If more than one outstanding RDMA Read Request Message is 1784 supported by both peers, the RDMA Read Response Messages MUST 1785 be submitted to the DDP layer on the Remote Peer in the order 1786 the RDMA Read Request Messages were Delivered by DDP, but the 1787 actual read of the buffer contents MAY take place in any order 1788 at the Remote Peer. 1790 This simplifies Local Peer Completion processing for RDMA 1791 Reads in that a Delivered RDMA Read Response MUST be 1792 sufficient to Complete the RDMA Read Operation. 1794 8 RDMAP Stream Management 1796 RDMAP Stream management consists of RDMAP Stream Initialization 1797 and RDMAP Stream Termination. 1799 8.1 Stream Initialization 1801 RDMAP Stream initialization occurs after the LLP Stream has been 1802 created (e.g. for DDP/MPA over TCP the first TCP Segment after the 1803 SYN, SYN/ACK exchange). The ULP is responsible for transitioning 1804 the LLP Stream into RDMA enabled mode. The switch to RDMA mode can 1805 happen immediately at LLP Stream initialization or at any time 1806 thereafter. Once in RDMA enabled mode, an implementation MUST send 1807 only RDMA Messages across the transport Stream until the RDMAP 1808 Stream is torn down. 1810 For each direction of an RDMAP Stream: 1812 * For a given RDMAP Stream, the number of outstanding RDMA Read 1813 Requests is limited per RDMAP Stream direction. 1815 * It is the ULP's responsibility to set the maximum number of 1816 outstanding, inbound RDMA Read Requests per RDMAP Stream 1817 direction. 1819 * The RDMAP Layer MUST provide the maximum number of outstanding, 1820 inbound RDMA Read Requests per RDMAP Stream direction that were 1821 negotiated between the ULP and the Local Peer's RDMAP Layer. 1822 The negotiation mechanism is outside the scope of this 1823 specification. 1825 * It is the ULP's responsibility to set the maximum number of 1826 outstanding, outbound RDMA Read Requests per RDMAP Stream 1827 direction. 1829 * The RDMAP Layer MUST provide the maximum number of outstanding, 1830 outbound RDMA Read Requests for the RDMAP Stream direction that 1831 were negotiated between the ULP and the Local Peer's RDMAP 1832 Layer. The negotiation mechanism is outside the scope of this 1833 specification. 1835 * The Local Peer's ULP is responsible for negotiating with the 1836 Remote Peer's ULP the maximum number of outstanding RDMA Read 1837 Requests for the RDMAP Stream direction. It is recommended that 1838 the ULP set the maximum number of outstanding, inbound RDMA 1839 Read Requests equal to the maximum number of outstanding, 1840 outbound RDMA Read Requests for a given RDMAP Stream direction. 1842 * For outbound RDMA Read Requests, the RDMAP Layer MUST NOT 1843 exceed the maximum number of outstanding, outbound RDMA Read 1844 Requests that were negotiated between the ULP and the Local 1845 Peer's RDMAP Layer. 1847 * For inbound RDMA Read Requests, the RDMAP Layer MUST NOT exceed 1848 the maximum number of outstanding, inbound RDMA Read Requests 1849 that were negotiated between the ULP and the Local Peer's RDMAP 1850 Layer. 1852 8.2 Stream Teardown 1854 There are three methods for terminating an RDMAP Stream: ULP 1855 Graceful Termination, RDMAP Abortive Termination, and LLP Abortive 1856 Termination. 1858 The ULP is responsible for performing ULP Graceful Termination. 1859 After a ULP Graceful Termination, either side of the Stream can 1860 initiate LLP Graceful Termination, using the graceful termination 1861 mechanism provided by the LLP. 1863 RDMAP Abortive Termination allows the RDMAP to issue a Terminate 1864 Message describing the reason the RDMAP Stream was terminated. The 1865 next section (8.2.1 RDMAP Abortive Termination) describes the 1866 RDMAP Abortive Termination in detail. 1868 LLP Abortive Termination results due to a LLP error and causes the 1869 RDMAP Stream to be torn down midstream, without an RDMAP Terminate 1870 Message. While this last method is highly undesirable, it is 1871 possible and the ULP should take this into consideration. 1873 8.2.1 RDMAP Abortive Termination 1875 RDMAP defines a Terminate operation that SHOULD be invoked when 1876 either an RDMAP error is encountered or a LLP error is surfaced to 1877 the RDMAP layer by the LLP. 1879 It is not always possible to send the Terminate Message. For 1880 example, certain LLP errors may occur that cause the LLP Stream to 1881 be torn down before a) RDMAP is aware of the error, b) before 1882 RDMAP is able to send the Terminate Message, or c) after RDMAP has 1883 posted the Terminate Message to the LLP, but it has not yet been 1884 transmitted by the LLP. 1886 Note that an RDMAP Abortive Termination may entail loss of data. 1887 In general, when a Terminate Message is received it is impossible 1888 to tell for sure what unacknowledged RDMA Messages were Completed 1889 successfully at the Remote Peer. Thus the state of all outstanding 1890 RDMA Messages is indeterminate and the Messages SHOULD be 1891 considered Completed in error. 1893 When a peer sends or receives a Terminate Message, it MAY 1894 immediately teardown the LLP Stream. The peer SHOULD perform a 1895 graceful LLP teardown to ensure the Terminate Message is 1896 successfully Delivered. 1898 See section 6.8 Terminate Header for a description of the 1899 Terminate Message and its contents. See section 7.4 Terminate 1900 Message for a description of the Terminate Message semantics. 1902 9 RDMAP Error Management 1904 The RDMAP protocol does not have RDMAP or DDP layer error recovery 1905 operations built in. If everything is working, the LLP guarantees 1906 will ensure that the Messages are arriving at the destination. 1908 If errors are detected at the RDMAP or DDP layer, then the RDMAP, 1909 DDP and LLP Streams are Abortively Terminated (see section 6.8 1910 Terminate Header on page 27). 1912 In general poor implementations or improper ULP programming causes 1913 the errors detected at the RDMAP and DDP layers. In these cases, 1914 returning a diagnostic termination error Message and closing the 1915 RDMAP Stream is far simpler than attempting to maintain the RDMAP 1916 Stream, particularly when the cause of the error is not known. 1918 If an LLP does not support teardown of a Stream independent of 1919 other Streams and an RDMAP error results in the Termination of a 1920 specific Stream, then the LLP MUST label the Stream as an 1921 erroneous Stream and MUST NOT allow any further data transfer on 1922 that Stream after RDMAP requests the Stream to be torn down. 1924 For a specific LLP connection, when all Streams are either 1925 gracefully torn down or are labeled as erroneous Streams, the LLP 1926 connection MUST be torn down. 1928 Since errors are detected at the Remote Peer (possibly long) after 1929 RDMA Messages are passed to DDP and the LLP at the Local Peer and 1930 Completed, the sender cannot easily determine which of its 1931 Messages have been received. (RDMA Reads are an exception to this 1932 rule). 1934 For a list of errors returned to the Remote Peer as a result of an 1935 Abortive Termination, see section 6.8 Terminate Header on page 27. 1937 9.1 RDMAP Error Surfacing 1939 If an error occurs at the Local Peer, the RDMAP layer MUST attempt 1940 to inform the local ULP that the error has occurred. 1942 The Local Peer MUST send a Terminate Message for each of the 1943 following cases: 1945 1. For Errors detected while creating RDMA Write, Send, Send with 1946 Invalidate, Send with Solicited Event, Send with Solicited 1947 Event and Invalidate, or RDMA Read Requests, or other reasons 1948 not directly associated with an incoming Message, the 1949 Terminate Message and Error code are sent instead of the 1950 request. In this case, the Error Type and Error Code fields 1951 are included in the Terminate Message, but the Terminated DDP 1952 Header and Terminated RDMA Header fields are set to zero. 1954 2. For errors detected on an incoming RDMA Write, Send, Send with 1955 Invalidate, Send with Solicited Event, Send with Solicited 1956 Event and Invalidate, or Read Response Message (after the 1957 Message has been Delivered by DDP), the Terminate Message is 1958 sent at the earliest possible opportunity, preferably in the 1959 next outgoing RDMA Message. In this case, the Error Type, 1960 Error Code, ULP PDU Length, and Terminated DDP Header fields 1961 are included in the Terminate Message, but the Terminated RDMA 1962 Header field is set to zero. 1964 3. For errors detected on an incoming RDMA Read Request Message 1965 (after the Message has been Delivered by DDP), the Terminate 1966 Message is sent at the earliest possible opportunity, 1967 preferably in the next outgoing RDMA Message. In this case, 1968 the Error Type, Error Code, ULP PDU Length, Terminated DDP 1969 Header, and Terminated RDMA Header fields are included in the 1970 Terminate Message. 1972 4. If more than one error is detected on incoming RDMA Messages, 1973 before the Terminate Message can be sent, then the first RDMA 1974 Message (and its associated DDP Segment) that experienced an 1975 error MUST be captured by the Terminate Message in accordance 1976 with rules 2 and 3 above. 1978 9.2 Errors Detected at the Remote Peer on Incoming RDMA Messages 1980 On incoming RDMA Writes, RDMA Read Response, Sends, Send with 1981 Invalidate, Send with Solicited Event, Send with Solicited Event 1982 and Invalidate, and Terminate Messages, the following must be 1983 validated: 1985 1. The DDP Layer MUST validate all DDP Segment fields. 1987 2. The RDMA OpCode MUST be valid. 1989 3. The RDMA Version MUST be valid. 1991 Additionally, on incoming Send with Invalidate and Send with 1992 Solicited Event and Invalidate Messages, the following must 1993 also be validated: 1995 4. The Invalidate STag MUST be valid. 1997 5. The STag MUST be associated to this RDMAP Stream. 1999 On incoming RDMA Request Messages, the following must be 2000 validated: 2002 1. The DDP Layer MUST validate all Untagged DDP Segment fields. 2004 2. The RDMA OpCode MUST be valid. 2006 3. The RDMA Version MUST be valid. 2008 4. For non-zero length RDMA Read Request Messages: 2010 a. The Data Source STag MUST be valid. 2012 b. The Data Source STag MUST be associated to this RDMAP 2013 Stream. 2015 c. The Data Source Tagged Offset MUST fall in the range of 2016 legal offsets associated with the Data Source STag. 2018 d. The sum of the Data Source Tagged Offset and the RDMA Read 2019 Message Size MUST fall in the range of legal offsets 2020 associated with the Data Source STag. 2022 e. The sum of the Data Source Tagged Offset and the RDMA Read 2023 Message Size MUST NOT cause the Data Source Tagged Offset 2024 to wrap. 2026 10 Security Considerations 2028 This section discusses both protocol-specific considerations and 2029 the implications of using RDMAP with existing security mechanisms. 2030 A more detailed analysis of the security issues around the 2031 implementation and the use of the DDP can be found in [RDMASEC]. 2033 10.1 Protocol-specific Security Considerations 2035 The vulnerabilities of RDMAP to active third-party interference 2036 are no greater than any other protocol running over TCP. A third 2037 party, by injecting spoofed packets into the network that are 2038 Delivered to an RDMAP Data Sink, could launch a variety of attacks 2039 that exploit RDMAP-specific behavior. Since RDMAP directly or 2040 indirectly exposes memory addresses on the wire, the Placement 2041 information carried in each RDMA Message must be validated, 2042 including access rights and octet level granularity base and 2043 bounds check, before any data is Placed. For example, a third- 2044 party adversary could inject random packets that appear to be 2045 valid RDMA Messages and corrupt the memory on an RDMAP Data Sink. 2046 Since RDMAP is IP transport protocol independent, communication 2047 security mechanisms such as IPsec [IPSEC] or TLS [TLS] may be used 2048 to prevent such attacks. 2050 10.2 Using IPSec with RDMAP 2052 IPsec can be used to protect against the packet injection attacks 2053 outlined above. Because IPsec is designed to secure arbitrary IP 2054 packet streams, including streams where packets are lost, RDMAP 2055 can run on top of IPsec without any change. IPsec packets are 2056 processed (e.g., integrity checked and possibly decrypted) in the 2057 order they are received, and an RDMAP Data Sink will process the 2058 decrypted RDMA Messages contained in these packets in the same 2059 manner as RDMA Messages contained in unsecured IP packets. 2061 10.3 Other Security Considerations 2063 RDMAP has several mechanisms that deal with a number of attacks. 2064 These include, but are not limited to: 2066 1. Connection to/from an unauthorized or unauthenticated 2067 endpoint. 2069 2. Highjacking of an RDMAP Stream. 2071 3. Attempts to read or write from unauthorized memory regions. 2073 4. Injection of RDMA Messages within a Stream on a multi-user 2074 operating system by another application. 2076 RDMAP relies on the LLP to establish the LLP Stream over which 2077 RDMA Messages will be carried. RDMAP itself does nothing to 2078 authenticate the validity of the LLP Stream of either of the 2079 endpoints. It is the responsibility of the ULP to validate the 2080 LLP Stream. This is highly desirable due to the nature of RDMA 2081 and DDP. 2083 Hijacking of an RDMAP channel would require that the underlying 2084 LLP connection be hijacked. This would require knowledge of 2085 Advertised buffers in order to directly Place data into a user 2086 buffer and is therefore constrained by the same techniques 2087 mentioned to guard against attempts to read or write from 2088 unauthorized memory regions. 2090 RDMAP does not require a host to open its buffers to arbitrary 2091 attacks over the RDMAP Stream. It may access ULP memory only to 2092 the extent that the ULP has enabled and authorized it to do so. 2093 The STag access control model is defined by a (forthcoming) 2094 document. Specific security operations include: 2096 1. STags are only valid over the exact byte range established by 2097 the ULP. RDMAP MUST provide a mechanism for the ULP to 2098 establish and revoke the TO range associated with the ULP 2099 Buffer referenced by an STag. 2101 2. STags are only valid for the duration established by the ULP. 2102 The ULP may revoke them at any time, in accordance with its 2103 own upper layer protocol requirements. RDMAP MUST provide a 2104 mechanism for the ULP to establish and revoke STag validity. 2106 3. STags are only enabled for read and/or write access by 2107 explicit ULP action. RDMAP MUST provide a mechanism for the 2108 ULP to establish and revoke read, write, or read and write 2109 access to the ULP Buffer referenced by an STag. 2111 4. The implementation is free to choose the value of STags and is 2112 encouraged to sparsely populate them over the full range 2113 available. This is admittedly weak security protection against 2114 a deliberate attack, but does minimize the risk of accidental 2115 matches when an incorrect STag is used due to a ULP software 2116 error. 2118 5. RDMAP allows and encourages local interactions to restrict the 2119 usage of STags to specific Streams and/or user processes. 2120 RDMAP MUST provide a mechanism for associating a RDMAP Stream 2121 with a STag. 2123 RDMAP MAY choose several locally defined mechanisms for 2124 associating a RDMAP Stream and a STag. One such mechanism is 2125 to provide two association types: a protection domain 2126 association and an RDMAP Stream association. 2128 * Under the protection domain (PD) association, a unique 2129 protection domain identifier (PD ID) is created and used 2130 locally to associate the STag with a set of RDMAP Streams. 2131 The scope of the PD ID MUST include all of the RDMAP 2132 Streams associated with the PD ID. Under this mechanism, 2133 only RDMAP Streams that have the same PD ID as the STag 2134 are allowed to use the STag. 2136 For an incoming RDMA Read Request Message on an RDMAP 2137 Stream, if the PD ID associated with that RDMAP Stream is 2138 not the same as the PD ID associated with the Data Source 2139 STag, then no RDMA Read Response Message is returned and 2140 the RDMAP Stream MUST be Terminated with an Invalid STag 2141 error. For a Send with Invalidate or Send with SE and 2142 Invalidate on an RDMAP Stream, if the PD ID associated 2143 with that RDMAP Stream is not the same as the PD ID 2144 associated with the STag that is to be Invalidated, the 2145 Message is not delivered to the ULP and the RDMAP Stream 2146 MUST be Terminated with an STag cannot be Invalidated 2147 error. Note that the PD ID is locally defined, and cannot 2148 be directly manipulated by the Remote Peer. 2150 * Under RDMAP Stream association, a given RDMAP Stream is 2151 identified locally by a unique RDMAP Stream identifier 2152 (ID) and that RDMA Stream ID is associated with the STag 2153 and RDMAP Stream. 2155 For an incoming RDMA Read Request Message on an RDMAP 2156 Stream, if the RDMAP Stream ID associated with that RDMAP 2157 Stream is not the same as the RDMAP Stream ID associated 2158 with the Data Source STag, then no RDMA Read Response 2159 Message is returned and the RDMAP Stream MUST be 2160 Terminated with an Invalid STag error. Finally, for a Send 2161 with Invalidate or Send with SE and Invalidate on an RDMAP 2162 Stream, if the RDMAP Stream ID associated with that RDMAP 2163 Stream is not the same as the RDMAP Stream ID associated 2164 with the STag that is to be Invalidated, then the Message 2165 is not delivered to the ULP and the RDMAP Stream MUST be 2166 Terminated with an STag cannot be Invalidated error. Note 2167 that the RDMA Stream ID is locally defined, and cannot be 2168 directly manipulated by the Remote Peer. 2170 Note: For an incoming RDMA Write or RDMA Read Response 2171 Message on an RDMAP Stream, the DDP layer MUST associate the 2172 STag targeted by the RDMA Write or RDMA Read Response Message 2173 (respectively) to the RDMAP Stream. If an STag targeted by 2174 the RDMA Write or RDMA Read Response Message Segment is not 2175 associated with the RDMAP Stream that received the Message 2176 Segment, DDP MUST surface an Invalid STag error to the RDMAP 2177 layer. The RDMAP layer MUST Terminate the RDMAP Stream if the 2178 DDP Layer surfaces an Invalid STag error. 2180 6. A ULP may only expose memory to remote access to the extent 2181 that it already had access to that memory itself. 2183 7. RDMAP provides operations to allow the holder of an STag to 2184 indicate when it has made its last usage of that STag. This 2185 enables automatic deregistration and/or scope reduction of 2186 STags as the implementation and ULP may see fit. 2188 8. If an STag is not valid on a connection, RDMAP provides a 2189 mechanism for terminating the RDMAP Stream (see section 6.8 2190 Terminate Header). 2192 9. An STag that is associated with an RDMAP Stream becomes 2193 invalid upon reception of a valid Send with Invalidate or Send 2194 with Solicited Event Message. RDMAP MUST invalidate the STag 2195 sent in a valid Send with Invalidate or Send with Solicited 2196 Event and Invalidate Message, before Completing the Send with 2197 Invalidate or Send with Solicited Event and Invalidate 2198 Message. 2200 Further, RDMAP encourages direct Placement of incoming payloads in 2201 user-mode ULP Buffers. This avoids the risks of prior solutions 2202 that relied upon exposing system buffers for incoming payloads. 2204 There is also a clean data Delivery hand-off between RDMAP and the 2205 ULP. This allows the ULP to implement additional security 2206 operations without restrictions or interference from RDMAP. 2208 In summary, RDMAP enables both ULP and LLP security. It requires 2209 that all of its data access be enabled and authorized by the ULP. 2211 It provides no operations for the ULP to gain permissions not 2212 already granted by the host operating system. It allows and 2213 encourages local interactions to specify even more precise 2214 security checks on STag binding and data transfer operations. 2216 By remaining independent of ULP and LLP security protocols, RDMAP 2217 will benefit from continuing improvements at those layers. Users 2218 are provided flexibility to adapt to their specific security 2219 requirements and the ability to adapt to future security 2220 challenges. 2222 11 References 2224 11.1 Normative References 2226 [DDP] H. Shah et al., "Direct Data Placement over Reliable 2227 Transports", RDMA Consortium Draft Specification draft-shah- 2228 iwarp-ddp-01.txt, October 2002 2230 [MPA] P. Culley et al., "Markers with PDU Alignment", RDMA 2231 Consortium Draft Specification draft-culley-iwarp-mpa-01.txt, 2232 October 2002 2234 [SCTP] R. Stewart et al., "Stream Control Transmission Protocol", 2235 RFC 2960, October 2000. 2237 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 2238 September 1981. 2240 11.2 Informative References 2242 [RFC2401] Atkinson, R., Kent, S., "Security Architecture for the 2243 Internet Protocol", RFC 2401, November 1998. 2245 [TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", 2246 RFC 2246, November 1998. 2248 [RDMASEC] J. Pinkerton et al., "DDP/RDMAP Security", draft-ietf- 2249 rddp-security-00.txt, October 2003. 2251 12 Appendix 2253 12.1 DDP Segment Formats for RDMA Messages 2255 This appendix is for information only and is NOT part of the 2256 standard. It simply depicts the DDP Segment format for the various 2257 RDMA Messages. 2259 12.1.1 DDP Segment for RDMA Write 2261 The following figure depicts an RDMA Write, DDP Segment: 2263 0 1 2 3 2264 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2266 | DDP Control | RDMA Control | 2267 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2268 | Data Sink STag | 2269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2270 | Data Sink Tagged Offset | 2271 + + 2272 | | 2273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2274 | RDMA Write ULP Payload | 2275 // // 2276 | | 2277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2278 Figure 11 RDMA Write, DDP Segment format 2280 12.1.2 DDP Segment for RDMA Read Request 2282 The following figure depicts an RDMA Read Request, DDP Segment: 2284 0 1 2 3 2285 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2287 | DDP Control | RDMA Control | 2288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2289 | Reserved (Not Used) | 2290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2291 | DDP (RDMA Read Request) Queue Number | 2292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2293 | DDP (RDMA Read Request) Message Sequence Number | 2294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2295 | DDP (RDMA Read Request) Message Offset | 2296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2297 | Data Sink STag (SinkSTag) | 2298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2299 | | 2300 + Data Sink Tagged Offset (SinkTO) + 2301 | | 2302 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2303 | RDMA Read Message Size (RDMARDSZ) | 2304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2305 | Data Source STag (SrcSTag) | 2306 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2307 | | 2308 + Data Source Tagged Offset (SrcTO) + 2309 | | 2310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2311 Figure 12 RDMA Read Request, DDP Segment format 2313 12.1.3 DDP Segment for RDMA Read Response 2315 The following figure depicts an RDMA Read Response, DDP Segment: 2317 0 1 2 3 2318 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2319 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2320 | DDP Control | RDMA Control | 2321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2322 | Data Sink STag | 2323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2324 | Data Sink Tagged Offset | 2325 + + 2326 | | 2327 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2328 | RDMA Read Response ULP Payload | 2329 // // 2330 | | 2331 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2332 Figure 13 RDMA Read Response, DDP Segment format 2334 12.1.4 DDP Segment for Send and Send with Solicited Event 2336 The following figure depicts a Send and Send with Solicited 2337 Request, DDP Segment: 2339 0 1 2 3 2340 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2341 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2342 | DDP Control | RDMA Control | 2343 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2344 | Reserved (Not Used) | 2345 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2346 | (Send) Queue Number | 2347 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2348 | (Send) Message Sequence Number | 2349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2350 | (Send) Message Offset | 2351 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2352 | Send ULP Payload | 2353 // // 2354 | | 2355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2356 Figure 14 Send and Send with Solicited Event, DDP Segment format 2358 12.1.5 DDP Segment for Send with Invalidate and Send with SE and 2359 Invalidate 2361 The following figure depicts a Send with invalidate and Send with 2362 Solicited and Invalidate Request, DDP Segment: 2364 0 1 2 3 2365 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2366 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2367 | DDP Control | RDMA Control | 2368 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2369 | Invalidate STag | 2370 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2371 | (Send) Queue Number | 2372 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2373 | (Send) Message Sequence Number | 2374 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2375 | (Send) Message Offset | 2376 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2377 | Send ULP Payload | 2378 // // 2379 | | 2380 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2381 Figure 15 Send with Invalidate and Send with SE and Invalidate, 2382 DDP Segment 2384 12.1.6 DDP Segment for Terminate 2386 The following figure depicts a Terminate, DDP Segment: 2388 0 1 2 3 2389 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2390 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2391 | DDP Control | RDMA Control | 2392 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2393 | Reserved (Not Used) | 2394 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2395 | DDP (Terminate) Queue Number | 2396 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2397 | DDP (Terminate) Message Sequence Number | 2398 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2399 | DDP (Terminate) Message Offset | 2400 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2401 | Terminate Control | Reserved | 2402 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2403 | DDP Segment Length (if any) | | 2404 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 2405 | | 2406 + + 2407 | Terminated DDP Header (if any) | 2408 + + 2409 | | 2410 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2411 | | 2412 // // 2413 | Terminated RDMA Header (if any) | 2414 + + 2415 | | 2416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2417 Figure 16 Terminate, DDP Segment format 2419 12.2 Ordering and Completion Table 2421 The following table summarizes the ordering relationships that are 2422 defined in section 7.5 Ordering and Completions from the 2423 standpoint of the local peer issuing the two Operations. Note, in 2424 the table that follows Send includes Send, Send with Invalidate, 2425 Send with Solicited Event, and Send with Solicited Event and 2426 Invalidate 2428 ------+-------+----------------+----------------+---------------- 2429 First | Later | Placement | Placement | Ordering 2430 Op | Op | guarantee at | guarantee | guarantee at 2431 | | Remote Peer | Local Peer | Remote Peer 2432 | | | | 2433 ------+-------+----------------+----------------+---------------- 2434 Send | Send | No placement | Not applicable | Completed in 2435 | | guarantee. If | | order. 2436 | | guarantee is | | 2437 | | necessary, see | | 2438 | | footnote 1. | | 2439 ------+-------+----------------+----------------+---------------- 2440 Send | RDMA | No placement | Not applicable | Not applicable 2441 | Write | guarantee. If | | 2442 | | guarantee is | | 2443 | | necessary, see | | 2444 | | footnote 1. | | 2445 ------+-------+----------------+----------------+---------------- 2446 Send | RDMA | No placement | RDMA Read | RDMA Read 2447 | Read | guarantee | Response | Response 2448 | | between Send | Payload will | Message will 2449 | | Payload and | not be placed | not be 2450 | | RDMA Read | at the local | generated until 2451 | | Request Header | peer until the | Send has been 2452 | | | Send Payload is| Completed 2453 | | | placed at the | 2454 | | | remote peer | 2455 ------+-------+----------------+----------------+---------------- 2456 RDMA | Send | No placement | Not applicable | Not applicable 2457 Write | | guarantee. If | | 2458 | | guarantee is | | 2459 | | necessary, see | | 2460 | | footnote 1. | | 2461 ------+-------+----------------+----------------+---------------- 2462 RDMA | RDMA | No placement | Not applicable | Not applicable 2463 Write | Write | guarantee. If | | 2464 | | guarantee is | | 2465 | | necessary, see | | 2466 | | footnote 1. | | 2467 ------+-------+----------------+----------------+---------------- 2468 RDMA | RDMA | No placement | RDMA Read | Not applicable 2469 Write | Read | guarantee | Response | 2470 | | between RDMA | Payload will | 2471 | | Write Payload | not be placed | 2472 | | and RDMA Read | at the local | 2473 | | Request Header | peer until the | 2474 | | | RDMA Write | 2475 | | | Payload is | 2476 | | | placed at the | 2477 | | | remote peer | 2478 ------+-------+----------------+----------------+---------------- 2479 RDMA | Send | No placement | Send Payload | Not applicable 2480 Read | | guarantee | may be placed | 2481 | | between RDMA | at the remote | 2482 | | Read Request | peer before the| 2483 | | Header and Send| RDMA Read | 2484 | | payload | Response is | 2485 | | | generated. | 2486 | | | If guarantee is| 2487 | | | necessary, see | 2488 | | | footnote 2. | 2489 ------+-------+----------------+----------------+---------------- 2490 RDMA | RDMA | No placement | RDMA Write | Not applicable 2491 Read | Write | guarantee | Payload may be | 2492 | | between RDMA | placed at the | 2493 | | Read Request | remote peer | 2494 | | Header and RDMA| before the RDMA| 2495 | | Write payload | Read Response | 2496 | | | is generated. | 2497 | | | If guarantee is| 2498 | | | necessary, see | 2499 | | | footnote 2. | 2500 ------+-------+----------------+----------------+---------------- 2501 RDMA | RDMA | No placement | No placement | Second RDMA 2502 Read | Read | guarantee of | guarantee of | Read Response 2503 | | the two RDMA | the two RDMA | will not be 2504 | | Read Request | Read Response | generated until 2505 | | Headers | Payloads. | first RDMA Read 2506 | | Additionally, | | Response is 2507 | | there is no | | generated. 2508 | | guarantee that | | 2509 | | the Tagged | | 2510 | | Buffers | | 2511 | | referenced in | | 2512 | | the RDMA Read | | 2513 | | will be read in| | 2514 | | order | | 2515 Figure 17 Operation Ordering 2517 Footnote 1: If the guarantee is necessary, a ULP may insert an 2518 RDMA Read Operation and wait for it to complete to act as a Fence. 2520 Footnote 2: If the guarantee is necessary, a ULP may wait for the 2521 RDMA Read Operation to complete before performing the Send. 2523 13 Authors Addresses 2525 Paul R. Culley 2526 Hewlett-Packard Company 2527 20555 SH 249 2528 Houston, Tx. USA 77070-2698 2529 Phone: 281-514-5543 2530 Email: paul.culley@hp.com 2532 Dave Garcia 2533 Hewlett-Packard Company 2534 19333 Vallco Parkway 2535 Cupertino, Ca. USA 95014 2536 Phone: 408.285.6116 2537 Email: dave.garcia@hp.com 2539 Jeff Hilland 2540 Hewlett-Packard Company 2541 20555 SH 249 2542 Houston, Tx. USA 77070-2698 2543 Phone: 281-514-9489 2544 Email: jeff.hilland@hp.com 2546 Renato J. Recio 2547 IBM Corp. 2548 11501 Burnett Road 2549 Austin, Tx. USA 78758 2550 Phone: 512-838-3685 2551 Email: recio@us.ibm.com 2552 14 Acknowledgments 2554 Dwight Barron 2555 Hewlett-Packard Company 2556 20555 SH 249 2557 Houston, Tx. USA 77070-2698 2558 Phone: 281-514-2769 2559 Email: dwight.barron@compaq.com 2561 Caitlin Bestler 2562 Email: cait@asomi.com 2564 John Carrier 2565 Adaptec, Inc. 2566 691 S. Milpitas Blvd. 2567 Milpitas, CA 95035 USA 2568 Phone: +1 (360) 378-8526 2569 Email: john_carrier@adaptec.com 2571 Ted Compton 2572 EMC Corporation 2573 Research Triangle Park, NC 27709, USA 2574 Phone: 919-248-6075 2575 Email: compton_ted@emc.com 2577 Uri Elzur 2578 Broadcom Corporation 2579 16215 Alton Parkway 2580 Irvine, California 92619-7013 USA 2581 Phone: +1 (949) 585-6432 2582 Email: Uri@Broadcom.com 2584 Hari Ghadia 2585 Adaptec, Inc. 2586 691 S. Milpitas Blvd., 2587 Milpitas, CA 95035 USA 2588 Phone: +1 (408) 957-5608 2589 Email: hari_ghadia@adaptec.com 2591 Howard C. Herbert 2592 Intel Corporation 2593 MS CH7-404 2594 5000 West Chandler Blvd. 2595 Chandler, Arizona 85226 2596 Phone: 480-554-3116 2597 Email: howard.c.herbert@intel.com 2599 Mike Ko 2600 IBM 2601 650 Harry Rd. 2602 San Jose, CA 95120 2603 Phone: (408) 927-2085 2604 Email: mako@us.ibm.com 2606 Mike Krause 2607 Hewlett-Packard Company 2608 43LN 2609 19410 Homestead Road 2610 Cupertino, CA 95014 USA 2611 Phone: 408-447-3191 2612 Email: krause@cup.hp.com 2614 Dave Minturn 2615 Intel Corporation 2616 MS JF1-210 2617 5200 North East Elam Young Parkway 2618 Hillsboro, Oregon 97124 2619 Phone: 503-712-4106 2620 Email: dave.b.minturn@intel.com 2622 Mike Penna 2623 Broadcom Corporation 2624 16215 Alton Parkway 2625 Irvine, California 92619-7013 USA 2626 Phone: +1 (949) 926-7149 2627 Email: MPenna@Broadcom.com 2629 Jim Pinkerton 2630 Microsoft, Inc. 2631 One Microsoft Way 2632 Redmond, WA, USA 98052 2633 Email: jpink@microsoft.com 2635 Hemal Shah 2636 Intel Corporation 2637 MS PTL1 2638 1501 South Mopac Expressway, #400 2639 Austin, Texas 78746 2640 Phone: 512-732-3963 2641 Email: hemal.shah@intel.com 2643 Allyn Romanow 2644 Cisco Systems 2645 170 W Tasman Drive 2646 San Jose, CA 95134 USA 2647 Phone: +1 408 525 8836 2648 Email: allyn@cisco.com 2650 Tom Talpey 2651 Network Appliance 2652 375 Totten Pond Road 2653 Waltham, MA 02451 USA 2654 Phone: +1 (781) 768-5329 2655 EMail: thomas.talpey@netapp.com 2657 Patricia Thaler 2658 Agilent Technologies, Inc. 2659 1101 Creekside Ridge Drive, #100 2660 M/S-RG10 2661 Roseville, CA 95678 2662 Phone: +1-916-788-5662 2663 email: pat_thaler@agilent.com 2665 Jim Wendt 2666 Hewlett-Packard Company 2667 8000 Foothills Boulevard MS 5668 2668 Roseville, CA 95747-5668 USA 2669 Phone: +1 916 785 5198 2670 Email: jim_wendt@hp.com 2672 15 Full Copyright Statement 2674 This document and the information contained herein is provided on 2675 an ��AS IS�� basis and ADAPTEC INC., AGILENT TECHNOLOGIES INC., 2676 BROADCOM CORPORATION, CISCO SYSTEMS INC., EMC CORPORATION, 2677 HEWLETT-PACKARD COMPANY, INTERNATIONAL BUSINESS MACHINES 2678 CORPORATION, INTEL CORPORATION, MICROSOFT CORPORATION, NETWORK 2679 APPLIANCE INC., THE INTERNET SOCIETY, AND THE INTERNET ENGINEERING 2680 TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 2681 BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2682 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 2683 MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2685 Copyright (c) 2002 ADAPTEC INC., BROADCOM CORPORATION, CISCO 2686 SYSTEMS INC., EMC CORPORATION, HEWLETT-PACKARD COMPANY, 2687 INTERNATIONAL BUSINESS MACHINES CORPORATION, INTEL CORPORATION, 2688 MICROSOFT CORPORATION, NETWORK APPLIANCE INC., All Rights 2689 Reserved.