idnits 2.17.1 draft-ietf-storm-rdmap-ext-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The abstract seems to contain references ([RFC5040]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 739 has weird spacing: '...e Data and ...' -- The document date (March 7, 2011) is 4792 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RC5040' is mentioned on line 122, but not defined == Missing Reference: 'RFC4050' is mentioned on line 249, but not defined == Unused Reference: 'RFC5041' is defined on line 1002, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Storage Maintenance (storm) Working Group Hemal Shah 2 Internet Draft Broadcom Corporation 3 Intended status: Standards Track Felix Marti 4 Expires: September 2011 Wael Noureddine 5 Asgeir Eiriksson 6 Chelsio Communications, Inc. 7 Robert Sharp 8 Intel Corporation 9 March 7, 2011 11 RDMA Protocol Extensions 12 draft-ietf-storm-rdmap-ext-00.txt 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with 17 the provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF). Note that other groups may also distribute 21 working documents as Internet-Drafts. The list of current Internet- 22 Drafts is at http://datatracker.ietf.org/drafts/current. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other documents 26 at any time. It is inappropriate to use Internet-Drafts as 27 reference material or to cite them other than as "work in progress." 29 This Internet-Draft will expire on September 7, 2011. 31 Copyright Notice 33 Copyright (c) 2011 IETF Trust and the persons identified as the 34 document authors. All rights reserved. 36 This document is subject to BCP 78 and the IETF Trust's Legal 37 Provisions Relating to IETF Documents 38 (http://trustee.ietf.org/license-info) in effect on the date of 39 publication of this document. Please review these documents 40 carefully, as they describe your rights and restrictions with 41 respect to this document. Code Components extracted from this 42 document must include Simplified BSD License text as described in 43 Section 4.e of the Trust Legal Provisions and are provided without 44 warranty as described in the Simplified BSD License. 46 Abstract 48 This document specifies extensions to the IETF Remote Direct Memory 49 Access Protocol (RDMAP [RFC5040]). RDMAP provides read and write 50 services directly to applications and enables data to be transferred 51 directly into Upper Layer Protocol (ULP) Buffers without 52 intermediate data copies. The extensions specified in this document 53 provide the following capabilities and/or improvements: Atomic 54 Operations and Immediate Data. 56 Table of Contents 58 1. Introduction...................................................3 59 2. Requirements Language..........................................3 60 3. Glossary.......................................................3 61 4. Header Format changes from RFC 5040............................5 62 4.1. RDMAP Control and Invalidate STag Fields..................5 63 4.2. RDMA Message Definitions..................................6 64 5. Atomic Operations..............................................7 65 5.1. Atomic Operation Details..................................8 66 5.1.1. FetchAdd.............................................8 67 5.1.2. Swap.................................................9 68 5.1.3. CmpSwap.............................................10 69 5.2. Atomic Operations........................................11 70 5.2.1. Atomic Operation Request Message....................11 71 5.2.2. Atomic Operation Response Message...................15 72 5.3. Atomicity Guarantees.....................................16 73 5.4. Atomic Operations Ordering and Completion Rules..........16 74 6. Immediate Data................................................17 75 6.1. RDMAP Interactions with the ULP for Immediate Data 76 Operations....................................................17 77 6.2. Immediate Data Header Format.............................18 78 6.3. Immediate Data or Immediate Data with SE Message.........19 79 6.4. Ordering and Completions.................................19 80 7. Ordering and Completions Table................................19 81 8. Error Processing..............................................23 82 8.1. Errors Detected at the Local Peer........................23 83 8.2. Errors Detected at the Remote Peer.......................23 84 9. Security Considerations.......................................24 85 10. IANA Considerations..........................................24 86 11. References...................................................24 87 11.1. Normative References....................................24 88 11.2. Informative References..................................24 89 12. Acknowledgments..............................................24 90 Appendix A. DDP Segment Formats for RDMA Messages................25 91 A.1. DDP Segment for Atomic Operation Request.................25 92 A.2. DDP Segment for Atomic Response..........................27 93 A.3. DDP Segment for Immediate Data and Immediate Data with SE27 95 1. Introduction 97 The RDMA Protocol [RFC5040] provides capabilities for zero copy and 98 kernel bypass data communications. This document specifies the 99 following extensions to the RDMA Protocol standard: 101 o Atomic operations on remote memory locations. Support for atomic 102 operation enhances the usability of RDMAP in distributed shared 103 memory environments. 105 o Immediate Data messages allow the ULP at the sender to provide a 106 small amount of data following an RDMA Write payload. 108 Other RDMA transport protocols define the functionality added by 109 these extensions leading to differences in RDMA applications and/or 110 Upper Layer Protocols. Removing these differences in the transport 111 protocols simplifies these applications and ULPs and that is the 112 main motivation for the extensions specified in this document. 114 2. Requirements Language 116 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 117 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 118 document are to be interpreted as described in RFC-2119 [RFC2119]. 120 3. Glossary 122 This document is an extension of [RC5040] and key words are defined 123 in the glossary of the referenced document. 125 Atomic Operation - is an operation that results in an execution of a 126 64-bit operation at a specific address on a remote node. The 127 consumer can use atomic operations to read, modify and write at the 128 destination address while at the same time guarantee that no other 129 read or write operation will occur across any other RDMAP/DDP 130 Streams on an RNIC at the Data Sink. 132 Atomic Operation Request - An RDMA Message used by the Data Source 133 to perform an atomic operation at the Data Sink. 135 Atomic Operation Response - An RDMA Message used by the Data Sink to 136 describe the completion of an atomic operation at the Data Sink. 138 CmpSwap - is an Atomic Operation that is used to compare and swap a 139 value at a specific address on a remote node. 141 FetchAdd - is an Atomic Operation that is used to atomically 142 increment a value at a specific address on a remote node. 144 Immediate Data - a small fixed size portion of data sent from the 145 Data Source to a Data Sink 147 Immediate Data Message - An RDMA Message used by the Data Source to 148 send Immediate Data to the Data Sink 150 Immediate Data with Solicited Event (SE) Message - An RDMA Message 151 used by the Data Source to send Immediate Data with Solicited Event 152 to the Data Sink 154 Requester - the sender of an RDMA atomic operation request. 156 Responder - the receiver of an RDMA atomic operation request. 158 Swap - is an Atomic Operation that is used to swap a value at a 159 specific address on a remote node. 161 4. Header Format changes from RFC 5040 163 The control information of RDMA Messages is included in DDP protocol 164 defined header fields, with the following new formats: 165 . Four new RDMA Messages carry additional RDMAP headers. The 166 Immediate Data operation and Immediate Data with Solicited Event 167 operation include 8 bytes of data following the DDP header. 168 Atomic Operations include Atomic Request or Atomic Response 169 headers following the DDP header. 171 4.1. RDMAP Control and Invalidate STag Fields 173 Figure 1 depicts the format of the DDP Control and RDMAP Control 174 fields, in the style and convention of [RFC5040]: 176 0 1 2 3 177 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 178 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 179 |T|L| Resrv | DV| RV|Rsv| Opcode| 180 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 181 | Invalidate STag | 182 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 184 Figure 1 DDP Control and RDMAP Control Fields 186 The RDMAP Version (RV) field in the RDMAP Control Field when the set 187 of extensions specified in this document is implemented MUST be 01b. 189 Additionally new RDMA Message Operation Codes are added for the 190 Atomic and Immediate Data operations as shown in Figure 2. 192 -------+-----------+-------+------+-------+-----------+-------------- 193 RDMA | Message | Tagged| STag | Queue | Invalidate| Message 194 Message| Type | Flag | and | Number| STag | Length 195 OpCode | | | TO | | | Communicated 196 | | | | | | between DDP 197 | | | | | | and RDMAP 198 -------+-----------+-------+------+-------+-----------+-------------- 199 1000b | Immediate | 0 | N/A | 0 | N/A | Yes 200 | Data | | | | | 201 -------+-----------+------------------------------------------------- 202 1001b | Immediate | 0 | N/A | 1 | N/A | Yes 203 | Data with | | | | | 204 | SE | | | | | 205 -------+-----------+------------------------------------------------- 206 1010b | Atomic | 0 | N/A | 1 | N/A | Yes 207 | Request | | | | | 208 -------+-----------+------------------------------------------------- 209 1011b | Atomic | 0 | N/A | 1 | N/A | Yes 210 | Response | | | | | 211 -------+-----------+------------------------------------------------- 213 Figure 2 Additional RDMA Usage of DDP Fields 215 Note: N/A means Not Applicable. 217 All other DDP and RDMAP control fields MUST be set as described in 218 RFC5040 [RFC5040]. 220 4.2. RDMA Message Definitions 222 The following figure defines which RDMA Headers MUST be used on each 223 new RDMA Message and which new RDMA Messages are allowed to carry 224 ULP payload: 226 -------+-----------+-------------------+------------------------- 227 RDMA | Message | RDMA Header Used | ULP Message allowed in 228 Message| Type | | the RDMA Message 229 OpCode | | | 230 | | | 231 -------+-----------+-------------------+------------------------- 232 1000b | Immediate | Immediate Data | No 233 | Data | Header | 234 -------+-----------+-------------------+------------------------- 235 1001b | Immediate | Immediate Data | No 236 | Data with | Header | 237 | SE | | 238 -------+-----------+-------------------+------------------------- 239 1010b | Atomic | Atomic Request | No 240 | Request | Header | 241 -------+-----------+-------------------+------------------------- 242 1011b | Atomic | Atomic Response | No 243 | Response | Header | 244 -------+-----------+-------------------+------------------------- 245 Figure 3 RDMA Message Definitions 247 5. Atomic Operations 249 The RDMA Protocol Specification in [RFC4050] does not include 250 support for atomic operations which are an important building block 251 for implementing distributed shared memory. 253 This document extends the RDMA Protocol specification with a set of 254 basic atomic operations, and specifies their resource and ordering 255 rules. 257 Atomic operations as specified in this document execute a 64-bit 258 operation at a specified destination address on a remote node. The 259 operations atomically read, modify and write back the contents of 260 the destination address and guarantee that atomic operations on this 261 address by other Queue Pairs (QPs) on the same RNIC do not occur 262 between the read and the write. Atomic operations as specified in 263 this document MAY be implemented. The discovery of whether the 264 atomic operations are implemented or not is outside the scope of 265 this specification and it should be handled by the ULPs or 266 applications. 268 Implementation note: It is recommended that the applications do not 269 use the buffer addresses used for atomic operations for other RDMA 270 operations. 272 Atomic operations use the same remote addressing mechanism as RDMA 273 Reads and Writes. The buffer address specified in the request is in 274 the address space of the Remote Peer that the atomic operation is 275 targeted at. 277 5.1. Atomic Operation Details 279 The following sub-sections describe the atomic operations in more 280 details. 282 5.1.1. FetchAdd 284 The FetchAdd atomic operation requests the responder to read a 64- 285 bit Original Remote Data value at a naturally aligned buffer address 286 in the responder's memory, to perform FetchAdd operation on multiple 287 fields of selectable length specified by 64-bit "Add Mask", and 288 write the result back to the same virtual address. The Atomic 289 addition is performed independently on each one of these fields. A 290 bit set in the Add Mask field specifies the field boundary. The 291 FetchAdd atomic operation result is unknown when the buffer address 292 is not naturally aligned. The setting of "Add Mask" field to 293 0x0000000000000000 results in Atomic Add of 64-bit Original Remote 294 Data Value and 64-bit "Add Data". 296 The pseudo code below describes masked FetchAdd atomic operation. 298 bit_location = 1 300 carry = 0 302 Remote Data Value = 0 304 for bit = 0 to 63 306 { 308 if (bit != 0 ) bit_location = bit_location << 1 310 val1 = !(!(Original Remote Data Value & bit_location)) 312 val2 = !(!(Add Data & bit_location)) 313 sum = carry + val1 + val2 315 carry = !(!(sum & 2)) 317 sum = sum & 1 319 if (sum) 321 Remote Data Value |= bit_location 323 carry = ((carry) && (!(Add Mask & bit_location))) 325 } 327 The FetchAdd operation is performed in the endian format of the 328 target memory. The "Original Remote Data" is converted from the 329 endian format of the target memory for return and returned to the 330 requester. The fields are in big-endian format on the wire. 332 The requester specifies: 334 o Remote STag 336 o Remote Tagged Offset 338 o Add Data 340 o Add Mask 342 The responder returns: 344 o Original Remote Data 346 5.1.2. Swap 348 The Swap Atomic Operation requires the responder to read a 64-bit 349 value at a naturally aligned buffer address in the responder's 350 memory, then to write the "Swap Data" fields into the same buffer 351 address. The "Original Remote Data" is converted from the endian 352 format of the target memory for return and returned to the 353 requester. The fields are in big-endian format on the wire. 355 The requester specifies: 357 o Remote STag 358 o Remote Tagged Offset 360 o Swap Data 362 The responder returns: 364 o Original Remote Data 366 After the successful completion of Swap operation, the responder's 367 memory at the specified buffer address contains the "Swap Data" 368 field in the header. The Swap atomic operation result is unknown 369 when the buffer address is not naturally aligned. 371 5.1.3. CmpSwap 373 The CmpSwap Atomic Operation requires the responder to read a 64-bit 374 value at a naturally aligned buffer address in the responder's 375 memory, to perform an AND logical operation using the 64 bit 376 "Compare Mask" field in the atomic operation Request header, then to 377 compare it with the result of a logical AND operation of the 378 "Compare Mask" and the "Compare Data" fields in the header, and, if 379 the two values are equal, to swap masked bits in the same buffer 380 address with the masked Swap Data. If the two masked compare values 381 are not equal, the contents of the responder's memory are not 382 changed. In either case, the original value read from the buffer 383 address is converted from the endian format of the target memory for 384 return and returned to the requester. The fields are in big-endian 385 format on the wire. 387 The requester specifies: 389 o Remote STag 391 o Remote Tagged Offset 393 o Swap Data 395 o Swap Mask 397 o Compare Data 399 o Compare Mask 401 The responder returns: 403 o Original Remote Data Value 405 The following pseudo code describes the masked CmpSwap operation 406 result. 408 if (!((Compare Data ^ Original Remote Data value) & Compare Mask) 410 then 412 Remote Data Value = 414 (Original Remote Data Value & ~(Swap Mask)) 416 | (Swap Data & Swap Mask) 418 else 420 Remote Data Value = Original Remote Data Value 422 After the operation, the remote data buffer SHALL contain the 423 "Original Remote Data Value" (if comparison did not match) or the 424 masked "Swap Data" (if the comparison did match). The CmpSwap atomic 425 operation result is unknown when the buffer address is not naturally 426 aligned. 428 5.2. Atomic Operations 430 The Atomic Operation Request and Response are RDMA Messages. An 431 Atomic Operation makes use of the DDP Untagged Buffer Model. Atomic 432 Operations use the same Queue Number as RDMA Read Requests (QN=1). 433 Reusing the same Queue Number allows the Atomic Operations to reuse 434 the same infrastructure (e.g. ORD/IRD flow control) as defined for 435 RDMA Read Requests. 437 The RDMA Message OpCode for an Atomic Request Message is 1010b. The 438 RDMA Message OpCode for an Atomic Response Message is 1011b. 440 5.2.1. Atomic Operation Request Message 442 The Atomic Operation Request Message carries an Atomic Operation 443 Header that describes the buffer address in the responder's memory. 444 The Atomic Operation Request header immediately follows the DDP 445 header. The RDMAP layer passes to the DDP layer a RDMAP Control 446 Field. The following figure depicts the Atomic Operation Request 447 Header that MUST be used for all Atomic Operation Request Messages: 449 0 1 2 3 450 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 451 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 452 | Reserved (Not Used) |AOpCode| 453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 454 | Request Identifier | 455 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 456 | Remote STag | 457 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 458 | Remote Tagged Offset | 459 + + 460 | | 461 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 462 | Add or Swap Data | 463 + + 464 | | 465 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 466 | Add or Swap Mask | 467 + + 468 | | 469 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 470 | Compare Data | 471 + + 472 | | 473 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 474 | Compare Mask | 475 + + 476 | | 477 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 479 Figure 4 Atomic Operation Request Header 481 Reserved (Not Used): 28 bits 483 This field MUST be set to zero on transmit, ignored on 484 receive. 486 Atomic Operation Code (AOpCode): 4 bits. 488 See Figure below. 490 Request Identifier: 32 bits. 492 The Request Identifier specifies a number that is used to 493 identify Atomic Operation Request Message. The use of this 494 field is implementation dependent and outside the scope of 495 this specification. 497 Remote STag: 32 bits. 499 The Remote STag identifies the Remote Peer's Tagged Buffer 500 targeted by the atomic operation. The Remote STag is 501 associated with the RDMAP Stream through a mechanism that is 502 outside the scope of the RDMAP specification. 504 Remote Tagged Offset: 64 bits. 506 The Remote Tagged Offset specifies the starting offset, in 507 octets, from the base of the Remote Peer's Tagged Buffer 508 targeted by the atomic operation. The Remote Tagged Offset MAY 509 start at an arbitrary offset. 511 Add or Swap Data: 64 bits. 513 The Add or Swap Data field specifies the 64-bit "Add Data" 514 value in an Atomic FetchAdd Operation or the 64-bit "Swap 515 Data" value in an Atomic Swap or CmpSwap Operation. 517 Add or Swap Mask: 64 bits 519 This field is used in masked atomic operations (FetchAdd and 520 CmpSwap) to perform a bitwise logical AND operation as specified 521 in the definition of these operations. For non-masked atomic 522 operations (Swap), this field MUST be set to ffffffffffffffffh on 523 transmit and ignored by the receiver. 525 Compare Data: 64 bits. 527 The Compare Data field specifies the 64-bit "Compare Data" 528 value in an Atomic CmpSwap Operation. For Atomic FetchAdd and 529 Atomic Swap operation, the Compare Data field MUST be set to 530 zero on transmit and ignored by the receiver. 532 Compare Mask: 64 bits 534 This field is used in masked atomic operation CmpSwap to 535 perform a bitwise logical AND operation as specified in the 536 definition of these operations. For atomic operations 537 FetchAndAdd and Swap, this field MUST be set to 538 ffffffffffffffffh on transmit and ignored by the receiver. 540 ---------+-----------+----------+----------+---------+--------- 541 Atomic | Atomic | Add or | Add or | Compare | Compare 542 Operation| Operation | Swap | Swap | Data | Mask 543 OpCode | | Data | Mask | | 544 ---------+-----------+----------+----------+---------+--------- 545 0000b | FetchAdd | Add Data | Add Mask | N/A | N/A 546 ---------+-----------+----------+----------+---------+--------- 547 0001b | Swap | Swap Data| N/A | N/A | N/A 548 ---------+-----------+----------+----------+---------+--------- 549 0010b | CmpSwap | Swap Data| Swap Mask| Valid | Valid 550 ---------+-----------+----------+----------+---------+--------- 551 0011b | | 552 to | Reserved | Not Specified 553 1111b | | 554 ---------+-----------+----------------------------------------- 556 Figure 5 Atomic Operation Message Definitions 558 The Atomic Operation Request Message has the following semantics: 560 1. An Atomic Operation Request Message MUST reference an Untagged 561 Buffer. That is, the Local Peer's RDMAP layer MUST request that 562 the DDP mark the Message as Untagged. 564 2. One Atomic Operation Request Message MUST consume one Untagged 565 Buffer. 567 3. The Remote Peer's RDMAP layer MUST process an Atomic Operation 568 Request Message. A valid Atomic Operation Request Message MUST 569 NOT be delivered to the Data Sink's ULP (i.e., it is processed by 570 the RDMAP layer). 572 4. At the Remote Peer, when an invalid Atomic Operation Request 573 Message is delivered to the Remote Peer's RDMAP layer, an error 574 is surfaced. 576 5. An Atomic Operation Request Message MUST reference the RDMA Read 577 Request Queue. That is, the Local Peer's RDMAP layer MUST 578 request that the DDP layer set the Queue Number field to one. 580 6. The Local Peer MUST pass to the DDP layer Atomic Operation 581 Request Messages in the order they were submitted by the ULP. 583 7. The Remote Peer MUST process the Atomic Operation Request 584 Messages in the order they were sent. 586 8. If the Data Source receives a valid Atomic Operation Request 587 Message, it MUST respond with a valid Atomic Operation Response 588 Message. 590 5.2.2. Atomic Operation Response Message 592 The Atomic Operation Response Message carries an Atomic Operation 593 Response Header that contains the "Original Request Identifier" and 594 "Original Remote Data Value". The Atomic Operation Response Header 595 immediately follows the DDP header. The RDMAP layer passes to the 596 DDP layer a RDMAP Control Field. The following figure depicts the 597 Atomic Operation Response header that MUST be used for all Atomic 598 Operation Response Messages: 600 0 1 2 3 601 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 602 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 603 | Original Request Identifier | 604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 605 | Original Remote Data Value | 606 + + 607 | | 608 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 610 Figure 6 Atomic Operation Response Header 612 Original Request Identifier: 32 bits. 614 The Original Request Identifier MUST be set to the value 615 specified in the Request Identifier field that was originally 616 provided in the corresponding Atomic Operation Request 617 Message. 619 Original Remote Data Value: 64 bits. 621 The Original Remote Value specifies the original 64-bit value 622 stored at the buffer address targeted by the atomic operation. 624 The Atomic Operation Response Message has the following semantics: 626 1. The Atomic Operation Response Message for the associated Atomic 627 Operation Request Message travels in the opposite direction. 629 2. An Atomic Operation Response Message MUST consume an Untagged 630 Buffer. That is, the Data Source RDMAP layer MUST request that 631 the DDP mark the Message as Untagged. 633 3. An Atomic Operation Response Message MUST reference the Queue 634 Number 3. That is, the Local Peer's RDMAP layer MUST request 635 that the DDP layer set the Queue Number field to 3. 637 4. The Data Source MUST ensure that a sufficient number of Untagged 638 Buffers are available on the RDMA Read Request Queue (Queue with 639 DDP Queue Number 1) to support the maximum number of Atomic 640 Operation Requests negotiated by the ULP. 642 5. The RDMAP layer MUST Deliver the Atomic Operation Response 643 Message to the ULP. 645 6. At the Remote Peer, when an invalid Atomic Operation Response 646 Message is delivered to the Remote Peer's RDMAP layer, an error 647 is surfaced. 649 7. The Data Source RDMAP layer MUST pass Atomic Operation Response 650 Messages to the DDP layer, in the order that the Atomic Operation 651 Request Messages were received by the RDMAP layer, at the Data 652 Source. 654 5.3. Atomicity Guarantees 656 Atomicity of the RMW on the responder's node by the Atomic Operation 657 SHALL be assured in the presence of concurrent atomic accesses by 658 other QPs on the same RNIC. 660 5.4. Atomic Operations Ordering and Completion Rules 662 In addition to the ordering and completion rules described in 663 RFC5040 [RFC5040], the following rules apply to implementations of 664 the Atomic operations. 666 1. For an Atomic operation, the contents of the Tagged Buffer at the 667 Data Sink MAY be indeterminate until the Atomic Operation 668 Response Message has been Delivered at the Local Peer. 670 2. Atomic Operation Request Messages MUST NOT start processing at 671 the Remote Peer until they have been Delivered to RDMAP by DDP. 673 3. Atomic Operation Response Messages MAY be generated at the Remote 674 Peer after subsequent RDMA Write Messages or Send Messages have 675 been Placed or Delivered. 677 4. Atomic Operation Response Message processing at the Remote Peer 678 MUST be started only after the Atomic Operation Request Message 679 has been Delivered by the DDP layer (thus, all previous RDMA 680 Messages have been properly submitted for ordered Placement). 682 5. Send Messages MAY be Completed at the Remote Peer (Data Sink) 683 before prior incoming Atomic Operation Request Messages have 684 completed their response processing. 686 6. An Atomic Operation MUST NOT be Completed at the Local Peer until 687 the DDP layer Delivers the associated incoming Atomic Operation 688 Response Message. 690 7. If more than one outstanding Atomic Request Messages are 691 supported by both peers, the Atomic Operation Request Messages 692 MUST be processed in the order they were delivered by the DDP 693 layer on the Remote Peer. Atomic Operation Response Messages MUST 694 be submitted to the DDP layer on the Remote Peer in the order the 695 Atomic Operation Request Messages were Delivered by DDP. 697 6. Immediate Data 699 The Immediate Data operation is used in conjunction with an RDMA 700 Write operation to improve ULP processing efficiency by allowing 8 701 bytes of immediate data which are placed in a Completion Queue Entry 702 (CQE) after the previous operation has been delivered at the remote 703 peer. 705 6.1. RDMAP Interactions with the ULP for Immediate Data Operations 707 For Immediate Data operations, the following are the interactions 708 between the RDMAP Layer and the ULP: 709 . At the Data Source: 711 . The ULP passes to the RDMAP Layer the following: 713 . Eight bytes of ULP Immediate Data 715 . When the Immediate Data operation Completes, an indication 716 of the Completion results. 718 . At the Data Sink: 720 . If the Immediate Data operation is Completed successfully, 721 the RDMAP Layer passes the following information to the ULP 722 Layer: 724 . Eight bytes of Immediate Data 726 . An Event, if the Data Sink is configured to generate an 727 Event and the RDMA Message Opcode indicates Message Type 728 Immediate Data with Solicited Event. 730 . If the Immediate Data operation is Completed in error, the 731 Data Sink RDMAP Layer will pass up the corresponding error 732 information to the Data Sink ULP and send a Terminate 733 Message to the Data Source RDMAP Layer. The Data Source 734 RDMAP Layer will then pass up the Terminate Message to the 735 ULP. 737 6.2. Immediate Data Header Format 739 The Immediate Data and Immediate Data with SE Messages carry 740 immediate data as shown in Figure 7. The RDMAP layer passes to the 741 DDP layer an RDMAP Control Field and 8 bytes of Immediate Data. The 742 first 8 bytes of the data following the DDP header contains the 743 Immediate Data. See section A.3. for the DDP segment format of an 744 Immediate Data or Immediate Data with SE Message. 746 0 1 2 3 747 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 748 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 749 | Immediate Data | 750 + + 751 | | 752 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 754 Figure 7 Immediate Data or Immediate Data with SE Message Header 756 Immediate Data: 64 bits. 757 Eight bytes of data transferred from the Requester to an 758 untagged buffer at the Responder. 760 6.3. Immediate Data or Immediate Data with SE Message 762 The Immediate Data or Immediate Data with SE Messageuses the DDP 763 Untagged Buffer Model to transfer Immediate data from the Data 764 Source to the Data Sink. 765 . An Immediate Data or Immediate Data with SE Message MUST 766 reference an Untagged Buffer. That is, the Local Peer's RDMAP 767 Layer MUST request that the DDP layer mark the Message as 768 Untagged. 770 . One Immediate Data or Immediate Data with SE Message MUST consume 771 one Untagged Buffer. 773 . At the Remote Peer, the Immediate Data or Immediate Data with SE 774 Message MUST be Delivered to the Remote Peer's ULP in the order 775 they were sent. 777 . For an Immediate Data or Immediate Data with SE Message, the 778 Local Peer's RDMAP Layer MUST request that the DDP layer set the 779 Queue Number field to zero. 781 . For an Immediate Data or Immediate Data with SE Message, the 782 Local Peer's RDMAP Layer MUST request that the DDP layer transmit 783 8 bytes of data. 785 . The Local Peer MUST issue Immediate Data and Immediate Data with 786 SE Messages in the order they were submitted by the ULP. 788 . The Remote Peer MUST check that Immediate Data and Immediate Data 789 with SE Messages include exactly 8 bytes of data from the DDP 790 layer. 792 6.4. Ordering and Completions 794 Ordering and completion rules for Immediate Data are the same as 795 those for a Send operation as described in section 5.5 of RFC 5040. 797 7. Ordering and Completions Table 799 The following table summarizes the ordering relationships for Atomic 800 and Immediate Data operations from the standpoint of local Peer issuing 801 the Operations. Note that in the table that follows, Send includes 802 Send, Send with Invalidate, Send with Solicited Event, and Send with 803 Solicited Event and Invalidate. Also note that in the table below, 804 Immediate Data includes Immediate Data and Immediate Data with 805 Solicited Event. 807 ----------+------------+-------------+-------------+------------------- 808 First | Second | Placement | Placement | Ordering 809 Operation | Operation | Guarantee at| Guarantee at| Guarantee at 810 | | Remote Peer | Local Peer | Remote Peer 811 ----------+------------+-------------+-------------+------------------- 812 Immediate | Send | No Placement| Not | Completed in 813 Data | | Guarantee | Applicable | Order 814 | | between Send| | 815 | | Payload and | | 816 | | Immediate | | 817 | | Data | | 818 ----------+------------+-------------+-------------+------------------- 819 Immediate | RDMA | No Placement| Not | Not 820 Data | Write | Guarantee | Applicable | Applicable 821 | | between RDMA| | 822 | | Write | | 823 | | Payload and | | 824 | | Immediate | | 825 | | Data | | 826 ----------+------------+-------------+-------------+------------------- 827 Immediate | RDMA | No Placement| RDMA Read | RDMA Read 828 Data | Read | Guarantee | Response | Response 829 | | between | will not be | Message will 830 | | Immediate | Placed until| not be 831 | | Data and | Immediate | generated 832 | | RDMA Read | Data is | until 833 | | Request | Placed at | Immediate Data 834 | | | Remote Peer | has been 835 | | | | Completed 836 ----------+------------+-------------+-------------+------------------- 837 Immediate | Atomic | No Placement| Atomic | Atomic 838 Data | | Guarantee | Response | Response 839 | | between | will not be | Message will 840 | | Immediate | Placed until| not be 841 | | Data and | Immediate | generated 842 | | Atomic | Data is | until 843 | | Request | Placed at | Immediate Data 844 | | | Remote Peer | has been 845 | | | | Completed 847 ----------+------------+-------------+-------------+------------------- 848 Immediate | Immediate | No Placement| Not | Completed in 849 Data or | Data | Guarantee | Applicable | Order 850 Send | | | | 851 ----------+------------+-------------+-------------+------------------- 852 RDMA Write| Immediate | No Placement| Not | Immediate Data 853 | Data | Guarantee | Applicable | is Completed 854 | | | | after RDMA 855 | | | | Write is Placed 856 | | | | and Delivered 857 ----------+------------+-------------+-------------+------------------- 858 RDMA Read | Immediate | No Placement| Immediate | Not Applicable 859 | Data | Guarantee | Data may be | 860 | | between | Placed | 861 | | Immediate | before | 862 | | Data and | RDMA Read | 863 | | RDMA Read | Response is | 864 | | Request | generated | 865 ----------+------------+-------------+-------------+------------------- 866 Atomic | Immediate | No Placement| Immediate | Not Applicable 867 | Data | Guarantee | Data may be | 868 | | between | Placed | 869 | | Immediate | before | 870 | | Data and | Atomic | 871 | | Atomic | Response is | 872 | | Request | generated | 873 ----------+------------+-------------+-------------+------------------- 874 Atomic | Send | No Placement| Send Payload| Not Applicable 875 | | Guarantee | may be | 876 | | between Send| Placed | 877 | | Payload and | before | 878 | | Atomic | Atomic | 879 | | Request | Response is | 880 | | | generated | 881 ----------+------------+-------------+-------------+------------------- 882 Atomic | RDMA | No Placement| RDMA Write | Not 883 | Write | Guarantee | Payload may | Applicable 884 | | between RDMA| be Placed | 885 | | Write | before | 886 | | Payload and | Atomic | 887 | | Atomic | Response is | 888 | | Request | generated | 889 ----------+------------+-------------+-------------+------------------- 890 Atomic | RDMA | No Placement| No Placement| RDMA Read 891 | Read | Guarantee | Guarantee | Response 892 | | between | between | Message will 893 | | Atomic | Atomic | not be 894 | | Request and | Response | generated 895 | | RDMA Read | and RDMA | until Atomic 896 | | Request | Read | Response Message 897 | | | Response | has been 898 | | | | generated 899 ----------+------------+-------------+-------------+------------------- 900 Atomic | Atomic | No Placement| No Placement| Second Atomic 901 | | Guarantee | Guarantee | Response 902 | | between two | between two | Message will 903 | | Atomic | Atomic | not be 904 | | Requests | Responses | generated 905 | | | | until first 906 | | | | Atomic Response 907 | | | | has been 908 | | | | generated 909 ----------+------------+-------------+-------------+------------------- 910 Send | Atomic | No Placement| Atomic | Atomic Response 911 | | Guarantee | Response | Message will not 912 | | between Send| will not be | be generated until 913 | | Payload and | Placed at | Send has been 914 | | Atomic | the Local | Completed 915 | | Request | Peer Until | 916 | | | Send Payload| 917 | | | is Placed | 918 | | | at the | 919 | | | Remote Peer | 920 ----------+------------+-------------+-------------+------------------- 921 RDMA | Atomic | No Placement| Atomic | Not 922 Write | | Guarantee | Response | Applicable 923 | | between RDMA| will not be | 924 | | Write | Placed at | 925 | | Payload and | the Local | 926 | | Atomic | Peer Until | 927 | | Request | Send Payload| 928 | | | is Placed | 929 | | | at the | 930 | | | Remote Peer | 931 ----------+------------+-------------+-------------+------------------- 932 RDMA | Atomic | No Placement| No Placement| Atomic Response 933 Read | | Guarantee | Guarantee | Message will 934 | | between | between | not be generated 935 | | Atomic | Atomic | until RDMA 936 | | Request and | Response | Read Response 937 | | RDMA Read | and RDMA | has been 938 | | Request | Read | generated 939 | | | Response | 940 ----------+------------+-------------+-------------+------------------- 942 8. Error Processing 944 In addition to error processing described in section 7 of RFC 5040, 945 the following rules apply for the new RDMA Messages defined in this 946 specification. 948 8.1. Errors Detected at the Local Peer 950 The Local Peer MUST send a Terminate Message for each of the 951 following cases: 953 1. For errors detected while creating an Atomic Request, Atomic 954 Response, Immediate Data, or Immediate Data with SE Message, or 955 other reasons not directly associated with an incoming Message, 956 the Terminate Message and Error code are sent instead of the 957 Message. In this case, the Error Type and Error Code fields are 958 included in the Terminate Message, but the Terminated DDP Header 959 and Terminated RDMA Header fields are set to zero. 961 2. For errors detected on an incoming Atomic Request, Atomic 962 Response, Immediate Data, or Immediate Data with Solicited Event 963 (after the Message has been Delivered by DDP), the Terminate 964 Message is sent at the earliest possible opportunity, preferably 965 in the next outgoing RDMA Message. In this case, the Error Type, 966 Error Code, and Terminated DDP Header fields are included in the 967 Terminate Message, but the Terminated RDMA Header field is set to 968 zero. 970 8.2. Errors Detected at the Remote Peer 972 On incoming Atomic Requests, Atomic Responses, Immediate Data, and 973 Immediate Data with Solicited Event, the following must be 974 validated: 976 1. The DDP layer MUST validate all DDP Segment fields. 978 2. The RDMA OpCode MUST be valid. 980 3. The RDMA Version MUST be valid. 982 9. Security Considerations 984 This document specifies extensions to the RDMA Protocol 985 specification in [RFC5040], and as such the Security Considerations 986 discussed in Section 8 of [RFC5040] apply. 988 10. IANA Considerations 990 This document requests no direct action from IANA. 992 11. References 994 11.1. Normative References 996 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 997 Requirement Levels", BCP 14, RFC 2119, March 1997. 999 [RFC5040] Recio, R. et al., "A Remote Direct Memory Access Protocol 1000 Specification", RFC 5040, October 2007. 1002 [RFC5041] Shah, H. et al., "Direct Data Placement over Reliable 1003 Transports", RFC 5041, October 2007. 1005 11.2. Informative References 1007 12. Acknowledgments 1009 The authors would like to acknowledge the following contributors who 1010 provided valuable comments and suggestions. 1012 o Steve Wise. 1014 This document was prepared using 2-Word-v2.0.template.dot. 1016 Appendix A. DDP Segment Formats for RDMA Messages 1018 This appendix is for information only and is NOT part of the 1019 standard. It simply depicts the DDP Segment format for the various 1020 RDMA Messages. 1022 A.1. DDP Segment for Atomic Operation Request 1024 The following figure depicts an Atomic Operation Request, DDP 1025 Segment: 1027 0 1 2 3 1028 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1029 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1030 | DDP Control | RDMA Control | 1031 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1032 | Reserved (Not Used) | 1033 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1034 | DDP (Atomic Operation Request) Queue Number | 1035 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1036 | DDP (Atomic Operation Request) Message Sequence Number | 1037 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1038 | DDP (Atomic Operation Request) Message Offset | 1039 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1040 | Reserved (Not Used) |AOpCode| 1041 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1042 | Request Identifier | 1043 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1044 | Remote STag | 1045 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1046 | Remote Tagged Offset | 1047 + + 1048 | | 1049 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1050 | Add or Swap Data | 1051 + + 1052 | | 1053 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1054 | Add or Swap Mask | 1055 + + 1056 | | 1057 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1058 | Compare Data | 1059 + + 1060 | | 1061 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1062 | Compare Mask | 1063 + + 1064 | | 1065 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1067 A.2. DDP Segment for Atomic Response 1069 The following figure depicts an Atomic Operation Response, DDP 1070 Segment: 1072 0 1 2 3 1073 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1074 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1075 | DDP Control | RDMA Control | 1076 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1077 | Reserved (Not Used) | 1078 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1079 | DDP (Atomic Operation Request) Queue Number | 1080 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1081 | DDP (Atomic Operation Request) Message Sequence Number | 1082 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1083 | DDP (Atomic Operation Request) Message Offset | 1084 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1085 | Original Request Identifier | 1086 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1087 | Original Remote Value | 1088 + + 1089 | | 1090 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1092 A.3. DDP Segment for Immediate Data and Immediate Data with SE 1094 The following figure depicts an Immediate Data or Immediate data 1095 with SE, DDP Segment: 1097 0 1 2 3 1098 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1100 | DDP Control | RDMA Control | 1101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1102 | Reserved (Not Used) | 1103 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1104 | DDP (Send) Queue Number | 1105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1106 | DDP (Send) Message Sequence Number | 1107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1108 | DDP Message Offset | 1109 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1110 | Immediate Data | 1111 + + 1112 | | 1113 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1115 Authors' Addresses 1117 Hemal Shah 1118 Broadcom Corporation 1119 5300 California Avenue 1120 Irvine, CA 92617 1121 Phone: 1-949-926-6941 1122 Email: hemal@broadcom.com 1124 Felix Marti 1125 Chelsio Communications, Inc. 1126 370 San Aleso Ave. 1127 Sunnyvale, CA 94085 1128 Phone: 1-408-962-3600 1129 Email: felix@chelsio.com 1131 Asgeir Eiriksson 1132 Chelsio Communications, Inc. 1133 370 San Aleso Ave. 1134 Sunnyvale, CA 94085 1135 Phone: 1-408-962-3600 1136 Email: asgeir@chelsio.com 1138 Wael Noureddine 1139 Chelsio Communications, Inc. 1140 370 San Aleso Ave. 1141 Sunnyvale, CA 94085 1142 Phone: 1-408-962-3600 1143 Email: wael@chelsio.com 1145 Robert Sharp 1146 Intel Corporation 1147 1501 South Mopac, Suite 400, Mailstop: AN1-WTR1 1148 Austin, TX 78746 1149 Phone: 1-512-493-3242 1150 Email: robert.o.sharp@intel.com