idnits 2.17.1 draft-ietf-storm-rdmap-ext-10.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 16, 2014) is 3656 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFCXXXX' is mentioned on line 1265, but not defined ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Storage Maintenance (storm) Working Group Hemal Shah 2 Internet Draft Broadcom Corporation 3 Intended status: Standards Track Felix Marti 4 Expires: October 2014 Wael Noureddine 5 Asgeir Eiriksson 6 Chelsio Communications, Inc. 7 Robert Sharp 8 Intel Corporation 9 April 16, 2014 11 RDMA Protocol Extensions 12 draft-ietf-storm-rdmap-ext-10.txt 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with 17 the provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF). Note that other groups may also distribute 21 working documents as Internet-Drafts. The list of current Internet- 22 Drafts is at http://datatracker.ietf.org/drafts/current. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other documents 26 at any time. It is inappropriate to use Internet-Drafts as 27 reference material or to cite them other than as "work in progress." 29 This Internet-Draft will expire on October 16, 2014. 31 Copyright Notice 33 Copyright (c) 2014 IETF Trust and the persons identified as the 34 document authors. All rights reserved. 36 This document is subject to BCP 78 and the IETF Trust's Legal 37 Provisions Relating to IETF Documents 38 (http://trustee.ietf.org/license-info) in effect on the date of 39 publication of this document. Please review these documents 40 carefully, as they describe your rights and restrictions with 41 respect to this document. Code Components extracted from this 42 document must include Simplified BSD License text as described in 43 Section 4.e of the Trust Legal Provisions and are provided without 44 warranty as described in the Simplified BSD License. 46 Abstract 48 This document specifies extensions to the IETF Remote Direct Memory 49 Access Protocol (RDMAP RFC5040). RDMAP provides read and write 50 services directly to applications and enables data to be transferred 51 directly into Upper Layer Protocol (ULP) Buffers without 52 intermediate data copies. The extensions specified in this document 53 provide the following capabilities and/or improvements: Atomic 54 Operations and Immediate Data. 56 Table of Contents 58 1. Introduction...................................................3 59 1.1. Discovery of RDMAP Extensions.............................4 60 2. Requirements Language..........................................5 61 3. Glossary.......................................................5 62 4. Header Format Extensions.......................................7 63 4.1. RDMAP Control and Invalidate STag Fields..................7 64 4.2. RDMA Message Definitions..................................9 65 5. Atomic Operations..............................................9 66 5.1. Atomic Operation Details.................................11 67 5.1.1. FetchAdd............................................11 68 5.1.2. CmpSwap.............................................12 69 5.2. Atomic Operations........................................14 70 5.2.1. Atomic Operation Request Message....................14 71 5.2.2. Atomic Operation Response Message...................18 72 5.3. Atomicity Guarantees.....................................19 73 5.4. Atomic Operations Ordering and Completion Rules..........19 74 6. Immediate Data................................................21 75 6.1. RDMAP Interactions with ULP for Immediate Data...........21 76 6.2. Immediate Data Header Format.............................22 77 6.3. Immediate Data or Immediate Data with SE Message.........22 78 6.4. Ordering and Completions.................................23 79 7. Ordering and Completions Table................................23 80 8. Error Processing..............................................26 81 8.1. Errors Detected at the Local Peer........................26 82 8.2. Errors Detected at the Remote Peer.......................27 84 9. Security Considerations.......................................28 85 10. IANA Considerations..........................................28 86 10.1. RDMAP Message Atomic Operation Subcodes.................28 87 10.2. RDMAP Queue Numbers.....................................29 88 11. References...................................................30 89 11.1. Normative References....................................30 90 11.2. Informative References..................................31 91 12. Acknowledgments..............................................32 92 Appendix A. DDP Segment Formats for RDMA Messages................33 93 A.1. DDP Segment for Atomic Operation Request.................33 94 A.2. DDP Segment for Atomic Response..........................35 95 A.3. DDP Segment for Immediate Data and Immediate Data with SE35 97 1. Introduction 99 The RDMA Protocol [RFC5040] provides capabilities for zero copy data 100 communications that preserve memory protection semantics, enabling 101 more efficient network protocol implementations. The RDMA Protocol 102 is part of the iWARP family of specifications which also include RFC 103 5041 [RFC5041], RFC 5044 [RFC5044], and RFC 6581 [RFC6581]. This 104 document specifies the following extensions to the RDMA Protocol 105 (RDMAP): 107 o Atomic operations on remote memory locations. Support for atomic 108 operation enhances the usability of RDMAP in distributed shared 109 memory environments. 111 o Immediate Data messages allow the ULP at the sender to provide a 112 small amount of data. When an Immediate Data message is sent 113 following an RDMA Write Message, the combination of the two 114 messages is an implementation of RDMA Write with Immediate 115 message that is found in other RDMA transport protocols. 117 Other RDMA transport protocols define the functionality added by 118 these extensions leading to differences in RDMA applications and/or 119 Upper Layer Protocols. Removing these differences in the transport 120 protocols simplifies these applications and ULPs and that is the 121 main motivation for the extensions specified in this document. 123 RSockets [RSOCKETS] is an example of RDMA enabled middleware that 124 provides a socket interface as the upper edge interface and utilizes 125 RDMA to provide more efficient networking for sockets based 126 applications. RSockets is aware of Immediate Data support in 127 InfiniBand [IB]. RSockets cannot utilize the RDMA Write with 128 Immediate Data operation from InfiniBand . The addition of the 129 Immediate Data operation specified in this draft will alleviate this 130 difference in RSockets when running on InfiniBand and iWARP. 132 Structured high performance computing applications based on the MPI 133 interface [MPI] may use Atomic Operations defined in this 134 specification. DAT Atomics [DAT_ATOMICS] is an example of RDMA 135 enabled middleware that provides a portable RDMA programming 136 interface for various RDMA transport protocols. DAT Atomics 137 includes a primitive for InfiniBand that is not supported by iWARP 138 RDMA Network Interface Controllers or RNICs. The addition of Atomic 139 Operations as specified in this draft will allow atomic operations 140 in DAT Atomics to work for both InfiniBand and RNICs 141 interchangeably. 143 For more background on RDMA Protocol applicability, see 144 Applicability of Remote Direct Memory Access Protocol (RDMA) and 145 Direct Data Placement Protocol (DDP) [RFC5045]. 147 1.1. Discovery of RDMAP Extensions 149 Today there are RDMA applications and/or ULPs that are aware of the 150 existence of Atomic and Immediate data operations for RDMA 151 transports such as InfiniBand and application programming interfaces 152 such as Open Fabrics Verbs [OFAVERBS]. Today, these applications 153 need to be aware that RDMAP does not support certain of these 154 operations. Typically the availability of these capabilities is 155 exposed to the applications through adapter query interfaces in 156 software. Applications then have to decide to use or not to use 157 Immediate Data or Atomic Operations based on the results of the 158 query interfaces. Such query interfaces typically return the scope 159 of atomicity guarantees, not the individual Atomic Operations 160 supported. Therefore, this specification requires all Atomic 161 Operations defined within to be supported if an RNIC supports any 162 Atomic Operations. 164 In cases where heterogeneous hardware, with differing support for 165 Atomic Operations and Immediate Data Operations, is deployed for use 166 by RDMA applications and/or ULPs, applications are either statically 167 configured to use or not use optional features or use application 168 specific negotiation mechanisms. For the extensions covered by this 169 document, it is RECOMMENDED that RDMA applications and/or ULPs 170 negotiate at the application or ULP level the usage of these 171 extensions. The definition of such application specific mechanism 172 is outside the scope of this specification. For backward 173 compatibility, existing applications and/or ULPs should not assume 174 that these extensions are supported. 176 In the absence of application specific negotiation of the features 177 defined within this specification, the new operations can be 178 attempted and reported errors can be used to determine a remote 179 peer's capabilities. In the case of Atomics, a FetchAdd operation 180 with Add Data set to 0 can safely be used to determine the existence 181 of Atomic Operations without modifying the content of a remote 182 peer's memory. A Remote Operation Error / Unexpected OpCode error 183 will be reported by the remote peer in the case of an Immediate Data 184 or Atomic Operation as described if not supported by the remote 185 peer. 187 2. Requirements Language 189 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 190 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 191 document are to be interpreted as described in RFC-2119 [RFC2119]. 193 3. Glossary 195 This document is an extension of RFC 5040 and key words are defined 196 in the glossary of the referenced document. 198 Atomic Operation - is an operation that results in an execution of a 199 memory operation at a specific ULP Buffer address on a remote node 200 using the Tagged Buffer data transfer model. The consumer can use 201 Atomic Operations to read, modify and write memory at the 202 destination ULP Buffer address while at the same time guaranteeing 203 that no other Atomic Operation read or write accesses to the ULP 204 Buffer address targeted by the Atomic Operation will occur across 205 any other RDMAP Streams on an RNIC at the Responder. 207 Atomic Operation Request - An RDMA Message used by the Data Source 208 to perform an Atomic Operation at the Responder. 210 Atomic Operation Response - An RDMA Message used by the Responder to 211 describe the completion of an Atomic Operation at the Responder. 213 CmpSwap - is an Atomic Operation that is used to compare and swap a 214 value at a specific address on a remote node. 216 FetchAdd - is an Atomic Operation that is used to atomically 217 increment a value at a specific ULP Buffer address on a remote node. 219 Immediate Data - a small fixed size portion of data sent from the 220 Data Source to a Data Sink 222 Immediate Data Message - An RDMA Message used by the Data Source to 223 send Immediate Data to the Data Sink 225 Immediate Data with Solicited Event (SE) Message - An RDMA Message 226 used by the Data Source to send Immediate Data with Solicited Event 227 to the Data Sink 229 iWARP - A suite of wire protocols comprised of RFC 5040, RFC 5041, 230 RFC 5044, and RFC 6581. 232 Requester - the sender of an RDMA Atomic Operation request. 234 Responder - the receiver of an RDMA Atomic Operation request. 236 RNIC - RDMA Network Interface Controller. In this context, this 237 would be a network I/O adapter or embedded controller with iWARP 238 functionality. 240 ULP - Upper Layer Protocol. The protocol layer above the one 241 currently being referenced. The ULP for RFC 5040 / RFC 5041 is 242 expected to be an OS, Application, adaptation layer, or proprietary 243 device. The RFC 5040 / RFC 5041 documents do not specify a ULP -- 244 they provide a set of semantics that allow a ULP to be designed to 245 utilize RFC 5040 / RFC 5041. 247 4. Header Format Extensions 249 The control information of RDMA Messages is included in DDP protocol 250 RFC 5041 defined header fields. RFC 5040 defines the RDMAP header 251 formats layered on the DDP header definition. This specification 252 extends RFC 5040 with the following new formats: 253 . Four new RDMA Messages carry additional RDMAP headers. The 254 Immediate Data operation and Immediate Data with Solicited Event 255 operation include 8 bytes of data following the RDMAP header. 256 Atomic Operations include Atomic Request or Atomic Response 257 headers following the RDMAP header. The RDMAP header for Atomic 258 Request messages is 52 bytes long as specified in Figure 4. The 259 RDMAP header for Atomic Response Messages is 32 bytes long as 260 specified in Figure 5. 262 . Introduction of a new queue for untagged buffers (QN=3) used for 263 Atomic Response tracking. 265 4.1. RDMAP Control and Invalidate STag Fields 267 For reference, Figure 1 depicts the format of the DDP Control and 268 RDMAP Control fields, in the style and convention of RFC 5040: 270 0 1 2 3 271 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 272 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 273 |T|L| Resrv | DV| RV|Rsv| Opcode| 274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 | Invalidate STag | 276 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 278 Figure 1 DDP Control and RDMAP Control Fields 280 The DDP Control Field consists of the T,L, Resrv and DV fields RFC 281 5041. The RDMAP Control Field consists of the RV, Rsv and Opcode 282 fields RFC 5040. 284 This specification adds additional values for the RDMA Opcode field 285 to those specified in RFC 5040. Figure 2 defines the new values of 286 RDMA Opcode field that are used for the RDMA Messages defined in 287 this specification. 289 Figure 2As shown in Figure 2, STag and Tagged Offset are not 290 applicable for the RDMA Messages defined in this specification. 291 Figure 2 also shows the appropriate Queue Number for each Opcode. 293 All RDMA Messages defined in this specification MUST have: 295 The RDMA Version (RV) field: 01b. 297 Opcode field: Set to one of the values in Figure 2. 299 Invalidate STag: Set to zero by the sender, ignored by the receiver. 301 -------+-----------+-------+------+-------+---------+------------- 302 RDMA | Message | Tagged| STag | Queue | In- | Message 303 Opcode | Type | Flag | and | Number| validate| Length 304 | | | TO | | STag | Communicated 305 | | | | | | between DDP 306 | | | | | | and RDMAP 307 -------+-----------+-------+------+-------+---------+------------- 308 1000b | Immediate | 0 | N/A | 0 | N/A | Yes 309 | Data | | | | | 310 -------+-----------+---------------------------------------------- 311 1001b | Immediate | 0 | N/A | 0 | N/A | Yes 312 | Data with | | | | | 313 | SE | | | | | 314 -------+-----------+---------------------------------------------- 315 1010b | Atomic | 0 | N/A | 1 | N/A | Yes 316 | Request | | | | | 317 -------+-----------+---------------------------------------------- 318 1011b | Atomic | 0 | N/A | 3 | N/A | Yes 319 | Response | | | | | 320 -------+-----------+---------------------------------------------- 322 Figure 2 Additional RDMA Usage of DDP Fields 324 Note: N/A means Not Applicable. 326 This extension defines RDMAP use of Queue Number 3 for Untagged 327 Buffers for Atomic Responses. This queue is used for tracking 328 outstanding Atomic Requests. 330 All other DDP and RDMAP control fields are set as described in RFC 331 5040. 333 4.2. RDMA Message Definitions 335 The following figure defines which RDMA Headers are used on each new 336 RDMA Message and which new RDMA Messages are allowed to carry ULP 337 payload: 339 -------+-----------+-------------------+------------------------- 340 RDMA | Message | RDMA Header Used | ULP Message allowed in 341 Message| Type | | the RDMA Message 342 OpCode | | | 343 | | | 344 -------+-----------+-------------------+------------------------- 345 1000b | Immediate | Immediate Data | No 346 | Data | Header | 347 -------+-----------+-------------------+------------------------- 348 1001b | Immediate | Immediate Data | No 349 | Data with | Header | 350 | SE | | 351 -------+-----------+-------------------+------------------------- 352 1010b | Atomic | Atomic Request | No 353 | Request | Header | 354 -------+-----------+-------------------+------------------------- 355 1011b | Atomic | Atomic Response | No 356 | Response | Header | 357 -------+-----------+-------------------+------------------------- 358 Figure 3 RDMA Message Definitions 360 5. Atomic Operations 362 The RDMA Protocol Specification in RFC 5040 does not include support 363 for Atomic Operations which are an important building block for 364 implementing distributed shared memory. 366 This document extends the RDMA Protocol specification with a set of 367 basic Atomic Operations, and specifies their resource and ordering 368 rules. The Atomic Operations specified in this document provide 369 equivalent functionality to the InfiniBand RDMA transport as well as 370 extended Atomic Operations defined in Open Fabrics Verbs, to allow 371 applications that use these primitives to work interchangeably over 372 iWARP. Other operations are left for future consideration. 374 Atomic operations as specified in this document execute a 64-bit 375 memory operation at a specified destination ULP Buffer address on a 376 Responder node using the Tagged Buffer data transfer model. The 377 operations atomically read, modify and write back the contents of 378 the destination ULP Buffer address and guarantee that Atomic 379 Operations on this ULP Buffer address by other RDMAP Streams on the 380 same RNIC do not occur between the read and the write caused by the 381 Atomic Operation. Therefore, the Responder RNIC MUST implement 382 mechanisms to prevent Atomic Operations to a memory registered for 383 Atomic Operations while an Atomic Operation targeting the memory is 384 in progress. The Requester of an atomic operation cannot rely on 385 atomic operation behavior at the Responder across multiple RNICs or 386 with respect to other applications/ULPs running at the Responder 387 that can access the ULP Buffer. It is OPTIONAL for an RNIC to 388 provide such behavior when implementing the atomic operations 389 specified in this document. An RNIC that supports Atomic Operations 390 as specified in this document MUST implement both the FetchAdd 391 operation as specified in section 5.1.1 and CmpSwap operation as 392 specified in section 5.1.2. The advertisement of Tagged Buffer 393 information for Atomic Operations is outside the scope of this 394 specification and is handled by the ULPs. 396 Implementation note: It is RECOMMENDED that the applications do not 397 use the ULP Buffer addresses used for Atomic Operations for other 398 RDMA operations due to the lack of atomicity guarantees between 399 operations other than Atomic Operations. 401 Implementation note: Errors related to the alignment in the 402 following sections cover Atomic Operations targeted at a ULP Buffer 403 address that is not aligned to a 64-bit boundary. 405 Atomic Operation Request Messages use the same remote addressing 406 mechanism as RDMA Reads and Writes. The ULP Buffer address specified 407 in the request is in the address space of the Remote Peer to which 408 the Atomic Operation is targeted. 410 Atomic Operation Response Messages MUST use the Untagged Buffer 411 model with QN=3. Queue number 3 will be used to track outstanding 412 Atomic Operation Request messages at the Requestor. When the Atomic 413 Operation Response message is received, the MSN will be used to 414 locate the corresponding Atomic Operation request in order to 415 complete the Atomic Operation request. 417 5.1. Atomic Operation Details 419 The following sub-sections describe the Atomic Operations in more 420 details. 422 5.1.1. FetchAdd 424 The FetchAdd Atomic Operation requests the Responder to read a 64- 425 bit Original Remote Data Value at a 64-bit aligned ULP Buffer 426 address in the Responder's memory, to perform FetchAdd operation on 427 multiple fields of selectable length specified by 64-bit "Add Mask", 428 and write the result back to the same ULP Buffer address. The Atomic 429 addition is performed independently on each one of these fields. A 430 bit set in the Add Mask field specifies the field boundary; for each 431 field, a bit is set at the most significant bit position for each 432 field, causing any carry out of that bit position to be discarded 433 when the addition is performed. 435 FetchAdd Atomic Operations MUST target ULP Buffer addresses that are 436 64-bit aligned. FetchAdd Atomic Operations that target ULP Buffer 437 addresses that are not 64-bit aligned MUST be surfaced as errors and 438 the Responder's memory MUST NOT be modified in such cases. 439 Additionally an error MUST be surfaced and a terminate message MUST 440 be generated. The setting of "Add Mask" field to 0x0000000000000000 441 results in Atomic Add of 64-bit Original Remote Data Value and 64- 442 bit "Add Data". 444 The pseudo code below describes masked FetchAdd Atomic Operation. 446 bit_location = 1 448 carry = 0 450 Remote Data Value = 0 452 for bit = 0 to 63 454 { 456 if (bit != 0 ) bit_location = bit_location << 1 458 val1 = (Original Remote Data Value & bit_location) >> bit 460 val2 = (Add Data & bit_location) >> bit 461 sum = carry + val1 + val2 463 carry = (sum & 2) >> 1 465 sum = sum & 1 467 if (sum) 469 Remote Data Value |= bit_location 471 carry = ((carry) && (!(Add Mask & bit_location))) 473 } 475 The FetchAdd operation is performed in the endian format of the 476 target memory. The "Original Remote Data Value" is converted from 477 the endian format of the target memory for return and returned to 478 the Requester. The fields are in big-endian format on the wire. 480 The Requester specifies: 482 o Remote STag 484 o Remote Tagged Offset 486 o Add Data 488 o Add Mask 490 The Responder returns: 492 o Original Remote Data 494 5.1.2. CmpSwap 496 The CmpSwap Atomic Operation requires the Responder to read a 64-bit 497 value at a 64-bit aligned ULP Buffer address in the Responder's 498 memory, to perform an AND logical operation using the 64 bit 499 "Compare Mask" field in the Atomic Operation Request header, then to 500 compare it with the result of a logical AND operation of the 501 "Compare Mask" and the "Compare Data" fields in the header, and, if 502 the two values are equal, to swap masked bits in the same ULP Buffer 503 address with the masked Swap Data. If the two masked compare values 504 are not equal, the contents of the Responder's memory are not 505 changed. In either case, the original value read from the ULP Buffer 506 address is converted from the endian format of the target memory for 507 return and returned to the Requester. The fields are in big-endian 508 format on the wire. 510 The Requester specifies: 512 o Remote STag 514 o Remote Tagged Offset 516 o Swap Data 518 o Swap Mask 520 o Compare Data 522 o Compare Mask 524 The Responder returns: 526 o Original Remote Data Value 528 The following pseudo code describes the masked CmpSwap operation 529 result. 531 if (!((Compare Data ^ Original Remote Data Value) & 533 Compare Mask)) 535 then 537 Remote Data Value = 539 (Original Remote Data Value & ~(Swap Mask)) 541 | (Swap Data & Swap Mask) 543 else 545 Remote Data Value = Original Remote Data Value 547 After the operation, the remote data buffer MUST contain the 548 "Original Remote Data Value" (if comparison did not match) or the 549 masked "Swap Data" (if the comparison did match). CmpSwap Atomic 550 Operations MUST target ULP Buffer addresses that are 64-bit aligned. 552 If a CmpSwap Atomic Operation is attempted on a target ULP Buffer 553 address that is not 64-bit aligned: 555 o The operation MUST NOT be performed, 557 o The Responder's memory MUST NOT be modified, 559 o The result MUST be surfaced as an error, and 561 o A terminate message MUST be generated (see Section 8.2. for the 562 terminate message contents) 564 5.2. Atomic Operations 566 The Atomic Operation Request and Response are RDMA Messages. An 567 Atomic Operation makes use of the DDP Untagged Buffer Model. Atomic 568 Operation Request messages MUST use the same Queue Number as RDMA 569 Read Requests (QN=1). Reusing the same Queue Number for Atomic 570 Request messages allows the Atomic Operations to reuse the same 571 infrastructure (e.g. ORD/IRD flow control) as defined for RDMA Read 572 Requests. Atomic Operation Response messages MUST set Queue Number 573 (QN) to 3 in the DDP header. 575 The RDMA Message OpCode for an Atomic Request Message is 1010b. The 576 RDMA Message OpCode for an Atomic Response Message is 1011b. 578 5.2.1. Atomic Operation Request Message 580 The Atomic Operation Request Message carries an Atomic Operation 581 Header that describes the ULP Buffer address in the Responder's 582 memory. The Atomic Operation Request header immediately follows the 583 DDP header. The RDMAP layer passes to the DDP layer a RDMAP Control 584 Field. The following figure depicts the Atomic Operation Request 585 Header that is used for all Atomic Operation Request Messages: 587 0 1 2 3 588 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 589 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 590 | Reserved (Not Used) |AOpCode| 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 | Request Identifier | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 | Remote STag | 595 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 596 | Remote Tagged Offset | 597 + + 598 | | 599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 600 | Add or Swap Data | 601 + + 602 | | 603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 604 | Add or Swap Mask | 605 + + 606 | | 607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 608 | Compare Data | 609 + + 610 | | 611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 612 | Compare Mask | 613 + + 614 | | 615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 617 Figure 4 Atomic Operation Request Header 619 Reserved (Not Used): 28 bits 621 This field is set to zero on transmit, ignored on receive. 623 Atomic Operation Code (AOpCode): 4 bits. 625 See Figure 5. All Atomic Operation Codes from Figure 5 MUST 626 be implemented by an RNIC that supports Atomic Operations. 628 Request Identifier: 32 bits. 630 The Request Identifier specifies a number that is used to 631 identify Atomic Operation Request Message. The value used in 632 this field is selected by the RNIC that sends the message, and 633 is reflected back to the Local Peer in the Atomic Operation 634 Response message. 636 Remote STag: 32 bits. 638 The Remote STag identifies the Remote Peer's Tagged Buffer 639 targeted by the Atomic Operation. The Remote STag is 640 associated with the RDMAP Stream through a mechanism that is 641 outside the scope of the RDMAP specification. 643 Remote Tagged Offset: 64 bits. 645 The Remote Tagged Offset specifies the starting offset, in 646 octets, from the base of the Remote Peer's Tagged Buffer 647 targeted by the Atomic Operation. The Remote Tagged Offset MAY 648 start at an arbitrary offset but MUST represent a 64-bit 649 aligned ULP Buffer address. 651 Add or Swap Data: 64 bits. 653 The Add or Swap Data field specifies the 64-bit "Add Data" 654 value in an Atomic FetchAdd Operation or the 64-bit "Swap 655 Data" value in an Atomic Swap or CmpSwap Operation. 657 Add or Swap Mask: 64 bits 659 This field is used in masked Atomic Operations (FetchAdd and 660 CmpSwap) to perform a bitwise logical AND operation as 661 specified in the definition of these operations. For non- 662 masked Atomic Operations (Swap), this field is set to 663 ffffffffffffffffh on transmit and ignored by the receiver. 665 Compare Data: 64 bits. 667 The Compare Data field specifies the 64-bit "Compare Data" 668 value in an Atomic CmpSwap Operation. For Atomic FetchAdd and 669 Atomic Swap operation, the Compare Data field is set to zero 670 on transmit and ignored by the receiver. 672 Compare Mask: 64 bits 673 This field is used in masked Atomic Operation CmpSwap to 674 perform a bitwise logical AND operation as specified in the 675 definition of these operations. For Atomic Operations FetchAdd 676 and Swap, this field is set to ffffffffffffffffh on transmit 677 and ignored by the receiver. 679 ---------+-----------+----------+----------+---------+--------- 680 Atomic | Atomic | Add or | Add or | Compare | Compare 681 Operation| Operation | Swap | Swap | Data | Mask 682 Code | | Data | Mask | | 683 ---------+-----------+----------+----------+---------+--------- 684 0000b | FetchAdd | Add Data | Add Mask | N/A | N/A 685 ---------+-----------+----------+----------+---------+--------- 686 0010b | CmpSwap | Swap Data| Swap Mask| Valid | Valid 687 ---------+-----------+----------------------------------------- 689 Figure 5 Atomic Operation Message Definitions 691 The Atomic Operation Request Message has the following semantics: 693 1. An Atomic Operation Request Message MUST reference an Untagged 694 Buffer. That is, the Local Peer's RDMAP layer MUST request that 695 the DDP mark the Message as Untagged. 697 2. One Atomic Operation Request Message MUST consume one Untagged 698 Buffer. 700 3. The Responder's RDMAP layer MUST process an Atomic Operation 701 Request Message. A valid Atomic Operation Request Message MUST 702 NOT be delivered to the Responder's ULP (i.e., it is processed by 703 the RDMAP layer). 705 4. At the Responder, an error MUST be surfaced in response to 706 delivery to the Remote Peer's RDMAP layer of an Atomic Operation 707 Request Message with an Atomic Operation Code that the RNIC does 708 not support. 710 5. An Atomic Operation Request Message MUST reference the RDMA Read 711 Request Queue. That is, the Requester's RDMAP layer MUST request 712 that the DDP layer set the Queue Number field to one. 714 6. The Requester MUST pass to the DDP layer Atomic Operation Request 715 Messages in the order they were submitted by the ULP. 717 7. The Responder MUST process the Atomic Operation Request Messages 718 in the order they were sent. 720 8. If the Responder receives a valid Atomic Operation Request 721 Message, it MUST respond with a valid Atomic Operation Response 722 Message. 724 5.2.2. Atomic Operation Response Message 726 The Atomic Operation Response Message carries an Atomic Operation 727 Response Header that contains the "Original Request Identifier" and 728 "Original Remote Data Value". The Atomic Operation Response Header 729 immediately follows the DDP header. The RDMAP layer passes to the 730 DDP layer a RDMAP Control Field. The following figure depicts the 731 Atomic Operation Response header that is used for all Atomic 732 Operation Response Messages: 734 0 1 2 3 735 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 736 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 737 | Original Request Identifier | 738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 739 | Original Remote Data Value | 740 + + 741 | | 742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 744 Figure 6 Atomic Operation Response Header 746 Original Request Identifier: 32 bits. 748 The Original Request Identifier is set to the value specified 749 in the Request Identifier field that was originally provided 750 in the corresponding Atomic Operation Request Message. 752 Original Remote Data Value: 64 bits. 754 The Original Remote Value specifies the original 64-bit value 755 stored at the ULP Buffer address targeted by the Atomic 756 Operation. 758 The Atomic Operation Response Message has the following semantics: 760 1. The Atomic Operation Response Message for the associated Atomic 761 Operation Request Message travels in the opposite direction. 763 2. An Atomic Operation Response Message MUST consume an Untagged 764 Buffer. That is, the Responder RDMAP layer MUST request that the 765 DDP mark the Message as Untagged. 767 3. An Atomic Operation Response Message MUST reference the Queue 768 Number 3. That is, the Responder's RDMAP layer MUST request that 769 the DDP layer set the Queue Number field to 3. 771 4. The Responder MUST ensure that a sufficient number of Untagged 772 Buffers are available on the RDMA Read Request Queue (Queue with 773 DDP Queue Number 1) to support the maximum number of Atomic 774 Operation Requests negotiated by the ULP in addition to the 775 maximum number of RDMA Read Requests negotiated by the ULP. 777 5. The Requester MUST ensure that a sufficient number of Untagged 778 Buffers are available on the RDMA Atomic Response Queue (Queue 779 with DDP Queue Number 3) to support the maximum number of Atomic 780 Operation Requests negotiated by the ULP. 782 6. The RDMAP layer MUST Deliver the Atomic Operation Response 783 Message to the ULP. 785 7. At the Requester, when an invalid Atomic Operation Response 786 Message is delivered to the Remote Peer's RDMAP layer, an error 787 is surfaced. 789 8. When the Responder receives Atomic Operation Request messages, 790 the Responder RDMAP layer MUST pass Atomic Operation Response 791 Messages to the DDP layer, in the order that the Atomic Operation 792 Request Messages were received by the RDMAP layer, at the 793 Responder. 795 5.3. Atomicity Guarantees 797 Atomicity of the Read-Modify-Write (RMW) on the Responder's node by 798 the Atomic Operation MUST be assured in the context of concurrent 799 atomic accesses by other RDMAP Streams on the same RNIC. 801 5.4. Atomic Operations Ordering and Completion Rules 803 In addition to the ordering and completion rules described in RFC 804 5040, the following rules apply to implementations of the Atomic 805 operations. 807 1. For an Atomic operation, the Requester MUST NOT consider the 808 contents of the Tagged Buffer at the Responder to be modified by 809 that specific Atomic Operation until the Atomic Operation 810 Response Message has been Delivered to RDMAP at the Requester. 812 2. Atomicity guarantees MUST be provided within the scope of a 813 single RNIC. 815 Implementation Note: This requirement for atomicity among 816 operations is limited to the scope of a single RNIC. Atomicity 817 guarantees are OPTIONAL with respect to access to the Tagged 818 Buffer by any other method than an Atomic Operation via the same 819 RNIC. Examples of such accesses that may not be atomic with 820 respect to an Atomic Operation include accesses via other RNICs 821 and local processor memory access to the Tagged Buffer. 823 3. Atomic Operation Request Messages MUST NOT start processing at 824 the Responder until they have been Delivered to RDMAP by DDP. 826 4. Atomic Operation Response Messages MAY be generated at the 827 Responder after subsequent RDMA Write Messages or Send Messages 828 have been Placed or Delivered. 830 5. Atomic Operation Response Message processing at the Responder 831 MUST be started only after the Atomic Operation Request Message 832 has been Delivered by the DDP layer (thus, all previous RDMA 833 Messages on that DDP Stream have been Delivered). 835 6. Send Messages MAY be Completed at the Responder before prior 836 incoming Atomic Operation Request Messages have completed their 837 response processing. 839 7. An Atomic Operation MUST NOT be Completed at the Requester until 840 the DDP layer Delivers the associated incoming Atomic Operation 841 Response Message. 843 8. If more than one outstanding Atomic Request Messages are 844 supported by both peers, the Atomic Operation Request Messages 845 MUST be processed in the order they were delivered by the DDP 846 layer on the Responder. Atomic Operation Response Messages MUST 847 be submitted to the DDP layer on the Responder in the order the 848 Atomic Operation Request Messages were Delivered by DDP. 850 6. Immediate Data 852 The Immediate Data operation is typically used in conjunction with 853 an RDMA Write Operation to improve ULP processing efficiency. The 854 efficiency is gained by causing an RDMA Completion to be generated 855 immediately following the RDMA Write operation. This RDMA Completion 856 delivers 8 bytes of immediate data at the Remote Peer. The 857 combination of an RDMA Write Message followed by an Immediate Data 858 Operation has the same behavior as the RDMA Write with Immediate Data 859 operation found in InfiniBand. An Immediate Data operation that is 860 not preceded by an RDMA Write operation causes an RDMA Completion. 862 6.1. RDMAP Interactions with ULP for Immediate Data 864 For Immediate Data operations, the following are the interactions 865 between the RDMAP Layer and the ULP: 866 . At the Data Source: 868 . The ULP passes to the RDMAP Layer the following: 870 . Eight bytes of ULP Immediate Data 872 . When the Immediate Data operation Completes, an indication 873 of the Completion results. 875 . At the Data Sink: 877 . If the Immediate Data operation is Completed successfully, 878 the RDMAP Layer passes the following information to the ULP 879 Layer: 881 . Eight bytes of Immediate Data 883 . An Event, if the Data Sink is configured to generate an 884 Event. 886 . If the Immediate Data operation is Completed in error, the 887 Data Sink RDMAP Layer will pass up the corresponding error 888 information to the Data Sink ULP and send a Terminate 889 Message to the Data Source RDMAP Layer. The Data Source 890 RDMAP Layer will then pass up the Terminate Message to the 891 ULP. 893 6.2. Immediate Data Header Format 895 The Immediate Data and Immediate Data with SE Messages carry 896 immediate data as shown in Figure 7. The RDMAP layer passes to the 897 DDP layer an RDMAP Control Field and 8 bytes of Immediate Data. The 898 first 8 bytes of the data following the DDP header contains the 899 Immediate Data. See section A.3. for the DDP segment format of an 900 Immediate Data or Immediate Data with SE Message. 902 0 1 2 3 903 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 904 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 905 | Immediate Data | 906 + + 907 | | 908 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 910 Figure 7 Immediate Data or Immediate Data with SE Message Header 912 Immediate Data: 64 bits. 913 Eight bytes of data transferred from the Data Source to an 914 untagged buffer at the Data Sink. 916 6.3. Immediate Data or Immediate Data with SE Message 918 The Immediate Data or Immediate Data with SE Message uses the DDP 919 Untagged Buffer Model to transfer Immediate Data from the Data 920 Source to the Data Sink. 921 . An Immediate Data or Immediate Data with SE Message MUST 922 reference an Untagged Buffer. That is, the Local Peer's RDMAP 923 Layer MUST request that the DDP layer mark the Message as 924 Untagged. 926 . One Immediate Data or Immediate Data with SE Message MUST consume 927 one Untagged Buffer. 929 . At the Remote Peer, the Immediate Data or Immediate Data with SE 930 Message MUST be Delivered to the Remote Peer's ULP in the order 931 they were sent. 933 . For an Immediate Data or Immediate Data with SE Message, the 934 Local Peer's RDMAP Layer MUST request that the DDP layer set the 935 Queue Number field to zero. 937 . For an Immediate Data or Immediate Data with SE Message, the 938 Local Peer's RDMAP Layer MUST request that the DDP layer transmit 939 8 bytes of data. 941 . The Local Peer MUST issue Immediate Data and Immediate Data with 942 SE Messages in the order they were submitted by the ULP. 944 . The Remote Peer MUST check that Immediate Data and Immediate Data 945 with SE Messages include exactly 8 bytes of data from the DDP 946 layer. The DDP header carries the length field that is reported 947 by the DDP layer. 949 6.4. Ordering and Completions 951 Ordering and completion rules for Immediate Data are the same as 952 those for a Send operation as described in section 5.5 of RFC 5040. 954 7. Ordering and Completions Table 956 The following table summarizes the ordering relationships for Atomic 957 and Immediate Data operations from the standpoint of Local Peer 958 issuing the Operations. Note that in the table that follows, Send 959 includes Send, Send with Invalidate, Send with Solicited Event, and 960 Send with Solicited Event and Invalidate. Also note that in the 961 table below, Immediate Data includes Immediate Data and Immediate 962 Data with Solicited Event. 964 ---------+----------+-------------+-------------+------------------ 965 First | Second | Placement | Placement | Ordering 966 Operation| Operation| Guarantee at| Guarantee at| Guarantee at 967 | | Remote Peer | Local Peer | Remote Peer 968 ---------+----------+-------------+-------------+------------------ 969 Immediate| Send | No Placement| Not | Completed in 970 Data | | Guarantee | Applicable | Order 971 | | between Send| | 972 | | Payload and | | 973 | | Immediate | | 974 | | Data | | 975 ---------+----------+-------------+-------------+------------------ 976 Immediate| RDMA | No Placement| Not | Not 977 Data | Write | Guarantee | Applicable | Applicable 978 | | between RDMA| | 979 | | Write | | 980 | | Payload and | | 981 | | Immediate | | 982 | | Data | | 983 ---------+----------+-------------+-------------+------------------ 984 Immediate| RDMA | No Placement| RDMA Read | RDMA Read 985 Data | Read | Guarantee | Response | Response 986 | | between | will not be | Message will 987 | | Immediate | Placed until| not be 988 | | Data and | Immediate | generated 989 | | RDMA Read | Data is | until 990 | | Request | Placed at | Immediate Data 991 | | | Remote Peer | has been 992 | | | | Completed 993 ---------+----------+-------------+-------------+------------------ 994 Immediate| Atomic | No Placement| Atomic | Atomic 995 Data | | Guarantee | Response | Response 996 | | between | will not be | Message will 997 | | Immediate | Placed until| not be 998 | | Data and | Immediate | generated 999 | | Atomic | Data is | until 1000 | | Request | Placed at | Immediate Data 1001 | | | Remote Peer | has been 1002 | | | | Completed 1003 ---------+----------+-------------+-------------+------------------ 1004 Immediate| Immediate| No Placement| Not | Completed in 1005 Data or | Data | Guarantee | Applicable | Order 1006 Send | | | | 1007 ---------+----------+-------------+-------------+------------------ 1008 RDMA | Immediate| No Placement| Not | Immediate Data 1009 Write | Data | Guarantee | Applicable | is Completed 1010 | | | | after RDMA 1011 | | | | Write is Placed 1012 | | | | and Delivered 1013 ---------+----------+-------------+-------------+------------------ 1014 RDMA Read| Immediate| No Placement| Immediate | Not Applicable 1015 | Data | Guarantee | Data MAY be | 1016 | | between | Placed | 1017 | | Immediate | before | 1018 | | Data and | RDMA Read | 1019 | | RDMA Read | Response is | 1020 | | Request | generated | 1022 ---------+----------+-------------+-------------+------------------ 1023 Atomic | Immediate| No Placement| Immediate | Not Applicable 1024 | Data | Guarantee | Data MAY be | 1025 | | between | Placed | 1026 | | Immediate | before | 1027 | | Data and | Atomic | 1028 | | Atomic | Response is | 1029 | | Request | generated | 1030 ---------+----------+-------------+-------------+------------------ 1031 Atomic | Send | No Placement| Send Payload| Not Applicable 1032 | | Guarantee | MAY be | 1033 | | between Send| Placed | 1034 | | Payload and | before | 1035 | | Atomic | Atomic | 1036 | | Request | Response is | 1037 | | | generated | 1038 ---------+----------+-------------+-------------+------------------ 1039 Atomic | RDMA | No Placement| RDMA Write | Not 1040 | Write | Guarantee | Payload MAY | Applicable 1041 | | between RDMA| be Placed | 1042 | | Write | before | 1043 | | Payload and | Atomic | 1044 | | Atomic | Response is | 1045 | | Request | generated | 1046 ---------+----------+-------------+-------------+------------------ 1047 Atomic | RDMA | No Placement| No Placement| RDMA Read 1048 | Read | Guarantee | Guarantee | Response 1049 | | between | between | Message will 1050 | | Atomic | Atomic | not be 1051 | | Request and | Response | generated 1052 | | RDMA Read | and RDMA | until Atomic 1053 | | Request | Read | Response Message 1054 | | | Response | has been 1055 | | | | generated 1056 ---------+----------+-------------+-------------+------------------ 1057 Atomic | Atomic | Placed in | No Placement| Second Atomic 1058 | | order | Guarantee | Request 1059 | | | between two | Message will 1060 | | | Atomic | not be 1061 | | | Responses | processed 1062 | | | | until first 1063 | | | | Atomic Response 1064 | | | | has been 1065 | | | | generated 1066 ---------+----------+-------------+-------------+------------------ 1067 Send | Atomic | No Placement| Atomic | Atomic Response 1068 | | Guarantee | Response | Message will not 1069 | | between Send| will not be | be generated 1070 | | Payload and | Placed at | until Send has 1071 | | Atomic | the Local | been Completed 1072 | | Request | Peer Until | 1073 | | | Send Payload| 1074 | | | is Placed | 1075 | | | at the | 1076 | | | Remote Peer | 1077 ---------+----------+-------------+-------------+------------------ 1078 RDMA | Atomic | No Placement| Atomic | Not 1079 Write | | Guarantee | Response | Applicable 1080 | | between RDMA| will not be | 1081 | | Write | Placed at | 1082 | | Payload and | the Local | 1083 | | Atomic | Peer Until | 1084 | | Request | RDMA Write | 1085 | | | Payload | 1086 | | | is Placed | 1087 | | | at the | 1088 | | | Remote Peer | 1089 ---------+----------+-------------+-------------+------------------ 1090 RDMA | Atomic | No Placement| No Placement| Atomic Response 1091 Read | | Guarantee | Guarantee | Message will 1092 | | between | between | not be generated 1093 | | Atomic | Atomic | until RDMA 1094 | | Request and | Response | Read Response 1095 | | RDMA Read | and RDMA | has been 1096 | | Request | Read | generated 1097 | | | Response | 1098 ---------+----------+-------------+-------------+------------------ 1100 8. Error Processing 1102 In addition to error processing described in section 7 of RFC 5040, 1103 the following rules apply for the new RDMA Messages defined in this 1104 specification. 1106 8.1. Errors Detected at the Local Peer 1108 The Local Peer MUST send a Terminate Message for each of the 1109 following cases: 1111 1. For errors detected while creating an Atomic Request, Atomic 1112 Response, Immediate Data, or Immediate Data with SE Message, or 1113 other reasons not directly associated with an incoming Message, 1114 the Terminate Message and Error code are sent instead of the 1115 Message. In this case, the Error Type and Error Code fields are 1116 included in the Terminate Message, but the Terminated DDP Header 1117 and Terminated RDMA Header fields are set to zero. 1119 2. For errors detected on an incoming Atomic Request, Atomic 1120 Response, Immediate Data, or Immediate Data with Solicited Event 1121 (after the Message has been Delivered by DDP), the Terminate 1122 Message is sent at the earliest possible opportunity, preferably 1123 in the next outgoing RDMA Message. In this case, the Error Type, 1124 Error Code, and Terminated DDP Header fields are included in the 1125 Terminate Message, but the Terminated RDMA Header field is set to 1126 zero. 1128 8.2. Errors Detected at the Remote Peer 1130 On incoming Atomic Requests, Atomic Responses, Immediate Data, and 1131 Immediate Data with Solicited Event, the following MUST be 1132 validated: 1134 . The DDP layer MUST validate all DDP Segment fields. 1136 . The RDMA OpCode MUST be valid. 1138 . The RDMA Version MUST be valid. 1140 On incoming Atomic requests the following additional validation MUST 1141 be performed: 1143 . The RDMAP layer MUST validate that the Remote Peer's Tagged ULP 1144 Buffer address references a 64-bit aligned ULP Buffer address. In 1145 the case of an error, the RDMAP layer MUST generate a Terminate 1146 Message indicating RDMA Layer Remote Operation Error with Error 1147 Code Name "Catastrophic Error, Localized to RDMAP Stream" as 1148 described in Section 4.8 of RFC 5040. Implementation Note: A ULP 1149 implementation can avoid this error by having the target ULP 1150 buffer of an atomic operation 64-bit aligned. 1152 9. Security Considerations 1154 This document specifies extensions to the RDMA Protocol 1155 specification in RFC 5040, and as such the Security Considerations 1156 discussed in Section 8 of RFC 5040 apply. In particular, Atomic 1157 Operations use ULP Buffer addresses for the Remote Peer buffer 1158 addressing used in RFC 5040 as required by the RFC 5042 [RFC5042] 1159 security model. 1161 RDMAP and related protocols may be used by applications that exhibit 1162 distinctive traffic characteristics such as message timing, source, 1163 destination and size patterns. Examples include structured high 1164 performance computing applications based on the MPI interface. For 1165 such applications, analysis of encrypted traffic could reveal 1166 sensitive information, e.g., the nature of the application, size of 1167 data set being used, and information about the application's rate of 1168 progress. Such information can be hidden from passive observation 1169 via use of ESPv3 Traffic Flow Confidentiality [RFC4303] to obfuscate 1170 the encrypted traffic's characteristics. ESPv3 implementation 1171 requirements for RDMAP are specified in [RFC7146]. 1173 10. IANA Considerations 1175 IANA is requested to add the following entries to the "RDMAP Message 1176 Operation Codes" registry of "RDDP Registries": 1178 0x8, Immediate Data, [RFCXXXX] 1180 0x9, Immediate Data with Solicited Event, [RFCXXXX] 1182 0xA, Atomic Request, [RFCXXXX] 1184 0xB, Atomic Response, [RFCXXXX] 1186 In addition, the following registry is requested to be added to 1187 "RDDP Registries". The following section specifies the registry, its 1188 initial contents and the administration policy in more detail. 1190 RFC Editor: Please replace XXXX in all instances of [RFCXXXX] above 1191 with the RFC number of this document and remove this note. 1193 10.1. RDMAP Message Atomic Operation Subcodes 1195 Name of the registry: "RDMAP Message Atomic Operation Subcodes" 1196 Namespace details: RDMAP Message Atomic Operation Subcodes are 4-bit 1197 values [RFCXXXX]. 1199 Information that must be provided to assign a new value: An IESG- 1200 approved standards-track specification defining the semantics and 1201 interoperability requirements of the proposed new value and the 1202 fields to be recorded in the registry. 1204 Fields to record in the registry: RDMAP Message Atomic Operation 1205 Subcode, Atomic Operation, RFC Reference. 1207 Initial registry contents: 1209 0x0, FetchAdd, [RFCXXXX] 1211 0x1, Reserved 1213 0x2, CmpSwap, [RFCXXXX] 1215 Note: An experimental RDMAP Message Operation Code has already been 1216 allocated; hence there is no need for an experimental RDMAP Message 1217 Atomic Operation Subcode. 1219 All other values are Unassigned and available to IANA for 1220 assignment. New RDMAP Message Atomic Operation Subcodes should be 1221 assigned sequentially in order to better support implementations 1222 that process RDMAP Message Atomic Operations in hardware. 1224 Allocation Policy: Standards Action ([RFC5226]) 1226 RFC Editor: Please replace XXXX in all instances of [RFCXXXX] above 1227 with the RFC number of this document and remove this note. 1229 10.2. RDMAP Queue Numbers 1231 Name of the registry: "RDMAP DDP Untagged Queue Numbers" 1233 Namespace details: RDMAP DDP Untagged Queue numbers are 32-bit 1234 values [RFCXXXX]. 1236 Information that must be provided to assign a new value: An IESG- 1237 approved standards-track specification defining the semantics and 1238 interoperability requirements of the proposed new value and the 1239 fields to be recorded in the registry. 1241 Fields to record in the registry: RDMAP DDP Untagged Queue Numbers, 1242 Queue Usage Description, RFC Reference. 1244 Initial registry contents: 1246 0x00000000, Queue 0 (Send operation Variants), [RFC5040] 1248 0x00000001, Queue 1 (RDMA Read Request operations), [RFC5040] 1250 0x00000002, Queue 2 (Terminate operations), [RFC5040] 1252 0x00000003, Queue 3 (Atomic Response operations), [RFCXXXX] 1254 Note: An experimental RDMAP Message Operation Code has already been 1255 allocated; hence there is no need for an experimental RDMAP DDP 1256 Untagged Queue Number. 1258 All other values are Unassigned and available to IANA for 1259 assignment. New RDMAP queue numbers should be assigned sequentially 1260 in order to better support implementations that perform RDMAP queue 1261 selection in hardware. 1263 Allocation Policy: Standards Action ([RFC5226]) 1265 RFC Editor: Please replace XXXX in all instances of [RFCXXXX] above 1266 with the RFC number of this document and remove this note. 1268 11. References 1270 11.1. Normative References 1272 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1273 Requirement Levels", BCP 14, RFC 2119, March 1997. 1275 [RFC4303] S. Kent, "IP Encapsulating Security Payload (ESP)", RFC 1276 4303, December 2005. 1278 [RFC5040] Recio, R. et al., "A Remote Direct Memory Access Protocol 1279 Specification", RFC 5040, October 2007. 1281 [RFC5041] Shah, H. et al., "Direct Data Placement over Reliable 1282 Transports", RFC 5041, October 2007. 1284 [RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement 1285 Protocol (DDP) / Remote Direct Memory Access Protocol 1286 (RDMAP) Security", October 2007. 1288 [RFC5226] T. Narten and H. Alvestrand, "Guidelines for Writing an 1289 IANA Considerations Section in RFCs", May 2008. 1291 [RFC7146] D. Black and P. Koning, "Securing Block Storage Protocols 1292 over IP: RFC 3723 Requirements Update for IPsec v3", April 1293 2014. 1295 RFC Editor: Please remove reference to RFC5226 if the associated 1296 IANA Considerations reference is also removed before publication. 1298 11.2. Informative References 1300 [IB] InfiniBand Trade Association, "InfiniBand Architecture 1301 Specification Volumes 1 and 2", Release 1.1, November 1302 2002, available from http://www.infinibandta.org/specs. 1304 [RSOCKETS] RSockets, RDMA enabled Sockets library for Open Fabrics, 1305 available from 1306 http://git.openfabrics.org/?p=~shefty/librdmacm.git;a=summ 1307 ary. 1309 [RFC5044] P. Culley, U. Elzur, R. Recio, S. Bailey, J. Carrier, 1310 "Marker PDU Aligned Framing for TCP Specification", 1311 October 2007. 1313 [RFC5045] C. Bestler and L. Coene, "Applicability of Remote Direct 1314 Memory Access Protocol (RDMA and Direct Data Placement 1315 Protocol (DDP)", October 2007. 1317 [RFC6581] A. Kanevsky, C. Bestler, R. Sharp, S. Wise, "Enhanced 1318 Remote Direct Memory Access (RDMA) Connection 1319 Establishment", April 2012. 1321 [OFAVERBS] Open Fabrics Alliance Verbs Enhanced Atomic Operations, 1322 "[PATCH 0/2] Add support for enhanced atomic operations", 1323 available from http://www.spinics.net/lists/linux- 1324 rdma/msg02405.html. 1326 [DAT_ATOMICS] DAT Collaborative, User Direct Access Programming 1327 Library, "Ratified DAT IB extension spec", available from 1328 http://www.datcollaborative.org/DAT_IB_Extensions.pdf. 1330 [MPI] Message Passing Interface Forum, "MPI: A Message-Passing 1331 Interface Standard, Version 3.0", available from 1332 http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf, 1333 September 2012. 1335 12. Acknowledgments 1337 The authors would like to acknowledge the following contributors who 1338 provided valuable comments and suggestions. 1340 o David Black 1342 o Arkady Kanevsky 1344 o Bernard Metzler 1346 o Jim Pinkerton 1348 o Tom Talpey 1350 o Steve Wise 1352 o Don Wood 1354 This document was prepared using 2-Word-v2.0.template.dot. 1356 Appendix A. DDP Segment Formats for RDMA Messages 1358 This appendix is for information only and is NOT part of the 1359 standard. It simply depicts the DDP Segment format for the various 1360 RDMA Messages. 1362 A.1. DDP Segment for Atomic Operation Request 1364 The following figure depicts an Atomic Operation Request, DDP 1365 Segment: 1367 0 1 2 3 1368 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1369 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1370 | DDP Control | RDMA Control | 1371 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1372 | Reserved (Not Used) | 1373 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1374 | DDP (Atomic Operation Request) Queue Number | 1375 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1376 | DDP (Atomic Operation Request) Message Sequence Number | 1377 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1378 | DDP (Atomic Operation Request) Message Offset | 1379 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1380 | Reserved (Not Used) |AOpCode| 1381 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1382 | Request Identifier | 1383 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1384 | Remote STag | 1385 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1386 | Remote Tagged Offset | 1387 + + 1388 | | 1389 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1390 | Add or Swap Data | 1391 + + 1392 | | 1393 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1394 | Add or Swap Mask | 1395 + + 1396 | | 1397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1398 | Compare Data | 1399 + + 1400 | | 1401 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1402 | Compare Mask | 1403 + + 1404 | | 1405 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1407 A.2. DDP Segment for Atomic Response 1409 The following figure depicts an Atomic Operation Response, DDP 1410 Segment: 1412 0 1 2 3 1413 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1414 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1415 | DDP Control | RDMA Control | 1416 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1417 | Reserved (Not Used) | 1418 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1419 | DDP (Atomic Operation Request) Queue Number | 1420 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1421 | DDP (Atomic Operation Request) Message Sequence Number | 1422 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1423 | DDP (Atomic Operation Request) Message Offset | 1424 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1425 | Original Request Identifier | 1426 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1427 | Original Remote Value | 1428 + + 1429 | | 1430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1432 A.3. DDP Segment for Immediate Data and Immediate Data with SE 1434 The following figure depicts an Immediate Data or Immediate data 1435 with SE, DDP Segment: 1437 0 1 2 3 1438 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1440 | DDP Control | RDMA Control | 1441 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1442 | Reserved (Not Used) | 1443 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1444 | DDP (Send) Queue Number | 1445 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1446 | DDP (Send) Message Sequence Number | 1447 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1448 | DDP Message Offset | 1449 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1450 | Immediate Data | 1451 + + 1452 | | 1453 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1455 Authors' Addresses 1457 Hemal Shah 1458 Broadcom Corporation 1459 5300 California Avenue 1460 Irvine, CA 92617 1461 Phone: 1-949-926-6941 1462 Email: hemal@broadcom.com 1464 Felix Marti 1465 Chelsio Communications, Inc. 1466 370 San Aleso Ave. 1467 Sunnyvale, CA 94085 1468 Phone: 1-408-962-3600 1469 Email: felix@chelsio.com 1471 Asgeir Eiriksson 1472 Chelsio Communications, Inc. 1473 370 San Aleso Ave. 1474 Sunnyvale, CA 94085 1475 Phone: 1-408-962-3600 1476 Email: asgeir@chelsio.com 1478 Wael Noureddine 1479 Chelsio Communications, Inc. 1480 370 San Aleso Ave. 1481 Sunnyvale, CA 94085 1482 Phone: 1-408-962-3600 1483 Email: wael@chelsio.com 1485 Robert Sharp 1486 Intel Corporation 1487 1300 South Mopac Expy, Mailstop: AN4-4B 1488 Austin, TX 78746 1489 Phone: 1-512-362-1407 1490 Email: robert.o.sharp@intel.com