idnits 2.17.1 draft-ietf-nfsv4-rfc5666bis-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (April 8, 2016) is 2940 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-08) exists of draft-ietf-nfsv4-rpcrdma-bidirection-01 -- Obsolete informational reference (is this intentional?): RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 5661 (Obsoleted by RFC 8881) -- Obsolete informational reference (is this intentional?): RFC 5666 (Obsoleted by RFC 8166) -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever, Ed. 3 Internet-Draft Oracle 4 Obsoletes: 5666 (if approved) W. Simpson 5 Intended status: Standards Track DayDreamer 6 Expires: October 10, 2016 T. Talpey 7 Microsoft 8 April 8, 2016 10 Remote Direct Memory Access Transport for Remote Procedure Call, Version 11 One 12 draft-ietf-nfsv4-rfc5666bis-05 14 Abstract 16 This document specifies a protocol for conveying Remote Procedure 17 Call (RPC) messages on physical transports capable of Remote Direct 18 Memory Access (RDMA). It requires no revision to application RPC 19 protocols or the RPC protocol itself. This document obsoletes RFC 20 5666. 22 Status of This Memo 24 This Internet-Draft is submitted in full conformance with the 25 provisions of BCP 78 and BCP 79. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF). Note that other groups may also distribute 29 working documents as Internet-Drafts. The list of current Internet- 30 Drafts is at http://datatracker.ietf.org/drafts/current/. 32 Internet-Drafts are draft documents valid for a maximum of six months 33 and may be updated, replaced, or obsoleted by other documents at any 34 time. It is inappropriate to use Internet-Drafts as reference 35 material or to cite them other than as "work in progress." 37 This Internet-Draft will expire on October 10, 2016. 39 Copyright Notice 41 Copyright (c) 2016 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents 46 (http://trustee.ietf.org/license-info) in effect on the date of 47 publication of this document. Please review these documents 48 carefully, as they describe your rights and restrictions with respect 49 to this document. Code Components extracted from this document must 50 include Simplified BSD License text as described in Section 4.e of 51 the Trust Legal Provisions and are provided without warranty as 52 described in the Simplified BSD License. 54 Table of Contents 56 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 57 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 58 1.2. Remote Procedure Calls On RDMA Transports . . . . . . . . 3 59 2. Changes Since RFC 5666 . . . . . . . . . . . . . . . . . . . 4 60 2.1. Changes To The Specification . . . . . . . . . . . . . . 4 61 2.2. Changes To The Protocol . . . . . . . . . . . . . . . . . 4 62 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 63 3.1. Remote Procedure Calls . . . . . . . . . . . . . . . . . 5 64 3.2. Remote Direct Memory Access . . . . . . . . . . . . . . . 8 65 4. RPC-Over-RDMA Protocol Framework . . . . . . . . . . . . . . 10 66 4.1. Transfer Models . . . . . . . . . . . . . . . . . . . . . 10 67 4.2. Message Framing . . . . . . . . . . . . . . . . . . . . . 11 68 4.3. Managing Receiver Resources . . . . . . . . . . . . . . . 11 69 4.4. XDR Encoding With Chunks . . . . . . . . . . . . . . . . 13 70 4.5. Message Size . . . . . . . . . . . . . . . . . . . . . . 20 71 5. RPC-Over-RDMA In Operation . . . . . . . . . . . . . . . . . 23 72 5.1. XDR Protocol Definition . . . . . . . . . . . . . . . . . 24 73 5.2. Fixed Header Fields . . . . . . . . . . . . . . . . . . . 28 74 5.3. Chunk Lists . . . . . . . . . . . . . . . . . . . . . . . 30 75 5.4. Memory Registration . . . . . . . . . . . . . . . . . . . 32 76 5.5. Error Handling . . . . . . . . . . . . . . . . . . . . . 34 77 5.6. Protocol Elements No Longer Supported . . . . . . . . . . 36 78 5.7. XDR Examples . . . . . . . . . . . . . . . . . . . . . . 37 79 6. RPC Bind Parameters . . . . . . . . . . . . . . . . . . . . . 39 80 7. Upper Layer Binding Specifications . . . . . . . . . . . . . 40 81 7.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 41 82 7.2. Maximum Reply Size . . . . . . . . . . . . . . . . . . . 42 83 7.3. Additional Considerations . . . . . . . . . . . . . . . . 42 84 7.4. Upper Layer Protocol Extensions . . . . . . . . . . . . . 43 85 8. Protocol Extensibility . . . . . . . . . . . . . . . . . . . 43 86 8.1. Conventional Extensions . . . . . . . . . . . . . . . . . 44 87 9. Security Considerations . . . . . . . . . . . . . . . . . . . 44 88 9.1. Memory Protection . . . . . . . . . . . . . . . . . . . . 44 89 9.2. RPC Message Security . . . . . . . . . . . . . . . . . . 45 90 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 48 91 11. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 49 92 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 49 93 12.1. Normative References . . . . . . . . . . . . . . . . . . 49 94 12.2. Informative References . . . . . . . . . . . . . . . . . 51 95 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 52 97 1. Introduction 99 This document obsoletes RFC 5666. However, the protocol specified by 100 this document is based on existing interoperating implementations of 101 the RPC-over-RDMA Version One protocol. 103 The new specification clarifies text that is subject to multiple 104 interpretations, and removes support for unimplemented RPC-over-RDMA 105 Version One protocol elements. It makes the role of Upper Layer 106 Bindings an explicit part of the protocol specification. 108 In addition, this document describes current practice using 109 RPCSEC_GSS [I-D.ietf-nfsv4-rpcsec-gssv3] on RDMA transports. 111 1.1. Requirements Language 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in [RFC2119]. 117 1.2. Remote Procedure Calls On RDMA Transports 119 Remote Direct Memory Access (RDMA) [RFC5040] [RFC5041] [IB] is a 120 technique for moving data efficiently between end nodes. By 121 directing data into destination buffers as it is sent on a network, 122 and placing it via direct memory access by hardware, the benefits of 123 faster transfers and reduced host overhead are obtained. 125 Open Network Computing Remote Procedure Call (ONC RPC, or simply, 126 RPC) [RFC5531] is a remote procedure call protocol that runs over a 127 variety of transports. Most RPC implementations today use UDP 128 [RFC0768] or TCP [RFC0793]. On UDP, RPC messages are encapsulated 129 inside datagrams, while on a TCP byte stream, RPC messages are 130 delineated by a record marking protocol. An RDMA transport also 131 conveys RPC messages in a specific fashion that must be fully 132 described if RPC implementations are to interoperate. 134 RDMA transports present semantics different from either UDP or TCP. 135 They retain message delineations like UDP, but provide a reliable and 136 sequenced data transfer like TCP. They also provide an offloaded 137 bulk transfer service not provided by UDP or TCP. RDMA transports 138 are therefore appropriately viewed as a new transport type by RPC. 140 In this context, the Network File System (NFS) protocols as described 141 in [RFC1094], [RFC1813], [RFC7530], [RFC5661], and future NFSv4 minor 142 verions are obvious beneficiaries of RDMA transports. A complete 143 problem statement is discussed in [RFC5532], and NFSv4-related issues 144 are discussed in [RFC5661]. Many other RPC-based protocols can also 145 benefit. 147 Although the RDMA transport described here can provide relatively 148 transparent support for any RPC application, this document also 149 describes mechanisms that can optimize data transfer further, given 150 more active participation by RPC applications. 152 2. Changes Since RFC 5666 154 2.1. Changes To The Specification 156 The following alterations have been made to the RPC-over-RDMA Version 157 One specification. The section numbers below refer to [RFC5666]. 159 o Section 2 has been expanded to introduce and explain key RPC, XDR, 160 and RDMA terminology. These terms are now used consistently 161 throughout the specification. 163 o Section 3 has been re-organized and split into sub-sections to 164 help readers locate specific requirements and definitions. 166 o Sections 4 and 5 have been combined to improve the organization of 167 this information. 169 o The specification of the optional Connection Configuration 170 Protocol has been removed from the specification. 172 o A section consolidating requirements for Upper Layer Bindings has 173 been added. 175 o An XDR extraction mechanism is provided, along with full 176 copyright, matching the approach used in [RFC5662]. 178 o The "Security Considerations" section has been expanded to include 179 a discussion of how RPC-over-RDMA security depends on features of 180 the underlying RDMA transport. 182 o A subsection describing the use of RPCSEC_GSS with RPC-over-RDMA 183 Version One has been added. 185 2.2. Changes To The Protocol 187 Although the protocol described herein interoperates with existing 188 implementations of [RFC5666], the following changes have been made 189 relative to the protocol described in that document: 191 o Support for the Read-Read transfer model has been removed. Read- 192 Read is a slower transfer model than Read-Write, thus implementers 193 have chosen not to support it. Removal simplifies explanatory 194 text, and support for the RDMA_DONE procedure is no longer 195 necessary. 197 o The specification of RDMA_MSGP in [RFC5666] and current 198 implementations of it are incomplete. Even if completed, benefit 199 for protocols such as NFSv4.0 [RFC7530] is doubtful. Therefore 200 the RDMA_MSGP message type is no longer supported. 202 o Technical errors with regard to handling RPC-over-RDMA header 203 errors have been corrected. 205 o Specific requirements related to handling XDR round-up and complex 206 XDR data types have been added. 208 o Explicit guidance is provided for sizing Write chunks, managing 209 multiple chunks in the Write list, and handling unused Write 210 chunks. 212 o Clear guidance about Send and Receive buffer size has been added. 213 This enables better decisions about when to provide and use the 214 Reply chunk. 216 The protocol version number has not been changed because the protocol 217 specified in this document fully interoperates with implementations 218 of the RPC-over-RDMA Version One protocol specified in [RFC5666]. 220 3. Terminology 222 3.1. Remote Procedure Calls 224 This section highlights key elements of the Remote Procedure Call 225 [RFC5531] and External Data Representation [RFC4506] protocols, upon 226 which RPC-over-RDMA Version One is constructed. Strong grounding 227 with these protocols is recommended before reading this document. 229 3.1.1. Upper Layer Protocols 231 Remote Procedure Calls are an abstraction used to implement the 232 operations of an "Upper Layer Protocol," or ULP. The term Upper 233 Layer Protocol refers to an RPC Program and Version tuple, which is a 234 versioned set of procedure calls that comprise a single well-defined 235 API. One example of an Upper Layer Protocol is the Network File 236 System Version 4.0 [RFC7530]. 238 3.1.2. Requesters And Responders 240 Like a local procedure call, every Remote Procedure Call (RPC) has a 241 set of "arguments" and a set of "results". A calling context is not 242 allowed to proceed until the procedure's results are available to it. 243 Unlike a local procedure call, the called procedure is executed 244 remotely rather than in the local application's context. 246 The RPC protocol as described in [RFC5531] is fundamentally a 247 message-passing protocol between one server and one or more clients. 248 ONC RPC transactions are made up of two types of messages: 250 CALL Message 251 A CALL message, or "Call", requests that work be done. A Call is 252 designated by the value zero (0) in the message's msg_type field. 253 An arbitrary unique value is placed in the message's xid field in 254 order to match this CALL message to a corresponding REPLY message. 256 REPLY Message 257 A REPLY message, or "Reply", reports the results of work requested 258 by a Call. A Reply is designated by the value one (1) in the 259 message's msg_type field. The value contained in the message's 260 xid field is copied from the Call whose results are being 261 reported. 263 The RPC client endpoint, or "requester", serializes an RPC Call's 264 arguments and conveys them to a server endpoint via an RPC Call 265 message. This message contains an RPC protocol header, a header 266 describing the requested upper layer operation, and all arguments. 268 The RPC server endpoint, or "responder", deserializes the arguments 269 and processes the requested operation. It then serializes the 270 operation's results into another byte stream. This byte stream is 271 conveyed back to the requester via an RPC Reply message. This 272 message contains an RPC protocol header, a header describing the 273 upper layer reply, and all results. 275 The requester deserializes the results and allows the original caller 276 to proceed. At this point the RPC transaction designated by the xid 277 in the Call message is complete, and the xid is retired. 279 3.1.3. RPC Transports 281 The role of an "RPC transport" is to mediate the exchange of RPC 282 messages between requesters and responders. An RPC transport bridges 283 the gap between the RPC message abstraction and the native operations 284 of a particular network transport. 286 RPC-over-RDMA is a connection-oriented RPC transport. When a 287 connection-oriented transport is used, requesters initiate transport 288 connections, while responders wait passively for incoming connection 289 requests. 291 3.1.4. External Data Representation 293 One cannot assume that all requesters and responders internally 294 represent data objects the same way. RPC uses eXternal Data 295 Representation, or XDR, to translate data types and serialize 296 arguments and results [RFC4506]. 298 The XDR protocol encodes data independent of the endianness or size 299 of host-native data types, allowing unambiguous decoding of data on 300 the receiving end. RPC Programs are specified by writing an XDR 301 definition of their procedures, argument data types, and result data 302 types. 304 XDR assumes that the number of bits in a byte (octet) and their order 305 are the same on both endpoints and on the physical network. The 306 smallest indivisible unit of XDR encoding is a group of four octets 307 in little-endian order. XDR also flattens lists, arrays, and other 308 complex data types so they can be conveyed as a stream of bytes. 310 A serialized stream of bytes that is the result of XDR encoding is 311 referred to as an "XDR stream." A sending endpoint encodes native 312 data into an XDR stream and then transmits that stream to a receiver. 313 A receiving endpoint decodes incoming XDR byte streams into its 314 native data representation format. 316 3.1.4.1. XDR Opaque Data 318 Sometimes a data item must be transferred as-is, without encoding or 319 decoding. The contents of such a data item are referred to as 320 "opaque data." XDR encoding places the content of opaque data items 321 directly into an XDR stream without altering it in any way. Upper 322 Layer Protocols or applications perform any needed data translation 323 in this case. Examples of opaque data items include the content of 324 files, or generic byte strings. 326 3.1.4.2. XDR Round-up 328 The number of octets in a variable-size opaque data item precedes 329 that item in an XDR stream. If the size of an encoded data item is 330 not a multiple of four octets, octets containing zero are added to 331 the end of the item as it is encoded so that the next encoded data 332 item starts on a four-octet boundary. The encoded size of the item 333 is not changed by the addition of the extra octets, and the zero 334 bytes are not exposed to the Upper Layer. 336 This technique is referred to as "XDR round-up," and the extra octets 337 are referred to as "XDR padding". 339 3.2. Remote Direct Memory Access 341 RPC requesters and responders can be made more efficient if large RPC 342 messages are transferred by a third party such as intelligent network 343 interface hardware (data movement offload), and placed in the 344 receiver's memory so that no additional adjustment of data alignment 345 has to be made (direct data placement). Remote Direct Memory Access 346 transports enable both optimizations. 348 3.2.1. Direct Data Placement 350 Typically, RPC implementations copy the contents of RPC messages into 351 a buffer before being sent. An efficient RPC implementation sends 352 bulk data without copying it into a separate send buffer first. 354 However, socket-based RPC implementations are often unable to receive 355 data directly into its final place in memory. Receivers often need 356 to copy incoming data to finish an RPC operation; sometimes, only to 357 adjust data alignment. 359 In this document, "RDMA" refers to the physical mechanism an RDMA 360 transport utilizes when moving data. Although this may not be 361 efficient, before an RDMA transfer a sender may copy data into an 362 intermediate buffer before an RDMA transfer. After an RDMA transfer, 363 a receiver may copy that data again to its final destination. 365 This document uses the term "direct data placement" (or DDP) to refer 366 specifically to an optimized data transfer where it is unnecessary 367 for a receiving host's CPU to copy transferred data to another 368 location after it has been received. Not all RDMA-based data 369 transfer qualifies as Direct Data Placement, and DDP can be achieved 370 using non-RDMA mechanisms. 372 3.2.2. RDMA Transport Requirements 374 The RPC-over-RDMA Version One protocol assumes the physical transport 375 provides the following abstract operations. A more complete 376 discussion of these operations is found in [RFC5040]. 378 Registered Memory 379 Registered memory is a segment of memory that is assigned a 380 steering tag that temporarily permits access by the RDMA provider 381 to perform data transfer operations. The RPC-over-RDMA Version 382 One protocol assumes that each segment of registered memory MUST 383 be identified with a steering tag of no more than 32 bits and 384 memory addresses of up to 64 bits in length. 386 RDMA Send 387 The RDMA provider supports an RDMA Send operation, with completion 388 signaled on the receiving peer after data has been placed in a 389 pre-posted memory segment. Sends complete at the receiver in the 390 order they were issued at the sender. The amount of data 391 transferred by an RDMA Send operation is limited by the size of 392 the remote pre-posted memory segment. 394 RDMA Receive 395 The RDMA provider supports an RDMA Receive operation to receive 396 data conveyed by incoming RDMA Send operations. To reduce the 397 amount of memory that must remain pinned awaiting incoming Sends, 398 the amount of pre-posted memory is limited. Flow-control to 399 prevent overrunning receiver resources is provided by the RDMA 400 consumer (in this case, the RPC-over-RDMA Version One protocol). 402 RDMA Write 403 The RDMA provider supports an RDMA Write operation to directly 404 place data in remote memory. The local host initiates an RDMA 405 Write, and completion is signaled there. No completion is 406 signaled on the remote. The local host provides a steering tag, 407 memory address, and length of the remote's memory segment. 409 RDMA Writes are not necessarily ordered with respect to one 410 another, but are ordered with respect to RDMA Sends. A subsequent 411 RDMA Send completion obtained at the write initiator guarantees 412 that prior RDMA Write data has been successfully placed in the 413 remote peer's memory. 415 RDMA Read 416 The RDMA provider supports an RDMA Read operation to directly 417 place peer source data in the read initiator's memory. The local 418 host initiates an RDMA Read, and completion is signaled there; no 419 completion is signaled on the remote. The local host provides 420 steering tags, memory addresses, and a length for the remote 421 source and local destination memory segments. 423 The remote peer receives no notification of RDMA Read completion. 424 The local host signals completion as part of a subsequent RDMA 425 Send message so that the remote peer can release steering tags and 426 subsequently free associated source memory segments. 428 The RPC-over-RDMA Version One protocol is designed to be carried over 429 RDMA transports that support the above abstract operations. This 430 protocol conveys to the RPC peer information sufficient for that RPC 431 peer to direct an RDMA layer to perform transfers containing RPC data 432 and to communicate their result(s). For example, it is readily 433 carried over RDMA transports such as Internet Wide Area RDMA Protocol 434 (iWARP) [RFC5040] [RFC5041]. 436 4. RPC-Over-RDMA Protocol Framework 438 4.1. Transfer Models 440 A "transfer model" designates which endpoint is responsible for 441 performing RDMA Read and Write operations. To enable these 442 operations, the peer endpoint first exposes segments of its memory to 443 the endpoint performing the RDMA Read and Write operations. 445 Read-Read 446 Requesters expose their memory to the responder, and the responder 447 exposes its memory to requesters. The responder employs RDMA Read 448 operations to pull RPC arguments or whole RPC calls from the 449 requester. Requesters employ RDMA Read operations to pull RPC 450 results or whole RPC relies from the responder. 452 Write-Write 453 Requesters expose their memory to the responder, and the responder 454 exposes its memory to requesters. Requesters employ RDMA Write 455 operations to push RPC arguments or whole RPC calls to the 456 responder. The responder employs RDMA Write operations to push 457 RPC results or whole RPC relies to the requester. 459 Read-Write 460 Requesters expose their memory to the responder, but the responder 461 does not expose its memory. The responder employs RDMA Read 462 operations to pull RPC arguments or whole RPC calls from the 463 requester. The responder employs RDMA Write operations to push 464 RPC results or whole RPC relies to the requester. 466 Write-Read 467 The responder exposes its memory to requesters, but requesters do 468 not expose their memory. Requesters employ RDMA Write operations 469 to push RPC arguments or whole RPC calls to the responder. 470 Requesters employ RDMA Read operations to pull RPC results or 471 whole RPC relies from the responder. 473 [RFC5666] specifies the use of both the Read-Read and the Read-Write 474 Transfer Model. All current RPC-over-RDMA Version One 475 implementations use only the Read-Write Transfer Model. Therefore 476 the use of the Read-Read Transfer Model within RPC-over-RDMA Version 477 One implementations is no longer supported. Other Transfer Models 478 may be used in future versions of RPC-over-RDMA. 480 4.2. Message Framing 482 On an RPC-over-RDMA transport, each RPC message is encapsulated by an 483 RPC-over-RDMA message. An RPC-over-RDMA message consists of two XDR 484 streams. 486 RPC Payload Stream 487 The "Payload stream" contains the encapsulated RPC message being 488 transferred by this RPC-over-RDMA message. This stream always 489 begins with the XID field of the encapsulated RPC message. 491 Transport Stream 492 The "Transport stream" contains a header that describes and 493 controls the transfer of the Payload stream in this RPC-over-RDMA 494 message. This header is analogous to the record marking used for 495 RPC over TCP but is more extensive, since RDMA transports support 496 several modes of data transfer. 498 In its simplest form, an RPC-over-RDMA message consists of a 499 Transport stream followed immediately by a Payload stream conveyed 500 together in a single RDMA Send. To transmit large RPC messages, a 501 combination of one RDMA Send operation and one or more RDMA Read or 502 Write operations is employed. 504 RPC-over-RDMA framing replaces all other RPC framing (such as TCP 505 record marking) when used atop an RPC-over-RDMA association, even 506 when the underlying RDMA protocol may itself be layered atop a 507 transport with a defined RPC framing (such as TCP). 509 It is however possible for RPC-over-RDMA to be dynamically enabled in 510 the course of negotiating the use of RDMA via an Upper Layer Protocol 511 exchange. Because RPC framing delimits an entire RPC request or 512 reply, the resulting shift in framing must occur between distinct RPC 513 messages, and in concert with the underlying transport. 515 4.3. Managing Receiver Resources 517 It is critical to provide RDMA Send flow control for an RDMA 518 connection. If no pre-posted receive buffer is large enough to 519 accept an incoming RDMA Send, the RDMA Send operation fails. If a 520 pre-posted receive buffer is not available to accept an incoming RDMA 521 Send, the RDMA Send operation can fail. Repeated occurrences of such 522 errors can be fatal to the connection. This is a departure from 523 conventional TCP/IP networking where buffers are allocated 524 dynamically as part of receiving messages. 526 The longevity of an RDMA connection requires that sending endpoints 527 respect the resource limits of peer receivers. To ensure messages 528 can be sent and received reliably, there are two operational 529 parameters for each connection. 531 4.3.1. Credit Limit 533 The number of pre-posted RDMA Receive operations is sometimes 534 referred to as a peer's "credit limit." Flow control for RDMA Send 535 operations directed to the responder is implemented as a simple 536 request/grant protocol in the RPC-over-RDMA header associated with 537 each RPC message. Section 5.2.3 has further detail. 539 o The RPC-over-RDMA header for RPC Call messages contains a 540 requested credit value for the responder. This is the maximum 541 number of RPC replies the requester can handle at once, 542 independent of how many RPCs are in flight at that moment. The 543 requester MAY dynamically adjust the requested credit value to 544 match its expected needs. 546 o The RPC-over-RDMA header for RPC Reply messages provides the 547 granted result. This is the maximum number of RPC calls the 548 responder can handle at once, without regard to how many RPCs are 549 in flight at that moment. The granted value MUST NOT be zero, 550 since such a value would result in deadlock. The responder MAY 551 dynamically adjust the granted credit value to match its needs or 552 policies (e.g. to accommodate the available resources in a shared 553 receive queue). 555 The requester MUST NOT send unacknowledged requests in excess of this 556 granted responder credit limit. If the limit is exceeded, the RDMA 557 layer may signal an error, possibly terminating the connection. If 558 an RDMA layer error does not occur, the responder MAY handle excess 559 requests or return an RPC layer error to the requester. 561 While RPC calls complete in any order, the current flow control limit 562 at the responder is known to the requester from the Send ordering 563 properties. It is always the lower of the requested and granted 564 credit values, minus the number of requests in flight. Advertised 565 credit values are not altered when individual RPCs are started or 566 completed. 568 On occasion a requester or responder may need to adjust the amount of 569 resources available to a connection. When this happens, the 570 responder needs to ensure that a credit increase is effected (i.e. 571 RDMA Receives are posted) before the next reply is sent. 573 Certain RDMA implementations may impose additional flow control 574 restrictions, such as limits on RDMA Read operations in progress at 575 the responder. Accommodation of such restrictions is considered the 576 responsibility of each RPC-over-RDMA Version One implementation. 578 4.3.2. Inline Threshold 580 A receiver's "inline threshold" value is the largest message size (in 581 octets) that the receiver can accept via an RDMA Receive operation. 582 Each connection has two inline threshold values, one for each peer 583 receiver. 585 Unlike credit limits, inline threshold values are not advertised to 586 peers via the RPC-over-RDMA Version One protocol, and there is no 587 provision for the inline threshold value to change during the 588 lifetime of an RPC-over-RDMA Version One connection. 590 4.3.3. Initial Connection State 592 When a connection is first established, peers might not know how many 593 receive buffers the other has, nor how large these buffers are. 595 As a basis for an initial exchange of RPC requests, each RPC-over- 596 RDMA Version One connection provides the ability to exchange at least 597 one RPC message at a time that is 1024 bytes in size. A responder 598 MAY exceed this basic level of configuration, but a requester MUST 599 NOT assume more than one credit is available, and MUST receive a 600 valid reply from the responder carrying the actual number of 601 available credits, prior to sending its next request. 603 Receiver implementations MUST support an inline threshold of 1024 604 bytes, but MAY support larger inline thresholds values. A mechanism 605 for discovering a peer's inline threshold value before a connection 606 is established may be used to optimize the use of RDMA Send 607 operations. In the absense of such a mechanism, senders MUST assume 608 a receiver's inline threshold is 1024 bytes. 610 4.4. XDR Encoding With Chunks 612 When a direct data placement capability is available, during XDR 613 encoding it can be determined that the transport can efficiently 614 place the content of one or more data items directly in the 615 receiver's memory, separately from the transfer of other parts of the 616 containing XDR stream. 618 4.4.1. Reducing An XDR Stream 620 RPC-over-RDMA Version One provides a mechanism for moving part of an 621 RPC message via a data transfer separate from an RDMA Send/Receive. 622 The sender removes one or more XDR data items from the Payload 623 stream. They are conveyed via one or more RDMA Read or Write 624 operations. The receiver inserts the data items into the Payload 625 stream before passing it to the Upper Layer. 627 A contiguous piece of a Payload stream can be split out and moved via 628 separate RDMA operations. The piece of memory containing that 629 portion of the data stream and metadata in an RPC-over-RDMA header 630 together comprise what is referred to as a "chunk." A Payload stream 631 after chunks have been removed is referred to as a "reduced" Payload 632 stream. Likewise, a data item that has been removed from a Payload 633 stream to be transferred separately is referred to as a "reduced" 634 data item. 636 4.4.2. DDP-Eligibility 638 Only an XDR data item that might benefit from Direct Data Placement 639 may be reduced. The eligibility of particular XDR data items to be 640 reduced is independent of RPC-over-RDMA, and thus is not specified by 641 this document. 643 To maintain interoperability on an RPC-over-RDMA transport, a 644 determination must be made of which XDR data items in each Upper 645 Layer Protocol are allowed to use Direct Data Placement. Therefore 646 an additional specification is needed that describes how an Upper 647 Layer Protocol enables Direct Data Placement. The set of 648 requirements for an Upper Layer Protocol to use an RPC-over-RDMA 649 transport is known as an "Upper Layer Binding specification," or ULB. 651 An Upper Layer Binding specification states which specific individual 652 XDR data items in an Upper Layer Protocol MAY be transferred via 653 Direct Data Placement. This document will refer to XDR data items 654 that are permitted to be reduced as "DDP-eligible". All other XDR 655 data items MUST NOT be reduced. RPC-over-RDMA Version One uses RDMA 656 Read and Write operations to transfer DDP-eligible data that has been 657 reduced. 659 Detailed requirements for Upper Layer Bindings are discussed in full 660 in Section 7. 662 4.4.3. RDMA Segments 664 When encoding a Payload stream that contains a DDP-eligible data 665 item, a sender may choose to reduce that data item. It does not 666 place the item into the Payload stream. Instead, the sender records 667 in the RPC-over-RDMA header the actual address and size of the memory 668 region containing that data item. 670 The requester provides location information for DDP-eligible data 671 items in both RPC Calls and Replies. The responder uses this 672 information to initiate RDMA Read and Write operations to retrieve or 673 update the specified region of the requester's memory. 675 An "RDMA segment", or a "plain segment", is an RPC-over-RDMA header 676 data object that contains the precise co-ordinates of a contiguous 677 memory region that is to be conveyed via one or more RDMA Read or 678 RDMA Write operations. The following fields are contained in each 679 segment. 681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 682 | Handle | 683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 684 | Length | 685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 686 | | 687 + Offset + 688 | | 689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 691 Handle 692 Steering tag (STag) or handle obtained when the segment's memory 693 is registered for RDMA. Also known as an R_key, this value is 694 generated by registering this memory with the RDMA provider. 696 Length 697 The length of the memory segment, in octets. 699 Offset 700 The offset or beginning memory address of the segment. 702 See [RFC5040] for further discussion of the meaning of these fields. 704 4.4.4. Chunks 706 In RPC-over-RDMA Version One, a "chunk" refers to a portion of the 707 Payload stream that is moved via RDMA Read or Write operations. 708 Chunk data is removed from the sender's Payload stream, transferred 709 by separate RDMA operations, and then re-inserted into the receiver's 710 Payload stream. 712 Each chunk consists of one or more RDMA segments. Each segment 713 represents a single contiguous piece of that chunk. Segments MAY 714 divide a chunk on any boundary that is convenient to the requester. 716 Except in special cases, a chunk contains exactly one XDR data item. 717 This makes it straightforward to remove chunks from an XDR stream 718 without affecting XDR alignment. Not every RPC-over-RDMA message has 719 chunks associated with it. 721 4.4.4.1. Counted Arrays 723 If a chunk contains a counted array data type, the count of array 724 elements MUST remain in the Payload stream, while the array elements 725 MUST be moved to the chunk. For example, when encoding an opaque 726 byte array as a chunk, the count of bytes stays in the Payload 727 stream, while the bytes in the array are removed from the Payload 728 stream and transferred within the chunk. 730 Any byte count left in the Payload stream MUST match the sum of the 731 lengths of the segments making up the chunk. If they do not agree, 732 an RPC protocol encoding error results. 734 Individual array elements appear in a chunk in their entirety. For 735 example, when encoding an array of arrays as a chunk, the count of 736 items in the enclosing array stays in the Payload stream, but each 737 enclosed array, including its item count, is transferred as part of 738 the chunk. 740 4.4.4.2. Optional-data 742 If a chunk contains an optional-data data type, the "is present" 743 field MUST remain in the Payload stream, while the data, if present, 744 MUST be moved to the chunk. 746 4.4.4.3. XDR Unions 748 A union data type should never be made DDP-eligible, but one or more 749 of its arms may be DDP-eligible. 751 4.4.5. Read Chunks 753 A "Read chunk" represents an XDR data item that is to be pulled from 754 the requester to the responder using RDMA Read operations. 756 A Read chunk is a list of one or more RDMA segments. Each RDMA 757 segment in a Read chunk is a plain segment which has an additional 758 Position field. 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 | Position | 762 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 763 | Handle | 764 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 765 | Length | 766 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 767 | | 768 + Offset + 769 | | 770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 772 Position 773 The byte offset in the Payload stream where the receiver re- 774 inserts the data item conveyed in a chunk. The Position value 775 MUST be computed from the beginning of the Payload stream, which 776 begins at Position zero. All RDMA segments belonging to the same 777 Read chunk have the same value in their Position field. 779 While constructing an RPC-over-RDMA Call message, a requester 780 registers memory segments containing data in Read chunks. It 781 advertises these chunks in the RPC-over-RDMA header of the RPC Call. 783 After receiving an RPC Call sent via an RDMA Send operation, a 784 responder transfers the chunk data from the requester using RDMA Read 785 operations. The responder reconstructs the transferred chunk data by 786 concatenating the contents of each segment, in list order, into the 787 received Payload stream at the Position value recorded in the 788 segment. 790 Put another way, the responder inserts the first segment in a Read 791 chunk into the Payload stream at the byte offset indicated by its 792 Position field. Segments whose Position field value match this 793 offset are concatenated afterwards, until there are no more segments 794 at that Position value. The next XDR data item in the Payload stream 795 follows. 797 4.4.5.1. Read Chunk Round-up 799 XDR requires each encoded data item to start on four-byte alignment. 800 When an odd-length data item is encoded, its length is encoded 801 literally, while the data is padded so the next data item in the XDR 802 stream can start on a four-byte boundary. Receivers ignore the 803 content of the pad bytes. 805 After an XDR data item has been reduced, all data items remaining in 806 the Payload stream must continue to adhere to these padding 807 requirements. Thus when an XDR data item is moved from the Payload 808 stream into a Read chunk, the requester MUST remove XDR padding for 809 that data item from the Payload stream as well. 811 The length of a Read chunk is the sum of the lengths of the read 812 segments that comprise it. If this sum is not a multiple of four, 813 the requester MAY choose to send a Read chunk without any XDR 814 padding. If the requester provides no actual round-up in a Read 815 chunk, the responder MUST be prepared to provide appropriate round-up 816 in the reconstructed call XDR stream 818 The Position field in a read segment indicates where the containing 819 Read chunk starts in the Payload stream. The value in this field 820 MUST be a multiple of four. Moreover, all segments in the same Read 821 chunk share the same Position value, even if one or more of the 822 segments have a non-four-byte aligned length. 824 4.4.5.2. Decoding Read Chunks 826 While decoding a received Payload stream, whenever the XDR offset in 827 the Payload stream matches that of a Read chunk, the transport 828 initiates an RDMA Read to pull the chunk's data content into 829 registered memory on the responder. 831 The responder acknowledges its completion of use of Read chunk source 832 buffers when it sends an RPC Reply to the requester. The requester 833 may then release Read chunks advertised in the request. 835 4.4.6. Write Chunks 837 A "Write chunk" represents an XDR data item that is to be pushed from 838 a responder to a requester using RDMA Write operations. 840 A Write chunk is an array of one or more plain RDMA segments. Write 841 chunks are provided by a requester long before the responder has 842 prepared the reply Payload stream. Therefore RDMA segments in a 843 Write chunk do not have a Position field. 845 While constructing an RPC Call message, a requester also prepares 846 memory regions to catch DDP-eligible reply data items. A requester 847 does not know the actual length of the result data item to be 848 returned, thus it MUST register a Write chunk long enough to 849 accommodate the maximum possible size of the returned data item. 851 A responder copies the requester-provided Write chunk segments into 852 the RPC-over-RDMA header that it returns with the reply. The 853 responder MUST NOT change the number of segments in the Write chunk. 855 The responder fills the segments in array order until the data item 856 has been completely written. The responder updates the segment 857 length fields to reflect the actual amount of data that is being 858 returned in each segment. If a Write chunk segment is not filled by 859 the responder, the updated length of the segment SHOULD be zero. 861 The responder then sends the RPC Reply via an RDMA Send operation. 862 After receiving the RPC Reply, the requester reconstructs the 863 transferred data by concatenating the contents of each segment, in 864 array order, into RPC Reply XDR stream. 866 4.4.6.1. Write Chunk Round-up 868 XDR requires each encoded data item to start on four-byte alignment. 869 When an odd-length data item is encoded, its length is encoded 870 literally, while the data is padded so the next data item in the XDR 871 stream can start on a four-byte boundary. Receivers ignore the 872 content of the pad bytes. 874 After a data item is reduced, data items remaining in the Payload 875 stream must continue to adhere to these padding requirements. Thus 876 when an XDR data item is moved from a reply Payload stream into a 877 Write chunk, the responder MUST remove XDR padding for that data item 878 from the reply Payload stream as well. 880 A requester SHOULD NOT provide extra length in a Write chunk to 881 accommodate XDR pad bytes. A responder MUST NOT write XDR pad bytes 882 for a Write chunk. 884 4.4.6.2. Unused Write Chunks 886 There are occasions when a requester provides a Write chunk but the 887 responder does not use it. 889 For example, an Upper Layer Protocol may define a union result where 890 some arms of the union contain a DDP-eligible data item while other 891 arms do not. The requester is required to provide a Write chunk in 892 this case, but if the responder returns a result that uses an arm of 893 the union that has no DDP-eligible data item, the Write chunk remains 894 unused. 896 When forming an RPC-over-RDMA Reply message with an unused Write 897 chunk, the responder MUST set the length of all segments in the chunk 898 to zero. 900 Unused write chunks, or unused bytes in write chunk segments, are not 901 returned as results. Their memory is returned to the Upper Layer as 902 part of RPC completion. However, the RPC layer MUST NOT assume that 903 the buffers have not been modified. 905 4.5. Message Size 907 A receiver of RDMA Send operations is required by RDMA to have 908 previously posted one or more adequately sized buffers. Memory 909 savings can be achieved on both requesters and responders by leaving 910 the inline threshold small. However, not all RPC messages are small. 912 4.5.1. Short Messages 914 RPC messages are frequently smaller than typical inline thresholds. 915 For example, the NFS version 3 GETATTR request is only 56 bytes: 20 916 bytes of RPC header, plus a 32-byte file handle argument and 4 bytes 917 for its length. The reply to this common request is about 100 bytes. 919 Since all RPC messages conveyed via RPC-over-RDMA require an RDMA 920 Send operation, the most efficient way to send an RPC message that is 921 smaller than the receiver's inline threshold is to append the Payload 922 stream directly to the Transport stream. An RPC-over-RDMA header 923 with a small RPC Call or Reply message immediately following is 924 transferred using a single RDMA Send operation. No RDMA Read or 925 Write operations are needed. 927 An RPC-over-RDMA transaction using Short Messages: 929 Requester Responder 930 | RDMA Send (RDMA_MSG) | 931 Call | ------------------------------> | 932 | | Processing 933 | | 934 | | 935 | RDMA Send (RDMA_MSG) | 936 | <------------------------------ | Reply 938 4.5.2. Chunked Messages 940 If DDP-eligible data items are present in a Payload stream, a sender 941 MAY reduce some or all of these items by removing them from the 942 Payload stream. The sender uses RDMA Read or Write operations to 943 transfer the reduced data items. The Transport stream with the 944 reduced Payload stream immediately following is then transferred 945 using a single RDMA Send operation 947 After receiving the Transport and Payload streams of a Chunked RPC- 948 over-RDMA Call message, the responder uses RDMA Read operations to 949 move reduced data items in Read chunks. Before sending the Transport 950 and Payload streams of a Chunked RPC-over-RDMA Reply message, the 951 responder uses RDMA Write operations to move reduced data items in 952 Write and Reply chunks. 954 An RPC-over-RDMA transaction with a Read chunk: 956 Requester Responder 957 | RDMA Send (RDMA_MSG) | 958 Call | ------------------------------> | 959 | RDMA Read | 960 | <------------------------------ | 961 | RDMA Response (arg data) | 962 | ------------------------------> | 963 | | Processing 964 | | 965 | | 966 | RDMA Send (RDMA_MSG) | 967 | <------------------------------ | Reply 969 An RPC-over-RDMA transaction with a Write chunk: 971 Requester Responder 972 | RDMA Send (RDMA_MSG) | 973 Call | ------------------------------> | 974 | | Processing 975 | | 976 | | 977 | RDMA Write (result data) | 978 | <------------------------------ | 979 | RDMA Send (RDMA_MSG) | 980 | <------------------------------ | Reply 982 4.5.3. Long Messages 984 When a Payload stream is larger than the receiver's inline threshold, 985 the Payload stream is reduced by removing DDP-eligible data items and 986 placing them in chunks to be moved separately. If there are no DDP- 987 eligible data items in the Payload stream, or the Payload stream is 988 still too large after it has been reduced, the RDMA transport MUST 989 use RDMA Read or Write operations to convey the Payload stream 990 itself. This mechanism is referred to as a "Long Message." 992 To transmit a Long Message, the sender conveys only the Transport 993 stream with an RDMA Send operation. The Payload stream is not 994 included in the Send buffer in this instance. Instead, the requester 995 provides chunks that the responder uses to move the Payload stream. 997 Long RPC Call 998 To send a Long RPC-over-RDMA Call message, the requester provides 999 a special Read chunk that contains the RPC Call's Payload stream. 1000 Every segment in this Read chunk MUST contain zero in its Position 1001 field. Thus this chunk is known as a "Position Zero Read chunk." 1003 Long RPC Reply 1004 To send a Long RPC-over-RDMA Reply message, the requester provides 1005 a single special Write chunk in advance, known as the "Reply 1006 chunk", that will contain the RPC Reply's Payload stream. The 1007 requester sizes the Reply chunk to accommodate the maximum 1008 expected reply size for that Upper Layer operation. 1010 Though the purpose of a Long Message is to handle large RPC messages, 1011 requesters MAY use a Long Message at any time to convey an RPC Call. 1013 A responder chooses which form of reply to use based on the chunks 1014 provided by the requester. If Write chunks were provided and the 1015 responder has a DDP-eligible result, it first reduces the reply 1016 Payload stream. If a Reply chunk was provided and the reduced 1017 Payload stream is larger than the requester's inline threshold, the 1018 responder MUST use the provided Reply chunk for the reply. 1020 Because these special chunks contain a whole RPC message, XDR data 1021 items appear in these special chunks without regard to their DDP- 1022 eligibility. 1024 An RPC-over-RDMA transaction using a Long Call: 1026 Requester Responder 1027 | RDMA Send (RDMA_NOMSG) | 1028 Call | ------------------------------> | 1029 | RDMA Read | 1030 | <------------------------------ | 1031 | RDMA Response (RPC call) | 1032 | ------------------------------> | 1033 | | Processing 1034 | | 1035 | | 1036 | RDMA Send (RDMA_MSG) | 1037 | <------------------------------ | Reply 1039 An RPC-over-RDMA transaction using a Long Reply: 1041 Requester Responder 1042 | RDMA Send (RDMA_MSG) | 1043 Call | ------------------------------> | 1044 | | Processing 1045 | | 1046 | | 1047 | RDMA Write (RPC reply) | 1048 | <------------------------------ | 1049 | RDMA Send (RDMA_NOMSG) | 1050 | <------------------------------ | Reply 1052 5. RPC-Over-RDMA In Operation 1054 Every RPC-over-RDMA Version One message has a header that includes a 1055 copy of the message's transaction ID, data for managing RDMA flow 1056 control credits, and lists of RDMA segments used for RDMA Read and 1057 Write operations. All RPC-over-RDMA header content is contained in 1058 the Transport stream, and thus MUST be XDR encoded. 1060 RPC message layout is unchanged from that described in [RFC5531] 1061 except for the possible reduction of data items that are moved by 1062 RDMA Read or Write operations. 1064 The RPC-over-RDMA protocol passes RPC messages without regard to 1065 their type (CALL or REPLY) or direction (forwards or backwards). 1066 Each endpoint of a connection MAY send any RPC-over-RDMA message 1067 header type at any time (subject to credit limits). 1069 5.1. XDR Protocol Definition 1071 This section contains a description of the core features of the RPC- 1072 over-RDMA Version One protocol, expressed in the XDR language 1073 [RFC4506]. 1075 This description is provided in a way that makes it simple to extract 1076 into ready-to-compile form. The reader can apply the following shell 1077 script to this document to produce a machine-readable XDR description 1078 of the RPC-over-RDMA Version One protocol without any OPTIONAL 1079 extensions. 1081 1083 #!/bin/sh 1084 grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??' 1086 1088 That is, if the above script is stored in a file called "extract.sh" 1089 and this document is in a file called "spec.txt" then the reader can 1090 do the following to extract an XDR description file: 1092 1094 sh extract.sh < spec.txt > rpcrdma_corev1.x 1096 1098 5.1.1. Code Component License 1100 Code components extracted from this document must include the 1101 following license text. When the extracted XDR code is combined with 1102 other complementary XDR code which itself has an identical license, 1103 only a single copy of the license text need be preserved. 1105 1107 /// /* 1108 /// * Copyright (c) 2010, 2016 IETF Trust and the persons 1109 /// * identified as authors of the code. All rights reserved. 1110 /// * 1111 /// * The authors of the code are: 1112 /// * B. Callaghan, T. Talpey, and C. Lever 1113 /// * 1114 /// * Redistribution and use in source and binary forms, with 1115 /// * or without modification, are permitted provided that the 1116 /// * following conditions are met: 1117 /// * 1118 /// * - Redistributions of source code must retain the above 1119 /// * copyright notice, this list of conditions and the 1120 /// * following disclaimer. 1121 /// * 1122 /// * - Redistributions in binary form must reproduce the above 1123 /// * copyright notice, this list of conditions and the 1124 /// * following disclaimer in the documentation and/or other 1125 /// * materials provided with the distribution. 1126 /// * 1127 /// * - Neither the name of Internet Society, IETF or IETF 1128 /// * Trust, nor the names of specific contributors, may be 1129 /// * used to endorse or promote products derived from this 1130 /// * software without specific prior written permission. 1131 /// * 1132 /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS 1133 /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED 1134 /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1135 /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 1136 /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO 1137 /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 1138 /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 1139 /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 1140 /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 1141 /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 1142 /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 1143 /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 1144 /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 1145 /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 1146 /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 1147 /// */ 1148 /// 1150 1152 5.1.2. RPC-Over-RDMA Version One XDR 1154 XDR data items defined in this section encodes the Transport Header 1155 Stream in each RPC-over-RDMA Version One message. Comments identify 1156 items that cannot be changed in subsequent versions. 1158 1160 /// /* 1161 /// * Plain RDMA segment (Section 4.4.3) 1162 /// */ 1163 /// struct xdr_rdma_segment { 1164 /// uint32 handle; /* Registered memory handle */ 1165 /// uint32 length; /* Length of the chunk in bytes */ 1166 /// uint64 offset; /* Chunk virtual address or offset */ 1167 /// }; 1168 /// 1169 /// /* 1170 /// * Read segment (Section 4.4.5) 1171 /// */ 1172 /// struct xdr_read_chunk { 1173 /// uint32 position; /* Position in XDR stream */ 1174 /// struct xdr_rdma_segment target; 1175 /// }; 1176 /// 1177 /// /* 1178 /// * Read list (Section 5.3.1) 1179 /// */ 1180 /// struct xdr_read_list { 1181 /// struct xdr_read_chunk entry; 1182 /// struct xdr_read_list *next; 1183 /// }; 1184 /// 1185 /// /* 1186 /// * Write chunk (Section 4.4.6) 1187 /// */ 1188 /// struct xdr_write_chunk { 1189 /// struct xdr_rdma_segment target<>; 1190 /// }; 1191 /// 1192 /// /* 1193 /// * Write list (Section 5.3.2) 1194 /// */ 1195 /// struct xdr_write_list { 1196 /// struct xdr_write_chunk entry; 1197 /// struct xdr_write_list *next; 1198 /// }; 1199 /// 1200 /// /* 1201 /// * Chunk lists (Section 5.3) 1202 /// */ 1203 /// struct rpc_rdma_header { 1204 /// struct xdr_read_list *rdma_reads; 1205 /// struct xdr_write_list *rdma_writes; 1206 /// struct xdr_write_chunk *rdma_reply; 1207 /// /* rpc body follows */ 1208 /// }; 1209 /// 1210 /// struct rpc_rdma_header_nomsg { 1211 /// struct xdr_read_list *rdma_reads; 1212 /// struct xdr_write_list *rdma_writes; 1213 /// struct xdr_write_chunk *rdma_reply; 1214 /// }; 1215 /// 1216 /// struct rpc_rdma_header_padded { 1217 /// uint32 rdma_align; /* Padding alignment */ 1218 /// uint32 rdma_thresh; /* Padding threshold */ 1219 /// struct xdr_read_list *rdma_reads; 1220 /// struct xdr_write_list *rdma_writes; 1221 /// struct xdr_write_chunk *rdma_reply; 1222 /// /* rpc body follows */ 1223 /// }; 1224 /// 1225 /// /* 1226 /// * Error handling (Section 5.5) 1227 /// */ 1228 /// enum rpc_rdma_errcode { 1229 /// ERR_VERS = 1, /* Fixed for all versions */ 1230 /// ERR_CHUNK = 2 1231 /// }; 1232 /// 1233 /// struct rpc_rdma_errvers { 1234 /// uint32 rdma_vers_low; 1235 /// uint32 rdma_vers_high; 1236 /// }; 1237 /// 1238 /// union rpc_rdma_error switch (rpc_rdma_errcode err) { 1239 /// case ERR_VERS: 1240 /// rpc_rdma_errvers range; 1241 /// case ERR_CHUNK: 1242 /// void; 1243 /// }; 1244 /// 1245 /// /* 1246 /// * Procedures (Section 5.2.4) 1247 /// */ 1248 /// enum rdma_proc { 1249 /// RDMA_MSG = 0, /* Fixed for all versions */ 1250 /// RDMA_NOMSG = 1, /* Fixed for all versions */ 1251 /// RDMA_MSGP = 2, /* Reserved */ 1252 /// RDMA_DONE = 3, /* Reserved */ 1253 /// RDMA_ERROR = 4 /* Fixed for all versions */ 1254 /// }; 1255 /// 1256 /// union rdma_body switch (rdma_proc proc) { 1257 /// case RDMA_MSG: 1258 /// rpc_rdma_header rdma_msg; 1259 /// case RDMA_NOMSG: 1260 /// rpc_rdma_header_nomsg rdma_nomsg; 1261 /// case RDMA_MSGP: 1262 /// rpc_rdma_header_padded rdma_msgp; 1263 /// case RDMA_DONE: 1264 /// void; 1265 /// case RDMA_ERROR: 1266 /// rpc_rdma_error rdma_error; 1267 /// }; 1268 /// 1269 /// /* 1270 /// * Fixed header fields (Section 5.2) 1271 /// */ 1272 /// struct rdma_msg { 1273 /// uint32 rdma_xid; 1274 /// uint32 rdma_vers; 1275 /// uint32 rdma_credit; 1276 /// rdma_body rdma_body; 1277 /// }; 1279 1281 5.2. Fixed Header Fields 1283 The RPC-over-RDMA header begins with four fixed 32-bit fields that 1284 control the RDMA interaction. These four fields, which must remain 1285 with the same meanings and in the same positions in all subsequent 1286 versions of the RPC-over-RDMA protocol, are described below. 1288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1289 | XID | 1290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1291 | Version Number | 1292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1293 | Credit Value | 1294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1295 | Procedure Number | 1296 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1298 5.2.1. Transaction ID (XID) 1300 The XID generated for the RPC Call and Reply. Having the XID at a 1301 fixed location in the header makes it easy for the receiver to 1302 establish context as soon as each RPC-over-RDMA message arrives. 1303 This XID MUST be the same as the XID in the RPC message. The 1304 receiver MAY perform its processing based solely on the XID in the 1305 RPC-over-RDMA header, and thereby ignore the XID in the RPC message, 1306 if it so chooses. 1308 5.2.2. Version Number 1310 For RPC-over-RDMA Version One, this field MUST contain the value one 1311 (1). Rules regarding changes to this transport protocol version 1312 number can be found in Section 8. 1314 5.2.3. Credit Value 1316 When sent in an RPC Call message, the requested credit value is 1317 provided. When sent in an RPC Reply message, the granted credit 1318 value is returned. RPC Calls SHOULD NOT be sent in excess of the 1319 currently granted limit. Further discussion of how the credit value 1320 is determined can be found in Section 4.3. 1322 5.2.4. Procedure number 1324 o RDMA_MSG = 0 indicates that chunk lists and a Payload stream 1325 follow. The format of the chunk lists is discussed below. 1327 o RDMA_NOMSG = 1 indicates that after the chunk lists there is no 1328 Payload stream. In this case, the chunk lists provide information 1329 to allow the responder to transfer the Payload stream using RDMA 1330 Read or Write operations. 1332 o RDMA_MSGP = 2 is reserved. 1334 o RDMA_DONE = 3 is reserved. 1336 o RDMA_ERROR = 4 is used to signal an encoding error in the RPC- 1337 over-RDMA header. 1339 An RDMA_MSG procedure conveys the Transport stream and the Payload 1340 stream via an RDMA Send operation. The Transport stream contains the 1341 four fixed fields, followed by the Read and Write lists and the Reply 1342 chunk, though any or all three MAY be marked as not present. The 1343 Payload stream then follows, beginning with its XID field. If a Read 1344 or Write chunk list is present, a portion of the Payload stream has 1345 been excised and is conveyed separately via RDMA Read or Write 1346 operations. 1348 An RDMA_NOMSG procedure conveys the Transport stream via an RDMA Send 1349 operation. The Transport stream contains the four fixed fields, 1350 followed by the Read and Write chunk lists and the Reply chunk. 1351 Though any of these MAY be marked as not present, one MUST be present 1352 and MUST hold the Payload stream for this RPC-over-RDMA message. If 1353 a Read or Write chunk list is present, a portion of the Payload 1354 stream has been excised and is conveyed separately via RDMA Read or 1355 Write operations. 1357 An RDMA_ERROR procedure conveys the Transport stream via an RDMA Send 1358 operation. The Transport stream contains the four fixed fields, 1359 followed by formatted error information. No Payload stream is 1360 conveyed in this type of RPC-over-RDMA message. 1362 A gather operation on each RDMA Send operation can be used to combine 1363 the Transport and Payload streams, which might have been constructed 1364 in separate buffers. However, the total length of the gathered send 1365 buffers MUST NOT exceed the peer receiver's inline threshold. 1367 5.3. Chunk Lists 1369 The chunk lists in an RPC-over-RDMA Version One header are three XDR 1370 optional-data fields that follow the fixed header fields in RDMA_MSG 1371 and RDMA_NOMSG procedures. Read Section 4.19 of [RFC4506] carefully 1372 to understand how optional-data fields work. Examples of XDR encoded 1373 chunk lists are provided in Section 5.7 as an aid to understanding. 1375 5.3.1. Read List 1377 Each RDMA_MSG or RDMA_NOMSG procedure has one "Read list." The Read 1378 list is a list of zero or more Read segments, provided by the 1379 requester, that are grouped by their Position fields into Read 1380 chunks. Each Read chunk advertises the location of argument data the 1381 responder is to retrieve via RDMA Read operations. The requester has 1382 removed the data in these chunks from the call's Payload stream. 1384 Via a Position Zero Read Chunk, a requester may provide an RPC Call 1385 message as a chunk in the Read list. 1387 If the RPC Call has no argument data that is DDP-eligible and the 1388 Position Zero Read Chunk is not being used, the requester leaves the 1389 Read list empty. 1391 Responders MUST leave the Read list empty in all replies. 1393 5.3.2. Write List 1395 Each RDMA_MSG or RDMA_NOMSG procedure has one "Write list." The 1396 Write list is a list of zero or more Write chunks, provided by the 1397 requester. Each Write chunk is an array of RDMA segments, thus the 1398 Write list is a list of counted arrays. Each Write chunk advertises 1399 receptacles for DDP-eligible data to be pushed by the responder via 1400 RDMA Write operations. If the RPC Reply has no possible DDP-eligible 1401 result data items, the requester leaves the Write list empty. 1403 When a Write list is provided for the results of an RPC Call, the 1404 responder MUST provide data corresponding to DDP-eligible XDR data 1405 items via RDMA Write operations to the memory referenced in the Write 1406 list. The responder removes the data in these chunks from the 1407 reply's Payload stream. 1409 When multiple Write chunks are present, the responder fills in each 1410 Write chunk with a DDP-eligible result until either there are no more 1411 results or no more Write chunks. The requester may not be able to 1412 predict which DDP-eligible data item goes in which chunk. Thus the 1413 requester is responsible for allocating and registering Write chunks 1414 large enough to accommodate the largest XDR data item that might be 1415 associated with each chunk in the list. 1417 The RPC Reply conveys the size of result data items by returning each 1418 Write chunk to the requester with the segment lengths rewritten to 1419 match the actual data transferred. Decoding the reply therefore 1420 performs no local data copying but merely returns the length obtained 1421 from the reply. 1423 Each decoded result consumes one entry in the Write list, which in 1424 turn consists of an array of RDMA segments. The length of a Write 1425 chunk is therefore the sum of all returned lengths in all segments 1426 comprising the corresponding list entry. As each Write chunk is 1427 decoded, the entire Write list entry is consumed. 1429 A requester constructs the Write list for an RPC transaction before 1430 the responder has formulated its reply. When there is only one DDP- 1431 eligible result data item, the requester inserts only a single Write 1432 chunk in the Write list. If the responder populates that chunk with 1433 data, the requester knows with certainty which result data item is 1434 contained in it. 1436 However, Upper Layer Protocol procedures may allow replies where more 1437 than one result data item is DDP-eligible. For example, an NFSv4 1438 COMPOUND procedure is composed of individual NFSv4 operations, more 1439 than one of which may have a reply containing a DDP-eligible result. 1441 As stated above, when multiple Write chunks are present, the 1442 responder reduces DDP-eligible results until either there are no more 1443 results or no more Write chunks. Then, as the requester decodes the 1444 reply Payload stream, it is clear from the contents of the reply 1445 which Write chunk contains which data item. 1447 When a requester has provided a Write list in a Call message, the 1448 responder MUST copy that list into the associated Reply. The copied 1449 Write list in the Reply is modified as above to reflect the actual 1450 amount of data that is being returned in the Write list. 1452 5.3.3. Reply Chunk 1454 Each RDMA_MSG or RDMA_NOMSG procedure has one "Reply chunk." The 1455 Reply chunk is a Write chunk, provided by the requester. The Reply 1456 chunk is a single counted array of RDMA segments. 1458 A requester MUST provide a Reply chunk whenever the maximum possible 1459 size of the reply is larger than its own inline threshold. The Reply 1460 chunk MUST be large enough to contain a Payload stream (RPC message) 1461 of this maximum size. If the actual reply Payload stream is smaller 1462 than the requester's inline threshold, the responder MAY return it as 1463 a Short message rather than using the Reply chunk. 1465 When a requester has provided a Reply chunk in a Call message, the 1466 responder MUST copy that chunk into the associated Reply. The copied 1467 Reply chunk in the Reply is modified to reflect the actual amount of 1468 data that is being returned in the Reply chunk. 1470 5.4. Memory Registration 1472 RDMA requires that data is transferred between only registered memory 1473 segments at the source and destination. All protocol headers as well 1474 as separately transferred data chunks must reside in registered 1475 memory. 1477 Since the cost of registering and de-registering memory can be a 1478 significant proportion of the RDMA transaction cost, it is important 1479 to minimize registration activity. For memory that is targeted by 1480 RDMA Send and Receive operations, a local-only registration is 1481 sufficient and can be left in place during the life of a connection 1482 without any risk of data exposure. 1484 5.4.1. Registration Longevity 1486 Data transferred via RDMA Read and Write can reside in a memory 1487 allocation not in the control of the RPC-over-RDMA transport. These 1488 memory allocations can persist outside the bounds of an RPC 1489 transaction. They are registered and invalidated as needed, as part 1490 of each RPC transaction. 1492 The requester endpoint must ensure that memory segments associated 1493 with each RPC transaction are properly fenced from responders before 1494 allowing Upper Layer access to the data contained in them. Moreover, 1495 the requester must not access these memory segments while the 1496 responder has access to them. 1498 This includes segments that are associated with canceled RPCs. A 1499 responder cannot know that the requester is no longer waiting for a 1500 reply, and might proceed to read or even update memory that the 1501 requester might have released for other use. 1503 5.4.2. Communicating DDP-Eligibility 1505 The interface by which an Upper Layer Protocol implementation 1506 communicates the eligibility of a data item locally to its local RPC- 1507 over-RDMA endpoint is not described by this specification. 1509 Depending on the implementation and constraints imposed by Upper 1510 Layer Bindings, it is possible to implement reduction transparently 1511 to upper layers. Such implementations may lead to inefficiencies, 1512 either because they require the RPC layer to perform expensive 1513 registration and de-registration of memory "on the fly", or they may 1514 require using RDMA chunks in reply messages, along with the resulting 1515 additional handshaking with the RPC-over-RDMA peer. 1517 However, these issues are internal and generally confined to the 1518 local interface between RPC and its upper layers, one in which 1519 implementations are free to innovate. The only requirement is that 1520 the resulting RPC-over-RDMA protocol sent to the peer is valid for 1521 the upper layer. 1523 5.4.3. Registration Strategies 1525 The choice of which memory registration strategies to employ is left 1526 to requester and responder implementers. To support the widest array 1527 of RDMA implementations, as well as the most general steering tag 1528 scheme, an Offset field is included in each segment. 1530 While zero-based offset schemes are available in many RDMA 1531 implementations, their use by RPC requires individual registration of 1532 each segment. For such implementations, this can be a significant 1533 overhead. By providing an offset in each chunk, many pre- 1534 registration or region-based registrations can be readily supported. 1535 By using a single, universal chunk representation, the RPC-over-RDMA 1536 protocol implementation is simplified to its most general form. 1538 5.5. Error Handling 1540 A receiver performs basic validity checks on the RPC-over-RDMA header 1541 and chunk contents before it passes the RPC message to the RPC 1542 consumer. If errors are detected in an RPC-over-RDMA header, an 1543 RDMA_ERROR procedure MUST be generated. Because the transport layer 1544 may not be aware of the direction of a problematic RPC message, an 1545 RDMA_ERROR procedure MAY be generated by either a requester or a 1546 responder. 1548 To form an RDMA_ERROR procedure: The rdma_xid field MUST contain the 1549 same XID that was in the rdma_xid field in the failing request; The 1550 rdma_vers field MUST contain the same version that was in the 1551 rdma_vers field in the failing request; The rdma_proc field MUST 1552 contain the value RDMA_ERROR; The rdma_err field contains a value 1553 that reflects the type of error that occurred, as described below. 1555 An RDMA_ERROR procedure indicates a permanent error. Receipt of this 1556 procedure completes the RPC transaction associated with XID in the 1557 rdma_xid field. A receiver MUST silently discard an RDMA_ERROR 1558 procedure that it cannot decode. 1560 5.5.1. Header Version Mismatch 1562 When a receiver detects an RPC-over-RDMA header version that it does 1563 not support (currently this document defines only Version One), it 1564 MUST reply with an RDMA_ERROR procedure and set the rdma_err value to 1565 ERR_VERS, also providing the low and high inclusive version numbers 1566 it does, in fact, support. 1568 5.5.2. XDR Errors 1570 A receiver might encounter an XDR parsing error that prevents it from 1571 processing the incoming Transport stream. Examples of such errors 1572 include an invalid value in the rdma_proc field, an RDMA_NOMSG 1573 message that has no chunk lists, or the contents of the rdma_xid 1574 field might not match the contents of the XID field in the 1575 accompanying RPC message. If the rdma_vers field contains a 1576 recognized value, but an XDR parsing error occurs, the responder MUST 1577 reply with an RDMA_ERROR procedure and set the rdma_err value to 1578 ERR_CHUNK. 1580 When a responder receives a valid RPC-over-RDMA header but the 1581 responder's Upper Layer Protocol implementation cannot parse the RPC 1582 arguments in the RPC Call message, the responder SHOULD return a 1583 RPC_GARBAGEARGS reply, using an RDMA_MSG procedure. This type of 1584 parsing failure might be due to mismatches between chunk sizes or 1585 offsets and the contents of the Payload stream, for example. A 1586 responder MAY also report the presence of a non-DDP-eligible data 1587 item in a Read or Write chunk using RPC_GARBAGEARGS. 1589 5.5.3. Responder RDMA Operational Errors 1591 In RPC-over-RDMA Version One, it is the responder which drives RDMA 1592 Read and Write operations that target the requester's memory. 1593 Problems might arise as the responder attempts to use requester- 1594 provided resources for RDMA operations. For example: 1596 o Chunks can be validated only by using their contents to form RDMA 1597 Read or Write operations. If chunk contents are invalid (say, a 1598 segment is no longer registered, or a chunk length is too long), a 1599 Remote Access error occurs. 1601 o If a requester's receive buffer is too small, the responder's Send 1602 operation completes with a Local Length Error. 1604 o If the requester-provided Reply chunk is too small to accommodate 1605 a large RPC Reply, a Remote Access error occurs. A responder can 1606 detect this problem before attempting to write past the end of the 1607 Reply chunk. 1609 RDMA operational errors are typically fatal to the connection. To 1610 avoid a retransmission loop and repeated connection loss that 1611 deadlocks the connection, once the requester has re-established a 1612 connection, the responder should send an RDMA_ERROR reply with an 1613 rdma_err value of ERR_CHUNK to indicate that no RPC-level reply is 1614 possible for that XID. 1616 5.5.4. Other Operational Errors 1618 While a requester is constructing a Call message, an unrecoverable 1619 problem might occur that prevents the requester from posting further 1620 RDMA Work Requests on behalf of that message. As with other 1621 transports, if a requester is unable to construct and transmit a Call 1622 message, the associated RPC transaction fails immediately. 1624 After a requester has received a reply, if it is unable to invalidate 1625 a memory region due to an unrecoverable problem, the requester MUST 1626 close the connection to fence that memory from the responder before 1627 the associated RPC transaction is complete. 1629 While a responder is constructing a Reply message or error message, 1630 an unrecoverable problem might occur that prevents the responder from 1631 posting further RDMA Work Requests on behalf of that message. If a 1632 responder is unable to construct and transmit a Reply or error 1633 message, the responder MUST close the connection to signal to the 1634 requester that a reply was lost. 1636 5.5.5. RDMA Transport Errors 1638 The RDMA connection and physical link provide some degree of error 1639 detection and retransmission. iWARP's Marker PDU Aligned (MPA) layer 1640 (when used over TCP), Stream Control Transmission Protocol (SCTP), as 1641 well as the InfiniBand link layer all provide Cyclic Redundancy Check 1642 (CRC) protection of the RDMA payload, and CRC-class protection is a 1643 general attribute of such transports. 1645 Additionally, the RPC layer itself can accept errors from the 1646 transport, and recover via retransmission. RPC recovery can handle 1647 complete loss and re-establishment of a transport connection. 1649 The details of reporting and recovery from RDMA link layer errors are 1650 outside the scope of this protocol specification. See Section 9 for 1651 further discussion of the use of RPC-level integrity schemes to 1652 detect errors. 1654 5.6. Protocol Elements No Longer Supported 1656 The following protocol elements are no longer supported in RPC-over- 1657 RDMA Version One. Related enum values and structure definitions 1658 remain in the RPC-over-RDMA Version One protocol for backwards 1659 compatibility. 1661 5.6.1. RDMA_MSGP 1663 The specification of RDMA_MSGP in Section 3.9 of [RFC5666] is 1664 incomplete. To fully specify RDMA_MSGP would require: 1666 o Updating the definition of DDP-eligibility to include data items 1667 that may be transferred, with padding, via RDMA_MSGP procedures 1669 o Adding full operational descriptions of the alignment and 1670 threshold fields 1672 o Discussing how alignment preferences are communicated between two 1673 peers without using CCP 1675 o Describing the treatment of RDMA_MSGP procedures that convey Read 1676 or Write chunks 1678 The RDMA_MSGP message type is beneficial only when the padded data 1679 payload is at the end of an RPC message's argument or result list. 1680 This is not typical for NFSv4 COMPOUND RPCs, which often include a 1681 GETATTR operation as the final element of the compound operation 1682 array. 1684 Without a full specification of RDMA_MSGP, there has been no fully 1685 implemented prototype of it. Without a complete prototype of 1686 RDMA_MSGP support, it is difficult to assess whether this protocol 1687 element has benefit, or can even be made to work interoperably. 1689 Therefore, senders MUST NOT send RDMA_MSGP procedures. When 1690 receiving an RDMA_MSGP procedure, receivers SHOULD reply with an 1691 RDMA_ERROR procedure, setting the rdma_err field to ERR_CHUNK. 1693 5.6.2. RDMA_DONE 1695 Because no implementation of RPC-over-RDMA Version One uses the Read- 1696 Read transfer model, there is never a need to send an RDMA_DONE 1697 procedure. 1699 Therefore, senders MUST NOT send RDMA_DONE messages. When receiving 1700 an RDMA_DONE procedure, receivers SHOULD reply with an RDMA_ERROR 1701 procedure, setting the rdma_err field to ERR_CHUNK. 1703 5.7. XDR Examples 1705 RPC-over-RDMA chunk lists are complex data types. In this section, 1706 illustrations are provided to help readers grasp how chunk lists are 1707 represented inside an RPC-over-RDMA header. 1709 An RDMA segment is the simplest component, being made up of a 32-bit 1710 handle (H), a 32-bit length (L), and 64-bits of offset (OO). Once 1711 flattened into an XDR stream, RDMA segments appear as 1713 HLOO 1715 A Read segment has an additional 32-bit position field. Read 1716 segments appear as 1717 PHLOO 1719 A Read chunk is a list of Read segments. Each segment is preceded by 1720 a 32-bit word containing a one if there is a segment, or a zero if 1721 there are no more segments (optional-data). In XDR form, this would 1722 look like 1724 1 PHLOO 1 PHLOO 1 PHLOO 0 1726 where P would hold the same value for each segment belonging to the 1727 same Read chunk. 1729 The Read List is also a list of Read segments. In XDR form, this 1730 would look like a Read chunk, except that the P values could vary 1731 across the list. An empty Read List is encoded as a single 32-bit 1732 zero. 1734 One Write chunk is a counted array of segments. In XDR form, the 1735 count would appear as the first 32-bit word, followed by an HLOO for 1736 each element of the array. For instance, a Write chunk with three 1737 elements would look like 1739 3 HLOO HLOO HLOO 1741 The Write List is a list of counted arrays. In XDR form, this is a 1742 combination of optional-data and counted arrays. To represent a 1743 Write List containing a Write chunk with three segments and a Write 1744 chunk with two segments, XDR would encode 1746 1 3 HLOO HLOO HLOO 1 2 HLOO HLOO 0 1748 An empty Write List is encoded as a single 32-bit zero. 1750 The Reply chunk is a Write chunk. Since it is an optional-data 1751 field, however, there is a 32-bit field in front of it that contains 1752 a one if the Reply chunk is present, or a zero if it is not. After 1753 encoding, a Reply chunk with 2 segments would look like 1755 1 2 HLOO HLOO 1757 Frequently a requester does not provide any chunks. In that case, 1758 after the four fixed fields in the RPC-over-RDMA header, there are 1759 simply three 32-bit fields that contain zero. 1761 6. RPC Bind Parameters 1763 In setting up a new RDMA connection, the first action by a requester 1764 is to obtain a transport address for the responder. The mechanism 1765 used to obtain this address, and to open an RDMA connection is 1766 dependent on the type of RDMA transport, and is the responsibility of 1767 each RPC protocol binding and its local implementation. 1769 RPC services normally register with a portmap or rpcbind [RFC1833] 1770 service, which associates an RPC Program number with a service 1771 address. (In the case of UDP or TCP, the service address for NFS is 1772 normally port 2049.) This policy is no different with RDMA 1773 transports, although it may require the allocation of port numbers 1774 appropriate to each Upper Layer Protocol that uses the RPC framing 1775 defined here. 1777 When mapped atop the iWARP transport [RFC5040] [RFC5041], which uses 1778 IP port addressing due to its layering on TCP and/or SCTP, port 1779 mapping is trivial and consists merely of issuing the port in the 1780 connection process. The NFS/RDMA protocol service address has been 1781 assigned port 20049 by IANA, for both iWARP/TCP and iWARP/SCTP. 1783 When mapped atop InfiniBand [IB], which uses a Group Identifier 1784 (GID)-based service endpoint naming scheme, a translation MUST be 1785 employed. One such translation is defined in the InfiniBand Port 1786 Addressing Annex [IBPORT], which is appropriate for translating IP 1787 port addressing to the InfiniBand network. Therefore, in this case, 1788 IP port addressing may be readily employed by the upper layer. 1790 When a mapping standard or convention exists for IP ports on an RDMA 1791 interconnect, there are several possibilities for each upper layer to 1792 consider: 1794 o One possibility is to have responder register its mapped IP port 1795 with the rpcbind service, under the netid (or netid's) defined 1796 here. An RPC-over-RDMA-aware requester can then resolve its 1797 desired service to a mappable port, and proceed to connect. This 1798 is the most flexible and compatible approach, for those upper 1799 layers that are defined to use the rpcbind service. 1801 o A second possibility is to have the responder's portmapper 1802 register itself on the RDMA interconnect at a "well known" service 1803 address (on UDP or TCP, this corresponds to port 111). A 1804 requester could connect to this service address and use the 1805 portmap protocol to obtain a service address in response to a 1806 program number, e.g., an iWARP port number, or an InfiniBand GID. 1808 o Alternatively, the requester could simply connect to the mapped 1809 well-known port for the service itself, if it is appropriately 1810 defined. By convention, the NFS/RDMA service, when operating atop 1811 such an InfiniBand fabric, will use the same 20049 assignment as 1812 for iWARP. 1814 Historically, different RPC protocols have taken different approaches 1815 to their port assignment; therefore, the specific method is left to 1816 each RPC-over-RDMA-enabled Upper Layer binding, and not addressed 1817 here. 1819 In Section 10, this specification defines two new "netid" values, to 1820 be used for registration of upper layers atop iWARP [RFC5040] 1821 [RFC5041] and (when a suitable port translation service is available) 1822 InfiniBand [IB]. Additional RDMA-capable networks MAY define their 1823 own netids, or if they provide a port translation, MAY share the one 1824 defined here. 1826 7. Upper Layer Binding Specifications 1828 An Upper Layer Protocol is typically defined independently of any 1829 particular RPC transport. An Upper Layer Binding specification (ULB) 1830 provides guidance that helps the Upper Layer Protocol interoperate 1831 correctly and efficiently over a particular transport. For RPC-over- 1832 RDMA Version One, an Upper Layer Binding may provide: 1834 o A taxonomy of XDR data items that are eligible for Direct Data 1835 Placement 1837 o Constraints on which Upper Layer procedures may be reduced, and on 1838 how many chunks may appear in a single RPC request 1840 o A method for determining the maximum size of the reply Payload 1841 stream for all procedures in the Upper Layer Protocol 1843 o An rpcbind port assignment for operation of the RPC Program and 1844 Version on an RPC-over-RDMA transport 1846 Each RPC Program and Version tuple that utilizes RPC-over-RDMA 1847 Version One needs to have an Upper Layer Binding specification. 1849 7.1. DDP-Eligibility 1851 An Upper Layer Binding designates some XDR data items as eligible for 1852 Direct Data Placement. As an RPC-over-RDMA message is formed, DDP- 1853 eligible data items can be removed from the Payload stream and placed 1854 directly in the receiver's memory. 1856 An XDR data item should be considered for DDP-eligibility if there is 1857 a clear benefit to moving the contents of the item directly from the 1858 sender's memory to the receiver's memory. Criteria for DDP- 1859 eligibility include: 1861 o The XDR data item is frequently sent or received, and its size is 1862 often much larger than typical inline thresholds. 1864 o Transport-level processing of the XDR data item is not needed. 1865 For example, the data item is an opaque byte array, which requires 1866 no XDR encoding and decoding of its content. 1868 o The content of the XDR data item is sensitive to address 1869 alignment. For example, pullup would be required on the receiver 1870 before the content of the item can be used. 1872 o The XDR data item does not contain DDP-eligible data items. 1874 In addition to defining the set of data items that are DDP-eligible, 1875 an Upper Layer Binding may also limit the use of chunks to particular 1876 Upper Layer procedures. If more than one data item in a procedure is 1877 DDP-eligible, the Upper Layer Binding may also limit the number of 1878 chunks that a requester can provide for a particular Upper Layer 1879 procedure. 1881 Senders MUST NOT reduce data items that are not DDP-eligible. Such 1882 data items MAY, however, be moved as part of a Position Zero Read 1883 Chunk or a Reply chunk. 1885 The programming interface by which an Upper Layer implementation 1886 indicates the DDP-eligibility of a data item to the RPC transport is 1887 not described by this specification. The only requirements are that 1888 the receiver can re-assemble the transmitted RPC-over-RDMA message 1889 into a valid XDR stream, and that DDP-eligibility rules specified by 1890 the Upper Layer Binding are respected. 1892 There is no provision to express DDP-eligibility within the XDR 1893 language. The only definitive specification of DDP-eligibility is an 1894 Upper Layer Binding. 1896 7.1.1. DDP-Eligibility Violation 1898 A DDP-eligibility violation occurs when a requester forms a Call 1899 message with a non-DDP-eligible data item in a Read chunk. A 1900 violation occurs when a responder forms a Reply message without 1901 reducing a DDP-eligible data item when there is a Write list provided 1902 by the requester. 1904 In the first case, a responder MUST NOT process the Call message. 1906 In the second case, as a requester parses a Reply message, it must 1907 assume that the responder has correctly reduced a DDP-eligible result 1908 data item. If the responder has not done so, it is likely that the 1909 requester cannot finish parsing the Payload stream and that an XDR 1910 error would result. 1912 Both types of violations MUST be reported as described in 1913 Section 5.5.2. 1915 7.2. Maximum Reply Size 1917 A requester provides resources for both a Call message and its 1918 matching Reply message. A requester forms the Call message itself, 1919 thus can compute the exact resources needed for it. 1921 A requester must allocate resources for the Reply message (an RPC- 1922 over-RDMA credit, a Receive buffer, and possibly a Write list and 1923 Reply chunk) before the responder has formed the actual reply. To 1924 accommodate all possible replies for the procedure in the Call 1925 message, a requester must allocate reply resources based on the 1926 maximum possible size of the expected Reply message. 1928 If there are procedures in the Upper Layer Protocol for which there 1929 is no clear reply size maximum, the Upper Layer Binding needs to 1930 specify a dependable means for determining the maximum. 1932 7.3. Additional Considerations 1934 There may be other details provided in an Upper Layer Binding. 1936 o An Upper Layer Binding may recommend an inline threshold value or 1937 other transport-related parameters for RPC-over-RDMA Version One 1938 connections bearing that Upper Layer Protocol. 1940 o An Upper Layer Protocol may provide a means to communicate these 1941 transport-related parameters between peers. Note that RPC-over- 1942 RDMA Version One does not specify any mechanism for changing any 1943 transport-related parameter after a connection has been 1944 established. 1946 o Multiple Upper Layer Protocols may share a single RPC-over-RDMA 1947 Version One connection when their Upper Layer Bindings allow the 1948 use of RPC-over-RDMA Version One and the rpcbind port assignments 1949 for the Protocols allow connection sharing. In this case, the 1950 same transport parameters (such as inline threshold) apply to all 1951 Protocols using that connection. 1953 Each Upper Layer Binding needs to be designed to allow correct 1954 interoperation without regard to the transport parameters actually in 1955 use. Furthermore, implementations of Upper Layer Protocols must be 1956 designed to interoperate correctly regardless of the connection 1957 parameters in effect on a connection. 1959 7.4. Upper Layer Protocol Extensions 1961 An RPC Program and Version tuple may be extensible. For instance, 1962 there may be a minor versioning scheme that is not reflected in the 1963 RPC version number. Or, the Upper Layer Protocol may allow 1964 additional features to be specified after the original RPC program 1965 specification was ratified. 1967 Upper Layer Bindings are provided for interoperable RPC Programs and 1968 Versions by extending existing Upper Layer Bindings to reflect the 1969 changes made necessary by each addition to the existing XDR. 1971 8. Protocol Extensibility 1973 The RPC-over-RDMA header format is specified using XDR, unlike the 1974 message header used with RPC over TCP. To maintain a high degree of 1975 interoperability among implementations of RPC-over-RDMA, any change 1976 to this XDR requires a protocol version number change. New versions 1977 of RPC-over-RDMA may be published as separate protocol specifications 1978 without updating this document. 1980 The first four fields in every RPC-over-RDMA header must remain 1981 aligned at the same fixed offsets for all versions of the RPC-over- 1982 RDMA protocol. The version number must be in a fixed place to enable 1983 implementations to detect protocol version mismatches. 1985 For version mismatches to be reported in a fashion that all future 1986 version implementations can reliably decode, the rdma_proc field must 1987 remain in a fixed place, the value of ERR_VERS must always remain the 1988 same, and the field placement in struct rpc_rdma_errvers must always 1989 remain the same. 1991 8.1. Conventional Extensions 1993 Introducing new capabilities to RPC-over-RDMA Version One is limited 1994 to the adoption of conventions that make use of existing XDR (defined 1995 in this document) and allowed abstract RDMA operations. Because no 1996 mechanism for detecting optional features exists in RPC-over-RDMA 1997 Version One, implementations must rely on Upper Layer Protocols to 1998 communicate the existence of such extensions. 2000 Such extensions must be specified in a Standards Track document with 2001 appropriate review by the nfsv4 Working Group and the IESG. An 2002 example of a conventional extension to RPC-over-RDMA Version One can 2003 be found in [I-D.ietf-nfsv4-rpcrdma-bidirection]. 2005 9. Security Considerations 2007 9.1. Memory Protection 2009 A primary consideration is the protection of the integrity and 2010 privacy of local memory by an RPC-over-RDMA transport. The use of 2011 RPC-over-RDMA MUST NOT introduce any vulnerabilities to system memory 2012 contents, nor to memory owned by user processes. 2014 It is REQUIRED that any RDMA provider used for RPC transport be 2015 conformant to the requirements of [RFC5042] in order to satisfy these 2016 protections. These protections are provided by the RDMA layer 2017 specifications, and in particular, their security models. 2019 9.1.1. Protection Domains 2021 The use of Protection Domains to limit the exposure of memory 2022 segments to a single connection is critical. Any attempt by an 2023 endpoint not participating in that connection to re-use memory 2024 handles needs to result in immediate failure of that connection. 2025 Because Upper Layer Protocol security mechanisms rely on this aspect 2026 of Reliable Connection behavior, strong authentication of remote 2027 endpoints is recommended. 2029 9.1.2. Handle Predictability 2031 Unpredictable memory handles should be used for any operation 2032 requiring advertised memory segments. Advertising a continuously 2033 registered memory region allows a remote host to read or write to 2034 that region even when an RPC involving that memory is not under way. 2035 Therefore implementations should avoid advertising persistently 2036 registered memory. 2038 9.1.3. Memory Fencing 2040 Requesters should register memory segments for remote access only 2041 when they are about to be the target of an RPC operation that 2042 involves an RDMA Read or Write. 2044 Registered memory segments should be invalidated as soon as related 2045 RPC operations are complete. Invalidation and DMA unmapping of RDMA 2046 segments should be complete before message integrity checking is 2047 done, and before the RPC consumer is allowed to continue execution 2048 and use or alter the contents of a memory region. 2050 An RPC transaction on a requester might be terminated before a reply 2051 arrives if the RPC consumer exits unexpectedly (for example it is 2052 signaled or a segmentation fault occurs). When an RPC terminates 2053 abnormally, memory segments associated with that RPC should be 2054 invalidated appropriately before the segments are released to be 2055 reused for other purposes on the requester. 2057 9.2. RPC Message Security 2059 ONC RPC provides cryptographic security via the RPCSEC_GSS framework 2060 [I-D.ietf-nfsv4-rpcsec-gssv3]. RPCSEC_GSS implements message 2061 authentication, per-message integrity checking, and per-message 2062 confidentiality. However, integrity and privacy services require 2063 significant movement of data on each endpoint host. Some performance 2064 benefits enabled by RDMA transports can be lost. 2066 9.2.1. RPC-Over-RDMA Protection At Lower Layers 2068 Note that performance loss is expected when RPCSEC_GSS integrity or 2069 privacy is in use on any RPC transport. Protection below the RDMA 2070 layer is a more appropriate security mechanism for RDMA transports in 2071 performance-sensitive deployments. Certain configurations of IPsec 2072 can be co-located in RDMA hardware, for example, without any change 2073 to RDMA consumers or loss of data movement efficiency. 2075 The use of protection in a lower layer MAY be negotiated through the 2076 use of an RPCSEC_GSS security flavor defined in 2077 [I-D.ietf-nfsv4-rpcsec-gssv3] in conjunction with the Channel Binding 2078 mechanism [RFC5056] and IPsec Channel Connection Latching [RFC5660]. 2079 Use of such mechanisms is REQUIRED where integrity and/or privacy is 2080 desired and where efficiency is required. 2082 9.2.2. RPCSEC_GSS On RPC-Over-RDMA Transports 2084 Not all RDMA devices and fabrics support the above protection 2085 mechanisms. Also, per-message authentication is still required on 2086 NFS clients where multiple users access NFS files. In these cases, 2087 RPCSEC_GSS can protect NFS traffic conveyed on RPC-over-RDMA 2088 connections. 2090 RPCSEC_GSS extends the ONC RPC protocol [RFC5531] without changing 2091 the format of RPC messages. By observing the conventions described 2092 in this section, an RPC-over-RDMA transport can convey RPCSEC_GSS- 2093 protected RPC messages interoperably. 2095 As part of the ONC RPC protocol, protocol elements of RPCSEC_GSS that 2096 appear in the Payload stream of an RPC-over-RDMA message (such as 2097 control messages exchanged as part of establishing or destroying a 2098 security context, or data items that are part of RPCSEC_GSS 2099 authentication material) MUST NOT be reduced. 2101 9.2.2.1. RPCSEC_GSS Context Negotiation 2103 Some NFS client implementations use a separate connection to 2104 establish a GSS context for NFS operation. These clients use TCP and 2105 the standard NFS port (2049) for context establishment. However 2106 there is no guarantee that an NFS/RDMA server provides a TCP-based 2107 NFS server on port 2049. 2109 9.2.2.2. RPC-Over-RDMA With RPCSEC_GSS Authentication 2111 The RPCSEC_GSS authentication service has no impact on the DDP- 2112 eligibity of data items in an Upper Layer Protocol. 2114 However, RPCSEC_GSS authentication material appearing in an RPC 2115 message header can be larger than, say, an AUTH_SYS authenticator. 2116 In particular, when an RPCSEC_GSS pseudoflavor is in use, a requester 2117 needs to accommodate a larger RPC credential when marshaling Call 2118 messages, and to provide for a maximum size RPCSEC_GSS verifier when 2119 allocating reply buffers and Reply chunks. 2121 RPC messages, and thus Payload streams, are made larger as a result. 2122 Upper Layer Protocol operations that fit in a Short Message when a 2123 simpler form of authentication is in use might need to be reduced, or 2124 conveyed via a Long Message, when RPCSEC_GSS authentication is in 2125 use. It is more likely that a requester provides both a Read list 2126 and a Reply chunk in the same RPC-over-RDMA header to convey a Long 2127 call and provision a receptacle for a Long reply. More frequent use 2128 of Long messages can impact transport efficiency. 2130 9.2.2.3. RPC-Over-RDMA With RPCSEC_GSS Integrity Or Privacy 2132 The RPCSEC_GSS integrity service enables endpoints to detect 2133 modification of RPC messages in flight. The RPCSEC_GSS privacy 2134 service prevents all but the intended recipient from viewing the 2135 cleartext content of RPC arguments and results. RPCSEC_GSS integrity 2136 and privacy are end-to-end. They protect RPC arguments and results 2137 from application to server endpoint, and back. 2139 The RPCSEC_GSS integrity and encryption services operate on whole RPC 2140 messages after they have been XDR encoded for transmit, and before 2141 they have been XDR decoded after receipt. Both sender and receiver 2142 endpoints use intermediate buffers to prevent exposure of encrypted 2143 data or unverified cleartext data to RPC consumers. After 2144 verification, encryption, and message wrapping has been performed, 2145 the transport layer MAY use RDMA data transfer between these 2146 intermediate buffers. 2148 The process of reducing a DDP-eligible data item removes the data 2149 item and its XDR padding from the encoded XDR stream. XDR padding of 2150 a reduced data item is not transferred in an RPC-over-RDMA message. 2151 After reduction, the Payload stream contains fewer octets then the 2152 whole XDR stream did beforehand. XDR padding octets are often zero 2153 bytes, but they don't have to be. Thus reducing DDP-eligible items 2154 affects the result of message integrity verification or encryption. 2156 Therefore a sender MUST NOT reduce a Payload stream when RPCSEC_GSS 2157 integrity or encryption services are in use. Effectively, no data 2158 item is DDP-eligible in this situation, and Chunked Messages cannot 2159 be used. In this mode, an RPC-over-RDMA transport operates in the 2160 same manner as a transport that does not support direct data 2161 placement. 2163 When RPCSEC_GSS integrity or privacy is in use, a requester provides 2164 both a Read list and a Reply chunk in the same RPC-over-RDMA header 2165 to convey a Long call and provision a receptacle for a Long reply. 2167 9.2.2.4. Protecting RPC-Over-RDMA Transport Headers 2169 Like the base fields in an ONC RPC message (XID, call direction, and 2170 so on), the contents of an RPC-over-RDMA message's Transport stream 2171 are not protected by RPCSEC_GSS. This exposes XIDs, connection 2172 credit limits, and chunk lists (but not the content of the data items 2173 they refer to) to malicious behavior, which could redirect data that 2174 is transferred by the RPC-over-RDMA message, result in spurious 2175 retransmits, or trigger connection loss. 2177 In particular, if an attacker alters the information contained in the 2178 chunk lists of an RPC-over-RDMA header, data contained in those 2179 chunks can be redirected to other registered memory segments on 2180 requesters. An attacker might alter the arguments of RDMA Read and 2181 RDMA Write operations on the wire to similar effect. The use of 2182 RPCSEC_GSS integrity or privacy services enable the requester to 2183 detect if such tampering has been done and reject the RPC message. 2185 Encryption at lower layers, as described in Section 9.2.1, protects 2186 the content of the Transport stream. To address attacks on RDMA 2187 protocols themselves, RDMA transport implementations should conform 2188 to [RFC5042]. 2190 10. IANA Considerations 2192 Three assignments are specified by this document. These are 2193 unchanged from [RFC5666]: 2195 o A set of RPC "netids" for resolving RPC-over-RDMA services 2197 o Optional service port assignments for Upper Layer Bindings 2199 o An RPC program number assignment for the configuration protocol 2201 These assignments have been established, as below. 2203 The new RPC transport has been assigned an RPC "netid", which is an 2204 rpcbind [RFC1833] string used to describe the underlying protocol in 2205 order for RPC to select the appropriate transport framing, as well as 2206 the format of the service addresses and ports. 2208 The following "Netid" registry strings are defined for this purpose: 2210 NC_RDMA "rdma" 2211 NC_RDMA6 "rdma6" 2213 These netids MAY be used for any RDMA network satisfying the 2214 requirements of Section 3.2.2, and able to identify service endpoints 2215 using IP port addressing, possibly through use of a translation 2216 service as described above in Section 6. The "rdma" netid is to be 2217 used when IPv4 addressing is employed by the underlying transport, 2218 and "rdma6" for IPv6 addressing. 2220 The netid assignment policy and registry are defined in [RFC5665]. 2222 As a new RPC transport, this protocol has no effect on RPC Program 2223 numbers or existing registered port numbers. However, new port 2224 numbers MAY be registered for use by RPC-over-RDMA-enabled services, 2225 as appropriate to the new networks over which the services will 2226 operate. 2228 For example, the NFS/RDMA service defined in [RFC5667] has been 2229 assigned the port 20049, in the IANA registry: 2231 nfsrdma 20049/tcp Network File System (NFS) over RDMA 2232 nfsrdma 20049/udp Network File System (NFS) over RDMA 2233 nfsrdma 20049/sctp Network File System (NFS) over RDMA 2235 The RPC program number assignment policy and registry are defined in 2236 [RFC5531]. 2238 11. Acknowledgments 2240 The editor gratefully acknowledges the work of Brent Callaghan and 2241 Tom Talpey on the original RPC-over-RDMA Version One specification 2242 [RFC5666]. 2244 Dave Noveck provided excellent review, constructive suggestions, and 2245 consistent navigational guidance throughout the process of drafting 2246 this document. Dave also contributed much of the organization and 2247 content of Section 8 and helped the authors understand the 2248 complexities of XDR extensibility. 2250 The comments and contributions of Karen Deitke, Dai Ngo, Chunli 2251 Zhang, Dominique Martinet, and Mahesh Siddheshwar are accepted with 2252 great thanks. The editor also wishes to thank Bill Baker, Greg 2253 Marsden, and Matt Benjamin for their support of this work. 2255 The extract.sh shell script and formatting conventions were first 2256 described by the authors of the NFSv4.1 XDR specification [RFC5662]. 2258 Special thanks go to nfsv4 Working Group Chair Spencer Shepler and 2259 nfsv4 Working Group Secretary Thomas Haynes for their support. 2261 12. References 2263 12.1. Normative References 2265 [I-D.ietf-nfsv4-rpcrdma-bidirection] 2266 Lever, C., "Size-Limited Bi-directional Remote Procedure 2267 Call On Remote Direct Memory Access Transports", draft- 2268 ietf-nfsv4-rpcrdma-bidirection-01 (work in progress), 2269 September 2015. 2271 [I-D.ietf-nfsv4-rpcsec-gssv3] 2272 Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 2273 Security Version 3", draft-ietf-nfsv4-rpcsec-gssv3-17 2274 (work in progress), January 2016. 2276 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 2277 RFC 1833, DOI 10.17487/RFC1833, August 1995, 2278 . 2280 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2281 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 2282 RFC2119, March 1997, 2283 . 2285 [RFC4506] Eisler, M., Ed., "XDR: External Data Representation 2286 Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 2287 2006, . 2289 [RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement 2290 Protocol (DDP) / Remote Direct Memory Access Protocol 2291 (RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October 2292 2007, . 2294 [RFC5056] Williams, N., "On the Use of Channel Bindings to Secure 2295 Channels", RFC 5056, DOI 10.17487/RFC5056, November 2007, 2296 . 2298 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 2299 Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, 2300 May 2009, . 2302 [RFC5660] Williams, N., "IPsec Channels: Connection Latching", RFC 2303 5660, DOI 10.17487/RFC5660, October 2009, 2304 . 2306 [RFC5665] Eisler, M., "IANA Considerations for Remote Procedure Call 2307 (RPC) Network Identifiers and Universal Address Formats", 2308 RFC 5665, DOI 10.17487/RFC5665, January 2010, 2309 . 2311 12.2. Informative References 2313 [IB] InfiniBand Trade Association, "InfiniBand Architecture 2314 Specifications", . 2316 [IBPORT] InfiniBand Trade Association, "IP Addressing Annex", 2317 . 2319 [RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 2320 10.17487/RFC0768, August 1980, 2321 . 2323 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 2324 793, DOI 10.17487/RFC0793, September 1981, 2325 . 2327 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 2328 specification", RFC 1094, DOI 10.17487/RFC1094, March 2329 1989, . 2331 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 2332 Version 3 Protocol Specification", RFC 1813, DOI 10.17487/ 2333 RFC1813, June 1995, 2334 . 2336 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 2337 Garcia, "A Remote Direct Memory Access Protocol 2338 Specification", RFC 5040, DOI 10.17487/RFC5040, October 2339 2007, . 2341 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 2342 Data Placement over Reliable Transports", RFC 5041, DOI 2343 10.17487/RFC5041, October 2007, 2344 . 2346 [RFC5532] Talpey, T. and C. Juszczak, "Network File System (NFS) 2347 Remote Direct Memory Access (RDMA) Problem Statement", RFC 2348 5532, DOI 10.17487/RFC5532, May 2009, 2349 . 2351 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 2352 "Network File System (NFS) Version 4 Minor Version 1 2353 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 2354 . 2356 [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 2357 "Network File System (NFS) Version 4 Minor Version 1 2358 External Data Representation Standard (XDR) Description", 2359 RFC 5662, DOI 10.17487/RFC5662, January 2010, 2360 . 2362 [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access 2363 Transport for Remote Procedure Call", RFC 5666, DOI 2364 10.17487/RFC5666, January 2010, 2365 . 2367 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 2368 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 2369 January 2010, . 2371 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 2372 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 2373 March 2015, . 2375 Authors' Addresses 2377 Charles Lever (editor) 2378 Oracle Corporation 2379 1015 Granger Avenue 2380 Ann Arbor, MI 48104 2381 USA 2383 Phone: +1 734 274 2396 2384 Email: chuck.lever@oracle.com 2386 William Allen Simpson 2387 DayDreamer 2388 1384 Fontaine 2389 Madison Heights, MI 48071 2390 USA 2392 Email: william.allen.simpson@gmail.com 2394 Tom Talpey 2395 Microsoft Corp. 2396 One Microsoft Way 2397 Redmond, WA 98052 2398 USA 2400 Phone: +1 425 704-9945 2401 Email: ttalpey@microsoft.com