idnits 2.17.1 draft-ietf-nfsv4-rpcrdma-bidirection-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 25, 2015) is 3107 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) ** Obsolete normative reference: RFC 5666 (Obsoleted by RFC 8166) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NFSv4 C. Lever 3 Internet-Draft Oracle 4 Intended status: Experimental September 25, 2015 5 Expires: March 28, 2016 7 Size-Limited Bi-directional Remote Procedure Call On Remote Direct 8 Memory Access Transports 9 draft-ietf-nfsv4-rpcrdma-bidirection-01 11 Abstract 13 Recent minor versions of NFSv4 work best when ONC RPC transports can 14 send ONC RPC transactions in both directions. This document 15 describes conventions that enable RPC-over-RDMA Version One transport 16 endpoints to interoperate when operation in both directions is 17 necessary. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on March 28, 2016. 36 Copyright Notice 38 Copyright (c) 2015 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 54 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 55 1.2. Scope Of This Document . . . . . . . . . . . . . . . . . 3 56 1.3. Understanding RPC Direction . . . . . . . . . . . . . . . 3 57 1.3.1. Forward Direction . . . . . . . . . . . . . . . . . . 4 58 1.3.2. Backward Direction . . . . . . . . . . . . . . . . . 4 59 1.3.3. Bi-direction . . . . . . . . . . . . . . . . . . . . 4 60 1.3.4. XID Values . . . . . . . . . . . . . . . . . . . . . 4 61 1.4. Rationale For RPC-over-RDMA Bi-Direction . . . . . . . . 5 62 1.4.1. NFSv4.0 Callback Operation . . . . . . . . . . . . . 5 63 1.4.2. NFSv4.1 Callback Operation . . . . . . . . . . . . . 6 64 1.5. Design Considerations . . . . . . . . . . . . . . . . . . 6 65 1.5.1. Backward Compatibility . . . . . . . . . . . . . . . 7 66 1.5.2. Performance Impact . . . . . . . . . . . . . . . . . 7 67 1.5.3. Server Memory Security . . . . . . . . . . . . . . . 7 68 1.5.4. Payload Size . . . . . . . . . . . . . . . . . . . . 7 69 2. Conventions For Backward Operation . . . . . . . . . . . . . 8 70 2.1. Flow Control . . . . . . . . . . . . . . . . . . . . . . 8 71 2.1.1. Forward Credits . . . . . . . . . . . . . . . . . . . 9 72 2.1.2. Backward Credits . . . . . . . . . . . . . . . . . . 9 73 2.2. Managing Receive Buffers . . . . . . . . . . . . . . . . 9 74 2.2.1. Client Receive Buffers . . . . . . . . . . . . . . . 10 75 2.2.2. Server Receive Buffers . . . . . . . . . . . . . . . 10 76 2.2.3. In the Absense of Backward Direction Support . . . . 10 77 2.3. Backward Direction Retransmission . . . . . . . . . . . . 11 78 2.4. Backward Direction Message Size . . . . . . . . . . . . . 12 79 2.5. Sending A Backward Direction Call . . . . . . . . . . . . 12 80 2.6. Sending A Backward Direction Reply . . . . . . . . . . . 13 81 3. Limits To This Approach . . . . . . . . . . . . . . . . . . . 13 82 3.1. Payload Size . . . . . . . . . . . . . . . . . . . . . . 13 83 3.2. Preparedness To Handle Backward Requests . . . . . . . . 13 84 3.3. Long Term . . . . . . . . . . . . . . . . . . . . . . . . 14 85 4. Security Considerations . . . . . . . . . . . . . . . . . . . 14 86 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 87 6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14 88 7. Normative References . . . . . . . . . . . . . . . . . . . . 15 89 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15 91 1. Introduction 92 1.1. Requirements Language 94 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 95 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 96 document are to be interpreted as described in [RFC2119]. 98 1.2. Scope Of This Document 100 This document describes a set of experimental conventions that apply 101 to RPC-over-RDMA Version One, specified in [RFC5666]. When observed, 102 these conventions enable RPC-over-RDMA Version One endpoints to 103 concurrently handle RPC transactions that flow from client to server 104 and from server to client. 106 These conventions can be observed when using the existing the RPC- 107 over-RDMA Version One protocol definition. Therefore this document 108 does not update [RFC5666]. 110 The purpose of this document is to permit interoperable prototype 111 implementations of bi-directional RPC-over-RDMA, enabling the use of 112 NFSv4.1, and in particular pNFS, on RDMA transports. 114 Providing an Upper Layer Binding for NFSv4.x callback operations is 115 outside the scope of this document. 117 1.3. Understanding RPC Direction 119 The ONC RPC protocol as described in [RFC5531] is fundamentally a 120 message-passing protocol between one server and one or more clients. 121 ONC RPC transactions are made up of two types of messages. 123 A CALL message, or "Call", requests work. A Call is designated by 124 the value CALL in the message's msg_type field. An arbitrary unique 125 value is placed in the message's xid field. A host that originates a 126 Call is referred to in this document as a "Caller." 128 A REPLY message, or "Reply", reports the results of work requested by 129 a Call. A Reply is designated by the value REPLY in the message's 130 msg_type field. The value contained in the message's xid field is 131 copied from the Call whose results are being reported. A host that 132 emits a Reply is referred to as a "Responder." 134 RPC-over-RDMA is a connection-oriented RPC transport. When a 135 connection-oriented transport is used, ONC RPC client endpoints are 136 responsible for initiating transport connections, while ONC RPC 137 service endpoints wait passively for incoming connection requests. 139 We do not consider RPC direction on connectionless RPC transports in 140 this document. 142 1.3.1. Forward Direction 144 A traditional ONC RPC client is always a Caller. A traditional ONC 145 RPC service is always a Responder. This traditional form of ONC RPC 146 message passing is referred to as operation in the "forward 147 direction." 149 During forward direction operation, the ONC RPC client is responsible 150 for establishing transport connections. 152 1.3.2. Backward Direction 154 The ONC RPC standard does not forbid passing messages in the other 155 direction. An ONC RPC service endpoint can act as a Caller, in which 156 case an ONC RPC client endpoint acts as a Responder. This form of 157 message passing is referred to as operation in the "backward 158 direction." 160 During backward direction operation, the ONC RPC client is 161 responsible for establishing transport connections, even though ONC 162 RPC Calls come from the ONC RPC server. 164 ONC RPC clients and services are optimized to perform and scale well 165 while handling traffic in the forward direction, and may not be 166 prepared to handle operation in the backward direction. Not until 167 recently has there been a need to handle backward direction 168 operation. 170 1.3.3. Bi-direction 172 A pair of endpoints may choose to use only forward or only backward 173 direction operations on a particular transport. Or, the endpoints 174 may send operations in both directions concurrently on the same 175 transport. 177 Bi-directional operation occurs when both transport endpoints act as 178 a Caller and a Responder at the same time. As above, the ONC RPC 179 client is responsible for establishing transport connections. 181 1.3.4. XID Values 183 Section 9 of [RFC5531] introduces the ONC RPC transaction identifier, 184 or "xid" for short. The value of an xid is interpreted in the 185 context of the message's msg_type field. 187 o The xid of a Call is arbitrary but is unique among outstanding 188 Calls from that Caller. 190 o The xid of a Reply always matches that of the initiating Call. 192 When receiving a Reply, a Caller matches the xid value in the Reply 193 with a Call it previously sent. 195 1.3.4.1. XIDs with Bi-direction 197 During bi-directional operation, the forward and backward directions 198 use independent xid spaces. 200 In other words, a forward direction Caller MAY use the same xid value 201 at the same time as a backward direction Caller on the same transport 202 connection. Though such concurrent requests use the same xid value, 203 they represent distinct ONC RPC transactions. 205 1.4. Rationale For RPC-over-RDMA Bi-Direction 207 1.4.1. NFSv4.0 Callback Operation 209 An NFSv4.0 client employs a traditional ONC RPC client to send NFS 210 requests to an NFSv4.0 server's traditional ONC RPC service 211 [RFC7530]. NFSv4.0 requests flow in the forward direction on a 212 connection established by the client. This connection is referred to 213 as a "forechannel" connection. 215 NFSv4.0 introduces the use of callback operations, or "callbacks", in 216 Section 10.2 of [RFC7530] for managing file delegation. An NFSv4.0 217 server sets up a traditional ONC RPC client, and an NFSv4.0 client 218 sets up a traditional ONC RPC service to handle callbacks. Callbacks 219 flow in the forward direction on a connection established by an 220 NFSv4.0 server. This connection is distinct from connections being 221 used as forechannels. This connection is referred to as a 222 "backchannel" connection. 224 When an RDMA transport is used as a forechannel, an NFSv4.0 client 225 typically provides a TCP callback service. The client's SETCLIENTID 226 operation advertises the callback service endpoint with a "tcp" or 227 "tcp6" netid. The server then connects to this service using a TCP 228 socket. 230 NFSv4.0 implementations are fully functional without a backchannel in 231 place. In this case, the server does not grant file delegations. 232 This might result in a negative performance effect, but functional 233 correctness is unaffected. 235 1.4.2. NFSv4.1 Callback Operation 237 NFSv4.1 supports file delegation in a similar fashion to NFSv4.0, and 238 extends the repertoire of callbacks to manage pNFS layouts, as 239 discussed in Chapter 12 of [RFC5661]. 241 For various reasons, NFSv4.1 requires that all transport connections 242 be initiated by NFSv4.1 clients. Therefore, NFSv4.1 servers send 243 callbacks to clients in the backward direction on connections 244 established by NFSv4.1 clients. 246 An NFSv4.1 client or server indicates to its peer that a backchannel 247 capability is available on a given transport by sending a 248 CREATE_SESSION or BIND_CONN_TO_SESSION operation. 250 NFSv4.1 clients may establish distinct transport connections for 251 forechannel and backchannel operation, or they may combine 252 forechannel and backchannel operation on one transport connection 253 using bi-directional operation. 255 Without a backward direction RPC-over-RDMA capability, an NFSv4.1 256 client must additionally connect using a transport with backward 257 direction capability to use as a backchannel. TCP is the only choice 258 at present for an NFSv4.1 backchannel connection. 260 Some implementations find it more convenient to use a single combined 261 transport (ie. a transport that is capable of bi-directional 262 operation). This simplifies connection establishment and recovery 263 during network partitions, or when one endpoint restarts. 265 As with NFSv4.0, if a backchannel is not in use, an NFSv4.1 server 266 does not grant delegations. But because of its reliance on callbacks 267 to manage pNFS layout state, pNFS operation is not possible without a 268 backchannel. 270 1.5. Design Considerations 272 As of this writing, the only use case for backward direction ONC RPC 273 messages is the NFSv4.1 backchannel. The conventions described in 274 this document take advantage of certain characteristics of NFSv4.1 275 callbacks, namely: 277 o NFSv4.1 callbacks typically bear small arguments and results 279 o NFSv4.1 callback arguments and results are insensitive to 280 alignment relative to system pages 282 o NFSv4.1 callbacks are infrequent relative to forechannel 283 operations 285 1.5.1. Backward Compatibility 287 Existing clients that implement RPC-over-RDMA Version One should 288 interoperate correctly with servers that implement RPC-over-RDMA with 289 backward direction support, and vice versa. 291 The approach taken here avoids altering the RPC-over-RDMA Version One 292 XDR specification. Keeping the XDR the same enables existing RPC- 293 over-RDMA Version One implementations to interoperate with 294 implementations that support operation in the backward direction. 296 1.5.2. Performance Impact 298 Support for operation in the backward direction should never impact 299 the performance or scalability of forward direction operation, where 300 the bulk of ONC RPC transport activity typically occurs. 302 1.5.3. Server Memory Security 304 RDMA transfers involve one endpoint exposing a section of its memory 305 to the other endpoint, which then drives RDMA Read and Write 306 operations to access or modify the exposed memory. RPC-over-RDMA 307 client endpoints expose their memory, and RPC-over-RDMA server 308 endpoints initiate RDMA data transfer operations. 310 If RDMA transfers are not used for backward direction operations, 311 there is no need for servers to expose their memory to clients. 312 Further, this avoids the client complexity required to drive RDMA 313 transfers. 315 1.5.4. Payload Size 317 Small RPC-over-RDMA messages are conveyed using only RDMA Send 318 operations. Send is used to transmit both ONC RPC Calls and replies. 320 To send a large payload, an RPC-over-RDMA client endpoint registers a 321 region of memory known as a chunk and transmits its coordinates to an 322 RPC-over-RDMA server endpoint, who uses an RDMA transfer to move data 323 to or from the client. See Sections 3.1, 3.2, and 3.4 of [RFC5666]. 325 To transmit RPC-over-RDMA messages larger than the receive buffer 326 size (typically 1024 bytes), a chunk must be used. For example, in 327 an RDMA_NOMSG type message, the entire RPC header and Upper Layer 328 payload are contained in one or more chunks. See Section 5.1 of 329 [RFC5666] for further details. 331 If chunks are not allowed to be used for conveying backward direction 332 messages, an RDMA_NOMSG type message cannot be used to convey a 333 backward direction message using the conventions described in this 334 document. Therefore, backward direction messages sent using the 335 conventions in this document can be no larger than a single receive 336 buffer. 338 Stipulating such a limit on backward direction message size assumes 339 that either Upper Layer Protocol consumers of backward direction 340 messages can advertise this limit to peers, or that ULP consumers can 341 agree by convention on a maximum size of their backchannel payloads. 343 In addition, using only inline forms of RPC-over-RDMA messages and 344 never populating the RPC-over-RDMA chunk lists means that the RPC 345 header's msg_type field is always at a fixed location in messages 346 flowing in the backward direction, allowing efficient detection of 347 the direction of an RPC-over-RDMA message. 349 With few exceptions, NFSv4.1 servers can break down callback requests 350 so they fit within this limit. There are potentially large NFSv4.1 351 callback operations, such as a CB_GETATTR operation where a large ACL 352 must be conveyed. Although we are not aware of any NFSv4.1 353 implementation that uses CB_GETATTR, this state of affairs is not 354 guaranteed in perpetuity. 356 2. Conventions For Backward Operation 358 Performing backward direction ONC RPC operations over an RPC-over- 359 RDMA transport can be accomplished within limits by observing the 360 conventions described in the following subsections. For reference, 361 the XDR description of RPC-over-RDMA Version One is contained in 362 Section 4.3 of [RFC5666]. 364 2.1. Flow Control 366 For an RDMA Send operation to work, the receiving consumer must have 367 posted an RDMA Receive Work Request to provide a receive buffer in 368 which to capture the incoming message. If a receiver hasn't posted 369 enough Receive WRs to catch incoming Send operations, the RDMA 370 provider is allowed to drop the RDMA connection. 372 The RPC-over-RDMA Version One protocol provides built-in send flow 373 control to prevent overrunning the number of pre-posted receive 374 buffers on a connection's receive endpoint. This is fully discussed 375 in Section 3.3 of [RFC5666]. 377 2.1.1. Forward Credits 379 An RPC-over-RDMA credit is the capability to handle one RPC-over-RDMA 380 transaction. Each forward direction RPC-over-RDMA Call requests a 381 number of credits from the Responder. Each forward direction Reply 382 informs the Caller how many credits the Responder is prepared to 383 handle in total. The value of the request and grant are carried in 384 each RPC-over-RDMA message's rdma_credit field. 386 Practically speaking, the critical value is the value of the 387 rdma_credit field in RPC-over-RDMA replies. When a Caller is 388 operating correctly, it sends no more outstanding requests at a time 389 than the Responder's advertised forward direction credit value. 391 The credit value is a guaranteed minimum. However, a receiver can 392 post more receive buffers than its credit value. There is no 393 requirement in the RPC-over-RDMA protocol for a receiver to indicate 394 a credit overrun. Operation continues as long as there are enough 395 receive buffers to handle incoming messages. 397 2.1.2. Backward Credits 399 Credits work the same way in the backward direction as they do in the 400 forward direction. However, forward direction credits and backward 401 direction credits are accounted separately. 403 In other words, the forward direction credit value is the same 404 whether or not there are backward direction resources associated with 405 an RPC-over-RDMA transport connection. The backward direction credit 406 value MAY be different than the forward direction credit value. The 407 rdma_credit field in a backward direction RPC-over-RDMA message MUST 408 NOT contain the value zero. 410 A backward direction Caller (an RPC-over-RDMA service endpoint) 411 requests credits from the Responder (an RPC-over-RDMA client 412 endpoint). The Responder reports how many credits it can grant. 413 This is the number of backward direction Calls the Responder is 414 prepared to handle at once. 416 When an RPC-over-RDMA server endpoint is operating correctly, it 417 sends no more outstanding requests at a time than the client 418 endpoint's advertised backward direction credit value. 420 2.2. Managing Receive Buffers 422 An RPC-over-RDMA transport endpoint must pre-post receive buffers 423 before it can receive and process incoming RPC-over-RDMA messages. 424 If a sender transmits a message for a receiver which has no prepared 425 receive buffer, the RDMA provider is allowed to drop the RDMA 426 connection. 428 2.2.1. Client Receive Buffers 430 Typically an RPC-over-RDMA caller posts only as many receive buffers 431 as there are outstanding RPC Calls. A client endpoint without 432 backward direction support might therefore at times have no pre- 433 posted receive buffers. 435 To receive incoming backward direction Calls, an RPC-over-RDMA client 436 endpoint must pre-post enough additional receive buffers to match its 437 advertised backward direction credit value. Each outstanding forward 438 direction RPC requires an additional receive buffer above this 439 minimum. 441 When an RDMA transport connection is lost, all active receive buffers 442 are flushed and are no longer available to receive incoming messages. 443 When a fresh transport connection is established, a client endpoint 444 must re-post a receive buffer to handle the Reply for each 445 retransmitted forward direction Call, and a full set of receive 446 buffers to handle backward direction Calls. 448 2.2.2. Server Receive Buffers 450 A forward direction RPC-over-RDMA service endpoint posts as many 451 receive buffers as it expects incoming forward direction Calls. That 452 is, it posts no fewer buffers than the number of RPC-over-RDMA 453 credits it advertises in the rdma_credit field of forward direction 454 RPC replies. 456 To receive incoming backward direction replies, an RPC-over-RDMA 457 server endpoint must pre-post a receive buffer for each backward 458 direction Call it sends. 460 When the existing transport connection is lost, all active receive 461 buffers are flushed and are no longer available to receive incoming 462 messages. When a fresh transport connection is established, a server 463 endpoint must re-post a receive buffer to handle the Reply for each 464 retransmitted backward direction Call, and a full set of receive 465 buffers for receiving forward direction Calls. 467 2.2.3. In the Absense of Backward Direction Support 469 An RPC-over-RDMA transport endpoint might not support backward 470 direction operation. There might be no mechanism in the transport 471 implementation to do so. Or the Upper Layer Protocol consumer might 472 not yet have configured the transport to handle backward direction 473 traffic. 475 A loss of the RDMA connection may result if the receiver is not 476 prepared to receive an incoming message. Thus a denial-of-service 477 could result if a sender continues to send backchannel messages after 478 every transport reconnect to an endpoint that is not prepared to 479 receive them. 481 Generally, for RPC-over-RDMA Version One transports, the Upper Layer 482 Protocol consumer is responsible for informing its peer when it has 483 support for the backward direction. Otherwise even a simple backward 484 direction NULL probe from a peer would result in a lost connection. 486 An NFSv4.1 server should never send backchannel messages to an 487 NFSv4.1 client before the NFSv4.1 client has sent a CREATE_SESSION or 488 a BIND_CONN_TO_SESSION operation. As long as an NFSv4.1 client has 489 prepared appropriate backchannel resources before sending one of 490 these operations, denial-of-service is avoided. Legacy versions of 491 NFS should never send backchannel operations. 493 Therefore, an Upper Layer Protocol consumer MUST NOT perform backward 494 direction ONC RPC operations unless the peer consumer has indicated 495 it is prepared to handle them. A description of Upper Layer Protocol 496 mechanisms used for this indication is outside the scope of this 497 document. 499 2.3. Backward Direction Retransmission 501 In rare cases, an ONC RPC transaction cannot be completed within a 502 certain time. This can be because the transport connection was lost, 503 the Call or Reply message was dropped, or because the Upper Layer 504 consumer delayed or dropped the ONC RPC request. Typically, the 505 Caller sends the transaction again, reusing the same RPC XID. This 506 is known as an "RPC retransmission". 508 In the forward direction, the Caller is the ONC RPC client. The 509 client is always responsible for establishing a transport connection 510 before sending again. 512 In the backward direction, the Caller is the ONC RPC server. Because 513 an ONC RPC server does not establish transport connections with 514 clients, it cannot send a retransmission if there is no transport 515 connection. It must wait for the ONC RPC client to re-establish the 516 transport connection before it can retransmit ONC RPC transactions in 517 the backward direction. 519 If an ONC RPC client has no work to do, it may be some time before it 520 re-establishes a transport connection. Backward direction Callers 521 must be prepared to wait indefinitely before a connection is 522 established before a pending backward direction ONC RPC Call can be 523 retransmitted. 525 2.4. Backward Direction Message Size 527 RPC-over-RDMA backward direction messages are transmitted and 528 received using the same buffers as messages in the forward direction. 529 Therefore they are constrained to be no larger than receive buffers 530 posted for forward messages. Typical implementations have chosen to 531 use 1024-byte buffers. 533 It is expected that the Upper Layer Protocol consumer establishes an 534 appropriate payload size limit for backward direction operations, 535 either by advertising that size limit to its peers, or by convention. 536 If that is done, backward direction messages do not exceed the size 537 of receive buffers at either endpoint. 539 If a sender transmits a backward direction message that is larger 540 than the receiver is prepared for, the RDMA provider drops the 541 message and the RDMA connection. 543 If a sender transmits an RDMA message that is too small to convey a 544 complete and valid RPC-over-RDMA and RPC message in either direction, 545 the receiver MUST NOT use any value in the fields that were 546 transmitted. Namely, the rdma_credit field MUST be ignored, and the 547 message dropped. 549 2.5. Sending A Backward Direction Call 551 To form a backward direction RPC-over-RDMA Call message on an RPC- 552 over-RDMA Version One transport, an ONC RPC service endpoint 553 constructs an RPC-over-RDMA header containing a fresh RPC XID in the 554 rdma_xid field (see Section 1.3.4 for full requirements). 556 The rdma_vers field MUST contain the value one. The number of 557 requested credits is placed in the rdma_credit field (see 558 Section 2.1). 560 The rdma_proc field in the RPC-over-RDMA header MUST contain the 561 value RDMA_MSG. All three chunk lists MUST be empty. 563 The ONC RPC Call header MUST follow immediately, starting with the 564 same XID value that is present in the RPC-over-RDMA header. The Call 565 header's msg_type field MUST contain the value CALL. 567 2.6. Sending A Backward Direction Reply 569 To form a backward direction RPC-over-RDMA Reply message on an RPC- 570 over-RDMA Version One transport, an ONC RPC client endpoint 571 constructs an RPC-over-RDMA header containing a copy of the matching 572 ONC RPC Call's RPC XID in the rdma_xid field (see Section 1.3.4 for 573 full requirements). 575 The rdma_vers field MUST contain the value one. The number of 576 granted credits is placed in the rdma_credit field (see Section 2.1). 578 The rdma_proc field in the RPC-over-RDMA header MUST contain the 579 value RDMA_MSG. All three chunk lists MUST be empty. 581 The ONC RPC Reply header MUST follow immediately, starting with the 582 same XID value that is present in the RPC-over-RDMA header. The 583 Reply header's msg_type field MUST contain the value REPLY. 585 3. Limits To This Approach 587 3.1. Payload Size 589 The major drawback to the approach described in this document is the 590 limit on payload size in backward direction requests. 592 o Some NFSv4.1 callback operations can have potentially large 593 arguments or results. For example, CB_GETATTR on a file with a 594 large ACL; or CB_NOTIFY, which can provide a large, complex 595 argument. 597 o Any backward direction operation protected by RPCSEC_GSS may have 598 additional header information that makes it difficult to send 599 backward direction operations with large arguments or results. 601 o Larger payloads could potentially require the use of RDMA data 602 transfers, which are complex and make it more difficult to detect 603 backward direction requests. The msg_type field in the ONC RPC 604 header would no longer be at a fixed location in backward 605 direction requests. 607 3.2. Preparedness To Handle Backward Requests 609 A second drawback is the exposure of the client transport endpoint to 610 backward direction Calls before it has posted receive buffers to 611 handle them. 613 Clients that do not support backward direction operation typically 614 drop messages they do not recognize. However, this does not allow 615 bi-direction-capable servers to quickly identify clients that cannot 616 handle backward direction requests. 618 The conventions in this document rely on Upper Layer Protocol 619 consumers to decide when backward direction transport operation is 620 appropriate. 622 3.3. Long Term 624 To address the limitations described in this section in the long run, 625 a new version of the RPC-over-RDMA protocol would be required. The 626 use of the conventions described in this document to enable backward 627 direction operation is thus a transitional approach that is 628 appropriate only while RPC-over-RDMA Version One is the predominantly 629 deployed version of the RPC-over-RDMA protocol. 631 4. Security Considerations 633 As a consequence of limiting the size of backward direction RPC-over- 634 RDMA messages, the use of RPCSEC_GSS integrity and confidentiality 635 services (see [RFC2203]) in the backward direction may be challenging 636 due to the size of the additional RPC header information required for 637 RPCSEC_GSS. 639 5. IANA Considerations 641 This document does not require actions by IANA. 643 6. Acknowledgements 645 Tom Talpey was an indispensable resource, in addition to creating the 646 foundation upon which this work is based. Our warmest regards go to 647 him for his help and support. 649 Dave Noveck provided excellent review, constructive suggestions, and 650 navigational guidance throughout the process of drafting this 651 document. 653 Dai Ngo was a solid partner and collaborator. Together we 654 constructed and tested independent prototypes of the conventions 655 described in this document. 657 The author wishes to thank Bill Baker for his unwavering support of 658 this work. In addition, the author gratefully acknowledges the 659 expert contributions of Karen Deitke, Chunli Zhang, Mahesh 660 Siddheshwar, Steve Wise, and Tom Tucker. 662 Special thanks go to the nfsv4 Working Group chair Spencer Shepler 663 and the WG Editor Tom Haynes for their support. 665 7. Normative References 667 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 668 Requirement Levels", BCP 14, RFC 2119, March 1997. 670 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 671 Specification", RFC 2203, September 1997. 673 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 674 Specification Version 2", RFC 5531, May 2009. 676 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 677 System (NFS) Version 4 Minor Version 1 Protocol", RFC 678 5661, January 2010. 680 [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access 681 Transport for Remote Procedure Call", RFC 5666, January 682 2010. 684 [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) 685 Version 4 Protocol", RFC 7530, March 2015. 687 Author's Address 689 Charles Lever 690 Oracle Corporation 691 1015 Granger Avenue 692 Ann Arbor, MI 48104 693 US 695 Phone: +1 734 274 2396 696 Email: chuck.lever@oracle.com