idnits 2.17.1 draft-ietf-nfsv4-rpcrdma-bidirection-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 2, 2016) is 2909 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-11) exists of draft-ietf-nfsv4-rfc5666bis-04 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever 3 Internet-Draft Oracle 4 Intended status: Standards Track May 2, 2016 5 Expires: November 3, 2016 7 Bi-directional Remote Procedure Call On RPC-over-RDMA Transports 8 draft-ietf-nfsv4-rpcrdma-bidirection-03 10 Abstract 12 Recent minor versions of NFSv4 work best when ONC RPC transports can 13 send Remote Procedure Call transactions in both directions on the 14 same connection. This document describes how RPC-over-RDMA transport 15 endpoints convey RPCs in both directions on a single connection. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on November 3, 2016. 34 Copyright Notice 36 Copyright (c) 2016 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 52 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 53 2. Understanding RPC Direction . . . . . . . . . . . . . . . . . 3 54 2.1. Forward Direction . . . . . . . . . . . . . . . . . . . . 3 55 2.2. Backward Direction . . . . . . . . . . . . . . . . . . . 4 56 2.3. Bi-directional Operation . . . . . . . . . . . . . . . . 4 57 2.4. XID Values . . . . . . . . . . . . . . . . . . . . . . . 4 58 3. Rationale For Bi-Directional RPC-over-RDMA . . . . . . . . . 5 59 3.1. NFSv4.0 Callback Operation . . . . . . . . . . . . . . . 5 60 3.2. NFSv4.1 Callback Operation . . . . . . . . . . . . . . . 6 61 4. Flow Control . . . . . . . . . . . . . . . . . . . . . . . . 6 62 4.1. Backward Credits . . . . . . . . . . . . . . . . . . . . 7 63 4.2. Managing Receive Buffers . . . . . . . . . . . . . . . . 7 64 5. Protocol For Backward Operation . . . . . . . . . . . . . . . 8 65 5.1. Sending A Backward Direction Call . . . . . . . . . . . . 8 66 5.2. Sending A Backward Direction Reply . . . . . . . . . . . 9 67 5.3. Backward Direction Chunks . . . . . . . . . . . . . . . . 9 68 5.4. Backward Direction Retransmission . . . . . . . . . . . . 10 69 6. In the Absence of Backward Direction Support . . . . . . . . 10 70 7. Backward Direction Upper Layer Binding . . . . . . . . . . . 11 71 8. Security Considerations . . . . . . . . . . . . . . . . . . . 11 72 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 73 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 74 11. Normative References . . . . . . . . . . . . . . . . . . . . 12 75 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 12 77 1. Introduction 79 The purpose of this document is to enable bi-directional RPC 80 operation on RPC-over-RDMA protocol versions that do not have 81 specific protocol facilities for backward direction operation. 82 Backward direction RPC transactions enable the operation of NFSv4.1, 83 and in particular pNFS. 85 For example, using the protocol described in this document, RPC 86 transactions can be conveyed in both directions on the same RPC-over- 87 RDMA Version One connection without changes to the Version One header 88 XDR description. Therefore this document does not update 89 [I-D.ietf-nfsv4-rfc5666bis]. 91 Providing an Upper Layer Binding for NFSv4.x callback operations is 92 outside the scope of this document. 94 1.1. Requirements Language 96 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 97 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 98 document are to be interpreted as described in [RFC2119]. 100 2. Understanding RPC Direction 102 The ONC RPC protocol as described in [RFC5531] is fundamentally a 103 message-passing protocol between one server and one or more clients. 104 ONC RPC transactions are made up of two types of messages. 106 A CALL message, or "Call", requests work. A Call is designated by 107 the value CALL in the message's msg_type field. An arbitrary unique 108 value is placed in the message's xid field. A host that originates a 109 Call is referred to in this document as a "Requester." 111 A REPLY message, or "Reply", reports the results of work requested by 112 a Call. A Reply is designated by the value REPLY in the message's 113 msg_type field. The value contained in the message's xid field is 114 copied from the Call whose results are being returned. A host that 115 emits a Reply is referred to as a "Responder." 117 Typically, a Call generates a corresponding Reply. A Reply is never 118 sent without a corresponding Call. 120 RPC-over-RDMA is a connection-oriented RPC transport. When a 121 connection-oriented transport is used, ONC RPC client endpoints are 122 responsible for initiating transport connections, while ONC RPC 123 service endpoints wait passively for incoming connection requests. 125 RPC direction on connectionless RPC transports is not considered in 126 this document. 128 2.1. Forward Direction 130 A traditional ONC RPC client is always a Requester. A traditional 131 ONC RPC service is always a Responder. This traditional form of ONC 132 RPC message passing is referred to as operation in the "forward 133 direction." 135 During forward direction operation, the ONC RPC client is responsible 136 for establishing transport connections. 138 2.2. Backward Direction 140 The ONC RPC specification [RFC5531] does not forbid passing messages 141 in the other direction. An ONC RPC service endpoint can act as a 142 Requester, in which case an ONC RPC client endpoint acts as a 143 Responder. This form of message passing is referred to as operation 144 in the "backward direction." 146 During backward direction operation, the ONC RPC client is 147 responsible for establishing transport connections, even though ONC 148 RPC Calls come from the ONC RPC server. 150 ONC RPC clients and services are optimized to perform and scale well 151 while handling traffic in the forward direction, and may not be 152 prepared to handle operation in the backward direction. Not until 153 recently has there been a need to handle backward direction 154 operation. 156 2.3. Bi-directional Operation 158 A pair of connected RPC endpoints may choose to use only forward or 159 only backward direction operations on a particular transport. Or, 160 these endpoints may send Calls in both directions concurrently on the 161 same transport. 163 "Bi-directional operation" occurs when both transport endpoints act 164 as a Requester and a Responder at the same time. As above, the ONC 165 RPC client is always responsible for establishing transport 166 connections. 168 2.4. XID Values 170 Section 9 of [RFC5531] introduces the ONC RPC transaction identifier, 171 or "xid" for short. The value of an xid is interpreted in the 172 context of the message's msg_type field. 174 o The xid of a Call is arbitrary but is unique among outstanding 175 Calls from that Requester. 177 o The xid of a Reply always matches that of the initiating Call. 179 When receiving a Reply, a Requester matches the xid value in the 180 Reply with a Call it previously sent. 182 2.4.1. XID Generation 184 During bi-directional operation, forward and backward direction XIDs 185 are typically generated on distinct hosts by possibly different 186 algorithms. There is no co-ordination between forward and backward 187 direction XID generation. 189 Therefore, a forward direction Requester MAY use the same xid value 190 at the same time as a backward direction Requester on the same 191 transport connection. Though such concurrent requests use the same 192 xid value, they represent distinct ONC RPC transactions. 194 3. Rationale For Bi-Directional RPC-over-RDMA 196 3.1. NFSv4.0 Callback Operation 198 An NFSv4.0 client employs a traditional ONC RPC client to send NFS 199 requests to an NFSv4.0 server's traditional ONC RPC service 200 [RFC7530]. NFSv4.0 requests flow in the forward direction on a 201 connection established by the client. This connection is referred to 202 as a "forechannel" connection. 204 An NFSv4 "delegation" is simply a promise made by a server that it 205 will notify a client before another agent is allowed access to a 206 file. With this guarantee, that client can operate as sole accessor 207 of the file. In particular, it can manage the file's data and 208 metadata caches aggressively. 210 To administer file delegations, NFSv4.0 introduces the use of 211 callback operations, or "callbacks", in Section 10.2 of [RFC7530]. 212 An NFSv4.0 server sets up a traditional ONC RPC client, and an 213 NFSv4.0 client sets up a traditional ONC RPC service. Callbacks flow 214 in the forward direction on a connection established between the 215 server's callback client, and the client's callback server. This 216 connection is distinct from connections being used as forechannels, 217 and is referred to as a "backchannel connection." 219 When an RDMA transport is used as a forechannel, an NFSv4.0 client 220 typically provides a TCP callback service. The client's SETCLIENTID 221 operation advertises the callback service endpoint with a "tcp" or 222 "tcp6" netid. The server then connects to this service using a TCP 223 socket. 225 NFSv4.0 implementations are fully functional without a backchannel in 226 place. In this case, the server does not grant file delegations. 227 This might result in a negative performance effect, but functional 228 correctness is unaffected. 230 3.2. NFSv4.1 Callback Operation 232 NFSv4.1 supports file delegation in a similar fashion to NFSv4.0, and 233 extends the callback mechanism to manage pNFS layouts, as discussed 234 in Section 12 of [RFC5661]. 236 To facilitate operation through NAT routers, all NFSv4.1 transport 237 connections are initiated by NFSv4.1 clients. Therefore NFSv4.1 238 servers send callbacks to clients in the backward direction on 239 connections established by NFSv4.1 clients. 241 NFSv4.1 clients and servers indicate to their peers that a 242 backchannel capability is available on a given transport in the 243 arguments and results of NFS CREATE_SESSION or BIND_CONN_TO_SESSION 244 operations. 246 NFSv4.1 clients may establish distinct transport connections for 247 forechannel and backchannel operation, or they may combine 248 forechannel and backchannel operation on one transport connection 249 using bi-directional operation. 251 Without a backward direction RPC-over-RDMA capability, an NFSv4.1 252 client must additionally connect using a transport with backward 253 direction capability to use as a backchannel. TCP is the only choice 254 for an NFSv4.1 backchannel connection in this case. 256 Some implementations find it more convenient to use a single combined 257 transport (ie. a transport that is capable of bi-directional 258 operation). This simplifies connection establishment and recovery 259 during network partitions or when one endpoint restarts. 261 As with NFSv4.0, if a backchannel is not in use, an NFSv4.1 server 262 does not grant delegations. But because of its reliance on callbacks 263 to manage pNFS layout state, pNFS operation is not possible without a 264 backchannel. 266 4. Flow Control 268 For an RDMA Send operation to work, the receiving peer must have 269 posted an RDMA Receive Work Request (WR) to provide a receive buffer 270 in which to land the incoming message. If a receiver hasn't posted 271 enough Receive WRs to land incoming Send operations, the RDMA 272 provider is allowed to drop the RDMA connection. 274 RPC-over-RDMA transport protocols provide built-in send flow control 275 to prevent overrunning the number of pre-posted receive buffers on a 276 connection's receive endpoint. This is fully discussed in 277 Section 4.3 of [I-D.ietf-nfsv4-rfc5666bis]. 279 4.1. Backward Credits 281 Credits work the same way in the backward direction as they do in the 282 forward direction. However, forward direction credits and backward 283 direction credits are accounted separately. 285 In other words, the forward direction credit value is the same 286 whether or not there are backward direction resources associated with 287 an RPC-over-RDMA transport connection. The backward direction credit 288 value MAY be different than the forward direction credit value. The 289 rdma_credit field in a backward direction RPC-over-RDMA message MUST 290 NOT contain the value zero. 292 A backward direction Requester (ie, an RPC-over-RDMA service 293 endpoint) requests credits from the Responder (ie, an RPC-over-RDMA 294 client endpoint). The Responder reports how many credits it has 295 granted. This is the number of backward direction Calls the 296 Responder is prepared to handle at once. 298 When message direction is not fully determined by context or by an 299 accompanying RPC message with a call direction field, it is not 300 possible to tell whether the header credit value is a request or 301 grant, or whether the value applies to the forward direction or 302 backward direction. In such cases, the receiver MUST NOT use the 303 header's credit value. 305 4.2. Managing Receive Buffers 307 An RPC-over-RDMA transport endpoint must pre-post receive buffers 308 before it can receive and process incoming RPC-over-RDMA messages. 309 If a sender transmits a message for a receiver which has no prepared 310 receive buffer, the RDMA provider is allowed to drop the RDMA 311 connection. 313 4.2.1. Client Receive Buffers 315 Typically an RPC-over-RDMA Requester posts only as many receive 316 buffers as there are outstanding RPC Calls. A client endpoint 317 without backward direction support might therefore at times have no 318 pre-posted receive buffers. 320 To receive incoming backward direction Calls, an RPC-over-RDMA client 321 endpoint must pre-post enough additional receive buffers to match its 322 advertised backward direction credit value. Each outstanding forward 323 direction RPC requires an additional receive buffer above this 324 minimum. 326 When an RDMA transport connection is lost, all active receive buffers 327 are flushed and are no longer available to receive incoming messages. 328 When a fresh transport connection is established, a client endpoint 329 must re-post a receive buffer to handle the Reply for each 330 retransmitted forward direction Call, and a full set of receive 331 buffers to handle backward direction Calls. 333 4.2.2. Server Receive Buffers 335 A forward direction RPC-over-RDMA service endpoint posts as many 336 receive buffers as it expects incoming forward direction Calls. That 337 is, it posts no fewer buffers than the number of credits granted in 338 the rdma_credit field of forward direction RPC replies. 340 To receive incoming backward direction replies, an RPC-over-RDMA 341 server endpoint must pre-post a receive buffer for each backward 342 direction Call it sends. 344 When the existing transport connection is lost, all active receive 345 buffers are flushed and are no longer available to receive incoming 346 messages. When a fresh transport connection is established, a server 347 endpoint must re-post a receive buffer to handle the Reply for each 348 retransmitted backward direction Call, and a full set of receive 349 buffers for receiving forward direction Calls. 351 5. Protocol For Backward Operation 353 Performing backward direction ONC RPC operations over an RPC-over- 354 RDMA transport connection can be accomplished by observing the 355 protocol described in the following subsections. For reference, the 356 XDR description of RPC-over-RDMA Version One is contained in 357 Section 5.1 of [I-D.ietf-nfsv4-rfc5666bis]. 359 5.1. Sending A Backward Direction Call 361 To form a backward direction RPC-over-RDMA Call message, an ONC RPC 362 service endpoint constructs an RPC-over-RDMA header containing a 363 fresh RPC XID in the rdma_xid field (see Section 2.4 for full 364 requirements). 366 The rdma_vers field MUST contain the same value in backward and 367 forward direction Call messages on the same connection. 369 The number of requested backward direction credits is placed in the 370 rdma_credit field (see Section 4). 372 Whether presented inline or as a separate chunk, the ONC RPC Call 373 header MUST start with the same XID value that is present in the RPC- 374 over-RDMA header, and the header's msg_type field MUST contain the 375 value CALL. 377 5.2. Sending A Backward Direction Reply 379 To form a backward direction RPC-over-RDMA Reply message, an ONC RPC 380 client endpoint constructs an RPC-over-RDMA header containing a copy 381 of the matching ONC RPC Call's RPC XID in the rdma_xid field (see 382 Section 2.4 for full requirements). 384 The rdma_vers field MUST contain the same value in a backward 385 direction Reply message as in the matching Call message. 387 The number of granted backward direction credits is placed in the 388 rdma_credit field (see Section 4). 390 Whether presented inline or as a separate chunk, the ONC RPC Reply 391 header MUST start with the same XID value that is present in the RPC- 392 over-RDMA header, and the header's msg_type field MUST contain the 393 value REPLY. 395 5.3. Backward Direction Chunks 397 Chunks MAY be used in the backward direction. They operate the same 398 way as in the forward direction (see [I-D.ietf-nfsv4-rfc5666bis] for 399 details). 401 An implementation might not support any Upper Layer Protocol that has 402 DDP-eligible data items. The Upper Layer Protocol may also use only 403 small messages, or it may have a native mechanism for restricting the 404 size of backward direction RPC messages, obviating the need to handle 405 Long Messages in the backward direction. 407 When there is no Upper Layer Protocol requirement for chunks, 408 implementers can choose not to provide support for chunks in the 409 backward direction. This avoids the complexity of adding support for 410 performing RDMA Reads and Writes in the backward direction. 412 When chunks are not implemented, RPC messages in the backward 413 direction are always sent using RDMA_MSG, and therefore can be no 414 larger than what can be sent inline (that is, without chunks). 415 Sending an inline message larger than the receiver's inline threshold 416 can result in loss of connection. 418 If a backward direction requester provides a non-empty chunk list to 419 a responder that does not support chunks, the responder MUST reply 420 with an RDMA_ERROR message with rdma_err field set to ERR_CHUNK. 422 5.4. Backward Direction Retransmission 424 In rare cases, an ONC RPC transaction cannot be completed within a 425 certain time. This can be because the transport connection was lost, 426 the Call or Reply message was dropped, or because the Upper Layer 427 consumer delayed or dropped the ONC RPC request. Typically, the 428 Requester sends the transaction again, reusing the same RPC XID. 429 This is known as an "RPC retransmission". 431 In the forward direction, the Requester is the ONC RPC client. The 432 client is always responsible for establishing a transport connection 433 before sending again. 435 In the backward direction, the Requester is the ONC RPC server. 436 Because an ONC RPC server does not establish transport connections 437 with clients, it cannot send a retransmission if there is no 438 transport connection. It must wait for the ONC RPC client to re- 439 establish the transport connection before it can retransmit ONC RPC 440 transactions in the backward direction. 442 If an ONC RPC client has no work to do, it may be some time before it 443 re-establishes a transport connection. Backward direction Requesters 444 must be prepared to wait indefinitely for a connection to be 445 established before a pending backward direction ONC RPC Call can be 446 retransmitted. 448 6. In the Absence of Backward Direction Support 450 An RPC-over-RDMA transport endpoint might not support backward 451 direction operation. There might be no mechanism in the transport 452 implementation to do so. Or the Upper Layer Protocol consumer might 453 not yet have configured the transport to handle backward direction 454 traffic. 456 If an endpoint is not prepared to receive an incoming backward 457 direction message, loss of the RDMA connection might result. Thus a 458 denial-of-service could result if a sender continues to send backward 459 direction messages after every transport reconnect to an endpoint 460 that is not prepared to receive them. 462 When dealing with the possibility that the remote peer has no 463 transport level support for backward direction operation, the Upper 464 Layer Protocol becomes responsible for informing peers when backward 465 direction operation is supported. Otherwise even a simple backward 466 direction NULL probe from a peer could result in a lost connection. 468 An NFSv4.1 server does not send backchannel messages to an NFSv4.1 469 client before the NFSv4.1 client has sent a CREATE_SESSION or a 470 BIND_CONN_TO_SESSION operation. As long as an NFSv4.1 client has 471 prepared appropriate backchannel resources before sending one of 472 these operations announcing support for backchannel operation, 473 denial-of-service is avoided. 475 Therefore, an Upper Layer Protocol consumer MUST NOT perform backward 476 direction ONC RPC operations unless the peer consumer has indicated 477 it is prepared to handle them. A description of Upper Layer Protocol 478 mechanisms used for this indication is outside the scope of this 479 document. 481 7. Backward Direction Upper Layer Binding 483 Since backward direction operation occurs on an already-established 484 connection, there is no need to specify RPC bind parameters. 486 An Upper Layer Protocol that operates on RPC-over-RDMA transports in 487 the backward direction may have DDP-eligible data items. These are 488 specified in an Upper Layer Binding document. 490 By default, no data items in a ULP are DDP-eligible. If there are no 491 DDP-eligible data items to document, an explicit Upper Layer Binding 492 may not be needed for an Upper Layer Protocol that operates only in 493 the backward direction. 495 Consult Section 7 of [I-D.ietf-nfsv4-rfc5666bis] for details about 496 what else may be contained in a binding. 498 8. Security Considerations 500 Security considerations for operation on RPC-over-RDMA transports are 501 outlined in Section 9 of [I-D.ietf-nfsv4-rfc5666bis]. 503 9. IANA Considerations 505 This document does not require actions by IANA. 507 10. Acknowledgements 509 Tom Talpey was an indispensable resource, in addition to creating the 510 foundation upon which this work is based. Our warmest regards go to 511 him for his help and support. 513 Dave Noveck provided excellent review, constructive suggestions, and 514 navigational guidance throughout the process of drafting this 515 document. 517 Dai Ngo was a solid partner and collaborator. Together we 518 constructed and tested independent prototypes of the changes 519 described in this document. 521 The author wishes to thank Bill Baker for his unwavering support of 522 this work. In addition, the author gratefully acknowledges the 523 expert contributions of Karen Deitke, Chunli Zhang, Mahesh 524 Siddheshwar, Steve Wise, and Tom Tucker. 526 Special thanks go to the nfsv4 Working Group Chair Spencer Shepler 527 and the nfsv4 Working Group Secretary Tom Haynes for their support. 529 11. Normative References 531 [I-D.ietf-nfsv4-rfc5666bis] 532 Lever, C., Simpson, W., and T. Talpey, "Remote Direct 533 Memory Access Transport for Remote Procedure Call", draft- 534 ietf-nfsv4-rfc5666bis-04 (work in progress), March 2016. 536 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 537 Requirement Levels", BCP 14, RFC 2119, March 1997. 539 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 540 Specification Version 2", RFC 5531, May 2009. 542 [RFC5661] Shepler, S., Eisler, M., and D. Noveck, "Network File 543 System (NFS) Version 4 Minor Version 1 Protocol", RFC 544 5661, January 2010. 546 [RFC7530] Haynes, T. and D. Noveck, "Network File System (NFS) 547 Version 4 Protocol", RFC 7530, March 2015. 549 Author's Address 551 Charles Lever 552 Oracle Corporation 553 1015 Granger Avenue 554 Ann Arbor, MI 48104 555 USA 557 Phone: +1 734 274 2396 558 Email: chuck.lever@oracle.com