idnits 2.17.1 draft-ietf-nfsv4-nfs-ulb-v2-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (6 April 2021) is 1109 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-07) exists of draft-ietf-nfsv4-rpcrdma-version-two-03 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever 3 Internet-Draft Oracle 4 Intended status: Standards Track 6 April 2021 5 Expires: 8 October 2021 7 Network File System (NFS) Upper-Layer Binding To RPC-Over-RDMA Version 2 8 draft-ietf-nfsv4-nfs-ulb-v2-04 10 Abstract 12 This document specifies Upper-Layer Bindings of Network File System 13 (NFS) protocol versions to RPC-over-RDMA version 2. 15 Note 17 Discussion of this draft takes place on the NFSv4 working group 18 mailing list (nfsv4@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group 20 information can be found at https://datatracker.ietf.org/wg/nfsv4/ 21 about/. 23 This note is to be removed before publishing as an RFC. 25 The source for this draft is maintained in GitHub. Suggested changes 26 can be submitted as pull requests at https://github.com/chucklever/ 27 i-d-nfs-ulb-v2. Instructions are on that page as well. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on 8 October 2021. 46 Copyright Notice 48 Copyright (c) 2021 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 53 license-info) in effect on the date of publication of this document. 54 Please review these documents carefully, as they describe your rights 55 and restrictions with respect to this document. Code Components 56 extracted from this document must include Simplified BSD License text 57 as described in Section 4.e of the Trust Legal Provisions and are 58 provided without warranty as described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 64 3. Upper-Layer Binding for NFS Versions 2 and 3 . . . . . . . . 4 65 3.1. Reply Size Estimation . . . . . . . . . . . . . . . . . . 4 66 3.2. RPC Binding Considerations . . . . . . . . . . . . . . . 5 67 3.3. Transport Considerations . . . . . . . . . . . . . . . . 5 68 4. Upper-Layer Bindings for NFS Version 2 and 3 Auxiliary 69 Protocols . . . . . . . . . . . . . . . . . . . . . . . . 6 70 4.1. MOUNT, NLM, and NSM Protocols . . . . . . . . . . . . . . 7 71 4.2. NFSACL Protocol . . . . . . . . . . . . . . . . . . . . . 7 72 5. Upper-Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 73 5.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 7 74 5.2. Reply Size Estimation . . . . . . . . . . . . . . . . . . 9 75 5.3. RPC Binding Considerations . . . . . . . . . . . . . . . 10 76 5.4. NFS COMPOUND Requests . . . . . . . . . . . . . . . . . . 10 77 5.5. NFS Callback Requests . . . . . . . . . . . . . . . . . . 12 78 5.6. Session-Related Considerations . . . . . . . . . . . . . 13 79 5.7. Transport Considerations . . . . . . . . . . . . . . . . 14 80 6. Extending NFS Upper-Layer Bindings . . . . . . . . . . . . . 15 81 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 82 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 83 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 84 9.1. Normative References . . . . . . . . . . . . . . . . . . 16 85 9.2. Informative References . . . . . . . . . . . . . . . . . 17 86 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 18 87 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 89 1. Introduction 91 The RPC-over-RDMA version 2 transport may employ direct data 92 placement to convey data payloads associated with RPC transactions, 93 as described in [I-D.ietf-nfsv4-rpcrdma-version-two]. RPC client and 94 server implementations using RPC-over-RDMA version 2 must agree which 95 XDR data items and RPC procedures are eligible to use direct data 96 placement (DDP) to ensure successful interoperation. 98 An Upper-Layer Binding specifies this agreement for one or more 99 versions of one RPC program. Other operational details, such as RPC 100 binding assignments, pairing Write chunks with result data items, and 101 reply size estimation, are also specified by such a Binding. 103 This document contains material required of Upper-Layer Bindings, as 104 specified in Appendix A of [I-D.ietf-nfsv4-rpcrdma-version-two], for 105 the following NFS protocol versions: 107 * NFS version 2 [RFC1094] 109 * NFS version 3 [RFC1813] 111 * NFS version 4.0 [RFC7530] 113 * NFS version 4.1 [RFC8881] 115 * NFS version 4.2 [RFC7862] 117 The current document also provides Upper-Layer Bindings for auxiliary 118 protocols used with NFS versions 2 and 3 (see Section 4). 120 This document assumes the reader is already familiar with concepts 121 and terminology defined throughout 122 [I-D.ietf-nfsv4-rpcrdma-version-two] and the documents it references. 124 2. Requirements Language 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 128 "OPTIONAL" in this document are to be interpreted as described in 129 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 130 capitals, as shown here. 132 3. Upper-Layer Binding for NFS Versions 2 and 3 134 The Upper-Layer Binding specification in this section applies to NFS 135 version 2 [RFC1094] and NFS version 3 [RFC1813]. For brevity, in 136 this document, a "Legacy NFS client" refers to an NFS client using 137 version 2 or version 3 of the NFS RPC program (100003) to communicate 138 with an NFS server. Likewise, a "Legacy NFS server" is an NFS server 139 communicating with clients using NFS version 2 or NFS version 3. 141 The following XDR data items in NFS versions 2 and 3 are DDP- 142 eligible: 144 * The opaque file data argument in the NFS WRITE procedure 146 * The pathname argument in the NFS SYMLINK procedure 148 * The opaque file data result in the NFS READ procedure 150 * The pathname result in the NFS READLINK procedure 152 All other argument or result data items in NFS versions 2 and 3 are 153 not DDP-eligible. 155 Whether or not an NFS operation is considered non-idempotent, a 156 transport error might not indicate whether the server has processed 157 the arguments of the RPC Call, or whether the server has accessed or 158 modified client memory associated with that RPC. 160 3.1. Reply Size Estimation 162 During the construction of each RPC Call message, a Requester is 163 responsible for allocating appropriate transport resources to receive 164 the corresponding Reply message. These resources must be capable of 165 holding the entire Reply, therefore the Requester needs to estimate 166 the maximum possible size of the expected Reply message. 168 * In many cases, the expected Reply can fit in one or a few RDMA 169 Send messages. The Requester need not provision any RDMA 170 resources, relying instead on message continuation to handle the 171 entire Reply message. 173 * In cases where the Requester deems direct data placement to be the 174 most efficient transfer mechanism, it provisions Write chunks 175 wherein the Responder can place results. In these cases, the 176 Requester must reliably estimate the maximum size of each result 177 that is to be placed in a Write chunk. 179 * When the Requester expects an especially large Reply message, it 180 can provision a combination of a Reply chunk and Write chunks for 181 result data items. In such cases, the Requester must reliably 182 estimate the maximum size of each result that is to be placed in a 183 Write chunk and the maximum size of the remainder to be placed in 184 the Reply chunk. 186 A legacy NFS client needs to make every effort to avoid 187 retransmission of non-idempotent NFS requests due to underestimated 188 Reply resources. Thanks to the mechanism of message continuation in 189 RPC-over-RDMA version 2, the need for such retransmission is greatly 190 reduced. 192 3.2. RPC Binding Considerations 194 Legacy NFS servers traditionally listen for clients on UDP and TCP 195 port 2049. Additionally, they register these ports with a local 196 portmapper service [RFC1833]. 198 A Legacy NFS server supporting RPC-over-RDMA version 2 and 199 registering itself with the RPC portmapper MAY choose an arbitrary 200 port, or MAY use the alternative well-known port number for its RPC- 201 over-RDMA service (see Section 8). The chosen port MAY be registered 202 with the RPC portmapper using the netids assigned in Section 12 of 203 [I-D.ietf-nfsv4-rpcrdma-version-two]. 205 3.3. Transport Considerations 207 Legacy NFS client implementations often rely on a transport-layer 208 keep-alive mechanism to detect when a legacy server has become 209 unresponsive. When an NFS server is no longer responsive, client- 210 side keep-alive terminates the connection, which in turn triggers 211 reconnection and retransmission of outstanding RPC transactions. 213 3.3.1. Keep-Alive 215 Some RDMA transports (such as the Reliable Connected QP type on 216 InfiniBand) have no keep-alive mechanism. Without a disconnect or 217 new RPC traffic, such connections can remain alive long after an NFS 218 server has become unresponsive or unreachable. Once an NFS client 219 has consumed all available RPC-over-RDMA version 2 credits on that 220 transport connection, it awaits a reply indefinitely before sending 221 another RPC request. 223 Legacy NFS clients SHOULD reserve one RPC-over-RDMA version 2 credit 224 to use for periodic server or connection health assessment. Either 225 peer can use this credit to drive an RPC request on an otherwise idle 226 connection, triggering either an affirmative server response or a 227 connection termination. 229 3.3.2. Replay Detection 231 Legacy NFS servers typically employ request replay detection to 232 reduce the risk of data and file namespace corruption that could 233 result when an NFS client retransmits a non-idempotent NFS request. 234 A legacy NFS server can send a cached response when a replay is 235 detected, rather than executing the request again. Replay detection 236 is not perfect, but it is usually adequate. 238 For legacy NFS servers, replay detection commonly utilizes heuristic 239 indicators such as the IP address of the NFS client, the source port 240 of the connection, the transaction ID of the request, and the 241 contents of the request's RPC and upper-layer protocol headers. In 242 short, replay detection is typically based on a connection tuple and 243 the request's XID. A legacy NFS client is careful to re-use the same 244 source port, if practical, when reconnecting so that legacy NFS 245 servers are better able to detect retransmissions. 247 However, a legacy NFS client operating over an RDMA transport has no 248 control over connection source ports. It is almost certain that an 249 RPC request that is retransmitted on a new connection can never be 250 detected as a replay if the legacy NFS server includes the connection 251 source port in its replay detection heuristics. 253 Therefore a legacy NFS server using an RDMA transport should never 254 use a legacy NFS client connection's source port as part of its NFS 255 request replay detection mechanism. 257 4. Upper-Layer Bindings for NFS Version 2 and 3 Auxiliary Protocols 259 Storage administrators typically deploy NFS versions 2 and 3 with 260 several other protocols, sometimes referred to as the "NFS auxiliary 261 protocols." These are distinct RPC programs that define procedures 262 that are not part of the NFS RPC program (100003). The Upper-Layer 263 Bindings in this section apply to: 265 * Versions 2 and 3 of the MOUNT RPC program (100005) [RFC1813] 267 * Versions 1, 3, and 4 of the NLM RPC program (100021) [RFC1813] 269 * Version 1 of the NSM RPC program (100024), described in Chapter 11 270 of [XNFS] 272 * Versions 2 and 3 of the NFSACL RPC program (100227). The NFSACL 273 program does not have a public definition. In this document it is 274 treated as a de facto standard, as there are several 275 interoperating implementations. 277 4.1. MOUNT, NLM, and NSM Protocols 279 Historically, NFS/RDMA implementations have chosen to convey the 280 MOUNT, NLM, and NSM protocols via TCP. A legacy NFS server 281 implementation MUST provide support for these protocols via TCP to 282 enable interoperation of these protocols when NFS/RDMA is in use. 284 4.2. NFSACL Protocol 286 Often legacy clients and servers that support the NFSACL RPC program 287 convey NFSACL procedures on the same transport connection and port as 288 the NFS RPC program (100003). Utilizing the same port obviates the 289 need for separate a rpcbind query to discover server support for this 290 RPC program. 292 ACLs are typically small, but even large ACLs must be encoded and 293 decoded to some degree before being made available to users. Thus no 294 data item in this Upper-Layer Protocol is DDP-eligible. 296 For procedures whose replies do not include an ACL object, the size 297 of a reply is determined directly from the NFSACL RPC program's XDR 298 definition. However, legacy client implementations should choose a 299 maximum size for ACLs based on internal limits, and can rely on 300 message continuation to handle the a priori unknown size of large ACL 301 objects in Replies. 303 5. Upper-Layer Binding For NFS Version 4 305 The Upper-Layer Binding specification in this section applies to 306 versions of the NFS RPC program defined in NFS version 4.0 [RFC7530] 307 NFS version 4.1 [RFC8881] and NFS version 4.2 [RFC7862]. 309 5.1. DDP-Eligibility 311 Only the following XDR data items in the COMPOUND procedure of all 312 NFS version 4 minor versions are DDP-eligible: 314 * The opaque data field in the WRITE4args structure 316 * The linkdata field of the NF4LNK arm in the createtype4 union 318 * The opaque data field in the READ4resok structure 319 * The linkdata field in the READLINK4resok structure 321 5.1.1. The NFSv4.2 READ_PLUS operation 323 NFS version 4.2 introduces an enhanced READ operation called 324 READ_PLUS [RFC7862]. READ_PLUS enables an NFS server to perform data 325 reduction of READ results so that the returned READ data is more 326 compact. 328 In a READ_PLUS result, returned file content appears as a list of one 329 or more of the following items: 331 * Regular data content: the same as the result of a traditional READ 332 operation. 334 * Unallocated space in a file: where no data has yet been written or 335 previously-written data has been removed via a hole-punch 336 operation. 338 * A counted pattern. 340 Upon receipt of a READ_PLUS result, an NFSv4.2 client expands the 341 returned list into the preferred local representation of the original 342 file content. 344 Before receiving that result, an NFSv4.2 client typically does not 345 know how the file's content is organized on the NFS server. Thus it 346 is not possible to predict the size or structure of a READ_PLUS Reply 347 in advance. The use of direct data placement is therefore 348 challenging. 350 A READ_PLUS content list containing more than one segment of regular 351 file data could be conveyed using multiple Write chunks, but only if 352 the client knows in advance where those chunks appear in the Reply 353 Payload stream. Moreover, the usual benefits of hardware-assisted 354 data placement are entirely waived if the client-side transport must 355 parse the result of each read I/O. 357 Therefore this Upper Layer Binding does not make any element of an 358 NFSv4.2 READ_PLUS Reply DDP-eligible. Further, this Upper Layer 359 Binding recommends that implementations avoid the use of the 360 READ_PLUS operation on NFS/RDMA mount points. 362 5.2. Reply Size Estimation 364 Within NFS version 4, there are certain variable-length result data 365 items whose maximum size cannot be estimated by clients reliably 366 because there is no protocol-specified size limit on these result 367 arrays. These include: 369 * The attrlist4 field 371 * Fields containing ACLs such as fattr4_acl, fattr4_dacl, and 372 fattr4_sacl 374 * Fields in the fs_locations4 and fs_locations_info4 data structures 376 * Fields which pertain to pNFS layout metadata, such as loc_body, 377 loh_body, da_addr_body, lou_body, lrf_body, fattr_layout_types, 378 and fs_layout_types 380 5.2.1. Reply Size Estimation for Minor Version 0 382 The NFS version 4.0 protocol itself does not impose any bound on the 383 size of NFS calls or replies. 385 Some of the data items enumerated in Section 5.2 (in particular, the 386 items related to ACLs and fs_locations) make it difficult to predict 387 the maximum size of NFS version 4.0 replies that interrogate 388 variable-length fattr4 attributes. Client implementations might rely 389 upon internal architectural limits to constrain the reply size, but 390 such limits are not always guaranteed to be reliable. 392 When an NFS version 4.0 client expects an especially sizeable fattr4 393 result, it can rely on message continuation or provision a Reply 394 chunk to enable that server to return that result via explicit RDMA. 396 5.2.2. Reply Size Estimation for Minor Version 1 and Newer 398 In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs 399 argument of the CREATE_SESSION operation contains a 400 ca_maxresponsesize field. The value in this field can be taken as 401 the absolute maximum size of replies generated by an NFS version 4.1 402 server. 404 An NFS version 4 client can use this value in cases where it is not 405 possible to estimate a reply size upper bound precisely. In 406 practice, objects such as ACLs, named attributes, layout bodies, and 407 security labels are much smaller than this maximum. 409 5.3. RPC Binding Considerations 411 NFS version 4 servers are required to listen on TCP port 2049, and 412 are not required to register with an rpcbind service [RFC7530]. 413 Therefore, an NFS version 4 server supporting RPC-over-RDMA version 2 414 MUST use the alternative well-known port number for its RPC-over-RDMA 415 service (see Section 8 Clients SHOULD connect to this well-known port 416 without consulting the RPC portmapper (as for NFS version 4 on TCP 417 transports). 419 5.4. NFS COMPOUND Requests 421 5.4.1. Multiple DDP-eligible Data Items 423 An NFS version 4 COMPOUND procedure can contain more than one 424 operation that carries a DDP-eligible data item. An NFS version 4 425 client provides XDR Position values in each Read chunk to 426 disambiguate which chunk is associated with which argument data item. 427 However, NFS version 4 server and client implementations must agree 428 in advance on how to pair Write chunks with returned result data 429 items. 431 In the following lists, a "READ operation" refers to any NFS version 432 4 operation that has a DDP-eligible result data item. An NFS version 433 4 client applies the mechanism specified in Section 4.3.2 of 434 [I-D.ietf-nfsv4-rpcrdma-version-two] to this class of operations as 435 follows: 437 * If an NFS version 4 client wishes all DDP-eligible items in an NFS 438 reply to be conveyed inline, it leaves the Write list empty. 440 An NFS version 4 server acts as follows: 442 * The first chunk in the Write list MUST be used by the first READ 443 operation in an NFS version 4 COMPOUND procedure. The next Write 444 chunk is used by the next READ operation, and so on. 446 * If an NFS version 4 client has provided a matching non-empty Write 447 chunk, then the corresponding READ operation MUST return its DDP- 448 eligible data item using that chunk. 450 * If an NFS version 4 client has provided an empty matching Write 451 chunk, then the corresponding READ operation MUST return all of 452 its result data items inline. 454 * If a READ operation returns a union arm which does not contain a 455 DDP-eligible result, and the NFS version 4 client has provided a 456 matching non-empty Write chunk, an NFS version 4 server MUST 457 return an empty Write chunk in that Write list position. 459 * If there are more READ operations than Write chunks, then 460 remaining NFS Read operations in an NFS version 4 COMPOUND that 461 have no matching Write chunk MUST return their results inline. 463 5.4.2. Chunk List Complexity 465 By default, the RPC-over-RDMA version 2 protocol places limits on the 466 number of chunks or segments that may appear in Read or Write lists 467 (see Section 5.2 of [I-D.ietf-nfsv4-rpcrdma-version-two]). 469 These implementation limits are especially important when Kerberos 470 integrity or privacy is in use [RFC7861]. GSS services increase the 471 size of credential material in RPC headers, potentially requiring the 472 use of a Long message, which increases the complexity of chunk lists 473 independent of the particular NFS version 4 COMPOUND being conveyed. 475 In the absence of an explicit transport property exchange that alters 476 these limits, NFS version 4 clients SHOULD follow the prescriptions 477 listed below when constructing RPC-over-RDMA version 2 messages. NFS 478 version 4 servers MUST accept and process all such requests. 480 * The Read list can contain either a Position-Zero Read chunk, one 481 Read chunk with a non-zero Position, or both. 483 * The Write list can contain no more than one Write chunk. 485 NFS version 4 clients wishing to send more complex chunk lists can 486 provide configuration interfaces to bound the complexity of NFS 487 version 4 COMPOUNDs, limit the number of elements in scatter-gather 488 operations, and avoid other sources of chunk overruns at the 489 receiving peer. 491 If an NFS version 4 server receives an RPC request via RPC-over-RDMA 492 version 2 that it cannot process due to chunk list complexity limits, 493 it SHOULD return one of the following responses to the client: 495 * A problem is detected by the transport layer while parsing the 496 transport header in an RPC Call message. The server responds with 497 an RDMA2_ERROR message with the err field set to ERR_CHUNK. 499 * A problem is detected during XDR decoding of the RPC Call message 500 while the RPC layer reassembles the call's XDR stream. The server 501 responds with an RPC reply with its "reply_stat" field set to 502 MSG_ACCEPTED and its "accept_stat" field set to GARBAGE_ARGS. 504 After receiving one of these errors, an NFS version 4 client SHOULD 505 NOT retransmit the failing request, as the result would be the same 506 error. It SHOULD terminate the RPC transaction associated with the 507 XID in the reply without further processing, and report an error to 508 the RPC consumer. 510 5.4.3. NFS Version 4 COMPOUND Example 512 The following example shows a Write list with three Write chunks, A, 513 B, and C. The NFS version 4 server consumes the provided Write 514 chunks by writing the results of the designated operations in the 515 compound request (READ and READLINK) back to each chunk. 517 Write list: 519 A --> B --> C 521 NFS version 4 COMPOUND request: 523 PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ 524 | | | 525 v v v 526 A B C 528 If the NFS version 4 client does not want to have the READLINK result 529 returned via RDMA, it provides an empty Write chunk for buffer B to 530 indicate that the READLINK result must be returned inline. 532 5.5. NFS Callback Requests 534 The NFS version 4 family of protocols support server-initiated 535 callbacks to notify NFS version 4 clients of events such as recalled 536 delegations. 538 5.5.1. NFS Version 4.0 Callback 540 An NFS version 4.0 client uses the SETCLIENTID operation to advertise 541 the IP address, port, and netid of its NFS version 4.0 callback 542 service. When an NFS version 4.0 server provides a backchannel 543 service to an NFS version 4.0 client that uses RPC-over-RDMA version 544 2 for its forward channel, the server MUST advertise the backchannel 545 service using either the "tcp" or "tcp6" netid. 547 Because the backchannel does not operate on RPC-over-RDMA, no XDR 548 data item in the NFS version 4.0 callback RPC program is DDP- 549 eligible. 551 5.5.2. NFS Version 4.1 Callback 553 In NFS version 4.1 and newer minor versions, callback operations may 554 appear on the same connection that is in use for NFS version 4 555 forward channel client requests. NFS version 4 clients and servers 556 MUST use the mechanisms described in Section 4.5 of 557 [I-D.ietf-nfsv4-rpcrdma-version-two] to convey backchannel operations 558 on an RPC-over-RDMA version 2 transport. 560 The csa_back_chan_attrs argument of the CREATE_SESSION operation 561 contains a ca_maxresponsesize field. The value in this field is the 562 absolute maximum size of backchannel replies generated by a replying 563 NFS version 4 client. 565 There are no DDP-eligible data items in callback procedures defined 566 in NFS version 4.1 or NFS version 4.2. However, some callback 567 operations, such as messages that convey device ID information, can 568 be sizeable. A sender can use Message Continuation or a Long message 569 in this situation. 571 When an NFS version 4.1 client can support Long Calls in its 572 backchannel, it reports a backchannel ca_maxrequestsize that is 573 larger than the connection's inline thresholds. Otherwise, an NFS 574 version 4 server MUST use only Short messages to convey backchannel 575 operations. 577 5.6. Session-Related Considerations 579 The presence of an NFS version 4 session (as defined in [RFC8881]) 580 does not effect the operation of RPC-over-RDMA version 2. None of 581 the operations introduced to support NFS sessions (e.g., the SEQUENCE 582 operation) contain DDP-eligible data items. There is no need to 583 match the number of session slots with the number of available RPC- 584 over-RDMA version 2 credits. 586 However, there are a few new cases where an RPC transaction can fail. 587 For example, a Requester might receive, in response to an RPC 588 request, an RDMA2_ERROR message with a rdma_err value of ERR_CHUNK. 589 These situations are not different from existing RPC errors, which an 590 NFS session implementation can already handle for other transport 591 types. Moreover, there might be no SEQUENCE result available to the 592 Requester to distinguish whether failure occurred before or after the 593 Responder executed the requested operations. 595 When a transport error occurs (e.g., an RDMA2_ERROR type message is 596 received), the Requester proceeds, as usual, to match the incoming 597 XID value to a waiting RPC Call. The Requester terminates the RPC 598 transaction and reports the result status to the RPC consumer. The 599 Requester's session implementation then determines the session ID and 600 slot for the failed request and performs slot recovery to make that 601 slot usable again. Otherwise, that slot could be rendered 602 permanently unavailable. 604 When an NFS session is not present (for example, when NFS version 4.0 605 is in use), a transport error does not indicate whether the server 606 has processed the arguments of the RPC Call, or whether the server 607 has accessed or modified client memory associated with that RPC. 609 5.7. Transport Considerations 611 5.7.1. Congestion Avoidance 613 Section 3.1 of [RFC7530] states: 615 Where an NFS version 4 implementation supports operation over the 616 IP network protocol, the supported transport layer between NFS and 617 IP MUST be an IETF standardized transport protocol that is 618 specified to avoid network congestion; such transports include TCP 619 and the Stream Control Transmission Protocol (SCTP). 621 Section 2.9.1 of [RFC8881] further states: 623 Even if NFS version 4.1 is used over a non-IP network protocol, it 624 is RECOMMENDED that the transport support congestion control. 626 It is permissible for a connectionless transport to be used under 627 NFS version 4.1; however, reliable and in-order delivery of data 628 combined with congestion control by the connectionless transport 629 is REQUIRED. As a consequence, UDP by itself MUST NOT be used as 630 an NFS version 4.1 transport. 632 RPC-over-RDMA version 2 utilizes only reliable, connection-oriented 633 transports that guarantee in-order delivery, meeting all the above 634 requirements for NFS version 4.0 and 4.1. See Section 4.2.1 of 635 [I-D.ietf-nfsv4-rpcrdma-version-two] for more details. 637 5.7.2. Retransmission and Keep-alive 639 NFS version 4 client implementations often rely on a transport-layer 640 keep-alive mechanism to detect when an NFS version 4 server has 641 become unresponsive. When an NFS server is no longer responsive, 642 client-side keep-alive terminates the connection, which in turn 643 triggers reconnection and RPC retransmission. 645 Some RDMA transports (such as the Reliable Connected QP type on 646 InfiniBand) have no keep-alive mechanism. Without a disconnect or 647 new RPC traffic, such connections can remain alive long after an NFS 648 server has become unresponsive. Once an NFS client has consumed all 649 available RPC-over-RDMA version 2 credits on that transport 650 connection, it indefinitely awaits a reply before sending another RPC 651 request. 653 NFS version 4 clients SHOULD reserve one RPC-over-RDMA version 2 654 credit to use for periodic server or connection health assessment. 655 Either peer can use this credit to drive an RPC request on an 656 otherwise idle connection, triggering either a quick affirmative 657 server response or immediate connection termination. 659 In addition to network partition and request loss scenarios, RPC- 660 over-RDMA version 2 transport connections can be terminated when a 661 Transport header is malformed, Reply messages exceed receive 662 resources, or when too many RPC-over-RDMA messages are sent at once. 663 In such cases: 665 * If a transport error occurs (e.g., an RDMA2_ERROR type message is 666 received) before the disconnect or instead of a disconnect, the 667 Requester MUST respond to that error as prescribed by the 668 specification of the RPC transport. Then the NFS version 4 rules 669 for handling retransmission apply. 671 * If there is a transport disconnect and the Responder has provided 672 no other response for a request, then only the NFS version 4 rules 673 for handling retransmission apply. 675 6. Extending NFS Upper-Layer Bindings 677 RPC programs such as NFS are required to have an Upper-Layer Binding 678 specification to interoperate on RPC-over-RDMA version 2 transports 679 [I-D.ietf-nfsv4-rpcrdma-version-two]. Via standards action, the 680 Upper-Layer Binding specified in this document can be extended to 681 cover versions of the NFS version 4 protocol specified after NFS 682 version 4 minor version 2, or to cover separately published 683 extensions to an existing NFS version 4 minor version, as described 684 in [RFC8178]. 686 7. Security Considerations 688 RPC-over-RDMA version 2 supports all RPC security models, including 689 RPCSEC_GSS security and transport-level security [RFC7861]. The 690 choice of what Direct Data Placement mechanism to convey RPC argument 691 and results does not affect this since it changes only the method of 692 data transfer. Because the current document defines only the binding 693 of the NFS protocols atop RPC-over-RDMA version 2 694 [I-D.ietf-nfsv4-rpcrdma-version-two], all relevant security 695 considerations are, therefore, described at that layer. 697 8. IANA Considerations 699 The use of direct data placement in NFS introduces a need for an 700 additional port number assignment for networks that share traditional 701 UDP and TCP port spaces with RDMA services. The iWARP protocol is 702 such an example [RFC5040] [RFC5041]. 704 For this purpose, the current document specifies a set of transport 705 protocol port number assignments. IANA has assigned the following 706 ports for NFS/RDMA in the IANA port registry, according to the 707 guidelines described in [RFC6335]. 709 nfsrdma 20049/tcp Network File System (NFS) over RDMA 710 nfsrdma 20049/udp Network File System (NFS) over RDMA 711 nfsrdma 20049/sctp Network File System (NFS) over RDMA 713 The current document should be added as a reference for the nfsrdma 714 port assignments. The current document does not alter these 715 assignments. 717 9. References 719 9.1. Normative References 721 [I-D.ietf-nfsv4-rpcrdma-version-two] 722 Lever, C. and D. Noveck, "RPC-over-RDMA Version 2 723 Protocol", Work in Progress, Internet-Draft, draft-ietf- 724 nfsv4-rpcrdma-version-two-03, 10 August 2020, 725 . 728 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 729 RFC 1833, DOI 10.17487/RFC1833, August 1995, 730 . 732 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 733 Requirement Levels", BCP 14, RFC 2119, 734 DOI 10.17487/RFC2119, March 1997, 735 . 737 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 738 Cheshire, "Internet Assigned Numbers Authority (IANA) 739 Procedures for the Management of the Service Name and 740 Transport Protocol Port Number Registry", BCP 165, 741 RFC 6335, DOI 10.17487/RFC6335, August 2011, 742 . 744 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 745 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 746 March 2015, . 748 [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 749 Security Version 3", RFC 7861, DOI 10.17487/RFC7861, 750 November 2016, . 752 [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor 753 Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, 754 November 2016, . 756 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 757 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 758 May 2017, . 760 [RFC8881] Noveck, D., Ed. and C. Lever, "Network File System (NFS) 761 Version 4 Minor Version 1 Protocol", RFC 8881, 762 DOI 10.17487/RFC8881, August 2020, 763 . 765 9.2. Informative References 767 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 768 specification", RFC 1094, DOI 10.17487/RFC1094, March 769 1989, . 771 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 772 Version 3 Protocol Specification", RFC 1813, 773 DOI 10.17487/RFC1813, June 1995, 774 . 776 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 777 Garcia, "A Remote Direct Memory Access Protocol 778 Specification", RFC 5040, DOI 10.17487/RFC5040, October 779 2007, . 781 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 782 Data Placement over Reliable Transports", RFC 5041, 783 DOI 10.17487/RFC5041, October 2007, 784 . 786 [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor 787 Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, 788 . 790 [XNFS] The Open Group, "Protocols for Interworking: XNFS, Version 791 3W", February 1998. 793 Acknowledgments 795 Thanks to Tom Talpey, who contributed the text of Section 5.4.2. 796 David Noveck contributed the text of Section 5.6 and Section 6. The 797 author also wishes to thank Bill Baker and Greg Marsden for their 798 support of this work. 800 Special thanks go to Transport Area Director Magnus Westerlund, NFSV4 801 Working Group Chairs Brian Pawlowski, and David Noveck, and NFSV4 802 Working Group Secretary Thomas Haynes for their support. 804 Author's Address 806 Charles Lever 807 Oracle Corporation 808 United States of America 810 Email: chuck.lever@oracle.com