idnits 2.17.1 draft-ietf-nfsv4-nfs-ulb-v2-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (4 July 2020) is 1385 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-07) exists of draft-ietf-nfsv4-rpcrdma-version-two-02 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever 3 Internet-Draft Oracle 4 Intended status: Standards Track 4 July 2020 5 Expires: 5 January 2021 7 Network File System (NFS) Upper-Layer Binding To RPC-Over-RDMA Version 2 8 draft-ietf-nfsv4-nfs-ulb-v2-02 10 Abstract 12 This document specifies Upper-Layer Bindings of Network File System 13 (NFS) protocol versions to RPC-over-RDMA version 2. 15 Note 17 Discussion of this draft takes place on the NFSv4 working group 18 mailing list (nfsv4@ietf.org), which is archived at 19 https://mailarchive.ietf.org/arch/browse/nfsv4/. Working Group 20 information can be found at https://datatracker.ietf.org/wg/nfsv4/ 21 about/. 23 This note is to be removed before publishing as an RFC. 25 The source for this draft is maintained in GitHub. Suggested changes 26 can be submitted as pull requests at https://github.com/chucklever/ 27 i-d-nfs-ulb-v2. Instructions are on that page as well. 29 Status of This Memo 31 This Internet-Draft is submitted in full conformance with the 32 provisions of BCP 78 and BCP 79. 34 Internet-Drafts are working documents of the Internet Engineering 35 Task Force (IETF). Note that other groups may also distribute 36 working documents as Internet-Drafts. The list of current Internet- 37 Drafts is at https://datatracker.ietf.org/drafts/current/. 39 Internet-Drafts are draft documents valid for a maximum of six months 40 and may be updated, replaced, or obsoleted by other documents at any 41 time. It is inappropriate to use Internet-Drafts as reference 42 material or to cite them other than as "work in progress." 44 This Internet-Draft will expire on 5 January 2021. 46 Copyright Notice 48 Copyright (c) 2020 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents (https://trustee.ietf.org/ 53 license-info) in effect on the date of publication of this document. 54 Please review these documents carefully, as they describe your rights 55 and restrictions with respect to this document. Code Components 56 extracted from this document must include Simplified BSD License text 57 as described in Section 4.e of the Trust Legal Provisions and are 58 provided without warranty as described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 63 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 3 64 3. Reply Size Estimation . . . . . . . . . . . . . . . . . . . . 4 65 4. Upper-Layer Binding for NFS Versions 2 and 3 . . . . . . . . 4 66 4.1. Reply Size Estimation . . . . . . . . . . . . . . . . . . 4 67 4.2. RPC Binding Considerations . . . . . . . . . . . . . . . 5 68 4.3. Transport Considerations . . . . . . . . . . . . . . . . 5 69 5. Upper-Layer Bindings for NFS Version 2 and 3 Auxiliary 70 Protocols . . . . . . . . . . . . . . . . . . . . . . . . 6 71 5.1. MOUNT, NLM, and NSM Protocols . . . . . . . . . . . . . . 6 72 5.2. NFSACL Protocol . . . . . . . . . . . . . . . . . . . . . 7 73 6. Upper-Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 74 6.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 7 75 6.2. Reply Size Estimation . . . . . . . . . . . . . . . . . . 8 76 6.3. RPC Binding Considerations . . . . . . . . . . . . . . . 9 77 6.4. NFS COMPOUND Requests . . . . . . . . . . . . . . . . . . 9 78 6.5. NFS Callback Requests . . . . . . . . . . . . . . . . . . 12 79 6.6. Session-Related Considerations . . . . . . . . . . . . . 13 80 6.7. Transport Considerations . . . . . . . . . . . . . . . . 14 81 7. Extending NFS Upper-Layer Bindings . . . . . . . . . . . . . 15 82 8. Security Considerations . . . . . . . . . . . . . . . . . . . 16 83 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 84 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 16 85 10.1. Normative References . . . . . . . . . . . . . . . . . . 16 86 10.2. Informative References . . . . . . . . . . . . . . . . . 17 87 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 18 88 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 90 1. Introduction 92 The RPC-over-RDMA version 2 transport may employ direct data 93 placement to convey data payloads associated with RPC transactions 94 [I-D.ietf-nfsv4-rpcrdma-version-two]. RPC client and server 95 implementations using RPC-over-RDMA version 2 must agree which XDR 96 data items and RPC procedures are eligible to use direct data 97 placement (DDP) to ensure successful interoperation. 99 An Upper-Layer Binding specifies this agreement for one or more 100 versions of one RPC program. Other operational details, such as RPC 101 binding assignments, pairing Write chunks with result data items, and 102 reply size estimation, are also specified by this Binding. 104 This document contains material required of Upper-Layer Bindings, as 105 specified in [I-D.ietf-nfsv4-rpcrdma-version-two], for the following 106 NFS protocol versions: 108 * NFS version 2 [RFC1094] 110 * NFS version 3 [RFC1813] 112 * NFS version 4.0 [RFC7530] 114 * NFS version 4.1 [RFC5661] 116 * NFS version 4.2 [RFC7862] 118 The current document also provides Upper-Layer Bindings for auxiliary 119 protocols used with NFS versions 2 and 3 (see Section 5). 121 This document assumes the reader is already familiar with concepts 122 and terminology defined in [I-D.ietf-nfsv4-rpcrdma-version-two] and 123 the documents it references. 125 2. Requirements Language 127 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 128 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 129 "OPTIONAL" in this document are to be interpreted as described in 130 BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all 131 capitals, as shown here. 133 3. Reply Size Estimation 135 During the construction of each RPC Call message, a Requester is 136 responsible for allocating appropriate resources for receiving the 137 corresponding Reply message. If the Requester expects that the RPC 138 Reply message could be larger than its inline threshold, it MAY 139 provide Write chunks wherein the Responder can place results and 140 Reply chunks wherein the Responder can place the reply's Payload 141 stream. A message continuation facility is also available in RPC- 142 over-RDMA version 2 to convey RPC messages that are larger than the 143 transport's inline threshold. 145 4. Upper-Layer Binding for NFS Versions 2 and 3 147 The Upper-Layer Binding specification in this section applies to NFS 148 version 2 [RFC1094] and NFS version 3 [RFC1813]. For brevity, in 149 this document, a "Legacy NFS client" refers to an NFS client using 150 version 2 or version 3 of the NFS RPC program (100003) to communicate 151 with an NFS server. Likewise, a "Legacy NFS server" is an NFS server 152 communicating with clients using NFS version 2 or NFS version 3. 154 The following XDR data items in NFS versions 2 and 3 are DDP- 155 eligible: 157 * The opaque file data argument in the NFS WRITE procedure 159 * The pathname argument in the NFS SYMLINK procedure 161 * The opaque file data result in the NFS READ procedure 163 * The pathname result in the NFS READLINK procedure 165 All other argument or result data items in NFS versions 2 and 3 are 166 not DDP-eligible. 168 Whether or not an NFS operation is considered non-idempotent, a 169 transport error might not indicate whether the server has processed 170 the arguments of the RPC Call, or whether the server has accessed or 171 modified client memory associated with that RPC. 173 4.1. Reply Size Estimation 175 A Legacy NFS client determines the maximum reply size for each 176 operation using the criteria outlined in Section 3. 178 4.2. RPC Binding Considerations 180 Legacy NFS servers traditionally listen for clients on UDP and TCP 181 port 2049. Additionally, they register these ports with a local 182 portmapper service [RFC1833]. 184 A Legacy NFS server supporting RPC-over-RDMA version 2 and 185 registering itself with the RPC portmapper MAY choose an arbitrary 186 port, or MAY use the alternative well-known port number for its RPC- 187 over-RDMA service (see Section 9). The chosen port MAY be registered 188 with the RPC portmapper using the netids assigned in 189 [I-D.ietf-nfsv4-rpcrdma-version-two]. 191 4.3. Transport Considerations 193 Legacy NFS client implementations often rely on a transport-layer 194 keep-alive mechanism to detect when a legacy server has become 195 unresponsive. When an NFS server is no longer responsive, client- 196 side keep-alive terminates the connection, which in turn triggers 197 reconnection and retransmission of outstanding RPC transactions. 199 4.3.1. Keep-Alive 201 Some RDMA transports (such as the Reliable Connected QP type on 202 InfiniBand) have no keep-alive mechanism. Without a disconnect or 203 new RPC traffic, such connections can remain alive long after an NFS 204 server has become unresponsive or unreachable. Once an NFS client 205 has consumed all available RPC-over-RDMA version 2 credits on that 206 transport connection, it awaits a reply indefinitely before sending 207 another RPC request. 209 Legacy NFS clients SHOULD reserve one RPC-over-RDMA version 2 credit 210 to use for periodic server or connection health assessment. Either 211 peer can use this credit to drive an RPC request on an otherwise idle 212 connection, triggering either an affirmative server response or a 213 connection termination. 215 4.3.2. Replay Detection 217 Legacy NFS servers typically employ request replay detection to 218 reduce the risk of data corruption that could result when an NFS 219 client retransmits a non-idempotent NFS request. A legacy NFS server 220 can send a cached response when a replay is detected, rather than 221 executing the request again. Replay detection is not perfect, but it 222 is usually adequate. 224 For legacy NFS servers, replay detection commonly utilizes heuristic 225 indicators such as the IP address of the NFS client, the source port 226 of the connection, the transaction ID of the request, and the 227 contents of the request's RPC and upper-layer protocol headers. In 228 short, replay detection is typically based on a connection tuple and 229 the request's XID. A legacy NFS client is careful to re-use the same 230 source port, if practical, when reconnecting so that legacy NFS 231 servers are better able to detect retransmissions. 233 However, a legacy NFS client operating over an RDMA transport has no 234 control over connection source ports. It is almost certain that an 235 RPC request that is retransmitted on a new connection can never be 236 detected as a replay if the legacy NFS server includes the connection 237 source port in its replay detection heuristics. 239 Therefore a legacy NFS server using an RDMA transport should never 240 use a legacy NFS client connection's source port as part of its NFS 241 request replay detection mechanism. 243 5. Upper-Layer Bindings for NFS Version 2 and 3 Auxiliary Protocols 245 Storage administrators typically deploy NFS versions 2 and 3 with 246 several other protocols, sometimes referred to as "NFS auxiliary 247 protocols." These are distinct RPC programs that define procedures 248 that are not part of the NFS RPC program (100003). The Upper-Layer 249 Bindings in this section apply to: 251 * Versions 2 and 3 of the MOUNT RPC program (100005) [RFC1813] 253 * Versions 1, 3, and 4 of the NLM RPC program (100021) [RFC1813] 255 * Version 1 of the NSM RPC program (100024), described in Chapter 11 256 of [XNFS] 258 * Version 1 of the NFSACL RPC program (100227), which does not have 259 a public definition. NFSACL is treated in this document as a de 260 facto standard, as there are several interoperating 261 implementations. 263 5.1. MOUNT, NLM, and NSM Protocols 265 Historically, NFS/RDMA implementations have chosen to convey the 266 MOUNT, NLM, and NSM protocols via TCP. A legacy NFS server 267 implementation MUST provide support for these protocols via TCP to 268 enable interoperation of these protocols when NFS/RDMA is in use. 270 5.2. NFSACL Protocol 272 Often legacy clients and servers that support the NFSACL RPC program 273 convey NFSACL procedures on the same connection as the NFS RPC 274 program (100003). Utilizing the same connection obviates the need 275 for separate rpcbind queries to discover server support for this RPC 276 program. 278 ACLs are typically small, but even large ACLs must be encoded and 279 decoded to some degree before being made available to users. Thus no 280 data item in this Upper-Layer Protocol is DDP-eligible. 282 For procedures whose replies do not include an ACL object, the size 283 of a reply is determined directly from the NFSACL RPC program's XDR 284 definition. Legacy client implementations should choose a maximum 285 size for ACLs based on internal limits. 287 6. Upper-Layer Binding For NFS Version 4 289 The Upper-Layer Binding specification in this section applies to 290 versions of the NFS RPC program defined in NFS version 4.0 [RFC7530] 291 NFS version 4.1 [RFC5661] and NFS version 4.2 [RFC7862]. 293 6.1. DDP-Eligibility 295 Only the following XDR data items in the COMPOUND procedure of all 296 NFS version 4 minor versions are DDP-eligible: 298 * The opaque data field in the WRITE4args structure 300 * The linkdata field of the NF4LNK arm in the createtype4 union 302 * The opaque data field in the READ4resok structure 304 * The linkdata field in the READLINK4resok structure 306 6.1.1. The NFSv4.2 READ_PLUS operation 308 NFS version 4.2 introduces an enhanced READ operation called 309 READ_PLUS [RFC7862]. READ_PLUS enables an NFS server to perform 310 inline data reduction of READ results so that the returned READ data 311 is more compact. 313 In a READ_PLUS result, returned file content appears as a list of one 314 or more of the following items: 316 * Regular data content: the same as the result of a traditional READ 317 operation. 319 * Unallocated space in a file: where no data has yet been written or 320 previously-written data has been removed via a hole-punch 321 operation. 323 * A counted pattern. 325 Upon receipt of a READ_PLUS result, an NFSv4.2 client expands the 326 returned list into a preferred local representation of the original 327 file content. 329 Before receiving that result, an NFSv4.2 client typically does not 330 know how the file's content is organized on the NFS server. Thus it 331 is not possible to predict the size or structure of a READ_PLUS Reply 332 in advance. The use of direct data placement is therefore 333 challenging. 335 A READ_PLUS content list containing more than one segment of regular 336 file data could be conveyed using multiple Write chunks, but only if 337 the client knows in advance where those chunks appear in the Reply 338 Payload stream. Moreover, the usual benefits of hardware-assisted 339 data placement are entirely waived if the client-side transport must 340 parse the result of each read I/O. 342 Therefore this Upper Layer Binding does not make any element of an 343 NFSv4.2 READ_PLUS Reply DDP-eligible. Further, this Upper Layer 344 Binding recommends that implementers disable the use of the READ_PLUS 345 operation on NFS/RDMA mount points. 347 6.2. Reply Size Estimation 349 Within NFS version 4, there are certain variable-length result data 350 items whose maximum size cannot be estimated by clients reliably 351 because there is no protocol-specified size limit on these result 352 arrays. These include: 354 * The attrlist4 field 356 * Fields containing ACLs such as fattr4_acl, fattr4_dacl, and 357 fattr4_sacl 359 * Fields in the fs_locations4 and fs_locations_info4 data structures 361 * Fields which pertain to pNFS layout metadata, such as loc_body, 362 loh_body, da_addr_body, lou_body, lrf_body, fattr_layout_types, 363 and fs_layout_types 365 6.2.1. Reply Size Estimation for Minor Version 0 367 The NFS version 4.0 protocol itself does not impose any bound on the 368 size of NFS calls or replies. 370 Some of the data items enumerated in Section 6.2 (in particular, the 371 items related to ACLs and fs_locations) make it difficult to predict 372 the maximum size of NFS version 4.0 replies that interrogate 373 variable-length fattr4 attributes. Client implementations might rely 374 upon internal architectural limits to constrain the reply size, but 375 such limits are not always guaranteed to be reliable. 377 When an NFS version 4.0 client expects an especially sizeable fattr4 378 result, it can provide a Reply chunk to enable that server to return 379 that result via explicit RDMA. An NFS version 4.0 client can use 380 short Reply chunk retry when an NFS COMPOUND containing a GETATTR 381 operation encounters a transport error. 383 6.2.2. Reply Size Estimation for Minor Version 1 and Newer 385 In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs 386 argument of the CREATE_SESSION operation contains a 387 ca_maxresponsesize field. The value in this field can be taken as 388 the absolute maximum size of replies generated by an NFS version 4.1 389 server. 391 An NFS version 4 client can use this value in cases where it is not 392 possible to estimate a reply size upper bound precisely. In 393 practice, objects such as ACLs, named attributes, layout bodies, and 394 security labels are much smaller than this maximum. 396 6.3. RPC Binding Considerations 398 NFS version 4 servers are required to listen on TCP port 2049, and 399 they are not required to register with a rpcbind service [RFC7530]. 401 Therefore, an NFS version 4 server supporting RPC-over-RDMA version 2 402 MUST use the alternative well-known port number for its RPC-over-RDMA 403 service (see Section 9 Clients SHOULD connect to this well-known port 404 without consulting the RPC portmapper (as for NFS version 4 on TCP 405 transports). 407 6.4. NFS COMPOUND Requests 408 6.4.1. Multiple DDP-eligible Data Items 410 An NFS version 4 COMPOUND procedure can contain more than one 411 operation that carries a DDP-eligible data item. An NFS version 4 412 client provides XDR Position values in each Read chunk to 413 disambiguate which chunk is associated with which argument data item. 414 However, NFS version 4 server and client implementations must agree 415 in advance on how to pair Write chunks with returned result data 416 items. 418 In the following lists, a "READ operation" refers to any NFS version 419 4 operation that has a DDP-eligible result data item. An NFS version 420 4 client applies the mechanism specified in Section 4.3.2 of 421 [I-D.ietf-nfsv4-rpcrdma-version-two] is applied to this class of 422 operations as follows: 424 * If an NFS version 4 client wishes all DDP-eligible items in an NFS 425 reply to be conveyed inline, it leaves the Write list empty. 427 An NFS version 4 server applies that mechanism as follows: 429 * The first chunk in the Write list MUST be used by the first READ 430 operation in an NFS version 4 COMPOUND procedure. The next Write 431 chunk is used by the next READ operation, and so on. 433 * If an NFS version 4 client has provided a matching non-empty Write 434 chunk, then the corresponding READ operation MUST return its DDP- 435 eligible data item using that chunk. 437 * If an NFS version 4 client has provided an empty matching Write 438 chunk, then the corresponding READ operation MUST return all of 439 its result data items inline. 441 * If a READ operation returns a union arm which does not contain a 442 DDP-eligible result, and the NFS version 4 client has provided a 443 matching non-empty Write chunk, an NFS version 4 server MUST 444 return an empty Write chunk in that Write list position. 446 * If there are more READ operations than Write chunks, then 447 remaining NFS Read operations in an NFS version 4 COMPOUND that 448 have no matching Write chunk MUST return their results inline. 450 6.4.2. Chunk List Complexity 452 The RPC-over-RDMA version 2 protocol does not place any limit on the 453 number of chunks or segments that may appear in Read or Write lists. 454 However, for various reasons, NFS version 4 server implementations 455 often have practical limits on the number of chunks or segments they 456 can process in a single RPC transaction conveyed via RPC-over-RDMA 457 version 2. 459 These implementation limits are especially important when Kerberos 460 integrity or privacy is in use [RFC7861]. GSS services increase the 461 size of credential material in RPC headers, potentially requiring the 462 use of a Long message, which increases the complexity of chunk lists 463 independent of the particular NFS version 4 COMPOUND being conveyed. 465 In the absence of explicit knowledge of the server's limits, NFS 466 version 4 clients SHOULD follow the prescriptions listed below when 467 constructing RPC-over-RDMA version 2 messages. NFS version 4 servers 468 MUST accept and process all such requests. 470 * The Read list can contain either a Position-Zero Read chunk, one 471 Read chunk with a non-zero Position, or both. 473 * The Write list can contain no more than one Write chunk. 475 * Any chunk can contain up to sixteen RDMA segments. 477 NFS version 4 clients wishing to send more complex chunk lists can 478 provide configuration interfaces to bound the complexity of NFS 479 version 4 COMPOUNDs, limit the number of elements in scatter-gather 480 operations, and avoid other sources of chunk overruns at the 481 receiving peer. 483 If an NFS version 4 server receives an RPC request via RPC-over-RDMA 484 version 2 that it cannot process due to chunk list complexity limits, 485 it SHOULD return one of the following responses to the client: 487 * A problem is detected by the transport layer while parsing the 488 transport header in an RPC Call message. The server responds with 489 an RDMA2_ERROR message with the err field set to ERR_CHUNK. 491 * A problem is detected during XDR decoding of the RPC Call message 492 while the RPC layer reassembles the call's XDR stream. The server 493 responds with an RPC reply with its "reply_stat" field set to 494 MSG_ACCEPTED and its "accept_stat" field set to GARBAGE_ARGS. 496 After receiving one of these errors, an NFS version 4 client SHOULD 497 NOT retransmit the failing request, as the result would be the same 498 error. It SHOULD immediately terminate the RPC transaction 499 associated with the XID in the reply. 501 6.4.3. NFS Version 4 COMPOUND Example 503 The following example shows a Write list with three Write chunks, A, 504 B, and C. The NFS version 4 server consumes the provided Write 505 chunks by writing the results of the designated operations in the 506 compound request (READ and READLINK) back to each chunk. 508 Write list: 510 A --> B --> C 512 NFS version 4 COMPOUND request: 514 PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ 515 | | | 516 v v v 517 A B C 519 If the NFS version 4 client does not want to have the READLINK result 520 returned via RDMA, it provides an empty Write chunk for buffer B to 521 indicate that the READLINK result must be returned inline. 523 6.5. NFS Callback Requests 525 The NFS version 4 family of protocols support server-initiated 526 callbacks to notify NFS version 4 clients of events such as recalled 527 delegations. 529 6.5.1. NFS Version 4.0 Callback 531 NFS version 4.0 implementations typically employ a separate TCP 532 connection to handle callback operations, even when the forward 533 channel uses an RPC-over-RDMA version 2 transport. 535 No operation in the NFS version 4.0 callback RPC program conveys a 536 data payload of significant size. Therefore, no XDR data items in 537 this RPC program is DDP-eligible. 539 A CB_RECALL reply is small and fixed in size. The CB_GETATTR reply 540 contains a variable-length fattr4 data item. See Section 6.2.1 for a 541 discussion of reply size prediction for this data item. 543 An NFS version 4.0 client advertises netids and ad hoc port addresses 544 for contacting its NFS version 4.0 callback service using the 545 SETCLIENTID operation. 547 6.5.2. NFS Version 4.1 Callback 549 In NFS version 4.1 and newer minor versions, callback operations may 550 appear on the same connection as is used for NFS version 4 forward 551 channel client requests. NFS version 4 clients and servers MUST use 552 the approach described in [RFC8167] to convey backchannel operations 553 on an RPC-over-RDMA version 2 transport. 555 The csa_back_chan_attrs argument of the CREATE_SESSION operation 556 contains a ca_maxresponsesize field. The value in this field is the 557 absolute maximum size of backchannel replies generated by a replying 558 NFS version 4 client. 560 There are no DDP-eligible data items in callback procedures defined 561 in NFS version 4.1 or NFS version 4.2. However, some callback 562 operations, such as messages that convey device ID information, can 563 be sizeable. A sender can use Message Continuation or a Long message 564 in this situation. 566 When an NFS version 4.1 client can support Long Calls in its 567 backchannel, it reports a backchannel ca_maxrequestsize that is 568 larger than the connection's inline thresholds. Otherwise, an NFS 569 version 4 server MUST use only Short messages to convey backchannel 570 operations. 572 6.6. Session-Related Considerations 574 The presence of an NFS version 4 session (as defined in [RFC5661]) 575 does not effect the operation of RPC-over-RDMA version 2. None of 576 the operations introduced to support NFS sessions (e.g., the SEQUENCE 577 operation) contain DDP-eligible data items. There is no need to 578 match the number of session slots with the number of available RPC- 579 over-RDMA version 2 credits. 581 However, there are a few new cases where an RPC transaction can fail. 582 For example, a Requester might receive, in response to an RPC 583 request, an RDMA2_ERROR message with a rdma_err value of ERR_CHUNK. 584 These situations are not different from existing RPC errors, which an 585 NFS session implementation can already handle for other transport 586 types. Moreover, there might be no SEQUENCE result available to the 587 Requester to distinguish whether failure occurred before or after the 588 Responder executed the requested operations. 590 When a transport error occurs (e.g., an RDMA2_ERROR type message is 591 received), the Requester proceeds, as usual, to match the incoming 592 XID value to a waiting RPC Call. The Requester terminates the RPC 593 transaction and reports the result status to the RPC consumer. The 594 Requester's session implementation then determines the session ID and 595 slot for the failed request and performs slot recovery to make that 596 slot usable again. Otherwise, that slot could be rendered 597 permanently unavailable. 599 When an NFS session is not present (for example, when NFS version 4.0 600 is in use), a transport error does not indicate whether the server 601 has processed the arguments of the RPC Call, or whether the server 602 has accessed or modified client memory associated with that RPC. 604 6.7. Transport Considerations 606 6.7.1. Congestion Avoidance 608 Section 3.1 of [RFC7530] states: 610 Where an NFS version 4 implementation supports operation over the 611 IP network protocol, the supported transport layer between NFS and 612 IP MUST be an IETF standardized transport protocol that is 613 specified to avoid network congestion; such transports include TCP 614 and the Stream Control Transmission Protocol (SCTP). 616 Section 2.9.1 of [RFC5661] further states: 618 Even if NFS version 4.1 is used over a non-IP network protocol, it 619 is RECOMMENDED that the transport support congestion control. 621 It is permissible for a connectionless transport to be used under 622 NFS version 4.1; however, reliable and in-order delivery of data 623 combined with congestion control by the connectionless transport 624 is REQUIRED. As a consequence, UDP by itself MUST NOT be used as 625 an NFS version 4.1 transport. 627 RPC-over-RDMA version 2 utilizes only RDMA Reliable Connected QP type 628 connections [I-D.ietf-nfsv4-rpcrdma-version-two]. RDMA Reliable 629 Connected QPs are reliable, connection-oriented transports that 630 guarantee in-order delivery, meeting all the above requirements. 632 6.7.2. Retransmission and Keep-alive 634 NFS version 4 client implementations often rely on a transport-layer 635 keep-alive mechanism to detect when an NFS version 4 server has 636 become unresponsive. When an NFS server is no longer responsive, 637 client-side keep-alive terminates the connection, which in turn 638 triggers reconnection and RPC retransmission. 640 Some RDMA transports (such as the Reliable Connected QP type on 641 InfiniBand) have no keep-alive mechanism. Without a disconnect or 642 new RPC traffic, such connections can remain alive long after an NFS 643 server has become unresponsive. Once an NFS client has consumed all 644 available RPC-over-RDMA version 2 credits on that transport 645 connection, it indefinitely awaits a reply before sending another RPC 646 request. 648 NFS version 4 clients SHOULD reserve one RPC-over-RDMA version 2 649 credit to use for periodic server or connection health assessment. 650 Either peer can use this credit to drive an RPC request on an 651 otherwise idle connection, triggering either a quick affirmative 652 server response or immediate connection termination. 654 In addition to network partition and request loss scenarios, RPC- 655 over-RDMA version 2 transport connections can be terminated when a 656 Transport header is malformed, Reply messages exceed receive 657 resources, or when too many RPC-over-RDMA messages are sent at once. 658 In such cases: 660 * If a transport error occurs (e.g., an RDMA2_ERROR type message is 661 received) before the disconnect or instead of a disconnect, the 662 Requester MUST respond to that error as prescribed by the 663 specification of the RPC transport. Then the NFS version 4 rules 664 for handling retransmission apply. 666 * If there is a transport disconnect and the Responder has provided 667 no other response for a request, then only the NFS version 4 rules 668 for handling retransmission apply. 670 7. Extending NFS Upper-Layer Bindings 672 RPC programs such as NFS are required to have an Upper-Layer Binding 673 specification to interoperate on RPC-over-RDMA version 2 transports 674 [I-D.ietf-nfsv4-rpcrdma-version-two]. Via standards action, the 675 Upper-Layer Binding specified in this document can be extended to 676 cover versions of the NFS version 4 protocol specified after NFS 677 version 4 minor version 2, or to cover separately published 678 extensions to an existing NFS version 4 minor version, as described 679 in [RFC8178]. 681 8. Security Considerations 683 RPC-over-RDMA version 2 supports all RPC security models, including 684 RPCSEC_GSS security and transport-level security [RFC7861]. The 685 choice of what Direct Data Placement mechanism to convey RPC argument 686 and results does not affect this since it changes only the method of 687 data transfer. Because the current document defines only the binding 688 of the NFS protocols atop [I-D.ietf-nfsv4-rpcrdma-version-two], all 689 relevant security considerations are, therefore, described at that 690 layer. 692 9. IANA Considerations 694 The use of direct data placement in NFS introduces a need for an 695 additional port number assignment for networks that share traditional 696 UDP and TCP port spaces with RDMA services. The iWARP protocol is 697 such an example [RFC5040] [RFC5041]. 699 For this purpose, the current document specifies a set of transport 700 protocol port number assignments. IANA has assigned the following 701 ports for NFS/RDMA in the IANA port registry, according to the 702 guidelines described in [RFC6335]. 704 nfsrdma 20049/tcp Network File System (NFS) over RDMA 705 nfsrdma 20049/udp Network File System (NFS) over RDMA 706 nfsrdma 20049/sctp Network File System (NFS) over RDMA 708 The current document should be added as a reference for the nfsrdma 709 port assignments. The current document does not alter these 710 assignments. 712 10. References 714 10.1. Normative References 716 [I-D.ietf-nfsv4-rpcrdma-version-two] 717 Lever, C. and D. Noveck, "RPC-over-RDMA Version 2 718 Protocol", Work in Progress, Internet-Draft, draft-ietf- 719 nfsv4-rpcrdma-version-two-02, 3 July 2020, 720 . 723 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 724 RFC 1833, DOI 10.17487/RFC1833, August 1995, 725 . 727 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 728 Requirement Levels", BCP 14, RFC 2119, 729 DOI 10.17487/RFC2119, March 1997, 730 . 732 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 733 "Network File System (NFS) Version 4 Minor Version 1 734 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 735 . 737 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 738 Cheshire, "Internet Assigned Numbers Authority (IANA) 739 Procedures for the Management of the Service Name and 740 Transport Protocol Port Number Registry", BCP 165, 741 RFC 6335, DOI 10.17487/RFC6335, August 2011, 742 . 744 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 745 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 746 March 2015, . 748 [RFC7861] Adamson, A. and N. Williams, "Remote Procedure Call (RPC) 749 Security Version 3", RFC 7861, DOI 10.17487/RFC7861, 750 November 2016, . 752 [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor 753 Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, 754 November 2016, . 756 [RFC8167] Lever, C., "Bidirectional Remote Procedure Call on RPC- 757 over-RDMA Transports", RFC 8167, DOI 10.17487/RFC8167, 758 June 2017, . 760 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 761 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 762 May 2017, . 764 10.2. Informative References 766 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 767 specification", RFC 1094, DOI 10.17487/RFC1094, March 768 1989, . 770 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 771 Version 3 Protocol Specification", RFC 1813, 772 DOI 10.17487/RFC1813, June 1995, 773 . 775 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 776 Garcia, "A Remote Direct Memory Access Protocol 777 Specification", RFC 5040, DOI 10.17487/RFC5040, October 778 2007, . 780 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 781 Data Placement over Reliable Transports", RFC 5041, 782 DOI 10.17487/RFC5041, October 2007, 783 . 785 [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor 786 Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, 787 . 789 [XNFS] The Open Group, "Protocols for Interworking: XNFS, Version 790 3W", February 1998. 792 Acknowledgments 794 Thanks to Tom Talpey, who contributed the text of Section 6.4.2. 795 David Noveck contributed the text of Section 6.6 and Section 7. The 796 author also wishes to thank Bill Baker and Greg Marsden for their 797 support of this work. 799 Special thanks go to Transport Area Director Magnus Westerlund, NFSV4 800 Working Group Chairs Spencer Shepler, Brian Pawlowski, and David 801 Noveck, and NFSV4 Working Group Secretary Thomas Haynes for their 802 support. 804 Author's Address 806 Charles Lever 807 Oracle Corporation 808 United States of America 810 Email: chuck.lever@oracle.com