idnits 2.17.1 draft-ietf-nfsv4-rfc5667bis-07.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 9, 2017) is 2599 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-11) exists of draft-ietf-nfsv4-rfc5666bis-10 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) == Outdated reference: A later version (-11) exists of draft-ietf-nfsv4-versioning-09 -- Obsolete informational reference (is this intentional?): RFC 5666 (Obsoleted by RFC 8166) -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever, Ed. 3 Internet-Draft Oracle 4 Obsoletes: 5667 (if approved) March 9, 2017 5 Intended status: Standards Track 6 Expires: September 10, 2017 8 Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA Version 9 One 10 draft-ietf-nfsv4-rfc5667bis-07 12 Abstract 14 This document specifies Upper Layer Bindings of Network File System 15 (NFS) protocol versions to RPC-over-RDMA Version One, enabling the 16 use of Direct Data Placement. This document obsoletes RFC 5667. 18 Requirements Language 20 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 21 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 22 document are to be interpreted as described in [RFC2119]. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on September 10, 2017. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 This document may contain material from IETF Documents or IETF 57 Contributions published or made publicly available before November 58 10, 2008. The person(s) controlling the copyright in some of this 59 material may not have granted the IETF Trust the right to allow 60 modifications of such material outside the IETF Standards Process. 61 Without obtaining an adequate license from the person(s) controlling 62 the copyright in such materials, this document may not be modified 63 outside the IETF Standards Process, and derivative works of it may 64 not be created outside the IETF Standards Process, except to format 65 it for publication as an RFC or to translate it into languages other 66 than English. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 71 2. Reply Size Estimation . . . . . . . . . . . . . . . . . . . . 3 72 2.1. Short Reply Chunk Retry . . . . . . . . . . . . . . . . . 4 73 3. Upper Layer Binding for NFS Versions 2 and 3 . . . . . . . . 5 74 3.1. Reply Size Estimation . . . . . . . . . . . . . . . . . . 5 75 3.2. RPC Binding Considerations . . . . . . . . . . . . . . . 5 76 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary 77 Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 6 78 4.1. MOUNT, NLM, and NSM Protocols . . . . . . . . . . . . . . 6 79 4.2. NFSACL Protocol . . . . . . . . . . . . . . . . . . . . . 6 80 5. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 81 5.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 7 82 5.2. Reply Size Estimation . . . . . . . . . . . . . . . . . . 8 83 5.3. RPC Binding Considerations . . . . . . . . . . . . . . . 9 84 5.4. NFS COMPOUND Requests . . . . . . . . . . . . . . . . . . 10 85 5.5. NFS Callback Requests . . . . . . . . . . . . . . . . . . 11 86 5.6. Session-Related Considerations . . . . . . . . . . . . . 12 87 5.7. Transport Considerations . . . . . . . . . . . . . . . . 13 88 6. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 14 89 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 90 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 91 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 92 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 93 9.2. Informative References . . . . . . . . . . . . . . . . . 16 94 Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 17 95 Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 18 96 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 98 1. Introduction 100 The RPC-over-RDMA Version One transport may employ direct data 101 placement to convey data payloads associated with RPC transactions 102 [I-D.ietf-nfsv4-rfc5666bis]. To enable successful interoperation, 103 RPC client and server implementations using RPC-over-RDMA Version One 104 must agree which XDR data items and RPC procedures are eligible to 105 use direct data placement (DDP). 107 An Upper Layer Binding specifies this agreement for one RPC Program. 108 Other operational details, such as RPC binding assignments, pairing 109 Write chunks with result data items, and reply size estimation, are 110 also specified by this Binding. 112 This document contains material required of Upper Layer Bindings, as 113 specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS 114 protocol versions: 116 o NFS Version 2 [RFC1094] 118 o NFS Version 3 [RFC1813] 120 o NFS Version 4.0 [RFC7530] 122 o NFS Version 4.1 [RFC5661] 124 o NFS Version 4.2 [RFC7862] 126 Upper Layer Bindings are also provided for auxiliary protocols used 127 with NFS versions 2 and 3. 129 This document assumes the reader is already familiar with concepts 130 and terminology defined in [I-D.ietf-nfsv4-rfc5666bis] and the 131 documents it references. 133 2. Reply Size Estimation 135 During the construction of each RPC Call message, a requester is 136 responsible for allocating appropriate resources for receiving the 137 corresponding Reply message. If the requester expects the RPC Reply 138 message will be larger than its inline threshold, it provides Write 139 and/or Reply chunks wherein the responder can place results and the 140 reply's Payload stream. 142 A reply resource overrun occurs if the RPC Reply Payload stream does 143 not fit into the provided Reply chunk, or no Reply chunk was provided 144 and the Payload stream does not fit inline. This prevents the 145 responder from returning the Upper Layer reply to the requester. 147 Therefore reliable reply size estimation is necessary to ensure 148 successful interoperation. 150 In most cases, the NFS protocol's XDR definition provides enough 151 information to enable an NFS client to predict the maximum size of 152 the expected Reply message. If there are variable-size data items in 153 the result, the maximum size of the RPC Reply message can be 154 estimated as follows: 156 o The client requests only a specific portion of an object (for 157 example, using the "count" and "offset" fields in an NFS READ). 159 o The client limits the number of results (e.g. using the "count" 160 field of an NFS READDIR request). 162 o The client has already cached the size of the whole object it is 163 about to request (say, via a previous NFS GETATTR request). 165 o The client and server have negotiated a maximum size for all calls 166 and responses (using a CREATE_SESSION operation, for instance). 168 2.1. Short Reply Chunk Retry 170 In a few cases, either the size of one or more returned data items or 171 the number of returned data items cannot be known in advance of 172 forming an RPC Call. 174 If an NFS server finds that the NFS client provided inadequate 175 receive resources to return the whole reply, it returns an RPC level 176 error or a transport error, such as ERR_CHUNK. 178 In response to these errors, an NFS client can choose to: 180 o Terminate the RPC transaction immediately with an error, or 182 o Allocate a larger Reply chunk and send the same request as a new 183 RPC transaction (to avoid hitting in a Duplicate Reply Cache). 184 The NFS client should avoid retrying the request indefinitely 185 because a responder may return ERR_CHUNK for a variety of reasons. 187 Subsequent sections of this document discuss exactly which operations 188 might have ultimate difficulty with Reply size estimation. These 189 operations are eligible for "short Reply chunk retry." Unless 190 explicitly mentioned as applicable, short Reply chunk retry should 191 not be used. 193 NFS server implementations can avoid connection loss by first 194 confirming that target RDMA segments are large enough to receive 195 results before initiating explicit RDMA operations. 197 3. Upper Layer Binding for NFS Versions 2 and 3 199 The Upper Layer Binding specification in this section applies to NFS 200 Version 2 [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in 201 this document a "Legacy NFS client" refers to an NFS client using the 202 NFS version 2 or NFS version 3 RPC Programs (100003) to communicate 203 with an NFS server. Likewise, a "Legacy NFS server" is an NFS server 204 communicating with clients using NFS version 2 or NFS version 3. 206 The following XDR data items in NFS versions 2 and 3 are DDP- 207 eligible: 209 o The opaque file data argument in the NFS WRITE procedure 211 o The pathname argument in the NFS SYMLINK procedure 213 o The opaque file data result in the NFS READ procedure 215 o The pathname result in the NFS READLINK procedure 217 All other argument or result data items in NFS versions 2 and 3 are 218 not DDP-eligible. 220 A transport error does not give an indication of whether the server 221 has processed the arguments of the RPC Call, or whether the server 222 has accessed or modified client memory associated with that RPC. 224 3.1. Reply Size Estimation 226 A Legacy NFS client determines the maximum reply size for each 227 operation using the criteria outlined in Section 2. There are no 228 operations in NFS version 2 or 3 that benefit from short Reply chunk 229 retry. 231 3.2. RPC Binding Considerations 233 Legacy NFS servers traditionally listen for clients on UDP and TCP 234 port 2049. Additionally, they register these ports with a local 235 portmapper [RFC1833] service. 237 A Legacy NFS server supporting RPC-over-RDMA Version One on such a 238 network and registering itself with the RPC portmapper MAY choose an 239 arbitrary port, or MAY use the alternative well-known port number for 240 its RPC-over-RDMA service (see Section 8). The chosen port MAY be 241 registered with the RPC portmapper under the netids assigned in 242 [I-D.ietf-nfsv4-rfc5666bis]. 244 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary Protocols 246 NFS versions 2 and 3 are typically deployed with several other 247 protocols, sometimes referred to as "NFS auxiliary protocols." These 248 are distinct RPC Programs that define procedures which are not part 249 of the NFS version 2 or version 3 RPC Programs. The Upper Layer 250 Bindings in this section apply to: 252 o Versions 2 and 3 of the MOUNT protocol [RFC1813] 254 o Versions 1, 3, and 4 of the NLM protocol [RFC1813] 256 o Version 1 of the NSM protocol, described in Chapter 11 of [XNFS] 258 o Version 1 of the NFSACL protocol, which does not have a public 259 definition. NFSACL is treated in this document as a de facto 260 standard, as there are several interoperating implementations. 262 4.1. MOUNT, NLM, and NSM Protocols 264 Typically MOUNT, NLM, and NSM are conveyed via TCP, even in 265 deployments where the NFS RPC Program operates on RPC-over-RDMA 266 Version One. 268 No XDR data item in these protocols is DDP-eligible, therefore a 269 special port assignment for operation on RPC-over-RDMA is not 270 necessary. When a Legacy server supports these RPC Programs on RPC- 271 over-RDMA Version One, it advertises an arbitrarily-chosen service 272 port address via the rpcbind service [RFC1833]. 274 The largest variable-length XDR data items in these protocols is 275 defined in [XNFS]: LM_MAXSTRLEN is 1024 bytes, LM_MAXNAMELEN is 276 LM_MAXSTRLEN + 1, and MAXNETOBJ_SZ is 1024 bytes. Reply size 277 estimation for these protocols uses the criteria outlined in 278 Section 2. There are no operations in these protocols that benefit 279 from short Reply chunk retry. 281 4.2. NFSACL Protocol 283 Legacy clients and servers that support the NFSACL RPC Program 284 typically convey NFSACL procedures on the same connection as NFS RPC 285 Programs. This obviates the need for separate rpcbind queries to 286 discover server support for this RPC Program. 288 ACLs are typically small, but even large ACLs must be encoded and 289 decoded to some degree. Thus no data item in this Upper Layer 290 Protocol is DDP-eligible. 292 For procedures whose replies do not include an ACL object, the size 293 of a reply is determined directly from the NFSACL RPC Program's XDR 294 definition. 296 There is no protocol-specified size limit for NFS version 3 ACLs, and 297 there is no mechanism in either the NFSACL or NFS RPC Programs for a 298 Legacy client to ascertain the largest ACL a Legacy server can 299 return. Legacy client implementations should choose a maximum size 300 for ACLs based on their own internal limits. 302 Because an NFSACL client cannot know in advance how large a returned 303 ACL will be, it can use short Reply chunk retry when an NFSACL GETACL 304 operation encounters a transport error. 306 5. Upper Layer Binding For NFS Version 4 308 The Upper Layer Binding specification in this section applies to RPC 309 Programs defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 310 [RFC5661], and NFS Version 4.2 [RFC7862]. 312 5.1. DDP-Eligibility 314 Only the following XDR data items in the COMPOUND procedure of all 315 NFS version 4 minor versions are DDP-eligible: 317 o The opaque data field in the WRITE4args structure 319 o The linkdata field of the NF4LNK arm in the createtype4 union 321 o The opaque data field in the READ4resok structure 323 o The linkdata field in the READLINK4resok structure 325 o In minor version 2 and newer, the rpc_data field of the 326 read_plus_content union (further restrictions on the use of this 327 data item follow below). 329 5.1.1. READ_PLUS Replies 331 The NFS version 4.2 READ_PLUS operation returns a complex data type 332 [RFC7862]. The rpr_contents field in the result of this operation is 333 an array of read_plus_content unions, one arm of which contains an 334 opaque byte stream (d_data). 336 The size of d_data is limited to the value of the rpa_count field, 337 but the protocol does not bound the number of elements which can be 338 returned in the rpr_contents array. In order to make the size of 339 READ_PLUS replies predictable by NFS version 4.2 clients, the 340 following restrictions are placed on the use of the READ_PLUS 341 operation on an RPC-over-RDMA Version One transport: 343 o An NFS version 4.2 client MUST NOT provide more than one Write 344 chunk for any READ_PLUS operation. When providing a Write chunk 345 for a READ_PLUS operation, an NFS version 4.2 client MUST provide 346 a Write chunk that is either empty (which forces all result data 347 items for this operation to be returned inline) or large enough to 348 receive rpa_count bytes in a single element of the rpr_contents 349 array. 351 o If the Write chunk provided for a READ_PLUS operation by an NFS 352 version 4.2 client is not empty, an NFS version 4.2 server MUST 353 use that chunk for the first element of the rpr_contents array 354 that has an rpc_data arm. 356 o An NFS version 4.2 server MUST NOT return more than two elements 357 in the rpr_contents array of any READ_PLUS operation. It returns 358 as much of the requested byte range as it can fit within these two 359 elements. If the NFS version 4.2 server has not asserted rpr_eof 360 in the reply, the NFS version 4.2 client SHOULD send additional 361 READ_PLUS requests for any remaining bytes. 363 5.2. Reply Size Estimation 365 Within NFS version 4, there are certain variable-length result data 366 items whose maximum size cannot be estimated by clients reliably 367 because there is no protocol-specified size limit on these arrays. 368 These include: 370 o The attrlist4 field 372 o Fields containing ACLs such as fattr4_acl, fattr4_dacl, 373 fattr4_sacl 375 o Fields in the fs_locations4 and fs_locations_info4 data structures 377 o Fields opaque to the NFS version 4 protocol which pertain to pNFS 378 layout metadata, such as loc_body, loh_body, da_addr_body, 379 lou_body, lrf_body, fattr_layout_types and fs_layout_types, 381 5.2.1. Reply Size Estimation for Minor Version 0 383 The NFS version 4.0 protocol itself does not impose any bound on the 384 size of NFS calls or responses. 386 Some of the data items enumerated in Section 5.2 (in particular, the 387 items related to ACLs and fs_locations) make it difficult to predict 388 the maximum size of NFS version 4.0 replies that interrogate 389 variable-length fattr4 attributes. Client implementations might rely 390 on their own internal architectural limits to constrain the reply 391 size, but such limits are not always guaranteed to be reliable. 393 When an especially large fattr4 result is expected, a Reply chunk 394 might be required. An NFS version 4.0 client can use short Reply 395 chunk retry when an NFS COMPOUND containing a GETATTR operation 396 encounters a transport error. 398 The use of NFS COMPOUND operations raises the possibility of requests 399 that combine a non-idempotent operation (e.g. RENAME) with a GETATTR 400 operation that requests one or more variable-length results. This 401 combination should be avoided by ensuring that any GETATTR operation 402 that requests a result of unpredictable length is sent in an NFS 403 COMPOUND by itself. 405 5.2.2. Reply Size Estimation for Minor Version 1 and Newer 407 In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs 408 argument of the CREATE_SESSION operation contains a 409 ca_maxresponsesize field. The value in this field can be taken as 410 the absolute maximum size of replies generated by an NFS version 4.1 411 server. 413 This value can be used in cases where it is not possible to estimate 414 a reply size upper bound precisely. In practice, objects such as 415 ACLs, named attributes, layout bodies, and security labels are much 416 smaller than this maximum. 418 5.3. RPC Binding Considerations 420 NFS version 4 servers are required to listen on TCP port 2049, and 421 they are not required to register with an rpcbind service [RFC7530]. 423 Therefore, an NFS version 4 server supporting RPC-over-RDMA Version 424 One MUST use the alternative well-known port number for its RPC-over- 425 RDMA service (see Section 8). Clients SHOULD connect to this well- 426 known port without consulting the RPC portmapper (as for NFS version 427 4 on TCP transports). 429 5.4. NFS COMPOUND Requests 431 5.4.1. Multiple DDP-eligible Data Items 433 An NFS version 4 COMPOUND procedure can contain more than one 434 operation that carries a DDP-eligible data item. An NFS version 4 435 client provides XDR Position values in each Read chunk to 436 disambiguate which chunk is associated with which argument data item. 437 However NFS version 4 server and client implementations must agree in 438 advance on how to pair Write chunks with returned result data items. 440 In the following list, an "NFS Read" operation refers to any NFS 441 Version 4 operation which has a DDP-eligible result data item (i.e., 442 either a READ, READ_PLUS, or READLINK operation). The mechanism 443 specified in Section 4.3.2 of [I-D.ietf-nfsv4-rfc5666bis]) is applied 444 to this class of operations: 446 o If an NFS version 4 client wishes all DDP-eligible items in an NFS 447 reply to be conveyed inline, it leaves the Write list empty. 449 o The first chunk in the Write list MUST be used by the first READ 450 operation in an NFS version 4 COMPOUND procedure. The next Write 451 chunk is used by the next READ operation, and so on. 453 o If an NFS version 4 client has provided a matching non-empty Write 454 chunk, then the corresponding READ operation MUST return its DDP- 455 eligible data item using that chunk. 457 o If an NFS version 4 client has provided an empty matching Write 458 chunk, then the corresponding READ operation MUST return all of 459 its result data items inline. 461 o If a READ operation returns a union arm which does not contain a 462 DDP-eligible result, and the NFS version 4 client has provided a 463 matching non-empty Write chunk, an NFS version 4 server MUST 464 return an empty Write chunk in that Write list position. 466 o If there are more READ operations than Write chunks, then 467 remaining NFS Read operations in an NFS version 4 COMPOUND that 468 have no matching Write chunk MUST return their results inline. 470 5.4.2. NFS Version 4 COMPOUND Example 472 The following example shows a Write list with three Write chunks, A, 473 B, and C. The NFS version 4 server consumes the provided Write 474 chunks by writing the results of the designated operations in the 475 compound request (READ and READLINK) back to each chunk. 477 Write list: 479 A --> B --> C 481 NFS version 4 COMPOUND request: 483 PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ 484 | | | 485 v v v 486 A B C 488 If the NFS version 4 client does not want to have the READLINK result 489 returned via RDMA, it provides an empty Write chunk for buffer B to 490 indicate that the READLINK result must be returned inline. 492 5.5. NFS Callback Requests 494 The NFS version 4 family of protocols support server-initiated 495 callbacks to notify NFS version 4 clients of events such as recalled 496 delegations. 498 5.5.1. NFS Version 4.0 Callback 500 NFS version 4.0 implementations typically employ a separate TCP 501 connection to handle callback operations, even when the forward 502 channel uses an RPC-over-RDMA Version One transport. 504 No operation in the NFS version 4.0 callback RPC Program conveys a 505 significant data payload. Therefore, no XDR data items in this RPC 506 Program is DDP-eligible. 508 A CB_RECALL reply is small and fixed in size. The CB_GETATTR reply 509 contains a variable-length fattr4 data item. See Section 5.2.1 for a 510 discussion of reply size prediction for this data item. 512 An NFS version 4.0 client advertises netids and ad hoc port addresses 513 for contacting its NFS version 4.0 callback service using the 514 SETCLIENTID operation. 516 5.5.2. NFS Version 4.1 Callback 518 In NFS version 4.1 and newer minor versions, callback operations may 519 appear on the same connection as is used for NFS version 4 forward 520 channel client requests. NFS version 4 clients and servers MUST use 521 the approach described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when 522 backchannel operations are conveyed on RPC-over-RDMA Version One 523 transports. 525 The csa_back_chan_attrs argument of the CREATE_SESSION operation 526 contains a ca_maxresponsesize field. The value in this field can be 527 taken as the absolute maximum size of backchannel replies generated 528 by a replying NFS version 4 client. 530 There are no DDP-eligible data items in callback procedures defined 531 in NFS version 4.1 or NFS version 4.2. However, some callback 532 operations, such as messages that convey device ID information, can 533 be large, in which case a Long Call or Reply might be required. 535 When an NFS version 4.1 client can support Long Calls in its 536 backchannel, it reports a backchannel ca_maxrequestsize that is 537 larger than the connection's inline thresholds. Otherwise an NFS 538 version 4 server MUST use only Short messages to convey backchannel 539 operations. 541 5.6. Session-Related Considerations 543 The presence of an NFS session (defined in [RFC5661]) has no effect 544 on the operation of RPC-over-RDMA Version One. None of the 545 operations introduced to support NFS sessions (e.g. the SEQUENCE 546 operation) contain DDP-eligible data items. There is no need to 547 match the number of session slots with the number of available RPC- 548 over-RDMA credits. 550 However, there are a few new cases where an RPC transaction can fail. 551 For example, a requester might receive, in response to an RPC 552 request, an RDMA_ERROR message with an rdma_err value of ERR_CHUNK, 553 or an RDMA_MSG containing an RPC_GARBAGEARGS reply. These situations 554 are no different from existing RPC errors which an NFS session 555 implementation is already prepared to handle for other transports. 556 And as with other transports during such a failure, there might be no 557 SEQUENCE result available to the requester to distinguish whether 558 failure occurred before or after the requested operations were 559 executed on the responder. 561 When a transport error occurs (e.g. RDMA_ERROR), the requester 562 proceeds as usual to match the incoming XID value to a waiting RPC 563 Call. The RPC transaction is terminated, and the result status is 564 reported to the Upper Layer Protocol. The requester's session 565 implementation then determines the session ID and slot for the failed 566 request, and performs slot recovery to make that slot usable again. 567 If this is not done, that slot could be rendered permanently 568 unavailable. 570 5.7. Transport Considerations 572 5.7.1. Congestion Avoidance 574 Section 3.1 of [RFC7530] states: 576 Where an NFSv4 implementation supports operation over the IP 577 network protocol, the supported transport layer between NFS and IP 578 MUST be an IETF standardized transport protocol that is specified 579 to avoid network congestion; such transports include TCP and the 580 Stream Control Transmission Protocol (SCTP). 582 Section 2.9.1 of [RFC5661] also states: 584 Even if NFSv4.1 is used over a non-IP network protocol, it is 585 RECOMMENDED that the transport support congestion control. 587 It is permissible for a connectionless transport to be used under 588 NFSv4.1; however, reliable and in-order delivery of data combined 589 with congestion control by the connectionless transport is 590 REQUIRED. As a consequence, UDP by itself MUST NOT be used as an 591 NFSv4.1 transport. 593 RPC-over-RDMA Version One is constructed on a platform of RDMA 594 Reliable Connections [I-D.ietf-nfsv4-rfc5666bis] [RFC5041]. RDMA 595 Reliable Connections are reliable, connection-oriented transports 596 that guarantee in-order delivery, meeting all above requirements for 597 NFS version 4 transports. 599 5.7.2. Retransmission and Keep-alive 601 NFS version 4 client implementations often rely on a transport-layer 602 keep-alive mechanism to detect when an NFS version 4 server has 603 become unresponsive. When an NFS server is no longer responsive, 604 client-side keep-alive terminates the connection, which in turn 605 triggers reconnection and RPC retransmission. 607 Some RDMA transports (such as Reliable Connections on InfiniBand) 608 have no keep-alive mechanism. Without a disconnect or new RPC 609 traffic, such connections can remain alive long after an NFS server 610 has become unresponsive. Once an NFS client has consumed all 611 available RPC-over-RDMA credits on that transport connection, it will 612 forever await a reply before sending another RPC request. 614 NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use 615 for periodic server or connection health assessment. This credit can 616 be used to drive an RPC request on an otherwise idle connection, 617 triggering either a quick affirmative server response or immediate 618 connection termination. 620 In addition to network partition and request loss scenarios, RPC- 621 over-RDMA transport connections can be terminated when a Transport 622 header is malformed, Reply messages are larger than receive 623 resources, or when too many RPC-over-RDMA messages are sent at once. 624 In such cases: 626 o If there is a transport error indicated (ie, RDMA_ERROR) before 627 the disconnect or instead of a disconnect, the requester MUST 628 respond to that error as prescribed by the specification of the 629 RPC transport. Then the NFS version 4 rules for handling 630 retransmission apply. 632 o If there is a transport disconnect and the responder has provided 633 no other response for a request, then only the NFS version 4 rules 634 for handling retransmission apply. 636 6. Extending NFS Upper Layer Bindings 638 RPC Programs such as NFS are required to have an Upper Layer Binding 639 specification to interoperate on RPC-over-RDMA Version One transports 640 [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer 641 Binding specified in this document can be extended to cover versions 642 of the NFS version 4 protocol specified after NFS version 4 minor 643 version 2, or separately published extensions to an existing NFS 644 version 4 minor version, as described in [I-D.ietf-nfsv4-versioning]. 646 7. Security Considerations 648 RPC-over-RDMA Version One supports all RPC security models, including 649 RPCSEC_GSS security and transport-level security [RFC2203]. The 650 choice of what Direct Data Placement mechanism to convey RPC argument 651 and results does not affect this, since it changes only the method of 652 data transfer. Specifically, the requirements of 653 [I-D.ietf-nfsv4-rfc5666bis] ensure that this choice does not 654 introduce new vulnerabilities. 656 Because this document defines only the binding of the NFS protocols 657 atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security 658 considerations are therefore to be described at that layer. 660 8. IANA Considerations 662 The use of direct data placement in NFS introduces a need for an 663 additional port number assignment for networks that share traditional 664 UDP and TCP port spaces with RDMA services. The iWARP protocol is 665 such an example [RFC5041] [RFC5040]. 667 For this purpose, a set of transport protocol port number assignments 668 is specified by this document. IANA has assigned the following ports 669 for NFS/RDMA in the IANA port registry, according to the guidelines 670 described in [RFC6335]. 672 nfsrdma 20049/tcp Network File System (NFS) over RDMA 673 nfsrdma 20049/udp Network File System (NFS) over RDMA 674 nfsrdma 20049/sctp Network File System (NFS) over RDMA 676 This document should be listed as the reference for the nfsrdma port 677 assignments. This document does not alter these assignments. 679 9. References 681 9.1. Normative References 683 [I-D.ietf-nfsv4-rfc5666bis] 684 Lever, C., Simpson, W., and T. Talpey, "Remote Direct 685 Memory Access Transport for Remote Procedure Call, Version 686 One", draft-ietf-nfsv4-rfc5666bis-10 (work in progress), 687 February 2017. 689 [I-D.ietf-nfsv4-rpcrdma-bidirection] 690 Lever, C., "Bi-directional Remote Procedure Call On RPC- 691 over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- 692 bidirection-08 (work in progress), March 2017. 694 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 695 RFC 1833, DOI 10.17487/RFC1833, August 1995, 696 . 698 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 699 Requirement Levels", BCP 14, RFC 2119, 700 DOI 10.17487/RFC2119, March 1997, 701 . 703 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 704 Specification", RFC 2203, DOI 10.17487/RFC2203, September 705 1997, . 707 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 708 "Network File System (NFS) Version 4 Minor Version 1 709 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 710 . 712 [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. 713 Cheshire, "Internet Assigned Numbers Authority (IANA) 714 Procedures for the Management of the Service Name and 715 Transport Protocol Port Number Registry", BCP 165, 716 RFC 6335, DOI 10.17487/RFC6335, August 2011, 717 . 719 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 720 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 721 March 2015, . 723 [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor 724 Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, 725 November 2016, . 727 9.2. Informative References 729 [I-D.ietf-nfsv4-versioning] 730 Noveck, D., "Rules for NFSv4 Extensions and Minor 731 Versions", draft-ietf-nfsv4-versioning-09 (work in 732 progress), December 2016. 734 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 735 specification", RFC 1094, DOI 10.17487/RFC1094, March 736 1989, . 738 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 739 Version 3 Protocol Specification", RFC 1813, 740 DOI 10.17487/RFC1813, June 1995, 741 . 743 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 744 Garcia, "A Remote Direct Memory Access Protocol 745 Specification", RFC 5040, DOI 10.17487/RFC5040, October 746 2007, . 748 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 749 Data Placement over Reliable Transports", RFC 5041, 750 DOI 10.17487/RFC5041, October 2007, 751 . 753 [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access 754 Transport for Remote Procedure Call", RFC 5666, 755 DOI 10.17487/RFC5666, January 2010, 756 . 758 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 759 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 760 January 2010, . 762 [XNFS] The Open Group, "Protocols for Interworking: XNFS, Version 763 3W", February 1998. 765 Appendix A. Changes Since RFC 5667 767 Corrections and updates made necessary by new language in 768 [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, 769 references to deprecated features of RPC-over-RDMA Version One, such 770 as RDMA_MSGP, and the use of the Read list for handling RPC replies, 771 have been removed. The term "mapping" has been replaced with the 772 term "binding" or "Upper Layer Binding" throughout the document. 773 Material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] has 774 been deleted. 776 Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer 777 Bindings that was not present in [RFC5667] has been added. A 778 complete discussion of reply size estimation has been introduced for 779 all protocols covered by the Upper Layer Bindings in this document. 781 Technical corrections have been made. For example, the mention of 782 12KB and 36KB inline thresholds have been removed. The reference to 783 a non-existant NFS version 4 SYMLINK operation has been replaced. 785 The discussion of NFS version 4 COMPOUND handling has been completed. 786 Some changes were made to the algorithm for matching DDP-eligible 787 results to Write chunks. 789 Requirements to ignore extra Read or Write chunks have been removed 790 from the NFS version 2 and 3 Upper Layer Binding, as they conflict 791 with [I-D.ietf-nfsv4-rfc5666bis]. 793 A section discussing NFS version 4 retransmission and connection loss 794 has been added. 796 The following additional improvements have been made, relative to 797 [RFC5667]: 799 o An explicit discussion of NFS version 4.0 and NFS version 4.1 800 backchannel operation has replaced the previous treatment of 801 callback operations. 803 o A binding for NFS version 4.2 has been added that includes 804 discussion of new data-bearing operations like READ_PLUS. 806 o A section suggesting a mechanism for periodically assessing 807 connection health has been introduced. 809 o Ambiguous or erroneous uses of RFC2119 terms have been corrected. 811 o References to obsolete RFCs have been updated. 813 o An IANA Considerations Section has been added, which specifies the 814 port assignments for NFS/RDMA. This replaces the example 815 assignment that appeared in [RFC5666]. 817 o Code excerpts have been removed, and figures have been modernized. 819 Appendix B. Acknowledgments 821 The author gratefully acknowledges the work of Brent Callaghan and 822 Tom Talpey on the original NFS Direct Data Placement specification 823 [RFC5667]. The author also wishes to thank Bill Baker and Greg 824 Marsden for their support of this work. 826 Dave Noveck provided excellent review, constructive suggestions, and 827 consistent navigational guidance throughout the process of drafting 828 this document. Dave also contributed the text of Section 5.6 and 829 Section 6, and insisted on precise discussion of reply size 830 estimation. 832 Thanks to Karen Deitke for her sharp observations about idempotency, 833 and the clarity of the discussion of NFS COMPOUNDs and NFS sessions. 835 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 836 Working Group Chair Spencer Shepler, and nfsv4 Working Group 837 Secretary Thomas Haynes for their support. 839 Author's Address 840 Charles Lever (editor) 841 Oracle Corporation 842 1015 Granger Avenue 843 Ann Arbor, MI 48104 844 USA 846 Phone: +1 248 816 6463 847 Email: chuck.lever@oracle.com