idnits 2.17.1 draft-ietf-nfsv4-rfc5667bis-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 28, 2016) is 2766 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Outdated reference: A later version (-11) exists of draft-ietf-nfsv4-rfc5666bis-07 == Outdated reference: A later version (-08) exists of draft-ietf-nfsv4-rpcrdma-bidirection-05 ** Obsolete normative reference: RFC 5661 (Obsoleted by RFC 8881) -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever, Ed. 3 Internet-Draft Oracle 4 Obsoletes: 5667 (if approved) September 28, 2016 5 Intended status: Standards Track 6 Expires: April 1, 2017 8 Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA 9 draft-ietf-nfsv4-rfc5667bis-03 11 Abstract 13 This document specifies Upper Layer Bindings of Network File System 14 (NFS) protocol versions to RPC-over-RDMA transports. These bindings 15 are required to enable RPC-based protocols such as NFS to use direct 16 data placement on RPC-over-RDMA transports. This document obsoletes 17 RFC 5667. 19 Requirements Language 21 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 22 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 23 document are to be interpreted as described in [RFC2119]. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on April 1, 2017. 42 Copyright Notice 44 Copyright (c) 2016 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 60 2. Conveying NFS Operations On RPC-Over-RDMA Transports . . . . 3 61 3. NFS Versions 2 And 3 Upper Layer Binding . . . . . . . . . . 4 62 4. NFS Version 4 Upper Layer Binding . . . . . . . . . . . . . . 6 63 5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 13 64 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 65 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 66 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 67 Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 15 68 Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 16 69 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17 71 1. Introduction 73 An RPC-over-RDMA transport, such as defined in 74 [I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to 75 convey data payloads associated with RPC transactions. Each RPC- 76 over-RDMA transport header conveys lists of memory locations 77 corresponding to XDR data items defined in an Upper Layer Protocol 78 (such as NFS). 80 To facilitate interoperation, RPC client and server implementations 81 must agree in advance on what XDR data items in which RPC procedures 82 are eligible for direct data placement (DDP). This document contains 83 material required of Upper Layer Bindings, as specified in 84 [I-D.ietf-nfsv4-rfc5666bis], for the following NFS protocol versions: 86 o NFS Version 2 [RFC1094] 88 o NFS Version 3 [RFC1813] 90 o NFS Version 4.0 [RFC7530] 92 o NFS Version 4.1 [RFC5661] 94 o NFS Version 4.2 [I-D.ietf-nfsv4-minorversion2] 96 2. Conveying NFS Operations On RPC-Over-RDMA Transports 98 Definitions of terminology and a general discussion of how RPC-over- 99 RDMA is used to convey RPC transactions can be found in 100 [I-D.ietf-nfsv4-rfc5666bis]. In this section, these general 101 principals are applied to the specifics of the NFS protocol. 103 2.1. Use Of The Read List 105 The Read list in each RPC-over-RDMA transport header represents a set 106 of memory regions containing DDP-eligible NFS argument data. Large 107 data items, such as the data payload of an NFS version 3 WRITE 108 procedure, are referenced by the Read list. The NFS server pulls 109 such payloads from the client and places them directly into its own 110 memory. 112 XDR unmarshaling code on the NFS server identifies the correspondence 113 between Read chunks and particular NFS arguments via the chunk 114 Position value encoded in each Read segment. 116 2.2. Use Of The Write List 118 The Write list in each RPC-over-RDMA transport header represents a 119 set of memory regions that can receive DDP-eligible NFS result data. 120 Large data items, such as the payload of an NFS version 3 READ 121 procedure, are referenced by the Write list. The NFS server pushes 122 such payloads to the client, placing them directly into the client's 123 memory. 125 Each Write chunk corresponds to a specific XDR data item in an NFS 126 reply. This document specifies how NFS client and server 127 implementations identify the correspondence between Write chunks and 128 XDR results. 130 2.2.1. Empty Write Chunks 132 Section 4.4.6.2 of [I-D.ietf-nfsv4-rfc5666bis] defines the concept of 133 unused Write chunks. An unused Write chunk is a Write chunk with 134 either zero segments or where all segments in the Write chunk have 135 zero length. In this document these are referred to as "empty" Write 136 chunks. A "non-empty" Write chunk has at least one segment of non- 137 zero length. 139 An NFS client might wish an NFS server to return a DDP-eligible 140 result inline. If there is only one DDP-eligible result item in the 141 reply, the NFS client simply specifies an empty Write list to force 142 the NFS server to return that result inline. If there are multiple 143 DDP-eligible results, the NFS client specifies empty Write chunks for 144 each DDP-eligible data item that it wishes to be returned inline. 146 An NFS server might encounter an XDR union result where there are 147 arms that have a DDP-eligible result, and arms that do not. If the 148 NFS client has provided a non-empty Write chunk that matches with a 149 DDP-eligible result, but the response does not contain that result, 150 the NFS server MUST return an empty Write chunk in that position in 151 the Write list. 153 2.3. Use Of Long Calls And Replies 155 Small RPC messages are conveyed using RDMA Send operations which are 156 of limited size. If an NFS request is too large to be conveyed 157 within the NFS server's responder inline threshold, and there are no 158 DDP-eligible data items that can be removed, an NFS client must send 159 the request using a Long Call. The entire NFS request is sent in a 160 special Read chunk called a Position-Zero Read chunk. 162 If an NFS client predicts that the maximum size of an NFS reply could 163 be too large to be conveyed within it's own responder inline 164 threshold, it provides a Reply chunk in the RPC-over-RDMA transport 165 header conveying the NFS request. The server places the entire NFS 166 reply in the Reply chunk. 168 These special chunks are described in more detail in 169 [I-D.ietf-nfsv4-rfc5666bis]. 171 2.4. Scatter-Gather Considerations 173 A chunk comprises exactly one XDR data item. Each Read chunk is 174 represented as a list of segments at the same XDR Position. Each 175 Write chunk is represented as an array of segments. An NFS client 176 thus has the flexibility to advertise a set of discontiguous memory 177 regions in which to send or receive a single DDP-eligible XDR data 178 item. 180 3. NFS Versions 2 And 3 Upper Layer Binding 182 An NFS version 2 or version 3 client MAY send a single Read chunk to 183 supply the opaque file data for an NFS WRITE procedure, or the 184 pathname for an NFS SYMLINK procedure. For these procedures, NFS 185 version 2 or 3 servers MUST ignore Read chunks beyond the first in 186 the Read list. For all other NFS procedures, NFS version 2 or 3 187 servers MUST ignore Read chunks that have a non-zero value in their 188 Position fields. 190 Similarly, an NFS version 2 or version 3 client MAY provide a single 191 Write chunk to receive either the opaque file data from an NFS READ 192 procedure, or the pathname from an NFS READLINK procedure. For these 193 procedures, NFS version 2 or 3 servers MUST ignore Write chunks 194 beyond the first in the Write list. For all other NFS procedures, 195 NFS version 2 or 3 servers MUST ignore the Write list. 197 There are no NFS version 2 or 3 procedures that have DDP-eligible 198 data items in both their Call and Reply. However, when an NFS 199 version 2 or version 3 client sends a Long Call or Reply, it MAY 200 provide a combination of a Read list, a Write list, and/or a Reply 201 chunk in the same RPC-over-RDMA header. 203 If an NFS version 2 or version 3 client has not provided enough bytes 204 in a Read list to match the size of a DDP-eligible NFS argument data 205 item, or if an NFS version 2 or version 3 client has not provided 206 enough Write list resources to handle an NFS READ or READLINK reply, 207 or if the client has not provided a large enough Reply chunk to 208 convey an NFS reply, the server MUST return one of: 210 o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid 211 field set to the XID of the matching NFS Call, and the rdma_error 212 field set to ERR_CHUNK; or 214 o An RPC message (via an RDMA_MSG message) with the xid field set to 215 the XID of the matching NFS Call, the mtype field set to REPLY, 216 the stat field set to MSG_ACCEPTED, and the accept_stat field set 217 to GARBAGE_ARGS. 219 These replies do not give any indication to NFS version 2 or version 220 3 clients of whether an NFS version 2 or 3 server has processed the 221 arguments of the RPC Call, or whether the NFS version 2 or 3 server 222 has accessed NFS client memory associated with that RPC. 224 NFS version 2 or version 3 clients already successfully estimate the 225 maximum reply size of each operation in order to provide an adequate 226 set of buffers to receive each NFS reply. An NFS version 2 or 227 version 3 client provides a Reply chunk when the maximum possible 228 reply size is larger than the client's responder inline threshold. 230 3.1. Auxiliary Protocols 232 NFS versions 2 and 3 are typically deployed with several other 233 protocols, referred to as "auxiliary" protocols. These are separate 234 RPC protcols which handle operations that are not part of the main 235 NFS protocol. These include the MOUNT and NLM protocols, introduced 236 in an appendix of [RFC1813]; the NSM protocol, described in Chapter 237 11 of [NSM]; and the NFSACL protocol, which does not have a public 238 definition. However NFSACL is treated as a de facto standard and 239 there are several interoperating implementations. 241 RPC-over-RDMA considers these as individual Upper Layer Protocols 242 [I-D.ietf-nfsv4-rfc5666bis]. Therefore to operate on an RPC-over- 243 RDMA transport, an Upper Layer Binding must be provided for each of 244 these. 246 Typically MOUNT, NLM, and NSM are conveyed via TCP rather than RPC- 247 over-RDMA. Note that only metadata is conveyed in these protocols, 248 thus direct data placement is never necessary, and the size of RPC 249 messages is uniformly small. The maximum size of replies is easily 250 determined by examining the XDR definitions of these protocols. 252 Implementations that support the NFSACL protocol typically send 253 NFSACL procedures on the same connection as the main NFS protocol. 254 Thus NFSACL does require an Upper Layer Binding. 256 No data item in this protocol is DDP-eligible. There is no protocol 257 size limit for NFS version 3 ACL objects. The client can have some 258 difficulty ascertaining the size of ACLs to be read from servers. 259 Practically speaking, ACLs are not large (less than 4KB in most 260 cases), but a large Reply chunk may be provided when the client is in 261 doubt. The usual rules apply to the use of Long Messages when the 262 size of an NFSACL RPC exceeds a connection's inline thresholds. 264 4. NFS Version 4 Upper Layer Binding 266 This specification applies to NFS Version 4.0 [RFC7530], NFS Version 267 4.1 [RFC5661], and NFS Version 4.2 [I-D.ietf-nfsv4-minorversion2]. 268 It also applies to the callback protocols associated with each of 269 these minor versions defined in the same documents. 271 4.1. DDP-Eligibility 273 For each WRITE operation in an NFS version 4 COMPOUND procedure, an 274 NFS version 4 client MAY provide a single Read chunk to supply the 275 opaque file data argument. For each CREATE(NF4LNK) operation in an 276 NFS version 4 COMPOUND procedure, An NFS version 4 client MAY provide 277 a single Read chunk to supply the pathname argument. 279 Similarly, for each READ operation in an NFS version 4 COMPOUND 280 procedure, an NFS version 4 client MAY provide a single Write chunk 281 to receive the opaque file data argument. For each READ_PLUS 282 operation in an NFS version 4 COMPOUND procedure, an NFS version 4 283 client MAY provide a single Write chunk to receive NFS4_CONTENT_DATA. 284 For each READLINK operation in an NFS version 4 COMPOUND procedure, 285 an NFS version 4 client MAY provide a single Write chunk to receive 286 the pathname argument. 288 An NFS version 4 client MUST NOT provide a Read or Write chunk that 289 corresponds with any other XDR data item in any other NFS version 4 290 operation in an NFS version 4 COMPOUND procedure, or in an NFS 291 version 4 NULL procedure. 293 It is possible for NFS version 4 COMPOUND procedures to use both the 294 Read list and Write list simultaneously. An NFS version 4 client MAY 295 provide a Read list and a Write list in the same transaction if it is 296 sending a Long Call or Reply. 298 If an NFS version 4 client has not provided enough bytes in a Read 299 list to match the size of a DDP-eligible NFS argument data item, or 300 if an NFS version 4 client has not provided enough Write list 301 resources to handle a WRITE or READLINK operation, or if the client 302 has not provided a large enough Reply chunk to convey an NFS reply, 303 the server MUST return one of: 305 o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid 306 field set to the XID of the matching NFS Call, and the rdma_error 307 field set to ERR_CHUNK; or 309 o An RPC message (via an RDMA_MSG message) with the xid field set to 310 the XID of the matching NFS Call, the stat field set to 311 MSG_ACCEPTED, and the accept_stat field set to GARBAGE_ARGS. 313 Such error replies are permanent errors, and constitute both 314 completion of the RPC transaction, and a valid server response. It 315 is not necessary for an NFS version 4 server to drop the transport 316 connection in this case. 318 4.1.1. Session-Related Considerations 320 In most cases, the presence of an NFS session [RFC5661] has no effect 321 on the operation of RPC-over-RDMA. None of the operations introduced 322 to support NFS sessions contain DDP-eligible data items. There is no 323 need to match the number of session slots with the number of 324 available RPC-over-RDMA credits. 326 However, there are some rare error conditions which require special 327 handling when an NFS session is operating on an RPC-over-RDMA 328 transport. For example, a requester might receive, in response to an 329 RPC request, an RDMA_ERROR message with an rdma_err value of 330 ERR_CHUNK, or an RDMA_MSG containing an RPC_GARBAGEARGS reply. 331 Within RPC-over-RDMA Version One, this class of error can be 332 generated for two different reasons: 334 o There was an XDR error detected parsing the RPC-over-RDMA headers. 336 o There was an error sending the response, because, for example, a 337 necessary reply chunk was not provided or the one provided is of 338 insufficient length. 340 These two situations, which arise only due to incorrect 341 implementations, have different implications with regard to Exactly- 342 Once Semantics. An XDR error in decoding the request precludes the 343 execution of the request on the responder, but failure to send a 344 reply indicates that some or all of the operations were executed. 346 In both instances, the client SHOULD NOT retry the operation. A 347 retry is liable to result in the same sort of error seen previously. 348 Instead, it is best to consider the operation as completed 349 unsuccessfully and report an error to the consumer who requested the 350 RPC. 352 In addition, within the error response, the requester does not have 353 the result of the execution of the SEQUENCE operation, which 354 identifies the session, slot, and sequence id for the request which 355 has failed. The xid associated with the request, obtained from the 356 rdma_xid field of the RDMA_ERROR or RDMA_MSG message, must be used to 357 determine the session and slot for the request which failed, and the 358 slot must be properly retired. If this is not done, the slot could 359 be rendered permanently unavailable. 361 4.2. Reply Size Estimation 363 An NFS version 4 client provides a Reply chunk when the maximum 364 possible reply size is larger than the client's responder inline 365 threshold. NFS version 4 clients already successfully estimate the 366 maximum reply size of most operations in order to provide an adequate 367 set of buffers to receive each NFS reply. 369 There are certain NFS version 4 data items whose size cannot be 370 estimated by clients reliably, however, because there is no protocol- 371 specified size limit on these structures. These include but are not 372 limited to opaque types, such as: 374 o The attrlist4 field 376 o Fields containing ACLs such as fattr4_acl, fattr4_dacl, 377 fattr4_sacl 379 o Fields in the fs_locations4 and fs_locations_info4 data structures 380 o Opaque fields which pertain to pNFS layout metadata, such as 381 loc_body, loh_body, da_addr_body, lou_body, lrf_body, 382 fattr_layout_types and fs_layout_types, 384 In NFS version 4.1 and later minor versions, the csa_fore_chan_attrs 385 argument of the CREATE_SESSION operation contains a 386 ca_maxresponsesize field. The value in this field can be taken as 387 the absolute maximum size of replies generated by a replying NFS 388 version 4 server. This value can be used in cases where it is not 389 possible to estimate a reply size upper bound precisely. In 390 practice, objects such as ACLs, named attributes, layout bodies, and 391 security labels are much smaller than this maximum. 393 With regard to NFS version 4.0, things are more troublesome. 394 Typically NFS version 4.0 client implementations rely on their own 395 architectural limits to keep reply buffer sizes reasonable. For 396 instance, although the NFS version 4 protocol is capable of conveying 397 a megabyte-sized ACL, nearly all known physical filesystems store 398 ACLs in on-disk containers which are small in size. 400 4.2.1. Managing READ_PLUS Replies 402 The NFS version 4.2 READ_PLUS operation returns a complex data type 403 [I-D.ietf-nfsv4-minorversion2]. The rpr_contents field in the result 404 of this operation is an array of read_plus_content unions, one arm of 405 which contains an opaque byte stream (d_data). 407 The size of d_data is limited to the value of the rpa_count field, 408 but the protocol does not bound the number of elements which can be 409 returned in the rpr_contents array. In order to make the size of 410 READ_PLUS replies predictable by NFS version 4.2 clients, the 411 following restrictions are placed on the use of the READ_PLUS 412 operation on RPC-over-RDMA transports: 414 o An NFS version 4.2 client MUST NOT provide more than one Write 415 chunk for any READ_PLUS operation. When providing a Write chunk 416 for a READ_PLUS operation, an NFS version 4.2 client MUST provide 417 a Write chunk that is either empty (which forces all result data 418 items for this operation to be returned inline) or large enough to 419 receive rpa_count bytes in a single element of the rpr_contents 420 array. 422 o If the Write chunk provided for a READ_PLUS operation by an NFS 423 version 4.2 client is not empty, an NFS version 4.2 server MUST 424 use that chunk for the first element of the rpr_contents array 425 that has an rpc_data arm. 427 o An NFS version 4.2 server MUST NOT return more than two elements 428 in the rpr_contents array of any READ_PLUS operation. It returns 429 as much of the requested byte range as it can fit within these two 430 elements. If the NFS version 4.2 server has not asserted rpr_eof 431 in the reply, the NFS version 4.2 client SHOULD send additional 432 READ_PLUS requests for any remaining bytes. 434 4.3. NFS Version 4 COMPOUND Requests 436 A single NFS version 4 COMPOUND procedure supplies arguments for a 437 sequence of operations, and returns results from that sequence, all 438 in a single round-trip [RFC7530]. An NFS version 4 client MAY 439 construct an NFS version 4 COMPOUND procedure that provides more than 440 one chunk in the Read list or Write list as long as it observes the 441 restrictions in Section 4.1. 443 An NFS version 4 client provides XDR Position values in each Read 444 chunk to disambiguate which chunk is associated with which argument 445 data item. However NFS version 4 server and client implementations 446 must agree in advance on how to pair Write chunks with returned 447 result data items. 449 The mechanism specified in Section 5.3.2 of 450 [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with some additional 451 restrictions. In the following list, an "NFS Read" operation refers 452 to any NFS Version 4 operation which has a DDP-eligible result data 453 item (i.e., either a READ, READ_PLUS, or READLINK operation). 455 o If an NFS version 4 client wishes all DDP-eligible items in an NFS 456 reply to be conveyed inline, it leaves the Write list empty. 458 o The first chunk in the Write list MUST be used by the first NFS 459 Read operation in an NFS version 4 COMPOUND procedure. The next 460 Write chunk is used by the next NFS Read operation, and so on. 462 o If an NFS version 4 client has provided a matching non-empty Write 463 chunk, then the corresponding NFS Read operation MUST return its 464 DDP-eligible data item using that chunk. 466 o If an NFS version 4 client has provided an empty matching Write 467 chunk, then the corresponding NFS Read operation MUST return all 468 of its result data items inline. 470 o If an NFS Read operation returns a union arm which does not 471 contain a DDP-eligible result, and the NFS version 4 client has 472 provided a matching non-empty Write chunk, an NFS version 4 server 473 MUST return an empty Write chunk in that Write list position. 475 o If there are more NFS Read operations than Write chunks, then 476 remaining NFS Read operations in an NFS version 4 COMPOUND that 477 have no matching Write chunk MUST return their results inline. 479 4.3.1. NFS Version 4 COMPOUND Example 481 The following example shows a Write list with three Write chunks, A, 482 B, and C. The NFS version 4 server consumes the provided Write 483 chunks by writing the results of the designated operations in the 484 compound request (READ and READLINK) back to each chunk. 486 Write list: 488 A --> B --> C 490 NFS version 4 COMPOUND request: 492 PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ 493 | | | 494 v v v 495 A B C 497 If the NFS version 4 client does not want to have the READLINK result 498 returned via RDMA, it provides an empty Write chunk for buffer B to 499 indicate that the READLINK result must be returned inline. 501 4.4. NFS Version 4 Callback 503 The NFS version 4 protocols support server-initiated callbacks to 504 notify clients of events such as recalled delegations. 506 4.4.1. NFS Version 4.0 Callback 508 NFS version 4.0 implementations typically employ a separate TCP 509 connection to handle callback operations, even when the forward 510 channel uses a RPC-over-RDMA transport. Therefore no Upper Layer 511 Binding for the NFS version 4.0 callback program is provided in this 512 document. 514 4.4.2. NFS Version 4.1 Callback 516 In NFS version 4.1 and later minor versions, callback operations may 517 appear on the same connection as is used for NFS version 4 forward 518 channel client requests. NFS version 4 clients and servers MUST use 519 the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when 520 backchannel operations are conveyed on RPC-over-RDMA transports. 522 The csa_back_chan_attrs argument of the CREATE_SESSION operation 523 contains a ca_maxresponsesize field. The value in this field can be 524 taken as the absolute maximum size of backchannel replies generated 525 by a replying NFS version 4 client. 527 There are no DDP-eligible data items in callback protocols associated 528 with NFS version 4.1 or NFS version 4.2. However, some callback 529 requests, such as messages that convey device ID information, may be 530 large, in which case a Long Call or Reply may be appropriate. When 531 the NFS version 4 client reports a backchannel ca_maxresponsesize 532 that is larger than the connection's inline thresholds, the NFS 533 version 4 client can support Long messages (i.e., Read chunks and 534 Reply chunks). Otherwise an NFS version 4 server MUST use Short 535 messages to convey backchannel operations. 537 See Section 4.1 for a discussion of how an NFS version 4 server 538 handles situations where an NFS version 4 client has provided 539 inadequate RDMA resources to convey a backchannel reply. 541 4.5. Connection Keep-Alive 543 NFS version 4 client implementations often rely on a transport-layer 544 keep-alive mechanism to detect when an NFS version 4 server has 545 become unresponsive. When an NFS server is no longer responsive, 546 client-side keep-alive terminates the connection, which in turn 547 triggers reconnection and RPC retransmission. 549 RDMA transports have no keep-alive mechanism. Without a disconnect 550 or new RPC traffic, RDMA transport connections can remain alive long 551 after an NFS server has become unresponsive. Once an NFS client has 552 consumed all available RPC-over-RDMA credits on that transport 553 connection, it will forever await a reply before sending another RPC 554 request. 556 NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use 557 for periodic server or connection health assessment. This credit can 558 be used to drive an RPC request on an otherwise idle connection, 559 triggering either a quick affirmative server response or immediate 560 connection termination. 562 To prevent lease expiry, NFS version 4 clients should use a lease- 563 extending operation such as RENEW or SEQUENCE, rather than a NULL 564 request, when performing a periodic health assessment. 566 5. Extending NFS Upper Layer Bindings 568 RPC programs such as NFS are required to have an Upper Layer Binding 569 specification to interoperate on RPC-over-RDMA transports 570 [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer 571 Binding specified in this document can be extended to cover versions 572 of the NFS version 4 protocol specified after NFS version 4 minor 573 version 2. This includes NFS version 4 extensions that are 574 documented separately from a new minor version. 576 6. IANA Considerations 578 NFS use of direct data placement introduces a need for an additional 579 NFS port number assignment for networks that share traditional UDP 580 and TCP port spaces with RDMA services. The iWARP [RFC5041] 581 [RFC5040] protocol is such an example (InfiniBand is not). 583 NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally 584 listen for clients on UDP and TCP port 2049, and additionally, they 585 register these with the portmapper and/or rpcbind [RFC1833] service. 586 However, [RFC7530] requires NFS version 4 servers to listen on TCP 587 port 2049, and they are not required to register. 589 An NFS version 2 or version 3 server supporting RPC-over-RDMA on such 590 a network and registering itself with the RPC portmapper MAY choose 591 an arbitrary port, or MAY use the alternative well-known port number 592 for its RPC-over-RDMA service. The chosen port MAY be registered 593 with the RPC portmapper under the netid assigned by the requirement 594 in [I-D.ietf-nfsv4-rfc5666bis]. 596 An NFS version 4 server supporting RPC-over-RDMA on such a network 597 MUST use the alternative well-known port number for its RPC-over-RDMA 598 service. Clients SHOULD connect to this well-known port without 599 consulting the RPC portmapper (as for NFS version 4 on TCP 600 transports). 602 The port number assigned to an NFS service over an RPC-over-RDMA 603 transport is available from the IANA port registry [RFC3232]. 605 7. Security Considerations 607 RPC-over-RDMA supports all RPC security models, including RPCSEC_GSS 608 security and transport-level security [RFC2203]. The choice of RDMA 609 Read and RDMA Write to convey RPC argument and results does not 610 affect this, since it changes only the method of data transfer. 611 Specifically, the requirements of [I-D.ietf-nfsv4-rfc5666bis] ensure 612 that this choice does not introduce new vulnerabilities. 614 Because this document defines only the binding of the NFS protocols 615 atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security 616 considerations are therefore to be described at that layer. 618 8. References 620 8.1. Normative References 622 [I-D.ietf-nfsv4-minorversion2] 623 Haynes, T., "NFS Version 4 Minor Version 2", draft-ietf- 624 nfsv4-minorversion2-41 (work in progress), January 2016. 626 [I-D.ietf-nfsv4-rfc5666bis] 627 Lever, C., Simpson, W., and T. Talpey, "Remote Direct 628 Memory Access Transport for Remote Procedure Call, Version 629 One", draft-ietf-nfsv4-rfc5666bis-07 (work in progress), 630 May 2016. 632 [I-D.ietf-nfsv4-rpcrdma-bidirection] 633 Lever, C., "Bi-directional Remote Procedure Call On RPC- 634 over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- 635 bidirection-05 (work in progress), June 2016. 637 [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", 638 RFC 1833, DOI 10.17487/RFC1833, August 1995, 639 . 641 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 642 Requirement Levels", BCP 14, RFC 2119, 643 DOI 10.17487/RFC2119, March 1997, 644 . 646 [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol 647 Specification", RFC 2203, DOI 10.17487/RFC2203, September 648 1997, . 650 [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 651 "Network File System (NFS) Version 4 Minor Version 1 652 Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, 653 . 655 [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System 656 (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, 657 March 2015, . 659 8.2. Informative References 661 [NSM] The Open Group, "Protocols for Interworking: XNFS, Version 662 3W", February 1998. 664 [RFC1094] Nowicki, B., "NFS: Network File System Protocol 665 specification", RFC 1094, DOI 10.17487/RFC1094, March 666 1989, . 668 [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS 669 Version 3 Protocol Specification", RFC 1813, 670 DOI 10.17487/RFC1813, June 1995, 671 . 673 [RFC3232] Reynolds, J., Ed., "Assigned Numbers: RFC 1700 is Replaced 674 by an On-line Database", RFC 3232, DOI 10.17487/RFC3232, 675 January 2002, . 677 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 678 Garcia, "A Remote Direct Memory Access Protocol 679 Specification", RFC 5040, DOI 10.17487/RFC5040, October 680 2007, . 682 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 683 Data Placement over Reliable Transports", RFC 5041, 684 DOI 10.17487/RFC5041, October 2007, 685 . 687 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 688 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 689 January 2010, . 691 Appendix A. Changes Since RFC 5667 693 Corrections and updates made necessary by new language in 694 [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, 695 references to deprecated features of RPC-over-RDMA Version One, such 696 as RDMA_MSGP, and the use of the Read list for handling RPC replies, 697 have been removed. The term "mapping" has been replaced with the 698 term "binding" or "Upper Layer Binding" throughout the document. 699 Some material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] 700 has been deleted. 702 Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer 703 Bindings that was not present in [RFC5667] has been added, including 704 discussion of how each NFS version properly estimates the maximum 705 size of RPC replies. 707 Technical corrections have been made. For example, the mention of 708 12KB and 36KB inline thresholds have been removed. The reference to 709 a non-existant NFS version 4 SYMLINK operation has been replaced with 710 NFS version 4 CREATE(NF4LNK). 712 The discussion of NFS version 4 COMPOUND handling has been completed. 713 Some changes were made to the algorithm for matching DDP-eligible 714 results to Write chunks. 716 The following additional improvements have been made, relative to 717 [RFC5667]: 719 o An explicit discussion of NFS version 4.0 and NFS version 4.1 720 backchannel operation has replaced the previous treatment of 721 callback operations. 723 o A binding for NFS version 4.2 has been added that includes 724 discussion of new data-bearing operations like READ_PLUS. 726 o A section suggesting a mechanism for periodically assessing 727 connection health has been introduced. 729 o Language inconsistent with or contradictory to 730 [I-D.ietf-nfsv4-rfc5666bis] has been removed from Sections 2 and 731 3, and both Sections have been combined into Section 2 in the 732 present document. 734 o Ambiguous or erroneous uses of RFC2119 terms have been corrected. 736 o References to obsolete RFCs have been updated. 738 o An IANA Considerations Section has replaced the "Port Usage 739 Considerations" Section. 741 o Code excerpts have been removed, and figures have been modernized. 743 Appendix B. Acknowledgments 745 The author gratefully acknowledges the work of Brent Callaghan and 746 Tom Talpey on the original NFS Direct Data Placement specification 747 [RFC5667]. The author also wishes to thank Bill Baker and Greg 748 Marsden for their support of this work. 750 Dave Noveck provided excellent review, constructive suggestions, and 751 consistent navigational guidance throughout the process of drafting 752 this document. Dave also contributed the text of Section 4.1.1. 754 Thanks to Karen Deitke for her sharp observations about idempotency, 755 and the clarity of the discussion of NFS COMPOUNDs. 757 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 758 Working Group Chair Spencer Shepler, and nfsv4 Working Group 759 Secretary Thomas Haynes for their support. 761 Author's Address 763 Charles Lever (editor) 764 Oracle Corporation 765 1015 Granger Avenue 766 Ann Arbor, MI 48104 767 USA 769 Phone: +1 734 274 2396 770 Email: chuck.lever@oracle.com