| < draft-ietf-nfsv4-rfc5667bis-03.txt | draft-ietf-nfsv4-rfc5667bis-04.txt > | |||
|---|---|---|---|---|
| Network File System Version 4 C. Lever, Ed. | Network File System Version 4 C. Lever, Ed. | |||
| Internet-Draft Oracle | Internet-Draft Oracle | |||
| Obsoletes: 5667 (if approved) September 28, 2016 | Obsoletes: 5667 (if approved) January 20, 2017 | |||
| Intended status: Standards Track | Intended status: Standards Track | |||
| Expires: April 1, 2017 | Expires: July 24, 2017 | |||
| Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA | Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA | |||
| draft-ietf-nfsv4-rfc5667bis-03 | draft-ietf-nfsv4-rfc5667bis-04 | |||
| Abstract | Abstract | |||
| This document specifies Upper Layer Bindings of Network File System | This document specifies Upper Layer Bindings of Network File System | |||
| (NFS) protocol versions to RPC-over-RDMA transports. These bindings | (NFS) protocol versions to RPC-over-RDMA. Upper Layer Bindings are | |||
| are required to enable RPC-based protocols such as NFS to use direct | required to enable RPC-based protocols, such as NFS, to use Direct | |||
| data placement on RPC-over-RDMA transports. This document obsoletes | Data Placement on RPC-over-RDMA. This document obsoletes RFC 5667. | |||
| RFC 5667. | ||||
| Requirements Language | Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| skipping to change at page 1, line 41 ¶ | skipping to change at page 1, line 40 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on April 1, 2017. | This Internet-Draft will expire on July 24, 2017. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2016 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| This document may contain material from IETF Documents or IETF | ||||
| Contributions published or made publicly available before November | ||||
| 10, 2008. The person(s) controlling the copyright in some of this | ||||
| material may not have granted the IETF Trust the right to allow | ||||
| modifications of such material outside the IETF Standards Process. | ||||
| Without obtaining an adequate license from the person(s) controlling | ||||
| the copyright in such materials, this document may not be modified | ||||
| outside the IETF Standards Process, and derivative works of it may | ||||
| not be created outside the IETF Standards Process, except to format | ||||
| it for publication as an RFC or to translate it into languages other | ||||
| than English. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Conveying NFS Operations On RPC-Over-RDMA Transports . . . . 3 | 2. Conveying NFS Operations On RPC-Over-RDMA . . . . . . . . . . 3 | |||
| 3. NFS Versions 2 And 3 Upper Layer Binding . . . . . . . . . . 4 | 3. Upper Layer Binding For NFS Versions 2 And 3 . . . . . . . . 5 | |||
| 4. NFS Version 4 Upper Layer Binding . . . . . . . . . . . . . . 6 | 4. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 | |||
| 5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 13 | 5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 13 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | |||
| 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 15 | Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 16 | |||
| Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 16 | Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 17 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 1. Introduction | 1. Introduction | |||
| An RPC-over-RDMA transport, such as defined in | An RPC-over-RDMA transport, such as the one defined in | |||
| [I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to | [I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to | |||
| convey data payloads associated with RPC transactions. Each RPC- | convey data payloads associated with RPC transactions. To enable | |||
| over-RDMA transport header conveys lists of memory locations | successful interoperation, RPC client and server implementations must | |||
| corresponding to XDR data items defined in an Upper Layer Protocol | agree as to which XDR data items in what particular RPC procedures | |||
| (such as NFS). | are eligible for direct data placement (DDP). | |||
| To facilitate interoperation, RPC client and server implementations | This document contains material required of Upper Layer Bindings, as | |||
| must agree in advance on what XDR data items in which RPC procedures | specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS | |||
| are eligible for direct data placement (DDP). This document contains | protocol versions: | |||
| material required of Upper Layer Bindings, as specified in | ||||
| [I-D.ietf-nfsv4-rfc5666bis], for the following NFS protocol versions: | ||||
| o NFS Version 2 [RFC1094] | o NFS Version 2 [RFC1094] | |||
| o NFS Version 3 [RFC1813] | o NFS Version 3 [RFC1813] | |||
| o NFS Version 4.0 [RFC7530] | o NFS Version 4.0 [RFC7530] | |||
| o NFS Version 4.1 [RFC5661] | o NFS Version 4.1 [RFC5661] | |||
| o NFS Version 4.2 [I-D.ietf-nfsv4-minorversion2] | o NFS Version 4.2 [RFC7862] | |||
| 2. Conveying NFS Operations On RPC-Over-RDMA Transports | Upper Layer Bindings specified in this document apply to all versions | |||
| of RPC-over-RDMA. | ||||
| 2. Conveying NFS Operations On RPC-Over-RDMA | ||||
| Definitions of terminology and a general discussion of how RPC-over- | Definitions of terminology and a general discussion of how RPC-over- | |||
| RDMA is used to convey RPC transactions can be found in | RDMA is used to convey RPC transactions can be found in | |||
| [I-D.ietf-nfsv4-rfc5666bis]. In this section, these general | [I-D.ietf-nfsv4-rfc5666bis]. In this section, these general | |||
| principals are applied to the specifics of the NFS protocol. | principles are applied in the context of conveying NFS procedures on | |||
| RPC-over-RDMA. Some issues common to all NFS protocol versions are | ||||
| introduced. | ||||
| 2.1. Use Of The Read List | 2.1. The Read List | |||
| The Read list in each RPC-over-RDMA transport header represents a set | The Read list in each RPC-over-RDMA transport header represents a set | |||
| of memory regions containing DDP-eligible NFS argument data. Large | of memory regions containing DDP-eligible NFS argument data. Large | |||
| data items, such as the data payload of an NFS version 3 WRITE | data items, such as the data payload of an NFS version 3 WRITE | |||
| procedure, are referenced by the Read list. The NFS server pulls | procedure, can be referenced by the Read list. The NFS server pulls | |||
| such payloads from the client and places them directly into its own | such payloads from the client and places them directly into its own | |||
| memory. | memory. | |||
| XDR unmarshaling code on the NFS server identifies the correspondence | Exactly which XDR data items may be conveyed in this fashion is | |||
| between Read chunks and particular NFS arguments via the chunk | detailed later in this document. | |||
| Position value encoded in each Read segment. | ||||
| 2.2. Use Of The Write List | 2.2. The Write List | |||
| The Write list in each RPC-over-RDMA transport header represents a | The Write list in each RPC-over-RDMA transport header represents a | |||
| set of memory regions that can receive DDP-eligible NFS result data. | set of memory regions that can receive DDP-eligible NFS result data. | |||
| Large data items, such as the payload of an NFS version 3 READ | Large data items, such as the payload of an NFS version 3 READ | |||
| procedure, are referenced by the Write list. The NFS server pushes | procedure, can be referenced by the Write list. The NFS server | |||
| such payloads to the client, placing them directly into the client's | pushes such payloads to the client, placing them directly into the | |||
| memory. | client's memory. | |||
| Each Write chunk corresponds to a specific XDR data item in an NFS | Each Write chunk corresponds to a specific XDR data item in an NFS | |||
| reply. This document specifies how NFS client and server | reply. This document specifies how NFS client and server | |||
| implementations identify the correspondence between Write chunks and | implementations identify the correspondence between Write chunks and | |||
| XDR results. | XDR results. | |||
| 2.2.1. Empty Write Chunks | Exactly which XDR data items may be conveyed in this fashion is | |||
| detailed later in this document. | ||||
| Section 4.4.6.2 of [I-D.ietf-nfsv4-rfc5666bis] defines the concept of | ||||
| unused Write chunks. An unused Write chunk is a Write chunk with | ||||
| either zero segments or where all segments in the Write chunk have | ||||
| zero length. In this document these are referred to as "empty" Write | ||||
| chunks. A "non-empty" Write chunk has at least one segment of non- | ||||
| zero length. | ||||
| An NFS client might wish an NFS server to return a DDP-eligible | ||||
| result inline. If there is only one DDP-eligible result item in the | ||||
| reply, the NFS client simply specifies an empty Write list to force | ||||
| the NFS server to return that result inline. If there are multiple | ||||
| DDP-eligible results, the NFS client specifies empty Write chunks for | ||||
| each DDP-eligible data item that it wishes to be returned inline. | ||||
| An NFS server might encounter an XDR union result where there are | ||||
| arms that have a DDP-eligible result, and arms that do not. If the | ||||
| NFS client has provided a non-empty Write chunk that matches with a | ||||
| DDP-eligible result, but the response does not contain that result, | ||||
| the NFS server MUST return an empty Write chunk in that position in | ||||
| the Write list. | ||||
| 2.3. Use Of Long Calls And Replies | 2.3. Long Calls And Replies | |||
| Small RPC messages are conveyed using RDMA Send operations which are | Small RPC messages are conveyed using RDMA Send operations which are | |||
| of limited size. If an NFS request is too large to be conveyed | of limited size. If an NFS request is too large to be conveyed | |||
| within the NFS server's responder inline threshold, and there are no | within the NFS server's responder inline threshold, and there are no | |||
| DDP-eligible data items that can be removed, an NFS client must send | DDP-eligible data items that can be removed, an NFS client must send | |||
| the request using a Long Call. The entire NFS request is sent in a | the request in the form of a Long Call. The entire NFS request is | |||
| special Read chunk called a Position-Zero Read chunk. | sent in a special Read chunk called a Position Zero Read chunk. | |||
| If an NFS client predicts that the maximum size of an NFS reply could | If an NFS client determines that the maximum size of an NFS reply | |||
| be too large to be conveyed within it's own responder inline | could be too large to be conveyed within it's own responder inline | |||
| threshold, it provides a Reply chunk in the RPC-over-RDMA transport | threshold, it provides a Reply chunk in the RPC-over-RDMA transport | |||
| header conveying the NFS request. The server places the entire NFS | header conveying the NFS request. The server places the entire NFS | |||
| reply in the Reply chunk. | reply in the Reply chunk. | |||
| These special chunks are described in more detail in | When the RPC authentication flavor requires that DDP-eligible data | |||
| items are never removed from RPC messages, an NFS client can provide | ||||
| both a Position Zero Read chunk and a Reply chunk for the same RPC. | ||||
| These special chunks are discussed in further detail in | ||||
| [I-D.ietf-nfsv4-rfc5666bis]. | [I-D.ietf-nfsv4-rfc5666bis]. | |||
| 2.4. Scatter-Gather Considerations | 2.4. Scatter-Gather Considerations | |||
| A chunk comprises exactly one XDR data item. Each Read chunk is | A chunk typically corresponds to exactly one XDR data item. Each | |||
| represented as a list of segments at the same XDR Position. Each | Read chunk is represented as a list of segments at the same XDR | |||
| Write chunk is represented as an array of segments. An NFS client | Position. Each Write chunk is represented as an array of segments. | |||
| thus has the flexibility to advertise a set of discontiguous memory | An NFS client thus has the flexibility to advertise a set of | |||
| regions in which to send or receive a single DDP-eligible XDR data | discontiguous memory regions in which to convey a single DDP-eligible | |||
| item. | XDR data item. | |||
| 3. NFS Versions 2 And 3 Upper Layer Binding | ||||
| An NFS version 2 or version 3 client MAY send a single Read chunk to | ||||
| supply the opaque file data for an NFS WRITE procedure, or the | ||||
| pathname for an NFS SYMLINK procedure. For these procedures, NFS | ||||
| version 2 or 3 servers MUST ignore Read chunks beyond the first in | ||||
| the Read list. For all other NFS procedures, NFS version 2 or 3 | ||||
| servers MUST ignore Read chunks that have a non-zero value in their | ||||
| Position fields. | ||||
| Similarly, an NFS version 2 or version 3 client MAY provide a single | ||||
| Write chunk to receive either the opaque file data from an NFS READ | ||||
| procedure, or the pathname from an NFS READLINK procedure. For these | ||||
| procedures, NFS version 2 or 3 servers MUST ignore Write chunks | ||||
| beyond the first in the Write list. For all other NFS procedures, | ||||
| NFS version 2 or 3 servers MUST ignore the Write list. | ||||
| There are no NFS version 2 or 3 procedures that have DDP-eligible | 2.5. DDP Eligibility Violations | |||
| data items in both their Call and Reply. However, when an NFS | ||||
| version 2 or version 3 client sends a Long Call or Reply, it MAY | ||||
| provide a combination of a Read list, a Write list, and/or a Reply | ||||
| chunk in the same RPC-over-RDMA header. | ||||
| If an NFS version 2 or version 3 client has not provided enough bytes | To report a DDP-eligibity violation, an NFS server MUST return one | |||
| in a Read list to match the size of a DDP-eligible NFS argument data | of: | |||
| item, or if an NFS version 2 or version 3 client has not provided | ||||
| enough Write list resources to handle an NFS READ or READLINK reply, | ||||
| or if the client has not provided a large enough Reply chunk to | ||||
| convey an NFS reply, the server MUST return one of: | ||||
| o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid | o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid | |||
| field set to the XID of the matching NFS Call, and the rdma_error | field set to the XID of the matching NFS Call, and the rdma_error | |||
| field set to ERR_CHUNK; or | field set to ERR_CHUNK; or | |||
| o An RPC message (via an RDMA_MSG message) with the xid field set to | o An RPC message (via an RDMA_MSG message) with the xid field set to | |||
| the XID of the matching NFS Call, the mtype field set to REPLY, | the XID of the matching NFS Call, the mtype field set to REPLY, | |||
| the stat field set to MSG_ACCEPTED, and the accept_stat field set | the stat field set to MSG_ACCEPTED, and the accept_stat field set | |||
| to GARBAGE_ARGS. | to GARBAGE_ARGS. | |||
| These replies do not give any indication to NFS version 2 or version | Subsequent sections of this document describe further considerations | |||
| 3 clients of whether an NFS version 2 or 3 server has processed the | particular to specific NFS protocols or procedures. | |||
| arguments of the RPC Call, or whether the NFS version 2 or 3 server | ||||
| has accessed NFS client memory associated with that RPC. | ||||
| NFS version 2 or version 3 clients already successfully estimate the | 2.6. Reply Size Estimation | |||
| maximum reply size of each operation in order to provide an adequate | ||||
| set of buffers to receive each NFS reply. An NFS version 2 or | ||||
| version 3 client provides a Reply chunk when the maximum possible | ||||
| reply size is larger than the client's responder inline threshold. | ||||
| 3.1. Auxiliary Protocols | During the construction of each RPC Call message, an NFS client is | |||
| responsible for allocating appropriate resources for receiving the | ||||
| matching Reply message. A Reply buffer overrun can result in | ||||
| corruption of the Reply message or termination of the transport | ||||
| connection. Therefore reliable reply size estimation is necessary to | ||||
| ensure successful interoperation. | ||||
| NFS versions 2 and 3 are typically deployed with several other | In many cases the Upper Layer Protocol's XDR definition provides | |||
| protocols, referred to as "auxiliary" protocols. These are separate | enough information to enable the client to make a reliable prediction | |||
| RPC protcols which handle operations that are not part of the main | of the maximum size of the expected Reply message. If there are | |||
| NFS protocol. These include the MOUNT and NLM protocols, introduced | variable-size data items in the result, the maximum size of the RPC | |||
| in an appendix of [RFC1813]; the NSM protocol, described in Chapter | Reply message can be reliably estimated in most cases: | |||
| 11 of [NSM]; and the NFSACL protocol, which does not have a public | ||||
| definition. However NFSACL is treated as a de facto standard and | ||||
| there are several interoperating implementations. | ||||
| RPC-over-RDMA considers these as individual Upper Layer Protocols | o The client requests only a specific portion of an object (for | |||
| [I-D.ietf-nfsv4-rfc5666bis]. Therefore to operate on an RPC-over- | example, using the "count" and "offset" fields in an NFS READ). | |||
| RDMA transport, an Upper Layer Binding must be provided for each of | ||||
| these. | ||||
| Typically MOUNT, NLM, and NSM are conveyed via TCP rather than RPC- | o The client has already cached the size of the whole object it is | |||
| over-RDMA. Note that only metadata is conveyed in these protocols, | about to request (say, via a previous NFS GETATTR request). | |||
| thus direct data placement is never necessary, and the size of RPC | ||||
| messages is uniformly small. The maximum size of replies is easily | ||||
| determined by examining the XDR definitions of these protocols. | ||||
| Implementations that support the NFSACL protocol typically send | It is occasionally not possible to determine the maximum Reply | |||
| NFSACL procedures on the same connection as the main NFS protocol. | message size based solely on the above criteria. NFS client | |||
| Thus NFSACL does require an Upper Layer Binding. | implementers can choose to provide the largest possible Reply buffer | |||
| in those cases, based on, for instance, the largest possible NFS READ | ||||
| or WRITE payload (which is negotiated at mount time). | ||||
| No data item in this protocol is DDP-eligible. There is no protocol | In rare cases, a client may encounter a reply for which no a priori | |||
| size limit for NFS version 3 ACL objects. The client can have some | determination of reply size bound is possible. The client SHOULD | |||
| difficulty ascertaining the size of ACLs to be read from servers. | expect a transport error to indicate that it must either terminate | |||
| Practically speaking, ACLs are not large (less than 4KB in most | that RPC transaction, or retry it with a larger Reply chunk. | |||
| cases), but a large Reply chunk may be provided when the client is in | ||||
| doubt. The usual rules apply to the use of Long Messages when the | ||||
| size of an NFSACL RPC exceeds a connection's inline thresholds. | ||||
| 4. NFS Version 4 Upper Layer Binding | The use of NFS COMPOUND operations raises the possibility of non- | |||
| idempotent requests that combine a non-idempotent operation with an | ||||
| operation whose reply size is uncertain. This causes potential | ||||
| difficulties with retrying the transaction. Note however that many | ||||
| operations normally considered non-idempotent (e.g WRITE, SETATTR) | ||||
| are actually idempotent. Truly non-idempotent operations are quite | ||||
| unusual in COMPOUNDs that include operations with uncertain reply | ||||
| sizes. | ||||
| This specification applies to NFS Version 4.0 [RFC7530], NFS Version | 3. Upper Layer Binding For NFS Versions 2 And 3 | |||
| 4.1 [RFC5661], and NFS Version 4.2 [I-D.ietf-nfsv4-minorversion2]. | ||||
| It also applies to the callback protocols associated with each of | ||||
| these minor versions defined in the same documents. | ||||
| 4.1. DDP-Eligibility | This Upper Layer Binding specification applies to NFS Version 2 | |||
| [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in this section | ||||
| a "legacy NFS client" refers to an NFS client using NFS version 2 or | ||||
| NFS version 3 to communicate with an NFS server. Likewise, a "legacy | ||||
| NFS server" is an NFS server communicating with clients using NFS | ||||
| version 2 or NFS version 3. | ||||
| For each WRITE operation in an NFS version 4 COMPOUND procedure, an | The following XDR data items in NFS versions 2 and 3 are DDP- | |||
| NFS version 4 client MAY provide a single Read chunk to supply the | eligible: | |||
| opaque file data argument. For each CREATE(NF4LNK) operation in an | ||||
| NFS version 4 COMPOUND procedure, An NFS version 4 client MAY provide | ||||
| a single Read chunk to supply the pathname argument. | ||||
| Similarly, for each READ operation in an NFS version 4 COMPOUND | o The opaque file data argument in the NFS WRITE procedure | |||
| procedure, an NFS version 4 client MAY provide a single Write chunk | ||||
| to receive the opaque file data argument. For each READ_PLUS | ||||
| operation in an NFS version 4 COMPOUND procedure, an NFS version 4 | ||||
| client MAY provide a single Write chunk to receive NFS4_CONTENT_DATA. | ||||
| For each READLINK operation in an NFS version 4 COMPOUND procedure, | ||||
| an NFS version 4 client MAY provide a single Write chunk to receive | ||||
| the pathname argument. | ||||
| An NFS version 4 client MUST NOT provide a Read or Write chunk that | o The pathname argument in the NFS SYMLINK procedure | |||
| corresponds with any other XDR data item in any other NFS version 4 | ||||
| operation in an NFS version 4 COMPOUND procedure, or in an NFS | ||||
| version 4 NULL procedure. | ||||
| It is possible for NFS version 4 COMPOUND procedures to use both the | o The opaque file data result in the NFS READ procedure | |||
| Read list and Write list simultaneously. An NFS version 4 client MAY | ||||
| provide a Read list and a Write list in the same transaction if it is | ||||
| sending a Long Call or Reply. | ||||
| If an NFS version 4 client has not provided enough bytes in a Read | o The pathname result in the NFS READLINK procedure | |||
| list to match the size of a DDP-eligible NFS argument data item, or | ||||
| if an NFS version 4 client has not provided enough Write list | ||||
| resources to handle a WRITE or READLINK operation, or if the client | ||||
| has not provided a large enough Reply chunk to convey an NFS reply, | ||||
| the server MUST return one of: | ||||
| o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid | All other argument or result data items in NFS versions 2 and 3 are | |||
| field set to the XID of the matching NFS Call, and the rdma_error | not DDP-eligible. | |||
| field set to ERR_CHUNK; or | ||||
| o An RPC message (via an RDMA_MSG message) with the xid field set to | A legacy server's response to a DDP-eligibility violation (described | |||
| the XID of the matching NFS Call, the stat field set to | in Section 2.5) does not give an indication to legacy clients of | |||
| MSG_ACCEPTED, and the accept_stat field set to GARBAGE_ARGS. | whether the server has processed the arguments of the RPC Call, or | |||
| whether the server has accessed or modified client memory associated | ||||
| with that RPC. | ||||
| Such error replies are permanent errors, and constitute both | A legacy NFS client determines the maximum reply size for each | |||
| completion of the RPC transaction, and a valid server response. It | operation using the basic criteria outlined in Section 2.6. Such | |||
| is not necessary for an NFS version 4 server to drop the transport | clients provide a Reply chunk when the maximum possible reply size, | |||
| connection in this case. | exclusive of any data items represented by Write chunks, is larger | |||
| than the client's responder inline threshold. | ||||
| 4.1.1. Session-Related Considerations | 3.1. Auxiliary Protocols | |||
| In most cases, the presence of an NFS session [RFC5661] has no effect | NFS versions 2 and 3 are typically deployed with several other | |||
| on the operation of RPC-over-RDMA. None of the operations introduced | protocols, sometimes referred to as "NFS auxiliary protocols." These | |||
| to support NFS sessions contain DDP-eligible data items. There is no | are separate RPC programs that define procedures which are not part | |||
| need to match the number of session slots with the number of | of the NFS version 2 or version 3 RPC programs. These include: | |||
| available RPC-over-RDMA credits. | ||||
| However, there are some rare error conditions which require special | o The MOUNT and NLM protocols, introduced in an appendix of | |||
| handling when an NFS session is operating on an RPC-over-RDMA | [RFC1813] | |||
| transport. For example, a requester might receive, in response to an | ||||
| RPC request, an RDMA_ERROR message with an rdma_err value of | ||||
| ERR_CHUNK, or an RDMA_MSG containing an RPC_GARBAGEARGS reply. | ||||
| Within RPC-over-RDMA Version One, this class of error can be | ||||
| generated for two different reasons: | ||||
| o There was an XDR error detected parsing the RPC-over-RDMA headers. | o The NSM protocol, described in Chapter 11 of [NSM] | |||
| o There was an error sending the response, because, for example, a | o The NFSACL protocol, which does not have a public definition | |||
| necessary reply chunk was not provided or the one provided is of | (NFSACL here is treated as a de facto standard as there are | |||
| insufficient length. | several interoperating implementations). | |||
| These two situations, which arise only due to incorrect | RPC-over-RDMA considers these programs as distinct Upper Layer | |||
| implementations, have different implications with regard to Exactly- | Protocols [I-D.ietf-nfsv4-rfc5666bis]. To enable the use of these | |||
| Once Semantics. An XDR error in decoding the request precludes the | ULPs on an RPC-over-RDMA transport, an Upper Layer Binding | |||
| execution of the request on the responder, but failure to send a | specification is provided here for each. | |||
| reply indicates that some or all of the operations were executed. | ||||
| In both instances, the client SHOULD NOT retry the operation. A | 3.1.1. MOUNT, NLM, And NSM Protocols | |||
| retry is liable to result in the same sort of error seen previously. | ||||
| Instead, it is best to consider the operation as completed | ||||
| unsuccessfully and report an error to the consumer who requested the | ||||
| RPC. | ||||
| In addition, within the error response, the requester does not have | Typically MOUNT, NLM, and NSM are conveyed via TCP, even in | |||
| the result of the execution of the SEQUENCE operation, which | deployments where NFS operations on RPC-over-RDMA. When a legacy | |||
| identifies the session, slot, and sequence id for the request which | server supports these programs on RPC-over-RDMA, it advertises the | |||
| has failed. The xid associated with the request, obtained from the | port address via the usual rpcbind service [RFC1833]. | |||
| rdma_xid field of the RDMA_ERROR or RDMA_MSG message, must be used to | ||||
| determine the session and slot for the request which failed, and the | ||||
| slot must be properly retired. If this is not done, the slot could | ||||
| be rendered permanently unavailable. | ||||
| 4.2. Reply Size Estimation | No operation in these protocols conveys a significant data payload, | |||
| and the size of RPC messages in these protocols is uniformly small. | ||||
| Therefore, no XDR data items in these protocols are DDP-eligible. | ||||
| The largest variable-length XDR data item is an xdr_netobj. In most | ||||
| implementations this data item is not larger than 1024 bytes, making | ||||
| reliable reply size estimation straightforward using the criteria | ||||
| outlined in Section 2.6. | ||||
| An NFS version 4 client provides a Reply chunk when the maximum | 3.1.2. NFSACL Protocol | |||
| possible reply size is larger than the client's responder inline | ||||
| threshold. NFS version 4 clients already successfully estimate the | ||||
| maximum reply size of most operations in order to provide an adequate | ||||
| set of buffers to receive each NFS reply. | ||||
| There are certain NFS version 4 data items whose size cannot be | Legacy clients and servers that support the NFSACL RPC program | |||
| estimated by clients reliably, however, because there is no protocol- | typically convey NFSACL procedures on the same connection as the NFS | |||
| specified size limit on these structures. These include but are not | RPC program. This obviates the need for separate rpcbind queries to | |||
| limited to opaque types, such as: | discover server support for this RPC program. | |||
| o The attrlist4 field | ACLs are typically small, but even large ACLs must be encoded and | |||
| decoded to some degree. Thus no data item in this Upper Layer | ||||
| Protocol is DDP-eligible. | ||||
| o Fields containing ACLs such as fattr4_acl, fattr4_dacl, | For procedures whose replies do not include an ACL object, the size | |||
| fattr4_sacl | of a reply is determined directly from the NFSACL program's XDR | |||
| definition. | ||||
| o Fields in the fs_locations4 and fs_locations_info4 data structures | There is no protocol-wide size limit for NFS version 3 ACLs, and | |||
| o Opaque fields which pertain to pNFS layout metadata, such as | there is no mechanism in either the NFSACL or NFS programs for a | |||
| loc_body, loh_body, da_addr_body, lou_body, lrf_body, | legacy client to ascertain the largest ACL a legacy server can store. | |||
| fattr_layout_types and fs_layout_types, | Legacy client implementations should choose a maximum size for ACLs | |||
| based on their own internal limits. A recommended lower bound for | ||||
| this maximum is 32,768 bytes, though a larger Reply chunk (up to the | ||||
| negotiated rsize setting) can be provided. | ||||
| In NFS version 4.1 and later minor versions, the csa_fore_chan_attrs | 4. Upper Layer Binding For NFS Version 4 | |||
| argument of the CREATE_SESSION operation contains a | ||||
| ca_maxresponsesize field. The value in this field can be taken as | ||||
| the absolute maximum size of replies generated by a replying NFS | ||||
| version 4 server. This value can be used in cases where it is not | ||||
| possible to estimate a reply size upper bound precisely. In | ||||
| practice, objects such as ACLs, named attributes, layout bodies, and | ||||
| security labels are much smaller than this maximum. | ||||
| With regard to NFS version 4.0, things are more troublesome. | This Upper Layer Binding specification applies to all protocols | |||
| Typically NFS version 4.0 client implementations rely on their own | defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 [RFC5661], and | |||
| architectural limits to keep reply buffer sizes reasonable. For | NFS Version 4.2 [RFC7862]. | |||
| instance, although the NFS version 4 protocol is capable of conveying | ||||
| a megabyte-sized ACL, nearly all known physical filesystems store | ||||
| ACLs in on-disk containers which are small in size. | ||||
| 4.2.1. Managing READ_PLUS Replies | 4.1. DDP-Eligibility | |||
| Only the following XDR data items in the COMPOUND procedure of all | ||||
| NFS version 4 minor versions are DDP-eligible: | ||||
| o The opaque data field in the WRITE4args structure | ||||
| o The linkdata field of the NF4LNK arm in the createtype4 union | ||||
| o The opaque data field in the READ4resok structure | ||||
| o The linkdata field in the READLINK4resok structure | ||||
| o In minor version 2 and newer, the rpc_data field of the | ||||
| read_plus_content union (further restrictions on the use of this | ||||
| data item follow below). | ||||
| 4.1.1. READ_PLUS Replies | ||||
| The NFS version 4.2 READ_PLUS operation returns a complex data type | The NFS version 4.2 READ_PLUS operation returns a complex data type | |||
| [I-D.ietf-nfsv4-minorversion2]. The rpr_contents field in the result | [RFC7862]. The rpr_contents field in the result of this operation is | |||
| of this operation is an array of read_plus_content unions, one arm of | an array of read_plus_content unions, one arm of which contains an | |||
| which contains an opaque byte stream (d_data). | opaque byte stream (d_data). | |||
| The size of d_data is limited to the value of the rpa_count field, | The size of d_data is limited to the value of the rpa_count field, | |||
| but the protocol does not bound the number of elements which can be | but the protocol does not bound the number of elements which can be | |||
| returned in the rpr_contents array. In order to make the size of | returned in the rpr_contents array. In order to make the size of | |||
| READ_PLUS replies predictable by NFS version 4.2 clients, the | READ_PLUS replies predictable by NFS version 4.2 clients, the | |||
| following restrictions are placed on the use of the READ_PLUS | following restrictions are placed on the use of the READ_PLUS | |||
| operation on RPC-over-RDMA transports: | operation on RPC-over-RDMA transports: | |||
| o An NFS version 4.2 client MUST NOT provide more than one Write | o An NFS version 4.2 client MUST NOT provide more than one Write | |||
| chunk for any READ_PLUS operation. When providing a Write chunk | chunk for any READ_PLUS operation. When providing a Write chunk | |||
| skipping to change at page 10, line 12 ¶ | skipping to change at page 9, line 7 ¶ | |||
| use that chunk for the first element of the rpr_contents array | use that chunk for the first element of the rpr_contents array | |||
| that has an rpc_data arm. | that has an rpc_data arm. | |||
| o An NFS version 4.2 server MUST NOT return more than two elements | o An NFS version 4.2 server MUST NOT return more than two elements | |||
| in the rpr_contents array of any READ_PLUS operation. It returns | in the rpr_contents array of any READ_PLUS operation. It returns | |||
| as much of the requested byte range as it can fit within these two | as much of the requested byte range as it can fit within these two | |||
| elements. If the NFS version 4.2 server has not asserted rpr_eof | elements. If the NFS version 4.2 server has not asserted rpr_eof | |||
| in the reply, the NFS version 4.2 client SHOULD send additional | in the reply, the NFS version 4.2 client SHOULD send additional | |||
| READ_PLUS requests for any remaining bytes. | READ_PLUS requests for any remaining bytes. | |||
| 4.3. NFS Version 4 COMPOUND Requests | 4.2. NFS Version 4 Reply Size Estimation | |||
| A single NFS version 4 COMPOUND procedure supplies arguments for a | An NFS version 4 client provides a Reply chunk when the maximum | |||
| sequence of operations, and returns results from that sequence, all | possible reply size is larger than the client's responder inline | |||
| in a single round-trip [RFC7530]. An NFS version 4 client MAY | threshold. | |||
| construct an NFS version 4 COMPOUND procedure that provides more than | ||||
| one chunk in the Read list or Write list as long as it observes the | ||||
| restrictions in Section 4.1. | ||||
| An NFS version 4 client provides XDR Position values in each Read | There are certain NFS version 4 data items whose size cannot be | |||
| chunk to disambiguate which chunk is associated with which argument | estimated by clients reliably, however, because there is no protocol- | |||
| data item. However NFS version 4 server and client implementations | specified size limit on these structures. These include: | |||
| must agree in advance on how to pair Write chunks with returned | ||||
| result data items. | ||||
| The mechanism specified in Section 5.3.2 of | o The attrlist4 field | |||
| [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with some additional | ||||
| restrictions. In the following list, an "NFS Read" operation refers | o Fields containing ACLs such as fattr4_acl, fattr4_dacl, | |||
| to any NFS Version 4 operation which has a DDP-eligible result data | fattr4_sacl | |||
| item (i.e., either a READ, READ_PLUS, or READLINK operation). | ||||
| o Fields in the fs_locations4 and fs_locations_info4 data structures | ||||
| o Opaque fields which pertain to pNFS layout metadata, such as | ||||
| loc_body, loh_body, da_addr_body, lou_body, lrf_body, | ||||
| fattr_layout_types and fs_layout_types, | ||||
| 4.2.1. Reply Size Estimation For Minor Version 0 | ||||
| The items enumerated above in Section 4.2 make it difficult to | ||||
| predict the maximum size of GETATTR replies that interrogate | ||||
| variable-length attributes. As discussed in Section 2.6, client | ||||
| implementations can rely on their own internal architectural limits | ||||
| to bound the reply size, but such limits are not guaranteed to be | ||||
| reliable. | ||||
| If a client implementation is equipped to recognize that a transport | ||||
| error could mean that it provisioned an inadequately sized Reply | ||||
| chunk, it can retry the operation with a larger Reply chunk. | ||||
| Otherwise, the client must terminate the RPC transaction. | ||||
| It is best to avoid issuing single COMPOUNDs that contain both non- | ||||
| idempotent operations and operations where the maximum reply size | ||||
| cannot be reliably predicted. | ||||
| 4.2.2. Reply Size Estimation For Minor Version 1 And Newer | ||||
| In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs | ||||
| argument of the CREATE_SESSION operation contains a | ||||
| ca_maxresponsesize field. The value in this field can be taken as | ||||
| the absolute maximum size of replies generated by a replying NFS | ||||
| version 4 server. | ||||
| This value can be used in cases where it is not possible to estimate | ||||
| a reply size upper bound precisely. In practice, objects such as | ||||
| ACLs, named attributes, layout bodies, and security labels are much | ||||
| smaller than this maximum. | ||||
| 4.3. NFS Version 4 COMPOUND Requests | ||||
| The NFS version 4 COMPOUND procedure allows the transmission of more | ||||
| than one DDP-eligible data item per Call and Reply message. An NFS | ||||
| version 4 client provides XDR Position values in each Read chunk to | ||||
| disambiguate which chunk is associated with which argument data item. | ||||
| However NFS version 4 server and client implementations must agree in | ||||
| advance on how to pair Write chunks with returned result data items. | ||||
| The mechanism specified in Section 4.3.2 of | ||||
| [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with additional | ||||
| restrictions that appear below. In the following list, an "NFS Read" | ||||
| operation refers to any NFS Version 4 operation which has a DDP- | ||||
| eligible result data item (i.e., either a READ, READ_PLUS, or | ||||
| READLINK operation). | ||||
| o If an NFS version 4 client wishes all DDP-eligible items in an NFS | o If an NFS version 4 client wishes all DDP-eligible items in an NFS | |||
| reply to be conveyed inline, it leaves the Write list empty. | reply to be conveyed inline, it leaves the Write list empty. | |||
| o The first chunk in the Write list MUST be used by the first NFS | o The first chunk in the Write list MUST be used by the first READ | |||
| Read operation in an NFS version 4 COMPOUND procedure. The next | operation in an NFS version 4 COMPOUND procedure. The next Write | |||
| Write chunk is used by the next NFS Read operation, and so on. | chunk is used by the next READ operation, and so on. | |||
| o If an NFS version 4 client has provided a matching non-empty Write | o If an NFS version 4 client has provided a matching non-empty Write | |||
| chunk, then the corresponding NFS Read operation MUST return its | chunk, then the corresponding READ operation MUST return its DDP- | |||
| DDP-eligible data item using that chunk. | eligible data item using that chunk. | |||
| o If an NFS version 4 client has provided an empty matching Write | o If an NFS version 4 client has provided an empty matching Write | |||
| chunk, then the corresponding NFS Read operation MUST return all | chunk, then the corresponding READ operation MUST return all of | |||
| of its result data items inline. | its result data items inline. | |||
| o If an NFS Read operation returns a union arm which does not | o If an READ operation returns a union arm which does not contain a | |||
| contain a DDP-eligible result, and the NFS version 4 client has | DDP-eligible result, and the NFS version 4 client has provided a | |||
| provided a matching non-empty Write chunk, an NFS version 4 server | matching non-empty Write chunk, an NFS version 4 server MUST | |||
| MUST return an empty Write chunk in that Write list position. | return an empty Write chunk in that Write list position. | |||
| o If there are more NFS Read operations than Write chunks, then | o If there are more READ operations than Write chunks, then | |||
| remaining NFS Read operations in an NFS version 4 COMPOUND that | remaining NFS Read operations in an NFS version 4 COMPOUND that | |||
| have no matching Write chunk MUST return their results inline. | have no matching Write chunk MUST return their results inline. | |||
| 4.3.1. NFS Version 4 COMPOUND Example | 4.3.1. NFS Version 4 COMPOUND Example | |||
| The following example shows a Write list with three Write chunks, A, | The following example shows a Write list with three Write chunks, A, | |||
| B, and C. The NFS version 4 server consumes the provided Write | B, and C. The NFS version 4 server consumes the provided Write | |||
| chunks by writing the results of the designated operations in the | chunks by writing the results of the designated operations in the | |||
| compound request (READ and READLINK) back to each chunk. | compound request (READ and READLINK) back to each chunk. | |||
| skipping to change at page 11, line 40 ¶ | skipping to change at page 11, line 36 ¶ | |||
| 4.4. NFS Version 4 Callback | 4.4. NFS Version 4 Callback | |||
| The NFS version 4 protocols support server-initiated callbacks to | The NFS version 4 protocols support server-initiated callbacks to | |||
| notify clients of events such as recalled delegations. | notify clients of events such as recalled delegations. | |||
| 4.4.1. NFS Version 4.0 Callback | 4.4.1. NFS Version 4.0 Callback | |||
| NFS version 4.0 implementations typically employ a separate TCP | NFS version 4.0 implementations typically employ a separate TCP | |||
| connection to handle callback operations, even when the forward | connection to handle callback operations, even when the forward | |||
| channel uses a RPC-over-RDMA transport. Therefore no Upper Layer | channel uses a RPC-over-RDMA transport. | |||
| Binding for the NFS version 4.0 callback program is provided in this | ||||
| document. | No operation in the NFS version 4.0 callback RPC program conveys a | |||
| significant data payload. Therefore, no XDR data items in this RPC | ||||
| program is DDP-eligible. | ||||
| A CB_RECALL reply is small and fixed in size. The CB_GETATTR reply | ||||
| contains a variable-length fattr4 data item. See Section 4.2.1 for a | ||||
| discussion of reply size prediction for this data item. | ||||
| An NFS version 4.0 client advertises netids and ad hoc port addresses | ||||
| for contacting its NFS version 4.0 callback service using the | ||||
| SETCLIENTID operation. | ||||
| 4.4.2. NFS Version 4.1 Callback | 4.4.2. NFS Version 4.1 Callback | |||
| In NFS version 4.1 and later minor versions, callback operations may | In NFS version 4.1 and newer minor versions, callback operations may | |||
| appear on the same connection as is used for NFS version 4 forward | appear on the same connection as is used for NFS version 4 forward | |||
| channel client requests. NFS version 4 clients and servers MUST use | channel client requests. NFS version 4 clients and servers MUST use | |||
| the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when | the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when | |||
| backchannel operations are conveyed on RPC-over-RDMA transports. | backchannel operations are conveyed on RPC-over-RDMA transports. | |||
| The csa_back_chan_attrs argument of the CREATE_SESSION operation | The csa_back_chan_attrs argument of the CREATE_SESSION operation | |||
| contains a ca_maxresponsesize field. The value in this field can be | contains a ca_maxresponsesize field. The value in this field can be | |||
| taken as the absolute maximum size of backchannel replies generated | taken as the absolute maximum size of backchannel replies generated | |||
| by a replying NFS version 4 client. | by a replying NFS version 4 client. | |||
| There are no DDP-eligible data items in callback protocols associated | There are no DDP-eligible data items in callback procedures defined | |||
| with NFS version 4.1 or NFS version 4.2. However, some callback | in NFS version 4.1 or NFS version 4.2. However, some callback | |||
| requests, such as messages that convey device ID information, may be | operations, such as messages that convey device ID information, can | |||
| large, in which case a Long Call or Reply may be appropriate. When | be large, in which case a Long Call or Reply might be required. | |||
| the NFS version 4 client reports a backchannel ca_maxresponsesize | ||||
| that is larger than the connection's inline thresholds, the NFS | ||||
| version 4 client can support Long messages (i.e., Read chunks and | ||||
| Reply chunks). Otherwise an NFS version 4 server MUST use Short | ||||
| messages to convey backchannel operations. | ||||
| See Section 4.1 for a discussion of how an NFS version 4 server | When an NFS version 4.1 client reports a backchannel | |||
| handles situations where an NFS version 4 client has provided | ca_maxrequestsize that is larger than the connection's inline | |||
| inadequate RDMA resources to convey a backchannel reply. | thresholds, the NFS version 4 client can support Long Calls. | |||
| Otherwise an NFS version 4 server MUST use Short messages to convey | ||||
| backchannel operations. | ||||
| 4.5. Connection Keep-Alive | 4.5. Session-Related Considerations | |||
| Typically the presence of an NFS session [RFC5661] has no effect on | ||||
| the operation of RPC-over-RDMA. None of the operations introduced to | ||||
| support NFS sessions contain DDP-eligible data items. There is no | ||||
| need to match the number of session slots with the number of | ||||
| available RPC-over-RDMA credits. | ||||
| However, there are some rare error conditions which require special | ||||
| handling when an NFS session is operating on an RPC-over-RDMA | ||||
| transport. For example, a requester might receive, in response to an | ||||
| RPC request, an RDMA_ERROR message with an rdma_err value of | ||||
| ERR_CHUNK, or an RDMA_MSG containing an RPC_GARBAGEARGS reply. | ||||
| Within RPC-over-RDMA Version One, this class of error can be | ||||
| generated for two different reasons: | ||||
| o There was an XDR error detected parsing the RPC-over-RDMA headers. | ||||
| o There was an error sending the response, because, for example, a | ||||
| necessary reply chunk was not provided or the one provided is of | ||||
| insufficient length. | ||||
| These two situations, which arise due to incorrect implementations or | ||||
| underestimation of reply size, have different implications with | ||||
| regard to Exactly-Once Semantics. An XDR error in decoding the | ||||
| request precludes the execution of the request on the responder, but | ||||
| failure to send a reply indicates that some or all of the operations | ||||
| were executed. | ||||
| In both instances, the client SHOULD NOT retry the operation without | ||||
| addressing reply resource inadequacy. Such a retry can result in the | ||||
| same sort of error seen previously. Instead, it is best to consider | ||||
| the operation as completed unsuccessfully and report an error to the | ||||
| consumer who requested the RPC. | ||||
| In addition, within the error response, the requester does not have | ||||
| the result of the execution of the SEQUENCE operation, which | ||||
| identifies the session, slot, and sequence id for the request which | ||||
| has failed. The xid associated with the request, obtained from the | ||||
| rdma_xid field of the RDMA_ERROR or RDMA_MSG message, must be used to | ||||
| determine the session and slot for the request which failed, and the | ||||
| slot must be properly retired. If this is not done, the slot could | ||||
| be rendered permanently unavailable. | ||||
| 4.6. Connection Keep-Alive | ||||
| NFS version 4 client implementations often rely on a transport-layer | NFS version 4 client implementations often rely on a transport-layer | |||
| keep-alive mechanism to detect when an NFS version 4 server has | keep-alive mechanism to detect when an NFS version 4 server has | |||
| become unresponsive. When an NFS server is no longer responsive, | become unresponsive. When an NFS server is no longer responsive, | |||
| client-side keep-alive terminates the connection, which in turn | client-side keep-alive terminates the connection, which in turn | |||
| triggers reconnection and RPC retransmission. | triggers reconnection and RPC retransmission. | |||
| RDMA transports have no keep-alive mechanism. Without a disconnect | Some RDMA transports (such as Reliable Connections on InfiniBand) | |||
| or new RPC traffic, RDMA transport connections can remain alive long | have no keep-alive mechanism. Without a disconnect or new RPC | |||
| after an NFS server has become unresponsive. Once an NFS client has | traffic, such connections can remain alive long after an NFS server | |||
| consumed all available RPC-over-RDMA credits on that transport | has become unresponsive. Once an NFS client has consumed all | |||
| connection, it will forever await a reply before sending another RPC | available RPC-over-RDMA credits on that transport connection, it will | |||
| request. | forever await a reply before sending another RPC request. | |||
| NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use | NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use | |||
| for periodic server or connection health assessment. This credit can | for periodic server or connection health assessment. This credit can | |||
| be used to drive an RPC request on an otherwise idle connection, | be used to drive an RPC request on an otherwise idle connection, | |||
| triggering either a quick affirmative server response or immediate | triggering either a quick affirmative server response or immediate | |||
| connection termination. | connection termination. | |||
| To prevent lease expiry, NFS version 4 clients should use a lease- | ||||
| extending operation such as RENEW or SEQUENCE, rather than a NULL | ||||
| request, when performing a periodic health assessment. | ||||
| 5. Extending NFS Upper Layer Bindings | 5. Extending NFS Upper Layer Bindings | |||
| RPC programs such as NFS are required to have an Upper Layer Binding | RPC programs such as NFS are required to have an Upper Layer Binding | |||
| specification to interoperate on RPC-over-RDMA transports | specification to interoperate on RPC-over-RDMA transports | |||
| [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer | [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer | |||
| Binding specified in this document can be extended to cover versions | Binding specified in this document can be extended to cover versions | |||
| of the NFS version 4 protocol specified after NFS version 4 minor | of the NFS version 4 protocol specified after NFS version 4 minor | |||
| version 2. This includes NFS version 4 extensions that are | version 2, or separately published extensions to an existing NFS | |||
| documented separately from a new minor version. | version 4 minor version, as described in [I-D.ietf-nfsv4-versioning]. | |||
| 6. IANA Considerations | 6. IANA Considerations | |||
| NFS use of direct data placement introduces a need for an additional | NFS use of direct data placement introduces a need for an additional | |||
| NFS port number assignment for networks that share traditional UDP | NFS port number assignment for networks that share traditional UDP | |||
| and TCP port spaces with RDMA services. The iWARP [RFC5041] | and TCP port spaces with RDMA services. The iWARP [RFC5041] | |||
| [RFC5040] protocol is such an example (InfiniBand is not). | [RFC5040] protocol is such an example (InfiniBand is not). | |||
| NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally | NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally | |||
| listen for clients on UDP and TCP port 2049, and additionally, they | listen for clients on UDP and TCP port 2049, and additionally, they | |||
| skipping to change at page 14, line 13 ¶ | skipping to change at page 15, line 9 ¶ | |||
| that this choice does not introduce new vulnerabilities. | that this choice does not introduce new vulnerabilities. | |||
| Because this document defines only the binding of the NFS protocols | Because this document defines only the binding of the NFS protocols | |||
| atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security | atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security | |||
| considerations are therefore to be described at that layer. | considerations are therefore to be described at that layer. | |||
| 8. References | 8. References | |||
| 8.1. Normative References | 8.1. Normative References | |||
| [I-D.ietf-nfsv4-minorversion2] | ||||
| Haynes, T., "NFS Version 4 Minor Version 2", draft-ietf- | ||||
| nfsv4-minorversion2-41 (work in progress), January 2016. | ||||
| [I-D.ietf-nfsv4-rfc5666bis] | [I-D.ietf-nfsv4-rfc5666bis] | |||
| Lever, C., Simpson, W., and T. Talpey, "Remote Direct | Lever, C., Simpson, W., and T. Talpey, "Remote Direct | |||
| Memory Access Transport for Remote Procedure Call, Version | Memory Access Transport for Remote Procedure Call, Version | |||
| One", draft-ietf-nfsv4-rfc5666bis-07 (work in progress), | One", draft-ietf-nfsv4-rfc5666bis-09 (work in progress), | |||
| May 2016. | January 2017. | |||
| [I-D.ietf-nfsv4-rpcrdma-bidirection] | [I-D.ietf-nfsv4-rpcrdma-bidirection] | |||
| Lever, C., "Bi-directional Remote Procedure Call On RPC- | Lever, C., "Bi-directional Remote Procedure Call On RPC- | |||
| over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- | over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- | |||
| bidirection-05 (work in progress), June 2016. | bidirection-06 (work in progress), January 2017. | |||
| [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | |||
| RFC 1833, DOI 10.17487/RFC1833, August 1995, | RFC 1833, DOI 10.17487/RFC1833, August 1995, | |||
| <http://www.rfc-editor.org/info/rfc1833>. | <http://www.rfc-editor.org/info/rfc1833>. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <http://www.rfc-editor.org/info/rfc2119>. | <http://www.rfc-editor.org/info/rfc2119>. | |||
| skipping to change at page 15, line 5 ¶ | skipping to change at page 15, line 42 ¶ | |||
| [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., | [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., | |||
| "Network File System (NFS) Version 4 Minor Version 1 | "Network File System (NFS) Version 4 Minor Version 1 | |||
| Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, | Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, | |||
| <http://www.rfc-editor.org/info/rfc5661>. | <http://www.rfc-editor.org/info/rfc5661>. | |||
| [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System | [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System | |||
| (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, | (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, | |||
| March 2015, <http://www.rfc-editor.org/info/rfc7530>. | March 2015, <http://www.rfc-editor.org/info/rfc7530>. | |||
| [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | ||||
| Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, | ||||
| November 2016, <http://www.rfc-editor.org/info/rfc7862>. | ||||
| 8.2. Informative References | 8.2. Informative References | |||
| [I-D.ietf-nfsv4-versioning] | ||||
| Noveck, D., "Rules for NFSv4 Extensions and Minor | ||||
| Versions", draft-ietf-nfsv4-versioning-09 (work in | ||||
| progress), December 2016. | ||||
| [NSM] The Open Group, "Protocols for Interworking: XNFS, Version | [NSM] The Open Group, "Protocols for Interworking: XNFS, Version | |||
| 3W", February 1998. | 3W", February 1998. | |||
| [RFC1094] Nowicki, B., "NFS: Network File System Protocol | [RFC1094] Nowicki, B., "NFS: Network File System Protocol | |||
| specification", RFC 1094, DOI 10.17487/RFC1094, March | specification", RFC 1094, DOI 10.17487/RFC1094, March | |||
| 1989, <http://www.rfc-editor.org/info/rfc1094>. | 1989, <http://www.rfc-editor.org/info/rfc1094>. | |||
| [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | |||
| Version 3 Protocol Specification", RFC 1813, | Version 3 Protocol Specification", RFC 1813, | |||
| DOI 10.17487/RFC1813, June 1995, | DOI 10.17487/RFC1813, June 1995, | |||
| skipping to change at page 16, line 14 ¶ | skipping to change at page 17, line 11 ¶ | |||
| Technical corrections have been made. For example, the mention of | Technical corrections have been made. For example, the mention of | |||
| 12KB and 36KB inline thresholds have been removed. The reference to | 12KB and 36KB inline thresholds have been removed. The reference to | |||
| a non-existant NFS version 4 SYMLINK operation has been replaced with | a non-existant NFS version 4 SYMLINK operation has been replaced with | |||
| NFS version 4 CREATE(NF4LNK). | NFS version 4 CREATE(NF4LNK). | |||
| The discussion of NFS version 4 COMPOUND handling has been completed. | The discussion of NFS version 4 COMPOUND handling has been completed. | |||
| Some changes were made to the algorithm for matching DDP-eligible | Some changes were made to the algorithm for matching DDP-eligible | |||
| results to Write chunks. | results to Write chunks. | |||
| Requirements to ignore extra Read or Write chunks have been removed | ||||
| from the NFS version 2 and 3 Upper Layer Binding, as they conflict | ||||
| with [I-D.ietf-nfsv4-rfc5666bis]. | ||||
| A complete discussion of reply size estimation has been introduced | ||||
| for all protocols covered by the Upper Layer Bindings in this | ||||
| document. | ||||
| The following additional improvements have been made, relative to | The following additional improvements have been made, relative to | |||
| [RFC5667]: | [RFC5667]: | |||
| o An explicit discussion of NFS version 4.0 and NFS version 4.1 | o An explicit discussion of NFS version 4.0 and NFS version 4.1 | |||
| backchannel operation has replaced the previous treatment of | backchannel operation has replaced the previous treatment of | |||
| callback operations. | callback operations. | |||
| o A binding for NFS version 4.2 has been added that includes | o A binding for NFS version 4.2 has been added that includes | |||
| discussion of new data-bearing operations like READ_PLUS. | discussion of new data-bearing operations like READ_PLUS. | |||
| skipping to change at page 16, line 50 ¶ | skipping to change at page 18, line 7 ¶ | |||
| Appendix B. Acknowledgments | Appendix B. Acknowledgments | |||
| The author gratefully acknowledges the work of Brent Callaghan and | The author gratefully acknowledges the work of Brent Callaghan and | |||
| Tom Talpey on the original NFS Direct Data Placement specification | Tom Talpey on the original NFS Direct Data Placement specification | |||
| [RFC5667]. The author also wishes to thank Bill Baker and Greg | [RFC5667]. The author also wishes to thank Bill Baker and Greg | |||
| Marsden for their support of this work. | Marsden for their support of this work. | |||
| Dave Noveck provided excellent review, constructive suggestions, and | Dave Noveck provided excellent review, constructive suggestions, and | |||
| consistent navigational guidance throughout the process of drafting | consistent navigational guidance throughout the process of drafting | |||
| this document. Dave also contributed the text of Section 4.1.1. | this document. Dave also contributed the text of Section 4.5 | |||
| Thanks to Karen Deitke for her sharp observations about idempotency, | Thanks to Karen Deitke for her sharp observations about idempotency, | |||
| and the clarity of the discussion of NFS COMPOUNDs. | and the clarity of the discussion of NFS COMPOUNDs. | |||
| Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 | Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 | |||
| Working Group Chair Spencer Shepler, and nfsv4 Working Group | Working Group Chair Spencer Shepler, and nfsv4 Working Group | |||
| Secretary Thomas Haynes for their support. | Secretary Thomas Haynes for their support. | |||
| Author's Address | Author's Address | |||
| Charles Lever (editor) | Charles Lever (editor) | |||
| Oracle Corporation | Oracle Corporation | |||
| 1015 Granger Avenue | 1015 Granger Avenue | |||
| Ann Arbor, MI 48104 | Ann Arbor, MI 48104 | |||
| USA | USA | |||
| Phone: +1 734 274 2396 | Phone: +1 248 816 6463 | |||
| Email: chuck.lever@oracle.com | Email: chuck.lever@oracle.com | |||
| End of changes. 91 change blocks. | ||||
| 320 lines changed or deleted | 378 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||