| < draft-ietf-nfsv4-rfc5667bis-06.txt | draft-ietf-nfsv4-rfc5667bis-07.txt > | |||
|---|---|---|---|---|
| Network File System Version 4 C. Lever, Ed. | Network File System Version 4 C. Lever, Ed. | |||
| Internet-Draft Oracle | Internet-Draft Oracle | |||
| Obsoletes: 5667 (if approved) February 24, 2017 | Obsoletes: 5667 (if approved) March 9, 2017 | |||
| Intended status: Standards Track | Intended status: Standards Track | |||
| Expires: August 28, 2017 | Expires: September 10, 2017 | |||
| Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA Version | Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA Version | |||
| One | One | |||
| draft-ietf-nfsv4-rfc5667bis-06 | draft-ietf-nfsv4-rfc5667bis-07 | |||
| Abstract | Abstract | |||
| This document specifies Upper Layer Bindings of Network File System | This document specifies Upper Layer Bindings of Network File System | |||
| (NFS) protocol versions to RPC-over-RDMA Version One. Upper Layer | (NFS) protocol versions to RPC-over-RDMA Version One, enabling the | |||
| Bindings are required in order to enable RPC-based protocols such as | use of Direct Data Placement. This document obsoletes RFC 5667. | |||
| NFS to use Direct Data Placement on RPC-over-RDMA Version One. This | ||||
| document obsoletes RFC 5667. | ||||
| Requirements Language | Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| skipping to change at page 1, line 42 ¶ | skipping to change at page 1, line 40 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on August 28, 2017. | This Internet-Draft will expire on September 10, 2017. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 38 ¶ | skipping to change at page 2, line 34 ¶ | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Reply Size Estimation . . . . . . . . . . . . . . . . . . . . 3 | 2. Reply Size Estimation . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2.1. Short Reply Chunk Retry . . . . . . . . . . . . . . . . . 4 | 2.1. Short Reply Chunk Retry . . . . . . . . . . . . . . . . . 4 | |||
| 3. Upper Layer Binding for NFS Versions 2 and 3 . . . . . . . . 5 | 3. Upper Layer Binding for NFS Versions 2 and 3 . . . . . . . . 5 | |||
| 3.1. Reply Size Estimation . . . . . . . . . . . . . . . . . . 5 | 3.1. Reply Size Estimation . . . . . . . . . . . . . . . . . . 5 | |||
| 3.2. RPC Binding Considerations . . . . . . . . . . . . . . . 5 | 3.2. RPC Binding Considerations . . . . . . . . . . . . . . . 5 | |||
| 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary | 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary | |||
| Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 4.1. MOUNT, NLM, and NSM Protocols . . . . . . . . . . . . . . 6 | 4.1. MOUNT, NLM, and NSM Protocols . . . . . . . . . . . . . . 6 | |||
| 4.2. NFSACL Protocol . . . . . . . . . . . . . . . . . . . . . 7 | 4.2. NFSACL Protocol . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 5. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 | 5. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 | |||
| 5.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 7 | 5.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 5.2. Reply Size Estimation . . . . . . . . . . . . . . . . . . 8 | 5.2. Reply Size Estimation . . . . . . . . . . . . . . . . . . 8 | |||
| 5.3. RPC Binding Considerations . . . . . . . . . . . . . . . 9 | 5.3. RPC Binding Considerations . . . . . . . . . . . . . . . 9 | |||
| 5.4. NFS COMPOUND Requests . . . . . . . . . . . . . . . . . . 10 | 5.4. NFS COMPOUND Requests . . . . . . . . . . . . . . . . . . 10 | |||
| 5.5. NFS Callback Requests . . . . . . . . . . . . . . . . . . 11 | 5.5. NFS Callback Requests . . . . . . . . . . . . . . . . . . 11 | |||
| 5.6. Session-Related Considerations . . . . . . . . . . . . . 12 | 5.6. Session-Related Considerations . . . . . . . . . . . . . 12 | |||
| 5.7. Transport Considerations . . . . . . . . . . . . . . . . 13 | 5.7. Transport Considerations . . . . . . . . . . . . . . . . 13 | |||
| 6. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 14 | 6. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 14 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 | 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 | |||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 16 | 9.2. Informative References . . . . . . . . . . . . . . . . . 16 | |||
| Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 17 | Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 17 | |||
| Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 18 | Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 18 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 1. Introduction | 1. Introduction | |||
| An RPC-over-RDMA Version One transport may employ direct data | The RPC-over-RDMA Version One transport may employ direct data | |||
| placement to convey certain data payloads associated with RPC | placement to convey data payloads associated with RPC transactions | |||
| transactions [I-D.ietf-nfsv4-rfc5666bis]. To enable successful | [I-D.ietf-nfsv4-rfc5666bis]. To enable successful interoperation, | |||
| interoperation, implementations of RPC Programs running on RPC-over- | RPC client and server implementations using RPC-over-RDMA Version One | |||
| RDMA must agree as to which XDR data items in what particular RPC | must agree which XDR data items and RPC procedures are eligible to | |||
| procedures are eligible for direct data placement (DDP). This | use direct data placement (DDP). | |||
| agreement is specified in an Upper Layer Binding. | ||||
| An Upper Layer Binding specifies this agreement for one RPC Program. | ||||
| Other operational details, such as RPC binding assignments, pairing | ||||
| Write chunks with result data items, and reply size estimation, are | ||||
| also specified by this Binding. | ||||
| This document contains material required of Upper Layer Bindings, as | This document contains material required of Upper Layer Bindings, as | |||
| specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS | specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS | |||
| protocol versions: | protocol versions: | |||
| o NFS Version 2 [RFC1094] | o NFS Version 2 [RFC1094] | |||
| o NFS Version 3 [RFC1813] | o NFS Version 3 [RFC1813] | |||
| o NFS Version 4.0 [RFC7530] | o NFS Version 4.0 [RFC7530] | |||
| o NFS Version 4.1 [RFC5661] | o NFS Version 4.1 [RFC5661] | |||
| o NFS Version 4.2 [RFC7862] | o NFS Version 4.2 [RFC7862] | |||
| Upper Layer Bindings are also provided for auxiliary protocols used | ||||
| with NFS versions 2 and 3. | ||||
| This document assumes the reader is already familiar with concepts | This document assumes the reader is already familiar with concepts | |||
| and terminology defined in [I-D.ietf-nfsv4-rfc5666bis] and the | and terminology defined in [I-D.ietf-nfsv4-rfc5666bis] and the | |||
| documents it references. | documents it references. | |||
| 2. Reply Size Estimation | 2. Reply Size Estimation | |||
| On an RPC-over-RDMA Version One transport, during the construction of | During the construction of each RPC Call message, a requester is | |||
| each RPC Call message, a requester is responsible for allocating | responsible for allocating appropriate resources for receiving the | |||
| appropriate resources for receiving the matching Reply message. | corresponding Reply message. If the requester expects the RPC Reply | |||
| message will be larger than its inline threshold, it provides Write | ||||
| and/or Reply chunks wherein the responder can place results and the | ||||
| reply's Payload stream. | ||||
| An overrun of these resources can result in corruption of the Reply | A reply resource overrun occurs if the RPC Reply Payload stream does | |||
| message or termination of the transport connection. Therefore | not fit into the provided Reply chunk, or no Reply chunk was provided | |||
| reliable reply size estimation is necessary to ensure successful | and the Payload stream does not fit inline. This prevents the | |||
| interoperation. This is particularly critical, for example, when | responder from returning the Upper Layer reply to the requester. | |||
| allocating a Reply chunk. | ||||
| Therefore reliable reply size estimation is necessary to ensure | ||||
| successful interoperation. | ||||
| In most cases, the NFS protocol's XDR definition provides enough | In most cases, the NFS protocol's XDR definition provides enough | |||
| information to enable an NFS client to predict the maximum size of | information to enable an NFS client to predict the maximum size of | |||
| the expected Reply message. If there are variable-size data items in | the expected Reply message. If there are variable-size data items in | |||
| the result, the maximum size of the RPC Reply message can be | the result, the maximum size of the RPC Reply message can be | |||
| estimated as follows: | estimated as follows: | |||
| o The client requests only a specific portion of an object (for | o The client requests only a specific portion of an object (for | |||
| example, using the "count" and "offset" fields in an NFS READ). | example, using the "count" and "offset" fields in an NFS READ). | |||
| o The client limits the number of results (e.g. using the "count" | ||||
| field of an NFS READDIR request). | ||||
| o The client has already cached the size of the whole object it is | o The client has already cached the size of the whole object it is | |||
| about to request (say, via a previous NFS GETATTR request). | about to request (say, via a previous NFS GETATTR request). | |||
| o The client and server have negotiated a maximum size for all calls | o The client and server have negotiated a maximum size for all calls | |||
| and responses (using a CREATE_SESSION operation, for instance). | and responses (using a CREATE_SESSION operation, for instance). | |||
| 2.1. Short Reply Chunk Retry | 2.1. Short Reply Chunk Retry | |||
| In a few cases, either the size of one or more returned data items or | In a few cases, either the size of one or more returned data items or | |||
| the number of returned data items cannot be known in advance of | the number of returned data items cannot be known in advance of | |||
| forming an RPC Call. | forming an RPC Call. | |||
| A requester uses a Reply chunk to handle an RPC transaction where the | If an NFS server finds that the NFS client provided inadequate | |||
| expected RPC Reply message might be larger than the requester's | receive resources to return the whole reply, it returns an RPC level | |||
| inline threshold. If an actual RPC Reply message does not fit in a | error or a transport error, such as ERR_CHUNK. | |||
| client-provided Reply chunk, the NFS server responds with an | ||||
| RDMA_ERROR message with the rdma_err field set to ERR_CHUNK, or it | ||||
| could even break the transport connection. | ||||
| In response, an NFS client can choose to: | In response to these errors, an NFS client can choose to: | |||
| o Terminate the RPC transaction with an error, or | o Terminate the RPC transaction immediately with an error, or | |||
| o Allocate a larger Reply chunk and send the same request as a new | o Allocate a larger Reply chunk and send the same request as a new | |||
| RPC transaction (to avoid hitting in a Duplicate Reply Cache). | RPC transaction (to avoid hitting in a Duplicate Reply Cache). | |||
| The NFS client should avoid retrying the request indefinitely | The NFS client should avoid retrying the request indefinitely | |||
| because a responder may return ERR_CHUNK for a variety of reasons. | because a responder may return ERR_CHUNK for a variety of reasons. | |||
| The latter choice is considered heroic recovery, and is only a real | ||||
| choice for the few operations where it is not possible for an NFS | ||||
| client to predict the size of the Reply message in advance. | ||||
| Subsequent sections of this document discuss exactly which operations | Subsequent sections of this document discuss exactly which operations | |||
| might have ultimate difficulty with Reply size estimation. These | might have ultimate difficulty with Reply size estimation. These | |||
| operations are eligible for "short Reply chunk retry." Unless | operations are eligible for "short Reply chunk retry." Unless | |||
| explicitly mentioned as applicable, short Reply chunk retry should | explicitly mentioned as applicable, short Reply chunk retry should | |||
| not be used. | not be used. | |||
| NFS server implementations can avoid connection loss by first | ||||
| confirming that target RDMA segments are large enough to receive | ||||
| results before initiating explicit RDMA operations. | ||||
| 3. Upper Layer Binding for NFS Versions 2 and 3 | 3. Upper Layer Binding for NFS Versions 2 and 3 | |||
| The Upper Layer Binding specification in this section applies to NFS | The Upper Layer Binding specification in this section applies to NFS | |||
| Version 2 [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in | Version 2 [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in | |||
| this document a "Legacy NFS client" refers to an NFS client using the | this document a "Legacy NFS client" refers to an NFS client using the | |||
| NFS version 2 or NFS version 3 RPC Programs (100003) to communicate | NFS version 2 or NFS version 3 RPC Programs (100003) to communicate | |||
| with an NFS server. Likewise, a "Legacy NFS server" is an NFS server | with an NFS server. Likewise, a "Legacy NFS server" is an NFS server | |||
| communicating with clients using NFS version 2 or NFS version 3. | communicating with clients using NFS version 2 or NFS version 3. | |||
| The following XDR data items in NFS versions 2 and 3 are DDP- | The following XDR data items in NFS versions 2 and 3 are DDP- | |||
| skipping to change at page 5, line 28 ¶ | skipping to change at page 5, line 32 ¶ | |||
| o The pathname argument in the NFS SYMLINK procedure | o The pathname argument in the NFS SYMLINK procedure | |||
| o The opaque file data result in the NFS READ procedure | o The opaque file data result in the NFS READ procedure | |||
| o The pathname result in the NFS READLINK procedure | o The pathname result in the NFS READLINK procedure | |||
| All other argument or result data items in NFS versions 2 and 3 are | All other argument or result data items in NFS versions 2 and 3 are | |||
| not DDP-eligible. | not DDP-eligible. | |||
| A Legacy NFS client MUST NOT send a reduced Payload stream in a Long | A transport error does not give an indication of whether the server | |||
| Call. A Legacy NFS client MUST NOT enable a Legacy NFS server to | has processed the arguments of the RPC Call, or whether the server | |||
| send a reduced Payload stream in a Long Reply. | has accessed or modified client memory associated with that RPC. | |||
| A Legacy server's response to a DDP-eligibility violation does not | ||||
| give an indication to Legacy clients of whether the server has | ||||
| processed the arguments of the RPC Call, or whether the server has | ||||
| accessed or modified client memory associated with that RPC. | ||||
| 3.1. Reply Size Estimation | 3.1. Reply Size Estimation | |||
| A Legacy NFS client determines the maximum reply size for each | A Legacy NFS client determines the maximum reply size for each | |||
| operation using the criteria outlined in Section 2. There are no | operation using the criteria outlined in Section 2. There are no | |||
| operations in NFS version 2 or 3 that benefit from short Reply chunk | operations in NFS version 2 or 3 that benefit from short Reply chunk | |||
| retry. | retry. | |||
| 3.2. RPC Binding Considerations | 3.2. RPC Binding Considerations | |||
| skipping to change at page 6, line 16 ¶ | skipping to change at page 6, line 15 ¶ | |||
| [I-D.ietf-nfsv4-rfc5666bis]. | [I-D.ietf-nfsv4-rfc5666bis]. | |||
| 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary Protocols | 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary Protocols | |||
| NFS versions 2 and 3 are typically deployed with several other | NFS versions 2 and 3 are typically deployed with several other | |||
| protocols, sometimes referred to as "NFS auxiliary protocols." These | protocols, sometimes referred to as "NFS auxiliary protocols." These | |||
| are distinct RPC Programs that define procedures which are not part | are distinct RPC Programs that define procedures which are not part | |||
| of the NFS version 2 or version 3 RPC Programs. The Upper Layer | of the NFS version 2 or version 3 RPC Programs. The Upper Layer | |||
| Bindings in this section apply to: | Bindings in this section apply to: | |||
| o The MOUNT and NLM protocols, introduced in an appendix of | o Versions 2 and 3 of the MOUNT protocol [RFC1813] | |||
| [RFC1813] | ||||
| o The NSM protocol, described in Chapter 11 of [NSM] | o Versions 1, 3, and 4 of the NLM protocol [RFC1813] | |||
| o The NFSACL protocol, which does not have a public definition. | o Version 1 of the NSM protocol, described in Chapter 11 of [XNFS] | |||
| NFSACL is treated in this document as a de facto standard, as | ||||
| there are several interoperating implementations. | ||||
| RPC-over-RDMA Version One considers these RPC Programs as separate | o Version 1 of the NFSACL protocol, which does not have a public | |||
| Upper Layer Protocols [I-D.ietf-nfsv4-rfc5666bis]. Therefore a | definition. NFSACL is treated in this document as a de facto | |||
| separate Upper Layer Binding, provided here, is required for each of | standard, as there are several interoperating implementations. | |||
| these. | ||||
| 4.1. MOUNT, NLM, and NSM Protocols | 4.1. MOUNT, NLM, and NSM Protocols | |||
| Typically MOUNT, NLM, and NSM are conveyed via TCP, even in | Typically MOUNT, NLM, and NSM are conveyed via TCP, even in | |||
| deployments where the NFS RPC Program operates on RPC-over-RDMA | deployments where the NFS RPC Program operates on RPC-over-RDMA | |||
| Version One. When a Legacy server supports these RPC Programs on | Version One. | |||
| RPC-over-RDMA Version One, it advertises the port address via the | ||||
| usual rpcbind service [RFC1833]. | ||||
| No operation in these protocols conveys a significant data payload, | No XDR data item in these protocols is DDP-eligible, therefore a | |||
| and the size of RPC messages in these protocols is uniformly small. | special port assignment for operation on RPC-over-RDMA is not | |||
| Therefore, no XDR data items in these protocols are DDP-eligible. | necessary. When a Legacy server supports these RPC Programs on RPC- | |||
| over-RDMA Version One, it advertises an arbitrarily-chosen service | ||||
| port address via the rpcbind service [RFC1833]. | ||||
| The largest variable-length XDR data item is an xdr_netobj. In most | The largest variable-length XDR data items in these protocols is | |||
| implementations this data item is never larger than 1024 bytes, | defined in [XNFS]: LM_MAXSTRLEN is 1024 bytes, LM_MAXNAMELEN is | |||
| making reliable reply size estimation straightforward using the | LM_MAXSTRLEN + 1, and MAXNETOBJ_SZ is 1024 bytes. Reply size | |||
| criteria outlined in Section 2. There are no operations in these | estimation for these protocols uses the criteria outlined in | |||
| protocols that benefit from short Reply chunk retry. | Section 2. There are no operations in these protocols that benefit | |||
| from short Reply chunk retry. | ||||
| 4.2. NFSACL Protocol | 4.2. NFSACL Protocol | |||
| Legacy clients and servers that support the NFSACL RPC Program | Legacy clients and servers that support the NFSACL RPC Program | |||
| typically convey NFSACL procedures on the same connection as NFS RPC | typically convey NFSACL procedures on the same connection as NFS RPC | |||
| Programs. This obviates the need for separate rpcbind queries to | Programs. This obviates the need for separate rpcbind queries to | |||
| discover server support for this RPC Program. | discover server support for this RPC Program. | |||
| ACLs are typically small, but even large ACLs must be encoded and | ACLs are typically small, but even large ACLs must be encoded and | |||
| decoded to some degree. Thus no data item in this Upper Layer | decoded to some degree. Thus no data item in this Upper Layer | |||
| skipping to change at page 9, line 4 ¶ | skipping to change at page 8, line 45 ¶ | |||
| items whose maximum size cannot be estimated by clients reliably | items whose maximum size cannot be estimated by clients reliably | |||
| because there is no protocol-specified size limit on these arrays. | because there is no protocol-specified size limit on these arrays. | |||
| These include: | These include: | |||
| o The attrlist4 field | o The attrlist4 field | |||
| o Fields containing ACLs such as fattr4_acl, fattr4_dacl, | o Fields containing ACLs such as fattr4_acl, fattr4_dacl, | |||
| fattr4_sacl | fattr4_sacl | |||
| o Fields in the fs_locations4 and fs_locations_info4 data structures | o Fields in the fs_locations4 and fs_locations_info4 data structures | |||
| o Fields opaque to the NFS version 4 protocol which pertain to pNFS | o Fields opaque to the NFS version 4 protocol which pertain to pNFS | |||
| layout metadata, such as loc_body, loh_body, da_addr_body, | layout metadata, such as loc_body, loh_body, da_addr_body, | |||
| lou_body, lrf_body, fattr_layout_types and fs_layout_types, | lou_body, lrf_body, fattr_layout_types and fs_layout_types, | |||
| 5.2.1. Reply Size Estimation for Minor Version 0 | 5.2.1. Reply Size Estimation for Minor Version 0 | |||
| The NFS version 4.0 protocol itself does not impose any bound on the | The NFS version 4.0 protocol itself does not impose any bound on the | |||
| size of NFS calls or responses. | size of NFS calls or responses. | |||
| Some of the data items enumerated in Section 5.2 (in particular, the | Some of the data items enumerated in Section 5.2 (in particular, the | |||
| items related to ACLs and fs_locations) make it difficult to predict | items related to ACLs and fs_locations) make it difficult to predict | |||
| the maximum size of NFS version 4.0 replies that interrogate | the maximum size of NFS version 4.0 replies that interrogate | |||
| variable-length fattr4 attributes. As discussed in Section 2, client | variable-length fattr4 attributes. Client implementations might rely | |||
| implementations can rely on their own internal architectural limits | on their own internal architectural limits to constrain the reply | |||
| to constrain the reply size, but such limits are not always | size, but such limits are not always guaranteed to be reliable. | |||
| guaranteed to be reliable. | ||||
| When an especially large fattr4 result is expected, a Reply chunk | When an especially large fattr4 result is expected, a Reply chunk | |||
| might be required. An NFS version 4.0 client can use short Reply | might be required. An NFS version 4.0 client can use short Reply | |||
| chunk retry when an NFS COMPOUND containing a GETATTR operation | chunk retry when an NFS COMPOUND containing a GETATTR operation | |||
| encounters a transport error. | encounters a transport error. | |||
| The use of NFS COMPOUND operations raises the possibility of requests | The use of NFS COMPOUND operations raises the possibility of requests | |||
| that combine a non-idempotent operation (e.g. WRITE) with a GETATTR | that combine a non-idempotent operation (e.g. RENAME) with a GETATTR | |||
| operation that requests one or more variable-length results. This | operation that requests one or more variable-length results. This | |||
| combination should be avoided by ensuring that any GETATTR operation | combination should be avoided by ensuring that any GETATTR operation | |||
| that requests a result of unpredictable length is sent in an NFS | that requests a result of unpredictable length is sent in an NFS | |||
| COMPOUND by itself. | COMPOUND by itself. | |||
| 5.2.2. Reply Size Estimation for Minor Version 1 and Newer | 5.2.2. Reply Size Estimation for Minor Version 1 and Newer | |||
| In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs | In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs | |||
| argument of the CREATE_SESSION operation contains a | argument of the CREATE_SESSION operation contains a | |||
| ca_maxresponsesize field. The value in this field can be taken as | ca_maxresponsesize field. The value in this field can be taken as | |||
| skipping to change at page 10, line 13 ¶ | skipping to change at page 10, line 7 ¶ | |||
| they are not required to register with an rpcbind service [RFC7530]. | they are not required to register with an rpcbind service [RFC7530]. | |||
| Therefore, an NFS version 4 server supporting RPC-over-RDMA Version | Therefore, an NFS version 4 server supporting RPC-over-RDMA Version | |||
| One MUST use the alternative well-known port number for its RPC-over- | One MUST use the alternative well-known port number for its RPC-over- | |||
| RDMA service (see Section 8). Clients SHOULD connect to this well- | RDMA service (see Section 8). Clients SHOULD connect to this well- | |||
| known port without consulting the RPC portmapper (as for NFS version | known port without consulting the RPC portmapper (as for NFS version | |||
| 4 on TCP transports). | 4 on TCP transports). | |||
| 5.4. NFS COMPOUND Requests | 5.4. NFS COMPOUND Requests | |||
| 5.4.1. Long Calls and Replies | 5.4.1. Multiple DDP-eligible Data Items | |||
| Each NFS version 4 COMPOUND procedure contains an array of operations | ||||
| which may be larger than a connection's inline thresholds, even after | ||||
| reduction of DDP-elibible payloads. Therefore, an NFS version 4 | ||||
| client MAY send a reduced Payload stream in a Long Call. An NFS | ||||
| version 4 client MAY enable an NFS version 4 server to send a reduced | ||||
| Payload stream in a Long Reply. | ||||
| 5.4.2. Multiple DDP-eligible Data Items | ||||
| The NFS version 4 COMPOUND procedure allows the transmission of more | An NFS version 4 COMPOUND procedure can contain more than one | |||
| than one DDP-eligible data item per Call and Reply message. An NFS | operation that carries a DDP-eligible data item. An NFS version 4 | |||
| version 4 client provides XDR Position values in each Read chunk to | client provides XDR Position values in each Read chunk to | |||
| disambiguate which chunk is associated with which argument data item. | disambiguate which chunk is associated with which argument data item. | |||
| However NFS version 4 server and client implementations must agree in | However NFS version 4 server and client implementations must agree in | |||
| advance on how to pair Write chunks with returned result data items. | advance on how to pair Write chunks with returned result data items. | |||
| The mechanism specified in Section 4.3.2 of | ||||
| [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with additional | ||||
| restrictions that appear below. | ||||
| In the following list, an "NFS Read" operation refers to any NFS | In the following list, an "NFS Read" operation refers to any NFS | |||
| Version 4 operation which has a DDP-eligible result data item (i.e., | Version 4 operation which has a DDP-eligible result data item (i.e., | |||
| either a READ, READ_PLUS, or READLINK operation). | either a READ, READ_PLUS, or READLINK operation). The mechanism | |||
| specified in Section 4.3.2 of [I-D.ietf-nfsv4-rfc5666bis]) is applied | ||||
| to this class of operations: | ||||
| o If an NFS version 4 client wishes all DDP-eligible items in an NFS | o If an NFS version 4 client wishes all DDP-eligible items in an NFS | |||
| reply to be conveyed inline, it leaves the Write list empty. | reply to be conveyed inline, it leaves the Write list empty. | |||
| o The first chunk in the Write list MUST be used by the first READ | o The first chunk in the Write list MUST be used by the first READ | |||
| operation in an NFS version 4 COMPOUND procedure. The next Write | operation in an NFS version 4 COMPOUND procedure. The next Write | |||
| chunk is used by the next READ operation, and so on. | chunk is used by the next READ operation, and so on. | |||
| o If an NFS version 4 client has provided a matching non-empty Write | o If an NFS version 4 client has provided a matching non-empty Write | |||
| chunk, then the corresponding READ operation MUST return its DDP- | chunk, then the corresponding READ operation MUST return its DDP- | |||
| skipping to change at page 11, line 14 ¶ | skipping to change at page 10, line 46 ¶ | |||
| o If a READ operation returns a union arm which does not contain a | o If a READ operation returns a union arm which does not contain a | |||
| DDP-eligible result, and the NFS version 4 client has provided a | DDP-eligible result, and the NFS version 4 client has provided a | |||
| matching non-empty Write chunk, an NFS version 4 server MUST | matching non-empty Write chunk, an NFS version 4 server MUST | |||
| return an empty Write chunk in that Write list position. | return an empty Write chunk in that Write list position. | |||
| o If there are more READ operations than Write chunks, then | o If there are more READ operations than Write chunks, then | |||
| remaining NFS Read operations in an NFS version 4 COMPOUND that | remaining NFS Read operations in an NFS version 4 COMPOUND that | |||
| have no matching Write chunk MUST return their results inline. | have no matching Write chunk MUST return their results inline. | |||
| 5.4.3. NFS Version 4 COMPOUND Example | 5.4.2. NFS Version 4 COMPOUND Example | |||
| The following example shows a Write list with three Write chunks, A, | The following example shows a Write list with three Write chunks, A, | |||
| B, and C. The NFS version 4 server consumes the provided Write | B, and C. The NFS version 4 server consumes the provided Write | |||
| chunks by writing the results of the designated operations in the | chunks by writing the results of the designated operations in the | |||
| compound request (READ and READLINK) back to each chunk. | compound request (READ and READLINK) back to each chunk. | |||
| Write list: | Write list: | |||
| A --> B --> C | A --> B --> C | |||
| skipping to change at page 12, line 18 ¶ | skipping to change at page 11, line 49 ¶ | |||
| An NFS version 4.0 client advertises netids and ad hoc port addresses | An NFS version 4.0 client advertises netids and ad hoc port addresses | |||
| for contacting its NFS version 4.0 callback service using the | for contacting its NFS version 4.0 callback service using the | |||
| SETCLIENTID operation. | SETCLIENTID operation. | |||
| 5.5.2. NFS Version 4.1 Callback | 5.5.2. NFS Version 4.1 Callback | |||
| In NFS version 4.1 and newer minor versions, callback operations may | In NFS version 4.1 and newer minor versions, callback operations may | |||
| appear on the same connection as is used for NFS version 4 forward | appear on the same connection as is used for NFS version 4 forward | |||
| channel client requests. NFS version 4 clients and servers MUST use | channel client requests. NFS version 4 clients and servers MUST use | |||
| the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when | the approach described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when | |||
| backchannel operations are conveyed on RPC-over-RDMA Version One | backchannel operations are conveyed on RPC-over-RDMA Version One | |||
| transports. | transports. | |||
| The csa_back_chan_attrs argument of the CREATE_SESSION operation | The csa_back_chan_attrs argument of the CREATE_SESSION operation | |||
| contains a ca_maxresponsesize field. The value in this field can be | contains a ca_maxresponsesize field. The value in this field can be | |||
| taken as the absolute maximum size of backchannel replies generated | taken as the absolute maximum size of backchannel replies generated | |||
| by a replying NFS version 4 client. | by a replying NFS version 4 client. | |||
| There are no DDP-eligible data items in callback procedures defined | There are no DDP-eligible data items in callback procedures defined | |||
| in NFS version 4.1 or NFS version 4.2. However, some callback | in NFS version 4.1 or NFS version 4.2. However, some callback | |||
| skipping to change at page 13, line 31 ¶ | skipping to change at page 13, line 17 ¶ | |||
| 5.7.1. Congestion Avoidance | 5.7.1. Congestion Avoidance | |||
| Section 3.1 of [RFC7530] states: | Section 3.1 of [RFC7530] states: | |||
| Where an NFSv4 implementation supports operation over the IP | Where an NFSv4 implementation supports operation over the IP | |||
| network protocol, the supported transport layer between NFS and IP | network protocol, the supported transport layer between NFS and IP | |||
| MUST be an IETF standardized transport protocol that is specified | MUST be an IETF standardized transport protocol that is specified | |||
| to avoid network congestion; such transports include TCP and the | to avoid network congestion; such transports include TCP and the | |||
| Stream Control Transmission Protocol (SCTP). | Stream Control Transmission Protocol (SCTP). | |||
| Section 2.9.1 of [RFC5661] further states: | Section 2.9.1 of [RFC5661] also states: | |||
| Even if NFSv4.1 is used over a non-IP network protocol, it is | Even if NFSv4.1 is used over a non-IP network protocol, it is | |||
| RECOMMENDED that the transport support congestion control. | RECOMMENDED that the transport support congestion control. | |||
| It is permissible for a connectionless transport to be used under | It is permissible for a connectionless transport to be used under | |||
| NFSv4.1; however, reliable and in-order delivery of data combined | NFSv4.1; however, reliable and in-order delivery of data combined | |||
| with congestion control by the connectionless transport is | with congestion control by the connectionless transport is | |||
| REQUIRED. As a consequence, UDP by itself MUST NOT be used as an | REQUIRED. As a consequence, UDP by itself MUST NOT be used as an | |||
| NFSv4.1 transport. | NFSv4.1 transport. | |||
| skipping to change at page 15, line 44 ¶ | skipping to change at page 15, line 32 ¶ | |||
| [I-D.ietf-nfsv4-rfc5666bis] | [I-D.ietf-nfsv4-rfc5666bis] | |||
| Lever, C., Simpson, W., and T. Talpey, "Remote Direct | Lever, C., Simpson, W., and T. Talpey, "Remote Direct | |||
| Memory Access Transport for Remote Procedure Call, Version | Memory Access Transport for Remote Procedure Call, Version | |||
| One", draft-ietf-nfsv4-rfc5666bis-10 (work in progress), | One", draft-ietf-nfsv4-rfc5666bis-10 (work in progress), | |||
| February 2017. | February 2017. | |||
| [I-D.ietf-nfsv4-rpcrdma-bidirection] | [I-D.ietf-nfsv4-rpcrdma-bidirection] | |||
| Lever, C., "Bi-directional Remote Procedure Call On RPC- | Lever, C., "Bi-directional Remote Procedure Call On RPC- | |||
| over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- | over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- | |||
| bidirection-07 (work in progress), February 2017. | bidirection-08 (work in progress), March 2017. | |||
| [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | |||
| RFC 1833, DOI 10.17487/RFC1833, August 1995, | RFC 1833, DOI 10.17487/RFC1833, August 1995, | |||
| <http://www.rfc-editor.org/info/rfc1833>. | <http://www.rfc-editor.org/info/rfc1833>. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <http://www.rfc-editor.org/info/rfc2119>. | <http://www.rfc-editor.org/info/rfc2119>. | |||
| skipping to change at page 16, line 41 ¶ | skipping to change at page 16, line 32 ¶ | |||
| Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, | Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, | |||
| November 2016, <http://www.rfc-editor.org/info/rfc7862>. | November 2016, <http://www.rfc-editor.org/info/rfc7862>. | |||
| 9.2. Informative References | 9.2. Informative References | |||
| [I-D.ietf-nfsv4-versioning] | [I-D.ietf-nfsv4-versioning] | |||
| Noveck, D., "Rules for NFSv4 Extensions and Minor | Noveck, D., "Rules for NFSv4 Extensions and Minor | |||
| Versions", draft-ietf-nfsv4-versioning-09 (work in | Versions", draft-ietf-nfsv4-versioning-09 (work in | |||
| progress), December 2016. | progress), December 2016. | |||
| [NSM] The Open Group, "Protocols for Interworking: XNFS, Version | ||||
| 3W", February 1998. | ||||
| [RFC1094] Nowicki, B., "NFS: Network File System Protocol | [RFC1094] Nowicki, B., "NFS: Network File System Protocol | |||
| specification", RFC 1094, DOI 10.17487/RFC1094, March | specification", RFC 1094, DOI 10.17487/RFC1094, March | |||
| 1989, <http://www.rfc-editor.org/info/rfc1094>. | 1989, <http://www.rfc-editor.org/info/rfc1094>. | |||
| [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | |||
| Version 3 Protocol Specification", RFC 1813, | Version 3 Protocol Specification", RFC 1813, | |||
| DOI 10.17487/RFC1813, June 1995, | DOI 10.17487/RFC1813, June 1995, | |||
| <http://www.rfc-editor.org/info/rfc1813>. | <http://www.rfc-editor.org/info/rfc1813>. | |||
| [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. | [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. | |||
| skipping to change at page 17, line 24 ¶ | skipping to change at page 17, line 14 ¶ | |||
| [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access | [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access | |||
| Transport for Remote Procedure Call", RFC 5666, | Transport for Remote Procedure Call", RFC 5666, | |||
| DOI 10.17487/RFC5666, January 2010, | DOI 10.17487/RFC5666, January 2010, | |||
| <http://www.rfc-editor.org/info/rfc5666>. | <http://www.rfc-editor.org/info/rfc5666>. | |||
| [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) | [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) | |||
| Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, | Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, | |||
| January 2010, <http://www.rfc-editor.org/info/rfc5667>. | January 2010, <http://www.rfc-editor.org/info/rfc5667>. | |||
| [XNFS] The Open Group, "Protocols for Interworking: XNFS, Version | ||||
| 3W", February 1998. | ||||
| Appendix A. Changes Since RFC 5667 | Appendix A. Changes Since RFC 5667 | |||
| Corrections and updates made necessary by new language in | Corrections and updates made necessary by new language in | |||
| [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, | [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, | |||
| references to deprecated features of RPC-over-RDMA Version One, such | references to deprecated features of RPC-over-RDMA Version One, such | |||
| as RDMA_MSGP, and the use of the Read list for handling RPC replies, | as RDMA_MSGP, and the use of the Read list for handling RPC replies, | |||
| have been removed. The term "mapping" has been replaced with the | have been removed. The term "mapping" has been replaced with the | |||
| term "binding" or "Upper Layer Binding" throughout the document. | term "binding" or "Upper Layer Binding" throughout the document. | |||
| Some material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] | Material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] has | |||
| has been deleted. | been deleted. | |||
| Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer | Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer | |||
| Bindings that was not present in [RFC5667] has been added, including | Bindings that was not present in [RFC5667] has been added. A | |||
| discussion of how each NFS version properly estimates the maximum | complete discussion of reply size estimation has been introduced for | |||
| size of RPC replies. | all protocols covered by the Upper Layer Bindings in this document. | |||
| Technical corrections have been made. For example, the mention of | Technical corrections have been made. For example, the mention of | |||
| 12KB and 36KB inline thresholds have been removed. The reference to | 12KB and 36KB inline thresholds have been removed. The reference to | |||
| a non-existant NFS version 4 SYMLINK operation has been replaced. | a non-existant NFS version 4 SYMLINK operation has been replaced. | |||
| The discussion of NFS version 4 COMPOUND handling has been completed. | The discussion of NFS version 4 COMPOUND handling has been completed. | |||
| Some changes were made to the algorithm for matching DDP-eligible | Some changes were made to the algorithm for matching DDP-eligible | |||
| results to Write chunks. | results to Write chunks. | |||
| Requirements to ignore extra Read or Write chunks have been removed | Requirements to ignore extra Read or Write chunks have been removed | |||
| from the NFS version 2 and 3 Upper Layer Binding, as they conflict | from the NFS version 2 and 3 Upper Layer Binding, as they conflict | |||
| with [I-D.ietf-nfsv4-rfc5666bis]. | with [I-D.ietf-nfsv4-rfc5666bis]. | |||
| A complete discussion of reply size estimation has been introduced | ||||
| for all protocols covered by the Upper Layer Bindings in this | ||||
| document. | ||||
| A section discussing NFS version 4 retransmission and connection loss | A section discussing NFS version 4 retransmission and connection loss | |||
| has been added. | has been added. | |||
| The following additional improvements have been made, relative to | The following additional improvements have been made, relative to | |||
| [RFC5667]: | [RFC5667]: | |||
| o An explicit discussion of NFS version 4.0 and NFS version 4.1 | o An explicit discussion of NFS version 4.0 and NFS version 4.1 | |||
| backchannel operation has replaced the previous treatment of | backchannel operation has replaced the previous treatment of | |||
| callback operations. | callback operations. | |||
| o A binding for NFS version 4.2 has been added that includes | o A binding for NFS version 4.2 has been added that includes | |||
| discussion of new data-bearing operations like READ_PLUS. | discussion of new data-bearing operations like READ_PLUS. | |||
| o A section suggesting a mechanism for periodically assessing | o A section suggesting a mechanism for periodically assessing | |||
| connection health has been introduced. | connection health has been introduced. | |||
| o Language inconsistent with or contradictory to | ||||
| [I-D.ietf-nfsv4-rfc5666bis] has been removed from the present | ||||
| document. | ||||
| o Ambiguous or erroneous uses of RFC2119 terms have been corrected. | o Ambiguous or erroneous uses of RFC2119 terms have been corrected. | |||
| o References to obsolete RFCs have been updated. | o References to obsolete RFCs have been updated. | |||
| o An IANA Considerations Section has been added, which specifies the | o An IANA Considerations Section has been added, which specifies the | |||
| port assignments for NFS/RDMA. This replaces the example | port assignments for NFS/RDMA. This replaces the example | |||
| assignment that appeared in [RFC5666]. | assignment that appeared in [RFC5666]. | |||
| o Code excerpts have been removed, and figures have been modernized. | o Code excerpts have been removed, and figures have been modernized. | |||
| End of changes. 45 change blocks. | ||||
| 111 lines changed or deleted | 93 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||