| < draft-ietf-nfsv4-rfc5667bis-05.txt | draft-ietf-nfsv4-rfc5667bis-06.txt > | |||
|---|---|---|---|---|
| Network File System Version 4 C. Lever, Ed. | Network File System Version 4 C. Lever, Ed. | |||
| Internet-Draft Oracle | Internet-Draft Oracle | |||
| Obsoletes: 5667 (if approved) February 3, 2017 | Obsoletes: 5667 (if approved) February 24, 2017 | |||
| Intended status: Standards Track | Intended status: Standards Track | |||
| Expires: August 7, 2017 | Expires: August 28, 2017 | |||
| Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA | Network File System (NFS) Upper Layer Binding To RPC-Over-RDMA Version | |||
| draft-ietf-nfsv4-rfc5667bis-05 | One | |||
| draft-ietf-nfsv4-rfc5667bis-06 | ||||
| Abstract | Abstract | |||
| This document specifies Upper Layer Bindings of Network File System | This document specifies Upper Layer Bindings of Network File System | |||
| (NFS) protocol versions to RPC-over-RDMA. Upper Layer Bindings are | (NFS) protocol versions to RPC-over-RDMA Version One. Upper Layer | |||
| required to enable RPC-based protocols, such as NFS, to use Direct | Bindings are required in order to enable RPC-based protocols such as | |||
| Data Placement on RPC-over-RDMA. This document obsoletes RFC 5667. | NFS to use Direct Data Placement on RPC-over-RDMA Version One. This | |||
| document obsoletes RFC 5667. | ||||
| Requirements Language | Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| skipping to change at page 1, line 40 ¶ | skipping to change at page 1, line 42 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on August 7, 2017. | This Internet-Draft will expire on August 28, 2017. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 25 ¶ | skipping to change at page 2, line 29 ¶ | |||
| modifications of such material outside the IETF Standards Process. | modifications of such material outside the IETF Standards Process. | |||
| Without obtaining an adequate license from the person(s) controlling | Without obtaining an adequate license from the person(s) controlling | |||
| the copyright in such materials, this document may not be modified | the copyright in such materials, this document may not be modified | |||
| outside the IETF Standards Process, and derivative works of it may | outside the IETF Standards Process, and derivative works of it may | |||
| not be created outside the IETF Standards Process, except to format | not be created outside the IETF Standards Process, except to format | |||
| it for publication as an RFC or to translate it into languages other | it for publication as an RFC or to translate it into languages other | |||
| than English. | than English. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Conveying NFS Operations On RPC-Over-RDMA . . . . . . . . . . 3 | 2. Reply Size Estimation . . . . . . . . . . . . . . . . . . . . 3 | |||
| 3. Upper Layer Binding For NFS Versions 2 And 3 . . . . . . . . 4 | 2.1. Short Reply Chunk Retry . . . . . . . . . . . . . . . . . 4 | |||
| 4. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 6 | 3. Upper Layer Binding for NFS Versions 2 and 3 . . . . . . . . 5 | |||
| 5. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 12 | 3.1. Reply Size Estimation . . . . . . . . . . . . . . . . . . 5 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | 3.2. RPC Binding Considerations . . . . . . . . . . . . . . . 5 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary | |||
| 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 | Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 15 | 4.1. MOUNT, NLM, and NSM Protocols . . . . . . . . . . . . . . 6 | |||
| Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 17 | 4.2. NFSACL Protocol . . . . . . . . . . . . . . . . . . . . . 7 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 17 | 5. Upper Layer Binding For NFS Version 4 . . . . . . . . . . . . 7 | |||
| 5.1. DDP-Eligibility . . . . . . . . . . . . . . . . . . . . . 7 | ||||
| 5.2. Reply Size Estimation . . . . . . . . . . . . . . . . . . 8 | ||||
| 5.3. RPC Binding Considerations . . . . . . . . . . . . . . . 9 | ||||
| 5.4. NFS COMPOUND Requests . . . . . . . . . . . . . . . . . . 10 | ||||
| 5.5. NFS Callback Requests . . . . . . . . . . . . . . . . . . 11 | ||||
| 5.6. Session-Related Considerations . . . . . . . . . . . . . 12 | ||||
| 5.7. Transport Considerations . . . . . . . . . . . . . . . . 13 | ||||
| 6. Extending NFS Upper Layer Bindings . . . . . . . . . . . . . 14 | ||||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | ||||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 | ||||
| 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 15 | ||||
| 9.1. Normative References . . . . . . . . . . . . . . . . . . 15 | ||||
| 9.2. Informative References . . . . . . . . . . . . . . . . . 16 | ||||
| Appendix A. Changes Since RFC 5667 . . . . . . . . . . . . . . . 17 | ||||
| Appendix B. Acknowledgments . . . . . . . . . . . . . . . . . . 18 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 19 | ||||
| 1. Introduction | 1. Introduction | |||
| An RPC-over-RDMA transport, such as the one defined in | An RPC-over-RDMA Version One transport may employ direct data | |||
| [I-D.ietf-nfsv4-rfc5666bis], may employ direct data placement to | placement to convey certain data payloads associated with RPC | |||
| convey data payloads associated with RPC transactions. To enable | transactions [I-D.ietf-nfsv4-rfc5666bis]. To enable successful | |||
| successful interoperation, RPC client and server implementations must | interoperation, implementations of RPC Programs running on RPC-over- | |||
| agree as to which XDR data items in what particular RPC procedures | RDMA must agree as to which XDR data items in what particular RPC | |||
| are eligible for direct data placement (DDP). | procedures are eligible for direct data placement (DDP). This | |||
| agreement is specified in an Upper Layer Binding. | ||||
| This document contains material required of Upper Layer Bindings, as | This document contains material required of Upper Layer Bindings, as | |||
| specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS | specified in [I-D.ietf-nfsv4-rfc5666bis], for the following NFS | |||
| protocol versions: | protocol versions: | |||
| o NFS Version 2 [RFC1094] | o NFS Version 2 [RFC1094] | |||
| o NFS Version 3 [RFC1813] | o NFS Version 3 [RFC1813] | |||
| o NFS Version 4.0 [RFC7530] | o NFS Version 4.0 [RFC7530] | |||
| o NFS Version 4.1 [RFC5661] | o NFS Version 4.1 [RFC5661] | |||
| o NFS Version 4.2 [RFC7862] | o NFS Version 4.2 [RFC7862] | |||
| Upper Layer Bindings specified in this document apply to all versions | This document assumes the reader is already familiar with concepts | |||
| of RPC-over-RDMA. | and terminology defined in [I-D.ietf-nfsv4-rfc5666bis] and the | |||
| documents it references. | ||||
| 2. Conveying NFS Operations On RPC-Over-RDMA | 2. Reply Size Estimation | |||
| Definitions of terminology and a general discussion of how RPC-over- | On an RPC-over-RDMA Version One transport, during the construction of | |||
| RDMA is used to convey RPC transactions can be found in | each RPC Call message, a requester is responsible for allocating | |||
| [I-D.ietf-nfsv4-rfc5666bis]. In this section, these general | appropriate resources for receiving the matching Reply message. | |||
| principles are applied in the context of conveying NFS procedures on | ||||
| RPC-over-RDMA. Some issues common to all NFS protocol versions are | ||||
| introduced. | ||||
| 2.1. DDP Eligibility Violations | An overrun of these resources can result in corruption of the Reply | |||
| message or termination of the transport connection. Therefore | ||||
| reliable reply size estimation is necessary to ensure successful | ||||
| interoperation. This is particularly critical, for example, when | ||||
| allocating a Reply chunk. | ||||
| To report a DDP-eligibity violation, an NFS server MUST return one | In most cases, the NFS protocol's XDR definition provides enough | |||
| of: | information to enable an NFS client to predict the maximum size of | |||
| the expected Reply message. If there are variable-size data items in | ||||
| the result, the maximum size of the RPC Reply message can be | ||||
| estimated as follows: | ||||
| o An RPC-over-RDMA message of type RDMA_ERROR, with the rdma_xid | o The client requests only a specific portion of an object (for | |||
| field set to the XID of the matching NFS Call, and the rdma_error | example, using the "count" and "offset" fields in an NFS READ). | |||
| field set to ERR_CHUNK; or | ||||
| o An RPC message (via an RDMA_MSG message) with the xid field set to | o The client has already cached the size of the whole object it is | |||
| the XID of the matching NFS Call, the mtype field set to REPLY, | about to request (say, via a previous NFS GETATTR request). | |||
| the stat field set to MSG_ACCEPTED, and the accept_stat field set | ||||
| to GARBAGE_ARGS. | ||||
| Subsequent sections of this document describe further considerations | o The client and server have negotiated a maximum size for all calls | |||
| particular to specific NFS protocols or procedures. | and responses (using a CREATE_SESSION operation, for instance). | |||
| 2.2. Reply Size Estimation | 2.1. Short Reply Chunk Retry | |||
| During the construction of each RPC Call message, an NFS client is | In a few cases, either the size of one or more returned data items or | |||
| responsible for allocating appropriate resources for receiving the | the number of returned data items cannot be known in advance of | |||
| matching Reply message. A Reply buffer overrun can result in | forming an RPC Call. | |||
| corruption of the Reply message or termination of the transport | ||||
| connection. Therefore reliable reply size estimation is necessary to | ||||
| ensure successful interoperation. This is particularly critical, for | ||||
| example, when allocating a Reply chunk. | ||||
| In many cases the Upper Layer Protocol's XDR definition provides | A requester uses a Reply chunk to handle an RPC transaction where the | |||
| enough information to enable the client to make a reliable prediction | expected RPC Reply message might be larger than the requester's | |||
| of the maximum size of the expected Reply message. If there are | inline threshold. If an actual RPC Reply message does not fit in a | |||
| variable-size data items in the result, the maximum size of the RPC | client-provided Reply chunk, the NFS server responds with an | |||
| Reply message can be reliably estimated in most cases: | RDMA_ERROR message with the rdma_err field set to ERR_CHUNK, or it | |||
| could even break the transport connection. | ||||
| o The client requests only a specific portion of an object (for | In response, an NFS client can choose to: | |||
| example, using the "count" and "offset" fields in an NFS READ). | ||||
| o The client has already cached the size of the whole object it is | o Terminate the RPC transaction with an error, or | |||
| about to request (say, via a previous NFS GETATTR request). | ||||
| o The client and server have negotiated a maximum size for all calls | o Allocate a larger Reply chunk and send the same request as a new | |||
| and responses. | RPC transaction (to avoid hitting in a Duplicate Reply Cache). | |||
| The NFS client should avoid retrying the request indefinitely | ||||
| because a responder may return ERR_CHUNK for a variety of reasons. | ||||
| Subsequent sections of this document describe considerations | The latter choice is considered heroic recovery, and is only a real | |||
| particular to specific NFS procedures where it is not possible to | choice for the few operations where it is not possible for an NFS | |||
| determine the maximum Reply message size based solely on the above | client to predict the size of the Reply message in advance. | |||
| criteria. | ||||
| 3. Upper Layer Binding For NFS Versions 2 And 3 | Subsequent sections of this document discuss exactly which operations | |||
| might have ultimate difficulty with Reply size estimation. These | ||||
| operations are eligible for "short Reply chunk retry." Unless | ||||
| explicitly mentioned as applicable, short Reply chunk retry should | ||||
| not be used. | ||||
| This Upper Layer Binding specification applies to NFS Version 2 | 3. Upper Layer Binding for NFS Versions 2 and 3 | |||
| [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in this section | ||||
| a "legacy NFS client" refers to an NFS client using NFS version 2 or | The Upper Layer Binding specification in this section applies to NFS | |||
| NFS version 3 to communicate with an NFS server. Likewise, a "legacy | Version 2 [RFC1094] and NFS Version 3 [RFC1813]. For brevity, in | |||
| NFS server" is an NFS server communicating with clients using NFS | this document a "Legacy NFS client" refers to an NFS client using the | |||
| version 2 or NFS version 3. | NFS version 2 or NFS version 3 RPC Programs (100003) to communicate | |||
| with an NFS server. Likewise, a "Legacy NFS server" is an NFS server | ||||
| communicating with clients using NFS version 2 or NFS version 3. | ||||
| The following XDR data items in NFS versions 2 and 3 are DDP- | The following XDR data items in NFS versions 2 and 3 are DDP- | |||
| eligible: | eligible: | |||
| o The opaque file data argument in the NFS WRITE procedure | o The opaque file data argument in the NFS WRITE procedure | |||
| o The pathname argument in the NFS SYMLINK procedure | o The pathname argument in the NFS SYMLINK procedure | |||
| o The opaque file data result in the NFS READ procedure | o The opaque file data result in the NFS READ procedure | |||
| o The pathname result in the NFS READLINK procedure | o The pathname result in the NFS READLINK procedure | |||
| All other argument or result data items in NFS versions 2 and 3 are | All other argument or result data items in NFS versions 2 and 3 are | |||
| not DDP-eligible. | not DDP-eligible. | |||
| A legacy server's response to a DDP-eligibility violation (described | A Legacy NFS client MUST NOT send a reduced Payload stream in a Long | |||
| in Section 2.1) does not give an indication to legacy clients of | Call. A Legacy NFS client MUST NOT enable a Legacy NFS server to | |||
| whether the server has processed the arguments of the RPC Call, or | send a reduced Payload stream in a Long Reply. | |||
| whether the server has accessed or modified client memory associated | ||||
| with that RPC. | ||||
| A legacy NFS client determines the maximum reply size for each | A Legacy server's response to a DDP-eligibility violation does not | |||
| operation using the basic criteria outlined in Section 2.2. | give an indication to Legacy clients of whether the server has | |||
| processed the arguments of the RPC Call, or whether the server has | ||||
| accessed or modified client memory associated with that RPC. | ||||
| 3.1. Auxiliary Protocols | 3.1. Reply Size Estimation | |||
| A Legacy NFS client determines the maximum reply size for each | ||||
| operation using the criteria outlined in Section 2. There are no | ||||
| operations in NFS version 2 or 3 that benefit from short Reply chunk | ||||
| retry. | ||||
| 3.2. RPC Binding Considerations | ||||
| Legacy NFS servers traditionally listen for clients on UDP and TCP | ||||
| port 2049. Additionally, they register these ports with a local | ||||
| portmapper [RFC1833] service. | ||||
| A Legacy NFS server supporting RPC-over-RDMA Version One on such a | ||||
| network and registering itself with the RPC portmapper MAY choose an | ||||
| arbitrary port, or MAY use the alternative well-known port number for | ||||
| its RPC-over-RDMA service (see Section 8). The chosen port MAY be | ||||
| registered with the RPC portmapper under the netids assigned in | ||||
| [I-D.ietf-nfsv4-rfc5666bis]. | ||||
| 4. Upper Layer Bindings for NFS Version 2 and 3 Auxiliary Protocols | ||||
| NFS versions 2 and 3 are typically deployed with several other | NFS versions 2 and 3 are typically deployed with several other | |||
| protocols, sometimes referred to as "NFS auxiliary protocols." These | protocols, sometimes referred to as "NFS auxiliary protocols." These | |||
| are separate RPC programs that define procedures which are not part | are distinct RPC Programs that define procedures which are not part | |||
| of the NFS version 2 or version 3 RPC programs. These include: | of the NFS version 2 or version 3 RPC Programs. The Upper Layer | |||
| Bindings in this section apply to: | ||||
| o The MOUNT and NLM protocols, introduced in an appendix of | o The MOUNT and NLM protocols, introduced in an appendix of | |||
| [RFC1813] | [RFC1813] | |||
| o The NSM protocol, described in Chapter 11 of [NSM] | o The NSM protocol, described in Chapter 11 of [NSM] | |||
| o The NFSACL protocol, which does not have a public definition | o The NFSACL protocol, which does not have a public definition. | |||
| (NFSACL here is treated as a de facto standard as there are | NFSACL is treated in this document as a de facto standard, as | |||
| several interoperating implementations). | there are several interoperating implementations. | |||
| RPC-over-RDMA considers these programs as distinct Upper Layer | RPC-over-RDMA Version One considers these RPC Programs as separate | |||
| Protocols [I-D.ietf-nfsv4-rfc5666bis]. To enable the use of these | Upper Layer Protocols [I-D.ietf-nfsv4-rfc5666bis]. Therefore a | |||
| ULPs on an RPC-over-RDMA transport, an Upper Layer Binding | separate Upper Layer Binding, provided here, is required for each of | |||
| specification is provided here for each. | these. | |||
| 3.1.1. MOUNT, NLM, And NSM Protocols | 4.1. MOUNT, NLM, and NSM Protocols | |||
| Typically MOUNT, NLM, and NSM are conveyed via TCP, even in | Typically MOUNT, NLM, and NSM are conveyed via TCP, even in | |||
| deployments where NFS operations on RPC-over-RDMA. When a legacy | deployments where the NFS RPC Program operates on RPC-over-RDMA | |||
| server supports these programs on RPC-over-RDMA, it advertises the | Version One. When a Legacy server supports these RPC Programs on | |||
| port address via the usual rpcbind service [RFC1833]. | RPC-over-RDMA Version One, it advertises the port address via the | |||
| usual rpcbind service [RFC1833]. | ||||
| No operation in these protocols conveys a significant data payload, | No operation in these protocols conveys a significant data payload, | |||
| and the size of RPC messages in these protocols is uniformly small. | and the size of RPC messages in these protocols is uniformly small. | |||
| Therefore, no XDR data items in these protocols are DDP-eligible. | Therefore, no XDR data items in these protocols are DDP-eligible. | |||
| The largest variable-length XDR data item is an xdr_netobj. In most | The largest variable-length XDR data item is an xdr_netobj. In most | |||
| implementations this data item is not larger than 1024 bytes, making | implementations this data item is never larger than 1024 bytes, | |||
| reliable reply size estimation straightforward using the criteria | making reliable reply size estimation straightforward using the | |||
| outlined in Section 2.2. | criteria outlined in Section 2. There are no operations in these | |||
| protocols that benefit from short Reply chunk retry. | ||||
| 3.1.2. NFSACL Protocol | 4.2. NFSACL Protocol | |||
| Legacy clients and servers that support the NFSACL RPC program | Legacy clients and servers that support the NFSACL RPC Program | |||
| typically convey NFSACL procedures on the same connection as the NFS | typically convey NFSACL procedures on the same connection as NFS RPC | |||
| RPC program. This obviates the need for separate rpcbind queries to | Programs. This obviates the need for separate rpcbind queries to | |||
| discover server support for this RPC program. | discover server support for this RPC Program. | |||
| ACLs are typically small, but even large ACLs must be encoded and | ACLs are typically small, but even large ACLs must be encoded and | |||
| decoded to some degree. Thus no data item in this Upper Layer | decoded to some degree. Thus no data item in this Upper Layer | |||
| Protocol is DDP-eligible. | Protocol is DDP-eligible. | |||
| For procedures whose replies do not include an ACL object, the size | For procedures whose replies do not include an ACL object, the size | |||
| of a reply is determined directly from the NFSACL program's XDR | of a reply is determined directly from the NFSACL RPC Program's XDR | |||
| definition. | definition. | |||
| There is no protocol-wide size limit for NFS version 3 ACLs, and | There is no protocol-specified size limit for NFS version 3 ACLs, and | |||
| there is no mechanism in either the NFSACL or NFS programs for a | there is no mechanism in either the NFSACL or NFS RPC Programs for a | |||
| legacy client to ascertain the largest ACL a legacy server can store. | Legacy client to ascertain the largest ACL a Legacy server can | |||
| Legacy client implementations should choose a maximum size for ACLs | return. Legacy client implementations should choose a maximum size | |||
| based on their own internal limits. A recommended lower bound for | for ACLs based on their own internal limits. | |||
| this maximum is 32,768 bytes. | ||||
| When an especially large ACL is expected, a Reply chunk might be | ||||
| required. If a legacy NFS server indicates that it cannot return an | ||||
| NFSACL GETACL response because the legacy NFS client has not provided | ||||
| a large enough Reply chunk to receive that response, the legacy NFS | ||||
| client can choose to | ||||
| o Terminate the NFSACL GETACL with an error, or | ||||
| o Allocate a larger Reply chunk and send the same NFSACL GETACL | Because an NFSACL client cannot know in advance how large a returned | |||
| request as a new RPC transaction. The NFS client should avoid | ACL will be, it can use short Reply chunk retry when an NFSACL GETACL | |||
| retrying the request indefinitely. | operation encounters a transport error. | |||
| 4. Upper Layer Binding For NFS Version 4 | 5. Upper Layer Binding For NFS Version 4 | |||
| This Upper Layer Binding specification applies to all protocols | The Upper Layer Binding specification in this section applies to RPC | |||
| defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 [RFC5661], and | Programs defined in NFS Version 4.0 [RFC7530], NFS Version 4.1 | |||
| NFS Version 4.2 [RFC7862]. | [RFC5661], and NFS Version 4.2 [RFC7862]. | |||
| 4.1. DDP-Eligibility | 5.1. DDP-Eligibility | |||
| Only the following XDR data items in the COMPOUND procedure of all | Only the following XDR data items in the COMPOUND procedure of all | |||
| NFS version 4 minor versions are DDP-eligible: | NFS version 4 minor versions are DDP-eligible: | |||
| o The opaque data field in the WRITE4args structure | o The opaque data field in the WRITE4args structure | |||
| o The linkdata field of the NF4LNK arm in the createtype4 union | o The linkdata field of the NF4LNK arm in the createtype4 union | |||
| o The opaque data field in the READ4resok structure | o The opaque data field in the READ4resok structure | |||
| skipping to change at page 7, line 4 ¶ | skipping to change at page 7, line 48 ¶ | |||
| Only the following XDR data items in the COMPOUND procedure of all | Only the following XDR data items in the COMPOUND procedure of all | |||
| NFS version 4 minor versions are DDP-eligible: | NFS version 4 minor versions are DDP-eligible: | |||
| o The opaque data field in the WRITE4args structure | o The opaque data field in the WRITE4args structure | |||
| o The linkdata field of the NF4LNK arm in the createtype4 union | o The linkdata field of the NF4LNK arm in the createtype4 union | |||
| o The opaque data field in the READ4resok structure | o The opaque data field in the READ4resok structure | |||
| o The linkdata field in the READLINK4resok structure | o The linkdata field in the READLINK4resok structure | |||
| o In minor version 2 and newer, the rpc_data field of the | o In minor version 2 and newer, the rpc_data field of the | |||
| read_plus_content union (further restrictions on the use of this | read_plus_content union (further restrictions on the use of this | |||
| data item follow below). | data item follow below). | |||
| 4.1.1. READ_PLUS Replies | 5.1.1. READ_PLUS Replies | |||
| The NFS version 4.2 READ_PLUS operation returns a complex data type | The NFS version 4.2 READ_PLUS operation returns a complex data type | |||
| [RFC7862]. The rpr_contents field in the result of this operation is | [RFC7862]. The rpr_contents field in the result of this operation is | |||
| an array of read_plus_content unions, one arm of which contains an | an array of read_plus_content unions, one arm of which contains an | |||
| opaque byte stream (d_data). | opaque byte stream (d_data). | |||
| The size of d_data is limited to the value of the rpa_count field, | The size of d_data is limited to the value of the rpa_count field, | |||
| but the protocol does not bound the number of elements which can be | but the protocol does not bound the number of elements which can be | |||
| returned in the rpr_contents array. In order to make the size of | returned in the rpr_contents array. In order to make the size of | |||
| READ_PLUS replies predictable by NFS version 4.2 clients, the | READ_PLUS replies predictable by NFS version 4.2 clients, the | |||
| following restrictions are placed on the use of the READ_PLUS | following restrictions are placed on the use of the READ_PLUS | |||
| operation on RPC-over-RDMA transports: | operation on an RPC-over-RDMA Version One transport: | |||
| o An NFS version 4.2 client MUST NOT provide more than one Write | o An NFS version 4.2 client MUST NOT provide more than one Write | |||
| chunk for any READ_PLUS operation. When providing a Write chunk | chunk for any READ_PLUS operation. When providing a Write chunk | |||
| for a READ_PLUS operation, an NFS version 4.2 client MUST provide | for a READ_PLUS operation, an NFS version 4.2 client MUST provide | |||
| a Write chunk that is either empty (which forces all result data | a Write chunk that is either empty (which forces all result data | |||
| items for this operation to be returned inline) or large enough to | items for this operation to be returned inline) or large enough to | |||
| receive rpa_count bytes in a single element of the rpr_contents | receive rpa_count bytes in a single element of the rpr_contents | |||
| array. | array. | |||
| o If the Write chunk provided for a READ_PLUS operation by an NFS | o If the Write chunk provided for a READ_PLUS operation by an NFS | |||
| skipping to change at page 7, line 42 ¶ | skipping to change at page 8, line 39 ¶ | |||
| use that chunk for the first element of the rpr_contents array | use that chunk for the first element of the rpr_contents array | |||
| that has an rpc_data arm. | that has an rpc_data arm. | |||
| o An NFS version 4.2 server MUST NOT return more than two elements | o An NFS version 4.2 server MUST NOT return more than two elements | |||
| in the rpr_contents array of any READ_PLUS operation. It returns | in the rpr_contents array of any READ_PLUS operation. It returns | |||
| as much of the requested byte range as it can fit within these two | as much of the requested byte range as it can fit within these two | |||
| elements. If the NFS version 4.2 server has not asserted rpr_eof | elements. If the NFS version 4.2 server has not asserted rpr_eof | |||
| in the reply, the NFS version 4.2 client SHOULD send additional | in the reply, the NFS version 4.2 client SHOULD send additional | |||
| READ_PLUS requests for any remaining bytes. | READ_PLUS requests for any remaining bytes. | |||
| 4.2. NFS Version 4 Reply Size Estimation | 5.2. Reply Size Estimation | |||
| Within NFS version 4, there are certain variable-length result data | Within NFS version 4, there are certain variable-length result data | |||
| items whose maximum size cannot be estimated by clients reliably | items whose maximum size cannot be estimated by clients reliably | |||
| because there is no protocol-specified size limit on these arrays. | because there is no protocol-specified size limit on these arrays. | |||
| These include: | These include: | |||
| o The attrlist4 field | o The attrlist4 field | |||
| o Fields containing ACLs such as fattr4_acl, fattr4_dacl, | o Fields containing ACLs such as fattr4_acl, fattr4_dacl, | |||
| fattr4_sacl | fattr4_sacl | |||
| o Fields in the fs_locations4 and fs_locations_info4 data structures | o Fields in the fs_locations4 and fs_locations_info4 data structures | |||
| o Fields opaque to the NFS version 4 protocol which pertain to pNFS | o Fields opaque to the NFS version 4 protocol which pertain to pNFS | |||
| layout metadata, such as loc_body, loh_body, da_addr_body, | layout metadata, such as loc_body, loh_body, da_addr_body, | |||
| lou_body, lrf_body, fattr_layout_types and fs_layout_types, | lou_body, lrf_body, fattr_layout_types and fs_layout_types, | |||
| 4.2.1. Reply Size Estimation For Minor Version 0 | 5.2.1. Reply Size Estimation for Minor Version 0 | |||
| The NFSv4.0 protocol itself does not impose any bound on the size of | The NFS version 4.0 protocol itself does not impose any bound on the | |||
| NFS calls or responses. | size of NFS calls or responses. | |||
| Some of the data items enumerated in Section 4.2 (in particular, the | Some of the data items enumerated in Section 5.2 (in particular, the | |||
| items related to ACLs and fs_locations) make it difficult to predict | items related to ACLs and fs_locations) make it difficult to predict | |||
| the maximum size of NFSv4.0 GETATTR replies that interrogate | the maximum size of NFS version 4.0 replies that interrogate | |||
| variable-length attributes. As discussed in Section 2.2, client | variable-length fattr4 attributes. As discussed in Section 2, client | |||
| implementations can rely on their own internal architectural limits | implementations can rely on their own internal architectural limits | |||
| to bound the reply size, but such limits are not always guaranteed to | to constrain the reply size, but such limits are not always | |||
| be reliable. | guaranteed to be reliable. | |||
| When an especially large NFSv4.0 GETATTR result is expected, a Reply | ||||
| chunk might be required. If an NFSv4.0 server indicates that it | ||||
| cannot return an NFSv4.0 GETATTR response because the requesting | ||||
| NFSv4.0 client has not provided a large enough Reply chunk to receive | ||||
| that response, the NFSv4.0 client can choose to | ||||
| o Terminate the NFSv4.0 GETATTR with an error, or | ||||
| o Allocate a larger Reply chunk and send the same NFSv4.0 GETATTR | When an especially large fattr4 result is expected, a Reply chunk | |||
| request as a new RPC transaction. The NFS client should avoid | might be required. An NFS version 4.0 client can use short Reply | |||
| retrying the request indefinitely. | chunk retry when an NFS COMPOUND containing a GETATTR operation | |||
| encounters a transport error. | ||||
| The use of NFS COMPOUND operations raises the possibility of requests | The use of NFS COMPOUND operations raises the possibility of requests | |||
| that combine a non-idempotent operation (eg. NFS WRITE) with an | that combine a non-idempotent operation (e.g. WRITE) with a GETATTR | |||
| NFSv4.0 GETATTR that requests one or more variable length results. | operation that requests one or more variable-length results. This | |||
| This combination should be avoided by ensuring that any NFSv4.0 | combination should be avoided by ensuring that any GETATTR operation | |||
| GETATTR operation that might return a result of unpredictable length | that requests a result of unpredictable length is sent in an NFS | |||
| is sent in an NFS COMPOUND by itself. | COMPOUND by itself. | |||
| 4.2.2. Reply Size Estimation For Minor Version 1 And Newer | 5.2.2. Reply Size Estimation for Minor Version 1 and Newer | |||
| In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs | In NFS version 4.1 and newer minor versions, the csa_fore_chan_attrs | |||
| argument of the CREATE_SESSION operation contains a | argument of the CREATE_SESSION operation contains a | |||
| ca_maxresponsesize field. The value in this field can be taken as | ca_maxresponsesize field. The value in this field can be taken as | |||
| the absolute maximum size of replies generated by a replying NFS | the absolute maximum size of replies generated by an NFS version 4.1 | |||
| version 4 server. | server. | |||
| This value can be used in cases where it is not possible to estimate | This value can be used in cases where it is not possible to estimate | |||
| a reply size upper bound precisely. In practice, objects such as | a reply size upper bound precisely. In practice, objects such as | |||
| ACLs, named attributes, layout bodies, and security labels are much | ACLs, named attributes, layout bodies, and security labels are much | |||
| smaller than this maximum. | smaller than this maximum. | |||
| 4.3. NFS Version 4 COMPOUND Requests | 5.3. RPC Binding Considerations | |||
| NFS version 4 servers are required to listen on TCP port 2049, and | ||||
| they are not required to register with an rpcbind service [RFC7530]. | ||||
| Therefore, an NFS version 4 server supporting RPC-over-RDMA Version | ||||
| One MUST use the alternative well-known port number for its RPC-over- | ||||
| RDMA service (see Section 8). Clients SHOULD connect to this well- | ||||
| known port without consulting the RPC portmapper (as for NFS version | ||||
| 4 on TCP transports). | ||||
| 5.4. NFS COMPOUND Requests | ||||
| 5.4.1. Long Calls and Replies | ||||
| Each NFS version 4 COMPOUND procedure contains an array of operations | ||||
| which may be larger than a connection's inline thresholds, even after | ||||
| reduction of DDP-elibible payloads. Therefore, an NFS version 4 | ||||
| client MAY send a reduced Payload stream in a Long Call. An NFS | ||||
| version 4 client MAY enable an NFS version 4 server to send a reduced | ||||
| Payload stream in a Long Reply. | ||||
| 5.4.2. Multiple DDP-eligible Data Items | ||||
| The NFS version 4 COMPOUND procedure allows the transmission of more | The NFS version 4 COMPOUND procedure allows the transmission of more | |||
| than one DDP-eligible data item per Call and Reply message. An NFS | than one DDP-eligible data item per Call and Reply message. An NFS | |||
| version 4 client provides XDR Position values in each Read chunk to | version 4 client provides XDR Position values in each Read chunk to | |||
| disambiguate which chunk is associated with which argument data item. | disambiguate which chunk is associated with which argument data item. | |||
| However NFS version 4 server and client implementations must agree in | However NFS version 4 server and client implementations must agree in | |||
| advance on how to pair Write chunks with returned result data items. | advance on how to pair Write chunks with returned result data items. | |||
| The mechanism specified in Section 4.3.2 of | The mechanism specified in Section 4.3.2 of | |||
| [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with additional | [I-D.ietf-nfsv4-rfc5666bis]) is applied here, with additional | |||
| restrictions that appear below. In the following list, an "NFS Read" | restrictions that appear below. | |||
| operation refers to any NFS Version 4 operation which has a DDP- | ||||
| eligible result data item (i.e., either a READ, READ_PLUS, or | In the following list, an "NFS Read" operation refers to any NFS | |||
| READLINK operation). | Version 4 operation which has a DDP-eligible result data item (i.e., | |||
| either a READ, READ_PLUS, or READLINK operation). | ||||
| o If an NFS version 4 client wishes all DDP-eligible items in an NFS | o If an NFS version 4 client wishes all DDP-eligible items in an NFS | |||
| reply to be conveyed inline, it leaves the Write list empty. | reply to be conveyed inline, it leaves the Write list empty. | |||
| o The first chunk in the Write list MUST be used by the first READ | o The first chunk in the Write list MUST be used by the first READ | |||
| operation in an NFS version 4 COMPOUND procedure. The next Write | operation in an NFS version 4 COMPOUND procedure. The next Write | |||
| chunk is used by the next READ operation, and so on. | chunk is used by the next READ operation, and so on. | |||
| o If an NFS version 4 client has provided a matching non-empty Write | o If an NFS version 4 client has provided a matching non-empty Write | |||
| chunk, then the corresponding READ operation MUST return its DDP- | chunk, then the corresponding READ operation MUST return its DDP- | |||
| skipping to change at page 10, line 5 ¶ | skipping to change at page 11, line 14 ¶ | |||
| o If a READ operation returns a union arm which does not contain a | o If a READ operation returns a union arm which does not contain a | |||
| DDP-eligible result, and the NFS version 4 client has provided a | DDP-eligible result, and the NFS version 4 client has provided a | |||
| matching non-empty Write chunk, an NFS version 4 server MUST | matching non-empty Write chunk, an NFS version 4 server MUST | |||
| return an empty Write chunk in that Write list position. | return an empty Write chunk in that Write list position. | |||
| o If there are more READ operations than Write chunks, then | o If there are more READ operations than Write chunks, then | |||
| remaining NFS Read operations in an NFS version 4 COMPOUND that | remaining NFS Read operations in an NFS version 4 COMPOUND that | |||
| have no matching Write chunk MUST return their results inline. | have no matching Write chunk MUST return their results inline. | |||
| 4.3.1. NFS Version 4 COMPOUND Example | 5.4.3. NFS Version 4 COMPOUND Example | |||
| The following example shows a Write list with three Write chunks, A, | The following example shows a Write list with three Write chunks, A, | |||
| B, and C. The NFS version 4 server consumes the provided Write | B, and C. The NFS version 4 server consumes the provided Write | |||
| chunks by writing the results of the designated operations in the | chunks by writing the results of the designated operations in the | |||
| compound request (READ and READLINK) back to each chunk. | compound request (READ and READLINK) back to each chunk. | |||
| Write list: | Write list: | |||
| A --> B --> C | A --> B --> C | |||
| skipping to change at page 10, line 27 ¶ | skipping to change at page 11, line 36 ¶ | |||
| PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ | PUTFH LOOKUP READ PUTFH LOOKUP READLINK PUTFH LOOKUP READ | |||
| | | | | | | | | |||
| v v v | v v v | |||
| A B C | A B C | |||
| If the NFS version 4 client does not want to have the READLINK result | If the NFS version 4 client does not want to have the READLINK result | |||
| returned via RDMA, it provides an empty Write chunk for buffer B to | returned via RDMA, it provides an empty Write chunk for buffer B to | |||
| indicate that the READLINK result must be returned inline. | indicate that the READLINK result must be returned inline. | |||
| 4.4. NFS Version 4 Callback | 5.5. NFS Callback Requests | |||
| The NFS version 4 protocols support server-initiated callbacks to | The NFS version 4 family of protocols support server-initiated | |||
| notify clients of events such as recalled delegations. | callbacks to notify NFS version 4 clients of events such as recalled | |||
| delegations. | ||||
| 4.4.1. NFS Version 4.0 Callback | 5.5.1. NFS Version 4.0 Callback | |||
| NFS version 4.0 implementations typically employ a separate TCP | NFS version 4.0 implementations typically employ a separate TCP | |||
| connection to handle callback operations, even when the forward | connection to handle callback operations, even when the forward | |||
| channel uses a RPC-over-RDMA transport. | channel uses an RPC-over-RDMA Version One transport. | |||
| No operation in the NFS version 4.0 callback RPC program conveys a | No operation in the NFS version 4.0 callback RPC Program conveys a | |||
| significant data payload. Therefore, no XDR data items in this RPC | significant data payload. Therefore, no XDR data items in this RPC | |||
| program is DDP-eligible. | Program is DDP-eligible. | |||
| A CB_RECALL reply is small and fixed in size. The CB_GETATTR reply | A CB_RECALL reply is small and fixed in size. The CB_GETATTR reply | |||
| contains a variable-length fattr4 data item. See Section 4.2.1 for a | contains a variable-length fattr4 data item. See Section 5.2.1 for a | |||
| discussion of reply size prediction for this data item. | discussion of reply size prediction for this data item. | |||
| An NFS version 4.0 client advertises netids and ad hoc port addresses | An NFS version 4.0 client advertises netids and ad hoc port addresses | |||
| for contacting its NFS version 4.0 callback service using the | for contacting its NFS version 4.0 callback service using the | |||
| SETCLIENTID operation. | SETCLIENTID operation. | |||
| 4.4.2. NFS Version 4.1 Callback | 5.5.2. NFS Version 4.1 Callback | |||
| In NFS version 4.1 and newer minor versions, callback operations may | In NFS version 4.1 and newer minor versions, callback operations may | |||
| appear on the same connection as is used for NFS version 4 forward | appear on the same connection as is used for NFS version 4 forward | |||
| channel client requests. NFS version 4 clients and servers MUST use | channel client requests. NFS version 4 clients and servers MUST use | |||
| the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when | the mechanism described in [I-D.ietf-nfsv4-rpcrdma-bidirection] when | |||
| backchannel operations are conveyed on RPC-over-RDMA transports. | backchannel operations are conveyed on RPC-over-RDMA Version One | |||
| transports. | ||||
| The csa_back_chan_attrs argument of the CREATE_SESSION operation | The csa_back_chan_attrs argument of the CREATE_SESSION operation | |||
| contains a ca_maxresponsesize field. The value in this field can be | contains a ca_maxresponsesize field. The value in this field can be | |||
| taken as the absolute maximum size of backchannel replies generated | taken as the absolute maximum size of backchannel replies generated | |||
| by a replying NFS version 4 client. | by a replying NFS version 4 client. | |||
| There are no DDP-eligible data items in callback procedures defined | There are no DDP-eligible data items in callback procedures defined | |||
| in NFS version 4.1 or NFS version 4.2. However, some callback | in NFS version 4.1 or NFS version 4.2. However, some callback | |||
| operations, such as messages that convey device ID information, can | operations, such as messages that convey device ID information, can | |||
| be large, in which case a Long Call or Reply might be required. | be large, in which case a Long Call or Reply might be required. | |||
| When an NFS version 4.1 client can support Long Calls in its | When an NFS version 4.1 client can support Long Calls in its | |||
| backchannel, it reports a backchannel ca_maxrequestsize that is | backchannel, it reports a backchannel ca_maxrequestsize that is | |||
| larger than the connection's inline thresholds. Otherwise an NFS | larger than the connection's inline thresholds. Otherwise an NFS | |||
| version 4 server MUST use only Short messages to convey backchannel | version 4 server MUST use only Short messages to convey backchannel | |||
| operations. | operations. | |||
| 4.5. Session-Related Considerations | 5.6. Session-Related Considerations | |||
| Typically the presence of an NFS session [RFC5661] has no effect on | The presence of an NFS session (defined in [RFC5661]) has no effect | |||
| the operation of RPC-over-RDMA. None of the operations introduced to | on the operation of RPC-over-RDMA Version One. None of the | |||
| support NFS sessions (eg. SEQUENCE) contain DDP-eligible data items. | operations introduced to support NFS sessions (e.g. the SEQUENCE | |||
| There is no need to match the number of session slots with the number | operation) contain DDP-eligible data items. There is no need to | |||
| of available RPC-over-RDMA credits. | match the number of session slots with the number of available RPC- | |||
| over-RDMA credits. | ||||
| When an NFS session operates on an RPC-over-RDMA transport, there are | However, there are a few new cases where an RPC transaction can fail. | |||
| a few additional cases where an RPC transaction can fail. For | For example, a requester might receive, in response to an RPC | |||
| example, a requester might receive, in response to an RPC request, an | request, an RDMA_ERROR message with an rdma_err value of ERR_CHUNK, | |||
| RDMA_ERROR message with an rdma_err value of ERR_CHUNK, or an | or an RDMA_MSG containing an RPC_GARBAGEARGS reply. These situations | |||
| RDMA_MSG containing an RPC_GARBAGEARGS reply. These situations are | are no different from existing RPC errors which an NFS session | |||
| no different from existing RPC errors which an NFS session | ||||
| implementation is already prepared to handle for other transports. | implementation is already prepared to handle for other transports. | |||
| As with other transports during such a failure, there might be no | And as with other transports during such a failure, there might be no | |||
| SEQUENCE result available to the requester to distinguish whether | SEQUENCE result available to the requester to distinguish whether | |||
| failure occurred before or after the requested operations were | failure occurred before or after the requested operations were | |||
| executed on the responder. When a transport error occurs (eg. | executed on the responder. | |||
| RDMA_ERROR), the requester proceeds as usual to match the incoming | ||||
| XID value to a waiting RPC Call. The RPC transaction is terminated, | ||||
| and the result status is reported to the Upper Layer Protocol. The | ||||
| requester's session implementation then determines the session ID and | ||||
| slot for the failed request, and performs slot recovery to make that | ||||
| slot usable again. If this is not done, that slot could be rendered | ||||
| permanently unavailable. | ||||
| 4.6. Retransmission And Keep-Alive | When a transport error occurs (e.g. RDMA_ERROR), the requester | |||
| proceeds as usual to match the incoming XID value to a waiting RPC | ||||
| Call. The RPC transaction is terminated, and the result status is | ||||
| reported to the Upper Layer Protocol. The requester's session | ||||
| implementation then determines the session ID and slot for the failed | ||||
| request, and performs slot recovery to make that slot usable again. | ||||
| If this is not done, that slot could be rendered permanently | ||||
| unavailable. | ||||
| 5.7. Transport Considerations | ||||
| 5.7.1. Congestion Avoidance | ||||
| Section 3.1 of [RFC7530] states: | ||||
| Where an NFSv4 implementation supports operation over the IP | ||||
| network protocol, the supported transport layer between NFS and IP | ||||
| MUST be an IETF standardized transport protocol that is specified | ||||
| to avoid network congestion; such transports include TCP and the | ||||
| Stream Control Transmission Protocol (SCTP). | ||||
| Section 2.9.1 of [RFC5661] further states: | ||||
| Even if NFSv4.1 is used over a non-IP network protocol, it is | ||||
| RECOMMENDED that the transport support congestion control. | ||||
| It is permissible for a connectionless transport to be used under | ||||
| NFSv4.1; however, reliable and in-order delivery of data combined | ||||
| with congestion control by the connectionless transport is | ||||
| REQUIRED. As a consequence, UDP by itself MUST NOT be used as an | ||||
| NFSv4.1 transport. | ||||
| RPC-over-RDMA Version One is constructed on a platform of RDMA | ||||
| Reliable Connections [I-D.ietf-nfsv4-rfc5666bis] [RFC5041]. RDMA | ||||
| Reliable Connections are reliable, connection-oriented transports | ||||
| that guarantee in-order delivery, meeting all above requirements for | ||||
| NFS version 4 transports. | ||||
| 5.7.2. Retransmission and Keep-alive | ||||
| NFS version 4 client implementations often rely on a transport-layer | NFS version 4 client implementations often rely on a transport-layer | |||
| keep-alive mechanism to detect when an NFS version 4 server has | keep-alive mechanism to detect when an NFS version 4 server has | |||
| become unresponsive. When an NFS server is no longer responsive, | become unresponsive. When an NFS server is no longer responsive, | |||
| client-side keep-alive terminates the connection, which in turn | client-side keep-alive terminates the connection, which in turn | |||
| triggers reconnection and RPC retransmission. | triggers reconnection and RPC retransmission. | |||
| Some RDMA transports (such as Reliable Connections on InfiniBand) | Some RDMA transports (such as Reliable Connections on InfiniBand) | |||
| have no keep-alive mechanism. Without a disconnect or new RPC | have no keep-alive mechanism. Without a disconnect or new RPC | |||
| traffic, such connections can remain alive long after an NFS server | traffic, such connections can remain alive long after an NFS server | |||
| skipping to change at page 12, line 30 ¶ | skipping to change at page 14, line 21 ¶ | |||
| available RPC-over-RDMA credits on that transport connection, it will | available RPC-over-RDMA credits on that transport connection, it will | |||
| forever await a reply before sending another RPC request. | forever await a reply before sending another RPC request. | |||
| NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use | NFS version 4 clients SHOULD reserve one RPC-over-RDMA credit to use | |||
| for periodic server or connection health assessment. This credit can | for periodic server or connection health assessment. This credit can | |||
| be used to drive an RPC request on an otherwise idle connection, | be used to drive an RPC request on an otherwise idle connection, | |||
| triggering either a quick affirmative server response or immediate | triggering either a quick affirmative server response or immediate | |||
| connection termination. | connection termination. | |||
| In addition to network partition and request loss scenarios, RPC- | In addition to network partition and request loss scenarios, RPC- | |||
| over-RDMA connections can be terminated when a Transport header is | over-RDMA transport connections can be terminated when a Transport | |||
| malformed, messages are larger than receive resources, or when too | header is malformed, Reply messages are larger than receive | |||
| many RPC-over-RDMA messages are sent at once. In such cases: | resources, or when too many RPC-over-RDMA messages are sent at once. | |||
| In such cases: | ||||
| o If there is a transport error indicated (ie, RDMA_ERROR) before | o If there is a transport error indicated (ie, RDMA_ERROR) before | |||
| the disconnect or instead of a disconnect, the requester MUST | the disconnect or instead of a disconnect, the requester MUST | |||
| respond to that error as prescribed by the specification of the | respond to that error as prescribed by the specification of the | |||
| RPC transport. Then the NFS version 4 rules for handling | RPC transport. Then the NFS version 4 rules for handling | |||
| retransmission apply. | retransmission apply. | |||
| o If there is a transport disconnect and the responder has provided | o If there is a transport disconnect and the responder has provided | |||
| no other response for a request, then only the NFS version 4 rules | no other response for a request, then only the NFS version 4 rules | |||
| for handling retransmission apply. | for handling retransmission apply. | |||
| 5. Extending NFS Upper Layer Bindings | 6. Extending NFS Upper Layer Bindings | |||
| RPC programs such as NFS are required to have an Upper Layer Binding | RPC Programs such as NFS are required to have an Upper Layer Binding | |||
| specification to interoperate on RPC-over-RDMA transports | specification to interoperate on RPC-over-RDMA Version One transports | |||
| [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer | [I-D.ietf-nfsv4-rfc5666bis]. Via standards action, the Upper Layer | |||
| Binding specified in this document can be extended to cover versions | Binding specified in this document can be extended to cover versions | |||
| of the NFS version 4 protocol specified after NFS version 4 minor | of the NFS version 4 protocol specified after NFS version 4 minor | |||
| version 2, or separately published extensions to an existing NFS | version 2, or separately published extensions to an existing NFS | |||
| version 4 minor version, as described in [I-D.ietf-nfsv4-versioning]. | version 4 minor version, as described in [I-D.ietf-nfsv4-versioning]. | |||
| 6. IANA Considerations | ||||
| NFS use of direct data placement introduces a need for an additional | ||||
| NFS port number assignment for networks that share traditional UDP | ||||
| and TCP port spaces with RDMA services. The iWARP [RFC5041] | ||||
| [RFC5040] protocol is such an example (InfiniBand is not). | ||||
| NFS servers for versions 2 and 3 [RFC1094] [RFC1813] traditionally | ||||
| listen for clients on UDP and TCP port 2049, and additionally, they | ||||
| register these with the portmapper and/or rpcbind [RFC1833] service. | ||||
| However, [RFC7530] requires NFS version 4 servers to listen on TCP | ||||
| port 2049, and they are not required to register. | ||||
| An NFS version 2 or version 3 server supporting RPC-over-RDMA on such | ||||
| a network and registering itself with the RPC portmapper MAY choose | ||||
| an arbitrary port, or MAY use the alternative well-known port number | ||||
| for its RPC-over-RDMA service. The chosen port MAY be registered | ||||
| with the RPC portmapper under the netid assigned by the requirement | ||||
| in [I-D.ietf-nfsv4-rfc5666bis]. | ||||
| An NFS version 4 server supporting RPC-over-RDMA on such a network | ||||
| MUST use the alternative well-known port number for its RPC-over-RDMA | ||||
| service. Clients SHOULD connect to this well-known port without | ||||
| consulting the RPC portmapper (as for NFS version 4 on TCP | ||||
| transports). | ||||
| The port number assigned to an NFS service over an RPC-over-RDMA | ||||
| transport is available from the IANA port registry [RFC3232]. | ||||
| 7. Security Considerations | 7. Security Considerations | |||
| RPC-over-RDMA supports all RPC security models, including RPCSEC_GSS | RPC-over-RDMA Version One supports all RPC security models, including | |||
| security and transport-level security [RFC2203]. The choice of what | RPCSEC_GSS security and transport-level security [RFC2203]. The | |||
| Direct Data Placement mechanism to convey RPC argument and results | choice of what Direct Data Placement mechanism to convey RPC argument | |||
| does not affect this, since it changes only the method of data | and results does not affect this, since it changes only the method of | |||
| transfer. Specifically, the requirements of | data transfer. Specifically, the requirements of | |||
| [I-D.ietf-nfsv4-rfc5666bis] ensure that this choice does not | [I-D.ietf-nfsv4-rfc5666bis] ensure that this choice does not | |||
| introduce new vulnerabilities. | introduce new vulnerabilities. | |||
| Because this document defines only the binding of the NFS protocols | Because this document defines only the binding of the NFS protocols | |||
| atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security | atop [I-D.ietf-nfsv4-rfc5666bis], all relevant security | |||
| considerations are therefore to be described at that layer. | considerations are therefore to be described at that layer. | |||
| 8. References | 8. IANA Considerations | |||
| 8.1. Normative References | The use of direct data placement in NFS introduces a need for an | |||
| additional port number assignment for networks that share traditional | ||||
| UDP and TCP port spaces with RDMA services. The iWARP protocol is | ||||
| such an example [RFC5041] [RFC5040]. | ||||
| For this purpose, a set of transport protocol port number assignments | ||||
| is specified by this document. IANA has assigned the following ports | ||||
| for NFS/RDMA in the IANA port registry, according to the guidelines | ||||
| described in [RFC6335]. | ||||
| nfsrdma 20049/tcp Network File System (NFS) over RDMA | ||||
| nfsrdma 20049/udp Network File System (NFS) over RDMA | ||||
| nfsrdma 20049/sctp Network File System (NFS) over RDMA | ||||
| This document should be listed as the reference for the nfsrdma port | ||||
| assignments. This document does not alter these assignments. | ||||
| 9. References | ||||
| 9.1. Normative References | ||||
| [I-D.ietf-nfsv4-rfc5666bis] | [I-D.ietf-nfsv4-rfc5666bis] | |||
| Lever, C., Simpson, W., and T. Talpey, "Remote Direct | Lever, C., Simpson, W., and T. Talpey, "Remote Direct | |||
| Memory Access Transport for Remote Procedure Call, Version | Memory Access Transport for Remote Procedure Call, Version | |||
| One", draft-ietf-nfsv4-rfc5666bis-09 (work in progress), | One", draft-ietf-nfsv4-rfc5666bis-10 (work in progress), | |||
| January 2017. | February 2017. | |||
| [I-D.ietf-nfsv4-rpcrdma-bidirection] | [I-D.ietf-nfsv4-rpcrdma-bidirection] | |||
| Lever, C., "Bi-directional Remote Procedure Call On RPC- | Lever, C., "Bi-directional Remote Procedure Call On RPC- | |||
| over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- | over-RDMA Transports", draft-ietf-nfsv4-rpcrdma- | |||
| bidirection-06 (work in progress), January 2017. | bidirection-07 (work in progress), February 2017. | |||
| [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | [RFC1833] Srinivasan, R., "Binding Protocols for ONC RPC Version 2", | |||
| RFC 1833, DOI 10.17487/RFC1833, August 1995, | RFC 1833, DOI 10.17487/RFC1833, August 1995, | |||
| <http://www.rfc-editor.org/info/rfc1833>. | <http://www.rfc-editor.org/info/rfc1833>. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | DOI 10.17487/RFC2119, March 1997, | |||
| <http://www.rfc-editor.org/info/rfc2119>. | <http://www.rfc-editor.org/info/rfc2119>. | |||
| [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol | [RFC2203] Eisler, M., Chiu, A., and L. Ling, "RPCSEC_GSS Protocol | |||
| Specification", RFC 2203, DOI 10.17487/RFC2203, September | Specification", RFC 2203, DOI 10.17487/RFC2203, September | |||
| 1997, <http://www.rfc-editor.org/info/rfc2203>. | 1997, <http://www.rfc-editor.org/info/rfc2203>. | |||
| [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., | [RFC5661] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., | |||
| "Network File System (NFS) Version 4 Minor Version 1 | "Network File System (NFS) Version 4 Minor Version 1 | |||
| Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, | Protocol", RFC 5661, DOI 10.17487/RFC5661, January 2010, | |||
| <http://www.rfc-editor.org/info/rfc5661>. | <http://www.rfc-editor.org/info/rfc5661>. | |||
| [RFC6335] Cotton, M., Eggert, L., Touch, J., Westerlund, M., and S. | ||||
| Cheshire, "Internet Assigned Numbers Authority (IANA) | ||||
| Procedures for the Management of the Service Name and | ||||
| Transport Protocol Port Number Registry", BCP 165, | ||||
| RFC 6335, DOI 10.17487/RFC6335, August 2011, | ||||
| <http://www.rfc-editor.org/info/rfc6335>. | ||||
| [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System | [RFC7530] Haynes, T., Ed. and D. Noveck, Ed., "Network File System | |||
| (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, | (NFS) Version 4 Protocol", RFC 7530, DOI 10.17487/RFC7530, | |||
| March 2015, <http://www.rfc-editor.org/info/rfc7530>. | March 2015, <http://www.rfc-editor.org/info/rfc7530>. | |||
| [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor | |||
| Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, | Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, | |||
| November 2016, <http://www.rfc-editor.org/info/rfc7862>. | November 2016, <http://www.rfc-editor.org/info/rfc7862>. | |||
| 8.2. Informative References | 9.2. Informative References | |||
| [I-D.ietf-nfsv4-versioning] | [I-D.ietf-nfsv4-versioning] | |||
| Noveck, D., "Rules for NFSv4 Extensions and Minor | Noveck, D., "Rules for NFSv4 Extensions and Minor | |||
| Versions", draft-ietf-nfsv4-versioning-09 (work in | Versions", draft-ietf-nfsv4-versioning-09 (work in | |||
| progress), December 2016. | progress), December 2016. | |||
| [NSM] The Open Group, "Protocols for Interworking: XNFS, Version | [NSM] The Open Group, "Protocols for Interworking: XNFS, Version | |||
| 3W", February 1998. | 3W", February 1998. | |||
| [RFC1094] Nowicki, B., "NFS: Network File System Protocol | [RFC1094] Nowicki, B., "NFS: Network File System Protocol | |||
| specification", RFC 1094, DOI 10.17487/RFC1094, March | specification", RFC 1094, DOI 10.17487/RFC1094, March | |||
| 1989, <http://www.rfc-editor.org/info/rfc1094>. | 1989, <http://www.rfc-editor.org/info/rfc1094>. | |||
| [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | [RFC1813] Callaghan, B., Pawlowski, B., and P. Staubach, "NFS | |||
| Version 3 Protocol Specification", RFC 1813, | Version 3 Protocol Specification", RFC 1813, | |||
| DOI 10.17487/RFC1813, June 1995, | DOI 10.17487/RFC1813, June 1995, | |||
| <http://www.rfc-editor.org/info/rfc1813>. | <http://www.rfc-editor.org/info/rfc1813>. | |||
| [RFC3232] Reynolds, J., Ed., "Assigned Numbers: RFC 1700 is Replaced | ||||
| by an On-line Database", RFC 3232, DOI 10.17487/RFC3232, | ||||
| January 2002, <http://www.rfc-editor.org/info/rfc3232>. | ||||
| [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. | [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. | |||
| Garcia, "A Remote Direct Memory Access Protocol | Garcia, "A Remote Direct Memory Access Protocol | |||
| Specification", RFC 5040, DOI 10.17487/RFC5040, October | Specification", RFC 5040, DOI 10.17487/RFC5040, October | |||
| 2007, <http://www.rfc-editor.org/info/rfc5040>. | 2007, <http://www.rfc-editor.org/info/rfc5040>. | |||
| [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct | [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct | |||
| Data Placement over Reliable Transports", RFC 5041, | Data Placement over Reliable Transports", RFC 5041, | |||
| DOI 10.17487/RFC5041, October 2007, | DOI 10.17487/RFC5041, October 2007, | |||
| <http://www.rfc-editor.org/info/rfc5041>. | <http://www.rfc-editor.org/info/rfc5041>. | |||
| [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access | ||||
| Transport for Remote Procedure Call", RFC 5666, | ||||
| DOI 10.17487/RFC5666, January 2010, | ||||
| <http://www.rfc-editor.org/info/rfc5666>. | ||||
| [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) | [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) | |||
| Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, | Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, | |||
| January 2010, <http://www.rfc-editor.org/info/rfc5667>. | January 2010, <http://www.rfc-editor.org/info/rfc5667>. | |||
| Appendix A. Changes Since RFC 5667 | Appendix A. Changes Since RFC 5667 | |||
| Corrections and updates made necessary by new language in | Corrections and updates made necessary by new language in | |||
| [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, | [I-D.ietf-nfsv4-rfc5666bis] have been introduced. For example, | |||
| references to deprecated features of RPC-over-RDMA Version One, such | references to deprecated features of RPC-over-RDMA Version One, such | |||
| as RDMA_MSGP, and the use of the Read list for handling RPC replies, | as RDMA_MSGP, and the use of the Read list for handling RPC replies, | |||
| skipping to change at page 16, line 4 ¶ | skipping to change at page 17, line 42 ¶ | |||
| Some material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] | Some material that duplicates what is in [I-D.ietf-nfsv4-rfc5666bis] | |||
| has been deleted. | has been deleted. | |||
| Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer | Material required by [I-D.ietf-nfsv4-rfc5666bis] for Upper Layer | |||
| Bindings that was not present in [RFC5667] has been added, including | Bindings that was not present in [RFC5667] has been added, including | |||
| discussion of how each NFS version properly estimates the maximum | discussion of how each NFS version properly estimates the maximum | |||
| size of RPC replies. | size of RPC replies. | |||
| Technical corrections have been made. For example, the mention of | Technical corrections have been made. For example, the mention of | |||
| 12KB and 36KB inline thresholds have been removed. The reference to | 12KB and 36KB inline thresholds have been removed. The reference to | |||
| a non-existant NFS version 4 SYMLINK operation has been replaced with | a non-existant NFS version 4 SYMLINK operation has been replaced. | |||
| NFS version 4 CREATE(NF4LNK). | ||||
| The discussion of NFS version 4 COMPOUND handling has been completed. | The discussion of NFS version 4 COMPOUND handling has been completed. | |||
| Some changes were made to the algorithm for matching DDP-eligible | Some changes were made to the algorithm for matching DDP-eligible | |||
| results to Write chunks. | results to Write chunks. | |||
| Requirements to ignore extra Read or Write chunks have been removed | Requirements to ignore extra Read or Write chunks have been removed | |||
| from the NFS version 2 and 3 Upper Layer Binding, as they conflict | from the NFS version 2 and 3 Upper Layer Binding, as they conflict | |||
| with [I-D.ietf-nfsv4-rfc5666bis]. | with [I-D.ietf-nfsv4-rfc5666bis]. | |||
| A complete discussion of reply size estimation has been introduced | A complete discussion of reply size estimation has been introduced | |||
| skipping to change at page 16, line 36 ¶ | skipping to change at page 18, line 26 ¶ | |||
| backchannel operation has replaced the previous treatment of | backchannel operation has replaced the previous treatment of | |||
| callback operations. | callback operations. | |||
| o A binding for NFS version 4.2 has been added that includes | o A binding for NFS version 4.2 has been added that includes | |||
| discussion of new data-bearing operations like READ_PLUS. | discussion of new data-bearing operations like READ_PLUS. | |||
| o A section suggesting a mechanism for periodically assessing | o A section suggesting a mechanism for periodically assessing | |||
| connection health has been introduced. | connection health has been introduced. | |||
| o Language inconsistent with or contradictory to | o Language inconsistent with or contradictory to | |||
| [I-D.ietf-nfsv4-rfc5666bis] has been removed from Sections 2 and | [I-D.ietf-nfsv4-rfc5666bis] has been removed from the present | |||
| 3, and both Sections have been combined into Section 2 in the | document. | |||
| present document. | ||||
| o Ambiguous or erroneous uses of RFC2119 terms have been corrected. | o Ambiguous or erroneous uses of RFC2119 terms have been corrected. | |||
| o References to obsolete RFCs have been updated. | o References to obsolete RFCs have been updated. | |||
| o An IANA Considerations Section has replaced the "Port Usage | o An IANA Considerations Section has been added, which specifies the | |||
| Considerations" Section. | port assignments for NFS/RDMA. This replaces the example | |||
| assignment that appeared in [RFC5666]. | ||||
| o Code excerpts have been removed, and figures have been modernized. | o Code excerpts have been removed, and figures have been modernized. | |||
| Appendix B. Acknowledgments | Appendix B. Acknowledgments | |||
| The author gratefully acknowledges the work of Brent Callaghan and | The author gratefully acknowledges the work of Brent Callaghan and | |||
| Tom Talpey on the original NFS Direct Data Placement specification | Tom Talpey on the original NFS Direct Data Placement specification | |||
| [RFC5667]. The author also wishes to thank Bill Baker and Greg | [RFC5667]. The author also wishes to thank Bill Baker and Greg | |||
| Marsden for their support of this work. | Marsden for their support of this work. | |||
| Dave Noveck provided excellent review, constructive suggestions, and | Dave Noveck provided excellent review, constructive suggestions, and | |||
| consistent navigational guidance throughout the process of drafting | consistent navigational guidance throughout the process of drafting | |||
| this document. Dave also contributed the text of Section 4.5 | this document. Dave also contributed the text of Section 5.6 and | |||
| Section 6, and insisted on precise discussion of reply size | ||||
| estimation. | ||||
| Thanks to Karen Deitke for her sharp observations about idempotency, | Thanks to Karen Deitke for her sharp observations about idempotency, | |||
| and the clarity of the discussion of NFS COMPOUNDs and NFS sessions. | and the clarity of the discussion of NFS COMPOUNDs and NFS sessions. | |||
| Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 | Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 | |||
| Working Group Chair Spencer Shepler, and nfsv4 Working Group | Working Group Chair Spencer Shepler, and nfsv4 Working Group | |||
| Secretary Thomas Haynes for their support. | Secretary Thomas Haynes for their support. | |||
| Author's Address | Author's Address | |||
| End of changes. 95 change blocks. | ||||
| 256 lines changed or deleted | 346 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||