idnits 2.17.1 draft-cel-nfsv4-rpcrdma-cm-pvt-msg-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 29, 2016) is 2758 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- == Outdated reference: A later version (-11) exists of draft-ietf-nfsv4-rfc5666bis-07 == Outdated reference: A later version (-09) exists of draft-cel-nfsv4-reminv-design-04 Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever 3 Internet-Draft Oracle 4 Intended status: Experimental September 29, 2016 5 Expires: April 2, 2017 7 RDMA Connection Manager Private Messages For RPC-Over-RDMA Version One 8 draft-cel-nfsv4-rpcrdma-cm-pvt-msg-00 10 Abstract 12 This document specifies the format of RDMA-CM Private Data exchanged 13 between RPC-over-RDMA Version One peers. Such messages indicate peer 14 support for Remote Invalidation and larger-than-default inline 15 thresholds, but can be extended. The Private Data message format 16 defined in this document is experimental only. 18 Requirements Language 20 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 21 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 22 document are to be interpreted as described in [RFC2119]. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on April 2, 2017. 41 Copyright Notice 43 Copyright (c) 2016 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 2. Advertised Transport Capabilities . . . . . . . . . . . . . . 3 60 3. Private Data Message Format . . . . . . . . . . . . . . . . . 4 61 4. Interoperability Considerations . . . . . . . . . . . . . . . 6 62 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 63 6. Security Considerations . . . . . . . . . . . . . . . . . . . 6 64 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 65 Appendix A. Acknowledgments . . . . . . . . . . . . . . . . . . 7 66 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 7 68 1. Introduction 70 RPC-over-RDMA Version One, specified in [I-D.ietf-nfsv4-rfc5666bis], 71 enables the use of direct data placement for upper layer protocols 72 based on RPC [RFC5531]. However, there are some recognized 73 shortcomings of the RPC-over-RDMA Version One protocol. The two most 74 immediate shortcomings are: 76 o Setting up an explicit RDMA operation (RDMA Read or Write) can be 77 costly. The small default size of inline thresholds requires the 78 use of explicit RDMA operations even for relatively small messages 79 and data payloads. 81 o Unlike most other contemporary RDMA-enabled storage protocols, 82 there is no facility in RPC-over-RDMA Version One that enables the 83 use of Remote Invalidation [RFC5042]. 85 The original specification of RPC-over-RDMA Version One provided an 86 out-of-band protocol for passing inline threshold settings between 87 connected peers. However, [I-D.ietf-nfsv4-rfc5666bis] deprecates 88 this protocol because it was not fully specified and thus it was 89 never implemented. 91 Work on [I-D.ietf-nfsv4-rfc5666bis] has demonstrated that the RPC- 92 over-RDMA Version One protocol as it stands is challenging to extend 93 while maintaining interoperability. Therefore, another out-of-band 94 mechanism is required to help relieve these limitations for RPC-over- 95 RDMA Version One implementations. 97 This document specifies a simple, non-XDR-based message format 98 designed to pass between RPC-over-RDMA Version One peers when an RDMA 99 transport connection is first established. The purpose of this 100 message format is to enable experimentation with parameters of the 101 base transport layer over which RPC-over-RDMA runs. Future versions 102 of RPC-over-RDMA may make use of these experimental results, 103 providing similar information exchange as part of the XDR-defined 104 base transport protocol. 106 2. Advertised Transport Capabilities 108 2.1. Inline Threshold Size 110 Section 4.3.2 of [I-D.ietf-nfsv4-rfc5666bis] defines the term "inline 111 threshold." There are a pair of inline thresholds per transport 112 connection, one for each direction of message flow, which limit the 113 size of messages conveyed using RDMA Send. If an incoming message 114 exceeds the size of a receiver's inline threshold, the receive 115 operation fails and the connection is typically terminated. To send 116 a message larger than a receiver's inline threshold, an NFS client 117 uses explicit RDMA operations, which are typically more costly than 118 RDMA Send. 120 The default value of this threshold for RPC-over-RDMA Version One 121 connections is 1024 bytes (see Section 4.3.3 of 122 [I-D.ietf-nfsv4-rfc5666bis]). This is adequate for nearly all NFS 123 Version 3 procedures. NFS Version 4 COMPOUNDs are larger, on 124 average, forcing clients to use explicit RDMA operations for 125 frequently-issued requests such as LOOKUP and GETATTR. 127 If a sender and receiver can agree on a larger inline threshold, a 128 greater portion of frequently-issued NFS Version 4 operations can 129 avoid the use of explicit RDMA operations. Explicit RDMA can be 130 avoided for smaller I/O requests as well. 132 Thus each peer advertises the largest message size it can send and 133 the largest size it can receive. The requester MUST use the smaller 134 of its maximum send size and the responder's maximum receive size as 135 the requester-to-responder inline threshold. The responder MUST use 136 the smaller of its maximum send size and the requester's maximum 137 receive size as the responder-to-requester inline threshold. 139 2.2. Support for Remote Invalidation 141 A description of Remote Invalidation and a full discussion of the 142 design issues can be found in [I-D.cel-nfsv4-reminv-design]. 144 Without altering the XDR definition of RPC-over-RDMA Version One 145 messages that carry chunk lists, it's not possible to provide fully 146 generic support for Remote Invalidation. However, it is possible to 147 provide a simple signaling mechanism for a requester to indicate it 148 can deal with Responder's Choice (see Section 2.3 of 149 [I-D.cel-nfsv4-reminv-design]). In this case, the responder is 150 allowed to invalidate any STag in an RPC-over-RDMA request. 152 Thus each peer advertises its ability to support Responder's Choice 153 Remote Invalidation. If both peers support it, then the responder 154 MAY use RDMA Send With Invalidate rather than RDMA Send to convey 155 RPC-over-RDMA reply messages. 157 3. Private Data Message Format 159 When an RPC-over-RDMA Version One transport connection is 160 established, a requester and responder MAY populate the CM Private 161 Data field exchanged as part of CM connection establishment (refer to 162 Section 12.7.35 of [IBTA-IB]). For RPC-over-RDMA Version One, the CM 163 Private Data field is formatted as described in this section. 164 Requesters and responders use the same format. 166 3.1. Fixed Mandatory Fields 168 The first 8 octets of the CM Private Data field MUST be formatted as 169 follows: 171 0 1 2 3 172 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 173 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 174 | Magic Number | 175 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 176 | Version | Flags | Send Size | Receive Size | 177 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 179 Magic Number 180 This field contains a fixed 32-bit value that identifies the 181 content of the Private Data field as an RPC-over-RDMA Version One 182 CM Private Data message. The value of this field MUST be 183 0xf6ab0e18, in big-endian order. 185 Version 186 This 8-bit field contains a message format version number. The 187 value "1" in this field means only the first eight octets are 188 present, they appear in the order described in this section, and 189 they each have the meaning defined in this section. 191 Flags 192 This 8-bit field contains eight boolean flags that indicate the 193 support status of optional features, such as Remote Invalidation. 194 The meaning of these flags is defined in Section 3.1.1. 196 Send Size 197 This 8-bit field contains an encoded value corresponding to the 198 largest message size this peer can send using RDMA Send. The 199 value is encoded as described in Section 3.1.2. 201 Receive Size 202 This 8-bit field contains an encoded value corresponding to the 203 largest message size this peer can receive via posted receive 204 buffers. The value is encoded as described in Section 3.1.2. 206 3.1.1. Feature Support Flags 208 The bits in the Flags field are labeled from bit 8 to bit 15, as 209 shown in the diagram above. When the Version field contains the 210 value "1", the bits in the Flags field have the following meaning: 212 Bit 15 213 When this bit is asserted (one), the sender supports the use of 214 Remote Invalidation, as described in 215 [I-D.cel-nfsv4-reminv-design]. When this bit is clear (zero), the 216 sender does not support Remote Invalidation. 218 Bits 14 - 8 219 These bits are reserved and must be clear (zero). 221 3.1.2. Inline Threshold Encoding 223 Inline threshold sizes from 1KB to 256KB can be represented in the 224 Send Size and Receive Size fields. A sender computes the encoded 225 value by dividing the actual value by 1024 and subtracting one from 226 the result. A receiver decodes this value by performing 227 complementary operations. 229 3.2. Extending The Private Message Format 231 The Private Data format described above can be extended to add 232 additional optional fields which follow the first eight octets or to 233 make use of one of the reserved bits in the Flags fields. To 234 introduce such changes while preserving interoperability, a new 235 Version number is allocated, and new fields and bit flags are 236 defined. A description of how receivers should behave if they do not 237 recognize the new format must also be provided. If this document is 238 still a personal draft in the Experiemental category, it must be 239 updated to document the new Private Data message format as above. 241 4. Interoperability Considerations 243 This extension is intended to interoperate with other RPC-over-RDMA 244 Version One implementations which do not support the exchange of CM 245 Private Data. When a peer does not receive a CM Private Data message 246 which conforms to Section 3, it MUST assume the remote peer supports 247 only the default RPC-over-RDMA Version One settings as defined in 248 [I-D.ietf-nfsv4-rfc5666bis]. In other words, the peer behaves as if 249 a Private Data message was received in which bit 8 of the Flags field 250 is clear (zero), and both Size fields contain the value zero. 252 5. IANA Considerations 254 There are no IANA considerations for this document. 256 6. Security Considerations 258 RDMA-CM Private Data typically traverses the link layer in the clear. 259 The same considerations apply here that are described in the Security 260 Considerations section of [I-D.ietf-nfsv4-rfc5666bis]. 262 7. References 264 7.1. Normative References 266 [I-D.ietf-nfsv4-rfc5666bis] 267 Lever, C., Simpson, W., and T. Talpey, "Remote Direct 268 Memory Access Transport for Remote Procedure Call, Version 269 One", draft-ietf-nfsv4-rfc5666bis-07 (work in progress), 270 May 2016. 272 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 273 Requirement Levels", BCP 14, RFC 2119, 274 DOI 10.17487/RFC2119, March 1997, 275 . 277 [RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement 278 Protocol (DDP) / Remote Direct Memory Access Protocol 279 (RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October 280 2007, . 282 7.2. Informative References 284 [I-D.cel-nfsv4-reminv-design] 285 Lever, C., "Using Remote Invalidation With RPC-Over-RDMA 286 Transports", draft-cel-nfsv4-reminv-design-04 (work in 287 progress), September 2016. 289 [IBTA-IB] InfiniBand Trade Association, "InfiniBand(TM) Architecture 290 Specification Volume 1 Release 1.2", November 2007, 291 . 293 [RFC5531] Thurlow, R., "RPC: Remote Procedure Call Protocol 294 Specification Version 2", RFC 5531, DOI 10.17487/RFC5531, 295 May 2009, . 297 Appendix A. Acknowledgments 299 Thanks to Christoph Hellwig of HGST and Devesh Sharma of Broadcom for 300 suggesting this approach. 302 Special thanks go to Transport Area Director Spencer Dawkins, nfsv4 303 Working Group Chair Spencer Shepler, and nfsv4 Working Group 304 Secretary Thomas Haynes for their support. 306 Author's Address 308 Charles Lever 309 Oracle Corporation 310 1015 Granger Avenue 311 Ann Arbor, MI 48104 312 USA 314 Phone: +1 734 274 2396 315 Email: chuck.lever@oracle.com