idnits 2.17.1 draft-cel-nfsv4-reminv-design-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 17, 2017) is 2475 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-09) exists of draft-cel-nfsv4-rpcrdma-version-two-05 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 C. Lever 3 Internet-Draft Oracle 4 Intended status: Informational July 17, 2017 5 Expires: January 18, 2018 7 Using Remote Invalidation With RPC-Over-RDMA Transports 8 draft-cel-nfsv4-reminv-design-06 10 Abstract 12 Remote Invalidation relieves RDMA responders of some of the burden of 13 preparing memory to be accessed remotely, thus reducing the latency 14 of RDMA Read and Write operations. This document considers how to 15 introduce generic support for Remote Invalidation to RPC-over-RDMA 16 transport protocols. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on January 18, 2018. 35 Copyright Notice 37 Copyright (c) 2017 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 54 3. General Requirements . . . . . . . . . . . . . . . . . . . . 4 55 3.1. Memory Management Extensions . . . . . . . . . . . . . . 4 56 3.2. Registration Types . . . . . . . . . . . . . . . . . . . 4 57 3.3. Selecting STags to Invalidate Remotely . . . . . . . . . 5 58 3.4. Future Enhancements . . . . . . . . . . . . . . . . . . . 6 59 4. Remote Invalidation in Operation . . . . . . . . . . . . . . 6 60 4.1. Determining Remote Invalidation Support Status . . . . . 7 61 4.2. Selection of Which STag to Invalidate Remotely . . . . . 8 62 4.3. Reverse-Direction Operation . . . . . . . . . . . . . . . 8 63 5. Protocol Elements . . . . . . . . . . . . . . . . . . . . . . 9 64 5.1. Per Protocol Version Remote Invalidation . . . . . . . . 9 65 5.2. Per Connection Remote Invalidation . . . . . . . . . . . 9 66 5.3. Fixed Protocol Remote Invalidation . . . . . . . . . . . 10 67 5.4. Per RPC Remote Invalidation (Single STag) . . . . . . . . 11 68 5.5. Per RPC Remote Invalidation (Multiple STags) . . . . . . 12 69 5.6. Inter-RPC Remote Invalidation . . . . . . . . . . . . . . 13 70 6. Recommendations . . . . . . . . . . . . . . . . . . . . . . . 13 71 6.1. General Considerations . . . . . . . . . . . . . . . . . 13 72 6.2. Analysis And Discussion . . . . . . . . . . . . . . . . . 14 73 6.3. Example Remote Invalidation Protocol . . . . . . . . . . 15 74 7. Security Considerations . . . . . . . . . . . . . . . . . . . 16 75 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 17 76 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 77 9.1. Normative References . . . . . . . . . . . . . . . . . . 17 78 9.2. Informative References . . . . . . . . . . . . . . . . . 18 79 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 18 80 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 18 82 1. Introduction 84 Like other RDMA-enabled storage protocols, RPC-over-RDMA version 1 85 employs a Read-Write transfer model when using explicit RDMA 86 operations to transfer data [RFC8166]. This means an RPC-over-RDMA 87 requester exposes regions of its memory to an RPC-over-RDMA 88 responder. The responder then uses RDMA Read and Write operations to 89 transfer bulk data payloads. 91 In preparation for a bulk data transfer, a requester asks its RNIC to 92 assign a steering tag, or STag, to a region of memory containing the 93 data to be moved. At this time, access rights are granted that allow 94 the RNIC to access or update that memory on behalf of a remote peer. 95 This act is referred to as "memory registration." The RNIC uses this 96 STag to steer data to and from the registered memory region. 98 When data transfer is complete, each STag is dissociated from its 99 memory region. This act is referred to as "memory invalidation." It 100 prevents further responder access to that memory region by revoking 101 its remote access rights. Invalidation should be done before RPC 102 applications on the requester are allowed access to memory that was 103 involved in an explicit RDMA operation. 105 Before an RPC transaction is terminated, the requester is responsible 106 for fencing memory from the responder [RFC8166]. That makes the 107 completion of RPC transactions synchronous with chunk invalidation. 108 Therefore the latency of invalidation adds to the total execution 109 time of the RPC transaction. 111 Remote Invalidation is a mechanism by which an RDMA peer can request 112 that the remote peer RNIC invalidate an STag associated with memory 113 on that remote peer [RFC5042]. An RDMA consumer requests Remote 114 Invalidation by posting an RDMA Send With Invalidate Work Request in 115 place of an RDMA Send Work Request. RDMA Send With Invalidate is 116 similar to RDMA Send, but takes one additional argument: a single 117 STag to be invalidated by the RNIC that receives the sent message. 118 An RDMA Send message is transmitted with additional header 119 information that conveys the STag that is to be invalidated 120 [RFC5040]. 122 The benefit of Remote Invalidation is that an extra Work Request, 123 context switch, and interrupt to perform memory invalidation are not 124 required by the requester as part of handling the completion of an 125 RPC transaction. STag invalidation begins before the Receive 126 completes, thus invalidation is started (and completes) sooner. The 127 upshot is faster completion of RPC transactions that involve 128 registered memory. 130 This mechanism has the most impact when explicit RDMA operations are 131 needed to move moderate amounts of data. Invalidation latency is 132 quite small compared to the time it takes to convey a large payload 133 with an explicit RDMA operation. Small RPCs are already conveyed 134 entirely via RDMA Send, thus Remote Invalidation is unnecessary for 135 them. When the time it takes to invalidate a memory region is on the 136 same order as the time it takes to move the contents of that region, 137 Remote Invalidation has its greatest impact. 139 Remote Invalidation confers benefits similar to the benefits of 140 increasing the size of Send and Receive buffers. However, Remote 141 Invalidation does not incur the cost of maintaining a pool of large 142 Receive buffers on either the requester or responder. Moderate-sized 143 RPC payloads can be transferred without the usual costs of memory 144 registration. Requesters can rely on RDMA Write to structure their 145 Receive buffers without introducing additional latency. 147 There are some downsides, however. Remote Invalidation is not 148 available on all RNIC devices. And, Remote Invalidation does not 149 address the extra round trip latency incurred when using RDMA Read. 150 This extra latency can be eliminated using a large inline threshold 151 for transmitting RPC Calls. 153 The purpose of this document is to explore generally how Remote 154 Invalidation can be introduced into the RPC-over-RDMA transport 155 protocol. The primary design considerations for the transport 156 protocol are to provide a mechanism to indicate when Remote 157 Invalidation can be used by the transport, and to provide selection 158 criteria for choosing which STag to invalidate remotely. Elements of 159 the XDR definition of the RPC-over-RDMA protocol must be altered to 160 some degree, depending on desired flexibility of operation, 161 invasiveness of XDR changes, and broadness of hardware support. 163 2. Requirements Language 165 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 166 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 167 document are to be interpreted as described in [RFC2119] [RFC8174] 168 when, and only when, they appear in all capitals, as shown here. 170 3. General Requirements 172 3.1. Memory Management Extensions 174 Remote Invalidation was not available in the original RDMA Verbs API. 175 New verbs API objects were specified that include operations that 176 enable Remote Invalidation, now described in [IBARCH]. The Verbs API 177 provides a capabilities flag, MEM_MGT_EXTENSIONS, that indicates that 178 an RNIC can provide the new APIs and objects. 180 An STag that is registered using the FRWR mechanism (in a privileged 181 execution context), or is registered via a Memory Window (in user 182 space), may be invalidated remotely [RFC5040]. These mechanisms are 183 available when an RNIC supports MEM_MGT_EXTENSIONS. 185 RDMA Send With Invalidate is available only with MEM_MGT_EXTENSIONS. 187 3.2. Registration Types 189 For the purposes of this discussion, there are two classes of STags. 190 Dynamically-registered STags are used in a single RPC, then 191 invalidated. Persistently-registered STags live longer than one RPC. 192 They may persist for the life of an RPC-over-RDMA connection, or 193 longer. 195 In RPC-over-RDMA version 1, a requester may provide more than one 196 STag in the chunk lists of an RPC. It may provide any combination of 197 the following registration types in one RPC, any combination of these 198 in a series of RPCs on the same connection, or it may use some other 199 registration model. 201 Examples of persistently-registered STags include: 203 o The device's reserved DMA R_key 205 o An STag registered for a connection that doesn't change from RPC 206 to RPC (for a utility buffer, say) 208 o An STag registered for a fixed memory region that is updated after 209 each time it is advertised 211 o An STag covering a large single region that is utilized in small 212 segments by many RPCs 214 Examples of dynamically-registered STags include: 216 o An STag registered for a single RPC transaction using a legacy 217 registration mechanism, then invalidated when the RPC is retired 219 o An STag registered for a single RPC transaction using either 220 Memory Windows or FRWR, then invalidated when the RPC is retired 222 Among these examples, only dynamically-registered STags using Memory 223 Windows or FRWR may be invalidated remotely. 225 3.3. Selecting STags to Invalidate Remotely 227 Remote Invalidation protocol mechanisms come in different styles: 229 Fixed Protocol 230 The rules by which a responder selects which STag to invalidate 231 remotely is fixed in the protocol specification. 233 Responder's Choice 234 The responder chooses an STag to invalidate remotely from among 235 all the STags in incoming requests. 237 Requester's Choice 238 The requester chooses one or more STags that may be invalidated 239 remotely, indicating its choices in each request. The responder 240 chooses an STag to invalidate remotely from among the requester's 241 picks. 243 There is no RDMA layer mechanism by which a responder can determine 244 how a requester-provided STag was registered. Thus a requester that 245 mixes persistently- and dynamically-registered STags in one RPC, or 246 mixes them across RPCs on the same connection, cannot tolerate 247 Responder's Choice. 249 3.4. Future Enhancements 251 There are two related enhancements that further reduce the effort 252 needed to invalidate STags associated with complex RPCs: 254 o The ability for one registered STag to represent a list of memory 255 regions that are not contiguous 257 o The ability to specify more than one remote STag in a single Work 258 Request to be remotely invalidated 260 At this time, the first mechanism has been implemented in at least 261 one RNIC on the market. The second is speculative (i.e., has not yet 262 been implemented anywhere). 264 Given support for registering non-contiguous memory regions with one 265 STag, when an RPC-over-RDMA requester constructs an RPC that has both 266 a Read list and a Write list, the requester has a choice: 268 o The requester can register a separate STag for each access mode 269 (one STag for memory regions needing read access, and one STag for 270 those needing write access) to provide good data security 272 o The requester can register a single STag with read and write 273 access enabled for the whole set of memory regions, to allow RDMA 274 Send With Invalidate to work optimally 276 Having the ability to remotely invalidate multiple STags at once 277 enables the combination of optimal performance and optimal security. 279 4. Remote Invalidation in Operation 281 When requester memory is registered for remote access, an RPC-over- 282 RDMA implementation can use Remote Invalidation by following these 283 steps: 285 1. The requester DMA-maps a memory region that will participate in 286 an RPC transaction, then registers an STag for that region. 288 2. The requester transmits the RPC Call, which also conveys the 289 STag, to the responder. 291 3. The responder processes the RPC transaction. The peer RNICs use 292 the STag to move RPC arguments and/or results. 294 4. The responder transmits the RPC Reply using an RDMA Send With 295 Invalidate Work Request, setting the Work Request's inv_handle 296 field to the value of the STag. 298 5. A Receive Work Request completes on the requester, carrying this 299 RPC reply, and reporting the invalidated STag. 301 6. The requester skips invalidation of the STag, then DMA-unmaps the 302 memory region associated with the STag. 304 The requester no longer needs to invalidate the STag involved with 305 this RPC. However, there are additional details that must be 306 resolved before the use of Remote Invalidation can commence. 308 4.1. Determining Remote Invalidation Support Status 310 An RDMA consumer (an Upper Layer Protocol implementation) that does 311 not support Remote Invalidation might not tolerate the use of RDMA 312 Send With Invalidate by a responder. Such a requester performs Local 313 Invalidation on STags that already happen to be invalid, and in some 314 cases this can result in protection errors or other issues. 316 Thus, to avoid spurious connection termination, a responder must not 317 post an RDMA Send With Invalidate Work Request unless it is sure the 318 following three conditions are met: 320 o The requester's RNIC is prepared to receive the additional header 321 information associated with Remote Invalidation 323 o The requester has used an appropriate registration mechanism to 324 register STags it wants invalidated remotely 326 o The requester is prepared to recognize remotely invalidated STags 327 during Receive processing, and thus avoid invalidating them a 328 second time 330 When all three of these conditions are true, a requester can report 331 positive Remote Invalidation support status to responders using an 332 Upper Layer Protocol mechanism. When a responder does not know the 333 requester's Remote Invalidation support status, it cannot use Remote 334 Invalidation without endangering the connection. 336 4.2. Selection of Which STag to Invalidate Remotely 338 The RDMA Send With Invalidate Work Request invalidates only one STag. 339 RPC-over-RDMA requesters may register more than one STag to handle 340 the movement of payloads for a single RPC. Either the client will 341 have to specify which STag may be remotely invalidated, the protocol 342 will have to specify a fixed way to select which STag to invalidate, 343 or the responder will have to choose arbitrarily which STag to 344 remotely invalidate. 346 In some circumstances, requesters may wish to utilize STags during 347 transactions that are registered using a mechanism that does not 348 tolerate Remote Invalidation. For example, an STag that is the 349 requester's local DMA R_key should never be invalidated remotely. If 350 a responder attempts to invalidate a such an STag, the result is 351 undefined, but the connection may be terminated or other failures can 352 occur. 354 Even with Remote Invalidation enabled, requesters remain responsible 355 for ensuring all STags are invalid before RPC transactions complete. 356 To avoid leaving STags registered, a requester must be prepared for 357 the responder or the requester's own RNIC to have not invalidated any 358 of an RPC's STags. When there are multiple STags associated with a 359 single RPC, a requester must be prepared for any of the STags to have 360 been remotely invalidated, or none of them. 362 4.3. Reverse-Direction Operation 364 As of this writing, no current RPC-over-RDMA implementation supports 365 direct data placement in the reverse direction. However, existing 366 protocol specifications do not forbid it [RFC8166] [RFC8167] 367 [I-D.cel-nfsv4-rpcrdma-version-two]. 369 When chunks are present in a reverse-direction RPC request, Remote 370 Invalidation allows the responder to trigger invalidation of a 371 requester's STags as part of sending a reply, the same as in the 372 forward direction. 374 However, in the reverse direction, the server acts as the requester, 375 and the client is the responder. The server's RNIC, therefore, must 376 support receiving an IETH, and the server must have registered the 377 STags with an appropriate registration mechanism. Thus the server 378 must indicate its Remote Invalidation support status to the client 379 (the opposite of forward direction Remote Invalidation). 381 5. Protocol Elements 383 In this section, a number of abstract protocol variations are 384 considered. These vary in functionality and invasiveness. Some may 385 be appropriate to use in combination. 387 5.1. Per Protocol Version Remote Invalidation 389 5.1.1. Description 391 When a higher protocol version number is negotiated, Remote 392 Invalidation is always enabled. This new protocol version would then 393 be usable only with RNICs that support Remote Invalidation. Both 394 peers assume that Remote Invalidation may be used in either 395 direction. 397 5.1.2. Similar Existing Implementations 399 SMB Direct [MS-SMBD] 401 5.1.3. Advantages 403 No XDR changes or protocol extensions are required. 405 Reverse direction use of Remote Invalidation is automatically 406 supported. 408 5.1.4. Disadvantages 410 The requester is not in control of which STags in an RPC may be 411 invalidated. Thus, a requester must not advertise STags which must 412 never be invalidated, or the protocol must specify a fixed choice of 413 which STag(s) in each request are allowed to be invalidated remotely. 415 Other features and benefits of the new protocol version would not be 416 available when an implementation employs an RNIC that does not 417 support Remote Invalidation. In particular, RNICs that do not 418 support MEM_MGT_EXTENTIONS could not use the new protocol version. 420 An extension or addition protocol version bump is required to 421 indicate support for transport-level mechanisms that can invalidate 422 multiple STags at once. 424 5.2. Per Connection Remote Invalidation 425 5.2.1. Description 427 At connection initiation time, messages are exchanged that indicate 428 each peer's Remote Invalidation support status. Without these 429 messages, peers assume Remote Invalidation is not supported. 431 5.2.2. Similar Existing Implementations 433 iSER [RFC7145]. Information is exchanged in RDMA-CM connection 434 requests to report an implementation's Remote Invalidation support 435 status. 437 5.2.3. Advantages 439 No changes to the base protocol XDR are required. 441 5.2.4. Disadvantages 443 Out-of-band messages are required to establish Remote Invalidation 444 support status. 446 The requester is not in control of which STags in an RPC may be 447 invalidated. Thus, a requester must not advertise STags which must 448 never be invalidated. 450 To support reverse-direction operation, the server must separately 451 indicate that it supports Remote Invalidation. 453 To enable support for multiple STag invalidation, this negotiation 454 protocol would have to be extended again to indicate when mechanisms 455 other than RDMA Send With Invalidate are supported by the requester's 456 RNIC. 458 5.3. Fixed Protocol Remote Invalidation 460 5.3.1. Description 462 No new field is introduced to the transport header. Protocol 463 specification determines how the responder chooses which STag is to 464 be invalidated remotely. Some other means is used to determine 465 whether Remote Invalidation can be used or not. 467 5.3.2. Similar Existing Implementations 469 iSER [RFC7145]. Two STags fields appear in each request: one 470 advertises Read data and one advertises Write data. When only one 471 STag is used in the request, it may be invalidated remotely. One 472 both STags are used, only the Read STag may be invalidated remotely. 474 SMB Direct [MS-SMBD]. The responder always chooses the first STag in 475 each request to be invalidated remotely. 477 5.3.3. Advantages 479 No changes to the base protocol XDR are required. 481 5.3.4. Disadvantages 483 Out-of-band messages are required to establish support status. 485 The requester is not in control of which STags in an RPC may be 486 invalidated. Thus, a requester must not advertise STags which must 487 never be invalidated. 489 This mechanism may not work well for transport protocols that allow 490 multiple read and write STags. 492 5.4. Per RPC Remote Invalidation (Single STag) 494 5.4.1. Description 496 A field is added to the transport header that contains an STag which 497 may be invalidated by the responder. A special value can be chosen 498 to mean "no STag may be invalidated" for use by requesters that have 499 no support for Remote Invalidation. 501 5.4.2. Similar Existing Implementations 503 None. 505 5.4.3. Advantages 507 A requester may advertise STags that cannot be invalidated remotely, 508 as long as they are never marked as "may invalidate." 510 No out-of-band support status negotiation is needed. 512 Reverse-direction RPCs can each indicate whether a reverse-direction 513 requester desires or does not support Remote Invalidation. 515 The responder needs no special logic or assumptions to choose the 516 STag to invalidate remotely. 518 5.4.4. Disadvantages 520 Either the base RPC-over-RDMA header XDR definition is altered, or a 521 protocol extension is required. 523 Requesters transmit a little extra data per RPC, making RPC-over-RDMA 524 messages slightly more costly to send and parse. 526 This mechanism cannot support the remote invalidation of multiple 527 STags at once. 529 5.5. Per RPC Remote Invalidation (Multiple STags) 531 5.5.1. Description 533 A new data structure is added to the transport header that indicate 534 which STags which may be invalidated by the responder. 536 This information might appear as a new field in the RDMA segment data 537 structure, as each segment has its own STag field. The field 538 indicates whether or not that STag may be invalidated by the 539 responder. Perhaps that field is a boolean, though in XDR, a boolean 540 is a full 32 bits. 542 Or, this information could appear in the header as an array of STags, 543 to reduce the amount of extra data contained in the RPC-over-RDMA 544 header. Zero array elements means the requester does not support 545 Remote Invalidation. 547 5.5.2. Similar Existing Implementations 549 NVMe/Fabrics [NVME]. Each STag in a request has an associated bit 550 flag that indicates whether the responder is allowed to invalidate it 551 remotely. 553 5.5.3. Advantages 555 A requester may advertise STags that cannot be invalidated remotely, 556 as long as they are never marked as "may invalidate." 558 The mechanism allows a requester to request either invalidation of 559 multiple STags at once, or to choose one STag to invalidate remotely. 561 No out-of-band support status negotiation is needed. 563 Each reverse-direction RPC can indicate whether a reverse-direction 564 requester desires or does not support Remote Invalidation. 566 The responder needs no special logic or assumptions to choose the 567 STag to invalidate remotely. 569 5.5.4. Disadvantages 571 The RPC-over-RDMA header XDR definition is possibly extensively 572 altered. 574 Requesters transmit extra data per RPC. However, it is limited to 575 only one or two 32-bit words in most cases. 577 5.6. Inter-RPC Remote Invalidation 579 5.6.1. Description 581 As a subfeature of support for Remote Invalidation, it is possible 582 that a responder can remotely invalidate an STag (using RDMA Send 583 With Invalidate) that refers to registered memory being used in the 584 Read chunk of a different RPC. Such Remote Invalidation would be 585 requested only after the RDMA Read has already been completed. 587 This can be useful when a responder is replying to an RPC via an 588 inline message, but notices there are other RPC replies pending that 589 have multiple STags, some of which are Read chunks. 591 5.6.2. Similar Existing Implementations 593 None 595 5.6.3. Advantages 597 This is one way to enable remote invalidation of multiple STags per 598 RPC, using only RDMA Send With Invalidate. 600 5.6.4. Disadvantages 602 Additional requester and responder complexity would be required to 603 keep track of STags. 605 6. Recommendations 607 6.1. General Considerations 609 When constructing a protocol to support Remote Invalidation, one of 610 the above designs, or some combination of them, may be chosen. 612 In no particular order, the author feels that the design priorities 613 are: 615 o Do not prevent the efficient operation of RNICs that do not handle 616 RDMA Send With Invalidate 618 o Introduce as little impact on header XDR and header length as 619 possible, to keep collateral performance impact low 621 o Enable support for Remote Invalidation when explicit RDMA is used 622 in reverse-direction RPCs. 624 An important question is whether the base RPC-over-RDMA protocol 625 should support Remote Invalidation, whether Remote Invalidation 626 support should be carried entirely on the shoulders of protocol 627 extensions, or whether some combination of the two is best. 629 Upper Layer Protocols will likely always be responsible for some 630 degree of signaling Remote Invalidation capabilities, as long as 631 innovation continues at the transport layer (e.g., new RDMA 632 operations that enable multi-STag Remote Invalidation). Predicting 633 future hardware capabilities is challenging, limiting the ability to 634 design long-lived protocol support for them. 636 Lastly, it is difficult to estimate how long the industry must 637 continue to support less capable devices. 639 6.2. Analysis And Discussion 641 All things being equal, making no changes to the base XDR definition 642 has great appeal. If the mechanism in Section 5.2 can be broadly 643 effective at enabling Remote Invalidation in the current set of RPC- 644 over-RDMA implementations, it would be the proper choice. 646 Unfortunately, among current RPC-over-RDMA client implementations, 647 there is one client that can immediately use a per-connection style 648 protocol, and one that can use only a per-RPC style protocol such as 649 Section 5.4. A third known client resides in user space and uses FMR 650 registration, thus it is incapable of immediately employing Remote 651 Invalidation. 653 Because there is a wide latitude of implementation choice already 654 allowed by the RPC-over-RDMA transport protocol, the author's 655 preference is to implement Section 5.4. The target STag can be added 656 to the RPC-over-RDMA transport as a single field in a new version of 657 the RPC-over-RDMA transport protocol. No further changes or 658 extensions are needed. 660 In the longer term, the requester appears to be in the better 661 position to determine which STag may be invalidated remotely. With 662 this mechanism, the requester can choose based on which STags may be 663 invalidated remotely, or may use criteria based on the strengths of 664 its RNIC. For instance, choosing the largest registered memory 665 region might be beneficial in some cases. 667 Allowing the responder to select from among several choices does not 668 seem to bring additional value, and burdens the responder with 669 additional header parsing costs for each chunk-bearing RPC 670 transaction. 672 Furthermore, the ability to request Remote Invalidation of multiple 673 STags in a single Work Request appears to be somewhat distant. It 674 would require additional Upper Layer Protocol mechanisms to 675 distinguish the new mechanism from using RDMA Send With Invalidate, 676 which we are not in a position to design today. Thus it does not 677 seem worth the extra implementation and protocol complexity of having 678 the requester provide a list of STags for the responder to choose 679 from. 681 As an alternative to modifying the XDR definition for the RDMA_MSG 682 and RDMA_NOMSG message types, a new RDMA message type could be 683 introduced in a new version of RPC-over-RDMA that provides similar 684 functionality to RDMA_MSG and RDMA_NOMSG but adds one or more new 685 fields. This has the advantage of leaving the version 1-compatible 686 parts of the the new XDR definition unchanged. It is an open 687 question whether this introduces more complexity to existing 688 implementations than adding new fields to RDMA_MSG and RDMA_NOMSG. 689 However, this approach is similar to the introduction of READ_PLUS in 690 the specification of NFSv4.2 [RFC7862]. 692 Allowing the feature described in Section 5.6 is likely to increase 693 the complexity of responder and especially requester implementations, 694 as they would have to remember invalidated STags independently of RPC 695 completions. Because it does not require any XDR changes, it could 696 easily be enabled in a future protocol extension. The author's 697 preference is to forbid this behavior in the initial specification, 698 but allow for a future extension to introduce it. 700 6.3. Example Remote Invalidation Protocol 702 As an example of how to proceed, the simplest approach would replace 703 struct rpcrdma2_chunk_lists (as defined in 704 [I-D.cel-nfsv4-rpcrdma-version-two]) with the following: 706 708 struct rpcrdma2_chunk_lists { 709 enum msg_type rdma_direction; 710 u32 rdma_inv_handle; 711 struct rpcrdma2_read_list *rdma_reads; 712 struct rpcrdma2_write_list *rdma_writes; 713 struct rpcrdma2_write_chunk *rdma_reply; 714 }; 716 718 The following language describes how to utilize the new field: 720 The requester sets the value of the rdma_inv_handle field to the 721 value of any one of the rdma_handle fields in the RPC-over-RDMA 722 header of the RPC Call that may be invalidated remotely. If the 723 RPC-over-RDMA header of the RPC Call contains no rdma_handles that 724 may be invalidated remotely, the requester MUST set the value of 725 the rdma_inv_handle field to zero. 727 If the rdma_inv_handle field in the RPC-over-RDMA header of an RPC 728 Call contains zero, the responder MUST NOT use RDMA Send With 729 Invalidate to transmit the matching RPC Reply. Otherwise, the 730 responder SHOULD use RDMA Send With Invalidate to transmit the RPC 731 Reply, specifying the value in the RPC-over-RDMA header's 732 rdma_inv_handle field as the Send With Invalidate Work Request's 733 inv_rkey. 735 7. Security Considerations 737 Remote Invalidation metadata is conveyed in the clear in RPC-over- 738 RDMA headers. This does not expose any new information to attackers. 740 A man-in-the-middle can alter Remote Invalidation metadata while it 741 is in transit. Requesters are prepared to handle the case where 742 responders have not invalidated any STags associated with an RPC. An 743 attacker can cause other STags in flight to be invalidated before the 744 responder is finished with the associated memory. Or an attacker can 745 replace the "to-be invalidated" STag with an STag in the same RPC 746 that should not be invalidated remotely. Any of these might cause 747 loss of connection, or other failures. 749 A connection relationship is required to exist between a requester 750 and a responder. The requester's RNIC has associated a Protection 751 Domain with that connection. The STag on the requester to be 752 invalidated is associated with that Protection Domain. This protects 753 against arbitrary invalidation of STags by network nodes not part of 754 the connection. 756 Further discussion appears in [RFC5042]. 758 8. IANA Considerations 760 This document does not require actions by IANA. 762 9. References 764 9.1. Normative References 766 [I-D.cel-nfsv4-rpcrdma-version-two] 767 Lever, C. and D. Noveck, "RPC-over-RDMA Version 2 768 Protocol", draft-cel-nfsv4-rpcrdma-version-two-05 (work in 769 progress), July 2017. 771 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 772 Requirement Levels", BCP 14, RFC 2119, 773 DOI 10.17487/RFC2119, March 1997, 774 . 776 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 777 Garcia, "A Remote Direct Memory Access Protocol 778 Specification", RFC 5040, DOI 10.17487/RFC5040, October 779 2007, . 781 [RFC5042] Pinkerton, J. and E. Deleganes, "Direct Data Placement 782 Protocol (DDP) / Remote Direct Memory Access Protocol 783 (RDMAP) Security", RFC 5042, DOI 10.17487/RFC5042, October 784 2007, . 786 [RFC7145] Ko, M. and A. Nezhinsky, "Internet Small Computer System 787 Interface (iSCSI) Extensions for the Remote Direct Memory 788 Access (RDMA) Specification", RFC 7145, 789 DOI 10.17487/RFC7145, April 2014, 790 . 792 [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct 793 Memory Access Transport for Remote Procedure Call Version 794 1", RFC 8166, DOI 10.17487/RFC8166, June 2017, 795 . 797 [RFC8167] Lever, C., "Bidirectional Remote Procedure Call on RPC- 798 over-RDMA Transports", RFC 8167, DOI 10.17487/RFC8167, 799 June 2017, . 801 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 802 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 803 May 2017, . 805 9.2. Informative References 807 [IBARCH] InfiniBand Trade Association, "InfiniBand Architecture 808 Specification Volume 1", Release 1.3, March 2015, 809 . 812 [MS-SMBD] Microsoft Corporation, "SMB Remote Direct Memory Access 813 (RDMA) Transport Protocol Specification", July 2016. 815 [NVME] NVM Express, Inc., "NVM Express Revision 1.2.1", July 816 2016. 818 [RFC7862] Haynes, T., "Network File System (NFS) Version 4 Minor 819 Version 2 Protocol", RFC 7862, DOI 10.17487/RFC7862, 820 November 2016, . 822 Acknowledgments 824 The author wishes to thank Sagi Grimberg, Christoph Hellwig, Karen 825 Deitke, Dave Noveck, and Tom Talpey. The author also wishes to thank 826 Bill Baker and Greg Marsden for their support of this work. 828 Special thanks go to Transport Area Director Spencer Dawkins, NFSV4 829 Working Group Chair Spencer Shepler, and NFSV4 Working Group 830 Secretary Thomas Haynes for their support. 832 Author's Address 834 Charles Lever 835 Oracle Corporation 836 1015 Granger Avenue 837 Ann Arbor, MI 48104 838 United States of America 840 Phone: +1 248 816 6463 841 Email: chuck.lever@oracle.com