idnits 2.17.1 draft-dnoveck-nfsv4-rpcrdma-rtrext-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 944 has weird spacing: '...itemlen xmd...' == Line 1032 has weird spacing: '...itemlen xmd...' == Line 1037 has weird spacing: '...sdrange xmdr...' == Line 1038 has weird spacing: '...rsditem xmd...' == Line 1295 has weird spacing: '...grpinfo opt...' == (9 more instances...) -- The document date (June 5, 2017) is 2517 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 963 -- Obsolete informational reference (is this intentional?): RFC 5666 (Obsoleted by RFC 8166) -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 D. Noveck 3 Internet-Draft NetApp 4 Intended status: Standards Track June 5, 2017 5 Expires: December 7, 2017 7 RPC-over-RDMA Extensions to Reduce Internode Round-trips 8 draft-dnoveck-nfsv4-rpcrdma-rtrext-02 10 Abstract 12 It is expected that a future version of the RPC-over-RDMA transport 13 will allow protocol extensions to be defined. This would provide for 14 the specification of OPTIONAL features allowing participants who 15 implement such features to cooperate as specified by that extension, 16 while still interoperating with participants who do not support that 17 extension. 19 A particular extension is described herein, whose purpose is to 20 reduce the latency due to inter-node round-trips needed to effect 21 operations which involve direct data placement or which transfer RPC 22 messages longer than the fixed inline buffer size limit. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on December 7, 2017. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 60 1.2. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 61 1.3. Prerequisites . . . . . . . . . . . . . . . . . . . . . . 3 62 1.4. Role Terminology . . . . . . . . . . . . . . . . . . . . 4 63 2. Extension Overview . . . . . . . . . . . . . . . . . . . . . 5 64 3. Data Placement Features . . . . . . . . . . . . . . . . . . . 5 65 3.1. Current Situation . . . . . . . . . . . . . . . . . . . . 5 66 3.2. RDMA_MSGP . . . . . . . . . . . . . . . . . . . . . . . . 6 67 3.3. Send-based Data Placement . . . . . . . . . . . . . . . . 7 68 3.4. Other Extensions Relating to Data Placement . . . . . . . 7 69 4. Message Continuation Feature . . . . . . . . . . . . . . . . 8 70 4.1. Current Situation . . . . . . . . . . . . . . . . . . . . 8 71 4.2. Message Continuation Changes . . . . . . . . . . . . . . 9 72 4.3. Message Continuation and Credits . . . . . . . . . . . . 10 73 5. Using Protocol Additions . . . . . . . . . . . . . . . . . . 11 74 5.1. New Operation Support . . . . . . . . . . . . . . . . . . 11 75 5.2. Message Continuation Support . . . . . . . . . . . . . . 11 76 5.3. Support for Send-based Data Placement . . . . . . . . . . 12 77 5.4. Error Reporting . . . . . . . . . . . . . . . . . . . . . 13 78 6. XDR Preliminaries . . . . . . . . . . . . . . . . . . . . . . 14 79 6.1. Message Continuation Preliminaries . . . . . . . . . . . 14 80 6.2. Data Placement Preliminaries . . . . . . . . . . . . . . 15 81 7. Data Placement Structures . . . . . . . . . . . . . . . . . . 17 82 7.1. Data Placement Overview . . . . . . . . . . . . . . . . . 17 83 7.2. Buffer Structure Definition . . . . . . . . . . . . . . . 19 84 7.3. Message Data Placement Structures . . . . . . . . . . . . 20 85 7.4. Response Direction Data Placement Structures . . . . . . 22 86 8. Transport Properties . . . . . . . . . . . . . . . . . . . . 25 87 8.1. Property List . . . . . . . . . . . . . . . . . . . . . . 25 88 8.2. RTR Support Property . . . . . . . . . . . . . . . . . . 26 89 8.3. Receive Buffer Structure Property . . . . . . . . . . . . 26 90 8.4. Request Transmission Receive Limit Property . . . . . . . 27 91 8.5. Response Transmission Send Limit Property . . . . . . . . 27 92 9. New Operations . . . . . . . . . . . . . . . . . . . . . . . 27 93 9.1. Operations List . . . . . . . . . . . . . . . . . . . . . 28 94 9.2. Transmit Request Operation . . . . . . . . . . . . . . . 29 95 9.3. Transmit Response Operation . . . . . . . . . . . . . . . 29 96 9.4. Transmit Continue Operation . . . . . . . . . . . . . . . 30 97 9.5. Error Reporting Operation . . . . . . . . . . . . . . . . 31 98 10. XDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 99 10.1. Code Component License . . . . . . . . . . . . . . . . . 35 100 10.2. XDR Proper for Extension . . . . . . . . . . . . . . . . 37 101 11. Security Considerations . . . . . . . . . . . . . . . . . . . 42 102 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43 103 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 43 104 13.1. Normative References . . . . . . . . . . . . . . . . . . 43 105 13.2. Informative References . . . . . . . . . . . . . . . . . 43 106 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 44 107 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 44 109 1. Preliminaries 111 1.1. Requirements Language 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in [RFC2119]. 117 1.2. Introduction 119 This document describes a potential extension to the RPC-over-RDMA 120 protocol, which would allow participating implementations to have 121 more flexibility in how they use RDMA sends and receives to effect 122 necessary transmission of RPC requests and replies. 124 In contrast to existing facilities defined in RPC-over-RDMA Version 125 One in which the mapping between RPC messages and RPC-over-RDMA 126 messages is strictly one-to-one and placement of bulk data is 127 effected only through use of explicit RDMA operations, the following 128 features are made available through this extension: 130 o The ability to effect data placement in the context of a single 131 RPC-over-RDMA transmission, rather than requiring explicit RDMA 132 operations to effect the necessary placement. 134 o The ability to continue an RPC request or reply over multiple RPC- 135 over-RDMA transmissions 137 1.3. Prerequisites 139 This document is written assuming that certain underlying facilities 140 will be made available to build upon, in the context of a future 141 version of RPC-over-RDMA. It is most likely that such facilities 142 will be first available in Version Two of RPC-over-RDMA. 144 o A protocol extension mechanism is needed to enable the extensions 145 to RPC-over-RDMA described here. 147 This document is currently written to conform to the extension 148 model for the proposed RPC-over-RDMA Version Two as described in 149 [rpcrdmav2]. 151 o An existing means of communicating transport properties between 152 the RPC-over-RDMA endpoints is assumed. 154 This document is currently written assuming the transport property 155 model defined in [rpcrdmav2] will be available and can be extended 156 to meet the needs of this extension. 158 As the document referred to above is currently a personal Internet 159 Draft, and subject to change, adjustments to this document are 160 expected to be necessary when and if the needed facilities are 161 defined in one or more working group documents. 163 1.4. Role Terminology 165 A number of different terms are used regarding the roles of the two 166 participants in an RPC-over-RMA connection. Some of these roles last 167 for the duration of a connection while others vary from request to 168 request or from message to message. 170 The roles of the client and server are fixed for the lifetime of the 171 connection, with the client defined as the endpoint which initiated 172 the connection. 174 The roles of requester and responder often parallel those of client 175 and server, although this is not always the case. Most requests are 176 made in the forward direction, in which the client is the requester 177 and the server is the responder. However, backward direction 178 requests are possible, in which case the server is the requester and 179 the client is the responder. As a result clients and servers may 180 both act as requesters and responders for different requests issued 181 on the same connection. 183 The roles of sender and receiver vary from message to messages. With 184 regard to the messages described in this document, the sender may act 185 as a requester by sending RPC requests or a responder by sending RPC 186 requests or as both at the same time by sending a mix of the two. 188 2. Extension Overview 190 This extension is intended to function as part of RPC-over-RDMA and 191 implementations should successfully interoperate with existing RPC- 192 over-RDMA Version One implementations. Nevertheless, this extension 193 seeks to take a somewhat different approach to high-performance RPC 194 operation than has been used previously in that it seeks to de- 195 emphasize the use of explicit RDMA operations. It does this in two 196 ways: 198 o By implementing a send-based form of data placement (see 199 Section 3), use of explicit RDMA operations can be avoided in many 200 common cases in which data is to be placed at an appropriate 201 location in the receiver's memory. 203 o Use of explicit RDMA to support reply chunks and position-zero 204 read chunks can be avoided by allowing a single message to be 205 split into multiple transmissions. This can be used to avoid many 206 instances of the only existing use of explicit RDMA operations not 207 associated with Direct Data Placement. 209 While use of explicit RDMA operations allows the cost of the actual 210 data transfer to be offloaded from the client and server CPUs to the 211 RNIC, there are ancillary costs in setting up the transfer that 212 cannot be ignored. As a result, send-based functions are often 213 preferable, since the RNIC also uses DMA to effect these operations. 214 In addition, the cost of the additional inter-node round trips 215 required by explicit RDMA operation can be an issue, which can 216 becomes increasingly troublesome as internode distances increase. 217 Once one moves from in-machine-room to campus-wide or metropolitan- 218 area distances the additional round-trip delay of 16 microseconds per 219 mile becomes an issue impeding use of explicit RDMA operations. 221 3. Data Placement Features 223 3.1. Current Situation 225 Although explicit RDMA operations are used in the existing RPC-over- 226 RDMA protocol for purposes unrelated to Direct Data Placement, all 227 placement of bulk data is effected using explicit RDMA operations. 229 As a result, many operations requiring placement of bulk data involve 230 multiple internode round trips. 232 3.2. RDMA_MSGP 234 Although this was not stated explicitly, it appears that RDMA_MSGP 235 (defined in [RFC5666], removed from RPC-over-RDMA Version One by 236 [rfc5666bis]), was an early attempt to effect correct placement of 237 bulk data within a single RPC-over-RDMA transmission. 239 As things turned out, the fields within the RDMA_MSGP header were not 240 described in [RFC5666] in a way that allowed this message type to be 241 implemented. 243 In attempting to provide the appropriate data placement 244 functionality, we have to keep in mind and avoid the problems that 245 led to failure of RDMA_MSGP. It appears that the problems go deeper 246 than neglecting to write a few relevant sentences. It is helpful to 247 note that: 249 o The inline message size limits eventually adopted were too small 250 to allow RDMA_MSGP to be used effectively. This is true of both 251 the 1K limit in Version One [rfc5666bis] and the 4K limit 252 specified in [rpcrdmav2]. 254 On the other hand, there is text within [RFC5667] that suggests 255 that much longer messages were anticipated at some points during 256 the evolution of RPC-over-RDMA. 258 o The fact that NFSv4 COMPOUNDs often have additional operations 259 beyond the one including the bulk data means that the RDMA_MSGP 260 model cannot be extended to NFSv4. As a result, the bulk data 261 needs to be excised from the data stream just as chunks are, so 262 that the payload stream can include non-bulk data both before and 263 after the logical position of the excised bulk data. 265 o In order for the sender to determine the appropriate amount of 266 padding necessary within a transmission to place the bulk data at 267 the proper position within receive buffer, the server must know 268 more about the structure of the receiver's buffers. Since the 269 padding needs to bring the bulk data to a position within the 270 buffer that is appropriate to receive the bulk data, the sender 271 needs to know where within the receive buffers such placement- 272 eligible areas are located. 274 o While appropriate padding could place the bulk data within a large 275 WRITE into an appropriately aligned buffer or set of buffer, there 276 is no corresponding provision for the bulk data associated with a 277 READ. In short, there is no way to indicate to the responder that 278 it should use RDMA_MSGP to appropriately place bulk data in the 279 response. 281 o There is no explicit discussion of the required padding's use in 282 effecting proper data placement or connection with the ULB's 283 specification of DDP-eligible XDR items. 285 To summarize, RDMA_MSGP was an attempt to properly place bulk data 286 which was thought of as a local optimization and insufficient 287 attention was given to it to make it successful. As a result, as 288 RPC-over-RDMA Version One was developed, data placement was 289 identified with the use of explicit RDMA operations providing DDP and 290 the possibility of data placement within sends was not recognized. 292 3.3. Send-based Data Placement 294 In this extension we will describe a more complete way to provide 295 send-based data placement, as follows: 297 o By defining the structure of receive buffers as a transport 298 property available to be interrogated by the peer implementation. 300 o By treating positioning of bulk data within a message as an 301 instance of data placement, causing the bulk data to be excised 302 from the payload XDR stream, as is the case with other forms of 303 bulk data placement (e.g. DDP). 305 o By defining new data structures to control placement of bulk data 306 that support both send-based data placement and DDP using explicit 307 RDMA operations that was an integral part in RPC-over-RDMA Version 308 One. These new control structures, described in Section 7.1 are 309 organized differently from the chunk-based structures described in 310 [rfc5666bis]. 312 In making these changes, we will retain certain aspects of the DDP 313 model: 315 o The set of bulk data items eligible for special data placement is 316 exactly the same as with DDP, as defined by the RPC protocol's 317 upper-layer binding document. 319 o The concept of an inline XDR stream is retained, with specially 320 placed items appearing outside it, but with references to them 321 retained so that the receiver has access to all of the message 322 data. 324 3.4. Other Extensions Relating to Data Placement 326 In order to support send-based data placement, new placement-related 327 data structures have been defined, as described in Sections 7.3 and 328 7.4. 330 These new data structures support both send-based and RDMA-operation- 331 based data placement. In addition, because of the restructuring 332 described in Section 7.1, a number of additional facilities are made 333 available: 335 o The ability to restrict entries regarding data placement in 336 response data to XDR data items generated in response to 337 performing particular constituent operations within a given RPC 338 request (e.g. specific operations within an NFSv4 COMPOUND). 340 o The ability to make use of special data placement contingent on 341 the actual length of a placement-eligible data item in the 342 response. 344 o The ability to specify whether use of data placement for a 345 particular placement-eligible data item is required or optional. 347 These additional facilities will be available to implementations that 348 do not support send-based data placement, as long as both parties 349 support the OPTIONAL Header types that include these new structures. 350 For more information about the relationships among, the new transport 351 properties, operations, and features, see Section 5. 353 4. Message Continuation Feature 355 4.1. Current Situation 357 Within RPC-over-RDMA Version One [rfc5666bis], each transmission of a 358 request or reply involves sending a single RDMA send message and 359 conversely each message-related transmission involves only a single 360 RPC request or reply. 362 This strict one-to-one model leads to some potential performance 363 issues. 365 o Because of RDMA's use of fixed-size receives, some requests and 366 replies will inevitably not fit in the limited space available, 367 even if they do not contain any DDP-eligible bulk data. 369 Such cases will raise performance issues because, to deal with 370 them, the server is interrupted twice to receive a single request 371 and all the necessary transfers are serialized. In particular, 372 there are two server interrupt latencies involved before the 373 server can process the actual request, in addition to the OTW 374 round-trip latencies. 376 o In the case of replies, there may be cases in which reply chucks 377 need to be allocated and registered even if the actual reply would 378 fit within the fixed receive-size limit. Because the decision to 379 create a reply chunk is made at the time the request is sent, even 380 an extremely low probability of a longer reply will trigger 381 allocation of a reply chunk. 383 Because this decision is made in conformance with ULB rules, 384 which, by their nature, may only reference a limited set of data, 385 a reply chunk may be required even when the actual probability of 386 a long reply is exactly zero. For example a GETATTR request can 387 generate a long reply due to a long ACL, and thus COMPOUND with 388 this operation might allocate a reply chunk, even if the specific 389 file system being interrogated only supports ACLs of limited 390 sizes, or the GETATTR in question does not interrogate one of the 391 ACL attributes. Also, the OWNER attribute is a string and it may 392 be impossible to determine a priori that the owner of any 393 particular file has no chance of requiring more than 4K bytes of 394 space, for example. The assumption that there are no such user 395 names, while it probably is valid, is not a fact that RPC-over- 396 RDMA implementations can depend on. 398 4.2. Message Continuation Changes 400 Continuing a single RPC request or reply is addressed by defining 401 separate optional header types to begin and to continue sending a 402 single RPC message. This is instead of creating a header with a 403 continuation bit. In this approach, all of the fields relating to 404 data placement, which include support for send-based data placement, 405 appear in the starting header (of types ROPT_XMTREQ and ROPT_XMTRESP) 406 and apply to the RPC message as a whole. 408 Later RPC-over-RDMA messages (of type ROPT_XMTCONT) may extend the 409 payload stream and/or provide additional buffers to which bulk data 410 can be directed. 412 In this case, all of the RPC-over-RDMA messages used together are 413 referred to as a transmission group and must be received in order 414 without any intervening message. 416 In implementations using this optional facility, those decoding RPC 417 messages received using RPC-over-RDMA no longer have the assurance 418 that that each RPC message is in a contiguous buffer. As most XDR 419 implementations are built based on the assumption that input will not 420 be contiguous, this will not affect performance in most cases. 422 4.3. Message Continuation and Credits 424 Using multiple transmissions to send a single request or response can 425 complicate credit management. In the case of the message 426 continuation feature, deadlocks can be avoided because use of message 427 continuation is not obligatory. The requester or responder can use 428 explicit RDMA operations if sufficient credits to use message 429 continuation are not available. 431 A requester is well positioned to make this choice with regard to the 432 sending of requests. The requester must know, before sending a 433 request, how long it will be, and therefore, how many credits it 434 would require to send the request using message continuation. If 435 these are not available, it can avoid message continuation by either 436 creating read chunks sufficient to make the payload stream fit in a 437 single transmission or by creating a position-zero read chunk. 439 With regard to the response, the requester is not in position to know 440 exactly how long the response will be. However, the ULB will allow 441 the maximum response length to be determined based on the request. 442 This value can be used: 444 o To determine the maximum number of receive buffers that might be 445 required to receive any response sent. 447 o To allocate and register a reply chunk to hold a possible large 448 reply. 450 The requester can avoid doing the second of these if the responder 451 has indicated it can use message continuation to send the response. 452 In this case, it makes sure that the buffers will be available and 453 indicates to the responder how many additional buffers (in the form 454 of pre-posted reads have been made available to accommodate 455 continuation transmissions. 457 When the responder processes the request, those additional receive 458 buffers may be used or not, or used only in part. This may be 459 because the response is shorter than the maximum possible response, 460 or because a reply chunk was used to transmit the response. 462 After the first or only transmission associated with the response is 463 received by the requester, it can be determined how many of the 464 additional buffers were used for the response. Any unused buffers 465 can be made available for other uses such as expanding the pool of 466 receive buffers available for the initial transmissions of response 467 or for receiving opposite direction requests. Alternatively, they 468 can be kept in reserve for future uses, such as being made available 469 to future requests which have potentially long responses. 471 5. Using Protocol Additions 473 In using existing RPC-over-RDMA facilities for protocol extension, 474 interoperability with existing implementations needs to be assured. 475 Because this document describes support for multiple features, we 476 need to clearly specify the various possible extensions and how peers 477 can determine whether certain facilities are supported by both ends 478 of the connection. 480 5.1. New Operation Support 482 Note that most of the new operations defined in this extension are 483 not tightly tied to a specific feature. XOPT_XMTREQ and XOPT_XMTRESP 484 are designed to support implementations that support either or both 485 Send-based data placement or message continuation. However, the 486 converse is not the case and these header types can be implemented by 487 those not supporting either of these features. For example, 488 implementations may only need support for the facilities described in 489 Section 3.4. 491 Implementations may determine whether a peer implementation supports 492 XOPT_XMTREQ, XOPT_XMTREQ, or XOPT_XMTCONT by attempting these 493 operations. An alternative is to interrogate the RTR Support 494 Property for information about which operations are supported. 496 5.2. Message Continuation Support 498 Implementations may determine and act based on the level of peer 499 implementation of support for message continuation as follows: 501 o To deal with issues relating to sending the peer multi- 502 transmission requests, the requester can interrogate the peer's 503 value of the Request Transmission Receive Limit (Section 8.4). In 504 cases in which the property is not provided or has the value one, 505 the requester implementation can avoid sending multi-transmission 506 requests, and use the equivalent of position-zero read chunks to 507 convey a request larger than the receive buffer limit. 509 Similarly, if the request is longer than can fit in a set of 510 transmissions given that limit, the request can be conveyed in the 511 same fashion, 513 o To deal with issues relating to sending the peer multi- 514 transmission responses, responders will only send multi- 515 transmission responses for requests conveyed using XOPT_XMTREQ 516 where the number of response transmissions is less than or equal 517 to buffer reservation count (in the field optxrq_rsbuf). The 518 requester can avoid receiving a message consisting of too many 519 transmissions by setting this field appropriately. This includes 520 the case in which the requester cannot handle any multi- 521 transmission responses. 523 o To avoid reserving receive buffers that the responder is not 524 prepared to use, the requester can interrogate the peer's value of 525 the Response Transmission Send Receive Limit (Section 8.5). In 526 cases in which it is possible that a request might result in a 527 response too large for this set of buffers, the requester, the 528 requester can provide a reply chunk to receive the response, which 529 the responder can use if the count of buffers provided is 530 insufficient. 532 5.3. Support for Send-based Data Placement 534 Implementations may determine and adapt to the level of peer 535 implementation support for send-based data placement as described 536 below. Note that an implementation may be able to send messages 537 containing bulk data items placed using send-based data placement 538 while not being prepared to receive them, or the reverse. 540 o The requester can interrogate the responder's Receive Buffer 541 Structure Property. In cases in which the property is not 542 provided or shows no placement-targetable buffer segments, an 543 implementation knows that messages containing bulk data may not be 544 sent using send-based data placement. In such cases, when 545 XOPT_XMTREQ is used to send a request, bulk items may be 546 transferred by setting the associated placement information to 547 indicate that the bulk data is to be fetched using explicit RDMA 548 operations. 550 o In cases in which a requester is unprepared to accept messages 551 using send-based data placement, its Receive Buffer Structure 552 Property will make this clear to the responder. Nevertheless, the 553 requester will generally indicate to the responder that bulk data 554 items are to be returned using explicit RDMA operations. As a 555 result, requesters may use XOPT_XMTREQ (and get the benefit of the 556 placement-related features discussed in Section 3.4 even if they 557 support neither message continuation nor send-based data 558 placement. 560 o Since it is possible for a responder to generate responses 561 containing bulk data using send-based data placement even if it is 562 not prepared to send such message, a requester who is prepared to 563 accept such messages should specify in the request that the 564 responses are to contain (or may contain) bulk data placed in this 565 way. In deciding whether this is to be done the requester can 566 interrogate the responder's RTR Support Property for information 567 about which whether the peer can send responses in this form. It 568 can do this without regard to whether the responder can accept 569 messages containing bulk data items placed using send-based data 570 placement. 572 In determining whether bulk data will be placed using send-based data 573 placement or via explicit RDMA operations, the level of support for 574 message continuation will have a role. This is because DDP using 575 explicit RDMA will reduce message size while send-based data 576 placement reduces the size of the payload stream by rearranging the 577 message, leaving the message size the same. As a result, the 578 considerations discussed in Section 4.3 will have to be attended to 579 by the sender in determining which form of data placement is to be 580 used. 582 5.4. Error Reporting 584 The more extensive transport layer functionality described in this 585 document requires its own means of reporting errors, to deal with 586 issues that are distinct from: 588 o Errors (including XDR errors) in the XDR stream as received by 589 responder or requester. 591 o XDR errors detected in the XDR headers defined by the base 592 protocol. 594 o XDR errors detected in the new operations defined in this 595 document. 597 Beyond the above, the following sorts of errors will have to be dealt 598 with, depending on which of the features of the extension are 599 implemented. 601 o Information associated with send-based data placement may be 602 inconsistent or otherwise invalid, even though it conforms to the 603 XDR definition. 605 o There may be problems with the organization of transmission groups 606 in that there are missing or extraneous transmissions. 608 In each of the above cases, the problem will be reported to the 609 sender using the Error Reporting operation which needs to be 610 supported by every endpoint that sends ROPT_XMTREQ, ROPT_XMTRESP, or 611 ROPT_XMTCONT. This includes cases in which the problem is one with a 612 reply. The function of the Error Reporting operation is to aid in 613 diagnosing transport protocol errors and allowing the sender to 614 recover or decide recovery is not possible. Reporting failure to the 615 requesting process is dealt with indirectly. For example, 617 o When the transmissions used to send a request are ill-formed, the 618 requestor can respond to the error indication by proceeding to 619 send the request using existing (i.e. non-extended) facilities. 620 If it chooses not to do so, the requestor can report an RPC 621 request failure to the initiator of the RPC. 623 o When the transmissions used to send a response are ill-formed, the 624 responder need to know about the problem since it will otherwise 625 assume that the transmissions succeeded. It can proceed to resend 626 the reply using existing (i.e. non-extended) facilities. If it 627 chooses not to do so, the requester will not see a response and 628 eventually an RPC timeout will occur. 630 6. XDR Preliminaries 632 6.1. Message Continuation Preliminaries 634 In order to implement message continuation, we have occasion to refer 635 to particular RPC-over-RDMA transmissions within a transmission group 636 or to characteristics of a later transmission group. 638 640 typedef uint32 xms_grpxn; 641 typedef uint32 xms_grpxc; 642 struct xms_id { 643 uint32 xmsi_xid; 644 msg_type xmsi_dir; 645 xms_grpxn xmsi_seq; 646 } 648 650 An xms_grpxn designates a particular RPC-over-RDMA transmission 651 within a set of transmissions devoted to sending a single RPC 652 message. 654 An xms_grpxc specifies the number of RPC-over-RDMA transmissions in a 655 potential group of transmissions devoted to sending a single RPC 656 message. 658 6.2. Data Placement Preliminaries 660 Data structures related to data placement use a number of XDR 661 typedefs to help clarify the meaning of fields in the data structures 662 which use these typedefs. 664 666 typedef uint32 xmdp_itemlen; 667 typedef uint32 xmdp_pldisp; 668 typedef uint32 xmdp_vsdisp; 670 typedef uint32 xmdp_tbsn; 672 enum xmdp_type { 673 XMPTYPE_EXRW = 1, 674 XMPTYPE_TBSN = 2, 675 XMPTYPE_CHOOSE = 3, 676 XMPTYPE_BYSIZE = 4, 677 XMPTYPE_TOOSHORT = 5, 678 XMPTYPE_NOITEM = 6 679 }; 681 683 An xmdp_itemlen specifies the length of XDR item. Because items 684 excised from the XDR stream are XDR items, lengths of items excised 685 from the XDR stream are denoted by xmdp_itemlens. 687 An xmdp_pldisp specifies a specific displacement with the payload 688 stream associated with a single RPC-over-RDNA transmission or a group 689 of such transmissions. Note that when multiple transmissions are 690 used for a single message, all of the payload streams within a 691 transmission group are considered concatenated. 693 An xmdp_vsdisp specifies a displacement within the virtual XDR stream 694 associates with the set of RPC messages transferred by single RPC- 695 over-RDNA transmission or a group of such transmissions. The virtual 696 XDR stream includes bulk data excised from the payload stream and so 697 displacements within it reflect those of the corresponding objects in 698 the XDR stream that might be sent and received if no bulk data 699 excision facilities were involved in the RPC transmission. 701 An xmdp_tbsn designates a particular target buffer segment within a 702 (trivial or non-trivial) RPC-over-RDMA transmission group. Each 703 placement-targetable buffer segment is assigned a number starting 704 with zero and proceeding through all the buffer segments for all the 705 RPC-over-RDMA transmissions in the group. This includes buffer 706 segments not actually used because transmission are shorter than the 707 maximum size and those in which a placement-targetable buffer segment 708 is used to hold part of the payload XDR stream rather than bulk data. 710 An xmdp_type allows a selection between placement using explicit RDMA 711 operations (i.e. DDP) and send-based data placement. Fields of this 712 type are used in a number of contexts. The specific context governs 713 which subset of the types is valid: 715 o In request messages, they indicate where each of the specially 716 placed data items within the request has been placed. In this 717 case, xmdp_type appears as the discriminator within an xmdp_loc 718 which is part of an xmdp_mitem that is an element within a 719 request's optxrq_dp field. 721 o In request messages, they direct the responder as to where 722 potential specially placed items are to be placed. In this case, 723 xmdp_type appears as the discriminator within an xmdp_rsdloc which 724 is part of an xmdp_rsditem that is an element within a request's 725 optxrq_rsd field. 727 o In response messages, they indicate how each of the potential 728 specially placed items has been dealt with. A subset of these 729 specially placed data items and are presented in the same form as 730 that used for specially placed data items within a request. In 731 this case, xmdp_type appears as the discriminator within an 732 xmdp_loc which is part of an xmdp_mitem that is an element within 733 a response's optxrs_dp field. 735 A number of these type are valid in all of these contexts, since they 736 specify use of a specific mode of data placement which is to be used 737 or has been used. 739 o XMPTYPE_EXRW selects DDP using explicit RDMA reads and writes. 741 o XMPTYPE_TBSN selects use of send-based data placement in which 742 placement-eligible data is located in placement-targetable buffer 743 segments. 745 Another set of types is used to direct the use of specific sets of 746 types but cannot specify an actual choice that has been made. 748 o XMPTYPE_CHOICE indicates that the responder may use either send- 749 based data placement or chunk-based DDP using explicit RDMA 750 operations, with a target location for the latter having been 751 provided by the requester. 753 o XMPTYPE_BYSIZE indicates that the responder is to use either send- 754 based data placement or chunk-based DDP using explicit RDMA 755 operations, with the choice between the two governed by the actual 756 size of the associated DDP-eligible XDR item. 758 The following types are used when no actual special placement has 759 occurred. They are used in responses to indicate ways in which a 760 direction to govern data placement in a reply was responded to 761 without resulting in special placement. 763 o XMPTYPE_TOOSHORT indicates that the corresponding entry in an 764 xmdp_rsdset was matched with a DDP-eligible item which was too 765 small to be handled using special placement, resulting in the DDP- 766 eligible item being placed inline. 768 o XMPTYPE_NOITEM indicates that the corresponding entry in an 769 xmdp_rsdset was not matched with a DDP-eligible item in the reply. 771 The following table indicates which of the above types is valid in 772 each of the contexts in which these types may appear. For valid 773 occurrences, it distinguishes those which give sender-generated 774 information about the message, and those that direct reply 775 construction, from those that indicate how those directions governed 776 the construction of a reply. For invalid occurrences, we distinguish 777 between those that result in XDR decode errors and those which are 778 valid from the XDR point of view but are semantically invalid. 780 +------------------+--------------+-----------------+---------------+ 781 | Type | xmdp_loc in | xmdp_rsdloc in | xmdp_loc in | 782 | | request | request | response | 783 +------------------+--------------+-----------------+---------------+ 784 | XMPTYPE_EXRW | Valid Info | Valid Direction | Valid Result | 785 | XMPTYPE_TBSN | Valid Info | Valid Direction | Valid Result | 786 | XMPTYPE_BYSIZE | XDR Invalid | Valid Direction | XDR Invalid | 787 | XMPTYPE_CHOICE | XDR Invalid | Valid Direction | XDR Invalid | 788 | XMPTYPE_TOOSHORT | Sem. Invalid | XDR Invalid | Valid Result | 789 | XMPTYPE_NOITEM | Sem. Invalid | XDR Invalid | Valid Result | 790 +------------------+--------------+-----------------+---------------+ 792 Table 1 794 7. Data Placement Structures 796 7.1. Data Placement Overview 798 To understand the new data placement structures defined here, it is 799 necessary to review the existing DDP structures used in RPC-over-RDMA 800 Version One and look at the corresponding structures in the new 801 message transmission headers defined in this document. 803 We look first at the existing structures. 805 o Read chunks are specified on requests to indicate data items to be 806 excised from the payload stream and fetched from the requester's 807 memory by the responder. As such, they serve as a means of 808 supplying data excised from the payload XDR stream. 810 Read chunks appear in replies but they have no clear function 811 there. 813 o Write chunks are specified on requests to provide locations in 814 requester memory to which DDP-eligible items in the corresponding 815 reply are to be transferred. They do not describe data in the 816 request but serve to direct reply construction. 818 When write chunks appear in replies they serve to indicate the 819 length of the data transferred. The addresses to which the bulk 820 reply data has been transferred is available, but this information 821 is already known to the requester. 823 o Reply chunks are specified to provide a location in the 824 requester's memory to which the responder can transfer the 825 response using RDMA Write. Like write chunks, they do not 826 describe data in the request but serve to direct reply 827 construction. 829 When reply chunks appear in reply message headers, they serve 830 mainly to indicate whether the reply chunk was actually used. 832 Within the data placement structures defined here a different 833 organization is used, even where DDP using explicit RDMA operations 834 in supported. 836 o All messages that contain bulk data contain structures that 837 indicate where the excised data is located. See Section 7.3 for 838 details. 840 o Requests that might generate replies containing bulk data contain 841 structures that provide guidance as to where the bulk data is to 842 be placed. See Section 7.4 for details. 844 Both sets of data structure are defined at the granularity of an RPC- 845 over-RDMA transmission group. That is, they describe the placement 846 of data within an RPC message and the scope of description is not 847 limited to a single RPC-over-RDMA transmission. 849 7.2. Buffer Structure Definition 851 Buffer structure definition information is used to allow the sender 852 to know how receive buffers are constructed, to allow it to 853 appropriately pad messages being sent so that bulk data will be 854 received into a memory area with the appropriate characteristics. 856 In this case, data placement will not place data in a specific 857 address, picked and registered in advance as is done to effect DDP 858 using explicit RDMA operations. Instead, a message is sent so that 859 when it is matched with one of the preposted receives, the bulk data 860 will be received into a memory area with the appropriate 861 characteristics, including: 863 o size 865 o alignment 867 o placement-targetability and potentially other memory 868 characteristics such as speed, persistence. 870 872 struct xmrbs_seg { 873 uint32 xmrseg_length; 874 uint32 xmrseg_align; 875 uint32 xmrseg_flags; 876 }; 878 const uint32 XMRSFLAG_PLT = 0x01; 880 struct xmrbs_group { 881 uint32 xmrgrp_count; 882 xmrbs_seg xmrgrp_info; 883 }; 885 struct xmrbs_buf { 886 uint32 xmrbuf_length; 887 xmrbs_group xmrbuf_groups<>; 888 }; 890 892 Buffers can be, and typically are, structured to contain multiple 893 segments. Preposted receives that target a buffer uses a scatter 894 list to place received messages in successive buffer segments. 896 An xmrbs_seg defines a single buffer segment. The fields included 897 are: 899 o xmrseg_length is the length of this contiguous buffer segment 901 o xmrseg_align specifies the guaranteed alignment for the 902 corresponding buffer segment. 904 o xmrseg_flags which specify some noteworthy characteristics of the 905 associated buffer segment. 907 The following flag bit is the only one currently defined: 909 o XMRSFLAG_PLT indicates that the buffer segment in question is to 910 be considered suitable as a target for data placement. 912 An xmrgs_group designates a set of buffer segment all with the same 913 buffer segment characteristics as indicated by xmr_grpinfo. The 914 buffer segments are contiguous within the buffer although they are 915 likely not to be physically contiguous. 917 An xmrbs_buf defines a receiver's buffer structure and consists of 918 multiple xmrbs_groups. This buffer structure, when made available as 919 a transport property, allows the sender to structure transmissions so 920 as to place DDP-eligible data in appropriate target buffer segments. 922 7.3. Message Data Placement Structures 924 These data structures show where in the virtual XDR stream for the 925 set of messages, data is to be excised from that XDR stream and where 926 that excised bulk data is to be found instead. 928 930 union xmdp_loc switch(xmdp_type type) 932 case XMPTYPE_EXRW: 933 rpcrdma1_segment xmdl_ex<>; 934 case XMPTYPE_TBSN: 935 xmdp_itemlen xmdl_offset; 936 xmdp_tbsn xmdl_bsnum<>; 937 case XMPTYPE_TOOSHORT: 938 case XMPTYPE_NOITEM: 939 void; 940 }; 942 struct xmdp_mitem { 943 xmdp_vsdisp xmdmi_disp; 944 xmdp_itemlen xmdmi_length; 945 xmdp_loc xmdmi_where; 946 }; 948 typedef xmdp_mitem xmdp_grpinfo<>; 950 952 An xmdp_loc shows where a particular piece of bulk data is located. 953 This information exists in multiple forms. 955 o The case for DDP using explicit RDMA operations, contains, in 956 xmdl_ex an array of rpcrdma1_segments showing where bulk data is 957 to be fetched from or has been transferred to. 959 o The case for send-based data placement contains, in xmdl_tbsn an 960 array placement-targetable buffer segments, indicating where bulk 961 data, excised from the payload stream, is actually located. The 962 bulk data starts xmdl_offset bytes into the buffer segment 963 designated by xmdl_bsnum[0] and then proceeds through buffer 964 segments denoted by successive xmdl_bsnum entries until the length 965 of the data item is exhausted. 967 o The cases for XMPTYPE_TOOSHORT and XMPTYPE_NOITEM are only valid 968 in responses 970 An xmdp_mitem denotes a specific item of bulk data. It consists of: 972 o The XDR stream displacement of the bulk data excised from the 973 payload stream, in xmdmi_disp. 975 o The length of the data item, in xmdmi_length. 977 o The actual location of the bulk data, in xmdmi_loc. 979 An xmdp_grpinfo consists of an array of xmdp_mitems describing all of 980 the bulk data excised from all RPC messages sent in a single RPC- 981 over-RDMA transmission group. Some possible cases: 983 o The array is of length zero, indicating that there is no DDP- 984 eligible data excised from the virtual XDR stream. In this case, 985 the virtual XDR stream and the payload stream are identical. 987 o The array consists of one or more xmdp_mitems, each of whose 988 xmdmi_where fields is of type XMPTYPE_EXRW. In this case, the 989 placement data corresponds to read chunks in the case in which a 990 request is being sent and to write chunks in the case in which a 991 reply is being sent. 993 o The array consists of one or more xmdp_mitems, each of whose 994 xmdmi_where fields is of type XMPTYPE_TBSN. In this case, each 995 entry, whether it applies to bulk data in a request or a reply, 996 describes data logically part of the message being sent, which may 997 be part of any RPC-over-RDMA transmissions in the same 998 transmission group. 1000 o The array consists of one or more xmdp_mitems, with xmdmi_where 1001 fields of a mixture of types, In this case, each entry, whether it 1002 applies to bulk data in a request or a reply, describes data 1003 logically part of the message being sent, although the method of 1004 getting access to that data may vary from entry to entry. 1006 7.4. Response Direction Data Placement Structures 1008 These data structures, when sent as part of the request, instruct the 1009 responder how to use data placement to place response data subject to 1010 special data placement. 1012 1014 union xmdp_rsdloc switch(xmdp_type type) 1016 case XMPTYPE_EXRW: 1017 case XMPTYPE_CHOICE: 1018 rpcrdma1_segment xmdrsdl_ex<>; 1019 case XMPTYPE_BYSIZE: 1020 xmdp_itemlen xmdrsdl_dsdov; 1021 rpcrdma1_segment xmdrsdl_bsex<>; 1022 case XMPTYPE_TBSN: 1023 void; 1024 }; 1026 struct xmdp_rsdrange { 1027 xmdp_vsdisp xmdrsdr_begin; 1028 xmdp_vsdisp xmdrsdr_end; 1029 }; 1031 struct xmdp_rsditem { 1032 xmdp_itemlen xmdrsdi_minlen; 1033 xmdp_rsdloc xmdrsdi_loc; 1034 }; 1036 struct xmdp_rsdset { 1037 xmdp_rsdrange xmdrsds_range; 1038 xmdp_rsditem xmdrsds_items<>; 1039 }; 1041 typedef xmdp_rsdset xmdp_rsdgroup<>; 1043 1045 An xmdp_rsdloc contains information specifying where bulk data 1046 generated as part of a reply is to be placed. This information is 1047 defined as a union with the following cases: 1049 o The case for DDP using explicit RDMA operations, XMPTYPE_EXRW, 1050 contains, in xmrsdl_ex, an array of rpcrdma1_segments showing 1051 where bulk data generated by the corresponding reply is to be 1052 transferred to. 1054 o The case allowing the responder to freely choose the data 1055 placement method, XMPTYPE_CHOICE, is identical. It also contains, 1056 in xmrsdl_ex, an array of rpcrdma1_segments showing where bulk 1057 data generated by the corresponding reply is to be transferred to 1058 if explicit RDMA requests are to be used. 1060 o The case for send-based data placement, XMPTYPE_TBSN, is void, 1061 since the decisions as to where bulk data is to be placed are made 1062 by the responder. 1064 o In the case directing the responder to choose the data placement 1065 method based on item size, XMPTYPE_BYSIZE, an array of 1066 rpcrdma1_segments is in xmrsdl_bsex. 1068 In all cases, each xmdp_rsdloc sent as part of a request has a 1069 corresponding xmdp_loc in the associated response. The xmdp_type 1070 specified in the request will affect the type in the response, but 1071 the types are not necessarily the same. The table below describes 1072 the valid combinations of request and response xmdp_type values. 1074 In this table, rows correspond to types in requests directing, the 1075 responder as to the desired placement in the response while the 1076 columns correspond to types in the ensuing response. Invalid 1077 combinations are labelled "Inv" while valid combination are labelled 1078 either "NDR" denoting no need to deregister memory, or "DR" to 1079 indicate that memory previously registered will need to be 1080 deregistered. 1082 +---------+--------+--------+-----------+---------+ 1083 | Type | EXRW | TBSN | TOOSHORT | NOITEM | 1084 +---------+--------+--------+-----------+---------+ 1085 | EXRW | DR | Inv. | DR | DR | 1086 | TBSN | Inv. | NDR | NDR | NDR | 1087 | CHOICE | DR | NDR | DR | DR | 1088 | BYSIZE | DR | NDR | DR | DR | 1089 +---------+--------+--------+-----------+---------+ 1091 Table 2 1093 An xmdp_rsdrange denotes a range of positions in the XDR stream 1094 associated with a request. Particular directions regarding bulk data 1095 in the corresponding response are limited to such ranges, where 1096 response XDR stream positions and request XDR stream positions can be 1097 reliably tied together. 1099 When the ULP supports multiple individual operations per RPC request 1100 (e.g., COMPOUND and CB_COMPOUND in NFSv4), an xmd_rsdrange can 1101 isolate elements of the reply due to particular operations. 1103 An xmdp_rsditem specifies the handling of one potential item of bulk 1104 data. The handling specified is qualified by a length range. If the 1105 item is smaller than xmdrsdi_minlen, it is not treated as bulk data 1106 and the corresponding data item appears in the payload stream, while 1107 that particular xmdp_rsditem is considered used up, making the next 1108 xmdp_rsditem in the xmdp_rsdset the target of the next DDP-eligible 1109 data item in the reply. Note that in the case in which xmdrsdi_loc 1110 specifies use of explicit RDMA operations, the area specified is not 1111 used and the requester is responsible for deregistering it. 1113 For each xmdp_rsditem, there will be a corresponding xmdp_mitem 1115 An xmdp_rsdset contains a set of xmdp_rsditems applicable to a given 1116 xmdp_range in the request. 1118 An xmdp_rsdgroup designates a set of xmdp_rsdsets applicable to a 1119 particular RPC-over-RDMA transmission group. The xmdrsds_range 1120 fields of successive xmdp_rsdsets must be disjoint and in strictly 1121 increasing order. 1123 8. Transport Properties 1125 8.1. Property List 1127 In this document we take advantage of the fact that the set of 1128 transport properties defined in [rpcrdmav2] is subject to later 1129 extension. The additional transport properties are summarized below 1130 in Table 3. 1132 In that table the columns have the following values: 1134 o The column labeled "property" identifies the transport property 1135 described by the current row. 1137 o The column labeled "#" specifies the propid value used to identify 1138 this property. 1140 o The column labeled "XDR type" gives XDR type of the data used to 1141 communicate the value of this property. This data overlays the 1142 nominally opaque field pv_data in a propval. 1144 o The column labeled "default" gives the default value for the 1145 property which is to be assumed by those who do not receive, or 1146 are unable to interpret, information about the actual value of the 1147 property. 1149 o The column labeled "section" indicates the section (within this 1150 document) that explains the semantics and use of this transport 1151 property. 1153 +------------------------------+----+-----------+---------+---------+ 1154 | property | # | XDR type | default | section | 1155 +------------------------------+----+-----------+---------+---------+ 1156 | RTR Support | 3 | uint32 | 0 | 8.2 | 1157 | Receive Buffer Structure | 4 | xmrbs_buf | Note1 | 8.3 | 1158 | Request Transmission Receive | 5 | xms_grpxc | 1 | 8.4 | 1159 | Limit | | | | | 1160 | Response Transmission Send | 6 | xms_grpxc | 1 | 8.5 | 1161 | Limit | | | | | 1162 +------------------------------+----+-----------+---------+---------+ 1164 Table 3 1166 The following notes apply to the above table: 1168 1. The default value for the Receive Buffer Structure always 1169 consists of a single buffer segment, without any alignment 1170 restrictions and not targetable for DDP. The length of that 1171 buffer segment derives from the Receive Buffer Size Property if 1172 available, and from the default receive buffer size otherwise. 1174 8.2. RTR Support Property 1176 1178 const uint32 XPROP_RTRSUPP = 3; 1179 typedef uint32 xpr_rtrs; 1181 const uint32 RTRS_XREQ = 1; 1182 const uint32 RTRS_XRESP = 2; 1183 const uint32 RTRS_XCONT = 4; 1185 1187 8.3. Receive Buffer Structure Property 1189 This property defines the structure of the endpoint's receive 1190 buffers, in order to give a sender the ability to place bulk data in 1191 specific DDP-targetable buffer segments. 1193 1195 const uint32 XPROP_RBSTRUCT = 4; 1196 typedef xmrbs_buf xpr_rbs; 1198 1199 Normally, this property, if specified, should be in agreement with 1200 Receive Buffer Size Property. However, the following rules apply. 1202 o If the value of Receive Buffer Structure Property is not 1203 specified, it is derived from the Receive Buffer Size Property, if 1204 known, and the default buffer size otherwise. The buffer is 1205 considered to consist of a single non-DDP-targetable segment whose 1206 size is the buffer size. 1208 o If the value of Receive Buffer Size Property is not specified and 1209 the Receive Buffer Structure Property is specified, the value of 1210 the former is derived from the latter, by adding up the length of 1211 all buffer segments specified. 1213 8.4. Request Transmission Receive Limit Property 1215 This property specifies the length of the longest request messages 1216 (in terms of number of transmissions) that a responder will accept. 1218 1220 const uint32 XPROP_REQRXLIM = 5; 1221 typedef uint32 xpr_rqrxl; 1223 1225 A requester can use this property to determine whether to send long 1226 requests by using message continuation or by using a position-zero 1227 read chunk. 1229 8.5. Response Transmission Send Limit Property 1231 This property specifies the length of the longest response message 1232 (in terms of number of transmissions) that a responder will generate. 1234 1236 const uint32 XPROP_RESPSXLIM = 6; 1237 typedef uint32 xpr_rssxl; 1239 1241 9. New Operations 1242 9.1. Operations List 1244 The proposed new operation are set for in Table 4 below. In that 1245 table, the columns have the following values: 1247 o The column labeled "operation" specifies the particular operation. 1249 o The column labeled "#" specifies the value of opttype for this 1250 operation. 1252 o The column labeled "XDR type" gives XDR type of the data structure 1253 used to describe the information in this new message type. This 1254 data overlays the nominally opaque field optinfo in an 1255 RDMA_OPTIONAL message. 1257 o The column labeled "msg" indicates whether this operation is 1258 followed (or not) by an RPC message payload (or something else). 1260 o The column labeled "section" indicates the section (within this 1261 document) that explains the semantics and use of this optional 1262 operation. 1264 +--------------------+----+--------------+--------+----------+ 1265 | operation | # | XDR type | msg | section | 1266 +--------------------+----+--------------+--------+----------+ 1267 | Transmit Request | 5 | optxmt_req | Note1 | 9.2 | 1268 | Transmit Response | 6 | optxmt_resp | Note1 | 9.3 | 1269 | Transmit Continue | 7 | optxmt_cont | Note2 | 9.4 | 1270 | Report Error | 8 | optrept_err | No. | 9.5 | 1271 +--------------------+----+--------------+--------+----------+ 1273 Table 4 1275 The following notes apply to the above table: 1277 1. Contains an initial segment of the message payload stream for an 1278 RPC message, or the entre payload stream. The optxr[qs]_pslen 1279 field, indicates the length of the section present 1281 2. May contain a part of a message payload stream for an RPC 1282 message, although not the entre payload stream. The optxc_pslen 1283 field, if non-zero, indicates that this portion is present, and 1284 the length of the section. 1286 9.2. Transmit Request Operation 1288 The message definition for this operation is as follows: 1290 1292 const uint32 ROPT_XMTREQ = 1; 1294 struct optxmt_req { 1295 xmdp_grpinfo optxrq_dp; 1296 xmdp_rsdgroup optxrq_rsd; 1297 xms_grpxc optxrq_count; 1298 xms_grpxc optxrq_rsbuf; 1299 xmdp_pldisp optxrq_pslen; 1301 }; 1303 1305 The field optxrq_dp describes the fields in virtual XDR stream which 1306 have been excised in forming the payload stream, and information 1307 about where the corresponding bulk data is located. 1309 The field optxrq_rsd consists of information directing the responder 1310 as to how to construct the reply, in terms of DDP. of length zero. 1312 The field optrq_count specifies the count of transmissions in this 1313 group of transmissions used to send a request. 1315 The field optrq_repch serves as a way to transfer a reply chunk to 1316 the responder to serve as a way in which a reply longer than the 1317 inline size limit may be transferred. Although, not prohibited by 1318 the protocol, it is unlikely to be used in environments in which 1319 message continuation is supported. 1321 The field optrq_pslen gives the length of the payload stream for the 1322 RPC transmitted. The payload stream begins right after the end of 1323 the optxmt_msg and proceeds for optxm_pslen bytes. This can include 1324 crossing buffer segment boundaries. 1326 9.3. Transmit Response Operation 1328 The message definition for this operation is as follows: 1330 1332 const uint32 ROPT_XMTRESP = 2; 1334 struct optxmt_resp { 1335 xmdp_grpinfo optxrs_dp; 1336 xms_grpxn optxrs_count; 1337 xmdp_pldisp optxrs_pslen; 1339 }; 1341 1343 The field optxrs_dp describes the fields in virtual XDR stream which 1344 have been excised in forming the payload stream, and information 1345 about where the corresponding bulk data is located. 1347 The field optrs_count specifies the count of transmissions in this 1348 group of transmissions used to send a reply. 1350 The field optrq_pslen gives the length of the payload stream for the 1351 RPC transmitted. The payload stream begins right after the end of 1352 the optxmt_msg and proceeds for optxm_pslen bytes. This can include 1353 crossing buffer segment boundaries. 1355 9.4. Transmit Continue Operation 1357 RPC-over-RDMA headers of this type are used to continue RPC messages 1358 begun by RPC-over-RDMA message of type ROPT_XMTREQ or ROPT_XMTRESP. 1359 The xid field of this message must match that in the initial 1360 transmission. 1362 This operation needs to be supported for the message continuation 1363 feature to be used. 1365 The message definition for this operation is as follows: 1367 1369 const uint32 ROPT_XMTCONT = 3; 1371 struct optxmt_cont { 1372 xms_grpxn optxc_xnum; 1373 uint32 optxc_itype; 1374 xmdp_pldisp; optxc_pslen; 1375 }; 1377 1378 The field optxc_xnum indicates the transmission number of this 1379 transmission within its transmission group. 1381 The field optxc_pslen gives the length of the section of the payload 1382 stream which is located in the current RPC-over-RDMA transmission. 1383 It is valid for this length to be zero, indicating that there is no 1384 portion of the payload stream in this transmission. Except when the 1385 length is zero, the payload stream begins right after the end of the 1386 optxmt_cont and proceeds for optxc_pslen bytes. This can include 1387 crossing buffer segment boundaries. In any case, the payload streams 1388 for all transmissions within the same group are considered 1389 concatenated. 1391 9.5. Error Reporting Operation 1393 This RPC-over-RDMA message type is used to signal the occurrence of 1394 errors that do not involve: 1396 1. Transmission of a message that violates the rules specified in 1397 [rpcrdmav2]. 1399 2. Transmission of a message described in this document which does 1400 not conform to the XDR specified here. 1402 3. The transmission of a message, which, when assembled according to 1403 the rules here, cannot be decoded according to the XDR for the 1404 ULP. 1406 Such errors can arise if the rules specified in this document are not 1407 followed and can be the result of a mismatch between multiple, each 1408 of which is valid when considered on its own. 1410 The preliminary error-related definition is as follows: 1412 1414 enum optr_err { 1415 OPTRERR_BADHMT = 1, 1416 OPTRERR_BADOMT = 2, 1417 OPTRERR_BADCONT = 3, 1418 OPTRERR_BADSEQ = 4, 1419 OPTRERR_BADXID = 5, 1420 OPTRERR_BADOFF = 6, 1421 OPTRERR_BADTBSN = 7, 1422 OPTRERR_BADPL = 8 1423 } 1425 union optr_info switch(optr_err optre_which) { 1427 case OPTRERR_BADHMT: 1428 case OPTRERR_BADOMT: 1429 case OPTRERR_BADSEQ: 1430 case OPTRERR_BADXID: 1431 uint32 optri_expect; 1432 uint32 optri_current; 1434 case OPTRERR_BADCONT: 1435 void; 1437 case OPTRERR_BADTBSN: 1438 case OPTRERR_BADOFF: 1439 case OPTRERR_BADPL: 1440 uint32 optri_value; 1441 uint32 optri_min; 1442 uint32 optri_max; 1444 }; 1446 1448 optr_err enumerates the various error conditions that might be 1449 reported. 1451 o OPTRERR_BADHMT indicates that a header message type other than the 1452 one expected was received. In this context, a particular message 1453 type can be considered "expected" only because of message or group 1454 continuation. 1456 o OPTRERR_BADOMT indicates that an optional message type other than 1457 the one expected was received. In this context, a particular 1458 message type can be considered "expected" only because of message 1459 or group continuation. 1461 o OPTRERR_BADCONT indicates that a continuation messages was 1462 received when there was no reason to expect one. 1464 o OPTRERR_BADSEQ indicate that a transmission sequence number other 1465 than the one expected was received. 1467 o OPTRERR_BADXID indicate that an xid other than the one expected in 1468 a continuation context. 1470 o OPTRERR_BADTBSN indicate that an invalid target buffer sequence 1471 number was received. 1473 o OPTRERR_BADOFF indicate that a bad offset was received as part of 1474 an xmdp_loc. This is typically because the offset is larger than 1475 the buffer segment size. 1477 o OPTRERR_BADPL indicates that a bad offset was received for the 1478 payload length. This is typically because the length would make 1479 the area devoted to the payload stream not a subset of the actual 1480 transmission. 1482 The optr_info gives error about the specific invalid field being 1483 reported. The additional information given depends on the specific 1484 error. 1486 o For the errors OPTRERR_BADHMT, OPTRERR_BADOMT, OPTRERR_BADSEQ, and 1487 OPTRERR_BADXID, the expected and actual values of the field are 1488 reported 1490 o For the error OPTRERR_CONT, no additional information is provided. 1492 o For the errors OPTRERR_BADTBSN, OPTRERR_BADOFF, and OPTRERR_BADPL, 1493 the actual value together with a range of valid values is 1494 provided. When the actual value is with the valid range, it can 1495 be inferred that the actual value is not properly aligned (e.g. 1496 not on a 32-bit boundary) 1498 The message definition for this operation is as follows: 1500 1502 const uint32 ROPT_REPTERR = 4; 1504 struct optrept_err { 1505 xms_id optre_bad; 1506 xms_id *optre_lead; 1507 optr_info optre_info; 1508 }; 1510 1512 The field optre_bad is a description of the transmission on which the 1513 error was actually detected. 1515 The optional field optre_lead is a description of an earlier 1516 transmission that might have led to the error reported. 1518 The field optre_info provides information about the 1520 10. XDR 1522 This section contains an XDR [RFC4506] description of the proposed 1523 extension. 1525 This description is provided in a way that makes it simple to extract 1526 into ready-to-use form. The reader can apply the following shell 1527 script to this document to produce a machine-readable XDR description 1528 of extension which can be combined with XDR for the base protocol to 1529 produce an XDR that includes the base protocol together with the 1530 optional extensions. 1532 1534 #!/bin/sh 1535 grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??' 1537 1539 That is, if the above script is stored in a file called "extract.sh" 1540 and this document is in a file called "ext.txt" then the reader can 1541 do the following to extract an XDR description file for this 1542 extension: 1544 1546 sh extract.sh < ext.txt > xmitext.x 1548 1550 The XDR description for this extension can be combined with that for 1551 other extensions and that for the base protocol. While this is a 1552 complete description and can be processed by the XDR compiler, the 1553 result might not be usable to process the extended protocol, for a 1554 number of reasons: 1556 The RPC-over-RDMA transport headers do not constitute an RPC 1557 program and version negotiation and message selection part of the 1558 XDR, rather than being external to it. 1560 Headers used for requests and replies are not necessarily paired, 1561 as they would be in an RPC program. 1563 Header types defined as optional extensions overlay existing 1564 nominally opaque fields in the base protocol. While this overlay 1565 architecture allows code aware of the overlay relationships to 1566 have a more complete view of header structure, this overlay 1567 relationship cannot be expressed within the XDR language 1569 10.1. Code Component License 1571 Code components extracted from this document must include the 1572 following license text. When the extracted XDR code is combined with 1573 other complementary XDR code which itself has an identical license, 1574 only a single copy of the license text need be preserved. 1576 1578 /// /* 1579 /// * Copyright (c) 2010, 2016 IETF Trust and the persons 1580 /// * identified as authors of the code. All rights reserved. 1581 /// * 1582 /// * The author of the code is: D. Noveck. 1583 /// * 1584 /// * Redistribution and use in source and binary forms, with 1585 /// * or without modification, are permitted provided that the 1586 /// * following conditions are met: 1587 /// * 1588 /// * - Redistributions of source code must retain the above 1589 /// * copyright notice, this list of conditions and the 1590 /// * following disclaimer. 1591 /// * 1592 /// * - Redistributions in binary form must reproduce the above 1593 /// * copyright notice, this list of conditions and the 1594 /// * following disclaimer in the documentation and/or other 1595 /// * materials provided with the distribution. 1596 /// * 1597 /// * - Neither the name of Internet Society, IETF or IETF 1598 /// * Trust, nor the names of specific contributors, may be 1599 /// * used to endorse or promote products derived from this 1600 /// * software without specific prior written permission. 1601 /// * 1602 /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS 1603 /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED 1604 /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1605 /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 1606 /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO 1607 /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 1608 /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 1609 /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 1610 /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 1611 /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 1612 /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 1613 /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 1614 /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 1615 /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 1616 /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 1617 /// */ 1619 1621 10.2. XDR Proper for Extension 1623 1624 /// /******************************************************************* 1625 /// ******************************************************************* 1626 /// ** 1627 /// ** XDR for OPTIONAL protocol extension. 1628 /// ** 1629 /// ** Includes support for both message continuation and send-based 1630 /// ** DDP. The latter is supported by a new structure for the 1631 /// ** specification of data placements which can be used for both 1632 /// ** send-based data placement and DDP using explicit RDMA 1633 /// ** operations. 1634 /// ** 1635 /// ** Extensions include: 1636 /// ** 1637 /// ** o Four new transport properties. 1638 /// ** o Four new OPTIONAL message types 1639 /// ** 1640 /// ******************************************************************* 1641 /// ******************************************************************/ 1642 /// 1643 /// /******************************************************************* 1644 /// * 1645 /// * Core XDR Definitions 1646 /// * 1647 /// ******************************************************************/ 1649 /// /* 1650 /// * General XDR preliminaries for these features, 1651 /// */ 1652 /// typedef uint32 xms_grpxn; 1653 /// typedef uint32 xms_grpxc; 1654 /// 1655 /// /* 1656 /// * Basic XDR typedefs for the new approach to the specification of 1657 /// 8 data placement. 1658 /// */ 1659 /// typedef uint32 xmdp_itemlen; 1660 /// typedef uint32 xmdp_pldisp; 1661 /// typedef uint32 xmdp_vsdisp; 1662 /// typedef uint32 xmdp_tbsn; 1663 /// 1664 /// /* 1665 /// * Define the possible types of data placement items. 1666 /// */ 1667 /// enum xmdp_type { 1668 /// XMPTYPE_EXRW = 1, 1669 /// XMPTYPE_TBSN = 2, 1670 /// XMPTYPE_CHOOSE = 3, 1671 /// XMPTYPE_BYSIZE = 4, 1672 /// XMPTYPE_TOOSHORT = 5, 1673 /// XMPTYPE_NOITEM = 6 1674 /// }; 1675 /// 1676 /// /* 1677 /// * XDR defining the placement of bulk items in the message being 1678 /// * sent. 1679 /// */ 1680 /// union xmdp_loc switch(xmdp_type type) 1681 /// 1682 /// case XMPTYPE_EXRW: 1683 /// rpcrdma1_segment xmdl_ex<>; 1684 /// case XMPTYPE_TBSN: 1685 /// xmdp_itemlen xmdl_offset; 1686 /// xmdp_tbsn xmdl_bsnum<>; 1687 /// case XMPTYPE_TOOSHORT: 1688 /// case XMPTYPE_NOITEM: 1689 /// void; 1690 /// }; 1691 /// 1692 /// 1693 /// 1694 /// struct xmdp_mitem { 1695 /// xmdp_vsdisp xmdmi_disp; 1696 /// xmdp_itemlen xmdmi_length; 1697 /// xmdp_loc xmdmi_where; 1698 /// }; 1699 /// 1700 /// typedef xmdp_mitem xmdp_grpinfo<>; 1701 /// 1702 /// /* 1703 /// * XDR defining the placement of bulk items in the response to the 1704 /// * message being sent. 1705 /// */ 1706 /// union xmdp_rsdloc switch(xmdp_type type) 1707 /// 1708 /// case XMPTYPE_EXRW: 1709 /// case XMPTYPE_CHOICE: 1710 /// rpcrdma1_segment xmdrsdl_ex<>; 1711 /// case XMPTYPE_BYSIZE: 1712 /// xmdp_itemlen xmdrsdl_dsdov; 1713 /// rpcrdma1_segment xmdrsdl_bsex<>; 1714 /// case XMPTYPE_TBSN: 1716 /// void; 1717 /// }; 1718 /// 1719 /// struct xmdp_rsdrange { 1720 /// xmdp_vsdisp xmdrsdr_begin; 1721 /// xmdp_vsdisp xmdrsdr_end; 1722 /// }; 1723 /// 1724 /// struct xmdp_rsditem { 1725 /// xmdp_itemlen xmdrsdi_minlen; 1726 /// xmdp_rsdloc xmdrsdi_loc; 1727 /// }; 1728 /// 1729 /// struct xmdp_rsdset { 1730 /// xmdp_rsdrange xmdrsds_range; 1731 /// xmdp_rsditem xmdrsds_items<>; 1732 /// }; 1733 /// 1734 /// typedef xmdp_rsdset xmdp_rsdgroup<>; 1735 /// 1736 /// /******************************************************************* 1737 /// * 1738 /// * New Transport Properties 1739 /// * 1740 /// ******************************************************************/ 1741 /// 1742 /// /* 1743 /// * New Transport Property codes 1744 /// */ 1745 /// const uint32 XPROP_RTRSUPP = 3; 1746 /// const uint32 XPROP_RBSTRUCT = 4; 1747 /// const uint32 XPROP_REQRXLIM = 5; 1748 /// const uint32 XPROP_RESPSXLIM = 6; 1749 /// 1750 /// /* 1751 /// * XDR relating to RTR Support Property 1752 /// */ 1753 /// typedef uint32 xpr_rtrs; 1754 /// 1755 /// const uint32 RTRS_XREQ = 1; 1756 /// const uint32 RTRS_XRESP = 2; 1757 /// const uint32 RTRS_XCONT = 4; 1758 /// 1759 /// /* 1760 /// * Items related to Receive Buffer Structure Property 1761 /// */ 1762 /// struct xmrbs_seg { 1763 /// uint32 xmrseg_length; 1764 /// uint32 xmrseg_align; 1765 /// uint32 xmrseg_flags; 1766 /// }; 1767 /// 1768 /// const uint32 XMRSFLAG_PLT = 0x01; 1769 /// 1770 /// struct xmrbs_group { 1771 /// uint32 xmrgrp_count; 1772 /// xmrbs_seg xmrgrp_info; 1773 /// }; 1774 /// 1775 /// struct xmrbs_buf { 1776 /// uint32 xmrbuf_length; 1777 /// xmrbs_group xmrbuf_groups<>; 1778 /// }; 1779 /// typedef xmrbs_buf xpr_rbs; 1780 /// 1781 /// /* 1782 /// * XDR relating to transmission limit properties 1783 /// */ 1784 /// typedef uint32 xpr_rqrxl; 1785 /// 1786 /// typedef uint32 xpr_rssxl; 1787 /// 1788 /// /******************************************************************* 1789 /// * 1790 /// * New OPTIONAL Message Types 1791 /// * 1792 /// ******************************************************************/ 1793 /// 1794 /// /* 1795 /// * New message type codes 1796 /// */ 1797 /// const uint32 ROPT_XMTREQ = 1; 1798 /// const uint32 ROPT_XMTRESP = 2; 1799 /// const uint32 ROPT_XMTCONT = 3; 1800 /// const uint32 ROPT_REPTERR = 4; 1801 /// 1802 /// 1803 /// /* 1804 /// * New message type to do the initial transmission of a request. 1805 /// */ 1806 /// struct optxmt_req { 1807 /// xmdp_grpinfo optxrq_dp; 1808 /// xmdp_rsdgroup optxrq_rsd; 1809 /// xms_grpxc optxrq_count; 1810 /// xms_grpxc optxrq_rsbuf; 1811 /// xmdp_pldisp optxrq_pslen; 1812 /// 1813 /// }; 1814 /// 1815 /// /* 1816 /// * New message type to do the initial transmission of a response. 1817 /// */ 1818 /// struct optxmt_resp { 1819 /// xmdp_grpinfo optxrs_dp; 1820 /// xms_grpxn optxrs_count; 1821 /// xmdp_pldisp optxrs_pslen; 1822 /// 1823 /// }; 1824 /// 1825 /// /* 1826 /// * New message type to transmit the continuation of a request or 1827 /// * response. 1828 /// */ 1829 /// struct optxmt_cont { 1830 /// xms_grpxn optxc_xnum; 1831 /// uint32 optxc_itype; 1832 /// xmdp_pldisp; optxc_pslen; 1833 /// }; 1834 /// 1835 /// /* 1836 /// * XDR definitions to support error reporting. 1837 /// */ 1838 /// enum optr_err { 1839 /// OPTRERR_BADHMT = 1, 1840 /// OPTRERR_BADOMT = 2, 1841 /// OPTRERR_BADCONT = 3, 1842 /// OPTRERR_BADSEQ = 4, 1843 /// OPTRERR_BADXID = 5, 1844 /// OPTRERR_BADOFF = 6, 1845 /// OPTRERR_BADTBSN = 7, 1846 /// OPTRERR_BADPL = 8 1847 /// } 1848 /// 1849 /// union optr_info switch(optr_err optre_which) { 1850 /// 1851 /// case OPTRERR_BADHMT: 1852 /// case OPTRERR_BADOMT: 1853 /// case OPTRERR_BADSEQ: 1854 /// case OPTRERR_BADXID: 1855 /// uint32 optri_expect; 1856 /// uint32 optri_current; 1857 /// 1858 /// case OPTRERR_BADCONT: 1859 /// void; 1860 /// 1861 /// 1862 /// case OPTRERR_BADTBSN: 1863 /// case OPTRERR_BADOFF: 1864 /// case OPTRERR_BADPL: 1865 /// uint32 optri_value; 1866 /// uint32 optri_min; 1867 /// uint32 optri_max; 1868 /// 1869 /// }; 1870 /// 1871 /// struct xms_id { 1872 /// uint32 xmsi_xid; 1873 /// msg_type xmsi_dir; 1874 /// xms_grpxn xmsi_seq; 1875 /// }; 1876 /// 1877 /// /* 1878 /// * New message type for error reporting. 1879 /// */ 1880 /// struct optrept_err { 1881 /// xms_id optre_bad; 1882 /// xms_id *optre_lead; 1883 /// optr_info optre_info; 1884 /// }; 1885 /// 1886 /// 1887 1889 11. Security Considerations 1891 The extension described has the same security considerations 1892 described in [rfc5666bis] and [rpcrdmav2]. With regard to the 1893 transport properties introduced in this document, it is possible that 1894 a man-in-the-middle could interfere with the communication of 1895 transport properties with possible negative effects. To prevent such 1896 interference, the steps described in [rpcrdmav2] should be attended 1897 to. 1899 The use of the techniques described in this document to reduce use of 1900 explicit RDMA operations raise important issues which implementers 1901 should consider: 1903 While the use of these techniques may be expedient in certain 1904 cases, their use is not likely to be universal, at least for a 1905 considerable time. As a result, implementers should remain aware 1906 of the issues discussed in Section 9.1 of [rfc5666bis], unless and 1907 until it is certain that none of a requesters memory can be 1908 registered for remote access. 1910 Extra care needs to be taken in cases in which padding needs to be 1911 inserted in a transmission to ensure that DDP-targetable data item 1912 will be received in an appropriately aligned buffer segment. In 1913 some implementations, sensitive data could be inadvertently sent 1914 within the padding. To prevent this, the padding can be zeroed or 1915 it can be sent from a pre-zeroed area using a gather list. 1917 12. IANA Considerations 1919 This document does not require any actions by IANA. 1921 13. References 1923 13.1. Normative References 1925 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1926 Requirement Levels", BCP 14, RFC 2119, 1927 DOI 10.17487/RFC2119, March 1997, 1928 . 1930 [RFC4506] Eisler, M., Ed., "XDR: External Data Representation 1931 Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 1932 2006, . 1934 [rfc5666bis] 1935 Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct 1936 Memory Access Transport for Remote Procedure Call", March 1937 2017, . 1940 Work in progress. 1942 13.2. Informative References 1944 [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 1945 "Network File System (NFS) Version 4 Minor Version 1 1946 External Data Representation Standard (XDR) Description", 1947 RFC 5662, DOI 10.17487/RFC5662, January 2010, 1948 . 1950 [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access 1951 Transport for Remote Procedure Call", RFC 5666, 1952 DOI 10.17487/RFC5666, January 2010, 1953 . 1955 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 1956 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 1957 January 2010, . 1959 [rpcrdmav2] 1960 Lever, C., Ed. and D. Noveck, "RPC-over-RDMA Version Two", 1961 May 2017, . 1964 Work in progress. 1966 Appendix A. Acknowledgements 1968 The author gratefully acknowledges the work of Brent Callaghan and 1969 Tom Talpey producing the original RPC-over-RDMA Version One 1970 specification [RFC5666] and also Tom's work in helping to clarify 1971 that specification. 1973 The author also wishes to thank Chuck Lever for his work resurrecting 1974 NFS support for RDMA in [rfc5666bis], for clarifying the relationshp 1975 between RDMA and direct data placement, and for beginning the work on 1976 RPC-over-RDMA Version Two. 1978 The extract.sh shell script and formatting conventions were first 1979 described by the authors of the NFSv4.1 XDR specification [RFC5662]. 1981 Author's Address 1983 David Noveck 1984 NetApp 1985 1601 Trapelo Road 1986 Waltham, MA 02451 1987 US 1989 Phone: +1 781 572 8038 1990 Email: davenoveck@gmail.com