idnits 2.17.1 draft-dnoveck-nfsv4-rpcrdma-rtrext-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 918 has weird spacing: '..._vsdisp xmd...' == Line 919 has weird spacing: '...itemlen xmdm...' == Line 1002 has weird spacing: '..._vsdisp xmd...' == Line 1003 has weird spacing: '..._vsdisp xmd...' == Line 1007 has weird spacing: '...itemlen xmdr...' == (23 more instances...) -- The document date (December 3, 2016) is 2693 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '0' on line 938 -- Obsolete informational reference (is this intentional?): RFC 5666 (Obsoleted by RFC 8166) -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 0 errors (**), 0 flaws (~~), 7 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 D. Noveck 3 Internet-Draft HPE 4 Intended status: Standards Track December 3, 2016 5 Expires: June 6, 2017 7 RPC-over-RDMA Extensions to Reduce Internode Round-trips 8 draft-dnoveck-nfsv4-rpcrdma-rtrext-01 10 Abstract 12 It is expected that a future version of the RPC-over-RDMA transport 13 will allow protocol extensions to be defined. This would provide for 14 the specification of OPTIONAL features allowing participants who 15 implement such features to cooperate as specified by that extension, 16 while still interoperating with participants who do not support that 17 extension. 19 A particular extension is described herein, whose purpose is to 20 reduce the latency due to inter-node round-trips needed to effect 21 operations which involve direct data placement or which transfer RPC 22 messages longer than the fixed inline buffer size limit. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on June 6, 2017. 41 Copyright Notice 43 Copyright (c) 2016 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 60 1.2. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 61 1.3. Prerequisites . . . . . . . . . . . . . . . . . . . . . . 3 62 1.4. Role Terminology . . . . . . . . . . . . . . . . . . . . 4 63 2. Extension Overview . . . . . . . . . . . . . . . . . . . . . 5 64 3. Direct Data Placement Features . . . . . . . . . . . . . . . 5 65 3.1. Current Situation . . . . . . . . . . . . . . . . . . . . 5 66 3.2. RDMA_MSGP . . . . . . . . . . . . . . . . . . . . . . . . 5 67 3.3. Send-based DDP . . . . . . . . . . . . . . . . . . . . . 7 68 3.4. Other DDP-Related Extensions . . . . . . . . . . . . . . 7 69 4. Message Continuation Feature . . . . . . . . . . . . . . . . 8 70 4.1. Current Situation . . . . . . . . . . . . . . . . . . . . 8 71 4.2. Message Continuation Changes . . . . . . . . . . . . . . 9 72 4.3. Message Continuation and Credits . . . . . . . . . . . . 9 73 5. Using Protocol Additions . . . . . . . . . . . . . . . . . . 10 74 5.1. New Operation Support . . . . . . . . . . . . . . . . . . 10 75 5.2. Message Continuation Support . . . . . . . . . . . . . . 11 76 5.3. Send-based DDP Support . . . . . . . . . . . . . . . . . 11 77 5.4. Error Reporting . . . . . . . . . . . . . . . . . . . . . 12 78 6. XDR Preliminaries . . . . . . . . . . . . . . . . . . . . . . 13 79 6.1. Message Continuation Preliminaries . . . . . . . . . . . 13 80 6.2. Data Placement Preliminaries . . . . . . . . . . . . . . 14 81 7. Data Placement Structures . . . . . . . . . . . . . . . . . . 17 82 7.1. Data Placement Overview . . . . . . . . . . . . . . . . . 17 83 7.2. Buffer Structure Definition . . . . . . . . . . . . . . . 18 84 7.3. Message DDP Structures . . . . . . . . . . . . . . . . . 20 85 7.4. Response Direction DDP Structures . . . . . . . . . . . . 21 86 8. Transport Properties . . . . . . . . . . . . . . . . . . . . 24 87 8.1. Property List . . . . . . . . . . . . . . . . . . . . . . 24 88 8.2. RTR Support Property . . . . . . . . . . . . . . . . . . 25 89 8.3. Receive Buffer Structure Property . . . . . . . . . . . . 25 90 8.4. Request Transmission Receive Limit Property . . . . . . . 26 91 8.5. Response Transmission Send Limit Property . . . . . . . . 26 92 9. New Operations . . . . . . . . . . . . . . . . . . . . . . . 26 93 9.1. Operations List . . . . . . . . . . . . . . . . . . . . . 27 94 9.2. Transmit Request Operation . . . . . . . . . . . . . . . 28 95 9.3. Transmit Response Operation . . . . . . . . . . . . . . . 28 96 9.4. Transmit Continue Operation . . . . . . . . . . . . . . . 29 97 9.5. Error Reporting Operation . . . . . . . . . . . . . . . . 30 98 10. XDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 99 10.1. Code Component License . . . . . . . . . . . . . . . . . 34 100 10.2. XDR Proper for Extension . . . . . . . . . . . . . . . . 36 101 11. Security Considerations . . . . . . . . . . . . . . . . . . . 41 102 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 103 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 42 104 13.1. Normative References . . . . . . . . . . . . . . . . . . 42 105 13.2. Informative References . . . . . . . . . . . . . . . . . 42 106 Appendix A. Acknowledgements . . . . . . . . . . . . . . . . . . 43 107 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 43 109 1. Preliminaries 111 1.1. Requirements Language 113 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 114 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 115 document are to be interpreted as described in [RFC2119]. 117 1.2. Introduction 119 This document describes a potential extension to the RPC-over-RDMA 120 protocol, which would allow participating implementations to have 121 more flexibility in how they use RDMA sends and receives to effect 122 necessary transmission of RPC requests and replies. 124 In contrast to existing facilities defined in RPC-over-RDMA Version 125 One in which the mapping between RPC messages and RPC-over-RDMA 126 messages is strictly one-to-one and DDP is effected only through use 127 of explicit RDMA operations, the following features are made 128 available through this extension: 130 o The ability to effect Direct Data Placement in the context of a 131 single RPC-over-RDMA transmission, rather than requiring explicit 132 RDMA operations to effect the necessary placement. 134 o The ability to continue an RPC request or reply over multiple RPC- 135 over-RDMA transmissions 137 1.3. Prerequisites 139 This document is written assuming that certain underlying facilities 140 will be made available to build upon, in the context of a future 141 version of RPC-over-RDMA. It is most likely that such facilities 142 will be first available in Version Two of RPC-over-RDMA. 144 o A protocol extension mechanism is needed to enable the extensions 145 to RPC-over-RDMA described here. 147 This document is currently written to conform to the extension 148 model for the proposed RPC-over-RDMA Version Two as described in 149 [rpcrdmav2]. 151 o An existing means of communicating transport properties between 152 the RPC-over-RDMA endpoints is assumed. 154 This document is currently written assuming the transport property 155 model defined in [rpcrdmav2] will be available and can be extended 156 to meet the needs of this extension. 158 As the document referred to above is currently a personal Internet 159 Draft, and subject to change, adjustments to this document are 160 expected to be necessary when and if the needed facilities are 161 defined in one or more working group documents. 163 1.4. Role Terminology 165 A number of different terms are used regarding the roles of the two 166 participants in an RPC-over-RMA connection. Some of these roles last 167 for the duration of a connection while others vary from request to 168 request or from message to message. 170 The roles of the client and server are fixed for the lifetime of the 171 connection, with the client defined as the endpoint which initiated 172 the connection. 174 The roles of requester and responder often parallel those of client 175 and server, although this is not always the case. Most requests are 176 made in the forward direction, in which the client is the requester 177 and the server is the responder. However, backward direction 178 requests are possible, in which case the server is the requester and 179 the client is the responder. As a result clients and servers may 180 both act as requesters and responders for different requests issued 181 on the same connection. 183 The roles of sender and receiver vary from message to messages. With 184 regard to the messages described in this document, the sender may act 185 as a requester by sending RPC requests or a responder by sending RPC 186 requests or as both at the same time by sending a mix of the two. 188 2. Extension Overview 190 This extension is intended to function as part of RPC-over-RDMA and 191 implementations should successfully interoperate with existing RPC- 192 over-RDMA Version One implementations. Nevertheless, this extension 193 seeks to take a somewhat different approach to high-performance RPC 194 operation than has been used previously in that it seeks to de- 195 emphasize the use of explicit RDMA operations. It does this in two 196 ways: 198 o By implementing a send-based form of Direct Data Placement (see 199 Section 3), use of explicit RDMA operations can be avoided in many 200 common cases in which data is directly placed. 202 o Use of explicit RDMA to support reply chunks and position-zero 203 read chunks can be avoided by allowing a single message to be 204 split into multiple transmissions. This can be used to avoid many 205 instances of the only existing use of explicit RDMA operations not 206 associated with Direct Data Placement. 208 While use of explicit RDMA operations allows the cost of the actual 209 data transfer to be offloaded from the client and server CPUs to the 210 RNIC, there are ancillary costs in setting up the transfer that 211 cannot be ignored. As a result, send-based functions are often 212 preferable, since the RNIC also uses DMA to effect these operations. 213 In addition, the cost of the additional inter-node round trips 214 required by explicit RDMA operation can be an issue, which can 215 becomes increasingly troublesome as internode distances increase. 216 Once one moves from in-machine-room to campus-wide or metropolitan- 217 area distances the additional round-trip delay of 16 microseconds per 218 mile becomes an issue impeding use of explicit RDMA operations. 220 3. Direct Data Placement Features 222 3.1. Current Situation 224 Although explicit RDMA operations are used in the existing RPC-over- 225 RDMA protocol for purposes unrelated to Direct Data Placement, all 226 DDP is effected using explicit RDMA operations. 228 As a result, many operations involving Direct Data Placement involve 229 multiple internode round trips. 231 3.2. RDMA_MSGP 233 Although this was not stated explicitly, it appears that RDMA_MSGP 234 (defined in [RFC5666], removed from RPC-over-RDMA Version One by 236 [rfc5666bis]), was an early attempt to effect correct placement of 237 bulk data within a single RPC-over-RDMA transmission. 239 As things turned out, the fields within the RDMA_MSGP header were not 240 described in [RFC5666] in a way that allowed this message type to be 241 implemented. 243 In attempting to provide DDP functionality, we have to keep in mind 244 and avoid the problems that led to failure of RDMA_MSGP. It appears 245 that the problems go deeper than neglecting to write a few relevant 246 sentences. It is helpful to note that: 248 o The inline message size limits eventually adopted were too small 249 to allow RDMA_MSGP to be used effectively. This is true of both 250 the 1K limit in Version One [rfc5666bis] and the 4K limit 251 specified in [rpcrdmav2]. 253 On the other hand, there is text within [RFC5667] that suggests 254 that much longer messages were anticipated at some points during 255 the evolution of RPC-over-RDMA. 257 o The fact that NFSv4 COMPOUNDs often have additional operations 258 beyond the one including the bulk data means that the RDMA_MSGP 259 model cannot be extended to NFSv4. As a result, the bulk data 260 needs to be excised from the data stream just as chunks are, so 261 that the payload stream can include non-bulk data both before and 262 after the logical position of the excised bulk data. 264 o In order for the sender to determine the appropriate amount of 265 padding necessary within a transmission to place the bulk data at 266 the proper position within receive buffer, the server must know 267 more about the structure of the receiver's buffers. Since the 268 padding needs to bring the bulk data to a position within the 269 buffer that is appropriate to receive the bulk data, the sender 270 needs to know where within the receive buffers such DDP-eligible 271 areas are located. 273 o While appropriate padding could place the bulk data within a large 274 WRITE into an appropriately aligned buffer or set of buffer, there 275 is no corresponding provision for the bulk data associated with a 276 READ. In short, there is no way to indicate to the responder that 277 it should use RDMA_MSGP to appropriately place bulk data in the 278 response. 280 o There is no explicit discussion of the required padding's use in 281 effecting proper data placement or connection with the ULB's 282 specification of DDP-eligible XDR. 284 To summarize, RDMA_MSGP was an attempt to properly place data which 285 was thought of as a local optimization and insufficient attention was 286 given to it to make it successful. As a result, as RPC-over-RDMA 287 Version One was developed, Direct Data Placement was identified with 288 the use of explicit RDMA operations, and the possibility of Data 289 Placement within sends was not recognized. 291 3.3. Send-based DDP 293 In this exension we will describe a more cmplete way to provide send- 294 based data placement, as follows: 296 o By defining the structure of receive buffers as a transport 297 property available to be interrogated by the peer implementation. 299 o By treating positioning of bulk data within a message as an 300 instance of DDP, causing the bulk data to be excised from the 301 payload XDR stream, as is the case with other forms of DDP. 303 o By defining new DDP control data structures that support both 304 send-based DDP and the form of DDP using explicit RDMA operations 305 that was specified in RPC-over-RDMA Version One. These new 306 control structures, described in Section 7.1 are organized 307 differently from the chunk-based structures described in 308 [rfc5666bis]. 310 3.4. Other DDP-Related Extensions 312 In order to support send-based DDP, new DDP-related data structures 313 have been defined, as described in Sections 7.3 and 7.4. 315 These new data structures support both send-based and RDMA-operation- 316 based DDP. In addition, because of the restructuring described in 317 Section 7.1, a number of additional facilities are made available: 319 o The ability to restrict entries regarding DDP in response data to 320 XDR data items generated in response to performing particular 321 constituent operations within a given RPC request (e.g. specific 322 operations within an NFSv4 COMPOUND). 324 o The ability to make use of DDP contingent on the actual length of 325 a DDP-eligible data item in the response. 327 o The ability to specify whether use of DDP for a particular DDP- 328 eligible data item is required or optional. 330 These additional facilities will be available to implementations that 331 do not support send-based DDP, as long as both parties support the 332 OPTIONAL Header types that include these new structures. For more 333 information about the relationships among, the new transport 334 properties, operations, and features, see Section 5. 336 4. Message Continuation Feature 338 4.1. Current Situation 340 Within RPC-over-RDMA Version One [rfc5666bis], each transmission of a 341 request or reply involves sending a single RDMA send message and 342 conversely each message-related transmission involves only a single 343 RPC request or reply. 345 This strict one-to-one model leads to some potential performance 346 issues. 348 o Because of RDMA's use of fixed-size receives, some requests and 349 replies will inevitably not fit in the limited space available, 350 even if they do not contain any DDP-eligible bulk data. 352 Such cases will raise performance issues because, to deal with 353 them, the server is interrupted twice to receive a single request 354 and all the necessary transfers are serialized. In particular, 355 there are two server interrupt latencies involved before the 356 server can process the actual request, in addition to the OTW 357 round-trip latencies. 359 o In the case of replies, there may be cases in which reply chucks 360 need to be allocated and registered even if the actual reply would 361 fit within the fixed receive-size limit. Because the decision to 362 create a reply chunk is made at the time the request is sent, even 363 an extremely low probability of a longer reply will trigger 364 allocation of a reply chunk. 366 Because this decision is made in conformance with ULB rules, 367 which, by their nature, may only reference a limited set of data, 368 a reply chunk may be required even when the actual probability of 369 a long reply is exactly zero. For example a GETATTR request can 370 generate a long reply due to a long ACL, and thus COMPOUND with 371 this operation might allocate a reply chunk, even if the specific 372 file system being interrogated only supports ACLs of limited 373 sizes, or the GETATTR in question does not interrogate one of the 374 ACL attributes. Also, the OWNER attribute is a string and may be 375 impossible to determine a priori that the owner of any particular 376 file has no chance of requiring more than 4K bytes of space, for 377 example. The assumption there are no such user names, while it 378 probably is valid, is not a fact that RPC-over-RDMA 379 implementations can depend on. 381 4.2. Message Continuation Changes 383 Continuing a single RPC request or reply is addressed by defining 384 separate optional header types to begin and to continue sending a 385 single RPC message. This is instead of creating a header with a 386 continuation bit. In this approach, all of the DDP-related fields, 387 which include support for send-based DDP, appear in the starting 388 header (of types ROPT_XMTREQ and ROPT_XMTRESP) and apply to the RPC 389 message as a whole. 391 Later RPC-over-RDMA messages (of type ROPT_XMTCONT) may extend the 392 payload stream and/or provide additional buffers to which bulk data 393 can be directed. 395 In this case, all of the RPC-over-RDMA messages used together are 396 referred to as a transmission group and must be received in order 397 without any intervening message. 399 In implementations using this optional facility, those decoding RPC 400 messages received using RPC-over-RDMA no longer have the assurance 401 that that each RPC message is in a contiguous buffer. As most XDR 402 implementations are built based on the assumption that input will not 403 be contiguous, this will not affect performance in most cases. 405 4.3. Message Continuation and Credits 407 Using multiple transmissions to send a single request or response can 408 complicate credit management. In the case of the message 409 continuation feature, deadlocks can be avoided because use of message 410 continuation is not obligatory. The requester or responder can use 411 explicit RDMA operations if sufficient credits to use message 412 continuation are not available. 414 A requester is well positioned to make this choice with regard to the 415 sending of requests. The requester must know, before sending a 416 request, how long it will be, and therefore, how many credits it 417 would require to send the request using message continuation. If 418 these are not available, it can avoid message continuation by either 419 creating read chunks sufficient to make the payload stream fit in a 420 single transmission or by creating a position-zero read chunk. 422 With regard to the response, the requester is not in position to know 423 exactly how long the response will be. However, the ULB will allow 424 the maximum response length to be determined based on the request. 425 This value can be used: 427 o To determine the maximum number of receive buffers that might be 428 required to receive any response sent. 430 o To allocate and register a reply chunk to hold a possible large 431 reply. 433 The requester can avoid doing the second of these if the responder 434 has indicated it can use message continuation to send the response. 435 In this case, it makes sure that the buffers will be available and 436 indicates to the responder how many additional buffers (in the form 437 of pre-posted reads have been made available to accommodate 438 continuation transmissions. 440 When the responder processes the request, those additional receive 441 buffers may be used or not, or used only in part. This may be 442 because the response is shorter than the maximum possible response, 443 or because a reply chunk was used to transmit the response. 445 After the first or only transmission associated with the response is 446 received by the requester, it can be determined how many of the 447 additional buffers were used for the response. Any unused buffers 448 can be made available for other uses such as expanding the pool of 449 receive buffers available for the initial transmissions of response 450 or for receiving opposite direction requests. Alternatively, they 451 can be kept in reserve for future uses, such as being made available 452 to future requests which have potentially long responses. 454 5. Using Protocol Additions 456 In using existing RPC-over-RDMA facilities for protocol extension, 457 interoperability with existing implementations needs to be assured. 458 Because this document describes support for multiple features, we 459 need to clearly specify the various possible extensions and how peers 460 can determine whether certain facilities are supported by both ends 461 of the connection. 463 5.1. New Operation Support 465 Note that most of the new operations defined in this extension are 466 not tightly tied to a specific feature. XOPT_XMTREQ and XOPT_XMTRESP 467 are designed to support implementations that support either or both 468 Send-based DDP or message continuation. However, the converse is not 469 the case and these header types can be implemented by those not 470 supporting either of these features. For example, implementations 471 may only need support for the facilities described in Section 3.4. 473 Implementations may determine whether a peer implementation supports 474 XOPT_XMTREQ, XOPT_XMTREQ, or XOPT_XMTCONT by attempting these 475 operations. An alternative is to interrogate the RTR Support 476 Property for information about which operations are supported. 478 5.2. Message Continuation Support 480 Implementations may determine and act based on the level of peer 481 implementation of support for message continuation as follows: 483 o To deal with issues relating to sending the peer multi- 484 transmission requests, the requester can interrogate the peer's 485 value of the Request Transmission Receive Limit (Section 8.4). In 486 cases in which the property is not provided or has the value one, 487 the requester implementation can avoid sending multi-transmission 488 requests, and use the equivalent of position-zero read chunks to 489 convey a request larger than the receive buffer limit. 491 Similarly, if the request is longer than can fit in a set of 492 transmissions given that limit, the request can be conveyed in the 493 same fashion, 495 o To deal with issues relating to sending the peer multi- 496 transmission responses, responders will only send multi- 497 transmission responses for requests conveyed using XOPT_XMTREQ 498 where the number of response transmissions is less than or equal 499 to buffer reservation count (in the field optxrq_rsbuf). The 500 requester can avoid receiving a message consisting of too many 501 transmissions by setting this field appropriately. This includes 502 the case in which the requester cannot handle any multi- 503 transmission responses. 505 o To avoid reserving receive buffers that the responder is not 506 prepared to use, the requester can interrogate the peer's value of 507 the Response Transmission Send Receive Limit (Section 8.5). In 508 cases in which it is possible that a request might result in a 509 response too large for this set of buffers, the requester, the 510 requester can provide a reply chunk to receive the response, which 511 the responder can use if the count of buffers provided is 512 insufficient. 514 5.3. Send-based DDP Support 516 Implementations may determine and adapt to the level of peer 517 implementation support for send-based DDP as described below. Note 518 that an implementation may be able to send messages containing bulk 519 data items placed using send-based DDP while not being prepared to 520 receive them, or the reverse. 522 o The requester can interrogate the responder's Receive Buffer 523 Structure Property. In cases in which the property is not 524 provided or shows no DDP-targetable buffer segments, an 525 implementation knows that messages containing bulk data may not be 526 sent using send-based DDP. In such cases, when XOPT_XMTREQ is 527 used to send a request, bulk items may be transferred by setting 528 the associated DDP information to indicate that the bulk data is 529 to be fetched using explicit RDMA operations. 531 o In cases in which a requester is unprepared to accept messages 532 using send-based DDP, its Receive Buffer Structure Property will 533 make this clear to the responder. Nevertheless, the requester 534 will generally indicate to the responder that bulk data items are 535 to be returned using explicit RDMA operations. As a result, 536 requesters may use XOPT_XMTREQ (and get the benefit of the DDP- 537 related features discussed in Section 3.4 even if they support 538 neither message continuation nor send-based DDP. 540 o Since it is possible for a responder to generate responses 541 containing bulk data using send-based DDP even if it is not 542 prepared to send such message, a requester who is prepared to 543 accept such messages should specify in the request that the 544 responses are to contain (or may contain) bulk data placed in this 545 way. In deciding whether this is to be done the requester can 546 interrogate the responder's RTR Support Property for information 547 about which whether the peer can send responses in this form. It 548 can do this without regard to whether the responder can accept 549 messages containing bulk data items placed using send-based DDP. 551 In determining whether bulk data will be placed using send-based DDP 552 or via explicit RDMA operations, the level of support for message 553 continuation will have a role. This is because DDP using explicit 554 RDMA will reduce message size while send-based DDP reduces the size 555 of the payload stream by rearranging the message, leaving the message 556 size the same. As a result, the considerations discussed in 557 Section 4.3 will have to be attended to by the sender in determining 558 which form of DDP is to be used. 560 5.4. Error Reporting 562 The more extensive transport layer functionality described in this 563 document requires its own means of reporting errors, to deal with 564 issues that are distinct from: 566 o Errors (including XDR errors) in the XDR stream as received by 567 responder or requester. 569 o XDR errors detected in the XDR headers defined by the base 570 protocol. 572 o XDR errors detected in the new operations defined in this 573 document. 575 Beyond the above, the following sorts of errors will have to be dealt 576 with, depending on which of the features of the extension are 577 implemented. 579 o Information associated with send-based DDP may be inconsistent or 580 otherwise invalid, even though it conforms to the XDR definition. 582 o There may be problems with the organization of transmission groups 583 in that there are missing or extraneous transmissions. 585 In each of the above cases, the problem will be reported to the 586 sender using the Error Reporting operation which needs to be 587 supported by every endpoint that sends ROPT_XMTREQ, ROPT_XMTRESP, or 588 ROPT_XMTCONT. This includes cases in which the problem is one with a 589 reply. The function of the Error Reporting operation is to aid in 590 diagnosing transport protocol errors and allowing the sender to 591 recover or decide recovery is not possible. Reporting failure to the 592 requesting process is dealt with indirectly. For example, 594 o When the transmissions used to send a request are ill-formed, the 595 requestor can respond to the error indication by proceeding to 596 send the request using existing (i.e. non-extended) facilities. 597 If it chooses not to do so, the requestor can report an RPC 598 request failure to the initiator of the RPC. 600 o When the transmissions used to send a response are ill-formed, the 601 responder need to know about the problem since it will otherwise 602 assume that the transmissions succeeded. It can proceed to resend 603 the reply using existing (i.e. non-extended) facilities. If it 604 chooses not to do so, the requester will not see a response and 605 eventually an RPC timeout will occur. 607 6. XDR Preliminaries 609 6.1. Message Continuation Preliminaries 611 In order to implement message continuation, we have occasion to refer 612 to particular RPC-over-RDMA transmissions within a transmission group 613 or to characteristics of a later transmission group. 615 617 typedef uint32 xms_grpxn; 618 typedef uint32 xms_grpxc; 619 struct xms_id { 620 uint32 xmsi_xid; 621 msg_type xmsi_dir; 622 xms_grpxn xmsi_seq; 623 } 625 627 An xms_grpxn designates a particular RPC-over-RDMA transmission 628 within a set of transmissions devoted to sending a single RPC 629 message. 631 An xms_grpxc specifies the number of RPC-over-RDMA transmissions in a 632 potential group of transmissions devoted to sending a single RPC 633 message. 635 6.2. Data Placement Preliminaries 637 Data structures related to data placement use a number of XDR 638 typedefs to help clarify the meaning of fields in the data structures 639 which use these typedefs. 641 643 typedef uint32 xmddp_itemlen; 644 typedef uint32 xmddp_pldisp; 645 typedef uint32 xmddp_vsdisp; 647 typedef uint32 xmddp_tbsn; 649 enum xmddp_type { 650 XMDTYPE_EXRW = 1, 651 XMDTYPE_TBSN = 2, 652 XMDTYPE_CHOOSE = 3, 653 XMDTYPE_BYSIZE = 4, 654 XMDTYPE_TOOSHORT = 5, 655 XMDTYPE_NOITEM = 6 656 }; 658 660 An xmddp_itemlen specifies the length of XDR item. Because items 661 excised from the XDR stream are XDR items, lengths of items excised 662 from the XDR stream are denoted by xmddp_itemlens. 664 An xmddp_pldisp specifies a specific displacement with the payload 665 stream associated with a single RPC-over-RDNA transmission or a group 666 of such transmissions. Note that when multiple transmissions are 667 used for a single message, all of the payload streams within a 668 transmission group are considered concatenated. 670 An xmddp_vsdisp specifies a displacement within the virtual XDR 671 stream associates with the set of RPC messages transferred by single 672 RPC-over-RDNA transmission or a group of such transmissions. The 673 virtual XDR stream includes bulk data excised from the payload stream 674 and so displacements within it reflect those of the corresponding 675 objects in the XDR stream that might be sent and received if no bulk 676 data excision facilities were involved in the RPC transmission. 678 An xmddp_tbsn designates a particular target buffer segment within a 679 (trivial or non-trivial) RPC-over-RDMA transmission group. Each DDP- 680 targetable buffer segment is assigned a number starting with zero and 681 proceeding through all the buffer segments for all the RPC-over-RDMA 682 transmissions in the group. This includes buffer segments not 683 actually used because transmission are shorter than the maximum size 684 and those in which a DDP-targetable buffer segment is used to hold 685 part of the payload XDR stream rather than bulk data. 687 An xmddp_type allows a selection between DDP using explicit RDMA 688 operations and that using send-based DDP. It is used in a number of 689 contexts. The specific context governs which subset of the types is 690 valid: 692 o In request messages, they indicate where each of the directly- 693 placed data items within the request has been placed. In this 694 case, xmddp_type appears as the discriminator within an xmddp_loc 695 which is part of an xmddp_mitem that is an element within a 696 request's optxrq_ddp field. 698 o In request messages, they direct the responder as to where 699 potential directly-placed items are to be placed. In this case, 700 xmddp_type appears as the discriminator within an xmddp_rsdloc 701 which is part of an xmddp_rsditem that is an element within a 702 request's optxrq_rsd field. 704 o In response messages, they indicate how each of the potential 705 directly-placed items has been dealt with. A subset of these are 706 directly-placed data items and are presented in the same form as 707 that used for directly-placed data items within a request. In 708 this case, xmddp_type appears as the discriminator within an 709 xmddp_loc which is part of an xmddp_mitem that is an element 710 within a response's optxrs_ddp field. 712 A number of these type are valid in all of these contexts, since they 713 specify use of a specific mode of direct placement which is to be 714 used or has been used. 716 o XMDTYPE_EXRW selects DDP using explicit RDMA reads and writes. 718 o XMDTYPE_TBSN selects use of send-based DDP in which DDP-eligible 719 data is located in DDP-targetable buffer segments. 721 Another set of types is used to direct the use of specific sets of 722 types but cannot specify an actual choice that has been made. 724 o XMDTYPE_CHOICE indicates that the responder may use either send- 725 based DDP or chunk-based DDP using explicit RDMA operations, with 726 a place for the latter having been provided by the requester. 728 o XMDTYPE_BYSIZE indicates that the responder is to use either send- 729 based DDP or chunk-based DDP using explicit RDMA operations, with 730 the choice between the two governed by the actual size of the 731 associated DDP-eligible XDR item. 733 The following types are used when no actual direct placement has 734 occurred. They are used in responses to indicate ways in which a 735 direction to govern DDP in a reply was responded to without resulting 736 in direct placement. 738 o XMDTYPE_TOOSHORT indicates that the corresponding entry in an 739 xmddp_rsdset was matched with a DDP-eligible item which was too 740 small to be handled using direct placement, resulting in the DDP- 741 eligible item being placed inline. 743 o XMDTYPE_NOITEM indicates that the corresponding entry in an 744 xmddp_rsdset was not matched with a DDP-eligible item in the 745 reply. 747 The following table indicates which of the above types is valid in 748 each of the contexts in which these types may appear. For valid 749 occurrences, it distinguishes those which give sender-generated 750 information about the message, and those that direct reply 751 construction, from those that indicate how those directions governed 752 the construction of a reply. For invalid occurrences, we distinguish 753 between those that result in XDR decode errors and those which are 754 valid from the XDR point of view but are semantically invalid. 756 +------------------+--------------+-----------------+---------------+ 757 | Type | xmddp_loc in | xmddp_rsdloc in | xmddp_loc in | 758 | | request | request | response | 759 +------------------+--------------+-----------------+---------------+ 760 | XMDTYPE_EXRW | Valid Info | Valid Direction | Valid Result | 761 | XMDTYPE_TBSN | Valid Info | Valid Direction | Valid Result | 762 | XMDTYPE_BYSIZE | XDR Invalid | Valid Direction | XDR Invalid | 763 | XMDTYPE_CHOICE | XDR Invalid | Valid Direction | XDR Invalid | 764 | XMDTYPE_TOOSHORT | Sem. Invalid | XDR Invalid | Valid Result | 765 | XMDTYPE_NOITEM | Sem. Invalid | XDR Invalid | Valid Result | 766 +------------------+--------------+-----------------+---------------+ 768 Table 1 770 7. Data Placement Structures 772 7.1. Data Placement Overview 774 To understand the new DDP structure defined here, it is necessary to 775 review the existing DDP structures used in RPC-over-RDMA Version One 776 and look at the corresponding structures in the new message 777 transmission headers defined in this document. 779 We look first at the existing structures. 781 o Read chunks are specified on requests to indicate data items to be 782 excised from the payload stream and fetched from the requester's 783 memory by the responder. As such, they serve as a means of 784 supplying data excised from the payload XDR stream. 786 Read chunks appear in replies but they have no clear function 787 there. 789 o Write chunks are specified on requests to provide locations in 790 requester memory to which DDP-eligible items in the corresponding 791 reply are to be transferred. They do not describe data in the 792 request but serve to direct reply construction. 794 When write chunks appear in replies they serve to indicate the 795 length of the data transferred. The addresses to which the bulk 796 reply data has been transferred is available, but this information 797 is already known to the requester. 799 o Reply chunks are specified to provide a location in the 800 requester's memory to which the responder can transfer the 801 response using RDMA Write. Like write chunks, they do not 802 describe data in the request but serve to direct reply 803 construction. 805 When reply chunks appear in reply message headers, they serve 806 mainly to indicate whether the reply chunk was actually used. 808 Within the DDP structures defined here a different organization is 809 used, even where DDP using explicit RDMA operations in supported. 811 o All messages that contain bulk data contain structures that 812 indicate where the excised data is located. See Section 7.3 for 813 details. 815 o Requests that might generate replies containing bulk data contain 816 structures that provide guidance as to where the bulk data is to 817 be placed. See Section 7.4 for details. 819 Both sets of data structure are defined at the granularity of an RPC- 820 over-RDMA transmission group. That is, they describe the placement 821 of data within an RPC message and the scope of description is not 822 limited to a single RPC-over-RDMA transmission. 824 7.2. Buffer Structure Definition 826 Buffer structure definition information is used to allow the sender 827 to know how receive buffers are constructed, to allow it to 828 appropriately pad messages being sent so that bulk data will be 829 received into a memory area with the appropriate characteristics. 831 In this case, Direct Data Placement will not place data in a specific 832 address, picked and registered in advance as is done to effect DDP 833 using explicit RDMA operations. Instead, a message is sent so that 834 when it is matched with one of the preposted receives, the bulk data 835 will be received into a memory area with the appropriate 836 characteristics, including: 838 o size 840 o alignment 842 o DDP-targetability and potentially other memory characteristics 843 such as speed, persistence. 845 847 struct xmrbs_seg { 848 uint32 xmrseg_length; 849 uint32 xmrseg_align; 850 uint32 xmrseg_flags; 851 }; 853 const uint32 XMRSFLAG_DDP = 0x01; 855 struct xmrbs_group { 856 uint32 xmrgrp_count; 857 xmrbs_seg xmrgrp_info; 858 }; 860 struct xmrbs_buf { 861 uint32 xmrbuf_length; 862 xmrbs_group xmrbuf_groups<>; 863 }; 865 867 Buffers can be, and typically are, structured to contain multiple 868 segments. Preposted receives that target a buffer uses a scatter 869 list to place received messages in successive buffer segments. 871 An xmrbs_seg defines a single buffer segment. The fields included 872 are: 874 o xmrseg_length is the length of this contiguous buffer segment 876 o xmrseg_align specifies the guaranteed alignment for the 877 corresponding buffer segment. 879 o xmrseg_flags which specify some noteworthy characteristics of the 880 associated buffer segment. 882 The following flag bit is the only one currently defined: 884 o XMRSFLAG_DDP indicates that the buffer segment in question is to 885 be considered suitable as a target for direct data placement. 887 An xmrgs_group designates a set of buffer segment all with the same 888 buffer segment characteristics as indicated by xmr_grpinfo. The 889 buffer segments are contiguous within the buffer although they are 890 likely not to be physically contiguous. 892 An xmrbs_buf defines a receiver's buffer structure and consists of 893 multiple xmrbs_groups. This buffer structure, when made available as 894 a transport property, allows the sender to structure transmissions so 895 as to place DDP-eligible data in appropriate target buffer segments. 897 7.3. Message DDP Structures 899 These data structures show where in the virtual XDR stream for the 900 set of messages, data is to be excised from that XDR stream and where 901 that excised bulk data is to be found instead. 903 905 union xmddp_loc switch(xmddp_type type) 907 case XMDTPE_EXRW: 908 rpcrdma1_segment xmdl_ex<>; 909 case XMDTYPE_TBSN: 910 xmddp_itemlen xmdl_offset; 911 xmddp_tbsn xmdl_bsnum<>; 912 case XMDTYPE_TOOSHORT: 913 case XMDTYPE_NOITEM: 914 void; 915 }; 917 struct xmddp_mitem { 918 xmddp_vsdisp xmdmi_disp; 919 xmddp_itemlen xmdmi_length; 920 xmddp_loc xmdmi_where; 921 }; 923 typedef xmddp_mitem xmddp_grpinfo<>; 925 927 An xmddp_loc shows where a particular piece of bulk data is located. 928 This information exists in multiple forms. 930 o The case for DDP using explicit RDMA operations, contains, in 931 xmdl_ex an array of rpcrdma1_segments showing where bulk data is 932 to be fetched from or has been transferred to. 934 o The case for send-based DDP contains, in xmdl_tbsn an array DDP- 935 targetable buffer segments, indicating where bulk data, excised 936 from the payload stream, is actually located. The bulk data 937 starts xmdl_offset bytes into the buffer segment designated by 938 xmdl_bsnum[0] and then proceeds through buffer segments denoted by 939 successive xmdl_bsnum entries until the length of the data item is 940 exhausted. 942 o The cases for XMDDP_TOOSHORT and XMDDP_NPITEM are only valid in 943 responses 945 An xmddp_mitem denotes a specific item of bulk data. It consists of: 947 o The XDR stream displacement of the bulk data excised from the 948 payload stream, in xmdmi_disp. 950 o The length of the data item, in xmdmi_length. 952 o The actual location of the bulk data, in xmdmi_loc. 954 An xmddp_grpinfo consists of an array of xmddp_mitems describing all 955 of the bulk data excised from all RPC messages sent in a single RPC- 956 over-RDMA transmission group. Some possible cases: 958 o The array is of length zero, indicating that there is no DDP- 959 eligible data excised from the virtual XDR stream. In this case, 960 the virtual XDR stream and the payload stream are identical. 962 o The array consists of one or more xmddp_mitems, each of whose 963 xmdmi_where fields is of type XMDTPE_EXRW. In this case, the DDP 964 data corresponds to read chunks in the case in which a request is 965 being sent and to write chunks in the case in which a reply is 966 being sent. 968 o The array consists of one or more xmddp_mitems, each of whose 969 xmdmi_where fields is of type XMDTPE_TBSN. In this case, each 970 entry, whether it applies to bulk data in a request or a reply, 971 describes data logically part of the message being sent, which may 972 be part of any RPC-over-RDMA transmissions in the same 973 transmission group. 975 o The array consists of one or more xmddp_mitems, with xmdmi_where 976 fields of a mixture of types, In this case, each entry, whether it 977 applies to bulk data in a request or a reply, describes data 978 logically part of the message being sent, although the method of 979 getting access to that data may vary from entry to entry. 981 7.4. Response Direction DDP Structures 983 These data structures, when sent as part of the request, instruct the 984 responder how to use Direct Data Placement to place response data 985 subject to direct data placement. 987 989 union xmddp_rsdloc switch(xmddp_type type) 991 case XMDTPE_EXRW: 992 case XMDTPE_CHOICE: 993 rpcrdma1_segment xmdrsdl_ex<>; 994 case XMDTPE_BYSIZE: 995 xmddp_itemlen xmdrsdl_dsdov; 996 rpcrdma1_segment xmdrsdl_bsex<>; 997 case XMDTYPE_TBSN: 998 void; 999 }; 1001 struct xmddp_rsdrange { 1002 xmddp_vsdisp xmdrsdr_begin; 1003 xmddp_vsdisp xmdrsdr_end; 1004 }; 1006 struct xmddp_rsditem { 1007 xmddp_itemlen xmdrsdi_minlen; 1008 xmddp_rsdloc xmdrsdi_loc; 1009 }; 1011 struct xmddp_rsdset { 1012 xmddp_rsdrange xmdrsds_range; 1013 xmddp_rsditem xmdrsds_items<>; 1014 }; 1016 typedef xmddp_rsdset xmddp_rsdgroup<>; 1018 1020 An xmddp_rsdloc contains information specifying where bulk data 1021 generated as part of a reply is to be placed. This information is 1022 defined as a union with the following cases: 1024 o The case for DDP using explicit RDMA operations, XMDTYPE_EXRW, 1025 contains, in xmrsdl_ex, an array of rpcrdma1_segments showing 1026 where bulk data generated by the corresponding reply is to be 1027 transferred to. 1029 o The case allowing the responder to freely choose the DDP method, 1030 XMDTYPE_CHOICE, is identical. It also contains, in xmrsdl_ex, an 1031 array of rpcrdma1_segments showing where bulk data generated by 1032 the corresponding reply is to be transferred to if explicit RDMA 1033 requests are to be used. 1035 o The case for send-based DDP, XMDTYPE_TBSN, is void, since the 1036 decisions as to where bulk data is to be placed are made by the 1037 responder. 1039 o In the case directing the responder to choose the DDP method based 1040 on item size, XMDTYPE_BYSIZE, an array of rpcrdma1_segments is in 1041 xmrsdl_bsex. 1043 In all cases, each xmddp_rsdloc sent as part of a request has a 1044 corresponding xmddp_loc in the associated response. The xmddp_type 1045 specified in the request will affect the type in the response, but 1046 the types are not necessarily the same. The table below describes 1047 the valid combinations of request and response xmddp_type values. 1049 In this table, rows correspond to types in requests directing, the 1050 responder as to the desired placement in the response while the 1051 columns correspond to types in the ensuing response. Invalid 1052 combinations are labelled "Inv" while valid combination are labelled 1053 either "NDR" denoting no need to deregister memory, or "DR" to 1054 indicate that memory previously registered will need to be 1055 deregistered. 1057 +---------+--------+--------+-----------+---------+ 1058 | Type | EXRW | TBSN | TOOSHORT | NOITEM | 1059 +---------+--------+--------+-----------+---------+ 1060 | EXRW | DR | Inv. | DR | DR | 1061 | TBSN | Inv. | NDR | NDR | NDR | 1062 | CHOICE | DR | NDR | DR | DR | 1063 | BYSIZE | DR | NDR | DR | DR | 1064 +---------+--------+--------+-----------+---------+ 1066 Table 2 1068 An xmddp_rsdrange denotes a range of positions in the XDR stream 1069 associated with a request. Particular directions regarding bulk data 1070 in the corresponding response are limited to such ranges, where 1071 response XDR stream positions and request XDR stream positions can be 1072 reliably tied together. 1074 When the ULP supports multiple individual operations per RPC request 1075 (e.g., COMPOUND and CB_COMPOUND in NFSv4), an xmd_rsdrange can 1076 isolate elements of the reply due to particular operations. 1078 An xmddp_rsditem specifies the handling of one potential item of bulk 1079 data. The handling specified is qualified by a length range. If the 1080 item is smaller than xmdrsdi_minlen, it is not treated as bulk data 1081 and the corresponding data item appears in the payload stream, while 1082 that particular xmddp_rsditem is considered used up, making the next 1083 xmddp_rsditem in the xmddp_rsdset the target of the next DDP-eligible 1084 data item in the reply. Note that in the case in which xmdrsdi_loc 1085 specifies use of explicit RDMA operations, the area specified is not 1086 used and the requester is responsible for deregistering it. 1088 For each xmddp_rsditem, there will be a corresponding xmddp_mitem 1090 An xmddp_rsdset contains a set of xmddp_rsditems applicable to a 1091 given xmddp_range in the request. 1093 An xmddp_rsdgroup designates a set of xmddp_rsdsets applicable to a 1094 particular RPC-over-RDMA transmission group. The xmdrsds_range 1095 fields of successive xmddp_rsdsets must be disjoint and in strictly 1096 increasing order. 1098 8. Transport Properties 1100 8.1. Property List 1102 In this document we take advantage of the fact that the set of 1103 transport properties defined in [rpcrdmav2] is subject to later 1104 extension. The additional transport properties are summarized below 1105 in Table 3. 1107 In that table the columns have the following values: 1109 o The column labeled "property" identifies the transport property 1110 described by the current row. 1112 o The column labeled "#" specifies the propid value used to identify 1113 this property. 1115 o The column labeled "XDR type" gives XDR type of the data used to 1116 communicate the value of this property. This data overlays the 1117 nominally opaque field pv_data in a propval. 1119 o The column labeled "default" gives the default value for the 1120 property which is to be assumed by those who do not receive, or 1121 are unable to interpret, information about the actual value of the 1122 property. 1124 o The column labeled "section" indicates the section (within this 1125 document) that explains the semantics and use of this transport 1126 property. 1128 +------------------------------+----+-----------+---------+---------+ 1129 | property | # | XDR type | default | section | 1130 +------------------------------+----+-----------+---------+---------+ 1131 | RTR Support | 3 | uint32 | 0 | 8.2 | 1132 | Receive Buffer Structure | 4 | xmrbs_buf | Note1 | 8.3 | 1133 | Request Transmission Receive | 5 | xms_grpxc | 1 | 8.4 | 1134 | Limit | | | | | 1135 | Response Transmission Send | 6 | xms_grpxc | 1 | 8.5 | 1136 | Limit | | | | | 1137 +------------------------------+----+-----------+---------+---------+ 1139 Table 3 1141 The following notes apply to the above table: 1143 1. The default value for the Receive Buffer Structure always 1144 consists of a single buffer segment, without any alignment 1145 restrictions and not targetable for DDP. The length of that 1146 buffer segment derives from the Receive Buffer Size Property if 1147 available, and from the default receive buffer size otherwise. 1149 8.2. RTR Support Property 1151 1153 const uint32 XPROP_RTRSUPP = 3; 1154 typedef uint32 xpr_rtrs; 1156 const uint32 RTRS_XREQ = 1; 1157 const uint32 RTRS_XRESP = 2; 1158 const uint32 RTRS_XCONT = 4; 1160 1162 8.3. Receive Buffer Structure Property 1164 This property defines the structure of the endpoint's receive 1165 buffers, in order to give a sender the ability to place bulk data in 1166 specific DDP-targetable buffer segments. 1168 1170 const uint32 XPROP_RBSTRUCT = 4; 1171 typedef xmrbs_buf xpr_rbs; 1173 1174 Normally, this property, if specified, should be in agreement with 1175 Receive Buffer Size Property. However, the following rules apply. 1177 o If the value of Receive Buffer Structure Property is not 1178 specified, it is derived from the Receive Buffer Size Property, if 1179 known, and the default buffer size otherwise. The buffer is 1180 considered to consist of a single non-DDP-targetable segment whose 1181 size is the buffer size. 1183 o If the value of Receive Buffer Size Property is not specified and 1184 the Receive Buffer Structure Property is specified, the value of 1185 the former is derived from the latter, by adding up the length of 1186 all buffer segments specified. 1188 8.4. Request Transmission Receive Limit Property 1190 This property specifies the length of the longest request messages 1191 (in terms of number of transmissions) that a responder will accept. 1193 1195 const uint32 XPROP_REQRXLIM = 5; 1196 typedef uint32 xpr_rqrxl; 1198 1200 A requester can use this property to determine whether to send long 1201 requests by using message continuation or by using a position-zero 1202 read chunk. 1204 8.5. Response Transmission Send Limit Property 1206 This property specifies the length of the longest response message 1207 (in terms of number of transmissions) that a responder will generate. 1209 1211 const uint32 XPROP_RESPSXLIM = 6; 1212 typedef uint32 xpr_rssxl; 1214 1216 9. New Operations 1217 9.1. Operations List 1219 The proposed new operation are set for in Table 4 below. In that 1220 table, the columns have the following values: 1222 o The column labeled "operation" specifies the particular operation. 1224 o The column labeled "#" specifies the value of opttype for this 1225 operation. 1227 o The column labeled "XDR type" gives XDR type of the data structure 1228 used to describe the information in this new message type. This 1229 data overlays the nominally opaque field optinfo in an 1230 RDMA_OPTIONAL message. 1232 o The column labeled "msg" indicates whether this operation is 1233 followed (or not) by an RPC message payload (or something else). 1235 o The column labeled "section" indicates the section (within this 1236 document) that explains the semantics and use of this optional 1237 operation. 1239 +--------------------+----+--------------+--------+----------+ 1240 | operation | # | XDR type | msg | section | 1241 +--------------------+----+--------------+--------+----------+ 1242 | Transmit Request | 5 | optxmt_req | Note1 | 9.2 | 1243 | Transmit Response | 6 | optxmt_resp | Note1 | 9.3 | 1244 | Transmit Continue | 7 | optxmt_cont | Note2 | 9.4 | 1245 | Report Error | 8 | optrept_err | No. | 9.5 | 1246 +--------------------+----+--------------+--------+----------+ 1248 Table 4 1250 The following notes apply to the above table: 1252 1. Contains an initial segment of the message payload stream for an 1253 RPC message, or the entre payload stream. The optxr[qs]_pslen 1254 field, indicates the length of the section present 1256 2. May contain a part of a message payload stream for an RPC 1257 message, although not the entre payload stream. The optxc_pslen 1258 field, if non-zero, indicates that this portion is present, and 1259 the length of the section. 1261 9.2. Transmit Request Operation 1263 The message definition for this operation is as follows: 1265 1267 const uint32 ROPT_XMTREQ = 1; 1269 struct optxmt_req { 1270 xmddp_grpinfo optxrq_ddp; 1271 xmddp_rsdgroup optxrq_rsd; 1272 xms_grpxc optxrq_count; 1273 xms_grpxc optxrq_rsbuf; 1274 xmddp_pldisp optxrq_pslen; 1276 }; 1278 1280 The field optxrq_ddp describes the fields in virtual XDR stream which 1281 have been excised in forming the payload stream, and information 1282 about where the corresponding bulk data is located. 1284 The field optxrq_rsd consists of information directing the responder 1285 as to how to construct the reply, in terms of DDP. of length zero. 1287 The field optrq_count specifies the count of transmissions in this 1288 group of transmissions used to send a request. 1290 The field optrq_repch serves as a way to transfer a reply chunk to 1291 the responder to serve as a way in which a reply longer than the 1292 inline size limit may be transferred. Although, not prohibited by 1293 the protocol, it is unlikely to be used in environments in which 1294 message continuation is supported. 1296 The field optrq_pslen gives the length of the payload stream for the 1297 RPC transmitted. The payload stream begins right after the end of 1298 the optxmt_msg and proceeds for optxm_pslen bytes. This can include 1299 crossing buffer segment boundaries. 1301 9.3. Transmit Response Operation 1303 The message definition for this operation is as follows: 1305 1307 const uint32 ROPT_XMTRESP = 2; 1309 struct optxmt_resp { 1310 xmddp_grpinfo optxrs_ddp; 1311 xms_grpxn optxrs_count; 1312 xmddp_pldisp optxrs_pslen; 1314 }; 1316 1318 The field optxrs_ddp describes the fields in virtual XDR stream which 1319 have been excised in forming the payload stream, and information 1320 about where the corresponding bulk data is located. 1322 The field optrs_count specifies the count of transmissions in this 1323 group of transmissions used to send a reply. 1325 The field optrq_pslen gives the length of the payload stream for the 1326 RPC transmitted. The payload stream begins right after the end of 1327 the optxmt_msg and proceeds for optxm_pslen bytes. This can include 1328 crossing buffer segment boundaries. 1330 9.4. Transmit Continue Operation 1332 RPC-over-RDMA headers of this type are used to continue RPC messages 1333 begun by RPC-over-RDMA message of type ROPT_XMTREQ or ROPT_XMTRESP. 1334 The xid field of this message must match that in the initial 1335 transmission. 1337 This operation needs to be supported for the message continuation 1338 feature to be used. 1340 The message definition for this operation is as follows: 1342 1344 const uint32 ROPT_XMTCONT = 3; 1346 struct optxmt_cont { 1347 xms_grpxn optxc_xnum; 1348 uint32 optxc_itype; 1349 xmddp_pldisp; optxc_pslen; 1350 }; 1352 1353 The field optxc_xnum indicates the transmission number of this 1354 transmission within its transmission group. 1356 The field optxc_pslen gives the length of the section of the payload 1357 stream which is located in the current RPC-over-RDMA transmission. 1358 It is valid for this length to be zero, indicating that there is no 1359 portion of the payload stream in this transmission. Except when the 1360 length is zero, the payload stream begins right after the end of the 1361 optxmt_cont and proceeds for optxc_pslen bytes. This can include 1362 crossing buffer segment boundaries. In any case, the payload streams 1363 for all transmissions within the same group are considered 1364 concatenated. 1366 9.5. Error Reporting Operation 1368 This RPC-over-RDMA message type is used to signal the occurrence of 1369 errors that do not involve: 1371 1. Transmission of a message that violates the rules specified in 1372 [rpcrdmav2]. 1374 2. Transmission of a message described in this document which does 1375 not conformn to the XDR specfied here. 1377 3. The transmission of a message, which, when assembled according to 1378 the rules here, cannot be decoded according to the XDR for the 1379 ULP. 1381 Such errors can arise if the rules specified in this document are not 1382 followed and can be the result of a mismatch between multiple, each 1383 of which is valid when considered on its own. 1385 The preliminary error-related definition is as follows: 1387 1389 enum optr_err { 1390 OPTRERR_BADHMT = 1, 1391 OPTRERR_BADOMT = 2, 1392 OPTRERR_BADCONT = 3, 1393 OPTRERR_BADSEQ = 4, 1394 OPTRERR_BADXID = 5, 1395 OPTRERR_BADOFF = 6, 1396 OPTRERR_BADTBSN = 7, 1397 OPTRERR_BADPL = 8 1398 } 1400 union optr_info switch(optr_err optre_which) { 1402 case OPTRERR_BADHMT: 1403 case OPTRERR_BADOMT: 1404 case OPTRERR_BADSEQ: 1405 case OPTRERR_BADXID: 1406 uint32 optri_expect; 1407 uint32 optri_current; 1409 case OPTRERR_BADCONT: 1410 void; 1412 case OPTRERR_BADTBSN: 1413 case OPTRERR_BADOFF: 1414 case OPTRERR_BADPL: 1415 uint32 optri_value; 1416 uint32 optri_min; 1417 uint32 optri_max; 1419 }; 1421 1423 optr_err enumerates the various error conditions that might be 1424 reported. 1426 o OPTRERR_BADHMT indicates that a header message type other than the 1427 one expected was received. In this context, a particular message 1428 type can be considered "expected" only because of message or group 1429 continuation. 1431 o OPTRERR_BADOMT indicates that an optional message type other than 1432 the one expected was received. In this context, a particular 1433 message type can be considered "expected" only because of message 1434 or group continuation. 1436 o OPTRERR_BADCONT indicates that a continuation messages was 1437 received when there was no reason to expect one. 1439 o OPTRERR_BADSEQ indicate that a transmission sequence number other 1440 than the one expected was received. 1442 o OPTRERR_BADXID indicate that an xid other than the one expected in 1443 a continuation context. 1445 o OPTRERR_BADTBSN indicate that an invalid target buffer sequence 1446 number was received. 1448 o OPTRERR_BADOFF indicate that a bad offset was received as part of 1449 an xmddp_loc. This is typically because the offset is larger than 1450 the buffer segment size. 1452 o OPTRERR_BADPL indicates that a bad offset was received for the 1453 payload length. This is typically because the length would make 1454 the area devoted to the payload stream not a subset of the actual 1455 transmission. 1457 The optr_info gives error about the specific invalid field being 1458 reported. The additional information given depends on the specific 1459 error. 1461 o For the errors OPTRERR_BADHMT, OPTRERR_BADOMT, OPTRERR_BADSEQ, and 1462 OPTRERR_BADXID, the expected and actual values of the field are 1463 reported 1465 o For the error OPTRERR_CONT, no additional information is provided. 1467 o For the errors OPTRERR_BADTBSN, OPTRERR_BADOFF, and OPTRERR_BADPL, 1468 the actual value together with a range of valid values is 1469 provided. When the actual value is with the valid range, it can 1470 be inferred that the actual value is not properly aligned (e.g. 1471 not on a 32-bit boundary) 1473 The message definition for this operation is as follows: 1475 1477 const uint32 ROPT_REPTERR = 4; 1479 struct optrept_err { 1480 xms_id optre_bad; 1481 xms_id *optre_lead; 1482 optr_info optre_info; 1483 }; 1485 1487 The field optre_bad is a description of the transmission on which the 1488 error was actually detected. 1490 The optional field optre_lead is a description of an earlier 1491 transmission that might have led to the error reported. 1493 The field optre_info provides information about the 1495 10. XDR 1497 This section contains an XDR [RFC4506] description of the proposed 1498 extension. 1500 This description is provided in a way that makes it simple to extract 1501 into ready-to-use form. The reader can apply the following shell 1502 script to this document to produce a machine-readable XDR description 1503 of extension which can be combined with XDR for the base protocol to 1504 produce an XDR that includes the base protocol together with the 1505 optional extensions. 1507 1509 #!/bin/sh 1510 grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??' 1512 1514 That is, if the above script is stored in a file called "extract.sh" 1515 and this document is in a file called "ext.txt" then the reader can 1516 do the following to extract an XDR description file for this 1517 extension: 1519 1521 sh extract.sh < ext.txt > xmitext.x 1523 1525 The XDR description for this extension can be combined with that for 1526 other extensions and that for the base protocol. While this is a 1527 complete description and can be processed by the XDR compiler, the 1528 result might not be usable to process the extended protocol, for a 1529 number of reasons: 1531 The RPC-over-RDMA transport headers do not constitute an RPC 1532 program and version negotiation and message selection part of the 1533 XDR, rather than being external to it. 1535 Headers used for requests and replies are not necessarily paired, 1536 as they would be in an RPC program. 1538 Header types defined as optional extensions overlay existing 1539 nominally opaque fields in the base protocol. While this overlay 1540 architecture allows code aware of the overlay relationships to 1541 have a more complete view of header structure, this overlay 1542 relationship cannot be expressed within the XDR language 1544 10.1. Code Component License 1546 Code components extracted from this document must include the 1547 following license text. When the extracted XDR code is combined with 1548 other complementary XDR code which itself has an identical license, 1549 only a single copy of the license text need be preserved. 1551 1553 /// /* 1554 /// * Copyright (c) 2010, 2016 IETF Trust and the persons 1555 /// * identified as authors of the code. All rights reserved. 1556 /// * 1557 /// * The author of the code is: D. Noveck. 1558 /// * 1559 /// * Redistribution and use in source and binary forms, with 1560 /// * or without modification, are permitted provided that the 1561 /// * following conditions are met: 1562 /// * 1563 /// * - Redistributions of source code must retain the above 1564 /// * copyright notice, this list of conditions and the 1565 /// * following disclaimer. 1566 /// * 1567 /// * - Redistributions in binary form must reproduce the above 1568 /// * copyright notice, this list of conditions and the 1569 /// * following disclaimer in the documentation and/or other 1570 /// * materials provided with the distribution. 1571 /// * 1572 /// * - Neither the name of Internet Society, IETF or IETF 1573 /// * Trust, nor the names of specific contributors, may be 1574 /// * used to endorse or promote products derived from this 1575 /// * software without specific prior written permission. 1576 /// * 1577 /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS 1578 /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED 1579 /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1580 /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 1581 /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO 1582 /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 1583 /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 1584 /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 1585 /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 1586 /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 1587 /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 1588 /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 1589 /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 1590 /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 1591 /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 1592 /// */ 1594 1596 10.2. XDR Proper for Extension 1598 1599 /// /******************************************************************* 1600 /// ******************************************************************* 1601 /// ** 1602 /// ** XDR for OPTIONAL protocol extension. 1603 /// ** 1604 /// ** Includes support for both message continuation and send-based 1605 /// ** DDP. The latter is supported by a new structure for the 1606 /// ** specification of data placements which can be used for both 1607 /// ** send-based DDP and DDP using explicit RDMA operations. 1608 /// ** 1609 /// ** Extensions include: 1610 /// ** 1611 /// ** o Four new transport properties. 1612 /// ** o Four new OPTIONAL message types 1613 /// ** 1614 /// ******************************************************************* 1615 /// ******************************************************************/ 1616 /// 1617 /// /******************************************************************* 1618 /// * 1619 /// * Core XDR Definitions 1620 /// * 1621 /// ******************************************************************/ 1623 /// /* 1624 /// * General XDR preliminaries for these features, 1625 /// */ 1626 /// typedef uint32 xms_grpxn; 1627 /// typedef uint32 xms_grpxc; 1628 /// 1629 /// /* 1630 /// * Basic XDR typedefs for the new approach to DDP Specification. 1631 /// */ 1632 /// typedef uint32 xmddp_itemlen; 1633 /// typedef uint32 xmddp_pldisp; 1634 /// typedef uint32 xmddp_vsdisp; 1635 /// typedef uint32 xmddp_tbsn; 1636 /// 1637 /// /* 1638 /// * Define the possible types of DDP items. 1639 /// */ 1640 /// enum xmddp_type { 1641 /// XMDTYPE_EXRW = 1, 1642 /// XMDTYPE_TBSN = 2, 1643 /// XMDTYPE_CHOOSE = 3, 1644 /// XMDTYPE_BYSIZE = 4, 1645 /// XMDTYPE_TOOSHORT = 5, 1646 /// XMDTYPE_NOITEM = 6 1647 /// }; 1648 /// 1649 /// /* 1650 /// * XDR defining the placemebnt of bulk items in the message being 1651 /// * sent. 1652 /// */ 1653 /// union xmddp_loc switch(xmddp_type type) 1654 /// 1655 /// case XMDTPE_EXRW: 1656 /// rpcrdma1_segment xmdl_ex<>; 1657 /// case XMDTYPE_TBSN: 1658 /// xmddp_itemlen xmdl_offset; 1659 /// xmddp_tbsn xmdl_bsnum<>; 1660 /// case XMDTYPE_TOOSHORT: 1661 /// case XMDTYPE_NOITEM: 1662 /// void; 1663 /// }; 1664 /// 1665 /// 1666 /// 1667 /// struct xmddp_mitem { 1668 /// xmddp_vsdisp xmdmi_disp; 1669 /// xmddp_itemlen xmdmi_length; 1670 /// xmddp_loc xmdmi_where; 1671 /// }; 1672 /// 1673 /// typedef xmddp_mitem xmddp_grpinfo<>; 1674 /// 1675 /// /* 1676 /// * XDR defining the placement of bulk items in the response to the 1677 /// * message being sent. 1678 /// */ 1679 /// union xmddp_rsdloc switch(xmddp_type type) 1680 /// 1681 /// case XMDTPE_EXRW: 1682 /// case XMDTPE_CHOICE: 1683 /// rpcrdma1_segment xmdrsdl_ex<>; 1684 /// case XMDTPE_BYSIZE: 1685 /// xmddp_itemlen xmdrsdl_dsdov; 1686 /// rpcrdma1_segment xmdrsdl_bsex<>; 1687 /// case XMDTYPE_TBSN: 1688 /// void; 1689 /// }; 1690 /// 1691 /// struct xmddp_rsdrange { 1692 /// xmddp_vsdisp xmdrsdr_begin; 1693 /// xmddp_vsdisp xmdrsdr_end; 1694 /// }; 1695 /// 1696 /// struct xmddp_rsditem { 1697 /// xmddp_itemlen xmdrsdi_minlen; 1698 /// xmddp_rsdloc xmdrsdi_loc; 1699 /// }; 1700 /// 1701 /// struct xmddp_rsdset { 1702 /// xmddp_rsdrange xmdrsds_range; 1703 /// xmddp_rsditem xmdrsds_items<>; 1704 /// }; 1705 /// 1706 /// typedef xmddp_rsdset xmddp_rsdgroup<>; 1707 /// 1708 /// /******************************************************************* 1709 /// * 1710 /// * New Transport Properties 1711 /// * 1712 /// ******************************************************************/ 1713 /// 1714 /// /* 1715 /// * New Transport Property codes 1716 /// */ 1717 /// const uint32 XPROP_RTRSUPP = 3; 1718 /// const uint32 XPROP_RBSTRUCT = 4; 1719 /// const uint32 XPROP_REQRXLIM = 5; 1720 /// const uint32 XPROP_RESPSXLIM = 6; 1721 /// 1722 /// /* 1723 /// * XDR relating to RTR Support Property 1724 /// */ 1725 /// typedef uint32 xpr_rtrs; 1726 /// 1727 /// const uint32 RTRS_XREQ = 1; 1728 /// const uint32 RTRS_XRESP = 2; 1729 /// const uint32 RTRS_XCONT = 4; 1730 /// 1731 /// /* 1732 /// * Items related to Receive Buffer Structure Property 1733 /// */ 1734 /// struct xmrbs_seg { 1735 /// uint32 xmrseg_length; 1736 /// uint32 xmrseg_align; 1737 /// uint32 xmrseg_flags; 1738 /// }; 1739 /// 1740 /// const uint32 XMRSFLAG_DDP = 0x01; 1741 /// 1742 /// struct xmrbs_group { 1743 /// uint32 xmrgrp_count; 1744 /// xmrbs_seg xmrgrp_info; 1745 /// }; 1746 /// 1747 /// struct xmrbs_buf { 1748 /// uint32 xmrbuf_length; 1749 /// xmrbs_group xmrbuf_groups<>; 1750 /// }; 1751 /// typedef xmrbs_buf xpr_rbs; 1752 /// 1753 /// /* 1754 /// * XDR relating to tranismission limit properties 1755 /// */ 1756 /// typedef uint32 xpr_rqrxl; 1757 /// 1758 /// typedef uint32 xpr_rssxl; 1759 /// 1760 /// /******************************************************************* 1761 /// * 1762 /// * New OPTIONAL Message Types 1763 /// * 1764 /// ******************************************************************/ 1765 /// 1766 /// /* 1767 /// * New message type codes 1768 /// */ 1769 /// const uint32 ROPT_XMTREQ = 1; 1770 /// const uint32 ROPT_XMTRESP = 2; 1771 /// const uint32 ROPT_XMTCONT = 3; 1772 /// const uint32 ROPT_REPTERR = 4; 1773 /// 1774 /// 1775 /// /* 1776 /// * New message type to do the initial transmission of a request. 1777 /// */ 1778 /// struct optxmt_req { 1779 /// xmddp_grpinfo optxrq_ddp; 1780 /// xmddp_rsdgroup optxrq_rsd; 1781 /// xms_grpxc optxrq_count; 1782 /// xms_grpxc optxrq_rsbuf; 1783 /// xmddp_pldisp optxrq_pslen; 1784 /// 1785 /// }; 1786 /// 1787 /// /* 1788 /// * New message type to do the initial transmission of a response. 1789 /// */ 1790 /// struct optxmt_resp { 1791 /// xmddp_grpinfo optxrs_ddp; 1792 /// xms_grpxn optxrs_count; 1793 /// xmddp_pldisp optxrs_pslen; 1794 /// 1795 /// }; 1796 /// 1797 /// /* 1798 /// * New message type to transmit the continuation of a request or 1799 /// * response. 1800 /// */ 1801 /// struct optxmt_cont { 1802 /// xms_grpxn optxc_xnum; 1803 /// uint32 optxc_itype; 1804 /// xmddp_pldisp; optxc_pslen; 1805 /// }; 1806 /// 1807 /// /* 1808 /// * XDR definitions to support error reporting. 1809 /// */ 1810 /// enum optr_err { 1811 /// OPTRERR_BADHMT = 1, 1812 /// OPTRERR_BADOMT = 2, 1813 /// OPTRERR_BADCONT = 3, 1814 /// OPTRERR_BADSEQ = 4, 1815 /// OPTRERR_BADXID = 5, 1816 /// OPTRERR_BADOFF = 6, 1817 /// OPTRERR_BADTBSN = 7, 1818 /// OPTRERR_BADPL = 8 1819 /// } 1820 /// 1821 /// union optr_info switch(optr_err optre_which) { 1822 /// 1823 /// case OPTRERR_BADHMT: 1824 /// case OPTRERR_BADOMT: 1825 /// case OPTRERR_BADSEQ: 1826 /// case OPTRERR_BADXID: 1827 /// uint32 optri_expect; 1828 /// uint32 optri_current; 1829 /// 1830 /// case OPTRERR_BADCONT: 1831 /// void; 1832 /// 1833 /// 1834 /// case OPTRERR_BADTBSN: 1835 /// case OPTRERR_BADOFF: 1836 /// case OPTRERR_BADPL: 1837 /// uint32 optri_value; 1838 /// uint32 optri_min; 1839 /// uint32 optri_max; 1840 /// 1841 /// }; 1842 /// 1843 /// struct xms_id { 1844 /// uint32 xmsi_xid; 1845 /// msg_type xmsi_dir; 1846 /// xms_grpxn xmsi_seq; 1847 /// }; 1848 /// 1849 /// /* 1850 /// * New message type for error reporting. 1851 /// */ 1852 /// struct optrept_err { 1853 /// xms_id optre_bad; 1854 /// xms_id *optre_lead; 1855 /// optr_info optre_info; 1856 /// }; 1857 /// 1858 /// 1859 1861 11. Security Considerations 1863 The extension described has the same security considerations 1864 described in [rfc5666bis] and [rpcrdmav2]. With regard to the 1865 transport properties introduced in this document, it is possible thar 1866 a man-in-the-middle could interfere with the communication of 1867 transport properties with possible negative effects. To present such 1868 interferece, the steps described in [rpcrdmav2] should be attended 1869 to. 1871 The use of the techniques described in this document to reduce use of 1872 explicit RDMA operatios raise important issues which implementers 1873 should consider: 1875 While the use of these techniques may be expedient in certai 1876 cases, their use is not likely to be universal, at least for a 1877 considerable time. As a result, implementers should remain aware 1878 of the issues discussed in Section 9.1 of [rfc5666bis], unless and 1879 until it is certain that none of a requesters memory can be 1880 registered for remote access. 1882 Exra care needs to be taken in cases in which padding needs to be 1883 inserted in a transmission to ensure that DDP-targetable dsta item 1884 will be received in an apprpriately aligned buffer segment. In 1885 some implementtions, sensitive data could be inavertntly sent 1886 within the padding. To prevent this, the pading can be zeroed or 1887 it can be sent from a pre-zeroed area using a gather list. 1889 12. IANA Considerations 1891 This document does not require any actions by IANA. 1893 13. References 1895 13.1. Normative References 1897 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1898 Requirement Levels", BCP 14, RFC 2119, 1899 DOI 10.17487/RFC2119, March 1997, 1900 . 1902 [RFC4506] Eisler, M., Ed., "XDR: External Data Representation 1903 Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 1904 2006, . 1906 [rfc5666bis] 1907 Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct 1908 Memory Access Transport for Remote Procedure Call", 1909 November 2016, . 1912 Work in progress. 1914 13.2. Informative References 1916 [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 1917 "Network File System (NFS) Version 4 Minor Version 1 1918 External Data Representation Standard (XDR) Description", 1919 RFC 5662, DOI 10.17487/RFC5662, January 2010, 1920 . 1922 [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access 1923 Transport for Remote Procedure Call", RFC 5666, 1924 DOI 10.17487/RFC5666, January 2010, 1925 . 1927 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 1928 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 1929 January 2010, . 1931 [rpcrdmav2] 1932 Lever, C., Ed. and D. Noveck, "RPC-over-RDMA Version Two", 1933 December 2016, . 1936 Work in progress. 1938 Appendix A. Acknowledgements 1940 The author gratefully acknowledges the work of Brent Callaghan and 1941 Tom Talpey producing the original RPC-over-RDMA Version One 1942 specification [RFC5666] and also Tom's work in helping to clarify 1943 that specification. 1945 The author also wishes to thank Chuck Lever for his work resurrecting 1946 NFS support for RDMA in [rfc5666bis], for clarifying the relationshp 1947 between RDMA and direct data placement, and for beginning the work on 1948 RPC-over-RDMA Version Two. 1950 The extract.sh shell script and formatting conventions were first 1951 described by the authors of the NFSv4.1 XDR specification [RFC5662]. 1953 Author's Address 1955 David Noveck 1956 Hewlett Packard Enterprise 1957 165 Dascomb Road 1958 Andover, MA 01810 1959 USA 1961 Phone: +1 781-572-8038 1962 Email: davenoveck@gmail.com