idnits 2.17.1 draft-dnoveck-nfsv4-rpcrdma-rtrext-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 985 has weird spacing: '...itemlen xmd...' == Line 1072 has weird spacing: '...itemlen xmd...' == Line 1077 has weird spacing: '...sdrange xmdr...' == Line 1078 has weird spacing: '...rsditem xmd...' == Line 1337 has weird spacing: '...grpinfo opt...' == (9 more instances...) -- The document date (December 5, 2017) is 2326 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '0' on line 1003 == Outdated reference: A later version (-09) exists of draft-cel-nfsv4-rpcrdma-version-two-05 -- Obsolete informational reference (is this intentional?): RFC 5666 (Obsoleted by RFC 8166) -- Obsolete informational reference (is this intentional?): RFC 5667 (Obsoleted by RFC 8267) Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network File System Version 4 D. Noveck 3 Internet-Draft NetApp 4 Intended status: Informational December 5, 2017 5 Expires: June 8, 2018 7 RPC-over-RDMA Extensions to Reduce Internode Round-trips 8 draft-dnoveck-nfsv4-rpcrdma-rtrext-03 10 Abstract 12 It is expected that a future version of the RPC-over-RDMA transport 13 will allow protocol extensions to be defined. This would provide for 14 the specification of OPTIONAL features allowing participants who 15 implement such features to cooperate as specified by that extension, 16 while still interoperating with participants who do not support that 17 extension. 19 A particular extension is described herein, whose motivation is to 20 reduce the latency due to inter-node round-trips needed to effect 21 operations which involve direct data placement or which transfer RPC 22 messages longer than the fixed inline buffer size limit. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at https://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on June 8, 2018. 41 Copyright Notice 43 Copyright (c) 2017 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (https://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . 3 59 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 60 1.2. Introduction . . . . . . . . . . . . . . . . . . . . . . 3 61 1.3. Role of this Document . . . . . . . . . . . . . . . . . . 3 62 1.4. Prerequisites . . . . . . . . . . . . . . . . . . . . . . 4 63 1.5. Participant Terminology . . . . . . . . . . . . . . . . . 5 64 2. Extension Overview . . . . . . . . . . . . . . . . . . . . . 5 65 3. Data Placement Features . . . . . . . . . . . . . . . . . . . 6 66 3.1. Current Situation . . . . . . . . . . . . . . . . . . . . 6 67 3.2. RDMA_MSGP . . . . . . . . . . . . . . . . . . . . . . . . 6 68 3.3. Send-based Data Placement . . . . . . . . . . . . . . . . 8 69 3.4. Other Extensions Relating to Data Placement . . . . . . . 8 70 4. Message Continuation Feature . . . . . . . . . . . . . . . . 9 71 4.1. Current Situation . . . . . . . . . . . . . . . . . . . . 9 72 4.2. Message Continuation Changes . . . . . . . . . . . . . . 10 73 4.3. Message Continuation and Credits . . . . . . . . . . . . 10 74 5. Using Protocol Additions . . . . . . . . . . . . . . . . . . 11 75 5.1. New Operation Support . . . . . . . . . . . . . . . . . . 12 76 5.2. Message Continuation Support . . . . . . . . . . . . . . 12 77 5.3. Support for Send-based Data Placement . . . . . . . . . . 13 78 5.4. Error Reporting . . . . . . . . . . . . . . . . . . . . . 14 79 6. XDR Preliminaries . . . . . . . . . . . . . . . . . . . . . . 15 80 6.1. Message Continuation Preliminaries . . . . . . . . . . . 15 81 6.2. Data Placement Preliminaries . . . . . . . . . . . . . . 15 82 7. Data Placement Structures . . . . . . . . . . . . . . . . . . 18 83 7.1. Data Placement Overview . . . . . . . . . . . . . . . . . 18 84 7.2. Buffer Structure Definition . . . . . . . . . . . . . . . 19 85 7.3. Message Data Placement Structures . . . . . . . . . . . . 21 86 7.4. Response Direction Data Placement Structures . . . . . . 23 87 8. Transport Properties . . . . . . . . . . . . . . . . . . . . 25 88 8.1. Property List . . . . . . . . . . . . . . . . . . . . . . 25 89 8.2. RTR Support Property . . . . . . . . . . . . . . . . . . 26 90 8.3. Receive Buffer Structure Property . . . . . . . . . . . . 26 91 8.4. Request Transmission Receive Limit Property . . . . . . . 27 92 8.5. Response Transmission Send Limit Property . . . . . . . . 27 93 9. New Operations . . . . . . . . . . . . . . . . . . . . . . . 28 94 9.1. Operations List . . . . . . . . . . . . . . . . . . . . . 28 95 9.2. Transmit Request Operation . . . . . . . . . . . . . . . 29 96 9.3. Transmit Response Operation . . . . . . . . . . . . . . . 29 97 9.4. Transmit Continue Operation . . . . . . . . . . . . . . . 30 98 9.5. Error Reporting Operation . . . . . . . . . . . . . . . . 31 99 10. XDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 100 10.1. Code Component License . . . . . . . . . . . . . . . . . 35 101 10.2. XDR Proper for Extension . . . . . . . . . . . . . . . . 37 102 11. Security Considerations . . . . . . . . . . . . . . . . . . . 42 103 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43 104 13. References . . . . . . . . . . . . . . . . . . . . . . . . . 43 105 13.1. Normative References . . . . . . . . . . . . . . . . . . 43 106 13.2. Informative References . . . . . . . . . . . . . . . . . 43 107 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 44 108 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 44 110 1. Preliminaries 112 1.1. Requirements Language 114 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 115 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 116 document are to be interpreted as described in [RFC2119]. 118 1.2. Introduction 120 This document describes a potential extension to the RPC-over-RDMA 121 protocol, which would allow participating implementations to have 122 more flexibility in how they use RDMA sends and receives to effect 123 necessary transmission of RPC requests and replies. 125 In contrast to existing facilities defined in RPC-over-RDMA Version 126 One in which the mapping between RPC messages and RPC-over-RDMA 127 messages is strictly one-to-one and placement of bulk data is 128 effected only through use of explicit RDMA operations, the following 129 features are made available through this extension: 131 o The ability to effect data placement in the context of a single 132 RPC-over-RDMA transmission, rather than requiring explicit RDMA 133 operations to effect the necessary placement. 135 o The ability to continue an RPC request or reply over multiple RPC- 136 over-RDMA transmissions. 138 1.3. Role of this Document 140 This is not a standards-track document, despite the fact that it 141 contains many of the sorts of items (e.g. proposed XDR, detailed 142 field descriptions) that normally appear in such documents. 144 Although this document is in the informational category it is not 145 expected to result in an Informational RFC, as the material within it 146 is not expected to be of interest to the internet community in 147 general. Its target audience is the nfsv4 working group itself and 148 it is not expected to evolve into an RFC. 150 The function of this document is essentially exploratory, in that it 151 looks at a number of possible ways that the RPC-over-RDMA transport 152 could be extended. Although many of these might well be followed up 153 on eventually with standards-track documents, it should not be 154 assumed that all will or that the relation among the various elements 155 of any extension to address these issue will be the same as laid out 156 here. 158 1.4. Prerequisites 160 This document is written assuming that certain underlying facilities 161 will be made available to build upon, in the context of a future 162 version of RPC-over-RDMA. It is most likely that such facilities 163 will be first available in Version Two of RPC-over-RDMA. 165 o A protocol extension mechanism is needed to enable the extensions 166 to RPC-over-RDMA described here. 168 This document is currently written to conform to the extension 169 model for the proposed RPC-over-RDMA Version Two as described in 170 [I-D.cel-nfsv4-rpcrdma-version-two]. 172 o An existing means of communicating transport properties between 173 the RPC-over-RDMA endpoints is assumed. 175 This document is currently written assuming the transport property 176 model defined in [I-D.cel-nfsv4-rpcrdma-version-two]. will be 177 available and can be extended to meet the needs of this extension. 179 As the document referred to above is currently a personal Internet 180 Draft, and subject to change, adjustments to this document are 181 expected to be necessary when and if the needed facilities are 182 defined in one or more working group documents leading to the 183 potential publication of Standards-track RFCs. 185 Such an RFC for a new RPC-over-RDMA version might differ from 186 [I-D.cel-nfsv4-rpcrdma-version-two] in significant ways even if it 187 provides the prerequisites listed above. For example, 189 o The extension model might be significantly different. For 190 example, it might use an approach more like that used in [RFC8178] 191 rather than using a single message type as a vehicle for OPTIONAL 192 extensions. 194 o There is the possibility of significant change in the credit 195 model. While [I-D.cel-nfsv4-rpcrdma-version-two] contains support 196 for one-way messages, much of the text regarding credits is 197 inherited from [RFC8166] which assumes a one-to-one mapping 198 between requests and responses. It is not clear whether this 199 mismatch will be resolved by changing (only) the description of 200 the credit mechanism or whether a more basic protocol change is 201 required. Whichever approach is taken, the treatment of message 202 continuation is likely to follow. 204 1.5. Participant Terminology 206 A number of different terms are used regarding the roles of the two 207 participants in an RPC-over-RMA connection. Some of these roles last 208 for the duration of a connection while others vary from request to 209 request or from message to message. 211 The roles of the client and server are fixed for the lifetime of the 212 connection, with the client defined as the endpoint which initiated 213 the connection. 215 The roles of requester and responder often parallel those of client 216 and server, although this is not always the case. Most requests are 217 made in the forward direction, in which the client is the requester 218 and the server is the responder. However, backward direction 219 requests are possible, in which case the server is the requester and 220 the client is the responder. As a result clients and servers may 221 both act as requesters and responders for different requests issued 222 on the same connection. 224 The roles of sender and receiver vary from message to messages. With 225 regard to the messages described in this document, the sender may act 226 as a requester by sending RPC requests or a responder by sending RPC 227 requests or as both at the same time by sending a mix of the two. 229 2. Extension Overview 231 This extension is intended to function as part of RPC-over-RDMA and 232 implementations should successfully interoperate with existing RPC- 233 over-RDMA Version One implementations. Nevertheless, this extension 234 seeks to take a somewhat different approach to high-performance RPC 235 operation than has been used previously in that it seeks to de- 236 emphasize the use of explicit RDMA operations. It does this in two 237 ways: 239 o By implementing a send-based form of data placement (see 240 Section 3), use of explicit RDMA operations can be avoided in many 241 common cases in which data is to be placed at an appropriate 242 location in the receiver's memory. 244 o Use of explicit RDMA to support reply chunks and position-zero 245 read chunks can be avoided by allowing a single message to be 246 split into multiple transmissions. This can be used to avoid many 247 instances of the only existing use of explicit RDMA operations not 248 associated with Direct Data Placement. 250 While use of explicit RDMA operations allows the cost of the actual 251 data transfer to be offloaded from the client and server CPUs to the 252 RNIC, there are ancillary costs in setting up the transfer that 253 cannot be ignored. As a result, send-based functions are often 254 preferable, since the RNIC also uses DMA to effect these operations. 255 In addition, the cost of the additional inter-node round trips 256 required by explicit RDMA operation can be an issue, which can 257 becomes increasingly troublesome as internode distances increase. 258 Once one moves from in-machine-room to campus-wide or metropolitan- 259 area distances the additional round-trip delay of 16 microseconds per 260 mile becomes an issue impeding use of explicit RDMA operations. 262 3. Data Placement Features 264 3.1. Current Situation 266 Although explicit RDMA operations are used in the existing RPC-over- 267 RDMA protocol for purposes unrelated to Direct Data Placement, all 268 placement of bulk data is effected using explicit RDMA operations. 270 As a result, many operations requiring placement of bulk data involve 271 multiple internode round trips. 273 3.2. RDMA_MSGP 275 Although this was not stated explicitly, it appears that RDMA_MSGP 276 (defined in [RFC5666], removed from RPC-over-RDMA Version One by 277 [RFC8166]), was an early attempt to effect correct placement of bulk 278 data within a single RPC-over-RDMA transmission. 280 As things turned out, the fields within the RDMA_MSGP header were not 281 described in [RFC5666] in a way that allowed this message type to be 282 implemented. 284 In attempting to provide the appropriate data placement 285 functionality, we have to keep in mind and avoid the problems that 286 led to failure of RDMA_MSGP. It appears that the problems go deeper 287 than neglecting to write a few relevant sentences. It is helpful to 288 note that: 290 o The inline message size limits eventually adopted were too small 291 to allow RDMA_MSGP to be used effectively. This is true of both 292 the 1K limit in Version One [RFC8166] and the 4K limit specified 293 in [I-D.cel-nfsv4-rpcrdma-version-two]. 295 On the other hand, there is text within [RFC5667] that suggests 296 that much longer messages were anticipated at some points during 297 the evolution of RPC-over-RDMA. 299 o The fact that NFSv4 COMPOUNDs often have additional operations 300 beyond the one including the bulk data means that the RDMA_MSGP 301 model cannot be extended to NFSv4. As a result, the bulk data 302 needs to be excised from the data stream just as chunks are, so 303 that the payload stream can include non-bulk data both before and 304 after the logical position of the excised bulk data. 306 o In order for the sender to determine the appropriate amount of 307 padding necessary within a transmission to place the bulk data at 308 the proper position within receive buffer, the server must know 309 more about the structure of the receiver's buffers. Since the 310 padding needs to bring the bulk data to a position within the 311 buffer that is appropriate to receive the bulk data, the sender 312 needs to know where within the receive buffers such placement- 313 eligible areas are located. 315 o While appropriate padding could place the bulk data within a large 316 WRITE into an appropriately aligned buffer or set of buffer, there 317 is no corresponding provision for the bulk data associated with a 318 READ. In short, there is no way to indicate to the responder that 319 it should use RDMA_MSGP to appropriately place bulk data in the 320 response. 322 o There is no explicit discussion of the required padding's use in 323 effecting proper data placement or connection with the ULB's 324 specification of DDP-eligible XDR items. 326 To summarize, RDMA_MSGP was an attempt to properly place bulk data 327 which was thought of as a local optimization and insufficient 328 attention was given to it to make it successful. As a result, as 329 RPC-over-RDMA Version One was developed, data placement was 330 identified with the use of explicit RDMA operations providing DDP and 331 the possibility of data placement within sends was not recognized. 333 3.3. Send-based Data Placement 335 In this extension we will describe a more complete way to provide 336 send-based data placement, as follows: 338 o By defining the structure of receive buffers as a transport 339 property available to be interrogated by the peer implementation. 341 o By treating positioning of bulk data within a message as an 342 instance of data placement, causing the bulk data to be excised 343 from the payload XDR stream, as is the case with other forms of 344 bulk data placement (e.g. DDP). 346 o By defining new data structures to control placement of bulk data 347 that support both send-based data placement and DDP using explicit 348 RDMA operations that was an integral part in RPC-over-RDMA Version 349 One. These new control structures, described in Section 7.1 are 350 organized differently from the chunk-based structures described in 351 [RFC8166]. 353 In making these changes, we will retain certain aspects of the DDP 354 model: 356 o The set of bulk data items eligible for special data placement is 357 exactly the same as with DDP, as defined by the RPC protocol's 358 upper-layer binding document. 360 o The concept of an inline XDR stream is retained, with specially 361 placed items appearing outside it, but with references to them 362 retained so that the receiver has access to all of the message 363 data. 365 3.4. Other Extensions Relating to Data Placement 367 In order to support send-based data placement, new placement-related 368 data structures have been defined, as described in Sections 7.3 and 369 7.4. 371 These new data structures support both send-based and RDMA-operation- 372 based data placement. In addition, because of the restructuring 373 described in Section 7.1, a number of additional facilities are made 374 available: 376 o The ability to restrict entries regarding data placement in 377 response data to XDR data items generated in response to 378 performing particular constituent operations within a given RPC 379 request (e.g. specific operations within an NFSv4 COMPOUND). 381 o The ability to make use of special data placement contingent on 382 the actual length of a placement-eligible data item in the 383 response. 385 o The ability to specify whether use of data placement for a 386 particular placement-eligible data item is required or optional. 388 These additional facilities will be available to implementations that 389 do not support send-based data placement, as long as both parties 390 support the OPTIONAL Header types that include these new structures. 391 For more information about the relationships among, the new transport 392 properties, operations, and features, see Section 5. 394 4. Message Continuation Feature 396 4.1. Current Situation 398 Within RPC-over-RDMA Version One [RFC8166], each transmission of a 399 request or reply involves sending a single RDMA send message and 400 conversely each message-related transmission involves only a single 401 RPC request or reply. 403 This strict one-to-one model leads to some potential performance 404 issues. 406 o Because of RDMA's use of fixed-size receives, some requests and 407 replies will inevitably not fit in the limited space available, 408 even if they do not contain any DDP-eligible bulk data. 410 Such cases will raise performance issues because, to deal with 411 them, the server is interrupted twice to receive a single request 412 and all the necessary transfers are serialized. In particular, 413 there are two server interrupt latencies involved before the 414 server can process the actual request, in addition to the OTW 415 round-trip latencies. 417 o In the case of replies, there may be cases in which reply chucks 418 need to be allocated and registered even if the actual reply would 419 fit within the fixed receive-size limit. Because the decision to 420 create a reply chunk is made at the time the request is sent, even 421 an extremely low probability of a longer reply will trigger 422 allocation of a reply chunk. 424 Because this decision is made in conformance with ULB rules, 425 which, by their nature, may only reference a limited set of data, 426 a reply chunk may be required even when the actual probability of 427 a long reply is exactly zero. For example a GETATTR request can 428 generate a long reply due to a long ACL, and thus COMPOUND with 429 this operation might allocate a reply chunk, even if the specific 430 file system being interrogated only supports ACLs of limited 431 sizes, or the GETATTR in question does not interrogate one of the 432 ACL attributes. Also, the OWNER attribute is a string and it may 433 be impossible to determine a priori that the owner of any 434 particular file has no chance of requiring more than 4K bytes of 435 space, for example. The assumption that there are no such user 436 names, while it probably is valid, is not a fact that RPC-over- 437 RDMA implementations can depend on. 439 4.2. Message Continuation Changes 441 Continuing a single RPC request or reply is addressed by defining 442 separate optional header types to begin and to continue sending a 443 single RPC message. This is instead of creating a header with a 444 continuation bit. In this approach, all of the fields relating to 445 data placement, which include support for send-based data placement, 446 appear in the starting header (of types ROPT_XMTREQ and ROPT_XMTRESP) 447 and apply to the RPC message as a whole. 449 Later RPC-over-RDMA messages (of type ROPT_XMTCONT) may extend the 450 payload stream and/or provide additional buffers to which bulk data 451 can be directed. 453 In this case, all of the RPC-over-RDMA messages used together are 454 referred to as a transmission group and must be received in order 455 without any intervening message. 457 In implementations using this optional facility, those decoding RPC 458 messages received using RPC-over-RDMA no longer have the assurance 459 that that each RPC message is in a contiguous buffer. As most XDR 460 implementations are built based on the assumption that input will not 461 be contiguous, this will not affect performance in most cases. 463 4.3. Message Continuation and Credits 465 Using multiple transmissions to send a single request or response can 466 complicate credit management. In the case of the message 467 continuation feature, deadlocks can be avoided because use of message 468 continuation is not obligatory. The requester or responder can use 469 explicit RDMA operations if sufficient credits to use message 470 continuation are not available. 472 A requester is well positioned to make this choice with regard to the 473 sending of requests. The requester must know, before sending a 474 request, how long it will be, and therefore, how many credits it 475 would require to send the request using message continuation. If 476 these are not available, it can avoid message continuation by either 477 creating read chunks sufficient to make the payload stream fit in a 478 single transmission or by creating a position-zero read chunk. 480 With regard to the response, the requester is not in position to know 481 exactly how long the response will be. However, the ULB will allow 482 the maximum response length to be determined based on the request. 483 This value can be used: 485 o To determine the maximum number of receive buffers that might be 486 required to receive any response sent. 488 o To allocate and register a reply chunk to hold a possible large 489 reply. 491 The requester can avoid doing the second of these if the responder 492 has indicated it can use message continuation to send the response. 493 In this case, it makes sure that the buffers will be available and 494 indicates to the responder how many additional buffers (in the form 495 of pre-posted reads have been made available to accommodate 496 continuation transmissions. 498 When the responder processes the request, those additional receive 499 buffers may be used or not, or used only in part. This may be 500 because the response is shorter than the maximum possible response, 501 or because a reply chunk was used to transmit the response. 503 After the first or only transmission associated with the response is 504 received by the requester, it can be determined how many of the 505 additional buffers were used for the response. Any unused buffers 506 can be made available for other uses such as expanding the pool of 507 receive buffers available for the initial transmissions of response 508 or for receiving opposite direction requests. Alternatively, they 509 can be kept in reserve for future uses, such as being made available 510 to future requests which have potentially long responses. 512 5. Using Protocol Additions 514 In using existing RPC-over-RDMA facilities for protocol extension, 515 interoperability with existing implementations needs to be assured. 516 Because this document describes support for multiple features, we 517 need to clearly specify the various possible extensions and how peers 518 can determine whether certain facilities are supported by both ends 519 of the connection. 521 5.1. New Operation Support 523 Note that most of the new operations defined in this extension are 524 not tightly tied to a specific feature. XOPT_XMTREQ and XOPT_XMTRESP 525 are designed to support implementations that support either or both 526 Send-based data placement or message continuation. However, the 527 converse is not the case and these header types can be implemented by 528 those not supporting either of these features. For example, 529 implementations may only need support for the facilities described in 530 Section 3.4. 532 Implementations may determine whether a peer implementation supports 533 XOPT_XMTREQ, XOPT_XMTREQ, or XOPT_XMTCONT by attempting these 534 operations. An alternative is to interrogate the RTR Support 535 Property for information about which operations are supported. 537 5.2. Message Continuation Support 539 Implementations may determine and act based on the level of peer 540 implementation of support for message continuation as follows: 542 o To deal with issues relating to sending the peer multi- 543 transmission requests, the requester can interrogate the peer's 544 value of the Request Transmission Receive Limit (Section 8.4). In 545 cases in which the property is not provided or has the value one, 546 the requester implementation can avoid sending multi-transmission 547 requests, and use the equivalent of position-zero read chunks to 548 convey a request larger than the receive buffer limit. 550 Similarly, if the request is longer than can fit in a set of 551 transmissions given that limit, the request can be conveyed in the 552 same fashion, 554 o To deal with issues relating to sending the peer multi- 555 transmission responses, responders will only send multi- 556 transmission responses for requests conveyed using XOPT_XMTREQ 557 where the number of response transmissions is less than or equal 558 to buffer reservation count (in the field optxrq_rsbuf). The 559 requester can avoid receiving a message consisting of too many 560 transmissions by setting this field appropriately. This includes 561 the case in which the requester cannot handle any multi- 562 transmission responses. 564 o To avoid reserving receive buffers that the responder is not 565 prepared to use, the requester can interrogate the peer's value of 566 the Response Transmission Send Receive Limit (Section 8.5). In 567 cases in which it is possible that a request might result in a 568 response too large for this set of buffers, the requester, the 569 requester can provide a reply chunk to receive the response, which 570 the responder can use if the count of buffers provided is 571 insufficient. 573 5.3. Support for Send-based Data Placement 575 Implementations may determine and adapt to the level of peer 576 implementation support for send-based data placement as described 577 below. Note that an implementation may be able to send messages 578 containing bulk data items placed using send-based data placement 579 while not being prepared to receive them, or the reverse. 581 o The requester can interrogate the responder's Receive Buffer 582 Structure Property. In cases in which the property is not 583 provided or shows no placement-targetable buffer segments, an 584 implementation knows that messages containing bulk data may not be 585 sent using send-based data placement. In such cases, when 586 XOPT_XMTREQ is used to send a request, bulk items may be 587 transferred by setting the associated placement information to 588 indicate that the bulk data is to be fetched using explicit RDMA 589 operations. 591 o In cases in which a requester is unprepared to accept messages 592 using send-based data placement, its Receive Buffer Structure 593 Property will make this clear to the responder. Nevertheless, the 594 requester will generally indicate to the responder that bulk data 595 items are to be returned using explicit RDMA operations. As a 596 result, requesters may use XOPT_XMTREQ (and get the benefit of the 597 placement-related features discussed in Section 3.4 even if they 598 support neither message continuation nor send-based data 599 placement. 601 o Since it is possible for a responder to generate responses 602 containing bulk data using send-based data placement even if it is 603 not prepared to send such message, a requester who is prepared to 604 accept such messages should specify in the request that the 605 responses are to contain (or may contain) bulk data placed in this 606 way. In deciding whether this is to be done the requester can 607 interrogate the responder's RTR Support Property for information 608 about which whether the peer can send responses in this form. It 609 can do this without regard to whether the responder can accept 610 messages containing bulk data items placed using send-based data 611 placement. 613 In determining whether bulk data will be placed using send-based data 614 placement or via explicit RDMA operations, the level of support for 615 message continuation will have a role. This is because DDP using 616 explicit RDMA will reduce message size while send-based data 617 placement reduces the size of the payload stream by rearranging the 618 message, leaving the message size the same. As a result, the 619 considerations discussed in Section 4.3 will have to be attended to 620 by the sender in determining which form of data placement is to be 621 used. 623 5.4. Error Reporting 625 The more extensive transport layer functionality described in this 626 document requires its own means of reporting errors, to deal with 627 issues that are distinct from: 629 o Errors (including XDR errors) in the XDR stream as received by 630 responder or requester. 632 o XDR errors detected in the XDR headers defined by the base 633 protocol. 635 o XDR errors detected in the new operations defined in this 636 document. 638 Beyond the above, the following sorts of errors will have to be dealt 639 with, depending on which of the features of the extension are 640 implemented. 642 o Information associated with send-based data placement may be 643 inconsistent or otherwise invalid, even though it conforms to the 644 XDR definition. 646 o There may be problems with the organization of transmission groups 647 in that there are missing or extraneous transmissions. 649 In each of the above cases, the problem will be reported to the 650 sender using the Error Reporting operation which needs to be 651 supported by every endpoint that sends ROPT_XMTREQ, ROPT_XMTRESP, or 652 ROPT_XMTCONT. This includes cases in which the problem is one with a 653 reply. The function of the Error Reporting operation is to aid in 654 diagnosing transport protocol errors and allowing the sender to 655 recover or decide recovery is not possible. Reporting failure to the 656 requesting process is dealt with indirectly. For example, 658 o When the transmissions used to send a request are ill-formed, the 659 requestor can respond to the error indication by proceeding to 660 send the request using existing (i.e. non-extended) facilities. 661 If it chooses not to do so, the requestor can report an RPC 662 request failure to the initiator of the RPC. 664 o When the transmissions used to send a response are ill-formed, the 665 responder need to know about the problem since it will otherwise 666 assume that the transmissions succeeded. It can proceed to resend 667 the reply using existing (i.e. non-extended) facilities. If it 668 chooses not to do so, the requester will not see a response and 669 eventually an RPC timeout will occur. 671 6. XDR Preliminaries 673 6.1. Message Continuation Preliminaries 675 In order to implement message continuation, we have occasion to refer 676 to particular RPC-over-RDMA transmissions within a transmission group 677 or to characteristics of a later transmission group. 679 681 typedef uint32 xms_grpxn; 682 typedef uint32 xms_grpxc; 683 struct xms_id { 684 uint32 xmsi_xid; 685 msg_type xmsi_dir; 686 xms_grpxn xmsi_seq; 687 } 689 691 An xms_grpxn designates a particular RPC-over-RDMA transmission 692 within a set of transmissions devoted to sending a single RPC 693 message. 695 An xms_grpxc specifies the number of RPC-over-RDMA transmissions in a 696 potential group of transmissions devoted to sending a single RPC 697 message. 699 6.2. Data Placement Preliminaries 701 Data structures related to data placement use a number of XDR 702 typedefs to help clarify the meaning of fields in the data structures 703 which use these typedefs. 705 707 typedef uint32 xmdp_itemlen; 708 typedef uint32 xmdp_pldisp; 709 typedef uint32 xmdp_vsdisp; 711 typedef uint32 xmdp_tbsn; 713 enum xmdp_type { 714 XMPTYPE_EXRW = 1, 715 XMPTYPE_TBSN = 2, 716 XMPTYPE_CHOOSE = 3, 717 XMPTYPE_BYSIZE = 4, 718 XMPTYPE_TOOSHORT = 5, 719 XMPTYPE_NOITEM = 6 720 }; 722 724 An xmdp_itemlen specifies the length of XDR item. Because items 725 excised from the XDR stream are XDR items, lengths of items excised 726 from the XDR stream are denoted by xmdp_itemlens. 728 An xmdp_pldisp specifies a specific displacement with the payload 729 stream associated with a single RPC-over-RDNA transmission or a group 730 of such transmissions. Note that when multiple transmissions are 731 used for a single message, all of the payload streams within a 732 transmission group are considered concatenated. 734 An xmdp_vsdisp specifies a displacement within the virtual XDR stream 735 associates with the set of RPC messages transferred by single RPC- 736 over-RDNA transmission or a group of such transmissions. The virtual 737 XDR stream includes bulk data excised from the payload stream and so 738 displacements within it reflect those of the corresponding objects in 739 the XDR stream that might be sent and received if no bulk data 740 excision facilities were involved in the RPC transmission. 742 An xmdp_tbsn designates a particular target buffer segment within a 743 (trivial or non-trivial) RPC-over-RDMA transmission group. Each 744 placement-targetable buffer segment is assigned a number starting 745 with zero and proceeding through all the buffer segments for all the 746 RPC-over-RDMA transmissions in the group. This includes buffer 747 segments not actually used because transmission are shorter than the 748 maximum size and those in which a placement-targetable buffer segment 749 is used to hold part of the payload XDR stream rather than bulk data. 751 An xmdp_type allows a selection between placement using explicit RDMA 752 operations (i.e. DDP) and send-based data placement. Fields of this 753 type are used in a number of contexts. The specific context governs 754 which subset of the types is valid: 756 o In request messages, they indicate where each of the specially 757 placed data items within the request has been placed. In this 758 case, xmdp_type appears as the discriminator within an xmdp_loc 759 which is part of an xmdp_mitem that is an element within a 760 request's optxrq_dp field. 762 o In request messages, they direct the responder as to where 763 potential specially placed items are to be placed. In this case, 764 xmdp_type appears as the discriminator within an xmdp_rsdloc which 765 is part of an xmdp_rsditem that is an element within a request's 766 optxrq_rsd field. 768 o In response messages, they indicate how each of the potential 769 specially placed items has been dealt with. A subset of these 770 specially placed data items and are presented in the same form as 771 that used for specially placed data items within a request. In 772 this case, xmdp_type appears as the discriminator within an 773 xmdp_loc which is part of an xmdp_mitem that is an element within 774 a response's optxrs_dp field. 776 A number of these type are valid in all of these contexts, since they 777 specify use of a specific mode of data placement which is to be used 778 or has been used. 780 o XMPTYPE_EXRW selects DDP using explicit RDMA reads and writes. 782 o XMPTYPE_TBSN selects use of send-based data placement in which 783 placement-eligible data is located in placement-targetable buffer 784 segments. 786 Another set of types is used to direct the use of specific sets of 787 types but cannot specify an actual choice that has been made. 789 o XMPTYPE_CHOICE indicates that the responder may use either send- 790 based data placement or chunk-based DDP using explicit RDMA 791 operations, with a target location for the latter having been 792 provided by the requester. 794 o XMPTYPE_BYSIZE indicates that the responder is to use either send- 795 based data placement or chunk-based DDP using explicit RDMA 796 operations, with the choice between the two governed by the actual 797 size of the associated DDP-eligible XDR item. 799 The following types are used when no actual special placement has 800 occurred. They are used in responses to indicate ways in which a 801 direction to govern data placement in a reply was responded to 802 without resulting in special placement. 804 o XMPTYPE_TOOSHORT indicates that the corresponding entry in an 805 xmdp_rsdset was matched with a DDP-eligible item which was too 806 small to be handled using special placement, resulting in the DDP- 807 eligible item being placed inline. 809 o XMPTYPE_NOITEM indicates that the corresponding entry in an 810 xmdp_rsdset was not matched with a DDP-eligible item in the reply. 812 The following table indicates which of the above types is valid in 813 each of the contexts in which these types may appear. For valid 814 occurrences, it distinguishes those which give sender-generated 815 information about the message, and those that direct reply 816 construction, from those that indicate how those directions governed 817 the construction of a reply. For invalid occurrences, we distinguish 818 between those that result in XDR decode errors and those which are 819 valid from the XDR point of view but are semantically invalid. 821 +------------------+--------------+-----------------+---------------+ 822 | Type | xmdp_loc in | xmdp_rsdloc in | xmdp_loc in | 823 | | request | request | response | 824 +------------------+--------------+-----------------+---------------+ 825 | XMPTYPE_EXRW | Valid Info | Valid Direction | Valid Result | 826 | XMPTYPE_TBSN | Valid Info | Valid Direction | Valid Result | 827 | XMPTYPE_BYSIZE | XDR Invalid | Valid Direction | XDR Invalid | 828 | XMPTYPE_CHOICE | XDR Invalid | Valid Direction | XDR Invalid | 829 | XMPTYPE_TOOSHORT | Sem. Invalid | XDR Invalid | Valid Result | 830 | XMPTYPE_NOITEM | Sem. Invalid | XDR Invalid | Valid Result | 831 +------------------+--------------+-----------------+---------------+ 833 Table 1 835 7. Data Placement Structures 837 7.1. Data Placement Overview 839 To understand the new data placement structures defined here, it is 840 necessary to review the existing DDP structures used in RPC-over-RDMA 841 Version One and look at the corresponding structures in the new 842 message transmission headers defined in this document. 844 We look first at the existing structures. 846 o Read chunks are specified on requests to indicate data items to be 847 excised from the payload stream and fetched from the requester's 848 memory by the responder. As such, they serve as a means of 849 supplying data excised from the payload XDR stream. 851 Read chunks appear in replies but they have no clear function 852 there. 854 o Write chunks are specified on requests to provide locations in 855 requester memory to which DDP-eligible items in the corresponding 856 reply are to be transferred. They do not describe data in the 857 request but serve to direct reply construction. 859 When write chunks appear in replies they serve to indicate the 860 length of the data transferred. The addresses to which the bulk 861 reply data has been transferred is available, but this information 862 is already known to the requester. 864 o Reply chunks are specified to provide a location in the 865 requester's memory to which the responder can transfer the 866 response using RDMA Write. Like write chunks, they do not 867 describe data in the request but serve to direct reply 868 construction. 870 When reply chunks appear in reply message headers, they serve 871 mainly to indicate whether the reply chunk was actually used. 873 Within the data placement structures defined here a different 874 organization is used, even where DDP using explicit RDMA operations 875 in supported. 877 o All messages that contain bulk data contain structures that 878 indicate where the excised data is located. See Section 7.3 for 879 details. 881 o Requests that might generate replies containing bulk data contain 882 structures that provide guidance as to where the bulk data is to 883 be placed. See Section 7.4 for details. 885 Both sets of data structure are defined at the granularity of an RPC- 886 over-RDMA transmission group. That is, they describe the placement 887 of data within an RPC message and the scope of description is not 888 limited to a single RPC-over-RDMA transmission. 890 7.2. Buffer Structure Definition 892 Buffer structure definition information is used to allow the sender 893 to know how receive buffers are constructed, to allow it to 894 appropriately pad messages being sent so that bulk data will be 895 received into a memory area with the appropriate characteristics. 897 In this case, data placement will not place data in a specific 898 address, picked and registered in advance as is done to effect DDP 899 using explicit RDMA operations. Instead, a message is sent so that 900 when it is matched with one of the preposted receives, the bulk data 901 will be received into a memory area with the appropriate 902 characteristics, including: 904 o size 906 o alignment 908 o placement-targetability and potentially other memory 909 characteristics such as speed, persistence. 911 913 struct xmrbs_seg { 914 uint32 xmrseg_length; 915 uint32 xmrseg_align; 916 uint32 xmrseg_flags; 917 }; 919 const uint32 XMRSFLAG_PLT = 0x01; 921 struct xmrbs_group { 922 uint32 xmrgrp_count; 923 xmrbs_seg xmrgrp_info; 924 }; 926 struct xmrbs_buf { 927 uint32 xmrbuf_length; 928 xmrbs_group xmrbuf_groups<>; 929 }; 931 933 Buffers can be, and typically are, structured to contain multiple 934 segments. Preposted receives that target a buffer uses a scatter 935 list to place received messages in successive buffer segments. 937 An xmrbs_seg defines a single buffer segment. The fields included 938 are: 940 o xmrseg_length is the length of this contiguous buffer segment 942 o xmrseg_align specifies the guaranteed alignment for the 943 corresponding buffer segment. 945 o xmrseg_flags which specify some noteworthy characteristics of the 946 associated buffer segment. 948 The following flag bit is the only one currently defined: 950 o XMRSFLAG_PLT indicates that the buffer segment in question is to 951 be considered suitable as a target for data placement. 953 An xmrgs_group designates a set of buffer segment all with the same 954 buffer segment characteristics as indicated by xmr_grpinfo. The 955 buffer segments are contiguous within the buffer although they are 956 likely not to be physically contiguous. 958 An xmrbs_buf defines a receiver's buffer structure and consists of 959 multiple xmrbs_groups. This buffer structure, when made available as 960 a transport property, allows the sender to structure transmissions so 961 as to place DDP-eligible data in appropriate target buffer segments. 963 7.3. Message Data Placement Structures 965 These data structures show where in the virtual XDR stream for the 966 set of messages, data is to be excised from that XDR stream and where 967 that excised bulk data is to be found instead. 969 971 union xmdp_loc switch(xmdp_type type) 973 case XMPTYPE_EXRW: 974 rpcrdma1_segment xmdl_ex<>; 975 case XMPTYPE_TBSN: 976 xmdp_itemlen xmdl_offset; 977 xmdp_tbsn xmdl_bsnum<>; 978 case XMPTYPE_TOOSHORT: 979 case XMPTYPE_NOITEM: 980 void; 981 }; 983 struct xmdp_mitem { 984 xmdp_vsdisp xmdmi_disp; 985 xmdp_itemlen xmdmi_length; 986 xmdp_loc xmdmi_where; 987 }; 989 typedef xmdp_mitem xmdp_grpinfo<>; 991 992 An xmdp_loc shows where a particular piece of bulk data is located. 993 This information exists in multiple forms. 995 o The case for DDP using explicit RDMA operations, contains, in 996 xmdl_ex an array of rpcrdma1_segments showing where bulk data is 997 to be fetched from or has been transferred to. 999 o The case for send-based data placement contains, in xmdl_tbsn an 1000 array placement-targetable buffer segments, indicating where bulk 1001 data, excised from the payload stream, is actually located. The 1002 bulk data starts xmdl_offset bytes into the buffer segment 1003 designated by xmdl_bsnum[0] and then proceeds through buffer 1004 segments denoted by successive xmdl_bsnum entries until the length 1005 of the data item is exhausted. 1007 o The cases for XMPTYPE_TOOSHORT and XMPTYPE_NOITEM are only valid 1008 in responses 1010 An xmdp_mitem denotes a specific item of bulk data. It consists of: 1012 o The XDR stream displacement of the bulk data excised from the 1013 payload stream, in xmdmi_disp. 1015 o The length of the data item, in xmdmi_length. 1017 o The actual location of the bulk data, in xmdmi_loc. 1019 An xmdp_grpinfo consists of an array of xmdp_mitems describing all of 1020 the bulk data excised from all RPC messages sent in a single RPC- 1021 over-RDMA transmission group. Some possible cases: 1023 o The array is of length zero, indicating that there is no DDP- 1024 eligible data excised from the virtual XDR stream. In this case, 1025 the virtual XDR stream and the payload stream are identical. 1027 o The array consists of one or more xmdp_mitems, each of whose 1028 xmdmi_where fields is of type XMPTYPE_EXRW. In this case, the 1029 placement data corresponds to read chunks in the case in which a 1030 request is being sent and to write chunks in the case in which a 1031 reply is being sent. 1033 o The array consists of one or more xmdp_mitems, each of whose 1034 xmdmi_where fields is of type XMPTYPE_TBSN. In this case, each 1035 entry, whether it applies to bulk data in a request or a reply, 1036 describes data logically part of the message being sent, which may 1037 be part of any RPC-over-RDMA transmissions in the same 1038 transmission group. 1040 o The array consists of one or more xmdp_mitems, with xmdmi_where 1041 fields of a mixture of types, In this case, each entry, whether it 1042 applies to bulk data in a request or a reply, describes data 1043 logically part of the message being sent, although the method of 1044 getting access to that data may vary from entry to entry. 1046 7.4. Response Direction Data Placement Structures 1048 These data structures, when sent as part of the request, instruct the 1049 responder how to use data placement to place response data subject to 1050 special data placement. 1052 1054 union xmdp_rsdloc switch(xmdp_type type) 1056 case XMPTYPE_EXRW: 1057 case XMPTYPE_CHOICE: 1058 rpcrdma1_segment xmdrsdl_ex<>; 1059 case XMPTYPE_BYSIZE: 1060 xmdp_itemlen xmdrsdl_dsdov; 1061 rpcrdma1_segment xmdrsdl_bsex<>; 1062 case XMPTYPE_TBSN: 1063 void; 1064 }; 1066 struct xmdp_rsdrange { 1067 xmdp_vsdisp xmdrsdr_begin; 1068 xmdp_vsdisp xmdrsdr_end; 1069 }; 1071 struct xmdp_rsditem { 1072 xmdp_itemlen xmdrsdi_minlen; 1073 xmdp_rsdloc xmdrsdi_loc; 1074 }; 1076 struct xmdp_rsdset { 1077 xmdp_rsdrange xmdrsds_range; 1078 xmdp_rsditem xmdrsds_items<>; 1079 }; 1081 typedef xmdp_rsdset xmdp_rsdgroup<>; 1083 1085 An xmdp_rsdloc contains information specifying where bulk data 1086 generated as part of a reply is to be placed. This information is 1087 defined as a union with the following cases: 1089 o The case for DDP using explicit RDMA operations, XMPTYPE_EXRW, 1090 contains, in xmrsdl_ex, an array of rpcrdma1_segments showing 1091 where bulk data generated by the corresponding reply is to be 1092 transferred to. 1094 o The case allowing the responder to freely choose the data 1095 placement method, XMPTYPE_CHOICE, is identical. It also contains, 1096 in xmrsdl_ex, an array of rpcrdma1_segments showing where bulk 1097 data generated by the corresponding reply is to be transferred to 1098 if explicit RDMA requests are to be used. 1100 o The case for send-based data placement, XMPTYPE_TBSN, is void, 1101 since the decisions as to where bulk data is to be placed are made 1102 by the responder. 1104 o In the case directing the responder to choose the data placement 1105 method based on item size, XMPTYPE_BYSIZE, an array of 1106 rpcrdma1_segments is in xmrsdl_bsex. 1108 In all cases, each xmdp_rsdloc sent as part of a request has a 1109 corresponding xmdp_loc in the associated response. The xmdp_type 1110 specified in the request will affect the type in the response, but 1111 the types are not necessarily the same. The table below describes 1112 the valid combinations of request and response xmdp_type values. 1114 In this table, rows correspond to types in requests directing, the 1115 responder as to the desired placement in the response while the 1116 columns correspond to types in the ensuing response. Invalid 1117 combinations are labelled "Inv" while valid combination are labelled 1118 either "NDR" denoting no need to deregister memory, or "DR" to 1119 indicate that memory previously registered will need to be 1120 deregistered. 1122 +---------+--------+--------+-----------+---------+ 1123 | Type | EXRW | TBSN | TOOSHORT | NOITEM | 1124 +---------+--------+--------+-----------+---------+ 1125 | EXRW | DR | Inv. | DR | DR | 1126 | TBSN | Inv. | NDR | NDR | NDR | 1127 | CHOICE | DR | NDR | DR | DR | 1128 | BYSIZE | DR | NDR | DR | DR | 1129 +---------+--------+--------+-----------+---------+ 1131 Table 2 1133 An xmdp_rsdrange denotes a range of positions in the XDR stream 1134 associated with a request. Particular directions regarding bulk data 1135 in the corresponding response are limited to such ranges, where 1136 response XDR stream positions and request XDR stream positions can be 1137 reliably tied together. 1139 When the ULP supports multiple individual operations per RPC request 1140 (e.g., COMPOUND and CB_COMPOUND in NFSv4), an xmd_rsdrange can 1141 isolate elements of the reply due to particular operations. 1143 An xmdp_rsditem specifies the handling of one potential item of bulk 1144 data. The handling specified is qualified by a length range. If the 1145 item is smaller than xmdrsdi_minlen, it is not treated as bulk data 1146 and the corresponding data item appears in the payload stream, while 1147 that particular xmdp_rsditem is considered used up, making the next 1148 xmdp_rsditem in the xmdp_rsdset the target of the next DDP-eligible 1149 data item in the reply. Note that in the case in which xmdrsdi_loc 1150 specifies use of explicit RDMA operations, the area specified is not 1151 used and the requester is responsible for deregistering it. 1153 For each xmdp_rsditem, there will be a corresponding xmdp_mitem 1155 An xmdp_rsdset contains a set of xmdp_rsditems applicable to a given 1156 xmdp_range in the request. 1158 An xmdp_rsdgroup designates a set of xmdp_rsdsets applicable to a 1159 particular RPC-over-RDMA transmission group. The xmdrsds_range 1160 fields of successive xmdp_rsdsets must be disjoint and in strictly 1161 increasing order. 1163 8. Transport Properties 1165 8.1. Property List 1167 In this document we take advantage of the fact that the set of 1168 transport properties defined in [I-D.cel-nfsv4-rpcrdma-version-two]. 1169 is subject to later extension. The additional transport properties 1170 are summarized below in Table 3. 1172 In that table the columns have the following values: 1174 o The column labeled "property" identifies the transport property 1175 described by the current row. 1177 o The column labeled "#" specifies the propid value used to identify 1178 this property. 1180 o The column labeled "XDR type" gives XDR type of the data used to 1181 communicate the value of this property. This data overlays the 1182 nominally opaque field pv_data in a propval. 1184 o The column labeled "default" gives the default value for the 1185 property which is to be assumed by those who do not receive, or 1186 are unable to interpret, information about the actual value of the 1187 property. 1189 o The column labeled "section" indicates the section (within this 1190 document) that explains the semantics and use of this transport 1191 property. 1193 +------------------------------+----+-----------+---------+---------+ 1194 | property | # | XDR type | default | section | 1195 +------------------------------+----+-----------+---------+---------+ 1196 | RTR Support | 3 | uint32 | 0 | 8.2 | 1197 | Receive Buffer Structure | 4 | xmrbs_buf | Note1 | 8.3 | 1198 | Request Transmission Receive | 5 | xms_grpxc | 1 | 8.4 | 1199 | Limit | | | | | 1200 | Response Transmission Send | 6 | xms_grpxc | 1 | 8.5 | 1201 | Limit | | | | | 1202 +------------------------------+----+-----------+---------+---------+ 1204 Table 3 1206 The following notes apply to the above table: 1208 1. The default value for the Receive Buffer Structure always 1209 consists of a single buffer segment, without any alignment 1210 restrictions and not targetable for DDP. The length of that 1211 buffer segment derives from the Receive Buffer Size Property if 1212 available, and from the default receive buffer size otherwise. 1214 8.2. RTR Support Property 1216 1218 const uint32 XPROP_RTRSUPP = 3; 1219 typedef uint32 xpr_rtrs; 1221 const uint32 RTRS_XREQ = 1; 1222 const uint32 RTRS_XRESP = 2; 1223 const uint32 RTRS_XCONT = 4; 1225 1227 8.3. Receive Buffer Structure Property 1229 This property defines the structure of the endpoint's receive 1230 buffers, in order to give a sender the ability to place bulk data in 1231 specific DDP-targetable buffer segments. 1233 1235 const uint32 XPROP_RBSTRUCT = 4; 1236 typedef xmrbs_buf xpr_rbs; 1238 1240 Normally, this property, if specified, should be in agreement with 1241 Receive Buffer Size Property. However, the following rules apply. 1243 o If the value of Receive Buffer Structure Property is not 1244 specified, it is derived from the Receive Buffer Size Property, if 1245 known, and the default buffer size otherwise. The buffer is 1246 considered to consist of a single non-DDP-targetable segment whose 1247 size is the buffer size. 1249 o If the value of Receive Buffer Size Property is not specified and 1250 the Receive Buffer Structure Property is specified, the value of 1251 the former is derived from the latter, by adding up the length of 1252 all buffer segments specified. 1254 8.4. Request Transmission Receive Limit Property 1256 This property specifies the length of the longest request messages 1257 (in terms of number of transmissions) that a responder will accept. 1259 1261 const uint32 XPROP_REQRXLIM = 5; 1262 typedef uint32 xpr_rqrxl; 1264 1266 A requester can use this property to determine whether to send long 1267 requests by using message continuation or by using a position-zero 1268 read chunk. 1270 8.5. Response Transmission Send Limit Property 1272 This property specifies the length of the longest response message 1273 (in terms of number of transmissions) that a responder will generate. 1275 1277 const uint32 XPROP_RESPSXLIM = 6; 1278 typedef uint32 xpr_rssxl; 1280 1282 9. New Operations 1284 9.1. Operations List 1286 The proposed new operation are set for in Table 4 below. In that 1287 table, the columns have the following values: 1289 o The column labeled "operation" specifies the particular operation. 1291 o The column labeled "#" specifies the value of opttype for this 1292 operation. 1294 o The column labeled "XDR type" gives XDR type of the data structure 1295 used to describe the information in this new message type. This 1296 data overlays the nominally opaque field optinfo in an 1297 RDMA_OPTIONAL message. 1299 o The column labeled "msg" indicates whether this operation is 1300 followed (or not) by an RPC message payload (or something else). 1302 o The column labeled "section" indicates the section (within this 1303 document) that explains the semantics and use of this optional 1304 operation. 1306 +--------------------+----+--------------+--------+----------+ 1307 | operation | # | XDR type | msg | section | 1308 +--------------------+----+--------------+--------+----------+ 1309 | Transmit Request | 5 | optxmt_req | Note1 | 9.2 | 1310 | Transmit Response | 6 | optxmt_resp | Note1 | 9.3 | 1311 | Transmit Continue | 7 | optxmt_cont | Note2 | 9.4 | 1312 | Report Error | 8 | optrept_err | No. | 9.5 | 1313 +--------------------+----+--------------+--------+----------+ 1315 Table 4 1317 The following notes apply to the above table: 1319 1. Contains an initial segment of the message payload stream for an 1320 RPC message, or the entre payload stream. The optxr[qs]_pslen 1321 field, indicates the length of the section present 1323 2. May contain a part of a message payload stream for an RPC 1324 message, although not the entre payload stream. The optxc_pslen 1325 field, if non-zero, indicates that this portion is present, and 1326 the length of the section. 1328 9.2. Transmit Request Operation 1330 The message definition for this operation is as follows: 1332 1334 const uint32 ROPT_XMTREQ = 1; 1336 struct optxmt_req { 1337 xmdp_grpinfo optxrq_dp; 1338 xmdp_rsdgroup optxrq_rsd; 1339 xms_grpxc optxrq_count; 1340 xms_grpxc optxrq_rsbuf; 1341 xmdp_pldisp optxrq_pslen; 1343 }; 1345 1347 The field optxrq_dp describes the fields in virtual XDR stream which 1348 have been excised in forming the payload stream, and information 1349 about where the corresponding bulk data is located. 1351 The field optxrq_rsd consists of information directing the responder 1352 as to how to construct the reply, in terms of DDP. of length zero. 1354 The field optrq_count specifies the count of transmissions in this 1355 group of transmissions used to send a request. 1357 The field optrq_repch serves as a way to transfer a reply chunk to 1358 the responder to serve as a way in which a reply longer than the 1359 inline size limit may be transferred. Although, not prohibited by 1360 the protocol, it is unlikely to be used in environments in which 1361 message continuation is supported. 1363 The field optrq_pslen gives the length of the payload stream for the 1364 RPC transmitted. The payload stream begins right after the end of 1365 the optxmt_msg and proceeds for optxm_pslen bytes. This can include 1366 crossing buffer segment boundaries. 1368 9.3. Transmit Response Operation 1370 The message definition for this operation is as follows: 1372 1374 const uint32 ROPT_XMTRESP = 2; 1376 struct optxmt_resp { 1377 xmdp_grpinfo optxrs_dp; 1378 xms_grpxn optxrs_count; 1379 xmdp_pldisp optxrs_pslen; 1381 }; 1383 1385 The field optxrs_dp describes the fields in virtual XDR stream which 1386 have been excised in forming the payload stream, and information 1387 about where the corresponding bulk data is located. 1389 The field optrs_count specifies the count of transmissions in this 1390 group of transmissions used to send a reply. 1392 The field optrq_pslen gives the length of the payload stream for the 1393 RPC transmitted. The payload stream begins right after the end of 1394 the optxmt_msg and proceeds for optxm_pslen bytes. This can include 1395 crossing buffer segment boundaries. 1397 9.4. Transmit Continue Operation 1399 RPC-over-RDMA headers of this type are used to continue RPC messages 1400 begun by RPC-over-RDMA message of type ROPT_XMTREQ or ROPT_XMTRESP. 1401 The xid field of this message must match that in the initial 1402 transmission. 1404 This operation needs to be supported for the message continuation 1405 feature to be used. 1407 The message definition for this operation is as follows: 1409 1411 const uint32 ROPT_XMTCONT = 3; 1413 struct optxmt_cont { 1414 xms_grpxn optxc_xnum; 1415 uint32 optxc_itype; 1416 xmdp_pldisp; optxc_pslen; 1417 }; 1419 1420 The field optxc_xnum indicates the transmission number of this 1421 transmission within its transmission group. 1423 The field optxc_pslen gives the length of the section of the payload 1424 stream which is located in the current RPC-over-RDMA transmission. 1425 It is valid for this length to be zero, indicating that there is no 1426 portion of the payload stream in this transmission. Except when the 1427 length is zero, the payload stream begins right after the end of the 1428 optxmt_cont and proceeds for optxc_pslen bytes. This can include 1429 crossing buffer segment boundaries. In any case, the payload streams 1430 for all transmissions within the same group are considered 1431 concatenated. 1433 9.5. Error Reporting Operation 1435 This RPC-over-RDMA message type is used to signal the occurrence of 1436 errors that do not involve: 1438 1. Transmission of a message that violates the rules specified in 1439 [I-D.cel-nfsv4-rpcrdma-version-two]. 1441 2. Transmission of a message described in this document which does 1442 not conform to the XDR specified here. 1444 3. The transmission of a message, which, when assembled according to 1445 the rules here, cannot be decoded according to the XDR for the 1446 ULP. 1448 Such errors can arise if the rules specified in this document are not 1449 followed and can be the result of a mismatch between multiple, each 1450 of which is valid when considered on its own. 1452 The preliminary error-related definition is as follows: 1454 1456 enum optr_err { 1457 OPTRERR_BADHMT = 1, 1458 OPTRERR_BADOMT = 2, 1459 OPTRERR_BADCONT = 3, 1460 OPTRERR_BADSEQ = 4, 1461 OPTRERR_BADXID = 5, 1462 OPTRERR_BADOFF = 6, 1463 OPTRERR_BADTBSN = 7, 1464 OPTRERR_BADPL = 8 1465 } 1467 union optr_info switch(optr_err optre_which) { 1469 case OPTRERR_BADHMT: 1470 case OPTRERR_BADOMT: 1471 case OPTRERR_BADSEQ: 1472 case OPTRERR_BADXID: 1473 uint32 optri_expect; 1474 uint32 optri_current; 1476 case OPTRERR_BADCONT: 1477 void; 1479 case OPTRERR_BADTBSN: 1480 case OPTRERR_BADOFF: 1481 case OPTRERR_BADPL: 1482 uint32 optri_value; 1483 uint32 optri_min; 1484 uint32 optri_max; 1486 }; 1488 1490 optr_err enumerates the various error conditions that might be 1491 reported. 1493 o OPTRERR_BADHMT indicates that a header message type other than the 1494 one expected was received. In this context, a particular message 1495 type can be considered "expected" only because of message or group 1496 continuation. 1498 o OPTRERR_BADOMT indicates that an optional message type other than 1499 the one expected was received. In this context, a particular 1500 message type can be considered "expected" only because of message 1501 or group continuation. 1503 o OPTRERR_BADCONT indicates that a continuation messages was 1504 received when there was no reason to expect one. 1506 o OPTRERR_BADSEQ indicate that a transmission sequence number other 1507 than the one expected was received. 1509 o OPTRERR_BADXID indicate that an xid other than the one expected in 1510 a continuation context. 1512 o OPTRERR_BADTBSN indicate that an invalid target buffer sequence 1513 number was received. 1515 o OPTRERR_BADOFF indicate that a bad offset was received as part of 1516 an xmdp_loc. This is typically because the offset is larger than 1517 the buffer segment size. 1519 o OPTRERR_BADPL indicates that a bad offset was received for the 1520 payload length. This is typically because the length would make 1521 the area devoted to the payload stream not a subset of the actual 1522 transmission. 1524 The optr_info gives error about the specific invalid field being 1525 reported. The additional information given depends on the specific 1526 error. 1528 o For the errors OPTRERR_BADHMT, OPTRERR_BADOMT, OPTRERR_BADSEQ, and 1529 OPTRERR_BADXID, the expected and actual values of the field are 1530 reported 1532 o For the error OPTRERR_CONT, no additional information is provided. 1534 o For the errors OPTRERR_BADTBSN, OPTRERR_BADOFF, and OPTRERR_BADPL, 1535 the actual value together with a range of valid values is 1536 provided. When the actual value is with the valid range, it can 1537 be inferred that the actual value is not properly aligned (e.g. 1538 not on a 32-bit boundary) 1540 The message definition for this operation is as follows: 1542 1544 const uint32 ROPT_REPTERR = 4; 1546 struct optrept_err { 1547 xms_id optre_bad; 1548 xms_id *optre_lead; 1549 optr_info optre_info; 1550 }; 1552 1554 The field optre_bad is a description of the transmission on which the 1555 error was actually detected. 1557 The optional field optre_lead is a description of an earlier 1558 transmission that might have led to the error reported. 1560 The field optre_info provides information about the 1562 10. XDR 1564 This section contains an XDR [RFC4506] description of the proposed 1565 extension. 1567 This description is provided in a way that makes it simple to extract 1568 into ready-to-use form. The reader can apply the following shell 1569 script to this document to produce a machine-readable XDR description 1570 of extension which can be combined with XDR for the base protocol to 1571 produce an XDR that includes the base protocol together with the 1572 optional extensions. 1574 1576 #!/bin/sh 1577 grep '^ *///' | sed 's?^ /// ??' | sed 's?^ *///$??' 1579 1581 That is, if the above script is stored in a file called "extract.sh" 1582 and this document is in a file called "ext.txt" then the reader can 1583 do the following to extract an XDR description file for this 1584 extension: 1586 1588 sh extract.sh < ext.txt > xmitext.x 1590 1592 The XDR description for this extension can be combined with that for 1593 other extensions and that for the base protocol. While this is a 1594 complete description and can be processed by the XDR compiler, the 1595 result might not be usable to process the extended protocol, for a 1596 number of reasons: 1598 The RPC-over-RDMA transport headers do not constitute an RPC 1599 program and version negotiation and message selection part of the 1600 XDR, rather than being external to it. 1602 Headers used for requests and replies are not necessarily paired, 1603 as they would be in an RPC program. 1605 Header types defined as optional extensions overlay existing 1606 nominally opaque fields in the base protocol. While this overlay 1607 architecture allows code aware of the overlay relationships to 1608 have a more complete view of header structure, this overlay 1609 relationship cannot be expressed within the XDR language 1611 10.1. Code Component License 1613 Code components extracted from this document must include the 1614 following license text. When the extracted XDR code is combined with 1615 other complementary XDR code which itself has an identical license, 1616 only a single copy of the license text need be preserved. 1618 1620 /// /* 1621 /// * Copyright (c) 2010, 2016 IETF Trust and the persons 1622 /// * identified as authors of the code. All rights reserved. 1623 /// * 1624 /// * The author of the code is: D. Noveck. 1625 /// * 1626 /// * Redistribution and use in source and binary forms, with 1627 /// * or without modification, are permitted provided that the 1628 /// * following conditions are met: 1629 /// * 1630 /// * - Redistributions of source code must retain the above 1631 /// * copyright notice, this list of conditions and the 1632 /// * following disclaimer. 1633 /// * 1634 /// * - Redistributions in binary form must reproduce the above 1635 /// * copyright notice, this list of conditions and the 1636 /// * following disclaimer in the documentation and/or other 1637 /// * materials provided with the distribution. 1638 /// * 1639 /// * - Neither the name of Internet Society, IETF or IETF 1640 /// * Trust, nor the names of specific contributors, may be 1641 /// * used to endorse or promote products derived from this 1642 /// * software without specific prior written permission. 1643 /// * 1644 /// * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS 1645 /// * AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED 1646 /// * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1647 /// * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS 1648 /// * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO 1649 /// * EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE 1650 /// * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, 1651 /// * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT 1652 /// * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 1653 /// * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS 1654 /// * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF 1655 /// * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 1656 /// * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 1657 /// * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF 1658 /// * ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 1659 /// */ 1661 1663 10.2. XDR Proper for Extension 1665 1666 /// /******************************************************************* 1667 /// ******************************************************************* 1668 /// ** 1669 /// ** XDR for OPTIONAL protocol extension. 1670 /// ** 1671 /// ** Includes support for both message continuation and send-based 1672 /// ** DDP. The latter is supported by a new structure for the 1673 /// ** specification of data placements which can be used for both 1674 /// ** send-based data placement and DDP using explicit RDMA 1675 /// ** operations. 1676 /// ** 1677 /// ** Extensions include: 1678 /// ** 1679 /// ** o Four new transport properties. 1680 /// ** o Four new OPTIONAL message types 1681 /// ** 1682 /// ******************************************************************* 1683 /// ******************************************************************/ 1684 /// 1685 /// /******************************************************************* 1686 /// * 1687 /// * Core XDR Definitions 1688 /// * 1689 /// ******************************************************************/ 1691 /// /* 1692 /// * General XDR preliminaries for these features, 1693 /// */ 1694 /// typedef uint32 xms_grpxn; 1695 /// typedef uint32 xms_grpxc; 1696 /// 1697 /// /* 1698 /// * Basic XDR typedefs for the new approach to the specification of 1699 /// 8 data placement. 1700 /// */ 1701 /// typedef uint32 xmdp_itemlen; 1702 /// typedef uint32 xmdp_pldisp; 1703 /// typedef uint32 xmdp_vsdisp; 1704 /// typedef uint32 xmdp_tbsn; 1705 /// 1706 /// /* 1707 /// * Define the possible types of data placement items. 1708 /// */ 1709 /// enum xmdp_type { 1710 /// XMPTYPE_EXRW = 1, 1711 /// XMPTYPE_TBSN = 2, 1712 /// XMPTYPE_CHOOSE = 3, 1713 /// XMPTYPE_BYSIZE = 4, 1714 /// XMPTYPE_TOOSHORT = 5, 1715 /// XMPTYPE_NOITEM = 6 1716 /// }; 1717 /// 1718 /// /* 1719 /// * XDR defining the placement of bulk items in the message being 1720 /// * sent. 1721 /// */ 1722 /// union xmdp_loc switch(xmdp_type type) 1723 /// 1724 /// case XMPTYPE_EXRW: 1725 /// rpcrdma1_segment xmdl_ex<>; 1726 /// case XMPTYPE_TBSN: 1727 /// xmdp_itemlen xmdl_offset; 1728 /// xmdp_tbsn xmdl_bsnum<>; 1729 /// case XMPTYPE_TOOSHORT: 1730 /// case XMPTYPE_NOITEM: 1731 /// void; 1732 /// }; 1733 /// 1734 /// 1735 /// 1736 /// struct xmdp_mitem { 1737 /// xmdp_vsdisp xmdmi_disp; 1738 /// xmdp_itemlen xmdmi_length; 1739 /// xmdp_loc xmdmi_where; 1740 /// }; 1741 /// 1742 /// typedef xmdp_mitem xmdp_grpinfo<>; 1743 /// 1744 /// /* 1745 /// * XDR defining the placement of bulk items in the response to the 1746 /// * message being sent. 1747 /// */ 1748 /// union xmdp_rsdloc switch(xmdp_type type) 1749 /// 1750 /// case XMPTYPE_EXRW: 1751 /// case XMPTYPE_CHOICE: 1752 /// rpcrdma1_segment xmdrsdl_ex<>; 1753 /// case XMPTYPE_BYSIZE: 1754 /// xmdp_itemlen xmdrsdl_dsdov; 1755 /// rpcrdma1_segment xmdrsdl_bsex<>; 1756 /// case XMPTYPE_TBSN: 1758 /// void; 1759 /// }; 1760 /// 1761 /// struct xmdp_rsdrange { 1762 /// xmdp_vsdisp xmdrsdr_begin; 1763 /// xmdp_vsdisp xmdrsdr_end; 1764 /// }; 1765 /// 1766 /// struct xmdp_rsditem { 1767 /// xmdp_itemlen xmdrsdi_minlen; 1768 /// xmdp_rsdloc xmdrsdi_loc; 1769 /// }; 1770 /// 1771 /// struct xmdp_rsdset { 1772 /// xmdp_rsdrange xmdrsds_range; 1773 /// xmdp_rsditem xmdrsds_items<>; 1774 /// }; 1775 /// 1776 /// typedef xmdp_rsdset xmdp_rsdgroup<>; 1777 /// 1778 /// /******************************************************************* 1779 /// * 1780 /// * New Transport Properties 1781 /// * 1782 /// ******************************************************************/ 1783 /// 1784 /// /* 1785 /// * New Transport Property codes 1786 /// */ 1787 /// const uint32 XPROP_RTRSUPP = 3; 1788 /// const uint32 XPROP_RBSTRUCT = 4; 1789 /// const uint32 XPROP_REQRXLIM = 5; 1790 /// const uint32 XPROP_RESPSXLIM = 6; 1791 /// 1792 /// /* 1793 /// * XDR relating to RTR Support Property 1794 /// */ 1795 /// typedef uint32 xpr_rtrs; 1796 /// 1797 /// const uint32 RTRS_XREQ = 1; 1798 /// const uint32 RTRS_XRESP = 2; 1799 /// const uint32 RTRS_XCONT = 4; 1800 /// 1801 /// /* 1802 /// * Items related to Receive Buffer Structure Property 1803 /// */ 1804 /// struct xmrbs_seg { 1805 /// uint32 xmrseg_length; 1806 /// uint32 xmrseg_align; 1807 /// uint32 xmrseg_flags; 1808 /// }; 1809 /// 1810 /// const uint32 XMRSFLAG_PLT = 0x01; 1811 /// 1812 /// struct xmrbs_group { 1813 /// uint32 xmrgrp_count; 1814 /// xmrbs_seg xmrgrp_info; 1815 /// }; 1816 /// 1817 /// struct xmrbs_buf { 1818 /// uint32 xmrbuf_length; 1819 /// xmrbs_group xmrbuf_groups<>; 1820 /// }; 1821 /// typedef xmrbs_buf xpr_rbs; 1822 /// 1823 /// /* 1824 /// * XDR relating to transmission limit properties 1825 /// */ 1826 /// typedef uint32 xpr_rqrxl; 1827 /// 1828 /// typedef uint32 xpr_rssxl; 1829 /// 1830 /// /******************************************************************* 1831 /// * 1832 /// * New OPTIONAL Message Types 1833 /// * 1834 /// ******************************************************************/ 1835 /// 1836 /// /* 1837 /// * New message type codes 1838 /// */ 1839 /// const uint32 ROPT_XMTREQ = 1; 1840 /// const uint32 ROPT_XMTRESP = 2; 1841 /// const uint32 ROPT_XMTCONT = 3; 1842 /// const uint32 ROPT_REPTERR = 4; 1843 /// 1844 /// 1845 /// /* 1846 /// * New message type to do the initial transmission of a request. 1847 /// */ 1848 /// struct optxmt_req { 1849 /// xmdp_grpinfo optxrq_dp; 1850 /// xmdp_rsdgroup optxrq_rsd; 1851 /// xms_grpxc optxrq_count; 1852 /// xms_grpxc optxrq_rsbuf; 1853 /// xmdp_pldisp optxrq_pslen; 1854 /// 1855 /// }; 1856 /// 1857 /// /* 1858 /// * New message type to do the initial transmission of a response. 1859 /// */ 1860 /// struct optxmt_resp { 1861 /// xmdp_grpinfo optxrs_dp; 1862 /// xms_grpxn optxrs_count; 1863 /// xmdp_pldisp optxrs_pslen; 1864 /// 1865 /// }; 1866 /// 1867 /// /* 1868 /// * New message type to transmit the continuation of a request or 1869 /// * response. 1870 /// */ 1871 /// struct optxmt_cont { 1872 /// xms_grpxn optxc_xnum; 1873 /// uint32 optxc_itype; 1874 /// xmdp_pldisp; optxc_pslen; 1875 /// }; 1876 /// 1877 /// /* 1878 /// * XDR definitions to support error reporting. 1879 /// */ 1880 /// enum optr_err { 1881 /// OPTRERR_BADHMT = 1, 1882 /// OPTRERR_BADOMT = 2, 1883 /// OPTRERR_BADCONT = 3, 1884 /// OPTRERR_BADSEQ = 4, 1885 /// OPTRERR_BADXID = 5, 1886 /// OPTRERR_BADOFF = 6, 1887 /// OPTRERR_BADTBSN = 7, 1888 /// OPTRERR_BADPL = 8 1889 /// } 1890 /// 1891 /// union optr_info switch(optr_err optre_which) { 1892 /// 1893 /// case OPTRERR_BADHMT: 1894 /// case OPTRERR_BADOMT: 1895 /// case OPTRERR_BADSEQ: 1896 /// case OPTRERR_BADXID: 1897 /// uint32 optri_expect; 1898 /// uint32 optri_current; 1899 /// 1900 /// case OPTRERR_BADCONT: 1901 /// void; 1902 /// 1903 /// 1904 /// case OPTRERR_BADTBSN: 1905 /// case OPTRERR_BADOFF: 1906 /// case OPTRERR_BADPL: 1907 /// uint32 optri_value; 1908 /// uint32 optri_min; 1909 /// uint32 optri_max; 1910 /// 1911 /// }; 1912 /// 1913 /// struct xms_id { 1914 /// uint32 xmsi_xid; 1915 /// msg_type xmsi_dir; 1916 /// xms_grpxn xmsi_seq; 1917 /// }; 1918 /// 1919 /// /* 1920 /// * New message type for error reporting. 1921 /// */ 1922 /// struct optrept_err { 1923 /// xms_id optre_bad; 1924 /// xms_id *optre_lead; 1925 /// optr_info optre_info; 1926 /// }; 1927 /// 1928 /// 1929 1931 11. Security Considerations 1933 The extension described has the same security considerations 1934 described in [RFC8166] and [I-D.cel-nfsv4-rpcrdma-version-two]. With 1935 regard to the transport properties introduced in this document, it is 1936 possible that a man-in-the-middle could interfere with the 1937 communication of transport properties with possible negative effects. 1938 To prevent such interference, the steps described in 1939 [I-D.cel-nfsv4-rpcrdma-version-two] should be attended to. 1941 The use of the techniques described in this document to reduce use of 1942 explicit RDMA operations raise important issues which implementers 1943 should consider: 1945 While the use of these techniques may be expedient in certain 1946 cases, their use is not likely to be universal, at least for a 1947 considerable time. As a result, implementers should remain aware 1948 of the issues discussed in Section 9.1 of [RFC8166], unless and 1949 until it is certain that none of a requesters memory can be 1950 registered for remote access. 1952 Extra care needs to be taken in cases in which padding needs to be 1953 inserted in a transmission to ensure that DDP-targetable data item 1954 will be received in an appropriately aligned buffer segment. In 1955 some implementations, sensitive data could be inadvertently sent 1956 within the padding. To prevent this, the padding can be zeroed or 1957 it can be sent from a pre-zeroed area using a gather list. 1959 12. IANA Considerations 1961 This document does not require any actions by IANA. 1963 13. References 1965 13.1. Normative References 1967 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1968 Requirement Levels", BCP 14, RFC 2119, 1969 DOI 10.17487/RFC2119, March 1997, 1970 . 1972 [RFC4506] Eisler, M., Ed., "XDR: External Data Representation 1973 Standard", STD 67, RFC 4506, DOI 10.17487/RFC4506, May 1974 2006, . 1976 [RFC8166] Lever, C., Ed., Simpson, W., and T. Talpey, "Remote Direct 1977 Memory Access Transport for Remote Procedure Call Version 1978 1", RFC 8166, DOI 10.17487/RFC8166, June 2017, 1979 . 1981 13.2. Informative References 1983 [I-D.cel-nfsv4-rpcrdma-version-two] 1984 Lever, C. and D. Noveck, "RPC-over-RDMA Version 2 1985 Protocol", draft-cel-nfsv4-rpcrdma-version-two-05 (work in 1986 progress), July 2017. 1988 [RFC5662] Shepler, S., Ed., Eisler, M., Ed., and D. Noveck, Ed., 1989 "Network File System (NFS) Version 4 Minor Version 1 1990 External Data Representation Standard (XDR) Description", 1991 RFC 5662, DOI 10.17487/RFC5662, January 2010, 1992 . 1994 [RFC5666] Talpey, T. and B. Callaghan, "Remote Direct Memory Access 1995 Transport for Remote Procedure Call", RFC 5666, 1996 DOI 10.17487/RFC5666, January 2010, 1997 . 1999 [RFC5667] Talpey, T. and B. Callaghan, "Network File System (NFS) 2000 Direct Data Placement", RFC 5667, DOI 10.17487/RFC5667, 2001 January 2010, . 2003 [RFC8178] Noveck, D., "Rules for NFSv4 Extensions and Minor 2004 Versions", RFC 8178, DOI 10.17487/RFC8178, July 2017, 2005 . 2007 Acknowledgments 2009 The author gratefully acknowledges the work of Brent Callaghan and 2010 Tom Talpey producing the original RPC-over-RDMA Version One 2011 specification [RFC5666] and also Tom's work in helping to clarify 2012 that specification. 2014 The author also wishes to thank Chuck Lever for his work resurrecting 2015 NFS support for RDMA in [RFC8166], for clarifying the relationshp 2016 between RDMA and direct data placement, and for beginning the work on 2017 RPC-over-RDMA Version Two. 2019 The extract.sh shell script and formatting conventions were first 2020 described by the authors of the NFSv4.1 XDR specification [RFC5662]. 2022 Author's Address 2024 David Noveck 2025 NetApp 2026 1601 Trapelo Road 2027 Waltham, MA 02451 2028 US 2030 Phone: +1 781 572 8038 2031 Email: davenoveck@gmail.com