idnits 2.17.1 draft-ietf-storm-mpa-peer-connect-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year (Using the creation date from RFC5043, updated by this document, for RFC5378 checks: 2003-09-15) -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 15, 2011) is 4515 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFCXXXX' is mentioned on line 1001, but not defined ** Obsolete normative reference: RFC 4960 (Obsoleted by RFC 9260) -- No information found for draft-hilland-iwarp-verbs-v1 - is the name correct? Summary: 1 error (**), 0 flaws (~~), 2 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 STORM A. Kanevsky, Ed. 3 Internet-Draft Dell Inc. 4 Updates: 5043, 5044 (if approved) C. Bestler, Ed. 5 Intended status: Standards Track Nexenta Systems 6 Expires: June 17, 2012 R. Sharp 7 Intel 8 S. Wise 9 Open Grid Computing 10 December 15, 2011 12 Enhanced RDMA Connection Establishment 13 draft-ietf-storm-mpa-peer-connect-09 15 Abstract 17 This document updates RFC 5043 and RFC 5044 by extending Marker 18 Protocol Data Unit (PDU) Aligned Framing (MPA) negotiation for Remote 19 Direct Memory Access (RDMA) connection establishment. The first 20 enhancement extends RFC 5044, enabling peer-to-peer connection 21 establishment over MPA/ Transmission Control Protocol (TCP). The 22 second enhancement extends both RFC 5043 and RFC 5044, by providing 23 an option for standardized exchange of RDMA-layer connection 24 configuration. 26 Status of this Memo 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 41 This Internet-Draft will expire on June 17, 2012. 43 Copyright Notice 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 carefully, as they describe your rights and restrictions with respect 53 to this document. Code Components extracted from this document must 54 include Simplified BSD License text as described in Section 4.e of 55 the Trust Legal Provisions and are provided without warranty as 56 described in the Simplified BSD License. 58 Table of Contents 60 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 61 1.1. Summary of changes affecting RFC 5044 . . . . . . . . . . 4 62 1.2. Summary of changes affecting RFC 5043 . . . . . . . . . . 4 63 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 4 64 3. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 4. Motivations . . . . . . . . . . . . . . . . . . . . . . . . . 7 66 4.1. Standardization of RDMA Read Parameter Configuration . . . 7 67 4.2. Enabling MPA Mode . . . . . . . . . . . . . . . . . . . . 9 68 4.3. Lack of Explicit RTR in MPA Request/Reply Exchange . . . . 9 69 4.4. Limitations on ULP Workaround . . . . . . . . . . . . . . 10 70 4.4.1. Transport Neutral APIs . . . . . . . . . . . . . . . . 11 71 4.4.2. Work/Completion Queue Accounting . . . . . . . . . . . 11 72 4.4.3. Host-based Implementation of MPA Fencing . . . . . . . 12 73 5. Enhanced MPA Connection Establishment . . . . . . . . . . . . 12 74 6. Enhanced MPA Request/Reply Frames . . . . . . . . . . . . . . 13 75 7. Enhanced SCTP Session Control Chunks . . . . . . . . . . . . . 14 76 8. MPA Error Reporting . . . . . . . . . . . . . . . . . . . . . 16 77 9. Enhanced RDMA Connection Establishment Data . . . . . . . . . 16 78 9.1. IRD and ORD Negotiation . . . . . . . . . . . . . . . . . 17 79 9.2. Peer-to-Peer Connection Negotiation . . . . . . . . . . . 19 80 9.3. Enhanced Connection Negotiation Flow . . . . . . . . . . . 20 81 10. Interoperability . . . . . . . . . . . . . . . . . . . . . . . 21 82 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 83 12. Security Considerations . . . . . . . . . . . . . . . . . . . 22 84 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 85 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 86 14.1. Normative References . . . . . . . . . . . . . . . . . . . 22 87 14.2. Informative References . . . . . . . . . . . . . . . . . . 23 88 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 90 1. Introduction 92 When used over Transmission Control Protocol (TCP), the current 93 Remote Direct Data Placement (RDDP) [RFC5041] suite of protocols 94 relies on MPA [RFC5044] protocol for both connection establishment 95 and for markers for TCP layering. 97 A typical model for establishing an RDMA connection has the following 98 steps: 100 o The passive side (responder) Upper Layer Protocol (ULP) listens 101 for connection requests. 103 o The active side (initiator) ULP submits a connection request using 104 an RDMA endpoint, the desired destination and the parameters to be 105 used for the connection. Those parameters include both RDMA layer 106 characteristics, such as the number of simultaneous RDMA Read 107 Requests to be allowed and application specific data. 109 o The passive side ULP receives a connection request, which includes 110 the identity of the active side and the requested connection 111 characteristics. The passive side ULP uses this information to 112 decide whether to accept the connection, and if it is to be 113 accepted, how to create and/or configure the local RDMA endpoint. 115 o If accepting, responder submits its acceptance of the connection 116 request, which in turn, generates the accept message to initiator. 117 This responder accept operation includes the RDMA endpoint to be 118 used and the connection characteristics (both the RDMA 119 configuration and any application specific private data to be 120 transferred to initiator). 122 o The active side receives confirmation that the connection has been 123 accepted, what the configured connection characteristics are, and 124 any application supplied private data. 126 Currently, MPA only supports a client-server model for connection 127 establishment, forcing peer-to-peer applications to interact as 128 though they had a client/server relationship. In addition 129 negotiation of some of Remote Direct Memory Access Protocol (RDMAP) 130 [RFC5040] specific parameters are left to ULP negotiation. Providing 131 an optional ULP-independent format for exchanging these parameters 132 would be of benefit to transport neutral Remote Direct Memory Access 133 (RDMA) applications. 135 1.1. Summary of changes affecting RFC 5044 137 This draft enhances [RFC5044] MPA connection setup protocol. First, 138 it adds exchange and negotiation of the parameters necessary to 139 support RDMA Read Requests. Second, it adds a message that serves as 140 a Ready to Receive (RTR) indication from the initiator to the 141 responder as the last message of connection establishment and adds 142 negotiation of an which type of message to use to carry the RTR 143 indication into MPA request/reply frames. 145 1.2. Summary of changes affecting RFC 5043 147 This draft enhances [RFC5043] by adding new Enhanced Session Control 148 Chunks that extends the currently defined Chunks with the addition of 149 Inbound RDMA Read Queue Depth (IRD) and Outbound RDMA Read Queue 150 Depth (ORD) negotiation. 152 2. Requirements Language 154 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 155 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 156 document are to be interpreted as described in [RFC2119]. 158 3. Definitions 160 Active Side: See Initiator. 162 Consumer: The ULPs or applications that lie above MPA and Direct 163 Data Placement (DDP). The Consumer is responsible for making TCP 164 or SCTP connections, starting MPA and DDP connections, and 165 generally controlling operations. See [RFC5044] and [RFC5043]. 167 CRC: Cyclic Redundancy Check 169 Completion Queue (CQ): A consumer accessible queue where the RDMA 170 device reports completions of Work Requests. A Consumer is able 171 to reap completions from a CQ without requiring per transaction 172 support from the kernel or other privileged entity. See [RDMAC]. 174 Completion Queue Entry (CQE): Transport and device specific 175 representation of a Work Completion. A Completion Queue holds 176 CQEs. See [RDMAC]. 178 FULPDU: Framed Upper Layer Protocol PDU. See FPDU of [RFC5044]. 180 Inbound RDMA Read Request Queue (IRRQ): A queue that is associated 181 with an RDMA Connection that tracks active incoming simultaneous 182 RDMA Read Request Messages. See [RDMAC]. 184 Inbound RDMA Read Queue Depth (IRD): The maximum number of incoming 185 simultaneous RDMA Read Request Messages an RDMA connection can 186 handle. See [RDMAC]. 188 Initiator: The endpoint of a connection that sends the MPA Request 189 Frame. Initiator is the active side of the connection 190 establishment. See [RFC5044]. 192 IRD: See Inbound RDMA Read Queue Depth. 194 MPA Fencing: MPA responder Connection Establishment logic that 195 ensures that no ULP messages will be transferred until the 196 initiator first message has been received. 198 MPA Request Frame: Data sent from the MPA initiator to the MPA 199 responder during the Startup Phase. See [RFC5044]. 201 MPA Reply Frame: Data sent from the MPA responder to the MPA 202 initiator during the Startup Phase. See [RFC5044]. 204 ORD: See Outbound RDMA Read Queue Depth. 206 Outbound RDMA Read Queue Depth (ORD): The maximum number of 207 simultaneous RDMA Read Requests that can be issued for the RDMA 208 connection. This should be less than or equal to the peer's IRD. 209 See [RDMAC]. 211 Passive Side: See Responder. 213 Private Data: A block of data exchanged between MPA endpoints during 214 initial connection setup. See [RFC5044]. 216 Queue Pair (QP): The traditional name for a local Endpoint in a 217 [VIA] derived local interface. A Queue Pair is the set of Work 218 Queues associated exclusively with a single Endpoint. The Send 219 Queue (SQ), Receive Queue (RQ) and Inbound RDMA Read Queue (IRQ) 220 are considered to be part of the Queue Pair. The potentially 221 shared Completion Queue (CQ) and Shared Receive Queue (SRQ) are 222 not. See [RDMAC]. 224 Remote Peer: The MPA protocol implementation on the opposite end of 225 the connection. Used to refer to the remote entity when 226 describing protocol exchanges or other interactions between two 227 Nodes. See [RFC5044]. 229 Responder: The connection endpoint that responds to an incoming MPA 230 connection request (the MPA Request Frame). Responder is the 231 passive side of the connection establishment. See [RFC5044]. 233 Ready to Receive (RTR): RTR is an indication provided by the last 234 connection establishment message sent from the initiator to the 235 responder. An RTR indicates that the initiator is ready to 236 receive messages and that connection establishment is completed. 238 Startup Phase: The initial exchanges of an MPA connection that 239 serves to more fully identify MPA endpoints to each other and pass 240 connection specific setup information to each other. See 241 [RFC5044]. 243 Shared Receive Queue(SRQ): A shared pool of Receive Work Requests 244 posted by the Consumer that can be allocated by multiple RDMA 245 endpoints (Queue Pair). See [RDMAC]. 247 Tagged (DDP) Message: - A DDP Message that targets a Tagged Buffer 248 that is explicitly Advertised to the Remote Peer through exchange 249 of an STag (memory handle), offset in the memory region identified 250 by STag, and length [RFC5040]. 252 Untagged (DDP) Message: - A DDP Message that targets an Untagged 253 Buffer associated with a queue specified by Queue Number (QN). 254 [RFC5040]. 256 Work Queue: An element of a [VIA] derived local interface that 257 allows user-space applications to submit Work Requests directly to 258 network hardware. Specific Work Queues include the Send Queue 259 (SQ) for transmit requests, Receive Queue (RQ) for receive 260 requests specific to a single Endpoint and Shared Receive Queues 261 (SRQs) for receive requests that can be allocated by one or more 262 Endpoints. See [RDMAC]. 264 Work Queue Element (WQE): Transport and device specific 265 representation of a Work Request. See [RDMAC]. 267 Work Request: An elementary object used by Consumers to enqueue a 268 requested operation (WQEs) onto a Work Queue. See [RDMAC]. 270 4. Motivations 272 The goal of this draft is twofold. One is to extend support from the 273 current client-server model for RDMA connection setup to a peer-to- 274 peer model. The second is to add negotiation of RDMA Read queue size 275 for both sides of an RDMA connection. 277 4.1. Standardization of RDMA Read Parameter Configuration 279 Most RDMA applications are developed using a transport neutral 280 Application Programming Interface (API) to access RDMA services based 281 on a "queue pair" paradigm as originally defined by the Virtual 282 Interface Architecture [VIA], refined by the Direct Access 283 Programming Library [DAPL] and most commonly deployed with the 284 OpenFabrics API [OFA]. 286 These transport neutral APIs seek to provide a common set of RDMA 287 services whether the underlying transport is, for example, RDDP over 288 MPA, RDDP over SCTP or InfiniBand. 290 The common model for establishing an RDMA connection has the 291 following steps: 293 o The passive side ULP listens for connection requests. 295 o The active side ULP submits a connection request using an RDMA 296 endpoint ("queue pair"), the desired destination and the 297 parameters to be used for the connection. Those parameters 298 include both RDMA layer characteristics, such as the RDMA Read 299 credits to be allowed and application specific data (typically 300 referred to as "private data"). 302 o The passive side ULP receives a connection request, which includes 303 the identity of the active side and the requested connection 304 characteristics. The passive side ULP uses this information to 305 decide whether to accept the connection, and if it is to be 306 accepted, how to create and/or configure the RDMA endpoint. 308 o If accepting, the passive side ULP submits its acceptance of the 309 connection request. This local accept operation includes the RDMA 310 endpoint to be used and the connection characteristics (both the 311 RDMA configuration and any application specific private data to be 312 returned). 314 o The active side receives confirmation that the connection has been 315 accepted, what the configured connection characteristics are, and 316 any application supplied private data. 318 As currently defined, DDP connection establishment requires the ULP 319 to encode the RDMA configuration in the application specific private 320 data. This results in undesirable duplication of logic to cover both 321 InfiniBand and RDDP, and to specify the extraction of the RDMA 322 characteristics from the ULP for each specific Upper Layer Protocol. 324 Both RDDP and InfiniBand support an initial private data exchange, 325 therefore a standard definition of the RDMA characteristics within 326 the private data section would enable common connection establishment 327 APIs to format the RDMA characteristics based on the same API 328 information used when establishing either protocol to form the 329 connection. The application would then only have to indicate that it 330 was using this standard format to enable common connection 331 establishment procedures to apply common code to properly parse these 332 fields and configure the RDMA endpoints accordingly. Exchange of 333 parameters necessary to perform RDMA Read operations is a common 334 usage of the initial private data exchange. 336 One of the RDMA operations that is defined in [RDMAC] is an RDMA 337 Read. RDMA Read operations are performed using an untagged message 338 sent from a Queue Pair (QP) on the local endpoint to a QP on the 339 remote endpoint targeting the Inbound RDMA Read Request Queue (QN=1 340 or Inbound RDMA Read Request Queue (IRRQ)) associated with the 341 connection. RDMA Read responses transfer data associated with each 342 RDMA Read Request from the remote endpoint to the local endpoint 343 using tagged messages. An inbound RDMA Read Request remains on the 344 IRRQ from the time that it is received until the time that the last 345 tagged message associated with the RDMA request is acknowledged. The 346 IRRQ is associated with a QP but is not a Work Queue. Instead the 347 IRRQ is a standalone queue that is used to manage RDMA read requests 348 associated with a QP. See [RDMAC] section 6 for more information 349 regarding QPs and IRRQ. One of the characteristics that must be 350 configured for a QP is the size of the IRRQ. This parameter is 351 called the Inbound RDMA Read Queue Depth (IRD). Another 352 characteristic of a QP that must be configured a local limit on the 353 number of simultaneous outbound RDMA Read Requests based on the size 354 of the remote endpoint QP's IRRQ. This parameter is call the 355 Outbound RDMA Read Queue Depth (ORD). ORD is used to limit the 356 number of simultaneous RDMA read requests such that the local 357 endpoint does not overrun the remote endpoint's IRRQ depth or IRD. 358 Note that outbound RDMA Reads are submitted to a QP's Send Queue at 359 the local peer, not to a separate outbound RDMA read request queue on 360 the local peer. The local endpoint uses ORD to strictly limit 361 simultaneous read requests so that IRRQ overruns do not occur at the 362 remote endpoint. 364 Determination of the values of the ORD and IRD are left to the ULP by 365 the current RDDP suite of protocols and also by [RDMAC]. Since this 366 negotiation of ORD and IRD is typical, it is desirable to provide a 367 common mechanism described in this draft. 369 4.2. Enabling MPA Mode 371 MPA defines encoding of DDP Segments in Framed Upper Layer Protocol 372 PDUs (FULPDUs). Generation of FULPDUs requires the ability to 373 periodically insert MPA Markers and to generate the MPA CRC-32c for 374 each frame. Reception may require parsing/removing the markers after 375 using them to identify MPA Frame boundaries, and validation of the 376 MPA-CRC32c. 378 A major design objective for MPA was to ensure that the resulting TCP 379 stream would be a fully compliant TCP stream for any and all TCP- 380 aware middle-boxes. The challenge is that while only some TCP 381 payload streams are a valid stream of MPA FULPDUs, any sequence of 382 bytes is a valid TCP payload stream. The determination that a given 383 stream is in a specific MPA mode cannot be made at the MPA or TCP 384 layer. Therefore enabling of MPA mode is handled by the ULP. 386 The MPA protocol can be viewed as having two parts. 388 o a specification of generation and reception of MPA FULPDUs. This 389 is unchanged by enhanced RDMA connection establishment. 391 o a pre-MPA exchange of messages to enable a specific MPA mode for 392 the TCP connection. Enhanced RDMA connection establishment 393 extends this protocol with two new features. 395 In typical implementations, generation and reception of MPA FULPDUs 396 is handled by hardware. The exchange of the MPA Request and Reply 397 frames is then handled by host software. As will be explained, this 398 implementation split impedes applications which are not compatible 399 with the client-server assumptions in the current MPA Request/Reply 400 exchange. 402 4.3. Lack of Explicit RTR in MPA Request/Reply Exchange 404 The exchange of MPA Request and Reply messages to place a TCP 405 connection in MPA mode is specified in [RFC5044]. This protocol 406 provides many benefits to the design of MPA FULPDU hardware: 408 o The ULP is responsible for specifying the exact MPA Mode (Markers 409 enabled or disabled, CRC-32c enabled or suppressed) and the point 410 in the TCP streams (inbound and outbound) where MPA frames will 411 begin. 413 o Before the first MPA frame is transmitted, all pre-MPA mode TCP 414 payload will have been acknowledged by the peer. Therefore it is 415 never necessary to generate a retransmission that mixes pre-MPA 416 and MPA payload. 418 o Before MPA reception is enabled, all incoming pre-MPA mode TCP 419 payload will have been acknowledged. Therefore the host will 420 never receive a TCP segment that mixes pre-MPA and MPA payload. 422 The limitation of the current MPA Request/Reply exchange is that it 423 does not define a Ready to Receive (RTR) indication that the active 424 side would send, so that the passive side can know that the last non- 425 MPA payload (the MPA Reply) had been received. 427 Instead, the role of an RTR indication is piggy-backed on the first 428 MPA FULPDU sent by the active side. This is actually a valuable 429 optimization for all applications that fit the classic client/server 430 model. The client only initiates the connection when it has a 431 request to send to the server, and the server has nothing to send 432 until it has received and processed the client request. 434 Even applications where the server sends some configuration data 435 immediately can easily send the same information as application 436 private data in the MPA Reply. So the currently defined exchange 437 works for almost all applications. 439 Many peer-to-peer applications, especially those involving cluster 440 calculations (frequently using Message Passing Interface (MPI) 441 [UsingMPI], or [RDS]), have no natural client or server roles 442 ([PPMPI], [OpenMP]). Typically one member of the cluster is 443 arbitrarily selected to initiate the connection when the distributed 444 task is launched, while the other accepts it. At startup time, 445 however, there is no way to predict which node will have the first 446 message to actually send. Establishing the connections immediately, 447 however, is valuable because it reduces latency once results are 448 ready to transmit and it validates connectivity throughout the 449 cluster. 451 The lack of an explicit RTR indication in the MPA Request/Reply 452 exchange forces all applications to have a first message from the 453 connection initiator, whether this matches the application 454 communication model or not. 456 4.4. Limitations on ULP Workaround 458 The requirement that the RDMA connection initiator sends the first 459 message does not appear to be onerous on first examination. The 460 natural question is why the application layer would not simply 461 generate a dummy message when there was no other message to submit. 463 There are three factors that make this workaround unsuitable for many 464 peer-to-peer applications. 466 o Transport Neutral APIs. 468 o Work/Completion Queue Accounting. 470 o Host-based implementation of MPA Fencing. 472 4.4.1. Transport Neutral APIs 474 Many of these applications access RDMA services using a transport 475 neutral API such as [DAPL] or [OFA]. Only RDDP over TCP [RFC5044] 476 has a first message requirement. Other RDMA transports, including 477 RDDP over SCTP (see [RFC5043]) and InfiniBand (see [IBTA]), do not. 479 Application or middleware communications can be expressed as 480 transport neutral RDMA operations, allowing lower software layers to 481 translate to transport and device specifics. Having a distinct extra 482 message that is required only for one transport undermines the 483 application's goal of being transport neutral. 485 4.4.2. Work/Completion Queue Accounting 487 RDMA local APIs conventionally use work queues to submit requests 488 (work queue elements or WQEs) and to asynchronously receive 489 completions (in completion queues or CQs). 491 Each work request can generate a completion queue entry (CQE). 492 Completions for successful transmit work requests are frequently 493 suppressed, but the completion queue capacity must account for the 494 possibility that each will complete in error. A completion queue can 495 receive completions from multiple work queues. 497 Completion Queues are defined so as to allow hardware RDMA 498 implementations to generate CQEs directly to a user-space mapped 499 buffer. This enables a user-space RDMA consumer to reap completions 500 without requiring kernel intervention. 502 A hardware RDMA implementation cannot reasonably wait for an 503 available slot in the completion queue. The queue must be sized such 504 that an overflow will not occur. When an overflow does occur it is 505 considered catastrophic and will typically require tearing down all 506 RDMA connections using that CQ. 508 This style of interface is very efficient, but places a burden on the 509 application to properly size each Completion Queue to match the Work 510 Queues that feed it. 512 While the format of both WQEs and CQEs is transport and device 513 dependent, a transport neutral API can deal with WQEs and CQEs as 514 abstract transport and device neutral objects. Therefore the number 515 of WQEs and CQEs required for an application can be transport and 516 device neutral. 518 The capacity of the work queues and completion queues can be 519 calculated in an abstract transport/device neutral fashion. If a 520 dummy operation approach was used, it would require lower layers to 521 know the usage model, and would disrupt the calculations by inserting 522 a dummy "operation" Work Request and filtering out the matching 523 completion. The lower layer does not know the usage model on which 524 the queue sizes are built, nor does it know how frequently an 525 insertion will be required. 527 4.4.3. Host-based Implementation of MPA Fencing 529 Many hardware implementations of RDDP using MPA/TCP do not handle the 530 MPA Request/Reply exchange in hardware, rather they are handled by 531 the host processor in software. With such designs it is common for 532 the MPA Fencing to be implemented in the user-space device-specific 533 library (commonly referred to as a 'User Verbs' library or module). 535 When the generation and reception of MPA FULPDUs is already dedicated 536 to hardware, a Work Completion can only be generated by an untagged 537 message since arrival of a message for tagged buffer does not 538 necessarily generate a completion and is done without any interaction 539 with ULP [RFC5040]. 541 5. Enhanced MPA Connection Establishment 543 Below we provide an overview of Enhanced Connection Setup. The goal 544 is to allow standard negotiation of ORD/IRD setting on both sides of 545 the RDMA connection and/or to negotiate the initial data transfer 546 operation by the initiator when the existing 'client sends first' 547 rule does not match application requirements. 549 The RDMA connection initiator sends an MPA Request, as specified in 550 [RFC5044]; the new format defined here allows for: 552 o Standardized negotiation of ORD and IRD. 554 o Negotiation of RTR functionality and the RDMA message type to use 555 as the RTR indication. 557 The RDMA connection responder processes the MPA Request and generates 558 an MPA Reply, as specified in [RFC5044]; the new format completes the 559 negotiation. 561 The local interface needs to provide a way for a ULP to request the 562 use of explicit RTR indication per-application or per-connection 563 basis when an explicit RTR indication will be required. Piggy- 564 backing the RTR on a Client's first message is a valuable 565 optimization for most connections. 567 The RDMA connection initiator MUST NOT allow any later FULPDUs to be 568 transmitted before the RTR indication. One method to achieve that is 569 to delay notifying the ULP that the RDMA connection has been 570 established until after any required RTR indication has been 571 transmitted. 573 All MPA exchanges are performed via TCP prior to RDMA establishment, 574 and are therefore signaled via TCP and not via RDMA completion. 576 6. Enhanced MPA Request/Reply Frames 578 Enhanced RDMA connection establishment uses an alternate format for 579 MPA Requests and Replies, as follows: 581 0 1 2 3 582 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 583 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 584 0 | | 585 + Key (16 bytes containing "MPA ID Req Frame") + 586 4 | (4D 50 41 20 49 44 20 52 65 71 20 46 72 61 6D 65) | 587 + Or (16 bytes containing "MPA ID Rep Frame") + 588 8 | (4D 50 41 20 49 44 20 52 65 70 20 46 72 61 6D 65) | 589 + + 590 12 | | 591 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 592 16 |M|C|R|S| Res | Rev | PD_Length | 593 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 594 | | 595 ~ ~ 596 ~ Private Data ~ 597 | | 598 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 599 | | 600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 602 Key: Unchanged from [RFC5044]. 604 M: Unchanged from [RFC5044]. 606 C: Unchanged from [RFC5044]. 608 R: Unchanged from [RFC5044]. 610 S: One if the Private Data begins with the enhanced RDMA connection 611 establishment data. Zero otherwise. 613 Res: One bit smaller than in [RFC5044], otherwise unchanged. In 614 [RFC5044] 'Res' field, in which the newly defined 'S' bit resides, 615 is reserved for future use. [RFC5044] specifies that 'RES' MUST 616 be set to zero when sending, and MUST NOT be checked on reception, 617 making use of S bit backwards compatible with the original MPA 618 frame format. When the S bit is set to zero, no additional 619 private data is used for enhanced RDMA connection establishment, 620 and therefore the resulting MPA request and reply frames are 621 identical to the unenhanced protocol. 623 Rev: This field contains the revision of MPA. To use any enhanced 624 connection establishment feature this MUST be set to two or 625 higher, If no enhanced connection establishment features are 626 desired it MAY be set to one. A host accepting MPA connections 627 MUST continue to accept MPA Requests with version one even if it 628 supports version two. 630 PD_Length: Unchanged from [RFC5044]. This is the total length of 631 the Private Data field, including the enhanced RDMA connection 632 establishment data if present. 634 Private Data: Unchanged from [RFC5044]. However, if the 'S' flag is 635 set, Private Data MUST begin with enhanced RDMA connection 636 establishment data (see Section 9). 638 7. Enhanced SCTP Session Control Chunks 640 Enhanced RDMA Connection Establishment uses the first 32 bits of the 641 Private data field for IRD and ORD negotiation in the "DDP Stream 642 Session Initiate" and "DDP Stream Session Accept" SCTP Session 643 Control Chunks. 645 The type of the SCTP Session Control Chunk is defined by a Function 646 Code (see [RFC4960]). [RFC5043] already defines codes for 'DDP 647 Stream Session Initiate' and 'DDP Stream Session Accept', which are 648 equivalent to a MPA Request Frame and an accepting MPA Reply Frame. 650 Enhanced RDMA connection establishment requires three additional 651 Function codes listed below: 653 Enhanced DDP Stream Session Initiate: 0x005 655 Enhanced DDP Stream Session Accept: 0x006 657 Enhanced DDP Stream Session Reject: 0x007 659 The Enhanced Reject function code MUST be used to indicate rejection 660 of enhanced DDP stream session for a configuration that would have 661 been accepted for unenhanced DDP Stream Session negotiation. 663 The Enhanced DDP stream session establishment follows the same rules 664 as the standard DDP stream session establishment as defined in 665 [RFC5043]. ULP-supplied Private Data MUST be included for Enhanced 666 DDP Stream Session Initiate, Enhanced DDP Stream Session Accept, and 667 Enhanced DDP Stream Session Reject messages, and MUST follow the 668 enhanced RDMA connection establishment data in the DDP Stream Session 669 Initiate and the Enhanced DDP Stream Session Accept messages. 671 Private Data length MUST NOT exceed 512 bytes in any message, 672 including enhanced RDMA connection establishment data. 674 Private Data MUST NOT be included in the DDP Stream Session TERM 675 message. 677 Received Extended DDP Stream Session Control messages SHOULD be 678 reported to the ULP. If reported, any supplied Private Data MUST be 679 available for the ULP to examine. For example, a received Extended 680 DDP Stream Session Control message is not reported to ULP if none of 681 the requested RTR indication types are supported by receiver. In 682 this case, Provider MAY generate reject reply message indicating 683 which RTR indication types it supports. 685 The enhanced DDP stream management MUST use the DDP stream session 686 termination function code to terminate a stream established using 687 enhanced DDP stream session function codes. 689 [RFC5043] already supports either side sending the first DDP Message 690 since the Payload Protocol Identifier (PPID) already distinguishes 691 between Session Establishment and DDP Segments. The enhanced RDMA 692 Connection Establishment provides to the ULP a transport independent 693 way to support peer-to-peer model. 695 The following additional Legal Sequences of DDP Stream Session 696 messages are defined: 698 o Enhanced Active/Passive Session Accepted: as with section 6.2 of 699 [RFC5043], but with the extended opcodes as defined in this 700 document. 702 o Enhanced Active/Passive Session Rejected: as with section 6.3 of 703 [RFC5043], but with the extended opcodes as defined in this 704 document. 706 o Enhanced Active/Passive Session Non-ULP Rejected: as with section 707 6.4 of [RFC5043], but with the extended opcodes as defined in this 708 document. 710 8. MPA Error Reporting 712 The RDMA connection establishment protocol is layered upon [RFC5040] 713 and [RFC5041]. Any enhanced RDMA connection establishment error 714 generates an MPA termination message to a peer. [RFC5040] defines a 715 triplet of protocol layers, error types and error codes for error 716 specification. MPA negotiation for RDMA connection establishment 717 uses the following layer and error type for MPA error reporting: 719 Layer: 0x2 - LLP 720 Error Type: 0x0 - MPA 722 While [RFC5044] defines four error codes, [RFC5043] does not define 723 any. Enhanced RDMA connection establishment extends [RFC5044] error 724 codes by adding three new error codes. Thus, enhanced RDMA 725 connection establishment is backward compatible with both [RFC5043] 726 and [RFC5044]. 728 The following error codes are defined for enhanced RDMA connection 729 establishment negotiation: 731 Error Code Description 732 -------------------------------------------------------- 733 0x05 Local catastrophic 734 0x06 Insufficient IRD resources 735 0x07 No matching RTR option 737 9. Enhanced RDMA Connection Establishment Data 739 Enhanced RDMA Connection Establishment places the following 32 bits 740 at the beginning of the Private data field of the MPA Request and 741 Reply Frames or the "DDP Stream Session Initiate" and "DDP Stream 742 Session Accept" SCTP Session Control Chunks. ULP specified private 743 data follows this field. The maximum amount of ULP specified private 744 data is therefore reduced by 4 bytes. Note that this field MUST be 745 sent in network byte order, with IRD and ORD encoded as 14 bit 746 unsigned integers. 748 0 1 2 3 749 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 750 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 751 0 |A|B| IRD |C|D| ORD | 752 4 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 754 IRD: Inbound RDMA Read Queue Depth. 756 ORD: Outbound RDMA Read Queue Depth. 758 A: Control Flag for connection model. 760 B: Control Flag for use of a zero length FULPDU (Send) RTR 761 indication. 763 C: Control Flag for use of a zero length RDMA Write RTR indication. 765 D: Control Flag for use of a zero length RDMA Read RTR indication. 767 9.1. IRD and ORD Negotiation 769 IRD and ORD are used for negotiation of Inbound RDMA Read Request 770 Queue depths for both endpoints of the RDMA connection. IRD is used 771 to configure the depth of the Inbound RDMA Read Request Queue (IRRQ) 772 on each endpoint. ORD is used to limit the number of simultaneous 773 outbound RDMA Read Requests allowed at at given point in time in 774 order to avoid IRRQ overruns at the remote endpoint. In order to 775 describe the negotiation of both local endpoint and remote endpoint 776 ORD and IRD values, four terms are defined: 778 Initiator IRD: IRD value sent in the MPA request or "DDP Stream 779 Session Initiate" SCTP Session Control Chunk. This is the value 780 of the initiator's IRD at the time of the MPA Request generation. 781 The responder sets its local ORD value to this value or less. 782 Initiator IRD is the maximum number of simultaneous inbound RDMA 783 Read Requests which the initiator can support for the requested 784 connection. 786 Initiator ORD: ORD value in the MPA request or "DDP Stream Session 787 Initiate" SCTP Session Control Chunk. This is the initial value 788 of the initiator's ORD at the time of the MPA Request generation 789 and also a request to the responder to support a responder IRD of 790 at least this value. Initiator ORD is the maximum number of 791 simultaneous outbound RDMA Read operations that the initiator 792 desires the responder to support for the requested connection. 794 Responder IRD: IRD value returned in the MPA reply or "DDP Stream 795 Session Accept" SCTP Session Control Chunk. This is the actual 796 value that the responder set for its local IRD. This value is 797 greater than or equal to initiator ORD for successful 798 negotiations. Responder IRD is the maximum number of simultaneous 799 inbound RDMA Read Requests that the responder actually can support 800 for the requested connection. 802 Responder ORD: ORD value returned in the MPA reply or "DDP Stream 803 Session Accept" SCTP Session Control Chunk. This is the actual 804 value that the responder used for ORD and is less than or equal to 805 initiator IRD for successful negotiations. Responder ORD is the 806 maximum number of simultaneous outbound RDMA Read operations that 807 the responder will allow for the requested connection. 809 The relationships between these parameters after a successful 810 negotiation is complete are the following: 812 initiator ORD <= responder IRD 814 responder ORD <= initiator IRD 816 The responder and initiator MUST pass the peer's provided IRD and ORD 817 values to the ULP, in addition to using the values as calculated by 818 the preceding rules. 820 Responder ORD SHOULD be set to a value less than or equal to 821 initiator IRD. If initiator ORD is insufficient to support the 822 selected connection model, responder IRD MAY be increased, for 823 example if initiator ORD is 0 (RDMA Reads will not be used by the 824 ULP) and the responder supports use of a zero length RDMA Read RTR 825 indication, then responder IRD can be set to 1. The responder MUST 826 set its ORD at most to initiator IRD. The responder MAY reject the 827 connection request if initiator IRD is not sufficient for the ULP 828 required ORD and specify the required ORD in the MPA Reject frame 829 responder ORD. Thus, the TERM message MUST contain Layer 2, Error 830 Type 0, Error Code 6. 832 Upon receiving the MPA Accept frame from the responder, the initiator 833 MUST set its IRD at least to responder ORD and its ORD at most to 834 responder IRD. If the initiator does not have sufficient resources 835 for the required IRD, it MUST send a TERM message to the responder 836 indicating insufficient resources, and terminate the connection due 837 to insufficient resources. Thus, the TERM message MUST contain Layer 838 2, Error Type 0, Error Code 6. 840 The initiator MUST pass the responder provided IRD and ORD to the ULP 841 for both MPA Accept and Reject messages. The initiator ULP can 842 decide its course of action. For example, the initiator ULP may 843 terminate the established connection and renegotiate responder ORD. 845 An all ones value (0x3FFF) indicates that automatic negotiation of 846 the IRD or ORD is not desired, and that the ULP will be responsible 847 for it. The responder MUST respond to an initiator ORD value of 848 0x3FFF by leaving its local endpoint IRD value unchanged, and setting 849 IRD to 0x3FFF in its reply message. The initiator MUST leave its 850 local endpoint ORD value unchanged upon receiving a responder IRD 851 value of 0x3FFF. The responder MUST respond to an initiator IRD 852 value of 0x3FFF by leaving its local endpoint ORD value unchanged, 853 and setting ORD to 0x3FFF in its reply message. The initiator MUST 854 leave its local endpoint IRD value unchanged upon receiving a 855 responder ORD value of 0x3FFF. 857 9.2. Peer-to-Peer Connection Negotiation 859 Control Flag A value 1 indicates that a peer-to-peer connection model 860 is being performed, and value 0 indicates a client-server model. 861 Control Flag B value 1 indicates that a zero length FULPDU (Send) RTR 862 indication is requested for the initiator and supported by the 863 responder, respectively, 0 otherwise. Control Flag C value 1 864 indicates that a zero length RDMA Write RTR indication is requested 865 for the initiator and supported by the responder, respectively, 0 866 otherwise. Control Flag D value 1 indicates that a zero length RDMA 867 Read RTR indication is requested for the initiator and supported by 868 the responder, respectively, 0 otherwise. The initiator MUST set 869 Control Flag A to 1 for peer-to-peer model. The initiator MUST set 870 each Control Flag B, C and D to 1 for each of the options it 871 supports, if Control Flag A is set to 1. 873 The responder MUST support at least one RTR indication option if it 874 supports Enhanced RDMA connection establishment. If Control Flag A 875 is 1 in the MPA request message then the responder MUST set Control 876 Flag A to 1 in the MPA reply message. For each initiator supported 877 RTR indication option the responder SHOULD set the corresponding 878 Control Flag if the responder can support that option in an MPA 879 reply. The responder is not required to specify all RTR indication 880 options it supports. The responder MUST set at least one RTR 881 indication option if it supports more than one initiator specified 882 RTR indication option. The responder MAY include additional RTR 883 indication options it supports, even if not requested by any 884 initiator specified RTR indication options. If the responder does 885 not support any of the initiator specified RTR indication options 886 then the responder MUST set at least one RTR indication type option 887 it supports. 889 Upon receiving the MPA accept frame with Control Flag A set to 1, the 890 initiator MUST generate one of the negotiated RTR indications. If 891 the initiator is not able to generate any of the responder supported 892 RTR indications, then it MUST send a TERM message to the responder 893 indicating failure to negotiate a mutually compatible connection 894 model or RTR option, and terminate the connection. Thus, the TERM 895 message MUST contain Layer 2, Error Type 0, Error Code 7. The ULP 896 can negotiate a ULP level RTR indication when a Provider level RTR 897 indication cannot be negotiated. 899 The initiator MUST set Control Flag A to 0 for client-server model. 900 The responder MUST set Control Flag A to 0 if Control Flag A is 0 in 901 request. If Control Flag A is set to 0 then Control Flags B, C and D 902 MUST also be set to 0. On reception if Control Flag A is set to 0 903 then Control Flags B, C, and D MUST be ignored. 905 9.3. Enhanced Connection Negotiation Flow 907 The RTR indication type and ORD/IRD negotiation follows the following 908 order: 910 initiator (MPA Request) --> Set Control Flag A to 1 to indicate 911 peer-to-peer connection model and initiator IRD, ORD setting on 912 local Endpoint of the connection. Set Control Flags B, C, and D 913 to 1 for each initiator-supported option of RTR indication. 915 responder (MPA Reply) <-- Match the initiator Control Flag A value 916 and set ORD/IRD to the responder local endpoint values based upon 917 the initiator initial ORD/IRD values and the number of 918 simultaneous RDMA Read Requests required by the ULP. Sets Control 919 Flags B, C, and D to 1 for responder-supported options of RTR 920 indication options for peer-to-peer connection model and sets the 921 responder IRD/ORD actual values. 923 initiator (First RDMA Message) --> After the initiator modifies its 924 ORD/IRD to match the responder's values as stated above, the 925 initiator sends the first message of negotiated RTR indication 926 option. If no matching RTR indication option exists then the 927 initiator sends a TERM message. 929 The initiator or responder MUST generate the TERM message that 930 contains Layer 2, Error Type 0, Error Code 5 when it encounters any 931 error locally for which the special Error Code is not defined in 932 Section 8 before resetting the connection. 934 10. Interoperability 936 The initiator requests enhanced RDMA connection establishment by 937 sending an enhanced RDMA establishment request; an enhanced responder 938 is REQUIRED to respond with an enhanced RDMA connection establishment 939 response, whereas an unenhanced responder treats the enhanced request 940 as incorrectly formatted and closes the TCP connection. All 941 responders are REQUIRED to issue unenhanced RDMA connection 942 establishment responses in response to unenhanced RDMA connection 943 establishment requests. 945 The initiator MUST NOT use the enhanced RDMA connection establishment 946 formats or function codes when no enhanced functionality is desired. 948 The responder MUST continue to accept unenhanced connection requests. 950 There are three initiator/responder cases that involve enhanced MPA: 951 both the initiator and responder, only the responder, and only the 952 initiator. The enhanced MPA frame is defined by field 'S' set to 1. 954 Enhanced MPA initiator and responder: If the responder receives an 955 enhanced MPA message, it MUST respond with an enhanced MPA 956 message. 958 Enhanced MPA responder only: If the responder receives an unenhanced 959 MPA message ('S' is set to 0), it MUST respond with an unenhanced 960 MPA message. 962 Enhanced MPA initiator only: If the responder receives an enhanced 963 MPA message and it does not support enhanced RDMA connection 964 establishment, it MUST close the TCP connection and exit MPA. 965 From a standard RDMA connection establishment point of view 966 enhanced MPA frame is improperly formatted as stated in [RFC5044]. 967 Thus, both the initiator and responder report TCP connection 968 termination to an application locally. In this case the initiator 969 MAY attempt to establish an RDMA connection using the unenhanced 970 MPA protocol as defined in [RFC5044] if this protocol is 971 compatible with the application, and let ULP deal with ORD and 972 IRD, and peer-to-peer negotiations. 974 A note for a potential future enhancements for connection 975 establishment negotiation: It is possible to further extend 976 formatting of private data of the MPA Request and Reply frames and to 977 use other bits from "Res" field to indicate additional private data 978 formatting. 980 11. IANA Considerations 982 IANA is requested to add the following entries to the "SCTP Function 983 Codes for DDP Session Control" registry created by Section 3.4 of 984 [IANA_RDDP_REGISTRY]: 986 0x0005, Enhanced DDP Stream Session Initiate, [RFCXXXX] 988 0x0006, Enhanced DDP Stream Session Accept, [RFCXXXX] 990 0x0007, Enhanced DDP Stream Session Reject, [RFCXXXX] 992 IANA is requested to add the following entries to the "MPA Errors" 993 registry created by Section 3.3 of [IANA_RDDP_REGISTRY] 995 0x2/0x0/0x05, - MPA Error / Local catastrophic error, [RFCXXXX] 997 0x2/0x0/0x06 - MPA Error / Insufficient IRD resources, [RFCXXXX] 999 0x2/0x0/0x07 - MPA Error / No matching RTR option, [RFCXXXX] 1001 RFC Editor: Please replace XXXX in the six instances of [RFCXXXX] 1002 above with the RFC number of this document and remove this note. 1004 12. Security Considerations 1006 The security considerations from RFC 5044 and RFC 5043 apply and the 1007 changes in this document do not introduce new security 1008 considerations. However it is recommended that implementations do 1009 sanity checking for the input parameters, including ORD, IRD, and the 1010 control flags used for RTR indication option negotiation. 1012 13. Acknowledgements 1014 The authors wish to thank Sean Hefty, Dave Minturn, Tom Talpey, David 1015 Black and David Harrington for their valuable contributions and 1016 reviews of this document. 1018 14. References 1020 14.1. Normative References 1022 [IANA_RDDP_REGISTRY] 1023 "IANA Registries for the RDDP (Remote Direct Data 1024 Placement) Protocols, Work in Progress", October, 2011, . 1028 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1029 Requirement Levels", BCP 14, RFC 2119, March 1997. 1031 [RFC4960] Stewart, R., "Stream Control Transmission Protocol", 1032 RFC 4960, September 2007. 1034 [RFC5040] Recio, R., Metzler, B., Culley, P., Hilland, J., and D. 1035 Garcia, "A Remote Direct Memory Access Protocol 1036 Specification", RFC 5040, October 2007. 1038 [RFC5041] Shah, H., Pinkerton, J., Recio, R., and P. Culley, "Direct 1039 Data Placement over Reliable Transports", RFC 5041, 1040 October 2007. 1042 [RFC5043] Bestler, C. and R. Stewart, "Stream Control Transmission 1043 Protocol (SCTP) Direct Data Placement (DDP) Adaptation", 1044 RFC 5043, October 2007. 1046 [RFC5044] Culley, P., Elzur, U., Recio, R., Bailey, S., and J. 1047 Carrier, "Marker PDU Aligned Framing for TCP 1048 Specification", RFC 5044, October 2007. 1050 14.2. Informative References 1052 [DAPL] "Direct Access Programming Library", 1053 . 1055 [IBTA] "InfiniBand Architecture Specification Release 1.2.1", . 1059 [OFA] "OFA verbs & APIs", . 1061 [OpenMP] McGraw-Hill, "Parallel Programming in C with MPI and 1062 OpenMP", 2003. 1064 [PPMPI] Morgan Kaufmann Publishers Inc., "Parallel Programming 1065 with MPI", 2008. 1067 [RDMAC] "RDMA Protocol Verbs Specification (Version 1.0)", . 1071 [RDS] Open Fabrics Association, "Reliable Datagram Socket", 1072 2008, . 1075 [UsingMPI] 1076 MIT Press, "Using MPI-2: Advanced Features of the Message 1077 Passing Interface", 1999. 1079 [VIA] Compaq, Intel, Microsoft, "Virtual Interface Architecture 1080 Specification", 1997, . 1083 Authors' Addresses 1085 Arkady Kanevsky (editor) 1086 Dell Inc. 1087 One Dell Way, MS PS2-47 1088 Round Rock, TX 78682 1089 USA 1091 Phone: +1-512-728-0000 1092 Email: arkady.kanevsky@gmail.com 1094 Caitlin Bestler (editor) 1095 Nexenta Systems 1096 555 E El Camino Real #104 1097 Sunnyvale, CA 94087 1098 USA 1100 Phone: +1-949-528-3085 1101 Email: Caitlin.Bestler@nexenta.com 1103 Robert Sharp 1104 Intel 1105 LAD High Performance Message Passing, Mailstop: AN1-WTR1 1106 1501 South Mopac, Suite 400 1107 Austin, TX 78746 1108 USA 1110 Phone: +1-512-493-3242 1111 Email: robert.o.sharp@intel.com 1112 Steve Wise 1113 Open Grid Computing 1114 4030 Braker Lane STE 130 1115 Austin, TX 78759 1116 USA 1118 Phone: +1-512-343-9196 x101 1119 Email: swise@opengridcomputing.com