Internet Draft J. Pinkerton Category: Informational Microsoft M. Krause HP S. Bailey Sandburst Requirements for an RDMA Protocol draft-pinkerton-rdma-reqmts-00.txt Status of this Memo This document is an Internet Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet Drafts as reference material or to cite them other than as "work in progress." The list of current Internet Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Copyright Notice Copyright (C) The Internet Society (2001). All Rights Reserved. Abstract This document proposes requirements for a Remote Direct Memory Access (RDMA) Protocol to run on TCP and SCTP transport protocols. An RDMA protocol provides a general facility for extremely high efficiency (low CPU cost per unit of data transferred), end-to-end data transfer. An RDMA protocol enables high efficiency data transfer by allowing network interfaces with hardware support for the RDMA protocol to perform zero-copy data transfer directly among application buffers. An RDMA protocol is an intermediary protocol on to which other upper layer protocols (ULPs), such as storage Pinkerton et al. Expires December 2001 [Page 1] Internet Draft Requirements for an RDMA Protocol June 2001 (e.g. SCSI), file (e.g. NFS) and interprocess communication protocols, may be mapped. ULPs mapped on to an RDMA protocol benefit from high efficiency data transfer without requiring any ULP-specific hardware. In other words, an RDMA protocol permits a single network interface hardware mechanism to accelerate data transfer for a wide range of present and future applications. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [RFC2119]. 1. Definitions In this document, the following definitions will be used. transport protocol - a layer 4 protocol specifically identified in section 4) below (e.g. TCP or SCTP). chunk - smallest unit of data transfer for the RDMA protocol. A chunk is required to fit within a single transport protocol segment. message - a single RDMA protocol command, which may be subdivide into multiple chunks for a variety of reasons, including segmentation to respect the maximum segment size of the transport protocol. 2. Goals The RDMA protocol SHOULD satisfy the following goals. 1. The RDMA protocol SHALL provide direct data placement into receive application buffers, even when chunks are received out of order. This implies: a. each chunk SHALL be self-describing in terms of the destination buffer location, Pinkerton et al. Expires December 2001 [Page 2] Internet Draft Requirements for an RDMA Protocol June 2001 b. a generic ULP header / data split to independent buffers SHALL be defined to enable RDMA middleware protocols to be supported while preserving application buffer zero- copy semantics. 2. The RDMA protocol SHALL provide strong end-to-end data integrity over supported transport protocols. Data integrity will be implemented either by the transport protocol, or with additional mechanisms specified by the RDMA protocol, based on transport protocol capabilities and present and anticipated ULP requirements. 3. The RDMA protocol SHALL provide two levels of error response: a. session termination on error detection - abort all current and in-flight operations, b. abort current and / or subsequent operations and allow ULP to initiate recovery. 4. The RDMA protocol SHALL provide defined memory region advertisement semantics, error detection, and associated wire protocol. 5. The RDMA protocol SHALL define ULP payload segmentation and reassembly across multiple transport protocol segments when not explicitly supported by the underlying transport protocol (also known as chunking). 6. The RDMA protocol SHALL provide a means to accumulate multiple RDMA protocol messages into a single receive notification event. 3. Scope of Work The RDMA protocol SHOULD conform to the following scope guidelines in satisfying its goals. 1. The RDMA protocol specification SHALL define a wire protocol and it's semantics for: a. startup, Pinkerton et al. Expires December 2001 [Page 3] Internet Draft Requirements for an RDMA Protocol June 2001 b. data transfer, c. teardown. 2. The RDMA protocol SHALL NOT duplicate transport protocol capabilities, including: a. congestion management, b. reliable delivery. 3. The RDMA protocol specification SHALL assume: a. invalid transport protocol segments (e.g. duplicate packets or corrupt packets) SHALL not be transferred into application buffer memory, b. all completions SHALL occur in order. 4. Interaction with Transport Protocols The RDMA protocol interface to transport protocols SHOULD satisfy the following requirements. 1. The RDMA protocol SHALL run on top of TCP and SHALL require TCP framing headers. When running on TCP the RDMA protocol SHALL: a. provide better error detection than existing TCP 16 bit ones-complement checksum, b. not provide better error correction than provided by TCP. This means that when running on TCP, the RDMA protocol SHALL not specify any additional retransmission algorithm to recover from detected errors beyond what is already provided by TCP, c. specify how to send ULP payloads larger than a TCP segment. 2. The RDMA protocol SHALL run on top of SCTP. When running on SCTP, the RDMA protocol SHALL: a. use existing SCTP error detection and correction mechanisms. Pinkerton et al. Expires December 2001 [Page 4] Internet Draft Requirements for an RDMA Protocol June 2001 5. Initialization and Teardown The RDMA protocol initialization and teardown SHOULD satisfy the following requirements. 1. Initialization of the RDMA protocol MAY be done at any time after transport protocol connection setup. 2. Initialization packet format SHALL not be specified by the specification. The RDMA protocol SHALL require at least the following variables be negotiated at initialization: a. flag specifying either in-order data delivery or out-of- order data delivery (in either case in-order completions are required), b. number of simultaneous outstanding RDMA Reads. 3. No mechanisms SHALL be provided to disable the RDMA protocol once it is enabled. The transport protocol connection MUST be terminated. 6. Data transfer mechanisms The RDMA protocol data transfer mechanisms SHOULD satisfy the following requirements. 1. Remote DMA transfer: a. The RDMA protocol specification SHALL define packet formats for: i. RDMA receive memory region advertisement, ii. RDMA Write, iii. RDMA Write NAK - if an error occurred on the RDMA Write, a NAK will be generated, iv. RDMA Read Request, v. RDMA Read Reply. b. The specification SHALL use a memory region tag and offset to describe destination memory: Pinkerton et al. Expires December 2001 [Page 5] Internet Draft Requirements for an RDMA Protocol June 2001 i. the offset SHALL be either zero based or have a fixed offset (this can be used to either provide a mapping directly to a receive virtual address space (fixed offset) or be relative to the beginning of the memory region (zero offset); ii. the memory region offset SHALL be at least 64 bits; iii. the memory region tag SHALL be at least 32 bits; iv. the receiver's memory region SHALL provide byte- boundary protection against errant RDMA transfers. In other words, errant RDMA transfers SHALL not be permitted to access memory outside of the buffer posted. c. Demultiplexing intermediate ULP headers from application data SHALL be enabled through any of the following: i. sending the intermediate ULP header using RDMA into a separate memory region from the application data buffer, ii. sending the intermediate ULP header using Send messages where the application data is sent via either RDMA or Send mechanisms. d. The RDMA protocol SHALL NOT specify any explicit buffer flow control. Necessary flow control definition is left to the ULP. 2. Send message based transfer: a. SHALL use a message sequence number and offset from the start of a message: i. send message sequence numbers SHALL be at least 64 bits, ii. send message offsets SHALL be at least 32 bits. b. if the receiver's buffer is smaller than a Send message, the Send message SHALL be split across multiple receiver buffers. Such an occurrence is not an error. c. The RDMA protocol SHALL NOT specify any explicit Send message flow control. Necessary flow control definition is left to the ULP. Pinkerton et al. Expires December 2001 [Page 6] Internet Draft Requirements for an RDMA Protocol June 2001 7. Common issues The RDMA protocol SHOULD satisfy the following additional, overall requirements. 1. The RDMA protocol SHALL ensure in-order completions of any combination of RDMA Reads, RDMA Writes and SEND messages. 2. The RDMA protocol SHALL provide a mechanism for an explicit, sender-managed ULP receive completion model. This notification mechanism will allow collapsing notification of multiple RDMA Writes and Sends into a single notification event to the receiving ULP. 3. A completion event SHALL only occur on the last chunk of a message. 4. General scatter into remote buffers SHALL not be directly supported. Posted buffers MUST be virtually contiguous. 5. RDMA transfers which are out-of-order to the same destination address are not guaranteed to occur in order. 8. Recommended approaches These recommendations are intended to be informative to the RDMA protocol specification process. They are not requirements. 1. The RDMA protocol SHOULD use a base header with extension headers as appropriate. 2. Within the confines of permitting high-performance hardware implementation, the RDMA protocol SHOULD use as small headers as possible. RDMA protocol headers are expected to occur frequently so small headers will substantially reduce protocol overhead. 3. RDMA and Send message headers SHOULD include either an opcode or a flag field, representing: a. Completion bit (C) - send a completion event to the ULP, b. Last bit (L) - the Last bit SHALL be set in the last chunk of a message, c. Transmit Error bit (E) - the transmitter encountered an error while sending this RDMA message. Pinkerton et al. Expires December 2001 [Page 7] Internet Draft Requirements for an RDMA Protocol June 2001 9. Security Considerations Security considerations are not discussed in this memo. 10. References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 Pinkerton et al. Expires December 2001 [Page 8] Internet Draft Requirements for an RDMA Protocol June 2001 11. Authors' Addresses Jim Pinkerton Microsoft, Inc. 1 Microsoft Way Redmond, WA 98052 USA Phone: EMail: jpink@microsoft.com Michael Krause Hewlett Packard Corporation 43LN 19420 Homestead Road Cupertino, CA 95014 USA Phone: +1 408 447 3191 EMail: krause@cup.hp.com Stephen Bailey Sandburst Corporation 600 Federal Street, 2nd Floor Andover, MA 01810 USA Phone: +1 978 689 1614 Email: steph@sandburst.com 12. Full Copyright Statement Copyright (C) The Internet Society (2001). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process Pinkerton et al. Expires December 2001 [Page 9] Internet Draft Requirements for an RDMA Protocol June 2001 must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Pinkerton et al. Expires December 2001 [Page 10]