R. Stewart Network Working Group Cisco Internet Draft D. Otis Document: draft-stewart-otis-sctp-ddp-rdma-01.txt SANlight Expires: August, 2002 February, 2002 SCTP DDP/RDMA Adaptation Status of this Memo This document is an internet-draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract In many applications, direct placement of data without the overhead of multiple copies or excessive context switching is a desirable feature. To accomplish this goal, a direct placement adaptation layer is defined within this document. We propose a small shim that sits above SCTP and that possibly places data directly into a user buffer. The ultimate goal is to have placement occur by the network interface card, where this shim will coordinate such placement while proper network layering is maintained. As SCTP was not designed to directly handle offset based fragmentation, the shim must handle message fragmentation to introduce the proper offsets as well as determine completion notifications as a result of the required unordered delivery needed for immediate placement. Table of Content 1 Introduction 2 1.1 Conventions 2 2 Adaptation Layer Formats 2 2.1 Adaptation Layer Indicator 2 2.2 DATA chunk format 3 3 Procedures 5 3.1 Association Initialization 5 3.2 DDP Data Placement 5 3.2.1 Receiver Side Behavior 5 3.2.2 Sender Side Behavior 6 4 IANA considerations 7 5 Security Considerations 7 Internet Draft SCTP DDP/RDMA Adaptation Page[2] 6 Acknowledgments 7 7 Authors' Addresses 8 8 References 8 1 Introduction In many applications, direct placement of data without the overhead of multiple copies or excessive context switching is a desirable feature. To accomplish this goal, a direct placement adaptation layer is defined within this document. We propose a small shim sitting directly above SCTP that enables data to be directly placed into user buffers without assembly buffering. This assumes hardware able to validate each DATA chunk as received prior to placement and each DATA Chunk carries an offset within an identified user buffer. Some implementations may include this adaptation layer within their SCTP implementations to obtain maximum performance, but the behavior of SCTP will be unaffected. In order to accomplish this, we specify the use of the new adaptation layer indication as defined in [STEWa]. 1.1 Conventions The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. DDP is a mnemonic for Direct Data Placement and ULP is for Upper-Level Protocol. 2 Adaptation Layer Formats 2.1 Adaptation Layer Indicator Three separate adaptation layers are defined which MAY appear in the INIT or INIT-ACK with the following format as defined in [STEWa]. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type =0xC006 | Length = Variable | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DDP Adaptation Indication | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Adaptation Indication: The following five values are allowed and one of them MUST be present to enable specific behaviors defined in this document: ANONYMOUS - 0x00000001 DDP - 0x00000002 DDP_WITH_TAG - 0x00000003 Internet Draft SCTP DDP/RDMA Adaptation Page[3] If ANONYMOUS is specified, completion semantics are delineated by Stream and the buffer is referenced anonymously. The 'Placement Tag' field is passed-through but holds no meaning for the shim. If DDP is specified, completion semantics are delineated by Stream that references the user buffer where length is the only range limit. The 'Placement Tag' field is passed-through but holds no meaning for the shim. If DDP_WITH_TAG is specified, then completion semantics are delineated by the 'Placement Tag' that reference the user buffer. For Upper-Level Protocols that utilize the DDP shim, the Payload Protocol Identifier will indicate either a null value or an IANA registered protocol identity. 2.2 DATA chunk format The following format MUST be used on all DATA chunks. Note that the format expands the existing DATA chunk but where direct placement fields are considered user data by the SCTP stack. In addition, to allow immediate placement, all DATA chunks are sent as Unordered and the shim is required to perform all message fragmentation prior to being delivered to SCTP where SCTP is placed in a mode to refuse messages larger than the path MTU. Common DATA Chunk header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0 | Reserved|U|B|E| Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TSN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Stream Identifier S | Stream Sequence Number n | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Payload Protocol Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ DDP Header Extension: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Placement Mode | Flags | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Placement Tag | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | + Placement Offset + | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \ \ / Data (seq n of Stream S) / \ \ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Internet Draft SCTP DDP/RDMA Adaptation Page[4] Note, the following fields: Type, Reserved, U,B,E Length, TSN, Stream Identifier, Stream Sequence Number, Payload Protocol Identifier are defined in [RFC2960] where the reader should refer for any details for these fields. In the case for DDP and DDP_WITH_TAG, Type will always be 0, the U, B, and E flags MUST be set, the Length will indicate the unpadded length of the DATA chunk, the TSN will represent a unique value associated with the DATA chunk, the Stream Identifier will indicate the Stream a message was sent, the Stream Sequence Number is invalid, and the Payload Protocol Identifier will be determined by the layer above the shim. An exception can occur when the Placement Mode ANONYMOUS is active, the U, B, and E flags may be any value, and the Stream Sequence Number will be valid if the U flag is not set. Placement Mode: 24 bits (unsigned integer) This field will hold one of the following values indicating a valid DDP Extension: 0x000001 - ANONYMOUS_MODE. In this mode, placement is into anonymous buffers. The Offset will contain either zero or a message byte displacement. Data in this chunk is not directly placed into user buffers. The TAG field MAY contain information used by the ULP above the shim. 0x000002 - DDP_MODE. In this mode, Direct Data Placement uses Stream to reference user buffers. This mode MUST only be used when the adaptation layer indication was DDP. The TAG field MAY contain information used by the ULP above the shim. 0x000003 - DDP_WITH_TAG_MODE. In this mode, Direct Data Placement uses the 'Placement Tag' to reference user buffers. This mode MUST only be used when the adaptation layer indication was DDP_WITH_TAG. Flags: 8 bits (unsigned integer) Bit 0 - Acknowledgement Requested. A signal provided to the ULP above the shim to indicate an Acknowledgement was requested upon completion. Bit 1 - Disclose. A signal provided to the ULP above the shim to indicate completion of the message that MAY invoke a process related to the current buffer. Internet Draft SCTP DDP/RDMA Adaptation Page[5] Bit 2 - Release Buffer. A signal provided to the ULP that the current buffer MAY be released to a process upon completion. Bit 3-7 - Reserved. Completion signals are held until the cumulative TSN is greater than or equal to the TSN of the DATA chunk carrying the signal flag. Comparisons and arithmetic on TSNs in this document SHOULD use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. Placement Tag: 32 bits (unsigned integer) When the Placement Mode is set to ANONYMOUS_MODE or DDP_MODE, this may hold information used by the ULP, otherwise, it holds the reference to the user buffer. This tag is used to lookup the actual buffer address, limits, and restrictions in the local endpoints tag lookup cache. Placement Offset: 64 bits (unsigned integer) When the Placement Mode is not set to ANONYMOUS, this value holds the placement byte offset. The local endpoint MUST verify the offset is within a valid range for the user buffer. If the Placement Mode is set to ANONYMOUS, then this value may hold either zero or the message byte displacement. 3 Procedures 3.1 Association Initialization At the startup of an association, an endpoint wishing to perform DDP placement MUST include an adaptation layer indication in its INIT or INIT-ACK (as defined in 2.1). After the exchange of the first two messages (INIT and INIT-ACK), an endpoint MUST verify that the peer supports the DDP Mode and Protocol Payload Identifier by confirmation that the peer included one of the adaptation indications. If the peer did specify a DDP adaptation, then ALL DATA chunks MUST contain the header extensions specified in section 2.2 and the endpoint SHOULD enable the indicated adaptation. The value of the Payload Protocol Identifier in subsequent Data Chunks is defined by the ULP. If the peer endpoint did NOT specify a DDP placement adaptation then the local endpoint MUST disable DDP adaptation and it MUST NOT send DATA chunks with the additional fields as specified in section 2.2. 3.2 DDP Data Placement 3.2.1 Receiver Side Behavior When a DATA chunk arrives and DDP Placement adaptation has been enabled, the following procedures MUST be performed. Internet Draft SCTP DDP/RDMA Adaptation Page[6] R1 - If the Placement Mode is set to DDP_MODE and the peer endpoint did not indicate DDP in its adaptation indication, the endpoint MUST abort the association. R2 - If the Placement Mode is set to DDP_WITH_TAG_MODE and the peer endpoint did not indicate DDP_WITH_TAG in its adaptation indication, the endpoint MUST abort the association. R3 - If the Placement mode is set to a recognized mode other than ANONYMOUS, the endpoint MUST use its lookup cache to determine the buffer to receive the payload of this DATA chunk. For modes using 'Placement Tag', this field SHOULD be used to obtain buffer related information. The buffer SHOULD be indexed by Placement Offset and the data SHOULD be directly placed within the buffer. Note: Great caution must be taken when referencing buffers with offsets. The Placement Tag SHOULD NOT be a direct memory address but instead an index to be translated into a memory address, memory limits, and read/write restrictions. The Placement Offset must be carefully verified to assure that the offset is within the valid range of the indicated buffer. If any data placement specification is incorrect, the association SHOULD be aborted. R4 - Otherwise, if the Placement Flag is set to ANONYMOUS_MODE, the endpoint MUST pass the message to into anonymous buffers for the ANONYMOUS adaptation process associated with the Stream. R5 - Release signals to processes associated with the buffers when the cumulative TSN is greater than or equal to the TSN of the DATA chunk carrying the signal flag. 3.2.2 Sender Side Behavior The sender of a message MUST always include an extension header if a DDP adaptation is enabled. The sender MUST perform the following when sending data: S1 - If ANONYMOUS was specified by the sender in its adaptation indication, the Placement Mode must be set to the value of ANONYMOUS_MODE. S2 - If the message is not to be directly placed into a user buffer (such as a negotiation message or a read request), the sender MUST specify the value of ANONYMOUS_MODE in the Placement Mode and the Placement Offset field will contain either zero or a message byte displacement. The Placement Tag field may contain information used by the ULP. S2 - If the message is to be directly placed into a user buffer, the Placement Mode SHOULD be set to the appropriate mode and Stream, tag, offset, and flags SHOULD be placed into the appropriate fields in the outgoing DATA chunk. For messages Internet Draft SCTP DDP/RDMA Adaptation Page[7] that must be fragmented by the shim, only the last DATA chunk of the message will include the flag values and each subsequent fragment will have the offset byte value advanced according to the sum of each previous fragment size. 4 IANA considerations This document defines three new Adaptation Layer Indications as specified within section 2.1. 5 Security Considerations Any direct placement of memory poses a significant security risk. Great caution must be taken when referencing offsets to memory addresses in behalf of peer endpoints. The Placement Tag SHOULD NOT be a direct memory address passed to a peer but instead an index to be translated into a memory address. The Placement Offset must be carefully verified to assure that the Offset is within a valid range of the buffer. If any data placement specification is incorrect the association SHOULD be aborted. 6 Acknowledgments The authors would like to thank the following people that have provided comments and input Stephen Bailey, Allyn Romanow, and Caitlin Bestler. Internet Draft SCTP DDP/RDMA Adaptation Page[8] 7 Authors' Addresses Randall R. Stewart 24 Burning Bush Trail. Crystal Lake, IL 60012 USA EMail: rrs@cisco.com Douglas Otis 800 E. Middlefield Mountain View, CA 94043 USA Email dotis@sanlight.net 8 References [RFC1982] Elz, R. and R. Bush, "Serial Number Arithmetic", RFC 1982, August 1996. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. J. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, and, V. Paxson, "Stream Control Transmission Protocol", RFC 2960, October 2000. [STEWa] - Stewart, Ramalho, Xie, Tuexen, Rytina, Conrad, "SCTP Extensions for Dynamic Reconfiguration of IP Addresses", November 2001, draft-ietf-tsvwg-addip-sctp-03.txt, work-in-progress. Internet Draft SCTP DDP/RDMA Adaptation Page[9] Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Funding for the RFC Editor function is currently provided by the Internet Society.