idnits 2.17.1 draft-shah-iwarp-ddp-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RDMA' is mentioned on line 258, but not defined == Missing Reference: 'DDP' is mentioned on line 350, but not defined == Unused Reference: 'RFC2026' is defined on line 1235, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 1238, but no explicit reference was found in the text -- No information found for draft-cully-iwarp-mpa - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'MPA' -- No information found for draft-recio-iwarp - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'RDMAP' ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) Summary: 4 errors (**), 0 flaws (~~), 6 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Hemal Shah 3 draft-shah-iwarp-ddp-01.txt Intel Corporation 4 James Pinkerton 5 Microsoft Corporation 6 Renato Recio 7 IBM Corporation 8 Paul Culley 9 Hewlett-Packard Company 11 Expires: April, 2003 13 Direct Data Placement over Reliable Transports 15 1 Status of this Memo 17 This document is an Internet-Draft and is subject to all provisions 18 of Section 10 of RFC2026. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other documents 27 at any time. It is inappropriate to use Internet-Drafts as 28 reference material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/1id-abstracts.html The list of Internet-Draft 32 Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 2 Abstract 37 The Direct Data Placement protocol provides information to Place the 38 incoming data directly into an upper layer protocol's receive buffer 39 without intermediate buffers. This removes excess CPU and memory 40 utilization associated with transferring data through the 41 intermediate buffers. 43 shah, et. al. Expires April 2003 1 44 Table of Contents 46 1 Status of this Memo.........................................1 47 2 Abstract....................................................1 48 3 Introduction................................................4 49 3.1 Architectural Goals.........................................4 50 3.2 Protocol Overview...........................................5 51 3.3 DDP Layering................................................7 52 4 Glossary....................................................9 53 4.1 General.....................................................9 54 4.2 LLP........................................................10 55 4.3 Direct Data Placement (DDP)................................10 56 5 Reliable Delivery LLP Requirements.........................13 57 6 Header Format..............................................15 58 6.1 DDP Control Field..........................................15 59 6.2 DDP Tagged Buffer Model Header.............................16 60 6.3 DDP Untagged Buffer Model Header...........................17 61 6.4 DDP Segment Format.........................................19 62 7 Data Transfer..............................................20 63 7.1 DDP Tagged or Untagged Buffer Models.......................20 64 7.1.1 Tagged Buffer Model.......................................20 65 7.1.2 Untagged Buffer Model.....................................20 66 7.2 Segmentation and Reassembly of a DDP Message...............21 67 7.3 Ordering Among DDP Messages................................22 68 7.4 DDP Message Completion & Delivery..........................23 69 8 DDP Stream Setup & Teardown................................25 70 8.1 DDP Stream Setup...........................................25 71 8.2 DDP Stream Teardown........................................25 72 8.2.1 DDP Graceful Teardown.....................................25 73 8.2.2 DDP Abortive Teardown.....................................26 74 9 Error Semantics............................................27 75 9.1 Errors detected at the Data Sink...........................27 76 9.2 DDP Error Numbers..........................................28 77 10 Security Considerations....................................29 78 10.1 Protocol-specific Security Considerations.................29 79 10.2 Using IPSec with DDP......................................29 80 10.3 Other Security Considerations.............................29 81 11 IANA Considerations........................................31 82 12 References.................................................32 83 12.1 Normative References......................................32 84 12.2 Informative References....................................32 85 13 Appendix...................................................33 86 13.1 Receive Window sizing.....................................33 87 14 Author's Addresses.........................................34 88 15 Acknowledgments............................................35 89 16 Full Copyright Statement...................................38 91 shah, et. al. Expires April 2003 2 92 Table of Figures 94 Figure 1 DDP Layering.............................................7 95 Figure 2 MPA, DDP, and RDMAP Header Alignment.....................8 96 Figure 3 DDP Control Field.......................................15 97 Figure 4 Tagged Buffer DDP Header................................16 98 Figure 5 Untagged Buffer DDP Header..............................18 99 Figure 6 DDP Segment Format......................................19 101 shah, et. al. Expires April 2003 3 102 3 Introduction 104 Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol 105 (ULP) to send data to a Data Sink without requiring the Data Sink to 106 Place the data in an intermediate buffer - thus when the data 107 arrives at the Data Sink, the network interface can Place the data 108 directly into the ULP's buffer. This can enable the Data Sink to 109 consume substantially less memory bandwidth than a buffered model 110 because the Data Sink is not required to move the data from the 111 intermediate buffer to the final destination. Additionally, this can 112 also enable the network protocol to consume substantially fewer CPU 113 cycles than if the CPU was used to move the data, and removes the 114 bandwidth limitation of only being able to move data as fast as the 115 CPU can copy the data. 117 DDP preserves ULP record boundaries (messages) while providing a 118 variety of data transfer mechanisms and completion mechanisms to be 119 used to transfer ULP messages. 121 3.1 Architectural Goals 123 DDP has been designed with the following high-level architectural 124 goals: 126 * Provide a buffer model that enables the Local Peer to Advertise 127 a named buffer (i.e. a Tag for a buffer) to the Remote Peer, 128 such that across the network the Remote Peer can Place data 129 into the buffer at Remote Peer specified locations. This is 130 referred to as the Tagged Buffer Model. 132 * Provide a second receive buffer model which preserves ULP 133 message boundaries from the Remote Peer and keeps the Local 134 Peer's buffers anonymous (i.e. Untagged). This is referred to 135 as the Untagged Buffer Model. 137 * Provide reliable, in-order Delivery semantics for both Tagged 138 and Untagged Buffer Models. 140 * Provide segmentation and reassembly of ULP messages. 142 * Enable the ULP buffer to be used as a reassembly buffer, 143 without a need for a copy, even if incoming DDP Segments arrive 144 out of order. This requires the protocol to separate Data 145 Placement of ULP Payload contained in an incoming DDP Segment 146 from Data Delivery of completed ULP Messages. 148 * If the LLP supports multiple LLP streams within a LLP 149 Connection, provide the above capabilities independently on 150 shah, et. al. Expires April 2003 4 151 each LLP stream and enable the capability to be exported on a 152 per LLP stream basis to the ULP. 154 3.2 Protocol Overview 156 DDP supports two basic data transfer models - a Tagged Buffer data 157 transfer model and an Untagged Buffer data transfer model. 159 The Tagged Buffer data transfer model requires the Data Sink to send 160 the Data Source an identifier for the ULP buffer, referred to as a 161 Steering Tag (STag). The STag is transferred to the Data Source 162 using a ULP defined method. Once the Data Source ULP has an STag for 163 a destination ULP buffer, it can request that DDP send the ULP data 164 to the destination ULP buffer by specifying the STag to DDP. Note 165 that the Tagged Buffer does not have to be filled starting at the 166 beginning of the ULP buffer. The ULP Data Source can provide an 167 arbitrary offset into the ULP buffer. 169 The Untagged Buffer data transfer model enables data transfer to 170 occur without requiring the Data Sink to Advertise a ULP Buffer to 171 the Data Source. The Data Sink can queue up a series of receive ULP 172 buffers. An Untagged DDP Message from the Data Source consumes an 173 Untagged Buffer at the Data Sink. Because DDP is message oriented, 174 even if the Data Source sends a DDP Message payload smaller than the 175 receive ULP buffer, the partially filled receive ULP buffer is 176 Delivered to the ULP anyway. If the Data Source sends a DDP Message 177 payload larger than the receive ULP buffer, it results in an error. 179 There are several key differences between the Tagged and Untagged 180 Buffer Model: 182 * For the Tagged Buffer Model, the Data Source specifies which 183 received Tagged Buffer will be used for a specific Tagged DDP 184 Message (sender-based ULP buffer management). For the Untagged 185 Buffer Model, the Data Sink specifies the order in which 186 Untagged Buffers will be consumed as Untagged DDP Messages are 187 received (receiver-based ULP buffer management). 189 * For the Tagged Buffer Model, the ULP at the Data Sink must 190 Advertise the ULP buffer to the Data Source through a ULP 191 specific mechanism before data transfer can occur. For the 192 Untagged Buffer Model, data transfer can occur without an end- 193 to-end explicit ULP buffer Advertisement. Note, however, that 194 the ULP needs to address flow control issues because if a DDP 195 Message arrives for an Untagged Buffer without an associated 196 receive ULP buffer, the DDP Message is dropped, the DDP Stream 197 is disabled for reception, and an error is reported to the ULP 198 at the Data Sink. 199 shah, et. al. Expires April 2003 5 200 * For the Tagged Buffer Model, a DDP Message can start at an 201 arbitrary offset within the Tagged Buffer. For the Untagged 202 Buffer Model, a DDP Message can only start at offset 0. 204 * The Tagged Buffer Model allows multiple DDP Messages targeted 205 to a Tagged Buffer with a single ULP buffer Advertisement. The 206 Untagged Buffer Model requires associating a receive ULP buffer 207 for each DDP Message targeted to an Untagged Buffer. 209 Either data transfer model Places a ULP Message into a DDP Message. 210 Each DDP Message is then sliced into DDP Segments that are intended 211 to fit within a lower-layer-protocol's (LLP) Maximum Upper Layer 212 Protocol Data Unit (MULPDU). Thus the ULP can post arbitrary size 213 ULP Messages, containing up to 2^32 - 1 octets of ULP Payload, and 214 DDP slices the ULP message into DDP Segments which are reassembled 215 transparently at the Data Sink. 217 DDP provides in-order Delivery for the ULP. However, DDP 218 differentiates between Data Delivery and Data Placement. DDP 219 provides enough information in each DDP Segment to allow the ULP 220 Payload in each inbound DDP Segment payloads to be directly Placed 221 into the correct ULP Buffer, even when the DDP Segments arrive out- 222 of-order. Thus, DDP enables the reassembly of ULP Payload contained 223 in DDP Segments of a DDP Message into a ULP Message to occur within 224 the ULP Buffer, therefore eliminating the traditional copy out of 225 the reassembly buffer into the ULP Buffer. 227 A DDP Message's payload is Delivered to the ULP when: 229 * all DDP Segments of a DDP Message have been completely received 230 and the payload of the DDP Message has been Placed into the 231 associated ULP Buffer, 233 * all prior DDP Messages have been Placed, and 235 * all prior DDP Message Deliveries have been performed. 237 The LLP under DDP may support a single LLP stream of data per 238 connection (e.g. TCP) or multiple LLP streams of data per connection 239 (e.g. SCTP). But in either case, DDP is specified such that each DDP 240 Stream is independent and maps to a single LLP stream. Within a 241 specific DDP Stream, the LLP Stream is required to provide in-order, 242 reliable Delivery. Note that DDP has no ordering guarantees between 243 DDP Streams. 245 A DDP protocol could potentially run over reliable Delivery LLPs or 246 unreliable Delivery LLPs. This specification requires reliable, in 247 order Delivery LLPs. 248 shah, et. al. Expires April 2003 6 249 3.3 DDP Layering 251 DDP is intended to be LLP independent, subject to the requirements 252 defined in section 5. However, DDP was specifically defined to be 253 part of a family of protocols that were created to work well 254 together, as shown in Figure 1 DDP Layering. For LLP protocol 255 definitions of each LLP, see [MPA], [TCP], and [SCTP]. 257 DDP enables direct data Placement capability for any ULP, but it has 258 been specifically designed to work well with RDMAP (see [RDMA]), and 259 is part of the iWARP protocol suite. 261 +-------------------+ 262 | | 263 | RDMA ULP | 264 | | 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 | | | 267 | ULP | RDMAP | 268 | | | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | | 271 | DDP protocol | 272 | | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 | | | 275 | MPA | | 276 | | | 277 | | | 278 +-+-+-+-+-+-+-+-+-+ SCTP | 279 | | | 280 | TCP | | 281 | | | 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 Figure 1 DDP Layering 286 If DDP is layered below RDMAP and on top of MPA and TCP, then the 287 respective headers and payload are arranged as follows (Note: For 288 clarity, MPA header and CRC are included but framing markers are not 289 shown.): 291 shah, et. al. Expires April 2003 7 292 0 1 2 3 293 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 295 | | 296 // TCP Header // 297 | | 298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 299 | MPA Header | | 300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 301 | | 302 // DDP Header // 303 | | 304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 305 | | 306 // RDMAP Header // 307 | | 308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 309 | | 310 // RDMAP ULP Payload // 311 | | 312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 | MPA CRC | 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 316 Figure 2 MPA, DDP, and RDMAP Header Alignment 318 shah, et. al. Expires April 2003 8 319 4 Glossary 321 4.1 General 323 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 324 the act of informing a Remote Peer that a local RDMA Buffer is 325 available to it. A Node makes available an RDMA Buffer for 326 incoming RDMA Read or RDMA Write access by informing its 327 RDMA/DDP peer of the Tagged Buffer identifiers (STag, base 328 address, length). This advertisement of Tagged Buffer 329 information is not defined by RDMA/DDP and is left to the ULP. A 330 typical method would be for the Local Peer to embed the Tagged 331 Buffer's Steering Tag, address, and length in a Send message 332 destined for the Remote Peer. 334 Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined 335 as the process of informing the ULP or consumer that a 336 particular Message is available for use. This is specifically 337 different from "Placement", which may generally occur in any 338 order, while the order of "Delivery" is strictly defined. See 339 "Data Placement". 341 Data Sink - The peer receiving a data payload. Note that the Data 342 Sink can be required to both send and receive RDMA/DDP Messages 343 to transfer a data payload. 345 Data Source - The peer sending a data payload. Note that the Data 346 Source can be required to both send and receive RDMA/DDP 347 Messages to transfer a data payload. 349 iWARP - A suite of wire protocols comprised of RDMAP [RDMAP], DDP 350 [DDP], and MPA [MPA]. The iWARP protocol suite may be layered 351 above TCP, SCTP, or other transport protocols. 353 Local Peer - The RDMA/DDP protocol implementation on the local end 354 of the connection. Used to refer to the local entity when 355 describing a protocol exchange or other interaction between two 356 Nodes. 358 Node - A computing device attached to one or more links of network. 359 A Node in this context does not refer to a specific application 360 or protocol instantiation running on the computer. A Node may 361 consist of one or more RNICs installed in a host computer. 363 Remote Peer - The RDMA/DDP protocol implementation on the opposite 364 end of the connection. Used to refer to the remote entity when 365 describing protocol exchanges or other interactions between two 366 Nodes. 367 shah, et. al. Expires April 2003 9 368 ULP - Upper Layer Protocol. The protocol layer above the protocol 369 layer currently being referenced. The ULP for RDMA/DDP is 370 expected to be an OS, Application, adaptation layer, or 371 proprietary device. The RDMA/DDP documents do not specify a ULP 372 - they provide a set of semantics that allow a ULP to be 373 designed to utilize RDMA/DDP. 375 ULP Message - the ULP data that is handed to a specific protocol 376 layer for transmission. Data boundaries are preserved as they 377 are transmitted through iWARP. 379 ULP Payload - The ULP data that is contained within a single 380 protocol segment or packet (e.g. a DDP Segment). 382 4.2 LLP 384 LLP - Lower Layer Protocol. The protocol layer beneath the protocol 385 layer currently being referenced. For example, for DDP the LLP 386 is SCTP, MPA, or other transport protocols. For RDMA, the LLP is 387 DDP. 389 LLP Connection - Corresponds to an LLP transport-level connection 390 between the peer LLP layers on two nodes. 392 LLP Stream - Corresponds to a single LLP transport-level stream 393 between the peer LLP layers on two Nodes. One or more LLP 394 Streams may map to a single transport-level LLP Connection. For 395 transport protocols that support multiple streams per connection 396 (e.g. SCTP), a LLP Stream corresponds to one transport-level 397 stream. 399 MULPDU - Maximum ULPDU. The current maximum size of the record that 400 is acceptable for DDP to pass to the LLP for transmission. 402 ULPDU - Upper Layer Protocol Data Unit. The data record defined by 403 the layer above MPA. 405 4.3 Direct Data Placement (DDP) 407 DDP Graceful Teardown - The act of closing a DDP Stream such that 408 all in-progress and pending DDP Messages are allowed to complete 409 successfully. 411 DDP Abortive Teardown - The act of closing a DDP Stream without 412 attempting to complete in-progress and pending DDP Messages. 414 shah, et. al. Expires April 2003 10 415 Data Placement (Placement, Placed, Places) - For DDP, this term is 416 specifically used to indicate the process of writing to a data 417 buffer by a DDP implementation. DDP Segments carry Placement 418 information, which may be used by the receiving DDP 419 implementation to perform Data Placement of the DDP Segment ULP 420 Payload. See "Data Delivery". 422 DDP Control Field - a fixed 8-bit field in the DDP Header. 424 DDP Header - The header present in all DDP Segments. The DDP Header 425 contains control and Placement fields that are used to define 426 the final Placement location for the ULP Payload carried in a 427 DDP Segment. 429 DDP Message - A ULP defined unit of data interchange, which is 430 subdivided into one or more DDP Segments. This segmentation may 431 occur for a variety of reasons, including segmentation to 432 respect the maximum segment size of the underlying transport 433 protocol. 435 DDP Segment - The smallest unit of data transfer for the DDP 436 protocol. It includes a DDP Header and ULP Payload (if present). 437 A DDP Segment should be sized to fit within the Lower Layer 438 Protocol MULPDU. 440 DDP Stream - a sequence of DDP messages whose ordering is defined by 441 the LLP. For SCTP, a DDP Stream maps directly to an SCTP stream. 442 For MPA, a DDP Stream maps directly to a TCP connection and a 443 single DDP Stream is supported. Note that DDP has no ordering 444 guarantees between DDP Streams. 446 Direct Data Placement - A mechanism whereby ULP data contained 447 within DDP Segments may be Placed directly into its final 448 destination in memory without processing of the ULP. This may 449 occur even when the DDP Segments arrive out of order. Out of 450 order Placement support may require the Data Sink to implement 451 the LLP and DDP as one functional block. 453 Direct Data Placement Protocol (DDP) - Also, a wire protocol that 454 supports Direct Data Placement by associating explicit memory 455 buffer placement information with the LLP payload units. 457 Message Offset (MO) - For the DDP Untagged Buffer Model, specifies 458 the offset, in octets, from the start of a DDP Message. 460 Message Sequence Number (MSN) - For the DDP Untagged Buffer Model, 461 specifies a sequence number that is increasing with each DDP 462 Message. 463 shah, et. al. Expires April 2003 11 464 Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a 465 destination Data Sink queue for a DDP Segment. 467 Steering Tag - An identifier of a Tagged Buffer on a Node, valid as 468 defined within a protocol specification. 470 STag - Steering Tag 472 Tagged Buffer - A buffer that is explicitly Advertised to the Remote 473 Peer through exchange of an STag, Target Offset, and length. 475 Tagged Buffer Model - A DDP data transfer model used to transfer 476 Tagged Buffers from the Local Peer to the Remote Peer. 478 Tagged DDP Message - A DDP Message that targets a Tagged Buffer. 480 Target Offset (TO) - The offset within a Tagged Buffer on a Node. 482 ULP Buffer - A buffer owned above the DDP Layer and advertised to 483 the DDP Layer either as a Tagged Buffer or an Untagged ULP 484 Buffer. 486 ULP Message Length - is the total length of the ULP Payload contained 487 in a DDP Message. 489 Untagged Buffer - A buffer that is not explicitly Advertised to the 490 Remote Peer. 492 Untagged Buffer Model - A DDP data transfer model used to transfer 493 Untagged Buffers from the Local Peer to the Remote Peer. 495 Untagged DDP Message - A DDP Message that targets an Untagged 496 Buffer. 498 shah, et. al. Expires April 2003 12 499 5 Reliable Delivery LLP Requirements 501 1. LLPs MUST expose MULPDU & MULPDU Changes. This is required so 502 that the DDP layer can perform segmentation aligned with the 503 MULPDU and can adapt as MULPDU changes come about. The corner 504 case of how to handle outstanding requests during a MULPDU 505 change is covered by the requirements below. 507 2. In the event of a MULPDU change, DDP MUST NOT be required by the 508 LLP to re-segment DDP Segments that have been previously posted 509 to the LLP. Note that under pathological conditions the LLP may 510 change the advertised MULPDU more frequently than the queue of 511 previously posted DDP Segment transmit requests is flushed. 512 Under this pathological condition, the LLP transmit queue can 513 contain DDP Messages which were posted multiple MULPDU updates 514 previously, thus there may be no correlation between the queued 515 DDP Segment(s) and the LLP's current value of MULPDU. 517 3. The LLP MUST ensure that if it accepts a DDP Segment, it will 518 transfer it reliably to the receiver or return with an error 519 stating that the transfer failed to complete. 521 4. The LLP MUST preserve DDP Segment and Message boundaries at the 522 Data Sink. 524 5. The LLP MAY provide the incoming segments out of order for 525 Placement, but if it does, it MUST also provide information that 526 specifies what the sender specified order was. 528 6. LLP MUST provide a strong digest (at least equivalent to CRC32- 529 C) to cover at least the DDP Segment. It is believed that some 530 of the existing data integrity digests are not sufficient and 531 that direct memory transfer semantics require a stronger digest 532 than, for example, a simple checksum. 534 7. On receive, the LLP MUST provide the length of the DDP Segment 535 received. This ensures that DDP does not have to carry a length 536 field in its header. 538 8. If an LLP does not support teardown of a LLP stream independent 539 of other LLP streams and a DDP error occurs on a specific DDP 540 Stream, then the LLP MUST label the associated LLP stream as an 541 erroneous LLP stream and MUST NOT allow any further data 542 transfer on that LLP stream after DDP requests the associated 543 DDP Stream to be torn down. 545 9. For a specific LLP Stream, the LLP MUST provide a mechanism to 546 indicate that the LLP Stream has been gracefully torn down. For 547 shah, et. al. Expires April 2003 13 548 a specific LLP Connection, the LLP MUST provide a mechanism to 549 indicate that the LLP Connection has been gracefully torn down. 550 Note that if the LLP does not allow an LLP Stream to be torn 551 down independently of the LLP Connection, the above requirements 552 allow the LLP to notify DDP of both events at the same time. 554 10. For a specific LLP Connection, when all LLP Streams are either 555 gracefully torn down or are labeled as erroneous LLP streams, 556 the LLP Connection MUST be torn down. 558 11. The LLP MUST NOT pass a duplicate DDP Segment to the DDP Layer 559 after it has passed all the previous DDP Segments to the DDP 560 Layer and the associated ordering information for the previous 561 DDP Segments and the current DDP Segment. 563 shah, et. al. Expires April 2003 14 564 6 Header Format 566 DDP has two different header formats: one for Data Placement into 567 Tagged Buffers, and the other for Data Placement into Untagged 568 Buffers. See Section 7.1 for a description of the two models. 570 6.1 DDP Control Field 572 The first 8 bits of the DDP Header carry a DDP Control Field that is 573 common between the two formats. It is shown below in Figure 3, 574 offset by 16 bits to accommodate the MPA header defined in [MPA]. 575 The MPA header is only present if DDP is layered on top of MPA. 577 0 1 2 3 578 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 579 +-+-+-+-+-+-+-+-+ 580 |T|L| Rsvd |DV | 581 +-+-+-+-+-+-+-+-+ 582 Figure 3 DDP Control Field 584 T - Tagged flag: 1 bit. 586 Specifies the Tagged or Untagged Buffer Model. If set to one, 587 the ULP Payload carried in this DDP Segment MUST be Placed into 588 a Tagged Buffer. 590 If set to zero, the ULP Payload carried in this DDP Segment 591 MUST be Placed into an Untagged Buffer. 593 L - Last flag: 1 bit. 595 Specifies whether the DDP Segment is the Last segment of a DDP 596 Message. It MUST be set to one on the last DDP Segment of every 597 DDP Message. It MUST NOT be set to one on any other DDP 598 Segment. 600 The DDP Segment with the L bit set to 1 MUST be posted to the 601 LLP after all other DDP Segments of the associated DDP Message 602 have been posted to the LLP. For an Untagged DDP Message, the 603 DDP Segment with the L bit set to 1 MUST carry the highest MO. 605 If the Last flag is set to one, the DDP Message payload MUST be 606 Delivered to the ULP after: 608 . Placement of all DDP Segments of this DDP Message and all 609 prior DDP Messages, and 610 shah, et. al. Expires April 2003 15 611 . Delivery of each prior DDP Message. 613 If the Last flag is set to zero, the DDP Segment is an 614 intermediate DDP Segment. 616 Rsvd - Reserved: 4 bits. 618 Reserved for future use by the DDP protocol. This field MUST be 619 set to zero on transmit, and not checked on receive. 621 DV - Direct Data Placement Protocol Version: 2 bits. 623 The version of the DDP Protocol in use. This field MUST be set 624 to one to indicate the version of the specification described 625 in this document. The value of DV MUST be the same for all the 626 DDP Segments transmitted or received on a DDP Stream. 628 6.2 DDP Tagged Buffer Model Header 630 Figure 4 shows the DDP Header format that MUST be used in all DDP 631 Segments that target Tagged Buffers. It includes the DDP Control 632 Field previously defined in Section 6.1. (Note: In Figure 4, the DDP 633 Header is offset by 16 bits to accommodate the MPA header defined in 634 [MPA]. The MPA header is only present if DDP is layered on top of 635 MPA.) 637 0 1 2 3 638 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 639 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 640 |T|L| Rsvd | DV| RsvdULP | 641 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 642 | STag | 643 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 644 | | 645 + TO + 646 | | 647 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 648 Figure 4 Tagged Buffer DDP Header 650 T is set to one. 652 RsvdULP - Reserved for use by the ULP: 8 bits. 654 The RsvdULP field is opaque to the DDP protocol and can be 655 structured in any way by the ULP. At the Data Source, DDP MUST 656 set RsvdULP Field to the value specified by the ULP. It is 657 transferred unmodified from the Data Source to the Data Sink. 658 At the Data Sink, DDP MUST provide the RsvdULP field to the ULP 659 shah, et. al. Expires April 2003 16 660 when the DDP Message is delivered. Each DDP Segment within a 661 specific DDP Message MUST contain the same value for this 662 field. 664 STag - Steering Tag: 32 bits. 666 The Steering Tag identifies the Data Sink's Tagged Buffer. The 667 STag MUST be valid for this DDP Stream. The STag is associated 668 with the DDP Stream through a mechanism that is outside the 669 scope of the DDP Protocol specification. At the Data Source, 670 DDP MUST set the STag field to the value specified by the ULP. 671 At the Data Sink, the DDP MUST provide the STag field when the 672 ULP Message is delivered. Each DDP Segment within a specific 673 DDP Message MUST contain the same value for this field and MUST 674 be the value supplied by the ULP. 676 TO - Tagged Offset: 64 bits. 678 The Tagged Offset specifies the offset, in octets, within the 679 Data Sink's Tagged Buffer, where the Placement of ULP Payload 680 contained in the DDP Segment starts. A DDP Message MAY start at 681 an arbitrary TO within a Tagged Buffer. 683 6.3 DDP Untagged Buffer Model Header 685 Figure 5 shows the DDP Header format that MUST be used in all DDP 686 Segments that target Untagged Buffers. It includes the DDP Control 687 Field previously defined in Section 6.1. (Note: In Figure 5, the DDP 688 Header is offset by 16 bits to accommodate the MPA header defined in 689 [MPA]. The MPA header is only present if DDP is layered on top of 690 MPA.) 692 shah, et. al. Expires April 2003 17 693 0 1 2 3 694 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 696 |T|L| Rsvd | DV| RsvdULP[0:7] | 697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 698 | RsvdULP[8:39] | 699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 700 | QN | 701 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 702 | MSN | 703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 704 | MO | 705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 706 Figure 5 Untagged Buffer DDP Header 708 T is set to zero. 710 RsvdULP - Reserved for use by the ULP: 40 bits. 712 The RsvdULP field is opaque to the DDP protocol and can be 713 structured in any way by the ULP. At the Data Source, DDP MUST 714 set RsvdULP Field to the value specified by the ULP. It is 715 transferred unmodified from the Data Source to the Data Sink. 716 At the Data Sink, DDP MUST provide RsvdULP field to the ULP 717 when the ULP Message is Delivered. Each DDP Segment within a 718 specific DDP Message MUST contain the same value for the 719 RsvdULP field. At the Data Sink, the DDP implementation is NOT 720 REQUIRED to verify that the same value is present in the 721 RsvdULP field of each DDP Segment within a specific DDP Message 722 and MAY provide the value from any one of the received DDP 723 Segment to the ULP when the ULP Message is Delivered. 725 QN - Queue Number: 32 bits. 727 The Queue Number identifies the Data Sink's Untagged Buffer 728 queue referenced by this header. Each DDP segment within a 729 specific DDP message MUST contain the same value for this field 730 and MUST be the value supplied by the ULP at the Data Source. 732 MSN - Message Sequence Number: 32 bits. 734 The Message Sequence Number specifies a sequence number that 735 MUST be increased by one (modulo 2^32) with each DDP Message 736 targeting the specific Queue Number on the DDP Stream 737 associated with this DDP Segment. The initial value for MSN 739 shah, et. al. Expires April 2003 18 740 MUST be one. The MSN value MUST wrap to 0 after a value of 741 0xFFFFFFFF. 743 MO - Message Offset: 32 bits. 745 The Message Offset specifies the offset, in octets, from the 746 start of the DDP Message represented by the MSN and Queue 747 Number on the DDP Stream associated with this DDP Segment. The 748 MO referencing the first octet of the DDP Message MUST be set 749 to zero by the DDP layer. 751 6.4 DDP Segment Format 753 Each DDP Segment MUST contain a DDP Header. Each DDP Segment may 754 also contain ULP Payload. Following is the DDP Segment format: 756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 757 | DDP | | 758 | Header| ULP Payload (if any) | 759 | | | 760 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 761 Figure 6 DDP Segment Format 763 shah, et. al. Expires April 2003 19 764 7 Data Transfer 766 DDP supports multi-segment DDP Messages. Each DDP Message is 767 composed of one or more DDP Segments. Each DDP Segment contains a 768 DDP Header. The DDP Header contains the information required by the 769 receiver to Place any ULP Payload included in the DDP Segment. 771 7.1 DDP Tagged or Untagged Buffer Models 773 DDP uses two basic Buffer Models for the Placement of the ULP 774 Payload: Tagged Buffer Model and Untagged Buffer Model. 776 7.1.1 Tagged Buffer Model 778 The Tagged Buffer Model is used by the Data Source to transfer a DDP 779 Message into a Tagged Buffer at the Data Sink that has been 780 previously Advertised to the Data Source. An STag identifies a 781 Tagged Buffer. For the Placement of a DDP Message using the Tagged 782 Buffer model, the STag is used to identify the buffer, and the TO is 783 used to identify the offset within the Tagged Buffer into which the 784 ULP Payload is transferred. The protocol used to Advertise the 785 Tagged Buffer is outside the scope of this specification (i.e. ULP 786 specific). A DDP Message can start at an arbitrary TO within a 787 Tagged Buffer. 789 Additionally, a Tagged Buffer can potentially be written multiple 790 times. This might be done for error recovery or because a buffer is 791 being re-used after some ULP specific synchronization mechanism. 793 7.1.2 Untagged Buffer Model 795 The Untagged Buffer Model is used by the Data Source to transfer a 796 DDP Message to the Data Sink into a queued buffer. 798 The DDP Queue Number is used by the ULP to separate ULP messages 799 into different queues of receive buffers. For example, if two queues 800 were supported, the ULP could use one queue to post buffers handed 801 to it by the application above the ULP, and it could use the other 802 queue for buffers which are only consumed by ULP specific control 803 messages. This enables the separation of ULP control messages from 804 opaque ULP Payload when using Untagged Buffers. 806 The DDP Message Sequence Number can be used by the Data Sink to 807 identify the specific Untagged Buffer. The protocol used to 808 communicate how many buffers have been queued is outside the scope 809 of this specification. Similarly, the exact implementation of the 810 buffer queue is outside the scope of this specification. 812 shah, et. al. Expires April 2003 20 813 7.2 Segmentation and Reassembly of a DDP Message 815 At the Data Source, the DDP layer MUST segment the data contained in 816 a ULP message into a series of DDP Segments, where each DDP Segment 817 contains a DDP Header and ULP Payload, and MUST be no larger than 818 the MULPDU value advertised by the LLP. The ULP Message Length MUST 819 be less than 2^32. At the Data Source, the DDP layer MUST send all 820 the data contained in the ULP message. At the Data Sink, the DDP 821 layer MUST Place the ULP Payload contained in all valid incoming DDP 822 Segments associated with a DDP Message into the ULP Buffer. 824 DDP Message segmentation at the Data Source is accomplished by 825 identifying a DDP Message (which corresponds one-to-one with a ULP 826 Message) uniquely and then, for each associated DDP Segment of a DDP 827 Message, by specifying an octet offset for the portion of the ULP 828 Message contained in the DDP Segment. 830 For an Untagged DDP Message, the combination of the QN and MSN 831 uniquely identifies a DDP Message. The octet offset for each DDP 832 Segment of a Untagged DDP Message is the MO field. For each DDP 833 Segment of a Untagged DDP Message, the MO MUST be set to the octet 834 offset from the first octet in the associated ULP Message (which is 835 defined to be zero) to the first octet in the ULP Payload contained 836 in the DDP Segment. 838 For example, if the ULP Untagged Message was 2048 octets, and the 839 MULPDU was 1500 octets, the Data Source would generate two DDP 840 Segments, one with MO = 0, containing 1482 octets of ULP Payload, 841 and a second with MO = 1482, containing 566 octets of ULP Payload. 842 In this example, the amount of ULP Payload for the first DDP Segment 843 was calculated as: 845 1482 = 1500 (MULPDU) - 18 (for the DDP Header) 847 For a Tagged DDP Message, the STag and TO, combined with the in- 848 order delivery characteristics of the LLP, are used to segment and 849 reassemble the ULP Message. Because the initial octet offset (the TO 850 field) can be non-zero, recovery of the original ULP Message 851 boundary cannot be done in the general case without an additional 852 ULP Message. 854 Implementers Note: One implementation, valid for some ULPs such 855 as RDMAP, is to not directly support recovery of the ULP 856 Message boundary for a Tagged DDP Message. For example, the ULP 857 may wish to have the Local Peer use small buffers at the Data 858 Source even when the ULP at the Data Sink has advertised a 859 single large Tagged Buffer for this data transfer. In this 860 case, the ULP may choose to use the same STag for multiple 861 shah, et. al. Expires April 2003 21 862 consecutive ULP Messages. Thus a non-zero initial TO and re-use 863 of the STag effectively enables the ULP to implement 864 segmentation and reassembly due to ULP specific constraints. 865 See [RDMAP] for details of how this is done. 867 A different implementation of a ULP could use an Untagged DDP 868 Message sent after the Tagged DDP Message which details the 869 initial TO for the STag that was used in the Tagged DDP 870 Message. And finally, another implementation of a ULP could 871 choose to always use an initial TO of zero such that no 872 additional message is required to convey the initial TO used in 873 a Tagged DDP Message. 875 Regardless of whether the ULP chooses to recover the original ULP 876 Message boundary at the Data Sink for a Tagged DDP Message, DDP 877 supports segmentation and reassembly of the Tagged DDP Message. The 878 STag is used to identify the ULP Buffer at the Data Sink and the TO 879 is used to identify the octet-offset within the ULP Buffer 880 referenced by the STag. The ULP at the Data Source MUST specify the 881 STag and the initial TO when the ULP Message is handed to DDP. 883 For each DDP Segment of a Tagged DDP Message, the TO MUST be set to 884 the octet offset from the first octet in the associated ULP Message 885 to the first octet in the ULP Payload contained in the DDP Segment, 886 plus the TO assigned to the first octet in the associated ULP 887 Message. 889 For example, if the ULP Tagged Message was 2048 octets with an 890 initial TO of 16384, and the MULPDU was 1500 octets, the Data Source 891 would generate two DDP Segments, one with TO = 16384, containing the 892 first 1486 octets of ULP payload, and a second with TO = 17870, 893 containing 562 octets of ULP payload. In this example, the amount of 894 ULP payload for the first DDP Segment was calculated as: 896 1486 = 1500 (MULPDU) - 14 (for the DDP Header) 898 A zero-length Tagged DDP Message is allowed and MUST consume exactly 899 one DDP Segment. Only the DDP Control and RsvdULP Fields MUST be 900 valid for a zero length Tagged DDP Segment. The STag and TO fields 901 MUST NOT be checked for a zero-length Tagged DDP Message. 903 For either Untagged or Tagged DDP Messages, the Data Sink is not 904 required to verify that the entire ULP Message has been received. 906 7.3 Ordering Among DDP Messages 908 Messages passed through the DDP MUST conform to the ordering rules 909 defined in this section. 910 shah, et. al. Expires April 2003 22 911 At the Data Source, DDP: 913 * MUST transmit DDP Messages in the order they were submitted to 914 the DDP layer, 916 * SHOULD transmit DDP Segments within a DDP Message in increasing 917 MO order for Untagged DDP Messages and in increasing TO order 918 for Tagged DDP Messages. 920 At the Data Sink, DDP (Note: The following rules are motivated by 921 LLP implementations that separate Placement and Delivery.): 923 * MAY perform Placement of DDP Segments out of order, 925 * MAY perform Placement of a DDP Segment more than once, 927 * MUST Deliver a DDP Message to the ULP at most once, 929 * MUST Deliver DDP Messages to the ULP in the order they were 930 sent by the Data Source. 932 7.4 DDP Message Completion & Delivery 934 At the Data Source, DDP Message transfer is considered completed 935 when the reliable, in-order transport LLP has indicated that the 936 transfer will occur reliably. Note that this in no way restricts the 937 LLP from buffering the data at either the Data Source or Data Sink. 938 Thus at the Data Source, completion of a DDP Message does not 939 necessarily mean that the Data Sink has received the message. 941 At the Data Sink, DDP MUST Deliver a DDP Message if and only if all 942 of the following are true: 944 * the last DDP Segment of the DDP Message had its Last flag set, 946 * all of the DDP Segments of the DDP Message have been Placed, 948 * all preceding DDP Messages have been Placed, and 950 * each preceding DDP Message has been Delivered to the ULP. 952 At the Data Sink, DDP MUST provide the ULP Message Length to the ULP 953 when an Untagged DDP Message is Delivered. The ULP Message Length 954 may be calculated by adding the MO and the ULP Payload length in the 955 last DDP Segment (with the Last flag set) of an Untagged DDP 956 Message. 958 shah, et. al. Expires April 2003 23 959 At the Data Sink, DDP MUST provide the RsvdULP Field of the DDP 960 Message to the ULP when the DDP Message is delivered. 962 shah, et. al. Expires April 2003 24 963 8 DDP Stream Setup & Teardown 965 This section describes LLP independent issues related to DDP Stream 966 setup and teardown. 968 8.1 DDP Stream Setup 970 It is expected that the ULP will use a mechanism outside the scope 971 of this specification to establish an LLP Connection, and that the 972 LLP Connection will support one or more LLP Streams (e.g. MPA/TCP or 973 SCTP). After the LLP sets up the LLP Stream, it will enable a DDP 974 Stream on a specific LLP Stream at an appropriate point. 976 The ULP is required to enable both endpoints of an LLP Stream for 977 DDP data transfer at the same time, in both directions; this is 978 necessary so that the Data Sink can properly recognize the DDP 979 Segments. 981 8.2 DDP Stream Teardown 983 DDP MUST NOT independently initiate Stream Teardown. DDP either 984 responds to a stream being torn down by the LLP or processes a 985 request from the ULP to teardown a stream. DDP Stream teardown 986 disables DDP capabilities on both endpoints. For connection-oriented 987 LLPs, DDP Stream teardown MAY result in underlying LLP Connection 988 teardown. 990 8.2.1 DDP Graceful Teardown 992 It is up to the ULP to ensure that DDP teardown happens on both 993 endpoints of the DDP Stream at the same time; this is necessary so 994 that the Data Sink stops trying to interpret the DDP Segments. 996 If the Local Peer ULP indicates graceful teardown, the DDP layer on 997 the Local Peer SHOULD ensure that all ULP data would be transferred 998 before the underlying LLP Stream & Connection are torn down, and any 999 further data transfer requests by the Local Peer ULP MUST return an 1000 error. 1002 If the DDP layer on the Local Peer receives a graceful teardown 1003 request from the LLP, any further data received after the request is 1004 considered an error and MUST cause the DDP Stream to be abortively 1005 torn down. 1007 If the Local Peer LLP supports a half-closed LLP Stream, on the 1008 receipt of a LLP graceful teardown request of the DDP Stream, DDP 1009 SHOULD indicate the half-closed state to the ULP, and continue to 1010 process outbound data transfer requests normally. Following this 1011 shah, et. al. Expires April 2003 25 1012 event, when the Local Peer ULP requests graceful teardown, DDP MUST 1013 indicate to the LLP that it SHOULD perform a graceful close of the 1014 other half of the LLP Stream. 1016 If the Local Peer LLP supports a half-closed LLP Stream, on the 1017 receipt of a ULP graceful half-close teardown request of the DDP 1018 Stream, DDP SHOULD keep data reception enabled on the other half of 1019 the LLP stream. 1021 8.2.2 DDP Abortive Teardown 1023 As previously mentioned, DDP does not independently terminate a DDP 1024 Stream. Thus any of the following fatal errors on a DDP Stream MUST 1025 cause DDP to indicate to the ULP that a fatal error has occurred: 1027 * Underlying LLP Connection or LLP Stream is lost. 1029 * Underlying LLP reports a catastrophic error. 1031 * DDP Header has one or more invalid fields. 1033 If the LLP indicates to the ULP that a fatal error has occurred, the 1034 DDP layer SHOULD report the error to the ULP (see Section 9.2, DDP 1035 Error Numbers) and complete all outstanding ULP requests with an 1036 error. If the underlying LLP Stream is still intact, DDP SHOULD 1037 continue to allow the ULP to transfer additional DDP Messages on the 1038 outgoing half connection after the fatal error was indicated to the 1039 ULP. This enables the ULP to transfer an error syndrome to the 1040 Remote Peer. After indicating to the ULP a fatal error has occurred, 1041 the DDP Stream MUST NOT be terminated until the Local Peer ULP 1042 indicates to the DDP layer that the DDP Stream should be abortively 1043 torndown. 1045 shah, et. al. Expires April 2003 26 1046 9 Error Semantics 1048 All LLP errors reported to DDP SHOULD be passed up to the ULP. 1050 9.1 Errors detected at the Data Sink 1052 For non-zero length Untagged DDP Segments, the DDP Segment MUST be 1053 validated before Placement by verifying: 1055 1. The QN is valid for this stream. 1057 2. The QN and MSN have an associated buffer that allows Placement 1058 of the payload. 1060 3. The MO falls in the range of legal offsets associated with the 1061 Untagged Buffer. 1063 4. The sum of the DDP Segment payload length and the MO falls in 1064 the range of legal offsets associated with the Untagged Buffer. 1066 5. For DDP Messages using Untagged Buffer model, the Message 1067 Sequence Number falls in the range of legal Message Sequence 1068 Numbers, for the queue defined by the QN. The legal range is 1069 defined as being between the MSN value assigned to the first 1070 available buffer for a specific QN and the MSN value assigned to 1071 the last available buffer for a specific QN. 1073 Implementers note: for a typical Queue Number, the lower limit 1074 of the Message Sequence Number is defined by whatever DDP 1075 Messages have already been Completed. The upper limit is 1076 defined by however many message buffers are currently available 1077 for that queue. Both numbers change dynamically as new DDP 1078 Messages are received and Completed, and new buffers are added. 1079 It is up to the ULP to ensure that sufficient buffers are 1080 available to handle the incoming DDP Segments. 1082 For non-zero length Tagged DDP Segments, the segment MUST be 1083 validated before Placement by verifying: 1085 1. The STag is valid for this stream. 1087 2. The STag has an associated buffer that allows Placement of the 1088 payload. 1090 3. The TO falls in the range of legal offsets registered for the 1091 STag. 1093 shah, et. al. Expires April 2003 27 1094 4. The sum of the DDP Segment payload length and the TO falls in 1095 the range of legal offsets registered for the STag. 1097 5. A 64-bit unsigned sum of the DDP Segment payload length and the 1098 TO does not wrap. 1100 If the DDP layer detects any of the receive errors listed in this 1101 section, it MUST cease placing the remainder of the DDP Segment and 1102 report the error(s) to the ULP. The DDP layer SHOULD include in the 1103 error report the DDP Header, the type of error, and the length of 1104 the DDP segment, if available. DDP MUST silently drop any subsequent 1105 incoming DDP Segments. Since each of these errors represents a 1106 failure of the sending ULP or protocol, DDP SHOULD enable the ULP to 1107 send one additional DDP Message before terminating the DDP Stream. 1109 9.2 DDP Error Numbers 1111 The following error numbers MUST be used when reporting receive 1112 errors to the ULP. They correspond to the checks enumerated in 1113 section 9.1. Each error is subdivided into a 4-bit Error Type and an 1114 8 bit Error Code. 1116 Error Error 1117 Type Code Description 1118 ---------------------------------------------------------- 1119 0x0 0x00 Local Catastrophic 1121 0x1 Tagged Buffer Error 1122 0x00 Invalid STag 1123 0x01 Base or bounds violation 1124 0x02 STag not associated with RDMA Stream 1125 0x03 TO wrap 1126 0x04 Invalid DDP version 1128 0x2 Untagged Buffer Error 1129 0x01 Invalid QN 1130 0x02 Invalid MSN - no buffer available 1131 0x03 Invalid MSN - MSN range is not valid 1132 0x04 Invalid MO 1133 0x05 DDP Message too long for available buffer 1134 0x06 Invalid DDP version 1136 0x3 Rsvd Reserved for the use by the LLP 1138 shah, et. al. Expires April 2003 28 1139 10 Security Considerations 1141 This section discusses both protocol-specific considerations and the 1142 implications of using DDP with existing security mechanisms. 1144 10.1 Protocol-specific Security Considerations 1146 The vulnerabilities of DDP to active third-party interference are no 1147 greater than any other protocol running over TCP. A third party, by 1148 injecting spoofed packets into the network that are Delivered to a 1149 DDP Data Sink, could launch a variety of attacks that exploit DDP- 1150 specific behavior. Since DDP directly or indirectly exposes memory 1151 addresses on the wire, the Placement information carried in each DDP 1152 Segment must be validated, including invalid STag and octet level 1153 granularity base and bounds check, before any data is Placed. For 1154 example, a third-party adversary could inject random packets that 1155 appear to be valid DDP Segments and corrupt the memory on a DDP Data 1156 Sink. Since DDP is IP transport protocol independent, communication 1157 security mechanisms such as IPsec [IPSEC] or TLS [TLS] may be used 1158 to prevent such attacks. 1160 10.2 Using IPSec with DDP 1162 IPsec can be used to protect against the packet injection attacks 1163 outlined above. Because IPsec is designed to secure arbitrary IP 1164 packet streams, including streams where packets are lost, DDP can 1165 run on top of IPsec without any change. IPsec packets are processed 1166 (e.g., integrity checked and possibly decrypted) in the order they 1167 are received, and a DDP Data Sink will process the decrypted DDP 1168 Segments contained in these packets in the same manner as DDP 1169 Segments contained in unsecured IP packets. 1171 10.3 Other Security Considerations 1173 DDP has several mechanisms that deal with a number of attacks. 1174 These attacks include, but are not limited to: 1176 1. Connection to/from an unauthorized or unauthenticated endpoint. 1177 2. Hijacking of a DDP Stream. 1178 3. Attempts to read or write from unauthorized memory regions. 1179 4. Injection of RDMA Messages within a Stream on a multi-user 1180 operating system by another application. 1182 DDP relies on the LLP to establish the LLP Stream over which DDP 1183 Messages will be carried. DDP itself does nothing to authenticate 1184 the validity of the LLP Stream of either of the endpoints. It is the 1186 shah, et. al. Expires April 2003 29 1187 responsibility of the ULP to validate the LLP Stream. This is highly 1188 desirable due to the nature of DDP. 1190 Hijacking of an DDP Stream would require that the underlying LLP 1191 Stream is hijacked. This would require knowledge of Advertised 1192 buffers in order to directly Place data into a user buffer and is 1193 therefore constrained by the same techniques mentioned to guard 1194 against attempts to read or write from unauthorized memory regions. 1196 DDP does not require a node to open its buffers to arbitrary attacks 1197 over the DDP Stream. It may access ULP memory only to the extent 1198 that the ULP has enabled and authorized it to do so. The STag 1199 access control model is defined by a (forthcoming) document. 1200 Specific security operations include: 1202 1. STags are only valid over the exact byte range established by the 1203 ULP. DDP MUST provide a mechanism for the ULP to establish and 1204 revoke the TO range associated with the ULP Buffer referenced by 1205 the STag. 1206 2. STags are only valid for the duration established by the ULP. The 1207 ULP may revoke them at any time, in accordance with its own upper 1208 layer protocol requirements. DDP MUST provide a mechanism for the 1209 ULP to establish and revoke STag validity. 1210 3. DDP MUST provide a mechanism for the ULP to communicate the 1211 association between STags and a specific DDP Stream.. 1212 4. A ULP may only expose memory to remote access to the extent that 1213 it already had access to that memory itself. 1214 5. If an STag is not valid on a DDP Stream, DDP MUST pass the invalid 1215 access attempt to the ULP. The ULP may provide a mechanism for 1216 terminating the DDP Stream. 1218 Further, DDP provides a mechanism that directly Places incoming 1219 payloads in user-mode ULP Buffers. This avoids the risks of prior 1220 solutions that relied upon exposing system buffers for incoming 1221 payloads. 1223 shah, et. al. Expires April 2003 30 1224 11 IANA Considerations 1226 If DDP was enabled a priori for a ULP by connecting to a well-known 1227 port, this well-known port would be registered for the DDP with 1228 IANA. 1230 shah, et. al. Expires April 2003 31 1231 12 References 1233 12.1 Normative References 1235 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 1236 3", BCP 9, RFC 2026, October 1996. 1238 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1239 Requirement Levels", BCP 14, RFC 2119, March 1997. 1241 [MPA] P. Culley et al., "Markers with PDU Alignment", RDMA 1242 Consortium Draft Specification draft-cully-iwarp-mpa-01.doc, 1243 October 2002 1245 [RDMAP] R. Recio et al., "RDMA Protocol Specification", RDMA 1246 Consortium Draft Specification draft-recio-iwarp-01, October 1247 2002 1249 [SCTP] R. Stewart et al., "Stream Control Transmission Protocol", 1250 RFC 2960, October 2000. 1252 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1253 September 1981. 1255 12.2 Informative References 1257 [TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC 1258 2246, November 1998. 1260 [IPSEC] Atkinson, R., Kent, S., "Security Architecture for the 1261 Internet Protocol", RFC 2401, November 1998. 1263 shah, et. al. Expires April 2003 32 1264 13 Appendix 1266 13.1 Receive Window sizing 1268 Reliable, sequenced, LLPs include a mechanism to advertise the 1269 amount of receive buffer space a sender may consume. This is 1270 generally called a "receive window". 1272 DDP allows data to be transferred directly to predefined buffers at 1273 the Data Sink. Accordingly, the LLP receive window size need not be 1274 affected by the reception of a DDP Segment, if that segment is 1275 placed before additional segments arrive. 1277 The LLP implementation SHOULD maintain an advertised receive window 1278 large enough to enable a reasonable number of segments to be 1279 outstanding at one time. The amount to advertise depends on the 1280 desired data rate, and the expected or actual round trip delay 1281 between endpoints. 1283 The amount of actual buffers maintained to "back up" the receive 1284 window is left up to the implementation. This amount will depend on 1285 the rate that DDP Segments can be retired; there may be some cases 1286 where segment processing cannot keep up with the incoming packet 1287 rate. If this occurs, one reasonable way to slow the incoming packet 1288 rate is to reduce the receive window. 1290 Note that the LLP should take care to comply with the applicable 1291 RFCs; for instance, for TCP, receivers are highly discouraged from 1292 "shrinking" the receive window (reducing the right edge of the 1293 window after it has been advertised). 1295 shah, et. al. Expires April 2003 33 1296 14 Author's Addresses 1298 Paul R. Culley 1299 Hewlett-Packard Company 1300 20555 SH 249 1301 Houston, TX 77070-2698 USA 1302 Phone: +1 (281) 514-5543 1303 Email: paul.culley@hp.com 1305 James Pinkerton 1306 Microsoft Corporation 1307 One Microsoft Way 1308 Redmond, WA 98052 USA 1309 Phone: +1 (425) 705-5442 1310 Email: jpink@microsoft.com 1312 Renato Recio 1313 IBM Corporation 1314 11501 Burnett Road 1315 Austin, TX 78758 USA 1316 Phone: +1 (512) 838-1365 1317 Email: recio@us.ibm.com 1319 Hemal Shah 1320 Intel Corporation 1321 MS PTL1 1322 1501 South Mopac Expressway, #400 1323 Austin, TX 78746 USA 1324 Phone: +1 (512) 732-3963 1325 Email: hemal.shah@intel.com 1327 shah, et. al. Expires April 2003 34 1328 15 Acknowledgments 1330 John Carrier 1331 Adaptec, Inc. 1332 691 S. Milpitas Blvd. 1333 Milpitas, CA 95035 USA 1334 Phone: +1 (360) 378-8526 1335 Email: john_carrier@adaptec.com 1337 Hari Ghadia 1338 Adaptec, Inc. 1339 691 S. Milpitas Blvd., 1340 Milpitas, CA 95035 USA 1341 Phone: +1 (408) 957-5608 1342 Email: hari_ghadia@adaptec.com 1344 Patricia Thaler 1345 Agilent Technologies, Inc. 1346 1101 Creekside Ridge Drive, #100 1347 M/S-RG10 1348 Roseville, CA 95678 1349 Phone: +1-916-788-5662 1350 email: pat_thaler@agilent.com 1352 Mike Penna 1353 Broadcom Corporation 1354 16215 Alton Parkway 1355 Irvine, California 92619-7013 USA 1356 Phone: +1 (949) 926-7149 1357 Email: MPenna@Broadcom.com 1359 Uri Elzur 1360 Broadcom Corporation 1361 16215 Alton Parkway 1362 Irvine, California 92619-7013 USA 1363 Phone: +1 (949) 585-6432 1364 Email: Uri@Broadcom.com 1366 Ted Compton 1367 EMC Corporation 1368 Research Triangle Park, NC 27709, USA 1369 Phone: 919-248-6075 1370 Email: compton_ted@emc.com 1372 Jim Wendt 1373 Hewlett-Packard Company 1374 8000 Foothills Boulevard 1375 Roseville, CA 95747-5668 USA 1376 shah, et. al. Expires April 2003 35 1377 Phone: +1 (916) 785-5198 1378 Email: jim_wendt@hp.com 1380 Mike Krause 1381 Hewlett-Packard Company, 43LN 1382 19410 Homestead Road 1383 Cupertino, CA 95014 USA 1384 Phone: +1 (408) 447-3191 1385 Email: krause@cup.hp.com 1387 Dave Minturn 1388 Intel Corporation 1389 MS JF1-210 1390 5200 North East Elam Young Parkway 1391 Hillsboro, OR 97124 USA 1392 Phone: +1 (503) 712-4106 1393 Email: dave.b.minturn@intel.com 1395 Howard C. Herbert 1396 Intel Corporation 1397 MS CH7-404 1398 5000 West Chandler Blvd. 1399 Chandler, AZ 85226 USA 1400 Phone: +1 (480) 554-3116 1401 Email: howard.c.herbert@intel.com 1403 Tom Talpey 1404 Network Appliance 1405 375 Totten Pond Road 1406 Waltham, MA 02451 USA 1407 Phone: +1 (781) 768-5329 1408 EMail: thomas.talpey@netapp.com 1410 Dwight Barron 1411 Hewlett-Packard Company 1412 20555 SH 249 1413 Houston, TX 77070-2698 USA 1414 Phone: +1 (281) 514-2769 1415 Email: Dwight.Barron@Hp.com 1417 Dave Garcia 1418 Hewlett-Packard Company 1419 19333 Vallco Parkway 1420 Cupertino, Ca. 95014 USA 1421 Phone: +1 (408) 285-6116 1422 Email: dave.garcia@hp.com 1424 shah, et. al. Expires April 2003 36 1425 Jeff Hilland 1426 Hewlett-Packard Company 1427 20555 SH 249 1428 Houston, Tx. 77070-2698 USA 1429 Phone: +1 (281) 514-9489 1430 Email: jeff.hilland@hp.com 1432 shah, et. al. Expires April 2003 37 1433 16 Full Copyright Statement 1435 This document and the information contained herein is provided on an 1436 "AS IS" basis and ADAPTEC INC., AGILENT TECHNOLOGIES INC., BROADCOM 1437 CORPORATION, CISCO SYSTEMS INC., EMC CORPORATION, HEWLETT-PACKARD 1438 COMPANY, INTERNATIONAL BUSINESS MACHINES CORPORATION, INTEL 1439 CORPORATION, MICROSOFT CORPORATION, NETWORK APPLIANCE INC., THE 1440 INTERNET SOCIETY, AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM 1441 ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 1442 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 1443 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 1444 FOR A PARTICULAR PURPOSE. 1446 Copyright (c) 2002 ADAPTEC INC., BROADCOM CORPORATION, CISCO SYSTEMS 1447 INC., EMC CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONAL 1448 BUSINESS MACHINES CORPORATION, INTEL CORPORATION, MICROSOFT 1449 CORPORATION, NETWORK APPLIANCE INC., All Rights Reserved. 1451 shah, et. al. Expires April 2003 38