idnits 2.17.1 draft-ietf-rddp-ddp-00.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RDMA' is mentioned on line 259, but not defined == Missing Reference: 'DDP' is mentioned on line 351, but not defined == Unused Reference: 'RFC2026' is defined on line 1278, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 1281, but no explicit reference was found in the text -- No information found for draft-cully-iwarp-mpa - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'MPA' -- No information found for draft-recio-iwarp - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'RDMAP' ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET-DRAFT Hemal Shah 3 draft-ietf-rddp-ddp-00.txt Intel Corporation 4 James Pinkerton 5 Microsoft Corporation 6 Renato Recio 7 IBM Corporation 8 Paul Culley 9 Hewlett-Packard Company 11 Expires: August, 2003 13 Direct Data Placement over Reliable Transports 15 1 Status of this Memo 17 This document is an Internet-Draft and is subject to all provisions 18 of Section 10 of RFC2026. 20 Internet-Drafts are working documents of the Internet Engineering 21 Task Force (IETF), its areas, and its working groups. Note that 22 other groups may also distribute working documents as Internet- 23 Drafts. 25 Internet-Drafts are draft documents valid for a maximum of six 26 months and may be updated, replaced, or obsoleted by other documents 27 at any time. It is inappropriate to use Internet-Drafts as 28 reference material or to cite them other than as "work in progress." 30 The list of current Internet-Drafts can be accessed at 31 http://www.ietf.org/1id-abstracts.html The list of Internet-Draft 32 Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 2 Abstract 37 The Direct Data Placement protocol provides information to Place the 38 incoming data directly into an upper layer protocol's receive buffer 39 without intermediate buffers. This removes excess CPU and memory 40 utilization associated with transferring data through the 41 intermediate buffers. 43 shah, et. al. Expires August 2003 1 44 Table of Contents 46 1 Status of this Memo.........................................1 47 2 Abstract....................................................1 48 3 Introduction................................................4 49 3.1 Architectural Goals.........................................4 50 3.2 Protocol Overview...........................................5 51 3.3 DDP Layering................................................7 52 4 Glossary....................................................9 53 4.1 General.....................................................9 54 4.2 LLP........................................................10 55 4.3 Direct Data Placement (DDP)................................10 56 5 Reliable Delivery LLP Requirements.........................13 57 6 Header Format..............................................15 58 6.1 DDP Control Field..........................................15 59 6.2 DDP Tagged Buffer Model Header.............................16 60 6.3 DDP Untagged Buffer Model Header...........................17 61 6.4 DDP Segment Format.........................................19 62 7 Data Transfer..............................................20 63 7.1 DDP Tagged or Untagged Buffer Models.......................20 64 7.1.1 Tagged Buffer Model.......................................20 65 7.1.2 Untagged Buffer Model.....................................20 66 7.2 Segmentation and Reassembly of a DDP Message...............21 67 7.3 Ordering Among DDP Messages................................22 68 7.4 DDP Message Completion & Delivery..........................23 69 8 DDP Stream Setup & Teardown................................25 70 8.1 DDP Stream Setup...........................................25 71 8.2 DDP Stream Teardown........................................25 72 8.2.1 DDP Graceful Teardown.....................................25 73 8.2.2 DDP Abortive Teardown.....................................26 74 9 Error Semantics............................................27 75 9.1 Errors detected at the Data Sink...........................27 76 9.2 DDP Error Numbers..........................................28 77 10 Security Considerations....................................29 78 10.1 Protocol-specific Security Considerations.................29 79 10.2 Using IPSec with DDP......................................29 80 10.3 Association of an STag and a DDP Stream...................29 81 10.4 Other Security Considerations.............................30 82 11 IANA Considerations........................................32 83 12 References.................................................33 84 12.1 Normative References......................................33 85 12.2 Informative References....................................33 86 13 Appendix...................................................34 87 13.1 Receive Window sizing.....................................34 88 14 Author's Addresses.........................................35 89 15 Acknowledgments............................................36 90 16 Full Copyright Statement...................................39 92 shah, et. al. Expires August 2003 2 93 Table of Figures 95 Figure 1 DDP Layering.............................................7 96 Figure 2 MPA, DDP, and RDMAP Header Alignment.....................8 97 Figure 3 DDP Control Field.......................................15 98 Figure 4 Tagged Buffer DDP Header................................16 99 Figure 5 Untagged Buffer DDP Header..............................18 100 Figure 6 DDP Segment Format......................................19 102 shah, et. al. Expires August 2003 3 103 3 Introduction 105 Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol 106 (ULP) to send data to a Data Sink without requiring the Data Sink to 107 Place the data in an intermediate buffer - thus when the data 108 arrives at the Data Sink, the network interface can Place the data 109 directly into the ULP's buffer. This can enable the Data Sink to 110 consume substantially less memory bandwidth than a buffered model 111 because the Data Sink is not required to move the data from the 112 intermediate buffer to the final destination. Additionally, this can 113 also enable the network protocol to consume substantially fewer CPU 114 cycles than if the CPU was used to move the data, and removes the 115 bandwidth limitation of only being able to move data as fast as the 116 CPU can copy the data. 118 DDP preserves ULP record boundaries (messages) while providing a 119 variety of data transfer mechanisms and completion mechanisms to be 120 used to transfer ULP messages. 122 3.1 Architectural Goals 124 DDP has been designed with the following high-level architectural 125 goals: 127 * Provide a buffer model that enables the Local Peer to Advertise 128 a named buffer (i.e. a Tag for a buffer) to the Remote Peer, 129 such that across the network the Remote Peer can Place data 130 into the buffer at Remote Peer specified locations. This is 131 referred to as the Tagged Buffer Model. 133 * Provide a second receive buffer model which preserves ULP 134 message boundaries from the Remote Peer and keeps the Local 135 Peer's buffers anonymous (i.e. Untagged). This is referred to 136 as the Untagged Buffer Model. 138 * Provide reliable, in-order Delivery semantics for both Tagged 139 and Untagged Buffer Models. 141 * Provide segmentation and reassembly of ULP messages. 143 * Enable the ULP buffer to be used as a reassembly buffer, 144 without a need for a copy, even if incoming DDP Segments arrive 145 out of order. This requires the protocol to separate Data 146 Placement of ULP Payload contained in an incoming DDP Segment 147 from Data Delivery of completed ULP Messages. 149 * If the LLP supports multiple LLP streams within a LLP 150 Connection, provide the above capabilities independently on 151 shah, et. al. Expires August 2003 4 152 each LLP stream and enable the capability to be exported on a 153 per LLP stream basis to the ULP. 155 3.2 Protocol Overview 157 DDP supports two basic data transfer models - a Tagged Buffer data 158 transfer model and an Untagged Buffer data transfer model. 160 The Tagged Buffer data transfer model requires the Data Sink to send 161 the Data Source an identifier for the ULP buffer, referred to as a 162 Steering Tag (STag). The STag is transferred to the Data Source 163 using a ULP defined method. Once the Data Source ULP has an STag for 164 a destination ULP buffer, it can request that DDP send the ULP data 165 to the destination ULP buffer by specifying the STag to DDP. Note 166 that the Tagged Buffer does not have to be filled starting at the 167 beginning of the ULP buffer. The ULP Data Source can provide an 168 arbitrary offset into the ULP buffer. 170 The Untagged Buffer data transfer model enables data transfer to 171 occur without requiring the Data Sink to Advertise a ULP Buffer to 172 the Data Source. The Data Sink can queue up a series of receive ULP 173 buffers. An Untagged DDP Message from the Data Source consumes an 174 Untagged Buffer at the Data Sink. Because DDP is message oriented, 175 even if the Data Source sends a DDP Message payload smaller than the 176 receive ULP buffer, the partially filled receive ULP buffer is 177 Delivered to the ULP anyway. If the Data Source sends a DDP Message 178 payload larger than the receive ULP buffer, it results in an error. 180 There are several key differences between the Tagged and Untagged 181 Buffer Model: 183 * For the Tagged Buffer Model, the Data Source specifies which 184 received Tagged Buffer will be used for a specific Tagged DDP 185 Message (sender-based ULP buffer management). For the Untagged 186 Buffer Model, the Data Sink specifies the order in which 187 Untagged Buffers will be consumed as Untagged DDP Messages are 188 received (receiver-based ULP buffer management). 190 * For the Tagged Buffer Model, the ULP at the Data Sink must 191 Advertise the ULP buffer to the Data Source through a ULP 192 specific mechanism before data transfer can occur. For the 193 Untagged Buffer Model, data transfer can occur without an end- 194 to-end explicit ULP buffer Advertisement. Note, however, that 195 the ULP needs to address flow control issues because if a DDP 196 Message arrives for an Untagged Buffer without an associated 197 receive ULP buffer, the DDP Message is dropped, the DDP Stream 198 is disabled for reception, and an error is reported to the ULP 199 at the Data Sink. 200 shah, et. al. Expires August 2003 5 201 * For the Tagged Buffer Model, a DDP Message can start at an 202 arbitrary offset within the Tagged Buffer. For the Untagged 203 Buffer Model, a DDP Message can only start at offset 0. 205 * The Tagged Buffer Model allows multiple DDP Messages targeted 206 to a Tagged Buffer with a single ULP buffer Advertisement. The 207 Untagged Buffer Model requires associating a receive ULP buffer 208 for each DDP Message targeted to an Untagged Buffer. 210 Either data transfer model Places a ULP Message into a DDP Message. 211 Each DDP Message is then sliced into DDP Segments that are intended 212 to fit within a lower-layer-protocol's (LLP) Maximum Upper Layer 213 Protocol Data Unit (MULPDU). Thus the ULP can post arbitrary size 214 ULP Messages, containing up to 2^32 - 1 octets of ULP Payload, and 215 DDP slices the ULP message into DDP Segments which are reassembled 216 transparently at the Data Sink. 218 DDP provides in-order Delivery for the ULP. However, DDP 219 differentiates between Data Delivery and Data Placement. DDP 220 provides enough information in each DDP Segment to allow the ULP 221 Payload in each inbound DDP Segment payloads to be directly Placed 222 into the correct ULP Buffer, even when the DDP Segments arrive out- 223 of-order. Thus, DDP enables the reassembly of ULP Payload contained 224 in DDP Segments of a DDP Message into a ULP Message to occur within 225 the ULP Buffer, therefore eliminating the traditional copy out of 226 the reassembly buffer into the ULP Buffer. 228 A DDP Message's payload is Delivered to the ULP when: 230 * all DDP Segments of a DDP Message have been completely received 231 and the payload of the DDP Message has been Placed into the 232 associated ULP Buffer, 234 * all prior DDP Messages have been Placed, and 236 * all prior DDP Message Deliveries have been performed. 238 The LLP under DDP may support a single LLP stream of data per 239 connection (e.g. TCP) or multiple LLP streams of data per connection 240 (e.g. SCTP). But in either case, DDP is specified such that each DDP 241 Stream is independent and maps to a single LLP stream. Within a 242 specific DDP Stream, the LLP Stream is required to provide in-order, 243 reliable Delivery. Note that DDP has no ordering guarantees between 244 DDP Streams. 246 A DDP protocol could potentially run over reliable Delivery LLPs or 247 unreliable Delivery LLPs. This specification requires reliable, in 248 order Delivery LLPs. 249 shah, et. al. Expires August 2003 6 250 3.3 DDP Layering 252 DDP is intended to be LLP independent, subject to the requirements 253 defined in section 5. However, DDP was specifically defined to be 254 part of a family of protocols that were created to work well 255 together, as shown in Figure 1 DDP Layering. For LLP protocol 256 definitions of each LLP, see [MPA], [TCP], and [SCTP]. 258 DDP enables direct data Placement capability for any ULP, but it has 259 been specifically designed to work well with RDMAP (see [RDMA]), and 260 is part of the iWARP protocol suite. 262 +-------------------+ 263 | | 264 | RDMA ULP | 265 | | 266 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 267 | | | 268 | ULP | RDMAP | 269 | | | 270 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 271 | | 272 | DDP protocol | 273 | | 274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 | | | 276 | MPA | | 277 | | | 278 | | | 279 +-+-+-+-+-+-+-+-+-+ SCTP | 280 | | | 281 | TCP | | 282 | | | 283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 285 Figure 1 DDP Layering 287 If DDP is layered below RDMAP and on top of MPA and TCP, then the 288 respective headers and payload are arranged as follows (Note: For 289 clarity, MPA header and CRC are included but framing markers are not 290 shown.): 292 shah, et. al. Expires August 2003 7 293 0 1 2 3 294 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 295 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 296 | | 297 // TCP Header // 298 | | 299 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 300 | MPA Header | | 301 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 302 | | 303 // DDP Header // 304 | | 305 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 306 | | 307 // RDMAP Header // 308 | | 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 310 | | 311 // RDMAP ULP Payload // 312 | | 313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 314 | MPA CRC | 315 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 317 Figure 2 MPA, DDP, and RDMAP Header Alignment 319 shah, et. al. Expires August 2003 8 320 4 Glossary 322 4.1 General 324 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 325 the act of informing a Remote Peer that a local RDMA Buffer is 326 available to it. A Node makes available an RDMA Buffer for 327 incoming RDMA Read or RDMA Write access by informing its 328 RDMA/DDP peer of the Tagged Buffer identifiers (STag, base 329 address, length). This advertisement of Tagged Buffer 330 information is not defined by RDMA/DDP and is left to the ULP. A 331 typical method would be for the Local Peer to embed the Tagged 332 Buffer's Steering Tag, address, and length in a Send message 333 destined for the Remote Peer. 335 Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined 336 as the process of informing the ULP or consumer that a 337 particular Message is available for use. This is specifically 338 different from "Placement", which may generally occur in any 339 order, while the order of "Delivery" is strictly defined. See 340 "Data Placement". 342 Data Sink - The peer receiving a data payload. Note that the Data 343 Sink can be required to both send and receive RDMA/DDP Messages 344 to transfer a data payload. 346 Data Source - The peer sending a data payload. Note that the Data 347 Source can be required to both send and receive RDMA/DDP 348 Messages to transfer a data payload. 350 iWARP - A suite of wire protocols comprised of RDMAP [RDMAP], DDP 351 [DDP], and MPA [MPA]. The iWARP protocol suite may be layered 352 above TCP, SCTP, or other transport protocols. 354 Local Peer - The RDMA/DDP protocol implementation on the local end 355 of the connection. Used to refer to the local entity when 356 describing a protocol exchange or other interaction between two 357 Nodes. 359 Node - A computing device attached to one or more links of network. 360 A Node in this context does not refer to a specific application 361 or protocol instantiation running on the computer. A Node may 362 consist of one or more RNICs installed in a host computer. 364 Remote Peer - The RDMA/DDP protocol implementation on the opposite 365 end of the connection. Used to refer to the remote entity when 366 describing protocol exchanges or other interactions between two 367 Nodes. 368 shah, et. al. Expires August 2003 9 369 ULP - Upper Layer Protocol. The protocol layer above the protocol 370 layer currently being referenced. The ULP for RDMA/DDP is 371 expected to be an OS, Application, adaptation layer, or 372 proprietary device. The RDMA/DDP documents do not specify a ULP 373 - they provide a set of semantics that allow a ULP to be 374 designed to utilize RDMA/DDP. 376 ULP Message - the ULP data that is handed to a specific protocol 377 layer for transmission. Data boundaries are preserved as they 378 are transmitted through iWARP. 380 ULP Payload - The ULP data that is contained within a single 381 protocol segment or packet (e.g. a DDP Segment). 383 4.2 LLP 385 LLP - Lower Layer Protocol. The protocol layer beneath the protocol 386 layer currently being referenced. For example, for DDP the LLP 387 is SCTP, MPA, or other transport protocols. For RDMA, the LLP is 388 DDP. 390 LLP Connection - Corresponds to an LLP transport-level connection 391 between the peer LLP layers on two nodes. 393 LLP Stream - Corresponds to a single LLP transport-level stream 394 between the peer LLP layers on two Nodes. One or more LLP 395 Streams may map to a single transport-level LLP Connection. For 396 transport protocols that support multiple streams per connection 397 (e.g. SCTP), a LLP Stream corresponds to one transport-level 398 stream. 400 MULPDU - Maximum ULPDU. The current maximum size of the record that 401 is acceptable for DDP to pass to the LLP for transmission. 403 ULPDU - Upper Layer Protocol Data Unit. The data record defined by 404 the layer above MPA. 406 4.3 Direct Data Placement (DDP) 408 DDP Graceful Teardown - The act of closing a DDP Stream such that 409 all in-progress and pending DDP Messages are allowed to complete 410 successfully. 412 DDP Abortive Teardown - The act of closing a DDP Stream without 413 attempting to complete in-progress and pending DDP Messages. 415 shah, et. al. Expires August 2003 10 416 Data Placement (Placement, Placed, Places) - For DDP, this term is 417 specifically used to indicate the process of writing to a data 418 buffer by a DDP implementation. DDP Segments carry Placement 419 information, which may be used by the receiving DDP 420 implementation to perform Data Placement of the DDP Segment ULP 421 Payload. See "Data Delivery". 423 DDP Control Field - a fixed 8-bit field in the DDP Header. 425 DDP Header - The header present in all DDP Segments. The DDP Header 426 contains control and Placement fields that are used to define 427 the final Placement location for the ULP Payload carried in a 428 DDP Segment. 430 DDP Message - A ULP defined unit of data interchange, which is 431 subdivided into one or more DDP Segments. This segmentation may 432 occur for a variety of reasons, including segmentation to 433 respect the maximum segment size of the underlying transport 434 protocol. 436 DDP Segment - The smallest unit of data transfer for the DDP 437 protocol. It includes a DDP Header and ULP Payload (if present). 438 A DDP Segment should be sized to fit within the Lower Layer 439 Protocol MULPDU. 441 DDP Stream - a sequence of DDP messages whose ordering is defined by 442 the LLP. For SCTP, a DDP Stream maps directly to an SCTP stream. 443 For MPA, a DDP Stream maps directly to a TCP connection and a 444 single DDP Stream is supported. Note that DDP has no ordering 445 guarantees between DDP Streams. 447 DDP Stream Identifier (ID) � An identifier for a DDP Stream. 449 Direct Data Placement - A mechanism whereby ULP data contained 450 within DDP Segments may be Placed directly into its final 451 destination in memory without processing of the ULP. This may 452 occur even when the DDP Segments arrive out of order. Out of 453 order Placement support may require the Data Sink to implement 454 the LLP and DDP as one functional block. 456 Direct Data Placement Protocol (DDP) - Also, a wire protocol that 457 supports Direct Data Placement by associating explicit memory 458 buffer placement information with the LLP payload units. 460 Message Offset (MO) - For the DDP Untagged Buffer Model, specifies 461 the offset, in octets, from the start of a DDP Message. 463 shah, et. al. Expires August 2003 11 464 Message Sequence Number (MSN) - For the DDP Untagged Buffer Model, 465 specifies a sequence number that is increasing with each DDP 466 Message. 468 Protection Domain (PD) � A Mechanism used to associate a DDP Stream 469 and an STag. Under this mechanism, the use of an STag is valid 470 on a DDP Stream if the STag has the same Protection Domain 471 Identifier (PD ID) as the DDP Stream. 473 Protection Domain Identifier (PD ID) � An identifier for the 474 Protection Domain. 476 Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a 477 destination Data Sink queue for a DDP Segment. 479 Steering Tag - An identifier of a Tagged Buffer on a Node, valid as 480 defined within a protocol specification. 482 STag - Steering Tag 484 Tagged Buffer - A buffer that is explicitly Advertised to the Remote 485 Peer through exchange of an STag, Target Offset, and length. 487 Tagged Buffer Model - A DDP data transfer model used to transfer 488 Tagged Buffers from the Local Peer to the Remote Peer. 490 Tagged DDP Message - A DDP Message that targets a Tagged Buffer. 492 Target Offset (TO) - The offset within a Tagged Buffer on a Node. 494 ULP Buffer - A buffer owned above the DDP Layer and advertised to 495 the DDP Layer either as a Tagged Buffer or an Untagged ULP 496 Buffer. 498 ULP Message Length - is the total length of the ULP Payload contained 499 in a DDP Message. 501 Untagged Buffer - A buffer that is not explicitly Advertised to the 502 Remote Peer. 504 Untagged Buffer Model - A DDP data transfer model used to transfer 505 Untagged Buffers from the Local Peer to the Remote Peer. 507 Untagged DDP Message - A DDP Message that targets an Untagged 508 Buffer. 510 shah, et. al. Expires August 2003 12 511 5 Reliable Delivery LLP Requirements 513 1. LLPs MUST expose MULPDU & MULPDU Changes. This is required so 514 that the DDP layer can perform segmentation aligned with the 515 MULPDU and can adapt as MULPDU changes come about. The corner 516 case of how to handle outstanding requests during a MULPDU 517 change is covered by the requirements below. 519 2. In the event of a MULPDU change, DDP MUST NOT be required by the 520 LLP to re-segment DDP Segments that have been previously posted 521 to the LLP. Note that under pathological conditions the LLP may 522 change the advertised MULPDU more frequently than the queue of 523 previously posted DDP Segment transmit requests is flushed. 524 Under this pathological condition, the LLP transmit queue can 525 contain DDP Messages which were posted multiple MULPDU updates 526 previously, thus there may be no correlation between the queued 527 DDP Segment(s) and the LLP's current value of MULPDU. 529 3. The LLP MUST ensure that if it accepts a DDP Segment, it will 530 transfer it reliably to the receiver or return with an error 531 stating that the transfer failed to complete. 533 4. The LLP MUST preserve DDP Segment and Message boundaries at the 534 Data Sink. 536 5. The LLP MAY provide the incoming segments out of order for 537 Placement, but if it does, it MUST also provide information that 538 specifies what the sender specified order was. 540 6. LLP MUST provide a strong digest (at least equivalent to CRC32- 541 C) to cover at least the DDP Segment. It is believed that some 542 of the existing data integrity digests are not sufficient and 543 that direct memory transfer semantics require a stronger digest 544 than, for example, a simple checksum. 546 7. On receive, the LLP MUST provide the length of the DDP Segment 547 received. This ensures that DDP does not have to carry a length 548 field in its header. 550 8. If an LLP does not support teardown of a LLP stream independent 551 of other LLP streams and a DDP error occurs on a specific DDP 552 Stream, then the LLP MUST label the associated LLP stream as an 553 erroneous LLP stream and MUST NOT allow any further data 554 transfer on that LLP stream after DDP requests the associated 555 DDP Stream to be torn down. 557 9. For a specific LLP Stream, the LLP MUST provide a mechanism to 558 indicate that the LLP Stream has been gracefully torn down. For 559 shah, et. al. Expires August 2003 13 560 a specific LLP Connection, the LLP MUST provide a mechanism to 561 indicate that the LLP Connection has been gracefully torn down. 562 Note that if the LLP does not allow an LLP Stream to be torn 563 down independently of the LLP Connection, the above requirements 564 allow the LLP to notify DDP of both events at the same time. 566 10. For a specific LLP Connection, when all LLP Streams are either 567 gracefully torn down or are labeled as erroneous LLP streams, 568 the LLP Connection MUST be torn down. 570 11. The LLP MUST NOT pass a duplicate DDP Segment to the DDP Layer 571 after it has passed all the previous DDP Segments to the DDP 572 Layer and the associated ordering information for the previous 573 DDP Segments and the current DDP Segment. 575 shah, et. al. Expires August 2003 14 576 6 Header Format 578 DDP has two different header formats: one for Data Placement into 579 Tagged Buffers, and the other for Data Placement into Untagged 580 Buffers. See Section 7.1 for a description of the two models. 582 6.1 DDP Control Field 584 The first 8 bits of the DDP Header carry a DDP Control Field that is 585 common between the two formats. It is shown below in Figure 3, 586 offset by 16 bits to accommodate the MPA header defined in [MPA]. 587 The MPA header is only present if DDP is layered on top of MPA. 589 0 1 2 3 590 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 591 +-+-+-+-+-+-+-+-+ 592 |T|L| Rsvd |DV | 593 +-+-+-+-+-+-+-+-+ 594 Figure 3 DDP Control Field 596 T - Tagged flag: 1 bit. 598 Specifies the Tagged or Untagged Buffer Model. If set to one, 599 the ULP Payload carried in this DDP Segment MUST be Placed into 600 a Tagged Buffer. 602 If set to zero, the ULP Payload carried in this DDP Segment 603 MUST be Placed into an Untagged Buffer. 605 L - Last flag: 1 bit. 607 Specifies whether the DDP Segment is the Last segment of a DDP 608 Message. It MUST be set to one on the last DDP Segment of every 609 DDP Message. It MUST NOT be set to one on any other DDP 610 Segment. 612 The DDP Segment with the L bit set to 1 MUST be posted to the 613 LLP after all other DDP Segments of the associated DDP Message 614 have been posted to the LLP. For an Untagged DDP Message, the 615 DDP Segment with the L bit set to 1 MUST carry the highest MO. 617 If the Last flag is set to one, the DDP Message payload MUST be 618 Delivered to the ULP after: 620 . Placement of all DDP Segments of this DDP Message and all 621 prior DDP Messages, and 622 shah, et. al. Expires August 2003 15 623 . Delivery of each prior DDP Message. 625 If the Last flag is set to zero, the DDP Segment is an 626 intermediate DDP Segment. 628 Rsvd - Reserved: 4 bits. 630 Reserved for future use by the DDP protocol. This field MUST be 631 set to zero on transmit, and not checked on receive. 633 DV - Direct Data Placement Protocol Version: 2 bits. 635 The version of the DDP Protocol in use. This field MUST be set 636 to one to indicate the version of the specification described 637 in this document. The value of DV MUST be the same for all the 638 DDP Segments transmitted or received on a DDP Stream. 640 6.2 DDP Tagged Buffer Model Header 642 Figure 4 shows the DDP Header format that MUST be used in all DDP 643 Segments that target Tagged Buffers. It includes the DDP Control 644 Field previously defined in Section 6.1. (Note: In Figure 4, the DDP 645 Header is offset by 16 bits to accommodate the MPA header defined in 646 [MPA]. The MPA header is only present if DDP is layered on top of 647 MPA.) 649 0 1 2 3 650 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 651 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 652 |T|L| Rsvd | DV| RsvdULP | 653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 | STag | 655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 656 | | 657 + TO + 658 | | 659 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 660 Figure 4 Tagged Buffer DDP Header 662 T is set to one. 664 RsvdULP - Reserved for use by the ULP: 8 bits. 666 The RsvdULP field is opaque to the DDP protocol and can be 667 structured in any way by the ULP. At the Data Source, DDP MUST 668 set RsvdULP Field to the value specified by the ULP. It is 669 transferred unmodified from the Data Source to the Data Sink. 670 At the Data Sink, DDP MUST provide the RsvdULP field to the ULP 671 shah, et. al. Expires August 2003 16 672 when the DDP Message is delivered. Each DDP Segment within a 673 specific DDP Message MUST contain the same value for this 674 field. 676 STag - Steering Tag: 32 bits. 678 The Steering Tag identifies the Data Sink's Tagged Buffer. The 679 STag MUST be valid for this DDP Stream. The STag is associated 680 with the DDP Stream through a mechanism that is outside the 681 scope of the DDP Protocol specification. At the Data Source, 682 DDP MUST set the STag field to the value specified by the ULP. 683 At the Data Sink, the DDP MUST provide the STag field when the 684 ULP Message is delivered. Each DDP Segment within a specific 685 DDP Message MUST contain the same value for this field and MUST 686 be the value supplied by the ULP. 688 TO - Tagged Offset: 64 bits. 690 The Tagged Offset specifies the offset, in octets, within the 691 Data Sink's Tagged Buffer, where the Placement of ULP Payload 692 contained in the DDP Segment starts. A DDP Message MAY start at 693 an arbitrary TO within a Tagged Buffer. 695 6.3 DDP Untagged Buffer Model Header 697 Figure 5 shows the DDP Header format that MUST be used in all DDP 698 Segments that target Untagged Buffers. It includes the DDP Control 699 Field previously defined in Section 6.1. (Note: In Figure 5, the DDP 700 Header is offset by 16 bits to accommodate the MPA header defined in 701 [MPA]. The MPA header is only present if DDP is layered on top of 702 MPA.) 704 shah, et. al. Expires August 2003 17 705 0 1 2 3 706 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 707 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 708 |T|L| Rsvd | DV| RsvdULP[0:7] | 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 | RsvdULP[8:39] | 711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | QN | 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 | MSN | 715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 716 | MO | 717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 718 Figure 5 Untagged Buffer DDP Header 720 T is set to zero. 722 RsvdULP - Reserved for use by the ULP: 40 bits. 724 The RsvdULP field is opaque to the DDP protocol and can be 725 structured in any way by the ULP. At the Data Source, DDP MUST 726 set RsvdULP Field to the value specified by the ULP. It is 727 transferred unmodified from the Data Source to the Data Sink. 728 At the Data Sink, DDP MUST provide RsvdULP field to the ULP 729 when the ULP Message is Delivered. Each DDP Segment within a 730 specific DDP Message MUST contain the same value for the 731 RsvdULP field. At the Data Sink, the DDP implementation is NOT 732 REQUIRED to verify that the same value is present in the 733 RsvdULP field of each DDP Segment within a specific DDP Message 734 and MAY provide the value from any one of the received DDP 735 Segment to the ULP when the ULP Message is Delivered. 737 QN - Queue Number: 32 bits. 739 The Queue Number identifies the Data Sink's Untagged Buffer 740 queue referenced by this header. Each DDP segment within a 741 specific DDP message MUST contain the same value for this field 742 and MUST be the value supplied by the ULP at the Data Source. 744 MSN - Message Sequence Number: 32 bits. 746 The Message Sequence Number specifies a sequence number that 747 MUST be increased by one (modulo 2^32) with each DDP Message 748 targeting the specific Queue Number on the DDP Stream 749 associated with this DDP Segment. The initial value for MSN 751 shah, et. al. Expires August 2003 18 752 MUST be one. The MSN value MUST wrap to 0 after a value of 753 0xFFFFFFFF. 755 MO - Message Offset: 32 bits. 757 The Message Offset specifies the offset, in octets, from the 758 start of the DDP Message represented by the MSN and Queue 759 Number on the DDP Stream associated with this DDP Segment. The 760 MO referencing the first octet of the DDP Message MUST be set 761 to zero by the DDP layer. 763 6.4 DDP Segment Format 765 Each DDP Segment MUST contain a DDP Header. Each DDP Segment may 766 also contain ULP Payload. Following is the DDP Segment format: 768 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 769 | DDP | | 770 | Header| ULP Payload (if any) | 771 | | | 772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 773 Figure 6 DDP Segment Format 775 shah, et. al. Expires August 2003 19 776 7 Data Transfer 778 DDP supports multi-segment DDP Messages. Each DDP Message is 779 composed of one or more DDP Segments. Each DDP Segment contains a 780 DDP Header. The DDP Header contains the information required by the 781 receiver to Place any ULP Payload included in the DDP Segment. 783 7.1 DDP Tagged or Untagged Buffer Models 785 DDP uses two basic Buffer Models for the Placement of the ULP 786 Payload: Tagged Buffer Model and Untagged Buffer Model. 788 7.1.1 Tagged Buffer Model 790 The Tagged Buffer Model is used by the Data Source to transfer a DDP 791 Message into a Tagged Buffer at the Data Sink that has been 792 previously Advertised to the Data Source. An STag identifies a 793 Tagged Buffer. For the Placement of a DDP Message using the Tagged 794 Buffer model, the STag is used to identify the buffer, and the TO is 795 used to identify the offset within the Tagged Buffer into which the 796 ULP Payload is transferred. The protocol used to Advertise the 797 Tagged Buffer is outside the scope of this specification (i.e. ULP 798 specific). A DDP Message can start at an arbitrary TO within a 799 Tagged Buffer. 801 Additionally, a Tagged Buffer can potentially be written multiple 802 times. This might be done for error recovery or because a buffer is 803 being re-used after some ULP specific synchronization mechanism. 805 7.1.2 Untagged Buffer Model 807 The Untagged Buffer Model is used by the Data Source to transfer a 808 DDP Message to the Data Sink into a queued buffer. 810 The DDP Queue Number is used by the ULP to separate ULP messages 811 into different queues of receive buffers. For example, if two queues 812 were supported, the ULP could use one queue to post buffers handed 813 to it by the application above the ULP, and it could use the other 814 queue for buffers which are only consumed by ULP specific control 815 messages. This enables the separation of ULP control messages from 816 opaque ULP Payload when using Untagged Buffers. 818 The DDP Message Sequence Number can be used by the Data Sink to 819 identify the specific Untagged Buffer. The protocol used to 820 communicate how many buffers have been queued is outside the scope 821 of this specification. Similarly, the exact implementation of the 822 buffer queue is outside the scope of this specification. 824 shah, et. al. Expires August 2003 20 825 7.2 Segmentation and Reassembly of a DDP Message 827 At the Data Source, the DDP layer MUST segment the data contained in 828 a ULP message into a series of DDP Segments, where each DDP Segment 829 contains a DDP Header and ULP Payload, and MUST be no larger than 830 the MULPDU value advertised by the LLP. The ULP Message Length MUST 831 be less than 2^32. At the Data Source, the DDP layer MUST send all 832 the data contained in the ULP message. At the Data Sink, the DDP 833 layer MUST Place the ULP Payload contained in all valid incoming DDP 834 Segments associated with a DDP Message into the ULP Buffer. 836 DDP Message segmentation at the Data Source is accomplished by 837 identifying a DDP Message (which corresponds one-to-one with a ULP 838 Message) uniquely and then, for each associated DDP Segment of a DDP 839 Message, by specifying an octet offset for the portion of the ULP 840 Message contained in the DDP Segment. 842 For an Untagged DDP Message, the combination of the QN and MSN 843 uniquely identifies a DDP Message. The octet offset for each DDP 844 Segment of a Untagged DDP Message is the MO field. For each DDP 845 Segment of a Untagged DDP Message, the MO MUST be set to the octet 846 offset from the first octet in the associated ULP Message (which is 847 defined to be zero) to the first octet in the ULP Payload contained 848 in the DDP Segment. 850 For example, if the ULP Untagged Message was 2048 octets, and the 851 MULPDU was 1500 octets, the Data Source would generate two DDP 852 Segments, one with MO = 0, containing 1482 octets of ULP Payload, 853 and a second with MO = 1482, containing 566 octets of ULP Payload. 854 In this example, the amount of ULP Payload for the first DDP Segment 855 was calculated as: 857 1482 = 1500 (MULPDU) - 18 (for the DDP Header) 859 For a Tagged DDP Message, the STag and TO, combined with the in- 860 order delivery characteristics of the LLP, are used to segment and 861 reassemble the ULP Message. Because the initial octet offset (the TO 862 field) can be non-zero, recovery of the original ULP Message 863 boundary cannot be done in the general case without an additional 864 ULP Message. 866 Implementers Note: One implementation, valid for some ULPs such 867 as RDMAP, is to not directly support recovery of the ULP 868 Message boundary for a Tagged DDP Message. For example, the ULP 869 may wish to have the Local Peer use small buffers at the Data 870 Source even when the ULP at the Data Sink has advertised a 871 single large Tagged Buffer for this data transfer. In this 872 case, the ULP may choose to use the same STag for multiple 873 shah, et. al. Expires August 2003 21 874 consecutive ULP Messages. Thus a non-zero initial TO and re-use 875 of the STag effectively enables the ULP to implement 876 segmentation and reassembly due to ULP specific constraints. 877 See [RDMAP] for details of how this is done. 879 A different implementation of a ULP could use an Untagged DDP 880 Message sent after the Tagged DDP Message which details the 881 initial TO for the STag that was used in the Tagged DDP 882 Message. And finally, another implementation of a ULP could 883 choose to always use an initial TO of zero such that no 884 additional message is required to convey the initial TO used in 885 a Tagged DDP Message. 887 Regardless of whether the ULP chooses to recover the original ULP 888 Message boundary at the Data Sink for a Tagged DDP Message, DDP 889 supports segmentation and reassembly of the Tagged DDP Message. The 890 STag is used to identify the ULP Buffer at the Data Sink and the TO 891 is used to identify the octet-offset within the ULP Buffer 892 referenced by the STag. The ULP at the Data Source MUST specify the 893 STag and the initial TO when the ULP Message is handed to DDP. 895 For each DDP Segment of a Tagged DDP Message, the TO MUST be set to 896 the octet offset from the first octet in the associated ULP Message 897 to the first octet in the ULP Payload contained in the DDP Segment, 898 plus the TO assigned to the first octet in the associated ULP 899 Message. 901 For example, if the ULP Tagged Message was 2048 octets with an 902 initial TO of 16384, and the MULPDU was 1500 octets, the Data Source 903 would generate two DDP Segments, one with TO = 16384, containing the 904 first 1486 octets of ULP payload, and a second with TO = 17870, 905 containing 562 octets of ULP payload. In this example, the amount of 906 ULP payload for the first DDP Segment was calculated as: 908 1486 = 1500 (MULPDU) - 14 (for the DDP Header) 910 A zero-length Tagged DDP Message is allowed and MUST consume exactly 911 one DDP Segment. Only the DDP Control and RsvdULP Fields MUST be 912 valid for a zero length Tagged DDP Segment. The STag and TO fields 913 MUST NOT be checked for a zero-length Tagged DDP Message. 915 For either Untagged or Tagged DDP Messages, the Data Sink is not 916 required to verify that the entire ULP Message has been received. 918 7.3 Ordering Among DDP Messages 920 Messages passed through the DDP MUST conform to the ordering rules 921 defined in this section. 922 shah, et. al. Expires August 2003 22 923 At the Data Source, DDP: 925 * MUST transmit DDP Messages in the order they were submitted to 926 the DDP layer, 928 * SHOULD transmit DDP Segments within a DDP Message in increasing 929 MO order for Untagged DDP Messages and in increasing TO order 930 for Tagged DDP Messages. 932 At the Data Sink, DDP (Note: The following rules are motivated by 933 LLP implementations that separate Placement and Delivery.): 935 * MAY perform Placement of DDP Segments out of order, 937 * MAY perform Placement of a DDP Segment more than once, 939 * MUST Deliver a DDP Message to the ULP at most once, 941 * MUST Deliver DDP Messages to the ULP in the order they were 942 sent by the Data Source. 944 7.4 DDP Message Completion & Delivery 946 At the Data Source, DDP Message transfer is considered completed 947 when the reliable, in-order transport LLP has indicated that the 948 transfer will occur reliably. Note that this in no way restricts the 949 LLP from buffering the data at either the Data Source or Data Sink. 950 Thus at the Data Source, completion of a DDP Message does not 951 necessarily mean that the Data Sink has received the message. 953 At the Data Sink, DDP MUST Deliver a DDP Message if and only if all 954 of the following are true: 956 * the last DDP Segment of the DDP Message had its Last flag set, 958 * all of the DDP Segments of the DDP Message have been Placed, 960 * all preceding DDP Messages have been Placed, and 962 * each preceding DDP Message has been Delivered to the ULP. 964 At the Data Sink, DDP MUST provide the ULP Message Length to the ULP 965 when an Untagged DDP Message is Delivered. The ULP Message Length 966 may be calculated by adding the MO and the ULP Payload length in the 967 last DDP Segment (with the Last flag set) of an Untagged DDP 968 Message. 970 shah, et. al. Expires August 2003 23 971 At the Data Sink, DDP MUST provide the RsvdULP Field of the DDP 972 Message to the ULP when the DDP Message is delivered. 974 shah, et. al. Expires August 2003 24 975 8 DDP Stream Setup & Teardown 977 This section describes LLP independent issues related to DDP Stream 978 setup and teardown. 980 8.1 DDP Stream Setup 982 It is expected that the ULP will use a mechanism outside the scope 983 of this specification to establish an LLP Connection, and that the 984 LLP Connection will support one or more LLP Streams (e.g. MPA/TCP or 985 SCTP). After the LLP sets up the LLP Stream, it will enable a DDP 986 Stream on a specific LLP Stream at an appropriate point. 988 The ULP is required to enable both endpoints of an LLP Stream for 989 DDP data transfer at the same time, in both directions; this is 990 necessary so that the Data Sink can properly recognize the DDP 991 Segments. 993 8.2 DDP Stream Teardown 995 DDP MUST NOT independently initiate Stream Teardown. DDP either 996 responds to a stream being torn down by the LLP or processes a 997 request from the ULP to teardown a stream. DDP Stream teardown 998 disables DDP capabilities on both endpoints. For connection-oriented 999 LLPs, DDP Stream teardown MAY result in underlying LLP Connection 1000 teardown. 1002 8.2.1 DDP Graceful Teardown 1004 It is up to the ULP to ensure that DDP teardown happens on both 1005 endpoints of the DDP Stream at the same time; this is necessary so 1006 that the Data Sink stops trying to interpret the DDP Segments. 1008 If the Local Peer ULP indicates graceful teardown, the DDP layer on 1009 the Local Peer SHOULD ensure that all ULP data would be transferred 1010 before the underlying LLP Stream & Connection are torn down, and any 1011 further data transfer requests by the Local Peer ULP MUST return an 1012 error. 1014 If the DDP layer on the Local Peer receives a graceful teardown 1015 request from the LLP, any further data received after the request is 1016 considered an error and MUST cause the DDP Stream to be abortively 1017 torn down. 1019 If the Local Peer LLP supports a half-closed LLP Stream, on the 1020 receipt of a LLP graceful teardown request of the DDP Stream, DDP 1021 SHOULD indicate the half-closed state to the ULP, and continue to 1022 process outbound data transfer requests normally. Following this 1023 shah, et. al. Expires August 2003 25 1024 event, when the Local Peer ULP requests graceful teardown, DDP MUST 1025 indicate to the LLP that it SHOULD perform a graceful close of the 1026 other half of the LLP Stream. 1028 If the Local Peer LLP supports a half-closed LLP Stream, on the 1029 receipt of a ULP graceful half-close teardown request of the DDP 1030 Stream, DDP SHOULD keep data reception enabled on the other half of 1031 the LLP stream. 1033 8.2.2 DDP Abortive Teardown 1035 As previously mentioned, DDP does not independently terminate a DDP 1036 Stream. Thus any of the following fatal errors on a DDP Stream MUST 1037 cause DDP to indicate to the ULP that a fatal error has occurred: 1039 * Underlying LLP Connection or LLP Stream is lost. 1041 * Underlying LLP reports a catastrophic error. 1043 * DDP Header has one or more invalid fields. 1045 If the LLP indicates to the ULP that a fatal error has occurred, the 1046 DDP layer SHOULD report the error to the ULP (see Section 9.2, DDP 1047 Error Numbers) and complete all outstanding ULP requests with an 1048 error. If the underlying LLP Stream is still intact, DDP SHOULD 1049 continue to allow the ULP to transfer additional DDP Messages on the 1050 outgoing half connection after the fatal error was indicated to the 1051 ULP. This enables the ULP to transfer an error syndrome to the 1052 Remote Peer. After indicating to the ULP a fatal error has occurred, 1053 the DDP Stream MUST NOT be terminated until the Local Peer ULP 1054 indicates to the DDP layer that the DDP Stream should be abortively 1055 torndown. 1057 shah, et. al. Expires August 2003 26 1058 9 Error Semantics 1060 All LLP errors reported to DDP SHOULD be passed up to the ULP. 1062 9.1 Errors detected at the Data Sink 1064 For non-zero length Untagged DDP Segments, the DDP Segment MUST be 1065 validated before Placement by verifying: 1067 1. The QN is valid for this stream. 1069 2. The QN and MSN have an associated buffer that allows Placement 1070 of the payload. 1072 3. The MO falls in the range of legal offsets associated with the 1073 Untagged Buffer. 1075 4. The sum of the DDP Segment payload length and the MO falls in 1076 the range of legal offsets associated with the Untagged Buffer. 1078 5. For DDP Messages using Untagged Buffer model, the Message 1079 Sequence Number falls in the range of legal Message Sequence 1080 Numbers, for the queue defined by the QN. The legal range is 1081 defined as being between the MSN value assigned to the first 1082 available buffer for a specific QN and the MSN value assigned to 1083 the last available buffer for a specific QN. 1085 Implementers note: for a typical Queue Number, the lower limit 1086 of the Message Sequence Number is defined by whatever DDP 1087 Messages have already been Completed. The upper limit is 1088 defined by however many message buffers are currently available 1089 for that queue. Both numbers change dynamically as new DDP 1090 Messages are received and Completed, and new buffers are added. 1091 It is up to the ULP to ensure that sufficient buffers are 1092 available to handle the incoming DDP Segments. 1094 For non-zero length Tagged DDP Segments, the segment MUST be 1095 validated before Placement by verifying: 1097 1. The STag is valid for this stream. 1099 2. The STag has an associated buffer that allows Placement of the 1100 payload. 1102 3. The TO falls in the range of legal offsets registered for the 1103 STag. 1105 shah, et. al. Expires August 2003 27 1106 4. The sum of the DDP Segment payload length and the TO falls in 1107 the range of legal offsets registered for the STag. 1109 5. A 64-bit unsigned sum of the DDP Segment payload length and the 1110 TO does not wrap. 1112 If the DDP layer detects any of the receive errors listed in this 1113 section, it MUST cease placing the remainder of the DDP Segment and 1114 report the error(s) to the ULP. The DDP layer SHOULD include in the 1115 error report the DDP Header, the type of error, and the length of 1116 the DDP segment, if available. DDP MUST silently drop any subsequent 1117 incoming DDP Segments. Since each of these errors represents a 1118 failure of the sending ULP or protocol, DDP SHOULD enable the ULP to 1119 send one additional DDP Message before terminating the DDP Stream. 1121 9.2 DDP Error Numbers 1123 The following error numbers MUST be used when reporting receive 1124 errors to the ULP. They correspond to the checks enumerated in 1125 section 9.1. Each error is subdivided into a 4-bit Error Type and an 1126 8 bit Error Code. 1128 Error Error 1129 Type Code Description 1130 ---------------------------------------------------------- 1131 0x0 0x00 Local Catastrophic 1133 0x1 Tagged Buffer Error 1134 0x00 Invalid STag 1135 0x01 Base or bounds violation 1136 0x02 STag not associated with DDP Stream 1137 0x03 TO wrap 1138 0x04 Invalid DDP version 1140 0x2 Untagged Buffer Error 1141 0x01 Invalid QN 1142 0x02 Invalid MSN - no buffer available 1143 0x03 Invalid MSN - MSN range is not valid 1144 0x04 Invalid MO 1145 0x05 DDP Message too long for available buffer 1146 0x06 Invalid DDP version 1148 0x3 Rsvd Reserved for the use by the LLP 1150 shah, et. al. Expires August 2003 28 1151 10 Security Considerations 1153 This section discusses both protocol-specific considerations and the 1154 implications of using DDP with existing security mechanisms. 1156 10.1 Protocol-specific Security Considerations 1158 The vulnerabilities of DDP to active third-party interference are no 1159 greater than any other protocol running over TCP. A third party, by 1160 injecting spoofed packets into the network that are Delivered to a 1161 DDP Data Sink, could launch a variety of attacks that exploit DDP- 1162 specific behavior. Since DDP directly or indirectly exposes memory 1163 addresses on the wire, the Placement information carried in each DDP 1164 Segment must be validated, including invalid STag and octet level 1165 granularity base and bounds check, before any data is Placed. For 1166 example, a third-party adversary could inject random packets that 1167 appear to be valid DDP Segments and corrupt the memory on a DDP Data 1168 Sink. Since DDP is IP transport protocol independent, communication 1169 security mechanisms such as IPsec [IPSEC] or TLS [TLS] may be used 1170 to prevent such attacks. 1172 10.2 Using IPSec with DDP 1174 IPsec can be used to protect against the packet injection attacks 1175 outlined above. Because IPsec is designed to secure arbitrary IP 1176 packet streams, including streams where packets are lost, DDP can 1177 run on top of IPsec without any change. IPsec packets are processed 1178 (e.g., integrity checked and possibly decrypted) in the order they 1179 are received, and a DDP Data Sink will process the decrypted DDP 1180 Segments contained in these packets in the same manner as DDP 1181 Segments contained in unsecured IP packets. 1183 10.3 Association of an STag and a DDP Stream 1185 There are several mechanisms for associating an STag and a DDP 1186 Stream. Two reasonable mechanisms for this association are a 1187 Protection Domain (PD) association and a DDP Stream association. 1189 Under the Protection Domain (PD) association, a unique Protection 1190 Domain Identifier (PD ID) is created and used locally to associate 1191 an STag with a set of DDP Streams. Under this mechanism, the use of 1192 the STag is only permitted on the DDP Streams that have the same PD 1193 ID as the STag. For an incoming DDP Segment of a Tagged DDP Message 1194 on a DDP Stream, if the PD ID of the DDP Stream is not the same as 1195 the PD ID of the STag targeted by the Tagged DDP Message, then the 1196 DDP Segment is not placed and the DDP layer MUST surface a local 1197 error to the ULP. Note that the PD ID is locally defined, and cannot 1198 be directly manipulated by the Remote Peer. 1199 shah, et. al. Expires August 2003 29 1200 Under the DDP Stream association, a DDP Stream is identified locally 1201 by a unique DDP Stream identifier (ID). An STag is associated with a 1202 DDP Stream by using a DDP Stream ID. In this case, for an incoming 1203 DDP Segment of a Tagged DDP Message on a DDP Stream, if the DDP 1204 Stream ID of the DDP Stream is not the same as the DDP Stream ID of 1205 the STag targeted by the Tagged DDP Message, then the DDP Segment is 1206 not placed and the DDP layer MUST surface a local error to the ULP. 1207 Note that the DDP Stream ID is locally defined, and cannot be 1208 directly manipulated by the Remote Peer. 1210 A ULP SHOULD associate an STag and a DDP Stream. DDP MUST support 1211 Protection Domain association and DDP Stream association mechanisms 1212 for associating an STag and a DDP Stream. 1214 10.4 Other Security Considerations 1216 DDP has several mechanisms that deal with a number of attacks. 1217 These attacks include, but are not limited to: 1219 1. Connection to/from an unauthorized or unauthenticated endpoint. 1220 2. Hijacking of a DDP Stream. 1221 3. Attempts to read or write from unauthorized memory regions. 1222 4. Injection of RDMA Messages within a Stream on a multi-user 1223 operating system by another application. 1225 DDP relies on the LLP to establish the LLP Stream over which DDP 1226 Messages will be carried. DDP itself does nothing to authenticate 1227 the validity of the LLP Stream of either of the endpoints. It is the 1228 responsibility of the ULP to validate the LLP Stream. This is highly 1229 desirable due to the nature of DDP. 1231 Hijacking of an DDP Stream would require that the underlying LLP 1232 Stream is hijacked. This would require knowledge of Advertised 1233 buffers in order to directly Place data into a user buffer and is 1234 therefore constrained by the same techniques mentioned to guard 1235 against attempts to read or write from unauthorized memory regions. 1237 DDP does not require a node to open its buffers to arbitrary attacks 1238 over the DDP Stream. It may access ULP memory only to the extent 1239 that the ULP has enabled and authorized it to do so. The STag 1240 access control model is defined by a (forthcoming) document. 1241 Specific security operations include: 1243 1. STags are only valid over the exact byte range established by the 1244 ULP. DDP MUST provide a mechanism for the ULP to establish and 1246 shah, et. al. Expires August 2003 30 1247 revoke the TO range associated with the ULP Buffer referenced by 1248 the STag. 1249 2. STags are only valid for the duration established by the ULP. The 1250 ULP may revoke them at any time, in accordance with its own upper 1251 layer protocol requirements. DDP MUST provide a mechanism for the 1252 ULP to establish and revoke STag validity. 1253 3. DDP MUST provide a mechanism for the ULP to communicate the 1254 association between a STag and a specific DDP Stream. 1255 4. A ULP may only expose memory to remote access to the extent that 1256 it already had access to that memory itself. 1257 5. If an STag is not valid on a DDP Stream, DDP MUST pass the invalid 1258 access attempt to the ULP. The ULP may provide a mechanism for 1259 terminating the DDP Stream. 1261 Further, DDP provides a mechanism that directly Places incoming 1262 payloads in user-mode ULP Buffers. This avoids the risks of prior 1263 solutions that relied upon exposing system buffers for incoming 1264 payloads. 1266 shah, et. al. Expires August 2003 31 1267 11 IANA Considerations 1269 If DDP was enabled a priori for a ULP by connecting to a well-known 1270 port, this well-known port would be registered for the DDP with 1271 IANA. 1273 shah, et. al. Expires August 2003 32 1274 12 References 1276 12.1 Normative References 1278 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 1279 3", BCP 9, RFC 2026, October 1996. 1281 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1282 Requirement Levels", BCP 14, RFC 2119, March 1997. 1284 [MPA] P. Culley et al., "Markers with PDU Alignment", RDMA 1285 Consortium Draft Specification draft-cully-iwarp-mpa-01.doc, 1286 October 2002 1288 [RDMAP] R. Recio et al., "RDMA Protocol Specification", RDMA 1289 Consortium Draft Specification draft-recio-iwarp-01, October 1290 2002 1292 [SCTP] R. Stewart et al., "Stream Control Transmission Protocol", 1293 RFC 2960, October 2000. 1295 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1296 September 1981. 1298 12.2 Informative References 1300 [TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC 1301 2246, November 1998. 1303 [IPSEC] Atkinson, R., Kent, S., "Security Architecture for the 1304 Internet Protocol", RFC 2401, November 1998. 1306 shah, et. al. Expires August 2003 33 1307 13 Appendix 1309 13.1 Receive Window sizing 1311 Reliable, sequenced, LLPs include a mechanism to advertise the 1312 amount of receive buffer space a sender may consume. This is 1313 generally called a "receive window". 1315 DDP allows data to be transferred directly to predefined buffers at 1316 the Data Sink. Accordingly, the LLP receive window size need not be 1317 affected by the reception of a DDP Segment, if that segment is 1318 placed before additional segments arrive. 1320 The LLP implementation SHOULD maintain an advertised receive window 1321 large enough to enable a reasonable number of segments to be 1322 outstanding at one time. The amount to advertise depends on the 1323 desired data rate, and the expected or actual round trip delay 1324 between endpoints. 1326 The amount of actual buffers maintained to "back up" the receive 1327 window is left up to the implementation. This amount will depend on 1328 the rate that DDP Segments can be retired; there may be some cases 1329 where segment processing cannot keep up with the incoming packet 1330 rate. If this occurs, one reasonable way to slow the incoming packet 1331 rate is to reduce the receive window. 1333 Note that the LLP should take care to comply with the applicable 1334 RFCs; for instance, for TCP, receivers are highly discouraged from 1335 "shrinking" the receive window (reducing the right edge of the 1336 window after it has been advertised). 1338 shah, et. al. Expires August 2003 34 1339 14 Author's Addresses 1341 Paul R. Culley 1342 Hewlett-Packard Company 1343 20555 SH 249 1344 Houston, TX 77070-2698 USA 1345 Phone: +1 (281) 514-5543 1346 Email: paul.culley@hp.com 1348 James Pinkerton 1349 Microsoft Corporation 1350 One Microsoft Way 1351 Redmond, WA 98052 USA 1352 Phone: +1 (425) 705-5442 1353 Email: jpink@microsoft.com 1355 Renato Recio 1356 IBM Corporation 1357 11501 Burnett Road 1358 Austin, TX 78758 USA 1359 Phone: +1 (512) 838-1365 1360 Email: recio@us.ibm.com 1362 Hemal Shah 1363 Intel Corporation 1364 MS PTL1 1365 1501 South Mopac Expressway, #400 1366 Austin, TX 78746 USA 1367 Phone: +1 (512) 732-3963 1368 Email: hemal.shah@intel.com 1370 shah, et. al. Expires August 2003 35 1371 15 Acknowledgments 1373 John Carrier 1374 Adaptec, Inc. 1375 691 S. Milpitas Blvd. 1376 Milpitas, CA 95035 USA 1377 Phone: +1 (360) 378-8526 1378 Email: john_carrier@adaptec.com 1380 Hari Ghadia 1381 Adaptec, Inc. 1382 691 S. Milpitas Blvd., 1383 Milpitas, CA 95035 USA 1384 Phone: +1 (408) 957-5608 1385 Email: hari_ghadia@adaptec.com 1387 Patricia Thaler 1388 Agilent Technologies, Inc. 1389 1101 Creekside Ridge Drive, #100 1390 M/S-RG10 1391 Roseville, CA 95678 1392 Phone: +1-916-788-5662 1393 email: pat_thaler@agilent.com 1395 Mike Penna 1396 Broadcom Corporation 1397 16215 Alton Parkway 1398 Irvine, California 92619-7013 USA 1399 Phone: +1 (949) 926-7149 1400 Email: MPenna@Broadcom.com 1402 Uri Elzur 1403 Broadcom Corporation 1404 16215 Alton Parkway 1405 Irvine, California 92619-7013 USA 1406 Phone: +1 (949) 585-6432 1407 Email: Uri@Broadcom.com 1409 Ted Compton 1410 EMC Corporation 1411 Research Triangle Park, NC 27709, USA 1412 Phone: 919-248-6075 1413 Email: compton_ted@emc.com 1415 Jim Wendt 1416 Hewlett-Packard Company 1417 8000 Foothills Boulevard 1418 Roseville, CA 95747-5668 USA 1419 shah, et. al. Expires August 2003 36 1420 Phone: +1 (916) 785-5198 1421 Email: jim_wendt@hp.com 1423 Mike Krause 1424 Hewlett-Packard Company, 43LN 1425 19410 Homestead Road 1426 Cupertino, CA 95014 USA 1427 Phone: +1 (408) 447-3191 1428 Email: krause@cup.hp.com 1430 Dave Minturn 1431 Intel Corporation 1432 MS JF1-210 1433 5200 North East Elam Young Parkway 1434 Hillsboro, OR 97124 USA 1435 Phone: +1 (503) 712-4106 1436 Email: dave.b.minturn@intel.com 1438 Howard C. Herbert 1439 Intel Corporation 1440 MS CH7-404 1441 5000 West Chandler Blvd. 1442 Chandler, AZ 85226 USA 1443 Phone: +1 (480) 554-3116 1444 Email: howard.c.herbert@intel.com 1446 Tom Talpey 1447 Network Appliance 1448 375 Totten Pond Road 1449 Waltham, MA 02451 USA 1450 Phone: +1 (781) 768-5329 1451 EMail: thomas.talpey@netapp.com 1453 Dwight Barron 1454 Hewlett-Packard Company 1455 20555 SH 249 1456 Houston, TX 77070-2698 USA 1457 Phone: +1 (281) 514-2769 1458 Email: Dwight.Barron@Hp.com 1460 Dave Garcia 1461 Hewlett-Packard Company 1462 19333 Vallco Parkway 1463 Cupertino, Ca. 95014 USA 1464 Phone: +1 (408) 285-6116 1465 Email: dave.garcia@hp.com 1467 shah, et. al. Expires August 2003 37 1468 Jeff Hilland 1469 Hewlett-Packard Company 1470 20555 SH 249 1471 Houston, Tx. 77070-2698 USA 1472 Phone: +1 (281) 514-9489 1473 Email: jeff.hilland@hp.com 1475 shah, et. al. Expires August 2003 38 1476 16 Full Copyright Statement 1478 This document and the information contained herein is provided on an 1479 "AS IS" basis and ADAPTEC INC., AGILENT TECHNOLOGIES INC., BROADCOM 1480 CORPORATION, CISCO SYSTEMS INC., EMC CORPORATION, HEWLETT-PACKARD 1481 COMPANY, INTERNATIONAL BUSINESS MACHINES CORPORATION, INTEL 1482 CORPORATION, MICROSOFT CORPORATION, NETWORK APPLIANCE INC., THE 1483 INTERNET SOCIETY, AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM 1484 ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 1485 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 1486 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 1487 FOR A PARTICULAR PURPOSE. 1489 Copyright (c) 2002 ADAPTEC INC., BROADCOM CORPORATION, CISCO SYSTEMS 1490 INC., EMC CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONAL 1491 BUSINESS MACHINES CORPORATION, INTEL CORPORATION, MICROSOFT 1492 CORPORATION, NETWORK APPLIANCE INC., All Rights Reserved. 1494 shah, et. al. Expires August 2003 39