idnits 2.17.1 draft-ietf-rddp-ddp-01.txt: ** The Abstract section seems to be numbered Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 3 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RDMA' is mentioned on line 257, but not defined == Missing Reference: 'DDP' is mentioned on line 350, but not defined == Unused Reference: 'RFC2026' is defined on line 1292, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 1295, but no explicit reference was found in the text -- No information found for draft-cully-iwarp-mpa - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'MPA' -- No information found for draft-recio-iwarp - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'RDMAP' ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) Summary: 4 errors (**), 0 flaws (~~), 7 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET-DRAFT Hemal Shah 2 draft-ietf-rddp-ddp-01.txt Intel Corporation 3 James Pinkerton 4 Microsoft Corporation 5 Renato Recio 6 IBM Corporation 7 Paul Culley 8 Hewlett-Packard Company 10 Expires: April, 2004 12 Direct Data Placement over Reliable Transports 14 1 Status of this Memo 16 This document is an Internet-Draft and is subject to all provisions 17 of Section 10 of RFC2026. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six 25 months and may be updated, replaced, or obsoleted by other documents 26 at any time. It is inappropriate to use Internet-Drafts as 27 reference material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/1id-abstracts.html The list of Internet-Draft 31 Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 2 Abstract 36 The Direct Data Placement protocol provides information to Place the 37 incoming data directly into an upper layer protocol's receive buffer 38 without intermediate buffers. This removes excess CPU and memory 39 utilization associated with transferring data through the 40 intermediate buffers. 42 shah, et. al. Expires April 2004 1 43 Table of Contents 45 1 Status of this Memo..........................................1 46 2 Abstract.....................................................1 47 3 Introduction.................................................4 48 3.1 Architectural Goals..........................................4 49 3.2 Protocol Overview............................................5 50 3.3 DDP Layering.................................................6 51 4 Glossary.....................................................9 52 4.1 General......................................................9 53 4.2 LLP.........................................................10 54 4.3 Direct Data Placement (DDP).................................10 55 5 Reliable Delivery LLP Requirements..........................13 56 6 Header Format...............................................15 57 6.1 DDP Control Field...........................................15 58 6.2 DDP Tagged Buffer Model Header..............................16 59 6.3 DDP Untagged Buffer Model Header............................17 60 6.4 DDP Segment Format..........................................18 61 7 Data Transfer...............................................19 62 7.1 DDP Tagged or Untagged Buffer Models........................19 63 7.1.1 Tagged Buffer Model.......................................19 64 7.1.2 Untagged Buffer Model.....................................19 65 7.2 Segmentation and Reassembly of a DDP Message................19 66 7.3 Ordering Among DDP Messages.................................21 67 7.4 DDP Message Completion & Delivery...........................22 68 8 DDP Stream Setup & Teardown.................................23 69 8.1 DDP Stream Setup............................................23 70 8.2 DDP Stream Teardown.........................................23 71 8.2.1 DDP Graceful Teardown.....................................23 72 8.2.2 DDP Abortive Teardown.....................................24 73 9 Error Semantics.............................................25 74 9.1 Errors detected at the Data Sink............................25 75 9.2 DDP Error Numbers...........................................26 76 10 Security Considerations.....................................27 77 10.1 Protocol-specific Security Considerations.................27 78 10.2 Using IPSec with DDP......................................27 79 10.3 Association of an STag and a DDP Stream...................27 80 10.4 Other Security Considerations.............................28 81 11 IANA Considerations.........................................30 82 12 References..................................................31 83 12.1 Normative References......................................31 84 12.2 Informative References....................................31 85 13 Appendix....................................................32 86 13.1 Receive Window sizing.....................................32 87 14 Author's Addresses..........................................33 88 15 Acknowledgments.............................................34 89 16 Full Copyright Statement....................................36 91 Table of Figures 93 Figure 1 DDP Layering.............................................7 94 Figure 2 MPA, DDP, and RDMAP Header Alignment.....................8 96 shah, et. al. Expires April 2004 2 97 Figure 3 DDP Control Field.......................................15 98 Figure 4 Tagged Buffer DDP Header................................16 99 Figure 5 Untagged Buffer DDP Header..............................17 100 Figure 6 DDP Segment Format......................................18 102 shah, et. al. Expires April 2004 3 103 3 Introduction 105 Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol 106 (ULP) to send data to a Data Sink without requiring the Data Sink to 107 Place the data in an intermediate buffer - thus when the data 108 arrives at the Data Sink, the network interface can Place the data 109 directly into the ULP's buffer. This can enable the Data Sink to 110 consume substantially less memory bandwidth than a buffered model 111 because the Data Sink is not required to move the data from the 112 intermediate buffer to the final destination. Additionally, this can 113 also enable the network protocol to consume substantially fewer CPU 114 cycles than if the CPU was used to move the data, and removes the 115 bandwidth limitation of only being able to move data as fast as the 116 CPU can copy the data. 118 DDP preserves ULP record boundaries (messages) while providing a 119 variety of data transfer mechanisms and completion mechanisms to be 120 used to transfer ULP messages. 122 3.1 Architectural Goals 124 DDP has been designed with the following high-level architectural 125 goals: 127 * Provide a buffer model that enables the Local Peer to Advertise 128 a named buffer (i.e. a Tag for a buffer) to the Remote Peer, 129 such that across the network the Remote Peer can Place data 130 into the buffer at Remote Peer specified locations. This is 131 referred to as the Tagged Buffer Model. 133 * Provide a second receive buffer model which preserves ULP 134 message boundaries from the Remote Peer and keeps the Local 135 Peer's buffers anonymous (i.e. Untagged). This is referred to 136 as the Untagged Buffer Model. 138 * Provide reliable, in-order Delivery semantics for both Tagged 139 and Untagged Buffer Models. 141 * Provide segmentation and reassembly of ULP messages. 143 * Enable the ULP buffer to be used as a reassembly buffer, 144 without a need for a copy, even if incoming DDP Segments arrive 145 out of order. This requires the protocol to separate Data 146 Placement of ULP Payload contained in an incoming DDP Segment 147 from Data Delivery of completed ULP Messages. 149 * If the LLP supports multiple LLP streams within a LLP 150 Connection, provide the above capabilities independently on 151 each LLP stream and enable the capability to be exported on a 152 per LLP stream basis to the ULP. 154 shah, et. al. Expires April 2004 4 155 3.2 Protocol Overview 157 DDP supports two basic data transfer models - a Tagged Buffer data 158 transfer model and an Untagged Buffer data transfer model. 160 The Tagged Buffer data transfer model requires the Data Sink to send 161 the Data Source an identifier for the ULP buffer, referred to as a 162 Steering Tag (STag). The STag is transferred to the Data Source 163 using a ULP defined method. Once the Data Source ULP has an STag for 164 a destination ULP buffer, it can request that DDP send the ULP data 165 to the destination ULP buffer by specifying the STag to DDP. Note 166 that the Tagged Buffer does not have to be filled starting at the 167 beginning of the ULP buffer. The ULP Data Source can provide an 168 arbitrary offset into the ULP buffer. 170 The Untagged Buffer data transfer model enables data transfer to 171 occur without requiring the Data Sink to Advertise a ULP Buffer to 172 the Data Source. The Data Sink can queue up a series of receive ULP 173 buffers. An Untagged DDP Message from the Data Source consumes an 174 Untagged Buffer at the Data Sink. Because DDP is message oriented, 175 even if the Data Source sends a DDP Message payload smaller than the 176 receive ULP buffer, the partially filled receive ULP buffer is 177 Delivered to the ULP anyway. If the Data Source sends a DDP Message 178 payload larger than the receive ULP buffer, it results in an error. 180 There are several key differences between the Tagged and Untagged 181 Buffer Model: 183 * For the Tagged Buffer Model, the Data Source specifies which 184 received Tagged Buffer will be used for a specific Tagged DDP 185 Message (sender-based ULP buffer management). For the Untagged 186 Buffer Model, the Data Sink specifies the order in which 187 Untagged Buffers will be consumed as Untagged DDP Messages are 188 received (receiver-based ULP buffer management). 190 * For the Tagged Buffer Model, the ULP at the Data Sink must 191 Advertise the ULP buffer to the Data Source through a ULP 192 specific mechanism before data transfer can occur. For the 193 Untagged Buffer Model, data transfer can occur without an end- 194 to-end explicit ULP buffer Advertisement. Note, however, that 195 the ULP needs to address flow control issues. 197 * For the Tagged Buffer Model, a DDP Message can start at an 198 arbitrary offset within the Tagged Buffer. For the Untagged 199 Buffer Model, a DDP Message can only start at offset 0. 201 * The Tagged Buffer Model allows multiple DDP Messages targeted 202 to a Tagged Buffer with a single ULP buffer Advertisement. The 203 Untagged Buffer Model requires associating a receive ULP buffer 204 for each DDP Message targeted to an Untagged Buffer. 206 Either data transfer model Places a ULP Message into a DDP Message. 207 Each DDP Message is then sliced into DDP Segments that are intended 209 shah, et. al. Expires April 2004 5 210 to fit within a lower-layer-protocol's (LLP) Maximum Upper Layer 211 Protocol Data Unit (MULPDU). Thus the ULP can post arbitrary size 212 ULP Messages, containing up to 2^32 - 1 octets of ULP Payload, and 213 DDP slices the ULP message into DDP Segments which are reassembled 214 transparently at the Data Sink. 216 DDP provides in-order Delivery for the ULP. However, DDP 217 differentiates between Data Delivery and Data Placement. DDP 218 provides enough information in each DDP Segment to allow the ULP 219 Payload in each inbound DDP Segment payloads to be directly Placed 220 into the correct ULP Buffer, even when the DDP Segments arrive out- 221 of-order. Thus, DDP enables the reassembly of ULP Payload contained 222 in DDP Segments of a DDP Message into a ULP Message to occur within 223 the ULP Buffer, therefore eliminating the traditional copy out of 224 the reassembly buffer into the ULP Buffer. 226 A DDP Message's payload is Delivered to the ULP when: 228 * all DDP Segments of a DDP Message have been completely received 229 and the payload of the DDP Message has been Placed into the 230 associated ULP Buffer, 232 * all prior DDP Messages have been Placed, and 234 * all prior DDP Message Deliveries have been performed. 236 The LLP under DDP may support a single LLP stream of data per 237 connection (e.g. TCP) or multiple LLP streams of data per connection 238 (e.g. SCTP). But in either case, DDP is specified such that each DDP 239 Stream is independent and maps to a single LLP stream. Within a 240 specific DDP Stream, the LLP Stream is required to provide in-order, 241 reliable Delivery. Note that DDP has no ordering guarantees between 242 DDP Streams. 244 A DDP protocol could potentially run over reliable Delivery LLPs or 245 unreliable Delivery LLPs. This specification requires reliable, in 246 order Delivery LLPs. 248 3.3 DDP Layering 250 DDP is intended to be LLP independent, subject to the requirements 251 defined in section 5. However, DDP was specifically defined to be 252 part of a family of protocols that were created to work well 253 together, as shown in Figure 1 DDP Layering. For LLP protocol 254 definitions of each LLP, see [MPA], [TCP], and [SCTP]. 256 DDP enables direct data Placement capability for any ULP, but it has 257 been specifically designed to work well with RDMAP (see [RDMA]), and 258 is part of the iWARP protocol suite. 260 shah, et. al. Expires April 2004 6 261 +-------------------+ 262 | | 263 | RDMA ULP | 264 | | 265 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 266 | | | 267 | ULP | RDMAP | 268 | | | 269 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 270 | | 271 | DDP protocol | 272 | | 273 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 274 | | | 275 | MPA | | 276 | | | 277 | | | 278 +-+-+-+-+-+-+-+-+-+ SCTP | 279 | | | 280 | TCP | | 281 | | | 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 284 Figure 1 DDP Layering 286 If DDP is layered below RDMAP and on top of MPA and TCP, then the 287 respective headers and payload are arranged as follows (Note: For 288 clarity, MPA header and CRC are included but framing markers are not 289 shown.): 291 shah, et. al. Expires April 2004 7 292 0 1 2 3 293 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 294 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 295 | | 296 // TCP Header // 297 | | 298 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 299 | MPA Header | | 300 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 301 | | 302 // DDP Header // 303 | | 304 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 305 | | 306 // RDMAP Header // 307 | | 308 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 309 | | 310 // RDMAP ULP Payload // 311 | | 312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 313 | MPA CRC | 314 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 316 Figure 2 MPA, DDP, and RDMAP Header Alignment 318 shah, et. al. Expires April 2004 8 319 4 Glossary 321 4.1 General 323 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 324 the act of informing a Remote Peer that a local RDMA Buffer is 325 available to it. A Node makes available an RDMA Buffer for 326 incoming RDMA Read or RDMA Write access by informing its 327 RDMA/DDP peer of the Tagged Buffer identifiers (STag, base 328 address, length). This advertisement of Tagged Buffer 329 information is not defined by RDMA/DDP and is left to the ULP. A 330 typical method would be for the Local Peer to embed the Tagged 331 Buffer's Steering Tag, address, and length in a Send message 332 destined for the Remote Peer. 334 Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined 335 as the process of informing the ULP or consumer that a 336 particular Message is available for use. This is specifically 337 different from "Placement", which may generally occur in any 338 order, while the order of "Delivery" is strictly defined. See 339 "Data Placement". 341 Data Sink - The peer receiving a data payload. Note that the Data 342 Sink can be required to both send and receive RDMA/DDP Messages 343 to transfer a data payload. 345 Data Source - The peer sending a data payload. Note that the Data 346 Source can be required to both send and receive RDMA/DDP 347 Messages to transfer a data payload. 349 iWARP - A suite of wire protocols comprised of RDMAP [RDMAP], DDP 350 [DDP], and MPA [MPA]. The iWARP protocol suite may be layered 351 above TCP, SCTP, or other transport protocols. 353 Local Peer - The RDMA/DDP protocol implementation on the local end 354 of the connection. Used to refer to the local entity when 355 describing a protocol exchange or other interaction between two 356 Nodes. 358 Node - A computing device attached to one or more links of network. 359 A Node in this context does not refer to a specific application 360 or protocol instantiation running on the computer. A Node may 361 consist of one or more RNICs installed in a host computer. 363 Remote Peer - The RDMA/DDP protocol implementation on the opposite 364 end of the connection. Used to refer to the remote entity when 365 describing protocol exchanges or other interactions between two 366 Nodes. 368 ULP - Upper Layer Protocol. The protocol layer above the protocol 369 layer currently being referenced. The ULP for RDMA/DDP is 370 expected to be an OS, Application, adaptation layer, or 371 proprietary device. The RDMA/DDP documents do not specify a ULP 373 shah, et. al. Expires April 2004 9 374 - they provide a set of semantics that allow a ULP to be 375 designed to utilize RDMA/DDP. 377 ULP Message - the ULP data that is handed to a specific protocol 378 layer for transmission. Data boundaries are preserved as they 379 are transmitted through iWARP. 381 ULP Payload - The ULP data that is contained within a single 382 protocol segment or packet (e.g. a DDP Segment). 384 4.2 LLP 386 LLP - Lower Layer Protocol. The protocol layer beneath the protocol 387 layer currently being referenced. For example, for DDP the LLP 388 is SCTP, MPA, or other transport protocols. For RDMA, the LLP is 389 DDP. 391 LLP Connection - Corresponds to an LLP transport-level connection 392 between the peer LLP layers on two nodes. 394 LLP Stream - Corresponds to a single LLP transport-level stream 395 between the peer LLP layers on two Nodes. One or more LLP 396 Streams may map to a single transport-level LLP Connection. For 397 transport protocols that support multiple streams per connection 398 (e.g. SCTP), a LLP Stream corresponds to one transport-level 399 stream. 401 MULPDU - Maximum ULPDU. The current maximum size of the record that 402 is acceptable for DDP to pass to the LLP for transmission. 404 ULPDU - Upper Layer Protocol Data Unit. The data record defined by 405 the layer above MPA. 407 4.3 Direct Data Placement (DDP) 409 DDP Graceful Teardown - The act of closing a DDP Stream such that 410 all in-progress and pending DDP Messages are allowed to complete 411 successfully. 413 DDP Abortive Teardown - The act of closing a DDP Stream without 414 attempting to complete in-progress and pending DDP Messages. 416 Data Placement (Placement, Placed, Places) - For DDP, this term is 417 specifically used to indicate the process of writing to a data 418 buffer by a DDP implementation. DDP Segments carry Placement 419 information, which may be used by the receiving DDP 420 implementation to perform Data Placement of the DDP Segment ULP 421 Payload. See "Data Delivery". 423 DDP Control Field - a fixed 8-bit field in the DDP Header. 425 shah, et. al. Expires April 2004 10 426 DDP Header - The header present in all DDP Segments. The DDP Header 427 contains control and Placement fields that are used to define 428 the final Placement location for the ULP Payload carried in a 429 DDP Segment. 431 DDP Message - A ULP defined unit of data interchange, which is 432 subdivided into one or more DDP Segments. This segmentation may 433 occur for a variety of reasons, including segmentation to 434 respect the maximum segment size of the underlying transport 435 protocol. 437 DDP Segment - The smallest unit of data transfer for the DDP 438 protocol. It includes a DDP Header and ULP Payload (if present). 439 A DDP Segment should be sized to fit within the Lower Layer 440 Protocol MULPDU. 442 DDP Stream - a sequence of DDP messages whose ordering is defined by 443 the LLP. For SCTP, a DDP Stream maps directly to an SCTP stream. 444 For MPA, a DDP Stream maps directly to a TCP connection and a 445 single DDP Stream is supported. Note that DDP has no ordering 446 guarantees between DDP Streams. 448 DDP Stream Identifier (ID) � An identifier for a DDP Stream. 450 Direct Data Placement - A mechanism whereby ULP data contained 451 within DDP Segments may be Placed directly into its final 452 destination in memory without processing of the ULP. This may 453 occur even when the DDP Segments arrive out of order. Out of 454 order Placement support may require the Data Sink to implement 455 the LLP and DDP as one functional block. 457 Direct Data Placement Protocol (DDP) - Also, a wire protocol that 458 supports Direct Data Placement by associating explicit memory 459 buffer placement information with the LLP payload units. 461 Message Offset (MO) - For the DDP Untagged Buffer Model, specifies 462 the offset, in octets, from the start of a DDP Message. 464 Message Sequence Number (MSN) - For the DDP Untagged Buffer Model, 465 specifies a sequence number that is increasing with each DDP 466 Message. 468 Protection Domain (PD) � A Mechanism used to associate a DDP Stream 469 and an STag. Under this mechanism, the use of an STag is valid 470 on a DDP Stream if the STag has the same Protection Domain 471 Identifier (PD ID) as the DDP Stream. 473 Protection Domain Identifier (PD ID) � An identifier for the 474 Protection Domain. 476 Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a 477 destination Data Sink queue for a DDP Segment. 479 shah, et. al. Expires April 2004 11 480 Steering Tag - An identifier of a Tagged Buffer on a Node, valid as 481 defined within a protocol specification. 483 STag - Steering Tag 485 Tagged Buffer - A buffer that is explicitly Advertised to the Remote 486 Peer through exchange of an STag, Target Offset, and length. 488 Tagged Buffer Model - A DDP data transfer model used to transfer 489 Tagged Buffers from the Local Peer to the Remote Peer. 491 Tagged DDP Message - A DDP Message that targets a Tagged Buffer. 493 Target Offset (TO) - The offset within a Tagged Buffer on a Node. 495 ULP Buffer - A buffer owned above the DDP Layer and advertised to 496 the DDP Layer either as a Tagged Buffer or an Untagged ULP 497 Buffer. 499 ULP Message Length - is the total length of the ULP Payload contained 500 in a DDP Message. 502 Untagged Buffer - A buffer that is not explicitly Advertised to the 503 Remote Peer. 505 Untagged Buffer Model - A DDP data transfer model used to transfer 506 Untagged Buffers from the Local Peer to the Remote Peer. 508 Untagged DDP Message - A DDP Message that targets an Untagged 509 Buffer. 511 shah, et. al. Expires April 2004 12 512 5 Reliable Delivery LLP Requirements 514 1. LLPs MUST expose MULPDU & MULPDU Changes. This is required so 515 that the DDP layer can perform segmentation aligned with the 516 MULPDU and can adapt as MULPDU changes come about. The corner 517 case of how to handle outstanding requests during a MULPDU 518 change is covered by the requirements below. 520 2. In the event of a MULPDU change, DDP MUST NOT be required by the 521 LLP to re-segment DDP Segments that have been previously posted 522 to the LLP. Note that under pathological conditions the LLP may 523 change the advertised MULPDU more frequently than the queue of 524 previously posted DDP Segment transmit requests is flushed. 525 Under this pathological condition, the LLP transmit queue can 526 contain DDP Messages which were posted multiple MULPDU updates 527 previously, thus there may be no correlation between the queued 528 DDP Segment(s) and the LLP's current value of MULPDU. 530 3. The LLP MUST ensure that if it accepts a DDP Segment, it will 531 transfer it reliably to the receiver or return with an error 532 stating that the transfer failed to complete. 534 4. The LLP MUST preserve DDP Segment and Message boundaries at the 535 Data Sink. 537 5. The LLP MAY provide the incoming segments out of order for 538 Placement, but if it does, it MUST also provide information that 539 specifies what the sender specified order was. 541 6. LLP MUST provide a strong digest (at least equivalent to CRC32- 542 C) to cover at least the DDP Segment. It is believed that some 543 of the existing data integrity digests are not sufficient and 544 that direct memory transfer semantics require a stronger digest 545 than, for example, a simple checksum. 547 7. On receive, the LLP MUST provide the length of the DDP Segment 548 received. This ensures that DDP does not have to carry a length 549 field in its header. 551 8. If an LLP does not support teardown of a LLP stream independent 552 of other LLP streams and a DDP error occurs on a specific DDP 553 Stream, then the LLP MUST label the associated LLP stream as an 554 erroneous LLP stream and MUST NOT allow any further data 555 transfer on that LLP stream after DDP requests the associated 556 DDP Stream to be torn down. 558 9. For a specific LLP Stream, the LLP MUST provide a mechanism to 559 indicate that the LLP Stream has been gracefully torn down. For 560 a specific LLP Connection, the LLP MUST provide a mechanism to 561 indicate that the LLP Connection has been gracefully torn down. 562 Note that if the LLP does not allow an LLP Stream to be torn 563 down independently of the LLP Connection, the above requirements 564 allow the LLP to notify DDP of both events at the same time. 566 shah, et. al. Expires April 2004 13 567 10. For a specific LLP Connection, when all LLP Streams are either 568 gracefully torn down or are labeled as erroneous LLP streams, 569 the LLP Connection MUST be torn down. 571 11. The LLP MUST NOT pass a duplicate DDP Segment to the DDP Layer 572 after it has passed all the previous DDP Segments to the DDP 573 Layer and the associated ordering information for the previous 574 DDP Segments and the current DDP Segment. 576 shah, et. al. Expires April 2004 14 577 6 Header Format 579 DDP has two different header formats: one for Data Placement into 580 Tagged Buffers, and the other for Data Placement into Untagged 581 Buffers. See Section 7.1 for a description of the two models. 583 6.1 DDP Control Field 585 The first 8 bits of the DDP Header carry a DDP Control Field that is 586 common between the two formats. It is shown below in Figure 3, 587 offset by 16 bits to accommodate the MPA header defined in [MPA]. 588 The MPA header is only present if DDP is layered on top of MPA. 590 0 1 2 3 591 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 592 +-+-+-+-+-+-+-+-+ 593 |T|L| Rsvd |DV | 594 +-+-+-+-+-+-+-+-+ 595 Figure 3 DDP Control Field 597 T - Tagged flag: 1 bit. 599 Specifies the Tagged or Untagged Buffer Model. If set to one, 600 the ULP Payload carried in this DDP Segment MUST be Placed into 601 a Tagged Buffer. 603 If set to zero, the ULP Payload carried in this DDP Segment 604 MUST be Placed into an Untagged Buffer. 606 L - Last flag: 1 bit. 608 Specifies whether the DDP Segment is the Last segment of a DDP 609 Message. It MUST be set to one on the last DDP Segment of every 610 DDP Message. It MUST NOT be set to one on any other DDP 611 Segment. 613 The DDP Segment with the L bit set to 1 MUST be posted to the 614 LLP after all other DDP Segments of the associated DDP Message 615 have been posted to the LLP. For an Untagged DDP Message, the 616 DDP Segment with the L bit set to 1 MUST carry the highest MO. 618 If the Last flag is set to one, the DDP Message payload MUST be 619 Delivered to the ULP after: 621 . Placement of all DDP Segments of this DDP Message and all 622 prior DDP Messages, and 624 . Delivery of each prior DDP Message. 626 If the Last flag is set to zero, the DDP Segment is an 627 intermediate DDP Segment. 629 shah, et. al. Expires April 2004 15 630 Rsvd - Reserved: 4 bits. 632 Reserved for future use by the DDP protocol. This field MUST be 633 set to zero on transmit, and not checked on receive. 635 DV - Direct Data Placement Protocol Version: 2 bits. 637 The version of the DDP Protocol in use. This field MUST be set 638 to one to indicate the version of the specification described 639 in this document. The value of DV MUST be the same for all the 640 DDP Segments transmitted or received on a DDP Stream. 642 6.2 DDP Tagged Buffer Model Header 644 Figure 4 shows the DDP Header format that MUST be used in all DDP 645 Segments that target Tagged Buffers. It includes the DDP Control 646 Field previously defined in Section 6.1. (Note: In Figure 4, the DDP 647 Header is offset by 16 bits to accommodate the MPA header defined in 648 [MPA]. The MPA header is only present if DDP is layered on top of 649 MPA.) 651 0 1 2 3 652 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 653 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 654 |T|L| Rsvd | DV| RsvdULP | 655 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 656 | STag | 657 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 658 | | 659 + TO + 660 | | 661 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 662 Figure 4 Tagged Buffer DDP Header 664 T is set to one. 666 RsvdULP - Reserved for use by the ULP: 8 bits. 668 The RsvdULP field is opaque to the DDP protocol and can be 669 structured in any way by the ULP. At the Data Source, DDP MUST 670 set RsvdULP Field to the value specified by the ULP. It is 671 transferred unmodified from the Data Source to the Data Sink. 672 At the Data Sink, DDP MUST provide the RsvdULP field to the ULP 673 when the DDP Message is delivered. Each DDP Segment within a 674 specific DDP Message MUST contain the same value for this 675 field. 677 STag - Steering Tag: 32 bits. 679 The Steering Tag identifies the Data Sink's Tagged Buffer. The 680 STag MUST be valid for this DDP Stream. The STag is associated 681 with the DDP Stream through a mechanism that is outside the 682 scope of the DDP Protocol specification. At the Data Source, 684 shah, et. al. Expires April 2004 16 685 DDP MUST set the STag field to the value specified by the ULP. 686 At the Data Sink, the DDP MUST provide the STag field when the 687 ULP Message is delivered. Each DDP Segment within a specific 688 DDP Message MUST contain the same value for this field and MUST 689 be the value supplied by the ULP. 691 TO - Tagged Offset: 64 bits. 693 The Tagged Offset specifies the offset, in octets, within the 694 Data Sink's Tagged Buffer, where the Placement of ULP Payload 695 contained in the DDP Segment starts. A DDP Message MAY start at 696 an arbitrary TO within a Tagged Buffer. 698 6.3 DDP Untagged Buffer Model Header 700 Figure 5 shows the DDP Header format that MUST be used in all DDP 701 Segments that target Untagged Buffers. It includes the DDP Control 702 Field previously defined in Section 6.1. (Note: In Figure 5, the DDP 703 Header is offset by 16 bits to accommodate the MPA header defined in 704 [MPA]. The MPA header is only present if DDP is layered on top of 705 MPA.) 707 0 1 2 3 708 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 709 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 710 |T|L| Rsvd | DV| RsvdULP[0:7] | 711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 712 | RsvdULP[8:39] | 713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 714 | QN | 715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 716 | MSN | 717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 718 | MO | 719 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 720 Figure 5 Untagged Buffer DDP Header 722 T is set to zero. 724 RsvdULP - Reserved for use by the ULP: 40 bits. 726 The RsvdULP field is opaque to the DDP protocol and can be 727 structured in any way by the ULP. At the Data Source, DDP MUST 728 set RsvdULP Field to the value specified by the ULP. It is 729 transferred unmodified from the Data Source to the Data Sink. 730 At the Data Sink, DDP MUST provide RsvdULP field to the ULP 731 when the ULP Message is Delivered. Each DDP Segment within a 732 specific DDP Message MUST contain the same value for the 733 RsvdULP field. At the Data Sink, the DDP implementation is NOT 734 REQUIRED to verify that the same value is present in the 735 RsvdULP field of each DDP Segment within a specific DDP Message 737 shah, et. al. Expires April 2004 17 738 and MAY provide the value from any one of the received DDP 739 Segment to the ULP when the ULP Message is Delivered. 741 QN - Queue Number: 32 bits. 743 The Queue Number identifies the Data Sink's Untagged Buffer 744 queue referenced by this header. Each DDP segment within a 745 specific DDP message MUST contain the same value for this field 746 and MUST be the value supplied by the ULP at the Data Source. 748 MSN - Message Sequence Number: 32 bits. 750 The Message Sequence Number specifies a sequence number that 751 MUST be increased by one (modulo 2^32) with each DDP Message 752 targeting the specific Queue Number on the DDP Stream 753 associated with this DDP Segment. The initial value for MSN 754 MUST be one. The MSN value MUST wrap to 0 after a value of 755 0xFFFFFFFF. 757 MO - Message Offset: 32 bits. 759 The Message Offset specifies the offset, in octets, from the 760 start of the DDP Message represented by the MSN and Queue 761 Number on the DDP Stream associated with this DDP Segment. The 762 MO referencing the first octet of the DDP Message MUST be set 763 to zero by the DDP layer. 765 6.4 DDP Segment Format 767 Each DDP Segment MUST contain a DDP Header. Each DDP Segment may 768 also contain ULP Payload. Following is the DDP Segment format: 770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 771 | DDP | | 772 | Header| ULP Payload (if any) | 773 | | | 774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 775 Figure 6 DDP Segment Format 777 shah, et. al. Expires April 2004 18 778 7 Data Transfer 780 DDP supports multi-segment DDP Messages. Each DDP Message is 781 composed of one or more DDP Segments. Each DDP Segment contains a 782 DDP Header. The DDP Header contains the information required by the 783 receiver to Place any ULP Payload included in the DDP Segment. 785 7.1 DDP Tagged or Untagged Buffer Models 787 DDP uses two basic Buffer Models for the Placement of the ULP 788 Payload: Tagged Buffer Model and Untagged Buffer Model. 790 7.1.1 Tagged Buffer Model 792 The Tagged Buffer Model is used by the Data Source to transfer a DDP 793 Message into a Tagged Buffer at the Data Sink that has been 794 previously Advertised to the Data Source. An STag identifies a 795 Tagged Buffer. For the Placement of a DDP Message using the Tagged 796 Buffer model, the STag is used to identify the buffer, and the TO is 797 used to identify the offset within the Tagged Buffer into which the 798 ULP Payload is transferred. The protocol used to Advertise the 799 Tagged Buffer is outside the scope of this specification (i.e. ULP 800 specific). A DDP Message can start at an arbitrary TO within a 801 Tagged Buffer. 803 Additionally, a Tagged Buffer can potentially be written multiple 804 times. This might be done for error recovery or because a buffer is 805 being re-used after some ULP specific synchronization mechanism. 807 7.1.2 Untagged Buffer Model 809 The Untagged Buffer Model is used by the Data Source to transfer a 810 DDP Message to the Data Sink into a queued buffer. 812 The DDP Queue Number is used by the ULP to separate ULP messages 813 into different queues of receive buffers. For example, if two queues 814 were supported, the ULP could use one queue to post buffers handed 815 to it by the application above the ULP, and it could use the other 816 queue for buffers which are only consumed by ULP specific control 817 messages. This enables the separation of ULP control messages from 818 opaque ULP Payload when using Untagged Buffers. 820 The DDP Message Sequence Number can be used by the Data Sink to 821 identify the specific Untagged Buffer. The protocol used to 822 communicate how many buffers have been queued is outside the scope 823 of this specification. Similarly, the exact implementation of the 824 buffer queue is outside the scope of this specification. 826 7.2 Segmentation and Reassembly of a DDP Message 828 At the Data Source, the DDP layer MUST segment the data contained in 829 a ULP message into a series of DDP Segments, where each DDP Segment 830 contains a DDP Header and ULP Payload, and MUST be no larger than 832 shah, et. al. Expires April 2004 19 833 the MULPDU value advertised by the LLP. The ULP Message Length MUST 834 be less than 2^32. At the Data Source, the DDP layer MUST send all 835 the data contained in the ULP message. At the Data Sink, the DDP 836 layer MUST Place the ULP Payload contained in all valid incoming DDP 837 Segments associated with a DDP Message into the ULP Buffer. 839 DDP Message segmentation at the Data Source is accomplished by 840 identifying a DDP Message (which corresponds one-to-one with a ULP 841 Message) uniquely and then, for each associated DDP Segment of a DDP 842 Message, by specifying an octet offset for the portion of the ULP 843 Message contained in the DDP Segment. 845 For an Untagged DDP Message, the combination of the QN and MSN 846 uniquely identifies a DDP Message. The octet offset for each DDP 847 Segment of a Untagged DDP Message is the MO field. For each DDP 848 Segment of a Untagged DDP Message, the MO MUST be set to the octet 849 offset from the first octet in the associated ULP Message (which is 850 defined to be zero) to the first octet in the ULP Payload contained 851 in the DDP Segment. 853 For example, if the ULP Untagged Message was 2048 octets, and the 854 MULPDU was 1500 octets, the Data Source would generate two DDP 855 Segments, one with MO = 0, containing 1482 octets of ULP Payload, 856 and a second with MO = 1482, containing 566 octets of ULP Payload. 857 In this example, the amount of ULP Payload for the first DDP Segment 858 was calculated as: 860 1482 = 1500 (MULPDU) - 18 (for the DDP Header) 862 For a Tagged DDP Message, the STag and TO, combined with the in- 863 order delivery characteristics of the LLP, are used to segment and 864 reassemble the ULP Message. Because the initial octet offset (the TO 865 field) can be non-zero, recovery of the original ULP Message 866 boundary cannot be done in the general case without an additional 867 ULP Message. 869 Implementers Note: One implementation, valid for some ULPs such 870 as RDMAP, is to not directly support recovery of the ULP 871 Message boundary for a Tagged DDP Message. For example, the ULP 872 may wish to have the Local Peer use small buffers at the Data 873 Source even when the ULP at the Data Sink has advertised a 874 single large Tagged Buffer for this data transfer. In this 875 case, the ULP may choose to use the same STag for multiple 876 consecutive ULP Messages. Thus a non-zero initial TO and re-use 877 of the STag effectively enables the ULP to implement 878 segmentation and reassembly due to ULP specific constraints. 879 See [RDMAP] for details of how this is done. 881 A different implementation of a ULP could use an Untagged DDP 882 Message sent after the Tagged DDP Message which details the 883 initial TO for the STag that was used in the Tagged DDP 884 Message. And finally, another implementation of a ULP could 885 choose to always use an initial TO of zero such that no 887 shah, et. al. Expires April 2004 20 888 additional message is required to convey the initial TO used in 889 a Tagged DDP Message. 891 Regardless of whether the ULP chooses to recover the original ULP 892 Message boundary at the Data Sink for a Tagged DDP Message, DDP 893 supports segmentation and reassembly of the Tagged DDP Message. The 894 STag is used to identify the ULP Buffer at the Data Sink and the TO 895 is used to identify the octet-offset within the ULP Buffer 896 referenced by the STag. The ULP at the Data Source MUST specify the 897 STag and the initial TO when the ULP Message is handed to DDP. 899 For each DDP Segment of a Tagged DDP Message, the TO MUST be set to 900 the octet offset from the first octet in the associated ULP Message 901 to the first octet in the ULP Payload contained in the DDP Segment, 902 plus the TO assigned to the first octet in the associated ULP 903 Message. 905 For example, if the ULP Tagged Message was 2048 octets with an 906 initial TO of 16384, and the MULPDU was 1500 octets, the Data Source 907 would generate two DDP Segments, one with TO = 16384, containing the 908 first 1486 octets of ULP payload, and a second with TO = 17870, 909 containing 562 octets of ULP payload. In this example, the amount of 910 ULP payload for the first DDP Segment was calculated as: 912 1486 = 1500 (MULPDU) - 14 (for the DDP Header) 914 A zero-length Tagged DDP Message is allowed and MUST consume exactly 915 one DDP Segment. Only the DDP Control and RsvdULP Fields MUST be 916 valid for a zero length Tagged DDP Segment. The STag and TO fields 917 MUST NOT be checked for a zero-length Tagged DDP Message. 919 For either Untagged or Tagged DDP Messages, the Data Sink is not 920 required to verify that the entire ULP Message has been received. 922 7.3 Ordering Among DDP Messages 924 Messages passed through the DDP MUST conform to the ordering rules 925 defined in this section. 927 At the Data Source, DDP: 929 * MUST transmit DDP Messages in the order they were submitted to 930 the DDP layer, 932 * SHOULD transmit DDP Segments within a DDP Message in increasing 933 MO order for Untagged DDP Messages and in increasing TO order 934 for Tagged DDP Messages. 936 At the Data Sink, DDP (Note: The following rules are motivated by 937 LLP implementations that separate Placement and Delivery.): 939 * MAY perform Placement of DDP Segments out of order, 941 shah, et. al. Expires April 2004 21 942 * MAY perform Placement of a DDP Segment more than once, 944 * MUST Deliver a DDP Message to the ULP at most once, 946 * MUST Deliver DDP Messages to the ULP in the order they were 947 sent by the Data Source. 949 7.4 DDP Message Completion & Delivery 951 At the Data Source, DDP Message transfer is considered completed 952 when the reliable, in-order transport LLP has indicated that the 953 transfer will occur reliably. Note that this in no way restricts the 954 LLP from buffering the data at either the Data Source or Data Sink. 955 Thus at the Data Source, completion of a DDP Message does not 956 necessarily mean that the Data Sink has received the message. 958 At the Data Sink, DDP MUST Deliver a DDP Message if and only if all 959 of the following are true: 961 * the last DDP Segment of the DDP Message had its Last flag set, 963 * all of the DDP Segments of the DDP Message have been Placed, 965 * all preceding DDP Messages have been Placed, and 967 * each preceding DDP Message has been Delivered to the ULP. 969 At the Data Sink, DDP MUST provide the ULP Message Length to the ULP 970 when an Untagged DDP Message is Delivered. The ULP Message Length 971 may be calculated by adding the MO and the ULP Payload length in the 972 last DDP Segment (with the Last flag set) of an Untagged DDP 973 Message. 975 At the Data Sink, DDP MUST provide the RsvdULP Field of the DDP 976 Message to the ULP when the DDP Message is delivered. 978 shah, et. al. Expires April 2004 22 979 8 DDP Stream Setup & Teardown 981 This section describes LLP independent issues related to DDP Stream 982 setup and teardown. 984 8.1 DDP Stream Setup 986 It is expected that the ULP will use a mechanism outside the scope 987 of this specification to establish an LLP Connection, and that the 988 LLP Connection will support one or more LLP Streams (e.g. MPA/TCP or 989 SCTP). After the LLP sets up the LLP Stream, it will enable a DDP 990 Stream on a specific LLP Stream at an appropriate point. 992 The ULP is required to enable both endpoints of an LLP Stream for 993 DDP data transfer at the same time, in both directions; this is 994 necessary so that the Data Sink can properly recognize the DDP 995 Segments. 997 8.2 DDP Stream Teardown 999 DDP MUST NOT independently initiate Stream Teardown. DDP either 1000 responds to a stream being torn down by the LLP or processes a 1001 request from the ULP to teardown a stream. DDP Stream teardown 1002 disables DDP capabilities on both endpoints. For connection-oriented 1003 LLPs, DDP Stream teardown MAY result in underlying LLP Connection 1004 teardown. 1006 8.2.1 DDP Graceful Teardown 1008 It is up to the ULP to ensure that DDP teardown happens on both 1009 endpoints of the DDP Stream at the same time; this is necessary so 1010 that the Data Sink stops trying to interpret the DDP Segments. 1012 If the Local Peer ULP indicates graceful teardown, the DDP layer on 1013 the Local Peer SHOULD ensure that all ULP data would be transferred 1014 before the underlying LLP Stream & Connection are torn down, and any 1015 further data transfer requests by the Local Peer ULP MUST return an 1016 error. 1018 If the DDP layer on the Local Peer receives a graceful teardown 1019 request from the LLP, any further data received after the request is 1020 considered an error and MUST cause the DDP Stream to be abortively 1021 torn down. 1023 If the Local Peer LLP supports a half-closed LLP Stream, on the 1024 receipt of a LLP graceful teardown request of the DDP Stream, DDP 1025 SHOULD indicate the half-closed state to the ULP, and continue to 1026 process outbound data transfer requests normally. Following this 1027 event, when the Local Peer ULP requests graceful teardown, DDP MUST 1028 indicate to the LLP that it SHOULD perform a graceful close of the 1029 other half of the LLP Stream. 1031 shah, et. al. Expires April 2004 23 1032 If the Local Peer LLP supports a half-closed LLP Stream, on the 1033 receipt of a ULP graceful half-close teardown request of the DDP 1034 Stream, DDP SHOULD keep data reception enabled on the other half of 1035 the LLP stream. 1037 8.2.2 DDP Abortive Teardown 1039 As previously mentioned, DDP does not independently terminate a DDP 1040 Stream. Thus any of the following fatal errors on a DDP Stream MUST 1041 cause DDP to indicate to the ULP that a fatal error has occurred: 1043 * Underlying LLP Connection or LLP Stream is lost. 1045 * Underlying LLP reports a catastrophic error. 1047 * DDP Header has one or more invalid fields. 1049 If the LLP indicates to the ULP that a fatal error has occurred, the 1050 DDP layer SHOULD report the error to the ULP (see Section 9.2, DDP 1051 Error Numbers) and complete all outstanding ULP requests with an 1052 error. If the underlying LLP Stream is still intact, DDP SHOULD 1053 continue to allow the ULP to transfer additional DDP Messages on the 1054 outgoing half connection after the fatal error was indicated to the 1055 ULP. This enables the ULP to transfer an error syndrome to the 1056 Remote Peer. After indicating to the ULP a fatal error has occurred, 1057 the DDP Stream MUST NOT be terminated until the Local Peer ULP 1058 indicates to the DDP layer that the DDP Stream should be abortively 1059 torndown. 1061 shah, et. al. Expires April 2004 24 1062 9 Error Semantics 1064 All LLP errors reported to DDP SHOULD be passed up to the ULP. 1066 9.1 Errors detected at the Data Sink 1068 For non-zero length Untagged DDP Segments, the DDP Segment MUST be 1069 validated before Placement by verifying: 1071 1. The QN is valid for this stream. 1073 2. The QN and MSN have an associated buffer that allows Placement 1074 of the payload. 1076 Implementers note: DDP implementations SHOULD consider lack of 1077 an associated buffer as a system fault. DDP implementations MAY 1078 try to recover from the system fault using LLP means in a ULP- 1079 transparent way. DDP implementations SHOULD NOT permit system 1080 faults to occur repeatedly or frequently. 1082 3. The MO falls in the range of legal offsets associated with the 1083 Untagged Buffer. 1085 4. The sum of the DDP Segment payload length and the MO falls in 1086 the range of legal offsets associated with the Untagged Buffer. 1088 5. For DDP Messages using Untagged Buffer model, the Message 1089 Sequence Number falls in the range of legal Message Sequence 1090 Numbers, for the queue defined by the QN. The legal range is 1091 defined as being between the MSN value assigned to the first 1092 available buffer for a specific QN and the MSN value assigned to 1093 the last available buffer for a specific QN. 1095 Implementers note: for a typical Queue Number, the lower limit 1096 of the Message Sequence Number is defined by whatever DDP 1097 Messages have already been Completed. The upper limit is 1098 defined by however many message buffers are currently available 1099 for that queue. Both numbers change dynamically as new DDP 1100 Messages are received and Completed, and new buffers are added. 1101 It is up to the ULP to ensure that sufficient buffers are 1102 available to handle the incoming DDP Segments. 1104 For non-zero length Tagged DDP Segments, the segment MUST be 1105 validated before Placement by verifying: 1107 1. The STag is valid for this stream. 1109 2. The STag has an associated buffer that allows Placement of the 1110 payload. 1112 3. The TO falls in the range of legal offsets registered for the 1113 STag. 1115 shah, et. al. Expires April 2004 25 1116 4. The sum of the DDP Segment payload length and the TO falls in 1117 the range of legal offsets registered for the STag. 1119 5. A 64-bit unsigned sum of the DDP Segment payload length and the 1120 TO does not wrap. 1122 If the DDP layer detects any of the receive errors listed in this 1123 section, it MUST cease placing the remainder of the DDP Segment and 1124 report the error(s) to the ULP. The DDP layer SHOULD include in the 1125 error report the DDP Header, the type of error, and the length of 1126 the DDP segment, if available. DDP MUST silently drop any subsequent 1127 incoming DDP Segments. Since each of these errors represents a 1128 failure of the sending ULP or protocol, DDP SHOULD enable the ULP to 1129 send one additional DDP Message before terminating the DDP Stream. 1131 9.2 DDP Error Numbers 1133 The following error numbers MUST be used when reporting receive 1134 errors to the ULP. They correspond to the checks enumerated in 1135 section 9.1. Each error is subdivided into a 4-bit Error Type and an 1136 8 bit Error Code. 1138 Error Error 1139 Type Code Description 1140 ---------------------------------------------------------- 1141 0x0 0x00 Local Catastrophic 1143 0x1 Tagged Buffer Error 1144 0x00 Invalid STag 1145 0x01 Base or bounds violation 1146 0x02 STag not associated with DDP Stream 1147 0x03 TO wrap 1148 0x04 Invalid DDP version 1150 0x2 Untagged Buffer Error 1151 0x01 Invalid QN 1152 0x02 Invalid MSN - no buffer available 1153 0x03 Invalid MSN - MSN range is not valid 1154 0x04 Invalid MO 1155 0x05 DDP Message too long for available buffer 1156 0x06 Invalid DDP version 1158 0x3 Rsvd Reserved for the use by the LLP 1160 shah, et. al. Expires April 2004 26 1161 10 Security Considerations 1163 This section discusses both protocol-specific considerations and the 1164 implications of using DDP with existing security mechanisms. A more 1165 detailed analysis of the security issues around the implementation 1166 and the use of the DDP can be found in [RDMASEC]. 1168 10.1 Protocol-specific Security Considerations 1170 The vulnerabilities of DDP to active third-party interference are no 1171 greater than any other protocol running over TCP. A third party, by 1172 injecting spoofed packets into the network that are Delivered to a 1173 DDP Data Sink, could launch a variety of attacks that exploit DDP- 1174 specific behavior. Since DDP directly or indirectly exposes memory 1175 addresses on the wire, the Placement information carried in each DDP 1176 Segment must be validated, including invalid STag and octet level 1177 granularity base and bounds check, before any data is Placed. For 1178 example, a third-party adversary could inject random packets that 1179 appear to be valid DDP Segments and corrupt the memory on a DDP Data 1180 Sink. Since DDP is IP transport protocol independent, communication 1181 security mechanisms such as IPsec [IPSEC] or TLS [TLS] may be used 1182 to prevent such attacks. 1184 10.2 Using IPSec with DDP 1186 IPsec can be used to protect against the packet injection attacks 1187 outlined above. Because IPsec is designed to secure arbitrary IP 1188 packet streams, including streams where packets are lost, DDP can 1189 run on top of IPsec without any change. IPsec packets are processed 1190 (e.g., integrity checked and possibly decrypted) in the order they 1191 are received, and a DDP Data Sink will process the decrypted DDP 1192 Segments contained in these packets in the same manner as DDP 1193 Segments contained in unsecured IP packets. 1195 10.3 Association of an STag and a DDP Stream 1197 There are several mechanisms for associating an STag and a DDP 1198 Stream. Two reasonable mechanisms for this association are a 1199 Protection Domain (PD) association and a DDP Stream association. 1201 Under the Protection Domain (PD) association, a unique Protection 1202 Domain Identifier (PD ID) is created and used locally to associate 1203 an STag with a set of DDP Streams. Under this mechanism, the use of 1204 the STag is only permitted on the DDP Streams that have the same PD 1205 ID as the STag. For an incoming DDP Segment of a Tagged DDP Message 1206 on a DDP Stream, if the PD ID of the DDP Stream is not the same as 1207 the PD ID of the STag targeted by the Tagged DDP Message, then the 1208 DDP Segment is not placed and the DDP layer MUST surface a local 1209 error to the ULP. Note that the PD ID is locally defined, and cannot 1210 be directly manipulated by the Remote Peer. 1212 Under the DDP Stream association, a DDP Stream is identified locally 1213 by a unique DDP Stream identifier (ID). An STag is associated with a 1215 shah, et. al. Expires April 2004 27 1216 DDP Stream by using a DDP Stream ID. In this case, for an incoming 1217 DDP Segment of a Tagged DDP Message on a DDP Stream, if the DDP 1218 Stream ID of the DDP Stream is not the same as the DDP Stream ID of 1219 the STag targeted by the Tagged DDP Message, then the DDP Segment is 1220 not placed and the DDP layer MUST surface a local error to the ULP. 1221 Note that the DDP Stream ID is locally defined, and cannot be 1222 directly manipulated by the Remote Peer. 1224 A ULP SHOULD associate an STag and a DDP Stream. DDP MUST support 1225 Protection Domain association and DDP Stream association mechanisms 1226 for associating an STag and a DDP Stream. 1228 10.4 Other Security Considerations 1230 DDP has several mechanisms that deal with a number of attacks. 1231 These attacks include, but are not limited to: 1233 1. Connection to/from an unauthorized or unauthenticated endpoint. 1234 2. Hijacking of a DDP Stream. 1235 3. Attempts to read or write from unauthorized memory regions. 1236 4. Injection of RDMA Messages within a Stream on a multi-user 1237 operating system by another application. 1239 DDP relies on the LLP to establish the LLP Stream over which DDP 1240 Messages will be carried. DDP itself does nothing to authenticate 1241 the validity of the LLP Stream of either of the endpoints. It is the 1242 responsibility of the ULP to validate the LLP Stream. This is highly 1243 desirable due to the nature of DDP. 1245 Hijacking of an DDP Stream would require that the underlying LLP 1246 Stream is hijacked. This would require knowledge of Advertised 1247 buffers in order to directly Place data into a user buffer and is 1248 therefore constrained by the same techniques mentioned to guard 1249 against attempts to read or write from unauthorized memory regions. 1251 DDP does not require a node to open its buffers to arbitrary attacks 1252 over the DDP Stream. It may access ULP memory only to the extent 1253 that the ULP has enabled and authorized it to do so. The STag 1254 access control model is defined by a (forthcoming) document. 1255 Specific security operations include: 1257 1. STags are only valid over the exact byte range established by the 1258 ULP. DDP MUST provide a mechanism for the ULP to establish and 1259 revoke the TO range associated with the ULP Buffer referenced by 1260 the STag. 1261 2. STags are only valid for the duration established by the ULP. The 1262 ULP may revoke them at any time, in accordance with its own upper 1263 layer protocol requirements. DDP MUST provide a mechanism for the 1264 ULP to establish and revoke STag validity. 1266 shah, et. al. Expires April 2004 28 1267 3. DDP MUST provide a mechanism for the ULP to communicate the 1268 association between a STag and a specific DDP Stream. 1269 4. A ULP may only expose memory to remote access to the extent that 1270 it already had access to that memory itself. 1271 5. If an STag is not valid on a DDP Stream, DDP MUST pass the invalid 1272 access attempt to the ULP. The ULP may provide a mechanism for 1273 terminating the DDP Stream. 1275 Further, DDP provides a mechanism that directly Places incoming 1276 payloads in user-mode ULP Buffers. This avoids the risks of prior 1277 solutions that relied upon exposing system buffers for incoming 1278 payloads. 1280 shah, et. al. Expires April 2004 29 1281 11 IANA Considerations 1283 If DDP was enabled a priori for a ULP by connecting to a well-known 1284 port, this well-known port would be registered for the DDP with 1285 IANA. 1287 shah, et. al. Expires April 2004 30 1288 12 References 1290 12.1 Normative References 1292 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 1293 3", BCP 9, RFC 2026, October 1996. 1295 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1296 Requirement Levels", BCP 14, RFC 2119, March 1997. 1298 [MPA] P. Culley et al., "Markers with PDU Alignment", RDMA 1299 Consortium Draft Specification draft-cully-iwarp-mpa-01.doc, 1300 October 2002 1302 [RDMAP] R. Recio et al., "RDMA Protocol Specification", RDMA 1303 Consortium Draft Specification draft-recio-iwarp-01, October 1304 2002 1306 [SCTP] R. Stewart et al., "Stream Control Transmission Protocol", 1307 RFC 2960, October 2000. 1309 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1310 September 1981. 1312 12.2 Informative References 1314 [TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC 1315 2246, November 1998. 1317 [IPSEC] Atkinson, R., Kent, S., "Security Architecture for the 1318 Internet Protocol", RFC 2401, November 1998. 1320 [RDMASEC] J. Pinkerton et al., "DDP/RDMAP Security", draft- 1321 pinkerton-rddp-security-00.txt, June 2003. 1323 shah, et. al. Expires April 2004 31 1324 13 Appendix 1326 13.1 Receive Window sizing 1328 Reliable, sequenced, LLPs include a mechanism to advertise the 1329 amount of receive buffer space a sender may consume. This is 1330 generally called a "receive window". 1332 DDP allows data to be transferred directly to predefined buffers at 1333 the Data Sink. Accordingly, the LLP receive window size need not be 1334 affected by the reception of a DDP Segment, if that segment is 1335 placed before additional segments arrive. 1337 The LLP implementation SHOULD maintain an advertised receive window 1338 large enough to enable a reasonable number of segments to be 1339 outstanding at one time. The amount to advertise depends on the 1340 desired data rate, and the expected or actual round trip delay 1341 between endpoints. 1343 The amount of actual buffers maintained to "back up" the receive 1344 window is left up to the implementation. This amount will depend on 1345 the rate that DDP Segments can be retired; there may be some cases 1346 where segment processing cannot keep up with the incoming packet 1347 rate. If this occurs, one reasonable way to slow the incoming packet 1348 rate is to reduce the receive window. 1350 Note that the LLP should take care to comply with the applicable 1351 RFCs; for instance, for TCP, receivers are highly discouraged from 1352 "shrinking" the receive window (reducing the right edge of the 1353 window after it has been advertised). 1355 shah, et. al. Expires April 2004 32 1356 14 Author's Addresses 1358 Hemal Shah 1359 Intel Corporation 1360 MS AN1-PTL1 1361 1501 South Mopac Expressway, #400 1362 Austin, TX 78746 USA 1363 Phone: +1 (512) 732-3963 1364 Email: hemal.shah@intel.com 1366 James Pinkerton 1367 Microsoft Corporation 1368 One Microsoft Way 1369 Redmond, WA 98052 USA 1370 Phone: +1 (425) 705-5442 1371 Email: jpink@microsoft.com 1373 Renato Recio 1374 IBM Corporation 1375 11501 Burnett Road 1376 Austin, TX 78758 USA 1377 Phone: +1 (512) 838-1365 1378 Email: recio@us.ibm.com 1380 Paul R. Culley 1381 Hewlett-Packard Company 1382 20555 SH 249 1383 Houston, TX 77070-2698 USA 1384 Phone: +1 (281) 514-5543 1385 Email: paul.culley@hp.com 1387 shah, et. al. Expires April 2004 33 1388 15 Acknowledgments 1390 John Carrier 1391 Adaptec, Inc. 1392 691 S. Milpitas Blvd. 1393 Milpitas, CA 95035 USA 1394 Phone: +1 (360) 378-8526 1395 Email: john_carrier@adaptec.com 1397 Hari Ghadia 1398 Adaptec, Inc. 1399 691 S. Milpitas Blvd., 1400 Milpitas, CA 95035 USA 1401 Phone: +1 (408) 957-5608 1402 Email: hari_ghadia@adaptec.com 1404 Patricia Thaler 1405 Agilent Technologies, Inc. 1406 1101 Creekside Ridge Drive, #100 1407 M/S-RG10 1408 Roseville, CA 95678 1409 Phone: +1-916-788-5662 1410 email: pat_thaler@agilent.com 1412 Mike Penna 1413 Broadcom Corporation 1414 16215 Alton Parkway 1415 Irvine, California 92619-7013 USA 1416 Phone: +1 (949) 926-7149 1417 Email: MPenna@Broadcom.com 1419 Uri Elzur 1420 Broadcom Corporation 1421 16215 Alton Parkway 1422 Irvine, California 92619-7013 USA 1423 Phone: +1 (949) 585-6432 1424 Email: Uri@Broadcom.com 1426 Ted Compton 1427 EMC Corporation 1428 Research Triangle Park, NC 27709, USA 1429 Phone: 919-248-6075 1430 Email: compton_ted@emc.com 1432 Jim Wendt 1433 Hewlett-Packard Company 1434 8000 Foothills Boulevard 1435 Roseville, CA 95747-5668 USA 1436 Phone: +1 (916) 785-5198 1437 Email: jim_wendt@hp.com 1439 Mike Krause 1440 Hewlett-Packard Company, 43LN 1442 shah, et. al. Expires April 2004 34 1443 19410 Homestead Road 1444 Cupertino, CA 95014 USA 1445 Phone: +1 (408) 447-3191 1446 Email: krause@cup.hp.com 1448 Dave Minturn 1449 Intel Corporation 1450 MS JF1-210 1451 5200 North East Elam Young Parkway 1452 Hillsboro, OR 97124 USA 1453 Phone: +1 (503) 712-4106 1454 Email: dave.b.minturn@intel.com 1456 Howard C. Herbert 1457 Intel Corporation 1458 MS CH7-404 1459 5000 West Chandler Blvd. 1460 Chandler, AZ 85226 USA 1461 Phone: +1 (480) 554-3116 1462 Email: howard.c.herbert@intel.com 1464 Tom Talpey 1465 Network Appliance 1466 375 Totten Pond Road 1467 Waltham, MA 02451 USA 1468 Phone: +1 (781) 768-5329 1469 EMail: thomas.talpey@netapp.com 1471 Dwight Barron 1472 Hewlett-Packard Company 1473 20555 SH 249 1474 Houston, TX 77070-2698 USA 1475 Phone: +1 (281) 514-2769 1476 Email: Dwight.Barron@Hp.com 1478 Dave Garcia 1479 Hewlett-Packard Company 1480 19333 Vallco Parkway 1481 Cupertino, Ca. 95014 USA 1482 Phone: +1 (408) 285-6116 1483 Email: dave.garcia@hp.com 1485 Jeff Hilland 1486 Hewlett-Packard Company 1487 20555 SH 249 1488 Houston, Tx. 77070-2698 USA 1489 Phone: +1 (281) 514-9489 1490 Email: jeff.hilland@hp.com 1492 shah, et. al. Expires April 2004 35 1493 16 Full Copyright Statement 1495 This document and the information contained herein is provided on an 1496 "AS IS" basis and ADAPTEC INC., AGILENT TECHNOLOGIES INC., BROADCOM 1497 CORPORATION, CISCO SYSTEMS INC., EMC CORPORATION, HEWLETT-PACKARD 1498 COMPANY, INTERNATIONAL BUSINESS MACHINES CORPORATION, INTEL 1499 CORPORATION, MICROSOFT CORPORATION, NETWORK APPLIANCE INC., THE 1500 INTERNET SOCIETY, AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM 1501 ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY 1502 WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE 1503 ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS 1504 FOR A PARTICULAR PURPOSE. 1506 Copyright (c) 2002 ADAPTEC INC., BROADCOM CORPORATION, CISCO SYSTEMS 1507 INC., EMC CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONAL 1508 BUSINESS MACHINES CORPORATION, INTEL CORPORATION, MICROSOFT 1509 CORPORATION, NETWORK APPLIANCE INC., All Rights Reserved. 1511 shah, et. al. Expires April 2004 36