idnits 2.17.1 draft-ietf-rddp-ddp-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 4 instances of lines with non-ascii characters in the document. == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 2306 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RDMA' is mentioned on line 266, but not defined == Missing Reference: 'DDP' is mentioned on line 359, but not defined == Missing Reference: 'RFC 3723' is mentioned on line 1382, but not defined == Unused Reference: 'RFC2026' is defined on line 1407, but no explicit reference was found in the text == Unused Reference: 'RFC2119' is defined on line 1410, but no explicit reference was found in the text == Unused Reference: 'RFC3723' is defined on line 1413, but no explicit reference was found in the text == Outdated reference: A later version (-08) exists of draft-ietf-rddp-mpa-01 == Outdated reference: A later version (-07) exists of draft-ietf-rddp-rdmap-01 ** Obsolete normative reference: RFC 2960 (ref. 'SCTP') (Obsoleted by RFC 4960) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2246 (ref. 'TLS') (Obsoleted by RFC 4346) -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) == Outdated reference: A later version (-10) exists of draft-ietf-rddp-security-05 Summary: 3 errors (**), 0 flaws (~~), 12 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Remote Direct Data Placement Work Group Hemal Shah 2 INTERNET-DRAFT Intel Corporation 3 Category: Standards Track James Pinkerton 4 draft-ietf-rddp-ddp-04.txt Microsoft Corporation 5 Renato Recio 6 IBM Corporation 7 Paul Culley 8 Hewlett-Packard Company 10 Expires: August, 2005 February, 2005 12 Direct Data Placement over Reliable Transports 14 Status of this Memo 16 By submitting this Internet-Draft, I certify that any applicable 17 patent or other IPR claims of which I am aware of have been 18 disclosed, and any of which I become aware will be disclosed, in 19 accordance with RFC 3668. 21 By submitting this Internet-Draft, I accept the provisions of 22 Section 4 of RFC 3667. 24 This document is an Internet-Draft and is subject to all provisions 25 of Section 10 of RFC2026. 27 Internet-Drafts are working documents of the Internet Engineering 28 Task Force (IETF), its areas, and its working groups. Note that 29 other groups may also distribute working documents as Internet- 30 Drafts. 32 Internet-Drafts are draft documents valid for a maximum of six 33 months and may be updated, replaced, or obsoleted by other documents 34 at any time. It is inappropriate to use Internet-Drafts as 35 reference material or to cite them other than as "work in progress." 37 The list of current Internet-Drafts can be accessed at 38 http://www.ietf.org/1id-abstracts.html The list of Internet-Draft 39 Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html. 42 Abstract 44 The Direct Data Placement protocol provides information to Place the 45 incoming data directly into an upper layer protocol's receive buffer 46 without intermediate buffers. This removes excess CPU and memory 47 utilization associated with transferring data through the 48 intermediate buffers. 50 Shah, et. al. Expires August 2005 1 51 Table of Contents 53 Status of this Memo...............................................1 54 Abstract..........................................................1 55 1 Introduction................................................4 56 1.1 Architectural Goals.........................................4 57 1.2 Protocol Overview...........................................5 58 1.3 DDP Layering................................................6 59 2 Glossary....................................................9 60 2.1 General.....................................................9 61 2.2 LLP........................................................10 62 2.3 Direct Data Placement (DDP)................................10 63 3 Reliable Delivery LLP Requirements.........................13 64 4 Header Format..............................................15 65 4.1 DDP Control Field..........................................15 66 4.2 DDP Tagged Buffer Model Header.............................16 67 4.3 DDP Untagged Buffer Model Header...........................17 68 4.4 DDP Segment Format.........................................18 69 5 Data Transfer..............................................19 70 5.1 DDP Tagged or Untagged Buffer Models.......................19 71 5.1.1 Tagged Buffer Model.......................................19 72 5.1.2 Untagged Buffer Model.....................................19 73 5.2 Segmentation and Reassembly of a DDP Message...............19 74 5.3 Ordering Among DDP Messages................................21 75 5.4 DDP Message Completion & Delivery..........................22 76 6 DDP Stream Setup & Teardown................................23 77 6.1 DDP Stream Setup...........................................23 78 6.2 DDP Stream Teardown........................................23 79 6.2.1 DDP Graceful Teardown.....................................23 80 6.2.2 DDP Abortive Teardown.....................................24 81 7 Error Semantics............................................25 82 7.1 Errors detected at the Data Sink...........................25 83 7.2 DDP Error Numbers..........................................26 84 8 Security Considerations....................................27 85 8.1 Protocol-specific Security Considerations..................27 86 8.2 Association of an STag and a DDP Stream....................27 87 8.3 Security Requirements......................................28 88 8.3.1 RNIC Requirements.........................................29 89 8.3.2 Privileged Resources Manager Requirement..................29 90 8.4 Security Services for DDP..................................30 91 9 IANA Considerations........................................32 92 10 References.................................................33 93 10.1 Normative References......................................33 94 10.2 Informative References....................................33 95 11 Appendix...................................................34 96 11.1 Receive Window sizing.....................................34 97 12 Author's Addresses.........................................35 98 13 Acknowledgments............................................36 99 14 Full Copyright Statement...................................39 101 Shah, et. al. Expires August 2005 2 102 Table of Figures 104 Figure 1 DDP Layering.............................................7 105 Figure 2 MPA, DDP, and RDMAP Header Alignment.....................8 106 Figure 3 DDP Control Field.......................................15 107 Figure 4 Tagged Buffer DDP Header................................16 108 Figure 5 Untagged Buffer DDP Header..............................17 109 Figure 6 DDP Segment Format......................................18 111 Shah, et. al. Expires August 2005 3 112 1 Introduction 114 Direct Data Placement Protocol (DDP) enables an Upper Layer Protocol 115 (ULP) to send data to a Data Sink without requiring the Data Sink to 116 Place the data in an intermediate buffer - thus when the data 117 arrives at the Data Sink, the network interface can Place the data 118 directly into the ULP's buffer. This can enable the Data Sink to 119 consume substantially less memory bandwidth than a buffered model 120 because the Data Sink is not required to move the data from the 121 intermediate buffer to the final destination. Additionally, this can 122 also enable the network protocol to consume substantially fewer CPU 123 cycles than if the CPU was used to move the data, and removes the 124 bandwidth limitation of only being able to move data as fast as the 125 CPU can copy the data. 127 DDP preserves ULP record boundaries (messages) while providing a 128 variety of data transfer mechanisms and completion mechanisms to be 129 used to transfer ULP messages. 131 1.1 Architectural Goals 133 DDP has been designed with the following high-level architectural 134 goals: 136 * Provide a buffer model that enables the Local Peer to Advertise 137 a named buffer (i.e. a Tag for a buffer) to the Remote Peer, 138 such that across the network the Remote Peer can Place data 139 into the buffer at Remote Peer specified locations. This is 140 referred to as the Tagged Buffer Model. 142 * Provide a second receive buffer model which preserves ULP 143 message boundaries from the Remote Peer and keeps the Local 144 Peer's buffers anonymous (i.e. Untagged). This is referred to 145 as the Untagged Buffer Model. 147 * Provide reliable, in-order Delivery semantics for both Tagged 148 and Untagged Buffer Models. 150 * Provide segmentation and reassembly of ULP messages. 152 * Enable the ULP buffer to be used as a reassembly buffer, 153 without a need for a copy, even if incoming DDP Segments arrive 154 out of order. This requires the protocol to separate Data 155 Placement of ULP Payload contained in an incoming DDP Segment 156 from Data Delivery of completed ULP Messages. 158 * If the LLP supports multiple LLP streams within a LLP 159 Connection, provide the above capabilities independently on 160 each LLP stream and enable the capability to be exported on a 161 per LLP stream basis to the ULP. 163 Shah, et. al. Expires August 2005 4 164 1.2 Protocol Overview 166 DDP supports two basic data transfer models - a Tagged Buffer data 167 transfer model and an Untagged Buffer data transfer model. 169 The Tagged Buffer data transfer model requires the Data Sink to send 170 the Data Source an identifier for the ULP buffer, referred to as a 171 Steering Tag (STag). The STag is transferred to the Data Source 172 using a ULP defined method. Once the Data Source ULP has an STag for 173 a destination ULP buffer, it can request that DDP send the ULP data 174 to the destination ULP buffer by specifying the STag to DDP. Note 175 that the Tagged Buffer does not have to be filled starting at the 176 beginning of the ULP buffer. The ULP Data Source can provide an 177 arbitrary offset into the ULP buffer. 179 The Untagged Buffer data transfer model enables data transfer to 180 occur without requiring the Data Sink to Advertise a ULP Buffer to 181 the Data Source. The Data Sink can queue up a series of receive ULP 182 buffers. An Untagged DDP Message from the Data Source consumes an 183 Untagged Buffer at the Data Sink. Because DDP is message oriented, 184 even if the Data Source sends a DDP Message payload smaller than the 185 receive ULP buffer, the partially filled receive ULP buffer is 186 Delivered to the ULP anyway. If the Data Source sends a DDP Message 187 payload larger than the receive ULP buffer, it results in an error. 189 There are several key differences between the Tagged and Untagged 190 Buffer Model: 192 * For the Tagged Buffer Model, the Data Source specifies which 193 received Tagged Buffer will be used for a specific Tagged DDP 194 Message (sender-based ULP buffer management). For the Untagged 195 Buffer Model, the Data Sink specifies the order in which 196 Untagged Buffers will be consumed as Untagged DDP Messages are 197 received (receiver-based ULP buffer management). 199 * For the Tagged Buffer Model, the ULP at the Data Sink must 200 Advertise the ULP buffer to the Data Source through a ULP 201 specific mechanism before data transfer can occur. For the 202 Untagged Buffer Model, data transfer can occur without an end- 203 to-end explicit ULP buffer Advertisement. Note, however, that 204 the ULP needs to address flow control issues. 206 * For the Tagged Buffer Model, a DDP Message can start at an 207 arbitrary offset within the Tagged Buffer. For the Untagged 208 Buffer Model, a DDP Message can only start at offset 0. 210 * The Tagged Buffer Model allows multiple DDP Messages targeted 211 to a Tagged Buffer with a single ULP buffer Advertisement. The 212 Untagged Buffer Model requires associating a receive ULP buffer 213 for each DDP Message targeted to an Untagged Buffer. 215 Either data transfer model Places a ULP Message into a DDP Message. 216 Each DDP Message is then sliced into DDP Segments that are intended 218 Shah, et. al. Expires August 2005 5 219 to fit within a lower-layer-protocol's (LLP) Maximum Upper Layer 220 Protocol Data Unit (MULPDU). Thus the ULP can post arbitrary size 221 ULP Messages, containing up to 2^32 - 1 octets of ULP Payload, and 222 DDP slices the ULP message into DDP Segments which are reassembled 223 transparently at the Data Sink. 225 DDP provides in-order Delivery for the ULP. However, DDP 226 differentiates between Data Delivery and Data Placement. DDP 227 provides enough information in each DDP Segment to allow the ULP 228 Payload in each inbound DDP Segment payloads to be directly Placed 229 into the correct ULP Buffer, even when the DDP Segments arrive out- 230 of-order. Thus, DDP enables the reassembly of ULP Payload contained 231 in DDP Segments of a DDP Message into a ULP Message to occur within 232 the ULP Buffer, therefore eliminating the traditional copy out of 233 the reassembly buffer into the ULP Buffer. 235 A DDP Message's payload is Delivered to the ULP when: 237 * all DDP Segments of a DDP Message have been completely received 238 and the payload of the DDP Message has been Placed into the 239 associated ULP Buffer, 241 * all prior DDP Messages have been Placed, and 243 * all prior DDP Message Deliveries have been performed. 245 The LLP under DDP may support a single LLP stream of data per 246 connection (e.g. TCP) or multiple LLP streams of data per connection 247 (e.g. SCTP). But in either case, DDP is specified such that each DDP 248 Stream is independent and maps to a single LLP stream. Within a 249 specific DDP Stream, the LLP Stream is required to provide in-order, 250 reliable Delivery. Note that DDP has no ordering guarantees between 251 DDP Streams. 253 A DDP protocol could potentially run over reliable Delivery LLPs or 254 unreliable Delivery LLPs. This specification requires reliable, in 255 order Delivery LLPs. 257 1.3 DDP Layering 259 DDP is intended to be LLP independent, subject to the requirements 260 defined in section 3. However, DDP was specifically defined to be 261 part of a family of protocols that were created to work well 262 together, as shown in Figure 1 DDP Layering. For LLP protocol 263 definitions of each LLP, see [MPA], [TCP], and [SCTP]. 265 DDP enables direct data Placement capability for any ULP, but it has 266 been specifically designed to work well with RDMAP (see [RDMA]), and 267 is part of the iWARP protocol suite. 269 Shah, et. al. Expires August 2005 6 270 +-------------------+ 271 | | 272 | RDMA ULP | 273 | | 274 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 275 | | | 276 | ULP | RDMAP | 277 | | | 278 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 279 | | 280 | DDP protocol | 281 | | 282 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 283 | | | 284 | MPA | | 285 | | | 286 | | | 287 +-+-+-+-+-+-+-+-+-+ SCTP | 288 | | | 289 | TCP | | 290 | | | 291 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 293 Figure 1 DDP Layering 295 If DDP is layered below RDMAP and on top of MPA and TCP, then the 296 respective headers and payload are arranged as follows (Note: For 297 clarity, MPA header and CRC are included but framing markers are not 298 shown.): 300 Shah, et. al. Expires August 2005 7 301 0 1 2 3 302 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 303 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 304 | | 305 // TCP Header // 306 | | 307 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 308 | MPA Header | | 309 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + 310 | | 311 // DDP Header // 312 | | 313 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 314 | | 315 // RDMAP Header // 316 | | 317 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 318 | | 319 // RDMAP ULP Payload // 320 | | 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 322 | MPA CRC | 323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 325 Figure 2 MPA, DDP, and RDMAP Header Alignment 327 Shah, et. al. Expires August 2005 8 328 2 Glossary 330 2.1 General 332 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 333 The act of informing a Remote Peer that a local RDMA Buffer is 334 available to it. A Node makes available an RDMA Buffer for 335 incoming RDMA Read or RDMA Write access by informing its 336 RDMA/DDP peer of the Tagged Buffer identifiers (STag, base 337 address, length). This advertisement of Tagged Buffer 338 information is not defined by RDMA/DDP and is left to the ULP. A 339 typical method would be for the Local Peer to embed the Tagged 340 Buffer's Steering Tag, address, and length in a Send message 341 destined for the Remote Peer. 343 Data Delivery (Delivery, Delivered, Delivers) - Delivery is defined 344 as the process of informing the ULP or consumer that a 345 particular message is available for use. This is specifically 346 different from "Placement", which may generally occur in any 347 order, while the order of "Delivery" is strictly defined. See 348 "Data Placement". 350 Data Sink - The peer receiving a data payload. Note that the Data 351 Sink can be required to both send and receive RDMA/DDP Messages 352 to transfer a data payload. 354 Data Source - The peer sending a data payload. Note that the Data 355 Source can be required to both send and receive RDMA/DDP 356 Messages to transfer a data payload. 358 iWARP - A suite of wire protocols comprised of RDMAP [RDMAP], DDP 359 [DDP], and MPA [MPA]. The iWARP protocol suite may be layered 360 above TCP, SCTP, or other transport protocols. 362 Local Peer - The RDMA/DDP protocol implementation on the local end 363 of the connection. Used to refer to the local entity when 364 describing a protocol exchange or other interaction between two 365 Nodes. 367 Node - A computing device attached to one or more links of a 368 network. A Node in this context does not refer to a specific 369 application or protocol instantiation running on the computer. A 370 Node may consist of one or more RNICs installed in a host 371 computer. 373 Remote Peer - The RDMA/DDP protocol implementation on the opposite 374 end of the connection. Used to refer to the remote entity when 375 describing protocol exchanges or other interactions between two 376 Nodes. 378 RNIC - RDMA Enabled Network Interface Controller. In this context, 379 this would be a network I/O adapter or embedded controller with 380 iWARP functionality. 382 Shah, et. al. Expires August 2005 9 383 ULP - Upper Layer Protocol. The protocol layer above the protocol 384 layer currently being referenced. The ULP for RDMA/DDP is 385 expected to be an OS, application, adaptation layer, or 386 proprietary device. The RDMA/DDP documents do not specify a ULP 387 - they provide a set of semantics that allow a ULP to be 388 designed to utilize RDMA/DDP. 390 ULP Message - The ULP data that is handed to a specific protocol 391 layer for transmission. Data boundaries are preserved as they 392 are transmitted through iWARP. 394 ULP Payload - The ULP data that is contained within a single 395 protocol segment or packet (e.g. a DDP Segment). 397 2.2 LLP 399 LLP - Lower Layer Protocol. The protocol layer beneath the protocol 400 layer currently being referenced. For example, for DDP the LLP 401 is SCTP, MPA, or other transport protocols. For RDMA, the LLP is 402 DDP. 404 LLP Connection - Corresponds to an LLP transport-level connection 405 between the peer LLP layers on two nodes. 407 LLP Stream - Corresponds to a single LLP transport-level stream 408 between the peer LLP layers on two Nodes. One or more LLP 409 Streams may map to a single transport-level LLP Connection. For 410 transport protocols that support multiple streams per connection 411 (e.g. SCTP), a LLP Stream corresponds to one transport-level 412 stream. 414 MULPDU - Maximum Upper Layer Protocol Data Unit. The current maximum 415 size of the record that is acceptable for DDP to pass to the LLP 416 for transmission. 418 2.3 Direct Data Placement (DDP) 420 DDP Graceful Teardown - The act of closing a DDP Stream such that 421 all in-progress and pending DDP Messages are allowed to complete 422 successfully. 424 DDP Abortive Teardown - The act of closing a DDP Stream without 425 attempting to complete in-progress and pending DDP Messages. 427 Data Placement (Placement, Placed, Places) - For DDP, this term is 428 specifically used to indicate the process of writing to a data 429 buffer by a DDP implementation. DDP Segments carry Placement 430 information, which may be used by the receiving DDP 431 implementation to perform Data Placement of the DDP Segment ULP 432 Payload. See "Data Delivery" and �Direct Data Placement�. 434 DDP Control Field - A fixed 8-bit field in the DDP Header. 436 Shah, et. al. Expires August 2005 10 437 DDP Header - The header present in all DDP Segments. The DDP Header 438 contains control and Placement fields that are used to define 439 the final Placement location for the ULP Payload carried in a 440 DDP Segment. 442 DDP Message - A ULP defined unit of data interchange, which is 443 subdivided into one or more DDP Segments. This segmentation may 444 occur for a variety of reasons, including segmentation to 445 respect the maximum segment size of the underlying transport 446 protocol. 448 DDP Segment - The smallest unit of data transfer for the DDP 449 protocol. It includes a DDP Header and ULP Payload (if present). 450 A DDP Segment should be sized to fit within the Lower Layer 451 Protocol MULPDU. 453 DDP Stream - a sequence of DDP messages whose ordering is defined by 454 the LLP. For SCTP, a DDP Stream maps directly to an SCTP stream. 455 For MPA, a DDP Stream maps directly to a TCP connection and a 456 single DDP Stream is supported. Note that DDP has no ordering 457 guarantees between DDP Streams. 459 DDP Stream Identifier (ID) � An identifier for a DDP Stream. 461 Direct Data Placement - A mechanism whereby ULP data contained 462 within DDP Segments may be Placed directly into its final 463 destination in memory without processing of the ULP. This may 464 occur even when the DDP Segments arrive out of order. Out of 465 order Placement support may require the Data Sink to implement 466 the LLP and DDP as one functional block. 468 Direct Data Placement Protocol (DDP) - Also, a wire protocol that 469 supports Direct Data Placement by associating explicit memory 470 buffer placement information with the LLP payload units. 472 Message Offset (MO) - For the DDP Untagged Buffer Model, specifies 473 the offset, in octets, from the start of a DDP Message. 475 Message Sequence Number (MSN) - For the DDP Untagged Buffer Model, 476 specifies a sequence number that is increasing with each DDP 477 Message. 479 Protection Domain (PD) � A Mechanism used to associate a DDP Stream 480 and an STag. Under this mechanism, the use of an STag is valid 481 on a DDP Stream if the STag has the same Protection Domain 482 Identifier (PD ID) as the DDP Stream. 484 Protection Domain Identifier (PD ID) � An identifier for the 485 Protection Domain. 487 Queue Number (QN) - For the DDP Untagged Buffer Model, identifies a 488 destination Data Sink queue for a DDP Segment. 490 Shah, et. al. Expires August 2005 11 491 Steering Tag - An identifier of a Tagged Buffer on a Node, valid as 492 defined within a protocol specification. 494 STag - Steering Tag 496 Tagged Buffer - A buffer that is explicitly Advertised to the Remote 497 Peer through exchange of an STag, Tagged Offset, and length. 499 Tagged Buffer Model - A DDP data transfer model used to transfer 500 Tagged Buffers from the Local Peer to the Remote Peer. 502 Tagged DDP Message - A DDP Message that targets a Tagged Buffer. 504 Tagged Offset (TO) - The offset within a Tagged Buffer on a Node. 506 ULP Buffer - A buffer owned above the DDP Layer and advertised to 507 the DDP Layer either as a Tagged Buffer or an Untagged ULP 508 Buffer. 510 ULP Message Length - The total length, in octets, of the ULP Payload 511 contained in a DDP Message. 513 Untagged Buffer - A buffer that is not explicitly Advertised to the 514 Remote Peer. 516 Untagged Buffer Model - A DDP data transfer model used to transfer 517 Untagged Buffers from the Local Peer to the Remote Peer. 519 Untagged DDP Message - A DDP Message that targets an Untagged 520 Buffer. 522 Shah, et. al. Expires August 2005 12 523 3 Reliable Delivery LLP Requirements 525 1. LLPs MUST expose MULPDU & MULPDU Changes. This is required so 526 that the DDP layer can perform segmentation aligned with the 527 MULPDU and can adapt as MULPDU changes come about. The corner 528 case of how to handle outstanding requests during a MULPDU 529 change is covered by the requirements below. 531 2. In the event of a MULPDU change, DDP MUST NOT be required by the 532 LLP to re-segment DDP Segments that have been previously posted 533 to the LLP. Note that under pathological conditions the LLP may 534 change the advertised MULPDU more frequently than the queue of 535 previously posted DDP Segment transmit requests is flushed. 536 Under this pathological condition, the LLP transmit queue can 537 contain DDP Messages which were posted multiple MULPDU updates 538 previously, thus there may be no correlation between the queued 539 DDP Segment(s) and the LLP's current value of MULPDU. 541 3. The LLP MUST ensure that if it accepts a DDP Segment, it will 542 transfer it reliably to the receiver or return with an error 543 stating that the transfer failed to complete. 545 4. The LLP MUST preserve DDP Segment and Message boundaries at the 546 Data Sink. 548 5. The LLP MAY provide the incoming segments out of order for 549 Placement, but if it does, it MUST also provide information that 550 specifies what the sender specified order was. 552 6. LLP MUST provide a strong digest (at least equivalent to CRC32- 553 C) to cover at least the DDP Segment. It is believed that some 554 of the existing data integrity digests are not sufficient and 555 that direct memory transfer semantics require a stronger digest 556 than, for example, a simple checksum. 558 7. On receive, the LLP MUST provide the length of the DDP Segment 559 received. This ensures that DDP does not have to carry a length 560 field in its header. 562 8. If an LLP does not support teardown of a LLP stream independent 563 of other LLP streams and a DDP error occurs on a specific DDP 564 Stream, then the LLP MUST label the associated LLP stream as an 565 erroneous LLP stream and MUST NOT allow any further data 566 transfer on that LLP stream after DDP requests the associated 567 DDP Stream to be torn down. 569 9. For a specific LLP Stream, the LLP MUST provide a mechanism to 570 indicate that the LLP Stream has been gracefully torn down. For 571 a specific LLP Connection, the LLP MUST provide a mechanism to 572 indicate that the LLP Connection has been gracefully torn down. 573 Note that if the LLP does not allow an LLP Stream to be torn 574 down independently of the LLP Connection, the above requirements 575 allow the LLP to notify DDP of both events at the same time. 577 Shah, et. al. Expires August 2005 13 578 10. For a specific LLP Connection, when all LLP Streams are either 579 gracefully torn down or are labeled as erroneous LLP streams, 580 the LLP Connection MUST be torn down. 582 11. The LLP MUST NOT pass a duplicate DDP Segment to the DDP Layer 583 after it has passed all the previous DDP Segments to the DDP 584 Layer and the associated ordering information for the previous 585 DDP Segments and the current DDP Segment. 587 Shah, et. al. Expires August 2005 14 588 4 Header Format 590 DDP has two different header formats: one for Data Placement into 591 Tagged Buffers, and the other for Data Placement into Untagged 592 Buffers. See Section 5.1 for a description of the two models. 594 4.1 DDP Control Field 596 The first 8 bits of the DDP Header carry a DDP Control Field that is 597 common between the two formats. It is shown below in Figure 3, 598 offset by 16 bits to accommodate the MPA header defined in [MPA]. 599 The MPA header is only present if DDP is layered on top of MPA. 601 0 1 2 3 602 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 603 +-+-+-+-+-+-+-+-+ 604 |T|L| Rsvd |DV | 605 +-+-+-+-+-+-+-+-+ 606 Figure 3 DDP Control Field 608 T - Tagged flag: 1 bit. 610 Specifies the Tagged or Untagged Buffer Model. If set to one, 611 the ULP Payload carried in this DDP Segment MUST be Placed into 612 a Tagged Buffer. 614 If set to zero, the ULP Payload carried in this DDP Segment 615 MUST be Placed into an Untagged Buffer. 617 L - Last flag: 1 bit. 619 Specifies whether the DDP Segment is the Last segment of a DDP 620 Message. It MUST be set to one on the last DDP Segment of every 621 DDP Message. It MUST NOT be set to one on any other DDP 622 Segment. 624 The DDP Segment with the L bit set to 1 MUST be posted to the 625 LLP after all other DDP Segments of the associated DDP Message 626 have been posted to the LLP. For an Untagged DDP Message, the 627 DDP Segment with the L bit set to 1 MUST carry the highest MO. 629 If the Last flag is set to one, the DDP Message payload MUST be 630 Delivered to the ULP after: 632 . Placement of all DDP Segments of this DDP Message and all 633 prior DDP Messages, and 635 . Delivery of each prior DDP Message. 637 If the Last flag is set to zero, the DDP Segment is an 638 intermediate DDP Segment. 640 Shah, et. al. Expires August 2005 15 641 Rsvd - Reserved: 4 bits. 643 Reserved for future use by the DDP protocol. This field MUST be 644 set to zero on transmit, and not checked on receive. 646 DV - Direct Data Placement Protocol Version: 2 bits. 648 The version of the DDP Protocol in use. This field MUST be set 649 to one to indicate the version of the specification described 650 in this document. The value of DV MUST be the same for all the 651 DDP Segments transmitted or received on a DDP Stream. 653 4.2 DDP Tagged Buffer Model Header 655 Figure 4 shows the DDP Header format that MUST be used in all DDP 656 Segments that target Tagged Buffers. It includes the DDP Control 657 Field previously defined in Section 4.1. (Note: In Figure 4, the DDP 658 Header is offset by 16 bits to accommodate the MPA header defined in 659 [MPA]. The MPA header is only present if DDP is layered on top of 660 MPA.) 662 0 1 2 3 663 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 665 |T|L| Rsvd | DV| RsvdULP | 666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 667 | STag | 668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 669 | | 670 + TO + 671 | | 672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 673 Figure 4 Tagged Buffer DDP Header 675 T is set to one. 677 RsvdULP - Reserved for use by the ULP: 8 bits. 679 The RsvdULP field is opaque to the DDP protocol and can be 680 structured in any way by the ULP. At the Data Source, DDP MUST 681 set RsvdULP Field to the value specified by the ULP. It is 682 transferred unmodified from the Data Source to the Data Sink. 683 At the Data Sink, DDP MUST provide the RsvdULP field to the ULP 684 when the DDP Message is delivered. Each DDP Segment within a 685 specific DDP Message MUST contain the same value for this 686 field. The Data Source MUST ensure that each DDP Segment within 687 a specific DDP Message contains the same value for this field. 689 STag - Steering Tag: 32 bits. 691 The Steering Tag identifies the Data Sink's Tagged Buffer. The 692 STag MUST be valid for this DDP Stream. The STag is associated 693 with the DDP Stream through a mechanism that is outside the 695 Shah, et. al. Expires August 2005 16 696 scope of the DDP Protocol specification. At the Data Source, 697 DDP MUST set the STag field to the value specified by the ULP. 698 At the Data Sink, the DDP MUST provide the STag field when the 699 ULP Message is delivered. Each DDP Segment within a specific 700 DDP Message MUST contain the same value for this field and MUST 701 be the value supplied by the ULP. The Data Source MUST ensure 702 that each DDP Segment within a specific DDP Message contains 703 the same value for this field. 705 TO - Tagged Offset: 64 bits. 707 The Tagged Offset specifies the offset, in octets, within the 708 Data Sink's Tagged Buffer, where the Placement of ULP Payload 709 contained in the DDP Segment starts. A DDP Message MAY start at 710 an arbitrary TO within a Tagged Buffer. 712 4.3 DDP Untagged Buffer Model Header 714 Figure 5 shows the DDP Header format that MUST be used in all DDP 715 Segments that target Untagged Buffers. It includes the DDP Control 716 Field previously defined in Section 4.1. (Note: In Figure 5, the DDP 717 Header is offset by 16 bits to accommodate the MPA header defined in 718 [MPA]. The MPA header is only present if DDP is layered on top of 719 MPA.) 721 0 1 2 3 722 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 723 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 724 |T|L| Rsvd | DV| RsvdULP[0:7] | 725 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 726 | RsvdULP[8:39] | 727 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 728 | QN | 729 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 730 | MSN | 731 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 732 | MO | 733 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 734 Figure 5 Untagged Buffer DDP Header 736 T is set to zero. 738 RsvdULP - Reserved for use by the ULP: 40 bits. 740 The RsvdULP field is opaque to the DDP protocol and can be 741 structured in any way by the ULP. At the Data Source, DDP MUST 742 set RsvdULP Field to the value specified by the ULP. It is 743 transferred unmodified from the Data Source to the Data Sink. 744 At the Data Sink, DDP MUST provide RsvdULP field to the ULP 745 when the ULP Message is Delivered. Each DDP Segment within a 746 specific DDP Message MUST contain the same value for the 748 Shah, et. al. Expires August 2005 17 749 RsvdULP field. At the Data Sink, the DDP implementation is NOT 750 REQUIRED to verify that the same value is present in the 751 RsvdULP field of each DDP Segment within a specific DDP Message 752 and MAY provide the value from any one of the received DDP 753 Segment to the ULP when the ULP Message is Delivered. 755 QN - Queue Number: 32 bits. 757 The Queue Number identifies the Data Sink's Untagged Buffer 758 queue referenced by this header. Each DDP segment within a 759 specific DDP message MUST contain the same value for this field 760 and MUST be the value supplied by the ULP at the Data Source. 761 The Data Source MUST ensure that each DDP Segment within a 762 specific DDP Message contains the same value for this field. 764 MSN - Message Sequence Number: 32 bits. 766 The Message Sequence Number specifies a sequence number that 767 MUST be increased by one (modulo 2^32) with each DDP Message 768 targeting the specific Queue Number on the DDP Stream 769 associated with this DDP Segment. The initial value for MSN 770 MUST be one. The MSN value MUST wrap to 0 after a value of 771 0xFFFFFFFF. Each DDP segment within a specific DDP message MUST 772 contain the same value for this field. The Data Source MUST 773 ensure that each DDP Segment within a specific DDP Message 774 contains the same value for this field. 776 MO - Message Offset: 32 bits. 778 The Message Offset specifies the offset, in octets, from the 779 start of the DDP Message represented by the MSN and Queue 780 Number on the DDP Stream associated with this DDP Segment. The 781 MO referencing the first octet of the DDP Message MUST be set 782 to zero by the DDP layer. 784 4.4 DDP Segment Format 786 Each DDP Segment MUST contain a DDP Header. Each DDP Segment may 787 also contain ULP Payload. Following is the DDP Segment format: 789 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 790 | DDP | | 791 | Header| ULP Payload (if any) | 792 | | | 793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 794 Figure 6 DDP Segment Format 796 Shah, et. al. Expires August 2005 18 797 5 Data Transfer 799 DDP supports multi-segment DDP Messages. Each DDP Message is 800 composed of one or more DDP Segments. Each DDP Segment contains a 801 DDP Header. The DDP Header contains the information required by the 802 receiver to Place any ULP Payload included in the DDP Segment. 804 5.1 DDP Tagged or Untagged Buffer Models 806 DDP uses two basic Buffer Models for the Placement of the ULP 807 Payload: Tagged Buffer Model and Untagged Buffer Model. 809 5.1.1 Tagged Buffer Model 811 The Tagged Buffer Model is used by the Data Source to transfer a DDP 812 Message into a Tagged Buffer at the Data Sink that has been 813 previously Advertised to the Data Source. An STag identifies a 814 Tagged Buffer. For the Placement of a DDP Message using the Tagged 815 Buffer model, the STag is used to identify the buffer, and the TO is 816 used to identify the offset within the Tagged Buffer into which the 817 ULP Payload is transferred. The protocol used to Advertise the 818 Tagged Buffer is outside the scope of this specification (i.e. ULP 819 specific). A DDP Message can start at an arbitrary TO within a 820 Tagged Buffer. 822 Additionally, a Tagged Buffer can potentially be written multiple 823 times. This might be done for error recovery or because a buffer is 824 being re-used after some ULP specific synchronization mechanism. 826 5.1.2 Untagged Buffer Model 828 The Untagged Buffer Model is used by the Data Source to transfer a 829 DDP Message to the Data Sink into a queued buffer. 831 The DDP Queue Number is used by the ULP to separate ULP messages 832 into different queues of receive buffers. For example, if two queues 833 were supported, the ULP could use one queue to post buffers handed 834 to it by the application above the ULP, and it could use the other 835 queue for buffers which are only consumed by ULP specific control 836 messages. This enables the separation of ULP control messages from 837 opaque ULP Payload when using Untagged Buffers. 839 The DDP Message Sequence Number can be used by the Data Sink to 840 identify the specific Untagged Buffer. The protocol used to 841 communicate how many buffers have been queued is outside the scope 842 of this specification. Similarly, the exact implementation of the 843 buffer queue is outside the scope of this specification. 845 5.2 Segmentation and Reassembly of a DDP Message 847 At the Data Source, the DDP layer MUST segment the data contained in 848 a ULP message into a series of DDP Segments, where each DDP Segment 849 contains a DDP Header and ULP Payload, and MUST be no larger than 851 Shah, et. al. Expires August 2005 19 852 the MULPDU value advertised by the LLP. The ULP Message Length MUST 853 be less than 2^32. At the Data Source, the DDP layer MUST send all 854 the data contained in the ULP message. At the Data Sink, the DDP 855 layer MUST Place the ULP Payload contained in all valid incoming DDP 856 Segments associated with a DDP Message into the ULP Buffer. 858 DDP Message segmentation at the Data Source is accomplished by 859 identifying a DDP Message (which corresponds one-to-one with a ULP 860 Message) uniquely and then, for each associated DDP Segment of a DDP 861 Message, by specifying an octet offset for the portion of the ULP 862 Message contained in the DDP Segment. 864 For an Untagged DDP Message, the combination of the QN and MSN 865 uniquely identifies a DDP Message. The octet offset for each DDP 866 Segment of a Untagged DDP Message is the MO field. For each DDP 867 Segment of a Untagged DDP Message, the MO MUST be set to the octet 868 offset from the first octet in the associated ULP Message (which is 869 defined to be zero) to the first octet in the ULP Payload contained 870 in the DDP Segment. 872 For example, if the ULP Untagged Message was 2048 octets, and the 873 MULPDU was 1500 octets, the Data Source would generate two DDP 874 Segments, one with MO = 0, containing 1482 octets of ULP Payload, 875 and a second with MO = 1482, containing 566 octets of ULP Payload. 876 In this example, the amount of ULP Payload for the first DDP Segment 877 was calculated as: 879 1482 = 1500 (MULPDU) - 18 (for the DDP Header) 881 For a Tagged DDP Message, the STag and TO, combined with the in- 882 order delivery characteristics of the LLP, are used to segment and 883 reassemble the ULP Message. Because the initial octet offset (the TO 884 field) can be non-zero, recovery of the original ULP Message 885 boundary cannot be done in the general case without an additional 886 ULP Message. 888 Implementers Note: One implementation, valid for some ULPs such 889 as RDMAP, is to not directly support recovery of the ULP 890 Message boundary for a Tagged DDP Message. For example, the ULP 891 may wish to have the Local Peer use small buffers at the Data 892 Source even when the ULP at the Data Sink has advertised a 893 single large Tagged Buffer for this data transfer. In this 894 case, the ULP may choose to use the same STag for multiple 895 consecutive ULP Messages. Thus a non-zero initial TO and re-use 896 of the STag effectively enables the ULP to implement 897 segmentation and reassembly due to ULP specific constraints. 898 See [RDMAP] for details of how this is done. 900 A different implementation of a ULP could use an Untagged DDP 901 Message sent after the Tagged DDP Message which details the 902 initial TO for the STag that was used in the Tagged DDP 903 Message. And finally, another implementation of a ULP could 904 choose to always use an initial TO of zero such that no 906 Shah, et. al. Expires August 2005 20 907 additional message is required to convey the initial TO used in 908 a Tagged DDP Message. 910 Regardless of whether the ULP chooses to recover the original ULP 911 Message boundary at the Data Sink for a Tagged DDP Message, DDP 912 supports segmentation and reassembly of the Tagged DDP Message. The 913 STag is used to identify the ULP Buffer at the Data Sink and the TO 914 is used to identify the octet-offset within the ULP Buffer 915 referenced by the STag. The ULP at the Data Source MUST specify the 916 STag and the initial TO when the ULP Message is handed to DDP. 918 For each DDP Segment of a Tagged DDP Message, the TO MUST be set to 919 the octet offset from the first octet in the associated ULP Message 920 to the first octet in the ULP Payload contained in the DDP Segment, 921 plus the TO assigned to the first octet in the associated ULP 922 Message. 924 For example, if the ULP Tagged Message was 2048 octets with an 925 initial TO of 16384, and the MULPDU was 1500 octets, the Data Source 926 would generate two DDP Segments, one with TO = 16384, containing the 927 first 1486 octets of ULP payload, and a second with TO = 17870, 928 containing 562 octets of ULP payload. In this example, the amount of 929 ULP payload for the first DDP Segment was calculated as: 931 1486 = 1500 (MULPDU) - 14 (for the DDP Header) 933 A zero-length DDP Message is allowed and MUST consume exactly one 934 DDP Segment. Only the DDP Control and RsvdULP Fields MUST be valid 935 for a zero length Tagged DDP Segment. The STag and TO fields MUST 936 NOT be checked for a zero-length Tagged DDP Message. 938 For either Untagged or Tagged DDP Messages, the Data Sink is not 939 required to verify that the entire ULP Message has been received. 941 5.3 Ordering Among DDP Messages 943 Messages passed through the DDP MUST conform to the ordering rules 944 defined in this section. 946 At the Data Source, DDP: 948 * MUST transmit DDP Messages in the order they were submitted to 949 the DDP layer, 951 * SHOULD transmit DDP Segments within a DDP Message in increasing 952 MO order for Untagged DDP Messages and in increasing TO order 953 for Tagged DDP Messages. 955 At the Data Sink, DDP (Note: The following rules are motivated by 956 LLP implementations that separate Placement and Delivery.): 958 * MAY perform Placement of DDP Segments out of order, 960 Shah, et. al. Expires August 2005 21 961 * MAY perform Placement of a DDP Segment more than once, 963 * MUST Deliver a DDP Message to the ULP at most once, 965 * MUST Deliver DDP Messages to the ULP in the order they were 966 sent by the Data Source. 968 5.4 DDP Message Completion & Delivery 970 At the Data Source, DDP Message transfer is considered completed 971 when the reliable, in-order transport LLP has indicated that the 972 transfer will occur reliably. Note that this in no way restricts the 973 LLP from buffering the data at either the Data Source or Data Sink. 974 Thus at the Data Source, completion of a DDP Message does not 975 necessarily mean that the Data Sink has received the message. 977 At the Data Sink, DDP MUST Deliver a DDP Message if and only if all 978 of the following are true: 980 * the last DDP Segment of the DDP Message had its Last flag set, 982 * all of the DDP Segments of the DDP Message have been Placed, 984 * all preceding DDP Messages have been Placed, and 986 * each preceding DDP Message has been Delivered to the ULP. 988 At the Data Sink, DDP MUST provide the ULP Message Length to the ULP 989 when an Untagged DDP Message is Delivered. The ULP Message Length 990 may be calculated by adding the MO and the ULP Payload length in the 991 last DDP Segment (with the Last flag set) of an Untagged DDP 992 Message. 994 At the Data Sink, DDP MUST provide the RsvdULP Field of the DDP 995 Message to the ULP when the DDP Message is delivered. 997 Shah, et. al. Expires August 2005 22 998 6 DDP Stream Setup & Teardown 1000 This section describes LLP independent issues related to DDP Stream 1001 setup and teardown. 1003 6.1 DDP Stream Setup 1005 It is expected that the ULP will use a mechanism outside the scope 1006 of this specification to establish an LLP Connection, and that the 1007 LLP Connection will support one or more LLP Streams (e.g. MPA/TCP or 1008 SCTP). After the LLP sets up the LLP Stream, it will enable a DDP 1009 Stream on a specific LLP Stream at an appropriate point. 1011 The ULP is required to enable both endpoints of an LLP Stream for 1012 DDP data transfer at the same time, in both directions; this is 1013 necessary so that the Data Sink can properly recognize the DDP 1014 Segments. 1016 6.2 DDP Stream Teardown 1018 DDP MUST NOT independently initiate Stream Teardown. DDP either 1019 responds to a stream being torn down by the LLP or processes a 1020 request from the ULP to teardown a stream. DDP Stream teardown 1021 disables DDP capabilities on both endpoints. For connection-oriented 1022 LLPs, DDP Stream teardown MAY result in underlying LLP Connection 1023 teardown. 1025 6.2.1 DDP Graceful Teardown 1027 It is up to the ULP to ensure that DDP teardown happens on both 1028 endpoints of the DDP Stream at the same time; this is necessary so 1029 that the Data Sink stops trying to interpret the DDP Segments. 1031 If the Local Peer ULP indicates graceful teardown, the DDP layer on 1032 the Local Peer SHOULD ensure that all ULP data would be transferred 1033 before the underlying LLP Stream & Connection are torn down, and any 1034 further data transfer requests by the Local Peer ULP MUST return an 1035 error. 1037 If the DDP layer on the Local Peer receives a graceful teardown 1038 request from the LLP, any further data received after the request is 1039 considered an error and MUST cause the DDP Stream to be abortively 1040 torn down. 1042 If the Local Peer LLP supports a half-closed LLP Stream, on the 1043 receipt of a LLP graceful teardown request of the DDP Stream, DDP 1044 SHOULD indicate the half-closed state to the ULP, and continue to 1045 process outbound data transfer requests normally. Following this 1046 event, when the Local Peer ULP requests graceful teardown, DDP MUST 1047 indicate to the LLP that it SHOULD perform a graceful close of the 1048 other half of the LLP Stream. 1050 Shah, et. al. Expires August 2005 23 1051 If the Local Peer LLP supports a half-closed LLP Stream, on the 1052 receipt of a ULP graceful half-close teardown request of the DDP 1053 Stream, DDP SHOULD keep data reception enabled on the other half of 1054 the LLP stream. 1056 6.2.2 DDP Abortive Teardown 1058 As previously mentioned, DDP does not independently terminate a DDP 1059 Stream. Thus any of the following fatal errors on a DDP Stream MUST 1060 cause DDP to indicate to the ULP that a fatal error has occurred: 1062 * Underlying LLP Connection or LLP Stream is lost. 1064 * Underlying LLP reports a catastrophic error. 1066 * DDP Header has one or more invalid fields. 1068 If the LLP indicates to the ULP that a fatal error has occurred, the 1069 DDP layer SHOULD report the error to the ULP (see Section 7.2, DDP 1070 Error Numbers) and complete all outstanding ULP requests with an 1071 error. If the underlying LLP Stream is still intact, DDP SHOULD 1072 continue to allow the ULP to transfer additional DDP Messages on the 1073 outgoing half connection after the fatal error was indicated to the 1074 ULP. This enables the ULP to transfer an error syndrome to the 1075 Remote Peer. After indicating to the ULP a fatal error has occurred, 1076 the DDP Stream MUST NOT be terminated until the Local Peer ULP 1077 indicates to the DDP layer that the DDP Stream should be abortively 1078 torndown. 1080 Shah, et. al. Expires August 2005 24 1081 7 Error Semantics 1083 All LLP errors reported to DDP SHOULD be passed up to the ULP. 1085 7.1 Errors detected at the Data Sink 1087 For non-zero length Untagged DDP Segments, the DDP Segment MUST be 1088 validated before Placement by verifying: 1090 1. The QN is valid for this stream. 1092 2. The QN and MSN have an associated buffer that allows Placement 1093 of the payload. 1095 Implementers note: DDP implementations SHOULD consider lack of 1096 an associated buffer as a system fault. DDP implementations MAY 1097 try to recover from the system fault using LLP means in a ULP- 1098 transparent way. DDP implementations SHOULD NOT permit system 1099 faults to occur repeatedly or frequently. If there is not an 1100 associated buffer, DDP implementations MAY choose to disable 1101 the stream for the reception and report an error to the ULP at 1102 the Data Sink. 1104 3. The MO falls in the range of legal offsets associated with the 1105 Untagged Buffer. 1107 4. The sum of the DDP Segment payload length and the MO falls in 1108 the range of legal offsets associated with the Untagged Buffer. 1110 5. The Message Sequence Number falls in the range of legal Message 1111 Sequence Numbers, for the queue defined by the QN. The legal 1112 range is defined as being between the MSN value assigned to the 1113 first available buffer for a specific QN and the MSN value 1114 assigned to the last available buffer for a specific QN. 1116 Implementers note: for a typical Queue Number, the lower limit 1117 of the Message Sequence Number is defined by whatever DDP 1118 Messages have already been Completed. The upper limit is 1119 defined by however many message buffers are currently available 1120 for that queue. Both numbers change dynamically as new DDP 1121 Messages are received and Completed, and new buffers are added. 1122 It is up to the ULP to ensure that sufficient buffers are 1123 available to handle the incoming DDP Segments. 1125 For non-zero length Tagged DDP Segments, the segment MUST be 1126 validated before Placement by verifying: 1128 1. The STag is valid for this stream. 1130 2. The STag has an associated buffer that allows Placement of the 1131 payload. 1133 Shah, et. al. Expires August 2005 25 1134 3. The TO falls in the range of legal offsets registered for the 1135 STag. 1137 4. The sum of the DDP Segment payload length and the TO falls in 1138 the range of legal offsets registered for the STag. 1140 5. A 64-bit unsigned sum of the DDP Segment payload length and the 1141 TO does not wrap. 1143 If the DDP layer detects any of the receive errors listed in this 1144 section, it MUST cease placing the remainder of the DDP Segment and 1145 report the error(s) to the ULP. The DDP layer SHOULD include in the 1146 error report the DDP Header, the type of error, and the length of 1147 the DDP segment, if available. DDP MUST silently drop any subsequent 1148 incoming DDP Segments. Since each of these errors represents a 1149 failure of the sending ULP or protocol, DDP SHOULD enable the ULP to 1150 send one additional DDP Message before terminating the DDP Stream. 1152 7.2 DDP Error Numbers 1154 The following error numbers MUST be used when reporting errors to 1155 the ULP. They correspond to the checks enumerated in section 7.1. 1156 Each error is subdivided into a 4-bit Error Type and an 8 bit Error 1157 Code. 1159 Error Error 1160 Type Code Description 1161 ---------------------------------------------------------- 1162 0x0 0x00 Local Catastrophic 1164 0x1 Tagged Buffer Error 1165 0x00 Invalid STag 1166 0x01 Base or bounds violation 1167 0x02 STag not associated with DDP Stream 1168 0x03 TO wrap 1169 0x04 Invalid DDP version 1171 0x2 Untagged Buffer Error 1172 0x01 Invalid QN 1173 0x02 Invalid MSN - no buffer available 1174 0x03 Invalid MSN - MSN range is not valid 1175 0x04 Invalid MO 1176 0x05 DDP Message too long for available buffer 1177 0x06 Invalid DDP version 1179 0x3 Rsvd Reserved for the use by the LLP 1181 Shah, et. al. Expires August 2005 26 1182 8 Security Considerations 1184 This section discusses both protocol-specific considerations and the 1185 implications of using DDP with existing security mechanisms. The 1186 security requirements for the DDP implementation are provided at the 1187 end of the section. A more detailed analysis of the security issues 1188 around the implementation and the use of the DDP can be found in 1189 [RDMASEC]. 1191 8.1 Protocol-specific Security Considerations 1193 The vulnerabilities of DDP to active third-party interference are no 1194 greater than any other protocol running over TCP. A third party, by 1195 injecting spoofed packets into the network that are Delivered to a 1196 DDP Data Sink, could launch a variety of attacks that exploit DDP- 1197 specific behavior. Since DDP directly or indirectly exposes memory 1198 addresses on the wire, the Placement information carried in each DDP 1199 Segment must be validated, including invalid STag and octet level 1200 granularity base and bounds check, before any data is Placed. For 1201 example, a third-party adversary could inject random packets that 1202 appear to be valid DDP Segments and corrupt the memory on a DDP Data 1203 Sink. Since DDP is IP transport protocol independent, communication 1204 security mechanisms such as IPsec [IPSEC] or TLS [TLS] may be used 1205 to prevent such attacks. 1207 8.2 Association of an STag and a DDP Stream 1209 There are several mechanisms for associating an STag and a DDP 1210 Stream. Two required mechanisms for this association are a 1211 Protection Domain (PD) association and a DDP Stream association. 1213 Under the Protection Domain (PD) association, a unique Protection 1214 Domain Identifier (PD ID) is created and used locally to associate 1215 an STag with a set of DDP Streams. Under this mechanism, the use of 1216 the STag is only permitted on the DDP Streams that have the same PD 1217 ID as the STag. For an incoming DDP Segment of a Tagged DDP Message 1218 on a DDP Stream, if the PD ID of the DDP Stream is not the same as 1219 the PD ID of the STag targeted by the Tagged DDP Message, then the 1220 DDP Segment is not placed and the DDP layer MUST surface a local 1221 error to the ULP. Note that the PD ID is locally defined, and cannot 1222 be directly manipulated by the Remote Peer. 1224 Under the DDP Stream association, a DDP Stream is identified locally 1225 by a unique DDP Stream identifier (ID). An STag is associated with a 1226 DDP Stream by using a DDP Stream ID. In this case, for an incoming 1227 DDP Segment of a Tagged DDP Message on a DDP Stream, if the DDP 1228 Stream ID of the DDP Stream is not the same as the DDP Stream ID of 1229 the STag targeted by the Tagged DDP Message, then the DDP Segment is 1230 not placed and the DDP layer MUST surface a local error to the ULP. 1231 Note that the DDP Stream ID is locally defined, and cannot be 1232 directly manipulated by the Remote Peer. 1234 Shah, et. al. Expires August 2005 27 1235 A ULP SHOULD associate an STag and a DDP Stream. DDP MUST support 1236 Protection Domain association and DDP Stream association mechanisms 1237 for associating an STag and a DDP Stream. 1239 8.3 Security Requirements 1241 [RDMASEC] defines the security model and general assumptions for 1242 RDMAP/DDP. This subsection provides the security requirements for 1243 the DDP implementation. For more details on the type of attacks, 1244 type of attackers, trust models, and resource sharing for the DDP 1245 implementation, the reader is referred to [RDMASEC]. 1247 DDP has several mechanisms that deal with a number of attacks. 1248 These attacks include, but are not limited to: 1250 1. Connection to/from an unauthorized or unauthenticated endpoint. 1251 2. Hijacking of a DDP Stream. 1252 3. Attempts to read or write from unauthorized memory regions. 1253 4. Injection of RDMA Messages within a Stream on a multi-user 1254 operating system by another application. 1256 DDP relies on the LLP to establish the LLP Stream over which DDP 1257 Messages will be carried. DDP itself does nothing to authenticate 1258 the validity of the LLP Stream of either of the endpoints. It is the 1259 responsibility of the ULP to validate the LLP Stream. This is highly 1260 desirable due to the nature of DDP. 1262 Hijacking of an DDP Stream would require that the underlying LLP 1263 Stream is hijacked. This would require knowledge of Advertised 1264 buffers in order to directly Place data into a user buffer and is 1265 therefore constrained by the same techniques mentioned to guard 1266 against attempts to read or write from unauthorized memory regions. 1268 DDP does not require a node to open its buffers to arbitrary attacks 1269 over the DDP Stream. It may access ULP memory only to the extent 1270 that the ULP has enabled and authorized it to do so. The STag 1271 access control model is defined in [RDMASEC]. Specific security 1272 operations include: 1274 1. STags are only valid over the exact byte range established by the 1275 ULP. DDP MUST provide a mechanism for the ULP to establish and 1276 revoke the TO range associated with the ULP Buffer referenced by 1277 the STag. 1278 2. STags are only valid for the duration established by the ULP. The 1279 ULP may revoke them at any time, in accordance with its own upper 1280 layer protocol requirements. DDP MUST provide a mechanism for the 1281 ULP to establish and revoke STag validity. 1282 3. DDP MUST provide a mechanism for the ULP to communicate the 1283 association between a STag and a specific DDP Stream. 1285 Shah, et. al. Expires August 2005 28 1286 4. A ULP may only expose memory to remote access to the extent that 1287 it already had access to that memory itself. 1288 5. If an STag is not valid on a DDP Stream, DDP MUST pass the invalid 1289 access attempt to the ULP. The ULP may provide a mechanism for 1290 terminating the DDP Stream. 1292 Further, DDP provides a mechanism that directly Places incoming 1293 payloads in user-mode ULP Buffers. This avoids the risks of prior 1294 solutions that relied upon exposing system buffers for incoming 1295 payloads. 1297 For the DDP implementation, two components MUST be provided: a RDMA 1298 enabled NIC (RNIC) and a Privileged Resource Manager (PRM). 1300 8.3.1 RNIC Requirements 1302 The RNIC MUST implement the DDP wire Protocol and perform the 1303 security semantics described below. 1305 * An RNIC MUST ensure that a specific DDP Stream in a specific 1306 Protection Domain cannot access an STag in a different 1307 Protection Domain. 1309 * An RNIC MUST ensure that if an STag is limited in scope to a 1310 single DDP Stream, no other DDP Stream can use the STag. 1312 * An RNIC MUST ensure that a Remote Peer is not able to access 1313 memory outside of the buffer specified when the STag was 1314 enabled for remote access. 1316 * An RNIC MUST provide a mechanism for the ULP to establish and 1317 revoke the association of a ULP Buffer to an STag and TO range. 1319 * An RNIC MUST provide a mechanism for the ULP to establish and 1320 revoke read, write, or read and write access to the ULP Buffer 1321 referenced by an STag. 1323 * An RNIC MUST ensure that the network interface can no longer 1324 modify an advertised buffer after the ULP revokes remote access 1325 rights for an STag. 1327 * An RNIC MUST NOT enable firmware to be loaded on the RNIC 1328 directly from an untrusted Local Peer or Remote Peer, unless 1329 the Peer is properly authenticated (by a mechanism outside the 1330 scope of this specification. The mechanism presumably entails 1331 authenticating that the remote ULP has the right to perform the 1332 update), and the update is done via a secure protocol, such as 1333 IPsec. 1335 8.3.2 Privileged Resources Manager Requirement 1337 The PRM MUST implement the security semantics described below. 1339 Shah, et. al. Expires August 2005 29 1340 * All Non-Privileged ULP interactions with the RNIC Engine that 1341 could affect other ULPs MUST be done using the Privileged 1342 Resource Manager as a proxy. 1344 * All ULP resource allocation requests for scarce resources MUST 1345 also be done using a Privileged Resource Manager. 1347 * The Privileged Resource Manager MUST NOT assume different ULPs 1348 share Partial Mutual Trust unless there is a mechanism to 1349 ensure that the ULPs do indeed share partial mutual trust. 1351 * If Non-Privileged ULPs are supported, the Privileged Resource 1352 Manager MUST verify that the Non-Privileged ULP has the right 1353 to access a specific Data Buffer before allowing an STag for 1354 which the ULP has access rights to be associated with a 1355 specific Data Buffer. 1357 * The Privileged Resource Manager SHOULD prevent a Local Peer 1358 from allocating more than its fair share of resources. 1359 If an RNIC provides the ability to share receive buffers across 1360 multiple DDP Streams, the combination of the RNIC and the 1361 Privileged Resource Manager MUST be able to detect if the 1362 Remote Peer is attempting to consume more than its fair share 1363 of resources so that the Local Peer can apply countermeasures 1364 to detect and prevent the attack. 1366 8.4 Security Services for DDP 1368 DDP uses an IP based network services, therefore, all exchanged DDP 1369 Segments are vulnerable to spoofing, tampering and information 1370 disclosure attacks. If a DDP Stream may be subject to impersonation 1371 attacks, or Stream hijacking attacks, it is highly RECOMMENDED that 1372 the DDP Stream be authenticated, integrity protected, and protected 1373 from replay attacks; it MAY use confidentiality protection to 1374 protect from eavesdropping. 1376 IPsec can be used to protect against the packet injection attacks 1377 outlined above. Because IPsec is designed to secure arbitrary IP 1378 packet streams, including streams where packets are lost, DDP can 1379 run on top of IPsec without any change. 1381 The DDP implementation MUST implement IPSec services as outlined in 1382 Section 2.3 of [RFC 3723]. IPsec packets are processed (e.g., 1383 integrity checked and possibly decrypted) in the order they are 1384 received, and a DDP Data Sink will process the decrypted DDP 1385 Segments contained in these packets in the same manner as DDP 1386 Segments contained in unsecured IP packets. 1388 The receipt of an IKE Phase 2 delete message MUST NOT be interpreted 1389 as a reason for tearing down a DDP Stream. Rather, it is preferable 1390 to leave the DDP Stream up, and if additional traffic is sent on it, 1391 to bring up another IKE Phase 2 SA to protect it. This avoids the 1392 potential for continually bringing DDP Streams up and down. 1394 Shah, et. al. Expires August 2005 30 1395 Shah, et. al. Expires August 2005 31 1396 9 IANA Considerations 1398 If DDP was enabled a priori for a ULP by connecting to a well-known 1399 port, this well-known port would be registered for the DDP with 1400 IANA. 1402 Shah, et. al. Expires August 2005 32 1403 10 References 1405 10.1 Normative References 1407 [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 1408 3", BCP 9, RFC 2026, October 1996. 1410 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1411 Requirement Levels", BCP 14, RFC 2119, March 1997. 1413 [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., Travostino, 1414 F., "Securing Block Storage Protocols over IP", RFC 3723, April 1415 2004. 1417 [MPA] Culley, P., Elzur, U., Recio, R., Bailey, S., Carrier, J., 1418 "Marker PDU Aligned Framing for TCP Specification", Internet 1419 Draft draft-ietf-rddp-mpa-01.txt (work in progress), July 2004 1421 [RDMAP] Recio, R., Culley, P., Garcia, D., Hilland, J., "An RDMA 1422 Protocol Specification", Internet Draft draft-ietf-rddp-rdmap- 1423 01.txt (work in progress), October 2003 1425 [SCTP] Stewart, R. et al., "Stream Control Transmission Protocol", 1426 RFC 2960, October 2000. 1428 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 1429 September 1981. 1431 10.2 Informative References 1433 [TLS] Dierks, T. and C. Allen, "The TLS Protocol Version 1.0", RFC 1434 2246, November 1998. 1436 [IPSEC] Atkinson, R., Kent, S., "Security Architecture for the 1437 Internet Protocol", RFC 2401, November 1998. 1439 [RDMASEC] Pinkerton J., Deleganes E., Romanow A., Bitan S., 1440 "DDP/RDMAP Security", draft-ietf-rddp-security-05.txt (work in 1441 progress), August 2004. 1443 Shah, et. al. Expires August 2005 33 1444 11 Appendix 1446 11.1 Receive Window sizing 1448 Reliable, sequenced, LLPs include a mechanism to advertise the 1449 amount of receive buffer space a sender may consume. This is 1450 generally called a "receive window". 1452 DDP allows data to be transferred directly to predefined buffers at 1453 the Data Sink. Accordingly, the LLP receive window size need not be 1454 affected by the reception of a DDP Segment, if that segment is 1455 placed before additional segments arrive. 1457 The LLP implementation SHOULD maintain an advertised receive window 1458 large enough to enable a reasonable number of segments to be 1459 outstanding at one time. The amount to advertise depends on the 1460 desired data rate, and the expected or actual round trip delay 1461 between endpoints. 1463 The amount of actual buffers maintained to "back up" the receive 1464 window is left up to the implementation. This amount will depend on 1465 the rate that DDP Segments can be retired; there may be some cases 1466 where segment processing cannot keep up with the incoming packet 1467 rate. If this occurs, one reasonable way to slow the incoming packet 1468 rate is to reduce the receive window. 1470 Note that the LLP should take care to comply with the applicable 1471 RFCs; for instance, for TCP, receivers are highly discouraged from 1472 "shrinking" the receive window (reducing the right edge of the 1473 window after it has been advertised). 1475 Shah, et. al. Expires August 2005 34 1476 12 Author's Addresses 1478 Hemal Shah 1479 Intel Corporation 1480 MS AN1-PTL1 1481 1501 South Mopac Expressway, #400 1482 Austin, TX 78746 USA 1483 Phone: +1 (512) 732-3963 1484 Email: hemal.shah@intel.com 1486 James Pinkerton 1487 Microsoft Corporation 1488 One Microsoft Way 1489 Redmond, WA 98052 USA 1490 Phone: +1 (425) 705-5442 1491 Email: jpink@microsoft.com 1493 Renato Recio 1494 IBM Corporation 1495 11501 Burnett Road 1496 Austin, TX 78758 USA 1497 Phone: +1 (512) 838-1365 1498 Email: recio@us.ibm.com 1500 Paul R. Culley 1501 Hewlett-Packard Company 1502 20555 SH 249 1503 Houston, TX 77070-2698 USA 1504 Phone: +1 (281) 514-5543 1505 Email: paul.culley@hp.com 1507 Shah, et. al. Expires August 2005 35 1508 13 Acknowledgments 1510 John Carrier 1511 Adaptec, Inc. 1512 691 S. Milpitas Blvd. 1513 Milpitas, CA 95035 USA 1514 Phone: +1 (360) 378-8526 1515 Email: john_carrier@adaptec.com 1517 Hari Ghadia 1518 Adaptec, Inc. 1519 691 S. Milpitas Blvd., 1520 Milpitas, CA 95035 USA 1521 Phone: +1 (408) 957-5608 1522 Email: hari_ghadia@adaptec.com 1524 Patricia Thaler 1525 Agilent Technologies, Inc. 1526 1101 Creekside Ridge Drive, #100 1527 M/S-RG10 1528 Roseville, CA 95678 1529 Phone: +1-916-788-5662 1530 email: pat_thaler@agilent.com 1532 Mike Penna 1533 Broadcom Corporation 1534 16215 Alton Parkway 1535 Irvine, California 92619-7013 USA 1536 Phone: +1 (949) 926-7149 1537 Email: MPenna@Broadcom.com 1539 Uri Elzur 1540 Broadcom Corporation 1541 16215 Alton Parkway 1542 Irvine, California 92619-7013 USA 1543 Phone: +1 (949) 585-6432 1544 Email: Uri@Broadcom.com 1546 Ted Compton 1547 EMC Corporation 1548 Research Triangle Park, NC 27709, USA 1549 Phone: 919-248-6075 1550 Email: compton_ted@emc.com 1552 Jim Wendt 1553 Hewlett-Packard Company 1554 8000 Foothills Boulevard 1555 Roseville, CA 95747-5668 USA 1556 Phone: +1 (916) 785-5198 1557 Email: jim_wendt@hp.com 1559 Mike Krause 1560 Hewlett-Packard Company, 43LN 1562 Shah, et. al. Expires August 2005 36 1563 19410 Homestead Road 1564 Cupertino, CA 95014 USA 1565 Phone: +1 (408) 447-3191 1566 Email: krause@cup.hp.com 1568 Dave Minturn 1569 Intel Corporation 1570 MS JF1-210 1571 5200 North East Elam Young Parkway 1572 Hillsboro, OR 97124 USA 1573 Phone: +1 (503) 712-4106 1574 Email: dave.b.minturn@intel.com 1576 Howard C. Herbert 1577 Intel Corporation 1578 MS CH7-404 1579 5000 West Chandler Blvd. 1580 Chandler, AZ 85226 USA 1581 Phone: +1 (480) 554-3116 1582 Email: howard.c.herbert@intel.com 1584 Tom Talpey 1585 Network Appliance 1586 375 Totten Pond Road 1587 Waltham, MA 02451 USA 1588 Phone: +1 (781) 768-5329 1589 EMail: thomas.talpey@netapp.com 1591 Dwight Barron 1592 Hewlett-Packard Company 1593 20555 SH 249 1594 Houston, TX 77070-2698 USA 1595 Phone: +1 (281) 514-2769 1596 Email: Dwight.Barron@Hp.com 1598 Dave Garcia 1599 Hewlett-Packard Company 1600 19333 Vallco Parkway 1601 Cupertino, Ca. 95014 USA 1602 Phone: +1 (408) 285-6116 1603 Email: dave.garcia@hp.com 1605 Jeff Hilland 1606 Hewlett-Packard Company 1607 20555 SH 249 1608 Houston, Tx. 77070-2698 USA 1609 Phone: +1 (281) 514-9489 1610 Email: jeff.hilland@hp.com 1612 Shah, et. al. Expires August 2005 37 1613 Barry Reinhold 1614 Lamprey Networks 1615 Durham, NH 03824 USA 1616 Phone: +1 (603) 868-8411 1617 Email: bbr@LampreyNetworks.com 1619 Shah, et. al. Expires August 2005 38 1620 14 Full Copyright Statement 1622 This document contains contributions from individuals representing 1623 or sponsored by ADAPTEC INC., AGILENT TECHNOLOGIES INC., BROADCOM 1624 CORPORATION, CISCO SYSTEMS INC., EMC CORPORATION, HEWLETT-PACKARD 1625 COMPANY, INTERNATIONAL BUSINESS MACHINES CORPORATION, INTEL 1626 CORPORATION, MICROSOFT CORPORATION, NETWORK APPLIANCE INC. 1628 This document and the information contained herein is provided on an 1629 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 1630 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, AND 1631 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1632 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE 1633 OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY 1634 IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR 1635 PURPOSE. 1637 Copyright (c) The Internet Society (2005). This document is subject 1638 to the rights, licenses and restrictions contained in BCP 78, and 1639 except as set forth therein, the authors retain all their rights. 1641 Shah, et. al. Expires August 2005 39