idnits 2.17.1 draft-ietf-ips-iwarp-da-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 21. -- Found old boilerplate from RFC 3978, Section 5.5 on line 2285. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2297. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2305. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2311. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 2007) is 6183 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 3720 (Obsoleted by RFC 7143) == Outdated reference: A later version (-07) exists of draft-ietf-rddp-ddp-06 == Outdated reference: A later version (-10) exists of draft-ietf-rddp-security-07 Summary: 4 errors (**), 0 flaws (~~), 5 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT Mallikarjun Chadalapaka 3 draft-ietf-ips-iwarp-da-05.txt HP 4 John Hufferd 5 IBM 6 Julian Satran 7 IBM 8 Hemal Shah 9 Intel 11 Expires 12 May 2007 14 Datamover Architecture for iSCSI (DA) 16 Status of this Memo 17 By submitting this Internet-Draft, each author represents 18 that any applicable patent or other IPR claims of which he or 19 she is aware have been or will be disclosed, and any of which 20 he or she becomes aware will be disclosed, in accordance with 21 Section 6 of BCP 79. 23 Internet-Drafts are working documents of the Internet 24 Engineering Task Force (IETF), its areas, and its working 25 groups. Note that other groups may also distribute working 26 documents as Internet-Drafts. 28 Internet-Drafts are draft documents valid for a maximum of 29 six months and may be updated, replaced, or obsoleted by 30 other documents at any time. It is inappropriate to use 31 Internet-Drafts as reference material or to cite them other 32 than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/1id-abstracts.html 37 The list of Internet-Draft Shadow Directories can be accessed 38 at http://www.ietf.org/shadow.html. 40 Abstract 41 iSCSI is a SCSI transport protocol that maps the SCSI family 42 of application protocols onto TCP/IP. Datamover Architecture 43 for iSCSI (DA) defines an abstract model in which the 44 movement of data between iSCSI end nodes is logically 45 separated from the rest of the iSCSI protocol in order to 46 allow iSCSI to adapt to innovations available in new IP 47 transports. While DA defines the architectural functions 48 required of the class of Datamover protocols, it does not 49 define any specific Datamover protocols. Each such Datamover 50 protocol, to be defined in a separate document, provides a 51 reliable transport for all iSCSI PDUs, but actually moves the 52 data required for certain iSCSI PDUs without involving the 53 remote iSCSI layer itself. This document begins with an 54 introduction of a few new abstractions, defines a layered 55 architecture for iSCSI and Datamover protocols, and then 56 models the interactions within an iSCSI end node between the 57 iSCSI layer and the Datamover layer that happen in order to 58 transparently perform remote data movement within an IP 59 fabric. It is intended that this definition would help map 60 iSCSI to generic RDMA-capable IP fabrics in the future 61 comprising TCP, SCTP, and possibly other underlying network 62 transport layers such as InfiniBand. 64 Table of Contents 66 1 Definitions and acronyms ...............................5 67 1.1 Definitions ............................................5 68 1.2 Acronyms ...............................................5 69 2 Motivation .............................................7 70 2.1 Intent .................................................7 71 2.2 Interpretation of Requirements .........................8 72 3 Architectural layering of iSCSI and Datamover layers ...9 73 4 Design Overview .......................................11 74 5 Architectural Concepts ................................13 75 5.1 iSCSI PDU types .......................................13 76 5.1.1 iSCSI data-type PDUs.................................13 77 5.1.2 iSCSI control-type PDUs..............................14 78 5.2 Data_Descriptor .......................................14 79 5.3 Connection_Handle .....................................14 80 5.4 Operational Primitive .................................15 81 5.5 Transport Connection ..................................16 82 6 Datamover layer and Datamover protocol ................17 83 7 Functional Overview ...................................19 84 7.1 Startup ...............................................19 85 7.2 Full Feature Phase ....................................19 86 7.3 Wrapup ................................................20 87 8 Operational Primitives provided by the Datamover layer 22 88 8.1 Send_Control ..........................................22 89 8.2 Put_Data ..............................................23 90 8.3 Get_Data ..............................................24 91 8.4 Allocate_Connection_Resources .........................24 92 8.5 Deallocate_Connection_Resources .......................25 93 8.6 Enable_Datamover ......................................26 94 8.7 Connection_Terminate ..................................26 95 8.8 Notice_Key_Values .....................................27 96 8.9 Deallocate_Task_Resources .............................27 97 9 Operational Primitives provided by the iSCSI layer ....29 98 9.1 Control_Notify ........................................29 99 9.2 Connection_Terminate_Notify ...........................30 100 9.3 Data_Completion_Notify ................................30 101 9.4 Data_ACK_Notify .......................................31 102 10 Datamover Interface (DI) ..............................33 103 10.1 Overview.............................................33 104 10.2 Interactions for handling asynchronous notifications.33 105 10.2.1 Connection termination .............................33 106 10.2.2 Data transfer completion ...........................33 107 10.2.3 Data acknowledgement ...............................34 108 10.3 Interactions for sending an iSCSI PDU................35 109 10.3.1 SCSI Command .......................................35 110 10.3.2 SCSI Response ......................................36 111 10.3.3 Task Management Function Request ...................36 112 10.3.4 Task Management Function Response ..................37 113 10.3.5 SCSI Data-out & SCSI Data-in .......................37 114 10.3.6 Ready To Transfer (R2T) ............................37 115 10.3.7 Asynchronous Message ...............................38 116 10.3.8 Text Request .......................................38 117 10.3.9 Text Response ......................................38 118 10.3.10 Login Request ....................................39 119 10.3.11 Login Response ...................................39 120 10.3.12 Logout Command ...................................40 121 10.3.13 Logout Response ..................................40 122 10.3.14 SNACK Request ....................................40 123 10.3.15 Reject ...........................................41 124 10.3.16 NOP-Out ..........................................41 125 10.3.17 NOP-In ...........................................41 126 10.4 Interactions for receiving an iSCSI PDU..............41 127 10.4.1 General Control-type PDU notification ..............42 128 10.4.2 SCSI Data Transfer PDUs ............................42 129 10.4.3 Login Request ......................................43 130 10.4.4 Login Response .....................................44 131 11 Security Considerations ...............................45 132 11.1 Architectural Considerations.........................45 133 11.2 Wire Protocol Considerations.........................46 134 12 IANA Considerations ...................................47 135 13 References and Bibliography ...........................48 136 13.1 Normative References.................................48 137 13.2 Informative References...............................48 138 14 Authors' Addresses ....................................49 139 15 Acknowledgements ......................................50 140 16 Appendix ..............................................54 141 16.1 Design considerations for a Datamover protocol.......54 142 16.2 Examples of Datamover interactions...................54 143 17 Full Copyright Statement ..............................64 144 18 Intellectual Property Statement .......................65 146 Table of Figures 148 Figure 1 Datamover Architecture diagram, with the RDMAP 149 example......................................................9 150 Figure 2 A successful iSCSI login on initiator..............56 151 Figure 3 A successful iSCSI login on target.................56 152 Figure 4 A failed iSCSI login on initiator..................57 153 Figure 5 A failed iSCSI login on target.....................57 154 Figure 6 iSCSI does not enable the Datamover................58 155 Figure 7 A normal iSCSI connection termination..............59 156 Figure 8 An abnormal iSCSI connection termination...........59 157 Figure 9 A SCSI Write data transfer.........................60 158 Figure 10 A SCSI Read data transfer.........................61 159 Figure 11 A SCSI Read data acknowledgement..................62 160 Figure 12 Task resource cleanup on abort...................63 162 1 Definitions and acronyms 164 1.1 Definitions 166 I/O Buffer - A buffer that is used in a SCSI Read or Write 167 operation so SCSI data may be sent from or received into 168 that buffer. 170 Datamover protocol - A Datamover protocol is a data transfer 171 wire protocol for iSCSI that meets the requirements 172 stated in section 6. 174 Datamover layer - A Datamover layer is a protocol layer 175 within an end node that implements the Datamover 176 protocol. 178 Datamover-assisted - An iSCSI connection is said to be 179 "Datamover-assisted" when a Datamover layer is enabled 180 for moving control and data information on that iSCSI 181 connection. 183 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 184 NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and 185 "OPTIONAL" in this document are to be interpreted as 186 described in [RFC2119]. 188 1.2 Acronyms 190 Acronym Definition 192 ------------------------------------------------------------- 194 DA Datamover Architecture for iSCSI 196 DDP Direct Data Placement Protocol 198 DI Datamover Interface 200 IANA Internet Assigned Numbers Authority 202 IETF Internet Engineering Task Force 204 I/O Input - Output 206 IP Internet Protocol 208 iSCSI Internet SCSI 210 iSER iSCSI Extensions for RDMA 212 ITT Initiator Task Tag 214 LO Leading Only 216 MPA Marker PDU Aligned Framing for TCP 218 PDU Protocol Data Unit 220 RDDP Remote Direct Data Placement 222 RDMA Remote Direct Memory Access 224 R2T Ready To Transfer 226 R2TSN Ready To Transfer Sequence Number 228 RDMA Remote Direct Memory Access 230 RDMAP Remote Direct Memory Access Protocol 232 RFC Request For Comments 234 SAM SCSI Architecture Model 236 SCSI Small Computer Systems Interface 238 SN Sequence Number 240 SNACK Selective Negative Acknowledgment - also 242 Sequence Number Acknowledgement for data 244 TCP Transmission Control Protocol 246 TTT Target Transfer Tag 248 2 Motivation 250 2.1 Intent 252 There are relatively new standard protocols that enable 253 Remote Direct Memory Access (RDMA) and Remote Direct Data 254 Placement (RDDP) technologies to work over IP fabrics. The 255 principal value proposition of these technologies is that 256 they enable one end node to place data in the final intended 257 buffer on the remote end node, thus eliminating the data copy 258 that traditionally happens in the receive path to move the 259 data to the final buffer. The data copy avoidance in turn 260 eliminates unnecessary memory bandwidth consumption, substan- 261 tially decreases the reassembly buffer size requirements, and 262 preserves CPU cycles that would otherwise be spent in 263 copying. 265 The iSCSI specification ([RFC3720]) defines a very detailed 266 data transfer model that employs SCSI Data-In PDUs, SCSI 267 Data-Out PDUs, and R2T PDUs, in addition to the SCSI Command 268 and SCSI Response PDUs that respectively create and conclude 269 the task context for the data transfer. In the traditional 270 iSCSI model, the iSCSI protocol layer plays the central role 271 in pacing the data transfer and carrying out the ensuing data 272 transfer itself. An alternative architecture would be for 273 iSCSI to delegate a large part of this data transfer role to 274 a separate protocol layer exclusively designed to move data, 275 which in turn is possibly aided by a data movement and 276 placement technology such as RDMA. 278 If iSCSI were operating in such RDMA environments, iSCSI 279 would be shielded from the low-level data transfer mechanics 280 but would only be privy to the conclusion of the requested 281 data transfer Thus, there would be an effective "off- 282 loading" of the work that an iSCSI protocol layer is expected 283 to perform, compared to today's iSCSI end nodes. For such 284 RDMA environments, it is highly desirable that there be a 285 standard architecture to separate the data movement part of 286 the iSCSI protocol definition from the rest of the iSCSI 287 functionality. This architecture precisely defines what a 288 Datamover layer is and also describes the model of 289 interactions between the iSCSI layer and the Datamover layer 290 (section 6). In order to satisfy this need, this document 291 presents a Datamover Architecture for iSCSI(DA) and also 292 summarizes a reasonable model for interactions between the 293 iSCSI layer and the Datamover layer for each of the iSCSI 295 PDUs that are defined in [RFC3720]. Note that while DA is 296 motivated by the advent of RDMA over TCP/IP technology, the 297 architecture is not dependent on RDMA in its design. DA is 298 intended to be a generic architectural framework for allowing 299 different types of Datamovers based on different types of 300 RDMA and transport protocols. Adoption of this model will 301 help iSCSI proliferate into more environments. 303 2.2 Interpretation of Requirements 305 This draft introduces certain architectural abstractions and 306 builds an abstract functional interface model between iSCSI 307 and Datamover protocol layers based on those abstractions. 308 This architectural style is motivated by the following 309 desires: 311 a) Provide guidance to Datamover protocol designers 312 with respect to the functional boundary between 313 iSCSI and the Datamover protocols. This guidance is 314 critical since a significant part of the [RFC3720] 315 protocol definition is left unchanged by DA 316 architecture and the iSCSI notions from [RFC3720] 317 (e.g., tasks, ITTs) are leveraged by the Datamover 318 protocol. 320 b) Aid existing iSCSI implementations to rapidly adapt 321 to DA architecture, largely by leveraging the 322 architectural abstractions also into implementation 323 constructs - e.g., functions, APIs, modules. 325 However, note that DA architecture does not intend to impose 326 any implementation specifics per se. When a DA architectural 327 concept (e.g., Operational Primitive) is described as 328 mandatory ("MUST") or recommended ("SHOULD") of a layer 329 (iSCSI or Datamover) in this document, the intent is that an 330 implementation respectively MUST or SHOULD produce the same 331 protocol action as what the model describes. Specifically, 332 no implementation compliance in terms of names, modules or 333 API arguments etc. is implied by this Architecture by such 334 use of [RFC2119] terms, only a functional compliance is 335 sought. 337 3 Architectural layering of iSCSI and Datamover layers 339 Figure 1 illustrates an example of the architectural layering 340 of iSCSI and Datamover layers, in conjunction with a TCP/IP 341 implementation of RDMAP/DDP ([DDP]) layers in an iSCSI end 342 node. Note that RDMAP/DDP/MPA, and TCP protocol layers are 343 shown here only as an example and in reality, DA is 344 completely oblivious to protocol layers below the Datamover 345 layer. The RDMAP/DDP/MPA protocol stack provides a generic 346 transport service with direct data placement. There is no 347 need to tailor the implementation of this protocol stack to 348 the specific ULP to benefit from these services. 350 Initiator stack Target stack 352 +----------------+ SCSI application +----------------+ 353 | SCSI Layer | protocols | SCSI Layer | 354 +----------------+ +----------------+ 355 ^ ^ 356 | | 357 v v 358 +----------------+ iSCSI protocol +----------------+ 359 | iSCSI Layer | (excluding data | iSCSI Layer | 360 +----------------+ movement) +----------------+ 361 ^ ^ 362 -- ---+-- ---- DI (Datamover Interface)--- ----+--- ---- 363 v v 364 +----------------+ a Datamover +----------------+ 365 | Datamover Layer| protocol | Datamover Layer| 366 +----------------+ +----------------+ 367 ^ ^ 368 +-------+----------+ +---------+-----------+ 369 | v | | v | 370 |+---------------+ | | +-----------------+ | 371 || RDMAP/DDP/MPA | | RDMAP/DDP/MPA | | RDMAP/DDP/MPA | | 372 || Layers | | protocols | | Layers | | 373 |+---------------+ | | +-----------------+ | 374 | ^ | | ^ | 375 | | network | | | network | 376 | | transport| | | transport | 377 | v | | v | 378 |+---------------+ | | +----------------+ | 379 || TCP Layer | | TCP protocol | | TCP Layer | | 380 |+---------------+ | | +----------------+ | 381 | ^ | | ^ | 382 +-------+----------+ +---------+-----------+ 383 +------------------------------------------+ 385 Figure 1 Datamover Architecture diagram, with the 386 RDMAP example 388 The scope of this document is limited to: 390 1. Defining the notion of a Datamover layer and a Datamover 391 protocol (section 6), 393 2. Defining the functionality distribution between the 394 iSCSI layer and the Datamover layer along with the 395 communication model between the two (Operational 396 Primitives), and, 398 3. Modeling the interactions between the blocks labeled as 399 "iSCSI Layer" and "Datamover Layer" in Figure 1 - i.e. 400 defining the interface labeled as "DI" in the figure - 401 for each defined iSCSI PDU, based on the Operational 402 Primitives. 404 4 Design Overview 406 This document discusses and defines a model for interactions 407 between the iSCSI layer and a "Datamover layer" (see section 408 6) operating within an iSCSI end node, presumably 409 communicating with one or more iSCSI end nodes with similar 410 layering. The model for interactions for handling different 411 iSCSI operations is called the "Datamover Interface" (DI, 412 section 10), while the architecture itself is called 413 "Datamover Architecture for iSCSI" (DA). It is likely that 414 the architecture will have implications on the Datamover wire 415 protocols as DA places certain requirements and functionality 416 expectations on the Datamover layer. However, this document 417 itself neither defines any new wire protocol for the 418 Datamover layer, nor any potential modifications to the iSCSI 419 wire protocol to employ the Datamover layer. The scope of 420 this document is strictly limited to specifying the 421 architectural framework and the minimally required 422 interactions that happen within an iSCSI end node to leverage 423 the Datamover layer. 425 The design ideas behind DA can be summarized thus - 427 1) DA defines an abstract functional interface model of iSCSI 428 layer's interactions with a Datamover layer below - i.e. DA 429 models the interactions between the logical "bottom" 430 interface of iSCSI and the logical "top" interface of a 431 Datamover. 433 2) DA guides the wire protocol for a Datamover layer by 434 defining the iSCSI knowledge that the Datamover layer may 435 utilize in its protocol definition (as an example, this 436 draft completely limits the notion of "iSCSI session" to 437 the iSCSI layer). 439 3) DA is designed to allow implementing the Datamover layer 440 either in hardware or in software. 442 4) DA is not a wire protocol spec, but an architecture that 443 also models the interactions between iSCSI and Datamover 444 layers operating within an iSCSI end node. 446 5) DA by design seeks to model the iSCSI-Datamover 447 interactions in a way that the modeling is independent of 448 the specifics of either a particular iSCSI revision, or a 449 specific instantiation of a Datamover layer. 451 6) DA introduces and relies on the notion of a defined set of 452 Operational Primitives (could be seen as entry point 453 definitions in implementation terms) provided by each layer 454 to the other to carry out the request-response 455 interactions. 457 7) DA is intended to allow Datamover protocol definitions with 458 minimal changes to existing iSCSI implementations. 460 8) DA is designed to allow the iSCSI layer to completely rely 461 on the Datamover layer for all the data transport needs. 463 9) DA models the architecturally required minimal interactions 464 between an operational iSCSI layer and a Datamover layer to 465 realize the iSCSI-transparent data movement. There may be 466 several other interactions in a typical implementation in 467 order to bootstrap a Datamover layer (or an iSCSI layer) 468 into operation, and they are outside the scope of this 469 document. 471 Note that in summary, DA is architected to support many 472 different Datamover protocols operating under the iSCSI 473 layer. One such example of a Datamover protocol is iSER 474 ([iSER]). 476 5 Architectural Concepts 478 5.1 iSCSI PDU types 480 This section defines the iSCSI PDU classification 481 terminology, as defined and used in this document. Out of 482 the set of legal iSCSI PDUs defined in [RFC3720], as we will 483 see in section 5.1.1, the iSCSI layer does not request a SCSI 484 Data-Out PDU carrying solicited data for transmission across 485 the Datamover Interface per this architecture. For this 486 reason, the SCSI Data-Out PDU carrying solicited data is 487 excluded in the iSCSI PDU classification we introduce in this 488 section (for SCSI Data-Out PDUs for unsolicited Data, see 489 section 5.1.2). The rest of the legal iSCSI PDUs that may be 490 exchanged across the Datamover Interface are defined to 491 consist of two classes: 493 1) iSCSI data-type PDUs 495 2) iSCSI control-type PDUs 497 5.1.1 iSCSI data-type PDUs 499 An iSCSI data-type PDU is defined as an iSCSI PDU that causes 500 data transfer, transparent to the remote iSCSI layer, to take 501 place between the peer iSCSI nodes on a full feature phase 502 iSCSI connection. A data-type PDU, when requested for 503 transmission by the sender iSCSI layer, results in the 504 associated data transfer without the participation of the 505 remote iSCSI layer, i.e. the PDU itself is not delivered as- 506 is to the remote iSCSI layer. The following iSCSI PDUs 507 constitute the set of iSCSI data-type PDUs - 509 1) SCSI Data-In PDU 511 2) R2T PDU 513 In an iSCSI end node structured as an iSCSI layer and a 514 Datamover layer as defined in this document, the solicitation 515 for Data-out (i.e. R2T PDU) is not delivered to the initiator 516 iSCSI layer, per the definition of an iSCSI data-type PDU. 517 The data transfer is instead performed via the mechanisms 518 known to the Datamover layer (e.g. RDMA Read). This in turn 519 implies that a SCSI Data-Out PDU for solicited data is never 520 requested for transmission across the Datamover Interface at 521 the initiator. 523 5.1.2 iSCSI control-type PDUs 525 Any iSCSI PDU that is not an iSCSI data-type PDU and also not 526 a solicited SCSI Data-out PDU is defined as an iSCSI control- 527 type PDU. Specifically, it is to be noted that SCSI Data-Out 528 PDUs for unsolicited Data are defined as iSCSI control-type 529 PDUs. 531 5.2 Data_Descriptor 533 A Data_Descriptor is an information element that describes an 534 iSCSI/SCSI data buffer, provided by the iSCSI layer to its 535 local Datamover layer or by the Datamover layer to its local 536 iSCSI layer for identifying the data associated respectively 537 with the requested or completed operation. 539 In implementation terms, a Data_Descriptor may be a scatter- 540 gather list describing a local buffer, the exact structure of 541 which is subject to the constraints imposed by the operating 542 environment on the local iSCSI node. 544 5.3 Connection_Handle 546 A Connection_Handle is an information element that identifies 547 the particular iSCSI connection for which an inbound or 548 outbound iSCSI PDU is intended. A connection handle is unique 549 for a given pair of an iSCSI layer instance and a Datamover 550 layer instance. The Connection_Handle qualifier is used in 551 all invocations of any Operational Primitive for connection 552 identification. 554 Note that the Connection_Handle is conceptually different 555 from the Connection Identifier (CID) defined by the iSCSI 556 specification. While the CID is a unique identifier of an 557 iSCSI connection within an iSCSI session, the uniqueness of 558 the Connection_Handle extends to the entire iSCSI layer 559 instance coupled with the Datamover layer instance, across 560 possibly multiple iSCSI sessions. 562 In implementation terms, a Connection_Handle could be an 563 opaque identifier exchanged between the iSCSI layer and the 564 Datamover layer at the connection login time. One may also 565 consider it to be similar in scope of uniqueness to a socket 566 identifier. The exact structure and modalities of exchange 567 of a Connection_Handle between the two layers is 568 implementation-specific. 570 5.4 Operational Primitive 572 An Operational Primitive, in this document, is an abstract 573 functional interface procedure that requests another layer to 574 perform a specific action on the requestor's behalf or 575 notifies the other layer of some event. The Datamover 576 Interface between an iSCSI layer instance and a Datamover 577 layer instance within an iSCSI end node uses a set of 578 Operational Primitives to define the functional interface 579 between the two layers. Note that not every invocation of an 580 Operational Primitive may elicit a response from the 581 requested layer. This document describes the types of 582 Operational Primitives that are implicitly required and 583 provided by the iSCSI protocol layer as defined in [RFC3720], 584 and the semantics of these Primitives. 586 Note that ownership of buffers and data structures is likely 587 to be exchanged between the iSCSI layer and its local 588 Datamover layer in invoking the Operational Primitives 589 defined in this architecture. The buffer management details, 590 including how buffers are allocated and released, are 591 implementation-specific and thus are outside the scope of 592 this document. 594 Each Operational Primitive invocation needs a certain 595 "information context" (e.g., Connection_Handle) for 596 performing the specific action being requested of it. The 597 required information context is described in this document by 598 a listing of "qualifiers" on each invocation - in the style 599 of function call arguments. No implementation specific is 600 however implied in this notation. The "qualifiers" of any 601 Operational Primitive invocation specified in this document 602 thus represent the mandatory information context that the 603 Operational Primitive invocation MUST consider in performing 604 the action. While the qualifiers are required, the method of 605 realizing the qualifiers (passed synchronously with 606 invocation, or retrieved from task context, or retrieved from 607 shared memory etc.) is really up to the implementations. 609 When an Operational Primitive implementation is described as 610 mandatory ("MUST") or recommended ("SHOULD") of a layer 611 (iSCSI or Datamover) in this document, the intent is that an 612 implementation respectively MUST or SHOULD produce the same 613 protocol action as what the model describes. 615 5.5 Transport Connection 617 The term "Transport Connection" is used in this document as a 618 generic term to represent the end-to-end logical connection 619 as defined by the underlying reliable transport protocol. 620 For this revision of this document, a Transport Connection 621 means only a TCP connection. 623 6 Datamover layer and Datamover protocol 625 This section introduces the notion of a "Datamover layer" and 626 "Datamover protocol" as meant in this document, and defines 627 the requirements on a Datamover protocol. 629 A Datamover layer is the implementation component that 630 realizes a Datamover protocol functionality in an iSCSI- 631 capable end node, in communicating with other iSCSI end nodes 632 with similar capabilities. More specifically, a "Datamover 633 layer" MUST provide the following functionality and the 634 "Datamover protocol" MUST consist of the wire protocol 635 required to realize the following functionality - 637 1) guarantee that all the necessary data transfers take place 638 when the local iSCSI layer requests transmitting a command 639 (in order to complete a SCSI command, for an initiator),or 640 sending/receiving an iSCSI data sequence (in order to 641 complete part of a SCSI command, for a target). 643 2) transport an iSCSI control-type PDU as-is to the peer 644 Datamover layer when requested to do so by the local iSCSI 645 layer. 647 3) provide notification and delivery to the iSCSI layer upon 648 arrival of an iSCSI control-type PDU. 650 4) provide an initiator-to-target data acknowledgement of SCSI 651 read data back to the target iSCSI layer, when requested. 653 5) provide an asynchronous notification upon completion of a 654 requested data transfer operation that moved data without 655 involving the iSCSI layer. 657 6) place the SCSI data into the I/O buffers or pick up the 658 SCSI data for transmission out of the data buffers that the 659 iSCSI layer had requested to be used for a SCSI I/O. 661 7) provide an error-free (i.e. must have at least the same 662 level of assurance of data integrity as the CRC32C iSCSI 663 data digest), reliable, in-order delivery transport 664 mechanism over IP networks in performing the data transfer, 665 and asynchronously notify the iSCSI layer upon iSCSI 666 connection termination. 668 Note that this architecture expects that each compliant 669 Datamover protocol will define the precise means of 670 satisfying the requirements specified in this section. 672 In order to meet the functional requirements listed in this 673 section, certain Datamover protocols may require pre-posted 674 buffers from the local iSCSI protocol layer via mechanisms 675 outside the scope of this document and in some 676 implementations, the absence of such buffers may result in a 677 connection failure. Datamover protocols may also realize 678 these functional requirements via methods not explicitly 679 listed in this document. 681 7 Functional Overview 683 This section presents an overview of the functional 684 interactions between the iSCSI layer and the Datamover layer 685 as intended by this Architecture. 687 7.1 Startup 689 The iSCSI Login Phase on an iSCSI connection occurs as 690 defined in [RFC3720]. The Architecture assumes that at the 691 end of the Login Phase, both the initiator and target, if 692 they had so decided, transition the connection to being 693 Datamover-assisted. The precise means of how an iSCSI 694 initiator and an iSCSI target agree on having the connection 695 Datamover-assisted is defined by the Datamover protocol. The 696 only architectural requirement is that all iSCSI interactions 697 in the iSCSI Full Feature Phase MUST be Datamover-assisted 698 subject to the prior agreement, meaning that Datamover 699 protocol is in the iSCSI-to-iSCSI communication path below 700 the iSCSI layer on either side as shown in Figure 1. DA 701 defines the Enable_Datamover Operational Primitive (section 702 8.6) to bring about this transition to a Datamover-assisted 703 connection. 705 The Architecture also assumes that the Datamover layer may 706 require a certain number of opaque local resources for making 707 a connection Datamover-assisted. DA thus defines the 708 Allocate_Connection_Resources Operational Primitive (section 709 8.4) to model this interaction. This Primitive is intended 710 to be invoked on each side once the two sides decide (as 711 previously noted) to have the connection Datamover-assisted. 712 The expected sequence of Primitive invocations is depicted in 713 Figure 2 and Figure 3 in section 16.2. Figure 4, Figure 5, 714 and Figure 6 illustrate how the Primitives may be employed to 715 deal with various legal login outcomes. 717 7.2 Full Feature Phase 719 All iSCSI peer communication in the Full Feature Phase 720 happens through the Datamover layers if the iSCSI connection 721 is Datamover-assisted. The Architecture assumes that a 722 Datamover layer may require a certain number of opaque local 723 resources for each new iSCSI task. In the normal course of 724 execution, these task-level resources in the Datamover layer 725 are assumed to be transparently allocated on each task 726 initiation and deallocated on the conclusion of each task as 727 appropriate. In exception scenarios however - in scenarios 728 that do not yield a SCSI Response for each task such as ABORT 729 TASK operation - the Architecture assumes that the Datamover 730 layer needs to be notified of the individual task 731 terminations to aid its task-level resource management. DA 732 thus defines the Deallocate_Task_Resources Operational 733 Primitive (section 8.9) to model this task-resource 734 management. In specifying the ITT qualifier for the 735 Deallocate_Task_Resources Primitive, the Architecture further 736 assumes that the Datamover layer tracks its opaque task-level 737 local resources by the iSCSI ITT. DA also defines 738 Send_Control (section 8.1), Put_Data (section 8.2), Get_Data 739 (section 8.3), Data_Completion_Notify(section 9.3), 740 Data_ACK_Notify (section 9.4), and Control_Notify (section 741 9.1) Operational Primitives to model the various Full Feature 742 Phase interactions. 744 Figure 9, Figure 10, and Figure 11 in section 16.2 show some 745 Full Feature Phase interactions - SCSI Write task, SCSI Read 746 task, and a SCSI Read Data acknowledgement respectively. 747 Figure 12 in section 16.2 illustrates how an ABORT TASK 748 operation can be modeled leading to deterministic resource 749 cleanup on the Datamover layer. 751 7.3 Wrapup 753 Once an iSCSI connection becomes Datamover-assisted, the 754 connection continues in that state till the end of the Full 755 Feature Phase, i.e. the termination of the connection. The 756 Architecture assumes that when a connection is normally 757 logged out, the Datamover layer needs to be notified so that 758 its connection-level opaque resources (see section 7.1) may 759 now be freed up. DA thus defines a Connection_Terminate 760 Operational Primitive (section 8.7) to model this 761 interaction. The Architecture further assumes that when a 762 connection termination happens without iSCSI layer's 763 involvement (e.g., TCP RST), the Datamover layer is capable 764 of locally cleaning up its task-level and connection-level 765 resources before notifying the iSCSI layer of the fact. DA 766 thus defines the Connection_Terminate_Notify Operational 767 Primitive (section 9.2) to model this interaction. 769 Figure 7 and Figure 8 in section 16.2 illustrate the 770 interactions between the iSCSI and Datamover layers in normal 771 and unexpected connection termination scenarios. 773 8 Operational Primitives provided by the Datamover layer 775 While the iSCSI specification itself does not have a notion 776 of Operational Primitives, any iSCSI layer implementing the 777 iSCSI specification functionally requires the following 778 Operational Primitives from its Datamover layer. Thus, any 779 Datamover protocol compliant with this architecture MUST 780 implement the Operational Primitives described in this 781 section. These Operational Primitives are invoked by the 782 iSCSI layer as appropriate. Unless otherwise stated, all the 783 following Operational Primitives may be used both on the 784 initiator side and the target side. In general programming 785 terminology, this set of Operational Primitives may be 786 construed as "down calls". 788 1) Send_Control 790 2) Put_Data 792 3) Get_Data 794 4) Allocate_Connection_Resources 796 5) Deallocate_Connection_Resources 798 6) Enable_Datamover 800 7) Connection_Terminate 802 8) Notice_Key_Values 804 9) Deallocate_Task_Resources 806 8.1 Send_Control 808 Input qualifiers: Connection_Handle, iSCSI PDU-specific 809 qualifiers 811 Return Results: Not specified. 813 An iSCSI layer requests its local Datamover layer to transmit 814 an iSCSI control-type PDU to the peer iSCSI layer operating 815 in the remote iSCSI node by this Operational Primitive. The 816 Datamover layer performs the requested operation, and may add 817 its own protocol headers in doing so. The iSCSI layer MUST 818 NOT invoke the Send_Control Operational Primitive on an iSCSI 819 connection that is not yet Datamover-assisted. 821 An initiator iSCSI layer requesting the transfer of a SCSI 822 command PDU or a target iSCSI layer requesting the transfer 823 of a SCSI response PDU are examples of invoking the 824 Send_Control Operational Primitive. As section 10.3.1 825 illustrates later on, the iSCSI PDU-specific qualifiers in 826 this example are: BHS and AHS, DataDescriptorOut, 827 DataDescriptorIn, ImmediateDataSize, and UnsolicitedDataSize 829 8.2 Put_Data 831 Input qualifiers: Connection_Handle, contents of a SCSI Data- 832 In PDU header, Data_Descriptor, Notify_Enable 834 Return Results: Not specified. 836 An iSCSI layer requests its local Datamover layer to transmit 837 the data identified by the Data_Descriptor for the SCSI Data- 838 In PDU to the peer iSCSI layer on the remote iSCSI node by 839 this Operational Primitive. The Datamover layer performs the 840 operation by using its own protocol means, completely 841 transparent to the remote iSCSI layer. The iSCSI layer MUST 842 NOT invoke the Put_Data Operational Primitive on an iSCSI 843 connection that is not yet Datamover-assisted. 845 The Notify_Enable qualifier is used to request the local 846 Datamover layer to generate or to not generate the eventual 847 local completion notification to the iSCSI layer for this 848 Put_Data invocation. For detailed semantics of this 849 qualifier, see section 9.3. 851 A Put_Data Primitive may only be invoked by an iSCSI layer on 852 the target to its local Datamover layer. 854 A target iSCSI layer requesting the transfer of an iSCSI read 855 data sequence (also known as a read burst) is an example of 856 invoking the Put_Data Operational Primitive. 858 8.3 Get_Data 860 Input qualifiers: Connection_Handle, contents of an R2T PDU, 861 Data_Descriptor, Notify_Enable 863 Return Results: Not specified. 865 An iSCSI layer requests its local Datamover layer to retrieve 866 certain data identified by the R2T PDU from the peer iSCSI 867 layer on the remote iSCSI node into the buffer identified by 868 the Data_Descriptor by invoking this Operational Primitive. 869 The Datamover layer performs the operation by using its own 870 protocol means, completely transparent to the remote iSCSI 871 layer. The iSCSI layer MUST NOT invoke the Get_Data 872 Operational Primitive on an iSCSI connection that is not yet 873 Datamover-assisted. 875 The Notify_Enable qualifier is used to request the local 876 Datamover layer to generate or to not generate the eventual 877 local completion notification to the iSCSI layer for this 878 Get_Data invocation. For detailed semantics of this 879 qualifier, see section 9.3. 881 A Get_Data Primitive may only be invoked by an iSCSI layer on 882 the target to its local Datamover layer. 884 A target iSCSI layer requesting the transfer of an iSCSI 885 write data sequence (also known as a write burst) is an 886 example of invoking the Get_Data Operational Primitive. 888 8.4 Allocate_Connection_Resources 890 Input qualifiers: Connection_Handle[, Resource_Descriptor ] 892 Return Results: Status. 894 By invoking this Operational Primitive, an iSCSI layer 895 requests its local Datamover layer to perform all the 896 Datamover-specific resource allocations required for the full 897 feature phase of an iSCSI connection. The Connection_Handle 898 identifies the connection the iSCSI layer is requesting the 899 resource allocation for in order to eventually transition the 900 connection to be a Datamover-assisted iSCSI connection. Note 901 that the Datamover layer however does not allocate any 902 Datamover-specific task-level resources upon invocation of 903 this Primitive. 905 An iSCSI layer, in addition, optionally specifies the 906 implementation-specific resource requirements for the iSCSI 907 connection to the Datamover layer, by passing an input 908 qualifier called Resource_Descriptor. The exact structure of 909 a Resource_Descriptor is implementation-dependent, and hence 910 structurally opaque to DA. 912 A return result of Status=success means that the 913 Allocate_Connection_Resources invocation corresponding to 914 that Connection_Handle succeeded. If an 915 Allocate_Connection_Resources invocation is made for a 916 Connection_Handle for which an earlier invocation succeeded, 917 the return Status must be success and the request will be 918 ignored by the Datamover layer. A return result of 919 Status=failure means that the Allocate_Connection_Resources 920 invocation corresponding to that Connection_Handle failed. 921 There MUST NOT be more than one Allocate_Connection_Resources 922 Primitive invocation outstanding for a given 923 Connection_Handle at any time. 925 The iSCSI layer must invoke the Allocate_Connection_Resources 926 Primitive before the invocation of the Enable_Datamover 927 Primitive. 929 8.5 Deallocate_Connection_Resources 931 Input qualifiers: Connection_Handle 933 Return Results: Not specified. 935 By invoking this Operational Primitive, an iSCSI layer 936 requests its local Datamover layer to deallocate all the 937 Datamover-specific resources that may have been allocated 938 earlier for the Transport Connection identified by the 939 Connection_Handle. The iSCSI layer may invoke this 940 Operational Primitive when the Datamover-specific resources 941 associated with the Connection_Handle are no longer necessary 942 (such as the Login failure of the corresponding iSCSI 943 connection). 945 8.6 Enable_Datamover 947 Input qualifiers: Connection_Handle, 948 Transport_Connection_Descriptor [, Final_Login_Response_PDU] 950 Return Results: Not specified. 952 By invoking this Operational Primitive, an iSCSI layer 953 requests its local Datamover layer to assist all further 954 iSCSI exchanges on the iSCSI connection (i.e. to make the 955 connection Datamover-assisted) identified by the 956 Connection_Handle, for which the Datamover-specific resource 957 allocation was earlier made. The iSCSI layer MUST NOT invoke 958 the Enable_Datamover Operational Primitive for an iSCSI 959 connection unless there was a corresponding prior resource 960 allocation. 962 The Final_Login_Response_PDU input qualifier is applicable 963 only for a target, and contains the final Login Response that 964 concludes the iSCSI Login phase and which must be sent as a 965 byte stream as expected by the initiator iSCSI layer. When 966 this qualifier is used, the target-Datamover layer MUST 967 transmit this final Login Response before Datamover 968 assistance is enabled for the Transport Connection. 970 The iSCSI layer identifies the specific Transport Connection 971 associated with the Connection_Handle to the Datamover layer 972 by specifying the Transport_Connection_Descriptor. The exact 973 structure of this Descriptor is implementation-dependent. 975 8.7 Connection_Terminate 977 Input qualifiers: Connection_Handle 979 Return Results: Not specified. 981 By invoking this Operational Primitive, an iSCSI layer 982 requests its local Datamover layer to terminate the Transport 983 Connection and deallocate all the connection and task 984 resources associated with the Connection_Handle. When this 985 Operational Primitive invocation returns to the iSCSI layer, 986 the iSCSI layer may assume the full ownership of all the 987 iSCSI-level resources, e.g. I/O Buffers, associated with the 988 connection. This Operational Primitive may be invoked only 989 with a valid Connection_Handle and the Transport Connection 990 associated with the Connection_Handle must already be 991 Datamover-assisted. 993 8.8 Notice_Key_Values 995 Input qualifiers: Connection_Handle, Number of keys, a list 996 of Key-Value pairs 998 Return Results: Not specified. 1000 By invoking this Operational Primitive, an iSCSI layer 1001 requests its local Datamover layer to take note of the 1002 negotiated values of the listed keys for the Transport 1003 Connection. This Operational Primitive may be invoked only 1004 with a valid Connection_Handle and the Key-Value pairs MUST 1005 be the current values that were successfully agreed upon by 1006 the iSCSI peers for the connection. The Datamover layer may 1007 use the values of the keys to aid the Datamover operation as 1008 it deems appropriate. The specific keys to be passed in as 1009 input qualifiers and the point(s) in time this Operational 1010 Primitive is invoked are implementation-dependent. 1012 8.9 Deallocate_Task_Resources 1014 Input qualifiers: Connection_Handle, ITT 1016 Return Results: Not specified. 1018 By invoking this Operational Primitive, an iSCSI layer 1019 requests its local Datamover Layer to deallocate all 1020 Datamover-specific resources that earlier may have been 1021 allocated for the task identified by the ITT qualifier. The 1022 iSCSI layer uses this Operational Primitive during exception 1023 processing when one or more active tasks are to be terminated 1024 without corresponding SCSI Response PDUs. This Primitive 1025 MUST be invoked for each active task terminated without a 1026 SCSI Response PDU. This Primitive MUST NOT be invoked by the 1028 iSCSI layer when a SCSI Response PDU normally concludes a 1029 task. When a SCSI Response PDU normally concludes a task 1030 (even if the SCSI Status was not a success), the Datamover 1031 layer is assumed to have automatically deallocated all 1032 Datamover-specific task resources for that task. Refer to 1033 section 7.2 for a related discussion on the Architectural 1034 assumptions on the task-level Datamover resource management, 1035 especially with respect to when the resources are assumed to 1036 be allocated. 1038 9 Operational Primitives provided by the iSCSI layer 1040 While the iSCSI specification itself does not have a notion 1041 of Operational Primitives, any iSCSI layer implementing the 1042 iSCSI specification would have to provide the following 1043 Operational Primitives to its local Datamover layer. Thus, 1044 any iSCSI protocol implementation compliant with this 1045 architecture MUST implement the Operational Primitives 1046 described in this section. These Operational Primitives are 1047 invoked by the Datamover layer as appropriate and when the 1048 iSCSI connection is Datamover-assisted. Unless otherwise 1049 stated, all the following Operational Primitives may be used 1050 both on the initiator side and the target side. In general 1051 programming terminology, this set of Operational Primitives 1052 may be construed as "up calls". 1054 1) Control_Notify 1056 2) Connection_Terminate_Notify 1058 3) Data_Completion_Notify 1060 4) Data_ACK_Notify 1062 9.1 Control_Notify 1064 Input qualifiers: Connection_Handle, an iSCSI control-type 1065 PDU. 1067 Return Results: Not specified. 1069 A Datamover layer notifies its local iSCSI layer, via this 1070 Operational Primitive, of the arrival of an iSCSI control- 1071 type PDU from the peer Datamover layer on the remote iSCSI 1072 node. The iSCSI layer processes the control-type PDU as 1073 defined in [RFC3720]. 1075 A target iSCSI layer being notified of the arrival of a SCSI 1076 Command is an example of invoking the Control_Notify 1077 Operational Primitive. 1079 Note that implementations may choose to describe the "iSCSI 1080 control-type PDU" qualifier in this notification using a 1081 Data_Descriptor (section 5.2) and not necessarily one 1082 contiguous buffer. 1084 9.2 Connection_Terminate_Notify 1086 Input qualifiers: Connection_Handle 1088 Return Results: Not specified. 1090 A Datamover layer notifies its local iSCSI layer on an 1091 unsolicited termination or failure of an iSCSI connection 1092 providing the Connection_Handle associated with the iSCSI 1093 Connection. The iSCSI Layer MUST consider the 1094 Connection_Handle to be invalid upon being so notified. The 1095 iSCSI layer processes the connection termination as defined 1096 in [RFC3720]. The Datamover layer MUST deallocate the 1097 connection and task resources associated with the terminated 1098 connection before notifying the iSCSI layer of the 1099 termination via this Operational Primitive. 1101 A target iSCSI layer being notified of an ungraceful 1102 connection termination by the Datamover layer when the 1103 underlying Transport Connection is torn down. Such a 1104 Connection_Terminate_Notify Operational Primitive may be 1105 triggered, for example, by a TCP RESET in cases where the 1106 underlying Transport Connection uses TCP. 1108 9.3 Data_Completion_Notify 1110 Input qualifiers: Connection_Handle, ITT, SN 1112 Return Results: Not specified. 1114 A Datamover layer notifies its local iSCSI layer on 1115 completing the retrieval of the data or upon sending the 1116 data, as requested in a prior iSCSI data-type PDU, from/to 1117 the peer Datamover layer on the remote iSCSI node via this 1118 Operational Primitive. The iSCSI layer processes the 1119 operation as defined in [RFC3720]. 1121 SN may be either the DataSN associated with the SCSI Data-In 1122 PDU or R2TSN associated with the R2T PDU depending on the 1123 SCSI operation. Note that, for targets, a TTT (see 1124 [RFC3720]) could have been specified instead of an SN. 1125 However, the considered choice was to leave the SN to be the 1126 qualifier for two reasons - a) it is generic and applicable 1127 to initiators and targets as well as Data-in and Data-out, 1128 and b) having both SN and TTT qualifiers for the notification 1129 was considered onerous on the Datamover layer, in terms of 1130 state maintenance for each completion notification. The 1131 implication of this choice is that iSCSI target 1132 implementations will have to adapt to using the ITT-SN tuple 1133 in associating the solicited data to the appropriate task, 1134 rather than the ITT-TTT tuple for doing the same. 1136 If Notify_Enable was set in either a Put_Data or a Get_Data 1137 invocation, the Datamover layer MUST invoke the 1138 Data_Completion_Notify Operational Primitive upon completing 1139 that requested data transfer. If the Notify_Enable was 1140 cleared in either a Put_Data or a Get_Data invocation, the 1141 Datamover layer MUST NOT invoke the Data_Completion_Notify 1142 Operational Primitive upon completing that requested data 1143 transfer. 1145 A Data_Completion_Notify invocation serves to notify the 1146 iSCSI layer of the Put_Data or Get_Data completion 1147 respectively. As earlier noted in sections 8.2 and 8.3, 1148 specific Datamover protocol definitions may restrict the 1149 usage scope of Put_Data and Get_Data, and thus implicitly the 1150 usage scope of Data_Completion_Notify. 1152 A target iSCSI layer being notified of the retrieval of a 1153 write data sequence is an example of invoking the 1154 Data_Completion_Notify Operational Primitive. 1156 9.4 Data_ACK_Notify 1158 Input qualifiers: Connection_Handle, ITT, DataSN 1160 Return Results: Not specified. 1162 A target Datamover layer notifies its local iSCSI layer of 1163 the arrival of a previously requested data acknowledgement 1164 from the peer Datamover layer on the remote (initiator) iSCSI 1165 node via this Operational Primitive. The iSCSI layer 1166 processes the data acknowledgement notification as defined in 1167 [RFC3720]. 1169 A target iSCSI layer being notified of the arrival of a data 1170 acknowledgement for a certain SCSI Read data PDU is the only 1171 example of invoking the Data_ACK_Notify Operational 1172 Primitive. 1174 10 Datamover Interface (DI) 1176 10.1 Overview 1178 This chapter describes the interactions model between iSCSI 1179 and Datamover layers when the iSCSI connection is Datamover- 1180 assisted so the iSCSI layer may carry out the following - 1182 - send iSCSI data-type PDUs and exchange iSCSI control-type 1183 PDUs, and 1185 - handle asynchronous notifications such as completion of 1186 data sequence transfer, and connection failure. 1188 This chapter relies on the notion of Operational Primitives 1189 (section 5.4) to define DI. 1191 10.2 Interactions for handling asynchronous notifications 1193 10.2.1 Connection termination 1195 As stated in section 9.2, the Datamover layer notifies the 1196 iSCSI layer of a failed or terminated connection via the 1197 Connection_Terminate_Notify Operational Primitive. The iSCSI 1198 layer MUST consider the connection as unusable upon the 1199 invocation of this Primitive and handle the connection 1200 termination as specified in [RFC3720]. 1202 10.2.2 Data transfer completion 1204 As stated in section 9.3, the Datamover layer notifies the 1205 iSCSI layer of a completed data transfer operation via the 1206 Data_Completion_Notify Operational Primitive. The iSCSI 1207 layer processes the transfer completion as specified in 1208 [RFC3720]. 1210 10.2.2.1 Completion of a requested SCSI Data transfer 1212 The Datamover layer, to notify the iSCSI layer of the 1213 completion of a requested iSCSI data-type PDU transfer, uses 1214 the Data_Completion_Notify Operational Primitive with the 1215 following input qualifiers. 1217 a) Connection_Handle 1218 b) ITT: Initiator Task Tag semantics as defined in 1219 [RFC3720] 1221 c) SN: DataSN for a SCSI Data-in/Data-out PDU, and R2TSN 1222 for an iSCSI R2T PDU. The semantics for both types of 1223 sequence numbers are as defined in [RFC3720]. 1225 The rationale for choosing SN is explained in section 9.3. 1227 Every invocation of the Data_Completion_Notify Operational 1228 Primitive MUST be preceded by an invocation of the Put_Data 1229 or Get_Data Operational Primitive with the Notify_Enable 1230 qualifier set by the iSCSI layer at an earlier point in time. 1232 10.2.3 Data acknowledgement 1234 [RFC3720] allows the iSCSI targets to optionally solicit data 1235 acknowledgement from the initiator for one or more Data-in 1236 PDUs, via setting of the A-bit on a Data-in PDU. The 1237 Data_ACK_Notify Operational Primitive with the following 1238 input qualifiers is used by the target Datamover layer to 1239 notify the local iSCSI layer of the arrival of data 1240 acknowledgement of a previously solicited iSCSI read data 1241 acknowledgement. This Operational Primitive thus is appli- 1242 cable only to iSCSI targets. 1244 a) Connection_Handle 1246 b) ITT: Initiator Task Tag semantics as defined in [RFC3720] 1248 c) DataSN: of the next SCSI Data-in PDU which immediately 1249 follows the SCSI Data-in PDU with the A-bit set to which 1250 this notification corresponds, with semantics as defined in 1251 [RFC3720]. 1253 Every invocation of the Data_ACK_Notify Operational Primitive 1254 MUST be preceded by an invocation of the Put_Data Operational 1255 Primitive by the iSCSI target layer with the A-bit set to 1 1256 at an earlier point in time. 1258 10.3 Interactions for sending an iSCSI PDU 1260 This section discusses the interactions model for sending 1261 each of the iSCSI PDUs defined in [RFC3720]. A 1262 Connection_Handle (see section 5.3) is assumed to qualify 1263 each of these interactions so that the Datamover layer can 1264 route it to the appropriate Transport Connection. The 1265 qualifying Connection_Handle is not explicitly listed in the 1266 subsequent sections. 1268 Note that the defined list of input qualifiers represents the 1269 semantically required set for the Datamover layer to consider 1270 in implementing the Primitive in each interaction described 1271 in this section (see section 5.4 for an elaboration). 1272 Implementations may choose to deduce the qualifiers in ways 1273 that are optimized for the implementation specifics. Two 1274 examples of this are: 1276 1. For SCSI Command (section 10.3.1), deducing the 1277 ImmediateDataSize input qualifier from the 1278 DataSegmentLength field of the SCSI Command PDU. 1280 2. For SCSI Data-Out (section 10.3.5.1), deducing the 1281 DataDescriptorOut input qualifier from the associated 1282 SCSI Command invocation qualifiers (assuming such state 1283 is maintained) in conjunction with BHS fields of the 1284 SCSI Data-out PDU. 1286 10.3.1 SCSI Command 1288 The Send_Control Operational Primitive with the following 1289 input qualifiers is used for requesting the transmission of a 1290 SCSI Command PDU. 1292 a) BHS and AHS, if any, of the SCSI Command PDU as defined in 1293 [RFC3720] 1295 b) DataDescriptorOut: that defines the I/O Buffer meant for 1296 Data-out for the entire command, in the case of a write or 1297 bidirectional command 1299 c) DataDescriptorIn: that defines the I/O Buffer meant for 1300 Data-in for the entire command, in the case of a read or 1301 bidirectional command 1303 d) ImmediateDataSize: that defines the number of octets of 1304 immediate unsolicited data for a write/bidirectional 1305 command 1307 e) UnsolicitedDataSize: that defines the number of octets of 1308 immediate and non-immediate unsolicited data for a 1309 write/bidirectional command. 1311 10.3.2 SCSI Response 1313 The Send_Control Operational Primitive with the following 1314 input qualifiers is used for requesting the transmission of a 1315 SCSI Response PDU. 1317 a) BHS of the SCSI Response PDU as defined in [RFC3720] 1319 b) DataDescriptorStatus: that defines the iSCSI buffer which 1320 contains the sense and response information for the command 1322 10.3.3 Task Management Function Request 1324 The Send_Control Operational Primitive with the following 1325 input qualifiers is used for requesting the transmission of a 1326 Task Management Function Request PDU. 1328 a) BHS of the Task Management Function Request PDU as defined 1329 in [RFC3720] 1331 b) DataDescriptorOut: that defines the I/O Buffer meant for 1332 Data-out for the entire command, in the case of a write or 1333 bidirectional command (Only valid if Function="TASK 1334 REASSIGN" - [RFC3720] ] 1336 c) DataDescriptorIn: that defines the I/O Buffer meant for 1337 Data-in for the entire command, in the case of a read or 1338 bidirectional command (Only valid if Function="TASK 1339 REASSIGN" - [RFC3720] ) 1341 10.3.4 Task Management Function Response 1343 The Send_Control Operational Primitive with the following 1344 input qualifier is used for requesting the transmission of a 1345 Task Management Function Response PDU. 1347 a) BHS of the Task Management Function Response PDU as defined 1348 in [RFC3720] 1350 10.3.5 SCSI Data-out & SCSI Data-in 1352 10.3.5.1 SCSI Data-out 1354 The Send_Control Operational Primitive with the following 1355 input qualifiers is used by the initiator iSCSI layer for 1356 requesting the transmission of a SCSI Data-out PDU carrying 1357 the non-immediate unsolicited data. 1359 a) BHS of the SCSI Data-out PDU as defined in [RFC3720] 1361 b) DataDescriptorOut: that defines the I/O Buffer with the 1362 Data-out to be carried in the iSCSI data segment of the PDU 1364 10.3.5.2 SCSI Data-in 1366 The Put_Data Operational Primitive with the following input 1367 qualifiers is used by the target iSCSI layer for requesting 1368 the transmission of the data carried by a SCSI Data-in PDU. 1370 a) BHS of the SCSI Data-in PDU as defined in [RFC3720] 1372 b) DataDescriptorIn: that defines the I/O Buffer with the 1373 Data-in being requested for transmission 1375 10.3.6 Ready To Transfer (R2T) 1377 The Get_Data Operational Primitive with the following input 1378 qualifiers is used by the target iSCSI layer for requesting 1379 the retrieval of the data as specified by the semantic 1380 content of an R2T PDU. 1382 a) BHS of the Ready To Transfer PDU as defined in [RFC3720] 1384 b) DataDescriptorOut: that defines the I/O Buffer for the 1385 Data-out being requested for retrieval 1387 10.3.7 Asynchronous Message 1389 The Send_Control Operational Primitive with the following 1390 input qualifiers is used for requesting the transmission of 1391 an Asynchronous Message PDU. 1393 a) BHS of the Asynchronous Message PDU as defined in [RFC3720] 1395 b) DataDescriptorSense: that defines an iSCSI buffer which 1396 contains the sense and iSCSI Event information. 1398 10.3.8 Text Request 1400 The Send_Control Operational Primitive with the following 1401 input qualifiers is used for requesting the transmission of a 1402 Text Request PDU. 1404 a) BHS of the Text Request PDU as defined in [RFC3720] 1406 b) DataDescriptorTextOut: that defines the iSCSI Text Request 1407 buffer 1409 10.3.9 Text Response 1411 The Send_Control Operational Primitive with the following 1412 input qualifiers is used for requesting the transmission of a 1413 Text Response PDU. 1415 a) BHS of the Text Response PDU as defined in [RFC3720] 1416 b) DataDescriptorTextIn: that defines the iSCSI Text Response 1417 buffer 1419 10.3.10 Login Request 1421 The Send_Control Operational Primitive with the following 1422 input qualifiers is used for requesting the transmission of a 1423 Login Request PDU. 1425 a) BHS of the Login Request PDU as defined in [RFC3720] 1427 b) DataDescriptorLoginRequest: that defines the iSCSI Login 1428 Request buffer 1430 Note that specific Datamover protocols may choose to disallow 1431 the standard DA Primitives from being used for the iSCSI 1432 Login phase. When used in conjunction with such Datamover 1433 protocols, an attempt to send a Login Request via the 1434 Send_Control Operational Primitive invocation is clearly an 1435 error scenario, as the Login Request PDU is being sent while 1436 the connection is in the iSCSI full feature phase. It is 1437 outside the scope of this document to specify the resulting 1438 implementation behavior in this case - [RFC3720] already 1439 defines the error handling for this error scenario. 1441 10.3.11 Login Response 1443 The Send_Control Operational Primitive with the following 1444 input qualifiers is used for requesting the transmission of a 1445 Login Response PDU. 1447 a) BHS of the Login Response PDU as defined in [RFC3720] 1449 b) DataDescriptorLoginResponse: that defines the iSCSI Login 1450 Response buffer 1452 Note that specific Datamover protocols may choose to disallow 1453 the standard DA Primitives from being used for the iSCSI 1454 Login phase. When used in conjunction with such Datamover 1455 protocols, an attempt to send a Login Response via the 1456 Send_Control Operational Primitive invocation is clearly an 1457 error scenario, as the Login Response PDU is being sent while 1458 in the iSCSI full feature phase. It is outside the scope of 1459 this document to specify the resulting implementation 1460 behavior in this case - [RFC3720] already defines the error 1461 handling for this error scenario. 1463 10.3.12 Logout Command 1465 The Send_Control Operational Primitive with the following 1466 input qualifier is used for requesting the transmission of a 1467 Logout Command PDU. 1469 a) BHS of the Logout Command PDU as defined in [RFC3720] 1471 10.3.13 Logout Response 1473 The Send_Control Operational Primitive with the following 1474 input qualifier is used for requesting the transmission of a 1475 Logout Response PDU. 1477 a) BHS of the Logout Response PDU as defined in [RFC3720] 1479 10.3.14 SNACK Request 1481 The Send_Control Operational Primitive with the following 1482 input qualifier is used for requesting the transmission of a 1483 SNACK Request PDU. 1485 a) BHS of the SNACK Request PDU as defined in [RFC3720] 1487 10.3.15 Reject 1489 The Send_Control Operational Primitive with the following 1490 input qualifiers is used for requesting the transmission of a 1491 Reject PDU. 1493 a) BHS of the Reject PDU as defined in [RFC3720] 1495 b) DataDescriptorReject: that defines the iSCSI Reject buffer 1497 10.3.16 NOP-Out 1499 The Send_Control Operational Primitive with the following 1500 input qualifiers is used for requesting the transmission of a 1501 NOP-Out PDU. 1503 a) BHS of the NOP-Out PDU as defined in [RFC3720] 1505 b) DataDescriptorNOPOut: that defines the iSCSI Ping data 1506 buffer 1508 10.3.17 NOP-In 1510 The Send_Control Operational Primitive with the following 1511 input qualifiers is used for requesting the transmission of a 1512 NOP-In PDU. 1514 a) BHS of the NOP-In PDU as defined in [RFC3720] 1516 b) DataDescriptorNOPIn: that defines the iSCSI Return Ping 1517 data buffer 1519 10.4 Interactions for receiving an iSCSI PDU 1521 The only PDUs that are received by an iSCSI layer operating 1522 on a Datamover layer are the iSCSI control-type PDUs. The 1523 Datamover layer delivers the iSCSI control-type PDUs as they 1524 arrive, qualifying each with the Connection_Handle (see 1525 section 5.3) that identifies the iSCSI connection the PDU is 1526 meant for. The subsequent processing of the iSCSI control- 1527 type PDUs proceeds as defined in [RFC3720]. 1529 10.4.1 General Control-type PDU notification 1531 This sub-section describes the general mechanics applicable 1532 to several control-type PDUs. The following sub-sections 1533 note additional considerations for control-type PDUs not 1534 covered in this sub-section. 1536 The Control_Notify Operational Primitive is used for 1537 notifying the arrival of the following iSCSI control-type 1538 PDUs: SCSI Command, SCSI Response, Task Management Function 1539 Request, Task Management Function Response, Asynchronous 1540 Message, Text Request, Text Response, Logout command, Logout 1541 Response, SNACK, Reject, NOP-Out, NOP-In. 1543 10.4.2 SCSI Data Transfer PDUs 1545 10.4.2.1 SCSI Data-out 1547 The Control_Notify Operational Primitive is used for 1548 notifying the iSCSI layer of the arrival of a SCSI Data-out 1549 PDU carrying the non-immediate unsolicited data. Note 1550 however that the solicited SCSI Data-out arriving on the 1551 target is not notified to the iSCSI layer using the 1552 Control_Notify Primitive because the solicited SCSI Data-out 1553 was not sent by the initiator iSCSI layer as control-type 1554 PDUs. 1556 10.4.2.2 SCSI Data-in 1558 The arrival of the SCSI Data-in is not notified to the iSCSI 1559 layer by the Datamover layer at the initiator, because SCSI 1560 Data-in is an iSCSI data-type PDU (see section 5.1). The 1561 iSCSI layer at the initiator however may infer the arrival of 1562 the SCSI Data-in when it receives a subsequent notification 1563 of the SCSI Response PDU via a Control_Notify invocation. 1565 While this document does not contemplate the possibility of a 1566 Data-in PDU being received at the initiator iSCSI layer, 1567 specific Datamover protocols may define how to deal with an 1568 unexpected inbound SCSI Data-in PDU that may result in the 1569 initiator iSCSI layer receiving the Data-in PDU. This 1570 document leaves the details of handling this error scenario 1571 to the specific Datamover protocols, so each may define the 1572 appropriate error handling specific to the Datamover 1573 environment. 1575 10.4.2.3 Ready To Transfer (R2T) 1577 Because an R2T PDU is an iSCSI data-type PDU (see section 1578 5.1) that is not delivered as-is to the initiator iSCSI 1579 layer, the arrival of an R2T PDU is not notified to the iSCSI 1580 layer by the Datamover layer. When an iSCSI node sends an 1581 R2T PDU to its local Datamover layer, the local and remote 1582 Datamover layers transparently bring about the data transfer 1583 requested by the R2T PDU. 1585 While this document does not contemplate the possibility of 1586 an R2T PDU being received at the initiator iSCSI layer, 1587 specific Datamover protocols may define how to deal with an 1588 unexpected inbound R2T PDU that may result in the initiator 1589 iSCSI layer receiving the R2T PDU. This document leaves the 1590 details of handling this error scenario to the specific 1591 Datamover protocols, so each may define the appropriate error 1592 handling specific to the Datamover environment. 1594 10.4.3 Login Request 1596 The Control_Notify Operational Primitive is used for 1597 notifying the target iSCSI layer of the arrival of a Login 1598 Request PDU. Note that specific Datamover protocols may 1599 choose to disallow the standard DA Primitives from being used 1600 for the iSCSI Login phase. When used in conjunction with 1601 such Datamover protocols, the arrival of a Login Request 1602 necessitating the Control_Notify Operational Primitive 1603 invocation is clearly an error scenario, as the Login Request 1604 PDU is arriving in the iSCSI full feature phase. It is 1605 outside the scope of this document to specify the resulting 1606 implementation behavior in this case - [RFC3720] already 1607 defines the error handling in this error scenario. 1609 10.4.4 Login Response 1611 The Control_Notify Operational Primitive is used for 1612 notifying the initiator iSCSI layer of the arrival of a Login 1613 Response PDU. Note that specific Datamover protocols may 1614 choose to disallow the standard DA Primitives from being used 1615 for the iSCSI Login phase. When used in conjunction with 1616 such Datamover protocols, the arrival of a Login Response 1617 necessitating the Control_Notify Operational Primitive 1618 invocation is clearly an error scenario, as the Login 1619 Response PDU is arriving in the iSCSI full feature phase. It 1620 is outside the scope of this document to specify the 1621 resulting implementation behavior in this case - [RFC3720] 1622 already defines the error handling in this error scenario. 1624 11 Security Considerations 1626 11.1 Architectural Considerations 1628 DA enables compliant iSCSI implementations to realize a 1629 control and data separation in the way they interact with 1630 their Datamover protocols. Note however that this separation 1631 does not imply a separation in transport mediums between 1632 control traffic and data traffic - basic iSCSI architecture 1633 with respect to tasks and PDU relationships to tasks remains 1634 unchanged. [RFC3720] defines several MUST requirements on 1635 ordering relationships across control and data for a given 1636 task besides a mandatory deterministic task allegiance model 1637 - DA does not change this basic architecture (DA has a 1638 normative reference on [RFC3720]) nor allow any additional 1639 flexibility in compliance in this area. To summarize, 1640 sending bulk data transfers (prompted by Put_Data and 1641 Get_Data Primitive invocations) on a different transport 1642 medium would be as ill-advised as sending just the Data- 1643 out/Data-in PDUs on a different TCP connection in RFC 3720- 1644 based iSCSI implementations. Consequently, all the iSCSI- 1645 related security text in [RFC3723] is directly applicable to 1646 a DA-enabled iSCSI implementation. 1648 Another area with security implications is the Datamover 1649 connection resource management model which DA defines - 1650 particularly the Allocate_Connection_Resources Primitive. An 1651 inadvertent realization of this model could leave an iSCSI 1652 implementation exposed to denial of service attacks. As 1653 Figure 2 and Figure 3 in section 16.2 illustrate, the most 1654 effective countermeasure to this potential attack consists of 1655 performing the Datamover resource allocation when the iSCSI 1656 layer is sufficiently far along in the iSCSI Login Phase that 1657 it is reasonably certain that the peer side is not an 1658 attacker. In particular, if the Login Phase includes a 1659 SecurityNegotiation stage, an iSCSI end node MUST defer the 1660 Datamover connection resource allocation (i.e. invoking the 1661 Allocate_Connection_Resources Primitive) to the 1662 LoginOperationalNegotiation stage ([RFC3720]) so that the 1663 resource allocation happens post-authentication. This 1664 considerably minimizes the potential for a denial of service 1665 attack. 1667 11.2 Wire Protocol Considerations 1669 In view of the fact that the DA architecture itself does not 1670 define any new wire protocol nor propose modifications to the 1671 existing protocols, there are no additional wire protocol 1672 security considerations in employing DA itself. However, a 1673 DA-compliant iSCSI implementation MUST comply with all the 1674 iSCSI-related requirements stipulated in [RFC3723] and 1675 [RFC3720]. Note further that in realizing DA, each Datamover 1676 protocol must define and elaborate as appropriate on any 1677 additional security considerations resulting from the use of 1678 that Datamover protocol. 1680 All Datamover protocol designers are strongly recommended to 1681 refer to [RDDPSEC] for the types of security issues to 1682 consider. While [RDDPSEC] elaborates on the security 1683 considerations applicable to an RDDP-based Datamover 1684 ([iSER]), the document is representative of the type of 1685 analysis of resource exhaustion and the application of 1686 countermeasures that needs to be done for any Datamover 1687 protocol. 1689 12 IANA Considerations 1691 DA architecture does not have any IANA considerations. 1693 13 References and Bibliography 1695 13.1 Normative References 1697 [RFC3720] J. Satran, K. Meth, C. Sapuntzakis, M. Chadalapaka, 1698 E. Zeidner, "Internet Small Computer Systems Interface 1699 (iSCSI)", RFC 3720, April 2004. 1701 [RFC3723] B. Aboba, J. Tseng, J. Walker, V. Rangan, F. 1702 Travostino, "Securing Block Storage Protocols over IP", 1703 RFC 3723, April 2004. 1705 [RFC2119] S. Bradner, "Key words for use in RFCs to Indicate 1706 Requirement Levels", March 1997. 1708 13.2 Informative References 1710 [DDP] H. Shah et al., "Direct Data Placement over Reliable 1711 Transports", IETF Internet Draft draft-ietf-rddp-ddp- 1712 06.txt (work in progress), June 2006. 1714 [iSER] M. Ko et al., "iSCSI Extensions for RDMA", IETF 1715 Internet Draft draft-ietf-ips-iser-03.txt (work in 1716 progress), April 2005. 1718 [RDDPSEC] J. Pinkerton et al., "DDP/RDMAP Security", IETF 1719 Internet Draft draft-ietf-rddp-security-07.txt (work in 1720 progress), April 2005 1722 14 Authors' Addresses 1724 Mallikarjun Chadalapaka 1725 Hewlett-Packard Company 1726 8000 Foothills Blvd. 1727 Roseville, CA 95747-5668, USA 1728 Phone: +1-916-785-5621 1729 E-mail: cbm@rose.hp.com 1731 John L. Hufferd 1732 IBM 1733 San Jose CA, USA 1734 Phone: +1-408-256-0403 1735 E-mail: hufferd@us.ibm.com 1737 Julian Satran 1738 IBM, Haifa Research Lab 1739 Haifa University Campus - Mount Carmel 1740 Haifa 31905, Israel 1741 Phone +972-4-829-6264 1742 E-mail: Julian_Satran@il.ibm.com 1744 Hemal Shah 1745 Intel Corporation 1746 MS PTL1 1747 1501 South Mopac Expressway, #400 1748 Austin, TX 78746 USA 1749 Phone: +1 (512) 732-3963 1750 Email: hemal.shah@intel.com 1752 Comments may be sent to Mallikarjun Chadalapaka. 1754 15 Acknowledgements 1756 The IP Storage (ips) Working Group in the Transport Area of 1757 IETF has been responsible for defining the iSCSI protocol 1758 (apart from a host of other relevant IP Storage protocols). 1759 The authors are grateful to the entire working group, whose 1760 work allowed this document to build on the concepts and 1761 details of the iSCSI protocol. 1763 In addition, the following individuals had reviewed and 1764 contributed to the improvement of this document. The authors 1765 are grateful for their contribution. 1767 John Carrier 1768 Adaptec, Inc. 1769 691 S. Milpitas Blvd., Milpitas, CA 95035 USA 1770 Phone: +1 (360) 378-8526 1771 Email: john_carrier@adaptec.com 1773 Hari Ghadia 1774 Adaptec, Inc. 1775 691 S. Milpitas Blvd., Milpitas, CA 95035 USA 1776 Phone: +1 (408) 957-5608 1777 Email: hari_ghadia@adaptec.com 1779 Hari Mudaliar 1780 Adaptec, Inc. 1781 691 S. Milpitas Blvd., Milpitas, CA 95035 USA 1782 Phone: +1 (408) 957-6012 1783 Email: hari_mudaliar@adaptec.com 1785 Patricia Thaler 1786 Agilent Technologies, Inc. 1787 1101 Creekside Ridge Drive, #100, M/S-RG10, 1788 Roseville, CA 95678 1789 Phone: +1-916-788-5662 1790 email: pat_thaler@agilent.com 1792 Uri Elzur 1793 Broadcom Corporation 1795 16215 Alton Parkway, Irvine, CA 92619-7013 USA 1796 Phone: +1 (949) 585-6432 1797 Email: Uri@Broadcom.com 1799 Mike Penna 1800 Broadcom Corporation 1801 16215 Alton Parkway,Irvine, CA 92619-7013 USA 1802 Phone: +1 (949) 926-7149 1803 Email: MPenna@Broadcom.com 1805 David Black 1806 EMC Corporation 1807 176 South St., Hopkinton, MA 01748, USA 1808 Phone: +1 (508) 293-7953 1809 Email: black_david@emc.com 1811 Ted Compton 1812 EMC Corporation 1813 Research Triangle Park, NC 27709, USA 1814 Phone: +1-919-248-6075 1815 Email: compton_ted@emc.com 1817 Dwight Barron 1818 Hewlett-Packard Company 1819 20555 SH 249, Houston, TX 77070-2698 USA 1820 Phone: +1 (281) 514-2769 1821 Email: Dwight.Barron@Hp.com 1823 Paul R. Culley 1824 Hewlett-Packard Company 1825 20555 SH 249, Houston, TX 77070-2698 USA 1826 Phone: +1 (281) 514-5543 1827 Email: paul.culley@hp.com 1829 Dave Garcia 1830 Hewlett-Packard Company 1831 19333 Vallco Parkway, Cupertino, Ca. 95014 USA 1832 Phone: +1 (408) 285-6116 1833 Email: dave.garcia@hp.com 1834 Randy Haagens 1835 Hewlett-Packard Company 1836 8000 Foothills Blvd, MS 5668, Roseville CA 1837 Phone: +1-916-785-4578 1838 email: randy_haagens@hp.com 1840 Jeff Hilland 1841 Hewlett-Packard Company 1842 20555 SH 249, Houston, Tx. 77070-2698 USA 1843 Phone: +1 (281) 514-9489 1844 Email: jeff.hilland@hp.com 1846 Mike Krause 1847 Hewlett-Packard Company, 43LN 1848 19410 Homestead Road, Cupertino, CA 95014 USA 1849 Phone: +1 (408) 447-3191 1850 Email: krause@cup.hp.com 1852 Jim Wendt 1853 Hewlett-Packard Company 1854 8000 Foothills Blvd, MS 5668, Roseville CA 1855 Phone: +1-916-785-5198 1856 email: jim_wendt@hp.com 1858 Mike Ko 1859 IBM 1860 650 Harry Rd, San Jose, CA 95120 1861 Phone: +1 (408) 927-2085 1862 Email: mako@us.ibm.com 1864 Renato Recio 1865 IBM Corporation 1866 11501 Burnett Road, Austin, TX 78758 USA 1867 Phone: +1 (512) 838-1365 1868 Email: recio@us.ibm.com 1869 Howard C. Herbert 1870 Intel Corporation 1871 MS CH7-404,5000 West Chandler Blvd., Chandler, AZ 85226 USA 1872 Phone: +1 (480) 554-3116 1873 Email: howard.c.herbert@intel.com 1875 Dave Minturn 1876 Intel Corporation 1877 MS JF1-210, 5200 North East Elam Young Parkway 1878 Hillsboro, OR 97124 USA 1879 Phone: +1 (503) 712-4106 1880 Email: dave.b.minturn@intel.com 1882 James Pinkerton 1883 Microsoft Corporation 1884 One Microsoft Way, Redmond, WA 98052 USA 1885 Phone: +1 (425) 705-5442 1886 Email: jpink@microsoft.com 1888 Tom Talpey 1889 Network Appliance 1890 375 Totten Pond Road, Waltham, MA 02451 USA 1891 Phone: +1 (781) 768-5329 1892 EMail: thomas.talpey@netapp.com 1894 16 Appendix 1896 16.1 Design considerations for a Datamover protocol 1898 This section discusses the specific considerations for RDMA- 1899 based and RDDP-based Datamover protocols. 1901 a) Note that the modeling of interactions for SCSI Data-Out 1902 (section 10.3.5.1) is only used for unsolicited data 1903 transfer. 1905 b) The modeling of interactions for SNACK (section 10.3.14, 1906 and section 10.4.1) is not expected to be used given that 1907 one of the design requirements on the Datamover is that it 1908 "guarantees an error-free, reliable, in-order transport 1909 mechanism" (section 6). The interactions for sending and 1910 receiving a SNACK are nevertheless modeled in this document 1911 because the receiving iSCSI layer can deterministically 1912 deal with an inadvertent SNACK. This also shows the DA 1913 designers' intent that DI is not meant to filter certain 1914 types of PDUs. 1916 c) The onus is on a reliable Datamover (per requirements 1917 stated in section 6) to realize end-to-end data 1918 acknowledgements via Datamover-specific means. In view of 1919 this, even data-ACK-type SNACKs are unnecessary to be used. 1920 Consequently, an initiator may never request sending a 1921 SNACK Request in this model assuming that the proactive 1922 (timeout-driven) SNACK functionality is turned off in the 1923 legacy iSCSI code. 1925 d) Note that the current DA model for bootstrapping a 1926 Connection_Handle into service - i.e. associating a new 1927 iSCSI connection with a Connection_Handle - clearly implies 1928 that the iSCSI connection must already be in full feature 1929 phase when the Datamover layer comes into the stack. This 1930 further implies that the iSCSI login phase must be carried 1931 out in the traditional "Byte streaming mode" with no 1932 assistance or involvement from the Datamover layer. 1934 16.2 Examples of Datamover interactions 1936 The figures described in this section provide some examples 1937 of the usage of Operational Primitives in interactions 1938 between the iSCSI layer and the Datamover layer. The 1939 following abbreviations are used in this section. 1941 Avail - Available 1943 Abted - Aborted 1945 Buf - I/O Buffer 1947 Cmd - Command 1949 Compl - Complete 1951 Conn - Connection 1953 Ctrl_Ntfy - Control_Notify 1955 Dal_Tk_Res - Deallocate_Task_Resources 1957 Data_Cmp_Nfy - Data_Completion_Notify 1959 Data_ACK_Nfy - Data_ACK_Notify 1961 DM - Datamover 1963 Imm - Immediate 1965 Snd_Ctrl - Send_Control 1967 Msg - Message 1969 Resp - Response 1971 Sol - Solicited 1973 TMF Req - Task Management Function Request 1975 TMF Res - Task Management Function Response 1977 Trans - Transfer 1979 Unsol - Unsolicited 1980 | | Allocate_Connection_Resources | D | ^ 1981 | |------------------------------->| a | | 1982 | | Connection resources are | t | | 1983 | i | successfully allocated | a | | iSCSI 1984 | S | | m | | Login 1985 | C | | o | | Phase 1986 | S | | v | | 1987 | I | | e | | 1988 | | | r | | Login Phase 1989 | L | Final Login Response (success) v succeeds 1990 | a |<----------------------------------------^ 1991 | y | | L | | iSCSI 1992 | e | Enable_Datamover | a | | Full 1993 | r |------------------------------->| y | | Feature 1994 | | Datamover is enabled | e | | Phase 1995 | | | r | | 1996 | | Full Feature Phase | | | 1997 | | control and data Transfer | | v 1999 Figure 2 A successful iSCSI login on initiator 2001 | | Notice_Key_Values | | | 2002 | |------------------------------->| | | 2003 | | Datamover layer is notified | | | 2004 | | of the negotiated key values | | | 2005 | | | | | 2006 | | Allocate_Connection_Resources | | | 2007 | |------------------------------->| D | | 2008 | | Connection resources are | a | | 2009 | i | successfully allocated | t | | iSCSI 2010 | S | | a | | Login 2011 | C | | m |Final | Phase 2012 | S | | o |Login | 2013 | I |Enable_Datamover(Login Response)| v |Resp | 2014 | |------------------------------->| e |---->vLogin Phase 2015 | L | Datamover is enabled | r | ^ succeeds 2016 | a | | | | 2017 | y | | L | | iSCSI 2018 | e | | a | | Full 2019 | r | | y | | Feature 2020 | | | e | | Phase 2021 | | Full Feature Phase | r | | 2022 | | control and data Transfer | | | 2023 | | | | v 2025 Figure 3 A successful iSCSI login on target 2027 | | Allocate_Connection_Resources | D | ^ 2028 | |------------------------------->| a | | 2029 | | Connection resources are | t | | 2030 | i | successfully allocated | a | | iSCSI 2031 | S | | m | | Login 2032 | C | | o | | Phase 2033 | S | | v | | 2034 | I | | e | | 2035 | | | r | | Login 2036 | | | | | Phase 2037 | L | Final Login Response (failure) v fails 2038 | a |<------------------------------------------ 2039 | y | | L | 2040 | e | Deallocate_Connection_Resources| a | 2041 | r |------------------------------->| y | 2042 | | Datamover-specific | e | 2043 | | connection resources freed | r | 2044 | | | | 2045 | | 2046 | | Connection terminated by standard means 2047 | |---------------------------------------------> 2049 Figure 4 A failed iSCSI login on initiator 2051 | | Allocate_Connection_Resources | D | ^ 2052 | |------------------------------->| a | | 2053 | | Connection resources are | t | | 2054 | i | successfully allocated | a | | iSCSI 2055 | S | | m | | Login 2056 | C | | o | | Phase 2057 | S | | v | | 2058 | I | | e | | 2059 | | | r | | Login 2060 | | | | | Phase 2061 | L | Final Login Response (failure) v fails 2062 | a |----------------------------------------------> 2063 | y | | L | 2064 | e | Deallocate_Connection_Resources| a | 2065 | r |------------------------------->| y | 2066 | | Datamover-specific | e | 2067 | | connection resources freed | r | 2068 | | | | 2069 | | 2070 | | Connection terminated by standard means 2071 | |--------------------------------------------> 2073 Figure 5 A failed iSCSI login on target 2075 | | Allocate_Connection_Resources | D | ^ 2076 | |------------------------------->| a | | 2077 | | Connection resources are | t | | 2078 | i | successfully allocated | a | | iSCSI 2079 | S | | m | | Login 2080 | C | | o | | Phase 2081 | S | | v | | 2082 | I | | e | | 2083 | | | r | | 2084 | L | Login non-Final Request/Response | 2085 | a |<-----------------------------------------| 2086 | y | iSCSI layer decides not to | L | | 2087 | e | enable Datamover for this | a | | 2088 | r | connection | y | | 2089 | | | e | | 2090 | | Deallocate_Connection_Resources| r | | 2091 | |------------------------------->| | | 2092 | | All Datamover-specific | | | 2093 | | resources deallocated | | | 2094 | | | | | Login 2095 | | | | | Phase 2096 | | | continues 2097 | | Regular Login negotiation continues | 2098 | |<---------------------------------------->| 2099 | | . 2100 | | . 2101 | | . 2103 Figure 6 iSCSI does not enable the Datamover 2105 | | | | ^ 2106 | | Full Feature Phase Control & | | | 2107 | | Data Transfer Using DM | D | | iSCSI 2108 | | | a | | Full Feature 2109 | i | | t | | Phase 2110 | S | | a | | (DM Enabled) 2111 | C | | m | | 2112 | S | Successful iSCSI Logout | o | | 2113 | I | | v | v 2114 | | Connection_Terminate | e | 2115 | L |------------------------------->| r | 2116 | a | Connection is terminated | | 2117 | y | Datamover-specific resources | L | Transport 2118 | e | deallocated, both connection | a | Connection 2119 | r | level & task level | y | is terminated 2120 | | | e | 2121 | | | r | 2122 | | | | 2123 | | | | 2124 Figure 7 A normal iSCSI connection termination 2126 | | | | ^ 2127 | | Full Feature Phase Control & | D | | iSCSI 2128 | | Data Transfer Using DM | a | | Full Feature 2129 | i | | t | | Phase 2130 | S | | a | | (DM Enabled) 2131 | C | | m | v 2132 | S | | o |<--Transport 2133 | I | Datamover-specific resources | v | Connection 2134 | | deallocated, both connection | e | Terminated (e.g. 2135 | L | level & task level | r | unexpected 2136 | a | | | FIN/RESET) 2137 | y | | L | 2138 | e | Connection_Terminate_Notify | a | 2139 | r |<-------------------------------| y | 2140 | | | e | 2141 | | | r | 2142 | | | | 2144 Figure 8 An abnormal iSCSI connection termination 2146 <-----Initiator-----> <-------Target-------> 2148 | | | | DM Msg holding | | | | 2149 SCSI | | | | SCSI Cmd PDU & | | | |SCSI 2150 Cmd | | Snd_Ctrl | |Unsol Imm Data | |Ctrl_Notify | |Cmd 2151 ---->| |--------->| |--------------->| |----------->| |---> 2152 | | | | | | | | 2153 | | | | DM Msg holding | | | | 2154 | | Snd_Ctrl | |SCSI Dataout PDU| |Ctrl_Notify | | 2155 | |--------->| |--------------->| |----------->| | 2156 | | . | | . | | . | |Unsol 2157 | | . | D| . | D| . | |Data 2158 | | . | a| DM Msg holding | a| . | |Trans 2159 | i| Snd_Ctrl | t|SCSI Dataout PDU| t|Ctrl_Notify | i| 2160 | S|--------->| a|--------------->| a|----------->| S| 2161 | C| | m| | m| | C|Buf 2162 | S| | o| | o| | S|Avail 2163 | I| | v| | v| Get_Data | I|(R2T) 2164 | | | e|----------------| e|<-----------| |<---- 2165 | L| | r||Solicited Data | r| | L| . 2166 | a| | || Transfer | | | a| . 2167 | y| | L|--------------->| L| . | y|Buf 2168 | e| | a| . | a| . | e|Avail 2169 | r| | y| . | y| Get_Data | r|(R2T) 2170 | | | e|----------------| e|<-----------| |<---- 2171 | | | r||Solicited Data | r| | | 2172 | | | || Transfer | | | | 2173 | | | |--------------->| |Data_Cmp_Nfy| |Data 2174 | | | | | |----------->| |Trans 2175 | | | | | | | |Compl 2176 | | | | DM Msg holding | | | | 2177 SCSI | | | |SCSI Resp PDU & | | | |SCSI 2178 Resp | |Ctrl_Ntfy | | Sense Data | | Snd_Ctrl | |Resp 2179 <----| |<---------| |<---------------| |<-----------| |<---- 2180 | | | | | | | | 2182 Figure 9 A SCSI Write data transfer 2184 <-----Initiator-----> <-------Target-------> 2186 | | | | | | | | 2187 SCSI | | | | DM Msg holding | | | |SCSI 2188 Cmd | | Snd_Ctrl | | SCSI Cmd PDU | |Ctrl_Notify | |Cmd 2189 ---->| |--------->| |--------------->| |----------->| |---> 2190 | | | | | | | | 2191 | | | D| SCSI Read | D| | |Buf 2192 | | | a| Data Transfer | a| Put_Data | |Avail 2193 | i| | t|<---------------| t|<-----------| i|<---- 2194 | S| | a| . | a| . | S| . 2195 | C| | m| . | m| . | C| . 2196 | S| | o| . | o| . | S| . 2197 | I| | v| SCSI Read | v| . | I|Buf 2198 | | | e| Data Transfer | e| Put_Data | |Avail 2199 | L| | r|<---------------| r|<-----------| L|<---- 2200 | a| | | | | | a| 2201 | y| | L| | L| | y| 2202 | e| | a| | a|Data_Cmp_Nfy| e|Data 2203 | r| | y| | y|----------->| r|Trans 2204 | | | e| | e| | |Compl 2205 | | | r| DM Msg holding | r| | | 2206 SCSI | | | |SCSI Resp PDU & | | | |SCSI 2207 Resp | |Ctrl_Ntfy | | Sense Data | | Snd_Ctrl | |Resp 2208 <----| |<---------| |<---------------| |<-----------| |<---- 2209 | | | | | | | | 2211 Figure 10 A SCSI Read data transfer 2213 <-----Initiator-----> <-------Target-------> 2215 | | | | | | | | 2216 SCSI | | | | DM Msg holding | | | |SCSI 2217 Cmd | | Snd_Ctrl | | SCSI Cmd PDU | |Ctrl_Notify | |Cmd 2218 ---->| |--------->| |--------------->| |----------->| |----> 2219 | | | | | | | | 2220 | | | D| SCSI Read | D| Put_Data | |Buf 2221 | | | a| Data Transfer | a|Data_in.A=1 | |Avail 2222 | i| | t|<---------------| t|<-----------| i|<---- 2223 | S| | a| . | a| . | S| . 2224 | C| | m| . | m|Data_ACK_Nfy| C| . 2225 | S| | o| | o|----------->| S| . 2226 | I| | v| | v| . | I| 2227 | | | e| | e| . | | 2228 | L| | r| | r| | L| 2229 | a| | | | | | a| 2230 | y| | L| | L| | y| 2231 | e| | a| | a| | e|Data 2232 | r| | y| | y| | r|Trans 2233 | | | e| | e| | |Compl 2234 | | | r| DM Msg holding | r| | | 2235 SCSI | | | |SCSI Resp PDU & | | | |SCSI 2236 Resp | |Ctrl_Ntfy | | Sense Data | | Snd_Ctrl | |Resp 2237 <----| |<---------| |<---------------| |<-----------| |<---- 2238 | | | | | | | | 2240 Figure 11 A SCSI Read data acknowledgement 2242 <-----Initiator-----> <-------Target-------> 2244 | | | | | | | | 2245 SCSI | | | | DM Msg holding | | | |SCSI 2246 Cmd | | Snd_Ctrl | | SCSI Cmd PDU | |Ctrl_Notify | |Cmd 2247 ---->| |--------->| |--------------->| |----------->| |----> 2248 | | | | | | | | 2249 | | | D| SCSI Read | D| | |Buf 2250 | | | a| Data Transfer | a| Put_Data | |Avail 2251 | i| | t|<---------------| t|<-----------| i|<---- 2252 | S| | a| . | a| . | S| . 2253 Abort| C| | m| DM Msg holding | m| . | C|Abort 2254 Task | S| Snd_Ctrl | o| Abort TMF Req | o|Ctrl_Notify | S|Task 2255 ---->| I|--------->| v|--------------->| v|----------->| I|----> 2256 | | | e| . | e| . | | 2257 Abort| L| | r| DM Msg holding| r| | L| . 2258 Done | a|Ctrl_Ntfy | | Abort TMF Res| | Snd_Ctrl | |Abted 2259 <----| y|<---------| L|<---------------| L|<-----------| y|<---- 2260 | e| | a| | a| | e| 2261 | r| | y| | y| | r| 2262 | | | e| | e| | | 2263 | | | r| | r| | | 2264 | | | | | | | | 2265 | |Dal_Tk_Res| | | |Dal_Tk_Res | | 2266 | |--------->| | | |<-----------| | 2267 | | | | | | | | 2269 Figure 12 Task resource cleanup on abort 2271 17 Full Copyright Statement 2273 Copyright (C) The IETF Trust (2006). This document is 2274 subject to the rights, licenses and restrictions contained in 2275 BCP 78, and except as set forth therein, the authors retain 2276 all their rights. 2278 This document and the information contained herein are 2279 provided on an "AS IS" basis and THE CONTRIBUTOR, THE 2280 ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), 2281 THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE 2282 DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT 2283 NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 2284 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES 2285 OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2287 18 Intellectual Property Statement 2289 The IETF takes no position regarding the validity or scope of 2290 any Intellectual Property Rights or other rights that might 2291 be claimed to pertain to the implementation or use of the 2292 technology described in this document or the extent to which 2293 any license under such rights might or might not be 2294 available; nor does it represent that it has made any 2295 independent effort to identify any such rights. Information 2296 on the procedures with respect to rights in RFC documents can 2297 be found in BCP 78 and BCP 79. 2299 Copies of IPR disclosures made to the IETF Secretariat and 2300 any assurances of licenses to be made available, or the 2301 result of an attempt made to obtain a general license or 2302 permission for the use of such proprietary rights by 2303 implementers or users of this specification can be obtained 2304 from the IETF on-line IPR repository at 2305 http://www.ietf.org/ipr. 2307 The IETF invites any interested party to bring to its 2308 attention any copyrights, patents or patent applications, or 2309 other proprietary rights that may cover technology that may 2310 be required to implement this standard. Please address the 2311 information to the IETF at ietf-ipr@ietf.org.