idnits 2.17.1 draft-ko-iwarp-iser-02.txt: -(180): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3667, Section 5.1 on line 22. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3315. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3322. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3328. ** The document seems to lack an RFC 3978 Section 5.1 IPR Disclosure Acknowledgement -- however, there's a paragraph with a matching beginning. Boilerplate error? ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** The document seems to lack an RFC 3978 Section 5.5 (updated by RFC 4748) Disclaimer -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document uses RFC 3667 boilerplate or RFC 3978-like boilerplate instead of verbatim RFC 3978 boilerplate. After 6 May 2005, submission of drafts without verbatim RFC 3978 boilerplate is not accepted. The following non-3978 patterns matched text found in the document. That text should be removed or replaced: By submitting this Internet-Draft, I certify that any applicable patent or other IPR claims of which I am aware have been disclosed, or will be disclosed, and any of which I become aware will be disclosed, in accordance with RFC 3668. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == There are 9 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 4804 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 204: '...on of an Operational Primitive MUST be...' RFC 2119 keyword, line 625: '...re iSCSI session MUST operate in one m...' RFC 2119 keyword, line 844: '...R protocol layer MUST support the foll...' RFC 2119 keyword, line 896: '... The iSER layer MUST use the followin...' RFC 2119 keyword, line 926: '...both the initiator and the target MUST...' (261 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1200 has weird spacing: '...imitive with ...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'IPSEC' is defined on line 2886, but no explicit reference was found in the text == Unused Reference: 'VERBS' is defined on line 2898, but no explicit reference was found in the text == Outdated reference: A later version (-07) exists of draft-ietf-rddp-rdmap-01 -- No information found for draft-ietf-rddp-ddp-0 - is the name correct? -- Possible downref: Normative reference to a draft: ref. 'DDP' == Outdated reference: A later version (-02) exists of draft-chadalapaka-iwarp-da-01 -- Possible downref: Normative reference to a draft: ref. 'DA' -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) == Outdated reference: A later version (-08) exists of draft-ietf-rddp-mpa-00 -- Obsolete informational reference (is this intentional?): RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- No information found for draft-hilland-iwarp-verbs-v1 - is the name correct? Summary: 7 errors (**), 0 flaws (~~), 10 warnings (==), 12 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 INTERNET DRAFT Mike Ko 2 draft-ko-iwarp-iser-02.txt John Hufferd 3 IBM Corporation 4 Mallikarjun Chadalapaka 5 Hewlett-Packard Company 6 Uri Elzur 7 Broadcom Corporation 8 Hemal Shah 9 Intel Corporation 10 Patricia Thaler 11 Agilent Technologies, Inc. 13 Expires: January, 2005 15 iSCSI Extensions for RDMA Specification 17 Status of this Memo 19 By submitting this Internet-Draft, I certify that any applicable 20 patent or other IPR claims of which I am aware have been disclosed, 21 or will be disclosed, and any of which I become aware will be 22 disclosed, in accordance with RFC 3668. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six 30 months and may be updated, replaced, or obsoleted by other documents 31 at any time. It is inappropriate to use Internet-Drafts as 32 reference material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/1id-abstracts.html. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 Abstract 42 iSCSI Extensions for RDMA provides the RDMA data transfer capability 43 to iSCSI [iSCSI] by layering iSCSI on top of the Remote Direct 44 Memory Access Protocol (RDMAP). The iWARP protocol suite provides 45 RDMA Read and Write services, which enable data to be transferred 46 directly into SCSI I/O Buffers without intermediate data copies. 47 This document describes the extensions to the iSCSI protocol to 48 support RDMA services as defined by the iWARP protocol suite. 50 Ko et. al. Expires January 2005 1 51 Table of Contents 53 1 Definitions and Acronyms....................................5 54 1.1 Definitions.................................................5 55 1.2 Acronyms...................................................10 56 2 Introduction...............................................13 57 2.1 Motivation.................................................13 58 2.2 Architectural Goals........................................14 59 2.3 Protocol Overview..........................................15 60 2.4 RDMA services and iSER.....................................16 61 2.4.1 STag......................................................16 62 2.4.2 Send......................................................17 63 2.4.3 RDMA Write................................................17 64 2.4.4 RDMA Read.................................................17 65 2.5 SCSI Read Overview.........................................18 66 2.6 SCSI Write Overview........................................18 67 2.7 iSCSI/iSER Layering........................................18 68 3 Upper Layer Interface Requirements.........................20 69 3.1 Operational Primitives offered by iSER.....................20 70 3.2 Operational Primitives used by iSER........................21 71 3.3 iSCSI Protocol Usage Requirements..........................22 72 4 Lower Layer Interface Requirements.........................24 73 4.1 Interactions with the iWARP Layer..........................24 74 4.2 Interactions with the Transport Layer......................25 75 5 Connection Setup and Termination...........................26 76 5.1 iSCSI/iSER Connection Setup................................26 77 5.1.1 Initiator Behavior........................................27 78 5.1.2 Target Behavior...........................................28 79 5.1.3 iSER Hello Exchange.......................................30 80 5.2 iSCSI/iSER Connection Termination..........................31 81 5.2.1 Normal Connection Termination at the Initiator............31 82 5.2.2 Normal Connection Termination at the Target...............31 83 5.2.3 Termination without Logout Request/Response PDUs..........32 84 6 Login/Text Operational Keys................................34 85 6.1 HeaderDigest and DataDigest................................34 86 6.2 MaxRecvDataSegmentLength...................................34 87 6.3 RDMAExtensions.............................................34 88 6.4 TargetRecvDataSegmentLength................................35 89 6.5 InitiatorRecvDataSegmentLength.............................36 90 6.6 OFMarker and IFMarker......................................36 91 7 iSCSI PDU Considerations...................................37 92 7.1 iSCSI Data-Type PDU........................................37 93 7.2 iSCSI Control-Type PDU.....................................38 94 7.3 iSCSI PDUs.................................................38 95 7.3.1 SCSI Command..............................................38 96 7.3.2 SCSI Response.............................................40 97 7.3.3 Task Management Function Request/Response.................41 99 Ko et. al. Expires January 2005 2 100 7.3.4 SCSI Data-out.............................................42 101 7.3.5 SCSI Data-in..............................................43 102 7.3.6 Ready To Transfer (R2T)...................................45 103 7.3.7 Asynchronous Message......................................47 104 7.3.8 Text Request & Text Response..............................47 105 7.3.9 Login Request & Login Response............................48 106 7.3.10 Logout Request & Logout Response........................48 107 7.3.11 SNACK Request...........................................48 108 7.3.12 Reject..................................................48 109 7.3.13 NOP-Out & NOP-In........................................49 110 8 Flow Control and STag Management...........................50 111 8.1 Flow Control for RDMA Send Message Types...................50 112 8.2 Flow Control for RDMA Read Resources.......................50 113 8.3 STag Management............................................51 114 8.3.1 Allocation of STags.......................................51 115 8.3.2 Invalidation of STags.....................................51 116 9 iSER Control and Data Transfer.............................53 117 9.1 iSER Header Format.........................................53 118 9.2 iSER Header Format for iSCSI Control-Type PDU..............53 119 9.3 iSER Header Format for iSER Hello Message..................55 120 9.4 iSER Header Format for iSER HelloReply Message.............56 121 9.5 SCSI Data Transfer Operations..............................57 122 9.5.1 SCSI Write Operation......................................57 123 9.5.2 SCSI Read Operation.......................................58 124 9.5.3 Bidirectional Operation...................................58 125 10 iSER Error Handling and Recovery...........................59 126 10.1 Error Handling............................................59 127 10.1.1 Errors in the Transport Layer...........................59 128 10.1.2 Errors in the iWARP protocol suite......................60 129 10.1.3 Errors in the iSER Layer................................60 130 10.1.4 Errors in the iSCSI Layer...............................62 131 10.2 Error Recovery............................................64 132 10.2.1 SNACK Handling and PDU Recovery.........................64 133 10.2.2 Connection Recovery.....................................65 134 11 Security Considerations....................................66 135 12 IANA Considerations........................................67 136 13 References.................................................68 137 13.1 Normative References......................................68 138 13.2 Informative References....................................68 139 14 Appendix...................................................69 140 14.1 iWARP Message Format for iSER.............................69 141 14.1.1 iWARP Message Format for iSER Hello Message.............69 142 14.1.2 iWARP Message Format for iSER HelloReply Message........70 143 14.1.3 iWARP Message Format for SCSI Read Command PDU..........71 144 14.1.4 iWARP Message Format for SCSI Read Data.................72 145 14.1.5 iWARP Message Format for SCSI Write Command PDU.........73 146 14.1.6 iWARP Message Format for RDMA Read Request..............74 148 Ko et. al. Expires January 2005 3 149 14.1.7 iWARP Message Format for Solicited SCSI Write Data......75 150 14.1.8 iWARP Message Format for SCSI Response PDU..............76 151 15 Author�s Address...........................................77 152 16 Acknowledgments............................................78 153 17 Full Copyright Statement...................................80 155 Table of Figures 157 Figure 1 Example of iSCSI/iSER Layering in Full Feature Mode....19 158 Figure 2 iSER Header Format.....................................53 159 Figure 3 iSER Header Format for iSCSI Control-Type PDU..........54 160 Figure 4 iSER Header Format for iSER Hello Message..............55 161 Figure 5 iSER Header Format for iSER HelloReply Message.........56 162 Figure 6 SendSE Message containing an iSER Hello Message........69 163 Figure 7 SendSE Message containing an iSER HelloReply Message...70 164 Figure 8 SendSE Message containing a SCSI Read Command PDU......71 165 Figure 9 RDMA Write Message containing SCSI Read Data...........72 166 Figure 10 SendSE Message containing a SCSI Write Command PDU....73 167 Figure 11 RDMA Read Request Message.............................74 168 Figure 12 RDMA Read Response Message containing SCSI Write Data.75 169 Figure 13 SendInvSE Message containing SCSI Response PDU........76 171 Ko et. al. Expires January 2005 4 172 1 Definitions and Acronyms 174 Some of the following definitions are taken from [RDMAP]. In those 175 definitions, the term ULP refers to the iSER Layer. 177 1.1 Definitions 179 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 180 The act of informing a Remote Peer that a local node�s buffer is 181 available to it. A Node makes a buffer available for incoming 182 RDMA Read Request Message or incoming RDMA Write Message access 183 by informing its RDMA/DDP peer of the Tagged Buffer identifiers 184 (STag, TO, and buffer length). This Advertisement of Tagged 185 Buffer information is not defined by RDMA/DDP and is left to the 186 ULP. A typical method would be for the Local Peer to embed the 187 Tagged Buffer's STag, TO, and buffer length in a Send Message 188 destined for the Remote Peer. 190 Completion (Completed, Complete, Completes) - Completion is defined 191 as the process by the iWARP layer to inform the ULP, in this 192 case the iSER Layer, that a particular RDMA Operation has 193 performed all functions specified for the RDMA Operation. 195 Connection - A connection is a logical circuit between the initiator 196 and the target, e.g., a TCP connection. Communication between 197 the initiator and the target occurs over one or more 198 connections. The connections carry control messages, SCSI 199 commands, parameters, and data within iSCSI Protocol Data Units 200 (iSCSI PDUs). 202 Connection Handle - An information element that identifies the 203 particular iSCSI connection and is unique for a given iSCSI-iSER 204 pair. Every invocation of an Operational Primitive MUST be 205 qualified with the Connection Handle. 207 Data Sink - The peer receiving a data payload. Note that the Data 208 Sink can be required to both send and receive RDMAP Messages to 209 transfer a data payload. 211 Data Source - The peer sending a data payload. Note that the Data 212 Source can be required to both send and receive RDMAP Messages 213 to transfer a data payload. 215 Datamover Interface (DI) - The interface between the iSCSI Layer and 216 the Datamover Layer as described in [DA]. 218 Ko et. al. Expires January 2005 5 219 Datamover Layer - A layer that is directly below the iSCSI Layer and 220 above the underlying transport layers. This layer exposes and 221 uses a set of transport independent Operational Primitives for 222 the communication between the iSCSI Layer and itself. The 223 Datamover layer, operating in conjunction with the transport 224 layers, moves the control and data information on the iSCSI 225 connection. In this specification, the iSER Layer is the 226 Datamover layer. 228 Datamover Protocol - A Datamover protocol is the wire-protocol that 229 is defined to realize the Datamover layer functionality. In this 230 specification, the iSER protocol is the Datamover protocol. 232 Event - An indication provided by the RDMAP Layer to the ULP to 233 indicate a Completion or other condition requiring immediate 234 attention. 236 Inbound RDMA Read Queue Depth (IRD) - The maximum number of incoming 237 outstanding RDMA Read Requests that the RNIC can handle on a 238 particular RDMAP Stream at the Data Source. 240 Invalidate STag - A mechanism used to prevent the Remote Peer from 241 reusing a previous explicitly Advertised STag, until the Local 242 Peer makes it available through a subsequent explicit 243 Advertisement. 245 I/O Buffer - A buffer that is used in a SCSI Read or Write operation 246 so SCSI data may be sent from or received into that buffer. 248 iSCSI - The iSCSI protocol is a mapping of the SCSI remote procedure 249 model of SAM-2 over the TCP, and the protocol itself is defined 250 in [iSCSI]. 252 iSCSI control-type PDU - Any iSCSI PDU that is not an iSCSI data- 253 type PDU and also not a SCSI Data-out PDU carrying solicited 254 data is defined as an iSCSI control-type PDU. Specifically, it 255 is to be noted that SCSI Data-out PDUs for unsolicited data are 256 defined as iSCSI control-type PDUs. 258 iSCSI data-type PDU - An iSCSI data-type PDU is defined as an iSCSI 259 PDU that causes data transfer, transparent to the remote iSCSI 260 Layer, to take place between the peer iSCSI nodes on a full 261 feature phase iSCSI connection. An iSCSI data-type PDU, when 262 requested for transmission by the sender iSCSI Layer, results in 263 the associated data transfer without the participation of the 264 remote iSCSI Layer, i.e. the PDU itself is not delivered as-is 266 Ko et. al. Expires January 2005 6 267 to the remote iSCSI Layer. The following iSCSI PDUs constitute 268 the set of iSCSI data-type PDUs - SCSI Data-In PDU and R2T PDU. 270 iSCSI Layer - A layer in the protocol stack implementation within an 271 end node that implements the iSCSI protocol and interfaces with 272 the iSER Layer via the Datamover Interface. 274 iSCSI PDU (iSCSI Protocol Data Unit) - The iSCSI Layer at the 275 initiator and the iSCSI Layer at the target divide their 276 communications into messages. The term "iSCSI protocol data 277 unit" (iSCSI PDU) is used for these messages. 279 iSCSI/iSER Connection - An iSER-assisted iSCSI connection. 281 iSCSI/iSER Session - An iSER-assisted iSCSI session. 283 iSCSI-iSER Pair - The iSCSI Layer and the underlying iSER Layer. 285 iSER - iSCSI Extensions for RDMA, the protocol defined in this 286 document. 288 iSER-assisted - A term generally used to describe the operation of 289 iSCSI when the iSER functionality is also enabled below the 290 iSCSI Layer for the specific iSCSI/iSER connection in question. 292 iSER-IRD - This variable represents the maximum number of incoming 293 outstanding RDMA Read Requests that the iSER Layer at the 294 initiator declares on a particular RDMAP Stream. 296 iSER-ORD - This variable represents the maximum number of 297 outstanding RDMA Read Requests that the iSER Layer can initiate 298 on a particular RDMAP Stream. This variable is maintained only 299 by the iSER Layer at the target. 301 iSER Layer - The layer that implements the iSCSI Extensions for RDMA 302 (iSER) protocol. 304 iWARP - A suite of wire protocols comprising of [RDMAP], [DDP], and 305 [MPA] when layered above [TCP]. [RDMAP] and [DDP] may be layered 306 above SCTP or other transport protocols. 308 Local Peer - The RDMAP implementation on the local end of the 309 connection. Used to refer to the local entity when describing 310 protocol exchanges or other interactions between two Nodes. 312 Node - A computing device attached to one or more links of a 313 network. A Node in this context does not refer to a specific 315 Ko et. al. Expires January 2005 7 316 application or protocol instantiation running on the computer. A 317 Node may consist of one or more RNICs installed in a host 318 computer. 320 Operational Primitive - An Operational Primitive is an abstract 321 functional interface procedure that requests another layer to 322 perform a specific action on the requestor�s behalf or notifies 323 the other layer of some event. The Datamover Interface between 324 an iSCSI Layer and a Datamover layer within an iSCSI end node 325 uses a set of Operational Primitives to define the functional 326 interface between the two layers. Note that not every invocation 327 of an Operational Primitive may elicit a response from the 328 requested layer. A full discussion of the Operational Primitive 329 types and request-response semantics available to iSCSI and iSER 330 can be found in [DA]. 332 Outbound RDMA Read Queue Depth (ORD) - The maximum number of 333 outstanding RDMA Read Requests that the RNIC can initiate on a 334 particular RDMAP Stream at the Data Sink. 336 RDMA-enabled Network Interface Controller (RNIC) - A network I/O 337 adapter or embedded controller with iWARP functionality. 339 RDMA Operation - A sequence of RDMAP Messages, including control 340 Messages, to transfer data from a Data Source to a Data Sink. 341 The following RDMA Operations are defined - RDMA Write 342 Operation, RDMA Read Operation, Send Operation, Send with 343 Invalidate Operation, Send with Solicited Event Operation, Send 344 with Solicited Event and Invalidate Operation, and Terminate 345 Operation. 347 RDMA Protocol (RDMAP) - A wire protocol that supports RDMA 348 Operations to transfer ULP data between a Local Peer and the 349 Remote Peer as described in [RDMAP]. 351 RDMA Read Operation - An RDMA Operation used by the Data Sink to 352 transfer the contents of a Data Source buffer from the Remote 353 Peer to a Data Sink buffer at the Local Peer. An RDMA Read 354 operation consists of a single RDMA Read Request Message and a 355 single RDMA Read Response Message. 357 RDMA Read Request - An RDMAP Message used by the Data Sink to 358 request the Data Source to transfer the contents of a buffer. 359 The RDMA Read Request Message describes both the Data Source and 360 the Data Sink buffers. 362 Ko et. al. Expires January 2005 8 363 RDMA Read Response - An RDMAP Message used by the Data Source to 364 transfer the contents of a buffer to the Data Sink, in response 365 to an RDMA Read Request. The RDMA Read Response Message only 366 describes the Data Sink buffer. 368 RDMA Write Operation - An RDMA Operation used by the Data Source to 369 transfer the contents of a Data Source buffer from the Local 370 Peer to a Data Sink buffer at the Remote Peer. The RDMA Write 371 Message only describes the Data Sink buffer. 373 RDMAP Message - The sequence of RDMAP packets which represent a 374 single RDMA operation or a part of RDMA Read Operation. 376 RDMAP Stream - A single bidirectional association between the peer 377 RDMAP layers on two Nodes over a single transport-level stream. 378 For iSER, the association is created when the iSCSI connection 379 transitions to iSER-assisted mode following a successful iSCSI 380 Login Phase during which iSER support is negotiated. 382 Remote Direct Memory Access (RDMA) - A method of accessing memory on 383 a remote system in which the local system specifies the remote 384 location of the data to be transferred. Employing an RNIC in the 385 remote system allows the access to take place without 386 interrupting the processing of the CPU(s) on the system. 388 Remote Peer - The RDMAP implementation on the opposite end of the 389 connection. Used to refer to the remote entity when describing 390 protocol exchanges or other interactions between two Nodes. 392 SCSI Layer - This layer builds/receives SCSI CDBs (Command 393 Descriptor Blocks) and sends/receives them with the remaining 394 command execute [SAM2] parameters to/from the iSCSI Layer. 396 Send - An RDMA Operation that transfers the contents of a ULP Buffer 397 from the Local Peer to a Buffer at the Remote Peer. 399 Send Message Type - A Send Message, Send with Invalidate Message, 400 Send with Solicited Event Message, or Send with Solicited Event 401 and Invalidate Message. 403 SendInvSE Message - A Send with Solicited Event and Invalidate 404 Message. 406 SendSE Message - A Send with Solicited Event Message 408 Ko et. al. Expires January 2005 9 409 Sequence Number (SN) - DataSN for a SCSI Data-in PDU and R2TSN for 410 an R2T PDU. The semantics for both types of sequence numbers 411 are as defined in [iSCSI]. 413 Session, iSCSI Session - The group of Connections that link an 414 initiator SCSI port with a target SCSI port form an iSCSI 415 session (equivalent to a SCSI I-T nexus). Connections can be 416 added to and removed from a session even while the I-T nexus is 417 intact. Across all connections within a session, an initiator 418 sees one and the same target. 420 Solicited Event (SE) - A facility by which an RDMA Operation sender 421 may cause an Event to be generated at the recipient, if the 422 recipient is configured to generate such an Event, when a Send 423 with Solicited Event or Send with Solicited Event and Invalidate 424 Message is received. 426 Steering Tag (STag) - An identifier of a Tagged Buffer on a Node as 427 defined in [RDMAP] and [DDP]. 429 Tagged Buffer - A buffer that is explicitly Advertised to a Remote 430 Peer through exchange of an STag, Tagged Offset, and length. 432 Tagged Offset (TO) - The offset within a Tagged Buffer. 434 Traditional iSCSI - Refers to the iSCSI protocol defined by [iSCSI] 435 (i.e. without the iSER enhancements). 437 Untagged Buffer - A buffer that is not explicitly Advertised to the 438 Remote Peer. 440 1.2 Acronyms 442 Acronym Definition 444 -------------------------------------------------------------- 446 CO Connection Only 448 CRC Cyclic Redundancy Check 450 DDP Direct Data Placement Protocol 452 DI Datamover Interface 454 IANA Internet Assigned Numbers Authority 456 Ko et. al. Expires January 2005 10 457 IETF Internet Engineering Task Force 459 I/O Input - Output 461 IO Initialize Only 463 IP Internet Protocol 465 IPsec Internet Protocol Security 467 iSER iSCSI Extensions for RDMA 469 ITT Initiator Task Tag 471 LO Leading Only 473 MPA Marker PDU Aligned Framing for TCP 475 NOP No Operation 477 NSG Next Stage (during the iSCSI Login Phase) 479 OS Operating System 481 PDU Protocol Data Unit 483 R2T Ready To Transfer 485 R2TSN Ready To Transfer Sequence Number 487 RDMA Remote Direct Memory Access 489 RDMAP Remote Direct Memory Access Protocol 491 RFC Request For Comments 493 RNIC RDMA-enabled Network Interface Controller 495 SAM2 SCSI Architecture Model - 2 497 SCSI Small Computer Systems Interface 499 SNACK Selective Negative Acknowledgment - also 501 Sequence Number Acknowledgement for data 503 STag Steering Tag 505 Ko et. al. Expires January 2005 11 506 SW Session Wide 508 TCP Transmission Control Protocol 510 TO Tagged Offset 512 ULP Upper Level Protocol 514 Ko et. al. Expires January 2005 12 515 2 Introduction 517 2.1 Motivation 519 The iSCSI protocol ([iSCSI]) is a mapping of the SCSI remote 520 procedure invocation model (see [SAM2]) over the TCP protocol. SCSI 521 commands are carried by iSCSI requests and SCSI responses and status 522 are carried by iSCSI responses. Other iSCSI protocol exchanges and 523 SCSI Data are also transported in iSCSI PDUs. 525 Out-of-order TCP segments in the traditional iSCSI model have to be 526 stored and reassembled before the iSCSI protocol layer within an end 527 node can place the data in the iSCSI buffers. This reassembly is 528 required because not every TCP segment is likely to contain an iSCSI 529 header to enable its placement and TCP itself does not have a built- 530 in mechanism for signaling ULP message boundaries to aid placement 531 of out-of-order segments. This TCP reassembly at high network 532 speeds is quite counter-productive for the following reasons: wasted 533 memory bandwidth in data copying, need for reassembly memory, wasted 534 CPU cycles in data copying, and the general store-and-forward 535 latency from an application perspective. [iSCSI] itself recognized 536 that TCP reassembly could be a serious issue and had introduced the 537 notion of a "sync and steering layer" that is optional to implement 538 and use. [iSCSI] further defined one specific sync and steering 539 layer - called "markers" - an application-level way of framing iSCSI 540 PDUs within the TCP data stream even when the TCP segments are not 541 yet reassembled to be in-order. 543 With these [iSCSI] defined techniques, a Network Interface 544 Controller customized for iSCSI (SNIC) could offload the TCP/IP 545 processing and support direct data placement. 547 Supporting direct data placement is the main function of the iWARP 548 protocol suite. A NIC enhanced with the RDMAP/DDP functions (RNIC) 549 can be used by any application that has been extended to support 550 RDMA. 552 With the availability of RNICs within a host system, which does not 553 have SNICs, it is appropriate for iSCSI to be able to exploit the 554 direct data placement function of the RNIC like other applications. 556 iSCSI Extensions for RDMA (iSER) is designed precisely to take 557 advantage of generic RDMA technologies - iSER�s goal is to permit 558 iSCSI to employ direct data placement and RDMA capabilities using a 559 generic RNIC. In summary, iSCSI/iSER protocol stack is designed to 560 enable scaling to high speeds by relying on a generic data placement 562 Ko et. al. Expires January 2005 13 563 process and RDMA technologies and products, which enable direct data 564 placement of both in-order and out-of-order data. 566 This document describes iSER as a protocol extension to iSCSI, both 567 for convenience of description and also because it is true in a very 568 strict protocol sense. However, it is to be noted that iSER is in 569 reality extending the connectivity of the iSCSI protocol defined in 570 [iSCSI], and the name iSER reflects this reality. 572 When the iSCSI protocol defined by [iSCSI] (i.e. without the iSER 573 enhancements) is intended in the rest of the document, the term 574 "traditional iSCSI" is used to make the intention clear. 576 2.2 Architectural Goals 578 This section summarizes the architectural goals that guided the 579 design of iSER. 581 1. Provide iWARP-based data transfer model for iSCSI that enables 582 direct in order or out of order data placement of SCSI data into 583 pre-allocated SCSI buffers while maintaining in order data 584 delivery. 586 2. Not require any major changes to SCSI Architecture Model (SAM/SAM- 587 2/SAM-3) and SCSI command set standards. 589 3. Utilize existing traditional iSCSI infrastructure (sometimes 590 referred to as "iSCSI ecosystem") including but not limited to 591 MIB, bootstrapping, negotiation, naming & discovery, and security. 593 4. Not require iSCSI full feature phase interoperability between an 594 end node operating in traditional iSCSI mode, and an end node 595 operating in iSER-assisted mode. 597 5. Allow initiator and target implementations that utilize generic 598 RNICs and implement iSCSI and iSER in software (not require iSCSI 599 or iSER specific assists in the iWARP protocol suite or RNIC). 601 6. Require full and only generic iWARP functionality at both the 602 initiator and the target. 604 7. Require a session to operate in the traditional iSCSI data 605 transfer mode if iSER is not supported by either the initiator or 606 the target. 608 8. Implement a light weight Datamover protocol for iSCSI with minimal 609 state maintenance. 611 Ko et. al. Expires January 2005 14 612 2.3 Protocol Overview 614 Consistent with the architectural goals stated in section 2.2, the 615 iSER protocol does not require changes in the iSCSI ecosystem or any 616 related SCSI specifications. iSER protocol defines the mapping of 617 iSCSI PDUs to RDMAP Messages in such a way that it is entirely 618 feasible to realize iSCSI/iSER implementations that are based on 619 generic RNICS. The iSER protocol layer requires minimal state 620 maintenance to assist an iSCSI full feature phase connection, 621 besides being oblivious to the notion of an iSCSI session. The 622 crucial protocol aspects of iSER may be summarized thus: 624 1. iSER-assisted mode is negotiated during the iSCSI login for each 625 connection, but an entire iSCSI session MUST operate in one mode 626 (i.e. one connection in the session cannot operate in iSER- 627 assisted mode while a different connection of the same session is 628 already in full feature mode in the traditional iSCSI mode). 630 2. Once in iSER-assisted mode, all iSCSI interactions on that 631 connection use RDMAP Messages. 633 3. A Send Message Type is used for carrying an iSCSI control-type 634 PDU preceded by an iSER header. See section 7.2 for more details 635 on iSCSI control-type PDUs. 637 4. RDMA Write, RDMA Read Request, and RDMA Read Response Messages 638 are used for carrying control and all data information associated 639 with the iSCSI data-type PDUs. See section 7.1 for more details 640 on iSCSI data-type PDUs. 642 5. Target drives all data transfer (with the exception of iSCSI 643 unsolicited data) for SCSI writes and SCSI reads, by issuing RDMA 644 Read Requests and RDMA Writes respectively. 646 6. The iWARP protocol suite guarantees data integrity. (For TCP, 647 iWARP uses a CRC-enhanced framing layer on TCP). For this 648 reason, iSCSI header and data digests are negotiated to "None" 649 for iSCSI/iSER sessions. 651 7. The iSCSI error recovery hierarchy defined by [iSCSI] is fully 652 supported by iSER. 654 8. iSER requires no changes to iSCSI authentication, security, and 655 text mode negotiation mechanisms. 657 Note that traditional iSCSI implementations may have to be adapted 658 to employ iSER. It is expected that the adaptation when required is 660 Ko et. al. Expires January 2005 15 661 likely to be centered around the upper layer interface requirements 662 of iSER (section 3). 664 2.4 RDMA services and iSER 666 iSER is designed to work with software and/or hardware protocol 667 stacks providing the protocol services defined in [RDMAP]. The 668 following subsections describe the key protocol elements of RDMAP 669 that iSER relies on. 671 2.4.1 STag 673 An STag is the RNIC-unique identifier of an I/O Buffer that the iSER 674 Layer Advertises to the remote iSCSI/iSER node in order to complete 675 a SCSI I/O. 677 In iSER, Advertisement is the act of informing the target by the 678 initiator that an I/O Buffer is available at the initiator for RDMA 679 Read or RDMA Write access by the target. The initiator Advertises 680 the I/O Buffer by including the STag in the header of an iSER 681 Message containing the SCSI Command PDU to the target. The base 682 Tagged Offset is not explicitly specified, but the target must 683 always assume it as zero. The buffer length is as specified in the 684 SCSI Command PDU. 686 The iSER Layer at the initiator Advertises the STag for the I/O 687 Buffer of each SCSI I/O to the iSER Layer at the target in the iSER 688 header of the SendSE Message containing the SCSI Command PDU, unless 689 the I/O can be completely satisfied by unsolicited data alone. 691 The iSER Layer at the target provides the STag for the I/O Buffer 692 that is the Data Sink of an RDMA Read Operation (section 2.4.4) to 693 the RDMAP layer on the initiator node - i.e. this is completely 694 transparent to the iSER Layer at the initiator. 696 The iSER protocol is defined so that the Advertised STag is 697 automatically invalidated upon a normal completion of the associated 698 task. This automatic invalidation is realized via the SendInvSE 699 Message carrying the SCSI Response PDU. There are two exceptions to 700 this automatic invalidation - bidirectional commands, and abnormal 701 completion of a command. The iSER Layer at the initiator is 702 required to explicitly invalidate the STag in these cases, in 703 addition to sanity checking the automatic invalidation even when 704 that does happen. 706 Ko et. al. Expires January 2005 16 707 2.4.2 Send 709 Send is the RDMA Operation that is not addressed to an Advertised 710 buffer by the sending side, and thus uses Untagged buffers on the 711 receiving side. 713 The iSER Layer at the initiator uses the Send Operation to transmit 714 any iSCSI control-type PDU to the target. As an example, the 715 initiator uses Send Operations to transfer iSER Messages containing 716 SCSI Command PDUs to the iSER Layer at the target. 718 An iSER layer at the target uses the Send Operation to transmit any 719 iSCSI control-type PDU to the initiator. As an example, the target 720 uses Send Operations to transfer iSER Messages containing SCSI 721 Response PDUs to the iSER Layer at the initiator. 723 2.4.3 RDMA Write 725 RDMA Write is the RDMA Operation that is used to place data into an 726 Advertised buffer on the receiving side. The sending side addresses 727 the Message using an STag and a Tagged Offset that are valid on the 728 Data Sink. 730 The iSER Layer at the target uses the RDMA Write Operation to 731 transfer the contents of a local I/O Buffer to an Advertised I/O 732 Buffer at the initiator. The iSER Layer at the target uses the RDMA 733 Write to transfer whole or part of the data required to complete a 734 SCSI Read command. 736 The iSER Layer at the initiator does not employ RDMA Writes. 738 2.4.4 RDMA Read 740 RDMA Read is the RDMA Operation that is used to retrieve data from 741 an Advertised buffer on a remote node. The sending side of the RDMA 742 Read Request addresses the Message using an STag and a Tagged Offset 743 that are valid on the Data Source in addition to providing a valid 744 local STag and Tagged Offset that identify the Data Sink. 746 The iSER Layer at the target uses the RDMA Read Operation to 747 transfer the contents of an Advertised I/O Buffer at the initiator 748 to a local I/O Buffer at the target. The iSER Layer at the target 749 uses the RDMA Read to fetch whole or part of the data required to 750 complete a SCSI Write. 752 The iSER Layer at the initiator does not employ RDMA Reads. 754 Ko et. al. Expires January 2005 17 755 2.5 SCSI Read Overview 757 The iSER Layer at the initiator receives the SCSI Command PDU from 758 the iSCSI Layer. The iSER Layer at the initiator generates an STag 759 for the I/O Buffer of the SCSI Read and Advertises the buffer by 760 including the STag as part of the iSER header for the PDU. The iSER 761 Message is transferred to the target using a SendSE Message. 763 The iSER Layer at the target uses one or more RDMA Writes to 764 transfer the data required to complete the SCSI Read. 766 The iSER Layer at the target uses a SendInvSE Message to transfer 767 the SCSI Response PDU back to the iSER Layer at the initiator. The 768 iSER Layer at the initiator notifies the iSCSI Layer of the 769 availability of the SCSI Response PDU. 771 2.6 SCSI Write Overview 773 The iSER Layer at the initiator receives the SCSI Command PDU from 774 the iSCSI Layer. If solicited data transfer is involved, the iSER 775 Layer at the initiator generates an STag for the I/O Buffer of the 776 SCSI Write and Advertises the buffer by including the STag as part 777 of the iSER header for the PDU. The iSER Message is transferred to 778 the target using a SendSE Message. 780 The iSER Layer at the initiator may optionally send one or more non- 781 immediate unsolicited data PDUs to the target using Send Message 782 Types. 784 If solicited data transfer is involved, the iSER Layer at the target 785 uses one or more RDMA Reads to transfer the data required to 786 complete the SCSI Write. 788 The iSER Layer at the target uses a SendInvSE Message to transfer 789 the SCSI Response PDU back to the iSER Layer at the initiator. The 790 iSER Layer at the initiator notifies the iSCSI Layer of the 791 availability of the SCSI Response PDU. 793 2.7 iSCSI/iSER Layering 795 iSCSI Extensions for RDMA (iSER) is layered between the iSCSI layer 796 and the RDMAP layer. Figure 1 shows an example of the relationship 797 between SCSI, iSCSI, iSER, RDMAP, and the rest of the iWARP stack 798 when the transport layer is TCP. 800 Ko et. al. Expires January 2005 18 801 +-------------------------------------+ 802 | SCSI | 803 +-------------------------------------+ 804 | iSCSI | 805 DI ------> +-------------------------------------+ 806 | iSER | 807 +-------------------------------------+ 808 | RDMAP | 809 +-------------------------------------+ 810 | DDP | 811 +-------------------------------------+ 812 | MPA | 813 +-------------------------------------+ 814 | TCP | 815 +-------------------------------------+ 816 Figure 1 Example of iSCSI/iSER Layering in Full Feature Mode 818 Ko et. al. Expires January 2005 19 819 3 Upper Layer Interface Requirements 821 This section discusses the upper layer interface requirements in the 822 form of an abstract model of the required interactions between the 823 iSCSI Layer and the iSER Layer. The abstract model used here is 824 derived from the architectural model described in [DA]. The 825 interface requirements are specified by Operational Primitives. An 826 Operational Primitive is an abstract functional interface procedure 827 between the iSCSI Layer and the iSER Layer that requests one layer 828 to perform a specific action on behalf of the other layer or 829 notifies the other layer of some event. 831 The abstract model and Operational Primitives defined in this 832 section are for the ease of description of iSER protocol. In the 833 rest of the iSER specification, the compliance statements related to 834 the use of these Operational Primitives are only for the purpose of 835 the required interactions between the iSCSI Layer and the iSER 836 Layer. Note that the compliance statements related to Operational 837 Primitives in the rest of this specification only mandate functional 838 equivalence on implementations, but do not put any requirements on 839 the implementation specifics of the interface between the iSCSI 840 Layer and the iSER Layer. 842 3.1 Operational Primitives offered by iSER 844 The iSER protocol layer MUST support the following Operational 845 Primitives to be used by the iSCSI protocol layer. 847 1. Send_Control: The iSCSI Layers at the initiator and the target 848 use this to request the outbound transfer of an iSCSI control- 849 type PDU. 850 2. Put_Data: The iSCSI Layer at the target uses this to request the 851 outbound transfer of data for a SCSI Data-in PDU. 853 3. Get_Data: The iSCSI Layer at the target uses this to request the 854 inbound transfer of solicited data requested by an R2T PDU. 856 4. Allocate_Connection_Resources: The iSCSI Layers at the initiator 857 and the target use this to request the allocation of all iWARP- 858 specific connection resources required for an operational 859 iSCSI/iSER connection. 861 5. Deallocate_Connection_Resources: The iSCSI Layers at the 862 initiator and the target use this to request the deallocation of 863 all iWARP-specific connection resources that were earlier 864 allocated as a result of a successful 865 Allocate_Connection_Resources invocation. 867 Ko et. al. Expires January 2005 20 868 6. Enable_Datamover: The iSCSI Layers at the initiator and the 869 target use this to request that a specified iSCSI connection be 870 transitioned to iSER-assisted mode. 872 7. Connection_Terminate: The iSCSI Layers at the initiator and the 873 target use this to request that a specified iSCSI/iSER connection 874 be terminated and all the associated connection and task 875 resources be freed. 877 8. Notice_Key_Values: The iSCSI Layers at the initiator and the 878 target use this to request that the specified Key-Value pairs are 879 to be taken note of by the local Datamover layer. 881 9. Deallocate_Task_Resources: The iSCSI Layers at the initiator and 882 the target use this to request the deallocation of all iWARP- 883 specific task resources that may have been allocated as part of 884 the task initiation by the iSER Layer. This Operational 885 Primitive is only used for tasks that did not conclude with a 886 SCSI Response PDU. 888 3.2 Operational Primitives used by iSER 890 Note that in the following discussion and in the rest of the 891 document, a PDU is described as "available" to the iSCSI Layer when 892 the iSER Layer notifies the iSCSI Layer of the reception of that 893 inbound PDU, along with an implementation-specific indication as to 894 where the received PDU is. 896 The iSER layer MUST use the following Operational Primitives offered 897 by the iSCSI protocol layer via DI. 899 1. Control_Notify: The iSER Layers at both the initiator and the 900 target use this to notify the iSCSI Layer of the availability of 901 an inbound iSCSI control-type PDU. 903 2. Data_Completion_Notify: The iSER Layer at the target uses this to 904 notify the iSCSI Layer of the completion of inbound/outbound data 905 transfer that was requested by the iSCSI Layer when the request 906 was qualified with Notify_Enable set. 908 3. Data_ACK_Notify: The iSER Layer at the target uses this to notify 909 the iSCSI Layer of the arrival of the data acknowledgement (as 910 defined in [iSCSI]) requested earlier by the iSCSI Layer for the 911 outbound data transfer (Data-in PDUs). 913 4. Connection_Terminate_Notify: The iSER Layers at both the 914 initiator and the target use this to notify the iSCSI Layer of 916 Ko et. al. Expires January 2005 21 917 the termination of an iSCSI/iSER connection. However, 918 Connection_Terminate_Notify is not invoked when the termination 919 of the connection was earlier requested by the local iSCSI Layer. 921 3.3 iSCSI Protocol Usage Requirements 923 An iSER-assisted iSCSI protocol layer should satisfy the following 924 protocol usage requirements from the iSER protocol: 926 1. The iSCSI Layers at both the initiator and the target MUST 927 negotiate the new RDMAExtensions key (see section 6.3) to "Yes" 928 on the leading connection. If the invocation of the 929 Allocate_Connection_Resources Operational Primitive to the iSER 930 layer fails after this key is negotiated to "Yes", the iSCSI 931 layer MUST fail the iSCSI Login process or terminate the 932 connection as appropriate. See section 10.1.3.1 and 10.1.3.2 933 for details. 935 2. The iSCSI Layers at both the initiator and the target MUST 936 negotiate the HeaderDigest key and the DataDigest key to "None" 937 during the login phase for iSER-assisted iSCSI connections. 939 3. The iSCSI Layer at the initiator MUST set ExpDataSN = 0 in Task 940 Management Function Requests for Task Allegiance Reassignment 941 for read/bidirectional commands, so as to cause the target to 942 send all unacknowledged read data. 944 4. The iSCSI Layer at the target MUST always return the SCSI status 945 in a separate SCSI Response PDU for read commands, i.e., there 946 MUST NOT be a "phase collapse" in concluding a SCSI Read 947 Command. 949 5. The iSCSI Layers at both the initiator and the target MUST 950 successfully negotiate the new InitiatorRecvDataSegmentLength 951 key for each iSER-assisted connection, and follow its defined 952 semantics. 954 6. The iSCSI Layer at both the initiator and the target MUST 955 successfully negotiate the new TargetRecvDataSegmentLength key 956 for each iSER-assisted connection, and follow its defined 957 semantics. 959 7. The iSCSI Layer at the initiator SHOULD NOT issue proactive 960 (based on time-outs) SNACKs for PDUs that it presumes are lost. 962 Ko et. al. Expires January 2005 22 963 8. The iSCSI Layers at both the initiator and the target MUST 964 negotiate the OFMarker key and the IFMarker key to "No" during 965 the login phase for an iSER-assisted iSCSI connection. 967 Ko et. al. Expires January 2005 23 968 4 Lower Layer Interface Requirements 970 4.1 Interactions with the iWARP Layer 972 The iSER protocol layer is layered on top of the iWARP protocol 973 stack (see Figure 1) and the following are the key features that are 974 assumed to be supported by iWARP: 976 * The RDMAP layer supports all basic RDMAP operations, including 977 RDMA Write Operation, RDMA Read Operation, Send Operation, Send 978 with Invalidate Operation, Send with Solicited Event Operation, 979 Send with Solicited Event & Invalidate Operation, and Terminate 980 Operation. 982 * The RDMAP/DDP layers provide reliable, in-order message delivery 983 and direct data placement. 985 * The RDMAP layer encapsulates a single iSER Message into a single 986 RDMAP message on the Data Source side. The RDMAP layer 987 decapsulates the iSER Message before delivering it to the iSER 988 Layer on the Data Sink side. 990 * When the iSER Layer provides the STag to be remotely invalidated 991 to the RDMAP layer for a SendInvSE Message, the RDMAP layer uses 992 this STag as the STag to be invalidated in the SendInvSE Message. 994 * The RDMAP layer uses the STag and Tagged Offset provided by the 995 iSER Layer for the RDMA Write and RDMA Read Request Messages. 997 * When the RDMAP layer delivers the content of an RDMA Send Message 998 Type to the iSER Layer, the RDMAP layer provides the length of 999 the RDMA Send message. This ensures that the iSER Layer does not 1000 have to carry a length field in the iSER header. 1002 * When the RDMAP layer delivers the SendSE or SendInvSE Message to 1003 the iSER Layer, it notifies the iSER Layer with the mechanism 1004 provided on that interface. 1006 * When the RDMAP layer delivers a SendInvSE Message to the iSER 1007 Layer, it passes the value of the STag that was invalidated. 1009 * The RDMAP layer propagates all status and error indications to 1010 the iSER Layer. 1012 * The iWARP implementation supports the enabling of the iWARP mode 1013 after Connection establishment. 1015 Ko et. al. Expires January 2005 24 1016 * Whenever the iSER Layer terminates the RDMAP Stream, the RDMAP 1017 layer terminates the associated Connection. 1019 4.2 Interactions with the Transport Layer 1021 The iSER Layer does not interface with the transport layer (e.g., 1022 TCP) directly. During Connection setup, the iSCSI Layer is 1023 responsible for setting up the Connection. If the login is 1024 successful, the iSCSI Layer invokes the Enable_Datamover Operational 1025 Primitive to request the iSER Layer to transition to the iSER- 1026 assisted mode for that iSCSI connection. See section 5.1 on 1027 iSCSI/iSER Connection Setup. After transitioning to iSER-assisted 1028 mode, the iWARP layer is responsible for maintaining the Connection 1029 and reports to the iSER Layer of any Connection failures. 1031 Ko et. al. Expires January 2005 25 1032 5 Connection Setup and Termination 1034 5.1 iSCSI/iSER Connection Setup 1036 During connection setup, the iSCSI Layer at the initiator is 1037 responsible for establishing a connection with the target. After 1038 the connection is established, the iSCSI Layers at the initiator and 1039 the target enter the Login Phase using the same rules as outlined in 1040 [iSCSI]. Transition to iSER-assisted mode occurs when the 1041 connection transitions into the iSCSI full feature phase following a 1042 successful login negotiation between the initiator and the target in 1043 which iSER-assisted mode is negotiated and the necessary iWARP 1044 resources have been allocated at both the initiator and the target. 1046 iSER-assisted mode MUST be enabled only if it is negotiated on the 1047 leading connection during the LoginOperationalNegotiation Stage of 1048 the iSCSI Login Phase. iSER-assisted mode is negotiated using the 1049 RDMAExtensions= key. Both the initiator and the 1050 target MUST exchange the RDMAExtensions key with the value set to 1051 "Yes" to enable iSER-assisted mode. If both the initiator and the 1052 target fail to negotiate the RDMAExtensions key set to "Yes", then 1053 the connection MUST continue with the login semantics as defined in 1054 [iSCSI]. 1056 iSER-assisted mode is defined for a Normal session only and the 1057 RDMAExtensions key MUST NOT be negotiated for a Discovery session. 1059 An iSER enabled node is not required to initiate the RDMAExtensions 1060 key exchange if its preference is for the traditional iSCSI mode. 1061 The RDMAExtensions key, if offered, MUST be sent in the first 1062 available Login Response or Login Request PDU in the 1063 LoginOperationalNegotiation stage. This is due to the fact that the 1064 value of some login parameters might depend on whether iSER-assisted 1065 mode is enabled or not. 1067 iSER-assisted mode is a session-wide attribute. If both the 1068 initiator and the target negotiated RDMAExtensions="Yes" on the 1069 leading connection of a session, then all subsequent connections of 1070 the same session MUST enable iSER-assisted mode without having to 1071 exchange RDMAExtensions key during the iSCSI Login Phase. 1072 Conversely, if both the initiator and the target failed to negotiate 1073 RDMAExtensions to "Yes" on the leading connection of a session, then 1074 the RDMAExtensions key MUST NOT be negotiated further on any 1075 additional subsequent connection of the session. 1077 When the RDMAExtensions key is negotiated to "Yes", the HeaderDigest 1078 and the DataDigest keys MUST be negotiated to "None" on all 1080 Ko et. al. Expires January 2005 26 1081 iSCSI/iSER connections participating in that iSCSI session. This is 1082 because, for an iSCSI/iSER connection, the iWARP protocol suite 1083 provides a CRC32c-based error detection for all iWARP Messages. 1084 Furthermore, all SCSI Read data are sent using RDMA Write Messages 1085 instead of the SCSI Data-in PDUs, and all solicited SCSI write data 1086 are sent using RDMA Read Response Messages instead of the SCSI Data- 1087 out PDUs. HeaderDigest and DataDigest which apply to iSCSI PDUs 1088 would not be appropriate for RDMA Read and RDMA Write operations 1089 used with iSER. 1091 5.1.1 Initiator Behavior 1093 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1094 mode, then on the initiator side, prior to sending the Login Request 1095 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1096 to FullFeaturePhase, the iSCSI Layer MUST invoke the 1097 Allocate_Connection_Resources Operational Primitive to request the 1098 iSER Layer to allocate the resources necessary to support iWARP. 1099 The iWARP resources required are defined by implementation and are 1100 outside the scope of this specification. Optionally, the iSCSI 1101 Layer MAY invoke the Notice_Key_Values Operational Primitive before 1102 invoking the Allocate_Connection_Resources Operational Primitive to 1103 request the iSER Layer to take note of the negotiated values of the 1104 iSCSI keys for the Connection. The specific keys to be passed in as 1105 input qualifiers are implementation dependent. These may include, 1106 but not limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1108 Among the iWARP resources allocated at the initiator is the Inbound 1109 RDMA Read Queue Depth (IRD). As described in section 9.5.1, R2Ts 1110 are transformed by the target into RDMA Read operations. IRD limits 1111 the maximum number of simultaneously incoming outstanding RDMA Read 1112 Requests per an RDMAP Stream from the target to the initiator. The 1113 required value of IRD is outside the scope of the iSER 1114 specification. The iSER Layer at the initiator MUST set IRD to 1 or 1115 higher if R2Ts are to be used in the connection. However, the iSER 1116 Layer at the initiator MAY set IRD to 0 based on implementation 1117 configuration which indicates that no R2Ts will be used on that 1118 connection. Initially, the iSER-IRD value at the initiator SHOULD be 1119 set to the IRD value at the initiator and MUST NOT be more than the 1120 IRD value. 1122 On the other hand, the Outbound RDMA Read Queue Depth (ORD) MAY be 1123 set to 0 since the iSER Layer at the initiator does not issue RDMA 1124 Read Requests to the target. 1126 Failure to allocate the requested iWARP resources locally results in 1127 a login failure and its handling is described in section 10.1.3.1. 1129 Ko et. al. Expires January 2005 27 1130 If the iSER Layer at the initiator is successful in allocating the 1131 necessary connection resources for iWARP, the following events MUST 1132 occur in the specified sequence: 1134 1. The iSER Layer MUST return a success status to the iSCSI Layer 1135 in response to the Allocate_Connection_Resources Operational 1136 Primitive. 1138 2. After the target returns the Login Response with the T bit set 1139 to 1 and the NSG field set to FullFeaturePhase, and a status 1140 class of 0 (Success), the iSCSI Layer MUST invoke the 1141 Enable_Datamover Operational Primitive with the following 1142 qualifiers to request the iSER Layer to transition to iSER- 1143 assisted mode (See section 10.1.4.6 for the case when the status 1144 class is not Success.): 1146 a. Connection_Handle that identifies the iSCSI connection. 1148 b. Transport_Connection_Descriptor which identifies the 1149 specific transport connection associated with the 1150 Connection_Handle. 1152 3. The iSER Layer MUST enable iWARP and transition the connection 1153 to iSER-assisted mode. 1155 4. The iSER Layer MUST send the iSER Hello Message as the first 1156 RDMAP message. See Section 5.1.3 on iSER Hello Exchange. 1158 5.1.2 Target Behavior 1160 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1161 mode, then on the target side, prior to sending the Login Response 1162 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1163 to FullFeaturePhase, the iSCSI Layer MUST invoke the 1164 Allocate_Connection_Resources Operational Primitive to request the 1165 iSER Layer to allocate the resources necessary to support iWARP. 1166 The iWARP resources required are defined by implementation and are 1167 outside the scope of this specification. Optionally, the iSCSI 1168 Layer MAY invoke the Notice_Key_Values Operational Primitive before 1169 invoking the Allocate_Connection_Resources Operational Primitive to 1170 request the iSER Layer to take note of the negotiated values of the 1171 iSCSI keys for the Connection. The specific keys to be passed in as 1172 input qualifiers are implementation dependent. These may include, 1173 but not limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1175 Among the iWARP resources allocated at the target is the Outbound 1176 RDMA Read Queue Depth (ORD). As described in section 9.5.1, R2Ts are 1178 Ko et. al. Expires January 2005 28 1179 transformed by the target into RDMA Read operations. The ORD limits 1180 the maximum number of simultaneously outstanding RDMA Read Requests 1181 per RDMAP Stream from the target to the initiator. Initially, the 1182 iSER-ORD value at the target SHOULD be set to the ORD value at the 1183 target. 1185 On the other hand, the IRD at the target MAY be set to 0 since the 1186 iSER Layer at the target does not expect RDMA Read Requests to be 1187 issued by the initiator. Failure to allocate the requested iWARP 1188 resources locally is a negotiation failure and is described in 1189 section 10.1.3.2. 1191 If the iSER Layer at the target is successful in allocating the 1192 necessary iWARP resources, the following events MUST occur in the 1193 specified sequence: 1195 1. The iSER Layer MUST return a success status to the iSCSI Layer 1196 in response to the Allocate_Connection_Resources Operational 1197 Primitive. 1199 2. The iSCSI Layer MUST invoke the Enable_Datamover Operational 1200 Primitive with the following qualifiers to request the iSER 1201 Layer to transition to iSER-assisted mode: 1203 a. Connection_Handle that identifies the iSCSI connection. 1205 b. Transport_Connection_Descriptor which identifies the 1206 specific transport connection associated with the 1207 Connection_Handle. 1209 c. The final transport layer (e.g. TCP) message containing the 1210 Login Response with the T bit set to 1 and the NSG field set 1211 to FullFeaturePhase 1213 3. The iSER Layer MUST send the final SCSI Login Response PDU in 1214 byte stream mode to conclude the iSCSI Login Phase. 1216 4. After sending the final SCSI Login Response PDU in byte stream 1217 mode, the iSER Layer MUST enable iWARP and transition the 1218 connection to iSER-assisted mode. 1220 5. After receiving the iSER Hello Message from the initiator, the 1221 iSER Layer MUST respond with the iSER HelloReply Message to be 1222 sent as the first RDMAP Message. See section 5.1.3 on iSER 1223 Hello Exchange for more details. 1225 Ko et. al. Expires January 2005 29 1226 Note: In the above sequence, the operations as described in the 1227 bullets 3 and 4 must be performed atomically. Failure to do this may 1228 result in race conditions. 1230 5.1.3 iSER Hello Exchange 1232 After the connection transitions into the iSER-assisted mode, the 1233 first RDMAP Message sent by the iSER Layer at the initiator to the 1234 target MUST be the iSER Hello Message. The iSER Hello Message is 1235 used by the iSER Layer at the initiator to declare iSER parameters 1236 to the target. See section 9.3 on iSER Header Format for iSER Hello 1237 Message 1239 In response to the iSER Hello Message, the iSER Layer at the target 1240 MUST return the iSER HelloReply Message as the first RDMAP Message 1241 sent by the target. The iSER HelloReply Message is used by the iSER 1242 Layer at the target to declare iSER parameters to the initiator. 1243 See section 9.4 on iSER Header Format for iSER HelloReply Message. 1245 In the iSER Hello Message, the iSER Layer at the initiator declares 1246 the iSER-IRD value to the target. 1248 Upon receiving the iSER Hello Message, the iSER Layer at the target 1249 MUST set the iSER-ORD value to the minimum of the iSER-ORD value at 1250 the target and the iSER-IRD value declared by the initiator. The 1251 iSER Layer at the target MAY adjust (lower) its ORD value to match 1252 the iSER-ORD value if the iSER-ORD value is smaller than the ORD 1253 value at the target in order to free up the unused resources. 1255 In the iSER HelloReply Message, the iSER Layer at the target 1256 declares the iSER-ORD value to the initiator. 1258 Upon receiving the iSER HelloReply Message, the iSER Layer at the 1259 initiator MAY adjust (lower) its IRD value to match the iSER-ORD 1260 value in order to free up the unused resources, if the iSER-ORD 1261 value declared by the target is smaller than the iSER-IRD value 1262 declared by the initiator. 1264 It is an iSER level negotiation failure if the iSER parameters 1265 declared in the iSER Hello Message by the initiator is unacceptable 1266 to the target. See section 10.1.3.3 on the handling of the error 1267 situation. 1269 Ko et. al. Expires January 2005 30 1270 5.2 iSCSI/iSER Connection Termination 1272 5.2.1 Normal Connection Termination at the Initiator 1274 The iSCSI Layer at the initiator terminates an iSCSI/iSER connection 1275 normally by invoking the Send_Control Operational Primitive 1276 qualified with the Logout Request PDU. The iSER Layer at the 1277 initiator MUST use a SendSE Message to send the Logout Request PDU 1278 to the target. After the iSER Layer at the initiator receives the 1279 SendSE Message containing the Logout Response PDU from the target, 1280 it MUST notify the iSCSI Layer by invoking the Control_Notify 1281 Operational Primitive qualified with the Logout Response PDU. 1283 After the iSCSI logout process is complete, the iSCSI layer at the 1284 target is responsible for closing the iSCSI/iSER connection as 1285 described in Section 5.2.2. After the RDMAP layer at the initiator 1286 reports that the Connection has been closed, the iSER Layer at the 1287 initiator MUST deallocate the iWARP resources for the connection, 1288 deallocate all the task resources (if any) associated with the 1289 connection, invalidate the local mapping(s) (if any) that associate 1290 the ITT(s) used on that connection to the local STag(s), and then 1291 invoke the Connection_Terminate_Notify Operational Primitive to 1292 notify the iSCSI Layer. 1294 5.2.2 Normal Connection Termination at the Target 1296 Upon receiving the SendSE Message containing the Logout Request PDU, 1297 the iSER Layer at the target MUST notify the iSCSI Layer at the 1298 target by invoking the Control_Notify Operational Primitive 1299 qualified with the Logout Request PDU. The iSCSI Layer completes 1300 the logout process by invoking the Send_Control Operational 1301 Primitive qualified with the Logout Response PDU. The iSER Layer at 1302 the target MUST use a SendSE Message to send the Logout Response PDU 1303 to the initiator. After the iSCSI logout process is complete, the 1304 iSCSI Layer at the target MUST invoke the Connection_Terminate 1305 Operational Primitive to request the iSER Layer at the target to 1306 terminate the RDMAP Stream. 1308 As part of the termination process, the RDMAP layer MUST close the 1309 Connection. When the RDMAP layer notifies the iSER Layer after the 1310 RDMAP stream and the associated Connection are terminated, the iSER 1311 Layer MUST deallocate the iWARP resources for the connection. In 1312 addition to deallocating the iWARP resources, the iSER Layer at the 1313 target MUST deallocate all the task resources (if any) associated 1314 with the connection, and invalidate the local and remote mapping(s) 1315 (if any) that associate the ITT(s) used on that connection to the 1316 local STag(s) and the Advertised STag(s) respectively. 1318 Ko et. al. Expires January 2005 31 1319 5.2.3 Termination without Logout Request/Response PDUs 1321 5.2.3.1 Connection Termination Initiated by the iSCSI Layer 1323 The Connection_Terminate Operational Primitive MAY be invoked by the 1324 iSCSI Layer to terminate the iSCSI/iSER connection without having 1325 previously exchanged the Logout Request and Logout Response PDUs 1326 between the two iSCSI/iSER nodes. The Connection_Terminate 1327 Operational Primitive requests the iSER Layer to terminate the RDMAP 1328 Stream. As part of the termination process, the RDMAP layer will 1329 close the Connection. When the RDMAP layer notifies the iSER Layer 1330 after the RDMAP stream and the associated Connection are terminated, 1331 the iSER Layer MUST perform the following actions. 1333 If the Connection_Terminate Operational Primitive is invoked by the 1334 iSCSI Layer at the target, then the iSER Layer at the target MUST 1335 deallocate the iWARP resources for the connection, deallocate all 1336 the task resources (if any) associated with the connection, and 1337 invalidate the local and remote mappings (if any) that associate the 1338 ITT(s) used on the connection to the local STag(s) and the 1339 Advertised STag(s) respectively. 1341 If the Connection_Terminate Operational Primitive is invoked by the 1342 iSCSI Layer at the initiator, then the iSER Layer at the initiator 1343 MUST deallocate the iWARP resources for the connection, deallocate 1344 the task resources (if any) associated with the connection, and 1345 invalidate the local mapping(s) (if any) that associate the ITT(s) 1346 used on the connection to the local STag(s). 1348 5.2.3.2 Connection Termination Notification to the iSCSI Layer 1350 If the iSCSI/iSER connection is terminated without the invocation of 1351 Connection_Terminate from the iSCSI Layer, the iSER Layer MUST 1352 invoke the Connection_Terminate_Notify Operational Primitive to 1353 notify the iSCSI Layer that the iSCSI/iSER connection has been 1354 terminated. 1356 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1357 target MUST deallocate the iWARP resources for the connection, 1358 deallocate the task resources (if any) associated with the 1359 connection, and invalidate the local and remote mappings (if any) 1360 that associate the ITT(s) used on the connection to the local 1361 STag(s) and the Advertised STag(s) respectively. 1363 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1364 initiator MUST deallocate the iWARP resources for the connection, 1365 deallocate the task resources (if any) associated with the 1367 Ko et. al. Expires January 2005 32 1368 connection, and invalidate the local mappings (if any) that 1369 associate the ITT(s) used on the connection to the local STag(s). 1371 If the remote iSCSI/iSER node initiated the closing of the 1372 Connection (e.g., by sending a TCP FIN or TCP RST), the iSER Layer 1373 MUST invoke the Connection_Terminate_Notify Operational Primitive to 1374 notify the iSCSI Layer after the RDMAP layer reports that the 1375 Connection is closed. 1377 Another example of a Connection termination without a preceding 1378 logout is when the iSCSI Layer at the initiator does an implicit 1379 logout (connection reinstatement). 1381 Ko et. al. Expires January 2005 33 1382 6 Login/Text Operational Keys 1384 Certain iSCSI login/text operational keys have restricted usage in 1385 iSER, and additional keys are used to support the iSER protocol 1386 functionality. All other keys defined by [iSCSI] and not discussed 1387 in this section may be used on iSCSI/iSER connections with the same 1388 semantics. 1390 6.1 HeaderDigest and DataDigest 1392 If the RDMAExtensions key is negotiated to "Yes" on the leading 1393 connection of a session, both HeaderDigest and DataDigest MUST be 1394 negotiated to "None" for each connection belonging to that session. 1396 6.2 MaxRecvDataSegmentLength 1398 For an iSCSI connection belonging to a session in which 1399 RDMAExtensions=Yes was negotiated on the leading connection of the 1400 session, MaxRecvDataSegmentLength need not be declared in the Login 1401 Phase. Instead InitiatorRecvDataSegmentLength (as described in 1402 section 6.5) and TargetRecvDataSegmentLength (as described in 1403 section 6.4) keys are negotiated. The values of the local and remote 1404 MaxRecvDataSegmentLength are derived from the 1405 InitiatorRecvDataSegmentLength and TargetRecvDataSegmentLength keys 1406 even if the MaxRecvDataSegmentLength was declared during the login 1407 phase. 1409 In the full feature phase, the initiator MUST consider the value of 1410 its local MaxRecvDataSegmentLength (that it would have declared to 1411 the target) as having the value of InitiatorRecvDataSegmentLength, 1412 and the value of the remote MaxRecvDataSegmentLength (that would 1413 have been declared by the target) as having the value of 1414 TargetRecvDataSegmentLength. Similarly, the target MUST consider 1415 the value of its local MaxRecvDataSegmentLength (that it would have 1416 declared to the initiator) as having the value of 1417 TargetRecvDataSegmentLength, and the value of the remote 1418 MaxRecvDataSegmentLength (that would have been declared by the 1419 initiator) as having the value of InitiatorRecvDataSegmentLength. 1421 The MaxRecvDataSegmentLength key is applicable only for iSCSI 1422 control-type PDUs. 1424 6.3 RDMAExtensions 1426 Use: LO (leading only) 1428 Senders: Initiator and Target 1430 Ko et. al. Expires January 2005 34 1431 Scope: SW (session-wide) 1433 RDMAExtensions= 1435 Irrelevant when: SessionType=Discovery 1437 Default is No 1439 Result function is AND 1441 This key is used by the initiator and the target to negotiate the 1442 support for iSER-assisted mode. To enable the use of iSER-assisted 1443 mode, both the initiator and the target MUST exchange 1444 RDMAExtensions=Yes. iSER-assisted mode MUST NOT be used if either 1445 the initiator or the target offers RDMAExtensions=No. 1447 An iSER-enabled node is not required to initiate the RDMAExtensions 1448 key exchange if it prefers to operate in the traditional iSCSI mode. 1449 However, if the RDMAExtensions key is to be negotiated, it MUST be 1450 offered only on the initial Login Request PDU or Login Response PDU 1451 of the leading connection, and if offered, the response MUST be sent 1452 in the immediately following Login Response or Login Request PDU 1453 respectively. The key must precede any other login keys which may 1454 be affected by the outcome of the negotiation of the RDMAExtensions 1455 key. 1457 6.4 TargetRecvDataSegmentLength 1459 Use: IO (Initialize only) 1461 Senders: Initiator and Target 1463 Scope: CO (connection-only) 1465 Irrelevant when: RDMAExtensions=No 1467 TargetRecvDataSegmentLength= 1469 Default is 8192 bytes 1471 Result function is minimum 1473 This key is relevant only for the iSCSI connection of an iSCSI 1474 session if RDMAExtensions=Yes was negotiated on the leading 1475 connection of the session. It is used by the initiator and the 1476 target to negotiate the maximum size of the data segment that an 1477 initiator may send to the target in an iSCSI control-type PDU. For 1479 Ko et. al. Expires January 2005 35 1480 SCSI Command PDUs and SCSI Data-out PDUs containing non-immediate 1481 unsolicited data to be sent by the initiator, the initiator MUST 1482 send all non-Final PDUs with a data segment size of exactly 1483 TargetRecvDataSegmentLength whenever the PDUs constitute a data 1484 sequence whose size is larger than TargetRecvDataSegmentLength. 1486 6.5 InitiatorRecvDataSegmentLength 1488 Use: IO (Initialize only) 1490 Senders: Initiator and Target 1492 Scope: CO (connection-only) 1494 Irrelevant when: RDMAExtensions=No 1496 InitiatorRecvDataSegmentLength= 1498 Default is 8192 bytes 1500 Result function is minimum 1502 This key is relevant only for the iSCSI connection of an iSCSI 1503 session if RDMAExtensions=Yes was negotiated on the leading 1504 connection of the session. It is used by the initiator and the 1505 target to negotiate the maximum size of the data segment that a 1506 target may send to the initiator in an iSCSI control-type PDU. 1508 6.6 OFMarker and IFMarker 1510 If the RDMAExtensions key is negotiated to "Yes" on the leading 1511 connection of a session, both OFMarker and IFMarker MUST be 1512 negotiated to "No" for each connection belonging to that session if 1513 they are negotiated. 1515 Ko et. al. Expires January 2005 36 1516 7 iSCSI PDU Considerations 1518 When a connection is in the iSER-assisted mode, two types of message 1519 transfers are allowed between the iSCSI Layer at the initiator and 1520 the iSCSI Layer at the target. These are known as the iSCSI data- 1521 type PDUs and the iSCSI control-type PDUs and these terms are 1522 described in the following sections. 1524 7.1 iSCSI Data-Type PDU 1526 An iSCSI data-type PDU is defined as an iSCSI PDU that causes data 1527 transfer, transparent to the remote iSCSI layer, to take place 1528 between the peer iSCSI nodes in the full feature phase of an 1529 iSCSI/iSER connection. An iSCSI data-type PDU, when requested for 1530 transmission by the iSCSI Layer in the sending node, results in the 1531 data being transferred without the participation of the iSCSI Layers 1532 at the sending and the receiving nodes. This is due to the fact 1533 that the PDU itself is not delivered as-is to the iSCSI Layer in the 1534 receiving node. Instead, the data transfer operations are 1535 transformed into the appropriate RDMA operations which are handled 1536 by the RNIC. The set of iSCSI data-type PDUs consists of SCSI Data- 1537 in PDUs and R2T PDUs. 1539 If the invocation of the Operational Primitive by the iSCSI Layer to 1540 request the iSER Layer to process an iSCSI data-type PDU is 1541 qualified with Notify_Enable set, then upon completing the RDMA 1542 operation, the iSER Layer at the target MUST notify the iSCSI Layer 1543 at the target by invoking the Data_Completion_Notify Operational 1544 Primitive qualified with ITT and SN. There is no data completion 1545 notification at the initiator since the RDMA operations are 1546 completely handled by the RNIC at the initiator and the iSER Layer 1547 at the initiator is not involved with the data transfer associated 1548 with iSCSI data-type PDUs. 1550 If the invocation of the Operational Primitive by the iSCSI Layer to 1551 request the iSER Layer to process an iSCSI data-type PDU is 1552 qualified with Notify_Enable cleared, then upon completing the RDMA 1553 operation, the iSER Layer at the target MUST NOT notify the iSCSI 1554 Layer at the target and MUST NOT invoke the Data_Completion_Notify 1555 Operational Primitive. 1557 If an operation associated with an iSCSI data-type PDU fails for any 1558 reason, the contents of the Data Sink buffers associated with the 1559 operation are considered indeterminate. 1561 Ko et. al. Expires January 2005 37 1562 7.2 iSCSI Control-Type PDU 1564 Any iSCSI PDU that is not an iSCSI data-type PDU and also not a SCSI 1565 Data-out PDU carrying solicited data is defined as an iSCSI control- 1566 type PDU. The iSCSI Layer invokes the Send_Control Operational 1567 Primitive to request the iSER Layer to process an iSCSI control-type 1568 PDU. iSCSI control-type PDUs are transferred using RDMAP Send 1569 Message Types. Specifically, it is to be noted that SCSI Data-Out 1570 PDUs carrying unsolicited data are defined as iSCSI control-type 1571 PDUs. See section 7.3.4 on the treatment of SCSI Data-out PDUs. 1573 When the iSER Layer receives an iSCSI control-type PDU, it MUST 1574 notify the iSCSI Layer by invoking the Control_Notify Operational 1575 Primitive qualified with the iSCSI control-type PDU. 1577 7.3 iSCSI PDUs 1579 This section describes the handling of each of the iSCSI PDU types 1580 by the iSER Layer. The iSCSI Layer requests the iSER Layer to 1581 process the iSCSI PDU by invoking the appropriate Operational 1582 Primitive. A Connection_Handle MUST qualify each of these 1583 invocations. In addition, BHS and the optional AHS of the iSCSI PDU 1584 as defined in [iSCSI] MUST qualify each of the invocations. The 1585 qualifying Connection_Handle, the BHS and the AHS are not explicitly 1586 listed in the subsequent sections. 1588 7.3.1 SCSI Command 1590 The SCSI Command PDU is an iSCSI control-type PDU as described in 1591 section 7.2. The iSER Layer at the initiator MUST send the SCSI 1592 command in a SendSE Message to the target. 1594 For a SCSI Write or bidirectional command, the iSCSI Layer at the 1595 initiator MUST invoke the Send_Control Operational Primitive 1596 qualified with ImmediateDataSize, UnsolicitedDataSize, and 1597 DataDescriptorOut. 1599 * If there is immediate data to be transferred for the SCSI write 1600 or bidirectional command, the qualifier ImmediateDataSize defines 1601 the number of bytes of immediate unsolicited data to be sent with 1602 the write or bidirectional command, and the qualifier 1603 DataDescriptorOut defines the initiator's I/O Buffer containing 1604 the SCSI Write data. 1606 * If there is unsolicited data to be transferred for the SCSI Write 1607 or bidirectional command, the qualifier UnsolicitedDataSize 1608 defines the number of bytes of immediate and non-immediate 1610 Ko et. al. Expires January 2005 38 1611 unsolicited data for the command. The iSCSI Layer will issue one 1612 or more SCSI Data-out PDUs for the non-immediate unsolicited 1613 data. See Section 7.3.4 on SCSI Data-out. 1615 * If there is solicited data to be transferred for the SCSI Write 1616 or bidirectional command, as indicated by the Expected Data 1617 Transfer Length in the SCSI Command PDU exceeding the value of 1618 UnsolicitedDataSize, the iSER Layer at the initiator MUST do the 1619 following: 1621 a. It MUST allocate a Write STag for the I/O Buffer defined by 1622 the qualifier DataDescriptorOut. The DataDescriptorOut 1623 describes the I/O buffer starting with the immediate 1624 unsolicited data (if any), followed by the non-immediate 1625 unsolicited data (if any) and solicited data. This means 1626 that the BufferOffset for the SCSI Data-out for this command 1627 is equal to the TO. This implies zero TO for this STag 1628 points to the beginning of this I/O Buffer. 1630 b. It MUST establish a local mapping that associates the 1631 Initiator Task Tag (ITT) to the Write STag. 1633 c. It MUST Advertise the Write STag to the target by sending it 1634 as the Write STag in the iSER header of the iSER Message 1635 (the payload of the RDMAP SendSE Message) containing the 1636 SCSI Write or bidirectional command PDU. See section 9.2 on 1637 iSER Header Format for iSCSI Control-Type PDU. 1639 For a SCSI Read or bidirectional command, the iSCSI Layer at the 1640 initiator MUST invoke the Send_Control Operational Primitive 1641 qualified with DataDescriptorIn which defines the initiator�s I/O 1642 Buffer for receiving the SCSI Read data. The iSER Layer at the 1643 initiator MUST do the following: 1645 a. It MUST allocate a Read STag for the I/O Buffer. 1647 b. It MUST establish a local mapping that associates the 1648 Initiator Task Tag (ITT) to the Read STag. 1650 c. It MUST Advertise the Read STag to the target by sending it 1651 as the Read STag in the iSER header of the iSER Message (the 1652 payload of the RDMAP SendSE Message) containing the SCSI 1653 Read or bidirectional command PDU. See section 9.2 on iSER 1654 Header Format for iSCSI Control-Type PDU. 1656 If the amount of unsolicited data to be transferred in a SCSI 1657 Command exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at 1659 Ko et. al. Expires January 2005 39 1660 the initiator MUST segment the data into multiple iSCSI control-type 1661 PDUs, with the data segment length in all PDUs generated except the 1662 last one having exactly the size TargetRecvDataSegmentLength. The 1663 data segment length of the last iSCSI control-type PDU carrying the 1664 unsolicited data can be up to TargetRecvDataSegmentLength. 1666 When the iSER Layer at the target receives the SCSI Command, it MUST 1667 establish a remote mapping that associates the ITT to the Advertised 1668 Write STag and the Read STag if present in the iSER header. The 1669 Write STag is used by the iSER Layer at the target in handling the 1670 data transfer associated with the R2T PDU(s) as described in section 1671 7.3.6. The Read STag is used in handling the SCSI Data-in PDU(s) 1672 from the iSCSI Layer at the target as described in section 7.3.5. 1674 7.3.2 SCSI Response 1676 The SCSI Response PDU is an iSCSI control-type PDU as described in 1677 section 7.2. The iSCSI Layer at the target MUST invoke the 1678 Send_Control Operational Primitive qualified with 1679 DataDescriptorStatus which defines the buffer containing the sense 1680 and response information. The iSCSI Layer at the target MUST always 1681 return the SCSI status for a SCSI command in a separate SCSI 1682 Response PDU. "Phase collapse" for transferring SCSI status in a 1683 SCSI Data-in PDU MUST NOT be used. The iSER Layer at the target 1684 sends the SCSI Response PDU according to the following rules: 1686 * If no STags were Advertised by the initiator in the iSER Message 1687 containing the SCSI command PDU, then the iSER Layer at the 1688 target MUST send a SendSE Message containing the SCSI Response 1689 PDU. 1691 * If the initiator Advertised a Read STag in the iSER Message 1692 containing the SCSI Command PDU, then the iSER Layer at the 1693 target MUST send a SendInvSE Message containing the SCSI Response 1694 PDU. The RDMAP header of the SendInvSE Message MUST carry the 1695 Read STag to be invalidated at the initiator. 1697 * If the initiator Advertised only the Write STag in the iSER 1698 Message containing the SCSI command PDU, then the iSER Layer at 1699 the target MUST send a SendInvSE Message containing the SCSI 1700 Response PDU. The RDMAP header of the SendInvSE Message MUST 1701 carry the Write STag to be invalidated at the initiator. 1703 When the iSCSI Layer at the target invokes the Send_Control 1704 Operational Primitive to send the SCSI Response PDU, the iSER Layer 1705 at the target MUST invalidate the remote mapping that associates the 1707 Ko et. al. Expires January 2005 40 1708 ITT to the Advertised STag(s) before transferring the SCSI Response 1709 PDU to the initiator. 1711 Upon receiving the SendInvSE Message containing the SCSI Response 1712 PDU from the target, the RDMAP layer at the initiator will 1713 invalidate the STag specified in the RDMAP header. The iSER Layer at 1714 the initiator MUST ensure that the correct STag is invalidated. If 1715 both the Read and the Write STags were Advertised earlier by the 1716 initiator, then the iSER Layer at the initiator MUST explicitly 1717 invalidate the Write STag upon receiving the SendInvSE Message 1718 because the RDMAP header of the SendInvSE Message can only carry one 1719 STag (in this case the Read STag) to be invalidated. 1721 The iSER Layer at the initiator MUST ensure the invalidation of the 1722 STag(s) used in a command before invoking the Control_Notify 1723 Operational Primitive qualified with the SCSI Response to notify the 1724 iSCSI Layer at the initiator. This precludes the possibility of 1725 using the STag(s) after the completion of the command thereby 1726 causing data corruption. 1728 When the iSER Layer at the initiator receives the SendSE or the 1729 SendInvSE Message containing the SCSI Response PDU, it SHOULD 1730 invalidate the local mapping that associates the ITT to the local 1731 STag(s). The iSER Layer MUST ensure that all local STag(s) 1732 associated with the ITT are invalidated before invoking the 1733 Control_Notify Operational Primitive to notify the iSCSI Layer of 1734 the SCSI Response PDU. 1736 7.3.3 Task Management Function Request/Response 1738 The Task Management Function Request/Response PDUs are iSCSI 1739 control-type PDUs as described in section 7.2. The iSER Layer MUST 1740 use a SendSE Message to send the Task Management Function Request 1741 /Response PDU. 1743 For the Task Management Function Request with the TASK REASSIGN 1744 function, the iSER Layer at the initiator MUST do the following: 1746 * It MUST use the ITT as specified in the Referenced Task Tag from 1747 the Task Management Function Request PDU to locate the existing 1748 STag(s), if any, in the local mapping(s) that associates the ITT 1749 to the local STag(s). 1751 * It MUST invalidate the existing STag(s), if any, and the local 1752 mapping(s) that associates the ITT to the local STag(s). 1754 Ko et. al. Expires January 2005 41 1755 * It MUST allocate a Read STag for the I/O Buffer as defined by the 1756 qualifier DataDescriptorIn if the Send_Control Operational 1757 Primitive invocation is qualified with DataDescriptorIn. 1759 * It MUST allocate a Write STag for the I/O Buffer as defined by 1760 the qualifier DataDescriptorOut if the Send_Control Operational 1761 Primitive invocation is qualified with DataDescriptorOut. 1763 * If STags are allocated, it MUST establish new local mapping(s) 1764 that associate the ITT to the allocated STag(s). 1766 * It MUST Advertise the STags, if allocated, to the target in the 1767 iSER header of the SendSE Message carrying the iSCSI PDU, as 1768 described in section 9.2. 1770 For the Task Management Function Request with the TASK REASSIGN 1771 function for a SCSI Read or bidirectional command, the iSCSI Layer 1772 at the initiator MUST set ExpDataSN to 0 since the data transfer and 1773 acknowledgements happen transparently to the iSCSI Layer at the 1774 initiator. This provides the flexibility to the iSCSI Layer at the 1775 target to request transmission of only the unacknowledged data as 1776 specified in [iSCSI]. 1778 When the iSER Layer at the target receives the Task Management 1779 Function Request with the TASK REASSIGN function, it MUST do the 1780 following: 1782 * It MUST use the ITT as specified in the Referenced Task Tag from 1783 the Task Management Function Request PDU to locate the mappings 1784 that associate the ITT to the Advertised STag(s) and the local 1785 STag(s), if any. 1787 * It MUST invalidate the local STaq(s), if any, associated with the 1788 ITT. 1790 * It MUST replace the Advertised STag(s) in the remote mapping that 1791 associates the ITT to the Advertised STag(s) with the Write STag 1792 and the Read STag if present in the iSER header. The Write STag 1793 is used in the handling of the R2T PDU(s) from the iSCSI Layer at 1794 the target as described in section 7.3.6. The Read STag is used 1795 in the handling of the SCSI Data-in PDU(s) from the iSCSI Layer 1796 at the target as described in section 7.3.5. 1798 7.3.4 SCSI Data-out 1800 SCSI Data-out PDUs for unsolicited SCSI Write data are iSCSI 1801 control-type PDUs as described in section 7.2. The iSCSI Layer at 1803 Ko et. al. Expires January 2005 42 1804 the initiator MUST invoke the Send_Control Operational Primitive 1805 qualified with DataDescriptorOut which defines the initiator�s I/O 1806 Buffer containing the unsolicited SCSI Write data. 1808 If the amount of unsolicited data to be transferred as SCSI Data-out 1809 exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at the 1810 initiator MUST segment the data into multiple iSCSI control-type 1811 PDUs, with the DataSegmentLength having the value of 1812 TargetRecvDataSegmentLength in all PDUs generated except the last 1813 one. The DataSegmentLength of the last iSCSI control-type PDU 1814 carrying the unsolicited data can be up to 1815 TargetRecvDataSegmentLength. The iSCSI Layer at the target MUST 1816 perform the reassembly function for the unsolicited data. 1818 For unsolicited data, if the F bit is set to 0 in a SCSI Data-out 1819 PDU, the iSER Layer at the initiator MUST use a Send Message to send 1820 the SCSI Data-out PDU. If the F bit set to 1, the iSER Layer at the 1821 initiator MUST use a SendSE Message to send the SCSI Data-out PDU. 1823 Solicited SCSI Write Data are handled using the R2T mechanism as 1824 described in section 7.3.6. Therefore SCSI Data-out PDUs for 1825 solicited data should never be requested for transmission by the 1826 iSCSI Layer at the initiator. However, if a solicited SCSI Data-out 1827 PDU is inadvertently requested (i.e. TTT!=0xffffffff) for 1828 transmission by the iSCSI Layer at the initiator, the iSER Layer at 1829 the initiator is not required to distinguish it as such. The iSER 1830 Layer at the initiator in such a case MAY treat it as an iSCSI 1831 control-type PDU and handle it as unsolicited data. 1833 7.3.5 SCSI Data-in 1835 SCSI Data-in PDUs are iSCSI data-type PDUs. When the iSCSI Layer at 1836 the target is ready to return the SCSI Read data to the initiator, 1837 it MUST invoke the Put_Data Operational Primitive qualified with 1838 DataDescriptorIn which defines the SCSI Data-in buffer. See section 1839 7.1 on the general requirement on the handling of iSCSI data-type 1840 PDUs. SCSI Data-in PDU(s) are used in SCSI Read data transfer as 1841 described in section 9.5.2. 1843 The iSER Layer at the target MUST do the following for each 1844 invocation of the Put_Data Operational Primitive: 1846 1. It MUST use the ITT in the SCSI Data-in PDU to locate the remote 1847 Read STag in the remote mapping that associates the ITT to 1848 Advertised STag(s). The remote mapping was established earlier 1849 by the iSER Layer at the target when the SCSI Read Command was 1850 received from the initiator. 1852 Ko et. al. Expires January 2005 43 1853 2. It MUST generate and send an RDMA Write Message containing the 1854 read data to the initiator. 1856 a. It MUST use the remote Read STag as the Data Sink STag of 1857 the RDMA Write Message. 1859 b. It MUST use the Buffer Offset from the SCSI Data-in PDU as 1860 the Data Sink Tagged Offset of the RDMA Write Message. 1862 c. It MUST use DataSegmentLength from the SCSI Data-in PDU to 1863 determine the amount of data to be sent in the RDMA Write 1864 Message. 1866 3. It MUST associate DataSN and ITT from the SCSI Data-in PDU with 1867 the RDMA Write operation. If the Put_Data Operational Primitive 1868 invocation was qualified with Notify_Enable set, then when the 1869 iSER Layer at the target receives a completion from the RDMAP 1870 layer for the RDMA Write Message, the iSER Layer at the target 1871 MUST notify the iSCSI Layer by invoking the 1872 Data_Completion_Notify Operational Primitive qualified with 1873 DataSN and ITT. Conversely, if the Put_Data Operational 1874 Primitive invocation was qualified with Notify_Enable cleared, 1875 then the iSER Layer at the target MUST NOT notify the iSCSI 1876 Layer on completion and MUST NOT invoke the 1877 Data_Completion_Notify Operational Primitive. 1879 When the A-bit is set to 1 in the SCSI Data-in PDU, the iSER Layer 1880 at the target MUST notify the iSCSI Layer at the target when the 1881 data transfer is complete at the initiator. To perform this 1882 additional function, the iSER Layer at the target can take advantage 1883 of the operational ErrorRecoveryLevel if previously disclosed by the 1884 iSCSI Layer via an earlier invocation of the Notice_Key_Values 1885 Operational Primitive. There are two approaches that can be taken: 1887 1. If the iSER Layer at the target knows that the operational 1888 ErrorRecoveryLevel is 2, or if the iSER Layer at the target does 1889 not know the operational ErrorRecoveryLevel, then the iSER Layer 1890 at the target MUST issue a zero-length RDMA Read Message 1891 following the RDMA Write Message. When the iSER Layer at the 1892 target receives a completion for the RDMA Read Message from the 1893 RDMAP layer, implying that the initiator RNIC has completed 1894 processing of the RDMA Write Message due to the completion 1895 ordering semantics of RDMAP, the iSER Layer at the target MUST 1896 invoke the Data_Ack_Notify Operational Primitive qualified with 1897 ITT and DataSN to notify the iSCSI Layer at the target. 1899 Ko et. al. Expires January 2005 44 1900 2. If the iSER Layer at the target knows that the operational 1901 ErrorRecoveryLevel is 1, then the iSER Layer at the target MUST 1902 do one of the following: 1904 a. It MUST invoke the Data_Ack_Notify Operational Primitive 1905 qualified with ITT and DataSN when it receives the local 1906 completion from the RDMAP layer for the RDMA Write Message. 1907 This is allowed since digest errors do not occur in iSER 1908 (see section 10.1.4.2) and a CRC error will cause the 1909 connection to be terminated and the task to be terminated 1910 anyway. The local RDMA Write completion from the RDMAP 1911 layer guarantees that the RDMAP layer will not access the 1912 I/O Buffer again to transfer the data associated with that 1913 RDMA Write operation. 1915 b. Alternatively, it MUST use the same procedure for handling 1916 the data transfer completion at the initiator as for 1917 ErrorRecoveryLevel 2. 1919 It should be noted that the iSCSI Layer at the target cannot set the 1920 A-bit to 1 if the ErrorRecoveryLevel=0. 1922 SCSI status MUST always be returned in a separate SCSI Response PDU. 1923 The S bit in the SCSI Data-in PDU MUST always be set to 0. There 1924 MUST NOT be a "phase collapse" in the SCSI Data-in PDU. 1926 Since the RDMA Write Message only transfers the data portion of the 1927 SCSI Data-in PDU but not the control information in the header, such 1928 as ExpCmdSN, if timely updates of such information is crucial, the 1929 iSCSI Layer at the initiator MAY issue NOP-out PDUs to request the 1930 iSCSI Layer at the target to respond with the information using NOP- 1931 in PDUs. 1933 7.3.6 Ready To Transfer (R2T) 1935 The R2T PDU is an iSCSI data-type PDU. In order to send an R2T PDU, 1936 the iSCSI Layer at the target MUST invoke the Get_Data Operational 1937 Primitive qualified with DataDescriptorOut which defines the I/O 1938 Buffer for receiving the SCSI Write data from the initiator. See 1939 section 7.1 on the general requirements on the handling of iSCSI 1940 data-type PDUs. 1942 The iSER Layer at the target MUST do the following for each 1943 invocation of the Get_Data Operational Primitive: 1945 1. It MUST ensure a valid local STag for the I/O Buffer and a valid 1946 local mapping that associates the Initiator Task Tag (ITT) to 1948 Ko et. al. Expires January 2005 45 1949 the local STag. This may involve allocating a valid local STag 1950 and establishing a local mapping. 1952 2. It MUST use the ITT in the R2T to locate the remote Write STag 1953 in the remote mapping that associates the ITT to Advertised 1954 STag(s). The remote mapping was established earlier by the iSER 1955 Layer at the target when the iSER Message containing the 1956 Advertised Write STag and the SCSI Command PDU for a SCSI Write 1957 or bidirectional command was received from the initiator. 1959 3. If the iSER-ORD value at the target is set to 0, the iSER Layer 1960 at the target MUST terminate the connection and free up the 1961 resources associated with the connection (as described in 5.2.3) 1962 if it received the R2T PDU from the iSCSI Layer at the target. 1963 Upon termination of the connection, the iSER Layer at the target 1964 MUST notify the iSCSI Layer at the target using the 1965 Connection_Terminate_Notify Operational Primitive. 1967 4. If the iSER-ORD value at the target is set to greater than 0, 1968 the iSER Layer at the target MUST transform the R2T PDU into an 1969 RDMA Read Request Message. While transforming the R2T PDU, the 1970 iSER Layer at the target MUST ensure that the number of 1971 outstanding RDMA Read Request Messages does not exceed iSER-ORD 1972 value. To transform the R2T PDU, the iSER Layer at the target: 1974 a. MUST derive the local STag and local Tagged Offset from the 1975 DataDescriptorOut that qualified the Get_Data invocation. 1977 b. MUST use the local STag as the Data Sink STag of the RDMA 1978 Read Request Message. 1980 c. MUST use the local Tagged Offset as the Data Sink Tagged 1981 Offset of the RDMA Read Request Message. 1983 d. MUST use the Desired Data Transfer Length from the R2T PDU 1984 as the RDMA Read Message Size of the RDMA Read Request 1985 Message. 1987 e. MUST use the remote Write STag as the Data Source STag of 1988 the RDMA Read Request Message. 1990 f. MUST use the Buffer Offset from the R2T PDU as the Data 1991 Source Tagged Offset of the RDMA Read Request Message. 1993 5. It MUST associate R2TSN and ITT from the R2T PDU with the RDMA 1994 Read operation. If the Get_Data Operational Primitive 1995 invocation was qualified with Notify_Enable set, then when the 1997 Ko et. al. Expires January 2005 46 1998 iSER Layer at the target receives a completion from the RDMAP 1999 layer for the RDMA Read operation, the iSER Layer at the target 2000 MUST notify the iSCSI Layer by invoking the 2001 Data_Completion_Notify Operational Primitive qualified with 2002 R2TSN and ITT. Conversely, if the Get_Data Operational 2003 Primitive invocation was qualified with Notify_Enable cleared, 2004 then the iSER Layer at the target MUST NOT notify the iSCSI 2005 Layer on completion and MUST NOT invoke the 2006 Data_Completion_Notify Operational Primitive. 2008 When the RDMAP layer at the initiator receives a valid RDMA Read 2009 Request Message, it will return an RDMA Read Response Message 2010 containing the solicited write data to the target. When the RDMAP 2011 layer at target receives the RDMA Read Response Message from the 2012 initiator, it will place the solicited data in the I/O Buffer 2013 referenced by the Data Sink STag in the RDMA Read Response Message. 2015 Since the RDMA Read Request Message from the target does not 2016 transfer the control information in the R2T PDU such as ExpCmdSN, if 2017 timely updates of such information is crucial, the iSCSI Layer at 2018 the initiator MAY issue NOP-out PDUs to request the iSCSI Layer at 2019 the target to respond with the information using NOP-in PDUs. 2021 Similarly, since the RDMA Read Response Message from the initiator 2022 only transfers the data but not the control information normally 2023 found in the SCSI Data-out PDU, such as ExpStatSN, if timely updates 2024 of such information is crucial, the iSCSI Layer at the target MAY 2025 issue NOP-in PDUs to request the iSCSI Layer at the initiator to 2026 respond with the information using NOP-out PDUs. 2028 7.3.7 Asynchronous Message 2030 The Asynchronous Message PDU is an iSCSI control-type PDU as 2031 described in section 7.2. The iSCSI Layer MUST invoke the 2032 Send_Control Operational Primitive qualified with 2033 DataDescriptorSense which defines the buffer containing the sense 2034 and iSCSI Event information. The iSER Layer MUST use a SendSE 2035 Message to send the Asynchronous Message PDU. 2037 7.3.8 Text Request & Text Response 2039 The Text Request and Text Response PDUs are iSCSI control-type PDUs 2040 as described in section 7.2. The iSCSI Layer MUST invoke the 2041 Send_Control Operational Primitive qualified with 2042 DataDescriptorTextOut (or DataDescriptorIn) which defines the Text 2043 Request (or Text Response) buffer. The iSER Layer MUST use SendSE 2044 Messages to send the Text Request and Text Response PDUs. 2046 Ko et. al. Expires January 2005 47 2047 7.3.9 Login Request & Login Response 2049 The Login Request PDUs and the Login Response PDUs are exchanged 2050 when the connection between the initiator and the target is still in 2051 the byte stream mode. During the login negotiation, the iSCSI Layer 2052 interacts with the transport layer directly and the iSER Layer is 2053 not involved. See section 5.1 on iSCSI/iSER Connection Setup. 2055 If the iSCSI Layer attempts to send a Login Request (or a Login 2056 Response) PDU during the full feature phase, it MUST invoke the 2057 Send_Control Operational Primitive qualified with 2058 DataDescriptorLoginRequest (or DataDescriptorLoginResponse) which 2059 defines the Login Request (or Login Response) buffer. The iSER 2060 Layer MUST handle it as an iSCSI control-type PDU as described in 2061 section 7.2, and use SendSE Messages to send the Login Request and 2062 Login Response PDUs. 2064 7.3.10 Logout Request & Logout Response 2066 The Logout Request and Logout Response PDUs are iSCSI control-type 2067 PDUs as described in section 7.2. The iSER Layer MUST use a SendSE 2068 Message to send the Logout Request or Logout Response PDU. Section 2069 5.2.1 and 5.2.2 describe the handling of the Logout Request and the 2070 Logout Response at the initiator and the target and the interactions 2071 between the initiator and the target to terminate a connection. 2073 7.3.11 SNACK Request 2075 Since HeaderDigest and DataDigest must be negotiated to "None", 2076 there are no digest errors when the connection is in iSER-assisted 2077 mode. Also since RDMAP delivers all messages in the order they were 2078 sent, there are no sequence errors when the connection is in iSER- 2079 assisted mode. Therefore the iSCSI Layer SHOULD NOT send SNACK 2080 Request PDUs. In particular, the Proactive (Time out) SNACK SHOULD 2081 NOT be issued. If the iSCSI Layer invokes the Send_Control 2082 Operational Primitive to request the iSER Layer to send a SNACK 2083 Request, the iSER Layer MUST handle it as an iSCSI control-type PDU 2084 as described in section 7.2, and use a SendSE Message to send the 2085 SNACK Request PDU. Upon receiving the iSER Message containing the 2086 SNACK PDU, the iSER Layer notifies the iSCSI Layer using the 2087 Control_Notify Operational Primitive. 2089 7.3.12 Reject 2091 The Reject PDU is an iSCSI control-type PDU as described in section 2092 7.2. The iSCSI Layer MUST invoke the Send_Control Operational 2093 Primitive qualified with DataDescriptorReject which defines the 2095 Ko et. al. Expires January 2005 48 2096 Rejct buffer. The iSER Layer MUST use a SendSE Message to send the 2097 Reject PDU. 2099 7.3.13 NOP-Out & NOP-In 2101 The NOP-Out and NOP-In PDUs are iSCSI control-type PDUs as described 2102 in section 7.2. The iSCSI Layer MUST invoke the Send_Control 2103 Operational Primitive qualified with DataDescriptorNOPOut (or 2104 DataDescriptorNOPIn) which defines the Ping (or Return Ping) data 2105 buffer. The iSER Layer MUST use SendSE Messages to send the NOP-Out 2106 and NOP-In PDUs. 2108 Ko et. al. Expires January 2005 49 2109 8 Flow Control and STag Management 2111 8.1 Flow Control for RDMA Send Message Types 2113 RDMAP Send Message Types are used by the iSER Layer to transfer 2114 iSCSI control-type PDUs. Each RDMAP Send Message Type consumes an 2115 Untagged Buffer at the Data Sink. However, neither the RDMAP layer 2116 nor the iSER Layer provides an explicit flow control mechanism for 2117 the RDMAP Send Message Types. Therefore, the iSER Layer SHOULD 2118 provision enough Untagged buffers for handling incoming RDMAP Send 2119 Message Types to prevent a buffer underrun condition at the RDMAP 2120 layer. If a buffer underrun happens, it may result in the 2121 termination of the connection. An implementation may choose to 2122 satisfy this requirement by using a common buffer pool shared across 2123 multiple connections, with usage limits on a per connection basis 2124 and usage limits on the buffer pool itself. In such an 2125 implementation, exceeding the buffer usage limit for a connection or 2126 the buffer pool itself may trigger interventions from the iSER Layer 2127 to replenish the buffer pool and/or to isolate the connection 2128 causing the problem. 2130 8.2 Flow Control for RDMA Read Resources 2132 The total number of RDMA Read operations that can be active 2133 simultaneously on an iSCSI/iSER connection depends on the amount of 2134 resources allocated as declared in the iSER Hello exchange described 2135 in section 5.1.3. Exceeding the number of RDMA Read operations 2136 allowed on a connection will result in the connection being 2137 terminated by the RDMAP layer. The iSER Layer at the target 2138 maintains the iSER-ORD to keep track of the maximum number of RDMA 2139 Read Requests that can be issued by the iSER Layer on a particular 2140 RDMAP Stream. 2142 During connection setup (see section 5.1), iSER-IRD is known at the 2143 initiator and iSER-ORD is known at the target after the iSER Layers 2144 at the initiator and the target have respectively allocated the 2145 iWARP resources for the connection, as directed by the 2146 Allocate_Connection_Resources Operational Primitive from the iSCSI 2147 Layer before the end of the iSCSI Login Phase. In the full feature 2148 phase, the first message sent by the initiator is the iSER Hello 2149 Message (see section 9.3) which contains the value of iSER-IRD. In 2150 response to the iSER Hello Message, the target sends the iSER 2151 HelloReply Message (see section 9.4) which contains the value of 2152 iSER-ORD. The iSER Layer at both the initiator and the target MAY 2153 adjust (lower) the iWARP resources associated with iSER-IRD and 2154 iSER-ORD respectively to match the iSER-ORD value declared in the 2155 HelloReply Message. The iSER Layer at the target MUST flow control 2157 Ko et. al. Expires January 2005 50 2158 the RDMA Read Request Messages to not exceed the iSER-ORD value at 2159 the target. 2161 8.3 STag Management 2163 An STag, as defined in [RDMAP], is an identifier of a Tagged Buffer 2164 used in an RDMA operation. The allocation and the subsequent 2165 invalidation of the STags are specified in this document if the 2166 STags are exposed on the wire by being Advertised in the iSER header 2167 or declared in the RDMAP header of an iWARP Message. 2169 8.3.1 Allocation of STags 2171 When the iSCSI Layer at the initiator invokes the Send_Control 2172 Operational Primitive to request the iSER Layer at the initiator to 2173 process a SCSI Command, zero, one, or two STags may be allocated by 2174 the iSER Layer. See section 7.3.1 for details. The number of STags 2175 allocated depends on whether the command is unidirectional or 2176 bidirectional and whether solicited write data transfer is involved 2177 or not. 2179 When the iSCSI Layer at the initiator invokes the Send_Control 2180 Operational Primitive to request the iSER Layer at the initiator to 2181 process a Task Management Function Request with the TASK REASSIGN 2182 function, besides allocating zero, one, or two STags, the iSER Layer 2183 MUST invalidate the existing STags, if any, associated with the ITT. 2184 See section 7.3.3 for details. 2186 The iSER Layer at the target allocates a local Data Sink STag when 2187 the iSCSI Layer at the target invokes the Get_Data Operational 2188 Primitive to request the iSER Layer to process an R2T PDU. See 2189 section 7.3.6 for details. 2191 8.3.2 Invalidation of STags 2193 The invalidation of the STags at the initiator at the completion of 2194 a unidirectional or bidirectional command when the associated SCSI 2195 Response PDU is sent by the target is described in section 7.3.2. 2197 When a unidirectional or bidirectional command concludes without the 2198 associated SCSI Response PDU being sent by the target, the iSCSI 2199 Layer at the initiator MUST invoke the Deallocate_Task_Resources 2200 Operational Primitive qualified with ITT. In response, the iSER 2201 Layer at the initiator MUST locate the STag(s) (if any) in the local 2202 mapping that associates the ITT to the local STag(s). The iSER 2203 Layer at the initiator MUST invalidate the STag(s) (if any) and the 2204 local mapping. 2206 Ko et. al. Expires January 2005 51 2207 For an RDMA Read operation used to realize a SCSI Write data 2208 transfer, the iSER Layer at the target SHOULD invalidate the Data 2209 Sink STag at the conclusion of the RDMA Read operation referencing 2210 the Data Sink STag (to permit the immediate reuse of buffer 2211 resources). 2213 For an RDMA Write operation used to realize a SCSI Read data 2214 transfer, the Data Source STag at the target is not declared to the 2215 initiator and is not exposed on the wire. Invalidation of the STag 2216 is thus not specified. 2218 When a unidirectional or bidirectional command concludes without the 2219 associated SCSI Response PDU being sent by the target, the iSCSI 2220 Layer at the target MUST invoke the Deallocate_Task_Resources 2221 Operational Primitive qualified with ITT. In response, the iSER 2222 Layer at the target MUST locate the local STag(s) (if any) in the 2223 local mapping that associates the ITT to the local STag(s). The 2224 iSER Layer at the target MUST invalidate the local STag(s) (if any) 2225 and the mapping. 2227 Ko et. al. Expires January 2005 52 2228 9 iSER Control and Data Transfer 2230 For iSCSI data-type PDUs (see section 7.1), the iSER Layer uses RDMA 2231 Read and RDMA Write operations to transfer the solicited data. For 2232 iSCSI control-type PDUs (see section 7.2), the iSER Layer uses RDMAP 2233 Send Message Types. 2235 9.1 iSER Header Format 2237 An iSER header MUST be present in every RDMAP Send Message Type. 2238 The iSER header is located in the first 12 bytes of the message 2239 payload of the RDMAP Send Message Type, as shown in Figure 2. 2241 0 1 2 3 2242 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2243 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2244 | Opcode| Opcode Specific Fields | 2245 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2246 | Opcode Specific Fields | 2247 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2248 | Opcode Specific Fields | 2249 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2250 Figure 2 iSER Header Format 2252 Opcode - Operation Code: 4 bits 2254 The Opcode field identifies the type of iSER Messages: 2256 0001b = iSCSI control-type PDU 2258 0010b = iSER Hello Message 2260 0011b = iSER HelloReply Message 2262 All other opcodes are reserved. 2264 9.2 iSER Header Format for iSCSI Control-Type PDU 2266 The iSER Layer uses RDMAP Send Message Types to transfer iSCSI 2267 control-type PDUs (see section 7.2). The message payload of each of 2268 the RDMAP Send Message Types used for transferring an iSER Message 2269 contains an iSER Header followed by an iSCSI control-type PDU. 2271 The iSER header in an RDMAP Send Message Type carrying an iSCSI 2272 control-type PDU MUST have the format as described in Figure 3. 2274 Ko et. al. Expires January 2005 53 2275 0 1 2 3 2276 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2277 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2278 | |W|R| | 2279 | 0001b |S|S| Reserved | 2280 | |V|V| | 2281 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2282 | Write STag (or N/A) | 2283 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2284 | Read STag (or N/A) | 2285 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2286 Figure 3 iSER Header Format for iSCSI Control-Type PDU 2288 WSV - Write STag Valid flag: 1 bit 2290 This flag indicates the validity of the Write STag field of the 2291 iSER Header. If set to one, the Write STag field in this iSER 2292 Header is valid. If set to zero, the Write STag field in this 2293 iSER Header MUST be ignored at the receiver. The Write STag 2294 Valid flag is set to one when there is solicited data to be 2295 transferred for a SCSI Write or bidirectional command, or when 2296 there are non-immediate unsolicited and solicited data to be 2297 transferred for the referenced task specified in a Task 2298 Management Function Request with the TASK REASSIGN function. 2300 RSV - Read STag Valid flag: 1 bit 2302 This flag indicates the validity of the Read STag field of the 2303 iSER Header. If set to one, the Read STag field in this iSER 2304 Header is valid. If set to zero, the Read STag field in this 2305 iSER Header MUST be ignored at the receiver. The Read STag 2306 Valid flag is set to one for a SCSI Read or bidirectional 2307 command, or a Task Management Function Request with the TASK 2308 REASSIGN function. 2310 Write STag - Write Steering Tag: 32 bits. 2312 This field contains the Write STag when the Write STag Valid 2313 flag is set to one. For a SCSI Write or bidirectional command, 2314 the Write STag is used to Advertise the initiator�s I/O Buffer 2315 containing the solicited data. For a Task Management Function 2316 Request with the TASK REASSIGN function, the Write STag is used 2317 to Advertise the initiator's I/O Buffer containing the non- 2319 Ko et. al. Expires January 2005 54 2320 immediate unsolicited data and solicited data. This Write STag 2321 is used as the Data Source STag in the resultant RDMA Read 2322 operation(s). When the Write STag Valid flag is set to zero, 2323 this field MUST be set to zero. 2325 Read STag - Read Steering Tag: 32 bits. 2327 This field contains the Read STag when the Read STag Valid flag 2328 is set to one. The Read STag is used to Advertise the 2329 initiator�s Read I/O Buffer of a SCSI Read or bidirectional 2330 command, or a Task Management Function Request with the TASK 2331 REASSIGN function. This Read STag is used as the Data Sink STag 2332 in the resultant RDMA Write operation(s). When the Read STag 2333 Valid flag is zero, this field MUST be set to zero. 2335 Reserved: 2337 Reserved fields MUST be set to zero on transmit and MUST be 2338 ignored on receive. 2340 9.3 iSER Header Format for iSER Hello Message 2342 An iSER Hello Message MUST only contain the iSER header which MUST 2343 have the format as described in Figure 4. iSER Hello Message is the 2344 first RDMAP Message sent on the RDMAP Stream from the iSER Layer at 2345 the initiator to the iSER Layer at the target. 2347 0 1 2 3 2348 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2349 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2350 | | | | | | 2351 | 0010b | Rsvd | MaxVer| MinVer| iSER-IRD | 2352 | | | | | | 2353 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2354 | Reserved | 2355 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2356 | Reserved | 2357 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2358 Figure 4 iSER Header Format for iSER Hello Message 2360 MaxVer - Maximum Version: 4 bits 2362 This field specifies the maximum version of the iSER protocol 2363 supported. It MUST be set to 1 to indicate the version of the 2364 specification described in this document. 2366 MinVer - Minimum Version: 4 bits 2368 Ko et. al. Expires January 2005 55 2369 This field specifies the minimum version of the iSER protocol 2370 supported. It MUST be set to 1 to indicate the version of the 2371 specification described in this document. 2373 iSER-IRD: 16 bits 2375 This field contains the value of the iSER-IRD at the initiator. 2377 Reserved (Rsvd): 2379 Reserved fields MUST be set to zero on transmit, and MUST be 2380 ignored on receive. 2382 9.4 iSER Header Format for iSER HelloReply Message 2384 An iSER HelloReply Message MUST only contain the iSER header which 2385 MUST have the format as described in Figure 5. The iSER HelloReply 2386 Message is the first RDMAP Message sent on the RDMAP Stream from the 2387 iSER Layer at the target to the iSER Layer at the initiator. 2389 0 1 2 3 2390 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2391 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2392 | | |R| | | | 2393 | 0011b |Rsvd |E| MaxVer| CurVer| iSER-ORD | 2394 | | |J| | | | 2395 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2396 | Reserved | 2397 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2398 | Reserved | 2399 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2400 Figure 5 iSER Header Format for iSER HelloReply Message 2402 REJ - Reject flag: 1 bit 2404 This flag indicates whether the target is rejecting this 2405 connection. If set to one, the target is rejecting the 2406 connection. 2408 MaxVer - Maximum Version: 4 bits 2410 This field specifies the maximum version of the iSER protocol 2411 supported. It MUST be set to 1 to indicate the version of the 2412 specification described in this document. 2414 CurVer - Current Version: 4 bits 2416 Ko et. al. Expires January 2005 56 2417 This field specifies the current version of the iSER protocol 2418 supported. It MUST be set to 1 to indicate the version of the 2419 specification described in this document. 2421 iSER-ORD: 16 bits 2423 This field contains the value of the iSER-ORD at the target. 2425 Reserved (Rsvd): 2427 Reserved fields MUST be set to zero on transmit, and MUST be 2428 ignored on receive. 2430 9.5 SCSI Data Transfer Operations 2432 The iSER Layer at the initiator and the iSER Layer at the target 2433 handle each SCSI Write, SCSI Read, and bidirectional operation as 2434 described below. 2436 9.5.1 SCSI Write Operation 2438 The iSCSI Layer at the initiator MUST invoke the Send_Control 2439 Operational Primitive to request the iSER Layer at the initiator to 2440 send the SCSI Write Command. The iSER Layer at the initiator MUST 2441 request the RDMAP layer to transmit a SendSE Message with the 2442 message payload consisting of the iSER header followed by the SCSI 2443 Command PDU and immediate data (if any). If there is solicited 2444 data, the iSER Layer MUST Advertise the Write STag in the iSER 2445 header of the SendSE Message, as described in section 9.2. Upon 2446 receiving the SendSE Message, the iSER Layer at the target MUST 2447 notify the iSCSI Layer at the target by invoking the Control_Notify 2448 Operational Primitive qualified with the SCSI Command PDU. See 2449 section 7.3.1 for details on the handling of the SCSI Write Command. 2451 For the non-immediate unsolicited data, the iSCSI Layer at the 2452 initiator MUST invoke a Send_Control Operational Primitive qualified 2453 with the SCSI Data-out PDU. Upon receiving each Send or SendSE 2454 Message containing the non-immediate unsolicited data, the iSER 2455 Layer at the target MUST notify the iSCSI Layer at the target by 2456 invoking the Control_Notify Operational Primitive qualified with the 2457 SCSI Data-out PDU. See section 7.3.4 for details on the handling of 2458 the SCSI Data-out PDU. 2460 For the solicited data, when the iSCSI Layer at the target has an 2461 I/O Buffer available, it MUST invoke the Get_Data Operational 2462 Primitive qualified with the R2T PDU. See section 7.3.6 for details 2463 on the handling of the R2T PDU. 2465 Ko et. al. Expires January 2005 57 2466 When the data transfer associated with this SCSI Write operation is 2467 complete, the iSCSI Layer at the target MUST invoke the Send_Control 2468 Operational Primitive when it is ready to send the SCSI Response 2469 PDU. Upon receiving a SendSE or SendInvSE Message containing the 2470 SCSI Response PDU, the iSER Layer at the initiator MUST notify the 2471 iSCSI Layer at the initiator by invoking the Control_Notify 2472 Operational Primitive qualified with the SCSI Response PDU. See 2473 section 7.3.2 for details on the handling of the SCSI Response PDU. 2475 9.5.2 SCSI Read Operation 2477 The iSCSI Layer at the initiator MUST invoke the Send_Control 2478 Operational Primitive to request the iSER Layer at the initiator to 2479 send the SCSI Read Command. The iSER Layer at the initiator MUST 2480 request the RDMAP layer to transmit a SendSE Message with the 2481 message payload consisting of the iSER header followed by the SCSI 2482 Command PDU. The iSER Layer at the initiator MUST Advertise the Read 2483 STag in the iSER header of the SendSE Message, as described in 2484 section 9.2. Upon receiving the SendSE Message, the iSER Layer at 2485 the target MUST notify the iSCSI Layer at the target by invoking the 2486 Control_Notify Operational Primitive qualified with the SCSI Command 2487 PDU. See section 7.3.1 for details on the handling of the SCSI Read 2488 Command. 2490 When the requested SCSI data is available in the I/O Buffer, the 2491 iSCSI Layer at the target MUST invoke the Put_Data Operational 2492 Primitive qualified with the SCSI Data-in PDU. See section 7.3.5 2493 for details on the handling of the SCSI Data-in PDU. 2495 When the data transfer associated with this SCSI Read operation is 2496 complete, the iSCSI Layer at the target MUST invoke the Send_Control 2497 Operational Primitive when it is ready to send the SCSI Response 2498 PDU. Upon receiving the SendInvSE Message containing the SCSI 2499 Response PDU, the iSER Layer at the initiator MUST notify the iSCSI 2500 Layer at the initiator by invoking the Control_Notify Operational 2501 Primitive qualified with the SCSI Response PDU. See section 7.3.2 2502 for details on the handling of the SCSI Response PDU. 2504 9.5.3 Bidirectional Operation 2506 The initiator and the target handle the SCSI Write and the SCSI Read 2507 portions of this bidirectional operation in a similar manner as 2508 described in Section 9.5.1 and Section 9.5.2 respectively. 2510 Ko et. al. Expires January 2005 58 2511 10 iSER Error Handling and Recovery 2513 [RDMAP] and the protocols below it provide the iSER Layer with 2514 reliable in-order delivery. Therefore, the error management needs of 2515 an iSCSI/iSER connection are somewhat different than those of 2516 traditional iSCSI running directly over TCP. 2518 10.1 Error Handling 2520 iSER error handling is described in the following sections, 2521 classified loosely based on the sources of errors: 2523 1. Those originating at the transport layer (e.g., TCP). 2525 2. Those originating at the RDMAP layer. 2527 3. Those originating at the iSER Layer. 2529 4. Those originating at the iSCSI Layer. 2531 10.1.1 Errors in the Transport Layer 2533 If the transport layer is TCP, then TCP packets with errors are 2534 silently dropped by the TCP layer and result in retransmission at 2535 the TCP layer. This has no impact on the iSER Layer. However, 2536 connection loss (e.g., link failure) and unexpected termination 2537 (e.g., TCP graceful or abnormal close without the iSCSI Logout 2538 exchanges) at the transport layer will cause the iSCSI/iSER 2539 connection to be terminated as well. 2541 10.1.1.1 Failure in the Transport Layer Before iWARP is Enabled 2543 If the Connection is lost or terminated before the iSCSI Layer 2544 invokes the Allocate_Connection_Resources Operational Primitive, the 2545 login process is terminated and no further action is required. 2547 If the Connection is lost or terminated after the iSCSI Layer has 2548 invoked the Allocate_Connection_Resources Operational Primitive, 2549 then the iSCSI Layer MUST invoke the Deallocate_Connection_Resources 2550 Operational Primitive to request the iSER Layer to deallocate the 2551 iWARP resources for the connection. 2553 10.1.1.2 Failure in the Transport Layer After iWARP is Enabled 2555 If the Connection is lost or terminated after the iSCSI Layer has 2556 invoked the Enable_Datamover Operational Primitive, the iSER Layer 2557 MUST notify the iSCSI Layer of the connection loss by invoking the 2559 Ko et. al. Expires January 2005 59 2560 Connection_Terminate_Notify Operational Primitive. Prior to invoking 2561 the Connection_Terminate_Notify Operational Primitive, the iSER 2562 layer MUST perform the actions described in Section 5.2.3.2. 2564 10.1.2 Errors in the iWARP protocol suite 2566 The RDMAP layer does not have error recovery operations built in. 2567 If errors are detected at the RDMAP layer, the RDMAP layer will 2568 terminate the RDMAP Stream and the associated Connection. 2570 10.1.2.1 Errors Detected in the Local RDMAP Layer 2572 If an error is encountered at the local RDMAP layer, the RDMAP layer 2573 MAY send a Terminate Message to the Remote Peer to report the error 2574 if possible. (See [RDMAP] for the list of errors where a Terminate 2575 Message is sent.) The RDMAP layer is responsible for terminating 2576 the Connection. After the RDMAP layer notifies the iSER Layer that 2577 the Connection is terminated, the iSER Layer MUST notify the iSCSI 2578 Layer by invoking the Connection_Terminate_Notify Operational 2579 Primitive. Prior to invoking the Connection_Terminate_Notify 2580 Operational Primitive, the iSER layer MUST perform the actions 2581 described in Section 5.2.3.2. 2583 10.1.2.2 Errors Detected in the RDMAP Layer at the Remote Peer 2585 If an error is encountered at the RDMAP layer at the Remote Peer, 2586 the RDMAP layer at the Remote Peer may send a Terminate Message to 2587 report the error if possible. If it is unable to send the Terminate 2588 Message, the Connection is terminated. This is treated similar to a 2589 failure in the transport layer after iWARP is enabled as described 2590 in section 10.1.1.2. 2592 If an error is encountered at the RDMAP layer at the Remote Peer and 2593 it is able to send a Terminate Message, the RDMAP layer at the 2594 Remote Peer is responsible for terminating the connection. After 2595 the local RDMAP layer notifies the iSER Layer that the Connection is 2596 terminated, the iSER Layer MUST notify the iSCSI Layer by invoking 2597 the Connection_Terminate_Notify Operational Primitive. Prior to 2598 invoking the Connection_Terminate_Notify Operational Primitive, the 2599 iSER layer MUST perform the actions described in Section 5.2.3.2. 2601 10.1.3 Errors in the iSER Layer 2603 The error handling due to errors at the iSER Layer is described in 2604 the following sections. 2606 Ko et. al. Expires January 2005 60 2607 10.1.3.1 Insufficient iWARP Resources at the Initiator at Connection 2608 Setup 2610 After the iSCSI Layer at the initiator invokes the 2611 Allocate_Connection_Resources Operational Primitive during the iSCSI 2612 login negotiation phase, if the iSER Layer at the initiator fails to 2613 allocate the necessary iWARP resources, it MUST return a status of 2614 failure to the iSCSI Layer at the initiator. The iSCSI Layer at the 2615 initiator MUST terminate the Connection as described in Section 2616 5.2.3.1. 2618 10.1.3.2 Insufficient iWARP Resources at the Target at Connection Setup 2620 After the iSCSI Layer at the target invokes the 2621 Allocate_Connection_Resources Operational Primitive during the iSCSI 2622 login negotiation phase, if the iSER Layer at the target fails to 2623 allocate the necessary iWARP resources, it MUST return a status of 2624 failure to the iSCSI Layer at the target. The iSCSI Layer at the 2625 target MUST send a Login Response with a status class of 3 (Target 2626 Error), and a status code of "0302" (Out of Resources). The iSCSI 2627 Layers at the initiator and the target MUST terminate the Connection 2628 as described in Section 5.2.3.1. 2630 10.1.3.3 iSER Negotiation Failures 2632 If the iWARP or iSER related parameters declared by the initiator in 2633 the iSER Hello Message is unacceptable to the iSER Layer at the 2634 target, the iSER Layer at the target MUST set the Reject (REJ) flag, 2635 as described in section 9.4, in the iSER HelloReply Message. The 2636 following are the cases when the iSER Layer MUST set the REJ flag to 2637 1 in the HelloReply Message: 2639 * The initiator-declared iSER-IRD value is greater than 0 and the 2640 target-declared iSER-ORD value is 0. 2642 * The initiator-supported and the target-supported iSER protocol 2643 versions do not overlap. 2645 After requesting the RDMAP layer to send the iSER HelloReply 2646 Message, the handling of the error situation is similar to that for 2647 iSER format errors, as described in section 10.1.3.4. 2649 10.1.3.4 iSER Format Errors 2651 The following types of errors in an iSER header are considered 2652 format errors: 2654 Ko et. al. Expires January 2005 61 2655 * Illegal contents of any iSER header field 2657 * Inconsistent field contents in an iSER header 2659 * Length error for an iSER Hello or HelloReply Message (see section 2660 9.3 and 9.4) 2662 When a format error is detected, the following events MUST occur in 2663 the specified sequence: 2665 1. The iSER Layer MUST request the RDMAP layer to terminate the 2666 RDMAP Stream. The RDMAP layer MUST terminate the associated 2667 Connection. 2669 2. The iSER Layer MUST notify the iSCSI Layer by invoking the 2670 Connection_Terminate_Notify Operational Primitive. Prior to 2671 invoking the Connection_Terminate_Notify Operational Primitive, 2672 the iSER layer MUST perform the actions described in Section 2673 5.2.3.2. 2675 10.1.3.5 iSER Protocol Errors 2677 The first iSER Message sent by the iSER Layer at the initiator after 2678 transitioning into iSER-assisted mode MUST be the iSER Hello Message 2679 (see section 9.3). Likewise, the first iSER Message sent by the 2680 iSER Layer at the target after transitioning into iSER-assisted mode 2681 MUST be the iSER HelloReply Message (see section 9.4). Failure to 2682 send the iSER Hello or HelloReply Message, as indicated by the wrong 2683 Opcode in the iSER header, is a protocol error. 2685 The handling of an iSER protocol error is similar to that for iSER 2686 format errors, as described in section 10.1.3.4. 2688 10.1.4 Errors in the iSCSI Layer 2690 The error handling due to errors at the iSCSI Layer is described in 2691 the following sections. For error recovery, see section 10.2. 2693 10.1.4.1 iSCSI Format Errors 2695 When an iSCSI format error is detected, the iSCSI Layer MUST invoke 2696 the Connection_Terminate Operational Primitive to request the iSER 2697 Layer to terminate the RDMAP Stream. For more details on the 2698 connection termination, see Section 5.2.3.1. 2700 Ko et. al. Expires January 2005 62 2701 10.1.4.2 iSCSI Digest Errors 2703 In the iSER-assisted mode, the iSCSI Layer will not see any digest 2704 error because both the HeaderDigest and the DataDigest keys are 2705 negotiated to "None". 2707 10.1.4.3 iSCSI Sequence Errors 2709 For traditional iSCSI, sequence errors are caused by dropped PDUs 2710 due to header or data digest errors. Since digests are not used in 2711 iSER-assisted mode and the RDMAP layer will deliver all messages in 2712 the order they were sent, sequence errors will not occur in iSER- 2713 assisted mode. 2715 10.1.4.4 iSCSI Protocol Error 2717 When the iSCSI Layer handles certain protocol errors by dropping the 2718 connection, the error handling is similar to that for iSCSI format 2719 errors as described in section 10.1.4.1 2721 When the iSCSI Layer uses the iSCSI Reject PDU and response codes to 2722 handle certain other protocol errors, no special handling at the 2723 iSER Layer is required. 2725 10.1.4.5 SCSI Timeouts and Session Errors 2727 This is handled at the iSCSI Layer and no special handling at the 2728 iSER Layer is required. 2730 10.1.4.6 iSCSI Negotiation Failures 2732 For negotiation failures that happen during the Login Phase at the 2733 initiator after the iSCSI Layer has invoked the 2734 Allocate_Connection_Resources Operational Primitive and before the 2735 Enable_Datamover Operational Primitive has been invoked, the iSCSI 2736 Layer MUST invoke the Deallocate_Connection_Resources Operational 2737 Primitive for the iSER Layer to deallocate the iWARP resources for 2738 the connection. The iSCSI Layer at the initiator MUST terminate the 2739 Connection. 2741 For negotiation failures during the Login Phase at the target, the 2742 iSCSI Layer can use a Login Response with a status class other than 2743 0 (success) to terminate the Login Phase. If the iSCSI Layer has 2744 invoked the Allocate_Connection_Resources Operational Primitive and 2745 before the Enable_Datamover Operational Primitive has been invoked, 2746 the iSCSI Layer at the target MUST invoke the 2747 Deallocate_Connection_Resources Operational Primitive to request the 2749 Ko et. al. Expires January 2005 63 2750 iSER Layer at the target to deallocate the iWARP resources for the 2751 connection. The iSCSI Layer at both the initiator and the target 2752 MUST terminate the Connection. 2754 During the iSCSI Login Phase, if the iSCSI Layer at the initiator 2755 receives a Login Response from the target with a status class other 2756 than 0 (Success) after the iSCSI Layer at the initiator has invoked 2757 the Allocate_Connection_Resources Operational Primitive, the iSCSI 2758 Layer MUST invoke the Deallocate_Connection_Resources Operational 2759 Primitive to request the iSER Layer to deallocate all iWARP 2760 resources for the connection. The iSCSI Layer MUST terminate the 2761 Connection in this case. 2763 For negotiation failures during the full feature phase, the error 2764 handling is left to the iSCSI Layer and no special handling at the 2765 iSER Layer is required. 2767 10.2 Error Recovery 2769 Error recovery requirements of iSCSI/iSER are the same as that of 2770 traditional iSCSI. All three ErrorRecoveryLevels as defined in 2771 [iSCSI] are supported in iSCSI/iSER. 2773 * For ErrorRecoveryLevel 0, session recovery is handled by iSCSI 2774 and no special handling by the iSER Layer is required. 2776 * For ErrorRecoveryLevel 1, see section 10.2.1 on SNACK Handling 2777 and PDU Recovery. 2779 * For ErrorRecoveryLevel 2, see section 10.2.2 on Connection 2780 Recovery. 2782 The iSCSI Layer MAY invoke the Notice_Key_Values Operational 2783 Primitive during connection setup to request the iSER Layer to take 2784 note of the value of the operational ErrorRecoveryLevel, as 2785 described in sections 5.1.1 and 5.1.2. 2787 10.2.1 SNACK Handling and PDU Recovery 2789 As described in sections 10.1.4.2 and 10.1.4.3, digest and sequence 2790 errors will not occur in the iSER-assisted mode. If the RDMAP layer 2791 detects an error, it will close the iSCSI/iSER connection, as 2792 described in section 10.1.2. Therefore, PDU recovery is not useful 2793 in the iSER-assisted mode. 2795 Ko et. al. Expires January 2005 64 2796 The iSCSI Layer at the initiator SHOULD disable timeout-driven 2797 proactive SNACKs. If the iSCSI Layer at the target receives a SNACK, 2798 it MUST respond to it as required by [iSCSI]. 2800 The iSCSI Layer at the initiator SHOULD disable iSCSI timeout-driven 2801 PDU retransmissions. 2803 10.2.2 Connection Recovery 2805 The iSCSI Layer at the initiator MAY reassign connection allegiance 2806 for non-immediate commands which are still in progress and are 2807 associated with the failed connection by using a Task Management 2808 Function Request with the TASK REASSIGN function. See section 7.3.3 2809 for more details. 2811 When the iSCSI Layer at the initiator does a task reassignment for a 2812 SCSI Write command, it MUST qualify the Send_Control Operational 2813 Primitive invocation with DataDescriptorOut which defines the I/O 2814 Buffer for both the non-immediate unsolicited data and the solicited 2815 data. This allows the iSCSI Layer at the target to use recovery 2816 R2Ts to request for data originally sent as unsolicited and 2817 solicited from the initiator. 2819 When the iSCSI Layer at the target accepts a reassignment request 2820 for a SCSI Read command, it MUST invoke the Put_Data Operational 2821 Primitive to request the iSER Layer to process SCSI Data-in for all 2822 unacknowledged data. See section 7.3.5 on the handling of SCSI 2823 Data-in. 2825 When the iSCSI Layer at the target accepts a reassignment request 2826 for a SCSI Write command, it MUST invoke the Get_Data Operational 2827 Primitive to request the iSER Layer to process a recovery R2T for 2828 any non-immediate unsolicited data and any solicited data sequences 2829 that have not been received. See section 7.3.6 on the handling of 2830 Ready To Transfer (R2T). 2832 The iSCSI Layer at the target MUST NOT issue recovery R2Ts on an 2833 iSCSI/iSER connection for a task for which the connection allegiance 2834 was never reassigned. The iSER Layer at the target MAY reject such 2835 a recovery R2T received via the Get_Data Operational Primitive 2836 invocation from the iSCSI Layer at the target, with an appropriate 2837 error code. 2839 The iSER Layer at the target will process the requests invoked by 2840 the Put_Data and Get_Data Operational Primitives for a reassigned 2841 task in the same way as for the original commands. 2843 Ko et. al. Expires January 2005 65 2844 11 Security Considerations 2846 Since iSER is layered on top of the iWARP layer and provides the 2847 RDMA extensions to the iSCSI protocol, the security considerations 2848 of iSER are similar to that of the underlying RDMAP layer as 2849 described in [RDMAP]. 2851 All the security protocol mechanisms described in [iSCSI] MAY be 2852 deployed for an iSCSI/iSER connection. If the IPsec mechanism is 2853 used, then it MUST be established before the connection transitions 2854 from the traditional iSCSI mode to the iSER-assisted mode. 2856 Ko et. al. Expires January 2005 66 2857 12 IANA Considerations 2859 The login operational keys RDMAExtensions, 2860 InitiatorRecvDataSegmentLength, and TargetRecvDataSegmentLength will 2861 be registered with IANA before this draft is approved to become an 2862 RFC. 2864 Ko et. al. Expires January 2005 67 2865 13 References 2867 13.1 Normative References 2869 [RDMAP] R. Recio et al., "An RDMA Protocol Specification", IETF 2870 Internet-draft draft-ietf-rddp-rdmap-01.txt (work in progress), 2871 October 2003 2873 [DDP] H. Shah et al., "Direct Data Placement over Reliable 2874 Transports", IETF Internet-draft draft-ietf-rddp-ddp-0.1.txt 2875 (work in progress), October 2003 2877 [DA] M. Chadalapaka et al., "Datamover Architecture for iSCSI", 2878 IETF Internet-draft, draft-chadalapaka-iwarp-da-01.txt, January 2879 2004 2881 [iSCSI] J. Satran et al., "iSCSI", IETF Internet-draft draft-ietf- 2882 ips-iSCSI-20.txt (work in progress), January 2003 2884 13.2 Informative References 2886 [IPSEC] S. Kent et al., "Security Architecture for the Internet 2887 Protocol", RFC 2401, November 1998 2889 [SAM2] T10/1157D, SCSI Architecture Model - 2 (SAM-2) 2891 [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP 2892 Specification", IETF Internet-draft draft-ietf-rddp-mpa-00.txt 2893 (work in progress), October 2003 2895 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 2896 September 1981 2898 [VERBS] J. Hilland et al., "RDMA Protocol Verbs Specification", 2899 RDMAC Consortium Draft Specification draft-hilland-iwarp-verbs- 2900 v1.0-RDMAC, April 2003 2902 Ko et. al. Expires January 2005 68 2903 14 Appendix 2905 14.1 iWARP Message Format for iSER 2907 This section is for information only and is NOT part of the 2908 standard. It simply depicts the iWARP Message format for the various 2909 iSER Messages when the transport layer is TCP. 2911 14.1.1 iWARP Message Format for iSER Hello Message 2913 The following figure depicts an iSER Hello Message encapsulated in 2914 an iWARP SendSE Message. 2916 0 1 2 3 2917 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2919 | MPA Header | DDP Control | RDMA Control | 2920 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2921 | Reserved | 2922 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2923 | (Send) Queue Number | 2924 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2925 | (Send) Message Sequence Number | 2926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2927 | (Send) Message Offset | 2928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2929 | 0010b | Zeros | 0001b | 0001b | iSER-IRD | 2930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2931 | All Zeros | 2932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2933 | All Zeros | 2934 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2935 | MPA CRC | 2936 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2937 Figure 6 SendSE Message containing an iSER Hello Message 2939 Ko et. al. Expires January 2005 69 2940 14.1.2 iWARP Message Format for iSER HelloReply Message 2942 The following figure depicts an iSER HelloReply Message encapsulated 2943 in an iWARP SendSE Message. The Reject (REJ) flag is set to 0. 2945 0 1 2 3 2946 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2947 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2948 | MPA Header | DDP Control | RDMA Control | 2949 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2950 | Reserved | 2951 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2952 | (Send) Queue Number | 2953 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2954 | (Send) Message Sequence Number | 2955 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2956 | (Send) Message Offset | 2957 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2958 | 0011b |Zeros|0| 0001b | 0001b | iSER-ORD | 2959 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2960 | All Zeros | 2961 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2962 | All Zeros | 2963 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2964 | MPA CRC | 2965 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2966 Figure 7 SendSE Message containing an iSER HelloReply Message 2968 Ko et. al. Expires January 2005 70 2969 14.1.3 iWARP Message Format for SCSI Read Command PDU 2971 The following figure depicts a SCSI Read Command PDU embedded in an 2972 iSER Message encapsulated in an iWARP SendSE Message. For this 2973 particular example, in the iSER header, the Write STag Valid flag is 2974 set to zero, the Read STag Valid flag is set to one, the Write STag 2975 field is set to all zeros, and the Read STag field contains a valid 2976 Read STag. 2978 0 1 2 3 2979 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2980 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2981 | MPA Header | DDP Control | RDMA Control | 2982 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2983 | Reserved | 2984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2985 | (Send) Queue Number | 2986 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2987 | (Send) Message Sequence Number | 2988 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2989 | (Send) Message Offset | 2990 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2991 | 0001b |0|1| All zeros | 2992 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2993 | All Zeros | 2994 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2995 | Read STag | 2996 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2997 | SCSI Read Command PDU | 2998 // // 2999 | | 3000 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3001 | MPA CRC | 3002 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3003 Figure 8 SendSE Message containing a SCSI Read Command PDU 3005 Ko et. al. Expires January 2005 71 3006 14.1.4 iWARP Message Format for SCSI Read Data 3008 The following figure depicts an iWARP RDMA Write Message carrying 3009 SCSI Read data in the payload: 3011 0 1 2 3 3012 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3013 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3014 | MPA Header | DDP Control | RDMA Control | 3015 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3016 | Data Sink STag | 3017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3018 | Data Sink Tagged Offset | 3019 + + 3020 | | 3021 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3022 | SCSI Read data | 3023 // // 3024 | | 3025 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3026 | MPA CRC | 3027 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3028 Figure 9 RDMA Write Message containing SCSI Read Data 3030 Ko et. al. Expires January 2005 72 3031 14.1.5 iWARP Message Format for SCSI Write Command PDU 3033 The following figure depicts a SCSI Write Command PDU embedded in an 3034 iSER Message encapsulated in an iWARP SendSE Message. For this 3035 particular example, in the iSER header, the Write STag Valid flag is 3036 set to one, the Read STag Valid flag is set to zero, the Write STag 3037 field contains a valid Write STag, and the Read STag field is set to 3038 all zeros since it is not used. 3040 0 1 2 3 3041 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3042 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3043 | MPA Header | DDP Control | RDMA Control | 3044 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3045 | Reserved | 3046 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3047 | (Send) Queue Number | 3048 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3049 | (Send) Message Sequence Number | 3050 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3051 | (Send) Message Offset | 3052 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3053 | 0001b |1|0| All zeros | 3054 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3055 | Write STag | 3056 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3057 | All Zeros | 3058 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3059 | SCSI Write Command PDU | 3060 // // 3061 | | 3062 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3063 | MPA CRC | 3064 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3065 Figure 10 SendSE Message containing a SCSI Write Command PDU 3067 Ko et. al. Expires January 2005 73 3068 14.1.6 iWARP Message Format for RDMA Read Request 3070 An iSCSI R2T is transformed into an iWARP RDMA Read Request Message. 3071 The following figure depicts an iWARP RDMA Read Request Message: 3073 0 1 2 3 3074 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3075 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3076 | MPA Header | DDP Control | RDMA Control | 3077 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3078 | Reserved (Not Used) | 3079 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3080 | DDP (RDMA Read Request) Queue Number | 3081 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3082 | DDP (RDMA Read Request) Message Sequence Number | 3083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3084 | DDP (RDMA Read Request) Message Offset | 3085 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3086 | Data Sink STag (SinkSTag) | 3087 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3088 | | 3089 + Data Sink Tagged Offset (SinkTO) + 3090 | | 3091 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3092 | RDMA Read Message Size (RDMARDSZ) | 3093 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3094 | Data Source STag (SrcSTag) | 3095 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3096 | | 3097 + Data Source Tagged Offset (SrcTO) + 3098 | | 3099 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3100 | MPA CRC | 3101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3102 Figure 11 RDMA Read Request Message 3104 Ko et. al. Expires January 2005 74 3105 14.1.7 iWARP Message Format for Solicited SCSI Write Data 3107 The following figure depicts an iWARP RDMA Read Response Message 3108 carrying the solicited SCSI Write data in the payload: 3110 0 1 2 3 3111 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3112 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3113 | MPA Header | DDP Control | RDMA Control | 3114 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3115 | Data Sink STag | 3116 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3117 | Data Sink Tagged Offset | 3118 + + 3119 | | 3120 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3121 | SCSI Write Data | 3122 // // 3123 | | 3124 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3125 | MPA CRC | 3126 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3127 Figure 12 RDMA Read Response Message containing SCSI Write Data 3129 Ko et. al. Expires January 2005 75 3130 14.1.8 iWARP Message Format for SCSI Response PDU 3132 The following figure depicts a SCSI Response PDU embedded in an iSER 3133 Message encapsulated in an iWARP SendInvSE Message: 3135 0 1 2 3 3136 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3137 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3138 | MPA Header | DDP Control | RDMA Control | 3139 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3140 | Invalidate STag | 3141 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3142 | (Send) Queue Number | 3143 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3144 | (Send) Message Sequence Number | 3145 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3146 | (Send) Message Offset | 3147 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3148 | 0001b |0|0| All Zeros | 3149 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3150 | All Zeros | 3151 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3152 | All Zeros | 3153 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3154 | SCSI Response PDU | 3155 // // 3156 | | 3157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3158 | MPA CRC | 3159 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3160 Figure 13 SendInvSE Message containing SCSI Response PDU 3162 Ko et. al. Expires January 2005 76 3163 15 Author�s Address 3165 Mallikarjun Chadalapaka 3166 Hewlett-Packard Company 3167 8000 Foothills Blvd. 3168 Roseville, CA 95747-5668, USA 3169 Phone: +1-916-785-5621 3170 Email: cbm@rose.hp.com 3172 Uri Elzur 3173 Broadcom Corporation 3174 16215 Alton Parkway 3175 Irvine, California 92619-7013, USA 3176 Phone: +1-949-926-6432 3177 Email: Uri@Broadcom.com 3179 John Hufferd 3180 IBM Corp. 3181 5600 Cottle Rd. 3182 San Jose, CA 95120, USA 3183 Phone: +1-408-256-0403 3184 Email: hufferd@us.ibm.com 3186 Mike Ko 3187 IBM Corp. 3188 650 Harry Rd. 3189 San Jose, CA 95120, USA 3190 Phone: +1-408-927-2085 3191 Email: mako@us.ibm.com 3193 Hemal Shah 3194 Intel Corporation 3195 MS AN1-PTL1 3196 1501 South Mopac Expressway, #400 3197 Austin, Texas 78746, USA 3198 Phone: +1-512-732-3963 3199 Email: hemal.shah@intel.com 3201 Patricia Thaler 3202 Agilent Technologies, Inc. 3203 1101 Creekside Ridge Drive, #100 3204 M/S-RG10 3205 Roseville, CA 95678, USA 3206 Phone: +1-916-788-5662 3207 email: pat_thaler@agilent.com 3209 Ko et. al. Expires January 2005 77 3210 16 Acknowledgments 3212 Dwight Barron 3213 Hewlett-Packard Company 3214 20555 SH.249 3215 Houston, TX 77070-2698, USA 3216 Phone: +1-281-514-2769 3217 Email: Dwight.Barron@Hp.com 3219 John Carrier 3220 Adaptec, Inc. 3221 691 S. Milpitas Blvd. 3222 Milpitas, CA 95035, USA 3223 Phone: +1-360-378-8526 3224 Email: john_carrier@adaptec.com 3226 Ted Compton 3227 EMC Corporation 3228 Research Triangle Park, NC 27709, USA 3229 Phone: +1-919-248-6075 3230 Email: compton_ted@emc.com 3232 Paul R. Culley 3233 Hewlett-Packard Company 3234 20555 SH 249 3235 Houston, Tx. 77070-2698, USA 3236 Phone: +1-281-514-5543 3237 Email: paul.culley@hp.com 3239 Jeff Hilland 3240 Hewlett-Packard Company 3241 20555 SH 249 3242 Houston, Tx. 77070-2698, USA 3243 Phone: +1-281-514-9489 3244 Email: jeff.hilland@hp.com 3246 Mike Krause 3247 Hewlett-Packard Company 3248 43LN 3249 19410 Homestead Road 3250 Cupertino, CA 95014, USA 3251 Phone: +1-408-447-3191 3252 Email: krause@cup.hp.com 3254 Ko et. al. Expires January 2005 78 3255 Jim Pinkerton 3256 Microsoft, Inc. 3257 One Microsoft Way 3258 Redmond, WA, 98052, USA 3259 Email: jpink@windows.microsoft.com 3261 Renato J. Recio 3262 IBM Corp. 3263 11501 Burnett Road 3264 Austin, TX 78758, USA 3265 Phone: +1-512-838-3685 3266 Email: recio@us.ibm.com 3268 Julian Satran 3269 IBM Corp. 3270 Haifa Research Lab 3271 Haifa University Campus - Mount Carmel 3272 Haifa 31905, Israel 3273 Phone: +972-4-829-6264 3274 Email: Julian_Satran@il.ibm.com 3276 Tom Talpey 3277 Network Appliance 3278 375 Totten Pond Road 3279 Waltham, MA 02451, USA 3280 Phone: +1-781-768-5329 3281 EMail: thomas.talpey@netapp.com 3283 Jim Wendt 3284 Hewlett-Packard Company 3285 8000 Foothills Boulevard MS 5668 3286 Roseville, CA 95747-5668, USA 3287 Phone: +1-916-785-5198 3288 Email: jim_wendt@hp.com 3290 Ko et. al. Expires January 2005 79 3291 17 Full Copyright Statement 3293 Copyright (C) The Internet Society (2004). This document is subject 3294 to the rights, licenses and restrictions contained in BCP 78, and 3295 except as set forth therein, the authors retain all their rights. 3297 This document and the information contained herein is provided on an 3298 "AS IS" basis and ADAPTEC INC., AGILENT TECHNOLOGIES INC., BROADCOM 3299 CORPORATION, CISCO SYSTEMS INC., DELL COMPUTER CORPORATION, EMC 3300 CORPORATION, HEWLETT-PACKARD COMPANY, INTERNATIONAL BUSINESS 3301 MACHINES CORPORATION, INTEL CORPORATION, MICROSOFT CORPORATION, 3302 NETWORK APPLIANCE INC., THE INTERNET SOCIETY, AND THE INTERNET 3303 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 3304 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 3305 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3306 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3308 The IETF takes no position regarding the validity or scope of any 3309 Intellectual Property Rights or other rights that might be claimed 3310 to pertain to the implementation or use of the technology 3311 described in this document or the extent to which any license 3312 under such rights might or might not be available; nor does it 3313 represent that it has made any independent effort to identify any 3314 such rights. Information on the procedures with respect to rights 3315 in RFC documents can be found in BCP 78 and BCP 79. 3317 Copies of IPR disclosures made to the IETF Secretariat and any 3318 assurances of licenses to be made available, or the result of an 3319 attempt made to obtain a general license or permission for the use 3320 of such proprietary rights by implementers or users of this 3321 specification can be obtained from the IETF on-line IPR repository 3322 at http://www.ietf.org/ipr. 3324 The IETF invites any interested party to bring to its attention 3325 any copyrights, patents or patent applications, or other 3326 proprietary rights that may cover technology that may be required 3327 to implement this standard. Please address the information to the 3328 IETF at ietf-ipr@ietf.org. 3330 Ko et. al. Expires January 2005 80