idnits 2.17.1 draft-ietf-ips-iser-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 22. -- Found old boilerplate from RFC 3978, Section 5.5 on line 3964. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 3973. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 3980. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 3986. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: The iSCSI Layer MUST not send a Login Request (or a Login Response) PDU during the full feature phase. A Login Request (or a Login Response) PDU, if used, MUST be treated as an iSCSI protocol error. The iSER Layer MAY reject such a PDU from the iSCSI Layer with an appropriate error code. If a Login Request PDU is received by the iSCSI Layer at the target, it MUST respond with a Reject PDU with a reason code of "protocol error". == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: PDUs in the Unregulated and Unexpected category are PDUs with the immediate flag set. The number of PDUs in this category which can be sent by an initiator is controlled by the value of MaxOutstandingUnexpectedPDUs declared by the target. (See section 6.7.) After a PDU in this category is sent by the initiator, it is outstanding until it is retired. At any time, the number of outstanding unexpected PDUs MUST not exceed the value of MaxOutstandingUnexpectedPDUs declared by the target. == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: For the control-type PDUs that can be sent by a target and are unexpected by the initiator, the number is controlled by MaxOutstandingUnexpectedPDUs declared by the initiator. (See section 6.7.) After a PDU in this category is sent by a target, it is outstanding until it is retired. At any time, the number of outstanding unexpected PDUs MUST not exceed the value of MaxOutstandingUnexpectedPDUs declared by the initiator. The initiator uses the value of MaxOutstandingUnexpectedPDUs that it declared to determine the amount of buffer resources required for control-type PDUs in this category that can be sent by a target. The following is a list of the PDUs in this category and the conditions for retiring the outstanding PDU: -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: 'RFC2119' is defined on line 3480, but no explicit reference was found in the text == Unused Reference: 'VERBS' is defined on line 3491, but no explicit reference was found in the text == Unused Reference: 'IPSEC' is defined on line 3495, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3720 (Obsoleted by RFC 7143) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) -- No information found for draft-ietf-ips-da - is the name correct? -- No information found for draft-hilland-iwarp-verbs-v1 - is the name correct? -- Obsolete informational reference (is this intentional?): RFC 2401 (ref. 'IPSEC') (Obsoleted by RFC 4301) Summary: 5 errors (**), 0 flaws (~~), 10 warnings (==), 10 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT Mike Ko 3 draft-ietf-ips-iser-06.txt IBM Corporation 4 Mallikarjun Chadalapaka 5 Hewlett-Packard Company 6 John Hufferd 7 Brocade, Inc. 8 Uri Elzur 9 Hemal Shah 10 Patricia Thaler 11 Broadcom Corporation 13 Expires: May, 2007 15 iSCSI Extensions for RDMA Specification 17 Status of this Memo 19 By submitting this Internet-Draft, each author represents that any 20 applicable patent or other IPR claims of which he or she is aware 21 have been or will be disclosed, and any of which he or she becomes 22 aware will be disclosed, in accordance with Section 6 of BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF), its areas, and its working groups. Note that 26 other groups may also distribute working documents as Internet- 27 Drafts. 29 Internet-Drafts are draft documents valid for a maximum of six 30 months and may be updated, replaced, or obsoleted by other documents 31 at any time. It is inappropriate to use Internet-Drafts as 32 reference material or to cite them other than as "work in progress." 34 The list of current Internet-Drafts can be accessed at 35 http://www.ietf.org/1id-abstracts.html. 37 The list of Internet-Draft Shadow Directories can be accessed at 38 http://www.ietf.org/shadow.html. 40 Abstract 42 iSCSI Extensions for RDMA provides the RDMA data transfer capability 43 to iSCSI by layering iSCSI on top of an RDMA-Capable Protocol such 44 as the iWARP protocol suite. An RDMA-Capable Protocol provides RDMA 45 Read and Write services, which enable data to be transferred 46 directly into SCSI I/O Buffers without intermediate data copies. 47 This document describes the extensions to the iSCSI protocol to 49 Ko et al. Expires May 2007 1 50 support RDMA services as provided by an RDMA-Capable Protocol such 51 as the iWARP protocol suite. 53 Ko et al. Expires May 2007 2 54 Table of Contents 56 1 Definitions and Acronyms....................................7 57 1.1 Definitions.................................................7 58 1.2 Acronyms...................................................13 59 1.3 Conventions................................................15 60 2 Introduction...............................................16 61 2.1 Motivation.................................................16 62 2.2 Architectural Goals........................................17 63 2.3 Protocol Overview..........................................18 64 2.4 RDMA services and iSER.....................................19 65 2.4.1 STag......................................................19 66 2.4.2 Send......................................................20 67 2.4.3 RDMA Write................................................20 68 2.4.4 RDMA Read.................................................20 69 2.5 SCSI Read Overview.........................................21 70 2.6 SCSI Write Overview........................................21 71 2.7 iSCSI/iSER Layering........................................22 72 3 Upper Layer Interface Requirements.........................23 73 3.1 Operational Primitives offered by iSER.....................23 74 3.1.1 Send_Control..............................................24 75 3.1.2 Put_Data..................................................24 76 3.1.3 Get_Data..................................................24 77 3.1.4 Allocate_Connection_Resources.............................25 78 3.1.5 Deallocate_Connection_Resources...........................25 79 3.1.6 Enable_Datamover..........................................25 80 3.1.7 Connection_Terminate......................................26 81 3.1.8 Notice_Key_Values.........................................26 82 3.1.9 Deallocate_Task_Resources.................................26 83 3.2 Operational Primitives used by iSER........................27 84 3.2.1 Control_Notify............................................27 85 3.2.2 Data_Completion_Notify....................................27 86 3.2.3 Data_ACK_Notify...........................................28 87 3.2.4 Connection_Terminate_Notify...............................28 88 3.3 iSCSI Protocol Usage Requirements..........................28 89 4 Lower Layer Interface Requirements.........................30 90 4.1 Interactions with the RCaP Layer...........................30 91 4.2 Interactions with the Transport Layer......................31 92 5 Connection Setup and Termination...........................32 93 5.1 iSCSI/iSER Connection Setup................................32 94 5.1.1 Initiator Behavior........................................33 95 5.1.2 Target Behavior...........................................35 96 5.1.3 iSER Hello Exchange.......................................37 97 5.2 iSCSI/iSER Connection Termination..........................38 98 5.2.1 Normal Connection Termination at the Initiator............38 99 5.2.2 Normal Connection Termination at the Target...............38 100 5.2.3 Termination without Logout Request/Response PDUs..........39 102 Ko et al. Expires May 2007 3 103 6 Login/Text Operational Keys................................41 104 6.1 HeaderDigest and DataDigest................................41 105 6.2 MaxRecvDataSegmentLength...................................41 106 6.3 RDMAExtensions.............................................41 107 6.4 TargetRecvDataSegmentLength................................42 108 6.5 InitiatorRecvDataSegmentLength.............................43 109 6.6 OFMarker and IFMarker......................................43 110 6.7 MaxOutstandingUnexpectedPDUs...............................44 111 7 iSCSI PDU Considerations...................................45 112 7.1 iSCSI Data-Type PDU........................................45 113 7.2 iSCSI Control-Type PDU.....................................46 114 7.3 iSCSI PDUs.................................................46 115 7.3.1 SCSI Command..............................................46 116 7.3.2 SCSI Response.............................................48 117 7.3.3 Task Management Function Request/Response.................49 118 7.3.4 SCSI Data-out.............................................51 119 7.3.5 SCSI Data-in..............................................51 120 7.3.6 Ready To Transfer (R2T)...................................54 121 7.3.7 Asynchronous Message......................................56 122 7.3.8 Text Request & Text Response..............................56 123 7.3.9 Login Request & Login Response............................56 124 7.3.10 Logout Request & Logout Response........................56 125 7.3.11 SNACK Request...........................................57 126 7.3.12 Reject..................................................57 127 7.3.13 NOP-Out & NOP-In........................................57 128 8 Flow Control and STag Management...........................58 129 8.1 Flow Control for RDMA Send Message Types...................58 130 8.1.1 Flow Control for Control-Type PDUs from the Initiator.....58 131 8.1.2 Flow Control for Control-Type PDUs from the Target........61 132 8.2 Flow Control for RDMA Read Resources.......................62 133 8.3 STag Management............................................62 134 8.3.1 Allocation of STags.......................................63 135 8.3.2 Invalidation of STags.....................................63 136 9 iSER Control and Data Transfer.............................65 137 9.1 iSER Header Format.........................................65 138 9.2 iSER Header Format for iSCSI Control-Type PDU..............65 139 9.3 iSER Header Format for iSER Hello Message..................67 140 9.4 iSER Header Format for iSER HelloReply Message.............68 141 9.5 SCSI Data Transfer Operations..............................69 142 9.5.1 SCSI Write Operation......................................69 143 9.5.2 SCSI Read Operation.......................................70 144 9.5.3 Bidirectional Operation...................................70 145 10 iSER Error Handling and Recovery...........................71 146 10.1 Error Handling............................................71 147 10.1.1 Errors in the Transport Layer...........................71 148 10.1.2 Errors in the RCaP Layer................................72 149 10.1.3 Errors in the iSER Layer................................72 151 Ko et al. Expires May 2007 4 152 10.1.4 Errors in the iSCSI Layer...............................74 153 10.2 Error Recovery............................................76 154 10.2.1 PDU Recovery............................................76 155 10.2.2 Connection Recovery.....................................77 156 11 Security Considerations....................................78 157 12 IANA Considerations........................................79 158 13 References.................................................80 159 13.1 Normative References......................................80 160 13.2 Informative References....................................80 161 14 Appendix A.................................................82 162 14.1 iWARP Message Format for iSER.............................82 163 14.1.1 iWARP Message Format for iSER Hello Message.............82 164 14.1.2 iWARP Message Format for iSER HelloReply Message........83 165 14.1.3 iWARP Message Format for SCSI Read Command PDU..........84 166 14.1.4 iWARP Message Format for SCSI Read Data.................85 167 14.1.5 iWARP Message Format for SCSI Write Command PDU.........86 168 14.1.6 iWARP Message Format for RDMA Read Request..............87 169 14.1.7 iWARP Message Format for Solicited SCSI Write Data......88 170 14.1.8 iWARP Message Format for SCSI Response PDU..............89 171 15 Appendix B.................................................90 172 15.1 Architectural discussion of iSER over InfiniBand..........90 173 15.2 The Host side of the iSCSI & iSER connections in Infiniband90 174 15.3 The Storage side of iSCSI & iSER mixed network environment91 175 15.4 Discovery processes for an InfiniBand Host................91 176 15.5 IBTA Connection specifications............................92 177 16 Author's Address...........................................93 178 17 Acknowledgments............................................94 179 18 Full Copyright Statement...................................95 181 Ko et al. Expires May 2007 5 182 Table of Figures 184 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase...22 185 Figure 2 iSER Header Format.....................................65 186 Figure 3 iSER Header Format for iSCSI Control-Type PDU..........66 187 Figure 4 iSER Header Format for iSER Hello Message..............67 188 Figure 5 iSER Header Format for iSER HelloReply Message.........68 189 Figure 6 SendSE Message containing an iSER Hello Message........82 190 Figure 7 SendSE Message containing an iSER HelloReply Message...83 191 Figure 8 SendSE Message containing a SCSI Read Command PDU......84 192 Figure 9 RDMA Write Message containing SCSI Read Data...........85 193 Figure 10 SendSE Message containing a SCSI Write Command PDU....86 194 Figure 11 RDMA Read Request Message.............................87 195 Figure 12 RDMA Read Response Message containing SCSI Write Data.88 196 Figure 13 SendInvSE Message containing SCSI Response PDU........89 197 Figure 14 iSCSI and iSER on IB..................................90 198 Figure 15 Storage Controller with TCP, iWARP, and IB Connections91 200 Ko et al. Expires May 2007 6 201 1 Definitions and Acronyms 203 1.1 Definitions 205 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 206 The act of informing a remote iSER Layer that a local node's 207 buffer is available to it. A Node makes a buffer available for 208 incoming RDMA Read Request Message or incoming RDMA Write 209 Message access by informing the remote iSER Layer of the Tagged 210 Buffer identifiers (STag, TO, and buffer length). Note that 211 this Advertisement of Tagged Buffer information is the 212 responsibility of the iSER Layer on either end and is not 213 defined by the RDMA-Capable Protocol. A typical method would be 214 for the iSER Layer to embed the Tagged Buffer's STag, TO, and 215 buffer length in a Send Message destined for the remote iSER 216 Layer. 218 Completion (Completed, Complete, Completes) - Completion is defined 219 as the process by the RDMA-Capable Protocol layer to inform the 220 iSER Layer, that a particular RDMA Operation has performed all 221 functions specified for the RDMA Operation. 223 Connection - A connection is a logical circuit between the initiator 224 and the target, e.g., a TCP connection. Communication between 225 the initiator and the target occurs over one or more 226 connections. The connections carry control messages, SCSI 227 commands, parameters, and data within iSCSI Protocol Data Units 228 (iSCSI PDUs). 230 Connection Handle - An information element that identifies the 231 particular iSCSI connection and is unique for a given iSCSI-iSER 232 pair. Every invocation of an Operational Primitive is qualified 233 with the Connection Handle. 235 Data Sink - The peer receiving a data payload. Note that the Data 236 Sink can be required to both send and receive RCaP Messages to 237 transfer a data payload. 239 Data Source - The peer sending a data payload. Note that the Data 240 Source can be required to both send and receive RCaP Messages to 241 transfer a data payload. 243 Datamover Interface (DI) - The interface between the iSCSI Layer and 244 the Datamover Layer as described in [DA]. 246 Datamover Layer - A layer that is directly below the iSCSI Layer and 247 above the underlying transport layers. This layer exposes and 249 Ko et al. Expires May 2007 7 250 uses a set of transport independent Operational Primitives for 251 the communication between the iSCSI Layer and itself. The 252 Datamover layer, operating in conjunction with the transport 253 layers, moves the control and data information on the iSCSI 254 connection. In this specification, the iSER Layer is the 255 Datamover layer. 257 Datamover Protocol - A Datamover protocol is the wire-protocol that 258 is defined to realize the Datamover layer functionality. In 259 this specification, the iSER protocol is the Datamover protocol. 261 Event - An indication provided by the RDMA-Capable Protocol layer to 262 the iSER Layer to indicate a Completion or other condition 263 requiring immediate attention. 265 Inbound RDMA Read Queue Depth (IRD) - The maximum number of incoming 266 outstanding RDMA Read Requests that the RDMA-Capable Controller 267 can handle on a particular RCaP Stream at the Data Source. For 268 some RDMA-Capable Protocol layers, the term "IRD" may be known 269 by a different name. For example, for InfiniBand, the 270 equivalent for IRD is the Responder Resources. 272 Invalidate STag - A mechanism used to prevent the Remote Peer from 273 reusing a previous explicitly Advertised STag, until the iSER 274 Layer at the local node makes it available through a subsequent 275 explicit Advertisement. 277 I/O Buffer - A buffer that is used in a SCSI Read or Write operation 278 so SCSI data may be sent from or received into that buffer. 280 iSCSI - The iSCSI protocol as defined in [RFC3720] is a mapping of 281 the SCSI Architecture Model of SAM-2 over TCP. 283 iSCSI control-type PDU - Any iSCSI PDU that is not an iSCSI data- 284 type PDU and also not a SCSI Data-out PDU carrying solicited 285 data is defined as an iSCSI control-type PDU. Specifically, it 286 is to be noted that SCSI Data-out PDUs for unsolicited data are 287 defined as iSCSI control-type PDUs. 289 iSCSI data-type PDU - An iSCSI data-type PDU is defined as an iSCSI 290 PDU that causes data transfer, transparent to the remote iSCSI 291 Layer, to take place between the peer iSCSI nodes on a full 292 feature phase iSCSI connection. An iSCSI data-type PDU, when 293 requested for transmission by the sender iSCSI Layer, results in 294 the associated data transfer without the participation of the 295 remote iSCSI Layer, i.e. the PDU itself is not delivered as-is 297 Ko et al. Expires May 2007 8 298 to the remote iSCSI Layer. The following iSCSI PDUs constitute 299 the set of iSCSI data-type PDUs - SCSI Data-In PDU and R2T PDU. 301 iSCSI Layer - A layer in the protocol stack implementation within an 302 end node that implements the iSCSI protocol and interfaces with 303 the iSER Layer via the Datamover Interface. 305 iSCSI PDU (iSCSI Protocol Data Unit) - The iSCSI Layer at the 306 initiator and the iSCSI Layer at the target divide their 307 communications into messages. The term "iSCSI protocol data 308 unit" (iSCSI PDU) is used for these messages. 310 iSCSI/iSER Connection - An iSER-assisted iSCSI connection. 312 iSCSI/iSER Session - An iSER-assisted iSCSI session. 314 iSCSI-iSER Pair - The iSCSI Layer and the underlying iSER Layer. 316 iSER - iSCSI Extensions for RDMA, the protocol defined in this 317 document. 319 iSER-assisted - A term generally used to describe the operation of 320 iSCSI when the iSER functionality is also enabled below the 321 iSCSI Layer for the specific iSCSI/iSER connection in question. 323 iSER-IRD - This variable represents the maximum number of incoming 324 outstanding RDMA Read Requests that the iSER Layer at the 325 initiator declares on a particular RCaP Stream. 327 iSER-ORD - This variable represents the maximum number of 328 outstanding RDMA Read Requests that the iSER Layer can initiate 329 on a particular RCaP Stream. This variable is maintained only 330 by the iSER Layer at the target. 332 iSER Layer - The layer that implements the iSCSI Extensions for RDMA 333 (iSER) protocol. 335 iWARP - A suite of wire protocols comprising of [RDMAP], [DDP], and 336 [MPA] when layered above [TCP]. [RDMAP] and [DDP] may be 337 layered above SCTP or other transport protocols. 339 Local Mapping - A task state record maintained by the iSER Layer 340 that associates the Initiator Task Tag to the Local STag(s). 341 The specifics of the record structure are implementation 342 dependent. 344 Ko et al. Expires May 2007 9 345 Local Peer - The implementation of the RDMA-Capable Protocol on the 346 local end of the connection. Used to refer to the local entity 347 when describing protocol exchanges or other interactions between 348 two Nodes. 350 Node - A computing device attached to one or more links of a 351 network. A Node in this context does not refer to a specific 352 application or protocol instantiation running on the computer. 353 A Node may consist of one or more RDMA-Capable Controllers 354 installed in a host computer. 356 Operational Primitive - An Operational Primitive is an abstract 357 functional interface procedure that requests another layer to 358 perform a specific action on the requestor's behalf or notifies 359 the other layer of some event. The Datamover Interface between 360 an iSCSI Layer and a Datamover layer within an iSCSI end node 361 uses a set of Operational Primitives to define the functional 362 interface between the two layers. Note that not every 363 invocation of an Operational Primitive may elicit a response 364 from the requested layer. A full discussion of the Operational 365 Primitive types and request-response semantics available to 366 iSCSI and iSER can be found in [DA]. 368 Outbound RDMA Read Queue Depth (ORD) - The maximum number of 369 outstanding RDMA Read Requests that the RDMA-Capable Controller 370 can initiate on a particular RCaP Stream at the Data Sink. For 371 some RDMA-Capable Protocol layer, the term "ORD" may be known by 372 a different name. For example, for InfiniBand, the equivalent 373 for ORD is the Initiator Depth. 375 Phase-Collapse - Refers to the optimization in iSCSI where the SCSI 376 status is transferred along with the final SCSI Data-in PDU from 377 a target. See section 3.2 in [RFC3720]. 379 RCaP Message - One or more packets of the network layer comprising a 380 single RDMA operation or a part of an RDMA Read Operation of the 381 RDMA-Capable Protocol. For iWARP, an RCaP Message is known as 382 an RDMAP Message. 384 RCaP Stream - A single bidirectional association between the peer 385 RDMA-Capable Protocol layers on two Nodes over a single 386 transport-level stream. For iWARP, an RCaP Stream is known as 387 an RDMAP Stream, and the association is created when the 388 connection transitions to iSER-assisted mode following a 389 successful Login Phase during which iSER support is negotiated. 391 Ko et al. Expires May 2007 10 392 RDMA-Capable Protocol (RCaP) - The protocol or protocol suite that 393 provides a reliable RDMA transport functionality, e.g., iWARP, 394 InfiniBand, etc. 396 RDMA-Capable Controller - A network I/O adapter or embedded 397 controller with RDMA functionality. For example, for iWARP, 398 this could be an RNIC, and for InfiniBand, this could be a HCA 399 (Host Channel Adapter) or TCA (Target Channel Adapter). 401 RDMA-enabled Network Interface Controller (RNIC) - A network I/O 402 adapter or embedded controller with iWARP functionality. 404 RDMA Operation - A sequence of RCaP Messages, including control 405 Messages, to transfer data from a Data Source to a Data Sink. 406 The following RDMA Operations are defined - RDMA Write 407 Operation, RDMA Read Operation, Send Operation, Send with 408 Invalidate Operation, Send with Solicited Event Operation, Send 409 with Solicited Event and Invalidate Operation, and Terminate 410 Operation. 412 RDMA Protocol (RDMAP) - A wire protocol that supports RDMA 413 Operations to transfer ULP data between a Local Peer and the 414 Remote Peer as described in [RDMAP]. 416 RDMA Read Operation - An RDMA Operation used by the Data Sink to 417 transfer the contents of a Data Source buffer from the Remote 418 Peer to a Data Sink buffer at the Local Peer. An RDMA Read 419 operation consists of a single RDMA Read Request Message and a 420 single RDMA Read Response Message. 422 RDMA Read Request - An RCaP Message used by the Data Sink to request 423 the Data Source to transfer the contents of a buffer. The RDMA 424 Read Request Message describes both the Data Source and the Data 425 Sink buffers. 427 RDMA Read Response - An RCaP Message used by the Data Source to 428 transfer the contents of a buffer to the Data Sink, in response 429 to an RDMA Read Request. The RDMA Read Response Message only 430 describes the Data Sink buffer. 432 RDMA Write Operation - An RDMA Operation used by the Data Source to 433 transfer the contents of a Data Source buffer from the Local 434 Peer to a Data Sink buffer at the Remote Peer. The RDMA Write 435 Message only describes the Data Sink buffer. 437 Remote Direct Memory Access (RDMA) - A method of accessing memory on 438 a remote system in which the local system specifies the remote 440 Ko et al. Expires May 2007 11 441 location of the data to be transferred. Employing an RDMA- 442 Capable Controller in the remote system allows the access to take 443 place without interrupting the processing of the CPU(s) on the 444 system. 446 Remote Mapping - A task state record maintained by the iSER Layer 447 that associates the Initiator Task Tag to the Advertised STag(s). 448 The specifics of the record structure are implementation 449 dependent. 451 Remote Peer - The implementation of the RDMA-Capable Protocol on the 452 opposite end of the connection. Used to refer to the remote 453 entity when describing protocol exchanges or other interactions 454 between two Nodes. 456 SCSI Layer - This layer builds/receives SCSI CDBs (Command 457 Descriptor Blocks) and sends/receives them with the remaining 458 command execute [SAM2] parameters to/from the iSCSI Layer. 460 Send - An RDMA Operation that transfers the contents of a Buffer 461 from the Local Peer to a Buffer at the Remote Peer. 463 Send Message Type - A Send Message, Send with Invalidate Message, 464 Send with Solicited Event Message, or Send with Solicited Event 465 and Invalidate Message. 467 SendInvSE Message - A Send with Solicited Event and Invalidate 468 Message. 470 SendSE Message - A Send with Solicited Event Message 472 Sequence Number (SN) - DataSN for a SCSI Data-in PDU and R2TSN for 473 an R2T PDU. The semantics for both types of sequence numbers 474 are as defined in [RFC3720]. 476 Session, iSCSI Session - The group of Connections that link an 477 initiator SCSI port with a target SCSI port form an iSCSI 478 session (equivalent to a SCSI I-T nexus). Connections can be 479 added to and removed from a session even while the I-T nexus is 480 intact. Across all connections within a session, an initiator 481 sees one and the same target. 483 Solicited Event (SE) - A facility by which an RDMA Operation sender 484 may cause an Event to be generated at the recipient, if the 485 recipient is configured to generate such an Event, when a Send 486 with Solicited Event or Send with Solicited Event and Invalidate 487 Message is received. 489 Ko et al. Expires May 2007 12 490 Steering Tag (STag) - An identifier of a Tagged Buffer on a Node 491 (Local or Remote) as defined in [RDMAP] and [DDP]. For other 492 RDMA-Capable Protocols, the Steering Tag may be known by 493 different names but will be herein referred to as STags. For 494 example, for Infiniband, a Remote STag is known as an R-Key, and 495 a Local STag is known as an L-Key, and both will be considered 496 STags. 498 Tagged Buffer - A buffer that is explicitly Advertised to the iSER 499 Layer at the remote node through the exchange of an STag, Tagged 500 Offset, and length. 502 Tagged Offset (TO) - The offset within a Tagged Buffer. 504 Traditional iSCSI - Refers to the iSCSI protocol as defined in 505 [RFC3720] (i.e. without the iSER enhancements). 507 Untagged Buffer - A buffer that is not explicitly Advertised to the 508 iSER Layer at the remode node. 510 1.2 Acronyms 512 Acronym Definition 514 -------------------------------------------------------------- 516 AHS Additional Header Segment 518 BHS Basic Header Segment 520 CO Connection Only 522 CRC Cyclic Redundancy Check 524 DDP Direct Data Placement Protocol 526 DI Datamover Interface 528 HCA Host Channel Adapter 530 IANA Internet Assigned Numbers Authority 532 IB Infiniband 534 IETF Internet Engineering Task Force 536 I/O Input - Output 538 Ko et al. Expires May 2007 13 539 IO Initialize Only 541 IP Internet Protocol 543 IPoIB IP over Infiniband 545 IPsec Internet Protocol Security 547 iSER iSCSI Extensions for RDMA 549 ITT Initiator Task Tag 551 LO Leading Only 553 MPA Marker PDU Aligned Framing for TCP 555 NOP No Operation 557 NSG Next Stage (during the iSCSI Login Phase) 559 OS Operating System 561 PDU Protocol Data Unit 563 R2T Ready To Transfer 565 R2TSN Ready To Transfer Sequence Number 567 RDMA Remote Direct Memory Access 569 RDMAP Remote Direct Memory Access Protocol 571 RFC Request For Comments 573 RNIC RDMA-enabled Network Interface Controller 575 SAM2 SCSI Architecture Model - 2 577 SCSI Small Computer Systems Interface 579 SNACK Selective Negative Acknowledgment - also 581 Sequence Number Acknowledgement for data 583 STag Steering Tag 585 SW Session Wide 587 Ko et al. Expires May 2007 14 588 TCA Target Channel Adapter 590 TCP Transmission Control Protocol 592 TMF Task Management Function 594 TTT Target Transfer Tag 596 TO Tagged Offset 598 ULP Upper Level Protocol 600 1.3 Conventions 602 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 603 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 604 document are to be interpreted as described in RFC 2119. 606 Ko et al. Expires May 2007 15 607 2 Introduction 609 2.1 Motivation 611 The iSCSI protocol ([RFC3720]) is a mapping of the SCSI Architecture 612 Model (see [SAM2]) over the TCP protocol. SCSI commands are carried 613 by iSCSI requests and SCSI responses and status are carried by iSCSI 614 responses. Other iSCSI protocol exchanges and SCSI Data are also 615 transported in iSCSI PDUs. 617 Out-of-order TCP segments in the Traditional iSCSI model have to be 618 stored and reassembled before the iSCSI protocol layer within an end 619 node can place the data in the iSCSI buffers. This reassembly is 620 required because not every TCP segment is likely to contain an iSCSI 621 header to enable its placement and TCP itself does not have a built- 622 in mechanism for signaling ULP message boundaries to aid placement 623 of out-of-order segments. This TCP reassembly at high network 624 speeds is quite counter-productive for the following reasons: wasted 625 memory bandwidth in data copying, need for reassembly memory, wasted 626 CPU cycles in data copying, and the general store-and-forward 627 latency from an application perspective. TCP reassembly was 628 recognized as a serious issue in [RFC3720], and the notion of a 629 "sync and steering layer" was introduced that is optional to 630 implement and use. One specific sync and steering mechanism, called 631 "markers", was defined in [RFC3720] which provides an application- 632 level way of framing iSCSI PDUs within the TCP data stream even when 633 the TCP segments are not yet reassembled to be in-order. 635 With these defined techniques in [RFC3720], a Network Interface 636 Controller customized for iSCSI (SNIC) could offload the TCP/IP 637 processing and support direct data placement, but most iSCSI 638 implementations do not support iSCSI "markers", making SNIC 639 marker-based direct data placement unusable in practice. 641 The iWARP protocol stack provides direct data placement 642 functionality that is usable in practice, and in addition, there is 643 also interest in using iSCSI with other RDMA protocol stacks that 644 support direct data placement, such as the one provided by 645 InfiniBand. The generic term RDMA-Capable Protocol (RCaP) is used 646 to refer to the RDMA functionality provided by such protocol stacks. 648 With the availability of RDMA-Capable Controllers within a host 649 system, which does not have SNICs, it is appropriate for iSCSI to be 650 able to exploit the direct data placement function of the RDMA- 651 Capable Controller like other applications. 653 Ko et al. Expires May 2007 16 654 iSCSI Extensions for RDMA (iSER) is designed precisely to take 655 advantage of generic RDMA technologies - iSER's goal is to permit 656 iSCSI to employ direct data placement and RDMA capabilities using a 657 generic RDMA-Capable Controller. In summary, iSCSI/iSER protocol 658 stack is designed to enable scaling to high speeds by relying on a 659 generic data placement process and RDMA technologies and products, 660 which enable direct data placement of both in-order and out-of-order 661 data. 663 This document describes iSER as a protocol extension to iSCSI, both 664 for convenience of description and also because it is true in a very 665 strict protocol sense. However, it is to be noted that iSER is in 666 reality extending the connectivity of the iSCSI protocol defined in 667 [RFC3720], and the name iSER reflects this reality. 669 When the iSCSI protocol as defined in [RFC3720] (i.e. without the 670 iSER enhancements) is intended in the rest of the document, the term 671 "Traditional iSCSI" is used to make the intention clear. 673 2.2 Architectural Goals 675 This section summarizes the architectural goals that guided the 676 design of iSER. 678 1. Provide an RDMA data transfer model for iSCSI that enables direct 679 in order or out of order data placement of SCSI data into pre- 680 allocated SCSI buffers while maintaining in order data delivery. 682 2. Not require any major changes to SCSI Architecture Model [SAM2] 683 and SCSI command set standards. 685 3. Utilize existing iSCSI infrastructure (sometimes referred to as 686 "iSCSI ecosystem") including but not limited to MIB, 687 bootstrapping, negotiation, naming & discovery, and security. 689 4. Require a session to operate in the Traditional iSCSI data 690 transfer mode if iSER is not supported by either the initiator or 691 the target (not require iSCSI full feature phase interoperability 692 between an end node operating in Traditional iSCSI mode, and an 693 end node operating in iSER-assisted mode). 695 5. Allow initiator and target implementations to utilize generic 696 RDMA-Capable Controllers such as RNICs, or implement iSCSI and 697 iSER in software (not require iSCSI or iSER specific assists in 698 the RCaP implementation or RDMA-Capable Controller). 700 Ko et al. Expires May 2007 17 701 6. Require full and only generic RCaP functionality at both the 702 initiator and the target. 704 7. Implement a light weight Datamover protocol for iSCSI with minimal 705 state maintenance. 707 2.3 Protocol Overview 709 Consistent with the architectural goals stated in section 2.2, the 710 iSER protocol does not require changes in the iSCSI ecosystem or any 711 related SCSI specifications. iSER protocol defines the mapping of 712 iSCSI PDUs to RCaP Messages in such a way that it is entirely 713 feasible to realize iSCSI/iSER implementations that are based on 714 generic RDMA-Capable Controllers. The iSER protocol layer requires 715 minimal state maintenance to assist an iSCSI full feature phase 716 connection, besides being oblivious to the notion of an iSCSI 717 session. The crucial protocol aspects of iSER may be summarized 718 thus: 720 1. iSER-assisted mode is negotiated during the iSCSI login for each 721 session, and an entire iSCSI session can only operate in one mode 722 (i.e. a connection in a session cannot operate in iSER-assisted 723 mode if a different connection of the same session is already in 724 full feature phase in the Traditional iSCSI mode). 726 2. Once in iSER-assisted mode, all iSCSI interactions on that 727 connection use RCaP Messages. 729 3. A Send Message Type is used for carrying an iSCSI control-type 730 PDU preceded by an iSER header. See section 7.2 for more details 731 on iSCSI control-type PDUs. 733 4. RDMA Write, RDMA Read Request, and RDMA Read Response Messages 734 are used for carrying control and all data information associated 735 with the iSCSI data-type PDUs. See section 7.1 for more details 736 on iSCSI data-type PDUs. 738 5. Target drives all data transfer (with the exception of iSCSI 739 unsolicited data) for SCSI writes and SCSI reads, by issuing RDMA 740 Read Requests and RDMA Writes respectively. 742 6. RCaP is responsible for ensuring data integrity. (For example, 743 iWARP includes a CRC-enhanced framing layer called MPA on top of 744 TCP; and for Infiniband, the CRCs are included in the Reliable 745 Connection mode). For this reason, iSCSI header and data digests 746 are negotiated to "None" for iSCSI/iSER sessions. 748 Ko et al. Expires May 2007 18 749 7. The iSCSI error recovery hierarchy defined in [RFC3720] is fully 750 supported by iSER. (However, see section 7.3.11 on the handling 751 of SNACK Request PDUs.) 753 8. iSER requires no changes to iSCSI authentication, security, and 754 text mode negotiation mechanisms. 756 Note that Traditional iSCSI implementations may have to be adapted 757 to employ iSER. It is expected that the adaptation when required is 758 likely to be centered around the upper layer interface requirements 759 of iSER (section 3). 761 2.4 RDMA services and iSER 763 iSER is designed to work with software and/or hardware protocol 764 stacks providing the protocol services defined in RCaP documents 765 such as [RDMAP], [IB], etc. The following subsections describe the 766 key protocol elements of RCaP services that iSER relies on. 768 2.4.1 STag 770 An STag is the identifier of an I/O Buffer unique to an RDMA-Capable 771 Controller that the iSER Layer Advertises to the remote iSCSI/iSER 772 node in order to complete a SCSI I/O. 774 In iSER, Advertisement is the act of informing the target by the 775 initiator that an I/O Buffer is available at the initiator for RDMA 776 Read or RDMA Write access by the target. The initiator Advertises 777 the I/O Buffer by including the STag in the header of an iSER 778 Message containing the SCSI Command PDU to the target. The base 779 Tagged Offset is not explicitly specified, but the target must 780 always assume it as zero. The buffer length is as specified in the 781 SCSI Command PDU. 783 The iSER Layer at the initiator Advertises the STag for the I/O 784 Buffer of each SCSI I/O to the iSER Layer at the target in the iSER 785 header of the SendSE Message containing the SCSI Command PDU, unless 786 the I/O can be completely satisfied by unsolicited data alone. 788 The iSER Layer at the target provides the STag for the I/O Buffer 789 that is the Data Sink of an RDMA Read Operation (section 2.4.4) to 790 the RCaP layer on the initiator node - i.e. this is completely 791 transparent to the iSER Layer at the initiator. 793 The iSER protocol is defined so that the Advertised STag is 794 automatically invalidated upon a normal completion of the associated 795 task. This automatic invalidation is realized via the SendInvSE 797 Ko et al. Expires May 2007 19 798 Message carrying the SCSI Response PDU. There are two exceptions to 799 this automatic invalidation - bidirectional commands, and abnormal 800 completion of a command. The iSER Layer at the initiator is 801 required to explicitly invalidate the STag in these cases, in 802 addition to sanity checking the automatic invalidation even when 803 that does happen. 805 2.4.2 Send 807 Send is the RDMA Operation that is not addressed to an Advertised 808 buffer by the sending side, and thus uses Untagged buffers on the 809 receiving side. 811 The iSER Layer at the initiator uses the Send Operation to transmit 812 any iSCSI control-type PDU to the target. As an example, the 813 initiator uses Send Operations to transfer iSER Messages containing 814 SCSI Command PDUs to the iSER Layer at the target. 816 An iSER layer at the target uses the Send Operation to transmit any 817 iSCSI control-type PDU to the initiator. As an example, the target 818 uses Send Operations to transfer iSER Messages containing SCSI 819 Response PDUs to the iSER Layer at the initiator. 821 2.4.3 RDMA Write 823 RDMA Write is the RDMA Operation that is used to place data into an 824 Advertised buffer on the receiving side. The sending side addresses 825 the Message using an STag and a Tagged Offset that are valid on the 826 Data Sink. 828 The iSER Layer at the target uses the RDMA Write Operation to 829 transfer the contents of a local I/O Buffer to an Advertised I/O 830 Buffer at the initiator. The iSER Layer at the target uses the RDMA 831 Write to transfer whole or part of the data required to complete a 832 SCSI Read command. 834 The iSER Layer at the initiator does not employ RDMA Writes. 836 2.4.4 RDMA Read 838 RDMA Read is the RDMA Operation that is used to retrieve data from 839 an Advertised buffer on a remote node. The sending side of the RDMA 840 Read Request addresses the Message using an STag and a Tagged Offset 841 that are valid on the Data Source in addition to providing a valid 842 local STag and Tagged Offset that identify the Data Sink. 844 Ko et al. Expires May 2007 20 845 The iSER Layer at the target uses the RDMA Read Operation to 846 transfer the contents of an Advertised I/O Buffer at the initiator 847 to a local I/O Buffer at the target. The iSER Layer at the target 848 uses the RDMA Read to fetch whole or part of the data required to 849 complete a SCSI Write Command. 851 The iSER Layer at the initiator does not employ RDMA Reads. 853 2.5 SCSI Read Overview 855 The iSER Layer at the initiator receives the SCSI Command PDU from 856 the iSCSI Layer. The iSER Layer at the initiator generates an STag 857 for the I/O Buffer of the SCSI Read and Advertises the buffer by 858 including the STag as part of the iSER header for the PDU. The iSER 859 Message is transferred to the target using a SendSE Message. 861 The iSER Layer at the target uses one or more RDMA Writes to 862 transfer the data required to complete the SCSI Read. 864 The iSER Layer at the target uses a SendInvSE Message to transfer 865 the SCSI Response PDU back to the iSER Layer at the initiator. The 866 iSER Layer at the initiator notifies the iSCSI Layer of the 867 availability of the SCSI Response PDU. 869 2.6 SCSI Write Overview 871 The iSER Layer at the initiator receives the SCSI Command PDU from 872 the iSCSI Layer. If solicited data transfer is involved, the iSER 873 Layer at the initiator generates an STag for the I/O Buffer of the 874 SCSI Write and Advertises the buffer by including the STag as part 875 of the iSER header for the PDU. The iSER Message is transferred to 876 the target using a SendSE Message. 878 The iSER Layer at the initiator may optionally send one or more non- 879 immediate unsolicited data PDUs to the target using Send Message 880 Types. 882 If solicited data transfer is involved, the iSER Layer at the target 883 uses one or more RDMA Reads to transfer the data required to 884 complete the SCSI Write. 886 The iSER Layer at the target uses a SendInvSE Message to transfer 887 the SCSI Response PDU back to the iSER Layer at the initiator. The 888 iSER Layer at the initiator notifies the iSCSI Layer of the 889 availability of the SCSI Response PDU. 891 Ko et al. Expires May 2007 21 892 2.7 iSCSI/iSER Layering 894 iSCSI Extensions for RDMA (iSER) is layered between the iSCSI layer 895 and the RCaP layer. Note that the RCaP layer may be composed of one 896 or more distinct protocol layers depending on the specifics of the 897 RCaP. Figure 1 shows an example of the relationship between SCSI, 898 iSCSI, iSER, and the different RCaP layers. For TCP, the RCaP is 899 iWARP. For Infiniband, the RCaP is the Reliable Connected Transport 900 Service. Note that the iSCSI layer as described here supports the 901 RDMA Extensions as used in iSER. 903 +-------------------------------------+ 904 | SCSI | 905 +-------------------------------------+ 906 | iSCSI | 907 DI ------> +-------------------------------------+ 908 | iSER | 909 +---------+--------------+------------+ 910 | RDMAP | | | 911 +---------+ Infiniband | | 912 | DDP | Reliable | Other | 913 +---------+ Connected | RDMA- | 914 | MPA | Transport | Capable | 915 +---------+ Service | Protocol | 916 | TCP | | | 917 +---------+--------------+------------+ 918 | | Infiniband | Other | 919 | IP | Network | Network | 920 | | Layer | Layer | 921 +---------+--------------+------------+ 923 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase 925 Ko et al. Expires May 2007 22 926 3 Upper Layer Interface Requirements 928 This section discusses the upper layer interface requirements in the 929 form of an abstract model of the required interactions between the 930 iSCSI Layer and the iSER Layer. The abstract model used here is 931 derived from the architectural model described in [DA]. [DA] also 932 provides a functional overview of the interactions between the iSCSI 933 Layer and the datamover layer as intended by the Datamover 934 Architecture. 936 The interface requirements are specified by Operational Primitives. 937 An Operational Primitive is an abstract functional interface 938 procedure between the iSCSI Layer and the iSER Layer that requests 939 one layer to perform a specific action on behalf of the other layer 940 or notifies the other layer of some event. Whenever an Operational 941 Primitive in invoked, the Connection_Handle qualifier is used to 942 identify a particular iSCSI connection. For some Operational 943 Primitives, a Data_Descriptor is used to identify the iSCSI/SCSI 944 data buffer associated with the requested or completed operation. 946 The abstract model and the Operational Primitives defined in this 947 section facilitate the description of the iSER protocol. In the 948 rest of the iSER specification, the compliance statements related to 949 the use of these Operational Primitives are only for the purpose of 950 the required interactions between the iSCSI Layer and the iSER 951 Layer. Note that the compliance statements related to the 952 Operational Primitives in the rest of this specification only 953 mandate functional equivalence on implementations, but do not put 954 any requirements on the implementation specifics of the interface 955 between the iSCSI Layer and the iSER Layer. 957 Each Operational Primitive is invoked with a set of qualifiers which 958 specify the information context for performing the specific action 959 being requested of the Operational Primitive. While the qualifiers 960 are required, the method of realizing the qualifiers (e.g., by 961 passing synchronously with invocation, or by retrieving from task 962 context, or by retrieving from shared memory, etc.) is 963 implementation dependent. 965 3.1 Operational Primitives offered by iSER 967 The iSER protocol layer MUST support the following Operational 968 Primitives to be used by the iSCSI protocol layer. 970 Ko et al. Expires May 2007 23 971 3.1.1 Send_Control 973 Input qualifiers: Connection_Handle, BHS and AHS (if any) of 974 the iSCSI PDU, PDU-specific qualifiers 976 Return results: Not specified 978 This is used by the iSCSI Layers at the initiator and the target to 979 request the outbound transfer of an iSCSI control-type PDU (see 980 section 7.2). Qualifiers that only apply for a particular control- 981 type PDU are known as PDU-specific qualifiers, e.g., 982 ImmediateDataSize for a SCSI Write command. For details on PDU- 983 specific qualifiers, see section 7.3. The iSCSI Layer can only 984 invoke the Send_Control Operational Primitive when the connection is 985 in iSER-assisted mode. 987 3.1.2 Put_Data 989 Input qualifiers: Connection_Handle, content of a SCSI Data-in 990 PDU header, Data_Descriptor, Notify_Enable 992 Return results: Not specified 994 This is used by the iSCSI Layer at the target to request the 995 outbound transfer of data for a SCSI Data-in PDU from the buffer 996 identified by the Data_Descriptor qualifier. The iSCSI Layer can 997 only invoke the Put_Data Operational Primitive when the connection 998 is in iSER-assisted mode. 1000 The Notify_Enable qualifier is used to indicate to the iSER Layer 1001 whether or not it should generate an eventual local completion 1002 notification to the iSCSI Layer. See section 3.2.2 on 1003 Data_Completion_Notify for details. 1005 3.1.3 Get_Data 1007 Input qualifiers: Connection_Handle, content of an R2T PDU, 1008 Data_Descriptor, Notify_Enable 1010 Return results: Not specified 1012 This is used by the iSCSI Layer at the target to request the inbound 1013 transfer of solicited data requested by an R2T PDU into the buffer 1014 identified by the Data_Descriptor qualifier. The iSCSI Layer can 1015 only invoke the Get_Data Operational Primitive when the connection 1016 is in iSER-assisted mode. 1018 Ko et al. Expires May 2007 24 1019 The Notify_Enable qualifier is used to indicate to the iSER Layer 1020 whether or not it should generate the eventual local completion 1021 notification to the iSCSI Layer. See section 3.2.2 on 1022 Data_Completion_Notify for details. 1024 3.1.4 Allocate_Connection_Resources 1026 Input qualifiers: Connection_Handle, Resource_Descriptor 1027 (optional) 1029 Return results: Status 1031 This is used by the iSCSI Layers at the initiator and the target to 1032 request the allocation of all connection resources necessary to 1033 support RCaP for an operational iSCSI/iSER connection. The iSCSI 1034 Layer may optionally specify the implementation-specific resource 1035 requirements for the iSCSI connection using the Resource_Descriptor 1036 qualifier. 1038 A return result of Status=success means the invocation succeeded, 1039 and a return result of Status=failure means that the invocation 1040 failed. If the invocation is for a Connection_Handle for which an 1041 earlier invocation succeeded, the request will be ignored by the 1042 iSER Layer and the result of Status=success will be returned. Only 1043 one Allocate_Connection_Resources Operational Primitive invocation 1044 can be outstanding for a given Connection_Handle at any time. 1046 3.1.5 Deallocate_Connection_Resources 1048 Input qualifiers: Connection_Handle 1050 Return results: Not specified 1052 This is used by the iSCSI Layers at the initiator and the target to 1053 request the deallocation of all connection resources that were 1054 allocated earlier as a result of a successful invocation of the 1055 Allocate_Connection_Resources Operational Primitive. 1057 3.1.6 Enable_Datamover 1059 Input qualifiers: Connection_Handle, 1060 Transport_Connection_Descriptor, Final Login_Response_PDU 1061 (optional) 1063 Return results: Not specified 1065 Ko et al. Expires May 2007 25 1066 This is used by the iSCSI Layers at the initiator and the target to 1067 request that a specified iSCSI connection be transitioned to iSER- 1068 assisted mode. The Transport_Connection_Descriptor qualifier is 1069 used to identify the specific connection associated with the 1070 Connection_Handle. The iSCSI layer can only invoke the 1071 Enable_Datamover Operational Primitive when there was a 1072 corresponding prior resource allocation. 1074 The Final_Login_Response_PDU input qualifier is applicable only for 1075 a target, and contains the final Login Response PDU that concludes 1076 the iSCSI Login Phase. If the underlying transport is TCP, the 1077 final Login Response PDU must be sent as a byte stream as expected 1078 by the iSCSI Layer at the initiator. When this qualifier is used, 1079 the iSER Layer at the target MUST transmit this final Login Response 1080 PDU before transitioning to iSER-assisted mode. 1082 3.1.7 Connection_Terminate 1084 Input qualifiers: Connection_Handle 1086 Return results: Not specified 1088 This is used by the iSCSI Layers at the initiator and the target to 1089 request that a specified iSCSI/iSER connection be terminated and all 1090 associated connection and task resources be freed. When this 1091 Operational Primitive invocation returns to the iSCSI layer, the 1092 iSCSI layer may assume full ownership of all iSCSI-level resources, 1093 e.g. I/O Buffers, associated with the connection. 1095 3.1.8 Notice_Key_Values 1097 Input qualifiers: Connection_Handle, number of keys, list of 1098 Key-Value pairs 1100 Return results: Not specified 1102 This is used by the iSCSI Layers at the initiator and the target to 1103 request the iSER Layer to take note of the specified Key-Value pairs 1104 which were negotiated by the iSCSI peers for the connection. 1106 3.1.9 Deallocate_Task_Resources 1108 Input qualifiers: Connection_Handle, ITT 1110 Return results: Not specified 1112 Ko et al. Expires May 2007 26 1113 This is used by the iSCSI Layers at the initiator and the target to 1114 request the deallocation of all RCaP-specific resources allocated by 1115 the iSER Layer for the task identified by the ITT qualifier. The 1116 iSER Layer may require a certain number of RCaP-specific resources 1117 associated with the ITT for each new iSCSI task. In the normal 1118 course of execution, these task-level resources in the iSER Layer 1119 are assumed to be transparently allocated on each task initiation 1120 and deallocated on the conclusion of each task as appropriate. In 1121 exception scenarios where the task does not conclude with a SCSI 1122 Response PDU, the iSER Layer needs to be notified of the individual 1123 task terminations to aid its task-level resource management. This 1124 Operational Primitive is used for this purpose, and is not needed 1125 when a SCSI Response PDU normally concludes a task. Note that RCaP- 1126 specific task resources are deallocated by the iSER Layer when a 1127 SCSI Response PDU normally concludes a task, even if the SCSI Status 1128 was not success. 1130 3.2 Operational Primitives used by iSER 1132 The iSER layer MUST use the following Operational Primitives offered 1133 by the iSCSI protocol layer when the connection is in iSER-assisted 1134 mode. 1136 3.2.1 Control_Notify 1138 Input qualifiers: Connection_Handle, an iSCSI control-type PDU 1140 Return results: Not specified 1142 This is used by the iSER Layers at the initiator and the target to 1143 notify the iSCSI Layer of the availability of an inbound iSCSI 1144 control-type PDU. A PDU is described as "available" to the iSCSI 1145 Layer when the iSER Layer notifies the iSCSI Layer of the reception 1146 of that inbound PDU, along with an implementation-specific 1147 indication as to where the received PDU is. 1149 3.2.2 Data_Completion_Notify 1151 Input qualifiers: Connection_Handle, ITT, SN 1153 Return results: Not specified 1155 This is used by the iSER Layer to notify the iSCSI Layer of the 1156 completion of outbound data transfer that was requested by the iSCSI 1157 Layer only if the invocation of the Put_Data Operational Primitive 1158 (see section 3.1.2) was qualified with Notify_Enable set. SN refers 1159 to the DataSN associated with the SCSI Data-In PDU. 1161 Ko et al. Expires May 2007 27 1162 This is used by the iSER Layer to notify the iSCSI Layer of the 1163 completion of inbound data transfer that was requested by the iSCSI 1164 Layer only if the invocation of the Get_Data Operational Primitive 1165 (see section 3.1.3) was qualified with Notify_Enable set. SN refers 1166 to the R2TSN associated with the R2T PDU. 1168 3.2.3 Data_ACK_Notify 1170 Input qualifier: Connection_Handle, ITT, DataSN 1172 Return results: Not specified 1174 This is used by the iSER Layer at the target to notify the iSCSI 1175 Layer of the arrival of the data acknowledgement (as defined in 1176 [RFC3720]) requested earlier by the iSCSI Layer for the outbound 1177 data transfer via an invocation of the Put_Data Operational 1178 Primitive where the A-bit in the SCSI Data-in PDU is set to 1. See 1179 section 7.3.5. DataSN refers to the expected DataSN of the next 1180 SCSI Data-in PDU which immediately follows the SCSI Data-in PDU with 1181 the A-bit set to which this notification corresponds, with semantics 1182 as defined in [RFC3720]. 1184 3.2.4 Connection_Terminate_Notify 1186 Input qualifiers: Connection_Handle 1188 Return results: Not specified 1190 This is used by the iSER Layers at the initiator and the target to 1191 notify the iSCSI Layer of the unsolicited termination or failure of 1192 an iSCSI/iSER connection. The iSER Layer MUST deallocate the 1193 connection and task resources associated with the terminated 1194 connection before the invocation of this Operational Primitive. 1195 Note that the Connection_Terminate_Notify Operational Primitive is 1196 not invoked when the termination of the connection was earlier 1197 requested by the local iSCSI Layer. 1199 3.3 iSCSI Protocol Usage Requirements 1201 To operate in an iSER-assisted mode, the iSCSI Layers at both the 1202 initiator and the target MUST negotiate the RDMAExtensions key (see 1203 section 6.3) to "Yes" on the leading connection. If the 1204 RDMAExtensions key is not negotiated to "Yes", then iSER-assisted 1205 mode MUST NOT be used. If the RDMAExtensons key is negotiated to 1206 "Yes" but the invocation of the Allocate_Connection_Resources 1207 Operational Primitive to the iSER layer fails, the iSCSI layer MUST 1209 Ko et al. Expires May 2007 28 1210 fail the iSCSI Login process or terminate the connection as 1211 appropriate. See section 10.1.3.1 for details. 1213 If the RDMAExtensions key is negotiated to "Yes", the iSCSI Layer 1214 MUST satisfy the following protocol usage requirements from the iSER 1215 protocol: 1217 1. The iSCSI Layer at the initiator MUST set ExpDataSN to 0 in Task 1218 Management Function Requests for Task Allegiance Reassignment 1219 for read/bidirectional commands, so as to cause the target to 1220 send all unacknowledged read data. 1222 2. The iSCSI Layer at the target MUST always return the SCSI status 1223 in a separate SCSI Response PDU for read commands, i.e., there 1224 MUST NOT be a "phase collapse" in concluding a SCSI Read 1225 Command. 1227 3. The iSCSI Layers at both the initiator and the target MUST 1228 support the keys as defined in section 6 on Login/Text 1229 Operational Keys. If used as specified, these keys MUST NOT be 1230 answered with NotUnderstood and the semantics as defined MUST be 1231 followed for each iSER-assisted connection. 1233 4. The iSCSI Layer at the initiator MUST NOT issue SNACKs for PDUs. 1235 Ko et al. Expires May 2007 29 1236 4 Lower Layer Interface Requirements 1238 4.1 Interactions with the RCaP Layer 1240 The iSER protocol layer is layered on top of an RCaP layer (see 1241 Figure 1) and the following are the key features that are assumed to 1242 be supported by any RCaP layer: 1244 * The RCaP layer supports all basic RDMA operations, including RDMA 1245 Write Operation, RDMA Read Operation, Send Operation, Send with 1246 Invalidate Operation, Send with Solicited Event Operation, Send 1247 with Solicited Event & Invalidate Operation, and Terminate 1248 Operation. 1250 * The RCaP layer provides reliable, in-order message delivery and 1251 direct data placement. 1253 * When the iSER Layer initiates an RDMA Read Operation following an 1254 RDMA Write Operation on one RCaP Stream, the RDMA Read Response 1255 Message processing on the remote node will be started only after 1256 the preceding RDMA Write Message payload is placed in the memory 1257 of the remote node. 1259 * The RCaP layer encapsulates a single iSER Message into a single 1260 RCaP Message on the Data Source side. The RCaP layer 1261 decapsulates the iSER Message before delivering it to the iSER 1262 Layer on the Data Sink side. 1264 * When the iSER Layer provides the STag to be remotely invalidated 1265 to the RCaP layer for a SendInvSE Message, the RCaP layer uses 1266 this STag as the STag to be invalidated in the SendInvSE Message. 1268 * The RCaP layer uses the STag and Tagged Offset provided by the 1269 iSER Layer for the RDMA Write and RDMA Read Request Messages. 1271 * When the RCaP layer delivers the content of an RDMA Send Message 1272 Type to the iSER Layer, the RCaP layer provides the length of the 1273 RDMA Send message. This ensures that the iSER Layer does not 1274 have to carry a length field in the iSER header. 1276 * When the RCaP layer delivers the SendSE or SendInvSE Message to 1277 the iSER Layer, it notifies the iSER Layer with the mechanism 1278 provided on that interface. 1280 * When the RCaP layer delivers a SendInvSE Message to the iSER 1281 Layer, it passes the value of the STag that was invalidated. 1283 Ko et al. Expires May 2007 30 1284 * The RCaP layer propagates all status and error indications to the 1285 iSER Layer. 1287 * For a transport layer that operates in byte stream mode such as 1288 TCP, the RCaP implementation supports the enabling of the RDMA 1289 mode after Connection establishment and the exchange of Login 1290 parameters in byte stream mode. For a transport layer that 1291 provides message delivery capability such as [IB], the RCaP 1292 implementation supports the use of the messaging capability by 1293 the iSCSI Layer directly for the Login phase after connection 1294 establishment before enabling iSER-assisted mode. 1296 * Whenever the iSER Layer terminates the RCaP Stream, the RCaP 1297 layer terminates the associated Connection. 1299 4.2 Interactions with the Transport Layer 1301 The iSER Layer does not directly setup the transport layer 1302 connection (e.g., TCP, or [IB]). During Connection setup, the iSCSI 1303 Layer is responsible for setting up the Connection. If the login is 1304 successful, the iSCSI Layer invokes the Enable_Datamover Operational 1305 Primitive to request the iSER Layer to transition to the iSER- 1306 assisted mode for that iSCSI connection. See section 5.1 on 1307 iSCSI/iSER Connection Setup. After transitioning to iSER-assisted 1308 mode, the RCaP layer and the underlying transport layer are 1309 responsible for maintaining the Connection and reporting to the iSER 1310 Layer any Connection failures. 1312 Ko et al. Expires May 2007 31 1313 5 Connection Setup and Termination 1315 5.1 iSCSI/iSER Connection Setup 1317 During connection setup, the iSCSI Layer at the initiator is 1318 responsible for establishing a connection with the target. After 1319 the connection is established, the iSCSI Layers at the initiator and 1320 the target enter the Login Phase using the same rules as outlined in 1321 [RFC3720]. Transition to iSER-assisted mode occurs when the 1322 connection transitions into the iSCSI full feature phase following a 1323 successful login negotiation between the initiator and the target in 1324 which iSER-assisted mode is negotiated and the connection resources 1325 necessary to support RCaP have been allocated at both the initiator 1326 and the target. The same connection MUST be used for both the iSCSI 1327 Login phase and the subsequent iSER-assisted full feature phase. 1329 iSER-assisted mode MUST be enabled only if it is negotiated on the 1330 leading connection during the LoginOperationalNegotiation Stage of 1331 the iSCSI Login Phase. iSER-assisted mode is negotiated using the 1332 RDMAExtensions= key. Both the initiator and the 1333 target MUST exchange the RDMAExtensions key with the value set to 1334 "Yes" to enable iSER-assisted mode. If both the initiator and the 1335 target fail to negotiate the RDMAExtensions key set to "Yes", then 1336 the connection MUST continue with the login semantics as defined in 1337 [RFC3720]. If the RDMAExtensions key is not negotiated to Yes, then 1338 for some RCaP implementation (such as [IB]), the connection may need 1339 to be re-established in TCP capable mode. (For InfiniBand this will 1340 require an [IPoIB] type connection.) 1342 iSER-assisted mode is defined for a Normal session only and the 1343 RDMAExtensions key MUST NOT be negotiated for a Discovery session. 1344 Discovery sessions are always conducted using the transport layer as 1345 described in [RFC3720]. 1347 An iSER enabled node is not required to initiate the RDMAExtensions 1348 key exchange if its preference is for the Traditional iSCSI mode. 1349 The RDMAExtensions key, if offered, MUST be sent in the first 1350 available Login Response or Login Request PDU in the 1351 LoginOperationalNegotiation stage. This is due to the fact that the 1352 value of some login parameters might depend on whether iSER-assisted 1353 mode is enabled or not. 1355 iSER-assisted mode is a session-wide attribute. If both the 1356 initiator and the target negotiated RDMAExtensions="Yes" on the 1357 leading connection of a session, then all subsequent connections of 1358 the same session MUST enable iSER-assisted mode without having to 1359 exchange RDMAExtensions key during the iSCSI Login Phase. 1361 Ko et al. Expires May 2007 32 1362 Conversely, if both the initiator and the target failed to negotiate 1363 RDMAExtensions to "Yes" on the leading connection of a session, then 1364 the RDMAExtensions key MUST NOT be negotiated further on any 1365 additional subsequent connection of the session. 1367 When the RDMAExtensions key is negotiated to "Yes", the HeaderDigest 1368 and the DataDigest keys MUST be negotiated to "None" on all 1369 iSCSI/iSER connections participating in that iSCSI session. This is 1370 because, for an iSCSI/iSER connection, RCaP is responsible for 1371 providing error detection that is at least as good as a 32-bit CRC 1372 for all iSER Messages. Furthermore, all SCSI Read data are sent 1373 using RDMA Write Messages instead of the SCSI Data-in PDUs, and all 1374 solicited SCSI write data are sent using RDMA Read Response Messages 1375 instead of the SCSI Data-out PDUs. HeaderDigest and DataDigest 1376 which apply to iSCSI PDUs would not be appropriate for RDMA Read and 1377 RDMA Write operations used with iSER. 1379 5.1.1 Initiator Behavior 1381 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1382 mode, then on the initiator side, prior to sending the Login Request 1383 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1384 to FullFeaturePhase, the iSCSI Layer MUST request the iSER Layer to 1385 allocate the connection resources necessary to support RCaP by 1386 invoking the Allocate_Connection_Resources Operational Primitive. 1387 The connection resources required are defined by implementation and 1388 are outside the scope of this specification. The iSCSI Layer may 1389 invoke the Notice_Key_Values Operational Primitive before invoking 1390 the Allocate_Connection_Resources Operational Primitive to request 1391 the iSER Layer to take note of the negotiated values of the iSCSI 1392 keys for the Connection. The specific keys to be passed in as input 1393 qualifiers are implementation dependent. These may include, but not 1394 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1396 To minimize the potential for a denial of service attack, the iSCSI 1397 Layer MUST NOT request the iSER Layer to allocate the connection 1398 resources necessary to support RCaP until the iSCSI layer is 1399 sufficiently far along in the iSCSI Login Phase that it is 1400 reasonably certain that the peer side is not an attacker. In 1401 particular, if the Login Phase includes a SecurityNegotiation stage, 1402 the iSCSI Layer MUST defer the connection resource allocation (i.e. 1403 invoking the Allocate_Connection_Resources Operational Primitive) to 1404 the LoginOperationalNegotiation stage ([RFC3720]) so that the 1405 resource allocation occurs after the authentication phase is 1406 completed. 1408 Ko et al. Expires May 2007 33 1409 Among the connection resources allocated at the initiator is the 1410 Inbound RDMA Read Queue Depth (IRD). As described in section 9.5.1, 1411 R2Ts are transformed by the target into RDMA Read operations. IRD 1412 limits the maximum number of simultaneously incoming outstanding 1413 RDMA Read Requests per an RCaP Stream from the target to the 1414 initiator. The required value of IRD is outside the scope of the 1415 iSER specification. The iSER Layer at the initiator MUST set IRD to 1416 1 or higher if R2Ts are to be used in the connection. However, the 1417 iSER Layer at the initiator MAY set IRD to 0 based on implementation 1418 configuration which indicates that no R2Ts will be used on that 1419 connection. Initially, the iSER-IRD value at the initiator SHOULD 1420 be set to the IRD value at the initiator and MUST NOT be more than 1421 the IRD value. 1423 On the other hand, the Outbound RDMA Read Queue Depth (ORD) MAY be 1424 set to 0 since the iSER Layer at the initiator does not issue RDMA 1425 Read Requests to the target. 1427 Failure to allocate the requested connection resources locally 1428 results in a login failure and its handling is described in section 1429 10.1.3.1. 1431 If the iSER Layer at the initiator is successful in allocating the 1432 connection resources necessary to support RCaP, the following events 1433 MUST occur in the specified sequence: 1435 1. The iSER Layer MUST return a success status to the iSCSI Layer 1436 in response to the Allocate_Connection_Resources Operational 1437 Primitive. 1439 2. After the target returns the Login Response with the T bit set 1440 to 1 and the NSG field set to FullFeaturePhase, and a status 1441 class of 0 (Success), the iSCSI Layer MUST request the iSER 1442 Layer to transition to iSER-assisted mode by invoking the 1443 Enable_Datamover Operational Primitive with the following 1444 qualifiers. (See section 10.1.4.6 for the case when the status 1445 class is not Success.): 1447 a. Connection_Handle that identifies the iSCSI connection. 1449 b. Transport_Connection_Descriptor which identifies the 1450 specific transport connection associated with the 1451 Connection_Handle. 1453 3. If necessary, the iSER Layer should enable RCaP and transition 1454 the connection to iSER-assisted mode. When the RCaP is iWARP, 1456 Ko et al. Expires May 2007 34 1457 then this step MUST be done. Not all RCaPs may need it 1458 depending on the RCaP Stream start-up state. 1460 4. The iSER Layer MUST send the iSER Hello Message as the first 1461 iSER Message. See Section 5.1.3 on iSER Hello Exchange. 1463 5.1.2 Target Behavior 1465 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1466 mode, then on the target side, prior to sending the Login Response 1467 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1468 to FullFeaturePhase, the iSCSI Layer MUST request the iSER Layer to 1469 allocate the resources necessary to support RCaP by invoking the 1470 Allocate_Connection_Resources Operational Primitive. The connection 1471 resources required are defined by implementation and are outside the 1472 scope of this specification. Optionally, the iSCSI Layer may invoke 1473 the Notice_Key_Values Operational Primitive before invoking the 1474 Allocate_Connection_Resources Operational Primitive to request the 1475 iSER Layer to take note of the negotiated values of the iSCSI keys 1476 for the Connection. The specific keys to be passed in as input 1477 qualifiers are implementation dependent. These may include, but not 1478 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1480 To minimize the potential for a denial of service attack, the iSCSI 1481 Layer MUST NOT request the iSER Layer to allocate the connection 1482 resources necessary to support RCaP until the iSCSI layer is 1483 sufficiently far along in the iSCSI Login Phase that it is 1484 reasonably certain that the peer side is not an attacker. In 1485 particular, if the Login Phase includes a SecurityNegotiation stage, 1486 the iSCSI Layer MUST defer the connection resource allocation (i.e. 1487 invoking the Allocate_Connection_Resources Operational Primitive) to 1488 the LoginOperationalNegotiation stage ([RFC3720]) so that the 1489 resource allocation occurs after the authentication phase is 1490 completed. 1492 Among the connection resources allocated at the target is the 1493 Outbound RDMA Read Queue Depth (ORD). As described in section 1494 9.5.1, R2Ts are transformed by the target into RDMA Read operations. 1495 The ORD limits the maximum number of simultaneously outstanding RDMA 1496 Read Requests per RCaP Stream from the target to the initiator. 1497 Initially, the iSER-ORD value at the target SHOULD be set to the ORD 1498 value at the target. 1500 On the other hand, the IRD at the target MAY be set to 0 since the 1501 iSER Layer at the target does not expect RDMA Read Requests to be 1502 issued by the initiator. 1504 Ko et al. Expires May 2007 35 1505 Failure to allocate the requested connection resources locally 1506 results in a login failure and its handling is described in section 1507 10.1.3.1. 1509 If the iSER Layer at the target is successful in allocating the 1510 connection resources necessary to support RCaP, the following events 1511 MUST occur in the specified sequence: 1513 1. The iSER Layer MUST return a success status to the iSCSI Layer 1514 in response to the Allocate_Connection_Resources Operational 1515 Primitive. 1517 2. The iSCSI Layer MUST request the iSER Layer to transition to 1518 iSER-assisted mode by invoking the Enable_Datamover Operational 1519 Primitive with the following qualifiers: 1521 a. Connection_Handle that identifies the iSCSI connection. 1523 b. Transport_Connection_Descriptor which identifies the 1524 specific transport connection associated with the 1525 Connection_Handle. 1527 c. The final transport layer (e.g. TCP) message containing the 1528 Login Response with the T bit set to 1 and the NSG field set 1529 to FullFeaturePhase 1531 3. The iSER Layer MUST send the final Login Response PDU in the 1532 native transport mode to conclude the iSCSI Login Phase. If the 1533 underlying transport is TCP, then the iSER Layer MUST send the 1534 final Login Response PDU in byte stream mode. 1536 4. After sending the final Login Response PDU, the iSER Layer 1537 should enable RCaP if necessary and transition the connection to 1538 iSER-assisted mode. When the RCaP is iWARP, then this step MUST 1539 be done. Not all RCaPs may need it depending on the RCaP Stream 1540 start-up state. 1542 5. After receiving the iSER Hello Message from the initiator, the 1543 iSER Layer MUST respond with the iSER HelloReply Message to be 1544 sent as the first iSER Message. See section 5.1.3 on iSER Hello 1545 Exchange for more details. 1547 Note: In the above sequence, the operations as described in bullets 1548 3 and 4 MUST be performed atomically for iWARP connections. Failure 1549 to do this may result in race conditions. 1551 Ko et al. Expires May 2007 36 1552 5.1.3 iSER Hello Exchange 1554 After the connection transitions into the iSER-assisted mode, the 1555 first iSER Message sent by the iSER Layer at the initiator to the 1556 target MUST be the iSER Hello Message. The iSER Hello Message is 1557 used by the iSER Layer at the initiator to declare iSER parameters 1558 to the target. See section 9.3 on iSER Header Format for iSER Hello 1559 Message. 1561 In response to the iSER Hello Message, the iSER Layer at the target 1562 MUST return the iSER HelloReply Message as the first iSER Message 1563 sent by the target. The iSER HelloReply Message is used by the iSER 1564 Layer at the target to declare iSER parameters to the initiator. 1565 See section 9.4 on iSER Header Format for iSER HelloReply Message. 1567 In the iSER Hello Message, the iSER Layer at the initiator declares 1568 the iSER-IRD value to the target. 1570 Upon receiving the iSER Hello Message, the iSER Layer at the target 1571 MUST set the iSER-ORD value to the minimum of the iSER-ORD value at 1572 the target and the iSER-IRD value declared by the initiator. The 1573 iSER Layer at the target MAY adjust (lower) its ORD value to match 1574 the iSER-ORD value if the iSER-ORD value is smaller than the ORD 1575 value at the target in order to free up the unused resources. 1577 In the iSER HelloReply Message, the iSER Layer at the target 1578 declares the iSER-ORD value to the initiator. 1580 Upon receiving the iSER HelloReply Message, the iSER Layer at the 1581 initiator MAY adjust (lower) its IRD value to match the iSER-ORD 1582 value in order to free up the unused resources, if the iSER-ORD 1583 value declared by the target is smaller than the iSER-IRD value 1584 declared by the initiator. 1586 It is an iSER level negotiation failure if the iSER parameters 1587 declared in the iSER Hello Message by the initiator are unacceptable 1588 to the target. This includes the following: 1590 * The initiator-declared iSER-IRD value is greater than 0 and the 1591 target-declared iSER-ORD value is 0. 1593 * The initiator-supported and the target-supported iSER protocol 1594 versions do not overlap. 1596 See section 10.1.3.2 on the handling of the error situation. 1598 Ko et al. Expires May 2007 37 1599 5.2 iSCSI/iSER Connection Termination 1601 5.2.1 Normal Connection Termination at the Initiator 1603 The iSCSI Layer at the initiator terminates an iSCSI/iSER connection 1604 normally by invoking the Send_Control Operational Primitive 1605 qualified with the Logout Request PDU. The iSER Layer at the 1606 initiator MUST use a SendSE Message to send the Logout Request PDU 1607 to the target. After the iSER Layer at the initiator receives the 1608 SendSE Message containing the Logout Response PDU from the target, 1609 it MUST notify the iSCSI Layer by invoking the Control_Notify 1610 Operational Primitive qualified with the Logout Response PDU. 1612 After the iSCSI logout process is complete, the iSCSI layer at the 1613 target is responsible for closing the iSCSI/iSER connection as 1614 described in Section 5.2.2. After the RCaP layer at the initiator 1615 reports that the Connection has been closed, the iSER Layer at the 1616 initiator MUST deallocate all connection and task resources (if any) 1617 associated with the connection, invalidate the Local Mapping(s) (if 1618 any) that associate the ITT(s) used on that connection to the local 1619 STag(s) before notifying the iSCSI Layer by invoking the 1620 Connection_Terminate_Notify Operational Primitive. 1622 5.2.2 Normal Connection Termination at the Target 1624 Upon receiving the SendSE Message containing the Logout Request PDU, 1625 the iSER Layer at the target MUST notify the iSCSI Layer at the 1626 target by invoking the Control_Notify Operational Primitive 1627 qualified with the Logout Request PDU. The iSCSI Layer completes 1628 the logout process by invoking the Send_Control Operational 1629 Primitive qualified with the Logout Response PDU. The iSER Layer at 1630 the target MUST use a SendSE Message to send the Logout Response PDU 1631 to the initiator. After the iSCSI logout process is complete, the 1632 iSCSI Layer at the target MUST request the iSER Layer at the target 1633 to terminate the RCaP Stream by invoking the Connection_Terminate 1634 Operational Primitive. 1636 As part of the termination process, the RCaP layer MUST close the 1637 Connection. When the RCaP layer notifies the iSER Layer after the 1638 RCaP Stream and the associated Connection are terminated, the iSER 1639 Layer MUST deallocate all connection and task resources (if any) 1640 associated with the connection, and invalidate the Local and Remote 1641 Mapping(s) (if any) that associate the ITT(s) used on that 1642 connection to the local STag(s) and the Advertised STag(s) 1643 respectively. 1645 Ko et al. Expires May 2007 38 1646 5.2.3 Termination without Logout Request/Response PDUs 1648 5.2.3.1 Connection Termination Initiated by the iSCSI Layer 1650 The Connection_Terminate Operational Primitive MAY be invoked by the 1651 iSCSI Layer to request the iSER Layer to terminate the RCaP Stream 1652 without having previously exchanged the Logout Request and Logout 1653 Response PDUs between the two iSCSI/iSER nodes. As part of the 1654 termination process, the RCaP layer will close the Connection. When 1655 the RCaP layer notifies the iSER Layer after the RCaP Stream and the 1656 associated Connection are terminated, the iSER Layer MUST perform 1657 the following actions. 1659 If the Connection_Terminate Operational Primitive is invoked by the 1660 iSCSI Layer at the target, then the iSER Layer at the target MUST 1661 deallocate all connection and task resources (if any) associated 1662 with the connection, and invalidate the Local and Remote Mappings 1663 (if any) that associate the ITT(s) used on the connection to the 1664 local STag(s) and the Advertised STag(s) respectively. 1666 If the Connection_Terminate Operational Primitive is invoked by the 1667 iSCSI Layer at the initiator, then the iSER Layer at the initiator 1668 MUST deallocate all connection and task resources (if any) 1669 associated with the connection, and invalidate the Local Mapping(s) 1670 (if any) that associate the ITT(s) used on the connection to the 1671 local STag(s). 1673 5.2.3.2 Connection Termination Notification to the iSCSI Layer 1675 If the iSCSI/iSER connection is terminated without the invocation of 1676 Connection_Terminate from the iSCSI Layer, the iSER Layer MUST 1677 notify the iSCSI Layer that the iSCSI/iSER connection has been 1678 terminated by invoking the Connection_Terminate_Notify Operational 1679 Primitive. 1681 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1682 target MUST deallocate all connection and task resources (if any) 1683 associated with the connection, and invalidate the Local and Remote 1684 Mappings (if any) that associate the ITT(s) used on the connection 1685 to the local STag(s) and the Advertised STag(s) respectively. 1687 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1688 initiator MUST deallocate all connection and task resources (if any) 1689 associated with the connection, and invalidate the Local Mappings 1690 (if any) that associate the ITT(s) used on the connection to the 1691 local STag(s). 1693 Ko et al. Expires May 2007 39 1694 If the remote iSCSI/iSER node initiated the closing of the 1695 Connection (e.g., by sending a TCP FIN or TCP RST), the iSER Layer 1696 MUST notify the iSCSI Layer after the RCaP layer reports that the 1697 Connection is closed by invoking the Connection_Terminate_Notify 1698 Operational Primitive. 1700 Another example of a Connection termination without a preceding 1701 logout is when the iSCSI Layer at the initiator does an implicit 1702 logout (connection reinstatement). 1704 Ko et al. Expires May 2007 40 1705 6 Login/Text Operational Keys 1707 Certain iSCSI login/text operational keys have restricted usage in 1708 iSER, and additional keys are used to support the iSER protocol 1709 functionality. All other keys defined in [RFC3720] and not 1710 discussed in this section may be used on iSCSI/iSER connections with 1711 the same semantics. 1713 6.1 HeaderDigest and DataDigest 1715 Irrelevant when: RDMAExtensions=Yes 1717 Negotiations resulting in RDMAExtensions=Yes for a session implies 1718 HeaderDigest=None and DataDigest=None for all connections in that 1719 session and overrides both the default and an explicit setting. 1721 6.2 MaxRecvDataSegmentLength 1723 For an iSCSI connection belonging to a session in which 1724 RDMAExtensions=Yes was negotiated on the leading connection of the 1725 session, MaxRecvDataSegmentLength need not be declared in the Login 1726 Phase. Instead InitiatorRecvDataSegmentLength (as described in 1727 section 6.5) and TargetRecvDataSegmentLength (as described in 1728 section 6.4) keys are negotiated. The values of the local and 1729 remote MaxRecvDataSegmentLength are derived from the 1730 InitiatorRecvDataSegmentLength and TargetRecvDataSegmentLength keys 1731 even if the MaxRecvDataSegmentLength was declared during the login 1732 phase. 1734 In the full feature phase, the initiator MUST consider the value of 1735 its local MaxRecvDataSegmentLength (that it would have declared to 1736 the target) as having the value of InitiatorRecvDataSegmentLength, 1737 and the value of the remote MaxRecvDataSegmentLength (that would 1738 have been declared by the target) as having the value of 1739 TargetRecvDataSegmentLength. Similarly, the target MUST consider 1740 the value of its local MaxRecvDataSegmentLength (that it would have 1741 declared to the initiator) as having the value of 1742 TargetRecvDataSegmentLength, and the value of the remote 1743 MaxRecvDataSegmentLength (that would have been declared by the 1744 initiator) as having the value of InitiatorRecvDataSegmentLength. 1746 The MaxRecvDataSegmentLength key is applicable only for iSCSI 1747 control-type PDUs. 1749 6.3 RDMAExtensions 1751 Use: LO (leading only) 1753 Ko et al. Expires May 2007 41 1754 Senders: Initiator and Target 1756 Scope: SW (session-wide) 1758 RDMAExtensions= 1760 Irrelevant when: SessionType=Discovery 1762 Default is No 1764 Result function is AND 1766 This key is used by the initiator and the target to negotiate the 1767 support for iSER-assisted mode. To enable the use of iSER-assisted 1768 mode, both the initiator and the target MUST exchange 1769 RDMAExtensions=Yes. iSER-assisted mode MUST NOT be used if either 1770 the initiator or the target offers RDMAExtensions=No. 1772 An iSER-enabled node is not required to initiate the RDMAExtensions 1773 key exchange if it prefers to operate in the Traditional iSCSI mode. 1774 However, if the RDMAExtensions key is to be negotiated, an initiator 1775 MUST offer the key in the first Login Request PDU in the 1776 LoginOperationalNegotiation stage of the leading connection, and a 1777 target MUST offer the key in the first Login Response PDU with which 1778 it is allowed to do so (i.e., the first Login Response PDU issued 1779 after the first Login Request PDU with the C bit set to 0) in the 1780 LoginOperationalNegotiation stage of the leading connection. In 1781 response to the offered key=value pair of RDMAExtensions=yes, an 1782 initiator MUST respond in the next Login Request PDU with which it 1783 is allowed to do so, and a target MUST respond in the next Login 1784 Response PDU with which it is allowed to do so. 1786 Negotiating the RDMAExtensions key first enables a node to negotiate 1787 the optimal value for other keys. Certain iSCSI keys such as 1788 MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T, 1789 ImmediateData, etc., may be negotiated differently depending on 1790 whether connection is in Traditional iSCSI mode or iSER-assisted 1791 mode. 1793 6.4 TargetRecvDataSegmentLength 1795 Use: IO (Initialize only) 1797 Senders: Initiator and Target 1799 Scope: CO (connection-only) 1801 Ko et al. Expires May 2007 42 1802 Irrelevant when: RDMAExtensions=No 1804 TargetRecvDataSegmentLength= 1806 Default is 8192 bytes 1808 Result function is minimum 1810 This key is relevant only for the iSCSI connection of an iSCSI 1811 session if RDMAExtensions=Yes was negotiated on the leading 1812 connection of the session. It is used by the initiator and the 1813 target to negotiate the maximum size of the data segment that an 1814 initiator may send to the target in an iSCSI control-type PDU in the 1815 full feature phase. For SCSI Command PDUs and SCSI Data-out PDUs 1816 containing non-immediate unsolicited data to be sent by the 1817 initiator, the initiator MUST send all non-Final PDUs with a data 1818 segment size of exactly TargetRecvDataSegmentLength whenever the 1819 PDUs constitute a data sequence whose size is larger than 1820 TargetRecvDataSegmentLength. 1822 6.5 InitiatorRecvDataSegmentLength 1824 Use: IO (Initialize only) 1826 Senders: Initiator and Target 1828 Scope: CO (connection-only) 1830 Irrelevant when: RDMAExtensions=No 1832 InitiatorRecvDataSegmentLength= 1834 Default is 8192 bytes 1836 Result function is minimum 1838 This key is relevant only for the iSCSI connection of an iSCSI 1839 session if RDMAExtensions=Yes was negotiated on the leading 1840 connection of the session. It is used by the initiator and the 1841 target to negotiate the maximum size of the data segment that a 1842 target may send to the initiator in an iSCSI control-type PDU in the 1843 full feature phase. 1845 6.6 OFMarker and IFMarker 1847 Irrelevant when: RDMAExtensions=Yes 1849 Ko et al. Expires May 2007 43 1850 Negotiations resulting in RDMAExtensions=Yes for a session implies 1851 OFMarker=No and IFMarker=No for all connections in that session and 1852 overrides both the default and an explicit setting. 1854 6.7 MaxOutstandingUnexpectedPDUs 1856 Use: LO (leading only), Declarative 1858 Senders: Initiator and Target 1860 Scope: SW (session-wide) 1862 Irrelevant when: RDMAExtensions=No 1864 MaxOutstandingUnexpectedPDUs= 1867 Default is 0 1869 This key is used by the initiator and the target to declare the 1870 maximum number of outstanding "unexpected" iSCSI control-type PDUs 1871 that it can receive in the full feature phase. It is intended to 1872 allow the receiving side to determine the amount of buffer resources 1873 needed beyond the normal flow control mechanism available in iSCSI. 1874 An initiator or target should select a value such that it would not 1875 impose an unnecessary constraint on the iSCSI Layer under normal 1876 circumstances. The value of 0 is defined to indicate that the 1877 declarer has no limit on the maximum number of outstanding 1878 "unexpected" iSCSI control-type PDUs that it can receive. See 1879 sections 8.1.1 and 8.1.2 for the usage of this key. Note that iSER 1880 Hello and HelloReply Messages are not iSCSI control-type PDUs and 1881 are not affected by this key. 1883 Ko et al. Expires May 2007 44 1884 7 iSCSI PDU Considerations 1886 When a connection is in the iSER-assisted mode, two types of message 1887 transfers are allowed between the iSCSI Layer at the initiator and 1888 the iSCSI Layer at the target. These are known as the iSCSI data- 1889 type PDUs and the iSCSI control-type PDUs and these terms are 1890 described in the following sections. 1892 7.1 iSCSI Data-Type PDU 1894 An iSCSI data-type PDU is defined as an iSCSI PDU that causes data 1895 transfer, transparent to the remote iSCSI layer, to take place 1896 between the peer iSCSI nodes in the full feature phase of an 1897 iSCSI/iSER connection. An iSCSI data-type PDU, when requested for 1898 transmission by the iSCSI Layer in the sending node, results in the 1899 data being transferred without the participation of the iSCSI Layers 1900 at the sending and the receiving nodes. This is due to the fact 1901 that the PDU itself is not delivered as-is to the iSCSI Layer in the 1902 receiving node. Instead, the data transfer operations are 1903 transformed into the appropriate RDMA operations which are handled 1904 by the RDMA-Capable Controller. The set of iSCSI data-type PDUs 1905 consists of SCSI Data-in PDUs and R2T PDUs. 1907 If the invocation of the Operational Primitive by the iSCSI Layer to 1908 request the iSER Layer to process an iSCSI data-type PDU is 1909 qualified with Notify_Enable set, then upon completing the RDMA 1910 operation, the iSER Layer at the target MUST notify the iSCSI Layer 1911 at the target by invoking the Data_Completion_Notify Operational 1912 Primitive qualified with ITT and SN. There is no data completion 1913 notification at the initiator since the RDMA operations are 1914 completely handled by the RDMA-Capable Controller at the initiator 1915 and the iSER Layer at the initiator is not involved with the data 1916 transfer associated with iSCSI data-type PDUs. 1918 If the invocation of the Operational Primitive by the iSCSI Layer to 1919 request the iSER Layer to process an iSCSI data-type PDU is 1920 qualified with Notify_Enable cleared, then upon completing the RDMA 1921 operation, the iSER Layer at the target MUST NOT notify the iSCSI 1922 Layer at the target and MUST NOT invoke the Data_Completion_Notify 1923 Operational Primitive. 1925 If an operation associated with an iSCSI data-type PDU fails for any 1926 reason, the contents of the Data Sink buffers associated with the 1927 operation are considered indeterminate. 1929 Ko et al. Expires May 2007 45 1930 7.2 iSCSI Control-Type PDU 1932 Any iSCSI PDU that is not an iSCSI data-type PDU and also not a SCSI 1933 Data-out PDU carrying solicited data is defined as an iSCSI control- 1934 type PDU. The iSCSI Layer invokes the Send_Control Operational 1935 Primitive to request the iSER Layer to process an iSCSI control-type 1936 PDU. iSCSI control-type PDUs are transferred using Send Message 1937 Types of RCaP. Specifically, it is to be noted that SCSI Data-Out 1938 PDUs carrying unsolicited data are defined as iSCSI control-type 1939 PDUs. See section 7.3.4 on the treatment of SCSI Data-out PDUs. 1941 When the iSER Layer receives an iSCSI control-type PDU, it MUST 1942 notify the iSCSI Layer by invoking the Control_Notify Operational 1943 Primitive qualified with the iSCSI control-type PDU. 1945 7.3 iSCSI PDUs 1947 This section describes the handling of each of the iSCSI PDU types 1948 by the iSER Layer. The iSCSI Layer requests the iSER Layer to 1949 process the iSCSI PDU by invoking the appropriate Operational 1950 Primitive. A Connection_Handle MUST qualify each of these 1951 invocations. In addition, BHS and the optional AHS of the iSCSI PDU 1952 as defined in [RFC3720] MUST qualify each of the invocations. The 1953 qualifying Connection_Handle, the BHS and the AHS are not explicitly 1954 listed in the subsequent sections. 1956 7.3.1 SCSI Command 1958 Type: control-type PDU 1960 PDU-specific qualifiers (for SCSI Write or bidirectional 1961 command): ImmediateDataSize, UnsolicitedDataSize, 1962 DataDescriptorOut 1964 PDU-specific qualifiers (for SCSI Read or bidirectional 1965 command): DataDescriptorIn 1967 The iSER Layer at the initiator MUST send the SCSI command in a 1968 SendSE Message to the target. 1970 For a SCSI Write or bidirectional command, the iSCSI Layer at the 1971 initiator MUST invoke the Send_Control Operational Primitive as 1972 follows: 1974 * If there is immediate data to be transferred for the SCSI write 1975 or bidirectional command, the qualifier ImmediateDataSize MUST be 1976 used to define the number of bytes of immediate unsolicited data 1978 Ko et al. Expires May 2007 46 1979 to be sent with the write or bidirectional command, and the 1980 qualifier DataDescriptorOut MUST be used to define the 1981 initiator's I/O Buffer containing the SCSI Write data. 1983 * If there is unsolicited data to be transferred for the SCSI Write 1984 or bidirectional command, the qualifier UnsolicitedDataSize MUST 1985 be used to define the number of bytes of immediate and non- 1986 immediate unsolicited data for the command. The iSCSI Layer will 1987 issue one or more SCSI Data-out PDUs for the non-immediate 1988 unsolicited data. See Section 7.3.4 on SCSI Data-out. 1990 * If there is solicited data to be transferred for the SCSI Write 1991 or bidirectional command, as indicated by the Expected Data 1992 Transfer Length in the SCSI Command PDU exceeding the value of 1993 UnsolicitedDataSize, the iSER Layer at the initiator MUST do the 1994 following: 1996 a. It MUST allocate a Write STag for the I/O Buffer defined by 1997 the qualifier DataDescriptorOut. DataDescriptorOut 1998 describes the I/O buffer starting with the immediate 1999 unsolicited data (if any), followed by the non-immediate 2000 unsolicited data (if any) and solicited data. This means 2001 that the BufferOffset for the SCSI Data-out for this command 2002 is equal to the TO. This implies zero TO for this STag 2003 points to the beginning of this I/O Buffer. 2005 b. It MUST establish a Local Mapping that associates the 2006 Initiator Task Tag (ITT) to the Write STag. 2008 c. It MUST Advertise the Write STag to the target by sending it 2009 as the Write STag in the iSER header of the iSER Message 2010 (the payload of the SendSE Message of RCaP) containing the 2011 SCSI Write or bidirectional command PDU. See section 9.2 on 2012 iSER Header Format for iSCSI Control-Type PDU. 2014 For a SCSI Read or bidirectional command, the iSCSI Layer at the 2015 initiator MUST invoke the Send_Control Operational Primitive 2016 qualified with DataDescriptorIn which defines the initiator's I/O 2017 Buffer for receiving the SCSI Read data. The iSER Layer at the 2018 initiator MUST do the following: 2020 a. It MUST allocate a Read STag for the I/O Buffer. 2022 b. It MUST establish a Local Mapping that associates the 2023 Initiator Task Tag (ITT) to the Read STag. 2025 Ko et al. Expires May 2007 47 2026 c. It MUST Advertise the Read STag to the target by sending it 2027 as the Read STag in the iSER header of the iSER Message (the 2028 payload of the SendSE Message of RCaP) containing the SCSI 2029 Read or bidirectional command PDU. See section 9.2 on iSER 2030 Header Format for iSCSI Control-Type PDU. 2032 If the amount of unsolicited data to be transferred in a SCSI 2033 Command exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at 2034 the initiator MUST segment the data into multiple iSCSI control-type 2035 PDUs, with the data segment length in all PDUs generated except the 2036 last one having exactly the size TargetRecvDataSegmentLength. The 2037 data segment length of the last iSCSI control-type PDU carrying the 2038 unsolicited data can be up to TargetRecvDataSegmentLength. 2040 When the iSER Layer at the target receives the SCSI Command, it MUST 2041 establish a Remote Mapping that associates the ITT to the Advertised 2042 Write STag and the Read STag if present in the iSER header. The 2043 Write STag is used by the iSER Layer at the target in handling the 2044 data transfer associated with the R2T PDU(s) as described in section 2045 7.3.6. The Read STag is used in handling the SCSI Data-in PDU(s) 2046 from the iSCSI Layer at the target as described in section 7.3.5. 2048 7.3.2 SCSI Response 2050 Type: control-type PDU 2052 PDU-specific qualifiers: DataDescriptorStatus 2054 The iSCSI Layer at the target MUST invoke the Send_Control 2055 Operational Primitive qualified with DataDescriptorStatus which 2056 defines the buffer containing the sense and response information. 2057 The iSCSI Layer at the target MUST always return the SCSI status for 2058 a SCSI command in a separate SCSI Response PDU. "Phase collapse" 2059 for transferring SCSI status in a SCSI Data-in PDU MUST NOT be used. 2060 The iSER Layer at the target sends the SCSI Response PDU according 2061 to the following rules: 2063 * If no STags were Advertised by the initiator in the iSER Message 2064 containing the SCSI command PDU, then the iSER Layer at the 2065 target MUST send a SendSE Message containing the SCSI Response 2066 PDU. 2068 * If the initiator Advertised a Read STag in the iSER Message 2069 containing the SCSI Command PDU, then the iSER Layer at the 2070 target MUST send a SendInvSE Message containing the SCSI Response 2071 PDU. The header of the SendInvSE Message MUST carry the Read 2072 STag to be invalidated at the initiator. 2074 Ko et al. Expires May 2007 48 2075 * If the initiator Advertised only the Write STag in the iSER 2076 Message containing the SCSI command PDU, then the iSER Layer at 2077 the target MUST send a SendInvSE Message containing the SCSI 2078 Response PDU. The header of the SendInvSE Message MUST carry the 2079 Write STag to be invalidated at the initiator. 2081 When the iSCSI Layer at the target invokes the Send_Control 2082 Operational Primitive to send the SCSI Response PDU, the iSER Layer 2083 at the target MUST invalidate the Remote Mapping that associates the 2084 ITT to the Advertised STag(s) before transferring the SCSI Response 2085 PDU to the initiator. 2087 Upon receiving the SendInvSE Message containing the SCSI Response 2088 PDU from the target, the RCaP layer at the initiator will invalidate 2089 the STag specified in the header. The iSER Layer at the initiator 2090 MUST ensure that the correct STag is invalidated. If both the Read 2091 and the Write STags were Advertised earlier by the initiator, then 2092 the iSER Layer at the initiator MUST explicitly invalidate the Write 2093 STag upon receiving the SendInvSE Message because the header of the 2094 SendInvSE Message can only carry one STag (in this case the Read 2095 STag) to be invalidated. 2097 The iSER Layer at the initiator MUST ensure the invalidation of the 2098 STag(s) used in a command before notifying the iSCSI Layer at the 2099 initiator by invoking the Control_Notify Operational Primitive 2100 qualified with the SCSI Response. This precludes the possibility of 2101 using the STag(s) after the completion of the command thereby 2102 causing data corruption. 2104 When the iSER Layer at the initiator receives the SendSE or the 2105 SendInvSE Message containing the SCSI Response PDU, it SHOULD 2106 invalidate the Local Mapping that associates the ITT to the local 2107 STag(s). The iSER Layer MUST ensure that all local STag(s) 2108 associated with the ITT are invalidated before notifying the iSCSI 2109 Layer of the SCSI Response PDU by invoking the Control_Notify 2110 Operational Primitive qualified with the SCSI Response PDU. 2112 7.3.3 Task Management Function Request/Response 2114 Type: control-type PDU 2116 PDU-specific qualifiers (for TMF Request): DataDescriptorOut, 2117 DataDescriptorIn 2119 The iSER Layer MUST use a SendSE Message to send the Task Management 2120 Function Request/Response PDU. 2122 Ko et al. Expires May 2007 49 2123 For the Task Management Function Request with the TASK REASSIGN 2124 function, the iSER Layer at the initiator MUST do the following: 2126 * It MUST use the ITT as specified in the Referenced Task Tag from 2127 the Task Management Function Request PDU to locate the existing 2128 STag(s), if any, in the Local Mapping(s) that associates the ITT 2129 to the local STag(s). 2131 * It MUST invalidate the existing STag(s), if any, and the Local 2132 Mapping(s) that associates the ITT to the local STag(s). 2134 * It MUST allocate a Read STag for the I/O Buffer as defined by the 2135 qualifier DataDescriptorIn if the Send_Control Operational 2136 Primitive invocation is qualified with DataDescriptorIn. 2138 * It MUST allocate a Write STag for the I/O Buffer as defined by 2139 the qualifier DataDescriptorOut if the Send_Control Operational 2140 Primitive invocation is qualified with DataDescriptorOut. 2142 * If STags are allocated, it MUST establish new Local Mapping(s) 2143 that associate the ITT to the allocated STag(s). 2145 * It MUST Advertise the STags, if allocated, to the target in the 2146 iSER header of the SendSE Message carrying the iSCSI PDU, as 2147 described in section 9.2. 2149 For the Task Management Function Request with the TASK REASSIGN 2150 function for a SCSI Read or bidirectional command, the iSCSI Layer 2151 at the initiator MUST set ExpDataSN to 0 since the data transfer and 2152 acknowledgements happen transparently to the iSCSI Layer at the 2153 initiator. This provides the flexibility to the iSCSI Layer at the 2154 target to request transmission of only the unacknowledged data as 2155 specified in [RFC3720]. 2157 When the iSER Layer at the target receives the Task Management 2158 Function Request with the TASK REASSIGN function, it MUST do the 2159 following: 2161 * It MUST use the ITT as specified in the Referenced Task Tag from 2162 the Task Management Function Request PDU to locate the mappings 2163 that associate the ITT to the Advertised STag(s) and the local 2164 STag(s), if any. 2166 * It MUST invalidate the local STaq(s), if any, associated with the 2167 ITT. 2169 Ko et al. Expires May 2007 50 2170 * It MUST replace the Advertised STag(s) in the Remote Mapping that 2171 associates the ITT to the Advertised STag(s) with the Write STag 2172 and the Read STag if present in the iSER header. The Write STag 2173 is used in the handling of the R2T PDU(s) from the iSCSI Layer at 2174 the target as described in section 7.3.6. The Read STag is used 2175 in the handling of the SCSI Data-in PDU(s) from the iSCSI Layer 2176 at the target as described in section 7.3.5. 2178 7.3.4 SCSI Data-out 2180 Type: control-type PDU 2182 PDU-specific qualifiers: DataDescriptorOut 2184 The iSCSI Layer at the initiator MUST invoke the Send_Control 2185 Operational Primitive qualified with DataDescriptorOut which defines 2186 the initiator's I/O Buffer containing unsolicited SCSI Write data. 2188 If the amount of unsolicited data to be transferred as SCSI Data-out 2189 exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at the 2190 initiator MUST segment the data into multiple iSCSI control-type 2191 PDUs, with the DataSegmentLength having the value of 2192 TargetRecvDataSegmentLength in all PDUs generated except the last 2193 one. The DataSegmentLength of the last iSCSI control-type PDU 2194 carrying the unsolicited data can be up to 2195 TargetRecvDataSegmentLength. The iSCSI Layer at the target MUST 2196 perform the reassembly function for the unsolicited data. 2198 For unsolicited data, if the F bit is set to 0 in a SCSI Data-out 2199 PDU, the iSER Layer at the initiator MUST use a Send Message to send 2200 the SCSI Data-out PDU. If the F bit is set to 1, the iSER Layer at 2201 the initiator MUST use a SendSE Message to send the SCSI Data-out 2202 PDU. 2204 Note that for solicited data, the SCSI Data-out PDUs are not used 2205 since R2T PDUs are not delivered to the iSCSI layer at the 2206 initiator; instead R2T PDUs are transformed by the iSER layer at the 2207 target into RDMA Read operations. (See section 7.3.6.) 2209 7.3.5 SCSI Data-in 2211 Type: data-type PDU 2213 PDU-specific qualifiers: DataDescriptorIn 2215 When the iSCSI Layer at the target is ready to return the SCSI Read 2216 data to the initiator, it MUST invoke the Put_Data Operational 2218 Ko et al. Expires May 2007 51 2219 Primitive qualified with DataDescriptorIn which defines the SCSI 2220 Data-in buffer. See section 7.1 on the general requirement on the 2221 handling of iSCSI data-type PDUs. SCSI Data-in PDU(s) are used in 2222 SCSI Read data transfer as described in section 9.5.2. 2224 The iSER Layer at the target MUST do the following for each 2225 invocation of the Put_Data Operational Primitive: 2227 1. It MUST use the ITT in the SCSI Data-in PDU to locate the remote 2228 Read STag in the Remote Mapping that associates the ITT to 2229 Advertised STag(s). The Remote Mapping was established earlier 2230 by the iSER Layer at the target when the SCSI Read Command was 2231 received from the initiator. 2233 2. It MUST generate and send an RDMA Write Message containing the 2234 read data to the initiator. 2236 a. It MUST use the remote Read STag as the Data Sink STag of 2237 the RDMA Write Message. 2239 b. It MUST use the Buffer Offset from the SCSI Data-in PDU as 2240 the Data Sink Tagged Offset of the RDMA Write Message. 2242 c. It MUST use DataSegmentLength from the SCSI Data-in PDU to 2243 determine the amount of data to be sent in the RDMA Write 2244 Message. 2246 3. It MUST associate DataSN and ITT from the SCSI Data-in PDU with 2247 the RDMA Write operation. If the Put_Data Operational Primitive 2248 invocation was qualified with Notify_Enable set, then when the 2249 iSER Layer at the target receives a completion from the RCaP 2250 layer for the RDMA Write Message, the iSER Layer at the target 2251 MUST notify the iSCSI Layer by invoking the 2252 Data_Completion_Notify Operational Primitive qualified with 2253 DataSN and ITT. Conversely, if the Put_Data Operational 2254 Primitive invocation was qualified with Notify_Enable cleared, 2255 then the iSER Layer at the target MUST NOT notify the iSCSI 2256 Layer on completion and MUST NOT invoke the 2257 Data_Completion_Notify Operational Primitive. 2259 When the A-bit is set to 1 in the SCSI Data-in PDU, the iSER Layer 2260 at the target MUST notify the iSCSI Layer at the target when the 2261 data transfer is complete at the initiator. To perform this 2262 additional function, the iSER Layer at the target can take advantage 2263 of the operational ErrorRecoveryLevel if previously disclosed by the 2264 iSCSI Layer via an earlier invocation of the Notice_Key_Values 2265 Operational Primitive. There are two approaches that can be taken: 2267 Ko et al. Expires May 2007 52 2268 1. If the iSER Layer at the target knows that the operational 2269 ErrorRecoveryLevel is 2, or if the iSER Layer at the target does 2270 not know the operational ErrorRecoveryLevel, then the iSER Layer 2271 at the target MUST issue a zero-length RDMA Read Request Message 2272 following the RDMA Write Message. When the iSER Layer at the 2273 target receives a completion for the RDMA Read Request Message 2274 from the RCaP layer, implying that the RDMA-Capable Controller 2275 at the initiator has completed processing the RDMA Write Message 2276 due to the completion ordering semantics of RCaP, the iSER Layer 2277 at the target MUST notify the iSCSI Layer at the target by 2278 invoking the Data_Ack_Notify Operational Primitive qualified 2279 with ITT and DataSN (see section 3.2.3). 2281 2. If the iSER Layer at the target knows that the operational 2282 ErrorRecoveryLevel is 1, then the iSER Layer at the target MUST 2283 do one of the following: 2285 a. It MUST notify the iSCSI Layer at the target by invoking the 2286 Data_Ack_Notify Operational Primitive qualified with ITT and 2287 DataSN (see section 3.2.3) when it receives the local 2288 completion from the RCaP layer for the RDMA Write Message. 2289 This is allowed since digest errors do not occur in iSER 2290 (see section 10.1.4.2) and a CRC error will cause the 2291 connection to be terminated and the task to be terminated 2292 anyway. The local RDMA Write completion from the RCaP layer 2293 guarantees that the RCaP layer will not access the I/O 2294 Buffer again to transfer the data associated with that RDMA 2295 Write operation. 2297 b. Alternatively, it MUST use the same procedure for handling 2298 the data transfer completion at the initiator as for 2299 ErrorRecoveryLevel 2. 2301 It should be noted that the iSCSI Layer at the target cannot set the 2302 A-bit to 1 if the ErrorRecoveryLevel=0. 2304 SCSI status MUST always be returned in a separate SCSI Response PDU. 2305 The S bit in the SCSI Data-in PDU MUST always be set to 0. There 2306 MUST NOT be a "phase collapse" in the SCSI Data-in PDU. 2308 Since the RDMA Write Message only transfers the data portion of the 2309 SCSI Data-in PDU but not the control information in the header, such 2310 as ExpCmdSN, if timely updates of such information is crucial, the 2311 iSCSI Layer at the initiator MAY issue NOP-Out PDUs to request the 2312 iSCSI Layer at the target to respond with the information using NOP- 2313 In PDUs. 2315 Ko et al. Expires May 2007 53 2316 7.3.6 Ready To Transfer (R2T) 2318 Type: data-type PDU 2320 PDU-specific qualifiers: DataDescriptorOut 2322 In order to send an R2T PDU, the iSCSI Layer at the target MUST 2323 invoke the Get_Data Operational Primitive qualified with 2324 DataDescriptorOut which defines the I/O Buffer for receiving the 2325 SCSI Write data from the initiator. See section 7.1 on the general 2326 requirements on the handling of iSCSI data-type PDUs. 2328 The iSER Layer at the target MUST do the following for each 2329 invocation of the Get_Data Operational Primitive: 2331 1. It MUST ensure a valid local STag for the I/O Buffer and a valid 2332 Local Mapping that associates the Initiator Task Tag (ITT) to 2333 the local STag. This may involve allocating a valid local STag 2334 and establishing a Local Mapping. 2336 2. It MUST use the ITT in the R2T to locate the remote Write STag 2337 in the Remote Mapping that associates the ITT to Advertised 2338 STag(s). The Remote Mapping was established earlier by the iSER 2339 Layer at the target when the iSER Message containing the 2340 Advertised Write STag and the SCSI Command PDU for a SCSI Write 2341 or bidirectional command was received from the initiator. 2343 3. If the iSER-ORD value at the target is set to 0, the iSER Layer 2344 at the target MUST terminate the connection and free up the 2345 resources associated with the connection (as described in 5.2.3) 2346 if it received the R2T PDU from the iSCSI Layer at the target. 2347 Upon termination of the connection, the iSER Layer at the target 2348 MUST notify the iSCSI Layer at the target by invoking the 2349 Connection_Terminate_Notify Operational Primitive. 2351 4. If the iSER-ORD value at the target is set to greater than 0, 2352 the iSER Layer at the target MUST transform the R2T PDU into an 2353 RDMA Read Request Message. While transforming the R2T PDU, the 2354 iSER Layer at the target MUST ensure that the number of 2355 outstanding RDMA Read Request Messages does not exceed iSER-ORD 2356 value. To transform the R2T PDU, the iSER Layer at the target: 2358 a. MUST derive the local STag and local Tagged Offset from the 2359 DataDescriptorOut that qualified the Get_Data invocation. 2361 b. MUST use the local STag as the Data Sink STag of the RDMA 2362 Read Request Message. 2364 Ko et al. Expires May 2007 54 2365 c. MUST use the local Tagged Offset as the Data Sink Tagged 2366 Offset of the RDMA Read Request Message. 2368 d. MUST use the Desired Data Transfer Length from the R2T PDU 2369 as the RDMA Read Message Size of the RDMA Read Request 2370 Message. 2372 e. MUST use the remote Write STag as the Data Source STag of 2373 the RDMA Read Request Message. 2375 f. MUST use the Buffer Offset from the R2T PDU as the Data 2376 Source Tagged Offset of the RDMA Read Request Message. 2378 5. It MUST associate R2TSN and ITT from the R2T PDU with the RDMA 2379 Read operation. If the Get_Data Operational Primitive 2380 invocation was qualified with Notify_Enable set, then when the 2381 iSER Layer at the target receives a completion from the RCaP 2382 layer for the RDMA Read operation, the iSER Layer at the target 2383 MUST notify the iSCSI Layer by invoking the 2384 Data_Completion_Notify Operational Primitive qualified with 2385 R2TSN and ITT. Conversely, if the Get_Data Operational 2386 Primitive invocation was qualified with Notify_Enable cleared, 2387 then the iSER Layer at the target MUST NOT notify the iSCSI 2388 Layer on completion and MUST NOT invoke the 2389 Data_Completion_Notify Operational Primitive. 2391 When the RCaP layer at the initiator receives a valid RDMA Read 2392 Request Message, it will return an RDMA Read Response Message 2393 containing the solicited write data to the target. When the RCaP 2394 layer at target receives the RDMA Read Response Message from the 2395 initiator, it will place the solicited data in the I/O Buffer 2396 referenced by the Data Sink STag in the RDMA Read Response Message. 2398 Since the RDMA Read Request Message from the target does not 2399 transfer the control information in the R2T PDU such as ExpCmdSN, if 2400 timely updates of such information is crucial, the iSCSI Layer at 2401 the initiator MAY issue NOP-Out PDUs to request the iSCSI Layer at 2402 the target to respond with the information using NOP-In PDUs. 2404 Similarly, since the RDMA Read Response Message from the initiator 2405 only transfers the data but not the control information normally 2406 found in the SCSI Data-out PDU, such as ExpStatSN, if timely updates 2407 of such information is crucial, the iSCSI Layer at the target MAY 2408 issue NOP-In PDUs to request the iSCSI Layer at the initiator to 2409 respond with the information using NOP-Out PDUs. 2411 Ko et al. Expires May 2007 55 2412 7.3.7 Asynchronous Message 2414 Type: control-type PDU 2416 PDU-specific qualifiers: DataDescriptorSense 2418 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2419 qualified with DataDescriptorSense which defines the buffer 2420 containing the sense and iSCSI Event information. The iSER Layer 2421 MUST use a SendSE Message to send the Asynchronous Message PDU. 2423 7.3.8 Text Request & Text Response 2425 Type: control-type PDU 2427 PDU-specific qualifiers: DataDescriptorTextOut (for Text 2428 Request), DataDescriptorIn (for Text Response) 2430 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2431 qualified with DataDescriptorTextOut (or DataDescriptorIn) which 2432 defines the Text Request (or Text Response) buffer. The iSER Layer 2433 MUST use SendSE Messages to send the Text Request (or Text Response 2434 PDUs). 2436 7.3.9 Login Request & Login Response 2438 During the login negotiation, the iSCSI Layer interacts with the 2439 transport layer directly and the iSER Layer is not involved. See 2440 section 5.1 on iSCSI/iSER Connection Setup. If the underlying 2441 transport is TCP, the Login Request PDUs and the Login Response PDUs 2442 are exchanged when the connection between the initiator and the 2443 target is still in the byte stream mode. 2445 The iSCSI Layer MUST not send a Login Request (or a Login Response) 2446 PDU during the full feature phase. A Login Request (or a Login 2447 Response) PDU, if used, MUST be treated as an iSCSI protocol error. 2448 The iSER Layer MAY reject such a PDU from the iSCSI Layer with an 2449 appropriate error code. If a Login Request PDU is received by the 2450 iSCSI Layer at the target, it MUST respond with a Reject PDU with a 2451 reason code of "protocol error". 2453 7.3.10 Logout Request & Logout Response 2455 Type: control-type PDU 2457 PDU-specific qualifiers: None 2459 Ko et al. Expires May 2007 56 2460 The iSER Layer MUST use a SendSE Message to send the Logout Request 2461 or Logout Response PDU. Section 5.2.1 and 5.2.2 describe the 2462 handling of the Logout Request and the Logout Response at the 2463 initiator and the target and the interactions between the initiator 2464 and the target to terminate a connection. 2466 7.3.11 SNACK Request 2468 Since HeaderDigest and DataDigest must be negotiated to "None", 2469 there are no digest errors when the connection is in iSER-assisted 2470 mode. Also since RCaP delivers all messages in the order they were 2471 sent, there are no sequence errors when the connection is in iSER- 2472 assisted mode. Therefore the iSCSI Layer MUST NOT send SNACK 2473 Request PDUs. A SNCAK Request PDU, if used, MUST be treated as an 2474 iSCSI protocol error. The iSER Layer MAY reject such a PDU from the 2475 iSCSI Layer with an appropriate error code. If a SNACK Request PDU 2476 is received by the iSCSI Layer at the target, it MUST respond with a 2477 Reject PDU with a reason code of "protocol error". 2479 7.3.12 Reject 2481 Type: control-type PDU 2483 PDU-specific qualifiers: DataDescriptorReject 2485 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2486 qualified with DataDescriptorReject which defines the Reject buffer. 2487 The iSER Layer MUST use a SendSE Message to send the Reject PDU. 2489 7.3.13 NOP-Out & NOP-In 2491 Type: control-type PDU 2493 PDU-specific qualifiers: DataDescriptorNOPOut (for NOP-Out), 2494 DataDescriptorNOPIn (for NOP-In) 2496 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2497 qualified with DataDescriptorNOPOut (or DataDescriptorNOPIn) which 2498 defines the Ping (or Return Ping) data buffer. The iSER Layer MUST 2499 use SendSE Messages to send the NOP-Out (or NOP-In) PDU. 2501 Ko et al. Expires May 2007 57 2502 8 Flow Control and STag Management 2504 8.1 Flow Control for RDMA Send Message Types 2506 Send Message Types in RCaP are used by the iSER Layer to transfer 2507 iSCSI control-type PDUs. Each Send Message Type in RCaP consumes an 2508 Untagged Buffer at the Data Sink. However, neither the RCaP layer 2509 nor the iSER Layer provides an explicit flow control mechanism for 2510 the Send Message Types. Therefore, the iSER Layer SHOULD provision 2511 enough Untagged buffers for handling incoming Send Message Types to 2512 prevent buffer exhaustion at the RCaP layer. If buffer exhaustion 2513 occurs, it may result in the termination of the connection. 2515 An implementation may choose to satisfy the buffer requirement by 2516 using a common buffer pool shared across multiple connections, with 2517 usage limits on a per connection basis and usage limits on the 2518 buffer pool itself. In such an implementation, exceeding the buffer 2519 usage limit for a connection or the buffer pool itself may trigger 2520 interventions from the iSER Layer to replenish the buffer pool 2521 and/or to isolate the connection causing the problem. 2523 iSER also provides the MaxOutstandingUnexpectedPDUs key to be used 2524 by the initiator and the target to declare the maximum number of 2525 outstanding "unexpected" control-type PDUs that it can receive. It 2526 is intended to allow the receiving side to determine the amount of 2527 buffer resources needed beyond the normal flow control mechanism 2528 available in iSCSI. 2530 The buffer resources required at both the initiator and the target 2531 as a result of control-type PDUs sent by the initiator is described 2532 in section 8.1.1. The buffer resources required at both the 2533 initiator and target as a result of control-type PDUs sent by the 2534 target is described in section 8.1.2. 2536 8.1.1 Flow Control for Control-Type PDUs from the Initiator 2538 The control-type PDUs that can be sent by an initiator to a target 2539 can be grouped into the following categories: 2541 1. Regulated: Control-type PDUs in this category are regulated by 2542 the iSCSI CmdSN window mechanism and the immediate flag is not 2543 set. 2545 2. Unregulated but Expected: Control-type PDUs in this category 2546 are not regulated by the iSCSI CmdSN window mechanism but are 2547 expected by the target. 2549 Ko et al. Expires May 2007 58 2550 3. Unregulated and Unexpected: Control-type PDUs in this category 2551 are not regulated by the iSCSI CmdSN window mechanism and are 2552 "unexpected" by the target. 2554 8.1.1.1 Control-Type PDUs from the Initiator in the Regulated Category 2556 Control-type PDUs that can be sent by the initiator in this category 2557 are regulated by the iSCSI CmdSN window mechanism and the immediate 2558 flag is not set. 2560 The queuing capacity required of the iSCSI layer at the target is 2561 described in section 3.2.2.1 of [RFC3720]. For each of the control- 2562 type PDUs that can be sent by the initiator in this category, the 2563 initiator MUST provision for the buffer resources required for the 2564 corresponding control-type PDU sent as a response from the target. 2565 The following is a list of the PDUs that can be sent by the 2566 initiator and the PDUs that are sent by the target in response: 2568 a. When an initiator sends a SCSI Command PDU, it expects a 2569 SCSI Response PDU from the target. 2571 b. When the initiator sends a Task Management Function Request 2572 PDU, it expects a Task Management Function Response PDU from 2573 the target. 2575 c. When the initiator sends a Text Request PDU, it expects a 2576 Text Response PDU from the target. 2578 d. When the initiator sends a Logout Request PDU, it expects a 2579 Logout Response PDU from the target. 2581 e. When the initiator sends a NOP-Out PDU as a ping request 2582 with ITT != 0xffffffff and TTT = 0xffffffff, it expects a 2583 NOP-In PDU from the target with the same ITT and TTT as in 2584 the ping request. 2586 The response from the target for any of the PDUs enumerated here may 2587 alternatively be in the form of a Reject PDU sent instead before the 2588 task is active, as described in section 6.3 of [RFC3720]. 2590 8.1.1.2 Control-Type PDUs from the Initiator in the Unregulated but 2591 Expected Category 2593 For the control-type PDUs in the Unregulated but Expected category, 2594 the amount of buffering resources required at the target can be 2595 predetermined. The following is a list of the PDUs in this 2596 category: 2598 Ko et al. Expires May 2007 59 2599 a. SCSI Data-out PDUs are used by the initiator to send 2600 unsolicited data. The amount of buffer resources required 2601 by the target can be determined using FirstBurstLength. 2602 Note that SCSI Data-out PDUs are not used for solicited 2603 data since the R2T PDU which is used for solicitation is 2604 transformed into RDMA Read operations by the iSER layer at 2605 the target. See section 7.3.4. 2607 b. A NOP-Out PDU with TTT != 0xffffffff is sent as a ping 2608 response by the initiator to the NOP-In PDU sent as a ping 2609 request by the target. 2611 8.1.1.3 Control-Type PDUs from the Initiator in the Unregulated and 2612 Unexpected Category 2614 PDUs in the Unregulated and Unexpected category are PDUs with the 2615 immediate flag set. The number of PDUs in this category which can 2616 be sent by an initiator is controlled by the value of 2617 MaxOutstandingUnexpectedPDUs declared by the target. (See section 2618 6.7.) After a PDU in this category is sent by the initiator, it is 2619 outstanding until it is retired. At any time, the number of 2620 outstanding unexpected PDUs MUST not exceed the value of 2621 MaxOutstandingUnexpectedPDUs declared by the target. 2623 The target uses the value of MaxOutstandingUnexpectedPDUs that it 2624 declared to determine the amount of buffer resources required for 2625 control-type PDUs in this category that can be sent by an initiator. 2626 For the initiator, for each of the control-type PDUs that can be 2627 sent in this category, the initiator MUST provision for the buffer 2628 resources if required for the corresponding control-type PDU that 2629 can be sent as a response from the target. 2631 An outstanding PDU in this category is retired as follows. If the 2632 CmdSN of the PDU sent by the initiator in this category is x, the 2633 PDU is outstanding until the initiator sends a non-immediate 2634 control-type PDU on the same connection with CmdSN = y (where y is 2635 at least x) and the target responds with a control-type PDU on any 2636 connection where ExpCmdSN is at least y+1. 2638 When the number of outstanding unexpected control-type PDUs equals 2639 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the initiator MUST 2640 NOT generate any unexpected PDUs which otherwise it would have 2641 generated, even if it is intended for immediate delivery. 2643 Ko et al. Expires May 2007 60 2644 8.1.2 Flow Control for Control-Type PDUs from the Target 2646 Control-type PDUs that can be sent by a target and are expected by 2647 the initiator are listed in the Regulated category. (See section 2648 8.1.1.1.) 2650 For the control-type PDUs that can be sent by a target and are 2651 unexpected by the initiator, the number is controlled by 2652 MaxOutstandingUnexpectedPDUs declared by the initiator. (See 2653 section 6.7.) After a PDU in this category is sent by a target, it 2654 is outstanding until it is retired. At any time, the number of 2655 outstanding unexpected PDUs MUST not exceed the value of 2656 MaxOutstandingUnexpectedPDUs declared by the initiator. The 2657 initiator uses the value of MaxOutstandingUnexpectedPDUs that it 2658 declared to determine the amount of buffer resources required for 2659 control-type PDUs in this category that can be sent by a target. 2660 The following is a list of the PDUs in this category and the 2661 conditions for retiring the outstanding PDU: 2663 a. For an Asynchronous Message PDU with StatSN = x, the PDU is 2664 outstanding until the initiator sends a control-type PDU 2665 with ExpStatSN set to at least x+1. 2667 b. For a Reject PDU with StatSN = x which is sent after a task 2668 is active, the PDU is outstanding until the initiator sends 2669 a control-type PDU with ExpStatSN set to at least x+1. 2671 c. For a NOP-In PDU with ITT = 0xffffffff and StatSN = x, the 2672 PDU is outstanding until the initiator responds with a 2673 control-type PDU on the same connection where ExpStatSN is 2674 at least x+1. But if the NOP-In PDU is sent as a ping 2675 request with TTT != 0xffffffff, the PDU can also be retired 2676 when the initiator sends a NOP-Out PDU with the same ITT and 2677 TTT as in the ping request. Note that when a target sends a 2678 NOP-In PDU as a ping request, it must provision a buffer for 2679 the NOP-Out PDU sent as a ping response from the initiator. 2681 When the number of outstanding unexpected control-type PDUs equals 2682 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the target MUST NOT 2683 generate any unexpected PDUs which otherwise it would have 2684 generated, even if its intent is to indicate an iSCSI error 2685 condition (e.g., Asynchronous Message, Reject). Task timeouts as in 2686 the initiator waiting for a command completion or other connection 2687 and session level exceptions will ensure that correct operational 2688 behavior will result in these cases despite not generating the PDU. 2689 This rule overrides any other requirements elsewhere which require 2690 that a Reject PDU MUST be sent. 2692 Ko et al. Expires May 2007 61 2693 (Implementation note: SCSI task timeout and recovery can be a 2694 lengthy process and hence SHOULD be avoided by proper provisioning 2695 of resources.) 2697 (Implementation note: To ensure that the initiator has a means to 2698 inform the target that outstanding PDUs have been retired, the 2699 target should reserve the last unexpected control-type PDU allowable 2700 by the value of MaxOutstandingUnexpectedPDUs declared by the 2701 initiator for sending a NOP-In ping request with TTT != 0xffffffff 2702 to allow the initiator to return the NOP-Out ping response with the 2703 current ExpStatSN.) 2705 8.2 Flow Control for RDMA Read Resources 2707 The total number of RDMA Read operations that can be active 2708 simultaneously on an iSCSI/iSER connection depends on the amount of 2709 resources allocated as declared in the iSER Hello exchange described 2710 in section 5.1.3. Exceeding the number of RDMA Read operations 2711 allowed on a connection will result in the connection being 2712 terminated by the RCaP layer. The iSER Layer at the target 2713 maintains the iSER-ORD to keep track of the maximum number of RDMA 2714 Read Requests that can be issued by the iSER Layer on a particular 2715 RCaP Stream. 2717 During connection setup (see section 5.1), iSER-IRD is known at the 2718 initiator and iSER-ORD is known at the target after the iSER Layers 2719 at the initiator and the target have respectively allocated the 2720 connection resources necessary to support RCaP, as directed by the 2721 Allocate_Connection_Resources Operational Primitive from the iSCSI 2722 Layer before the end of the iSCSI Login Phase. In the full feature 2723 phase, the first message sent by the initiator is the iSER Hello 2724 Message (see section 9.3) which contains the value of iSER-IRD. In 2725 response to the iSER Hello Message, the target sends the iSER 2726 HelloReply Message (see section 9.4) which contains the value of 2727 iSER-ORD. The iSER Layer at both the initiator and the target MAY 2728 adjust (lower) the resources associated with iSER-IRD and iSER-ORD 2729 respectively to match the iSER-ORD value declared in the HelloReply 2730 Message. The iSER Layer at the target MUST flow control the RDMA 2731 Read Request Messages to not exceed the iSER-ORD value at the 2732 target. 2734 8.3 STag Management 2736 An STag, as defined in [RDMAP], is an identifier of a Tagged Buffer 2737 used in an RDMA operation. The allocation and the subsequent 2738 invalidation of the STags are specified in this document if the 2740 Ko et al. Expires May 2007 62 2741 STags are exposed on the wire by being Advertised in the iSER header 2742 or declared in the header of an RCaP Message. 2744 8.3.1 Allocation of STags 2746 When the iSCSI Layer at the initiator invokes the Send_Control 2747 Operational Primitive to request the iSER Layer at the initiator to 2748 process a SCSI Command, zero, one, or two STags may be allocated by 2749 the iSER Layer. See section 7.3.1 for details. The number of STags 2750 allocated depends on whether the command is unidirectional or 2751 bidirectional and whether solicited write data transfer is involved 2752 or not. 2754 When the iSCSI Layer at the initiator invokes the Send_Control 2755 Operational Primitive to request the iSER Layer at the initiator to 2756 process a Task Management Function Request with the TASK REASSIGN 2757 function, besides allocating zero, one, or two STags, the iSER Layer 2758 MUST invalidate the existing STags, if any, associated with the ITT. 2759 See section 7.3.3 for details. 2761 The iSER Layer at the target allocates a local Data Sink STag when 2762 the iSCSI Layer at the target invokes the Get_Data Operational 2763 Primitive to request the iSER Layer to process an R2T PDU. See 2764 section 7.3.6 for details. 2766 8.3.2 Invalidation of STags 2768 The invalidation of the STags at the initiator at the completion of 2769 a unidirectional or bidirectional command when the associated SCSI 2770 Response PDU is sent by the target is described in section 7.3.2. 2772 When a unidirectional or bidirectional command concludes without the 2773 associated SCSI Response PDU being sent by the target, the iSCSI 2774 Layer at the initiator MUST request the iSER Layer at the initiator 2775 to invalidate the STags by invoking the Deallocate_Task_Resources 2776 Operational Primitive qualified with ITT. In response, the iSER 2777 Layer at the initiator MUST locate the STag(s) (if any) in the Local 2778 Mapping that associates the ITT to the local STag(s). The iSER 2779 Layer at the initiator MUST invalidate the STag(s) (if any) and the 2780 Local Mapping. 2782 For an RDMA Read operation used to realize a SCSI Write data 2783 transfer, the iSER Layer at the target SHOULD invalidate the Data 2784 Sink STag at the conclusion of the RDMA Read operation referencing 2785 the Data Sink STag (to permit the immediate reuse of buffer 2786 resources). 2788 Ko et al. Expires May 2007 63 2789 For an RDMA Write operation used to realize a SCSI Read data 2790 transfer, the Data Source STag at the target is not declared to the 2791 initiator and is not exposed on the wire. Invalidation of the STag 2792 is thus not specified. 2794 When a unidirectional or bidirectional command concludes without the 2795 associated SCSI Response PDU being sent by the target, the iSCSI 2796 Layer at the target MUST request the iSER Layer at the target to 2797 invalidate the STags by invoking the Deallocate_Task_Resources 2798 Operational Primitive qualified with ITT. In response, the iSER 2799 Layer at the target MUST locate the local STag(s) (if any) in the 2800 Local Mapping that associates the ITT to the local STag(s). The 2801 iSER Layer at the target MUST invalidate the local STag(s) (if any) 2802 and the mapping. 2804 Ko et al. Expires May 2007 64 2805 9 iSER Control and Data Transfer 2807 For iSCSI data-type PDUs (see section 7.1), the iSER Layer uses RDMA 2808 Read and RDMA Write operations to transfer the solicited data. For 2809 iSCSI control-type PDUs (see section 7.2), the iSER Layer uses Send 2810 Message Types of RCaP. 2812 9.1 iSER Header Format 2814 An iSER header MUST be present in every Send Message Type of RCaP. 2815 The iSER header is located in the first 12 bytes of the message 2816 payload of the Send Message Type of RCaP, as shown in Figure 2. 2818 0 1 2 3 2819 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2821 | Opcode| Opcode Specific Fields | 2822 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2823 | Opcode Specific Fields | 2824 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2825 | Opcode Specific Fields | 2826 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2827 Figure 2 iSER Header Format 2829 Opcode - Operation Code: 4 bits 2831 The Opcode field identifies the type of iSER Messages: 2833 0001b = iSCSI control-type PDU 2835 0010b = iSER Hello Message 2837 0011b = iSER HelloReply Message 2839 All other opcodes are reserved. 2841 9.2 iSER Header Format for iSCSI Control-Type PDU 2843 The iSER Layer uses Send Message Types of RCaP to transfer iSCSI 2844 control-type PDUs (see section 7.2). The message payload of each of 2845 the Send Message Types of RCaP used for transferring an iSER Message 2846 contains an iSER Header followed by an iSCSI control-type PDU. 2848 The iSER header in a Send Message Type of RCaP carrying an iSCSI 2849 control-type PDU MUST have the format as described in Figure 3. 2851 Ko et al. Expires May 2007 65 2852 0 1 2 3 2853 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2854 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2855 | |W|R| | 2856 | 0001b |S|S| Reserved | 2857 | |V|V| | 2858 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2859 | Write STag (or N/A) | 2860 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2861 | Read STag (or N/A) | 2862 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2863 Figure 3 iSER Header Format for iSCSI Control-Type PDU 2865 WSV - Write STag Valid flag: 1 bit 2867 This flag indicates the validity of the Write STag field of the 2868 iSER Header. If set to one, the Write STag field in this iSER 2869 Header is valid. If set to zero, the Write STag field in this 2870 iSER Header MUST be ignored at the receiver. The Write STag 2871 Valid flag is set to one when there is solicited data to be 2872 transferred for a SCSI Write or bidirectional command, or when 2873 there are non-immediate unsolicited and solicited data to be 2874 transferred for the referenced task specified in a Task 2875 Management Function Request with the TASK REASSIGN function. 2877 RSV - Read STag Valid flag: 1 bit 2879 This flag indicates the validity of the Read STag field of the 2880 iSER Header. If set to one, the Read STag field in this iSER 2881 Header is valid. If set to zero, the Read STag field in this 2882 iSER Header MUST be ignored at the receiver. The Read STag 2883 Valid flag is set to one for a SCSI Read or bidirectional 2884 command, or a Task Management Function Request with the TASK 2885 REASSIGN function. 2887 Write STag - Write Steering Tag: 32 bits 2889 This field contains the Write STag when the Write STag Valid 2890 flag is set to one. For a SCSI Write or bidirectional command, 2891 the Write STag is used to Advertise the initiator's I/O Buffer 2892 containing the solicited data. For a Task Management Function 2893 Request with the TASK REASSIGN function, the Write STag is used 2894 to Advertise the initiator's I/O Buffer containing the non- 2895 immediate unsolicited data and solicited data. This Write STag 2896 is used as the Data Source STag in the resultant RDMA Read 2897 operation(s). When the Write STag Valid flag is set to zero, 2898 this field MUST be set to zero. 2900 Ko et al. Expires May 2007 66 2901 Read STag - Read Steering Tag: 32 bits 2903 This field contains the Read STag when the Read STag Valid flag 2904 is set to one. The Read STag is used to Advertise the 2905 initiator's Read I/O Buffer of a SCSI Read or bidirectional 2906 command, or a Task Management Function Request with the TASK 2907 REASSIGN function. This Read STag is used as the Data Sink 2908 STag in the resultant RDMA Write operation(s). When the Read 2909 STag Valid flag is zero, this field MUST be set to zero. 2911 Reserved: 2913 Reserved fields MUST be set to zero on transmit and MUST be 2914 ignored on receive. 2916 9.3 iSER Header Format for iSER Hello Message 2918 An iSER Hello Message MUST only contain the iSER header which MUST 2919 have the format as described in Figure 4. iSER Hello Message is the 2920 first iSER Message sent on the RCaP Stream from the iSER Layer at 2921 the initiator to the iSER Layer at the target. 2923 0 1 2 3 2924 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2925 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2926 | | | | | | 2927 | 0010b | Rsvd | MaxVer| MinVer| iSER-IRD | 2928 | | | | | | 2929 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2930 | Reserved | 2931 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2932 | Reserved | 2933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2934 Figure 4 iSER Header Format for iSER Hello Message 2936 MaxVer - Maximum Version: 4 bits 2938 This field specifies the maximum version of the iSER protocol 2939 supported. It MUST be set to 1 to indicate the version of the 2940 specification described in this document. 2942 MinVer - Minimum Version: 4 bits 2944 This field specifies the minimum version of the iSER protocol 2945 supported. It MUST be set to 1 to indicate the version of the 2946 specification described in this document. 2948 Ko et al. Expires May 2007 67 2949 iSER-IRD: 16 bits 2951 This field contains the value of the iSER-IRD at the initiator. 2953 Reserved (Rsvd): 2955 Reserved fields MUST be set to zero on transmit, and MUST be 2956 ignored on receive. 2958 9.4 iSER Header Format for iSER HelloReply Message 2960 An iSER HelloReply Message MUST only contain the iSER header which 2961 MUST have the format as described in Figure 5. The iSER HelloReply 2962 Message is the first iSER Message sent on the RCaP Stream from the 2963 iSER Layer at the target to the iSER Layer at the initiator. 2965 0 1 2 3 2966 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2967 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2968 | | |R| | | | 2969 | 0011b |Rsvd |E| MaxVer| CurVer| iSER-ORD | 2970 | | |J| | | | 2971 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2972 | Reserved | 2973 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2974 | Reserved | 2975 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2976 Figure 5 iSER Header Format for iSER HelloReply Message 2978 REJ - Reject flag: 1 bit 2980 This flag indicates whether the target is rejecting this 2981 connection. If set to one, the target is rejecting the 2982 connection. 2984 MaxVer - Maximum Version: 4 bits 2986 This field specifies the maximum version of the iSER protocol 2987 supported. It MUST be set to 1 to indicate the version of the 2988 specification described in this document. 2990 CurVer - Current Version: 4 bits 2992 This field specifies the current version of the iSER protocol 2993 supported. It MUST be set to 1 to indicate the version of the 2994 specification described in this document. 2996 Ko et al. Expires May 2007 68 2997 iSER-ORD: 16 bits 2999 This field contains the value of the iSER-ORD at the target. 3001 Reserved (Rsvd): 3003 Reserved fields MUST be set to zero on transmit, and MUST be 3004 ignored on receive. 3006 9.5 SCSI Data Transfer Operations 3008 The iSER Layer at the initiator and the iSER Layer at the target 3009 handle each SCSI Write, SCSI Read, and bidirectional operation as 3010 described below. 3012 9.5.1 SCSI Write Operation 3014 The iSCSI Layer at the initiator MUST invoke the Send_Control 3015 Operational Primitive to request the iSER Layer at the initiator to 3016 send the SCSI Write Command. The iSER Layer at the initiator MUST 3017 request the RCaP layer to transmit a SendSE Message with the message 3018 payload consisting of the iSER header followed by the SCSI Command 3019 PDU and immediate data (if any). If there is solicited data, the 3020 iSER Layer MUST Advertise the Write STag in the iSER header of the 3021 SendSE Message, as described in section 9.2. Upon receiving the 3022 SendSE Message, the iSER Layer at the target MUST notify the iSCSI 3023 Layer at the target by invoking the Control_Notify Operational 3024 Primitive qualified with the SCSI Command PDU. See section 7.3.1 3025 for details on the handling of the SCSI Write Command. 3027 For the non-immediate unsolicited data, the iSCSI Layer at the 3028 initiator MUST invoke a Send_Control Operational Primitive qualified 3029 with the SCSI Data-out PDU. Upon receiving each Send or SendSE 3030 Message containing the non-immediate unsolicited data, the iSER 3031 Layer at the target MUST notify the iSCSI Layer at the target by 3032 invoking the Control_Notify Operational Primitive qualified with the 3033 SCSI Data-out PDU. See section 7.3.4 for details on the handling of 3034 the SCSI Data-out PDU. 3036 For the solicited data, when the iSCSI Layer at the target has an 3037 I/O Buffer available, it MUST invoke the Get_Data Operational 3038 Primitive qualified with the R2T PDU. See section 7.3.6 for details 3039 on the handling of the R2T PDU. 3041 When the data transfer associated with this SCSI Write operation is 3042 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3043 Operational Primitive when it is ready to send the SCSI Response 3045 Ko et al. Expires May 2007 69 3046 PDU. Upon receiving a SendSE or SendInvSE Message containing the 3047 SCSI Response PDU, the iSER Layer at the initiator MUST notify the 3048 iSCSI Layer at the initiator by invoking the Control_Notify 3049 Operational Primitive qualified with the SCSI Response PDU. See 3050 section 7.3.2 for details on the handling of the SCSI Response PDU. 3052 9.5.2 SCSI Read Operation 3054 The iSCSI Layer at the initiator MUST invoke the Send_Control 3055 Operational Primitive to request the iSER Layer at the initiator to 3056 send the SCSI Read Command. The iSER Layer at the initiator MUST 3057 request the RCaP layer to transmit a SendSE Message with the message 3058 payload consisting of the iSER header followed by the SCSI Command 3059 PDU. The iSER Layer at the initiator MUST Advertise the Read STag 3060 in the iSER header of the SendSE Message, as described in section 3061 9.2. Upon receiving the SendSE Message, the iSER Layer at the 3062 target MUST notify the iSCSI Layer at the target by invoking the 3063 Control_Notify Operational Primitive qualified with the SCSI Command 3064 PDU. See section 7.3.1 for details on the handling of the SCSI Read 3065 Command. 3067 When the requested SCSI data is available in the I/O Buffer, the 3068 iSCSI Layer at the target MUST invoke the Put_Data Operational 3069 Primitive qualified with the SCSI Data-in PDU. See section 7.3.5 3070 for details on the handling of the SCSI Data-in PDU. 3072 When the data transfer associated with this SCSI Read operation is 3073 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3074 Operational Primitive when it is ready to send the SCSI Response 3075 PDU. Upon receiving the SendInvSE Message containing the SCSI 3076 Response PDU, the iSER Layer at the initiator MUST notify the iSCSI 3077 Layer at the initiator by invoking the Control_Notify Operational 3078 Primitive qualified with the SCSI Response PDU. See section 7.3.2 3079 for details on the handling of the SCSI Response PDU. 3081 9.5.3 Bidirectional Operation 3083 The initiator and the target handle the SCSI Write and the SCSI Read 3084 portions of this bidirectional operation the same as described in 3085 Section 9.5.1 and Section 9.5.2 respectively. 3087 Ko et al. Expires May 2007 70 3088 10 iSER Error Handling and Recovery 3090 RCaP provides the iSER Layer with reliable in-order delivery. 3091 Therefore, the error management needs of an iSER-assisted connection 3092 are somewhat different than those of a Traditional iSCSI connection. 3094 10.1 Error Handling 3096 iSER error handling is described in the following sections, 3097 classified loosely based on the sources of errors: 3099 1. Those originating at the transport layer (e.g., TCP). 3101 2. Those originating at the RCaP layer. 3103 3. Those originating at the iSER Layer. 3105 4. Those originating at the iSCSI Layer. 3107 10.1.1 Errors in the Transport Layer 3109 If the transport layer is TCP, then TCP packets with detected errors 3110 are silently dropped by the TCP layer and result in retransmission 3111 at the TCP layer. This has no impact on the iSER Layer. However, 3112 connection loss (e.g., link failure) and unexpected termination 3113 (e.g., TCP graceful or abnormal close without the iSCSI Logout 3114 exchanges) at the transport layer will cause the iSCSI/iSER 3115 connection to be terminated as well. 3117 10.1.1.1 Failure in the Transport Layer Before RCaP Mode is Enabled 3119 If the Connection is lost or terminated before the iSCSI Layer 3120 invokes the Allocate_Connection_Resources Operational Primitive, the 3121 login process is terminated and no further action is required. 3123 If the Connection is lost or terminated after the iSCSI Layer has 3124 invoked the Allocate_Connection_Resources Operational Primitive, 3125 then the iSCSI Layer MUST request the iSER Layer to deallocate all 3126 connection resources by invoking the Deallocate_Connection_Resources 3127 Operational Primitive. 3129 10.1.1.2 Failure in the Transport Layer After RCaP Mode is Enabled 3131 If the Connection is lost or terminated after the iSCSI Layer has 3132 invoked the Enable_Datamover Operational Primitive, the iSER Layer 3133 MUST notify the iSCSI Layer of the connection loss by invoking the 3134 Connection_Terminate_Notify Operational Primitive. Prior to 3136 Ko et al. Expires May 2007 71 3137 invoking the Connection_Terminate_Notify Operational Primitive, the 3138 iSER layer MUST perform the actions described in Section 5.2.3.2. 3140 10.1.2 Errors in the RCaP Layer 3142 The RCaP layer does not have error recovery operations built in. If 3143 errors are detected at the RCaP layer, the RCaP layer will terminate 3144 the RCaP Stream and the associated Connection. 3146 10.1.2.1 Errors Detected in the Local RCaP Layer 3148 If an error is encountered at the local RCaP layer, the RCaP layer 3149 MAY send a Terminate Message to the Remote Peer to report the error 3150 if possible. (For iWARP, see [RDMAP] for the list of errors where a 3151 Terminate Message is sent.) The RCaP layer is responsible for 3152 terminating the Connection. After the RCaP layer notifies the iSER 3153 Layer that the Connection is terminated, the iSER Layer MUST notify 3154 the iSCSI Layer by invoking the Connection_Terminate_Notify 3155 Operational Primitive. Prior to invoking the 3156 Connection_Terminate_Notify Operational Primitive, the iSER layer 3157 MUST perform the actions described in Section 5.2.3.2. 3159 10.1.2.2 Errors Detected in the RCaP Layer at the Remote Peer 3161 If an error is encountered at the RCaP layer at the Remote Peer, the 3162 RCaP layer at the Remote Peer may send a Terminate Message to report 3163 the error if possible. If it is unable to send the Terminate 3164 Message, the Connection is terminated. This is treated the same as 3165 a failure in the transport layer after RDMA is enabled as described 3166 in section 10.1.1.2. 3168 If an error is encountered at the RCaP layer at the Remote Peer and 3169 it is able to send a Terminate Message, the RCaP layer at the Remote 3170 Peer is responsible for terminating the connection. After the local 3171 RCaP layer notifies the iSER Layer that the Connection is 3172 terminated, the iSER Layer MUST notify the iSCSI Layer by invoking 3173 the Connection_Terminate_Notify Operational Primitive. Prior to 3174 invoking the Connection_Terminate_Notify Operational Primitive, the 3175 iSER layer MUST perform the actions described in Section 5.2.3.2. 3177 10.1.3 Errors in the iSER Layer 3179 The error handling due to errors at the iSER Layer is described in 3180 the following sections. 3182 Ko et al. Expires May 2007 72 3183 10.1.3.1 Insufficient Connection Resources to Support RCaP at 3184 Connection Setup 3186 After the iSCSI Layer at the initiator invokes the 3187 Allocate_Connection_Resources Operational Primitive during the iSCSI 3188 login negotiation phase, if the iSER Layer at the initiator fails to 3189 allocate the connection resources necessary to support RCaP, it MUST 3190 return a status of failure to the iSCSI Layer at the initiator. The 3191 iSCSI Layer at the initiator MUST terminate the Connection as 3192 described in Section 5.2.3.1. 3194 After the iSCSI Layer at the target invokes the 3195 Allocate_Connection_Resources Operational Primitive during the iSCSI 3196 login negotiation phase, if the iSER Layer at the target fails to 3197 allocate the connection resources necessary to support RCaP, it MUST 3198 return a status of failure to the iSCSI Layer at the target. The 3199 iSCSI Layer at the target MUST send a Login Response with a status 3200 class of 3 (Target Error), and a status code of "0302" (Out of 3201 Resources). The iSCSI Layers at the initiator and the target MUST 3202 terminate the Connection as described in Section 5.2.3.1. 3204 10.1.3.2 iSER Negotiation Failures 3206 If the RCaP or iSER related parameters declared by the initiator in 3207 the iSER Hello Message is unacceptable to the iSER Layer at the 3208 target, the iSER Layer at the target MUST set the Reject (REJ) flag, 3209 as described in section 9.4, in the iSER HelloReply Message. The 3210 following are the cases when the iSER Layer MUST set the REJ flag to 3211 1 in the HelloReply Message: 3213 * The initiator-declared iSER-IRD value is greater than 0 and the 3214 target-declared iSER-ORD value is 0. 3216 * The initiator-supported and the target-supported iSER protocol 3217 versions do not overlap. 3219 After requesting the RCaP layer to send the iSER HelloReply Message, 3220 the handling of the error situation is the same as that for iSER 3221 format errors as described in section 10.1.3.3. 3223 10.1.3.3 iSER Format Errors 3225 The following types of errors in an iSER header are considered 3226 format errors: 3228 * Illegal contents of any iSER header field 3230 Ko et al. Expires May 2007 73 3231 * Inconsistent field contents in an iSER header 3233 * Length error for an iSER Hello or HelloReply Message (see section 3234 9.3 and 9.4) 3236 When a format error is detected, the following events MUST occur in 3237 the specified sequence: 3239 1. The iSER Layer MUST request the RCaP layer to terminate the RCaP 3240 Stream. The RCaP layer MUST terminate the associated 3241 Connection. 3243 2. The iSER Layer MUST notify the iSCSI Layer of the connection 3244 termination by invoking the Connection_Terminate_Notify 3245 Operational Primitive. Prior to invoking the 3246 Connection_Terminate_Notify Operational Primitive, the iSER 3247 layer MUST perform the actions described in Section 5.2.3.2. 3249 10.1.3.4 iSER Protocol Errors 3251 The first iSER Message sent by the iSER Layer at the initiator after 3252 transitioning into iSER-assisted mode MUST be the iSER Hello Message 3253 (see section 9.3). Likewise, the first iSER Message sent by the 3254 iSER Layer at the target after transitioning into iSER-assisted mode 3255 MUST be the iSER HelloReply Message (see section 9.4). Failure to 3256 send the iSER Hello or HelloReply Message, as indicated by the wrong 3257 Opcode in the iSER header, is a protocol error. The handling of 3258 this error situation is the same as that for iSER format errors as 3259 described in section 10.1.3.3. 3261 If the sending side of an iSER-enabled connection acts in a manner 3262 not permitted by the negotiated or declared login/text operational 3263 key values as described in section 6, this is a protocol error and 3264 the receiving side MAY handle this the same as for iSER format 3265 errors as described in section 10.1.3.3. 3267 10.1.4 Errors in the iSCSI Layer 3269 The error handling due to errors at the iSCSI Layer is described in 3270 the following sections. For error recovery, see section 10.2. 3272 10.1.4.1 iSCSI Format Errors 3274 When an iSCSI format error is detected, the iSCSI Layer MUST request 3275 the iSER Layer to terminate the RCaP Stream by invoking the 3276 Connection_Terminate Operational Primitive. For more details on the 3277 connection termination, see Section 5.2.3.1. 3279 Ko et al. Expires May 2007 74 3280 10.1.4.2 iSCSI Digest Errors 3282 In the iSER-assisted mode, the iSCSI Layer will not see any digest 3283 error because both the HeaderDigest and the DataDigest keys are 3284 negotiated to "None". 3286 10.1.4.3 iSCSI Sequence Errors 3288 For Traditional iSCSI, sequence errors are caused by dropped PDUs 3289 due to header or data digest errors. Since digests are not used in 3290 iSER-assisted mode and the RCaP layer will deliver all messages in 3291 the order they were sent, sequence errors will not occur in iSER- 3292 assisted mode. 3294 10.1.4.4 iSCSI Protocol Error 3296 When the iSCSI Layer handles certain protocol errors by dropping the 3297 connection, the error handling is the same as that for iSCSI format 3298 errors as described in section 10.1.4.1. 3300 When the iSCSI Layer uses the iSCSI Reject PDU and response codes to 3301 handle certain other protocol errors, no special handling at the 3302 iSER Layer is required. 3304 10.1.4.5 SCSI Timeouts and Session Errors 3306 This is handled at the iSCSI Layer and no special handling at the 3307 iSER Layer is required. 3309 10.1.4.6 iSCSI Negotiation Failures 3311 For negotiation failures that happen during the Login Phase at the 3312 initiator after the iSCSI Layer has invoked the 3313 Allocate_Connection_Resources Operational Primitive and before the 3314 Enable_Datamover Operational Primitive has been invoked, the iSCSI 3315 Layer MUST request the iSER Layer to deallocate all connection 3316 resources by invoking the Deallocate_Connection_Resources 3317 Operational Primitive. The iSCSI Layer at the initiator MUST 3318 terminate the Connection. 3320 For negotiation failures during the Login Phase at the target, the 3321 iSCSI Layer can use a Login Response with a status class other than 3322 0 (success) to terminate the Login Phase. If the iSCSI Layer has 3323 invoked the Allocate_Connection_Resources Operational Primitive and 3324 before the Enable_Datamover Operational Primitive has been invoked, 3325 the iSCSI Layer at the target MUST request the iSER Layer at the 3326 target to deallocate all connection resources by invoking the 3328 Ko et al. Expires May 2007 75 3329 Deallocate_Connection_Resources Operational Primitive. The iSCSI 3330 Layer at both the initiator and the target MUST terminate the 3331 Connection. 3333 During the iSCSI Login Phase, if the iSCSI Layer at the initiator 3334 receives a Login Response from the target with a status class other 3335 than 0 (Success) after the iSCSI Layer at the initiator has invoked 3336 the Allocate_Connection_Resources Operational Primitive, the iSCSI 3337 Layer MUST request the iSER Layer to deallocate all connection 3338 resources by invoking the Deallocate_Connection_Resources 3339 Operational Primitive. The iSCSI Layer MUST terminate the 3340 Connection in this case. 3342 For negotiation failures during the full feature phase, the error 3343 handling is left to the iSCSI Layer and no special handling at the 3344 iSER Layer is required. 3346 10.2 Error Recovery 3348 Error recovery requirements of iSCSI/iSER are the same as that of 3349 Traditional iSCSI. All three ErrorRecoveryLevels as defined in 3350 [RFC3720] are supported in iSCSI/iSER. 3352 * For ErrorRecoveryLevel 0, session recovery is handled by iSCSI 3353 and no special handling by the iSER Layer is required. 3355 * For ErrorRecoveryLevel 1, see section 10.2.1 on PDU Recovery. 3357 * For ErrorRecoveryLevel 2, see section 10.2.2 on Connection 3358 Recovery. 3360 The iSCSI Layer may invoke the Notice_Key_Values Operational 3361 Primitive during connection setup to request the iSER Layer to take 3362 note of the value of the operational ErrorRecoveryLevel, as 3363 described in sections 5.1.1 and 5.1.2. 3365 10.2.1 PDU Recovery 3367 As described in sections 10.1.4.2 and 10.1.4.3, digest and sequence 3368 errors will not occur in the iSER-assisted mode. If the RCaP layer 3369 detects an error, it will close the iSCSI/iSER connection, as 3370 described in section 10.1.2. Therefore, PDU recovery is not useful 3371 in the iSER-assisted mode. 3373 The iSCSI Layer at the initiator SHOULD disable iSCSI timeout-driven 3374 PDU retransmissions. 3376 Ko et al. Expires May 2007 76 3377 10.2.2 Connection Recovery 3379 The iSCSI Layer at the initiator MAY reassign connection allegiance 3380 for non-immediate commands which are still in progress and are 3381 associated with the failed connection by using a Task Management 3382 Function Request with the TASK REASSIGN function. See section 7.3.3 3383 for more details. 3385 When the iSCSI Layer at the initiator does a task reassignment for a 3386 SCSI Write command, it MUST qualify the Send_Control Operational 3387 Primitive invocation with DataDescriptorOut which defines the I/O 3388 Buffer for both the non-immediate unsolicited data and the solicited 3389 data. This allows the iSCSI Layer at the target to use recovery 3390 R2Ts to request for data originally sent as unsolicited and 3391 solicited from the initiator. 3393 When the iSCSI Layer at the target accepts a reassignment request 3394 for a SCSI Read command, it MUST request the iSER Layer to process 3395 SCSI Data-in for all unacknowledged data by invoking the Put_Data 3396 Operational Primitive. See section 7.3.5 on the handling of SCSI 3397 Data-in. 3399 When the iSCSI Layer at the target accepts a reassignment request 3400 for a SCSI Write command, it MUST request the iSER Layer to process 3401 a recovery R2T for any non-immediate unsolicited data and any 3402 solicited data sequences that have not been received by invoking the 3403 Get_Data Operational Primitive. See section 7.3.6 on the handling 3404 of Ready To Transfer (R2T). 3406 The iSCSI Layer at the target MUST NOT issue recovery R2Ts on an 3407 iSCSI/iSER connection for a task for which the connection allegiance 3408 was never reassigned. The iSER Layer at the target MAY reject such 3409 a recovery R2T received via the Get_Data Operational Primitive 3410 invocation from the iSCSI Layer at the target, with an appropriate 3411 error code. 3413 The iSER Layer at the target will process the requests invoked by 3414 the Put_Data and Get_Data Operational Primitives for a reassigned 3415 task in the same way as for the original commands. 3417 Ko et al. Expires May 2007 77 3418 11 Security Considerations 3420 When iSER is layered on top of an RCaP layer and provides the RDMA 3421 extensions to the iSCSI protocol, the security considerations of 3422 iSER are the same as that of the underlying RCaP layer. For iWARP, 3423 this is described in [RDMAP] and [RDDPSEC]. 3425 Since iSER-assisted iSCSI protocol is still functionally iSCSI from 3426 a security considerations perspective, all of the iSCSI security 3427 requirements as described in [RFC3720] and [RFC3723] apply. If the 3428 IPsec mechanism is used, then it MUST be established before the 3429 connection transitions to the iSER-assisted mode. If iSER is 3430 layered on top of a non-IP based RCaP layer, all the security 3431 protocol mechanisms applicable to that RCaP layer is also applicable 3432 to an iSCSI/iSER connection. If iSER is layered on top of a non-IP 3433 protocol, the IPsec mechanism as specified in [RFC3720] MUST be 3434 implemented at any point where the iSER protocol enters the IP 3435 network (e.g., via gateways), and the non-IP protocol SHOULD 3436 implement (optional to use) a packet by packet security protocol 3437 equal in strength to the IPsec mechanism specified by [RFC3720]. 3439 To minimize the potential for a denial of service attack, the iSCSI 3440 Layer MUST NOT request the iSER Layer to allocate the connection 3441 resources necessary to support RCaP until the iSCSI layer is 3442 sufficiently far along in the iSCSI Login Phase that it is 3443 reasonably certain that the peer side is not an attacker, as 3444 described in sections 5.1.1 and 5.1.2. 3446 Ko et al. Expires May 2007 78 3447 12 IANA Considerations 3449 This document has no actions for IANA. 3451 Ko et al. Expires May 2007 79 3452 13 References 3454 13.1 Normative References 3456 [RFC3720] J. Satran et al., "iSCSI", RFC 3720, April 2004 3458 [RFC3723] B. Aboba et al., "Securing Block Storage Protocols over 3459 IP", RFC 3723, April 2004. 3461 [RDMAP] R. Recio et al., "An RDMA Protocol Specification", IETF 3462 Internet-draft draft-ietf-rddp-rdmap-07.txt (work in progress), 3463 September 2006 3465 [DDP] H. Shah et al., "Direct Data Placement over Reliable 3466 Transports", IETF Internet-draft draft-ietf-rddp-ddp-07.txt 3467 (work in progress), September 2006 3469 [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP 3470 Specification", IETF Internet-draft draft-ietf-rddp-mpa-08.txt 3471 (work in progress), October 2006 3473 [RDDPSEC] J. Pinkerton et al., "DDP/RDMAP Security", IETF Internet 3474 Draft draft-ietf-rddp-security-10.txt (work in progress), June 3475 2006 3477 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 3478 September 1981 3480 [RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate 3481 Requirement Levels", BCP 14, RFC 2119, March 1997 3483 13.2 Informative References 3485 [SAM2] T10/1157D, SCSI Architecture Model - 2 (SAM-2) 3487 [DA] M. Chadalapaka et al., "Datamover Architecture for iSCSI", IETF 3488 Internet-draft, draft-ietf-ips-da-04.txt (work in progress), 3489 October 2006 3491 [VERBS] J. Hilland et al., "RDMA Protocol Verbs Specification", 3492 RDMAC Consortium Draft Specification draft-hilland-iwarp-verbs- 3493 v1.0-RDMAC, April 2003 3495 [IPSEC] S. Kent et al., "Security Architecture for the Internet 3496 Protocol", RFC 2401, November 1998 3498 Ko et al. Expires May 2007 80 3500 [IB] InfiniBand Architecture Specification Volume 1 Release 1.2, 3501 October 2004 3503 [IPoIB] H.K. Chu et al, "Transmission of IP over InfiniBand", RFC 3504 4391, March 2006 3506 Ko et al. Expires May 2007 81 3507 14 Appendix A 3509 14.1 iWARP Message Format for iSER 3511 This section is for information only and is NOT part of the 3512 standard. It simply depicts the iWARP Message format for the 3513 various iSER Messages when the transport layer is TCP. 3515 14.1.1 iWARP Message Format for iSER Hello Message 3517 The following figure depicts an iSER Hello Message encapsulated in 3518 an iWARP SendSE Message. 3520 0 1 2 3 3521 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3523 | MPA Header | DDP Control | RDMA Control | 3524 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3525 | Reserved | 3526 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3527 | (Send) Queue Number | 3528 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3529 | (Send) Message Sequence Number | 3530 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3531 | (Send) Message Offset | 3532 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3533 | 0010b | Zeros | 0001b | 0001b | iSER-IRD | 3534 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3535 | All Zeros | 3536 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3537 | All Zeros | 3538 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3539 | MPA CRC | 3540 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3541 Figure 6 SendSE Message containing an iSER Hello Message 3543 Ko et al. Expires May 2007 82 3544 14.1.2 iWARP Message Format for iSER HelloReply Message 3546 The following figure depicts an iSER HelloReply Message encapsulated 3547 in an iWARP SendSE Message. The Reject (REJ) flag is set to 0. 3549 0 1 2 3 3550 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3551 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3552 | MPA Header | DDP Control | RDMA Control | 3553 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3554 | Reserved | 3555 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3556 | (Send) Queue Number | 3557 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3558 | (Send) Message Sequence Number | 3559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3560 | (Send) Message Offset | 3561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3562 | 0011b |Zeros|0| 0001b | 0001b | iSER-ORD | 3563 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3564 | All Zeros | 3565 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3566 | All Zeros | 3567 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3568 | MPA CRC | 3569 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3570 Figure 7 SendSE Message containing an iSER HelloReply Message 3572 Ko et al. Expires May 2007 83 3573 14.1.3 iWARP Message Format for SCSI Read Command PDU 3575 The following figure depicts a SCSI Read Command PDU embedded in an 3576 iSER Message encapsulated in an iWARP SendSE Message. For this 3577 particular example, in the iSER header, the Write STag Valid flag is 3578 set to zero, the Read STag Valid flag is set to one, the Write STag 3579 field is set to all zeros, and the Read STag field contains a valid 3580 Read STag. 3582 0 1 2 3 3583 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3585 | MPA Header | DDP Control | RDMA Control | 3586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3587 | Reserved | 3588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3589 | (Send) Queue Number | 3590 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3591 | (Send) Message Sequence Number | 3592 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3593 | (Send) Message Offset | 3594 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3595 | 0001b |0|1| All zeros | 3596 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3597 | All Zeros | 3598 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3599 | Read STag | 3600 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3601 | SCSI Read Command PDU | 3602 // // 3603 | | 3604 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3605 | MPA CRC | 3606 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3607 Figure 8 SendSE Message containing a SCSI Read Command PDU 3609 Ko et al. Expires May 2007 84 3610 14.1.4 iWARP Message Format for SCSI Read Data 3612 The following figure depicts an iWARP RDMA Write Message carrying 3613 SCSI Read data in the payload: 3615 0 1 2 3 3616 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3618 | MPA Header | DDP Control | RDMA Control | 3619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3620 | Data Sink STag | 3621 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3622 | Data Sink Tagged Offset | 3623 + + 3624 | | 3625 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3626 | SCSI Read data | 3627 // // 3628 | | 3629 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3630 | MPA CRC | 3631 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3632 Figure 9 RDMA Write Message containing SCSI Read Data 3634 Ko et al. Expires May 2007 85 3635 14.1.5 iWARP Message Format for SCSI Write Command PDU 3637 The following figure depicts a SCSI Write Command PDU embedded in an 3638 iSER Message encapsulated in an iWARP SendSE Message. For this 3639 particular example, in the iSER header, the Write STag Valid flag is 3640 set to one, the Read STag Valid flag is set to zero, the Write STag 3641 field contains a valid Write STag, and the Read STag field is set to 3642 all zeros since it is not used. 3644 0 1 2 3 3645 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3647 | MPA Header | DDP Control | RDMA Control | 3648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3649 | Reserved | 3650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3651 | (Send) Queue Number | 3652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3653 | (Send) Message Sequence Number | 3654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3655 | (Send) Message Offset | 3656 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3657 | 0001b |1|0| All zeros | 3658 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3659 | Write STag | 3660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3661 | All Zeros | 3662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3663 | SCSI Write Command PDU | 3664 // // 3665 | | 3666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3667 | MPA CRC | 3668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3669 Figure 10 SendSE Message containing a SCSI Write Command PDU 3671 Ko et al. Expires May 2007 86 3672 14.1.6 iWARP Message Format for RDMA Read Request 3674 An iSCSI R2T is transformed into an iWARP RDMA Read Request Message. 3675 The following figure depicts an iWARP RDMA Read Request Message: 3677 0 1 2 3 3678 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3680 | MPA Header | DDP Control | RDMA Control | 3681 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3682 | Reserved (Not Used) | 3683 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3684 | DDP (RDMA Read Request) Queue Number | 3685 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3686 | DDP (RDMA Read Request) Message Sequence Number | 3687 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3688 | DDP (RDMA Read Request) Message Offset | 3689 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3690 | Data Sink STag (SinkSTag) | 3691 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3692 | | 3693 + Data Sink Tagged Offset (SinkTO) + 3694 | | 3695 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3696 | RDMA Read Message Size (RDMARDSZ) | 3697 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3698 | Data Source STag (SrcSTag) | 3699 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3700 | | 3701 + Data Source Tagged Offset (SrcTO) + 3702 | | 3703 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3704 | MPA CRC | 3705 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3706 Figure 11 RDMA Read Request Message 3708 Ko et al. Expires May 2007 87 3709 14.1.7 iWARP Message Format for Solicited SCSI Write Data 3711 The following figure depicts an iWARP RDMA Read Response Message 3712 carrying the solicited SCSI Write data in the payload: 3714 0 1 2 3 3715 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3717 | MPA Header | DDP Control | RDMA Control | 3718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3719 | Data Sink STag | 3720 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3721 | Data Sink Tagged Offset | 3722 + + 3723 | | 3724 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3725 | SCSI Write Data | 3726 // // 3727 | | 3728 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3729 | MPA CRC | 3730 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3731 Figure 12 RDMA Read Response Message containing SCSI Write Data 3733 Ko et al. Expires May 2007 88 3734 14.1.8 iWARP Message Format for SCSI Response PDU 3736 The following figure depicts a SCSI Response PDU embedded in an iSER 3737 Message encapsulated in an iWARP SendInvSE Message: 3739 0 1 2 3 3740 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3741 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3742 | MPA Header | DDP Control | RDMA Control | 3743 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3744 | Invalidate STag | 3745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3746 | (Send) Queue Number | 3747 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3748 | (Send) Message Sequence Number | 3749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3750 | (Send) Message Offset | 3751 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3752 | 0001b |0|0| All Zeros | 3753 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3754 | All Zeros | 3755 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3756 | All Zeros | 3757 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3758 | SCSI Response PDU | 3759 // // 3760 | | 3761 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3762 | MPA CRC | 3763 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3764 Figure 13 SendInvSE Message containing SCSI Response PDU 3766 Ko et al. Expires May 2007 89 3767 15 Appendix B 3769 15.1 Architectural discussion of iSER over InfiniBand 3771 This section explains how an InfiniBand network (with Gateways) 3772 would be structured. It is informational only and is intended to 3773 provide insight on how iSER is used in an InfiniBand environment. 3775 15.2 The Host side of the iSCSI & iSER connections in Infiniband 3777 Figure 14 defines the topologies in which iSCSI and iSER will be 3778 able to operate on an InfiniBand Network. 3780 +---------+ +---------+ +---------+ +---------+ +--- -----+ 3781 | Host | | Host | | Host | | Host | | Host | 3782 | | | | | | | | | | 3783 +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ 3784 |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| 3785 +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ 3786 |----+------|-----+-----|-----+-----|-----+-----|-----+---> To IB 3787 IB| IB | IB | IB | IB | SubNet2 SWTCH 3788 +-v-----------v-----------v-----------v-----------v---------+ 3789 | InfiniBand Switch for Subnet1 | 3790 +---+-----+--------+-----+--------+-----+------------v------+ 3791 | TCA | | TCA | | TCA | | 3792 +-----+ +-----+ +-----+ | IB 3793 / IB \ / IB \ / \ +--+--v--+--+ 3794 | iSER | | iSER | | IPoIB | | | TCA | | 3795 | Gateway | | Gateway | | Gateway | | +-----+ | 3796 | to | | to | | to | | Storage | 3797 | iSCSI | | iSER | | IP | | Controller| 3798 | TCP | | iWARP | |Ethernet | +-----+-----+ 3799 +---v-----| +---v-----| +----v----+ 3800 | EN | EN | EN 3801 +--------------+---------------+----> to IP based storage 3802 Ethernet links that carry iSCSI or iWARP 3804 Figure 14 iSCSI and iSER on IB 3806 In Figure 14, the Host systems are connected via the InfiniBand Host 3807 Channel Adapters (HCAs) to the InfiniBand links. With the use of IB 3808 switch(es), the InfiniBand links connect the HCA to InfiniBand 3809 Target Channel Adapters (TCAs) located in gateways or Storage 3810 Controllers. An iSER-capable IB-IP Gateway converts the iSER 3811 Messages encapsulated in IB protocols to either standard iSCSI, or 3812 iSER Messages for iWARP. An [IPoIB] Gateway converts the InfiniBand 3813 [IPoIB] protocol to IP protocol, and in the iSCSI case, permits 3815 Ko et al. Expires May 2007 90 3816 iSCSI to be operated on an IB Network between the Hosts and the 3817 [IPoIB] Gateway. 3819 15.3 The Storage side of iSCSI & iSER mixed network environment 3821 Figure 15 shows a storage controller that has three different portal 3822 groups: one supporting only iSCSI (TPG-4), one supporting iSER/iWARP 3823 or iSCSI (TPG-2), and one supporting iSER/IB (TPG-1). 3825 | | | 3826 | | | 3827 +--+--v--+----------+--v--+----------+--v--+--+ 3828 | | IB | |iWARP| | EN | | 3829 | | | | TCP | | NIC | | 3830 | |(TCA)| | RNIC| | | | 3831 | +-----| +-----+ +-----+ | 3832 | TPG-1 TPG-2 TPG-4 | 3833 | 9.1.3.3 9.1.2.4 9.1.2.6 | 3834 | | 3835 | Storage Controller | 3836 | | 3837 +---------------------------------------------+ 3839 Figure 15 Storage Controller with TCP, iWARP, and IB Connections 3841 The normal iSCSI portal group advertising processes (via SLP, iSNS, 3842 or SendTargets) are available to a Storage Controller. 3844 15.4 Discovery processes for an InfiniBand Host 3846 An InfiniBand Host system can gather portal group IP address from 3847 SLP, iSNS, or the SendTargets discovery processes by using TCP/IP 3848 via [IPoIB]. After obtaining one or more remote portal IP 3849 addresses, the Initiator uses the standard IP mechanisms to resolve 3850 the IP address to a local outgoing interface and the destination 3851 hardware address (Ethernet MAC or IB GID of the target or a gateway 3852 leading to the target). If the resolved interface is an [IPoIB] 3853 network interface, then the target portal can be reached through an 3854 InfiniBand fabric. In this case the Initiator can establish an 3855 iSCSI/TCP or iSCSI/iSER session with the Target over that InfiniBand 3856 interface, using the Hardware Address (InfiniBand GID) obtained 3857 through the standard Address Resolution (ARP) processes. 3859 If more than one IP address are obtained through the discovery 3860 process, the Initiator should select a Target IP address that is on 3861 the same IP subnet as the Initiator if one exists. This will avoid 3863 Ko et al. Expires May 2007 91 3864 a potential overhead of going through a gateway when a direct path 3865 exists. 3867 In addition a user can configure manual static IP route entries if a 3868 particular path to the target is preferred. 3870 15.5 IBTA Connection specifications 3872 It is outside the scope of this document, but it is expected that 3873 the InfiniBand Trade Association (IBTA) has or will define: 3875 * The iSER ServiceID 3877 * A Means for permitting a Host to establish a connection with a 3878 peer InfiniBand end-node, and that peer indicating when that 3879 end-node supports iSER, so the Host would be able to fall back 3880 to iSCSI/TCP over [IPoIB]. 3882 * A Means for permitting the Host to establish connections with 3883 IB iSER connections on storage controllers or IB iSER connected 3884 Gateways in preference to [IPoIB] connected Gateways/Bridges or 3885 connections to Target Storage Controllers that also accept 3886 iSCSI via [IPoIB]. 3888 * A Means for combining the IB ServiceID for iSER and the IP port 3889 number such that the IB Host can use normal IB connection 3890 processes, yet ensure that the iSER target peer can actually 3891 connect to the required IP port number. 3893 Ko et al. Expires May 2007 92 3894 16 Author's Address 3896 Mallikarjun Chadalapaka 3897 Hewlett-Packard Company 3898 8000 Foothills Blvd. 3899 Roseville, CA 95747-5668, USA 3900 Phone: +1-916-785-5621 3901 Email: cbm@rose.hp.com 3903 Uri Elzur 3904 Broadcom Corporation 3905 16215 Alton Parkway 3906 Irvine, CA 92619-7013, USA 3907 Phone: +1-949-926-6432 3908 Email: Uri@Broadcom.com 3910 John Hufferd 3911 Brocade Communications Systems, Inc. 3912 1745 Technology Drive 3913 San Jose, CA 95110, USA 3914 Phone: +1-408-333-5244 3915 Email: jhufferd@brocade.com 3917 Mike Ko 3918 IBM Corp. 3919 650 Harry Rd. 3920 San Jose, CA 95120, USA 3921 Phone: +1-408-927-2085 3922 Email: mako@us.ibm.com 3924 Hemal Shah 3925 Broadcom Corporation 3926 16215 Alton Parkway 3927 P.O. Box 57013 3928 Irvine, CA 92619, USA 3929 Phone: +1-949-926-6941 3930 Email: hemal@broadcom.com 3932 Patricia Thaler 3933 Broadcom Corporation 3934 5025 Keane Dr. 3935 Carmichael, CA 95608, USA 3936 Phone: +1-916-570-2707 3937 email: pthaler@broadcom.com 3939 Ko et al. Expires May 2007 93 3940 17 Acknowledgments 3942 This protocol was developed by a design team that, in addition to 3943 the authors, included Dwight Barron (HP), John Carrier (formerly 3944 from Adaptec), Ted Compton (EMC), Paul R. Culley (HP), Yaron Haviv 3945 (Voltaire), Jeff Hilland (HP), Mike Krause (HP), Alex Nezhinsky 3946 (Voltaire), Jim Pinkerton (Microsoft), Renato J. Recio (IBM), Julian 3947 Satran (IBM), Tom Talpey (Network Appliance), and Jim Wendt (HP). 3948 Special thanks to David Black (EMC) for his extensive review 3949 comments. 3951 Ko et al. Expires May 2007 94 3952 18 Full Copyright Statement 3954 Copyright (C) The Internet Society (2006). This document is subject 3955 to the rights, licenses and restrictions contained in BCP 78, and 3956 except as set forth therein, the authors retain all their rights. 3958 This document and the information contained herein are provided on 3959 an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE 3960 REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE 3961 INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR 3962 IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 3963 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 3964 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 3966 The IETF takes no position regarding the validity or scope of any 3967 Intellectual Property Rights or other rights that might be claimed 3968 to pertain to the implementation or use of the technology 3969 described in this document or the extent to which any license 3970 under such rights might or might not be available; nor does it 3971 represent that it has made any independent effort to identify any 3972 such rights. Information on the procedures with respect to rights 3973 in RFC documents can be found in BCP 78 and BCP 79. 3975 Copies of IPR disclosures made to the IETF Secretariat and any 3976 assurances of licenses to be made available, or the result of an 3977 attempt made to obtain a general license or permission for the use 3978 of such proprietary rights by implementers or users of this 3979 specification can be obtained from the IETF on-line IPR repository 3980 at http://www.ietf.org/ipr. 3982 The IETF invites any interested party to bring to its attention 3983 any copyrights, patents or patent applications, or other 3984 proprietary rights that may cover technology that may be required 3985 to implement this standard. Please address the information to the 3986 IETF at ietf-ipr@ietf.org. 3988 Ko et al. Expires May 2007 95