idnits 2.17.1 draft-ietf-storm-iser-11.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 2 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (May 20, 2012) is 4352 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFCXXXX' is mentioned on line 3511, but not defined ** Obsolete normative reference: RFC 5046 (Obsoleted by RFC 7145) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Storage Maintenance (StorM) Working Group Michael Ko 3 Internet Draft Consultant 4 Intended status: Proposed Standard Alexander Nezhinsky 5 Expires: November 2012 Mellanox 6 Obsoletes: 5046 May 20, 2012 8 iSCSI Extensions for RDMA Specification 9 draft-ietf-storm-iser-11.txt 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with 14 the provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on November, 2012. 34 Abstract 36 iSCSI Extensions for RDMA provides the RDMA data transfer capability 37 to iSCSI by layering iSCSI on top of an RDMA-Capable Protocol. An 38 RDMA-Capable Protocol provides RDMA Read and Write services, which 39 enable data to be transferred directly into SCSI I/O Buffers without 40 intermediate data copies. This document describes the extensions to 41 the iSCSI protocol to support RDMA services as provided by an RDMA- 42 Capable Protocol. 44 This document obsoletes RFC 5046. 46 Table of Contents 48 1 Definitions and Acronyms ....................................6 49 1.1 Definitions .................................................6 50 1.2 Acronyms ...................................................12 51 1.3 Conventions ................................................14 52 2 Introduction ...............................................15 53 2.1 Motivation .................................................15 54 2.2 Architectural Goals ........................................16 55 2.3 Protocol Overview ..........................................16 56 2.4 RDMA services and iSER .....................................18 57 2.4.1 STag......................................................18 58 2.4.2 Send......................................................19 59 2.4.3 RDMA Write................................................20 60 2.4.4 RDMA Read.................................................20 61 2.5 SCSI Read Overview .........................................20 62 2.6 SCSI Write Overview ........................................21 63 2.7 iSCSI/iSER Layering ........................................21 64 3 Upper Layer Interface Requirements .........................23 65 3.1 Operational Primitives offered by iSER .....................23 66 3.1.1 Send_Control..............................................24 67 3.1.2 Put_Data..................................................24 68 3.1.3 Get_Data..................................................24 69 3.1.4 Allocate_Connection_Resources.............................25 70 3.1.5 Deallocate_Connection_Resources...........................25 71 3.1.6 Enable_Datamover..........................................25 72 3.1.7 Connection_Terminate......................................26 73 3.1.8 Notice_Key_Values.........................................26 74 3.1.9 Deallocate_Task_Resources.................................26 75 3.2 Operational Primitives used by iSER ........................27 76 3.2.1 Control_Notify............................................27 77 3.2.2 Data_Completion_Notify....................................27 78 3.2.3 Data_ACK_Notify...........................................28 79 3.2.4 Connection_Terminate_Notify...............................28 80 3.3 iSCSI Protocol Usage Requirements ..........................28 81 4 Lower Layer Interface Requirements .........................30 82 4.1 Interactions with the RCaP Layer ...........................30 83 4.2 Interactions with the Transport Layer ......................31 84 5 Connection Setup and Termination ...........................32 85 5.1 iSCSI/iSER Connection Setup ................................32 86 5.1.1 Initiator Behavior........................................33 87 5.1.2 Target Behavior...........................................35 88 5.1.3 iSER Hello Exchange.......................................36 89 5.2 iSCSI/iSER Connection Termination ..........................37 90 5.2.1 Normal Connection Termination at the Initiator............37 91 5.2.2 Normal Connection Termination at the Target...............38 92 5.2.3 Termination without Logout Request/Response PDUs..........38 93 6 Login/Text Operational Keys ................................40 94 6.1 HeaderDigest and DataDigest ................................40 95 6.2 MaxRecvDataSegmentLength ...................................40 96 6.3 RDMAExtensions .............................................41 97 6.4 TargetRecvDataSegmentLength ................................42 98 6.5 InitiatorRecvDataSegmentLength .............................42 99 6.6 OFMarker and IFMarker ......................................43 100 6.7 MaxOutstandingUnexpectedPDUs ...............................43 101 6.8 MaxAHSLength ...............................................44 102 6.9 TaggedBufferForSolicitedDataOnly ...........................44 103 6.10 iSERHelloRequired.........................................45 104 7 iSCSI PDU Considerations ...................................46 105 7.1 iSCSI Data-Type PDU ........................................46 106 7.2 iSCSI Control-Type PDU .....................................47 107 7.3 iSCSI PDUs .................................................47 108 7.3.1 SCSI Command..............................................47 109 7.3.2 SCSI Response.............................................49 110 7.3.3 Task Management Function Request/Response.................51 111 7.3.4 SCSI Data-out.............................................52 112 7.3.5 SCSI Data-in..............................................53 113 7.3.6 Ready To Transfer (R2T)...................................55 114 7.3.7 Asynchronous Message......................................57 115 7.3.8 Text Request & Text Response..............................57 116 7.3.9 Login Request & Login Response............................58 117 7.3.10 Logout Request & Logout Response ........................58 118 7.3.11 SNACK Request ...........................................58 119 7.3.12 Reject ..................................................58 120 7.3.13 NOP-Out & NOP-In ........................................59 121 8 Flow Control and STag Management ...........................60 122 8.1 Flow Control for RDMA Send Messages ........................60 123 8.1.1 Flow Control for Control-Type PDUs from the Initiator.....60 124 8.1.2 Flow Control for Control-Type PDUs from the Target........63 125 8.2 Flow Control for RDMA Read Resources .......................64 126 8.3 STag Management ............................................65 127 8.3.1 Allocation of STags.......................................65 128 8.3.2 Invalidation of STags.....................................65 129 9 iSER Control and Data Transfer .............................67 130 9.1 iSER Header Format .........................................67 131 9.2 iSER Header Format for iSCSI Control-Type PDU ..............67 132 9.3 iSER Header Format for iSER Hello Message ..................70 133 9.4 iSER Header Format for iSER HelloReply Message .............71 134 9.5 SCSI Data Transfer Operations ..............................72 135 9.5.1 SCSI Write Operation......................................72 136 9.5.2 SCSI Read Operation.......................................73 137 9.5.3 Bidirectional Operation...................................74 138 10 iSER Error Handling and Recovery ...........................75 139 10.1 Error Handling............................................75 140 10.1.1 Errors in the Transport Layer ...........................75 141 10.1.2 Errors in the RCaP Layer ................................76 142 10.1.3 Errors in the iSER Layer ................................76 143 10.1.4 Errors in the iSCSI Layer ...............................78 144 10.2 Error Recovery............................................80 145 10.2.1 PDU Recovery ............................................80 146 10.2.2 Connection Recovery .....................................81 147 11 Security Considerations ....................................82 148 12 IANA Considerations ........................................83 149 13 References .................................................84 150 13.1 Normative References......................................84 151 13.2 Informative References....................................84 152 14 Appendix A: Summary of Changes from RFC 5046 ...............85 153 15 Appendix B: Message Format for iSER ........................87 154 15.1 iWARP Message Format for iSER Hello Message...............87 155 15.2 iWARP Message Format for iSER HelloReply Message..........88 156 15.3 iSER Header Format for SCSI Read Command PDU..............89 157 15.4 iSER Header Format for SCSI Write Command PDU.............90 158 15.5 iSER Header Format for SCSI Response PDU..................91 159 16 Appendix C: Architectural discussion of iSER over InfiniBand92 160 16.1 Host side of iSCSI & iSER connections in Infiniband.......92 161 16.2 Storage side of iSCSI & iSER mixed network environment....93 162 16.3 Discovery processes for an InfiniBand Host................93 163 16.4 IBTA Connection specifications............................94 164 17 Acknowledgments ............................................95 165 Table of Figures 167 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase ...22 168 Figure 2 iSER Header Format .....................................67 169 Figure 3 iSER Header Format for iSCSI Control-Type PDU ..........68 170 Figure 4 iSER Header Format for iSER Hello Message ..............70 171 Figure 5 iSER Header Format for iSER HelloReply Message .........71 172 Figure 6 SendSE Message containing an iSER Hello Message ........87 173 Figure 7 SendSE Message containing an iSER HelloReply Message ...88 174 Figure 8 iSER Header Format for SCSI Read Command PDU ...........89 175 Figure 9 iSER Header Format for SCSI Write Command PDU ..........90 176 Figure 10 iSER Header Format for SCSI Response PDU ..............91 177 Figure 11 iSCSI and iSER on IB ..................................92 178 Figure 12 Storage Controller with TCP, iWARP, and IB Connections 93 180 1 Definitions and Acronyms 182 1.1 Definitions 184 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 185 The act of informing a remote iSER Layer that a local node's 186 buffer is available to it. A Node makes a buffer available for 187 incoming RDMA Read Request Message or incoming RDMA Write 188 Message access by informing the remote iSER Layer of the Tagged 189 Buffer identifiers (STag, Base Offset, and buffer length). Note 190 that this Advertisement of Tagged Buffer information is the 191 responsibility of the iSER Layer on either end and is not 192 defined by the RDMA-Capable Protocol. A typical method would be 193 for the iSER Layer to embed the Tagged Buffer's STag, Base 194 Offset, and buffer length in a message destined for the remote 195 iSER Layer. 197 Base Offset - A value when added to the Buffer Offset forms the 198 Tagged Offset. 200 Completion (Completed, Complete, Completes) - Completion is defined 201 as the process by the RDMA-Capable Protocol layer to inform the 202 iSER Layer, that a particular RDMA Operation has performed all 203 functions specified for the RDMA Operation. 205 Connection - A connection is a logical bidirectional communication 206 channel between the initiator and the target, e.g., a TCP 207 connection. Communication between the initiator and the target 208 occurs over one or more connections. The connections carry 209 control messages, SCSI commands, parameters, and data within 210 iSCSI Protocol Data Units (iSCSI PDUs). 212 Connection Handle - An information element that identifies the 213 particular iSCSI connection and is unique for a given iSCSI 214 Layer and the underlying iSER Layer. Every invocation of an 215 Operational Primitive is qualified with the Connection Handle. 217 Data Sink - The peer receiving a data payload. Note that the Data 218 Sink can be required to both send and receive RCaP Messages to 219 transfer a data payload. 221 Data Source - The peer sending a data payload. Note that the Data 222 Source can be required to both send and receive RCaP Messages to 223 transfer a data payload. 225 Datamover Interface (DI) - The interface between the iSCSI Layer and 226 the Datamover Layer as described in [DA]. 228 Datamover Layer - A layer that is directly below the iSCSI Layer and 229 above the underlying transport layers. This layer exposes and 230 uses a set of transport independent Operational Primitives for 231 the communication between the iSCSI Layer and itself. The 232 Datamover layer, operating in conjunction with the transport 233 layers, moves the control and data information on the iSCSI 234 connection. In this specification, the iSER Layer is the 235 Datamover layer. 237 Datamover Protocol - A Datamover protocol is the wire-protocol that 238 is defined to realize the Datamover layer functionality. In 239 this specification, the iSER protocol is the Datamover protocol. 241 Inbound RDMA Read Queue Depth (IRD) - The maximum number of incoming 242 outstanding RDMA Read Requests that the RDMA-Capable Controller 243 can handle on a particular RCaP Stream at the Data Source. For 244 some RDMA-Capable Protocol layers, the term "IRD" may be known 245 by a different name. For example, for InfiniBand, the 246 equivalent for IRD is the Responder Resources. 248 I/O Buffer - A buffer that is used in a SCSI Read or Write operation 249 so SCSI data may be sent from or received into that buffer. 251 iSCSI - The iSCSI protocol as defined in [iSCSI] is a mapping of the 252 SCSI Architecture Model of SAM-5 over TCP. 254 iSCSI control-type PDU - Any iSCSI PDU that is not an iSCSI data- 255 type PDU and also not a SCSI Data-out PDU carrying solicited 256 data is defined as an iSCSI control-type PDU. Specifically, it 257 is to be noted that SCSI Data-out PDUs for unsolicited data are 258 defined as iSCSI control-type PDUs. 260 iSCSI data-type PDU - An iSCSI data-type PDU is defined as an iSCSI 261 PDU that causes data transfer via RDMA operations at the iSER 262 layer, transparent to the remote iSCSI Layer, to take place 263 between the peer iSCSI nodes on a full feature phase iSCSI 264 connection. An iSCSI data-type PDU, when requested for 265 transmission by the sender iSCSI Layer, results in the 266 associated data transfer without the participation of the remote 267 iSCSI Layer, i.e. the PDU itself is not delivered as-is to the 268 remote iSCSI Layer. The following iSCSI PDUs constitute the set 269 of iSCSI data-type PDUs - SCSI Data-In PDU and R2T PDU. 271 iSCSI Layer - A layer in the protocol stack implementation within an 272 end node that implements the iSCSI protocol and interfaces with 273 the iSER Layer via the Datamover Interface. 275 iSCSI PDU (iSCSI Protocol Data Unit) - The iSCSI Layer at the 276 initiator and the iSCSI Layer at the target divide their 277 communications into messages. The term "iSCSI protocol data 278 unit" (iSCSI PDU) is used for these messages. 280 iSCSI/iSER Connection - An iSER-assisted iSCSI connection. An iSCSI 281 connection that is not iSER-assisted always maps onto a TCP 282 connection at the transport level. But an iSER-assisted iSCSI 283 connection may not have an underlying TCP connection. For some 284 RCaP implementation (e.g., iWARP), an iSER-assisted iSCSI 285 connection has an underlying TCP connection. For other RCaP 286 implementation (e.g., InfiniBand), there is no underlying TCP 287 connection. (In the specific example of InfiniBand [IB], an 288 iSER-assisted iSCSI connection is directly mapped onto the 289 InfiniBand RC channel.) 291 iSCSI/iSER Session - An iSER-assisted iSCSI session. All 292 connections of an iSCSI/iSER session are iSCSI/iSER connections. 294 iSER - iSCSI Extensions for RDMA, the protocol defined in this 295 document. 297 iSER-assisted - A term generally used to describe the operation of 298 iSCSI when the iSER functionality is also enabled below the 299 iSCSI Layer for the specific iSCSI/iSER connection in question. 301 iSER-IRD - This variable represents the maximum number of incoming 302 outstanding RDMA Read Requests that the iSER Layer at the 303 initiator declares on a particular RCaP Stream. 305 iSER-ORD - This variable represents the maximum number of 306 outstanding RDMA Read Requests that the iSER Layer can initiate 307 on a particular RCaP Stream. This variable is maintained only 308 by the iSER Layer at the target. 310 iSER Layer - The layer that implements the iSCSI Extensions for RDMA 311 (iSER) protocol. 313 iWARP - A suite of wire protocols comprising of [RDMAP], [DDP], and 314 [MPA] when layered above [TCP]. [RDMAP] and [DDP] may be 315 layered above SCTP or other transport protocols. 317 Local Mapping - A task state record maintained by the iSER Layer 318 that associates the Initiator Task Tag to the Local STag(s). 319 The specifics of the record structure are implementation 320 dependent. 322 Local Peer - The implementation of the RDMA-Capable Protocol on the 323 local end of the connection. Used to refer to the local entity 324 when describing protocol exchanges or other interactions between 325 two Nodes. 327 Node - A computing device attached to one or more links of a 328 network. A Node in this context does not refer to a specific 329 application or protocol instantiation running on the computer. 330 A Node may consist of one or more RDMA-Capable Controllers 331 installed in a host computer. 333 Operational Primitive - An Operational Primitive is an abstract 334 functional interface procedure that requests another layer to 335 perform a specific action on the requestor's behalf or notifies 336 the other layer of some event. The Datamover Interface between 337 an iSCSI Layer and a Datamover layer within an iSCSI end node 338 uses a set of Operational Primitives to define the functional 339 interface between the two layers. Note that not every 340 invocation of an Operational Primitive may elicit a response 341 from the requested layer. A full discussion of the Operational 342 Primitive types and request-response semantics available to 343 iSCSI and iSER can be found in [DA]. 345 Outbound RDMA Read Queue Depth (ORD) - The maximum number of 346 outstanding RDMA Read Requests that the RDMA-Capable Controller 347 can initiate on a particular RCaP Stream at the Data Sink. For 348 some RDMA-Capable Protocol layer, the term "ORD" may be known by 349 a different name. For example, for InfiniBand, the equivalent 350 for ORD is the Initiator Depth. 352 Phase Collapse - Refers to the optimization in iSCSI where the SCSI 353 status is transferred along with the final SCSI Data-in PDU from 354 a target. See section 4.2 in [iSCSI]. 356 RCaP Message - One or more packets of the network layer comprising a 357 single RDMA operation or a part of an RDMA Read Operation of the 358 RDMA-Capable Protocol. For iWARP, an RCaP Message is known as 359 an RDMAP Message. 361 RCaP Stream - A single bidirectional association between the peer 362 RDMA-Capable Protocol layers on two Nodes over a single 363 transport-level stream. For iWARP, an RCaP Stream is known as 364 an RDMAP Stream, and the association is created following a 365 successful Login Phase during which iSER support is negotiated. 367 RDMA-Capable Protocol (RCaP) - The protocol or protocol suite that 368 provides a reliable RDMA transport functionality, e.g., iWARP, 369 InfiniBand, etc. 371 RDMA-Capable Controller - A network I/O adapter or embedded 372 controller with RDMA functionality. For example, for iWARP, 373 this could be an RNIC, and for InfiniBand, this could be a HCA 374 (Host Channel Adapter) or TCA (Target Channel Adapter). 376 RDMA-enabled Network Interface Controller (RNIC) - A network I/O 377 adapter or embedded controller with iWARP functionality. 379 RDMA Operation - A sequence of RCaP Messages, including control 380 Messages, to transfer data from a Data Source to a Data Sink. 381 The following RDMA Operations are defined - RDMA Write 382 Operation, RDMA Read Operation, and Send Operation. 384 RDMA Protocol (RDMAP) - A wire protocol that supports RDMA 385 Operations to transfer ULP data between a Local Peer and the 386 Remote Peer as described in [RDMAP]. 388 RDMA Read Operation - An RDMA Operation used by the Data Sink to 389 transfer the contents of a Data Source buffer from the Remote 390 Peer to a Data Sink buffer at the Local Peer. An RDMA Read 391 operation consists of a single RDMA Read Request Message and a 392 single RDMA Read Response Message. 394 RDMA Read Request - An RCaP Message used by the Data Sink to request 395 the Data Source to transfer the contents of a buffer. The RDMA 396 Read Request Message describes both the Data Source and the Data 397 Sink buffers. 399 RDMA Read Response - An RCaP Message used by the Data Source to 400 transfer the contents of a buffer to the Data Sink, in response 401 to an RDMA Read Request. The RDMA Read Response Message only 402 describes the Data Sink buffer. 404 RDMA Write Operation - An RDMA Operation used by the Data Source to 405 transfer the contents of a Data Source buffer from the Local 406 Peer to a Data Sink buffer at the Remote Peer. The RDMA Write 407 Message only describes the Data Sink buffer. 409 Remote Direct Memory Access (RDMA) - A method of accessing memory on 410 a remote system in which the local system specifies the remote 411 location of the data to be transferred. Employing an RDMA- 412 Capable Controller in the remote system allows the access to take 413 place without interrupting the processing of the CPU(s) on the 414 system. 416 Remote Mapping - A task state record maintained by the iSER Layer 417 that associates the Initiator Task Tag to the Advertised STag(s) 418 and the Base Offset(s). The specifics of the record structure 419 are implementation dependent. 421 Remote Peer - The implementation of the RDMA-Capable Protocol on the 422 opposite end of the connection. Used to refer to the remote 423 entity when describing protocol exchanges or other interactions 424 between two Nodes. 426 SCSI Layer - This layer builds/receives SCSI CDBs (Command 427 Descriptor Blocks) and sends/receives them with the remaining 428 command execute [SAM5] parameters to/from the iSCSI Layer. 430 Send - An RDMA Operation that transfers the content of a buffer from 431 the Local Peer to an untagged buffer at the Remote Peer. 433 SendInvSE Message - A Send with Solicited Event and Invalidate 434 Message. 436 SendSE Message - A Send with Solicited Event Message. 438 Sequence Number (SN) - DataSN for a SCSI Data-in PDU and R2TSN for 439 an R2T PDU. The semantics for both types of sequence numbers 440 are as defined in [iSCSI]. 442 Session, iSCSI Session - The group of Connections that link an 443 initiator SCSI port with a target SCSI port form an iSCSI 444 session (equivalent to a SCSI I-T nexus). Connections can be 445 added to and removed from a session even while the I-T nexus is 446 intact. Across all connections within a session, an initiator 447 sees one and the same target. 449 Steering Tag (STag) - An identifier of a Tagged Buffer on a Node 450 (Local or Remote) as defined in [RDMAP] and [DDP]. For other 451 RDMA-Capable Protocols, the Steering Tag may be known by 452 different names but will be herein referred to as STags. For 453 example, for Infiniband, a Remote STag is known as an R-Key, and 454 a Local STag is known as an L-Key, and both will be considered 455 STags. 457 Tagged Buffer - A buffer that is explicitly Advertised to the iSER 458 Layer at the remote node through the exchange of an STag, Base 459 Offset, and length. 461 Tagged Offset - The offset within a Tagged Buffer. 463 Traditional iSCSI - Refers to the iSCSI protocol as defined in 464 [iSCSI] (i.e. without the iSER enhancements). 466 Untagged Buffer - A buffer that is not explicitly Advertised to the 467 iSER Layer at the remode node. 469 1.2 Acronyms 471 Acronym Definition 473 -------------------------------------------------------------- 475 AHS Additional Header Segment 477 BHS Basic Header Segment 479 CO Connection Only 481 CRC Cyclic Redundancy Check 483 DDP Direct Data Placement Protocol 485 DI Datamover Interface 487 HCA Host Channel Adapter 489 IANA Internet Assigned Numbers Authority 491 IB Infiniband 493 IETF Internet Engineering Task Force 495 I/O Input - Output 497 IO Initialize Only 499 IP Internet Protocol 501 IPoIB IP over Infiniband 503 IPsec Internet Protocol Security 505 iSER iSCSI Extensions for RDMA 507 ITT Initiator Task Tag 508 LO Leading Only 510 MPA Marker PDU Aligned Framing for TCP 512 NOP No Operation 514 NSG Next Stage (during the iSCSI Login Phase) 516 PDU Protocol Data Unit 518 R2T Ready To Transfer 520 R2TSN Ready To Transfer Sequence Number 522 RDMA Remote Direct Memory Access 524 RDMAP Remote Direct Memory Access Protocol 526 RFC Request For Comments 528 RNIC RDMA-enabled Network Interface Controller 530 SAM5 SCSI Architecture Model - 5 532 SCSI Small Computer Systems Interface 534 SNACK Selective Negative Acknowledgment - also 536 Sequence Number Acknowledgement for data 538 STag Steering Tag 540 SW Session Wide 542 TCA Target Channel Adapter 544 TCP Transmission Control Protocol 546 TMF Task Management Function 548 TTT Target Transfer Tag 550 ULP Upper Level Protocol 552 1.3 Conventions 554 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 555 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 556 document are to be interpreted as described in [RFC2119]. 558 2 Introduction 560 2.1 Motivation 562 The iSCSI protocol ([iSCSI]) is a mapping of the SCSI Architecture 563 Model (see [SAM5] and [iSCSI-SAM]) over the TCP protocol. SCSI 564 commands are carried by iSCSI requests and SCSI responses and status 565 are carried by iSCSI responses. Other iSCSI protocol exchanges and 566 SCSI Data are also transported in iSCSI PDUs. 568 Out-of-order TCP segments in the Traditional iSCSI model have to be 569 stored and reassembled before the iSCSI protocol layer within an end 570 node can place the data in the iSCSI buffers. This reassembly is 571 required because not every TCP segment is likely to contain an iSCSI 572 header to enable its placement and TCP itself does not have a built- 573 in mechanism for signaling ULP message boundaries to aid placement 574 of out-of-order segments. This TCP reassembly at high network 575 speeds is quite counter-productive for the following reasons: wasted 576 memory bandwidth in data copying, need for reassembly memory, wasted 577 CPU cycles in data copying, and the general store-and-forward 578 latency from an application perspective. 580 The generic term RDMA-Capable Protocol (RCaP) is used to refer to 581 protocol stacks that provide the RDMA functionality, such as iWARP 582 and InfiniBand. 584 With the availability of RDMA-Capable Controllers within a host 585 system, it is appropriate for iSCSI to be able to exploit the direct 586 data placement function of the RDMA-Capable Controller like other 587 applications. 589 iSCSI Extensions for RDMA (iSER) is designed precisely to take 590 advantage of generic RDMA technologies - iSER's goal is to permit 591 iSCSI to employ direct data placement and RDMA capabilities using a 592 generic RDMA-Capable Controller. In summary, iSCSI/iSER protocol 593 stack is designed to enable scaling to high speeds by relying on a 594 generic data placement process and RDMA technologies and products, 595 which enable direct data placement of both in-order and out-of-order 596 data. 598 This document describes iSER as a protocol extension to iSCSI, both 599 for convenience of description and also because it is true in a very 600 strict protocol sense. However, it is to be noted that iSER is in 601 reality extending the connectivity of the iSCSI protocol defined in 602 [iSCSI], and the name iSER reflects this reality. 604 When the iSCSI protocol as defined in [iSCSI] (i.e. without the iSER 605 enhancements) is intended in the rest of the document, the term 606 "Traditional iSCSI" is used to make the intention clear. 608 2.2 Architectural Goals 610 This section summarizes the architectural goals that guided the 611 design of iSER. 613 1. Provide an RDMA data transfer model for iSCSI that enables direct 614 in order or out of order data placement of SCSI data into pre- 615 allocated SCSI buffers while maintaining in order data delivery. 617 2. Not require any major changes to SCSI Architecture Model [SAM5] 618 and SCSI command set standards. 620 3. Utilize existing iSCSI infrastructure (sometimes referred to as 621 "iSCSI ecosystem") including but not limited to MIB, 622 bootstrapping, negotiation, naming & discovery, and security. 624 4. Enable a session to operate in the Traditional iSCSI data transfer 625 mode if iSER is not supported by either the initiator or the 626 target (not require iSCSI full feature phase interoperability 627 between an end node operating in Traditional iSCSI mode, and an 628 end node operating in iSER-assisted mode). 630 5. Allow initiator and target implementations to utilize generic 631 RDMA-Capable Controllers such as RNICs, or implement iSCSI and 632 iSER in software (not require iSCSI or iSER specific assists in 633 the RCaP implementation or RDMA-Capable Controller). 635 6. Implement a light weight Datamover protocol for iSCSI with minimal 636 state maintenance. 638 2.3 Protocol Overview 640 Consistent with the architectural goals stated in section 2.2, the 641 iSER protocol does not require changes in the iSCSI ecosystem or any 642 related SCSI specifications. iSER protocol defines the mapping of 643 iSCSI PDUs to RCaP Messages in such a way that it is entirely 644 feasible to realize iSCSI/iSER implementations that are based on 645 generic RDMA-Capable Controllers. The iSER protocol layer requires 646 minimal state maintenance to assist an iSCSI full feature phase 647 connection, besides being oblivious to the notion of an iSCSI 648 session. The crucial protocol aspects of iSER may be summarized 649 thus: 651 1. iSER-assisted mode is negotiated during the iSCSI login in the 652 leading connection for each session, and an entire iSCSI session 653 can only operate in one mode (i.e. a connection in a session 654 cannot operate in iSER-assisted mode if a different connection of 655 the same session is already in full feature phase in the 656 Traditional iSCSI mode). 658 2. Once in iSER-assisted mode, all iSCSI interactions on that 659 connection use RCaP Messages. 661 3. A Send Message is used for carrying an iSCSI control-type PDU 662 preceded by an iSER header. See section 7.2 for more details on 663 iSCSI control-type PDUs. 665 4. RDMA Write, RDMA Read Request, and RDMA Read Response Messages 666 are used for carrying control and all data information associated 667 with the iSCSI data-type PDUs (i.e., SCSI Data-In PDUs and R2T 668 PDUs). iSER does not use SCSI Data-Out PDUs for solicited data, 669 and SCSI Data-Out PDUs for unsolicited data are not treated as 670 iSCSI data-type PDUs by iSER because RDMA is not used. See 671 section 7.1 for more details on iSCSI data-type PDUs. 673 5. Target drives all data transfer (with the exception of iSCSI 674 unsolicited data) for SCSI writes and SCSI reads, by issuing RDMA 675 Read Requests and RDMA Writes respectively. 677 6. RCaP is responsible for ensuring data integrity. (For example, 678 iWARP includes a CRC-enhanced framing layer called MPA on top of 679 TCP; and for Infiniband, the CRCs are included in the Reliable 680 Connection mode). For this reason, iSCSI header and data digests 681 are negotiated to "None" for iSCSI/iSER sessions. 683 7. The iSCSI error recovery hierarchy defined in [iSCSI] is fully 684 supported by iSER. (However, see section 7.3.11 on the handling 685 of SNACK Request PDUs.) 687 8. iSER requires no changes to iSCSI authentication, security, and 688 text mode negotiation mechanisms. 690 Note that Traditional iSCSI implementations may have to be adapted 691 to employ iSER. It is expected that the adaptation when required is 692 likely to be centered around the upper layer interface requirements 693 of iSER (section 3). 695 2.4 RDMA services and iSER 697 iSER is designed to work with software and/or hardware protocol 698 stacks providing the protocol services defined in RCaP documents 699 such as [RDMAP], [IB], etc. The following subsections describe the 700 key protocol elements of RCaP services that iSER relies on. 702 2.4.1 STag 704 An STag is the identifier of an I/O Buffer unique to an RDMA-Capable 705 Controller that the iSER Layer Advertises to the remote iSCSI/iSER 706 node in order to complete a SCSI I/O. 708 In iSER, Advertisement is the act of informing the target by the 709 initiator that an I/O Buffer is available at the initiator for RDMA 710 Read or RDMA Write access by the target. The initiator Advertises 711 the I/O Buffer by including the STag and the Base Offset in the 712 header of an iSER Message containing the SCSI Command PDU to the 713 target. The buffer length is as specified in the SCSI Command PDU. 715 The iSER Layer at the initiator Advertises the STag and the Base 716 Offset for the I/O Buffer of each SCSI I/O to the iSER Layer at the 717 target in the iSER header of a Send Message containing the SCSI 718 Command PDU, unless the I/O can be completely satisfied by 719 unsolicited data alone. The SendSE Message should be used if 720 supported by the RCaP layer (e.g., iWARP). 722 The iSER Layer at the target provides the STag for the I/O Buffer 723 that is the Data Sink of an RDMA Read Operation (section 2.4.4) to 724 the RCaP layer on the initiator node - i.e. this is completely 725 transparent to the iSER Layer at the initiator. 727 The iSER layer at the initiator SHOULD invalidate the Advertised 728 STag upon a normal completion of the associated task. The Send with 729 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 730 can be used for automatic invalidation when it is used to carry the 731 SCSI Response PDU. There are two exceptions to this automatic 732 invalidation - bidirectional commands, and abnormal completion of a 733 command. The iSER Layer at the initiator SHOULD explicitly 734 invalidate the STag in these two cases. That iSER layer MUST check 735 that STag invalidation has occurred whenever receipt of a Send with 736 Invalidate message is the expected means of causing an STag to be 737 invalidated, and MUST perform the STag invalidation if the STag has 738 not already been invalidated (e.g., because a Send message was used 739 instead of Send with Invalidate). 741 If the Advertised STag is not invalidated as recommended in the 742 foregoing paragraph (e.g., in order to cache the STag for future 743 reuse), the I/O Buffer remains exposed to the network for access by 744 the RCaP. Such an I/O Buffer is capable of being read or written by 745 the RCaP outside the scope of the iSCSI operation for which it was 746 originally established, which has both robustness and security 747 considerations. The robustness considerations are that the system 748 containing the iSER initiator may react poorly to an unexpected 749 modification of its memory. For the security considerations, see 750 Section 11. 752 2.4.2 Send 754 Send is the RDMA Operation that is not addressed to an Advertised 755 buffer, and uses Untagged buffers as the message is received. 757 The iSER Layer at the initiator uses the Send Operation to transmit 758 any iSCSI control-type PDU to the target. As an example, the 759 initiator uses Send Operations to transfer iSER Messages containing 760 SCSI Command PDUs to the iSER Layer at the target. 762 An iSER layer at the target uses the Send Operation to transmit any 763 iSCSI control-type PDU to the initiator. As an example, the target 764 uses Send Operations to transfer iSER Messages containing SCSI 765 Response PDUs to the iSER Layer at the initiator. 767 For interoperability, iSER implementations SHOULD accept and 768 correctly process SendSE and SendInvSE messages. However, SendSE 769 and SendInvSE messages are to be regarded as optimizations or 770 enhancements to the basic Send message, and their support may vary 771 by RCaP protocol and specific implementation. In general, these 772 messages SHOULD NOT be used, unless the RCaP requires support for 773 them in all implementations. If these messages are used, the 774 implementation SHOULD be capable of reverting to use of Send in 775 order to work with a receiver that does not support these message. 776 Attempted use of these messages with a peer that does not support 777 them may result in a fatal error that closes the RCaP connection. 778 For example, these messages SHOULD NOT be used with the InfiniBand 779 RCaP because InfiniBand does not require support for them in all 780 cases. New iSER implementations SHOULD use Send (and not SendSE or 781 SendInvSE) unless there are compelling reasons for doing otherwise. 782 Similarly, iSER implementations SHOULD NOT rely on events triggered 783 by SendSE and SendInvSE, as these messages may not be used. 785 2.4.3 RDMA Write 787 RDMA Write is the RDMA Operation that is used to place data into an 788 Advertised buffer at the Data Sink. The Data Source addresses the 789 Message using an STag and a Tagged Offset that are valid on the Data 790 Sink. 792 The iSER Layer at the target uses the RDMA Write Operation to 793 transfer the contents of a local I/O Buffer to an Advertised I/O 794 Buffer at the initiator. The iSER Layer at the target uses the RDMA 795 Write to transfer whole or part of the data required to complete a 796 SCSI Read command. 798 The iSER Layer at the initiator does not employ RDMA Writes. 800 2.4.4 RDMA Read 802 RDMA Read is the RDMA Operation that is used to retrieve data from 803 an Advertised buffer at the Data Source. The sender of the RDMA 804 Read Request addresses the Message using an STag and a Tagged Offset 805 that are valid on the Data Source in addition to providing a valid 806 local STag and Tagged Offset that identify the Data Sink. 808 The iSER Layer at the target uses the RDMA Read Operation to 809 transfer the contents of an Advertised I/O Buffer at the initiator 810 to a local I/O Buffer at the target. The iSER Layer at the target 811 uses the RDMA Read to fetch whole or part of the data required to 812 complete a SCSI Write Command. 814 The iSER Layer at the initiator does not employ RDMA Reads. 816 2.5 SCSI Read Overview 818 The iSER Layer at the initiator receives the SCSI Command PDU from 819 the iSCSI Layer. The iSER Layer at the initiator generates an STag 820 for the I/O Buffer of the SCSI Read and Advertises the buffer by 821 including the STag and the Base Offset as part of the iSER header 822 for the PDU. The iSER Message is transferred to the target using a 823 Send Message. The SendSE Message should be used if supported by the 824 RCaP layer (e.g., iWARP). 826 The iSER Layer at the target uses one or more RDMA Writes to 827 transfer the data required to complete the SCSI Read. 829 The iSER Layer at the target uses a Send Message to transfer the 830 SCSI Response PDU back to the iSER Layer at the initiator. The iSER 831 Layer at the initiator invalidates the STag and notifies the iSCSI 832 Layer of the availability of the SCSI Response PDU. The Send with 833 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 834 can be used for automatic invalidation of the STag. 836 2.6 SCSI Write Overview 838 The iSER Layer at the initiator receives the SCSI Command PDU from 839 the iSCSI Layer. If solicited data transfer is involved, the iSER 840 Layer at the initiator generates an STag for the I/O Buffer of the 841 SCSI Write and Advertises the buffer by including the STag and the 842 Base Offset as part of the iSER header for the PDU. The iSER 843 Message is transferred to the target using a Send Message. The 844 SendSE Message should be used if supported by the RCaP layer (e.g., 845 iWARP). 847 The iSER Layer at the initiator may optionally send one or more non- 848 immediate unsolicited data PDUs to the target using Send Messages. 850 If solicited data transfer is involved, the iSER Layer at the target 851 uses one or more RDMA Reads to transfer the data required to 852 complete the SCSI Write. 854 The iSER Layer at the target uses a Send Message to transfer the 855 SCSI Response PDU back to the iSER Layer at the initiator. The iSER 856 Layer at the initiator invalidates the STag and notifies the iSCSI 857 Layer of the availability of the SCSI Response PDU. The Send with 858 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 859 can be used for automatic invalidation of the STag. 861 2.7 iSCSI/iSER Layering 863 iSCSI Extensions for RDMA (iSER) is layered between the iSCSI layer 864 and the RCaP layer. Note that the RCaP layer may be composed of one 865 or more distinct protocol layers depending on the specifics of the 866 RCaP. Figure 1 shows an example of the relationship between SCSI, 867 iSCSI, iSER, and the different RCaP layers. For TCP, the RCaP is 868 iWARP. For Infiniband, the RCaP is the Reliable Connected Transport 869 Service. Note that the iSCSI layer as described here supports the 870 RDMA Extensions as used in iSER. 872 +-------------------------------------+ 873 | SCSI | 874 +-------------------------------------+ 875 | iSCSI | 876 DI ------> +-------------------------------------+ 877 | iSER | 878 +---------+--------------+------------+ 879 | RDMAP | | | 880 +---------+ Infiniband | | 881 | DDP | Reliable | Other | 882 +---------+ Connected | RDMA- | 883 | MPA | Transport | Capable | 884 +---------+ Service | Protocol | 885 | TCP | | | 886 +---------+--------------+------------+ 887 | | Infiniband | Other | 888 | IP | Network | Network | 889 | | Layer | Layer | 890 +---------+--------------+------------+ 892 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase 894 3 Upper Layer Interface Requirements 896 This section discusses the upper layer interface requirements in the 897 form of an abstract model of the required interactions between the 898 iSCSI Layer and the iSER Layer. The abstract model used here is 899 derived from the architectural model described in [DA]. [DA] also 900 provides a functional overview of the interactions between the iSCSI 901 Layer and the datamover layer as intended by the Datamover 902 Architecture. 904 The interface requirements are specified by Operational Primitives. 905 An Operational Primitive is an abstract functional interface 906 procedure between the iSCSI Layer and the iSER Layer that requests 907 one layer to perform a specific action on behalf of the other layer 908 or notifies the other layer of some event. Whenever an Operational 909 Primitive in invoked, the Connection_Handle qualifier is used to 910 identify a particular iSCSI connection. For some Operational 911 Primitives, a Data_Descriptor is used to identify the iSCSI/SCSI 912 data buffer associated with the requested or completed operation. 914 The abstract model and the Operational Primitives defined in this 915 section facilitate the description of the iSER protocol. In the 916 rest of the iSER specification, the compliance statements related to 917 the use of these Operational Primitives are only for the purpose of 918 the required interactions between the iSCSI Layer and the iSER 919 Layer. Note that the compliance statements related to the 920 Operational Primitives in the rest of this specification only 921 mandate functional equivalence on implementations, but do not put 922 any requirements on the implementation specifics of the interface 923 between the iSCSI Layer and the iSER Layer. 925 Each Operational Primitive is invoked with a set of qualifiers which 926 specify the information context for performing the specific action 927 being requested of the Operational Primitive. While the qualifiers 928 are required, the method of realizing the qualifiers (e.g., by 929 passing synchronously with invocation, or by retrieving from task 930 context, or by retrieving from shared memory, etc.) is 931 implementation dependent. 933 3.1 Operational Primitives offered by iSER 935 The iSER protocol layer MUST support the following Operational 936 Primitives to be used by the iSCSI protocol layer. 938 3.1.1 Send_Control 940 Input qualifiers: Connection_Handle, BHS and AHS (if any) of 941 the iSCSI PDU, PDU-specific qualifiers 943 Return results: Not specified 945 This is used by the iSCSI Layers at the initiator and the target to 946 request the outbound transfer of an iSCSI control-type PDU (see 947 section 7.2). Qualifiers that only apply for a particular control- 948 type PDU are known as PDU-specific qualifiers, e.g., 949 ImmediateDataSize for a SCSI Write command. For details on PDU- 950 specific qualifiers, see section 7.3. The iSCSI Layer can only 951 invoke the Send_Control Operational Primitive when the connection is 952 in iSER-assisted mode. 954 3.1.2 Put_Data 956 Input qualifiers: Connection_Handle, content of a SCSI Data-in 957 PDU header, Data_Descriptor, Notify_Enable 959 Return results: Not specified 961 This is used by the iSCSI Layer at the target to request the 962 outbound transfer of data for a SCSI Data-in PDU from the buffer 963 identified by the Data_Descriptor qualifier. The iSCSI Layer can 964 only invoke the Put_Data Operational Primitive when the connection 965 is in iSER-assisted mode. 967 The Notify_Enable qualifier is used to indicate to the iSER Layer 968 whether or not it should generate an eventual local completion 969 notification to the iSCSI Layer. See section 3.2.2 on 970 Data_Completion_Notify for details. 972 3.1.3 Get_Data 974 Input qualifiers: Connection_Handle, content of an R2T PDU, 975 Data_Descriptor, Notify_Enable 977 Return results: Not specified 979 This is used by the iSCSI Layer at the target to request the inbound 980 transfer of solicited data requested by an R2T PDU into the buffer 981 identified by the Data_Descriptor qualifier. The iSCSI Layer can 982 only invoke the Get_Data Operational Primitive when the connection 983 is in iSER-assisted mode. 985 The Notify_Enable qualifier is used to indicate to the iSER Layer 986 whether or not it should generate the eventual local completion 987 notification to the iSCSI Layer. See section 3.2.2 on 988 Data_Completion_Notify for details. 990 3.1.4 Allocate_Connection_Resources 992 Input qualifiers: Connection_Handle, Resource_Descriptor 993 (optional) 995 Return results: Status 997 This is used by the iSCSI Layers at the initiator and the target to 998 request the allocation of all connection resources necessary to 999 support RCaP for an operational iSCSI/iSER connection. The iSCSI 1000 Layer may optionally specify the implementation-specific resource 1001 requirements for the iSCSI connection using the Resource_Descriptor 1002 qualifier. 1004 A return result of Status=success means the invocation succeeded, 1005 and a return result of Status=failure means that the invocation 1006 failed. If the invocation is for a Connection_Handle for which an 1007 earlier invocation succeeded, the request will be ignored by the 1008 iSER Layer and the result of Status=success will be returned. Only 1009 one Allocate_Connection_Resources Operational Primitive invocation 1010 can be outstanding for a given Connection_Handle at any time. 1012 3.1.5 Deallocate_Connection_Resources 1014 Input qualifiers: Connection_Handle 1016 Return results: Not specified 1018 This is used by the iSCSI Layers at the initiator and the target to 1019 request the deallocation of all connection resources that were 1020 allocated earlier as a result of a successful invocation of the 1021 Allocate_Connection_Resources Operational Primitive. 1023 3.1.6 Enable_Datamover 1025 Input qualifiers: Connection_Handle, 1026 Transport_Connection_Descriptor, Final Login_Response_PDU 1027 (optional) 1029 Return results: Not specified 1031 This is used by the iSCSI Layers at the initiator and the target to 1032 request that iSER-assisted mode be used for the connection. The 1033 Transport_Connection_Descriptor qualifier is used to identify the 1034 specific connection associated with the Connection_Handle. The 1035 iSCSI layer can only invoke the Enable_Datamover Operational 1036 Primitive when there was a corresponding prior resource allocation. 1038 The Final_Login_Response_PDU input qualifier is applicable only for 1039 a target, and contains the final Login Response PDU that concludes 1040 the iSCSI Login Phase. 1042 3.1.7 Connection_Terminate 1044 Input qualifiers: Connection_Handle 1046 Return results: Not specified 1048 This is used by the iSCSI Layers at the initiator and the target to 1049 request that a specified iSCSI/iSER connection be terminated and all 1050 associated connection and task resources be freed. When this 1051 Operational Primitive invocation returns to the iSCSI layer, the 1052 iSCSI layer may assume full ownership of all iSCSI-level resources, 1053 e.g. I/O Buffers, associated with the connection. 1055 3.1.8 Notice_Key_Values 1057 Input qualifiers: Connection_Handle, number of keys, list of 1058 Key-Value pairs 1060 Return results: Not specified 1062 This is used by the iSCSI Layers at the initiator and the target to 1063 request the iSER Layer to take note of the specified Key-Value pairs 1064 which were negotiated by the iSCSI peers for the connection. 1066 3.1.9 Deallocate_Task_Resources 1068 Input qualifiers: Connection_Handle, ITT 1070 Return results: Not specified 1072 This is used by the iSCSI Layers at the initiator and the target to 1073 request the deallocation of all RCaP-specific resources allocated by 1074 the iSER Layer for the task identified by the ITT qualifier. The 1075 iSER Layer may require a certain number of RCaP-specific resources 1076 associated with the ITT for each new iSCSI task. In the normal 1077 course of execution, these task-level resources in the iSER Layer 1078 are assumed to be transparently allocated on each task initiation 1079 and deallocated on the conclusion of each task as appropriate. In 1080 exception scenarios where the task does not conclude with a SCSI 1081 Response PDU, the iSER Layer needs to be notified of the individual 1082 task terminations to aid its task-level resource management. This 1083 Operational Primitive is used for this purpose, and is not needed 1084 when a SCSI Response PDU normally concludes a task. Note that RCaP- 1085 specific task resources are deallocated by the iSER Layer when a 1086 SCSI Response PDU normally concludes a task, even if the SCSI Status 1087 was not success. 1089 3.2 Operational Primitives used by iSER 1091 The iSER layer MUST use the following Operational Primitives offered 1092 by the iSCSI protocol layer when the connection is in iSER-assisted 1093 mode. 1095 3.2.1 Control_Notify 1097 Input qualifiers: Connection_Handle, an iSCSI control-type PDU 1099 Return results: Not specified 1101 This is used by the iSER Layers at the initiator and the target to 1102 notify the iSCSI Layer of the availability of an inbound iSCSI 1103 control-type PDU. A PDU is described as "available" to the iSCSI 1104 Layer when the iSER Layer notifies the iSCSI Layer of the reception 1105 of that inbound PDU, along with an implementation-specific 1106 indication as to where the received PDU is. 1108 3.2.2 Data_Completion_Notify 1110 Input qualifiers: Connection_Handle, ITT, SN 1112 Return results: Not specified 1114 This is used by the iSER Layer to notify the iSCSI Layer of the 1115 completion of outbound data transfer that was requested by the iSCSI 1116 Layer only if the invocation of the Put_Data Operational Primitive 1117 (see section 3.1.2) was qualified with Notify_Enable set. SN refers 1118 to the DataSN associated with the SCSI Data-In PDU. 1120 This is used by the iSER Layer to notify the iSCSI Layer of the 1121 completion of inbound data transfer that was requested by the iSCSI 1122 Layer only if the invocation of the Get_Data Operational Primitive 1123 (see section 3.1.3) was qualified with Notify_Enable set. SN refers 1124 to the R2TSN associated with the R2T PDU. 1126 3.2.3 Data_ACK_Notify 1128 Input qualifier: Connection_Handle, ITT, DataSN 1130 Return results: Not specified 1132 This is used by the iSER Layer at the target to notify the iSCSI 1133 Layer of the arrival of the data acknowledgement (as defined in 1134 [iSCSI]) requested earlier by the iSCSI Layer for the outbound data 1135 transfer via an invocation of the Put_Data Operational Primitive 1136 where the A-bit in the SCSI Data-in PDU is set to 1. See section 1137 7.3.5. DataSN refers to the expected DataSN of the next SCSI Data- 1138 in PDU which immediately follows the SCSI Data-in PDU with the A-bit 1139 set to which this notification corresponds, with semantics as 1140 defined in [iSCSI]. 1142 3.2.4 Connection_Terminate_Notify 1144 Input qualifiers: Connection_Handle 1146 Return results: Not specified 1148 This is used by the iSER Layers at the initiator and the target to 1149 notify the iSCSI Layer of the unsolicited termination or failure of 1150 an iSCSI/iSER connection. The iSER Layer MUST deallocate the 1151 connection and task resources associated with the terminated 1152 connection before the invocation of this Operational Primitive. 1153 Note that the Connection_Terminate_Notify Operational Primitive is 1154 not invoked when the termination of the connection was earlier 1155 requested by the local iSCSI Layer. 1157 3.3 iSCSI Protocol Usage Requirements 1159 To operate in an iSER-assisted mode, the iSCSI Layers at both the 1160 initiator and the target MUST negotiate the RDMAExtensions key (see 1161 section 6.3) to "Yes" on the leading connection. If the 1162 RDMAExtensions key is not negotiated to "Yes", then iSER-assisted 1163 mode MUST NOT be used. If the RDMAExtensons key is negotiated to 1164 "Yes" but the invocation of the Allocate_Connection_Resources 1165 Operational Primitive to the iSER layer fails, the iSCSI layer MUST 1166 fail the iSCSI Login process or terminate the connection as 1167 appropriate. See section 10.1.3.1 for details. 1169 If the RDMAExtensions key is negotiated to "Yes", the iSCSI Layer 1170 MUST satisfy the following protocol usage requirements from the iSER 1171 protocol: 1173 1. The iSCSI Layer at the initiator MUST set ExpDataSN to 0 in Task 1174 Management Function Requests for Task Allegiance Reassignment 1175 for read/bidirectional commands, so as to cause the target to 1176 send all unacknowledged read data. 1178 2. The iSCSI Layer at the target MUST always return the SCSI status 1179 in a separate SCSI Response PDU for read commands, i.e., there 1180 MUST NOT be a "phase collapse" in concluding a SCSI Read 1181 Command. 1183 3. The iSCSI Layers at both the initiator and the target MUST 1184 support the keys as defined in section 6 on Login/Text 1185 Operational Keys. If used as specified, these keys MUST NOT be 1186 answered with NotUnderstood and the semantics as defined MUST be 1187 followed for each iSER-assisted connection. 1189 4. The iSCSI Layer at the initiator MUST NOT issue SNACKs for PDUs. 1191 4 Lower Layer Interface Requirements 1193 4.1 Interactions with the RCaP Layer 1195 The iSER protocol layer is layered on top of an RCaP layer (see 1196 Figure 1) and the following are the key features that are assumed to 1197 be supported by any RCaP layer: 1199 * The RCaP layer supports all basic RDMA operations, including RDMA 1200 Write Operation, RDMA Read Operation, and Send Operation. 1202 * The RCaP layer provides reliable, in-order message delivery and 1203 direct data placement. 1205 * When the iSER Layer initiates an RDMA Read Operation following an 1206 RDMA Write Operation on one RCaP Stream, the RDMA Read Response 1207 Message processing on the remote node will be started only after 1208 the preceding RDMA Write Message payload is placed in the memory 1209 of the remote node. 1211 * The RCaP layer encapsulates a single iSER Message into a single 1212 RCaP Message on the Data Source side. The RCaP layer 1213 decapsulates the iSER Message before delivering it to the iSER 1214 Layer on the Data Sink side. 1216 * For a RCaP layer that supports the Send with Invalidate Message 1217 (e.g., iWARP), when the iSER Layer provides the STag to be 1218 remotely invalidated to the RCaP layer for a Send with Invalidate 1219 Message, the RCaP layer uses this STag as the STag to be 1220 invalidated in the Send with Invalidate Message. 1222 * The RCaP layer uses the STag and Tagged Offset provided by the 1223 iSER Layer for the RDMA Write and RDMA Read Request Messages. 1225 * When the RCaP layer delivers the content of an RDMA Send Message 1226 to the iSER Layer, the RCaP layer provides the length of the RDMA 1227 Send message. This ensures that the iSER Layer does not have to 1228 carry a length field in the iSER header. 1230 * When the RCaP layer delivers the Send Message to the iSER Layer, 1231 it notifies the iSER Layer with the mechanism provided on that 1232 interface. 1234 * For a RCaP layer that supports the Send with Invalidate Message 1235 (e.g., iWARP), when the RCaP layer delivers a Send with 1236 Invalidate Message to the iSER Layer, it passes the value of the 1237 STag that was invalidated. 1239 * The RCaP layer propagates all status and error indications to the 1240 iSER Layer. 1242 * For a transport layer that operates in byte stream mode such as 1243 TCP, the RCaP implementation supports the enabling of the RDMA 1244 mode after Connection establishment and the exchange of Login 1245 parameters in byte stream mode. For a transport layer that 1246 provides message delivery capability such as [IB], the RCaP 1247 implementation supports the use of the messaging capability by 1248 the iSCSI Layer directly for the Login phase after connection 1249 establishment before enabling iSER-assisted mode. (In the 1250 specific example of InfiniBand [IB], the iSCSI Layer uses IB 1251 messages to transfer iSCSI PDUs for the Login phase after 1252 connection establishment before enabling iSER-assisted mode.) 1254 * Whenever the iSER Layer terminates the RCaP Stream, the RCaP 1255 layer terminates the associated Connection. 1257 4.2 Interactions with the Transport Layer 1259 After the iSER connection is established, the RCaP layer and the 1260 underlying transport layer are responsible for maintaining the 1261 Connection and reporting to the iSER Layer any Connection failures. 1263 5 Connection Setup and Termination 1265 5.1 iSCSI/iSER Connection Setup 1267 During connection setup, the iSCSI Layer at the initiator is 1268 responsible for establishing a connection with the target. After 1269 the connection is established, the iSCSI Layers at the initiator and 1270 the target enter the Login Phase using the same rules as outlined in 1271 [iSCSI]. The connection transitions into the iSCSI full feature 1272 phase in iSER-assisted mode following a successful login negotiation 1273 between the initiator and the target in which iSER-assisted mode is 1274 negotiated and the connection resources necessary to support RCaP 1275 have been allocated at both the initiator and the target. The same 1276 connection MUST be used for both the iSCSI Login phase and the 1277 subsequent iSER-assisted full feature phase. 1279 For a transport layer that operates in byte stream mode such as TCP, 1280 the RCaP implementation supports the enabling of the RDMA mode after 1281 Connection establishment and the exchange of Login parameters in 1282 byte stream mode. For a transport layer that provides message 1283 delivery capability such as [IB], the RCaP implementation supports 1284 the use of the messaging capability by the iSCSI Layer directly for 1285 the Login phase after connection establishment before enabling iSER- 1286 assisted mode. 1288 iSER-assisted mode MUST be enabled only if it is negotiated on the 1289 leading connection during the LoginOperationalNegotiation Stage of 1290 the iSCSI Login Phase. iSER-assisted mode is negotiated using the 1291 RDMAExtensions= key. Both the initiator and the 1292 target MUST exchange the RDMAExtensions key with the value set to 1293 "Yes" to enable iSER-assisted mode. If both the initiator and the 1294 target fail to negotiate the RDMAExtensions key set to "Yes", then 1295 the connection MUST continue with the login semantics as defined in 1296 [iSCSI]. If the RDMAExtensions key is not negotiated to Yes, then 1297 for some RCaP implementation (such as [IB]), the existing connection 1298 may need to be torn down and a new connection may need to be 1299 established in TCP capable mode. (For InfiniBand this will require 1300 an [IPoIB] type connection.) 1302 iSER-assisted mode is defined for a Normal session only and the 1303 RDMAExtensions key MUST NOT be negotiated for a Discovery session. 1304 Discovery sessions are always conducted using the transport layer as 1305 described in [iSCSI]. 1307 An iSER enabled node is not required to initiate the RDMAExtensions 1308 key exchange if its preference is for the Traditional iSCSI mode. 1309 The RDMAExtensions key, if offered, MUST be sent in the first 1310 available Login Response or Login Request PDU in the 1311 LoginOperationalNegotiation stage. This is due to the fact that the 1312 value of some login parameters might depend on whether iSER-assisted 1313 mode is enabled or not. 1315 iSER-assisted mode is a session-wide attribute. If both the 1316 initiator and the target negotiated RDMAExtensions="Yes" on the 1317 leading connection of a session, then all subsequent connections of 1318 the same session MUST enable iSER-assisted mode without having to 1319 exchange RDMAExtensions key during the iSCSI Login Phase. 1320 Conversely, if both the initiator and the target failed to negotiate 1321 RDMAExtensions to "Yes" on the leading connection of a session, then 1322 the RDMAExtensions key MUST NOT be negotiated further on any 1323 additional subsequent connection of the session. 1325 When the RDMAExtensions key is negotiated to "Yes", the HeaderDigest 1326 and the DataDigest keys MUST be negotiated to "None" on all 1327 iSCSI/iSER connections participating in that iSCSI session. This is 1328 because, for an iSCSI/iSER connection, RCaP is responsible for 1329 providing error detection that is at least as good as a 32-bit CRC 1330 for all iSER Messages. Furthermore, all SCSI Read data are sent 1331 using RDMA Write Messages instead of the SCSI Data-in PDUs, and all 1332 solicited SCSI write data are sent using RDMA Read Response Messages 1333 instead of the SCSI Data-out PDUs. HeaderDigest and DataDigest 1334 which apply to iSCSI PDUs would not be appropriate for RDMA Read and 1335 RDMA Write operations used with iSER. 1337 5.1.1 Initiator Behavior 1339 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1340 mode, then on the initiator side, prior to sending the Login Request 1341 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1342 to FullFeaturePhase, the iSCSI Layer MUST request the iSER Layer to 1343 allocate the connection resources necessary to support RCaP by 1344 invoking the Allocate_Connection_Resources Operational Primitive. 1345 The connection resources required are defined by implementation and 1346 are outside the scope of this specification. The iSCSI Layer may 1347 invoke the Notice_Key_Values Operational Primitive before invoking 1348 the Allocate_Connection_Resources Operational Primitive to request 1349 the iSER Layer to take note of the negotiated values of the iSCSI 1350 keys for the Connection. The specific keys to be passed in as input 1351 qualifiers are implementation dependent. These may include, but not 1352 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1354 Among the connection resources allocated at the initiator is the 1355 Inbound RDMA Read Queue Depth (IRD). As described in section 9.5.1, 1356 R2Ts are transformed by the target into RDMA Read operations. IRD 1357 limits the maximum number of simultaneously incoming outstanding 1358 RDMA Read Requests per an RCaP Stream from the target to the 1359 initiator. The required value of IRD is outside the scope of the 1360 iSER specification. The iSER Layer at the initiator MUST set IRD to 1361 1 or higher if R2Ts are to be used in the connection. However, the 1362 iSER Layer at the initiator MAY set IRD to 0 based on implementation 1363 configuration which indicates that no R2Ts will be used on that 1364 connection. Initially, the iSER-IRD value at the initiator SHOULD 1365 be set to the IRD value at the initiator and MUST NOT be more than 1366 the IRD value. 1368 On the other hand, the Outbound RDMA Read Queue Depth (ORD) MAY be 1369 set to 0 since the iSER Layer at the initiator does not issue RDMA 1370 Read Requests to the target. 1372 Failure to allocate the requested connection resources locally 1373 results in a login failure and its handling is described in section 1374 10.1.3.1. 1376 If the iSER Layer at the initiator is successful in allocating the 1377 connection resources necessary to support RCaP, the following events 1378 MUST occur in the specified sequence: 1380 1. The iSER Layer MUST return a success status to the iSCSI Layer 1381 in response to the Allocate_Connection_Resources Operational 1382 Primitive. 1384 2. After the target returns the Login Response with the T bit set 1385 to 1 and the NSG field set to FullFeaturePhase, and a status 1386 class of 0 (Success), the iSCSI Layer MUST invoke the 1387 Enable_Datamover Operational Primitive with the following 1388 qualifiers. (See section 10.1.4.6 for the case when the status 1389 class is not Success.): 1391 a. Connection_Handle that identifies the iSCSI connection. 1393 b. Transport_Connection_Descriptor which identifies the 1394 specific transport connection associated with the 1395 Connection_Handle. 1397 3. The iSER Layer MUST send the iSER Hello Message as the first 1398 iSER Message only if iSERHelloRequired is negotiated to "Yes". 1399 See Section 5.1.3 on iSER Hello Exchange. 1401 5.1.2 Target Behavior 1403 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1404 mode, then on the target side, prior to sending the Login Response 1405 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1406 to FullFeaturePhase, the iSCSI Layer MUST request the iSER Layer to 1407 allocate the resources necessary to support RCaP by invoking the 1408 Allocate_Connection_Resources Operational Primitive. The connection 1409 resources required are defined by implementation and are outside the 1410 scope of this specification. Optionally, the iSCSI Layer may invoke 1411 the Notice_Key_Values Operational Primitive before invoking the 1412 Allocate_Connection_Resources Operational Primitive to request the 1413 iSER Layer to take note of the negotiated values of the iSCSI keys 1414 for the Connection. The specific keys to be passed in as input 1415 qualifiers are implementation dependent. These may include, but not 1416 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1418 To minimize the potential for a denial of service attack, the iSCSI 1419 Layer MUST NOT request the iSER Layer to allocate the connection 1420 resources necessary to support RCaP until the iSCSI layer is 1421 sufficiently far along in the iSCSI Login Phase that it is 1422 reasonably certain that the peer side is not an attacker. In 1423 particular, if the Login Phase includes a SecurityNegotiation stage, 1424 the iSCSI Layer MUST defer the connection resource allocation (i.e. 1425 invoking the Allocate_Connection_Resources Operational Primitive) to 1426 the LoginOperationalNegotiation stage ([iSCSI]) so that the resource 1427 allocation occurs after the authentication phase is completed. 1429 Among the connection resources allocated at the target is the 1430 Outbound RDMA Read Queue Depth (ORD). As described in section 1431 9.5.1, R2Ts are transformed by the target into RDMA Read operations. 1432 The ORD limits the maximum number of simultaneously outstanding RDMA 1433 Read Requests per RCaP Stream from the target to the initiator. 1434 Initially, the iSER-ORD value at the target SHOULD be set to the ORD 1435 value at the target. 1437 On the other hand, the IRD at the target MAY be set to 0 since the 1438 iSER Layer at the target does not expect RDMA Read Requests to be 1439 issued by the initiator. 1441 Failure to allocate the requested connection resources locally 1442 results in a login failure and its handling is described in section 1443 10.1.3.1. 1445 If the iSER Layer at the target is successful in allocating the 1446 connection resources necessary to support RCaP, the following events 1447 MUST occur in the specified sequence: 1449 1. The iSER Layer MUST return a success status to the iSCSI Layer 1450 in response to the Allocate_Connection_Resources Operational 1451 Primitive. 1453 2. The iSCSI Layer MUST invoke the Enable_Datamover Operational 1454 Primitive with the following qualifiers: 1456 a. Connection_Handle that identifies the iSCSI connection. 1458 b. Transport_Connection_Descriptor which identifies the 1459 specific transport connection associated with the 1460 Connection_Handle. 1462 c. The final transport layer (e.g. TCP) message containing the 1463 Login Response with the T bit set to 1 and the NSG field set 1464 to FullFeaturePhase 1466 3. The iSER Layer MUST send the final Login Response PDU in the 1467 native transport mode to conclude the iSCSI Login Phase. If the 1468 underlying transport is TCP, then the iSER Layer MUST send the 1469 final Login Response PDU in byte stream mode. 1471 4. After receiving the iSER Hello Message from the initiator, the 1472 iSER Layer MUST respond with the iSER HelloReply Message to be 1473 sent as the first iSER Message if iSERHelloRequired is 1474 negotiated to "Yes". If the iSER layer receives an iSER Hello 1475 Message when iSERHelloRequired is negotiated to "No", then this 1476 MUST be treated as an iSER protocol error. See section 5.1.3 on 1477 iSER Hello Exchange for more details. 1479 Note: In the above sequence, the operations as described in bullets 1480 3 and 4 MUST be performed atomically for iWARP connections. Failure 1481 to do this may result in race conditions. 1483 5.1.3 iSER Hello Exchange 1485 If iSERHelloRequired is negotiated to "Yes", the first iSER Message 1486 sent by the iSER Layer at the initiator to the target MUST be the 1487 iSER Hello Message. The iSER Hello Message is used by the iSER 1488 Layer at the initiator to declare iSER parameters to the target. 1489 See section 9.3 on iSER Header Format for iSER Hello Message. 1490 Conversely, if iSERHelloRequired is negotiated to "No", then the 1491 iSER Layer at the initiator MUST NOT send an iSER Hello Message. 1493 In response to the iSER Hello Message, the iSER Layer at the target 1494 MUST return the iSER HelloReply Message as the first iSER Message 1495 sent by the target if iSERHelloRequired is negotiated to "Yes". The 1496 iSER HelloReply Message is used by the iSER Layer at the target to 1497 declare iSER parameters to the initiator. See section 9.4 on iSER 1498 Header Format for iSER HelloReply Message. If the iSER layer 1499 receives an iSER Hello Message when iSERHelloRequired is negotiated 1500 to "No", then this MUST be treated as an iSER protocol error. See 1501 section 10.1.3.4 on iSER Protocol Errors for more details 1503 In the iSER Hello Message, the iSER Layer at the initiator declares 1504 the iSER-IRD value to the target. 1506 Upon receiving the iSER Hello Message, the iSER Layer at the target 1507 MUST set the iSER-ORD value to the minimum of the iSER-ORD value at 1508 the target and the iSER-IRD value declared by the initiator. The 1509 iSER Layer at the target MAY adjust (lower) its ORD value to match 1510 the iSER-ORD value if the iSER-ORD value is smaller than the ORD 1511 value at the target in order to free up the unused resources. 1513 In the iSER HelloReply Message, the iSER Layer at the target 1514 declares the iSER-ORD value to the initiator. 1516 Upon receiving the iSER HelloReply Message, the iSER Layer at the 1517 initiator MAY adjust (lower) its IRD value to match the iSER-ORD 1518 value in order to free up the unused resources, if the iSER-ORD 1519 value declared by the target is smaller than the iSER-IRD value 1520 declared by the initiator. 1522 It is an iSER level negotiation failure if the iSER parameters 1523 declared in the iSER Hello Message by the initiator are unacceptable 1524 to the target. This includes the following: 1526 * The initiator-declared iSER-IRD value is greater than 0 and the 1527 target-declared iSER-ORD value is 0. 1529 * The initiator-supported and the target-supported iSER protocol 1530 versions do not overlap. 1532 See section 10.1.3.2 on the handling of the error situation. 1534 5.2 iSCSI/iSER Connection Termination 1536 5.2.1 Normal Connection Termination at the Initiator 1538 The iSCSI Layer at the initiator terminates an iSCSI/iSER connection 1539 normally by invoking the Send_Control Operational Primitive 1540 qualified with the Logout Request PDU. The iSER Layer at the 1541 initiator MUST use a Send Message to send the Logout Request PDU to 1542 the target. The SendSE Message should be used if supported by the 1543 RCaP layer (e.g., iWARP). After the iSER Layer at the initiator 1544 receives the Send Message containing the Logout Response PDU from 1545 the target, it MUST notify the iSCSI Layer by invoking the 1546 Control_Notify Operational Primitive qualified with the Logout 1547 Response PDU. 1549 After the iSCSI logout process is complete, the iSCSI layer at the 1550 target is responsible for closing the iSCSI/iSER connection as 1551 described in Section 5.2.2. After the RCaP layer at the initiator 1552 reports that the Connection has been closed, the iSER Layer at the 1553 initiator MUST deallocate all connection and task resources (if any) 1554 associated with the connection, invalidate the Local Mappings (if 1555 any) before notifying the iSCSI Layer by invoking the 1556 Connection_Terminate_Notify Operational Primitive. 1558 5.2.2 Normal Connection Termination at the Target 1560 Upon receiving the Send Message containing the Logout Request PDU, 1561 the iSER Layer at the target MUST notify the iSCSI Layer at the 1562 target by invoking the Control_Notify Operational Primitive 1563 qualified with the Logout Request PDU. The iSCSI Layer completes 1564 the logout process by invoking the Send_Control Operational 1565 Primitive qualified with the Logout Response PDU. The iSER Layer at 1566 the target MUST use a Send Message to send the Logout Response PDU 1567 to the initiator. The SendSE Message should be used if supported by 1568 the RCaP layer (e.g., iWARP). After the iSCSI logout process is 1569 complete, the iSCSI Layer at the target MUST request the iSER Layer 1570 at the target to terminate the RCaP Stream by invoking the 1571 Connection_Terminate Operational Primitive. 1573 As part of the termination process, the RCaP layer MUST close the 1574 Connection. When the RCaP layer notifies the iSER Layer after the 1575 RCaP Stream and the associated Connection are terminated, the iSER 1576 Layer MUST deallocate all connection and task resources (if any) 1577 associated with the connection, and invalidate the Local and Remote 1578 Mappings (if any). 1580 5.2.3 Termination without Logout Request/Response PDUs 1582 5.2.3.1 Connection Termination Initiated by the iSCSI Layer 1584 The Connection_Terminate Operational Primitive MAY be invoked by the 1585 iSCSI Layer to request the iSER Layer to terminate the RCaP Stream 1586 without having previously exchanged the Logout Request and Logout 1587 Response PDUs between the two iSCSI/iSER nodes. As part of the 1588 termination process, the RCaP layer will close the Connection. When 1589 the RCaP layer notifies the iSER Layer after the RCaP Stream and the 1590 associated Connection are terminated, the iSER Layer MUST perform 1591 the following actions. 1593 If the Connection_Terminate Operational Primitive is invoked by the 1594 iSCSI Layer at the target, then the iSER Layer at the target MUST 1595 deallocate all connection and task resources (if any) associated 1596 with the connection, and invalidate the Local and Remote Mappings 1597 (if any). 1599 If the Connection_Terminate Operational Primitive is invoked by the 1600 iSCSI Layer at the initiator, then the iSER Layer at the initiator 1601 MUST deallocate all connection and task resources (if any) 1602 associated with the connection, and invalidate the Local Mappings 1603 (if any). 1605 5.2.3.2 Connection Termination Notification to the iSCSI Layer 1607 If the iSCSI/iSER connection is terminated without the invocation of 1608 Connection_Terminate from the iSCSI Layer, the iSER Layer MUST 1609 notify the iSCSI Layer that the iSCSI/iSER connection has been 1610 terminated by invoking the Connection_Terminate_Notify Operational 1611 Primitive. 1613 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1614 target MUST deallocate all connection and task resources (if any) 1615 associated with the connection, and invalidate the Local and Remote 1616 Mappings (if any). 1618 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1619 initiator MUST deallocate all connection and task resources (if any) 1620 associated with the connection, and invalidate the Local Mappings 1621 (if any). 1623 If the remote iSCSI/iSER node initiated the closing of the 1624 Connection (e.g., by sending a TCP FIN or TCP RST), the iSER Layer 1625 MUST notify the iSCSI Layer after the RCaP layer reports that the 1626 Connection is closed by invoking the Connection_Terminate_Notify 1627 Operational Primitive. 1629 Another example of a Connection termination without a preceding 1630 logout is when the iSCSI Layer at the initiator does an implicit 1631 logout (connection reinstatement). 1633 6 Login/Text Operational Keys 1635 Certain iSCSI login/text operational keys have restricted usage in 1636 iSER, and additional keys are used to support the iSER protocol 1637 functionality. All other keys defined in [iSCSI] and not discussed 1638 in this section may be used on iSCSI/iSER connections with the same 1639 semantics. 1641 6.1 HeaderDigest and DataDigest 1643 Irrelevant when: RDMAExtensions=Yes 1645 Negotiations resulting in RDMAExtensions=Yes for a session implies 1646 HeaderDigest=None and DataDigest=None for all connections in that 1647 session and overrides both the default and an explicit setting. 1649 6.2 MaxRecvDataSegmentLength 1651 For an iSCSI connection belonging to a session in which 1652 RDMAExtensions=Yes was negotiated on the leading connection of the 1653 session, MaxRecvDataSegmentLength need not be declared in the Login 1654 Phase, and MUST be ignored if it is declared. Instead 1655 InitiatorRecvDataSegmentLength (as described in section 6.5) and 1656 TargetRecvDataSegmentLength (as described in section 6.4) keys are 1657 negotiated. The values of the local and remote 1658 MaxRecvDataSegmentLength are derived from the 1659 InitiatorRecvDataSegmentLength and TargetRecvDataSegmentLength keys. 1661 In the full feature phase, the initiator MUST consider the value of 1662 its local MaxRecvDataSegmentLength (that it would have declared to 1663 the target) as having the value of InitiatorRecvDataSegmentLength, 1664 and the value of the remote MaxRecvDataSegmentLength (that would 1665 have been declared by the target) as having the value of 1666 TargetRecvDataSegmentLength. Similarly, the target MUST consider 1667 the value of its local MaxRecvDataSegmentLength (that it would have 1668 declared to the initiator) as having the value of 1669 TargetRecvDataSegmentLength, and the value of the remote 1670 MaxRecvDataSegmentLength (that would have been declared by the 1671 initiator) as having the value of InitiatorRecvDataSegmentLength. 1673 Note that RFC 3720 requires that when a target receives a NOP-Out 1674 request with a valid Initiator Task Tag, it responds with a NOP-In 1675 with the same Initiator Task Tag that was provided in the NOP-Out 1676 request. Furthermore, it returns the first MaxRecvDataSegmentLength 1677 bytes of the initiator provided Ping Data. Since there is no 1678 MaxRecvDataSegmentLength common to the initiator and the target in 1679 iSER, the length of the data sent with the NOP-Out request MUST NOT 1680 exceed InitiatorMaxRecvDataSegmentLength. 1682 The MaxRecvDataSegmentLength key is applicable only for iSCSI 1683 control-type PDUs. 1685 6.3 RDMAExtensions 1687 Use: LO (leading only) 1689 Senders: Initiator and Target 1691 Scope: SW (session-wide) 1693 RDMAExtensions= 1695 Irrelevant when: SessionType=Discovery 1697 Default is No 1699 Result function is AND 1701 This key is used by the initiator and the target to negotiate the 1702 support for iSER-assisted mode. To enable the use of iSER-assisted 1703 mode, both the initiator and the target MUST exchange 1704 RDMAExtensions=Yes. iSER-assisted mode MUST NOT be used if either 1705 the initiator or the target offers RDMAExtensions=No. 1707 An iSER-enabled node is not required to initiate the RDMAExtensions 1708 key exchange if it prefers to operate in the Traditional iSCSI mode. 1709 However, if the RDMAExtensions key is to be negotiated, an initiator 1710 MUST offer the key in the first Login Request PDU in the 1711 LoginOperationalNegotiation stage of the leading connection, and a 1712 target MUST offer the key in the first Login Response PDU with which 1713 it is allowed to do so (i.e., the first Login Response PDU issued 1714 after the first Login Request PDU with the C bit set to 0) in the 1715 LoginOperationalNegotiation stage of the leading connection. In 1716 response to the offered key=value pair of RDMAExtensions=yes, an 1717 initiator MUST respond in the next Login Request PDU with which it 1718 is allowed to do so, and a target MUST respond in the next Login 1719 Response PDU with which it is allowed to do so. 1721 Negotiating the RDMAExtensions key first enables a node to negotiate 1722 the optimal value for other keys. Certain iSCSI keys such as 1723 MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T, 1724 ImmediateData, etc., may be negotiated differently depending on 1725 whether connection is in Traditional iSCSI mode or iSER-assisted 1726 mode. 1728 6.4 TargetRecvDataSegmentLength 1730 Use: IO (Initialize only) 1732 Senders: Initiator and Target 1734 Scope: CO (connection-only) 1736 Irrelevant when: RDMAExtensions=No 1738 TargetRecvDataSegmentLength= 1740 Default is 8192 bytes 1742 Result function is minimum 1744 This key is relevant only for the iSCSI connection of an iSCSI 1745 session if RDMAExtensions=Yes was negotiated on the leading 1746 connection of the session. It is used by the initiator and the 1747 target to negotiate the maximum size of the data segment that an 1748 initiator may send to the target in an iSCSI control-type PDU in the 1749 full feature phase. For SCSI Command PDUs and SCSI Data-out PDUs 1750 containing non-immediate unsolicited data to be sent by the 1751 initiator, the initiator MUST send all non-Final PDUs with a data 1752 segment size of exactly TargetRecvDataSegmentLength whenever the 1753 PDUs constitute a data sequence whose size is larger than 1754 TargetRecvDataSegmentLength. 1756 6.5 InitiatorRecvDataSegmentLength 1758 Use: IO (Initialize only) 1760 Senders: Initiator and Target 1762 Scope: CO (connection-only) 1764 Irrelevant when: RDMAExtensions=No 1766 InitiatorRecvDataSegmentLength= 1768 Default is 8192 bytes 1770 Result function is minimum 1771 This key is relevant only for the iSCSI connection of an iSCSI 1772 session if RDMAExtensions=Yes was negotiated on the leading 1773 connection of the session. It is used by the initiator and the 1774 target to negotiate the maximum size of the data segment that a 1775 target may send to the initiator in an iSCSI control-type PDU in the 1776 full feature phase. 1778 6.6 OFMarker and IFMarker 1780 Irrelevant when: RDMAExtensions=Yes 1782 Negotiations resulting in RDMAExtensions=Yes for a session implies 1783 OFMarker=No and IFMarker=No for all connections in that session and 1784 overrides both the default and an explicit setting. 1786 6.7 MaxOutstandingUnexpectedPDUs 1788 Use: LO (leading only), Declarative 1790 Senders: Initiator and Target 1792 Scope: SW (session-wide) 1794 Irrelevant when: RDMAExtensions=No 1796 MaxOutstandingUnexpectedPDUs= 1799 Default is 0 1801 This key is used by the initiator and the target to declare the 1802 maximum number of outstanding "unexpected" iSCSI control-type PDUs 1803 that it can receive in the full feature phase. It is intended to 1804 allow the receiving side to determine the amount of buffer resources 1805 needed beyond the normal flow control mechanism available in iSCSI. 1806 An initiator or target should select a value such that it would not 1807 impose an unnecessary constraint on the iSCSI Layer under normal 1808 circumstances. The value of 0 is defined to indicate that the 1809 declarer has no limit on the maximum number of outstanding 1810 "unexpected" iSCSI control-type PDUs that it can receive. See 1811 sections 8.1.1 and 8.1.2 for the usage of this key. Note that iSER 1812 Hello and HelloReply Messages are not iSCSI control-type PDUs and 1813 are not affected by this key. 1815 For interoperability with implementations based on [RFC5046], this 1816 key SHOULD be negotiated because the default value of 0 in [RFC5046] 1817 is problematic for most implementations as it does not impose a 1818 bound on resources consumable by unexpected PDUs. 1820 6.8 MaxAHSLength 1822 Use: LO (leading only), Declarative 1824 Senders: Initiator and Target 1826 Scope: SW (session-wide) 1828 Irrelevant when: RDMAExtensions=No 1830 MaxAHSLength= 1832 Default is 256 1834 This key is used by the intiator and target to declare the maximum 1835 size of AHS in an iSCSI control-type PDU that it can receive in the 1836 full feature phase. It is intended to allow the receiving side to 1837 determine the amount of resources needed for receive buffering. An 1838 initiator or target should select a value such that it would not 1839 impose an unnecessary constraint on the iSCSI Layer under normal 1840 circumstances. The value of 0 is defined to indicate that the 1841 declarer has no limit on the maximum size of AHS in iSCSI control- 1842 type PDUs that it can receive. 1844 For interoperability with implementations based on [RFC5046], an 1845 initiator or target MAY terminate the connection if it anticipates 1846 MaxAHSLength to be greater than 256 and the key is not understood by 1847 its peer. 1849 6.9 TaggedBufferForSolicitedDataOnly 1851 Use: LO (leading only), Declarative 1853 Senders: Initiator 1855 Scope: SW (session-wide) 1857 RDMAExtensions= 1859 Irrelevant when: RDMAExtensions=No 1861 Default is No 1862 This key is used by the intiator to declare to the target the usage 1863 of the Write Base Offset in the iSER header of an iSCSI control-type 1864 PDU. When set to No, the Base Offset is associated with an I/O 1865 buffer that contains all the write data, including both unsolicited 1866 and solicited data. When set to Yes, the Base Offset is associated 1867 with an I/O buffer that only contains solicited data. 1869 6.10 iSERHelloRequired 1871 Use: LO (leading only), Declarative 1873 Senders: Initiator 1875 Scope: SW (session-wide) 1877 RDMAExtensions= 1879 Irrelevant when: RDMAExtensions=No 1881 Default is No 1883 This key is relevant only for the iSCSI connection of an iSCSI 1884 session if RDMAExtensions=Yes was negotiated on the leading 1885 connection of the session. It is used by the intiator to declare to 1886 the target if the iSER Hello Exchange is required. When set to Yes, 1887 the iSER layers MUST perform the iSER Hello Exchange as described in 1888 5.1.3. When set to No, the iSER layers MUST NOT perform the iSER 1889 Hello Exchange. 1891 7 iSCSI PDU Considerations 1893 When a connection is in the iSER-assisted mode, two types of message 1894 transfers are allowed between the iSCSI Layer at the initiator and 1895 the iSCSI Layer at the target. These are known as the iSCSI data- 1896 type PDUs and the iSCSI control-type PDUs and these terms are 1897 described in the following sections. 1899 7.1 iSCSI Data-Type PDU 1901 An iSCSI data-type PDU is defined as an iSCSI PDU that causes data 1902 transfer, transparent to the remote iSCSI layer, to take place 1903 between the peer iSCSI nodes in the full feature phase of an 1904 iSCSI/iSER connection. An iSCSI data-type PDU, when requested for 1905 transmission by the iSCSI Layer in the sending node, results in the 1906 data being transferred without the participation of the iSCSI Layers 1907 at the sending and the receiving nodes. This is due to the fact 1908 that the PDU itself is not delivered as-is to the iSCSI Layer in the 1909 receiving node. Instead, the data transfer operations are 1910 transformed into the appropriate RDMA operations which are handled 1911 by the RDMA-Capable Controller. The set of iSCSI data-type PDUs 1912 consists of SCSI Data-in PDUs and R2T PDUs. 1914 If the invocation of the Operational Primitive by the iSCSI Layer to 1915 request the iSER Layer to process an iSCSI data-type PDU is 1916 qualified with Notify_Enable set, then upon completing the RDMA 1917 operation, the iSER Layer at the target MUST notify the iSCSI Layer 1918 at the target by invoking the Data_Completion_Notify Operational 1919 Primitive qualified with ITT and SN. There is no data completion 1920 notification at the initiator since the RDMA operations are 1921 completely handled by the RDMA-Capable Controller at the initiator 1922 and the iSER Layer at the initiator is not involved with the data 1923 transfer associated with iSCSI data-type PDUs. 1925 If the invocation of the Operational Primitive by the iSCSI Layer to 1926 request the iSER Layer to process an iSCSI data-type PDU is 1927 qualified with Notify_Enable cleared, then upon completing the RDMA 1928 operation, the iSER Layer at the target MUST NOT notify the iSCSI 1929 Layer at the target and MUST NOT invoke the Data_Completion_Notify 1930 Operational Primitive. 1932 If an operation associated with an iSCSI data-type PDU fails for any 1933 reason, the contents of the Data Sink buffers associated with the 1934 operation are considered indeterminate. 1936 7.2 iSCSI Control-Type PDU 1938 Any iSCSI PDU that is not an iSCSI data-type PDU and also not a SCSI 1939 Data-out PDU carrying solicited data is defined as an iSCSI control- 1940 type PDU. The iSCSI Layer invokes the Send_Control Operational 1941 Primitive to request the iSER Layer to process an iSCSI control-type 1942 PDU. iSCSI control-type PDUs are transferred using Send Messages of 1943 RCaP. Specifically, it is to be noted that SCSI Data-Out PDUs 1944 carrying unsolicited data are defined as iSCSI control-type PDUs. 1945 See section 7.3.4 on the treatment of SCSI Data-out PDUs. 1947 When the iSER Layer receives an iSCSI control-type PDU, it MUST 1948 notify the iSCSI Layer by invoking the Control_Notify Operational 1949 Primitive qualified with the iSCSI control-type PDU. 1951 7.3 iSCSI PDUs 1953 This section describes the handling of each of the iSCSI PDU types 1954 by the iSER Layer. The iSCSI Layer requests the iSER Layer to 1955 process the iSCSI PDU by invoking the appropriate Operational 1956 Primitive. A Connection_Handle MUST qualify each of these 1957 invocations. In addition, BHS and the optional AHS of the iSCSI PDU 1958 as defined in [iSCSI] MUST qualify each of the invocations. The 1959 qualifying Connection_Handle, the BHS and the AHS are not explicitly 1960 listed in the subsequent sections. 1962 7.3.1 SCSI Command 1964 Type: control-type PDU 1966 PDU-specific qualifiers (for SCSI Write or bidirectional 1967 command): ImmediateDataSize, UnsolicitedDataSize, 1968 DataDescriptorOut 1970 PDU-specific qualifiers (for SCSI Read or bidirectional 1971 command): DataDescriptorIn 1973 The iSER Layer at the initiator MUST send the SCSI command in a Send 1974 Message to the target. The SendSE Message should be used if 1975 supported by the RCaP layer (e.g., iWARP). 1977 For a SCSI Write or bidirectional command, the iSCSI Layer at the 1978 initiator MUST invoke the Send_Control Operational Primitive as 1979 follows: 1981 * If there is immediate data to be transferred for the SCSI write 1982 or bidirectional command, the qualifier ImmediateDataSize MUST be 1983 used to define the number of bytes of immediate unsolicited data 1984 to be sent with the write or bidirectional command, and the 1985 qualifier DataDescriptorOut MUST be used to define the 1986 initiator's I/O Buffer containing the SCSI Write data. 1988 * If there is unsolicited data to be transferred for the SCSI Write 1989 or bidirectional command, the qualifier UnsolicitedDataSize MUST 1990 be used to define the number of bytes of immediate and non- 1991 immediate unsolicited data for the command. The iSCSI Layer will 1992 issue one or more SCSI Data-out PDUs for the non-immediate 1993 unsolicited data. See Section 7.3.4 on SCSI Data-out. 1995 * If there is solicited data to be transferred for the SCSI Write 1996 or bidirectional command, as indicated by the Expected Data 1997 Transfer Length in the SCSI Command PDU exceeding the value of 1998 UnsolicitedDataSize, the iSER Layer at the initiator MUST do the 1999 following: 2001 a. It MUST allocate a Write STag for the I/O Buffer defined by 2002 the qualifier DataDescriptorOut. DataDescriptorOut 2003 describes the I/O buffer starting with the immediate 2004 unsolicited data (if any), followed by the non-immediate 2005 unsolicited data (if any) and solicited data. When 2006 TaggedBufferForSolicitedDataOnly is negotiated to No, the 2007 Base Offset is associated with this I/O Buffer. When 2008 TaggedBufferForSolicitedDataOnly is negotiated to Yes, the 2009 Base Offset is associated with an I/O Buffer that contains 2010 only solicited data. 2012 b. It MUST establish a Local Mapping that associates the 2013 Initiator Task Tag (ITT) to the Write STag. 2015 c. It MUST Advertise the Write STag and the Base Offset to the 2016 target by sending them in the iSER header of the iSER 2017 Message (the payload of the Send Message of RCaP) containing 2018 the SCSI Write or bidirectional command PDU. The SendSE 2019 Message should be used if supported by the RCaP layer (e.g., 2020 iWARP). See section 9.2 on iSER Header Format for iSCSI 2021 Control-Type PDU. 2023 For a SCSI Read or bidirectional command, the iSCSI Layer at the 2024 initiator MUST invoke the Send_Control Operational Primitive 2025 qualified with DataDescriptorIn which defines the initiator's I/O 2026 Buffer for receiving the SCSI Read data. The iSER Layer at the 2027 initiator MUST do the following: 2029 a. It MUST allocate a Read STag for the I/O Buffer and note the 2030 Base Offset for this I/O Buffer. 2032 b. It MUST establish a Local Mapping that associates the 2033 Initiator Task Tag (ITT) to the Read STag. 2035 c. It MUST Advertise the Read STag and the Base Offset to the 2036 target by sending them in the iSER header of the iSER 2037 Message (the payload of the Send Message of RCaP) containing 2038 the SCSI Read or bidirectional command PDU. The SendSE 2039 Message should be used if supported by the RCaP layer (e.g., 2040 iWARP). See section 9.2 on iSER Header Format for iSCSI 2041 Control-Type PDU. 2043 If the amount of unsolicited data to be transferred in a SCSI 2044 Command exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at 2045 the initiator MUST segment the data into multiple iSCSI control-type 2046 PDUs, with the data segment length in all PDUs generated except the 2047 last one having exactly the size TargetRecvDataSegmentLength. The 2048 data segment length of the last iSCSI control-type PDU carrying the 2049 unsolicited data can be up to TargetRecvDataSegmentLength. 2051 When the iSER Layer at the target receives the SCSI Command, it MUST 2052 establish a Remote Mapping that associates the ITT to the Base 2053 Offset(s) and the Advertised STag(s) in the iSER header. The Write 2054 STag is used by the iSER Layer at the target in handling the data 2055 transfer associated with the R2T PDU(s) as described in section 2056 7.3.6. The Read STag is used in handling the SCSI Data-in PDU(s) 2057 from the iSCSI Layer at the target as described in section 7.3.5. 2059 7.3.2 SCSI Response 2061 Type: control-type PDU 2063 PDU-specific qualifiers: DataDescriptorStatus 2065 The iSCSI Layer at the target MUST invoke the Send_Control 2066 Operational Primitive qualified with DataDescriptorStatus which 2067 defines the buffer containing the sense and response information. 2068 The iSCSI Layer at the target MUST always return the SCSI status for 2069 a SCSI command in a separate SCSI Response PDU. "Phase collapse" 2070 for transferring SCSI status in a SCSI Data-in PDU MUST NOT be used. 2071 The iSER Layer at the target sends the SCSI Response PDU according 2072 to the following rules: 2074 * If no STags were Advertised by the initiator in the iSER Message 2075 containing the SCSI command PDU, then the iSER Layer at the 2076 target MUST send a Send Message containing the SCSI Response PDU. 2077 The SendSE Message should be used if supported by the RCaP layer 2078 (e.g., iWARP). 2080 * If the initiator Advertised a Read STag in the iSER Message 2081 containing the SCSI Command PDU, then the iSER Layer at the 2082 target MUST send a Send Message containing the SCSI Response PDU. 2083 The header of the Send Message MUST carry the Read STag to be 2084 invalidated at the initiator. The Send with Invalidate Message, 2085 if supported by the RCaP layer (e.g., iWARP), can be used for the 2086 automatic invalidation of the STag. 2088 * If the initiator Advertised only the Write STag in the iSER 2089 Message containing the SCSI command PDU, then the iSER Layer at 2090 the target MUST send a Send Message containing the SCSI Response 2091 PDU. The header of the Send Message MUST carry the Write STag to 2092 be invalidated at the initiator. The Send with Invalidate 2093 Message, if supported by the RCaP layer (e.g., iWARP), can be 2094 used for the automatic invalidation of the STag. 2096 When the iSCSI Layer at the target invokes the Send_Control 2097 Operational Primitive to send the SCSI Response PDU, the iSER Layer 2098 at the target MUST invalidate the Remote Mapping before transferring 2099 the SCSI Response PDU to the initiator. 2101 Upon receiving a Send Message containing the SCSI Response PDU from 2102 the target, the iSER layer at the initiator MUST invalidate the 2103 STag(s) specified in the header. (If a Send with Invalidate Message 2104 is supported by the RCaP layer (e.g., iWARP) and is used to carry 2105 the SCSI Response PDU, the RCaP layer at the initiator will 2106 invalidate the STag. The iSER Layer at the initiator MUST ensure 2107 that the correct STag is invalidated. If both the Read and the 2108 Write STags were Advertised earlier by the initiator, then the iSER 2109 Layer at the initiator MUST explicitly invalidate the Write STag 2110 upon receiving the Send with Invalidate Message because the header 2111 of the Send with Invalidate Message can only carry one STag (in this 2112 case the Read STag) to be invalidated.) 2114 The iSER Layer at the initiator MUST ensure the invalidation of the 2115 STag(s) used in a command before notifying the iSCSI Layer at the 2116 initiator by invoking the Control_Notify Operational Primitive 2117 qualified with the SCSI Response. This precludes the possibility of 2118 using the STag(s) after the completion of the command thereby 2119 causing data corruption. 2121 When the iSER Layer at the initiator receives a Send Message 2122 containing the SCSI Response PDU, it SHOULD invalidate the Local 2123 Mapping. The iSER Layer MUST ensure that all local STag(s) 2124 associated with the ITT are invalidated before notifying the iSCSI 2125 Layer of the SCSI Response PDU by invoking the Control_Notify 2126 Operational Primitive qualified with the SCSI Response PDU. 2128 7.3.3 Task Management Function Request/Response 2130 Type: control-type PDU 2132 PDU-specific qualifiers (for TMF Request): DataDescriptorOut, 2133 DataDescriptorIn 2135 The iSER Layer MUST use a Send Message to send the Task Management 2136 Function Request/Response PDU. The SendSE Message should be used if 2137 supported by the RCaP layer (e.g., iWARP). 2139 For the Task Management Function Request with the TASK REASSIGN 2140 function, the iSER Layer at the initiator MUST do the following: 2142 * It MUST use the ITT as specified in the Referenced Task Tag from 2143 the Task Management Function Request PDU to locate the existing 2144 STags (if any) in the Local Mappings. 2146 * It MUST invalidate the existing STags (if any) and the Local 2147 Mappings. 2149 * It MUST allocate a Read STag for the I/O Buffer and note the Base 2150 Offset associated with the I/O Buffer as defined by the qualifier 2151 DataDescriptorIn if the Send_Control Operational Primitive 2152 invocation is qualified with DataDescriptorIn. 2154 * It MUST allocate a Write STag for the I/O Buffer and note the 2155 Base OIffset associated with the I/O Buffer as defined by the 2156 qualifier DataDescriptorOut if the Send_Control Operational 2157 Primitive invocation is qualified with DataDescriptorOut. 2159 * If STags are allocated, it MUST establish new Local Mapping(s) 2160 that associate the ITT to the allocated STag(s). 2162 * It MUST Advertise the STags and the Base Offsets, if allocated, 2163 to the target in the iSER header of the Send Message carrying the 2164 iSCSI PDU, as described in section 9.2. The SendSE Message 2165 should be used if supported by the RCaP layer (e.g., iWARP). 2167 For the Task Management Function Request with the TASK REASSIGN 2168 function for a SCSI Read or bidirectional command, the iSCSI Layer 2169 at the initiator MUST set ExpDataSN to 0 since the data transfer and 2170 acknowledgements happen transparently to the iSCSI Layer at the 2171 initiator. This provides the flexibility to the iSCSI Layer at the 2172 target to request transmission of only the unacknowledged data as 2173 specified in [iSCSI]. 2175 When the iSER Layer at the target receives the Task Management 2176 Function Request with the TASK REASSIGN function, it MUST do the 2177 following: 2179 * It MUST use the ITT as specified in the Referenced Task Tag from 2180 the Task Management Function Request PDU to locate the Local and 2181 Remote Mappings (if any). 2183 * It MUST invalidate the local STaqs (if any) associated with the 2184 ITT. 2186 * It MUST replace the Base Offset(s) and the Advertised STag(s) in 2187 the Remote Mapping with the Base Offset(s) and the Advertised 2188 STag(s) in the iSER header. The Write STag is used in the 2189 handling of the R2T PDU(s) from the iSCSI Layer at the target as 2190 described in section 7.3.6. The Read STag is used in the 2191 handling of the SCSI Data-in PDU(s) from the iSCSI Layer at the 2192 target as described in section 7.3.5. 2194 7.3.4 SCSI Data-out 2196 Type: control-type PDU 2198 PDU-specific qualifiers: DataDescriptorOut 2200 The iSCSI Layer at the initiator MUST invoke the Send_Control 2201 Operational Primitive qualified with DataDescriptorOut which defines 2202 the initiator's I/O Buffer containing unsolicited SCSI Write data. 2204 If the amount of unsolicited data to be transferred as SCSI Data-out 2205 exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at the 2206 initiator MUST segment the data into multiple iSCSI control-type 2207 PDUs, with the DataSegmentLength having the value of 2208 TargetRecvDataSegmentLength in all PDUs generated except the last 2209 one. The DataSegmentLength of the last iSCSI control-type PDU 2210 carrying the unsolicited data can be up to 2211 TargetRecvDataSegmentLength. The iSCSI Layer at the target MUST 2212 perform the reassembly function for the unsolicited data. 2214 For unsolicited data, the iSER Layer at the initiator MUST use a 2215 Send Message to send the SCSI Data-out PDU. If the F bit is set to 2216 1, the SendSE Message shoud be used if supported by the RCaP layer 2217 (e.g., iWARP). 2219 Note that for solicited data, the SCSI Data-out PDUs are not used 2220 since R2T PDUs are not delivered to the iSCSI layer at the 2221 initiator; instead R2T PDUs are transformed by the iSER layer at the 2222 target into RDMA Read operations. (See section 7.3.6.) 2224 7.3.5 SCSI Data-in 2226 Type: data-type PDU 2228 PDU-specific qualifiers: DataDescriptorIn 2230 When the iSCSI Layer at the target is ready to return the SCSI Read 2231 data to the initiator, it MUST invoke the Put_Data Operational 2232 Primitive qualified with DataDescriptorIn which defines the SCSI 2233 Data-in buffer. See section 7.1 on the general requirement on the 2234 handling of iSCSI data-type PDUs. SCSI Data-in PDU(s) are used in 2235 SCSI Read data transfer as described in section 9.5.2. 2237 The iSER Layer at the target MUST do the following for each 2238 invocation of the Put_Data Operational Primitive: 2240 1. It MUST use the ITT in the SCSI Data-in PDU to locate the remote 2241 Read STag and the Base Offset in the Remote Mapping. The Remote 2242 Mapping was established earlier by the iSER Layer at the target 2243 when the SCSI Read Command was received from the initiator. 2245 2. It MUST generate and send an RDMA Write Message containing the 2246 read data to the initiator. 2248 a. It MUST use the remote Read STag as the Data Sink STag of 2249 the RDMA Write Message. 2251 b. It MUST add the Buffer Offset from the SCSI Data-in PDU to 2252 the Base Offset from the Remote Mapping as the Data Sink 2253 Tagged Offset of the RDMA Write Message. 2255 c. It MUST use DataSegmentLength from the SCSI Data-in PDU to 2256 determine the amount of data to be sent in the RDMA Write 2257 Message. 2259 3. It MUST associate DataSN and ITT from the SCSI Data-in PDU with 2260 the RDMA Write operation. If the Put_Data Operational Primitive 2261 invocation was qualified with Notify_Enable set, then when the 2262 iSER Layer at the target receives a completion from the RCaP 2263 layer for the RDMA Write Message, the iSER Layer at the target 2264 MUST notify the iSCSI Layer by invoking the 2265 Data_Completion_Notify Operational Primitive qualified with 2266 DataSN and ITT. Conversely, if the Put_Data Operational 2267 Primitive invocation was qualified with Notify_Enable cleared, 2268 then the iSER Layer at the target MUST NOT notify the iSCSI 2269 Layer on completion and MUST NOT invoke the 2270 Data_Completion_Notify Operational Primitive. 2272 When the A-bit is set to 1 in the SCSI Data-in PDU, the iSER Layer 2273 at the target MUST notify the iSCSI Layer at the target when the 2274 data transfer is complete at the initiator. To perform this 2275 additional function, the iSER Layer at the target can take advantage 2276 of the operational ErrorRecoveryLevel if previously disclosed by the 2277 iSCSI Layer via an earlier invocation of the Notice_Key_Values 2278 Operational Primitive. There are two approaches that can be taken: 2280 1. If the iSER Layer at the target knows that the operational 2281 ErrorRecoveryLevel is 2, or if the iSER Layer at the target does 2282 not know the operational ErrorRecoveryLevel, then the iSER Layer 2283 at the target MUST issue a zero-length RDMA Read Request Message 2284 following the RDMA Write Message. When the iSER Layer at the 2285 target receives a completion for the RDMA Read Request Message 2286 from the RCaP layer, implying that the RDMA-Capable Controller 2287 at the initiator has completed processing the RDMA Write Message 2288 due to the completion ordering semantics of RCaP, the iSER Layer 2289 at the target MUST notify the iSCSI Layer at the target by 2290 invoking the Data_Ack_Notify Operational Primitive qualified 2291 with ITT and DataSN (see section 3.2.3). 2293 2. If the iSER Layer at the target knows that the operational 2294 ErrorRecoveryLevel is 1, then the iSER Layer at the target MUST 2295 do one of the following: 2297 a. It MUST notify the iSCSI Layer at the target by invoking the 2298 Data_Ack_Notify Operational Primitive qualified with ITT and 2299 DataSN (see section 3.2.3) when it receives the local 2300 completion from the RCaP layer for the RDMA Write Message. 2301 This is allowed since digest errors do not occur in iSER 2302 (see section 10.1.4.2) and a CRC error will cause the 2303 connection to be terminated and the task to be terminated 2304 anyway. The local RDMA Write completion from the RCaP layer 2305 guarantees that the RCaP layer will not access the I/O 2306 Buffer again to transfer the data associated with that RDMA 2307 Write operation. 2309 b. Alternatively, it MUST use the same procedure for handling 2310 the data transfer completion at the initiator as for 2311 ErrorRecoveryLevel 2. 2313 It should be noted that the iSCSI Layer at the target cannot set the 2314 A-bit to 1 if the ErrorRecoveryLevel=0. 2316 SCSI status MUST always be returned in a separate SCSI Response PDU. 2317 The S bit in the SCSI Data-in PDU MUST always be set to 0. There 2318 MUST NOT be a "phase collapse" in the SCSI Data-in PDU. 2320 Since the RDMA Write Message only transfers the data portion of the 2321 SCSI Data-in PDU but not the control information in the header, such 2322 as ExpCmdSN, if timely updates of such information is crucial, the 2323 iSCSI Layer at the initiator MAY issue NOP-Out PDUs to request the 2324 iSCSI Layer at the target to respond with the information using NOP- 2325 In PDUs. 2327 7.3.6 Ready To Transfer (R2T) 2329 Type: data-type PDU 2331 PDU-specific qualifiers: DataDescriptorOut 2333 In order to send an R2T PDU, the iSCSI Layer at the target MUST 2334 invoke the Get_Data Operational Primitive qualified with 2335 DataDescriptorOut which defines the I/O Buffer for receiving the 2336 SCSI Write data from the initiator. See section 7.1 on the general 2337 requirements on the handling of iSCSI data-type PDUs. 2339 The iSER Layer at the target MUST do the following for each 2340 invocation of the Get_Data Operational Primitive: 2342 1. It MUST ensure a valid local STag for the I/O Buffer and a valid 2343 Local Mapping. This may involve allocating a valid local STag 2344 and establishing a Local Mapping. 2346 2. It MUST use the ITT in the R2T to locate the remote Write STag 2347 and the Base Offset in the Remote Mapping. The Remote Mapping 2348 was established earlier by the iSER Layer at the target when the 2349 iSER Message containing the Advertised Write STag, the Base 2350 Offset and the SCSI Command PDU for a SCSI Write or 2351 bidirectional command was received from the initiator. 2353 3. If the iSER-ORD value at the target is set to 0, the iSER Layer 2354 at the target MUST terminate the connection and free up the 2355 resources associated with the connection (as described in 5.2.3) 2356 if it received the R2T PDU from the iSCSI Layer at the target. 2357 Upon termination of the connection, the iSER Layer at the target 2358 MUST notify the iSCSI Layer at the target by invoking the 2359 Connection Terminate Notify Operational Primitive. 2361 4. If the iSER-ORD value at the target is set to greater than 0, 2362 the iSER Layer at the target MUST transform the R2T PDU into an 2363 RDMA Read Request Message. While transforming the R2T PDU, the 2364 iSER Layer at the target MUST ensure that the number of 2365 outstanding RDMA Read Request Messages does not exceed iSER-ORD 2366 value. To transform the R2T PDU, the iSER Layer at the target: 2368 a. MUST derive the local STag and local Tagged Offset from the 2369 DataDescriptorOut that qualified the Get_Data invocation. 2371 b. MUST use the local STag as the Data Sink STag of the RDMA 2372 Read Request Message. 2374 c. MUST use the local Tagged Offset as the Data Sink Tagged 2375 Offset of the RDMA Read Request Message. 2377 d. MUST use the Desired Data Transfer Length from the R2T PDU 2378 as the RDMA Read Message Size of the RDMA Read Request 2379 Message. 2381 e. MUST use the remote Write STag as the Data Source STag of 2382 the RDMA Read Request Message. 2384 f. MUST add the Buffer Offset from the R2T PDU to the Base 2385 Offset from the Remote Mapping as the Data Source Tagged 2386 Offset of the RDMA Read Request Message. 2388 5. It MUST associate R2TSN and ITT from the R2T PDU with the RDMA 2389 Read operation. If the Get_Data Operational Primitive 2390 invocation was qualified with Notify_Enable set, then when the 2391 iSER Layer at the target receives a completion from the RCaP 2392 layer for the RDMA Read operation, the iSER Layer at the target 2393 MUST notify the iSCSI Layer by invoking the 2394 Data_Completion_Notify Operational Primitive qualified with 2395 R2TSN and ITT. Conversely, if the Get_Data Operational 2396 Primitive invocation was qualified with Notify_Enable cleared, 2397 then the iSER Layer at the target MUST NOT notify the iSCSI 2398 Layer on completion and MUST NOT invoke the 2399 Data_Completion_Notify Operational Primitive. 2401 When the RCaP layer at the initiator receives a valid RDMA Read 2402 Request Message, it will return an RDMA Read Response Message 2403 containing the solicited write data to the target. When the RCaP 2404 layer at target receives the RDMA Read Response Message from the 2405 initiator, it will place the solicited data in the I/O Buffer 2406 referenced by the Data Sink STag in the RDMA Read Response Message. 2408 Since the RDMA Read Request Message from the target does not 2409 transfer the control information in the R2T PDU such as ExpCmdSN, if 2410 timely updates of such information is crucial, the iSCSI Layer at 2411 the initiator MAY issue NOP-Out PDUs to request the iSCSI Layer at 2412 the target to respond with the information using NOP-In PDUs. 2414 Similarly, since the RDMA Read Response Message from the initiator 2415 only transfers the data but not the control information normally 2416 found in the SCSI Data-out PDU, such as ExpStatSN, if timely updates 2417 of such information is crucial, the iSCSI Layer at the target MAY 2418 issue NOP-In PDUs to request the iSCSI Layer at the initiator to 2419 respond with the information using NOP-Out PDUs. 2421 7.3.7 Asynchronous Message 2423 Type: control-type PDU 2425 PDU-specific qualifiers: DataDescriptorSense 2427 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2428 qualified with DataDescriptorSense which defines the buffer 2429 containing the sense and iSCSI event information. The iSER Layer 2430 MUST use a Send Message to send the Asynchronous Message PDU. The 2431 SendSE Message should be used if supported by the RCaP layer (e.g., 2432 iWARP). 2434 7.3.8 Text Request & Text Response 2436 Type: control-type PDU 2438 PDU-specific qualifiers: DataDescriptorTextOut (for Text 2439 Request), DataDescriptorIn (for Text Response) 2441 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2442 qualified with DataDescriptorTextOut (or DataDescriptorIn) which 2443 defines the Text Request (or Text Response) buffer. The iSER Layer 2444 MUST use Send Messages to send the Text Request (or Text Response 2445 PDUs). The SendSE Message should be used if supported by the RCaP 2446 layer (e.g., iWARP). 2448 7.3.9 Login Request & Login Response 2450 During the login negotiation, the iSCSI Layer interacts with the 2451 transport layer directly and the iSER Layer is not involved. See 2452 section 5.1 on iSCSI/iSER Connection Setup. If the underlying 2453 transport is TCP, the Login Request PDUs and the Login Response PDUs 2454 are exchanged when the connection between the initiator and the 2455 target is still in the byte stream mode. 2457 The iSCSI Layer MUST NOT send a Login Request (or a Login Response) 2458 PDU during the full feature phase. A Login Request (or a Login 2459 Response) PDU, if used, MUST be treated as an iSCSI protocol error. 2460 The iSER Layer MAY reject such a PDU from the iSCSI Layer with an 2461 appropriate error code. If a Login Request PDU is received by the 2462 iSCSI Layer at the target, it MUST respond with a Reject PDU with a 2463 reason code of "protocol error". 2465 7.3.10 Logout Request & Logout Response 2467 Type: control-type PDU 2469 PDU-specific qualifiers: None 2471 The iSER Layer MUST use a Send Message to send the Logout Request or 2472 Logout Response PDU. The SendSE Message should be used if supported 2473 by the RCaP layer (e.g., iWARP). Section 5.2.1 and 5.2.2 describe 2474 the handling of the Logout Request and the Logout Response at the 2475 initiator and the target and the interactions between the initiator 2476 and the target to terminate a connection. 2478 7.3.11 SNACK Request 2480 Since HeaderDigest and DataDigest must be negotiated to "None", 2481 there are no digest errors when the connection is in iSER-assisted 2482 mode. Also since RCaP delivers all messages in the order they were 2483 sent, there are no sequence errors when the connection is in iSER- 2484 assisted mode. Therefore the iSCSI Layer MUST NOT send SNACK 2485 Request PDUs. A SNCAK Request PDU, if used, MUST be treated as an 2486 iSCSI protocol error. The iSER Layer MAY reject such a PDU from the 2487 iSCSI Layer with an appropriate error code. If a SNACK Request PDU 2488 is received by the iSCSI Layer at the target, it MUST respond with a 2489 Reject PDU with a reason code of "protocol error". 2491 7.3.12 Reject 2493 Type: control-type PDU 2494 PDU-specific qualifiers: DataDescriptorReject 2496 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2497 qualified with DataDescriptorReject which defines the Reject buffer. 2498 The iSER Layer MUST use a Send Message to send the Reject PDU. The 2499 SendSE Message should be used if supported by the RCaP layer (e.g., 2500 iWARP). 2502 7.3.13 NOP-Out & NOP-In 2504 Type: control-type PDU 2506 PDU-specific qualifiers: DataDescriptorNOPOut (for NOP-Out), 2507 DataDescriptorNOPIn (for NOP-In) 2509 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2510 qualified with DataDescriptorNOPOut (or DataDescriptorNOPIn) which 2511 defines the Ping (or Return Ping) data buffer. The iSER Layer MUST 2512 use Send Messages to send the NOP-Out (or NOP-In) PDU. The SendSE 2513 Message should be used if supported by the RCaP layer (e.g., iWARP). 2515 8 Flow Control and STag Management 2517 8.1 Flow Control for RDMA Send Messages 2519 Send Messages in RCaP are used by the iSER Layer to transfer iSCSI 2520 control-type PDUs. Each Send Message in RCaP consumes an Untagged 2521 Buffer at the Data Sink. However, neither the RCaP layer nor the 2522 iSER Layer provides an explicit flow control mechanism for the Send 2523 Messages. Therefore, the iSER Layer SHOULD provision enough 2524 Untagged buffers for handling incoming Send Messages to prevent 2525 buffer exhaustion at the RCaP layer. If buffer exhaustion occurs, 2526 it may result in the termination of the connection. 2528 An implementation may choose to satisfy the buffer requirement by 2529 using a common buffer pool shared across multiple connections, with 2530 usage limits on a per connection basis and usage limits on the 2531 buffer pool itself. In such an implementation, exceeding the buffer 2532 usage limit for a connection or the buffer pool itself may trigger 2533 interventions from the iSER Layer to replenish the buffer pool 2534 and/or to isolate the connection causing the problem. 2536 iSER also provides the MaxOutstandingUnexpectedPDUs key to be used 2537 by the initiator and the target to declare the maximum number of 2538 outstanding "unexpected" control-type PDUs that it can receive. It 2539 is intended to allow the receiving side to determine the amount of 2540 buffer resources needed beyond the normal flow control mechanism 2541 available in iSCSI. 2543 The buffer resources required at both the initiator and the target 2544 as a result of control-type PDUs sent by the initiator is described 2545 in section 8.1.1. The buffer resources required at both the 2546 initiator and target as a result of control-type PDUs sent by the 2547 target is described in section 8.1.2. 2549 8.1.1 Flow Control for Control-Type PDUs from the Initiator 2551 The control-type PDUs that can be sent by an initiator to a target 2552 can be grouped into the following categories: 2554 1. Regulated: Control-type PDUs in this category are regulated by 2555 the iSCSI CmdSN window mechanism and the immediate flag is not 2556 set. 2558 2. Unregulated but Expected: Control-type PDUs in this category 2559 are not regulated by the iSCSI CmdSN window mechanism but are 2560 expected by the target. 2562 3. Unregulated and Unexpected: Control-type PDUs in this category 2563 are not regulated by the iSCSI CmdSN window mechanism and are 2564 "unexpected" by the target. 2566 8.1.1.1 Control-Type PDUs from the Initiator in the Regulated Category 2568 Control-type PDUs that can be sent by the initiator in this category 2569 are regulated by the iSCSI CmdSN window mechanism and the immediate 2570 flag is not set. 2572 The queuing capacity required of the iSCSI layer at the target is 2573 described in section 4.2.2.1 of [iSCSI]. For each of the control- 2574 type PDUs that can be sent by the initiator in this category, the 2575 initiator MUST provision for the buffer resources required for the 2576 corresponding control-type PDU sent as a response from the target. 2577 The following is a list of the PDUs that can be sent by the 2578 initiator and the PDUs that are sent by the target in response: 2580 a. When an initiator sends a SCSI Command PDU, it expects a 2581 SCSI Response PDU from the target. 2583 b. When the initiator sends a Task Management Function Request 2584 PDU, it expects a Task Management Function Response PDU from 2585 the target. 2587 c. When the initiator sends a Text Request PDU, it expects a 2588 Text Response PDU from the target. 2590 d. When the initiator sends a Logout Request PDU, it expects a 2591 Logout Response PDU from the target. 2593 e. When the initiator sends a NOP-Out PDU as a ping request 2594 with ITT != 0xffffffff and TTT = 0xffffffff, it expects a 2595 NOP-In PDU from the target with the same ITT and TTT as in 2596 the ping request. 2598 The response from the target for any of the PDUs enumerated here may 2599 alternatively be in the form of a Reject PDU sent instead before the 2600 task is active, as described in section 7.3 of [iSCSI]. 2602 8.1.1.2 Control-Type PDUs from the Initiator in the Unregulated but 2603 Expected Category 2605 For the control-type PDUs in the Unregulated but Expected category, 2606 the amount of buffering resources required at the target can be 2607 predetermined. The following is a list of the PDUs in this 2608 category: 2610 a. SCSI Data-out PDUs are used by the initiator to send 2611 unsolicited data. The amount of buffer resources required 2612 by the target can be determined using FirstBurstLength. 2613 Note that SCSI Data-out PDUs are not used for solicited 2614 data since the R2T PDU which is used for solicitation is 2615 transformed into RDMA Read operations by the iSER layer at 2616 the target. See section 7.3.4. 2618 b. A NOP-Out PDU with TTT != 0xffffffff is sent as a ping 2619 response by the initiator to the NOP-In PDU sent as a ping 2620 request by the target. 2622 8.1.1.3 Control-Type PDUs from the Initiator in the Unregulated and 2623 Unexpected Category 2625 PDUs in the Unregulated and Unexpected category are PDUs with the 2626 immediate flag set. The number of PDUs in this category which can 2627 be sent by an initiator is controlled by the value of 2628 MaxOutstandingUnexpectedPDUs declared by the target. (See section 2629 6.7.) After a PDU in this category is sent by the initiator, it is 2630 outstanding until it is retired. At any time, the number of 2631 outstanding unexpected PDUs MUST NOT exceed the value of 2632 MaxOutstandingUnexpectedPDUs declared by the target. 2634 The target uses the value of MaxOutstandingUnexpectedPDUs that it 2635 declared to determine the amount of buffer resources required for 2636 control-type PDUs in this category that can be sent by an initiator. 2637 For the initiator, for each of the control-type PDUs that can be 2638 sent in this category, the initiator MUST provision for the buffer 2639 resources if required for the corresponding control-type PDU that 2640 can be sent as a response from the target. 2642 An outstanding PDU in this category is retired as follows. If the 2643 CmdSN of the PDU sent by the initiator in this category is x, the 2644 PDU is outstanding until the initiator sends a non-immediate 2645 control-type PDU on the same connection with CmdSN = y (where y is 2646 at least x) and the target responds with a control-type PDU on any 2647 connection where ExpCmdSN is at least y+1. 2649 When the number of outstanding unexpected control-type PDUs equals 2650 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the initiator MUST 2651 NOT generate any unexpected PDUs which otherwise it would have 2652 generated, even if it is intended for immediate delivery. 2654 8.1.2 Flow Control for Control-Type PDUs from the Target 2656 Control-type PDUs that can be sent by a target and are expected by 2657 the initiator are listed in the Regulated category. (See section 2658 8.1.1.1.) 2660 For the control-type PDUs that can be sent by a target and are 2661 unexpected by the initiator, the number is controlled by 2662 MaxOutstandingUnexpectedPDUs declared by the initiator. (See 2663 section 6.7.) After a PDU in this category is sent by a target, it 2664 is outstanding until it is retired. At any time, the number of 2665 outstanding unexpected PDUs MUST NOT exceed the value of 2666 MaxOutstandingUnexpectedPDUs declared by the initiator. The 2667 initiator uses the value of MaxOutstandingUnexpectedPDUs that it 2668 declared to determine the amount of buffer resources required for 2669 control-type PDUs in this category that can be sent by a target. 2670 The following is a list of the PDUs in this category and the 2671 conditions for retiring the outstanding PDU: 2673 a. For an Asynchronous Message PDU with StatSN = x, the PDU is 2674 outstanding until the initiator sends a control-type PDU 2675 with ExpStatSN set to at least x+1. 2677 b. For a Reject PDU with StatSN = x which is sent after a task 2678 is active, the PDU is outstanding until the initiator sends 2679 a control-type PDU with ExpStatSN set to at least x+1. 2681 c. For a NOP-In PDU with ITT = 0xffffffff and StatSN = x, the 2682 PDU is outstanding until the initiator responds with a 2683 control-type PDU on the same connection where ExpStatSN is 2684 at least x+1. But if the NOP-In PDU is sent as a ping 2685 request with TTT != 0xffffffff, the PDU can also be retired 2686 when the initiator sends a NOP-Out PDU with the same ITT and 2687 TTT as in the ping request. Note that when a target sends a 2688 NOP-In PDU as a ping request, it must provision a buffer for 2689 the NOP-Out PDU sent as a ping response from the initiator. 2691 When the number of outstanding unexpected control-type PDUs equals 2692 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the target MUST NOT 2693 generate any unexpected PDUs which otherwise it would have 2694 generated, even if its intent is to indicate an iSCSI error 2695 condition (e.g., Asynchronous Message, Reject). Task timeouts as in 2696 the initiator waiting for a command completion or other connection 2697 and session level exceptions will ensure that correct operational 2698 behavior will result in these cases despite not generating the PDU. 2699 This rule overrides any other requirements elsewhere which require 2700 that a Reject PDU MUST be sent. 2702 (Implementation note: SCSI task timeout and recovery can be a 2703 lengthy process and hence SHOULD be avoided by proper provisioning 2704 of resources.) 2706 (Implementation note: To ensure that the initiator has a means to 2707 inform the target that outstanding PDUs have been retired, the 2708 target should reserve the last unexpected control-type PDU allowable 2709 by the value of MaxOutstandingUnexpectedPDUs declared by the 2710 initiator for sending a NOP-In ping request with TTT != 0xffffffff 2711 to allow the initiator to return the NOP-Out ping response with the 2712 current ExpStatSN.) 2714 8.2 Flow Control for RDMA Read Resources 2716 If iSERHelloRequired is negotiated to "Yes", then the total number 2717 of RDMA Read operations that can be active simultaneously on an 2718 iSCSI/iSER connection depends on the amount of resources allocated 2719 as declared in the iSER Hello exchange described in section 5.1.3. 2720 Exceeding the number of RDMA Read operations allowed on a connection 2721 will result in the connection being terminated by the RCaP layer. 2722 The iSER Layer at the target maintains the iSER-ORD to keep track of 2723 the maximum number of RDMA Read Requests that can be issued by the 2724 iSER Layer on a particular RCaP Stream. 2726 During connection setup (see section 5.1), iSER-IRD is known at the 2727 initiator and iSER-ORD is known at the target after the iSER Layers 2728 at the initiator and the target have respectively allocated the 2729 connection resources necessary to support RCaP, as directed by the 2730 Allocate_Connection_Resources Operational Primitive from the iSCSI 2731 Layer before the end of the iSCSI Login Phase. In the full feature 2732 phase, if iSERHelloRequired is ngtiated to "Yes", then the first 2733 message sent by the initiator is the iSER Hello Message (see section 2734 9.3) which contains the value of iSER-IRD. In response to the iSER 2735 Hello Message, the target sends the iSER HelloReply Message (see 2736 section 9.4) which contains the value of iSER-ORD. The iSER Layer 2737 at both the initiator and the target MAY adjust (lower) the 2738 resources associated with iSER-IRD and iSER-ORD respectively to 2739 match the iSER-ORD value declared in the HelloReply Message. The 2740 iSER Layer at the target MUST flow control the RDMA Read Request 2741 Messages to not exceed the iSER-ORD value at the target. 2743 If iSERHelloRequired is negotiated to "No", then the maximum number 2744 of RDMA Read operations that can be active is negotiated via other 2745 means outside the scope of this document. For example, in 2746 InfiniBand, iSER connection setup uses InfiniBand CM MADs, with 2747 additional iSER information exchanged in the private data. 2749 8.3 STag Management 2751 An STag is an identifier of a Tagged Buffer used in an RDMA 2752 operation. The allocation and the subsequent invalidation of the 2753 STags are specified in this document if the STags are exposed on the 2754 wire by being Advertised in the iSER header or declared in the 2755 header of an RCaP Message. 2757 8.3.1 Allocation of STags 2759 When the iSCSI Layer at the initiator invokes the Send_Control 2760 Operational Primitive to request the iSER Layer at the initiator to 2761 process a SCSI Command, zero, one, or two STags may be allocated by 2762 the iSER Layer. See section 7.3.1 for details. The number of STags 2763 allocated depends on whether the command is unidirectional or 2764 bidirectional and whether solicited write data transfer is involved 2765 or not. 2767 When the iSCSI Layer at the initiator invokes the Send_Control 2768 Operational Primitive to request the iSER Layer at the initiator to 2769 process a Task Management Function Request with the TASK REASSIGN 2770 function, besides allocating zero, one, or two STags, the iSER Layer 2771 MUST invalidate the existing STags (if any) associated with the ITT. 2772 See section 7.3.3 for details. 2774 The iSER Layer at the target allocates a local Data Sink STag when 2775 the iSCSI Layer at the target invokes the Get_Data Operational 2776 Primitive to request the iSER Layer to process an R2T PDU. See 2777 section 7.3.6 for details. 2779 8.3.2 Invalidation of STags 2781 The invalidation of the STags at the initiator at the completion of 2782 a unidirectional or bidirectional command when the associated SCSI 2783 Response PDU is sent by the target is described in section 7.3.2. 2785 When a unidirectional or bidirectional command concludes without the 2786 associated SCSI Response PDU being sent by the target, the iSCSI 2787 Layer at the initiator MUST request the iSER Layer at the initiator 2788 to invalidate the STags by invoking the Deallocate_Task_Resources 2789 Operational Primitive qualified with ITT. In response, the iSER 2790 Layer at the initiator MUST locate the STags (if any) in the Local 2791 Mapping. The iSER Layer at the initiator MUST invalidate the STags 2792 (if any) and the Local Mapping. 2794 For an RDMA Read operation used to realize a SCSI Write data 2795 transfer, the iSER Layer at the target SHOULD invalidate the Data 2796 Sink STag at the conclusion of the RDMA Read operation referencing 2797 the Data Sink STag (to permit the immediate reuse of buffer 2798 resources). 2800 For an RDMA Write operation used to realize a SCSI Read data 2801 transfer, the Data Source STag at the target is not declared to the 2802 initiator and is not exposed on the wire. Invalidation of the STag 2803 is thus not specified. 2805 When a unidirectional or bidirectional command concludes without the 2806 associated SCSI Response PDU being sent by the target, the iSCSI 2807 Layer at the target MUST request the iSER Layer at the target to 2808 invalidate the STags by invoking the Deallocate_Task_Resources 2809 Operational Primitive qualified with ITT. In response, the iSER 2810 Layer at the target MUST locate the local STags (if any) in the 2811 Local Mapping. The iSER Layer at the target MUST invalidate the 2812 local STags (if any) and the Local Mapping. 2814 9 iSER Control and Data Transfer 2816 For iSCSI data-type PDUs (see section 7.1), the iSER Layer uses RDMA 2817 Read and RDMA Write operations to transfer the solicited data. For 2818 iSCSI control-type PDUs (see section 7.2), the iSER Layer uses Send 2819 Messages of RCaP. 2821 9.1 iSER Header Format 2823 An iSER header MUST be present in every Send Message of RCaP. The 2824 iSER header is located in the first 28 bytes of the message payload 2825 of the Send Message of RCaP, as shown in Figure 2. 2827 0 1 2 3 2828 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2830 | Opcode| Opcode Specific Fields | 2831 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2832 | Opcode Specific Fields (32 bits) | 2833 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2834 | | 2835 | Opcode Specific Fields (64 bits) | 2836 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2837 | Opcode Specific Fields (32 bits) | 2838 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2839 | | 2840 | Opcode Specific Fields (64 bits) | 2841 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2843 Figure 2 iSER Header Format 2845 Opcode - Operation Code: 4 bits 2847 The Opcode field identifies the type of iSER Messages: 2849 0001b = iSCSI control-type PDU 2851 0010b = iSER Hello Message 2853 0011b = iSER HelloReply Message 2855 All other opcodes are reserved. 2857 9.2 iSER Header Format for iSCSI Control-Type PDU 2859 The iSER Layer uses Send Messages of RCaP to transfer iSCSI control- 2860 type PDUs (see section 7.2). The message payload of each of the 2861 Send Messages of RCaP used for transferring an iSER Message contains 2862 an iSER Header followed by an iSCSI control-type PDU. 2864 The iSER header in a Send Message of RCaP carrying an iSCSI control- 2865 type PDU MUST have the format as described in Figure 3. 2867 0 1 2 3 2868 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2869 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2870 | |W|R| | 2871 | 0001b |S|S| Reserved | 2872 | |V|V| | 2873 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2874 | Write STag | 2875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2876 | | 2877 | Write Base Offset | 2878 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2879 | Read STag | 2880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2881 | | 2882 | Read Base Offset | 2883 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2884 Figure 3 iSER Header Format for iSCSI Control-Type PDU 2886 WSV - Write STag Valid flag: 1 bit 2888 This flag indicates the validity of the Write STag field and 2889 the Write Base Offset field of the iSER Header. If set to one, 2890 the Write STag field and the Write Base Offset field in this 2891 iSER Header are valid. If set to zero, the Write STag field 2892 and the Write Base Offset field in this iSER Header MUST be 2893 ignored at the receiver. The Write STag Valid flag is set to 2894 one when there is solicited data to be transferred for a SCSI 2895 Write or bidirectional command, or when there are non-immediate 2896 unsolicited and solicited data to be transferred for the 2897 referenced task specified in a Task Management Function Request 2898 with the TASK REASSIGN function. 2900 RSV - Read STag Valid flag: 1 bit 2902 This flag indicates the validity of the Read STag field and the 2903 Read Base Offset field of the iSER Header. If set to one, the 2904 Read STag field and the Read Base Offset field in this iSER 2905 Header is valid. If set to zero, the Read STag field and the 2906 Read Base Offset field in this iSER Header MUST be ignored at 2907 the receiver. The Read STag Valid flag is set to one for a 2908 SCSI Read or bidirectional command, or a Task Management 2909 Function Request with the TASK REASSIGN function. 2911 Write STag - Write Steering Tag: 32 bits 2913 This field contains the Write STag when the Write STag Valid 2914 flag is set to one. For a SCSI Write or bidirectional command, 2915 the Write STag is used to Advertise the initiator's I/O Buffer 2916 containing the solicited data. For a Task Management Function 2917 Request with the TASK REASSIGN function, the Write STag is used 2918 to Advertise the initiator's I/O Buffer containing the non- 2919 immediate unsolicited data and solicited data. This Write STag 2920 is used as the Data Source STag in the resultant RDMA Read 2921 operation(s). When the Write STag Valid flag is set to zero, 2922 this field MUST be set to zero and ignored on receive. 2924 Write Base Offset: 64 bits 2926 This field contains the Base Offset associated with the I/O 2927 Buffer for the SCSI Write command when the Write STag Valid 2928 flag is set to one. When the Write STag Valid flag is set to 2929 zero, this field MUST be set to zero and ignored on receive. 2931 Read STag - Read Steering Tag: 32 bits 2933 This field contains the Read STag when the Read STag Valid flag 2934 is set to one. The Read STag is used to Advertise the 2935 initiator's Read I/O Buffer of a SCSI Read or bidirectional 2936 command, or a Task Management Function Request with the TASK 2937 REASSIGN function. This Read STag is used as the Data Sink 2938 STag in the resultant RDMA Write operation(s). When the Read 2939 STag Valid flag is zero, this field MUST be set to zero and 2940 ignored on receive. 2942 Read Base Offset: 64 bits 2944 This field contains the Base Offset associated with the I/O 2945 Buffer for the SCSI Read command when the Read STag Valid flag 2946 is set to one. When the Read STag Valid flag is set to zero, 2947 this field MUST be set to zero and ignored on receive. 2949 Reserved: 2951 Reserved fields MUST be set to zero on transmit and MUST be 2952 ignored on receive. 2954 9.3 iSER Header Format for iSER Hello Message 2956 An iSER Hello Message MUST only contain the iSER header which MUST 2957 have the format as described in Figure 4. If iSERHelloRequired is 2958 negotiated to "Yes", then iSER Hello Message is the first iSER 2959 Message sent on the RCaP Stream from the iSER Layer at the initiator 2960 to the iSER Layer at the target. 2962 0 1 2 3 2963 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2964 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2965 | | | | | | 2966 | 0010b | Rsvd | MaxVer| MinVer| iSER-IRD | 2967 | | | | | | 2968 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2969 | Reserved | 2970 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2971 | | 2972 | Reserved | 2973 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2974 | Reserved | 2975 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2976 | | 2977 | Reserved | 2978 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2980 Figure 4 iSER Header Format for iSER Hello Message 2982 MaxVer - Maximum Version: 4 bits 2984 This field specifies the maximum version of the iSER protocol 2985 supported. It MUST be set to 10 to indicate the version of the 2986 specification described in this document. 2988 MinVer - Minimum Version: 4 bits 2990 This field specifies the minimum version of the iSER protocol 2991 supported. It MUST be set to 10 to indicate the version of the 2992 specification described in this document. 2994 iSER-IRD: 16 bits 2996 This field contains the value of the iSER-IRD at the initiator. 2998 Reserved (Rsvd): 3000 Reserved fields MUST be set to zero on transmit, and MUST be 3001 ignored on receive. 3003 9.4 iSER Header Format for iSER HelloReply Message 3005 An iSER HelloReply Message MUST only contain the iSER header which 3006 MUST have the format as described in Figure 5. If iSERHelloRequired 3007 is negotiated to "Yes", then the iSER HelloReply Message is the 3008 first iSER Message sent on the RCaP Stream from the iSER Layer at 3009 the target to the iSER Layer at the initiator. 3011 0 1 2 3 3012 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3013 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3014 | | |R| | | | 3015 | 0011b |Rsvd |E| MaxVer| CurVer| iSER-ORD | 3016 | | |J| | | | 3017 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3018 | Reserved | 3019 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3020 | | 3021 | Reserved | 3022 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3023 | Reserved | 3024 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3025 | | 3026 | Reserved | 3027 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3029 Figure 5 iSER Header Format for iSER HelloReply Message 3031 REJ - Reject flag: 1 bit 3033 This flag indicates whether the target is rejecting this 3034 connection. If set to one, the target is rejecting the 3035 connection. 3037 MaxVer - Maximum Version: 4 bits 3039 This field specifies the maximum version of the iSER protocol 3040 supported. It MUST be set to 10 to indicate the version of the 3041 specification described in this document. 3043 CurVer - Current Version: 4 bits 3044 This field specifies the current version of the iSER protocol 3045 supported. It MUST be set to 10 to indicate the version of the 3046 specification described in this document. 3048 iSER-ORD: 16 bits 3050 This field contains the value of the iSER-ORD at the target. 3052 Reserved (Rsvd): 3054 Reserved fields MUST be set to zero on transmit, and MUST be 3055 ignored on receive. 3057 9.5 SCSI Data Transfer Operations 3059 The iSER Layer at the initiator and the iSER Layer at the target 3060 handle each SCSI Write, SCSI Read, and bidirectional operation as 3061 described below. 3063 9.5.1 SCSI Write Operation 3065 The iSCSI Layer at the initiator MUST invoke the Send_Control 3066 Operational Primitive to request the iSER Layer at the initiator to 3067 send the SCSI Write Command. The iSER Layer at the initiator MUST 3068 request the RCaP layer to transmit a Send Message with the message 3069 payload consisting of the iSER header followed by the SCSI Command 3070 PDU and immediate data (if any). The SendSE Message should be used 3071 if supported by the RCaP layer (e.g., iWARP). If there is solicited 3072 data, the iSER Layer MUST Advertise the Write STag and the Base 3073 Offset in the iSER header of the Send Message, as described in 3074 section 9.2. Upon receiving the Send Message, the iSER Layer at the 3075 target MUST notify the iSCSI Layer at the target by invoking the 3076 Control_Notify Operational Primitive qualified with the SCSI Command 3077 PDU. See section 7.3.1 for details on the handling of the SCSI 3078 Write Command. 3080 For the non-immediate unsolicited data, the iSCSI Layer at the 3081 initiator MUST invoke a Send_Control Operational Primitive qualified 3082 with the SCSI Data-out PDU. Upon receiving each Send Message 3083 containing the non-immediate unsolicited data, the iSER Layer at the 3084 target MUST notify the iSCSI Layer at the target by invoking the 3085 Control_Notify Operational Primitive qualified with the SCSI Data- 3086 out PDU. See section 7.3.4 for details on the handling of the SCSI 3087 Data-out PDU. 3089 For the solicited data, when the iSCSI Layer at the target has an 3090 I/O Buffer available, it MUST invoke the Get_Data Operational 3091 Primitive qualified with the R2T PDU. See section 7.3.6 for details 3092 on the handling of the R2T PDU. 3094 When the data transfer associated with this SCSI Write operation is 3095 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3096 Operational Primitive when it is ready to send the SCSI Response 3097 PDU. Upon receiving a Send Message containing the SCSI Response 3098 PDU, the iSER Layer at the initiator MUST notify the iSCSI Layer at 3099 the initiator by invoking the Control_Notify Operational Primitive 3100 qualified with the SCSI Response PDU. See section 7.3.2 for details 3101 on the handling of the SCSI Response PDU. 3103 9.5.2 SCSI Read Operation 3105 The iSCSI Layer at the initiator MUST invoke the Send_Control 3106 Operational Primitive to request the iSER Layer at the initiator to 3107 send the SCSI Read Command. The iSER Layer at the initiator MUST 3108 request the RCaP layer to transmit a Send Message with the message 3109 payload consisting of the iSER header followed by the SCSI Command 3110 PDU. The SendSE Message should be used if supported by the RCaP 3111 layer (e.g., iWARP). The iSER Layer at the initiator MUST Advertise 3112 the Read STag and the Base Offset in the iSER header of the Send 3113 Message, as described in section 9.2. Upon receiving the Send 3114 Message, the iSER Layer at the target MUST notify the iSCSI Layer at 3115 the target by invoking the Control_Notify Operational Primitive 3116 qualified with the SCSI Command PDU. See section 7.3.1 for details 3117 on the handling of the SCSI Read Command. 3119 When the requested SCSI data is available in the I/O Buffer, the 3120 iSCSI Layer at the target MUST invoke the Put_Data Operational 3121 Primitive qualified with the SCSI Data-in PDU. See section 7.3.5 3122 for details on the handling of the SCSI Data-in PDU. 3124 When the data transfer associated with this SCSI Read operation is 3125 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3126 Operational Primitive when it is ready to send the SCSI Response 3127 PDU. The SendInvSE Message should be used if supported by the RCaP 3128 layer (e.g., iWARP). Upon receiving the Send Message containing the 3129 SCSI Response PDU, the iSER Layer at the initiator MUST notify the 3130 iSCSI Layer at the initiator by invoking the Control_Notify 3131 Operational Primitive qualified with the SCSI Response PDU. See 3132 section 7.3.2 for details on the handling of the SCSI Response PDU. 3134 9.5.3 Bidirectional Operation 3136 The initiator and the target handle the SCSI Write and the SCSI Read 3137 portions of this bidirectional operation the same as described in 3138 Section 9.5.1 and Section 9.5.2 respectively. 3140 10 iSER Error Handling and Recovery 3142 RCaP provides the iSER Layer with reliable in-order delivery. 3143 Therefore, the error management needs of an iSER-assisted connection 3144 are somewhat different than those of a Traditional iSCSI connection. 3146 10.1 Error Handling 3148 iSER error handling is described in the following sections, 3149 classified loosely based on the sources of errors: 3151 1. Those originating at the transport layer (e.g., TCP). 3153 2. Those originating at the RCaP layer. 3155 3. Those originating at the iSER Layer. 3157 4. Those originating at the iSCSI Layer. 3159 10.1.1 Errors in the Transport Layer 3161 If the transport layer is TCP, then TCP packets with detected errors 3162 are silently dropped by the TCP layer and result in retransmission 3163 at the TCP layer. This has no impact on the iSER Layer. However, 3164 connection loss (e.g., link failure) and unexpected termination 3165 (e.g., TCP graceful or abnormal close without the iSCSI Logout 3166 exchanges) at the transport layer will cause the iSCSI/iSER 3167 connection to be terminated as well. 3169 10.1.1.1 Failure in the Transport Layer Before RCaP Mode is Enabled 3171 If the Connection is lost or terminated before the iSCSI Layer 3172 invokes the Allocate_Connection_Resources Operational Primitive, the 3173 login process is terminated and no further action is required. 3175 If the Connection is lost or terminated after the iSCSI Layer has 3176 invoked the Allocate_Connection_Resources Operational Primitive, 3177 then the iSCSI Layer MUST request the iSER Layer to deallocate all 3178 connection resources by invoking the Deallocate_Connection_Resources 3179 Operational Primitive. 3181 10.1.1.2 Failure in the Transport Layer After RCaP Mode is Enabled 3183 If the Connection is lost or terminated after the iSCSI Layer has 3184 invoked the Enable_Datamover Operational Primitive, the iSER Layer 3185 MUST notify the iSCSI Layer of the connection loss by invoking the 3186 Connection_Terminate_Notify Operational Primitive. Prior to 3187 invoking the Connection_Terminate_Notify Operational Primitive, the 3188 iSER layer MUST perform the actions described in Section 5.2.3.2. 3190 10.1.2 Errors in the RCaP Layer 3192 The RCaP layer does not have error recovery operations built in. If 3193 errors are detected at the RCaP layer, the RCaP layer will terminate 3194 the RCaP Stream and the associated Connection. 3196 10.1.2.1 Errors Detected in the Local RCaP Layer 3198 If an error is encountered at the local RCaP layer, the RCaP layer 3199 MAY send a Send Message to the Remote Peer to report the error if 3200 possible. (For iWARP, see [RDMAP] for the list of errors where a 3201 Terminate Message is sent.) The RCaP layer is responsible for 3202 terminating the Connection. After the RCaP layer notifies the iSER 3203 Layer that the Connection is terminated, the iSER Layer MUST notify 3204 the iSCSI Layer by invoking the Connection_Terminate_Notify 3205 Operational Primitive. Prior to invoking the Connection Terminate 3206 Notify Operational Primitive, the iSER layer MUST perform the 3207 actions described in Section 5.2.3.2. 3209 10.1.2.2 Errors Detected in the RCaP Layer at the Remote Peer 3211 If an error is encountered at the RCaP layer at the Remote Peer, the 3212 RCaP layer at the Remote Peer may send a Send Message to report the 3213 error if possible. If it is unable to send a Send Message, the 3214 Connection is terminated. This is treated the same as a failure in 3215 the transport layer after RDMA is enabled as described in section 3216 10.1.1.2. 3218 If an error is encountered at the RCaP layer at the Remote Peer and 3219 it is able to send a Send Message, the RCaP layer at the Remote Peer 3220 is responsible for terminating the connection. After the local RCaP 3221 layer notifies the iSER Layer that the Connection is terminated, the 3222 iSER Layer MUST notify the iSCSI Layer by invoking the Connection 3223 Terminate Notify Operational Primitive. Prior to invoking the 3224 Connection_Terminate_Notify Operational Primitive, the iSER layer 3225 MUST perform the actions described in Section 5.2.3.2. 3227 10.1.3 Errors in the iSER Layer 3229 The error handling due to errors at the iSER Layer is described in 3230 the following sections. 3232 10.1.3.1 Insufficient Connection Resources to Support RCaP at 3233 Connection Setup 3235 After the iSCSI Layer at the initiator invokes the 3236 Allocate_Connection_Resources Operational Primitive during the iSCSI 3237 login negotiation phase, if the iSER Layer at the initiator fails to 3238 allocate the connection resources necessary to support RCaP, it MUST 3239 return a status of failure to the iSCSI Layer at the initiator. The 3240 iSCSI Layer at the initiator MUST terminate the Connection as 3241 described in Section 5.2.3.1. 3243 After the iSCSI Layer at the target invokes the 3244 Allocate_Connection_Resources Operational Primitive during the iSCSI 3245 login negotiation phase, if the iSER Layer at the target fails to 3246 allocate the connection resources necessary to support RCaP, it MUST 3247 return a status of failure to the iSCSI Layer at the target. The 3248 iSCSI Layer at the target MUST send a Login Response with a status 3249 class of 3 (Target Error), and a status code of "0302" (Out of 3250 Resources). The iSCSI Layers at the initiator and the target MUST 3251 terminate the Connection as described in Section 5.2.3.1. 3253 10.1.3.2 iSER Negotiation Failures 3255 If iSERHelloRequired is negotiated to "Yes" and the RCaP or iSER 3256 related parameters declared by the initiator in the iSER Hello 3257 Message is unacceptable to the iSER Layer at the target, the iSER 3258 Layer at the target MUST set the Reject (REJ) flag, as described in 3259 section 9.4, in the iSER HelloReply Message. The following are the 3260 cases when the iSER Layer MUST set the REJ flag to 1 in the 3261 HelloReply Message: 3263 * The initiator-declared iSER-IRD value is greater than 0 and the 3264 target-declared iSER-ORD value is 0. 3266 * The initiator-supported and the target-supported iSER protocol 3267 versions do not overlap. 3269 After requesting the RCaP layer to send the iSER HelloReply Message, 3270 the handling of the error situation is the same as that for iSER 3271 format errors as described in section 10.1.3.3. 3273 10.1.3.3 iSER Format Errors 3275 The following types of errors in an iSER header are considered 3276 format errors: 3278 * Illegal contents of any iSER header field 3279 * Inconsistent field contents in an iSER header 3281 * Length error for an iSER Hello or HelloReply Message (see section 3282 9.3 and 9.4) 3284 When a format error is detected, the following events MUST occur in 3285 the specified sequence: 3287 1. The iSER Layer MUST request the RCaP layer to terminate the RCaP 3288 Stream. The RCaP layer MUST terminate the associated 3289 Connection. 3291 2. The iSER Layer MUST notify the iSCSI Layer of the connection 3292 termination by invoking the Connection_Terminate_Notify 3293 Operational Primitive. Prior to invoking the 3294 Connection_Terminate_Notify Operational Primitive, the iSER 3295 layer MUST perform the actions described in Section 5.2.3.2. 3297 10.1.3.4 iSER Protocol Errors 3299 If iSERHelloRequired is negotiated to "Yes", then the first iSER 3300 Message sent by the iSER Layer at the initiator MUST be the iSER 3301 Hello Message (see section 9.3). In this case the first iSER 3302 Message sent by the iSER Layer at the target MUST be the iSER 3303 HelloReply Message (see section 9.4). Failure to send the iSER 3304 Hello or HelloReply Message, as indicated by the wrong Opcode in the 3305 iSER header, is a protocol error. Conversely, it is a protocol 3306 error if the iSER Hello Message is sent by the iSER Layer at the 3307 initiator when iSERHelloRequired is negotiated to "No". The 3308 handling of iSER protocol errors is the same as that for iSER format 3309 errors as described in section 10.1.3.3. 3311 If the sending side of an iSER-enabled connection acts in a manner 3312 not permitted by the negotiated or declared login/text operational 3313 key values as described in section 6, this is a protocol error and 3314 the receiving side MAY handle this the same as for iSER format 3315 errors as described in section 10.1.3.3. 3317 10.1.4 Errors in the iSCSI Layer 3319 The error handling due to errors at the iSCSI Layer is described in 3320 the following sections. For error recovery, see section 10.2. 3322 10.1.4.1 iSCSI Format Errors 3324 When an iSCSI format error is detected, the iSCSI Layer MUST request 3325 the iSER Layer to terminate the RCaP Stream by invoking the 3326 Connection_Terminate Operational Primitive. For more details on the 3327 connection termination, see Section 5.2.3.1. 3329 10.1.4.2 iSCSI Digest Errors 3331 In the iSER-assisted mode, the iSCSI Layer will not see any digest 3332 error because both the HeaderDigest and the DataDigest keys are 3333 negotiated to "None". 3335 10.1.4.3 iSCSI Sequence Errors 3337 For Traditional iSCSI, sequence errors are caused by dropped PDUs 3338 due to header or data digest errors. Since digests are not used in 3339 iSER-assisted mode and the RCaP layer will deliver all messages in 3340 the order they were sent, sequence errors will not occur in iSER- 3341 assisted mode. 3343 10.1.4.4 iSCSI Protocol Error 3345 When the iSCSI Layer handles certain protocol errors by dropping the 3346 connection, the error handling is the same as that for iSCSI format 3347 errors as described in section 10.1.4.1. 3349 When the iSCSI Layer uses the iSCSI Reject PDU and response codes to 3350 handle certain other protocol errors, no special handling at the 3351 iSER Layer is required. 3353 10.1.4.5 SCSI Timeouts and Session Errors 3355 This is handled at the iSCSI Layer and no special handling at the 3356 iSER Layer is required. 3358 10.1.4.6 iSCSI Negotiation Failures 3360 For negotiation failures that happen during the Login Phase at the 3361 initiator after the iSCSI Layer has invoked the 3362 Allocate_Connection_Resources Operational Primitive and before the 3363 Enable_Datamover Operational Primitive has been invoked, the iSCSI 3364 Layer MUST request the iSER Layer to deallocate all connection 3365 resources by invoking the Deallocate_Connection_Resources 3366 Operational Primitive. The iSCSI Layer at the initiator MUST 3367 terminate the Connection. 3369 For negotiation failures during the Login Phase at the target, the 3370 iSCSI Layer can use a Login Response with a status class other than 3371 0 (success) to terminate the Login Phase. If the iSCSI Layer has 3372 invoked the Allocate_Connection_Resources Operational Primitive and 3373 before the Enable_Datamover Operational Primitive has been invoked, 3374 the iSCSI Layer at the target MUST request the iSER Layer at the 3375 target to deallocate all connection resources by invoking the 3376 Deallocate_Connection_Resources Operational Primitive. The iSCSI 3377 Layer at both the initiator and the target MUST terminate the 3378 Connection. 3380 During the iSCSI Login Phase, if the iSCSI Layer at the initiator 3381 receives a Login Response from the target with a status class other 3382 than 0 (Success) after the iSCSI Layer at the initiator has invoked 3383 the Allocate_Connection_Resources Operational Primitive, the iSCSI 3384 Layer MUST request the iSER Layer to deallocate all connection 3385 resources by invoking the Deallocate_Connection_Resources 3386 Operational Primitive. The iSCSI Layer MUST terminate the 3387 Connection in this case. 3389 For negotiation failures during the full feature phase, the error 3390 handling is left to the iSCSI Layer and no special handling at the 3391 iSER Layer is required. 3393 10.2 Error Recovery 3395 Error recovery requirements of iSCSI/iSER are the same as that of 3396 Traditional iSCSI. All three ErrorRecoveryLevels as defined in 3397 [iSCSI] are supported in iSCSI/iSER. 3399 * For ErrorRecoveryLevel 0, session recovery is handled by iSCSI 3400 and no special handling by the iSER Layer is required. 3402 * For ErrorRecoveryLevel 1, see section 10.2.1 on PDU Recovery. 3404 * For ErrorRecoveryLevel 2, see section 10.2.2 on Connection 3405 Recovery. 3407 The iSCSI Layer may invoke the Notice_Key_Values Operational 3408 Primitive during connection setup to request the iSER Layer to take 3409 note of the value of the operational ErrorRecoveryLevel, as 3410 described in sections 5.1.1 and 5.1.2. 3412 10.2.1 PDU Recovery 3414 As described in sections 10.1.4.2 and 10.1.4.3, digest and sequence 3415 errors will not occur in the iSER-assisted mode. If the RCaP layer 3416 detects an error, it will close the iSCSI/iSER connection, as 3417 described in section 10.1.2. Therefore, PDU recovery is not useful 3418 in the iSER-assisted mode. 3420 The iSCSI Layer at the initiator SHOULD disable iSCSI timeout-driven 3421 PDU retransmissions. 3423 10.2.2 Connection Recovery 3425 The iSCSI Layer at the initiator MAY reassign connection allegiance 3426 for non-immediate commands which are still in progress and are 3427 associated with the failed connection by using a Task Management 3428 Function Request with the TASK REASSIGN function. See section 7.3.3 3429 for more details. 3431 When the iSCSI Layer at the initiator does a task reassignment for a 3432 SCSI Write command, it MUST qualify the Send_Control Operational 3433 Primitive invocation with DataDescriptorOut which defines the I/O 3434 Buffer for both the non-immediate unsolicited data and the solicited 3435 data. This allows the iSCSI Layer at the target to use recovery 3436 R2Ts to request for data originally sent as unsolicited and 3437 solicited from the initiator. 3439 When the iSCSI Layer at the target accepts a reassignment request 3440 for a SCSI Read command, it MUST request the iSER Layer to process 3441 SCSI Data-in for all unacknowledged data by invoking the Put_Data 3442 Operational Primitive. See section 7.3.5 on the handling of SCSI 3443 Data-in. 3445 When the iSCSI Layer at the target accepts a reassignment request 3446 for a SCSI Write command, it MUST request the iSER Layer to process 3447 a recovery R2T for any non-immediate unsolicited data and any 3448 solicited data sequences that have not been received by invoking the 3449 Get_Data Operational Primitive. See section 7.3.6 on the handling 3450 of Ready To Transfer (R2T). 3452 The iSCSI Layer at the target MUST NOT issue recovery R2Ts on an 3453 iSCSI/iSER connection for a task for which the connection allegiance 3454 was never reassigned. The iSER Layer at the target MAY reject such 3455 a recovery R2T received via the Get_Data Operational Primitive 3456 invocation from the iSCSI Layer at the target, with an appropriate 3457 error code. 3459 The iSER Layer at the target will process the requests invoked by 3460 the Put_Data and Get_Data Operational Primitives for a reassigned 3461 task in the same way as for the original commands. 3463 11 Security Considerations 3465 When iSER is layered on top of an RCaP layer and provides the RDMA 3466 extensions to the iSCSI protocol, the security considerations of 3467 iSER are the same as that of the underlying RCaP layer. For iWARP, 3468 this is described in [RDMAP] and [RDDPSEC]. 3470 Since iSER-assisted iSCSI protocol is still functionally iSCSI from 3471 a security considerations perspective, all of the iSCSI security 3472 requirements as described in [iSCSI] applies. If iSER is layered on 3473 top of a non-IP based RCaP layer, all the security protocol 3474 mechanisms applicable to that RCaP layer is also applicable to an 3475 iSCSI/iSER connection. If iSER is layered on top of a non-IP 3476 protocol, the IPsec mechanism as specified in [iSCSI] MUST be 3477 implemented at any point where the iSER protocol enters the IP 3478 network (e.g., via gateways), and the non-IP protocol SHOULD 3479 implement (optional to use) a packet by packet security protocol 3480 equal in strength to the IPsec mechanism specified by [iSCSI]. 3482 To minimize the potential for a denial of service attack, the iSCSI 3483 Layer MUST NOT request the iSER Layer to allocate the connection 3484 resources necessary to support RCaP until the iSCSI layer is 3485 sufficiently far along in the iSCSI Login Phase that it is 3486 reasonably certain that the peer side is not an attacker, as 3487 described in sections 5.1.1 and 5.1.2. 3489 A valid STag exposes I/O Buffer resources to the network for access 3490 via the RCaP. The security considerations referred to in the above 3491 paragraphs provide means of controlling that access in order to 3492 prevent undesired disclosure or modification of data in the I/O 3493 Buffer. These considerations are of heightened importance for 3494 implementations that do not invalidate the STag after completion of 3495 the associated task (ISCSI I/O operation) because the period of 3496 exposure is correspondingly longer. For this reason, STag 3497 invalidation after completion of the associated task is RECOMMENDED 3498 in Section 2.4.1. 3500 12 IANA Considerations 3502 IANA is requested to add the following entries to the "iSCSI 3503 Login/Text Keys" registry of "iSCSI Parameters": 3505 MaxAHSLength, [RFCXXXX] 3507 TaggedBufferForSolicitedDataOnly, [RFCXXXX] 3509 iSERHelloRequired, [RFCXXXX] 3511 RFC Editor: Please replace XXXX in all instances of [RFCXXXX] above 3512 with the RFC number of this document and remove this note. 3514 IANA is requested to update the registrations of the other 4 iSER 3515 keys in that registry to reference the RFC number of this draft when 3516 it is published as an RFC. 3518 13 References 3520 13.1 Normative References 3522 [RFC5046] M. Ko et al., "iSCSI Externsions for Remote Direct Memory 3523 Access", RFC 5046, October 2007 3525 [iSCSI] Chadalapaka et al., "iSCSI Protocol (Consolidated)", draft- 3526 ietf-storm-iscsi-cons-04.txt (work in progress), October 2011 3528 [RDMAP] R. Recio et al., "An RDMA Protocol Specification", RFC 5040, 3529 October 2007 3531 [DDP] H. Shah et al., "Direct Data Placement over Reliable 3532 Transports", RFC 5041, October 2007 3534 [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP 3535 Specification", RFC 5044, October 2007 3537 [RDDPSEC] J. Pinkerton et al., "DDP/RDMAP Security", RFC 5042, 3538 October 2007 3540 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 3541 September 1981 3543 [RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate 3544 Requirement Levels", BCP 14, RFC 2119, March 1997 3546 13.2 Informative References 3548 [SAM5] T10/2104D rev r04, SCSI Architecture Model - 5 (SAM-5), 3549 Committee Draft. 3551 [iSCSI-SAM] F. Knight et al., "Internet Small Computer Systems 3552 Interface (iSCSI) SCSI Architecture Features Update", draft- 3553 ietf-storm-iscsi-sam-04.txt (work in progress), August 2011 3555 [DA] M. Chadalapaka et al., "Datamover Architecture for iSCSI", RFC 3556 5047, October 2007 3558 [IB] InfiniBand Architecture Specification Volume 1 Release 1.2, 3559 October 2004 3561 [IPoIB] H.K. Chu et al, "Transmission of IP over InfiniBand", RFC 3562 4391, March 2006 3564 14 Appendix A: Summary of Changes from RFC 5046 3566 All changes are backward compatible with RFC 5046 except for item #8 3567 which reflects all known implementations of iSER, each of which has 3568 implemented this change, despite its absence in RFC 5046. As a 3569 result, a hypothetical implementation based on RFC 5046 will not 3570 interoperate with an implementation based on this version of the 3571 specification. 3573 1. Removed the requirement that a connection be opened in "normal" 3574 TCP mode and transitioned to zero-copy mode. This allows the spec 3575 to conform to existing implementation for both Infiniband and 3576 iWARP. Changes were made in sections 2, 3.1.6, 4.2, 5.1, 5.1.1, 3577 5.1.2, 5.1.3, 10.1.3.4, and 11. 3579 2. Added a clause in section 6.2 to clarify that 3580 MaxRecvDataSegmentLength must be ignored if it is declared in the 3581 Login Phase. 3583 3. Added a clause in section 6.2 to clarify that the initiator must 3584 not send more than InitiatorMaxRecvDataSegmentLength worth of data 3585 when a NOP-Out request is sent with a valid Initiator Task Tag. 3586 Since InitiatorMaxRecvDataSegmentLength can be smaller than 3587 TargetMaxRecvDataSegmentLength, returning the original data in the 3588 NOP-Out request in this situation can overflow the receive buffer 3589 unless the length of the data sent with the NOP-Out request is 3590 less than InitiatorMaxRecvDataSegmentLength. 3592 4. Added a SHOULD negotiate recommendation for 3593 MaxOutstandingUnexpectedPDUs in section 6.7. 3595 5. Added MaxAHSLength key in section 6.8 to set a limit on the AHS 3596 Length. This is useful when posting receive buffers in knowing 3597 what the maximum possible message length is in a PDU which 3598 contains AHS. 3600 6. Added TaggedBufferForSolicitedDataOnly key in section 6.9 to 3601 indicate how the memory region will be used. An initiator can 3602 treat the memory regions intended for unsolicited and solicited 3603 data differently, and can use different registration modes. In 3604 contrast, RFC 5046 treats the memory occupied by the data as a 3605 contiguous (or virtually contiguous, by means of scatter-gather 3606 mechanisms) and homogenous region. Adding a new key will allow 3607 different memory models to be accommodated. Changes were also 3608 made in section 7.3.1. 3610 7. Added iSERHelloRequired key in section 6.10 to make it optional to 3611 use iSER Hello messages. iSER Hello messages are required for 3612 certain RCaP implementations such as iWARP but can cause problems 3613 for others such as InfiniBand. The default is "No" since iSER 3614 Hello messages have not been implemented and are not in use. 3615 Changes were made in sections 5.1.1, 5.1.2, 5.1.3, 8.2, 9.3, 9.4, 3616 10.1.3.2 and 10.1.3.4. 3618 8. Added two 64-bit fields in iSER header in section 9.2 for the Read 3619 Base Offset and the Write Base Offset to accommodate a non-zero 3620 Base Offset. This allows one implementation such as the OFED 3621 stack to be used in both the Infiniband and the iWARP environment. 3622 Changes were made in the definition of Base Offset, Advertisement, 3623 and Tagged Buffer. Changes were also made in sections 2.4.1, 2.5, 3624 2.6, 7.3.1, 7.3.3, 7.3.5, 7.3.6, 9.1, 9.3, 9.4, 9.5.1, and 9.5.2. 3625 This change is not backward compatible with RFC 5046, but is part 3626 of all known implementations of iSER at the time this document was 3627 developed. 3629 9. Remove iWARP specific behavior. Changes were made in the 3630 definition section on RDMA Operation and Send Message Type. 3631 Clarifications were added in section 2.4.2 on the use of SendSE 3632 and SendInvSE. These clarifications reflect a removal of the 3633 requirements in RFC 5046 for the use of these messages, as 3634 implementations have not followed RFC 5046 in this area. Changes 3635 affecting Send with Invalidate were made in sections 2.4.1, 2.5, 3636 2.6, 4.1, and 7.3.2. Changes affecting Terminate were made in 3637 sections 10.1.2.1 and 10.1.2.2. Changes were made in section 15 3638 to remove iWARP headers. 3640 10. Removed denial of service descriptions for the initiator in 3641 section 5.1.1 since it is applicable for the target only. 3643 11. Clarified in section 2.4.1 that STag invalidation is the 3644 initiator's responsibility for security reasons, and the initiator 3645 cannot rely on the target using an Invalidate version of Send. 3646 Added text in section 11 on Stag invalidation. 3648 15 Appendix B: Message Format for iSER 3650 This section is for information only and is NOT part of the 3651 standard. 3653 15.1 iWARP Message Format for iSER Hello Message 3655 The following figure depicts an iSER Hello Message encapsulated in 3656 an iWARP SendSE Message. 3658 0 1 2 3 3659 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3660 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3661 | MPA Header | DDP Control | RDMA Control | 3662 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3663 | Reserved | 3664 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3665 | (Send) Queue Number | 3666 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3667 | (Send) Message Sequence Number | 3668 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3669 | (Send) Message Offset | 3670 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3671 | 0010b | Zeros | 0001b | 0001b | iSER-IRD | 3672 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3673 | All Zeros | 3674 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3675 | | 3676 | All Zeros | 3677 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3678 | All Zeros | 3679 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3680 | | 3681 | All Zeros | 3683 | MPA CRC | 3684 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3685 Figure 6 SendSE Message containing an iSER Hello Message 3687 15.2 iWARP Message Format for iSER HelloReply Message 3689 The following figure depicts an iSER HelloReply Message encapsulated 3690 in an iWARP SendSE Message. The Reject (REJ) flag is set to 0. 3692 0 1 2 3 3693 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3694 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3695 | MPA Header | DDP Control | RDMA Control | 3696 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3697 | Reserved | 3698 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3699 | (Send) Queue Number | 3700 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3701 | (Send) Message Sequence Number | 3702 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3703 | (Send) Message Offset | 3704 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3705 | 0011b |Zeros|0| 0001b | 0001b | iSER-ORD | 3706 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3707 | All Zeros | 3708 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3709 | | 3710 | All Zeros | 3711 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3712 | All Zeros | 3713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3714 | | 3715 | All Zeros | 3716 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3717 | MPA CRC | 3718 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3719 Figure 7 SendSE Message containing an iSER HelloReply Message 3721 15.3 iSER Header Format for SCSI Read Command PDU 3723 The following figure depicts a SCSI Read Command PDU embedded in an 3724 iSER Message. For this particular example, in the iSER header, the 3725 Write STag Valid flag is set to zero, the Read STag Valid flag is 3726 set to one, the Write STag field is set to all zeros, the Write Base 3727 Offset field is set to all zeros, the Read STag field contains a 3728 valid Read STag, and the Read Base Offset field contains a valid 3729 Base Offset for the Read Tagged Buffer. 3731 0 1 2 3 3732 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3733 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3734 | 0001b |0|1| All zeros | 3735 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3736 | All Zeros | 3737 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3738 | | 3739 | All Zeros | 3740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3741 | Read STag | 3742 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3743 | | 3744 | Read Base Offset | 3745 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3746 | SCSI Read Command PDU | 3747 // // 3748 | | 3749 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3751 Figure 8 iSER Header Format for SCSI Read Command PDU 3753 15.4 iSER Header Format for SCSI Write Command PDU 3755 The following figure depicts a SCSI Write Command PDU embedded in an 3756 iSER Message. For this particular example, in the iSER header, the 3757 Write STag Valid flag is set to one, the Read STag Valid flag is set 3758 to zero, the Write STag field contains a valid Write STag, the Write 3759 Base Offset field contains a valid Base Offset for the Write Tagged 3760 Buffer, the Read STag field is set to all zeros since it is not 3761 used, and the Read Base Offset field is set to all zeros. 3763 0 1 2 3 3764 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3765 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3766 | 0001b |1|0| All zeros | 3767 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3768 | Write STag | 3769 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3770 | | 3771 | Write Base Offset | 3772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3773 | All Zeros | 3774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3775 | | 3776 | All Zeros | 3777 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3778 | SCSI Write Command PDU | 3779 // // 3780 | | 3781 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3782 Figure 9 iSER Header Format for SCSI Write Command PDU 3784 15.5 iSER Header Format for SCSI Response PDU 3786 The following figure depicts a SCSI Response PDU embedded in an iSER 3787 Message: 3789 0 1 2 3 3790 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3791 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3792 | 0001b |0|0| All Zeros | 3793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3794 | All Zeros | 3795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3796 | | 3797 | All Zeros | 3798 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3799 | All Zeros | 3800 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3801 | | 3802 | All Zeros | 3803 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3804 | SCSI Response PDU | 3805 // // 3806 | | 3807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3808 Figure 10 iSER Header Format for SCSI Response PDU 3810 16 Appendix C: Architectural discussion of iSER over InfiniBand 3812 This section explains how an InfiniBand network (with Gateways) 3813 would be structured. It is informational only and is intended to 3814 provide insight on how iSER is used in an InfiniBand environment. 3816 16.1 Host side of iSCSI & iSER connections in Infiniband 3818 Figure 11 defines the topologies in which iSCSI and iSER will be 3819 able to operate on an InfiniBand Network. 3821 +---------+ +---------+ +---------+ +---------+ +--- -----+ 3822 | Host | | Host | | Host | | Host | | Host | 3823 | | | | | | | | | | 3824 +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ 3825 |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| 3826 +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ 3827 |----+------|-----+-----|-----+-----|-----+-----|-----+---> To IB 3828 IB| IB | IB | IB | IB | SubNet2 SWTCH 3829 +-v-----------v-----------v-----------v-----------v---------+ 3830 | InfiniBand Switch for Subnet1 | 3831 +---+-----+--------+-----+--------+-----+------------v------+ 3832 | TCA | | TCA | | TCA | | 3833 +-----+ +-----+ +-----+ | IB 3834 / IB \ / IB \ / \ +--+--v--+--+ 3835 | iSER | | iSER | | IPoIB | | | TCA | | 3836 | Gateway | | Gateway | | Gateway | | +-----+ | 3837 | to | | to | | to | | Storage | 3838 | iSCSI | | iSER | | IP | | Controller| 3839 | TCP | | iWARP | |Ethernet | +-----+-----+ 3840 +---v-----| +---v-----| +----v----+ 3841 | EN | EN | EN 3842 +--------------+---------------+----> to IP based storage 3843 Ethernet links that carry iSCSI or iWARP 3845 Figure 11 iSCSI and iSER on IB 3847 In Figure 11, the Host systems are connected via the InfiniBand Host 3848 Channel Adapters (HCAs) to the InfiniBand links. With the use of IB 3849 switch(es), the InfiniBand links connect the HCA to InfiniBand 3850 Target Channel Adapters (TCAs) located in gateways or Storage 3851 Controllers. An iSER-capable IB-IP Gateway converts the iSER 3852 Messages encapsulated in IB protocols to either standard iSCSI, or 3853 iSER Messages for iWARP. An [IPoIB] Gateway converts the InfiniBand 3854 [IPoIB] protocol to IP protocol, and in the iSCSI case, permits 3855 iSCSI to be operated on an IB Network between the Hosts and the 3856 [IPoIB] Gateway. 3858 16.2 Storage side of iSCSI & iSER mixed network environment 3860 Figure 12 shows a storage controller that has three different portal 3861 groups: one supporting only iSCSI (TPG-4), one supporting iSER/iWARP 3862 or iSCSI (TPG-2), and one supporting iSER/IB (TPG-1). 3864 | | | 3865 | | | 3866 +--+--v--+----------+--v--+----------+--v--+--+ 3867 | | IB | |iWARP| | EN | | 3868 | | | | TCP | | NIC | | 3869 | |(TCA)| | RNIC| | | | 3870 | +-----| +-----+ +-----+ | 3871 | TPG-1 TPG-2 TPG-4 | 3872 | 9.1.3.3 9.1.2.4 9.1.2.6 | 3873 | | 3874 | Storage Controller | 3875 | | 3876 +---------------------------------------------+ 3878 Figure 12 Storage Controller with TCP, iWARP, and IB Connections 3880 The normal iSCSI portal group advertising processes (via SLP, iSNS, 3881 or SendTargets) are available to a Storage Controller. 3883 16.3 Discovery processes for an InfiniBand Host 3885 An InfiniBand Host system can gather portal group IP address from 3886 SLP, iSNS, or the SendTargets discovery processes by using TCP/IP 3887 via [IPoIB]. After obtaining one or more remote portal IP 3888 addresses, the Initiator uses the standard IP mechanisms to resolve 3889 the IP address to a local outgoing interface and the destination 3890 hardware address (Ethernet MAC or IB GID of the target or a gateway 3891 leading to the target). If the resolved interface is an [IPoIB] 3892 network interface, then the target portal can be reached through an 3893 InfiniBand fabric. In this case the Initiator can establish an 3894 iSCSI/TCP or iSCSI/iSER session with the Target over that InfiniBand 3895 interface, using the Hardware Address (InfiniBand GID) obtained 3896 through the standard Address Resolution (ARP) processes. 3898 If more than one IP address are obtained through the discovery 3899 process, the Initiator should select a Target IP address that is on 3900 the same IP subnet as the Initiator if one exists. This will avoid 3901 a potential overhead of going through a gateway when a direct path 3902 exists. 3904 In addition a user can configure manual static IP route entries if a 3905 particular path to the target is preferred. 3907 16.4 IBTA Connection specifications 3909 It is outside the scope of this document, but it is expected that 3910 the InfiniBand Trade Association (IBTA) has or will define: 3912 * The iSER ServiceID 3914 * A Means for permitting a Host to establish a connection with a 3915 peer InfiniBand end-node, and that peer indicating when that 3916 end-node supports iSER, so the Host would be able to fall back 3917 to iSCSI/TCP over [IPoIB]. 3919 * A Means for permitting the Host to establish connections with 3920 IB iSER connections on storage controllers or IB iSER connected 3921 Gateways in preference to [IPoIB] connected Gateways/Bridges or 3922 connections to Target Storage Controllers that also accept 3923 iSCSI via [IPoIB]. 3925 * A Means for combining the IB ServiceID for iSER and the IP port 3926 number such that the IB Host can use normal IB connection 3927 processes, yet ensure that the iSER target peer can actually 3928 connect to the required IP port number. 3930 17 Acknowledgments 3932 The authors acknowledge the following individuals for identifying 3933 implementation issues and/or suggesting resolutions to the issues 3934 clarified in this document: Alexander Nezhinsky, Robert Russell, 3935 Arne Redlich, David Black, Mallikarjun Chadalapaka, Tom Talpey, 3936 Felix Marti, Robert Sharp, Caitlin Bestler, and Hemal Shah. Credit 3937 also goes to the authors of the original iSER Specification 3938 [RFC5046], including Michael Ko, Mallikarjun Chadalapaka, John 3939 Hufferd, Uri Elzur, Hemal Shah, and Patricia Thaler. This document 3940 benefited from all of their contributions. 3942 Author's Address 3944 Michael Ko 3945 Email: mkosjc@gmail.com 3947 Alexander Nezhinsky 3948 Mellanox Technologies 3949 13 Zarchin St. 3950 Raanana 43662, Israel 3951 Phone: +972-74-712-9000 3952 Email: alexandern@mellanox.com, nezhinsky@gmail.com 3954 Copyright Notice 3956 Copyright (c) 2012 IETF Trust and the persons identified as the 3957 document authors. All rights reserved. 3959 This document is subject to BCP 78 and the IETF Trust's Legal 3960 Provisions Relating to IETF Documents 3961 (http://trustee.ietf.org/license-info) in effect on the date of 3962 publication of this document. Please review these documents 3963 carefully, as they describe your rights and restrictions with 3964 respect to this document. Code Components extracted from this 3965 document must include Simplified BSD License text as described in 3966 Section 4.e of the Trust Legal Provisions and are provided without 3967 warranty as described in the Simplified BSD License.