idnits 2.17.1 draft-ietf-storm-iser-14.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 3 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (June 6, 2013) is 3976 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IPSEC-IPS' is mentioned on line 3577, but not defined == Missing Reference: 'RFCXXXX' is mentioned on line 3628, but not defined == Unused Reference: 'IPS-IPSEC' is defined on line 3678, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5046 (Obsoleted by RFC 7145) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) == Outdated reference: A later version (-04) exists of draft-ietf-storm-ipsec-ips-update-00 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Storage Maintenance (StorM) Working Group Michael Ko 3 Internet Draft Consultant 4 Intended status: Proposed Standard Alexander Nezhinsky 5 Expires: December 2013 Mellanox 6 Obsoletes: 5046 June 6, 2013 8 iSCSI Extensions for RDMA Specification 9 draft-ietf-storm-iser-14.txt 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with 14 the provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on December, 2013. 34 Abstract 36 iSCSI Extensions for Remote Direct Memory Access (RDMA) provides the 37 RDMA data transfer capability to iSCSI by layering iSCSI on top of 38 an RDMA-Capable Protocol. An RDMA-Capable Protocol provides RDMA 39 Read and Write services, which enable data to be transferred 40 directly into SCSI I/O Buffers without intermediate data copies. 41 This document describes the extensions to the iSCSI protocol to 42 support RDMA services as provided by an RDMA-Capable Protocol. 44 This document obsoletes RFC 5046. 46 Table of Contents 48 1 Definitions and Acronyms ....................................6 49 1.1 Definitions .................................................6 50 1.2 Acronyms ...................................................12 51 1.3 Conventions ................................................14 52 2 Introduction ...............................................15 53 2.1 Motivation .................................................15 54 2.2 iSCSI/iSER Layering ........................................16 55 2.3 Architectural Goals ........................................17 56 2.4 Protocol Overview ..........................................17 57 2.5 RDMA services and iSER .....................................19 58 2.5.1 STag......................................................19 59 2.5.2 Send......................................................20 60 2.5.3 RDMA Write................................................21 61 2.5.4 RDMA Read.................................................21 62 2.6 SCSI Read Overview .........................................21 63 2.7 SCSI Write Overview ........................................22 64 3 Upper Layer Interface Requirements .........................23 65 3.1 Operational Primitives offered by iSER .....................23 66 3.1.1 Send_Control..............................................24 67 3.1.2 Put_Data..................................................24 68 3.1.3 Get_Data..................................................24 69 3.1.4 Allocate_Connection_Resources.............................25 70 3.1.5 Deallocate_Connection_Resources...........................25 71 3.1.6 Enable_Datamover..........................................25 72 3.1.7 Connection_Terminate......................................26 73 3.1.8 Notice_Key_Values.........................................26 74 3.1.9 Deallocate_Task_Resources.................................26 75 3.2 Operational Primitives used by iSER ........................27 76 3.2.1 Control_Notify............................................27 77 3.2.2 Data_Completion_Notify....................................27 78 3.2.3 Data_ACK_Notify...........................................28 79 3.2.4 Connection_Terminate_Notify...............................28 80 3.3 iSCSI Protocol Usage Requirements ..........................28 81 4 Lower Layer Interface Requirements .........................30 82 4.1 Interactions with the RCaP Layer ...........................30 83 4.2 Interactions with the Transport Layer ......................31 84 5 Connection Setup and Termination ...........................32 85 5.1 iSCSI/iSER Connection Setup ................................32 86 5.1.1 Initiator Behavior........................................33 87 5.1.2 Target Behavior...........................................35 88 5.1.3 iSER Hello Exchange.......................................36 89 5.2 iSCSI/iSER Connection Termination ..........................39 90 5.2.1 Normal Connection Termination at the Initiator............39 91 5.2.2 Normal Connection Termination at the Target...............40 92 5.2.3 Termination without Logout Request/Response PDUs..........40 93 6 Login/Text Operational Keys ................................42 94 6.1 HeaderDigest and DataDigest ................................42 95 6.2 MaxRecvDataSegmentLength ...................................42 96 6.3 RDMAExtensions .............................................43 97 6.4 TargetRecvDataSegmentLength ................................44 98 6.5 InitiatorRecvDataSegmentLength .............................44 99 6.6 OFMarker and IFMarker ......................................45 100 6.7 MaxOutstandingUnexpectedPDUs ...............................45 101 6.8 MaxAHSLength ...............................................46 102 6.9 TaggedBufferForSolicitedDataOnly ...........................46 103 6.10 iSERHelloRequired.........................................47 104 7 iSCSI PDU Considerations ...................................48 105 7.1 iSCSI Data-Type PDU ........................................48 106 7.2 iSCSI Control-Type PDU .....................................49 107 7.3 iSCSI PDUs .................................................49 108 7.3.1 SCSI Command..............................................49 109 7.3.2 SCSI Response.............................................51 110 7.3.3 Task Management Function Request/Response.................53 111 7.3.4 SCSI Data-out.............................................54 112 7.3.5 SCSI Data-in..............................................55 113 7.3.6 Ready To Transfer (R2T)...................................57 114 7.3.7 Asynchronous Message......................................59 115 7.3.8 Text Request & Text Response..............................59 116 7.3.9 Login Request & Login Response............................60 117 7.3.10 Logout Request & Logout Response ........................60 118 7.3.11 SNACK Request ...........................................60 119 7.3.12 Reject ..................................................60 120 7.3.13 NOP-Out & NOP-In ........................................61 121 8 Flow Control and STag Management ...........................62 122 8.1 Flow Control for RDMA Send Messages ........................62 123 8.1.1 Flow Control for Control-Type PDUs from the Initiator.....62 124 8.1.2 Flow Control for Control-Type PDUs from the Target........65 125 8.2 Flow Control for RDMA Read Resources .......................66 126 8.3 STag Management ............................................67 127 8.3.1 Allocation of STags.......................................67 128 8.3.2 Invalidation of STags.....................................67 129 9 iSER Control and Data Transfer .............................69 130 9.1 iSER Header Format .........................................69 131 9.2 iSER Header Format for iSCSI Control-Type PDU ..............69 132 9.3 iSER Header Format for iSER Hello Message ..................72 133 9.4 iSER Header Format for iSER HelloReply Message .............73 134 9.5 SCSI Data Transfer Operations ..............................74 135 9.5.1 SCSI Write Operation......................................74 136 9.5.2 SCSI Read Operation.......................................75 137 9.5.3 Bidirectional Operation...................................76 138 10 iSER Error Handling and Recovery ...........................77 139 10.1 Error Handling............................................77 140 10.1.1 Errors in the Transport Layer ...........................77 141 10.1.2 Errors in the RCaP Layer ................................78 142 10.1.3 Errors in the iSER Layer ................................78 143 10.1.4 Errors in the iSCSI Layer ...............................80 144 10.2 Error Recovery............................................82 145 10.2.1 PDU Recovery ............................................82 146 10.2.2 Connection Recovery .....................................83 147 11 Security Considerations ....................................84 148 12 IANA Considerations ........................................85 149 13 References .................................................86 150 13.1 Normative References......................................86 151 13.2 Informative References....................................86 152 14 Appendix A: Summary of Changes from RFC 5046 ...............88 153 15 Appendix B: Message Format for iSER ........................90 154 15.1 iWARP Message Format for iSER Hello Message...............90 155 15.2 iWARP Message Format for iSER HelloReply Message..........91 156 15.3 iSER Header Format for SCSI Read Command PDU..............92 157 15.4 iSER Header Format for SCSI Write Command PDU.............93 158 15.5 iSER Header Format for SCSI Response PDU..................94 159 16 Appendix C: Architectural discussion of iSER over InfiniBand95 160 16.1 Host side of iSCSI & iSER connections in Infiniband.......95 161 16.2 Storage side of iSCSI & iSER mixed network environment....96 162 16.3 Discovery processes for an InfiniBand Host................96 163 16.4 IBTA Connection specifications............................97 164 17 Acknowledgments ............................................98 165 Table of Figures 167 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase ...16 168 Figure 2 iSER Header Format .....................................69 169 Figure 3 iSER Header Format for iSCSI Control-Type PDU ..........70 170 Figure 4 iSER Header Format for iSER Hello Message ..............72 171 Figure 5 iSER Header Format for iSER HelloReply Message .........73 172 Figure 6 SendSE Message containing an iSER Hello Message ........90 173 Figure 7 SendSE Message containing an iSER HelloReply Message ...91 174 Figure 8 iSER Header Format for SCSI Read Command PDU ...........92 175 Figure 9 iSER Header Format for SCSI Write Command PDU ..........93 176 Figure 10 iSER Header Format for SCSI Response PDU ..............94 177 Figure 11 iSCSI and iSER on IB ..................................95 178 Figure 12 Storage Controller with TCP, iWARP, and IB Connections 96 180 1 Definitions and Acronyms 182 1.1 Definitions 184 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 185 The act of informing a remote iSER (iSCSI Extensions for RDMA) 186 Layer that a local node's buffer is available to it. A Node 187 makes a buffer available for incoming RDMA Read Request Message 188 or incoming RDMA Write Message access by informing the remote 189 iSER Layer of the Tagged Buffer identifiers (STag, Base Offset, 190 and buffer length). Note that this Advertisement of Tagged 191 Buffer information is the responsibility of the iSER Layer on 192 either end and is not defined by the RDMA-Capable Protocol. A 193 typical method would be for the iSER Layer to embed the Tagged 194 Buffer's STag, Base Offset, and buffer length in a message 195 destined for the remote iSER Layer. 197 Base Offset - A value when added to the Buffer Offset forms the 198 Tagged Offset. 200 Completion (Completed, Complete, Completes) - Completion is defined 201 as the process by the RDMA-Capable Protocol layer to inform the 202 iSER Layer, that a particular RDMA Operation has performed all 203 functions specified for the RDMA Operation. 205 Connection - A connection is a logical bidirectional communication 206 channel between the initiator and the target, e.g., a TCP 207 connection. Communication between the initiator and the target 208 occurs over one or more connections. The connections carry 209 control messages, SCSI commands, parameters, and data within 210 iSCSI Protocol Data Units (iSCSI PDUs). 212 Connection Handle - An information element that identifies the 213 particular iSCSI connection and is unique for a given iSCSI 214 Layer and the underlying iSER Layer. Every invocation of an 215 Operational Primitive is qualified with the Connection Handle. 217 Data Sink - The peer receiving a data payload. Note that the Data 218 Sink can be required to both send and receive RCaP (RDMA-Capable 219 Protocol) Messages to transfer a data payload. 221 Data Source - The peer sending a data payload. Note that the Data 222 Source can be required to both send and receive RCaP Messages to 223 transfer a data payload. 225 Datamover Interface (DI) - The interface between the iSCSI Layer and 226 the Datamover Layer as described in [DA]. 228 Datamover Layer - A layer that is directly below the iSCSI Layer and 229 above the underlying transport layers. This layer exposes and 230 uses a set of transport independent Operational Primitives for 231 the communication between the iSCSI Layer and itself. The 232 Datamover layer, operating in conjunction with the transport 233 layers, moves the control and data information on the iSCSI 234 connection. In this specification, the iSER Layer is the 235 Datamover layer. 237 Datamover Protocol - A Datamover protocol is the wire-protocol that 238 is defined to realize the Datamover layer functionality. In 239 this specification, the iSER protocol is the Datamover protocol. 241 Inbound RDMA Read Queue Depth (IRD) - The maximum number of incoming 242 outstanding RDMA Read Requests that the RDMA-Capable Controller 243 can handle on a particular RCaP Stream at the Data Source. For 244 some RDMA-Capable Protocol layers, the term "IRD" may be known 245 by a different name. For example, for InfiniBand, the 246 equivalent for IRD is the Responder Resources. 248 I/O Buffer - A buffer that is used in a SCSI Read or Write operation 249 so SCSI data may be sent from or received into that buffer. 251 iSCSI - The iSCSI protocol as defined in [iSCSI] is a mapping of the 252 SCSI Architecture Model of SAM-5 over TCP. 254 iSCSI control-type PDU - Any iSCSI PDU that is not an iSCSI data- 255 type PDU and also not a SCSI Data-out PDU carrying solicited 256 data is defined as an iSCSI control-type PDU. Specifically, it 257 is to be noted that SCSI Data-out PDUs for unsolicited data are 258 defined as iSCSI control-type PDUs. 260 iSCSI data-type PDU - An iSCSI data-type PDU is defined as an iSCSI 261 PDU that causes data transfer via RDMA operations at the iSER 262 layer, transparent to the remote iSCSI Layer, to take place 263 between the peer iSCSI nodes on a full feature phase iSCSI 264 connection. An iSCSI data-type PDU, when requested for 265 transmission by the sender iSCSI Layer, results in the 266 associated data transfer without the participation of the remote 267 iSCSI Layer, i.e. the PDU itself is not delivered as-is to the 268 remote iSCSI Layer. The following iSCSI PDUs constitute the set 269 of iSCSI data-type PDUs - SCSI Data-In PDU and R2T PDU. 271 iSCSI Layer - A layer in the protocol stack implementation within an 272 end node that implements the iSCSI protocol and interfaces with 273 the iSER Layer via the Datamover Interface. 275 iSCSI PDU (iSCSI Protocol Data Unit) - The iSCSI Layer at the 276 initiator and the iSCSI Layer at the target divide their 277 communications into messages. The term "iSCSI protocol data 278 unit" (iSCSI PDU) is used for these messages. 280 iSCSI/iSER Connection - An iSER-assisted iSCSI connection. An iSCSI 281 connection that is not iSER-assisted always maps onto a TCP 282 connection at the transport level. But an iSER-assisted iSCSI 283 connection may not have an underlying TCP connection. For some 284 RCaP implementation (e.g., iWARP), an iSER-assisted iSCSI 285 connection has an underlying TCP connection. For other RCaP 286 implementation (e.g., InfiniBand), there is no underlying TCP 287 connection. (In the specific example of InfiniBand [IB], an 288 iSER-assisted iSCSI connection is directly mapped onto the 289 InfiniBand RC channel.) 291 iSCSI/iSER Session - An iSER-assisted iSCSI session. All 292 connections of an iSCSI/iSER session are iSCSI/iSER connections. 294 iSER - iSCSI Extensions for RDMA, the protocol defined in this 295 document. 297 iSER-assisted - A term generally used to describe the operation of 298 iSCSI when the iSER functionality is also enabled below the 299 iSCSI Layer for the specific iSCSI/iSER connection in question. 301 iSER-IRD - This variable represents the maximum number of incoming 302 outstanding RDMA Read Requests that the iSER Layer at the 303 initiator declares on a particular RCaP Stream. 305 iSER-ORD - This variable represents the maximum number of 306 outstanding RDMA Read Requests that the iSER Layer can initiate 307 on a particular RCaP Stream. This variable is maintained only 308 by the iSER Layer at the target. 310 iSER Layer - The layer that implements the iSCSI Extensions for RDMA 311 (iSER) protocol. 313 iWARP - A suite of wire protocols comprising of [RDMAP], [DDP], and 314 [MPA] when layered above [TCP]. [RDMAP] and [DDP] may be 315 layered above SCTP or other transport protocols. 317 Local Mapping - A task state record maintained by the iSER Layer 318 that associates the Initiator Task Tag to the Local STag(s). 319 The specifics of the record structure are implementation 320 dependent. 322 Local Peer - The implementation of the RDMA-Capable Protocol on the 323 local end of the connection. Used to refer to the local entity 324 when describing protocol exchanges or other interactions between 325 two Nodes. 327 Node - A computing device attached to one or more links of a 328 network. A Node in this context does not refer to a specific 329 application or protocol instantiation running on the computer. 330 A Node may consist of one or more RDMA-Capable Controllers 331 installed in a host computer. 333 Operational Primitive - An Operational Primitive is an abstract 334 functional interface procedure that requests another layer to 335 perform a specific action on the requestor's behalf or notifies 336 the other layer of some event. The Datamover Interface between 337 an iSCSI Layer and a Datamover layer within an iSCSI end node 338 uses a set of Operational Primitives to define the functional 339 interface between the two layers. Note that not every 340 invocation of an Operational Primitive may elicit a response 341 from the requested layer. A full discussion of the Operational 342 Primitive types and request-response semantics available to 343 iSCSI and iSER can be found in [DA]. 345 Outbound RDMA Read Queue Depth (ORD) - The maximum number of 346 outstanding RDMA Read Requests that the RDMA-Capable Controller 347 can initiate on a particular RCaP Stream at the Data Sink. For 348 some RDMA-Capable Protocol layer, the term "ORD" may be known by 349 a different name. For example, for InfiniBand, the equivalent 350 for ORD is the Initiator Depth. 352 Phase Collapse - Refers to the optimization in iSCSI where the SCSI 353 status is transferred along with the final SCSI Data-in PDU from 354 a target. See section 4.2 in [iSCSI]. 356 RCaP Message - One or more packets of the network layer comprising a 357 single RDMA operation or a part of an RDMA Read Operation of the 358 RDMA-Capable Protocol. For iWARP, an RCaP Message is known as 359 an RDMAP Message. 361 RCaP Stream - A single bidirectional association between the peer 362 RDMA-Capable Protocol layers on two Nodes over a single 363 transport-level stream. For iWARP, an RCaP Stream is known as 364 an RDMAP Stream, and the association is created following a 365 successful Login Phase during which iSER support is negotiated. 367 RDMA-Capable Protocol (RCaP) - The protocol or protocol suite that 368 provides a reliable RDMA transport functionality, e.g., iWARP, 369 InfiniBand, etc. 371 RDMA-Capable Controller - A network I/O adapter or embedded 372 controller with RDMA functionality. For example, for iWARP, 373 this could be an RNIC, and for InfiniBand, this could be a HCA 374 (Host Channel Adapter) or TCA (Target Channel Adapter). 376 RDMA-enabled Network Interface Controller (RNIC) - A network I/O 377 adapter or embedded controller with iWARP functionality. 379 RDMA Operation - A sequence of RCaP Messages, including control 380 Messages, to transfer data from a Data Source to a Data Sink. 381 The following RDMA Operations are defined - RDMA Write 382 Operation, RDMA Read Operation, and Send Operation. 384 RDMA Protocol (RDMAP) - A wire protocol that supports RDMA 385 Operations to transfer ULP (Upper Level Protocol) data between a 386 Local Peer and the Remote Peer as described in [RDMAP]. 388 RDMA Read Operation - An RDMA Operation used by the Data Sink to 389 transfer the contents of a Data Source buffer from the Remote 390 Peer to a Data Sink buffer at the Local Peer. An RDMA Read 391 operation consists of a single RDMA Read Request Message and a 392 single RDMA Read Response Message. 394 RDMA Read Request - An RCaP Message used by the Data Sink to request 395 the Data Source to transfer the contents of a buffer. The RDMA 396 Read Request Message describes both the Data Source and the Data 397 Sink buffers. 399 RDMA Read Response - An RCaP Message used by the Data Source to 400 transfer the contents of a buffer to the Data Sink, in response 401 to an RDMA Read Request. The RDMA Read Response Message only 402 describes the Data Sink buffer. 404 RDMA Write Operation - An RDMA Operation used by the Data Source to 405 transfer the contents of a Data Source buffer from the Local 406 Peer to a Data Sink buffer at the Remote Peer. The RDMA Write 407 Message only describes the Data Sink buffer. 409 Remote Direct Memory Access (RDMA) - A method of accessing memory on 410 a remote system in which the local system specifies the remote 411 location of the data to be transferred. Employing an RDMA- 412 Capable Controller in the remote system allows the access to take 413 place without interrupting the processing of the CPU(s) on the 414 system. 416 Remote Mapping - A task state record maintained by the iSER Layer 417 that associates the Initiator Task Tag to the Advertised STag(s) 418 and the Base Offset(s). The specifics of the record structure 419 are implementation dependent. 421 Remote Peer - The implementation of the RDMA-Capable Protocol on the 422 opposite end of the connection. Used to refer to the remote 423 entity when describing protocol exchanges or other interactions 424 between two Nodes. 426 SCSI Layer - This layer builds/receives SCSI CDBs (Command 427 Descriptor Blocks) and sends/receives them with the remaining 428 command execute [SAM5] parameters to/from the iSCSI Layer. 430 Send - An RDMA Operation that transfers the content of a buffer from 431 the Local Peer to an untagged buffer at the Remote Peer. 433 SendInvSE Message - A Send with Solicited Event and Invalidate 434 Message. 436 SendSE Message - A Send with Solicited Event Message. 438 Sequence Number (SN) - DataSN for a SCSI Data-in PDU and R2TSN for 439 an R2T PDU. The semantics for both types of sequence numbers 440 are as defined in [iSCSI]. 442 Session, iSCSI Session - The group of Connections that link an 443 initiator SCSI port with a target SCSI port form an iSCSI 444 session (equivalent to a SCSI I-T nexus). Connections can be 445 added to and removed from a session even while the I-T nexus is 446 intact. Across all connections within a session, an initiator 447 sees one and the same target. 449 Steering Tag (STag) - An identifier of a Tagged Buffer on a Node 450 (Local or Remote) as defined in [RDMAP] and [DDP]. For other 451 RDMA-Capable Protocols, the Steering Tag may be known by 452 different names but will be herein referred to as STags. For 453 example, for Infiniband, a Remote STag is known as an R-Key, and 454 a Local STag is known as an L-Key, and both will be considered 455 STags. 457 Tagged Buffer - A buffer that is explicitly Advertised to the iSER 458 Layer at the remote node through the exchange of an STag, Base 459 Offset, and length. 461 Tagged Offset - The offset within a Tagged Buffer. 463 Traditional iSCSI - Refers to the iSCSI protocol as defined in 464 [iSCSI] (i.e. without the iSER enhancements). 466 Untagged Buffer - A buffer that is not explicitly Advertised to the 467 iSER Layer at the remode node. 469 1.2 Acronyms 471 Acronym Definition 473 -------------------------------------------------------------- 475 AHS Additional Header Segment 477 BHS Basic Header Segment 479 CO Connection Only 481 CRC Cyclic Redundancy Check 483 DDP Direct Data Placement Protocol 485 DI Datamover Interface 487 HCA Host Channel Adapter 489 IANA Internet Assigned Numbers Authority 491 IB Infiniband 493 IETF Internet Engineering Task Force 495 I/O Input - Output 497 IO Initialize Only 499 IP Internet Protocol 501 IPoIB IP over Infiniband 503 IPsec Internet Protocol Security 505 iSER iSCSI Extensions for RDMA 507 ITT Initiator Task Tag 508 LO Leading Only 510 MPA Marker PDU Aligned Framing for TCP 512 NOP No Operation 514 NSG Next Stage (during the iSCSI Login Phase) 516 PDU Protocol Data Unit 518 R2T Ready To Transfer 520 R2TSN Ready To Transfer Sequence Number 522 RDMA Remote Direct Memory Access 524 RDMAP Remote Direct Memory Access Protocol 526 RFC Request For Comments 528 RNIC RDMA-enabled Network Interface Controller 530 SAM5 SCSI Architecture Model - 5 532 SCSI Small Computer Systems Interface 534 SNACK Selective Negative Acknowledgment - also 536 Sequence Number Acknowledgement for data 538 STag Steering Tag 540 SW Session Wide 542 TCA Target Channel Adapter 544 TCP Transmission Control Protocol 546 TMF Task Management Function 548 TTT Target Transfer Tag 550 ULP Upper Level Protocol 552 1.3 Conventions 554 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 555 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 556 document are to be interpreted as described in [RFC2119]. 558 2 Introduction 560 2.1 Motivation 562 The iSCSI protocol ([iSCSI]) is a mapping of the SCSI Architecture 563 Model (see [SAM5] and [iSCSI-SAM]) over the TCP protocol. SCSI 564 commands are carried by iSCSI requests and SCSI responses and status 565 are carried by iSCSI responses. Other iSCSI protocol exchanges and 566 SCSI Data are also transported in iSCSI PDUs. 568 Out-of-order TCP segments in the Traditional iSCSI model have to be 569 stored and reassembled before the iSCSI protocol layer within an end 570 node can place the data in the iSCSI buffers. This reassembly is 571 required because not every TCP segment is likely to contain an iSCSI 572 header to enable its placement and TCP itself does not have a built- 573 in mechanism for signaling ULP message boundaries to aid placement 574 of out-of-order segments. This TCP reassembly at high network 575 speeds is quite counter-productive for the following reasons: wasted 576 memory bandwidth in data copying, need for reassembly memory, wasted 577 CPU cycles in data copying, and the general store-and-forward 578 latency from an application perspective. 580 The generic term RDMA-Capable Protocol (RCaP) is used to refer to 581 protocol stacks that provide the Remote Direct Memory Access (RDMA) 582 functionality, such as iWARP and InfiniBand. 584 With the availability of RDMA-Capable Controllers within a host 585 system, it is appropriate for iSCSI to be able to exploit the direct 586 data placement function of the RDMA-Capable Controller like other 587 applications. 589 iSCSI Extensions for RDMA (iSER) is designed precisely to take 590 advantage of generic RDMA technologies - iSER's goal is to permit 591 iSCSI to employ direct data placement and RDMA capabilities using a 592 generic RDMA-Capable Controller. In summary, iSCSI/iSER protocol 593 stack is designed to enable scaling to high speeds by relying on a 594 generic data placement process and RDMA technologies and products, 595 which enable direct data placement of both in-order and out-of-order 596 data. 598 This document describes iSER as a protocol extension to iSCSI, both 599 for convenience of description and also because it is true in a very 600 strict protocol sense. However, it is to be noted that iSER is in 601 reality extending the connectivity of the iSCSI protocol defined in 602 [iSCSI], and the name iSER reflects this reality. 604 When the iSCSI protocol as defined in [iSCSI] (i.e. without the iSER 605 enhancements) is intended in the rest of the document, the term 606 "Traditional iSCSI" is used to make the intention clear. 608 This document obsoletes RFC 5046. See Section 14 for the list of 609 changes from RFC 5046. 611 2.2 iSCSI/iSER Layering 613 iSCSI Extensions for RDMA (iSER) is layered between the iSCSI layer 614 and the RCaP layer. 616 +--------------------------------------------------------+ 618 | SCSI | 620 +--------------------------------------------------------+ 622 | iSCSI | 624 DI -> +--------------------------------------------------------+ 626 | iSER | 628 +-------+--------------------------+---------------------+ 630 | RDMAP | | | 632 +-------+ Infiniband | | 634 | DDP | Reliable | Other | 636 +-------+ Connected | RDMA | 638 | MPA | Transport | Capable | 640 +-------+ Service | Protocol | 642 | TCP | | | 644 +-------+--------------------------+---------------------+ 646 | IP | Infiniband Network Layer | Other Network Layer | 648 +-------+--------------------------+---------------------+ 650 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase 652 Figure 1 shows an example of the relationship between SCSI, iSCSI, 653 iSER, and the different RCaP layers. For TCP, the RCaP is iWARP. 654 For Infiniband, the RCaP is the Reliable Connected Transport 655 Service. Note that the iSCSI layer as described here supports the 656 RDMA Extensions as used in iSER. 658 2.3 Architectural Goals 660 This section summarizes the architectural goals that guided the 661 design of iSER. 663 1. Provide an RDMA data transfer model for iSCSI that enables direct 664 in order or out of order data placement of SCSI data into pre- 665 allocated SCSI buffers while maintaining in order data delivery. 667 2. Not require any major changes to SCSI Architecture Model [SAM5] 668 and SCSI command set standards. 670 3. Utilize existing iSCSI infrastructure (sometimes referred to as 671 "iSCSI ecosystem") including but not limited to MIB, 672 bootstrapping, negotiation, naming & discovery, and security. 674 4. Enable a session to operate in the Traditional iSCSI data transfer 675 mode if iSER is not supported by either the initiator or the 676 target (not require iSCSI full feature phase interoperability 677 between an end node operating in Traditional iSCSI mode, and an 678 end node operating in iSER-assisted mode). 680 5. Allow initiator and target implementations to utilize generic 681 RDMA-Capable Controllers such as RNICs, or implement iSCSI and 682 iSER in software (not require iSCSI or iSER specific assists in 683 the RCaP implementation or RDMA-Capable Controller). 685 6. Implement a light weight Datamover protocol for iSCSI with minimal 686 state maintenance. 688 2.4 Protocol Overview 690 Consistent with the architectural goals stated in section 2.3, the 691 iSER protocol does not require changes in the iSCSI ecosystem or any 692 related SCSI specifications. iSER protocol defines the mapping of 693 iSCSI PDUs to RCaP Messages in such a way that it is entirely 694 feasible to realize iSCSI/iSER implementations that are based on 695 generic RDMA-Capable Controllers. The iSER protocol layer requires 696 minimal state maintenance to assist an iSCSI full feature phase 697 connection, besides being oblivious to the notion of an iSCSI 698 session. The crucial protocol aspects of iSER may be summarized 699 thus: 701 1. iSER-assisted mode is negotiated during the iSCSI login in the 702 leading connection for each session, and an entire iSCSI session 703 can only operate in one mode (i.e. a connection in a session 704 cannot operate in iSER-assisted mode if a different connection of 705 the same session is already in full feature phase in the 706 Traditional iSCSI mode). 708 2. Once in iSER-assisted mode, all iSCSI interactions on that 709 connection use RCaP Messages. 711 3. A Send Message is used for carrying an iSCSI control-type PDU 712 preceded by an iSER header. See section 7.2 for more details on 713 iSCSI control-type PDUs. 715 4. RDMA Write, RDMA Read Request, and RDMA Read Response Messages 716 are used for carrying control and all data information associated 717 with the iSCSI data-type PDUs (i.e., SCSI Data-In PDUs and R2T 718 PDUs). iSER does not use SCSI Data-Out PDUs for solicited data, 719 and SCSI Data-Out PDUs for unsolicited data are not treated as 720 iSCSI data-type PDUs by iSER because RDMA is not used. See 721 section 7.1 for more details on iSCSI data-type PDUs. 723 5. Target drives all data transfer (with the exception of iSCSI 724 unsolicited data) for SCSI writes and SCSI reads, by issuing RDMA 725 Read Requests and RDMA Writes respectively. 727 6. RCaP is responsible for ensuring data integrity. (For example, 728 iWARP includes a CRC-enhanced framing layer called MPA on top of 729 TCP; and for Infiniband, the CRCs are included in the Reliable 730 Connection mode). For this reason, iSCSI header and data digests 731 are negotiated to "None" for iSCSI/iSER sessions. 733 7. The iSCSI error recovery hierarchy defined in [iSCSI] is fully 734 supported by iSER. (However, see section 7.3.11 on the handling 735 of SNACK Request PDUs.) 737 8. iSER requires no changes to iSCSI security and text mode 738 negotiation mechanisms. 740 Note that Traditional iSCSI implementations may have to be adapted 741 to employ iSER. It is expected that the adaptation when required is 742 likely to be centered around the upper layer interface requirements 743 of iSER (section 3). 745 2.5 RDMA services and iSER 747 iSER is designed to work with software and/or hardware protocol 748 stacks providing the protocol services defined in RCaP documents 749 such as [RDMAP], [IB], etc. The following subsections describe the 750 key protocol elements of RCaP services that iSER relies on. 752 2.5.1 STag 754 An STag is the identifier of an I/O Buffer unique to an RDMA-Capable 755 Controller that the iSER Layer Advertises to the remote iSCSI/iSER 756 node in order to complete a SCSI I/O. 758 In iSER, Advertisement is the act of informing the target by the 759 initiator that an I/O Buffer is available at the initiator for RDMA 760 Read or RDMA Write access by the target. The initiator Advertises 761 the I/O Buffer by including the STag and the Base Offset in the 762 header of an iSER Message containing the SCSI Command PDU to the 763 target. The buffer length is as specified in the SCSI Command PDU. 765 The iSER Layer at the initiator Advertises the STag and the Base 766 Offset for the I/O Buffer of each SCSI I/O to the iSER Layer at the 767 target in the iSER header of a Send Message containing the SCSI 768 Command PDU, unless the I/O can be completely satisfied by 769 unsolicited data alone. The SendSE Message should be used if 770 supported by the RCaP layer (e.g., iWARP). 772 The iSER Layer at the target provides the STag for the I/O Buffer 773 that is the Data Sink of an RDMA Read Operation (section 2.5.4) to 774 the RCaP layer on the initiator node - i.e. this is completely 775 transparent to the iSER Layer at the initiator. 777 The iSER layer at the initiator SHOULD invalidate the Advertised 778 STag upon a normal completion of the associated task. The Send with 779 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 780 can be used for automatic invalidation when it is used to carry the 781 SCSI Response PDU. There are two exceptions to this automatic 782 invalidation - bidirectional commands, and abnormal completion of a 783 command. The iSER Layer at the initiator SHOULD explicitly 784 invalidate the STag in these two cases. That iSER layer MUST check 785 that STag invalidation has occurred whenever receipt of a Send with 786 Invalidate message is the expected means of causing an STag to be 787 invalidated, and MUST perform the STag invalidation if the STag has 788 not already been invalidated (e.g., because a Send message was used 789 instead of Send with Invalidate). 791 If the Advertised STag is not invalidated as recommended in the 792 foregoing paragraph (e.g., in order to cache the STag for future 793 reuse), the I/O Buffer remains exposed to the network for access by 794 the RCaP. Such an I/O Buffer is capable of being read or written by 795 the RCaP outside the scope of the iSCSI operation for which it was 796 originally established, which has both robustness and security 797 considerations. The robustness considerations are that the system 798 containing the iSER initiator may react poorly to an unexpected 799 modification of its memory. For the security considerations, see 800 Section 11. 802 2.5.2 Send 804 Send is the RDMA Operation that is not addressed to an Advertised 805 buffer, and uses Untagged buffers as the message is received. 807 The iSER Layer at the initiator uses the Send Operation to transmit 808 any iSCSI control-type PDU to the target. As an example, the 809 initiator uses Send Operations to transfer iSER Messages containing 810 SCSI Command PDUs to the iSER Layer at the target. 812 An iSER layer at the target uses the Send Operation to transmit any 813 iSCSI control-type PDU to the initiator. As an example, the target 814 uses Send Operations to transfer iSER Messages containing SCSI 815 Response PDUs to the iSER Layer at the initiator. 817 For interoperability, iSER implementations SHOULD accept and 818 correctly process SendSE and SendInvSE messages. However, SendSE 819 and SendInvSE messages are to be regarded as optimizations or 820 enhancements to the basic Send message, and their support may vary 821 by RCaP protocol and specific implementation. In general, these 822 messages SHOULD NOT be used, unless the RCaP requires support for 823 them in all implementations. If these messages are used, the 824 implementation SHOULD be capable of reverting to use of Send in 825 order to work with a receiver that does not support these message. 826 Attempted use of these messages with a peer that does not support 827 them may result in a fatal error that closes the RCaP connection. 828 For example, these messages SHOULD NOT be used with the InfiniBand 829 RCaP because InfiniBand does not require support for them in all 830 cases. New iSER implementations SHOULD use Send (and not SendSE or 831 SendInvSE) unless there are compelling reasons for doing otherwise. 832 Similarly, iSER implementations SHOULD NOT rely on events triggered 833 by SendSE and SendInvSE, as these messages may not be used. 835 2.5.3 RDMA Write 837 RDMA Write is the RDMA Operation that is used to place data into an 838 Advertised buffer at the Data Sink. The Data Source addresses the 839 Message using an STag and a Tagged Offset that are valid on the Data 840 Sink. 842 The iSER Layer at the target uses the RDMA Write Operation to 843 transfer the contents of a local I/O Buffer to an Advertised I/O 844 Buffer at the initiator. The iSER Layer at the target uses the RDMA 845 Write to transfer whole or part of the data required to complete a 846 SCSI Read command. 848 The iSER Layer at the initiator does not employ RDMA Writes. 850 2.5.4 RDMA Read 852 RDMA Read is the RDMA Operation that is used to retrieve data from 853 an Advertised buffer at the Data Source. The sender of the RDMA 854 Read Request addresses the Message using an STag and a Tagged Offset 855 that are valid on the Data Source in addition to providing a valid 856 local STag and Tagged Offset that identify the Data Sink. 858 The iSER Layer at the target uses the RDMA Read Operation to 859 transfer the contents of an Advertised I/O Buffer at the initiator 860 to a local I/O Buffer at the target. The iSER Layer at the target 861 uses the RDMA Read to fetch whole or part of the data required to 862 complete a SCSI Write Command. 864 The iSER Layer at the initiator does not employ RDMA Reads. 866 2.6 SCSI Read Overview 868 The iSER Layer at the initiator receives the SCSI Command PDU from 869 the iSCSI Layer. The iSER Layer at the initiator generates an STag 870 for the I/O Buffer of the SCSI Read and Advertises the buffer by 871 including the STag and the Base Offset as part of the iSER header 872 for the PDU. The iSER Message is transferred to the target using a 873 Send Message. The SendSE Message should be used if supported by the 874 RCaP layer (e.g., iWARP). 876 The iSER Layer at the target uses one or more RDMA Writes to 877 transfer the data required to complete the SCSI Read. 879 The iSER Layer at the target uses a Send Message to transfer the 880 SCSI Response PDU back to the iSER Layer at the initiator. The iSER 881 Layer at the initiator invalidates the STag and notifies the iSCSI 882 Layer of the availability of the SCSI Response PDU. The Send with 883 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 884 can be used for automatic invalidation of the STag. 886 2.7 SCSI Write Overview 888 The iSER Layer at the initiator receives the SCSI Command PDU from 889 the iSCSI Layer. If solicited data transfer is involved, the iSER 890 Layer at the initiator generates an STag for the I/O Buffer of the 891 SCSI Write and Advertises the buffer by including the STag and the 892 Base Offset as part of the iSER header for the PDU. The iSER 893 Message is transferred to the target using a Send Message. The 894 SendSE Message should be used if supported by the RCaP layer (e.g., 895 iWARP). 897 The iSER Layer at the initiator may optionally send one or more non- 898 immediate unsolicited data PDUs to the target using Send Messages. 900 If solicited data transfer is involved, the iSER Layer at the target 901 uses one or more RDMA Reads to transfer the data required to 902 complete the SCSI Write. 904 The iSER Layer at the target uses a Send Message to transfer the 905 SCSI Response PDU back to the iSER Layer at the initiator. The iSER 906 Layer at the initiator invalidates the STag and notifies the iSCSI 907 Layer of the availability of the SCSI Response PDU. The Send with 908 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 909 can be used for automatic invalidation of the STag. 911 3 Upper Layer Interface Requirements 913 This section discusses the upper layer interface requirements in the 914 form of an abstract model of the required interactions between the 915 iSCSI Layer and the iSER Layer. The abstract model used here is 916 derived from the architectural model described in [DA]. [DA] also 917 provides a functional overview of the interactions between the iSCSI 918 Layer and the datamover layer as intended by the Datamover 919 Architecture. 921 The interface requirements are specified by Operational Primitives. 922 An Operational Primitive is an abstract functional interface 923 procedure between the iSCSI Layer and the iSER Layer that requests 924 one layer to perform a specific action on behalf of the other layer 925 or notifies the other layer of some event. Whenever an Operational 926 Primitive in invoked, the Connection_Handle qualifier is used to 927 identify a particular iSCSI connection. For some Operational 928 Primitives, a Data_Descriptor is used to identify the iSCSI/SCSI 929 data buffer associated with the requested or completed operation. 931 The abstract model and the Operational Primitives defined in this 932 section facilitate the description of the iSER protocol. In the 933 rest of the iSER specification, the compliance statements related to 934 the use of these Operational Primitives are only for the purpose of 935 the required interactions between the iSCSI Layer and the iSER 936 Layer. Note that the compliance statements related to the 937 Operational Primitives in the rest of this specification only 938 mandate functional equivalence on implementations, but do not put 939 any requirements on the implementation specifics of the interface 940 between the iSCSI Layer and the iSER Layer. 942 Each Operational Primitive is invoked with a set of qualifiers which 943 specify the information context for performing the specific action 944 being requested of the Operational Primitive. While the qualifiers 945 are required, the method of realizing the qualifiers (e.g., by 946 passing synchronously with invocation, or by retrieving from task 947 context, or by retrieving from shared memory, etc.) is 948 implementation dependent. 950 3.1 Operational Primitives offered by iSER 952 The iSER protocol layer MUST support the following Operational 953 Primitives to be used by the iSCSI protocol layer. 955 3.1.1 Send_Control 957 Input qualifiers: Connection_Handle, BHS and AHS (if any) of 958 the iSCSI PDU, PDU-specific qualifiers 960 Return results: Not specified 962 This is used by the iSCSI Layers at the initiator and the target to 963 request the outbound transfer of an iSCSI control-type PDU (see 964 section 7.2). Qualifiers that only apply for a particular control- 965 type PDU are known as PDU-specific qualifiers, e.g., 966 ImmediateDataSize for a SCSI Write command. For details on PDU- 967 specific qualifiers, see section 7.3. The iSCSI Layer can only 968 invoke the Send_Control Operational Primitive when the connection is 969 in iSER-assisted mode. 971 3.1.2 Put_Data 973 Input qualifiers: Connection_Handle, content of a SCSI Data-in 974 PDU header, Data_Descriptor, Notify_Enable 976 Return results: Not specified 978 This is used by the iSCSI Layer at the target to request the 979 outbound transfer of data for a SCSI Data-in PDU from the buffer 980 identified by the Data_Descriptor qualifier. The iSCSI Layer can 981 only invoke the Put_Data Operational Primitive when the connection 982 is in iSER-assisted mode. 984 The Notify_Enable qualifier is used to indicate to the iSER Layer 985 whether or not it should generate an eventual local completion 986 notification to the iSCSI Layer. See section 3.2.2 on 987 Data_Completion_Notify for details. 989 3.1.3 Get_Data 991 Input qualifiers: Connection_Handle, content of an R2T PDU, 992 Data_Descriptor, Notify_Enable 994 Return results: Not specified 996 This is used by the iSCSI Layer at the target to request the inbound 997 transfer of solicited data requested by an R2T PDU into the buffer 998 identified by the Data_Descriptor qualifier. The iSCSI Layer can 999 only invoke the Get_Data Operational Primitive when the connection 1000 is in iSER-assisted mode. 1002 The Notify_Enable qualifier is used to indicate to the iSER Layer 1003 whether or not it should generate the eventual local completion 1004 notification to the iSCSI Layer. See section 3.2.2 on 1005 Data_Completion_Notify for details. 1007 3.1.4 Allocate_Connection_Resources 1009 Input qualifiers: Connection_Handle, Resource_Descriptor 1010 (optional) 1012 Return results: Status 1014 This is used by the iSCSI Layers at the initiator and the target to 1015 request the allocation of all connection resources necessary to 1016 support RCaP for an operational iSCSI/iSER connection. The iSCSI 1017 Layer may optionally specify the implementation-specific resource 1018 requirements for the iSCSI connection using the Resource_Descriptor 1019 qualifier. 1021 A return result of Status=success means the invocation succeeded, 1022 and a return result of Status=failure means that the invocation 1023 failed. If the invocation is for a Connection_Handle for which an 1024 earlier invocation succeeded, the request will be ignored by the 1025 iSER Layer and the result of Status=success will be returned. Only 1026 one Allocate_Connection_Resources Operational Primitive invocation 1027 can be outstanding for a given Connection_Handle at any time. 1029 3.1.5 Deallocate_Connection_Resources 1031 Input qualifiers: Connection_Handle 1033 Return results: Not specified 1035 This is used by the iSCSI Layers at the initiator and the target to 1036 request the deallocation of all connection resources that were 1037 allocated earlier as a result of a successful invocation of the 1038 Allocate_Connection_Resources Operational Primitive. 1040 3.1.6 Enable_Datamover 1042 Input qualifiers: Connection_Handle, 1043 Transport_Connection_Descriptor, Final Login_Response_PDU 1044 (optional) 1046 Return results: Not specified 1048 This is used by the iSCSI Layers at the initiator and the target to 1049 request that iSER-assisted mode be used for the connection. The 1050 Transport_Connection_Descriptor qualifier is used to identify the 1051 specific connection associated with the Connection_Handle. The 1052 iSCSI layer can only invoke the Enable_Datamover Operational 1053 Primitive when there was a corresponding prior resource allocation. 1055 The Final_Login_Response_PDU input qualifier is applicable only for 1056 a target, and contains the final Login Response PDU that concludes 1057 the iSCSI Login Phase. 1059 3.1.7 Connection_Terminate 1061 Input qualifiers: Connection_Handle 1063 Return results: Not specified 1065 This is used by the iSCSI Layers at the initiator and the target to 1066 request that a specified iSCSI/iSER connection be terminated and all 1067 associated connection and task resources be freed. When this 1068 Operational Primitive invocation returns to the iSCSI layer, the 1069 iSCSI layer may assume full ownership of all iSCSI-level resources, 1070 e.g. I/O Buffers, associated with the connection. 1072 3.1.8 Notice_Key_Values 1074 Input qualifiers: Connection_Handle, number of keys, list of 1075 Key-Value pairs 1077 Return results: Not specified 1079 This is used by the iSCSI Layers at the initiator and the target to 1080 request the iSER Layer to take note of the specified Key-Value pairs 1081 which were negotiated by the iSCSI peers for the connection. 1083 3.1.9 Deallocate_Task_Resources 1085 Input qualifiers: Connection_Handle, ITT 1087 Return results: Not specified 1089 This is used by the iSCSI Layers at the initiator and the target to 1090 request the deallocation of all RCaP-specific resources allocated by 1091 the iSER Layer for the task identified by the ITT qualifier. The 1092 iSER Layer may require a certain number of RCaP-specific resources 1093 associated with the ITT for each new iSCSI task. In the normal 1094 course of execution, these task-level resources in the iSER Layer 1095 are assumed to be transparently allocated on each task initiation 1096 and deallocated on the conclusion of each task as appropriate. In 1097 exception scenarios where the task does not conclude with a SCSI 1098 Response PDU, the iSER Layer needs to be notified of the individual 1099 task terminations to aid its task-level resource management. This 1100 Operational Primitive is used for this purpose, and is not needed 1101 when a SCSI Response PDU normally concludes a task. Note that RCaP- 1102 specific task resources are deallocated by the iSER Layer when a 1103 SCSI Response PDU normally concludes a task, even if the SCSI Status 1104 was not success. 1106 3.2 Operational Primitives used by iSER 1108 The iSER layer MUST use the following Operational Primitives offered 1109 by the iSCSI protocol layer when the connection is in iSER-assisted 1110 mode. 1112 3.2.1 Control_Notify 1114 Input qualifiers: Connection_Handle, an iSCSI control-type PDU 1116 Return results: Not specified 1118 This is used by the iSER Layers at the initiator and the target to 1119 notify the iSCSI Layer of the availability of an inbound iSCSI 1120 control-type PDU. A PDU is described as "available" to the iSCSI 1121 Layer when the iSER Layer notifies the iSCSI Layer of the reception 1122 of that inbound PDU, along with an implementation-specific 1123 indication as to where the received PDU is. 1125 3.2.2 Data_Completion_Notify 1127 Input qualifiers: Connection_Handle, ITT, SN 1129 Return results: Not specified 1131 This is used by the iSER Layer to notify the iSCSI Layer of the 1132 completion of outbound data transfer that was requested by the iSCSI 1133 Layer only if the invocation of the Put_Data Operational Primitive 1134 (see section 3.1.2) was qualified with Notify_Enable set. SN refers 1135 to the DataSN associated with the SCSI Data-In PDU. 1137 This is used by the iSER Layer to notify the iSCSI Layer of the 1138 completion of inbound data transfer that was requested by the iSCSI 1139 Layer only if the invocation of the Get_Data Operational Primitive 1140 (see section 3.1.3) was qualified with Notify_Enable set. SN refers 1141 to the R2TSN associated with the R2T PDU. 1143 3.2.3 Data_ACK_Notify 1145 Input qualifier: Connection_Handle, ITT, DataSN 1147 Return results: Not specified 1149 This is used by the iSER Layer at the target to notify the iSCSI 1150 Layer of the arrival of the data acknowledgement (as defined in 1151 [iSCSI]) requested earlier by the iSCSI Layer for the outbound data 1152 transfer via an invocation of the Put_Data Operational Primitive 1153 where the A-bit in the SCSI Data-in PDU is set to 1. See section 1154 7.3.5. DataSN refers to the expected DataSN of the next SCSI Data- 1155 in PDU which immediately follows the SCSI Data-in PDU with the A-bit 1156 set to which this notification corresponds, with semantics as 1157 defined in [iSCSI]. 1159 3.2.4 Connection_Terminate_Notify 1161 Input qualifiers: Connection_Handle 1163 Return results: Not specified 1165 This is used by the iSER Layers at the initiator and the target to 1166 notify the iSCSI Layer of the unsolicited termination or failure of 1167 an iSCSI/iSER connection. The iSER Layer MUST deallocate the 1168 connection and task resources associated with the terminated 1169 connection before the invocation of this Operational Primitive. 1170 Note that the Connection_Terminate_Notify Operational Primitive is 1171 not invoked when the termination of the connection was earlier 1172 requested by the local iSCSI Layer. 1174 3.3 iSCSI Protocol Usage Requirements 1176 To operate in an iSER-assisted mode, the iSCSI Layers at both the 1177 initiator and the target MUST negotiate the RDMAExtensions key (see 1178 section 6.3) to "Yes" on the leading connection. If the 1179 RDMAExtensions key is not negotiated to "Yes", then iSER-assisted 1180 mode MUST NOT be used. If the RDMAExtensons key is negotiated to 1181 "Yes" but the invocation of the Allocate_Connection_Resources 1182 Operational Primitive to the iSER layer fails, the iSCSI layer MUST 1183 fail the iSCSI Login process or terminate the connection as 1184 appropriate. See section 10.1.3.1 for details. 1186 If the RDMAExtensions key is negotiated to "Yes", the iSCSI Layer 1187 MUST satisfy the following protocol usage requirements from the iSER 1188 protocol: 1190 1. The iSCSI Layer at the initiator MUST set ExpDataSN to 0 in Task 1191 Management Function Requests for Task Allegiance Reassignment 1192 for read/bidirectional commands, so as to cause the target to 1193 send all unacknowledged read data. 1195 2. The iSCSI Layer at the target MUST always return the SCSI status 1196 in a separate SCSI Response PDU for read commands, i.e., there 1197 MUST NOT be a "phase collapse" in concluding a SCSI Read 1198 Command. 1200 3. The iSCSI Layers at both the initiator and the target MUST 1201 support the keys as defined in section 6 on Login/Text 1202 Operational Keys. If used as specified, these keys MUST NOT be 1203 answered with NotUnderstood and the semantics as defined MUST be 1204 followed for each iSER-assisted connection. 1206 4. The iSCSI Layer at the initiator MUST NOT issue SNACKs for PDUs. 1208 4 Lower Layer Interface Requirements 1210 4.1 Interactions with the RCaP Layer 1212 The iSER protocol layer is layered on top of an RCaP layer (see 1213 Error! Reference source not found.) and the following are the key 1214 features that are assumed to be supported by any RCaP layer: 1216 * The RCaP layer supports all basic RDMA operations, including RDMA 1217 Write Operation, RDMA Read Operation, and Send Operation. 1219 * The RCaP layer provides reliable, in-order message delivery and 1220 direct data placement. 1222 * When the iSER Layer initiates an RDMA Read Operation following an 1223 RDMA Write Operation on one RCaP Stream, the RDMA Read Response 1224 Message processing on the remote node will be started only after 1225 the preceding RDMA Write Message payload is placed in the memory 1226 of the remote node. 1228 * The RCaP layer encapsulates a single iSER Message into a single 1229 RCaP Message on the Data Source side. The RCaP layer 1230 decapsulates the iSER Message before delivering it to the iSER 1231 Layer on the Data Sink side. 1233 * For a RCaP layer that supports the Send with Invalidate Message 1234 (e.g., iWARP), when the iSER Layer provides the STag to be 1235 remotely invalidated to the RCaP layer for a Send with Invalidate 1236 Message, the RCaP layer uses this STag as the STag to be 1237 invalidated in the Send with Invalidate Message. 1239 * The RCaP layer uses the STag and Tagged Offset provided by the 1240 iSER Layer for the RDMA Write and RDMA Read Request Messages. 1242 * When the RCaP layer delivers the content of an RDMA Send Message 1243 to the iSER Layer, the RCaP layer provides the length of the RDMA 1244 Send message. This ensures that the iSER Layer does not have to 1245 carry a length field in the iSER header. 1247 * When the RCaP layer delivers the Send Message to the iSER Layer, 1248 it notifies the iSER Layer with the mechanism provided on that 1249 interface. 1251 * For a RCaP layer that supports the Send with Invalidate Message 1252 (e.g., iWARP), when the RCaP layer delivers a Send with 1253 Invalidate Message to the iSER Layer, it passes the value of the 1254 STag that was invalidated. 1256 * The RCaP layer propagates all status and error indications to the 1257 iSER Layer. 1259 * For a transport layer that operates in byte stream mode such as 1260 TCP, the RCaP implementation supports the enabling of the RDMA 1261 mode after Connection establishment and the exchange of Login 1262 parameters in byte stream mode. For a transport layer that 1263 provides message delivery capability such as [IB], the RCaP 1264 implementation supports the use of the messaging capability by 1265 the iSCSI Layer directly for the Login phase after connection 1266 establishment before enabling iSER-assisted mode. (In the 1267 specific example of InfiniBand [IB], the iSCSI Layer uses IB 1268 messages to transfer iSCSI PDUs for the Login phase after 1269 connection establishment before enabling iSER-assisted mode.) 1271 * Whenever the iSER Layer terminates the RCaP Stream, the RCaP 1272 layer terminates the associated Connection. 1274 4.2 Interactions with the Transport Layer 1276 After the iSER connection is established, the RCaP layer and the 1277 underlying transport layer are responsible for maintaining the 1278 Connection and reporting to the iSER Layer any Connection failures. 1280 5 Connection Setup and Termination 1282 5.1 iSCSI/iSER Connection Setup 1284 During connection setup, the iSCSI Layer at the initiator is 1285 responsible for establishing a connection with the target. After 1286 the connection is established, the iSCSI Layers at the initiator and 1287 the target enter the Login Phase using the same rules as outlined in 1288 [iSCSI]. The connection transitions into the iSCSI full feature 1289 phase in iSER-assisted mode following a successful login negotiation 1290 between the initiator and the target in which iSER-assisted mode is 1291 negotiated and the connection resources necessary to support RCaP 1292 have been allocated at both the initiator and the target. The same 1293 connection MUST be used for both the iSCSI Login phase and the 1294 subsequent iSER-assisted full feature phase. 1296 For a transport layer that operates in byte stream mode such as TCP, 1297 the RCaP implementation supports the enabling of the RDMA mode after 1298 Connection establishment and the exchange of Login parameters in 1299 byte stream mode. For a transport layer that provides message 1300 delivery capability such as [IB], the RCaP implementation supports 1301 the use of the messaging capability by the iSCSI Layer directly for 1302 the Login phase after connection establishment before enabling iSER- 1303 assisted mode. 1305 iSER-assisted mode MUST be enabled only if it is negotiated on the 1306 leading connection during the LoginOperationalNegotiation Stage of 1307 the iSCSI Login Phase. iSER-assisted mode is negotiated using the 1308 RDMAExtensions= key. Both the initiator and the 1309 target MUST exchange the RDMAExtensions key with the value set to 1310 "Yes" to enable iSER-assisted mode. If both the initiator and the 1311 target fail to negotiate the RDMAExtensions key set to "Yes", then 1312 the connection MUST continue with the login semantics as defined in 1313 [iSCSI]. If the RDMAExtensions key is not negotiated to Yes, then 1314 for some RCaP implementation (such as [IB]), the existing connection 1315 may need to be torn down and a new connection may need to be 1316 established in TCP capable mode. (For InfiniBand this will require 1317 an [IPoIB] type connection.) 1319 iSER-assisted mode is defined for a Normal session only and the 1320 RDMAExtensions key MUST NOT be negotiated for a Discovery session. 1321 Discovery sessions are always conducted using the transport layer as 1322 described in [iSCSI]. 1324 An iSER enabled node is not required to initiate the RDMAExtensions 1325 key exchange if its preference is for the Traditional iSCSI mode. 1326 The RDMAExtensions key, if offered, MUST be sent in the first 1327 available Login Response or Login Request PDU in the 1328 LoginOperationalNegotiation stage. This is due to the fact that the 1329 value of some login parameters might depend on whether iSER-assisted 1330 mode is enabled or not. 1332 iSER-assisted mode is a session-wide attribute. If both the 1333 initiator and the target negotiated RDMAExtensions="Yes" on the 1334 leading connection of a session, then all subsequent connections of 1335 the same session MUST enable iSER-assisted mode without having to 1336 exchange RDMAExtensions key during the iSCSI Login Phase. 1337 Conversely, if both the initiator and the target failed to negotiate 1338 RDMAExtensions to "Yes" on the leading connection of a session, then 1339 the RDMAExtensions key MUST NOT be negotiated further on any 1340 additional subsequent connection of the session. 1342 When the RDMAExtensions key is negotiated to "Yes", the HeaderDigest 1343 and the DataDigest keys MUST be negotiated to "None" on all 1344 iSCSI/iSER connections participating in that iSCSI session. This is 1345 because, for an iSCSI/iSER connection, RCaP is responsible for 1346 providing error detection that is at least as good as a 32-bit CRC 1347 for all iSER Messages. Furthermore, all SCSI Read data are sent 1348 using RDMA Write Messages instead of the SCSI Data-in PDUs, and all 1349 solicited SCSI write data are sent using RDMA Read Response Messages 1350 instead of the SCSI Data-out PDUs. HeaderDigest and DataDigest 1351 which apply to iSCSI PDUs would not be appropriate for RDMA Read and 1352 RDMA Write operations used with iSER. 1354 5.1.1 Initiator Behavior 1356 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1357 mode, then on the initiator side, prior to sending the Login Request 1358 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1359 to FullFeaturePhase, the iSCSI Layer SHOULD request the iSER Layer 1360 to allocate the connection resources necessary to support RCaP by 1361 invoking the Allocate_Connection_Resources Operational Primitive. 1362 The connection resources required are defined by implementation and 1363 are outside the scope of this specification. The iSCSI Layer may 1364 invoke the Notice_Key_Values Operational Primitive before invoking 1365 the Allocate_Connection_Resources Operational Primitive to request 1366 the iSER Layer to take note of the negotiated values of the iSCSI 1367 keys for the Connection. The specific keys to be passed in as input 1368 qualifiers are implementation dependent. These may include, but not 1369 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1371 Among the connection resources allocated at the initiator is the 1372 Inbound RDMA Read Queue Depth (IRD). As described in section 9.5.1, 1373 R2Ts are transformed by the target into RDMA Read operations. IRD 1374 limits the maximum number of simultaneously incoming outstanding 1375 RDMA Read Requests per an RCaP Stream from the target to the 1376 initiator. The required value of IRD is outside the scope of the 1377 iSER specification. The iSER Layer at the initiator MUST set IRD to 1378 1 or higher if R2Ts are to be used in the connection. However, the 1379 iSER Layer at the initiator MAY set IRD to 0 based on implementation 1380 configuration which indicates that no R2Ts will be used on that 1381 connection. Initially, the iSER-IRD value at the initiator SHOULD 1382 be set to the IRD value at the initiator and MUST NOT be more than 1383 the IRD value. 1385 On the other hand, the Outbound RDMA Read Queue Depth (ORD) MAY be 1386 set to 0 since the iSER Layer at the initiator does not issue RDMA 1387 Read Requests to the target. 1389 Failure to allocate the requested connection resources locally 1390 results in a login failure and its handling is described in section 1391 10.1.3.1. 1393 The iSER Layer MUST return a success status to the iSCSI Layer in 1394 response to the Allocate_Connection_Resources Operational Primitive. 1396 After the target returns the Login Response with the T bit set to 1 1397 and the NSG field set to FullFeaturePhase, and a status class of 0 1398 (Success), the iSCSI Layer MUST invoke the Enable_Datamover 1399 Operational Primitive with the following qualifiers. (See section 1400 10.1.4.6 for the case when the status class is not Success.): 1402 a. Connection_Handle that identifies the iSCSI connection. 1404 b. Transport_Connection_Descriptor which identifies the 1405 specific transport connection associated with the 1406 Connection_Handle. 1408 The iSER Layer MUST send the iSER Hello Message as the first iSER 1409 Message only if iSERHelloRequired is negotiated to "Yes". See 1410 Section 5.1.3 on iSER Hello Exchange. 1412 If the iSCSI Layer on the initiator side allocates the connection 1413 resources to support RCaP only after it receives the final Login 1414 Response PDU from the target, then it may not be able to handle the 1415 number of unexpected iSCSI control-type PDUs (as declared by the 1416 MaxOutstandingUnexpectedPDUs key from the initiator) that can be 1417 sent by the target before the buffer resources are allocated at the 1418 initiator side. In this case the iSERHelloRequired key SHOULD be 1419 negotiated to "Yes" so that the initiator can allocate the 1420 connection resources before sending the iSER Hello Message. See 1421 section 5.1.3 for more details. 1423 5.1.2 Target Behavior 1425 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1426 mode, then on the target side, prior to sending the Login Response 1427 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1428 to FullFeaturePhase, the iSCSI Layer MUST request the iSER Layer to 1429 allocate the resources necessary to support RCaP by invoking the 1430 Allocate_Connection_Resources Operational Primitive. The connection 1431 resources required are defined by implementation and are outside the 1432 scope of this specification. Optionally, the iSCSI Layer may invoke 1433 the Notice_Key_Values Operational Primitive before invoking the 1434 Allocate_Connection_Resources Operational Primitive to request the 1435 iSER Layer to take note of the negotiated values of the iSCSI keys 1436 for the Connection. The specific keys to be passed in as input 1437 qualifiers are implementation dependent. These may include, but not 1438 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1440 Premature allocation of RCaP connection resources can expose an iSER 1441 target to a resource exhaustion attack on those resources via 1442 multiple iSER connections that progress only to the point at which 1443 the implementation allocates the RCaP connection resources. The 1444 countermeasure for this attack is initiator authentication; the 1445 iSCSI Layer MUST NOT request the iSER Layer to allocate the 1446 connection resources necessary to support RCaP until the iSCSI layer 1447 is sufficiently far along in the iSCSI Login Phase that it is 1448 reasonably certain that the peer side is not an attacker. In 1449 particular, if the Login Phase includes a SecurityNegotiation stage, 1450 the iSCSI Layer MUST defer the connection resource allocation (i.e. 1451 invoking the Allocate_Connection_Resources Operational Primitive) to 1452 the LoginOperationalNegotiation stage ([iSCSI]) so that the resource 1453 allocation occurs after the authentication phase is completed. 1455 Among the connection resources allocated at the target is the 1456 Outbound RDMA Read Queue Depth (ORD). As described in section 1457 9.5.1, R2Ts are transformed by the target into RDMA Read operations. 1458 The ORD limits the maximum number of simultaneously outstanding RDMA 1459 Read Requests per RCaP Stream from the target to the initiator. 1460 Initially, the iSER-ORD value at the target SHOULD be set to the ORD 1461 value at the target. 1463 On the other hand, the IRD at the target MAY be set to 0 since the 1464 iSER Layer at the target does not expect RDMA Read Requests to be 1465 issued by the initiator. 1467 Failure to allocate the requested connection resources locally 1468 results in a login failure and its handling is described in section 1469 10.1.3.1. 1471 If the iSER Layer at the target is successful in allocating the 1472 connection resources necessary to support RCaP, the following events 1473 MUST occur in the specified sequence: 1475 1. The iSER Layer MUST return a success status to the iSCSI Layer 1476 in response to the Allocate_Connection_Resources Operational 1477 Primitive. 1479 2. The iSCSI Layer MUST invoke the Enable_Datamover Operational 1480 Primitive with the following qualifiers: 1482 a. Connection_Handle that identifies the iSCSI connection. 1484 b. Transport_Connection_Descriptor which identifies the 1485 specific transport connection associated with the 1486 Connection_Handle. 1488 c. The final transport layer (e.g. TCP) message containing the 1489 Login Response with the T bit set to 1 and the NSG field set 1490 to FullFeaturePhase 1492 3. The iSER Layer MUST send the final Login Response PDU in the 1493 native transport mode to conclude the iSCSI Login Phase. If the 1494 underlying transport is TCP, then the iSER Layer MUST send the 1495 final Login Response PDU in byte stream mode. 1497 4. After receiving the iSER Hello Message from the initiator, the 1498 iSER Layer MUST respond with the iSER HelloReply Message to be 1499 sent as the first iSER Message if iSERHelloRequired is 1500 negotiated to "Yes". If the iSER layer receives an iSER Hello 1501 Message when iSERHelloRequired is negotiated to "No", then this 1502 MUST be treated as an iSER protocol error. See section 5.1.3 on 1503 iSER Hello Exchange for more details. 1505 Note: In the above sequence, the operations as described in bullets 1506 3 and 4 MUST be performed atomically for iWARP connections. Failure 1507 to do this may result in race conditions. 1509 5.1.3 iSER Hello Exchange 1511 If iSERHelloRequired is negotiated to "Yes", the first iSER Message 1512 sent by the iSER Layer at the initiator to the target MUST be the 1513 iSER Hello Message. The iSER Hello Message is used by the iSER 1514 Layer at the initiator to declare iSER parameters to the target. 1515 See section 9.3 on iSER Header Format for iSER Hello Message. 1516 Conversely, if iSERHelloRequired is negotiated to "No", then the 1517 iSER Layer at the initiator MUST NOT send an iSER Hello Message. 1519 In response to the iSER Hello Message, the iSER Layer at the target 1520 MUST return the iSER HelloReply Message as the first iSER Message 1521 sent by the target if iSERHelloRequired is negotiated to "Yes". The 1522 iSER HelloReply Message is used by the iSER Layer at the target to 1523 declare iSER parameters to the initiator. See section 9.4 on iSER 1524 Header Format for iSER HelloReply Message. If the iSER layer 1525 receives an iSER Hello Message when iSERHelloRequired is negotiated 1526 to "No", then this MUST be treated as an iSER protocol error. See 1527 section 10.1.3.4 on iSER Protocol Errors for more details 1529 In the iSER Hello Message, the iSER Layer at the initiator declares 1530 the iSER-IRD value to the target. 1532 Upon receiving the iSER Hello Message, the iSER Layer at the target 1533 MUST set the iSER-ORD value to the minimum of the iSER-ORD value at 1534 the target and the iSER-IRD value declared by the initiator. The 1535 iSER Layer at the target MAY adjust (lower) its ORD value to match 1536 the iSER-ORD value if the iSER-ORD value is smaller than the ORD 1537 value at the target in order to free up the unused resources. 1539 In the iSER HelloReply Message, the iSER Layer at the target 1540 declares the iSER-ORD value to the initiator. 1542 Upon receiving the iSER HelloReply Message, the iSER Layer at the 1543 initiator MAY adjust (lower) its IRD value to match the iSER-ORD 1544 value in order to free up the unused resources, if the iSER-ORD 1545 value declared by the target is smaller than the iSER-IRD value 1546 declared by the initiator. 1548 It is an iSER level negotiation failure if the iSER parameters 1549 declared in the iSER Hello Message by the initiator are unacceptable 1550 to the target. This includes the following: 1552 * The initiator-declared iSER-IRD value is greater than 0 and the 1553 target-declared iSER-ORD value is 0. 1555 * The initiator-supported and the target-supported iSER protocol 1556 versions do not overlap. 1558 See section 10.1.3.2 on the handling of the error situation. 1560 An initiator that conforms to [RFC5046] allocates connection 1561 resources before seding the Login Request with the T (Transit) bit 1562 set to 1 and the NSG (Next Stage) field set to FullFeaturePhase. 1563 (For brevity, this is referred to as "early" connection allocation.) 1564 The current iSER specification relaxes this requirement to allow an 1565 initiator to allocate connection resources after it receives the 1566 final Login Response PDU from the target. (For brevity, this is 1567 referred to as "late" connection allocation.) An initiator that 1568 employs "late" connection allocation may encounter problems (e.g., 1569 RCaP connection closure) with a target that sends unexpected iSCSI 1570 PDUs immediately upon transitioning to Full Feature Phase, as 1571 allowed by the negotiated value of the MaxOustandingUnexpectedPDUs 1572 key. The only way to prevent this situation in full generality is 1573 to use iSER Hello Messages, as they enable the initiator to allocate 1574 its connection resources before sending its iSER Hello Message. The 1575 iSERHelloRequired key is used by the initiator to determine if it is 1576 dealing with a target that supports the iSER Hello exchanges. 1577 Fortunately, known iSER target implementations do not take full 1578 advantage of the number of allowed unexpected PDUs immediately upon 1579 transitioning into full feature phase, enabling an initiator 1580 workaround that involves a smaller quantity of connection resources 1581 prior to full-feature phase, as explained further below. 1583 In the following summary where "late" connection allocation is 1584 practised, an initiator that follows [RFC5046] is referred to as an 1585 "old" initiator; otherwise it is referred to as a "new" initiator. 1586 Similarly, a target that does not support the iSERHelloRequired key 1587 (and responds with "NotUnderstood" when negotiating the 1588 iSERHelloRequired key) is referred to as an "old" target; otherwise 1589 it is referred to as a "new" target. Note that an "old" target can 1590 still support the iSER Hello exchanges but this fact is not known by 1591 the initiator. A "new" target can also respond with "No" when 1592 negotiating the iSERHelloRequired key. In this case its behavior 1593 with respect to "late" connection allocation is similar to an "old" 1594 target. 1596 A "new" initiator will work fine with a "new" target. 1598 For an "old" initiator and an "old" target, the failure by the 1599 initiator to handle the number of unexpected iSCSI control-type PDUs 1600 that are sent by the target before the buffer resources are 1601 allocated at the initiator can result in the failure of the iSER 1602 session caused by closure of the underlying RCaP connection. For 1603 the "old" target, there is known implementation that sends one 1604 unexpected iSCSI control-type PDU after sending the final Login 1605 Response and then waits awhile before sending the next one. This 1606 tends to alleviate somewhat the buffer allocation problem at the 1607 initiator. 1609 For a "new" initiator and an "old" target, the failure by the 1610 initiator to handle the number of unexpected iSCSI control-type PDUs 1611 that are sent by the target before the buffer resources are 1612 allocated at the initiator can result in the failure of the iSER 1613 session caused by closure of the underlying RCaP connection. A 1614 "new" initiator MAY choose to terminate the connection; otherwise it 1615 SHOULD do one of the following: 1617 1. Allocate the connection resources before sending the final Login 1618 Request PDU. 1620 2. Allocate one or more buffers for receiving unexpected control- 1621 type PDUs from the target before sending the final Login Request 1622 PDU. This reduces the possibility of the unexpected control-type 1623 PDUs causing the RCaP connection to close before the connection 1624 resources have been allocated. 1626 For an "old" initiator and a "new" target, if the iSERHelloRequired 1627 key is not negotiated, a "new" target MUST still respond with the 1628 iSER HelloReply Message when it receives the iSER Hello Message. If 1629 the iSERHelloRequired key is negotiated to "No" or "NotUnderstood", 1630 a "new" target MAY choose to terminate the connection; otherwise it 1631 SHOULD delay sending any unexpected control-type PDUs until one of 1632 the following events has occurred: 1634 1. A PDU is received from the initiator after it sends the final 1635 Login Response PDU. 1637 2. A system configurable timeout period, say one second, has 1638 expired. 1640 5.2 iSCSI/iSER Connection Termination 1642 5.2.1 Normal Connection Termination at the Initiator 1644 The iSCSI Layer at the initiator terminates an iSCSI/iSER connection 1645 normally by invoking the Send_Control Operational Primitive 1646 qualified with the Logout Request PDU. The iSER Layer at the 1647 initiator MUST use a Send Message to send the Logout Request PDU to 1648 the target. The SendSE Message should be used if supported by the 1649 RCaP layer (e.g., iWARP). After the iSER Layer at the initiator 1650 receives the Send Message containing the Logout Response PDU from 1651 the target, it MUST notify the iSCSI Layer by invoking the 1652 Control_Notify Operational Primitive qualified with the Logout 1653 Response PDU. 1655 After the iSCSI logout process is complete, the iSCSI layer at the 1656 target is responsible for closing the iSCSI/iSER connection as 1657 described in Section 5.2.2. After the RCaP layer at the initiator 1658 reports that the Connection has been closed, the iSER Layer at the 1659 initiator MUST deallocate all connection and task resources (if any) 1660 associated with the connection, invalidate the Local Mappings (if 1661 any) before notifying the iSCSI Layer by invoking the 1662 Connection_Terminate_Notify Operational Primitive. 1664 5.2.2 Normal Connection Termination at the Target 1666 Upon receiving the Send Message containing the Logout Request PDU, 1667 the iSER Layer at the target MUST notify the iSCSI Layer at the 1668 target by invoking the Control_Notify Operational Primitive 1669 qualified with the Logout Request PDU. The iSCSI Layer completes 1670 the logout process by invoking the Send_Control Operational 1671 Primitive qualified with the Logout Response PDU. The iSER Layer at 1672 the target MUST use a Send Message to send the Logout Response PDU 1673 to the initiator. The SendSE Message should be used if supported by 1674 the RCaP layer (e.g., iWARP). After the iSCSI logout process is 1675 complete, the iSCSI Layer at the target MUST request the iSER Layer 1676 at the target to terminate the RCaP Stream by invoking the 1677 Connection_Terminate Operational Primitive. 1679 As part of the termination process, the RCaP layer MUST close the 1680 Connection. When the RCaP layer notifies the iSER Layer after the 1681 RCaP Stream and the associated Connection are terminated, the iSER 1682 Layer MUST deallocate all connection and task resources (if any) 1683 associated with the connection, and invalidate the Local and Remote 1684 Mappings (if any). 1686 5.2.3 Termination without Logout Request/Response PDUs 1688 5.2.3.1 Connection Termination Initiated by the iSCSI Layer 1690 The Connection_Terminate Operational Primitive MAY be invoked by the 1691 iSCSI Layer to request the iSER Layer to terminate the RCaP Stream 1692 without having previously exchanged the Logout Request and Logout 1693 Response PDUs between the two iSCSI/iSER nodes. As part of the 1694 termination process, the RCaP layer will close the Connection. When 1695 the RCaP layer notifies the iSER Layer after the RCaP Stream and the 1696 associated Connection are terminated, the iSER Layer MUST perform 1697 the following actions. 1699 If the Connection_Terminate Operational Primitive is invoked by the 1700 iSCSI Layer at the target, then the iSER Layer at the target MUST 1701 deallocate all connection and task resources (if any) associated 1702 with the connection, and invalidate the Local and Remote Mappings 1703 (if any). 1705 If the Connection_Terminate Operational Primitive is invoked by the 1706 iSCSI Layer at the initiator, then the iSER Layer at the initiator 1707 MUST deallocate all connection and task resources (if any) 1708 associated with the connection, and invalidate the Local Mappings 1709 (if any). 1711 5.2.3.2 Connection Termination Notification to the iSCSI Layer 1713 If the iSCSI/iSER connection is terminated without the invocation of 1714 Connection_Terminate from the iSCSI Layer, the iSER Layer MUST 1715 notify the iSCSI Layer that the iSCSI/iSER connection has been 1716 terminated by invoking the Connection_Terminate_Notify Operational 1717 Primitive. 1719 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1720 target MUST deallocate all connection and task resources (if any) 1721 associated with the connection, and invalidate the Local and Remote 1722 Mappings (if any). 1724 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1725 initiator MUST deallocate all connection and task resources (if any) 1726 associated with the connection, and invalidate the Local Mappings 1727 (if any). 1729 If the remote iSCSI/iSER node initiated the closing of the 1730 Connection (e.g., by sending a TCP FIN or TCP RST), the iSER Layer 1731 MUST notify the iSCSI Layer after the RCaP layer reports that the 1732 Connection is closed by invoking the Connection_Terminate_Notify 1733 Operational Primitive. 1735 Another example of a Connection termination without a preceding 1736 logout is when the iSCSI Layer at the initiator does an implicit 1737 logout (connection reinstatement). 1739 6 Login/Text Operational Keys 1741 Certain iSCSI login/text operational keys have restricted usage in 1742 iSER, and additional keys are used to support the iSER protocol 1743 functionality. All other keys defined in [iSCSI] and not discussed 1744 in this section may be used on iSCSI/iSER connections with the same 1745 semantics. 1747 6.1 HeaderDigest and DataDigest 1749 Irrelevant when: RDMAExtensions=Yes 1751 Negotiations resulting in RDMAExtensions=Yes for a session implies 1752 HeaderDigest=None and DataDigest=None for all connections in that 1753 session and overrides both the default and an explicit setting. 1755 6.2 MaxRecvDataSegmentLength 1757 For an iSCSI connection belonging to a session in which 1758 RDMAExtensions=Yes was negotiated on the leading connection of the 1759 session, MaxRecvDataSegmentLength need not be declared in the Login 1760 Phase, and MUST be ignored if it is declared. Instead 1761 InitiatorRecvDataSegmentLength (as described in section 6.5) and 1762 TargetRecvDataSegmentLength (as described in section 6.4) keys are 1763 negotiated. The values of the local and remote 1764 MaxRecvDataSegmentLength are derived from the 1765 InitiatorRecvDataSegmentLength and TargetRecvDataSegmentLength keys. 1767 In the full feature phase, the initiator MUST consider the value of 1768 its local MaxRecvDataSegmentLength (that it would have declared to 1769 the target) as having the value of InitiatorRecvDataSegmentLength, 1770 and the value of the remote MaxRecvDataSegmentLength (that would 1771 have been declared by the target) as having the value of 1772 TargetRecvDataSegmentLength. Similarly, the target MUST consider 1773 the value of its local MaxRecvDataSegmentLength (that it would have 1774 declared to the initiator) as having the value of 1775 TargetRecvDataSegmentLength, and the value of the remote 1776 MaxRecvDataSegmentLength (that would have been declared by the 1777 initiator) as having the value of InitiatorRecvDataSegmentLength. 1779 Note that RFC 3720 requires that when a target receives a NOP-Out 1780 request with a valid Initiator Task Tag, it responds with a NOP-In 1781 with the same Initiator Task Tag that was provided in the NOP-Out 1782 request. Furthermore, it returns the first MaxRecvDataSegmentLength 1783 bytes of the initiator provided Ping Data. Since there is no 1784 MaxRecvDataSegmentLength common to the initiator and the target in 1785 iSER, the length of the data sent with the NOP-Out request MUST NOT 1786 exceed InitiatorMaxRecvDataSegmentLength. 1788 The MaxRecvDataSegmentLength key is applicable only for iSCSI 1789 control-type PDUs. 1791 6.3 RDMAExtensions 1793 Use: LO (leading only) 1795 Senders: Initiator and Target 1797 Scope: SW (session-wide) 1799 RDMAExtensions= 1801 Irrelevant when: SessionType=Discovery 1803 Default is No 1805 Result function is AND 1807 This key is used by the initiator and the target to negotiate the 1808 support for iSER-assisted mode. To enable the use of iSER-assisted 1809 mode, both the initiator and the target MUST exchange 1810 RDMAExtensions=Yes. iSER-assisted mode MUST NOT be used if either 1811 the initiator or the target offers RDMAExtensions=No. 1813 An iSER-enabled node is not required to initiate the RDMAExtensions 1814 key exchange if it prefers to operate in the Traditional iSCSI mode. 1815 However, if the RDMAExtensions key is to be negotiated, an initiator 1816 MUST offer the key in the first Login Request PDU in the 1817 LoginOperationalNegotiation stage of the leading connection, and a 1818 target MUST offer the key in the first Login Response PDU with which 1819 it is allowed to do so (i.e., the first Login Response PDU issued 1820 after the first Login Request PDU with the C bit set to 0) in the 1821 LoginOperationalNegotiation stage of the leading connection. In 1822 response to the offered key=value pair of RDMAExtensions=yes, an 1823 initiator MUST respond in the next Login Request PDU with which it 1824 is allowed to do so, and a target MUST respond in the next Login 1825 Response PDU with which it is allowed to do so. 1827 Negotiating the RDMAExtensions key first enables a node to negotiate 1828 the optimal value for other keys. Certain iSCSI keys such as 1829 MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T, 1830 ImmediateData, etc., may be negotiated differently depending on 1831 whether connection is in Traditional iSCSI mode or iSER-assisted 1832 mode. 1834 6.4 TargetRecvDataSegmentLength 1836 Use: IO (Initialize only) 1838 Senders: Initiator and Target 1840 Scope: CO (connection-only) 1842 Irrelevant when: RDMAExtensions=No 1844 TargetRecvDataSegmentLength= 1846 Default is 8192 bytes 1848 Result function is minimum 1850 This key is relevant only for the iSCSI connection of an iSCSI 1851 session if RDMAExtensions=Yes was negotiated on the leading 1852 connection of the session. It is used by the initiator and the 1853 target to negotiate the maximum size of the data segment that an 1854 initiator may send to the target in an iSCSI control-type PDU in the 1855 full feature phase. For SCSI Command PDUs and SCSI Data-out PDUs 1856 containing non-immediate unsolicited data to be sent by the 1857 initiator, the initiator MUST send all non-Final PDUs with a data 1858 segment size of exactly TargetRecvDataSegmentLength whenever the 1859 PDUs constitute a data sequence whose size is larger than 1860 TargetRecvDataSegmentLength. 1862 6.5 InitiatorRecvDataSegmentLength 1864 Use: IO (Initialize only) 1866 Senders: Initiator and Target 1868 Scope: CO (connection-only) 1870 Irrelevant when: RDMAExtensions=No 1872 InitiatorRecvDataSegmentLength= 1874 Default is 8192 bytes 1876 Result function is minimum 1877 This key is relevant only for the iSCSI connection of an iSCSI 1878 session if RDMAExtensions=Yes was negotiated on the leading 1879 connection of the session. It is used by the initiator and the 1880 target to negotiate the maximum size of the data segment that a 1881 target may send to the initiator in an iSCSI control-type PDU in the 1882 full feature phase. 1884 6.6 OFMarker and IFMarker 1886 Irrelevant when: RDMAExtensions=Yes 1888 Negotiations resulting in RDMAExtensions=Yes for a session implies 1889 OFMarker=No and IFMarker=No for all connections in that session and 1890 overrides both the default and an explicit setting. 1892 6.7 MaxOutstandingUnexpectedPDUs 1894 Use: LO (leading only), Declarative 1896 Senders: Initiator and Target 1898 Scope: SW (session-wide) 1900 Irrelevant when: RDMAExtensions=No 1902 MaxOutstandingUnexpectedPDUs= 1905 Default is 0 1907 This key is used by the initiator and the target to declare the 1908 maximum number of outstanding "unexpected" iSCSI control-type PDUs 1909 that it can receive in the full feature phase. It is intended to 1910 allow the receiving side to determine the amount of buffer resources 1911 needed beyond the normal flow control mechanism available in iSCSI. 1912 An initiator or target should select a value such that it would not 1913 impose an unnecessary constraint on the iSCSI Layer under normal 1914 circumstances. The value of 0 is defined to indicate that the 1915 declarer has no limit on the maximum number of outstanding 1916 "unexpected" iSCSI control-type PDUs that it can receive. See 1917 sections 8.1.1 and 8.1.2 for the usage of this key. Note that iSER 1918 Hello and HelloReply Messages are not iSCSI control-type PDUs and 1919 are not affected by this key. 1921 For interoperability with implementations based on [RFC5046], this 1922 key SHOULD be negotiated because the default value of 0 in [RFC5046] 1923 is problematic for most implementations as it does not impose a 1924 bound on resources consumable by unexpected PDUs. 1926 6.8 MaxAHSLength 1928 Use: LO (leading only), Declarative 1930 Senders: Initiator and Target 1932 Scope: SW (session-wide) 1934 Irrelevant when: RDMAExtensions=No 1936 MaxAHSLength= 1938 Default is 256 1940 This key is used by the intiator and target to declare the maximum 1941 size of AHS in an iSCSI control-type PDU that it can receive in the 1942 full feature phase. It is intended to allow the receiving side to 1943 determine the amount of resources needed for receive buffering. An 1944 initiator or target should select a value such that it would not 1945 impose an unnecessary constraint on the iSCSI Layer under normal 1946 circumstances. The value of 0 is defined to indicate that the 1947 declarer has no limit on the maximum size of AHS in iSCSI control- 1948 type PDUs that it can receive. 1950 For interoperability with implementations based on [RFC5046], an 1951 initiator or target MAY terminate the connection if it anticipates 1952 MaxAHSLength to be greater than 256 and the key is not understood by 1953 its peer. 1955 6.9 TaggedBufferForSolicitedDataOnly 1957 Use: LO (leading only), Declarative 1959 Senders: Initiator 1961 Scope: SW (session-wide) 1963 RDMAExtensions= 1965 Irrelevant when: RDMAExtensions=No 1967 Default is No 1968 This key is used by the intiator to declare to the target the usage 1969 of the Write Base Offset in the iSER header of an iSCSI control-type 1970 PDU. When set to No, the Base Offset is associated with an I/O 1971 buffer that contains all the write data, including both unsolicited 1972 and solicited data. When set to Yes, the Base Offset is associated 1973 with an I/O buffer that only contains solicited data. 1975 6.10 iSERHelloRequired 1977 Use: LO (leading only), Declarative 1979 Senders: Initiator 1981 Scope: SW (session-wide) 1983 RDMAExtensions= 1985 Irrelevant when: RDMAExtensions=No 1987 Default is No 1989 This key is relevant only for the iSCSI connection of an iSCSI 1990 session if RDMAExtensions=Yes was negotiated on the leading 1991 connection of the session. It is used by the intiator to declare to 1992 the target if the iSER Hello Exchange is required. When set to Yes, 1993 the iSER layers MUST perform the iSER Hello Exchange as described in 1994 5.1.3. When set to No, the iSER layers MUST NOT perform the iSER 1995 Hello Exchange. 1997 7 iSCSI PDU Considerations 1999 When a connection is in the iSER-assisted mode, two types of message 2000 transfers are allowed between the iSCSI Layer at the initiator and 2001 the iSCSI Layer at the target. These are known as the iSCSI data- 2002 type PDUs and the iSCSI control-type PDUs and these terms are 2003 described in the following sections. 2005 7.1 iSCSI Data-Type PDU 2007 An iSCSI data-type PDU is defined as an iSCSI PDU that causes data 2008 transfer, transparent to the remote iSCSI layer, to take place 2009 between the peer iSCSI nodes in the full feature phase of an 2010 iSCSI/iSER connection. An iSCSI data-type PDU, when requested for 2011 transmission by the iSCSI Layer in the sending node, results in the 2012 data being transferred without the participation of the iSCSI Layers 2013 at the sending and the receiving nodes. This is due to the fact 2014 that the PDU itself is not delivered as-is to the iSCSI Layer in the 2015 receiving node. Instead, the data transfer operations are 2016 transformed into the appropriate RDMA operations which are handled 2017 by the RDMA-Capable Controller. The set of iSCSI data-type PDUs 2018 consists of SCSI Data-in PDUs and R2T PDUs. 2020 If the invocation of the Operational Primitive by the iSCSI Layer to 2021 request the iSER Layer to process an iSCSI data-type PDU is 2022 qualified with Notify_Enable set, then upon completing the RDMA 2023 operation, the iSER Layer at the target MUST notify the iSCSI Layer 2024 at the target by invoking the Data_Completion_Notify Operational 2025 Primitive qualified with ITT and SN. There is no data completion 2026 notification at the initiator since the RDMA operations are 2027 completely handled by the RDMA-Capable Controller at the initiator 2028 and the iSER Layer at the initiator is not involved with the data 2029 transfer associated with iSCSI data-type PDUs. 2031 If the invocation of the Operational Primitive by the iSCSI Layer to 2032 request the iSER Layer to process an iSCSI data-type PDU is 2033 qualified with Notify_Enable cleared, then upon completing the RDMA 2034 operation, the iSER Layer at the target MUST NOT notify the iSCSI 2035 Layer at the target and MUST NOT invoke the Data_Completion_Notify 2036 Operational Primitive. 2038 If an operation associated with an iSCSI data-type PDU fails for any 2039 reason, the contents of the Data Sink buffers associated with the 2040 operation are considered indeterminate. 2042 7.2 iSCSI Control-Type PDU 2044 Any iSCSI PDU that is not an iSCSI data-type PDU and also not a SCSI 2045 Data-out PDU carrying solicited data is defined as an iSCSI control- 2046 type PDU. The iSCSI Layer invokes the Send_Control Operational 2047 Primitive to request the iSER Layer to process an iSCSI control-type 2048 PDU. iSCSI control-type PDUs are transferred using Send Messages of 2049 RCaP. Specifically, it is to be noted that SCSI Data-Out PDUs 2050 carrying unsolicited data are defined as iSCSI control-type PDUs. 2051 See section 7.3.4 on the treatment of SCSI Data-out PDUs. 2053 When the iSER Layer receives an iSCSI control-type PDU, it MUST 2054 notify the iSCSI Layer by invoking the Control_Notify Operational 2055 Primitive qualified with the iSCSI control-type PDU. 2057 7.3 iSCSI PDUs 2059 This section describes the handling of each of the iSCSI PDU types 2060 by the iSER Layer. The iSCSI Layer requests the iSER Layer to 2061 process the iSCSI PDU by invoking the appropriate Operational 2062 Primitive. A Connection_Handle MUST qualify each of these 2063 invocations. In addition, BHS and the optional AHS of the iSCSI PDU 2064 as defined in [iSCSI] MUST qualify each of the invocations. The 2065 qualifying Connection_Handle, the BHS and the AHS are not explicitly 2066 listed in the subsequent sections. 2068 7.3.1 SCSI Command 2070 Type: control-type PDU 2072 PDU-specific qualifiers (for SCSI Write or bidirectional 2073 command): ImmediateDataSize, UnsolicitedDataSize, 2074 DataDescriptorOut 2076 PDU-specific qualifiers (for SCSI Read or bidirectional 2077 command): DataDescriptorIn 2079 The iSER Layer at the initiator MUST send the SCSI command in a Send 2080 Message to the target. The SendSE Message should be used if 2081 supported by the RCaP layer (e.g., iWARP). 2083 For a SCSI Write or bidirectional command, the iSCSI Layer at the 2084 initiator MUST invoke the Send_Control Operational Primitive as 2085 follows: 2087 * If there is immediate data to be transferred for the SCSI write 2088 or bidirectional command, the qualifier ImmediateDataSize MUST be 2089 used to define the number of bytes of immediate unsolicited data 2090 to be sent with the write or bidirectional command, and the 2091 qualifier DataDescriptorOut MUST be used to define the 2092 initiator's I/O Buffer containing the SCSI Write data. 2094 * If there is unsolicited data to be transferred for the SCSI Write 2095 or bidirectional command, the qualifier UnsolicitedDataSize MUST 2096 be used to define the number of bytes of immediate and non- 2097 immediate unsolicited data for the command. The iSCSI Layer will 2098 issue one or more SCSI Data-out PDUs for the non-immediate 2099 unsolicited data. See Section 7.3.4 on SCSI Data-out. 2101 * If there is solicited data to be transferred for the SCSI Write 2102 or bidirectional command, as indicated by the Expected Data 2103 Transfer Length in the SCSI Command PDU exceeding the value of 2104 UnsolicitedDataSize, the iSER Layer at the initiator MUST do the 2105 following: 2107 a. It MUST allocate a Write STag for the I/O Buffer defined by 2108 the qualifier DataDescriptorOut. DataDescriptorOut 2109 describes the I/O buffer starting with the immediate 2110 unsolicited data (if any), followed by the non-immediate 2111 unsolicited data (if any) and solicited data. When 2112 TaggedBufferForSolicitedDataOnly is negotiated to No, the 2113 Base Offset is associated with this I/O Buffer. When 2114 TaggedBufferForSolicitedDataOnly is negotiated to Yes, the 2115 Base Offset is associated with an I/O Buffer that contains 2116 only solicited data. 2118 b. It MUST establish a Local Mapping that associates the 2119 Initiator Task Tag (ITT) to the Write STag. 2121 c. It MUST Advertise the Write STag and the Base Offset to the 2122 target by sending them in the iSER header of the iSER 2123 Message (the payload of the Send Message of RCaP) containing 2124 the SCSI Write or bidirectional command PDU. The SendSE 2125 Message should be used if supported by the RCaP layer (e.g., 2126 iWARP). See section 9.2 on iSER Header Format for iSCSI 2127 Control-Type PDU. 2129 For a SCSI Read or bidirectional command, the iSCSI Layer at the 2130 initiator MUST invoke the Send_Control Operational Primitive 2131 qualified with DataDescriptorIn which defines the initiator's I/O 2132 Buffer for receiving the SCSI Read data. The iSER Layer at the 2133 initiator MUST do the following: 2135 a. It MUST allocate a Read STag for the I/O Buffer and note the 2136 Base Offset for this I/O Buffer. 2138 b. It MUST establish a Local Mapping that associates the 2139 Initiator Task Tag (ITT) to the Read STag. 2141 c. It MUST Advertise the Read STag and the Base Offset to the 2142 target by sending them in the iSER header of the iSER 2143 Message (the payload of the Send Message of RCaP) containing 2144 the SCSI Read or bidirectional command PDU. The SendSE 2145 Message should be used if supported by the RCaP layer (e.g., 2146 iWARP). See section 9.2 on iSER Header Format for iSCSI 2147 Control-Type PDU. 2149 If the amount of unsolicited data to be transferred in a SCSI 2150 Command exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at 2151 the initiator MUST segment the data into multiple iSCSI control-type 2152 PDUs, with the data segment length in all PDUs generated except the 2153 last one having exactly the size TargetRecvDataSegmentLength. The 2154 data segment length of the last iSCSI control-type PDU carrying the 2155 unsolicited data can be up to TargetRecvDataSegmentLength. 2157 When the iSER Layer at the target receives the SCSI Command, it MUST 2158 establish a Remote Mapping that associates the ITT to the Base 2159 Offset(s) and the Advertised STag(s) in the iSER header. The Write 2160 STag is used by the iSER Layer at the target in handling the data 2161 transfer associated with the R2T PDU(s) as described in section 2162 7.3.6. The Read STag is used in handling the SCSI Data-in PDU(s) 2163 from the iSCSI Layer at the target as described in section 7.3.5. 2165 7.3.2 SCSI Response 2167 Type: control-type PDU 2169 PDU-specific qualifiers: DataDescriptorStatus 2171 The iSCSI Layer at the target MUST invoke the Send_Control 2172 Operational Primitive qualified with DataDescriptorStatus which 2173 defines the buffer containing the sense and response information. 2174 The iSCSI Layer at the target MUST always return the SCSI status for 2175 a SCSI command in a separate SCSI Response PDU. "Phase collapse" 2176 for transferring SCSI status in a SCSI Data-in PDU MUST NOT be used. 2177 The iSER Layer at the target sends the SCSI Response PDU according 2178 to the following rules: 2180 * If no STags were Advertised by the initiator in the iSER Message 2181 containing the SCSI command PDU, then the iSER Layer at the 2182 target MUST send a Send Message containing the SCSI Response PDU. 2183 The SendSE Message should be used if supported by the RCaP layer 2184 (e.g., iWARP). 2186 * If the initiator Advertised a Read STag in the iSER Message 2187 containing the SCSI Command PDU, then the iSER Layer at the 2188 target MUST send a Send Message containing the SCSI Response PDU. 2189 The header of the Send Message MUST carry the Read STag to be 2190 invalidated at the initiator. The Send with Invalidate Message, 2191 if supported by the RCaP layer (e.g., iWARP), can be used for the 2192 automatic invalidation of the STag. 2194 * If the initiator Advertised only the Write STag in the iSER 2195 Message containing the SCSI command PDU, then the iSER Layer at 2196 the target MUST send a Send Message containing the SCSI Response 2197 PDU. The header of the Send Message MUST carry the Write STag to 2198 be invalidated at the initiator. The Send with Invalidate 2199 Message, if supported by the RCaP layer (e.g., iWARP), can be 2200 used for the automatic invalidation of the STag. 2202 When the iSCSI Layer at the target invokes the Send_Control 2203 Operational Primitive to send the SCSI Response PDU, the iSER Layer 2204 at the target MUST invalidate the Remote Mapping before transferring 2205 the SCSI Response PDU to the initiator. 2207 Upon receiving a Send Message containing the SCSI Response PDU from 2208 the target, the iSER layer at the initiator MUST invalidate the 2209 STag(s) specified in the header. (If a Send with Invalidate Message 2210 is supported by the RCaP layer (e.g., iWARP) and is used to carry 2211 the SCSI Response PDU, the RCaP layer at the initiator will 2212 invalidate the STag. The iSER Layer at the initiator MUST ensure 2213 that the correct STag is invalidated. If both the Read and the 2214 Write STags were Advertised earlier by the initiator, then the iSER 2215 Layer at the initiator MUST explicitly invalidate the Write STag 2216 upon receiving the Send with Invalidate Message because the header 2217 of the Send with Invalidate Message can only carry one STag (in this 2218 case the Read STag) to be invalidated.) 2220 The iSER Layer at the initiator MUST ensure the invalidation of the 2221 STag(s) used in a command before notifying the iSCSI Layer at the 2222 initiator by invoking the Control_Notify Operational Primitive 2223 qualified with the SCSI Response. This precludes the possibility of 2224 using the STag(s) after the completion of the command thereby 2225 causing data corruption. 2227 When the iSER Layer at the initiator receives a Send Message 2228 containing the SCSI Response PDU, it SHOULD invalidate the Local 2229 Mapping. The iSER Layer MUST ensure that all local STag(s) 2230 associated with the ITT are invalidated before notifying the iSCSI 2231 Layer of the SCSI Response PDU by invoking the Control_Notify 2232 Operational Primitive qualified with the SCSI Response PDU. 2234 7.3.3 Task Management Function Request/Response 2236 Type: control-type PDU 2238 PDU-specific qualifiers (for TMF Request): DataDescriptorOut, 2239 DataDescriptorIn 2241 The iSER Layer MUST use a Send Message to send the Task Management 2242 Function Request/Response PDU. The SendSE Message should be used if 2243 supported by the RCaP layer (e.g., iWARP). 2245 For the Task Management Function Request with the TASK REASSIGN 2246 function, the iSER Layer at the initiator MUST do the following: 2248 * It MUST use the ITT as specified in the Referenced Task Tag from 2249 the Task Management Function Request PDU to locate the existing 2250 STags (if any) in the Local Mappings. 2252 * It MUST invalidate the existing STags (if any) and the Local 2253 Mappings. 2255 * It MUST allocate a Read STag for the I/O Buffer and note the Base 2256 Offset associated with the I/O Buffer as defined by the qualifier 2257 DataDescriptorIn if the Send_Control Operational Primitive 2258 invocation is qualified with DataDescriptorIn. 2260 * It MUST allocate a Write STag for the I/O Buffer and note the 2261 Base OIffset associated with the I/O Buffer as defined by the 2262 qualifier DataDescriptorOut if the Send_Control Operational 2263 Primitive invocation is qualified with DataDescriptorOut. 2265 * If STags are allocated, it MUST establish new Local Mapping(s) 2266 that associate the ITT to the allocated STag(s). 2268 * It MUST Advertise the STags and the Base Offsets, if allocated, 2269 to the target in the iSER header of the Send Message carrying the 2270 iSCSI PDU, as described in section 9.2. The SendSE Message 2271 should be used if supported by the RCaP layer (e.g., iWARP). 2273 For the Task Management Function Request with the TASK REASSIGN 2274 function for a SCSI Read or bidirectional command, the iSCSI Layer 2275 at the initiator MUST set ExpDataSN to 0 since the data transfer and 2276 acknowledgements happen transparently to the iSCSI Layer at the 2277 initiator. This provides the flexibility to the iSCSI Layer at the 2278 target to request transmission of only the unacknowledged data as 2279 specified in [iSCSI]. 2281 When the iSER Layer at the target receives the Task Management 2282 Function Request with the TASK REASSIGN function, it MUST do the 2283 following: 2285 * It MUST use the ITT as specified in the Referenced Task Tag from 2286 the Task Management Function Request PDU to locate the Local and 2287 Remote Mappings (if any). 2289 * It MUST invalidate the local STaqs (if any) associated with the 2290 ITT. 2292 * It MUST replace the Base Offset(s) and the Advertised STag(s) in 2293 the Remote Mapping with the Base Offset(s) and the Advertised 2294 STag(s) in the iSER header. The Write STag is used in the 2295 handling of the R2T PDU(s) from the iSCSI Layer at the target as 2296 described in section 7.3.6. The Read STag is used in the 2297 handling of the SCSI Data-in PDU(s) from the iSCSI Layer at the 2298 target as described in section 7.3.5. 2300 7.3.4 SCSI Data-out 2302 Type: control-type PDU 2304 PDU-specific qualifiers: DataDescriptorOut 2306 The iSCSI Layer at the initiator MUST invoke the Send_Control 2307 Operational Primitive qualified with DataDescriptorOut which defines 2308 the initiator's I/O Buffer containing unsolicited SCSI Write data. 2310 If the amount of unsolicited data to be transferred as SCSI Data-out 2311 exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at the 2312 initiator MUST segment the data into multiple iSCSI control-type 2313 PDUs, with the DataSegmentLength having the value of 2314 TargetRecvDataSegmentLength in all PDUs generated except the last 2315 one. The DataSegmentLength of the last iSCSI control-type PDU 2316 carrying the unsolicited data can be up to 2317 TargetRecvDataSegmentLength. The iSCSI Layer at the target MUST 2318 perform the reassembly function for the unsolicited data. 2320 For unsolicited data, the iSER Layer at the initiator MUST use a 2321 Send Message to send the SCSI Data-out PDU. If the F bit is set to 2322 1, the SendSE Message shoud be used if supported by the RCaP layer 2323 (e.g., iWARP). 2325 Note that for solicited data, the SCSI Data-out PDUs are not used 2326 since R2T PDUs are not delivered to the iSCSI layer at the 2327 initiator; instead R2T PDUs are transformed by the iSER layer at the 2328 target into RDMA Read operations. (See section 7.3.6.) 2330 7.3.5 SCSI Data-in 2332 Type: data-type PDU 2334 PDU-specific qualifiers: DataDescriptorIn 2336 When the iSCSI Layer at the target is ready to return the SCSI Read 2337 data to the initiator, it MUST invoke the Put_Data Operational 2338 Primitive qualified with DataDescriptorIn which defines the SCSI 2339 Data-in buffer. See section 7.1 on the general requirement on the 2340 handling of iSCSI data-type PDUs. SCSI Data-in PDU(s) are used in 2341 SCSI Read data transfer as described in section 9.5.2. 2343 The iSER Layer at the target MUST do the following for each 2344 invocation of the Put_Data Operational Primitive: 2346 1. It MUST use the ITT in the SCSI Data-in PDU to locate the remote 2347 Read STag and the Base Offset in the Remote Mapping. The Remote 2348 Mapping was established earlier by the iSER Layer at the target 2349 when the SCSI Read Command was received from the initiator. 2351 2. It MUST generate and send an RDMA Write Message containing the 2352 read data to the initiator. 2354 a. It MUST use the remote Read STag as the Data Sink STag of 2355 the RDMA Write Message. 2357 b. It MUST add the Buffer Offset from the SCSI Data-in PDU to 2358 the Base Offset from the Remote Mapping as the Data Sink 2359 Tagged Offset of the RDMA Write Message. 2361 c. It MUST use DataSegmentLength from the SCSI Data-in PDU to 2362 determine the amount of data to be sent in the RDMA Write 2363 Message. 2365 3. It MUST associate DataSN and ITT from the SCSI Data-in PDU with 2366 the RDMA Write operation. If the Put_Data Operational Primitive 2367 invocation was qualified with Notify_Enable set, then when the 2368 iSER Layer at the target receives a completion from the RCaP 2369 layer for the RDMA Write Message, the iSER Layer at the target 2370 MUST notify the iSCSI Layer by invoking the 2371 Data_Completion_Notify Operational Primitive qualified with 2372 DataSN and ITT. Conversely, if the Put_Data Operational 2373 Primitive invocation was qualified with Notify_Enable cleared, 2374 then the iSER Layer at the target MUST NOT notify the iSCSI 2375 Layer on completion and MUST NOT invoke the 2376 Data_Completion_Notify Operational Primitive. 2378 When the A-bit is set to 1 in the SCSI Data-in PDU, the iSER Layer 2379 at the target MUST notify the iSCSI Layer at the target when the 2380 data transfer is complete at the initiator. To perform this 2381 additional function, the iSER Layer at the target can take advantage 2382 of the operational ErrorRecoveryLevel if previously disclosed by the 2383 iSCSI Layer via an earlier invocation of the Notice_Key_Values 2384 Operational Primitive. There are two approaches that can be taken: 2386 1. If the iSER Layer at the target knows that the operational 2387 ErrorRecoveryLevel is 2, or if the iSER Layer at the target does 2388 not know the operational ErrorRecoveryLevel, then the iSER Layer 2389 at the target MUST issue a zero-length RDMA Read Request Message 2390 following the RDMA Write Message. When the iSER Layer at the 2391 target receives a completion for the RDMA Read Request Message 2392 from the RCaP layer, implying that the RDMA-Capable Controller 2393 at the initiator has completed processing the RDMA Write Message 2394 due to the completion ordering semantics of RCaP, the iSER Layer 2395 at the target MUST notify the iSCSI Layer at the target by 2396 invoking the Data_Ack_Notify Operational Primitive qualified 2397 with ITT and DataSN (see section 3.2.3). 2399 2. If the iSER Layer at the target knows that the operational 2400 ErrorRecoveryLevel is 1, then the iSER Layer at the target MUST 2401 do one of the following: 2403 a. It MUST notify the iSCSI Layer at the target by invoking the 2404 Data_Ack_Notify Operational Primitive qualified with ITT and 2405 DataSN (see section 3.2.3) when it receives the local 2406 completion from the RCaP layer for the RDMA Write Message. 2407 This is allowed since digest errors do not occur in iSER 2408 (see section 10.1.4.2) and a CRC error will cause the 2409 connection to be terminated and the task to be terminated 2410 anyway. The local RDMA Write completion from the RCaP layer 2411 guarantees that the RCaP layer will not access the I/O 2412 Buffer again to transfer the data associated with that RDMA 2413 Write operation. 2415 b. Alternatively, it MUST use the same procedure for handling 2416 the data transfer completion at the initiator as for 2417 ErrorRecoveryLevel 2. 2419 It should be noted that the iSCSI Layer at the target cannot set the 2420 A-bit to 1 if the ErrorRecoveryLevel=0. 2422 SCSI status MUST always be returned in a separate SCSI Response PDU. 2423 The S bit in the SCSI Data-in PDU MUST always be set to 0. There 2424 MUST NOT be a "phase collapse" in the SCSI Data-in PDU. 2426 Since the RDMA Write Message only transfers the data portion of the 2427 SCSI Data-in PDU but not the control information in the header, such 2428 as ExpCmdSN, if timely updates of such information is crucial, the 2429 iSCSI Layer at the initiator MAY issue NOP-Out PDUs to request the 2430 iSCSI Layer at the target to respond with the information using NOP- 2431 In PDUs. 2433 7.3.6 Ready To Transfer (R2T) 2435 Type: data-type PDU 2437 PDU-specific qualifiers: DataDescriptorOut 2439 In order to send an R2T PDU, the iSCSI Layer at the target MUST 2440 invoke the Get_Data Operational Primitive qualified with 2441 DataDescriptorOut which defines the I/O Buffer for receiving the 2442 SCSI Write data from the initiator. See section 7.1 on the general 2443 requirements on the handling of iSCSI data-type PDUs. 2445 The iSER Layer at the target MUST do the following for each 2446 invocation of the Get_Data Operational Primitive: 2448 1. It MUST ensure a valid local STag for the I/O Buffer and a valid 2449 Local Mapping. This may involve allocating a valid local STag 2450 and establishing a Local Mapping. 2452 2. It MUST use the ITT in the R2T to locate the remote Write STag 2453 and the Base Offset in the Remote Mapping. The Remote Mapping 2454 was established earlier by the iSER Layer at the target when the 2455 iSER Message containing the Advertised Write STag, the Base 2456 Offset and the SCSI Command PDU for a SCSI Write or 2457 bidirectional command was received from the initiator. 2459 3. If the iSER-ORD value at the target is set to 0, the iSER Layer 2460 at the target MUST terminate the connection and free up the 2461 resources associated with the connection (as described in 5.2.3) 2462 if it received the R2T PDU from the iSCSI Layer at the target. 2463 Upon termination of the connection, the iSER Layer at the target 2464 MUST notify the iSCSI Layer at the target by invoking the 2465 Connection Terminate Notify Operational Primitive. 2467 4. If the iSER-ORD value at the target is set to greater than 0, 2468 the iSER Layer at the target MUST transform the R2T PDU into an 2469 RDMA Read Request Message. While transforming the R2T PDU, the 2470 iSER Layer at the target MUST ensure that the number of 2471 outstanding RDMA Read Request Messages does not exceed iSER-ORD 2472 value. To transform the R2T PDU, the iSER Layer at the target: 2474 a. MUST derive the local STag and local Tagged Offset from the 2475 DataDescriptorOut that qualified the Get_Data invocation. 2477 b. MUST use the local STag as the Data Sink STag of the RDMA 2478 Read Request Message. 2480 c. MUST use the local Tagged Offset as the Data Sink Tagged 2481 Offset of the RDMA Read Request Message. 2483 d. MUST use the Desired Data Transfer Length from the R2T PDU 2484 as the RDMA Read Message Size of the RDMA Read Request 2485 Message. 2487 e. MUST use the remote Write STag as the Data Source STag of 2488 the RDMA Read Request Message. 2490 f. MUST add the Buffer Offset from the R2T PDU to the Base 2491 Offset from the Remote Mapping as the Data Source Tagged 2492 Offset of the RDMA Read Request Message. 2494 5. It MUST associate R2TSN and ITT from the R2T PDU with the RDMA 2495 Read operation. If the Get_Data Operational Primitive 2496 invocation was qualified with Notify_Enable set, then when the 2497 iSER Layer at the target receives a completion from the RCaP 2498 layer for the RDMA Read operation, the iSER Layer at the target 2499 MUST notify the iSCSI Layer by invoking the 2500 Data_Completion_Notify Operational Primitive qualified with 2501 R2TSN and ITT. Conversely, if the Get_Data Operational 2502 Primitive invocation was qualified with Notify_Enable cleared, 2503 then the iSER Layer at the target MUST NOT notify the iSCSI 2504 Layer on completion and MUST NOT invoke the 2505 Data_Completion_Notify Operational Primitive. 2507 When the RCaP layer at the initiator receives a valid RDMA Read 2508 Request Message, it will return an RDMA Read Response Message 2509 containing the solicited write data to the target. When the RCaP 2510 layer at target receives the RDMA Read Response Message from the 2511 initiator, it will place the solicited data in the I/O Buffer 2512 referenced by the Data Sink STag in the RDMA Read Response Message. 2514 Since the RDMA Read Request Message from the target does not 2515 transfer the control information in the R2T PDU such as ExpCmdSN, if 2516 timely updates of such information is crucial, the iSCSI Layer at 2517 the initiator MAY issue NOP-Out PDUs to request the iSCSI Layer at 2518 the target to respond with the information using NOP-In PDUs. 2520 Similarly, since the RDMA Read Response Message from the initiator 2521 only transfers the data but not the control information normally 2522 found in the SCSI Data-out PDU, such as ExpStatSN, if timely updates 2523 of such information is crucial, the iSCSI Layer at the target MAY 2524 issue NOP-In PDUs to request the iSCSI Layer at the initiator to 2525 respond with the information using NOP-Out PDUs. 2527 7.3.7 Asynchronous Message 2529 Type: control-type PDU 2531 PDU-specific qualifiers: DataDescriptorSense 2533 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2534 qualified with DataDescriptorSense which defines the buffer 2535 containing the sense and iSCSI event information. The iSER Layer 2536 MUST use a Send Message to send the Asynchronous Message PDU. The 2537 SendSE Message should be used if supported by the RCaP layer (e.g., 2538 iWARP). 2540 7.3.8 Text Request & Text Response 2542 Type: control-type PDU 2544 PDU-specific qualifiers: DataDescriptorTextOut (for Text 2545 Request), DataDescriptorIn (for Text Response) 2547 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2548 qualified with DataDescriptorTextOut (or DataDescriptorIn) which 2549 defines the Text Request (or Text Response) buffer. The iSER Layer 2550 MUST use Send Messages to send the Text Request (or Text Response 2551 PDUs). The SendSE Message should be used if supported by the RCaP 2552 layer (e.g., iWARP). 2554 7.3.9 Login Request & Login Response 2556 During the login negotiation, the iSCSI Layer interacts with the 2557 transport layer directly and the iSER Layer is not involved. See 2558 section 5.1 on iSCSI/iSER Connection Setup. If the underlying 2559 transport is TCP, the Login Request PDUs and the Login Response PDUs 2560 are exchanged when the connection between the initiator and the 2561 target is still in the byte stream mode. 2563 The iSCSI Layer MUST NOT send a Login Request (or a Login Response) 2564 PDU during the full feature phase. A Login Request (or a Login 2565 Response) PDU, if used, MUST be treated as an iSCSI protocol error. 2566 The iSER Layer MAY reject such a PDU from the iSCSI Layer with an 2567 appropriate error code. If a Login Request PDU is received by the 2568 iSCSI Layer at the target, it MUST respond with a Reject PDU with a 2569 reason code of "protocol error". 2571 7.3.10 Logout Request & Logout Response 2573 Type: control-type PDU 2575 PDU-specific qualifiers: None 2577 The iSER Layer MUST use a Send Message to send the Logout Request or 2578 Logout Response PDU. The SendSE Message should be used if supported 2579 by the RCaP layer (e.g., iWARP). Section 5.2.1 and 5.2.2 describe 2580 the handling of the Logout Request and the Logout Response at the 2581 initiator and the target and the interactions between the initiator 2582 and the target to terminate a connection. 2584 7.3.11 SNACK Request 2586 Since HeaderDigest and DataDigest must be negotiated to "None", 2587 there are no digest errors when the connection is in iSER-assisted 2588 mode. Also since RCaP delivers all messages in the order they were 2589 sent, there are no sequence errors when the connection is in iSER- 2590 assisted mode. Therefore the iSCSI Layer MUST NOT send SNACK 2591 Request PDUs. A SNCAK Request PDU, if used, MUST be treated as an 2592 iSCSI protocol error. The iSER Layer MAY reject such a PDU from the 2593 iSCSI Layer with an appropriate error code. If a SNACK Request PDU 2594 is received by the iSCSI Layer at the target, it MUST respond with a 2595 Reject PDU with a reason code of "protocol error". 2597 7.3.12 Reject 2599 Type: control-type PDU 2600 PDU-specific qualifiers: DataDescriptorReject 2602 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2603 qualified with DataDescriptorReject which defines the Reject buffer. 2604 The iSER Layer MUST use a Send Message to send the Reject PDU. The 2605 SendSE Message should be used if supported by the RCaP layer (e.g., 2606 iWARP). 2608 7.3.13 NOP-Out & NOP-In 2610 Type: control-type PDU 2612 PDU-specific qualifiers: DataDescriptorNOPOut (for NOP-Out), 2613 DataDescriptorNOPIn (for NOP-In) 2615 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2616 qualified with DataDescriptorNOPOut (or DataDescriptorNOPIn) which 2617 defines the Ping (or Return Ping) data buffer. The iSER Layer MUST 2618 use Send Messages to send the NOP-Out (or NOP-In) PDU. The SendSE 2619 Message should be used if supported by the RCaP layer (e.g., iWARP). 2621 8 Flow Control and STag Management 2623 8.1 Flow Control for RDMA Send Messages 2625 Send Messages in RCaP are used by the iSER Layer to transfer iSCSI 2626 control-type PDUs. Each Send Message in RCaP consumes an Untagged 2627 Buffer at the Data Sink. However, neither the RCaP layer nor the 2628 iSER Layer provides an explicit flow control mechanism for the Send 2629 Messages. Therefore, the iSER Layer SHOULD provision enough 2630 Untagged buffers for handling incoming Send Messages to prevent 2631 buffer exhaustion at the RCaP layer. If buffer exhaustion occurs, 2632 it may result in the termination of the connection. 2634 An implementation may choose to satisfy the buffer requirement by 2635 using a common buffer pool shared across multiple connections, with 2636 usage limits on a per connection basis and usage limits on the 2637 buffer pool itself. In such an implementation, exceeding the buffer 2638 usage limit for a connection or the buffer pool itself may trigger 2639 interventions from the iSER Layer to replenish the buffer pool 2640 and/or to isolate the connection causing the problem. 2642 iSER also provides the MaxOutstandingUnexpectedPDUs key to be used 2643 by the initiator and the target to declare the maximum number of 2644 outstanding "unexpected" control-type PDUs that it can receive. It 2645 is intended to allow the receiving side to determine the amount of 2646 buffer resources needed beyond the normal flow control mechanism 2647 available in iSCSI. 2649 The buffer resources required at both the initiator and the target 2650 as a result of control-type PDUs sent by the initiator is described 2651 in section 8.1.1. The buffer resources required at both the 2652 initiator and target as a result of control-type PDUs sent by the 2653 target is described in section 8.1.2. 2655 8.1.1 Flow Control for Control-Type PDUs from the Initiator 2657 The control-type PDUs that can be sent by an initiator to a target 2658 can be grouped into the following categories: 2660 1. Regulated: Control-type PDUs in this category are regulated by 2661 the iSCSI CmdSN window mechanism and the immediate flag is not 2662 set. 2664 2. Unregulated but Expected: Control-type PDUs in this category 2665 are not regulated by the iSCSI CmdSN window mechanism but are 2666 expected by the target. 2668 3. Unregulated and Unexpected: Control-type PDUs in this category 2669 are not regulated by the iSCSI CmdSN window mechanism and are 2670 "unexpected" by the target. 2672 8.1.1.1 Control-Type PDUs from the Initiator in the Regulated Category 2674 Control-type PDUs that can be sent by the initiator in this category 2675 are regulated by the iSCSI CmdSN window mechanism and the immediate 2676 flag is not set. 2678 The queuing capacity required of the iSCSI layer at the target is 2679 described in section 4.2.2.1 of [iSCSI]. For each of the control- 2680 type PDUs that can be sent by the initiator in this category, the 2681 initiator MUST provision for the buffer resources required for the 2682 corresponding control-type PDU sent as a response from the target. 2683 The following is a list of the PDUs that can be sent by the 2684 initiator and the PDUs that are sent by the target in response: 2686 a. When an initiator sends a SCSI Command PDU, it expects a 2687 SCSI Response PDU from the target. 2689 b. When the initiator sends a Task Management Function Request 2690 PDU, it expects a Task Management Function Response PDU from 2691 the target. 2693 c. When the initiator sends a Text Request PDU, it expects a 2694 Text Response PDU from the target. 2696 d. When the initiator sends a Logout Request PDU, it expects a 2697 Logout Response PDU from the target. 2699 e. When the initiator sends a NOP-Out PDU as a ping request 2700 with ITT != 0xffffffff and TTT = 0xffffffff, it expects a 2701 NOP-In PDU from the target with the same ITT and TTT as in 2702 the ping request. 2704 The response from the target for any of the PDUs enumerated here may 2705 alternatively be in the form of a Reject PDU sent instead before the 2706 task is active, as described in section 7.3 of [iSCSI]. 2708 8.1.1.2 Control-Type PDUs from the Initiator in the Unregulated but 2709 Expected Category 2711 For the control-type PDUs in the Unregulated but Expected category, 2712 the amount of buffering resources required at the target can be 2713 predetermined. The following is a list of the PDUs in this 2714 category: 2716 a. SCSI Data-out PDUs are used by the initiator to send 2717 unsolicited data. The amount of buffer resources required 2718 by the target can be determined using FirstBurstLength. 2719 Note that SCSI Data-out PDUs are not used for solicited 2720 data since the R2T PDU which is used for solicitation is 2721 transformed into RDMA Read operations by the iSER layer at 2722 the target. See section 7.3.4. 2724 b. A NOP-Out PDU with TTT != 0xffffffff is sent as a ping 2725 response by the initiator to the NOP-In PDU sent as a ping 2726 request by the target. 2728 8.1.1.3 Control-Type PDUs from the Initiator in the Unregulated and 2729 Unexpected Category 2731 PDUs in the Unregulated and Unexpected category are PDUs with the 2732 immediate flag set. The number of PDUs in this category which can 2733 be sent by an initiator is controlled by the value of 2734 MaxOutstandingUnexpectedPDUs declared by the target. (See section 2735 6.7.) After a PDU in this category is sent by the initiator, it is 2736 outstanding until it is retired. At any time, the number of 2737 outstanding unexpected PDUs MUST NOT exceed the value of 2738 MaxOutstandingUnexpectedPDUs declared by the target. 2740 The target uses the value of MaxOutstandingUnexpectedPDUs that it 2741 declared to determine the amount of buffer resources required for 2742 control-type PDUs in this category that can be sent by an initiator. 2743 For the initiator, for each of the control-type PDUs that can be 2744 sent in this category, the initiator MUST provision for the buffer 2745 resources if required for the corresponding control-type PDU that 2746 can be sent as a response from the target. 2748 An outstanding PDU in this category is retired as follows. If the 2749 CmdSN of the PDU sent by the initiator in this category is x, the 2750 PDU is outstanding until the initiator sends a non-immediate 2751 control-type PDU on the same connection with CmdSN = y (where y is 2752 at least x) and the target responds with a control-type PDU on any 2753 connection where ExpCmdSN is at least y+1. 2755 When the number of outstanding unexpected control-type PDUs equals 2756 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the initiator MUST 2757 NOT generate any unexpected PDUs which otherwise it would have 2758 generated, even if it is intended for immediate delivery. 2760 8.1.2 Flow Control for Control-Type PDUs from the Target 2762 Control-type PDUs that can be sent by a target and are expected by 2763 the initiator are listed in the Regulated category. (See section 2764 8.1.1.1.) 2766 For the control-type PDUs that can be sent by a target and are 2767 unexpected by the initiator, the number is controlled by 2768 MaxOutstandingUnexpectedPDUs declared by the initiator. (See 2769 section 6.7.) After a PDU in this category is sent by a target, it 2770 is outstanding until it is retired. At any time, the number of 2771 outstanding unexpected PDUs MUST NOT exceed the value of 2772 MaxOutstandingUnexpectedPDUs declared by the initiator. The 2773 initiator uses the value of MaxOutstandingUnexpectedPDUs that it 2774 declared to determine the amount of buffer resources required for 2775 control-type PDUs in this category that can be sent by a target. 2776 The following is a list of the PDUs in this category and the 2777 conditions for retiring the outstanding PDU: 2779 a. For an Asynchronous Message PDU with StatSN = x, the PDU is 2780 outstanding until the initiator sends a control-type PDU 2781 with ExpStatSN set to at least x+1. 2783 b. For a Reject PDU with StatSN = x which is sent after a task 2784 is active, the PDU is outstanding until the initiator sends 2785 a control-type PDU with ExpStatSN set to at least x+1. 2787 c. For a NOP-In PDU with ITT = 0xffffffff and StatSN = x, the 2788 PDU is outstanding until the initiator responds with a 2789 control-type PDU on the same connection where ExpStatSN is 2790 at least x+1. But if the NOP-In PDU is sent as a ping 2791 request with TTT != 0xffffffff, the PDU can also be retired 2792 when the initiator sends a NOP-Out PDU with the same ITT and 2793 TTT as in the ping request. Note that when a target sends a 2794 NOP-In PDU as a ping request, it must provision a buffer for 2795 the NOP-Out PDU sent as a ping response from the initiator. 2797 When the number of outstanding unexpected control-type PDUs equals 2798 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the target MUST NOT 2799 generate any unexpected PDUs which otherwise it would have 2800 generated, even if its intent is to indicate an iSCSI error 2801 condition (e.g., Asynchronous Message, Reject). Task timeouts as in 2802 the initiator waiting for a command completion or other connection 2803 and session level exceptions will ensure that correct operational 2804 behavior will result in these cases despite not generating the PDU. 2805 This rule overrides any other requirements elsewhere which require 2806 that a Reject PDU MUST be sent. 2808 (Implementation note: SCSI task timeout and recovery can be a 2809 lengthy process and hence SHOULD be avoided by proper provisioning 2810 of resources.) 2812 (Implementation note: To ensure that the initiator has a means to 2813 inform the target that outstanding PDUs have been retired, the 2814 target should reserve the last unexpected control-type PDU allowable 2815 by the value of MaxOutstandingUnexpectedPDUs declared by the 2816 initiator for sending a NOP-In ping request with TTT != 0xffffffff 2817 to allow the initiator to return the NOP-Out ping response with the 2818 current ExpStatSN.) 2820 8.2 Flow Control for RDMA Read Resources 2822 If iSERHelloRequired is negotiated to "Yes", then the total number 2823 of RDMA Read operations that can be active simultaneously on an 2824 iSCSI/iSER connection depends on the amount of resources allocated 2825 as declared in the iSER Hello exchange described in section 5.1.3. 2826 Exceeding the number of RDMA Read operations allowed on a connection 2827 will result in the connection being terminated by the RCaP layer. 2828 The iSER Layer at the target maintains the iSER-ORD to keep track of 2829 the maximum number of RDMA Read Requests that can be issued by the 2830 iSER Layer on a particular RCaP Stream. 2832 During connection setup (see section 5.1), iSER-IRD is known at the 2833 initiator and iSER-ORD is known at the target after the iSER Layers 2834 at the initiator and the target have respectively allocated the 2835 connection resources necessary to support RCaP, as directed by the 2836 Allocate_Connection_Resources Operational Primitive from the iSCSI 2837 Layer before the end of the iSCSI Login Phase. In the full feature 2838 phase, if iSERHelloRequired is ngtiated to "Yes", then the first 2839 message sent by the initiator is the iSER Hello Message (see section 2840 9.3) which contains the value of iSER-IRD. In response to the iSER 2841 Hello Message, the target sends the iSER HelloReply Message (see 2842 section 9.4) which contains the value of iSER-ORD. The iSER Layer 2843 at both the initiator and the target MAY adjust (lower) the 2844 resources associated with iSER-IRD and iSER-ORD respectively to 2845 match the iSER-ORD value declared in the HelloReply Message. The 2846 iSER Layer at the target MUST flow control the RDMA Read Request 2847 Messages to not exceed the iSER-ORD value at the target. 2849 If iSERHelloRequired is negotiated to "No", then the maximum number 2850 of RDMA Read operations that can be active is negotiated via other 2851 means outside the scope of this document. For example, in 2852 InfiniBand, iSER connection setup uses InfiniBand CM MADs, with 2853 additional iSER information exchanged in the private data. 2855 8.3 STag Management 2857 An STag is an identifier of a Tagged Buffer used in an RDMA 2858 operation. The allocation and the subsequent invalidation of the 2859 STags are specified in this document if the STags are exposed on the 2860 wire by being Advertised in the iSER header or declared in the 2861 header of an RCaP Message. 2863 8.3.1 Allocation of STags 2865 When the iSCSI Layer at the initiator invokes the Send_Control 2866 Operational Primitive to request the iSER Layer at the initiator to 2867 process a SCSI Command, zero, one, or two STags may be allocated by 2868 the iSER Layer. See section 7.3.1 for details. The number of STags 2869 allocated depends on whether the command is unidirectional or 2870 bidirectional and whether solicited write data transfer is involved 2871 or not. 2873 When the iSCSI Layer at the initiator invokes the Send_Control 2874 Operational Primitive to request the iSER Layer at the initiator to 2875 process a Task Management Function Request with the TASK REASSIGN 2876 function, besides allocating zero, one, or two STags, the iSER Layer 2877 MUST invalidate the existing STags (if any) associated with the ITT. 2878 See section 7.3.3 for details. 2880 The iSER Layer at the target allocates a local Data Sink STag when 2881 the iSCSI Layer at the target invokes the Get_Data Operational 2882 Primitive to request the iSER Layer to process an R2T PDU. See 2883 section 7.3.6 for details. 2885 8.3.2 Invalidation of STags 2887 The invalidation of the STags at the initiator at the completion of 2888 a unidirectional or bidirectional command when the associated SCSI 2889 Response PDU is sent by the target is described in section 7.3.2. 2891 When a unidirectional or bidirectional command concludes without the 2892 associated SCSI Response PDU being sent by the target, the iSCSI 2893 Layer at the initiator MUST request the iSER Layer at the initiator 2894 to invalidate the STags by invoking the Deallocate_Task_Resources 2895 Operational Primitive qualified with ITT. In response, the iSER 2896 Layer at the initiator MUST locate the STags (if any) in the Local 2897 Mapping. The iSER Layer at the initiator MUST invalidate the STags 2898 (if any) and the Local Mapping. 2900 For an RDMA Read operation used to realize a SCSI Write data 2901 transfer, the iSER Layer at the target SHOULD invalidate the Data 2902 Sink STag at the conclusion of the RDMA Read operation referencing 2903 the Data Sink STag (to permit the immediate reuse of buffer 2904 resources). 2906 For an RDMA Write operation used to realize a SCSI Read data 2907 transfer, the Data Source STag at the target is not declared to the 2908 initiator and is not exposed on the wire. Invalidation of the STag 2909 is thus not specified. 2911 When a unidirectional or bidirectional command concludes without the 2912 associated SCSI Response PDU being sent by the target, the iSCSI 2913 Layer at the target MUST request the iSER Layer at the target to 2914 invalidate the STags by invoking the Deallocate_Task_Resources 2915 Operational Primitive qualified with ITT. In response, the iSER 2916 Layer at the target MUST locate the local STags (if any) in the 2917 Local Mapping. The iSER Layer at the target MUST invalidate the 2918 local STags (if any) and the Local Mapping. 2920 9 iSER Control and Data Transfer 2922 For iSCSI data-type PDUs (see section 7.1), the iSER Layer uses RDMA 2923 Read and RDMA Write operations to transfer the solicited data. For 2924 iSCSI control-type PDUs (see section 7.2), the iSER Layer uses Send 2925 Messages of RCaP. 2927 9.1 iSER Header Format 2929 An iSER header MUST be present in every Send Message of RCaP. The 2930 iSER header is located in the first 28 bytes of the message payload 2931 of the Send Message of RCaP, as shown in Figure 2. 2933 0 1 2 3 2934 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2936 | Opcode| Opcode Specific Fields | 2937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2938 | Opcode Specific Fields (32 bits) | 2939 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2940 | | 2941 | Opcode Specific Fields (64 bits) | 2942 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2943 | Opcode Specific Fields (32 bits) | 2944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2945 | | 2946 | Opcode Specific Fields (64 bits) | 2947 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2949 Figure 2 iSER Header Format 2951 Opcode - Operation Code: 4 bits 2953 The Opcode field identifies the type of iSER Messages: 2955 0001b = iSCSI control-type PDU 2957 0010b = iSER Hello Message 2959 0011b = iSER HelloReply Message 2961 All other opcodes are reserved. 2963 9.2 iSER Header Format for iSCSI Control-Type PDU 2965 The iSER Layer uses Send Messages of RCaP to transfer iSCSI control- 2966 type PDUs (see section 7.2). The message payload of each of the 2967 Send Messages of RCaP used for transferring an iSER Message contains 2968 an iSER Header followed by an iSCSI control-type PDU. 2970 The iSER header in a Send Message of RCaP carrying an iSCSI control- 2971 type PDU MUST have the format as described in Figure 3. 2973 0 1 2 3 2974 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2975 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2976 | |W|R| | 2977 | 0001b |S|S| Reserved | 2978 | |V|V| | 2979 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2980 | Write STag | 2981 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2982 | | 2983 | Write Base Offset | 2984 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2985 | Read STag | 2986 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2987 | | 2988 | Read Base Offset | 2989 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2990 Figure 3 iSER Header Format for iSCSI Control-Type PDU 2992 WSV - Write STag Valid flag: 1 bit 2994 This flag indicates the validity of the Write STag field and 2995 the Write Base Offset field of the iSER Header. If set to one, 2996 the Write STag field and the Write Base Offset field in this 2997 iSER Header are valid. If set to zero, the Write STag field 2998 and the Write Base Offset field in this iSER Header MUST be 2999 ignored at the receiver. The Write STag Valid flag is set to 3000 one when there is solicited data to be transferred for a SCSI 3001 Write or bidirectional command, or when there are non-immediate 3002 unsolicited and solicited data to be transferred for the 3003 referenced task specified in a Task Management Function Request 3004 with the TASK REASSIGN function. 3006 RSV - Read STag Valid flag: 1 bit 3008 This flag indicates the validity of the Read STag field and the 3009 Read Base Offset field of the iSER Header. If set to one, the 3010 Read STag field and the Read Base Offset field in this iSER 3011 Header is valid. If set to zero, the Read STag field and the 3012 Read Base Offset field in this iSER Header MUST be ignored at 3013 the receiver. The Read STag Valid flag is set to one for a 3014 SCSI Read or bidirectional command, or a Task Management 3015 Function Request with the TASK REASSIGN function. 3017 Write STag - Write Steering Tag: 32 bits 3019 This field contains the Write STag when the Write STag Valid 3020 flag is set to one. For a SCSI Write or bidirectional command, 3021 the Write STag is used to Advertise the initiator's I/O Buffer 3022 containing the solicited data. For a Task Management Function 3023 Request with the TASK REASSIGN function, the Write STag is used 3024 to Advertise the initiator's I/O Buffer containing the non- 3025 immediate unsolicited data and solicited data. This Write STag 3026 is used as the Data Source STag in the resultant RDMA Read 3027 operation(s). When the Write STag Valid flag is set to zero, 3028 this field MUST be set to zero and ignored on receive. 3030 Write Base Offset: 64 bits 3032 This field contains the Base Offset associated with the I/O 3033 Buffer for the SCSI Write command when the Write STag Valid 3034 flag is set to one. When the Write STag Valid flag is set to 3035 zero, this field MUST be set to zero and ignored on receive. 3037 Read STag - Read Steering Tag: 32 bits 3039 This field contains the Read STag when the Read STag Valid flag 3040 is set to one. The Read STag is used to Advertise the 3041 initiator's Read I/O Buffer of a SCSI Read or bidirectional 3042 command, or a Task Management Function Request with the TASK 3043 REASSIGN function. This Read STag is used as the Data Sink 3044 STag in the resultant RDMA Write operation(s). When the Read 3045 STag Valid flag is zero, this field MUST be set to zero and 3046 ignored on receive. 3048 Read Base Offset: 64 bits 3050 This field contains the Base Offset associated with the I/O 3051 Buffer for the SCSI Read command when the Read STag Valid flag 3052 is set to one. When the Read STag Valid flag is set to zero, 3053 this field MUST be set to zero and ignored on receive. 3055 Reserved: 3057 Reserved fields MUST be set to zero on transmit and MUST be 3058 ignored on receive. 3060 9.3 iSER Header Format for iSER Hello Message 3062 An iSER Hello Message MUST only contain the iSER header which MUST 3063 have the format as described in Figure 4. If iSERHelloRequired is 3064 negotiated to "Yes", then iSER Hello Message is the first iSER 3065 Message sent on the RCaP Stream from the iSER Layer at the initiator 3066 to the iSER Layer at the target. 3068 0 1 2 3 3069 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3070 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3071 | | | | | | 3072 | 0010b | Rsvd | MaxVer| MinVer| iSER-IRD | 3073 | | | | | | 3074 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3075 | Reserved | 3076 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3077 | | 3078 | Reserved | 3079 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3080 | Reserved | 3081 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3082 | | 3083 | Reserved | 3084 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3086 Figure 4 iSER Header Format for iSER Hello Message 3088 MaxVer - Maximum Version: 4 bits 3090 This field specifies the maximum version of the iSER protocol 3091 supported. It MUST be set to 10 to indicate the version of the 3092 specification described in this document. 3094 MinVer - Minimum Version: 4 bits 3096 This field specifies the minimum version of the iSER protocol 3097 supported. It MUST be set to 10 to indicate the version of the 3098 specification described in this document. 3100 iSER-IRD: 16 bits 3102 This field contains the value of the iSER-IRD at the initiator. 3104 Reserved (Rsvd): 3106 Reserved fields MUST be set to zero on transmit, and MUST be 3107 ignored on receive. 3109 9.4 iSER Header Format for iSER HelloReply Message 3111 An iSER HelloReply Message MUST only contain the iSER header which 3112 MUST have the format as described in Figure 5. If iSERHelloRequired 3113 is negotiated to "Yes", then the iSER HelloReply Message is the 3114 first iSER Message sent on the RCaP Stream from the iSER Layer at 3115 the target to the iSER Layer at the initiator. 3117 0 1 2 3 3118 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3119 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3120 | | |R| | | | 3121 | 0011b |Rsvd |E| MaxVer| CurVer| iSER-ORD | 3122 | | |J| | | | 3123 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3124 | Reserved | 3125 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3126 | | 3127 | Reserved | 3128 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3129 | Reserved | 3130 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3131 | | 3132 | Reserved | 3133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3135 Figure 5 iSER Header Format for iSER HelloReply Message 3137 REJ - Reject flag: 1 bit 3139 This flag indicates whether the target is rejecting this 3140 connection. If set to one, the target is rejecting the 3141 connection. 3143 MaxVer - Maximum Version: 4 bits 3145 This field specifies the maximum version of the iSER protocol 3146 supported. It MUST be set to 10 to indicate the version of the 3147 specification described in this document. 3149 CurVer - Current Version: 4 bits 3150 This field specifies the current version of the iSER protocol 3151 supported. It MUST be set to 10 to indicate the version of the 3152 specification described in this document. 3154 iSER-ORD: 16 bits 3156 This field contains the value of the iSER-ORD at the target. 3158 Reserved (Rsvd): 3160 Reserved fields MUST be set to zero on transmit, and MUST be 3161 ignored on receive. 3163 9.5 SCSI Data Transfer Operations 3165 The iSER Layer at the initiator and the iSER Layer at the target 3166 handle each SCSI Write, SCSI Read, and bidirectional operation as 3167 described below. 3169 9.5.1 SCSI Write Operation 3171 The iSCSI Layer at the initiator MUST invoke the Send_Control 3172 Operational Primitive to request the iSER Layer at the initiator to 3173 send the SCSI Write Command. The iSER Layer at the initiator MUST 3174 request the RCaP layer to transmit a Send Message with the message 3175 payload consisting of the iSER header followed by the SCSI Command 3176 PDU and immediate data (if any). The SendSE Message should be used 3177 if supported by the RCaP layer (e.g., iWARP). If there is solicited 3178 data, the iSER Layer MUST Advertise the Write STag and the Base 3179 Offset in the iSER header of the Send Message, as described in 3180 section 9.2. Upon receiving the Send Message, the iSER Layer at the 3181 target MUST notify the iSCSI Layer at the target by invoking the 3182 Control_Notify Operational Primitive qualified with the SCSI Command 3183 PDU. See section 7.3.1 for details on the handling of the SCSI 3184 Write Command. 3186 For the non-immediate unsolicited data, the iSCSI Layer at the 3187 initiator MUST invoke a Send_Control Operational Primitive qualified 3188 with the SCSI Data-out PDU. Upon receiving each Send Message 3189 containing the non-immediate unsolicited data, the iSER Layer at the 3190 target MUST notify the iSCSI Layer at the target by invoking the 3191 Control_Notify Operational Primitive qualified with the SCSI Data- 3192 out PDU. See section 7.3.4 for details on the handling of the SCSI 3193 Data-out PDU. 3195 For the solicited data, when the iSCSI Layer at the target has an 3196 I/O Buffer available, it MUST invoke the Get_Data Operational 3197 Primitive qualified with the R2T PDU. See section 7.3.6 for details 3198 on the handling of the R2T PDU. 3200 When the data transfer associated with this SCSI Write operation is 3201 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3202 Operational Primitive when it is ready to send the SCSI Response 3203 PDU. Upon receiving a Send Message containing the SCSI Response 3204 PDU, the iSER Layer at the initiator MUST notify the iSCSI Layer at 3205 the initiator by invoking the Control_Notify Operational Primitive 3206 qualified with the SCSI Response PDU. See section 7.3.2 for details 3207 on the handling of the SCSI Response PDU. 3209 9.5.2 SCSI Read Operation 3211 The iSCSI Layer at the initiator MUST invoke the Send_Control 3212 Operational Primitive to request the iSER Layer at the initiator to 3213 send the SCSI Read Command. The iSER Layer at the initiator MUST 3214 request the RCaP layer to transmit a Send Message with the message 3215 payload consisting of the iSER header followed by the SCSI Command 3216 PDU. The SendSE Message should be used if supported by the RCaP 3217 layer (e.g., iWARP). The iSER Layer at the initiator MUST Advertise 3218 the Read STag and the Base Offset in the iSER header of the Send 3219 Message, as described in section 9.2. Upon receiving the Send 3220 Message, the iSER Layer at the target MUST notify the iSCSI Layer at 3221 the target by invoking the Control_Notify Operational Primitive 3222 qualified with the SCSI Command PDU. See section 7.3.1 for details 3223 on the handling of the SCSI Read Command. 3225 When the requested SCSI data is available in the I/O Buffer, the 3226 iSCSI Layer at the target MUST invoke the Put_Data Operational 3227 Primitive qualified with the SCSI Data-in PDU. See section 7.3.5 3228 for details on the handling of the SCSI Data-in PDU. 3230 When the data transfer associated with this SCSI Read operation is 3231 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3232 Operational Primitive when it is ready to send the SCSI Response 3233 PDU. The SendInvSE Message should be used if supported by the RCaP 3234 layer (e.g., iWARP). Upon receiving the Send Message containing the 3235 SCSI Response PDU, the iSER Layer at the initiator MUST notify the 3236 iSCSI Layer at the initiator by invoking the Control_Notify 3237 Operational Primitive qualified with the SCSI Response PDU. See 3238 section 7.3.2 for details on the handling of the SCSI Response PDU. 3240 9.5.3 Bidirectional Operation 3242 The initiator and the target handle the SCSI Write and the SCSI Read 3243 portions of this bidirectional operation the same as described in 3244 Section 9.5.1 and Section 9.5.2 respectively. 3246 10 iSER Error Handling and Recovery 3248 RCaP provides the iSER Layer with reliable in-order delivery. 3249 Therefore, the error management needs of an iSER-assisted connection 3250 are somewhat different than those of a Traditional iSCSI connection. 3252 10.1 Error Handling 3254 iSER error handling is described in the following sections, 3255 classified loosely based on the sources of errors: 3257 1. Those originating at the transport layer (e.g., TCP). 3259 2. Those originating at the RCaP layer. 3261 3. Those originating at the iSER Layer. 3263 4. Those originating at the iSCSI Layer. 3265 10.1.1 Errors in the Transport Layer 3267 If the transport layer is TCP, then TCP packets with detected errors 3268 are silently dropped by the TCP layer and result in retransmission 3269 at the TCP layer. This has no impact on the iSER Layer. However, 3270 connection loss (e.g., link failure) and unexpected termination 3271 (e.g., TCP graceful or abnormal close without the iSCSI Logout 3272 exchanges) at the transport layer will cause the iSCSI/iSER 3273 connection to be terminated as well. 3275 10.1.1.1 Failure in the Transport Layer Before RCaP Mode is Enabled 3277 If the Connection is lost or terminated before the iSCSI Layer 3278 invokes the Allocate_Connection_Resources Operational Primitive, the 3279 login process is terminated and no further action is required. 3281 If the Connection is lost or terminated after the iSCSI Layer has 3282 invoked the Allocate_Connection_Resources Operational Primitive, 3283 then the iSCSI Layer MUST request the iSER Layer to deallocate all 3284 connection resources by invoking the Deallocate_Connection_Resources 3285 Operational Primitive. 3287 10.1.1.2 Failure in the Transport Layer After RCaP Mode is Enabled 3289 If the Connection is lost or terminated after the iSCSI Layer has 3290 invoked the Enable_Datamover Operational Primitive, the iSER Layer 3291 MUST notify the iSCSI Layer of the connection loss by invoking the 3292 Connection_Terminate_Notify Operational Primitive. Prior to 3293 invoking the Connection_Terminate_Notify Operational Primitive, the 3294 iSER layer MUST perform the actions described in Section 5.2.3.2. 3296 10.1.2 Errors in the RCaP Layer 3298 The RCaP layer does not have error recovery operations built in. If 3299 errors are detected at the RCaP layer, the RCaP layer will terminate 3300 the RCaP Stream and the associated Connection. 3302 10.1.2.1 Errors Detected in the Local RCaP Layer 3304 If an error is encountered at the local RCaP layer, the RCaP layer 3305 MAY send a Send Message to the Remote Peer to report the error if 3306 possible. (For iWARP, see [RDMAP] for the list of errors where a 3307 Terminate Message is sent.) The RCaP layer is responsible for 3308 terminating the Connection. After the RCaP layer notifies the iSER 3309 Layer that the Connection is terminated, the iSER Layer MUST notify 3310 the iSCSI Layer by invoking the Connection_Terminate_Notify 3311 Operational Primitive. Prior to invoking the Connection Terminate 3312 Notify Operational Primitive, the iSER layer MUST perform the 3313 actions described in Section 5.2.3.2. 3315 10.1.2.2 Errors Detected in the RCaP Layer at the Remote Peer 3317 If an error is encountered at the RCaP layer at the Remote Peer, the 3318 RCaP layer at the Remote Peer may send a Send Message to report the 3319 error if possible. If it is unable to send a Send Message, the 3320 Connection is terminated. This is treated the same as a failure in 3321 the transport layer after RDMA is enabled as described in section 3322 10.1.1.2. 3324 If an error is encountered at the RCaP layer at the Remote Peer and 3325 it is able to send a Send Message, the RCaP layer at the Remote Peer 3326 is responsible for terminating the connection. After the local RCaP 3327 layer notifies the iSER Layer that the Connection is terminated, the 3328 iSER Layer MUST notify the iSCSI Layer by invoking the Connection 3329 Terminate Notify Operational Primitive. Prior to invoking the 3330 Connection_Terminate_Notify Operational Primitive, the iSER layer 3331 MUST perform the actions described in Section 5.2.3.2. 3333 10.1.3 Errors in the iSER Layer 3335 The error handling due to errors at the iSER Layer is described in 3336 the following sections. 3338 10.1.3.1 Insufficient Connection Resources to Support RCaP at 3339 Connection Setup 3341 After the iSCSI Layer at the initiator invokes the 3342 Allocate_Connection_Resources Operational Primitive during the iSCSI 3343 login negotiation phase, if the iSER Layer at the initiator fails to 3344 allocate the connection resources necessary to support RCaP, it MUST 3345 return a status of failure to the iSCSI Layer at the initiator. The 3346 iSCSI Layer at the initiator MUST terminate the Connection as 3347 described in Section 5.2.3.1. 3349 After the iSCSI Layer at the target invokes the 3350 Allocate_Connection_Resources Operational Primitive during the iSCSI 3351 login negotiation phase, if the iSER Layer at the target fails to 3352 allocate the connection resources necessary to support RCaP, it MUST 3353 return a status of failure to the iSCSI Layer at the target. The 3354 iSCSI Layer at the target MUST send a Login Response with a status 3355 class of 3 (Target Error), and a status code of "0302" (Out of 3356 Resources). The iSCSI Layers at the initiator and the target MUST 3357 terminate the Connection as described in Section 5.2.3.1. 3359 10.1.3.2 iSER Negotiation Failures 3361 If iSERHelloRequired is negotiated to "Yes" and the RCaP or iSER 3362 related parameters declared by the initiator in the iSER Hello 3363 Message is unacceptable to the iSER Layer at the target, the iSER 3364 Layer at the target MUST set the Reject (REJ) flag, as described in 3365 section 9.4, in the iSER HelloReply Message. The following are the 3366 cases when the iSER Layer MUST set the REJ flag to 1 in the 3367 HelloReply Message: 3369 * The initiator-declared iSER-IRD value is greater than 0 and the 3370 target-declared iSER-ORD value is 0. 3372 * The initiator-supported and the target-supported iSER protocol 3373 versions do not overlap. 3375 After requesting the RCaP layer to send the iSER HelloReply Message, 3376 the handling of the error situation is the same as that for iSER 3377 format errors as described in section 10.1.3.3. 3379 10.1.3.3 iSER Format Errors 3381 The following types of errors in an iSER header are considered 3382 format errors: 3384 * Illegal contents of any iSER header field 3385 * Inconsistent field contents in an iSER header 3387 * Length error for an iSER Hello or HelloReply Message (see section 3388 9.3 and 9.4) 3390 When a format error is detected, the following events MUST occur in 3391 the specified sequence: 3393 1. The iSER Layer MUST request the RCaP layer to terminate the RCaP 3394 Stream. The RCaP layer MUST terminate the associated 3395 Connection. 3397 2. The iSER Layer MUST notify the iSCSI Layer of the connection 3398 termination by invoking the Connection_Terminate_Notify 3399 Operational Primitive. Prior to invoking the 3400 Connection_Terminate_Notify Operational Primitive, the iSER 3401 layer MUST perform the actions described in Section 5.2.3.2. 3403 10.1.3.4 iSER Protocol Errors 3405 If iSERHelloRequired is negotiated to "Yes", then the first iSER 3406 Message sent by the iSER Layer at the initiator MUST be the iSER 3407 Hello Message (see section 9.3). In this case the first iSER 3408 Message sent by the iSER Layer at the target MUST be the iSER 3409 HelloReply Message (see section 9.4). Failure to send the iSER 3410 Hello or HelloReply Message, as indicated by the wrong Opcode in the 3411 iSER header, is a protocol error. Conversely, if the iSER Hello 3412 Message is sent by the iSER Layer at the initiator when 3413 iSERHelloRequired is negotiated to "No", the iSER Layer at the 3414 target MAY treat this as a protocol error or respond with an iSER 3415 HelloReply Message. The handling of iSER protocol errors is the 3416 same as that for iSER format errors as described in section 3417 10.1.3.3. 3419 If the sending side of an iSER-enabled connection acts in a manner 3420 not permitted by the negotiated or declared login/text operational 3421 key values as described in section 6, this is a protocol error and 3422 the receiving side MAY handle this the same as for iSER format 3423 errors as described in section 10.1.3.3. 3425 10.1.4 Errors in the iSCSI Layer 3427 The error handling due to errors at the iSCSI Layer is described in 3428 the following sections. For error recovery, see section 10.2. 3430 10.1.4.1 iSCSI Format Errors 3432 When an iSCSI format error is detected, the iSCSI Layer MUST request 3433 the iSER Layer to terminate the RCaP Stream by invoking the 3434 Connection_Terminate Operational Primitive. For more details on the 3435 connection termination, see Section 5.2.3.1. 3437 10.1.4.2 iSCSI Digest Errors 3439 In the iSER-assisted mode, the iSCSI Layer will not see any digest 3440 error because both the HeaderDigest and the DataDigest keys are 3441 negotiated to "None". 3443 10.1.4.3 iSCSI Sequence Errors 3445 For Traditional iSCSI, sequence errors are caused by dropped PDUs 3446 due to header or data digest errors. Since digests are not used in 3447 iSER-assisted mode and the RCaP layer will deliver all messages in 3448 the order they were sent, sequence errors will not occur in iSER- 3449 assisted mode. 3451 10.1.4.4 iSCSI Protocol Error 3453 When the iSCSI Layer handles certain protocol errors by dropping the 3454 connection, the error handling is the same as that for iSCSI format 3455 errors as described in section 10.1.4.1. 3457 When the iSCSI Layer uses the iSCSI Reject PDU and response codes to 3458 handle certain other protocol errors, no special handling at the 3459 iSER Layer is required. 3461 10.1.4.5 SCSI Timeouts and Session Errors 3463 This is handled at the iSCSI Layer and no special handling at the 3464 iSER Layer is required. 3466 10.1.4.6 iSCSI Negotiation Failures 3468 For negotiation failures that happen during the Login Phase at the 3469 initiator after the iSCSI Layer has invoked the 3470 Allocate_Connection_Resources Operational Primitive and before the 3471 Enable_Datamover Operational Primitive has been invoked, the iSCSI 3472 Layer MUST request the iSER Layer to deallocate all connection 3473 resources by invoking the Deallocate_Connection_Resources 3474 Operational Primitive. The iSCSI Layer at the initiator MUST 3475 terminate the Connection. 3477 For negotiation failures during the Login Phase at the target, the 3478 iSCSI Layer can use a Login Response with a status class other than 3479 0 (success) to terminate the Login Phase. If the iSCSI Layer has 3480 invoked the Allocate_Connection_Resources Operational Primitive and 3481 before the Enable_Datamover Operational Primitive has been invoked, 3482 the iSCSI Layer at the target MUST request the iSER Layer at the 3483 target to deallocate all connection resources by invoking the 3484 Deallocate_Connection_Resources Operational Primitive. The iSCSI 3485 Layer at both the initiator and the target MUST terminate the 3486 Connection. 3488 During the iSCSI Login Phase, if the iSCSI Layer at the initiator 3489 receives a Login Response from the target with a status class other 3490 than 0 (Success) after the iSCSI Layer at the initiator has invoked 3491 the Allocate_Connection_Resources Operational Primitive, the iSCSI 3492 Layer MUST request the iSER Layer to deallocate all connection 3493 resources by invoking the Deallocate_Connection_Resources 3494 Operational Primitive. The iSCSI Layer MUST terminate the 3495 Connection in this case. 3497 For negotiation failures during the full feature phase, the error 3498 handling is left to the iSCSI Layer and no special handling at the 3499 iSER Layer is required. 3501 10.2 Error Recovery 3503 Error recovery requirements of iSCSI/iSER are the same as that of 3504 Traditional iSCSI. All three ErrorRecoveryLevels as defined in 3505 [iSCSI] are supported in iSCSI/iSER. 3507 * For ErrorRecoveryLevel 0, session recovery is handled by iSCSI 3508 and no special handling by the iSER Layer is required. 3510 * For ErrorRecoveryLevel 1, see section 10.2.1 on PDU Recovery. 3512 * For ErrorRecoveryLevel 2, see section 10.2.2 on Connection 3513 Recovery. 3515 The iSCSI Layer may invoke the Notice_Key_Values Operational 3516 Primitive during connection setup to request the iSER Layer to take 3517 note of the value of the operational ErrorRecoveryLevel, as 3518 described in sections 5.1.1 and 5.1.2. 3520 10.2.1 PDU Recovery 3522 As described in sections 10.1.4.2 and 10.1.4.3, digest and sequence 3523 errors will not occur in the iSER-assisted mode. If the RCaP layer 3524 detects an error, it will close the iSCSI/iSER connection, as 3525 described in section 10.1.2. Therefore, PDU recovery is not useful 3526 in the iSER-assisted mode. 3528 The iSCSI Layer at the initiator SHOULD disable iSCSI timeout-driven 3529 PDU retransmissions. 3531 10.2.2 Connection Recovery 3533 The iSCSI Layer at the initiator MAY reassign connection allegiance 3534 for non-immediate commands which are still in progress and are 3535 associated with the failed connection by using a Task Management 3536 Function Request with the TASK REASSIGN function. See section 7.3.3 3537 for more details. 3539 When the iSCSI Layer at the initiator does a task reassignment for a 3540 SCSI Write command, it MUST qualify the Send_Control Operational 3541 Primitive invocation with DataDescriptorOut which defines the I/O 3542 Buffer for both the non-immediate unsolicited data and the solicited 3543 data. This allows the iSCSI Layer at the target to use recovery 3544 R2Ts to request for data originally sent as unsolicited and 3545 solicited from the initiator. 3547 When the iSCSI Layer at the target accepts a reassignment request 3548 for a SCSI Read command, it MUST request the iSER Layer to process 3549 SCSI Data-in for all unacknowledged data by invoking the Put_Data 3550 Operational Primitive. See section 7.3.5 on the handling of SCSI 3551 Data-in. 3553 When the iSCSI Layer at the target accepts a reassignment request 3554 for a SCSI Write command, it MUST request the iSER Layer to process 3555 a recovery R2T for any non-immediate unsolicited data and any 3556 solicited data sequences that have not been received by invoking the 3557 Get_Data Operational Primitive. See section 7.3.6 on the handling 3558 of Ready To Transfer (R2T). 3560 The iSCSI Layer at the target MUST NOT issue recovery R2Ts on an 3561 iSCSI/iSER connection for a task for which the connection allegiance 3562 was never reassigned. The iSER Layer at the target MAY reject such 3563 a recovery R2T received via the Get_Data Operational Primitive 3564 invocation from the iSCSI Layer at the target, with an appropriate 3565 error code. 3567 The iSER Layer at the target will process the requests invoked by 3568 the Put_Data and Get_Data Operational Primitives for a reassigned 3569 task in the same way as for the original commands. 3571 11 Security Considerations 3573 When iSER is layered on top of an RCaP layer and provides the RDMA 3574 extensions to the iSCSI protocol, the security considerations of 3575 iSER are the same as that of the underlying RCaP layer. For iWARP, 3576 this is described in [RDMAP] and [RDDPSEC], plus the updates to both 3577 of those RFCs that are contained in [IPSEC-IPS]. 3579 Since iSER-assisted iSCSI protocol is still functionally iSCSI from 3580 a security considerations perspective, all of the iSCSI security 3581 requirements as described in [iSCSI] applies. If iSER is layered on 3582 top of a non-IP based RCaP layer, all the security protocol 3583 mechanisms applicable to that RCaP layer is also applicable to an 3584 iSCSI/iSER connection. If iSER is layered on top of a non-IP 3585 protocol, the IPsec mechanism as specified in [iSCSI] MUST be 3586 implemented at any point where the iSER protocol enters the IP 3587 network (e.g., via gateways), and the non-IP protocol SHOULD 3588 implement (optional to use) a packet by packet security protocol 3589 equal in strength to the IPsec mechanism specified by [iSCSI]. 3591 In order to protect target RCaP connection resources from possible 3592 resource exhaustion attacks, allocation of such resources for a new 3593 connection MUST be delayed until it is reasonably certain that the 3594 new connection is not part of a resource exhaustion attack (e.g., 3595 until after the SecurityNegotiation stage of Login), see section 3596 5.1.2. 3598 A valid STag exposes I/O Buffer resources to the network for access 3599 via the RCaP. The security measures for the RCAP and iSER described 3600 in the above paragraphs can be used to protect data in an I/O buffer 3601 from undesired disclosure or modification, and these measures are of 3602 heightened importance for implementations that retain (e.g., cache) 3603 STags for use in multiple tasks (e.g., iSCSI I/O operations) because 3604 the resources are exposed to the network for a longer period of 3605 time. 3607 A complementary means of controlling I/O Buffer resource exposure is 3608 invalidation of the STag after completion of the associated task, 3609 which is RECOMMENDED in Section 2.5.1. The use of Send with 3610 Invalidate messages (which cause remote STag invalidation) is 3611 OPTIONAL, therefore the iSER layer MUST NOT rely on use of a Send 3612 with Invalidate by its Remote Peer to cause local STag invalidation. 3613 If an STag is expected to be invalid after completion of a task, the 3614 iSER layer MUST check the STag and invalidate it if it is still 3615 valid. 3617 12 IANA Considerations 3619 IANA is requested to add the following entries to the "iSCSI 3620 Login/Text Keys" registry of "iSCSI Parameters": 3622 MaxAHSLength, [RFCXXXX] 3624 TaggedBufferForSolicitedDataOnly, [RFCXXXX] 3626 iSERHelloRequired, [RFCXXXX] 3628 RFC Editor: Please replace XXXX in all instances of [RFCXXXX] above 3629 with the RFC number of this document and remove this note. 3631 IANA is requested to update the following entries in the "iSCSI 3632 Login/Text Keys" registry of "iSCSI Parameters" to reference the RFC 3633 number of this draft when it is published as an RFC. 3635 InitiatorRecvDataSegmentLength 3637 MaxOutstandingUnexpectedPDUs 3639 RDMAExtensions 3641 TargetRecvDataSegmentLength 3643 IANA is also requested to change the RFC5046 reference for the iSCSI 3644 Login/Text Keys registry to the RFC number of this document. 3646 IANA is requested to update the registrations of the iSER Opcodes 1- 3647 3 in the iSER Opcodes registry to reference the RFC number of this 3648 draft when it is published as an RFC. 3650 13 References 3652 13.1 Normative References 3654 [RFC5046] M. Ko et al., "iSCSI Externsions for Remote Direct Memory 3655 Access", RFC 5046, October 2007 3657 [iSCSI] Chadalapaka et al., "iSCSI Protocol (Consolidated)", draft- 3658 ietf-storm-iscsi-cons-08.txt (work in progress), January 2013 3660 [RDMAP] R. Recio et al., "An RDMA Protocol Specification", RFC 5040, 3661 October 2007 3663 [DDP] H. Shah et al., "Direct Data Placement over Reliable 3664 Transports", RFC 5041, October 2007 3666 [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP 3667 Specification", RFC 5044, October 2007 3669 [RDDPSEC] J. Pinkerton et al., "DDP/RDMAP Security", RFC 5042, 3670 October 2007 3672 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 3673 September 1981 3675 [RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate 3676 Requirement Levels", BCP 14, RFC 2119, March 1997 3678 [IPS-IPSEC] D. Black et al., "IP Storage: IPsec Requirements Update 3679 for IPsec v3", draft-ietf-storm-ipsec-ips-update-00 (work in 3680 progress), June 2013 3682 13.2 Informative References 3684 [SAM5] T10/2104D rev r04, SCSI Architecture Model - 5 (SAM-5), 3685 Committee Draft. 3687 [iSCSI-SAM] F. Knight et al., "Internet Small Computer Systems 3688 Interface (iSCSI) SCSI Architecture Features Update", draft- 3689 ietf-storm-iscsi-sam-04.txt (work in progress), August 2011 3691 [DA] M. Chadalapaka et al., "Datamover Architecture for iSCSI", RFC 3692 5047, October 2007 3694 [IB] InfiniBand Architecture Specification Volume 1 Release 1.2, 3695 October 2004 3697 [IPoIB] H.K. Chu et al, "Transmission of IP over InfiniBand", RFC 3698 4391, March 2006 3700 14 Appendix A: Summary of Changes from RFC 5046 3702 All changes are backward compatible with RFC 5046 except for item #8 3703 which reflects all known implementations of iSER, each of which has 3704 implemented this change, despite its absence in RFC 5046. As a 3705 result, a hypothetical implementation based on RFC 5046 will not 3706 interoperate with an implementation based on this version of the 3707 specification. 3709 1. Removed the requirement that a connection be opened in "normal" 3710 TCP mode and transitioned to zero-copy mode. This allows the spec 3711 to conform to existing implementation for both Infiniband and 3712 iWARP. Changes were made in sections 2, 3.1.6, 4.2, 5.1, 5.1.1, 3713 5.1.2, 5.1.3, 10.1.3.4, and 11. 3715 2. Added a clause in section 6.2 to clarify that 3716 MaxRecvDataSegmentLength must be ignored if it is declared in the 3717 Login Phase. 3719 3. Added a clause in section 6.2 to clarify that the initiator must 3720 not send more than InitiatorMaxRecvDataSegmentLength worth of data 3721 when a NOP-Out request is sent with a valid Initiator Task Tag. 3722 Since InitiatorMaxRecvDataSegmentLength can be smaller than 3723 TargetMaxRecvDataSegmentLength, returning the original data in the 3724 NOP-Out request in this situation can overflow the receive buffer 3725 unless the length of the data sent with the NOP-Out request is 3726 less than InitiatorMaxRecvDataSegmentLength. 3728 4. Added a SHOULD negotiate recommendation for 3729 MaxOutstandingUnexpectedPDUs in section 6.7. 3731 5. Added MaxAHSLength key in section 6.8 to set a limit on the AHS 3732 Length. This is useful when posting receive buffers in knowing 3733 what the maximum possible message length is in a PDU which 3734 contains AHS. 3736 6. Added TaggedBufferForSolicitedDataOnly key in section 6.9 to 3737 indicate how the memory region will be used. An initiator can 3738 treat the memory regions intended for unsolicited and solicited 3739 data differently, and can use different registration modes. In 3740 contrast, RFC 5046 treats the memory occupied by the data as a 3741 contiguous (or virtually contiguous, by means of scatter-gather 3742 mechanisms) and homogenous region. Adding a new key will allow 3743 different memory models to be accommodated. Changes were also 3744 made in section 7.3.1. 3746 7. Added iSERHelloRequired key in section 6.10 to allow an initiator 3747 to allocate connection resources after the login process by 3748 requiring the use of the iSER Hello messages before sending iSCSI 3749 PDUs. The default is "No" since iSER Hello messages have not been 3750 implemented and are not in use. Changes were made in sections 3751 5.1.1, 5.1.2, 5.1.3, 8.2, 9.3, 9.4, 10.1.3.2 and 10.1.3.4. 3753 8. Added two 64-bit fields in iSER header in section 9.2 for the Read 3754 Base Offset and the Write Base Offset to accommodate a non-zero 3755 Base Offset. This allows one implementation such as the OFED 3756 stack to be used in both the Infiniband and the iWARP environment. 3757 Changes were made in the definition of Base Offset, Advertisement, 3758 and Tagged Buffer. Changes were also made in sections 2.4.1, 2.5, 3759 2.6, 7.3.1, 7.3.3, 7.3.5, 7.3.6, 9.1, 9.3, 9.4, 9.5.1, and 9.5.2. 3760 This change is not backward compatible with RFC 5046, but is part 3761 of all known implementations of iSER at the time this document was 3762 developed. 3764 9. Remove iWARP specific behavior. Changes were made in the 3765 definition section on RDMA Operation and Send Message Type. 3766 Clarifications were added in section 2.4.2 on the use of SendSE 3767 and SendInvSE. These clarifications reflect a removal of the 3768 requirements in RFC 5046 for the use of these messages, as 3769 implementations have not followed RFC 5046 in this area. Changes 3770 affecting Send with Invalidate were made in sections 2.4.1, 2.5, 3771 2.6, 4.1, and 7.3.2. Changes affecting Terminate were made in 3772 sections 10.1.2.1 and 10.1.2.2. Changes were made in section 15 3773 to remove iWARP headers. 3775 10. Removed denial of service descriptions for the initiator in 3776 section 5.1.1 since it is applicable for the target only. 3778 11. Clarified in section 2.4.1 that STag invalidation is the 3779 initiator's responsibility for security reasons, and the initiator 3780 cannot rely on the target using an Invalidate version of Send. 3781 Added text in section 11 on Stag invalidation. 3783 15 Appendix B: Message Format for iSER 3785 This section is for information only and is NOT part of the 3786 standard. 3788 15.1 iWARP Message Format for iSER Hello Message 3790 The following figure depicts an iSER Hello Message encapsulated in 3791 an iWARP SendSE Message. 3793 0 1 2 3 3794 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3796 | MPA Header | DDP Control | RDMA Control | 3797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3798 | Reserved | 3799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3800 | (Send) Queue Number | 3801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3802 | (Send) Message Sequence Number | 3803 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3804 | (Send) Message Offset | 3805 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3806 | 0010b | Zeros | 0001b | 0001b | iSER-IRD | 3807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3808 | All Zeros | 3809 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3810 | | 3811 | All Zeros | 3812 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3813 | All Zeros | 3814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3815 | | 3816 | All Zeros | 3818 | MPA CRC | 3819 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3820 Figure 6 SendSE Message containing an iSER Hello Message 3822 15.2 iWARP Message Format for iSER HelloReply Message 3824 The following figure depicts an iSER HelloReply Message encapsulated 3825 in an iWARP SendSE Message. The Reject (REJ) flag is set to 0. 3827 0 1 2 3 3828 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3829 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3830 | MPA Header | DDP Control | RDMA Control | 3831 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3832 | Reserved | 3833 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3834 | (Send) Queue Number | 3835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3836 | (Send) Message Sequence Number | 3837 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3838 | (Send) Message Offset | 3839 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3840 | 0011b |Zeros|0| 0001b | 0001b | iSER-ORD | 3841 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3842 | All Zeros | 3843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3844 | | 3845 | All Zeros | 3846 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3847 | All Zeros | 3848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3849 | | 3850 | All Zeros | 3851 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3852 | MPA CRC | 3853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3854 Figure 7 SendSE Message containing an iSER HelloReply Message 3856 15.3 iSER Header Format for SCSI Read Command PDU 3858 The following figure depicts a SCSI Read Command PDU embedded in an 3859 iSER Message. For this particular example, in the iSER header, the 3860 Write STag Valid flag is set to zero, the Read STag Valid flag is 3861 set to one, the Write STag field is set to all zeros, the Write Base 3862 Offset field is set to all zeros, the Read STag field contains a 3863 valid Read STag, and the Read Base Offset field contains a valid 3864 Base Offset for the Read Tagged Buffer. 3866 0 1 2 3 3867 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3868 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3869 | 0001b |0|1| All zeros | 3870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3871 | All Zeros | 3872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3873 | | 3874 | All Zeros | 3875 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3876 | Read STag | 3877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3878 | | 3879 | Read Base Offset | 3880 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3881 | SCSI Read Command PDU | 3882 // // 3883 | | 3884 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3886 Figure 8 iSER Header Format for SCSI Read Command PDU 3888 15.4 iSER Header Format for SCSI Write Command PDU 3890 The following figure depicts a SCSI Write Command PDU embedded in an 3891 iSER Message. For this particular example, in the iSER header, the 3892 Write STag Valid flag is set to one, the Read STag Valid flag is set 3893 to zero, the Write STag field contains a valid Write STag, the Write 3894 Base Offset field contains a valid Base Offset for the Write Tagged 3895 Buffer, the Read STag field is set to all zeros since it is not 3896 used, and the Read Base Offset field is set to all zeros. 3898 0 1 2 3 3899 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3900 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3901 | 0001b |1|0| All zeros | 3902 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3903 | Write STag | 3904 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3905 | | 3906 | Write Base Offset | 3907 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3908 | All Zeros | 3909 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3910 | | 3911 | All Zeros | 3912 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3913 | SCSI Write Command PDU | 3914 // // 3915 | | 3916 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3917 Figure 9 iSER Header Format for SCSI Write Command PDU 3919 15.5 iSER Header Format for SCSI Response PDU 3921 The following figure depicts a SCSI Response PDU embedded in an iSER 3922 Message: 3924 0 1 2 3 3925 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3926 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3927 | 0001b |0|0| All Zeros | 3928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3929 | All Zeros | 3930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3931 | | 3932 | All Zeros | 3933 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3934 | All Zeros | 3935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3936 | | 3937 | All Zeros | 3938 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3939 | SCSI Response PDU | 3940 // // 3941 | | 3942 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3943 Figure 10 iSER Header Format for SCSI Response PDU 3945 16 Appendix C: Architectural discussion of iSER over InfiniBand 3947 This section explains how an InfiniBand network (with Gateways) 3948 would be structured. It is informational only and is intended to 3949 provide insight on how iSER is used in an InfiniBand environment. 3951 16.1 Host side of iSCSI & iSER connections in Infiniband 3953 Figure 11 defines the topologies in which iSCSI and iSER will be 3954 able to operate on an InfiniBand Network. 3956 +---------+ +---------+ +---------+ +---------+ +--- -----+ 3957 | Host | | Host | | Host | | Host | | Host | 3958 | | | | | | | | | | 3959 +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ 3960 |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| 3961 +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ 3962 |----+------|-----+-----|-----+-----|-----+-----|-----+---> To IB 3963 IB| IB | IB | IB | IB | SubNet2 SWTCH 3964 +-v-----------v-----------v-----------v-----------v---------+ 3965 | InfiniBand Switch for Subnet1 | 3966 +---+-----+--------+-----+--------+-----+------------v------+ 3967 | TCA | | TCA | | TCA | | 3968 +-----+ +-----+ +-----+ | IB 3969 / IB \ / IB \ / \ +--+--v--+--+ 3970 | iSER | | iSER | | IPoIB | | | TCA | | 3971 | Gateway | | Gateway | | Gateway | | +-----+ | 3972 | to | | to | | to | | Storage | 3973 | iSCSI | | iSER | | IP | | Controller| 3974 | TCP | | iWARP | |Ethernet | +-----+-----+ 3975 +---v-----| +---v-----| +----v----+ 3976 | EN | EN | EN 3977 +--------------+---------------+----> to IP based storage 3978 Ethernet links that carry iSCSI or iWARP 3980 Figure 11 iSCSI and iSER on IB 3982 In Figure 11, the Host systems are connected via the InfiniBand Host 3983 Channel Adapters (HCAs) to the InfiniBand links. With the use of IB 3984 switch(es), the InfiniBand links connect the HCA to InfiniBand 3985 Target Channel Adapters (TCAs) located in gateways or Storage 3986 Controllers. An iSER-capable IB-IP Gateway converts the iSER 3987 Messages encapsulated in IB protocols to either standard iSCSI, or 3988 iSER Messages for iWARP. An [IPoIB] Gateway converts the InfiniBand 3989 [IPoIB] protocol to IP protocol, and in the iSCSI case, permits 3990 iSCSI to be operated on an IB Network between the Hosts and the 3991 [IPoIB] Gateway. 3993 16.2 Storage side of iSCSI & iSER mixed network environment 3995 Figure 12 shows a storage controller that has three different portal 3996 groups: one supporting only iSCSI (TPG-4), one supporting iSER/iWARP 3997 or iSCSI (TPG-2), and one supporting iSER/IB (TPG-1). 3999 | | | 4000 | | | 4001 +--+--v--+----------+--v--+----------+--v--+--+ 4002 | | IB | |iWARP| | EN | | 4003 | | | | TCP | | NIC | | 4004 | |(TCA)| | RNIC| | | | 4005 | +-----| +-----+ +-----+ | 4006 | TPG-1 TPG-2 TPG-4 | 4007 | 9.1.3.3 9.1.2.4 9.1.2.6 | 4008 | | 4009 | Storage Controller | 4010 | | 4011 +---------------------------------------------+ 4013 Figure 12 Storage Controller with TCP, iWARP, and IB Connections 4015 The normal iSCSI portal group advertising processes (via SLP, iSNS, 4016 or SendTargets) are available to a Storage Controller. 4018 16.3 Discovery processes for an InfiniBand Host 4020 An InfiniBand Host system can gather portal group IP address from 4021 SLP, iSNS, or the SendTargets discovery processes by using TCP/IP 4022 via [IPoIB]. After obtaining one or more remote portal IP 4023 addresses, the Initiator uses the standard IP mechanisms to resolve 4024 the IP address to a local outgoing interface and the destination 4025 hardware address (Ethernet MAC or IB GID of the target or a gateway 4026 leading to the target). If the resolved interface is an [IPoIB] 4027 network interface, then the target portal can be reached through an 4028 InfiniBand fabric. In this case the Initiator can establish an 4029 iSCSI/TCP or iSCSI/iSER session with the Target over that InfiniBand 4030 interface, using the Hardware Address (InfiniBand GID) obtained 4031 through the standard Address Resolution (ARP) processes. 4033 If more than one IP address are obtained through the discovery 4034 process, the Initiator should select a Target IP address that is on 4035 the same IP subnet as the Initiator if one exists. This will avoid 4036 a potential overhead of going through a gateway when a direct path 4037 exists. 4039 In addition a user can configure manual static IP route entries if a 4040 particular path to the target is preferred. 4042 16.4 IBTA Connection specifications 4044 It is outside the scope of this document, but it is expected that 4045 the InfiniBand Trade Association (IBTA) has or will define: 4047 * The iSER ServiceID 4049 * A Means for permitting a Host to establish a connection with a 4050 peer InfiniBand end-node, and that peer indicating when that 4051 end-node supports iSER, so the Host would be able to fall back 4052 to iSCSI/TCP over [IPoIB]. 4054 * A Means for permitting the Host to establish connections with 4055 IB iSER connections on storage controllers or IB iSER connected 4056 Gateways in preference to [IPoIB] connected Gateways/Bridges or 4057 connections to Target Storage Controllers that also accept 4058 iSCSI via [IPoIB]. 4060 * A Means for combining the IB ServiceID for iSER and the IP port 4061 number such that the IB Host can use normal IB connection 4062 processes, yet ensure that the iSER target peer can actually 4063 connect to the required IP port number. 4065 17 Acknowledgments 4067 The authors acknowledge the following individuals for identifying 4068 implementation issues and/or suggesting resolutions to the issues 4069 clarified in this document: Alexander Nezhinsky, Robert Russell, 4070 Arne Redlich, David Black, Mallikarjun Chadalapaka, Tom Talpey, 4071 Felix Marti, Robert Sharp, Caitlin Bestler, and Hemal Shah. Credit 4072 also goes to the authors of the original iSER Specification 4073 [RFC5046], including Michael Ko, Mallikarjun Chadalapaka, John 4074 Hufferd, Uri Elzur, Hemal Shah, and Patricia Thaler. This document 4075 benefited from all of their contributions. 4077 Author's Address 4079 Michael Ko 4080 Email: mkosjc@gmail.com 4082 Alexander Nezhinsky 4083 Mellanox Technologies 4084 13 Zarchin St. 4085 Raanana 43662, Israel 4086 Phone: +972-74-712-9000 4087 Email: alexandern@mellanox.com, nezhinsky@gmail.com 4089 Copyright Notice 4091 Copyright (c) 2012 IETF Trust and the persons identified as the 4092 document authors. All rights reserved. 4094 This document is subject to BCP 78 and the IETF Trust's Legal 4095 Provisions Relating to IETF Documents 4096 (http://trustee.ietf.org/license-info) in effect on the date of 4097 publication of this document. Please review these documents 4098 carefully, as they describe your rights and restrictions with 4099 respect to this document. Code Components extracted from this 4100 document must include Simplified BSD License text as described in 4101 Section 4.e of the Trust Legal Provisions and are provided without 4102 warranty as described in the Simplified BSD License.