idnits 2.17.1 draft-ietf-storm-iser-15.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- == There are 1 instance of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. == There are 3 instances of lines with private range IPv4 addresses in the document. If these are generic example addresses, they should be changed to use any of the ranges defined in RFC 6890 (or successor): 192.0.2.x, 198.51.100.x or 203.0.113.x. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 9, 2013) is 3937 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'IPSEC-IPS' is mentioned on line 3579, but not defined == Missing Reference: 'RFCXXXX' is mentioned on line 3630, but not defined == Unused Reference: 'IPS-IPSEC' is defined on line 3680, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5046 (Obsoleted by RFC 7145) ** Obsolete normative reference: RFC 793 (ref. 'TCP') (Obsoleted by RFC 9293) == Outdated reference: A later version (-04) exists of draft-ietf-storm-ipsec-ips-update-03 Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Storage Maintenance (StorM) Working Group Michael Ko 3 Internet Draft Consultant 4 Intended status: Proposed Standard Alexander Nezhinsky 5 Expires: January 2014 Mellanox 6 Obsoletes: 5046 July 9, 2013 8 iSCSI Extensions for RDMA Specification 9 draft-ietf-storm-iser-15.txt 11 Status of this Memo 13 This Internet-Draft is submitted to IETF in full conformance with 14 the provisions of BCP 78 and BCP 79. 16 Internet-Drafts are working documents of the Internet Engineering 17 Task Force (IETF), its areas, and its working groups. Note that 18 other groups may also distribute working documents as Internet- 19 Drafts. 21 Internet-Drafts are draft documents valid for a maximum of six 22 months and may be updated, replaced, or obsoleted by other documents 23 at any time. It is inappropriate to use Internet-Drafts as 24 reference material or to cite them other than as "work in progress." 26 The list of current Internet-Drafts can be accessed at 27 http://www.ietf.org/1id-abstracts.html. 29 The list of Internet-Draft Shadow Directories can be accessed at 30 http://www.ietf.org/shadow.html. 32 This Internet-Draft will expire on January, 2014. 34 Abstract 36 iSCSI Extensions for Remote Direct Memory Access (RDMA) provides the 37 RDMA data transfer capability to iSCSI by layering iSCSI on top of 38 an RDMA-Capable Protocol. An RDMA-Capable Protocol provides RDMA 39 Read and Write services, which enable data to be transferred 40 directly into SCSI I/O Buffers without intermediate data copies. 41 This document describes the extensions to the iSCSI protocol to 42 support RDMA services as provided by an RDMA-Capable Protocol. 44 This document obsoletes RFC 5046. 46 Table of Contents 48 1 Definitions and Acronyms ....................................6 49 1.1 Definitions .................................................6 50 1.2 Acronyms ...................................................12 51 1.3 Conventions ................................................14 52 2 Introduction ...............................................15 53 2.1 Motivation .................................................15 54 2.2 iSCSI/iSER Layering ........................................16 55 2.3 Architectural Goals ........................................17 56 2.4 Protocol Overview ..........................................17 57 2.5 RDMA services and iSER .....................................19 58 2.5.1 STag......................................................19 59 2.5.2 Send......................................................20 60 2.5.3 RDMA Write................................................21 61 2.5.4 RDMA Read.................................................21 62 2.6 SCSI Read Overview .........................................21 63 2.7 SCSI Write Overview ........................................22 64 3 Upper Layer Interface Requirements .........................23 65 3.1 Operational Primitives offered by iSER .....................23 66 3.1.1 Send_Control..............................................24 67 3.1.2 Put_Data..................................................24 68 3.1.3 Get_Data..................................................24 69 3.1.4 Allocate_Connection_Resources.............................25 70 3.1.5 Deallocate_Connection_Resources...........................25 71 3.1.6 Enable_Datamover..........................................25 72 3.1.7 Connection_Terminate......................................26 73 3.1.8 Notice_Key_Values.........................................26 74 3.1.9 Deallocate_Task_Resources.................................26 75 3.2 Operational Primitives used by iSER ........................27 76 3.2.1 Control_Notify............................................27 77 3.2.2 Data_Completion_Notify....................................27 78 3.2.3 Data_ACK_Notify...........................................28 79 3.2.4 Connection_Terminate_Notify...............................28 80 3.3 iSCSI Protocol Usage Requirements ..........................28 81 4 Lower Layer Interface Requirements .........................30 82 4.1 Interactions with the RCaP Layer ...........................30 83 4.2 Interactions with the Transport Layer ......................31 84 5 Connection Setup and Termination ...........................32 85 5.1 iSCSI/iSER Connection Setup ................................32 86 5.1.1 Initiator Behavior........................................33 87 5.1.2 Target Behavior...........................................35 88 5.1.3 iSER Hello Exchange.......................................36 89 5.2 iSCSI/iSER Connection Termination ..........................39 90 5.2.1 Normal Connection Termination at the Initiator............39 91 5.2.2 Normal Connection Termination at the Target...............40 92 5.2.3 Termination without Logout Request/Response PDUs..........40 93 6 Login/Text Operational Keys ................................42 94 6.1 HeaderDigest and DataDigest ................................42 95 6.2 MaxRecvDataSegmentLength ...................................42 96 6.3 RDMAExtensions .............................................43 97 6.4 TargetRecvDataSegmentLength ................................44 98 6.5 InitiatorRecvDataSegmentLength .............................44 99 6.6 OFMarker and IFMarker ......................................45 100 6.7 MaxOutstandingUnexpectedPDUs ...............................45 101 6.8 MaxAHSLength ...............................................46 102 6.9 TaggedBufferForSolicitedDataOnly ...........................46 103 6.10 iSERHelloRequired.........................................47 104 7 iSCSI PDU Considerations ...................................48 105 7.1 iSCSI Data-Type PDU ........................................48 106 7.2 iSCSI Control-Type PDU .....................................49 107 7.3 iSCSI PDUs .................................................49 108 7.3.1 SCSI Command..............................................49 109 7.3.2 SCSI Response.............................................51 110 7.3.3 Task Management Function Request/Response.................53 111 7.3.4 SCSI Data-out.............................................54 112 7.3.5 SCSI Data-in..............................................55 113 7.3.6 Ready To Transfer (R2T)...................................57 114 7.3.7 Asynchronous Message......................................59 115 7.3.8 Text Request & Text Response..............................59 116 7.3.9 Login Request & Login Response............................60 117 7.3.10 Logout Request & Logout Response ........................60 118 7.3.11 SNACK Request ...........................................60 119 7.3.12 Reject ..................................................60 120 7.3.13 NOP-Out & NOP-In ........................................61 121 8 Flow Control and STag Management ...........................62 122 8.1 Flow Control for RDMA Send Messages ........................62 123 8.1.1 Flow Control for Control-Type PDUs from the Initiator.....62 124 8.1.2 Flow Control for Control-Type PDUs from the Target........65 125 8.2 Flow Control for RDMA Read Resources .......................66 126 8.3 STag Management ............................................67 127 8.3.1 Allocation of STags.......................................67 128 8.3.2 Invalidation of STags.....................................67 129 9 iSER Control and Data Transfer .............................69 130 9.1 iSER Header Format .........................................69 131 9.2 iSER Header Format for iSCSI Control-Type PDU ..............69 132 9.3 iSER Header Format for iSER Hello Message ..................72 133 9.4 iSER Header Format for iSER HelloReply Message .............73 134 9.5 SCSI Data Transfer Operations ..............................74 135 9.5.1 SCSI Write Operation......................................74 136 9.5.2 SCSI Read Operation.......................................75 137 9.5.3 Bidirectional Operation...................................76 138 10 iSER Error Handling and Recovery ...........................77 139 10.1 Error Handling............................................77 140 10.1.1 Errors in the Transport Layer ...........................77 141 10.1.2 Errors in the RCaP Layer ................................78 142 10.1.3 Errors in the iSER Layer ................................78 143 10.1.4 Errors in the iSCSI Layer ...............................80 144 10.2 Error Recovery............................................82 145 10.2.1 PDU Recovery ............................................82 146 10.2.2 Connection Recovery .....................................83 147 11 Security Considerations ....................................84 148 12 IANA Considerations ........................................85 149 13 References .................................................86 150 13.1 Normative References......................................86 151 13.2 Informative References....................................86 152 14 Appendix A: Summary of Changes from RFC 5046 ...............88 153 15 Appendix B: Message Format for iSER ........................90 154 15.1 iWARP Message Format for iSER Hello Message...............90 155 15.2 iWARP Message Format for iSER HelloReply Message..........91 156 15.3 iSER Header Format for SCSI Read Command PDU..............92 157 15.4 iSER Header Format for SCSI Write Command PDU.............93 158 15.5 iSER Header Format for SCSI Response PDU..................94 159 16 Appendix C: Architectural discussion of iSER over InfiniBand95 160 16.1 Host side of iSCSI & iSER connections in Infiniband.......95 161 16.2 Storage side of iSCSI & iSER mixed network environment....96 162 16.3 Discovery processes for an InfiniBand Host................96 163 16.4 IBTA Connection specifications............................97 164 17 Acknowledgments ............................................98 165 Table of Figures 167 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase ...16 168 Figure 2 iSER Header Format .....................................69 169 Figure 3 iSER Header Format for iSCSI Control-Type PDU ..........70 170 Figure 4 iSER Header Format for iSER Hello Message ..............72 171 Figure 5 iSER Header Format for iSER HelloReply Message .........73 172 Figure 6 SendSE Message containing an iSER Hello Message ........90 173 Figure 7 SendSE Message containing an iSER HelloReply Message ...91 174 Figure 8 iSER Header Format for SCSI Read Command PDU ...........92 175 Figure 9 iSER Header Format for SCSI Write Command PDU ..........93 176 Figure 10 iSER Header Format for SCSI Response PDU ..............94 177 Figure 11 iSCSI and iSER on IB ..................................95 178 Figure 12 Storage Controller with TCP, iWARP, and IB Connections 96 180 1 Definitions and Acronyms 182 1.1 Definitions 184 Advertisement (Advertised, Advertise, Advertisements, Advertises) - 185 The act of informing a remote iSER (iSCSI Extensions for RDMA) 186 Layer that a local node's buffer is available to it. A Node 187 makes a buffer available for incoming RDMA Read Request Message 188 or incoming RDMA Write Message access by informing the remote 189 iSER Layer of the Tagged Buffer identifiers (STag, Base Offset, 190 and buffer length). Note that this Advertisement of Tagged 191 Buffer information is the responsibility of the iSER Layer on 192 either end and is not defined by the RDMA-Capable Protocol. A 193 typical method would be for the iSER Layer to embed the Tagged 194 Buffer's STag, Base Offset, and buffer length in a message 195 destined for the remote iSER Layer. 197 Base Offset - A value when added to the Buffer Offset forms the 198 Tagged Offset. 200 Completion (Completed, Complete, Completes) - Completion is defined 201 as the process by which the RDMA-Capable Protocol layer informs 202 the iSER Layer, that a particular RDMA Operation has performed 203 all functions specified for the RDMA Operation. 205 Connection - A connection is a logical bidirectional communication 206 channel between the initiator and the target, e.g., a TCP 207 connection. Communication between the initiator and the target 208 occurs over one or more connections. The connections carry 209 control messages, SCSI commands, parameters, and data within 210 iSCSI Protocol Data Units (iSCSI PDUs). 212 Connection Handle - An information element that identifies the 213 particular iSCSI connection and is unique for a given iSCSI 214 Layer and the underlying iSER Layer. Every invocation of an 215 Operational Primitive is qualified with the Connection Handle. 217 Data Sink - The peer receiving a data payload. Note that the Data 218 Sink can be required to both send and receive RCaP (RDMA-Capable 219 Protocol) Messages to transfer a data payload. 221 Data Source - The peer sending a data payload. Note that the Data 222 Source can be required to both send and receive RCaP Messages to 223 transfer a data payload. 225 Datamover Interface (DI) - The interface between the iSCSI Layer and 226 the Datamover Layer as described in [DA]. 228 Datamover Layer - A layer that is directly below the iSCSI Layer and 229 above the underlying transport layers. This layer exposes and 230 uses a set of transport independent Operational Primitives for 231 the communication between the iSCSI Layer and itself. The 232 Datamover layer, operating in conjunction with the transport 233 layers, moves the control and data information on the iSCSI 234 connection. In this specification, the iSER Layer is the 235 Datamover layer. 237 Datamover Protocol - A Datamover protocol is the wire-protocol that 238 is defined to realize the Datamover layer functionality. In 239 this specification, the iSER protocol is the Datamover protocol. 241 Inbound RDMA Read Queue Depth (IRD) - The maximum number of incoming 242 outstanding RDMA Read Requests that the RDMA-Capable Controller 243 can handle on a particular RCaP Stream at the Data Source. For 244 some RDMA-Capable Protocol layers, the term "IRD" may be known 245 by a different name. For example, for InfiniBand, the 246 equivalent for IRD is the Responder Resources. 248 I/O Buffer - A buffer that is used in a SCSI Read or Write operation 249 so SCSI data may be sent from or received into that buffer. 251 iSCSI - The iSCSI protocol as defined in [iSCSI] is a mapping of the 252 SCSI Architecture Model of SAM-5 over TCP. 254 iSCSI control-type PDU - Any iSCSI PDU that is not an iSCSI data- 255 type PDU and also not a SCSI Data-out PDU carrying solicited 256 data is defined as an iSCSI control-type PDU. Specifically, it 257 is to be noted that SCSI Data-out PDUs for unsolicited data are 258 defined as iSCSI control-type PDUs. 260 iSCSI data-type PDU - An iSCSI data-type PDU is defined as an iSCSI 261 PDU that causes data transfer via RDMA operations at the iSER 262 layer, transparent to the remote iSCSI Layer, to take place 263 between the peer iSCSI nodes on a full feature phase iSCSI 264 connection. An iSCSI data-type PDU, when requested for 265 transmission by the sender iSCSI Layer, results in the 266 associated data transfer without the participation of the remote 267 iSCSI Layer, i.e. the PDU itself is not delivered as-is to the 268 remote iSCSI Layer. The following iSCSI PDUs constitute the set 269 of iSCSI data-type PDUs - SCSI Data-In PDU and R2T PDU. 271 iSCSI Layer - A layer in the protocol stack implementation within an 272 end node that implements the iSCSI protocol and interfaces with 273 the iSER Layer via the Datamover Interface. 275 iSCSI PDU (iSCSI Protocol Data Unit) - The iSCSI Layer at the 276 initiator and the iSCSI Layer at the target divide their 277 communications into messages. The term "iSCSI protocol data 278 unit" (iSCSI PDU) is used for these messages. 280 iSCSI/iSER Connection - An iSER-assisted iSCSI connection. An iSCSI 281 connection that is not iSER-assisted always maps onto a TCP 282 connection at the transport level. But an iSER-assisted iSCSI 283 connection may not have an underlying TCP connection. For some 284 RCaP implementation (e.g., iWARP), an iSER-assisted iSCSI 285 connection has an underlying TCP connection. For other RCaP 286 implementation (e.g., InfiniBand), there is no underlying TCP 287 connection. (In the specific example of InfiniBand [IB], an 288 iSER-assisted iSCSI connection is directly mapped onto the 289 InfiniBand RC channel.) 291 iSCSI/iSER Session - An iSER-assisted iSCSI session. All 292 connections of an iSCSI/iSER session are iSCSI/iSER connections. 294 iSER - iSCSI Extensions for RDMA, the protocol defined in this 295 document. 297 iSER-assisted - A term generally used to describe the operation of 298 iSCSI when the iSER functionality is also enabled below the 299 iSCSI Layer for the specific iSCSI/iSER connection in question. 301 iSER-IRD - This variable represents the maximum number of incoming 302 outstanding RDMA Read Requests that the iSER Layer at the 303 initiator grants on a particular RCaP Stream. 305 iSER-ORD - This variable represents the maximum number of 306 outstanding RDMA Read Requests that the iSER Layer can initiate 307 on a particular RCaP Stream. This variable is maintained only 308 by the iSER Layer at the target. 310 iSER Layer - The layer that implements the iSCSI Extensions for RDMA 311 (iSER) protocol. 313 iWARP - A suite of wire protocols comprising of [RDMAP], [DDP], and 314 [MPA] when layered above [TCP]. [RDMAP] and [DDP] may be 315 layered above SCTP or other transport protocols. 317 Local Mapping - A task state record maintained by the iSER Layer 318 that associates the Initiator Task Tag to the Local STag(s). 319 The specifics of the record structure are implementation 320 dependent. 322 Local Peer - The implementation of the RDMA-Capable Protocol on the 323 local end of the connection. Used to refer to the local entity 324 when describing protocol exchanges or other interactions between 325 two Nodes. 327 Node - A computing device attached to one or more links of a 328 network. A Node in this context does not refer to a specific 329 application or protocol instantiation running on the computer. 330 A Node may consist of one or more RDMA-Capable Controllers 331 installed in a host computer. 333 Operational Primitive - An Operational Primitive is an abstract 334 functional interface procedure that requests another layer to 335 perform a specific action on the requestor's behalf or notifies 336 the other layer of some event. The Datamover Interface between 337 an iSCSI Layer and a Datamover layer within an iSCSI end node 338 uses a set of Operational Primitives to define the functional 339 interface between the two layers. Note that not every 340 invocation of an Operational Primitive may elicit a response 341 from the requested layer. A full discussion of the Operational 342 Primitive types and request-response semantics available to 343 iSCSI and iSER can be found in [DA]. 345 Outbound RDMA Read Queue Depth (ORD) - The maximum number of 346 outstanding RDMA Read Requests that the RDMA-Capable Controller 347 can initiate on a particular RCaP Stream at the Data Sink. For 348 some RDMA-Capable Protocol layer, the term "ORD" may be known by 349 a different name. For example, for InfiniBand, the equivalent 350 for ORD is the Initiator Depth. 352 Phase Collapse - Refers to the optimization in iSCSI where the SCSI 353 status is transferred along with the final SCSI Data-in PDU from 354 a target. See section 4.2 in [iSCSI]. 356 RCaP Message - One or more packets of the network layer comprising a 357 single RDMA operation or a part of an RDMA Read Operation of the 358 RDMA-Capable Protocol. For iWARP, an RCaP Message is known as 359 an RDMAP Message. 361 RCaP Stream - A single bidirectional association between the peer 362 RDMA-Capable Protocol layers on two Nodes over a single 363 transport-level stream. For iWARP, an RCaP Stream is known as 364 an RDMAP Stream, and the association is created following a 365 successful Login Phase during which iSER support is negotiated. 367 RDMA-Capable Protocol (RCaP) - The protocol or protocol suite that 368 provides a reliable RDMA transport functionality, e.g., iWARP, 369 InfiniBand, etc. 371 RDMA-Capable Controller - A network I/O adapter or embedded 372 controller with RDMA functionality. For example, for iWARP, 373 this could be an RNIC, and for InfiniBand, this could be a HCA 374 (Host Channel Adapter) or TCA (Target Channel Adapter). 376 RDMA-enabled Network Interface Controller (RNIC) - A network I/O 377 adapter or embedded controller with iWARP functionality. 379 RDMA Operation - A sequence of RCaP Messages, including control 380 Messages, to transfer data from a Data Source to a Data Sink. 381 The following RDMA Operations are defined - RDMA Write 382 Operation, RDMA Read Operation, and Send Operation. 384 RDMA Protocol (RDMAP) - A wire protocol that supports RDMA 385 Operations to transfer ULP (Upper Level Protocol) data between a 386 Local Peer and the Remote Peer as described in [RDMAP]. 388 RDMA Read Operation - An RDMA Operation used by the Data Sink to 389 transfer the contents of a Data Source buffer from the Remote 390 Peer to a Data Sink buffer at the Local Peer. An RDMA Read 391 operation consists of a single RDMA Read Request Message and a 392 single RDMA Read Response Message. 394 RDMA Read Request - An RCaP Message used by the Data Sink to request 395 the Data Source to transfer the contents of a buffer. The RDMA 396 Read Request Message describes both the Data Source and the Data 397 Sink buffers. 399 RDMA Read Response - An RCaP Message used by the Data Source to 400 transfer the contents of a buffer to the Data Sink, in response 401 to an RDMA Read Request. The RDMA Read Response Message only 402 describes the Data Sink buffer. 404 RDMA Write Operation - An RDMA Operation used by the Data Source to 405 transfer the contents of a Data Source buffer from the Local 406 Peer to a Data Sink buffer at the Remote Peer. The RDMA Write 407 Message only describes the Data Sink buffer. 409 Remote Direct Memory Access (RDMA) - A method of accessing memory on 410 a remote system in which the local system specifies the remote 411 location of the data to be transferred. Employing an RDMA- 412 Capable Controller in the remote system allows the access to take 413 place without interrupting the processing of the CPU(s) on the 414 system. 416 Remote Mapping - A task state record maintained by the iSER Layer 417 that associates the Initiator Task Tag to the Advertised STag(s) 418 and the Base Offset(s). The specifics of the record structure 419 are implementation dependent. 421 Remote Peer - The implementation of the RDMA-Capable Protocol on the 422 opposite end of the connection. Used to refer to the remote 423 entity when describing protocol exchanges or other interactions 424 between two Nodes. 426 SCSI Layer - This layer builds/receives SCSI CDBs (Command 427 Descriptor Blocks) and sends/receives them with the remaining 428 command execute [SAM5] parameters to/from the iSCSI Layer. 430 Send - An RDMA Operation that transfers the content of a buffer from 431 the Local Peer to an untagged buffer at the Remote Peer. 433 SendInvSE Message - A Send with Solicited Event and Invalidate 434 Message. 436 SendSE Message - A Send with Solicited Event Message. 438 Sequence Number (SN) - DataSN for a SCSI Data-in PDU and R2TSN for 439 an R2T PDU. The semantics for both types of sequence numbers 440 are as defined in [iSCSI]. 442 Session, iSCSI Session - The group of Connections that link an 443 initiator SCSI port with a target SCSI port form an iSCSI 444 session (equivalent to a SCSI I-T nexus). Connections can be 445 added to and removed from a session even while the I-T nexus is 446 intact. Across all connections within a session, an initiator 447 sees one and the same target. 449 Steering Tag (STag) - An identifier of a Tagged Buffer on a Node 450 (Local or Remote) as defined in [RDMAP] and [DDP]. For other 451 RDMA-Capable Protocols, the Steering Tag may be known by 452 different names but will be herein referred to as STags. For 453 example, for Infiniband, a Remote STag is known as an R-Key, and 454 a Local STag is known as an L-Key, and both will be considered 455 STags. 457 Tagged Buffer - A buffer that is explicitly Advertised to the iSER 458 Layer at the remote node through the exchange of an STag, Base 459 Offset, and length. 461 Tagged Offset - The offset within a Tagged Buffer. 463 Traditional iSCSI - Refers to the iSCSI protocol as defined in 464 [iSCSI] (i.e. without the iSER enhancements). 466 Untagged Buffer - A buffer that is not explicitly Advertised to the 467 iSER Layer at the remode node. 469 1.2 Acronyms 471 Acronym Definition 473 -------------------------------------------------------------- 475 AHS Additional Header Segment 477 BHS Basic Header Segment 479 CO Connection Only 481 CRC Cyclic Redundancy Check 483 DDP Direct Data Placement Protocol 485 DI Datamover Interface 487 HCA Host Channel Adapter 489 IANA Internet Assigned Numbers Authority 491 IB Infiniband 493 IETF Internet Engineering Task Force 495 I/O Input - Output 497 IO Initialize Only 499 IP Internet Protocol 501 IPoIB IP over Infiniband 503 IPsec Internet Protocol Security 505 iSER iSCSI Extensions for RDMA 507 ITT Initiator Task Tag 508 LO Leading Only 510 MPA Marker PDU Aligned Framing for TCP 512 NOP No Operation 514 NSG Next Stage (during the iSCSI Login Phase) 516 PDU Protocol Data Unit 518 R2T Ready To Transfer 520 R2TSN Ready To Transfer Sequence Number 522 RCaP RDMA-Capable Protocol 524 RDMA Remote Direct Memory Access 526 RDMAP Remote Direct Memory Access Protocol 528 RFC Request For Comments 530 RNIC RDMA-enabled Network Interface Controller 532 SAM5 SCSI Architecture Model - 5 534 SCSI Small Computer Systems Interface 536 SNACK Selective Negative Acknowledgment - also 538 Sequence Number Acknowledgement for data 540 STag Steering Tag 542 SW Session Wide 544 TCA Target Channel Adapter 546 TCP Transmission Control Protocol 548 TMF Task Management Function 550 TTT Target Transfer Tag 552 ULP Upper Level Protocol 554 1.3 Conventions 556 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 557 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 558 document are to be interpreted as described in [RFC2119]. 560 2 Introduction 562 2.1 Motivation 564 The iSCSI protocol ([iSCSI]) is a mapping of the SCSI Architecture 565 Model (see [SAM5] and [iSCSI-SAM]) over the TCP protocol. SCSI 566 commands are carried by iSCSI requests and SCSI responses and status 567 are carried by iSCSI responses. Other iSCSI protocol exchanges and 568 SCSI Data are also transported in iSCSI PDUs. 570 Out-of-order TCP segments in the Traditional iSCSI model have to be 571 stored and reassembled before the iSCSI protocol layer within an end 572 node can place the data in the iSCSI buffers. This reassembly is 573 required because not every TCP segment is likely to contain an iSCSI 574 header to enable its placement and TCP itself does not have a built- 575 in mechanism for signaling ULP message boundaries to aid placement 576 of out-of-order segments. This TCP reassembly at high network 577 speeds is quite counter-productive for the following reasons: wasted 578 memory bandwidth in data copying, need for reassembly memory, wasted 579 CPU cycles in data copying, and the general store-and-forward 580 latency from an application perspective. 582 The generic term RDMA-Capable Protocol (RCaP) is used to refer to 583 protocol stacks that provide the Remote Direct Memory Access (RDMA) 584 functionality, such as iWARP and InfiniBand. 586 With the availability of RDMA-Capable Controllers within a host 587 system, it is appropriate for iSCSI to be able to exploit the direct 588 data placement function of the RDMA-Capable Controller like other 589 applications. 591 iSCSI Extensions for RDMA (iSER) is designed precisely to take 592 advantage of generic RDMA technologies - iSER's goal is to permit 593 iSCSI to employ direct data placement and RDMA capabilities using a 594 generic RDMA-Capable Controller. In summary, iSCSI/iSER protocol 595 stack is designed to enable scaling to high speeds by relying on a 596 generic data placement process and RDMA technologies and products, 597 which enable direct data placement of both in-order and out-of-order 598 data. 600 This document describes iSER as a protocol extension to iSCSI, both 601 for convenience of description and also because it is true in a very 602 strict protocol sense. However, it is to be noted that iSER is in 603 reality extending the connectivity of the iSCSI protocol defined in 604 [iSCSI], and the name iSER reflects this reality. 606 When the iSCSI protocol as defined in [iSCSI] (i.e. without the iSER 607 enhancements) is intended in the rest of the document, the term 608 "Traditional iSCSI" is used to make the intention clear. 610 This document obsoletes RFC 5046. See Section 14 for the list of 611 changes from RFC 5046. 613 2.2 iSCSI/iSER Layering 615 iSCSI Extensions for RDMA (iSER) is layered between the iSCSI layer 616 and the RCaP layer. 618 +--------------------------------------------------------+ 620 | SCSI | 622 +--------------------------------------------------------+ 624 | iSCSI | 626 DI -> +--------------------------------------------------------+ 628 | iSER | 630 +-------+--------------------------+---------------------+ 632 | RDMAP | | | 634 +-------+ Infiniband | | 636 | DDP | Reliable | Other | 638 +-------+ Connected | RDMA | 640 | MPA | Transport | Capable | 642 +-------+ Service | Protocol | 644 | TCP | | | 646 +-------+--------------------------+---------------------+ 648 | IP | Infiniband Network Layer | Other Network Layer | 650 +-------+--------------------------+---------------------+ 652 Figure 1 Example of iSCSI/iSER Layering in Full Feature Phase 654 Figure 1 shows an example of the relationship between SCSI, iSCSI, 655 iSER, and the different RCaP layers. For TCP, the RCaP is iWARP. 656 For Infiniband, the RCaP is the Reliable Connected Transport 657 Service. Note that the iSCSI layer as described here supports the 658 RDMA Extensions as used in iSER. 660 2.3 Architectural Goals 662 This section summarizes the architectural goals that guided the 663 design of iSER. 665 1. Provide an RDMA data transfer model for iSCSI that enables direct 666 in order or out of order data placement of SCSI data into pre- 667 allocated SCSI buffers while maintaining in order data delivery. 669 2. Not require any major changes to SCSI Architecture Model [SAM5] 670 and SCSI command set standards. 672 3. Utilize existing iSCSI infrastructure (sometimes referred to as 673 "iSCSI ecosystem") including but not limited to MIB, 674 bootstrapping, negotiation, naming & discovery, and security. 676 4. Enable a session to operate in the Traditional iSCSI data transfer 677 mode if iSER is not supported by either the initiator or the 678 target (not require iSCSI full feature phase interoperability 679 between an end node operating in Traditional iSCSI mode, and an 680 end node operating in iSER-assisted mode). 682 5. Allow initiator and target implementations to utilize generic 683 RDMA-Capable Controllers such as RNICs, or implement iSCSI and 684 iSER in software (not require iSCSI or iSER specific assists in 685 the RCaP implementation or RDMA-Capable Controller). 687 6. Implement a light weight Datamover protocol for iSCSI with minimal 688 state maintenance. 690 2.4 Protocol Overview 692 Consistent with the architectural goals stated in section 2.3, the 693 iSER protocol does not require changes in the iSCSI ecosystem or any 694 related SCSI specifications. iSER protocol defines the mapping of 695 iSCSI PDUs to RCaP Messages in such a way that it is entirely 696 feasible to realize iSCSI/iSER implementations that are based on 697 generic RDMA-Capable Controllers. The iSER protocol layer requires 698 minimal state maintenance to assist an iSCSI full feature phase 699 connection, besides being oblivious to the notion of an iSCSI 700 session. The crucial protocol aspects of iSER may be summarized 701 thus: 703 1. iSER-assisted mode is negotiated during the iSCSI login in the 704 leading connection for each session, and an entire iSCSI session 705 can only operate in one mode (i.e. a connection in a session 706 cannot operate in iSER-assisted mode if a different connection of 707 the same session is already in full feature phase in the 708 Traditional iSCSI mode). 710 2. Once in iSER-assisted mode, all iSCSI interactions on that 711 connection use RCaP Messages. 713 3. A Send Message is used for carrying an iSCSI control-type PDU 714 preceded by an iSER header. See section 7.2 for more details on 715 iSCSI control-type PDUs. 717 4. RDMA Write, RDMA Read Request, and RDMA Read Response Messages 718 are used for carrying control and all data information associated 719 with the iSCSI data-type PDUs (i.e., SCSI Data-In PDUs and R2T 720 PDUs). iSER does not use SCSI Data-Out PDUs for solicited data, 721 and SCSI Data-Out PDUs for unsolicited data are not treated as 722 iSCSI data-type PDUs by iSER because RDMA is not used. See 723 section 7.1 for more details on iSCSI data-type PDUs. 725 5. Target drives all data transfer (with the exception of iSCSI 726 unsolicited data) for SCSI writes and SCSI reads, by issuing RDMA 727 Read Requests and RDMA Writes respectively. 729 6. RCaP is responsible for ensuring data integrity. (For example, 730 iWARP includes a CRC-enhanced framing layer called MPA on top of 731 TCP; and for Infiniband, the CRCs are included in the Reliable 732 Connection mode). For this reason, iSCSI header and data digests 733 are negotiated to "None" for iSCSI/iSER sessions. 735 7. The iSCSI error recovery hierarchy defined in [iSCSI] is fully 736 supported by iSER. (However, see section 7.3.11 on the handling 737 of SNACK Request PDUs.) 739 8. iSER requires no changes to iSCSI security and text mode 740 negotiation mechanisms. 742 Note that Traditional iSCSI implementations may have to be adapted 743 to employ iSER. It is expected that the adaptation when required is 744 likely to be centered around the upper layer interface requirements 745 of iSER (section 3). 747 2.5 RDMA services and iSER 749 iSER is designed to work with software and/or hardware protocol 750 stacks providing the protocol services defined in RCaP documents 751 such as [RDMAP], [IB], etc. The following subsections describe the 752 key protocol elements of RCaP services that iSER relies on. 754 2.5.1 STag 756 An STag is the identifier of an I/O Buffer unique to an RDMA-Capable 757 Controller that the iSER Layer Advertises to the remote iSCSI/iSER 758 node in order to complete a SCSI I/O. 760 In iSER, Advertisement is the act of informing the target by the 761 initiator that an I/O Buffer is available at the initiator for RDMA 762 Read or RDMA Write access by the target. The initiator Advertises 763 the I/O Buffer by including the STag and the Base Offset in the 764 header of an iSER Message containing the SCSI Command PDU to the 765 target. The buffer length is as specified in the SCSI Command PDU. 767 The iSER Layer at the initiator Advertises the STag and the Base 768 Offset for the I/O Buffer of each SCSI I/O to the iSER Layer at the 769 target in the iSER header of a Send Message containing the SCSI 770 Command PDU, unless the I/O can be completely satisfied by 771 unsolicited data alone. The SendSE Message should be used if 772 supported by the RCaP layer (e.g., iWARP). 774 The iSER Layer at the target provides the STag for the I/O Buffer 775 that is the Data Sink of an RDMA Read Operation (section 2.5.4) to 776 the RCaP layer on the initiator node - i.e. this is completely 777 transparent to the iSER Layer at the initiator. 779 The iSER layer at the initiator SHOULD invalidate the Advertised 780 STag upon a normal completion of the associated task. The Send with 781 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 782 can be used for automatic invalidation when it is used to carry the 783 SCSI Response PDU. There are two exceptions to this automatic 784 invalidation - bidirectional commands, and abnormal completion of a 785 command. The iSER Layer at the initiator SHOULD explicitly 786 invalidate the STag in these two cases. That iSER layer MUST check 787 that STag invalidation has occurred whenever receipt of a Send with 788 Invalidate message is the expected means of causing an STag to be 789 invalidated, and MUST perform the STag invalidation if the STag has 790 not already been invalidated (e.g., because a Send message was used 791 instead of Send with Invalidate). 793 If the Advertised STag is not invalidated as recommended in the 794 foregoing paragraph (e.g., in order to cache the STag for future 795 reuse), the I/O Buffer remains exposed to the network for access by 796 the RCaP. Such an I/O Buffer is capable of being read or written by 797 the RCaP outside the scope of the iSCSI operation for which it was 798 originally established, which has both robustness and security 799 considerations. The robustness considerations are that the system 800 containing the iSER initiator may react poorly to an unexpected 801 modification of its memory. For the security considerations, see 802 Section 11. 804 2.5.2 Send 806 Send is the RDMA Operation that is not addressed to an Advertised 807 buffer, and uses Untagged buffers as the message is received. 809 The iSER Layer at the initiator uses the Send Operation to transmit 810 any iSCSI control-type PDU to the target. As an example, the 811 initiator uses Send Operations to transfer iSER Messages containing 812 SCSI Command PDUs to the iSER Layer at the target. 814 An iSER layer at the target uses the Send Operation to transmit any 815 iSCSI control-type PDU to the initiator. As an example, the target 816 uses Send Operations to transfer iSER Messages containing SCSI 817 Response PDUs to the iSER Layer at the initiator. 819 For interoperability, iSER implementations SHOULD accept and 820 correctly process SendSE and SendInvSE messages. However, SendSE 821 and SendInvSE messages are to be regarded as optimizations or 822 enhancements to the basic Send message, and their support may vary 823 by RCaP protocol and specific implementation. In general, these 824 messages SHOULD NOT be used, unless the RCaP requires support for 825 them in all implementations. If these messages are used, the 826 implementation SHOULD be capable of reverting to use of Send in 827 order to work with a receiver that does not support these message. 828 Attempted use of these messages with a peer that does not support 829 them may result in a fatal error that closes the RCaP connection. 830 For example, these messages SHOULD NOT be used with the InfiniBand 831 RCaP because InfiniBand does not require support for them in all 832 cases. New iSER implementations SHOULD use Send (and not SendSE or 833 SendInvSE) unless there are compelling reasons for doing otherwise. 834 Similarly, iSER implementations SHOULD NOT rely on events triggered 835 by SendSE and SendInvSE, as these messages may not be used. 837 2.5.3 RDMA Write 839 RDMA Write is the RDMA Operation that is used to place data into an 840 Advertised buffer at the Data Sink. The Data Source addresses the 841 Message using an STag and a Tagged Offset that are valid on the Data 842 Sink. 844 The iSER Layer at the target uses the RDMA Write Operation to 845 transfer the contents of a local I/O Buffer to an Advertised I/O 846 Buffer at the initiator. The iSER Layer at the target uses the RDMA 847 Write to transfer whole or part of the data required to complete a 848 SCSI Read command. 850 The iSER Layer at the initiator does not employ RDMA Writes. 852 2.5.4 RDMA Read 854 RDMA Read is the RDMA Operation that is used to retrieve data from 855 an Advertised buffer at the Data Source. The sender of the RDMA 856 Read Request addresses the Message using an STag and a Tagged Offset 857 that are valid on the Data Source in addition to providing a valid 858 local STag and Tagged Offset that identify the Data Sink. 860 The iSER Layer at the target uses the RDMA Read Operation to 861 transfer the contents of an Advertised I/O Buffer at the initiator 862 to a local I/O Buffer at the target. The iSER Layer at the target 863 uses the RDMA Read to fetch whole or part of the data required to 864 complete a SCSI Write Command. 866 The iSER Layer at the initiator does not employ RDMA Reads. 868 2.6 SCSI Read Overview 870 The iSER Layer at the initiator receives the SCSI Command PDU from 871 the iSCSI Layer. The iSER Layer at the initiator generates an STag 872 for the I/O Buffer of the SCSI Read and Advertises the buffer by 873 including the STag and the Base Offset as part of the iSER header 874 for the PDU. The iSER Message is transferred to the target using a 875 Send Message. The SendSE Message should be used if supported by the 876 RCaP layer (e.g., iWARP). 878 The iSER Layer at the target uses one or more RDMA Writes to 879 transfer the data required to complete the SCSI Read. 881 The iSER Layer at the target uses a Send Message to transfer the 882 SCSI Response PDU back to the iSER Layer at the initiator. The iSER 883 Layer at the initiator invalidates the STag and notifies the iSCSI 884 Layer of the availability of the SCSI Response PDU. The Send with 885 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 886 can be used for automatic invalidation of the STag. 888 2.7 SCSI Write Overview 890 The iSER Layer at the initiator receives the SCSI Command PDU from 891 the iSCSI Layer. If solicited data transfer is involved, the iSER 892 Layer at the initiator generates an STag for the I/O Buffer of the 893 SCSI Write and Advertises the buffer by including the STag and the 894 Base Offset as part of the iSER header for the PDU. The iSER 895 Message is transferred to the target using a Send Message. The 896 SendSE Message should be used if supported by the RCaP layer (e.g., 897 iWARP). 899 The iSER Layer at the initiator may optionally send one or more non- 900 immediate unsolicited data PDUs to the target using Send Messages. 902 If solicited data transfer is involved, the iSER Layer at the target 903 uses one or more RDMA Reads to transfer the data required to 904 complete the SCSI Write. 906 The iSER Layer at the target uses a Send Message to transfer the 907 SCSI Response PDU back to the iSER Layer at the initiator. The iSER 908 Layer at the initiator invalidates the STag and notifies the iSCSI 909 Layer of the availability of the SCSI Response PDU. The Send with 910 Invalidate Message, if supported by the RCaP layer (e.g., iWARP), 911 can be used for automatic invalidation of the STag. 913 3 Upper Layer Interface Requirements 915 This section discusses the upper layer interface requirements in the 916 form of an abstract model of the required interactions between the 917 iSCSI Layer and the iSER Layer. The abstract model used here is 918 derived from the architectural model described in [DA]. [DA] also 919 provides a functional overview of the interactions between the iSCSI 920 Layer and the datamover layer as intended by the Datamover 921 Architecture. 923 The interface requirements are specified by Operational Primitives. 924 An Operational Primitive is an abstract functional interface 925 procedure between the iSCSI Layer and the iSER Layer that requests 926 one layer to perform a specific action on behalf of the other layer 927 or notifies the other layer of some event. Whenever an Operational 928 Primitive in invoked, the Connection_Handle qualifier is used to 929 identify a particular iSCSI connection. For some Operational 930 Primitives, a Data_Descriptor is used to identify the iSCSI/SCSI 931 data buffer associated with the requested or completed operation. 933 The abstract model and the Operational Primitives defined in this 934 section facilitate the description of the iSER protocol. In the 935 rest of the iSER specification, the compliance statements related to 936 the use of these Operational Primitives are only for the purpose of 937 the required interactions between the iSCSI Layer and the iSER 938 Layer. Note that the compliance statements related to the 939 Operational Primitives in the rest of this specification only 940 mandate functional equivalence on implementations, but do not put 941 any requirements on the implementation specifics of the interface 942 between the iSCSI Layer and the iSER Layer. 944 Each Operational Primitive is invoked with a set of qualifiers which 945 specify the information context for performing the specific action 946 being requested of the Operational Primitive. While the qualifiers 947 are required, the method of realizing the qualifiers (e.g., by 948 passing synchronously with invocation, or by retrieving from task 949 context, or by retrieving from shared memory, etc.) is 950 implementation dependent. 952 3.1 Operational Primitives offered by iSER 954 The iSER protocol layer MUST support the following Operational 955 Primitives to be used by the iSCSI protocol layer. 957 3.1.1 Send_Control 959 Input qualifiers: Connection_Handle, BHS and AHS (if any) of 960 the iSCSI PDU, PDU-specific qualifiers 962 Return results: Not specified 964 This is used by the iSCSI Layers at the initiator and the target to 965 request the outbound transfer of an iSCSI control-type PDU (see 966 section 7.2). Qualifiers that only apply for a particular control- 967 type PDU are known as PDU-specific qualifiers, e.g., 968 ImmediateDataSize for a SCSI Write command. For details on PDU- 969 specific qualifiers, see section 7.3. The iSCSI Layer can only 970 invoke the Send_Control Operational Primitive when the connection is 971 in iSER-assisted mode. 973 3.1.2 Put_Data 975 Input qualifiers: Connection_Handle, content of a SCSI Data-in 976 PDU header, Data_Descriptor, Notify_Enable 978 Return results: Not specified 980 This is used by the iSCSI Layer at the target to request the 981 outbound transfer of data for a SCSI Data-in PDU from the buffer 982 identified by the Data_Descriptor qualifier. The iSCSI Layer can 983 only invoke the Put_Data Operational Primitive when the connection 984 is in iSER-assisted mode. 986 The Notify_Enable qualifier is used to indicate to the iSER Layer 987 whether or not it should generate an eventual local completion 988 notification to the iSCSI Layer. See section 3.2.2 on 989 Data_Completion_Notify for details. 991 3.1.3 Get_Data 993 Input qualifiers: Connection_Handle, content of an R2T PDU, 994 Data_Descriptor, Notify_Enable 996 Return results: Not specified 998 This is used by the iSCSI Layer at the target to request the inbound 999 transfer of solicited data requested by an R2T PDU into the buffer 1000 identified by the Data_Descriptor qualifier. The iSCSI Layer can 1001 only invoke the Get_Data Operational Primitive when the connection 1002 is in iSER-assisted mode. 1004 The Notify_Enable qualifier is used to indicate to the iSER Layer 1005 whether or not it should generate the eventual local completion 1006 notification to the iSCSI Layer. See section 3.2.2 on 1007 Data_Completion_Notify for details. 1009 3.1.4 Allocate_Connection_Resources 1011 Input qualifiers: Connection_Handle, Resource_Descriptor 1012 (optional) 1014 Return results: Status 1016 This is used by the iSCSI Layers at the initiator and the target to 1017 request the allocation of all connection resources necessary to 1018 support RCaP for an operational iSCSI/iSER connection. The iSCSI 1019 Layer may optionally specify the implementation-specific resource 1020 requirements for the iSCSI connection using the Resource_Descriptor 1021 qualifier. 1023 A return result of Status=success means the invocation succeeded, 1024 and a return result of Status=failure means that the invocation 1025 failed. If the invocation is for a Connection_Handle for which an 1026 earlier invocation succeeded, the request will be ignored by the 1027 iSER Layer and the result of Status=success will be returned. Only 1028 one Allocate_Connection_Resources Operational Primitive invocation 1029 can be outstanding for a given Connection_Handle at any time. 1031 3.1.5 Deallocate_Connection_Resources 1033 Input qualifiers: Connection_Handle 1035 Return results: Not specified 1037 This is used by the iSCSI Layers at the initiator and the target to 1038 request the deallocation of all connection resources that were 1039 allocated earlier as a result of a successful invocation of the 1040 Allocate_Connection_Resources Operational Primitive. 1042 3.1.6 Enable_Datamover 1044 Input qualifiers: Connection_Handle, 1045 Transport_Connection_Descriptor, Final Login_Response_PDU 1046 (optional) 1048 Return results: Not specified 1050 This is used by the iSCSI Layers at the initiator and the target to 1051 request that iSER-assisted mode be used for the connection. The 1052 Transport_Connection_Descriptor qualifier is used to identify the 1053 specific connection associated with the Connection_Handle. The 1054 iSCSI layer can only invoke the Enable_Datamover Operational 1055 Primitive when there was a corresponding prior resource allocation. 1057 The Final_Login_Response_PDU input qualifier is applicable only for 1058 a target, and contains the final Login Response PDU that concludes 1059 the iSCSI Login Phase. 1061 3.1.7 Connection_Terminate 1063 Input qualifiers: Connection_Handle 1065 Return results: Not specified 1067 This is used by the iSCSI Layers at the initiator and the target to 1068 request that a specified iSCSI/iSER connection be terminated and all 1069 associated connection and task resources be freed. When this 1070 Operational Primitive invocation returns to the iSCSI layer, the 1071 iSCSI layer may assume full ownership of all iSCSI-level resources, 1072 e.g. I/O Buffers, associated with the connection. 1074 3.1.8 Notice_Key_Values 1076 Input qualifiers: Connection_Handle, number of keys, list of 1077 Key-Value pairs 1079 Return results: Not specified 1081 This is used by the iSCSI Layers at the initiator and the target to 1082 request the iSER Layer to take note of the specified Key-Value pairs 1083 which were negotiated by the iSCSI peers for the connection. 1085 3.1.9 Deallocate_Task_Resources 1087 Input qualifiers: Connection_Handle, ITT 1089 Return results: Not specified 1091 This is used by the iSCSI Layers at the initiator and the target to 1092 request the deallocation of all RCaP-specific resources allocated by 1093 the iSER Layer for the task identified by the ITT qualifier. The 1094 iSER Layer may require a certain number of RCaP-specific resources 1095 associated with the ITT for each new iSCSI task. In the normal 1096 course of execution, these task-level resources in the iSER Layer 1097 are assumed to be transparently allocated on each task initiation 1098 and deallocated on the conclusion of each task as appropriate. In 1099 exception scenarios where the task does not conclude with a SCSI 1100 Response PDU, the iSER Layer needs to be notified of the individual 1101 task terminations to aid its task-level resource management. This 1102 Operational Primitive is used for this purpose, and is not needed 1103 when a SCSI Response PDU normally concludes a task. Note that RCaP- 1104 specific task resources are deallocated by the iSER Layer when a 1105 SCSI Response PDU normally concludes a task, even if the SCSI Status 1106 was not success. 1108 3.2 Operational Primitives used by iSER 1110 The iSER layer MUST use the following Operational Primitives offered 1111 by the iSCSI protocol layer when the connection is in iSER-assisted 1112 mode. 1114 3.2.1 Control_Notify 1116 Input qualifiers: Connection_Handle, an iSCSI control-type PDU 1118 Return results: Not specified 1120 This is used by the iSER Layers at the initiator and the target to 1121 notify the iSCSI Layer of the availability of an inbound iSCSI 1122 control-type PDU. A PDU is described as "available" to the iSCSI 1123 Layer when the iSER Layer notifies the iSCSI Layer of the reception 1124 of that inbound PDU, along with an implementation-specific 1125 indication as to where the received PDU is. 1127 3.2.2 Data_Completion_Notify 1129 Input qualifiers: Connection_Handle, ITT, SN 1131 Return results: Not specified 1133 This is used by the iSER Layer to notify the iSCSI Layer of the 1134 completion of outbound data transfer that was requested by the iSCSI 1135 Layer only if the invocation of the Put_Data Operational Primitive 1136 (see section 3.1.2) was qualified with Notify_Enable set. SN refers 1137 to the DataSN associated with the SCSI Data-In PDU. 1139 This is used by the iSER Layer to notify the iSCSI Layer of the 1140 completion of inbound data transfer that was requested by the iSCSI 1141 Layer only if the invocation of the Get_Data Operational Primitive 1142 (see section 3.1.3) was qualified with Notify_Enable set. SN refers 1143 to the R2TSN associated with the R2T PDU. 1145 3.2.3 Data_ACK_Notify 1147 Input qualifier: Connection_Handle, ITT, DataSN 1149 Return results: Not specified 1151 This is used by the iSER Layer at the target to notify the iSCSI 1152 Layer of the arrival of the data acknowledgement (as defined in 1153 [iSCSI]) requested earlier by the iSCSI Layer for the outbound data 1154 transfer via an invocation of the Put_Data Operational Primitive 1155 where the A-bit in the SCSI Data-in PDU is set to 1. See section 1156 7.3.5. DataSN refers to the expected DataSN of the next SCSI Data- 1157 in PDU which immediately follows the SCSI Data-in PDU with the A-bit 1158 set to which this notification corresponds, with semantics as 1159 defined in [iSCSI]. 1161 3.2.4 Connection_Terminate_Notify 1163 Input qualifiers: Connection_Handle 1165 Return results: Not specified 1167 This is used by the iSER Layers at the initiator and the target to 1168 notify the iSCSI Layer of the unsolicited termination or failure of 1169 an iSCSI/iSER connection. The iSER Layer MUST deallocate the 1170 connection and task resources associated with the terminated 1171 connection before the invocation of this Operational Primitive. 1172 Note that the Connection_Terminate_Notify Operational Primitive is 1173 not invoked when the termination of the connection was earlier 1174 requested by the local iSCSI Layer. 1176 3.3 iSCSI Protocol Usage Requirements 1178 To operate in an iSER-assisted mode, the iSCSI Layers at both the 1179 initiator and the target MUST negotiate the RDMAExtensions key (see 1180 section 6.3) to "Yes" on the leading connection. If the 1181 RDMAExtensions key is not negotiated to "Yes", then iSER-assisted 1182 mode MUST NOT be used. If the RDMAExtensons key is negotiated to 1183 "Yes" but the invocation of the Allocate_Connection_Resources 1184 Operational Primitive to the iSER layer fails, the iSCSI layer MUST 1185 fail the iSCSI Login process or terminate the connection as 1186 appropriate. See section 10.1.3.1 for details. 1188 If the RDMAExtensions key is negotiated to "Yes", the iSCSI Layer 1189 MUST satisfy the following protocol usage requirements from the iSER 1190 protocol: 1192 1. The iSCSI Layer at the initiator MUST set ExpDataSN to 0 in Task 1193 Management Function Requests for Task Allegiance Reassignment 1194 for read/bidirectional commands, so as to cause the target to 1195 send all unacknowledged read data. 1197 2. The iSCSI Layer at the target MUST always return the SCSI status 1198 in a separate SCSI Response PDU for read commands, i.e., there 1199 MUST NOT be a "phase collapse" in concluding a SCSI Read 1200 Command. 1202 3. The iSCSI Layers at both the initiator and the target MUST 1203 support the keys as defined in section 6 on Login/Text 1204 Operational Keys. If used as specified, these keys MUST NOT be 1205 answered with NotUnderstood and the semantics as defined MUST be 1206 followed for each iSER-assisted connection. 1208 4. The iSCSI Layer at the initiator MUST NOT issue SNACKs for PDUs. 1210 4 Lower Layer Interface Requirements 1212 4.1 Interactions with the RCaP Layer 1214 The iSER protocol layer is layered on top of an RCaP layer (see 1215 Figure 1) and the following are the key features that are assumed to 1216 be supported by any RCaP layer: 1218 * The RCaP layer supports all basic RDMA operations, including RDMA 1219 Write Operation, RDMA Read Operation, and Send Operation. 1221 * The RCaP layer provides reliable, in-order message delivery and 1222 direct data placement. 1224 * When the iSER Layer initiates an RDMA Read Operation following an 1225 RDMA Write Operation on one RCaP Stream, the RDMA Read Response 1226 Message processing on the remote node will be started only after 1227 the preceding RDMA Write Message payload is placed in the memory 1228 of the remote node. 1230 * The RCaP layer encapsulates a single iSER Message into a single 1231 RCaP Message on the Data Source side. The RCaP layer 1232 decapsulates the iSER Message before delivering it to the iSER 1233 Layer on the Data Sink side. 1235 * For a RCaP layer that supports the Send with Invalidate Message 1236 (e.g., iWARP), when the iSER Layer provides the STag to be 1237 remotely invalidated to the RCaP layer for a Send with Invalidate 1238 Message, the RCaP layer uses this STag as the STag to be 1239 invalidated in the Send with Invalidate Message. 1241 * The RCaP layer uses the STag and Tagged Offset provided by the 1242 iSER Layer for the RDMA Write and RDMA Read Request Messages. 1244 * When the RCaP layer delivers the content of an RDMA Send Message 1245 to the iSER Layer, the RCaP layer provides the length of the RDMA 1246 Send message. This ensures that the iSER Layer does not have to 1247 carry a length field in the iSER header. 1249 * When the RCaP layer delivers the Send Message to the iSER Layer, 1250 it notifies the iSER Layer with the mechanism provided on that 1251 interface. 1253 * For a RCaP layer that supports the Send with Invalidate Message 1254 (e.g., iWARP), when the RCaP layer delivers a Send with 1255 Invalidate Message to the iSER Layer, it passes the value of the 1256 STag that was invalidated. 1258 * The RCaP layer propagates all status and error indications to the 1259 iSER Layer. 1261 * For a transport layer that operates in byte stream mode such as 1262 TCP, the RCaP implementation supports the enabling of the RDMA 1263 mode after Connection establishment and the exchange of Login 1264 parameters in byte stream mode. For a transport layer that 1265 provides message delivery capability such as [IB], the RCaP 1266 implementation supports the use of the messaging capability by 1267 the iSCSI Layer directly for the Login phase after connection 1268 establishment before enabling iSER-assisted mode. (In the 1269 specific example of InfiniBand [IB], the iSCSI Layer uses IB 1270 messages to transfer iSCSI PDUs for the Login phase after 1271 connection establishment before enabling iSER-assisted mode.) 1273 * Whenever the iSER Layer terminates the RCaP Stream, the RCaP 1274 layer terminates the associated Connection. 1276 4.2 Interactions with the Transport Layer 1278 After the iSER connection is established, the RCaP layer and the 1279 underlying transport layer are responsible for maintaining the 1280 Connection and reporting to the iSER Layer any Connection failures. 1282 5 Connection Setup and Termination 1284 5.1 iSCSI/iSER Connection Setup 1286 During connection setup, the iSCSI Layer at the initiator is 1287 responsible for establishing a connection with the target. After 1288 the connection is established, the iSCSI Layers at the initiator and 1289 the target enter the Login Phase using the same rules as outlined in 1290 [iSCSI]. The connection transitions into the iSCSI full feature 1291 phase in iSER-assisted mode following a successful login negotiation 1292 between the initiator and the target in which iSER-assisted mode is 1293 negotiated and the connection resources necessary to support RCaP 1294 have been allocated at both the initiator and the target. The same 1295 connection MUST be used for both the iSCSI Login phase and the 1296 subsequent iSER-assisted full feature phase. 1298 For a transport layer that operates in byte stream mode such as TCP, 1299 the RCaP implementation supports the enabling of the RDMA mode after 1300 Connection establishment and the exchange of Login parameters in 1301 byte stream mode. For a transport layer that provides message 1302 delivery capability such as [IB], the RCaP implementation supports 1303 the use of the messaging capability by the iSCSI Layer directly for 1304 the Login phase after connection establishment before enabling iSER- 1305 assisted mode. 1307 iSER-assisted mode MUST NOT be enabled unless it is negotiated on 1308 the leading connection during the LoginOperationalNegotiation Stage 1309 of the iSCSI Login Phase. iSER-assisted mode is negotiated using 1310 the RDMAExtensions= key. Both the initiator and the 1311 target MUST exchange the RDMAExtensions key with the value set to 1312 "Yes" to enable iSER-assisted mode. If both the initiator and the 1313 target fail to negotiate the RDMAExtensions key set to "Yes", then 1314 the connection MUST continue with the login semantics as defined in 1315 [iSCSI]. If the RDMAExtensions key is not negotiated to Yes, then 1316 for some RCaP implementation (such as [IB]), the existing connection 1317 may need to be torn down and a new connection may need to be 1318 established in TCP capable mode. (For InfiniBand this will require 1319 an [IPoIB] type connection.) 1321 iSER-assisted mode is defined for a Normal session only and the 1322 RDMAExtensions key MUST NOT be negotiated for a Discovery session. 1323 Discovery sessions are always conducted using the transport layer as 1324 described in [iSCSI]. 1326 An iSER enabled node is not required to initiate the RDMAExtensions 1327 key exchange if its preference is for the Traditional iSCSI mode. 1328 The RDMAExtensions key, if offered, MUST be sent in the first 1329 available Login Response or Login Request PDU in the 1330 LoginOperationalNegotiation stage. This is due to the fact that the 1331 value of some login parameters might depend on whether iSER-assisted 1332 mode is enabled or not. 1334 iSER-assisted mode is a session-wide attribute. If both the 1335 initiator and the target negotiated RDMAExtensions="Yes" on the 1336 leading connection of a session, then all subsequent connections of 1337 the same session MUST enable iSER-assisted mode without having to 1338 exchange RDMAExtensions key during the iSCSI Login Phase. 1339 Conversely, if both the initiator and the target failed to negotiate 1340 RDMAExtensions to "Yes" on the leading connection of a session, then 1341 the RDMAExtensions key MUST NOT be negotiated further on any 1342 additional subsequent connection of the session. 1344 When the RDMAExtensions key is negotiated to "Yes", the HeaderDigest 1345 and the DataDigest keys MUST be negotiated to "None" on all 1346 iSCSI/iSER connections participating in that iSCSI session. This is 1347 because, for an iSCSI/iSER connection, RCaP is responsible for 1348 providing error detection that is at least as good as a 32-bit CRC 1349 for all iSER Messages. Furthermore, all SCSI Read data are sent 1350 using RDMA Write Messages instead of the SCSI Data-in PDUs, and all 1351 solicited SCSI write data are sent using RDMA Read Response Messages 1352 instead of the SCSI Data-out PDUs. HeaderDigest and DataDigest 1353 which apply to iSCSI PDUs would not be appropriate for RDMA Read and 1354 RDMA Write operations used with iSER. 1356 5.1.1 Initiator Behavior 1358 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1359 mode, then on the initiator side, prior to sending the Login Request 1360 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1361 to FullFeaturePhase, the iSCSI Layer SHOULD request the iSER Layer 1362 to allocate the connection resources necessary to support RCaP by 1363 invoking the Allocate_Connection_Resources Operational Primitive. 1364 The connection resources required are defined by implementation and 1365 are outside the scope of this specification. The iSCSI Layer may 1366 invoke the Notice_Key_Values Operational Primitive before invoking 1367 the Allocate_Connection_Resources Operational Primitive to request 1368 the iSER Layer to take note of the negotiated values of the iSCSI 1369 keys for the Connection. The specific keys to be passed in as input 1370 qualifiers are implementation dependent. These may include, but are 1371 not limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1373 Among the connection resources allocated at the initiator is the 1374 Inbound RDMA Read Queue Depth (IRD). As described in section 9.5.1, 1375 R2Ts are transformed by the target into RDMA Read operations. IRD 1376 limits the maximum number of simultaneously incoming outstanding 1377 RDMA Read Requests per an RCaP Stream from the target to the 1378 initiator. The required value of IRD is outside the scope of the 1379 iSER specification. The iSER Layer at the initiator MUST set IRD to 1380 1 or higher if R2Ts are to be used in the connection. However, the 1381 iSER Layer at the initiator MAY set IRD to 0 based on implementation 1382 configuration which indicates that no R2Ts will be used on that 1383 connection. Initially, the iSER-IRD value at the initiator SHOULD 1384 be set to the IRD value at the initiator and MUST NOT be more than 1385 the IRD value. 1387 On the other hand, the Outbound RDMA Read Queue Depth (ORD) MAY be 1388 set to 0 since the iSER Layer at the initiator does not issue RDMA 1389 Read Requests to the target. 1391 Failure to allocate the requested connection resources locally 1392 results in a login failure and its handling is described in section 1393 10.1.3.1. 1395 The iSER Layer MUST return a success status to the iSCSI Layer in 1396 response to the Allocate_Connection_Resources Operational Primitive. 1398 After the target returns the Login Response with the T bit set to 1 1399 and the NSG field set to FullFeaturePhase, and a status class of 0 1400 (Success), the iSCSI Layer MUST invoke the Enable_Datamover 1401 Operational Primitive with the following qualifiers. (See section 1402 10.1.4.6 for the case when the status class is not Success.): 1404 a. Connection_Handle that identifies the iSCSI connection. 1406 b. Transport_Connection_Descriptor which identifies the 1407 specific transport connection associated with the 1408 Connection_Handle. 1410 The iSER Layer MUST send the iSER Hello Message as the first iSER 1411 Message only if iSERHelloRequired is negotiated to "Yes". See 1412 Section 5.1.3 on iSER Hello Exchange. 1414 If the iSCSI Layer on the initiator side allocates the connection 1415 resources to support RCaP only after it receives the final Login 1416 Response PDU from the target, then it may not be able to handle the 1417 number of unexpected iSCSI control-type PDUs (as declared by the 1418 MaxOutstandingUnexpectedPDUs key from the initiator) that can be 1419 sent by the target before the buffer resources are allocated at the 1420 initiator side. In this case the iSERHelloRequired key SHOULD be 1421 negotiated to "Yes" so that the initiator can allocate the 1422 connection resources before sending the iSER Hello Message. See 1423 section 5.1.3 for more details. 1425 5.1.2 Target Behavior 1427 If the outcome of the iSCSI negotiation is to enable iSER-assisted 1428 mode, then on the target side, prior to sending the Login Response 1429 with the T (Transit) bit set to 1 and the NSG (Next Stage) field set 1430 to FullFeaturePhase, the iSCSI Layer MUST request the iSER Layer to 1431 allocate the resources necessary to support RCaP by invoking the 1432 Allocate_Connection_Resources Operational Primitive. The connection 1433 resources required are defined by implementation and are outside the 1434 scope of this specification. Optionally, the iSCSI Layer may invoke 1435 the Notice_Key_Values Operational Primitive before invoking the 1436 Allocate_Connection_Resources Operational Primitive to request the 1437 iSER Layer to take note of the negotiated values of the iSCSI keys 1438 for the Connection. The specific keys to be passed in as input 1439 qualifiers are implementation dependent. These may include, but not 1440 limited to, MaxOutstandingR2T, ErrorRecoveryLevel, etc. 1442 Premature allocation of RCaP connection resources can expose an iSER 1443 target to a resource exhaustion attack on those resources via 1444 multiple iSER connections that progress only to the point at which 1445 the implementation allocates the RCaP connection resources. The 1446 countermeasure for this attack is initiator authentication; the 1447 iSCSI Layer MUST NOT request the iSER Layer to allocate the 1448 connection resources necessary to support RCaP until the iSCSI layer 1449 is sufficiently far along in the iSCSI Login Phase that it is 1450 reasonably certain that the peer side is not an attacker. In 1451 particular, if the Login Phase includes a SecurityNegotiation stage, 1452 the iSCSI Layer MUST defer the connection resource allocation (i.e. 1453 invoking the Allocate_Connection_Resources Operational Primitive) to 1454 the LoginOperationalNegotiation stage ([iSCSI]) so that the resource 1455 allocation occurs after the authentication phase is completed. 1457 Among the connection resources allocated at the target is the 1458 Outbound RDMA Read Queue Depth (ORD). As described in section 1459 9.5.1, R2Ts are transformed by the target into RDMA Read operations. 1460 The ORD limits the maximum number of simultaneously outstanding RDMA 1461 Read Requests per RCaP Stream from the target to the initiator. 1462 Initially, the iSER-ORD value at the target SHOULD be set to the ORD 1463 value at the target. 1465 On the other hand, the IRD at the target MAY be set to 0 since the 1466 iSER Layer at the target does not expect RDMA Read Requests to be 1467 issued by the initiator. 1469 Failure to allocate the requested connection resources locally 1470 results in a login failure and its handling is described in section 1471 10.1.3.1. 1473 If the iSER Layer at the target is successful in allocating the 1474 connection resources necessary to support RCaP, the following events 1475 MUST occur in the specified sequence: 1477 1. The iSER Layer MUST return a success status to the iSCSI Layer 1478 in response to the Allocate_Connection_Resources Operational 1479 Primitive. 1481 2. The iSCSI Layer MUST invoke the Enable_Datamover Operational 1482 Primitive with the following qualifiers: 1484 a. Connection_Handle that identifies the iSCSI connection. 1486 b. Transport_Connection_Descriptor which identifies the 1487 specific transport connection associated with the 1488 Connection_Handle. 1490 c. The final transport layer (e.g. TCP) message containing the 1491 Login Response with the T bit set to 1 and the NSG field set 1492 to FullFeaturePhase 1494 3. The iSER Layer MUST send the final Login Response PDU in the 1495 native transport mode to conclude the iSCSI Login Phase. If the 1496 underlying transport is TCP, then the iSER Layer MUST send the 1497 final Login Response PDU in byte stream mode. 1499 4. After receiving the iSER Hello Message from the initiator, the 1500 iSER Layer MUST respond with the iSER HelloReply Message to be 1501 sent as the first iSER Message if iSERHelloRequired is 1502 negotiated to "Yes". If the iSER layer receives an iSER Hello 1503 Message when iSERHelloRequired is negotiated to "No", then this 1504 MUST be treated as an iSER protocol error. See section 5.1.3 on 1505 iSER Hello Exchange for more details. 1507 Note: In the above sequence, the operations as described in bullets 1508 3 and 4 MUST be performed atomically for iWARP connections. Failure 1509 to do this may result in race conditions. 1511 5.1.3 iSER Hello Exchange 1513 If iSERHelloRequired is negotiated to "Yes", the first iSER Message 1514 sent by the iSER Layer at the initiator to the target MUST be the 1515 iSER Hello Message. The iSER Hello Message is used by the iSER 1516 Layer at the initiator to declare iSER parameters to the target. 1517 See section 9.3 on iSER Header Format for iSER Hello Message. 1518 Conversely, if iSERHelloRequired is negotiated to "No", then the 1519 iSER Layer at the initiator MUST NOT send an iSER Hello Message. 1521 In response to the iSER Hello Message, the iSER Layer at the target 1522 MUST return the iSER HelloReply Message as the first iSER Message 1523 sent by the target if iSERHelloRequired is negotiated to "Yes". The 1524 iSER HelloReply Message is used by the iSER Layer at the target to 1525 declare iSER parameters to the initiator. See section 9.4 on iSER 1526 Header Format for iSER HelloReply Message. If the iSER layer 1527 receives an iSER Hello Message when iSERHelloRequired is negotiated 1528 to "No", then this MUST be treated as an iSER protocol error. See 1529 section 10.1.3.4 on iSER Protocol Errors for more details 1531 In the iSER Hello Message, the iSER Layer at the initiator declares 1532 the iSER-IRD value to the target. 1534 Upon receiving the iSER Hello Message, the iSER Layer at the target 1535 MUST set the iSER-ORD value to the minimum of the iSER-ORD value at 1536 the target and the iSER-IRD value declared by the initiator. The 1537 iSER Layer at the target MAY adjust (lower) its ORD value to match 1538 the iSER-ORD value if the iSER-ORD value is smaller than the ORD 1539 value at the target in order to free up the unused resources. 1541 In the iSER HelloReply Message, the iSER Layer at the target 1542 declares the iSER-ORD value to the initiator. 1544 Upon receiving the iSER HelloReply Message, the iSER Layer at the 1545 initiator MAY adjust (lower) its IRD value to match the iSER-ORD 1546 value in order to free up the unused resources, if the iSER-ORD 1547 value declared by the target is smaller than the iSER-IRD value 1548 declared by the initiator. 1550 It is an iSER level negotiation failure if the iSER parameters 1551 declared in the iSER Hello Message by the initiator are unacceptable 1552 to the target. This includes the following: 1554 * The initiator-declared iSER-IRD value is greater than 0 and the 1555 target-declared iSER-ORD value is 0. 1557 * The initiator-supported and the target-supported iSER protocol 1558 versions do not overlap. 1560 See section 10.1.3.2 on the handling of the error situation. 1562 An initiator that conforms to [RFC5046] allocates connection 1563 resources before seding the Login Request with the T (Transit) bit 1564 set to 1 and the NSG (Next Stage) field set to FullFeaturePhase. 1565 (For brevity, this is referred to as "early" connection allocation.) 1566 The current iSER specification relaxes this requirement to allow an 1567 initiator to allocate connection resources after it receives the 1568 final Login Response PDU from the target. (For brevity, this is 1569 referred to as "late" connection allocation.) An initiator that 1570 employs "late" connection allocation may encounter problems (e.g., 1571 RCaP connection closure) with a target that sends unexpected iSCSI 1572 PDUs immediately upon transitioning to Full Feature Phase, as 1573 allowed by the negotiated value of the MaxOustandingUnexpectedPDUs 1574 key. The only way to prevent this situation in full generality is 1575 to use iSER Hello Messages, as they enable the initiator to allocate 1576 its connection resources before sending its iSER Hello Message. The 1577 iSERHelloRequired key is used by the initiator to determine if it is 1578 dealing with a target that supports the iSER Hello exchanges. 1579 Fortunately, known iSER target implementations do not take full 1580 advantage of the number of allowed unexpected PDUs immediately upon 1581 transitioning into full feature phase, enabling an initiator 1582 workaround that involves a smaller quantity of connection resources 1583 prior to full-feature phase, as explained further below. 1585 In the following summary where "late" connection allocation is 1586 practised, an initiator that follows [RFC5046] is referred to as an 1587 "old" initiator; otherwise it is referred to as a "new" initiator. 1588 Similarly, a target that does not support the iSERHelloRequired key 1589 (and responds with "NotUnderstood" when negotiating the 1590 iSERHelloRequired key) is referred to as an "old" target; otherwise 1591 it is referred to as a "new" target. Note that an "old" target can 1592 still support the iSER Hello exchanges but this fact is not known by 1593 the initiator. A "new" target can also respond with "No" when 1594 negotiating the iSERHelloRequired key. In this case its behavior 1595 with respect to "late" connection allocation is similar to an "old" 1596 target. 1598 A "new" initiator will work fine with a "new" target. 1600 For an "old" initiator and an "old" target, the failure by the 1601 initiator to handle the number of unexpected iSCSI control-type PDUs 1602 that are sent by the target before the buffer resources are 1603 allocated at the initiator can result in the failure of the iSER 1604 session caused by closure of the underlying RCaP connection. For 1605 the "old" target, there is known implementation that sends one 1606 unexpected iSCSI control-type PDU after sending the final Login 1607 Response and then waits awhile before sending the next one. This 1608 tends to alleviate somewhat the buffer allocation problem at the 1609 initiator. 1611 For a "new" initiator and an "old" target, the failure by the 1612 initiator to handle the number of unexpected iSCSI control-type PDUs 1613 that are sent by the target before the buffer resources are 1614 allocated at the initiator can result in the failure of the iSER 1615 session caused by closure of the underlying RCaP connection. A 1616 "new" initiator MAY choose to terminate the connection; otherwise it 1617 SHOULD do one of the following: 1619 1. Allocate the connection resources before sending the final Login 1620 Request PDU. 1622 2. Allocate one or more buffers for receiving unexpected control- 1623 type PDUs from the target before sending the final Login Request 1624 PDU. This reduces the possibility of the unexpected control-type 1625 PDUs causing the RCaP connection to close before the connection 1626 resources have been allocated. 1628 For an "old" initiator and a "new" target, if the iSERHelloRequired 1629 key is not negotiated, a "new" target MUST still respond with the 1630 iSER HelloReply Message when it receives the iSER Hello Message. If 1631 the iSERHelloRequired key is negotiated to "No" or "NotUnderstood", 1632 a "new" target MAY choose to terminate the connection; otherwise it 1633 SHOULD delay sending any unexpected control-type PDUs until one of 1634 the following events has occurred: 1636 1. A PDU is received from the initiator after it sends the final 1637 Login Response PDU. 1639 2. A system configurable timeout period, say one second, has 1640 expired. 1642 5.2 iSCSI/iSER Connection Termination 1644 5.2.1 Normal Connection Termination at the Initiator 1646 The iSCSI Layer at the initiator terminates an iSCSI/iSER connection 1647 normally by invoking the Send_Control Operational Primitive 1648 qualified with the Logout Request PDU. The iSER Layer at the 1649 initiator MUST use a Send Message to send the Logout Request PDU to 1650 the target. The SendSE Message should be used if supported by the 1651 RCaP layer (e.g., iWARP). After the iSER Layer at the initiator 1652 receives the Send Message containing the Logout Response PDU from 1653 the target, it MUST notify the iSCSI Layer by invoking the 1654 Control_Notify Operational Primitive qualified with the Logout 1655 Response PDU. 1657 After the iSCSI logout process is complete, the iSCSI layer at the 1658 target is responsible for closing the iSCSI/iSER connection as 1659 described in Section 5.2.2. After the RCaP layer at the initiator 1660 reports that the Connection has been closed, the iSER Layer at the 1661 initiator MUST deallocate all connection and task resources (if any) 1662 associated with the connection, invalidate the Local Mappings (if 1663 any) before notifying the iSCSI Layer by invoking the 1664 Connection_Terminate_Notify Operational Primitive. 1666 5.2.2 Normal Connection Termination at the Target 1668 Upon receiving the Send Message containing the Logout Request PDU, 1669 the iSER Layer at the target MUST notify the iSCSI Layer at the 1670 target by invoking the Control_Notify Operational Primitive 1671 qualified with the Logout Request PDU. The iSCSI Layer completes 1672 the logout process by invoking the Send_Control Operational 1673 Primitive qualified with the Logout Response PDU. The iSER Layer at 1674 the target MUST use a Send Message to send the Logout Response PDU 1675 to the initiator. The SendSE Message should be used if supported by 1676 the RCaP layer (e.g., iWARP). After the iSCSI logout process is 1677 complete, the iSCSI Layer at the target MUST request the iSER Layer 1678 at the target to terminate the RCaP Stream by invoking the 1679 Connection_Terminate Operational Primitive. 1681 As part of the termination process, the RCaP layer MUST close the 1682 Connection. When the RCaP layer notifies the iSER Layer after the 1683 RCaP Stream and the associated Connection are terminated, the iSER 1684 Layer MUST deallocate all connection and task resources (if any) 1685 associated with the connection, and invalidate the Local and Remote 1686 Mappings (if any). 1688 5.2.3 Termination without Logout Request/Response PDUs 1690 5.2.3.1 Connection Termination Initiated by the iSCSI Layer 1692 The Connection_Terminate Operational Primitive MAY be invoked by the 1693 iSCSI Layer to request the iSER Layer to terminate the RCaP Stream 1694 without having previously exchanged the Logout Request and Logout 1695 Response PDUs between the two iSCSI/iSER nodes. As part of the 1696 termination process, the RCaP layer will close the Connection. When 1697 the RCaP layer notifies the iSER Layer after the RCaP Stream and the 1698 associated Connection are terminated, the iSER Layer MUST perform 1699 the following actions. 1701 If the Connection_Terminate Operational Primitive is invoked by the 1702 iSCSI Layer at the target, then the iSER Layer at the target MUST 1703 deallocate all connection and task resources (if any) associated 1704 with the connection, and invalidate the Local and Remote Mappings 1705 (if any). 1707 If the Connection_Terminate Operational Primitive is invoked by the 1708 iSCSI Layer at the initiator, then the iSER Layer at the initiator 1709 MUST deallocate all connection and task resources (if any) 1710 associated with the connection, and invalidate the Local Mappings 1711 (if any). 1713 5.2.3.2 Connection Termination Notification to the iSCSI Layer 1715 If the iSCSI/iSER connection is terminated without the invocation of 1716 Connection_Terminate from the iSCSI Layer, the iSER Layer MUST 1717 notify the iSCSI Layer that the iSCSI/iSER connection has been 1718 terminated by invoking the Connection_Terminate_Notify Operational 1719 Primitive. 1721 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1722 target MUST deallocate all connection and task resources (if any) 1723 associated with the connection, and invalidate the Local and Remote 1724 Mappings (if any). 1726 Prior to invoking Connection_Terminate_Notify, the iSER Layer at the 1727 initiator MUST deallocate all connection and task resources (if any) 1728 associated with the connection, and invalidate the Local Mappings 1729 (if any). 1731 If the remote iSCSI/iSER node initiated the closing of the 1732 Connection (e.g., by sending a TCP FIN or TCP RST), the iSER Layer 1733 MUST notify the iSCSI Layer after the RCaP layer reports that the 1734 Connection is closed by invoking the Connection_Terminate_Notify 1735 Operational Primitive. 1737 Another example of a Connection termination without a preceding 1738 logout is when the iSCSI Layer at the initiator does an implicit 1739 logout (connection reinstatement). 1741 6 Login/Text Operational Keys 1743 Certain iSCSI login/text operational keys have restricted usage in 1744 iSER, and additional keys are used to support the iSER protocol 1745 functionality. All other keys defined in [iSCSI] and not discussed 1746 in this section may be used on iSCSI/iSER connections with the same 1747 semantics. 1749 6.1 HeaderDigest and DataDigest 1751 Irrelevant when: RDMAExtensions=Yes 1753 Negotiations resulting in RDMAExtensions=Yes for a session implies 1754 HeaderDigest=None and DataDigest=None for all connections in that 1755 session and overrides both the default and an explicit setting. 1757 6.2 MaxRecvDataSegmentLength 1759 For an iSCSI connection belonging to a session in which 1760 RDMAExtensions=Yes was negotiated on the leading connection of the 1761 session, MaxRecvDataSegmentLength need not be declared in the Login 1762 Phase, and MUST be ignored if it is declared. Instead 1763 InitiatorRecvDataSegmentLength (as described in section 6.5) and 1764 TargetRecvDataSegmentLength (as described in section 6.4) keys are 1765 negotiated. The values of the local and remote 1766 MaxRecvDataSegmentLength are derived from the 1767 InitiatorRecvDataSegmentLength and TargetRecvDataSegmentLength keys. 1769 In the full feature phase, the initiator MUST consider the value of 1770 its local MaxRecvDataSegmentLength (that it would have declared to 1771 the target) as having the value of InitiatorRecvDataSegmentLength, 1772 and the value of the remote MaxRecvDataSegmentLength (that would 1773 have been declared by the target) as having the value of 1774 TargetRecvDataSegmentLength. Similarly, the target MUST consider 1775 the value of its local MaxRecvDataSegmentLength (that it would have 1776 declared to the initiator) as having the value of 1777 TargetRecvDataSegmentLength, and the value of the remote 1778 MaxRecvDataSegmentLength (that would have been declared by the 1779 initiator) as having the value of InitiatorRecvDataSegmentLength. 1781 Note that RFC 3720 requires that when a target receives a NOP-Out 1782 request with a valid Initiator Task Tag, it responds with a NOP-In 1783 with the same Initiator Task Tag that was provided in the NOP-Out 1784 request. Furthermore, it returns the first MaxRecvDataSegmentLength 1785 bytes of the initiator provided Ping Data. Since there is no 1786 MaxRecvDataSegmentLength common to the initiator and the target in 1787 iSER, the length of the data sent with the NOP-Out request MUST NOT 1788 exceed InitiatorMaxRecvDataSegmentLength. 1790 The MaxRecvDataSegmentLength key is applicable only for iSCSI 1791 control-type PDUs. 1793 6.3 RDMAExtensions 1795 Use: LO (leading only) 1797 Senders: Initiator and Target 1799 Scope: SW (session-wide) 1801 RDMAExtensions= 1803 Irrelevant when: SessionType=Discovery 1805 Default is No 1807 Result function is AND 1809 This key is used by the initiator and the target to negotiate the 1810 support for iSER-assisted mode. To enable the use of iSER-assisted 1811 mode, both the initiator and the target MUST exchange 1812 RDMAExtensions=Yes. iSER-assisted mode MUST NOT be used if either 1813 the initiator or the target offers RDMAExtensions=No. 1815 An iSER-enabled node is not required to initiate the RDMAExtensions 1816 key exchange if it prefers to operate in the Traditional iSCSI mode. 1817 However, if the RDMAExtensions key is to be negotiated, an initiator 1818 MUST offer the key in the first Login Request PDU in the 1819 LoginOperationalNegotiation stage of the leading connection, and a 1820 target MUST offer the key in the first Login Response PDU with which 1821 it is allowed to do so (i.e., the first Login Response PDU issued 1822 after the first Login Request PDU with the C bit set to 0) in the 1823 LoginOperationalNegotiation stage of the leading connection. In 1824 response to the offered key=value pair of RDMAExtensions=yes, an 1825 initiator MUST respond in the next Login Request PDU with which it 1826 is allowed to do so, and a target MUST respond in the next Login 1827 Response PDU with which it is allowed to do so. 1829 Negotiating the RDMAExtensions key first enables a node to negotiate 1830 the optimal value for other keys. Certain iSCSI keys such as 1831 MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T, 1832 ImmediateData, etc., may be negotiated differently depending on 1833 whether connection is in Traditional iSCSI mode or iSER-assisted 1834 mode. 1836 6.4 TargetRecvDataSegmentLength 1838 Use: IO (Initialize only) 1840 Senders: Initiator and Target 1842 Scope: CO (connection-only) 1844 Irrelevant when: RDMAExtensions=No 1846 TargetRecvDataSegmentLength= 1848 Default is 8192 bytes 1850 Result function is minimum 1852 This key is relevant only for the iSCSI connection of an iSCSI 1853 session if RDMAExtensions=Yes was negotiated on the leading 1854 connection of the session. It is used by the initiator and the 1855 target to negotiate the maximum size of the data segment that an 1856 initiator may send to the target in an iSCSI control-type PDU in the 1857 full feature phase. For SCSI Command PDUs and SCSI Data-out PDUs 1858 containing non-immediate unsolicited data to be sent by the 1859 initiator, the initiator MUST send all non-Final PDUs with a data 1860 segment size of exactly TargetRecvDataSegmentLength whenever the 1861 PDUs constitute a data sequence whose size is larger than 1862 TargetRecvDataSegmentLength. 1864 6.5 InitiatorRecvDataSegmentLength 1866 Use: IO (Initialize only) 1868 Senders: Initiator and Target 1870 Scope: CO (connection-only) 1872 Irrelevant when: RDMAExtensions=No 1874 InitiatorRecvDataSegmentLength= 1876 Default is 8192 bytes 1878 Result function is minimum 1879 This key is relevant only for the iSCSI connection of an iSCSI 1880 session if RDMAExtensions=Yes was negotiated on the leading 1881 connection of the session. It is used by the initiator and the 1882 target to negotiate the maximum size of the data segment that a 1883 target may send to the initiator in an iSCSI control-type PDU in the 1884 full feature phase. 1886 6.6 OFMarker and IFMarker 1888 Irrelevant when: RDMAExtensions=Yes 1890 Negotiations resulting in RDMAExtensions=Yes for a session implies 1891 OFMarker=No and IFMarker=No for all connections in that session and 1892 overrides both the default and an explicit setting. 1894 6.7 MaxOutstandingUnexpectedPDUs 1896 Use: LO (leading only), Declarative 1898 Senders: Initiator and Target 1900 Scope: SW (session-wide) 1902 Irrelevant when: RDMAExtensions=No 1904 MaxOutstandingUnexpectedPDUs= 1907 Default is 0 1909 This key is used by the initiator and the target to declare the 1910 maximum number of outstanding "unexpected" iSCSI control-type PDUs 1911 that it can receive in the full feature phase. It is intended to 1912 allow the receiving side to determine the amount of buffer resources 1913 needed beyond the normal flow control mechanism available in iSCSI. 1914 An initiator or target should select a value such that it would not 1915 impose an unnecessary constraint on the iSCSI Layer under normal 1916 circumstances. The value of 0 is defined to indicate that the 1917 declarer has no limit on the maximum number of outstanding 1918 "unexpected" iSCSI control-type PDUs that it can receive. See 1919 sections 8.1.1 and 8.1.2 for the usage of this key. Note that iSER 1920 Hello and HelloReply Messages are not iSCSI control-type PDUs and 1921 are not affected by this key. 1923 For interoperability with implementations based on [RFC5046], this 1924 key SHOULD be negotiated because the default value of 0 in [RFC5046] 1925 is problematic for most implementations as it does not impose a 1926 bound on resources consumable by unexpected PDUs. 1928 6.8 MaxAHSLength 1930 Use: LO (leading only), Declarative 1932 Senders: Initiator and Target 1934 Scope: SW (session-wide) 1936 Irrelevant when: RDMAExtensions=No 1938 MaxAHSLength= 1940 Default is 256 1942 This key is used by the intiator and target to declare the maximum 1943 size of AHS in an iSCSI control-type PDU that it can receive in the 1944 full feature phase. It is intended to allow the receiving side to 1945 determine the amount of resources needed for receive buffering. An 1946 initiator or target should select a value such that it would not 1947 impose an unnecessary constraint on the iSCSI Layer under normal 1948 circumstances. The value of 0 is defined to indicate that the 1949 declarer has no limit on the maximum size of AHS in iSCSI control- 1950 type PDUs that it can receive. 1952 For interoperability with implementations based on [RFC5046], an 1953 initiator or target MAY terminate the connection if it anticipates 1954 MaxAHSLength to be greater than 256 and the key is not understood by 1955 its peer. 1957 6.9 TaggedBufferForSolicitedDataOnly 1959 Use: LO (leading only), Declarative 1961 Senders: Initiator 1963 Scope: SW (session-wide) 1965 RDMAExtensions= 1967 Irrelevant when: RDMAExtensions=No 1969 Default is No 1970 This key is used by the intiator to declare to the target the usage 1971 of the Write Base Offset in the iSER header of an iSCSI control-type 1972 PDU. When set to No, the Base Offset is associated with an I/O 1973 buffer that contains all the write data, including both unsolicited 1974 and solicited data. When set to Yes, the Base Offset is associated 1975 with an I/O buffer that only contains solicited data. 1977 6.10 iSERHelloRequired 1979 Use: LO (leading only), Declarative 1981 Senders: Initiator 1983 Scope: SW (session-wide) 1985 RDMAExtensions= 1987 Irrelevant when: RDMAExtensions=No 1989 Default is No 1991 This key is relevant only for the iSCSI connection of an iSCSI 1992 session if RDMAExtensions=Yes was negotiated on the leading 1993 connection of the session. It is used by the intiator to declare to 1994 the target if the iSER Hello Exchange is required. When set to Yes, 1995 the iSER layers MUST perform the iSER Hello Exchange as described in 1996 5.1.3. When set to No, the iSER layers MUST NOT perform the iSER 1997 Hello Exchange. 1999 7 iSCSI PDU Considerations 2001 When a connection is in the iSER-assisted mode, two types of message 2002 transfers are allowed between the iSCSI Layer at the initiator and 2003 the iSCSI Layer at the target. These are known as the iSCSI data- 2004 type PDUs and the iSCSI control-type PDUs and these terms are 2005 described in the following sections. 2007 7.1 iSCSI Data-Type PDU 2009 An iSCSI data-type PDU is defined as an iSCSI PDU that causes data 2010 transfer, transparent to the remote iSCSI layer, to take place 2011 between the peer iSCSI nodes in the full feature phase of an 2012 iSCSI/iSER connection. An iSCSI data-type PDU, when requested for 2013 transmission by the iSCSI Layer in the sending node, results in the 2014 data being transferred without the participation of the iSCSI Layers 2015 at the sending and the receiving nodes. This is due to the fact 2016 that the PDU itself is not delivered as-is to the iSCSI Layer in the 2017 receiving node. Instead, the data transfer operations are 2018 transformed into the appropriate RDMA operations which are handled 2019 by the RDMA-Capable Controller. The set of iSCSI data-type PDUs 2020 consists of SCSI Data-in PDUs and R2T PDUs. 2022 If the invocation of the Operational Primitive by the iSCSI Layer to 2023 request the iSER Layer to process an iSCSI data-type PDU is 2024 qualified with Notify_Enable set, then upon completing the RDMA 2025 operation, the iSER Layer at the target MUST notify the iSCSI Layer 2026 at the target by invoking the Data_Completion_Notify Operational 2027 Primitive qualified with ITT and SN. There is no data completion 2028 notification at the initiator since the RDMA operations are 2029 completely handled by the RDMA-Capable Controller at the initiator 2030 and the iSER Layer at the initiator is not involved with the data 2031 transfer associated with iSCSI data-type PDUs. 2033 If the invocation of the Operational Primitive by the iSCSI Layer to 2034 request the iSER Layer to process an iSCSI data-type PDU is 2035 qualified with Notify_Enable cleared, then upon completing the RDMA 2036 operation, the iSER Layer at the target MUST NOT notify the iSCSI 2037 Layer at the target and MUST NOT invoke the Data_Completion_Notify 2038 Operational Primitive. 2040 If an operation associated with an iSCSI data-type PDU fails for any 2041 reason, the contents of the Data Sink buffers associated with the 2042 operation are considered indeterminate. 2044 7.2 iSCSI Control-Type PDU 2046 Any iSCSI PDU that is not an iSCSI data-type PDU and also not a SCSI 2047 Data-out PDU carrying solicited data is defined as an iSCSI control- 2048 type PDU. The iSCSI Layer invokes the Send_Control Operational 2049 Primitive to request the iSER Layer to process an iSCSI control-type 2050 PDU. iSCSI control-type PDUs are transferred using Send Messages of 2051 RCaP. Specifically, it is to be noted that SCSI Data-Out PDUs 2052 carrying unsolicited data are defined as iSCSI control-type PDUs. 2053 See section 7.3.4 on the treatment of SCSI Data-out PDUs. 2055 When the iSER Layer receives an iSCSI control-type PDU, it MUST 2056 notify the iSCSI Layer by invoking the Control_Notify Operational 2057 Primitive qualified with the iSCSI control-type PDU. 2059 7.3 iSCSI PDUs 2061 This section describes the handling of each of the iSCSI PDU types 2062 by the iSER Layer. The iSCSI Layer requests the iSER Layer to 2063 process the iSCSI PDU by invoking the appropriate Operational 2064 Primitive. A Connection_Handle MUST qualify each of these 2065 invocations. In addition, BHS and the optional AHS of the iSCSI PDU 2066 as defined in [iSCSI] MUST qualify each of the invocations. The 2067 qualifying Connection_Handle, the BHS and the AHS are not explicitly 2068 listed in the subsequent sections. 2070 7.3.1 SCSI Command 2072 Type: control-type PDU 2074 PDU-specific qualifiers (for SCSI Write or bidirectional 2075 command): ImmediateDataSize, UnsolicitedDataSize, 2076 DataDescriptorOut 2078 PDU-specific qualifiers (for SCSI Read or bidirectional 2079 command): DataDescriptorIn 2081 The iSER Layer at the initiator MUST send the SCSI command in a Send 2082 Message to the target. The SendSE Message should be used if 2083 supported by the RCaP layer (e.g., iWARP). 2085 For a SCSI Write or bidirectional command, the iSCSI Layer at the 2086 initiator MUST invoke the Send_Control Operational Primitive as 2087 follows: 2089 * If there is immediate data to be transferred for the SCSI write 2090 or bidirectional command, the qualifier ImmediateDataSize MUST be 2091 used to define the number of bytes of immediate unsolicited data 2092 to be sent with the write or bidirectional command, and the 2093 qualifier DataDescriptorOut MUST be used to define the 2094 initiator's I/O Buffer containing the SCSI Write data. 2096 * If there is unsolicited data to be transferred for the SCSI Write 2097 or bidirectional command, the qualifier UnsolicitedDataSize MUST 2098 be used to define the number of bytes of immediate and non- 2099 immediate unsolicited data for the command. The iSCSI Layer will 2100 issue one or more SCSI Data-out PDUs for the non-immediate 2101 unsolicited data. See Section 7.3.4 on SCSI Data-out. 2103 * If there is solicited data to be transferred for the SCSI Write 2104 or bidirectional command, as indicated by the Expected Data 2105 Transfer Length in the SCSI Command PDU exceeding the value of 2106 UnsolicitedDataSize, the iSER Layer at the initiator MUST do the 2107 following: 2109 a. It MUST allocate a Write STag for the I/O Buffer defined by 2110 the qualifier DataDescriptorOut. DataDescriptorOut 2111 describes the I/O buffer starting with the immediate 2112 unsolicited data (if any), followed by the non-immediate 2113 unsolicited data (if any) and solicited data. When 2114 TaggedBufferForSolicitedDataOnly is negotiated to No, the 2115 Base Offset is associated with this I/O Buffer. When 2116 TaggedBufferForSolicitedDataOnly is negotiated to Yes, the 2117 Base Offset is associated with an I/O Buffer that contains 2118 only solicited data. 2120 b. It MUST establish a Local Mapping that associates the 2121 Initiator Task Tag (ITT) to the Write STag. 2123 c. It MUST Advertise the Write STag and the Base Offset to the 2124 target by sending them in the iSER header of the iSER 2125 Message (the payload of the Send Message of RCaP) containing 2126 the SCSI Write or bidirectional command PDU. The SendSE 2127 Message should be used if supported by the RCaP layer (e.g., 2128 iWARP). See section 9.2 on iSER Header Format for iSCSI 2129 Control-Type PDU. 2131 For a SCSI Read or bidirectional command, the iSCSI Layer at the 2132 initiator MUST invoke the Send_Control Operational Primitive 2133 qualified with DataDescriptorIn which defines the initiator's I/O 2134 Buffer for receiving the SCSI Read data. The iSER Layer at the 2135 initiator MUST do the following: 2137 a. It MUST allocate a Read STag for the I/O Buffer and note the 2138 Base Offset for this I/O Buffer. 2140 b. It MUST establish a Local Mapping that associates the 2141 Initiator Task Tag (ITT) to the Read STag. 2143 c. It MUST Advertise the Read STag and the Base Offset to the 2144 target by sending them in the iSER header of the iSER 2145 Message (the payload of the Send Message of RCaP) containing 2146 the SCSI Read or bidirectional command PDU. The SendSE 2147 Message should be used if supported by the RCaP layer (e.g., 2148 iWARP). See section 9.2 on iSER Header Format for iSCSI 2149 Control-Type PDU. 2151 If the amount of unsolicited data to be transferred in a SCSI 2152 Command exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at 2153 the initiator MUST segment the data into multiple iSCSI control-type 2154 PDUs, with the data segment length in all PDUs generated except the 2155 last one having exactly the size TargetRecvDataSegmentLength. The 2156 data segment length of the last iSCSI control-type PDU carrying the 2157 unsolicited data can be up to TargetRecvDataSegmentLength. 2159 When the iSER Layer at the target receives the SCSI Command, it MUST 2160 establish a Remote Mapping that associates the ITT to the Base 2161 Offset(s) and the Advertised STag(s) in the iSER header. The Write 2162 STag is used by the iSER Layer at the target in handling the data 2163 transfer associated with the R2T PDU(s) as described in section 2164 7.3.6. The Read STag is used in handling the SCSI Data-in PDU(s) 2165 from the iSCSI Layer at the target as described in section 7.3.5. 2167 7.3.2 SCSI Response 2169 Type: control-type PDU 2171 PDU-specific qualifiers: DataDescriptorStatus 2173 The iSCSI Layer at the target MUST invoke the Send_Control 2174 Operational Primitive qualified with DataDescriptorStatus which 2175 defines the buffer containing the sense and response information. 2176 The iSCSI Layer at the target MUST always return the SCSI status for 2177 a SCSI command in a separate SCSI Response PDU. "Phase collapse" 2178 for transferring SCSI status in a SCSI Data-in PDU MUST NOT be used. 2179 The iSER Layer at the target sends the SCSI Response PDU according 2180 to the following rules: 2182 * If no STags were Advertised by the initiator in the iSER Message 2183 containing the SCSI command PDU, then the iSER Layer at the 2184 target MUST send a Send Message containing the SCSI Response PDU. 2185 The SendSE Message should be used if supported by the RCaP layer 2186 (e.g., iWARP). 2188 * If the initiator Advertised a Read STag in the iSER Message 2189 containing the SCSI Command PDU, then the iSER Layer at the 2190 target MUST send a Send Message containing the SCSI Response PDU. 2191 The header of the Send Message MUST carry the Read STag to be 2192 invalidated at the initiator. The Send with Invalidate Message, 2193 if supported by the RCaP layer (e.g., iWARP), can be used for the 2194 automatic invalidation of the STag. 2196 * If the initiator Advertised only the Write STag in the iSER 2197 Message containing the SCSI command PDU, then the iSER Layer at 2198 the target MUST send a Send Message containing the SCSI Response 2199 PDU. The header of the Send Message MUST carry the Write STag to 2200 be invalidated at the initiator. The Send with Invalidate 2201 Message, if supported by the RCaP layer (e.g., iWARP), can be 2202 used for the automatic invalidation of the STag. 2204 When the iSCSI Layer at the target invokes the Send_Control 2205 Operational Primitive to send the SCSI Response PDU, the iSER Layer 2206 at the target MUST invalidate the Remote Mapping before transferring 2207 the SCSI Response PDU to the initiator. 2209 Upon receiving a Send Message containing the SCSI Response PDU from 2210 the target, the iSER layer at the initiator MUST invalidate the 2211 STag(s) specified in the header. (If a Send with Invalidate Message 2212 is supported by the RCaP layer (e.g., iWARP) and is used to carry 2213 the SCSI Response PDU, the RCaP layer at the initiator will 2214 invalidate the STag. The iSER Layer at the initiator MUST ensure 2215 that the correct STag is invalidated. If both the Read and the 2216 Write STags were Advertised earlier by the initiator, then the iSER 2217 Layer at the initiator MUST explicitly invalidate the Write STag 2218 upon receiving the Send with Invalidate Message because the header 2219 of the Send with Invalidate Message can only carry one STag (in this 2220 case the Read STag) to be invalidated.) 2222 The iSER Layer at the initiator MUST ensure the invalidation of the 2223 STag(s) used in a command before notifying the iSCSI Layer at the 2224 initiator by invoking the Control_Notify Operational Primitive 2225 qualified with the SCSI Response. This precludes the possibility of 2226 using the STag(s) after the completion of the command thereby 2227 causing data corruption. 2229 When the iSER Layer at the initiator receives a Send Message 2230 containing the SCSI Response PDU, it SHOULD invalidate the Local 2231 Mapping. The iSER Layer MUST ensure that all local STag(s) 2232 associated with the ITT are invalidated before notifying the iSCSI 2233 Layer of the SCSI Response PDU by invoking the Control_Notify 2234 Operational Primitive qualified with the SCSI Response PDU. 2236 7.3.3 Task Management Function Request/Response 2238 Type: control-type PDU 2240 PDU-specific qualifiers (for TMF Request): DataDescriptorOut, 2241 DataDescriptorIn 2243 The iSER Layer MUST use a Send Message to send the Task Management 2244 Function Request/Response PDU. The SendSE Message should be used if 2245 supported by the RCaP layer (e.g., iWARP). 2247 For the Task Management Function Request with the TASK REASSIGN 2248 function, the iSER Layer at the initiator MUST do the following: 2250 * It MUST use the ITT as specified in the Referenced Task Tag from 2251 the Task Management Function Request PDU to locate the existing 2252 STags (if any) in the Local Mappings. 2254 * It MUST invalidate the existing STags (if any) and the Local 2255 Mappings. 2257 * It MUST allocate a Read STag for the I/O Buffer and note the Base 2258 Offset associated with the I/O Buffer as defined by the qualifier 2259 DataDescriptorIn if the Send_Control Operational Primitive 2260 invocation is qualified with DataDescriptorIn. 2262 * It MUST allocate a Write STag for the I/O Buffer and note the 2263 Base OIffset associated with the I/O Buffer as defined by the 2264 qualifier DataDescriptorOut if the Send_Control Operational 2265 Primitive invocation is qualified with DataDescriptorOut. 2267 * If STags are allocated, it MUST establish new Local Mapping(s) 2268 that associate the ITT to the allocated STag(s). 2270 * It MUST Advertise the STags and the Base Offsets, if allocated, 2271 to the target in the iSER header of the Send Message carrying the 2272 iSCSI PDU, as described in section 9.2. The SendSE Message 2273 should be used if supported by the RCaP layer (e.g., iWARP). 2275 For the Task Management Function Request with the TASK REASSIGN 2276 function for a SCSI Read or bidirectional command, the iSCSI Layer 2277 at the initiator MUST set ExpDataSN to 0 since the data transfer and 2278 acknowledgements happen transparently to the iSCSI Layer at the 2279 initiator. This provides the flexibility to the iSCSI Layer at the 2280 target to request transmission of only the unacknowledged data as 2281 specified in [iSCSI]. 2283 When the iSER Layer at the target receives the Task Management 2284 Function Request with the TASK REASSIGN function, it MUST do the 2285 following: 2287 * It MUST use the ITT as specified in the Referenced Task Tag from 2288 the Task Management Function Request PDU to locate the Local and 2289 Remote Mappings (if any). 2291 * It MUST invalidate the local STaqs (if any) associated with the 2292 ITT. 2294 * It MUST replace the Base Offset(s) and the Advertised STag(s) in 2295 the Remote Mapping with the Base Offset(s) and the Advertised 2296 STag(s) in the iSER header. The Write STag is used in the 2297 handling of the R2T PDU(s) from the iSCSI Layer at the target as 2298 described in section 7.3.6. The Read STag is used in the 2299 handling of the SCSI Data-in PDU(s) from the iSCSI Layer at the 2300 target as described in section 7.3.5. 2302 7.3.4 SCSI Data-out 2304 Type: control-type PDU 2306 PDU-specific qualifiers: DataDescriptorOut 2308 The iSCSI Layer at the initiator MUST invoke the Send_Control 2309 Operational Primitive qualified with DataDescriptorOut which defines 2310 the initiator's I/O Buffer containing unsolicited SCSI Write data. 2312 If the amount of unsolicited data to be transferred as SCSI Data-out 2313 exceeds TargetRecvDataSegmentLength, then the iSCSI Layer at the 2314 initiator MUST segment the data into multiple iSCSI control-type 2315 PDUs, with the DataSegmentLength having the value of 2316 TargetRecvDataSegmentLength in all PDUs generated except the last 2317 one. The DataSegmentLength of the last iSCSI control-type PDU 2318 carrying the unsolicited data can be up to 2319 TargetRecvDataSegmentLength. The iSCSI Layer at the target MUST 2320 perform the reassembly function for the unsolicited data. 2322 For unsolicited data, the iSER Layer at the initiator MUST use a 2323 Send Message to send the SCSI Data-out PDU. If the F bit is set to 2324 1, the SendSE Message shoud be used if supported by the RCaP layer 2325 (e.g., iWARP). 2327 Note that for solicited data, the SCSI Data-out PDUs are not used 2328 since R2T PDUs are not delivered to the iSCSI layer at the 2329 initiator; instead R2T PDUs are transformed by the iSER layer at the 2330 target into RDMA Read operations. (See section 7.3.6.) 2332 7.3.5 SCSI Data-in 2334 Type: data-type PDU 2336 PDU-specific qualifiers: DataDescriptorIn 2338 When the iSCSI Layer at the target is ready to return the SCSI Read 2339 data to the initiator, it MUST invoke the Put_Data Operational 2340 Primitive qualified with DataDescriptorIn which defines the SCSI 2341 Data-in buffer. See section 7.1 on the general requirement on the 2342 handling of iSCSI data-type PDUs. SCSI Data-in PDU(s) are used in 2343 SCSI Read data transfer as described in section 9.5.2. 2345 The iSER Layer at the target MUST do the following for each 2346 invocation of the Put_Data Operational Primitive: 2348 1. It MUST use the ITT in the SCSI Data-in PDU to locate the remote 2349 Read STag and the Base Offset in the Remote Mapping. The Remote 2350 Mapping was established earlier by the iSER Layer at the target 2351 when the SCSI Read Command was received from the initiator. 2353 2. It MUST generate and send an RDMA Write Message containing the 2354 read data to the initiator. 2356 a. It MUST use the remote Read STag as the Data Sink STag of 2357 the RDMA Write Message. 2359 b. It MUST add the Buffer Offset from the SCSI Data-in PDU to 2360 the Base Offset from the Remote Mapping as the Data Sink 2361 Tagged Offset of the RDMA Write Message. 2363 c. It MUST use DataSegmentLength from the SCSI Data-in PDU to 2364 determine the amount of data to be sent in the RDMA Write 2365 Message. 2367 3. It MUST associate DataSN and ITT from the SCSI Data-in PDU with 2368 the RDMA Write operation. If the Put_Data Operational Primitive 2369 invocation was qualified with Notify_Enable set, then when the 2370 iSER Layer at the target receives a completion from the RCaP 2371 layer for the RDMA Write Message, the iSER Layer at the target 2372 MUST notify the iSCSI Layer by invoking the 2373 Data_Completion_Notify Operational Primitive qualified with 2374 DataSN and ITT. Conversely, if the Put_Data Operational 2375 Primitive invocation was qualified with Notify_Enable cleared, 2376 then the iSER Layer at the target MUST NOT notify the iSCSI 2377 Layer on completion and MUST NOT invoke the 2378 Data_Completion_Notify Operational Primitive. 2380 When the A-bit is set to 1 in the SCSI Data-in PDU, the iSER Layer 2381 at the target MUST notify the iSCSI Layer at the target when the 2382 data transfer is complete at the initiator. To perform this 2383 additional function, the iSER Layer at the target can take advantage 2384 of the operational ErrorRecoveryLevel if previously disclosed by the 2385 iSCSI Layer via an earlier invocation of the Notice_Key_Values 2386 Operational Primitive. There are two approaches that can be taken: 2388 1. If the iSER Layer at the target knows that the operational 2389 ErrorRecoveryLevel is 2, or if the iSER Layer at the target does 2390 not know the operational ErrorRecoveryLevel, then the iSER Layer 2391 at the target MUST issue a zero-length RDMA Read Request Message 2392 following the RDMA Write Message. When the iSER Layer at the 2393 target receives a completion for the RDMA Read Request Message 2394 from the RCaP layer, implying that the RDMA-Capable Controller 2395 at the initiator has completed processing the RDMA Write Message 2396 due to the completion ordering semantics of RCaP, the iSER Layer 2397 at the target MUST notify the iSCSI Layer at the target by 2398 invoking the Data_Ack_Notify Operational Primitive qualified 2399 with ITT and DataSN (see section 3.2.3). 2401 2. If the iSER Layer at the target knows that the operational 2402 ErrorRecoveryLevel is 1, then the iSER Layer at the target MUST 2403 do one of the following: 2405 a. It MUST notify the iSCSI Layer at the target by invoking the 2406 Data_Ack_Notify Operational Primitive qualified with ITT and 2407 DataSN (see section 3.2.3) when it receives the local 2408 completion from the RCaP layer for the RDMA Write Message. 2409 This is allowed since digest errors do not occur in iSER 2410 (see section 10.1.4.2) and a CRC error will cause the 2411 connection to be terminated and the task to be terminated 2412 anyway. The local RDMA Write completion from the RCaP layer 2413 guarantees that the RCaP layer will not access the I/O 2414 Buffer again to transfer the data associated with that RDMA 2415 Write operation. 2417 b. Alternatively, it MUST use the same procedure for handling 2418 the data transfer completion at the initiator as for 2419 ErrorRecoveryLevel 2. 2421 It should be noted that the iSCSI Layer at the target cannot set the 2422 A-bit to 1 if the ErrorRecoveryLevel=0. 2424 SCSI status MUST always be returned in a separate SCSI Response PDU. 2425 The S bit in the SCSI Data-in PDU MUST always be set to 0. There 2426 MUST NOT be a "phase collapse" in the SCSI Data-in PDU. 2428 Since the RDMA Write Message only transfers the data portion of the 2429 SCSI Data-in PDU but not the control information in the header, such 2430 as ExpCmdSN, if timely updates of such information is crucial, the 2431 iSCSI Layer at the initiator MAY issue NOP-Out PDUs to request the 2432 iSCSI Layer at the target to respond with the information using NOP- 2433 In PDUs. 2435 7.3.6 Ready To Transfer (R2T) 2437 Type: data-type PDU 2439 PDU-specific qualifiers: DataDescriptorOut 2441 In order to send an R2T PDU, the iSCSI Layer at the target MUST 2442 invoke the Get_Data Operational Primitive qualified with 2443 DataDescriptorOut which defines the I/O Buffer for receiving the 2444 SCSI Write data from the initiator. See section 7.1 on the general 2445 requirements on the handling of iSCSI data-type PDUs. 2447 The iSER Layer at the target MUST do the following for each 2448 invocation of the Get_Data Operational Primitive: 2450 1. It MUST ensure a valid local STag for the I/O Buffer and a valid 2451 Local Mapping. This may involve allocating a valid local STag 2452 and establishing a Local Mapping. 2454 2. It MUST use the ITT in the R2T to locate the remote Write STag 2455 and the Base Offset in the Remote Mapping. The Remote Mapping 2456 was established earlier by the iSER Layer at the target when the 2457 iSER Message containing the Advertised Write STag, the Base 2458 Offset and the SCSI Command PDU for a SCSI Write or 2459 bidirectional command was received from the initiator. 2461 3. If the iSER-ORD value at the target is set to 0, the iSER Layer 2462 at the target MUST terminate the connection and free up the 2463 resources associated with the connection (as described in 5.2.3) 2464 if it received the R2T PDU from the iSCSI Layer at the target. 2465 Upon termination of the connection, the iSER Layer at the target 2466 MUST notify the iSCSI Layer at the target by invoking the 2467 Connection Terminate Notify Operational Primitive. 2469 4. If the iSER-ORD value at the target is set to greater than 0, 2470 the iSER Layer at the target MUST transform the R2T PDU into an 2471 RDMA Read Request Message. While transforming the R2T PDU, the 2472 iSER Layer at the target MUST ensure that the number of 2473 outstanding RDMA Read Request Messages does not exceed iSER-ORD 2474 value. To transform the R2T PDU, the iSER Layer at the target: 2476 a. MUST derive the local STag and local Tagged Offset from the 2477 DataDescriptorOut that qualified the Get_Data invocation. 2479 b. MUST use the local STag as the Data Sink STag of the RDMA 2480 Read Request Message. 2482 c. MUST use the local Tagged Offset as the Data Sink Tagged 2483 Offset of the RDMA Read Request Message. 2485 d. MUST use the Desired Data Transfer Length from the R2T PDU 2486 as the RDMA Read Message Size of the RDMA Read Request 2487 Message. 2489 e. MUST use the remote Write STag as the Data Source STag of 2490 the RDMA Read Request Message. 2492 f. MUST add the Buffer Offset from the R2T PDU to the Base 2493 Offset from the Remote Mapping as the Data Source Tagged 2494 Offset of the RDMA Read Request Message. 2496 5. It MUST associate R2TSN and ITT from the R2T PDU with the RDMA 2497 Read operation. If the Get_Data Operational Primitive 2498 invocation was qualified with Notify_Enable set, then when the 2499 iSER Layer at the target receives a completion from the RCaP 2500 layer for the RDMA Read operation, the iSER Layer at the target 2501 MUST notify the iSCSI Layer by invoking the 2502 Data_Completion_Notify Operational Primitive qualified with 2503 R2TSN and ITT. Conversely, if the Get_Data Operational 2504 Primitive invocation was qualified with Notify_Enable cleared, 2505 then the iSER Layer at the target MUST NOT notify the iSCSI 2506 Layer on completion and MUST NOT invoke the 2507 Data_Completion_Notify Operational Primitive. 2509 When the RCaP layer at the initiator receives a valid RDMA Read 2510 Request Message, it will return an RDMA Read Response Message 2511 containing the solicited write data to the target. When the RCaP 2512 layer at target receives the RDMA Read Response Message from the 2513 initiator, it will place the solicited data in the I/O Buffer 2514 referenced by the Data Sink STag in the RDMA Read Response Message. 2516 Since the RDMA Read Request Message from the target does not 2517 transfer the control information in the R2T PDU such as ExpCmdSN, if 2518 timely updates of such information is crucial, the iSCSI Layer at 2519 the initiator MAY issue NOP-Out PDUs to request the iSCSI Layer at 2520 the target to respond with the information using NOP-In PDUs. 2522 Similarly, since the RDMA Read Response Message from the initiator 2523 only transfers the data but not the control information normally 2524 found in the SCSI Data-out PDU, such as ExpStatSN, if timely updates 2525 of such information is crucial, the iSCSI Layer at the target MAY 2526 issue NOP-In PDUs to request the iSCSI Layer at the initiator to 2527 respond with the information using NOP-Out PDUs. 2529 7.3.7 Asynchronous Message 2531 Type: control-type PDU 2533 PDU-specific qualifiers: DataDescriptorSense 2535 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2536 qualified with DataDescriptorSense which defines the buffer 2537 containing the sense and iSCSI event information. The iSER Layer 2538 MUST use a Send Message to send the Asynchronous Message PDU. The 2539 SendSE Message should be used if supported by the RCaP layer (e.g., 2540 iWARP). 2542 7.3.8 Text Request & Text Response 2544 Type: control-type PDU 2546 PDU-specific qualifiers: DataDescriptorTextOut (for Text 2547 Request), DataDescriptorIn (for Text Response) 2549 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2550 qualified with DataDescriptorTextOut (or DataDescriptorIn) which 2551 defines the Text Request (or Text Response) buffer. The iSER Layer 2552 MUST use Send Messages to send the Text Request (or Text Response 2553 PDUs). The SendSE Message should be used if supported by the RCaP 2554 layer (e.g., iWARP). 2556 7.3.9 Login Request & Login Response 2558 During the login negotiation, the iSCSI Layer interacts with the 2559 transport layer directly and the iSER Layer is not involved. See 2560 section 5.1 on iSCSI/iSER Connection Setup. If the underlying 2561 transport is TCP, the Login Request PDUs and the Login Response PDUs 2562 are exchanged when the connection between the initiator and the 2563 target is still in the byte stream mode. 2565 The iSCSI Layer MUST NOT send a Login Request (or a Login Response) 2566 PDU during the full feature phase. A Login Request (or a Login 2567 Response) PDU, if used, MUST be treated as an iSCSI protocol error. 2568 The iSER Layer MAY reject such a PDU from the iSCSI Layer with an 2569 appropriate error code. If a Login Request PDU is received by the 2570 iSCSI Layer at the target, it MUST respond with a Reject PDU with a 2571 reason code of "protocol error". 2573 7.3.10 Logout Request & Logout Response 2575 Type: control-type PDU 2577 PDU-specific qualifiers: None 2579 The iSER Layer MUST use a Send Message to send the Logout Request or 2580 Logout Response PDU. The SendSE Message should be used if supported 2581 by the RCaP layer (e.g., iWARP). Section 5.2.1 and 5.2.2 describe 2582 the handling of the Logout Request and the Logout Response at the 2583 initiator and the target and the interactions between the initiator 2584 and the target to terminate a connection. 2586 7.3.11 SNACK Request 2588 Since HeaderDigest and DataDigest must be negotiated to "None", 2589 there are no digest errors when the connection is in iSER-assisted 2590 mode. Also since RCaP delivers all messages in the order they were 2591 sent, there are no sequence errors when the connection is in iSER- 2592 assisted mode. Therefore the iSCSI Layer MUST NOT send SNACK 2593 Request PDUs. A SNACK Request PDU, if used, MUST be treated as an 2594 iSCSI protocol error. The iSER Layer MAY reject such a PDU from the 2595 iSCSI Layer with an appropriate error code. If a SNACK Request PDU 2596 is received by the iSCSI Layer at the target, it MUST respond with a 2597 Reject PDU with a reason code of "protocol error". 2599 7.3.12 Reject 2601 Type: control-type PDU 2602 PDU-specific qualifiers: DataDescriptorReject 2604 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2605 qualified with DataDescriptorReject which defines the Reject buffer. 2606 The iSER Layer MUST use a Send Message to send the Reject PDU. The 2607 SendSE Message should be used if supported by the RCaP layer (e.g., 2608 iWARP). 2610 7.3.13 NOP-Out & NOP-In 2612 Type: control-type PDU 2614 PDU-specific qualifiers: DataDescriptorNOPOut (for NOP-Out), 2615 DataDescriptorNOPIn (for NOP-In) 2617 The iSCSI Layer MUST invoke the Send_Control Operational Primitive 2618 qualified with DataDescriptorNOPOut (or DataDescriptorNOPIn) which 2619 defines the Ping (or Return Ping) data buffer. The iSER Layer MUST 2620 use Send Messages to send the NOP-Out (or NOP-In) PDU. The SendSE 2621 Message should be used if supported by the RCaP layer (e.g., iWARP). 2623 8 Flow Control and STag Management 2625 8.1 Flow Control for RDMA Send Messages 2627 Send Messages in RCaP are used by the iSER Layer to transfer iSCSI 2628 control-type PDUs. Each Send Message in RCaP consumes an Untagged 2629 Buffer at the Data Sink. However, neither the RCaP layer nor the 2630 iSER Layer provides an explicit flow control mechanism for the Send 2631 Messages. Therefore, the iSER Layer SHOULD provision enough 2632 Untagged buffers for handling incoming Send Messages to prevent 2633 buffer exhaustion at the RCaP layer. If buffer exhaustion occurs, 2634 it may result in the termination of the connection. 2636 An implementation may choose to satisfy the buffer requirement by 2637 using a common buffer pool shared across multiple connections, with 2638 usage limits on a per connection basis and usage limits on the 2639 buffer pool itself. In such an implementation, exceeding the buffer 2640 usage limit for a connection or the buffer pool itself may trigger 2641 interventions from the iSER Layer to replenish the buffer pool 2642 and/or to isolate the connection causing the problem. 2644 iSER also provides the MaxOutstandingUnexpectedPDUs key to be used 2645 by the initiator and the target to declare the maximum number of 2646 outstanding "unexpected" control-type PDUs that it can receive. It 2647 is intended to allow the receiving side to determine the amount of 2648 buffer resources needed beyond the normal flow control mechanism 2649 available in iSCSI. 2651 The buffer resources required at both the initiator and the target 2652 as a result of control-type PDUs sent by the initiator is described 2653 in section 8.1.1. The buffer resources required at both the 2654 initiator and target as a result of control-type PDUs sent by the 2655 target is described in section 8.1.2. 2657 8.1.1 Flow Control for Control-Type PDUs from the Initiator 2659 The control-type PDUs that can be sent by an initiator to a target 2660 can be grouped into the following categories: 2662 1. Regulated: Control-type PDUs in this category are regulated by 2663 the iSCSI CmdSN window mechanism and the immediate flag is not 2664 set. 2666 2. Unregulated but Expected: Control-type PDUs in this category 2667 are not regulated by the iSCSI CmdSN window mechanism but are 2668 expected by the target. 2670 3. Unregulated and Unexpected: Control-type PDUs in this category 2671 are not regulated by the iSCSI CmdSN window mechanism and are 2672 "unexpected" by the target. 2674 8.1.1.1 Control-Type PDUs from the Initiator in the Regulated Category 2676 Control-type PDUs that can be sent by the initiator in this category 2677 are regulated by the iSCSI CmdSN window mechanism and the immediate 2678 flag is not set. 2680 The queuing capacity required of the iSCSI layer at the target is 2681 described in section 4.2.2.1 of [iSCSI]. For each of the control- 2682 type PDUs that can be sent by the initiator in this category, the 2683 initiator MUST provision for the buffer resources required for the 2684 corresponding control-type PDU sent as a response from the target. 2685 The following is a list of the PDUs that can be sent by the 2686 initiator and the PDUs that are sent by the target in response: 2688 a. When an initiator sends a SCSI Command PDU, it expects a 2689 SCSI Response PDU from the target. 2691 b. When the initiator sends a Task Management Function Request 2692 PDU, it expects a Task Management Function Response PDU from 2693 the target. 2695 c. When the initiator sends a Text Request PDU, it expects a 2696 Text Response PDU from the target. 2698 d. When the initiator sends a Logout Request PDU, it expects a 2699 Logout Response PDU from the target. 2701 e. When the initiator sends a NOP-Out PDU as a ping request 2702 with ITT != 0xffffffff and TTT = 0xffffffff, it expects a 2703 NOP-In PDU from the target with the same ITT and TTT as in 2704 the ping request. 2706 The response from the target for any of the PDUs enumerated here may 2707 alternatively be in the form of a Reject PDU sent instead before the 2708 task is active, as described in section 7.3 of [iSCSI]. 2710 8.1.1.2 Control-Type PDUs from the Initiator in the Unregulated but 2711 Expected Category 2713 For the control-type PDUs in the Unregulated but Expected category, 2714 the amount of buffering resources required at the target can be 2715 predetermined. The following is a list of the PDUs in this 2716 category: 2718 a. SCSI Data-out PDUs are used by the initiator to send 2719 unsolicited data. The amount of buffer resources required 2720 by the target can be determined using FirstBurstLength. 2721 Note that SCSI Data-out PDUs are not used for solicited 2722 data since the R2T PDU which is used for solicitation is 2723 transformed into RDMA Read operations by the iSER layer at 2724 the target. See section 7.3.4. 2726 b. A NOP-Out PDU with TTT != 0xffffffff is sent as a ping 2727 response by the initiator to the NOP-In PDU sent as a ping 2728 request by the target. 2730 8.1.1.3 Control-Type PDUs from the Initiator in the Unregulated and 2731 Unexpected Category 2733 PDUs in the Unregulated and Unexpected category are PDUs with the 2734 immediate flag set. The number of PDUs in this category which can 2735 be sent by an initiator is controlled by the value of 2736 MaxOutstandingUnexpectedPDUs declared by the target. (See section 2737 6.7.) After a PDU in this category is sent by the initiator, it is 2738 outstanding until it is retired. At any time, the number of 2739 outstanding unexpected PDUs MUST NOT exceed the value of 2740 MaxOutstandingUnexpectedPDUs declared by the target. 2742 The target uses the value of MaxOutstandingUnexpectedPDUs that it 2743 declared to determine the amount of buffer resources required for 2744 control-type PDUs in this category that can be sent by an initiator. 2745 For the initiator, for each of the control-type PDUs that can be 2746 sent in this category, the initiator MUST provision for the buffer 2747 resources if required for the corresponding control-type PDU that 2748 can be sent as a response from the target. 2750 An outstanding PDU in this category is retired as follows. If the 2751 CmdSN of the PDU sent by the initiator in this category is x, the 2752 PDU is outstanding until the initiator sends a non-immediate 2753 control-type PDU on the same connection with CmdSN = y (where y is 2754 at least x) and the target responds with a control-type PDU on any 2755 connection where ExpCmdSN is at least y+1. 2757 When the number of outstanding unexpected control-type PDUs equals 2758 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the initiator MUST 2759 NOT generate any unexpected PDUs which otherwise it would have 2760 generated, even if it is intended for immediate delivery. 2762 8.1.2 Flow Control for Control-Type PDUs from the Target 2764 Control-type PDUs that can be sent by a target and are expected by 2765 the initiator are listed in the Regulated category. (See section 2766 8.1.1.1.) 2768 For the control-type PDUs that can be sent by a target and are 2769 unexpected by the initiator, the number is controlled by 2770 MaxOutstandingUnexpectedPDUs declared by the initiator. (See 2771 section 6.7.) After a PDU in this category is sent by a target, it 2772 is outstanding until it is retired. At any time, the number of 2773 outstanding unexpected PDUs MUST NOT exceed the value of 2774 MaxOutstandingUnexpectedPDUs declared by the initiator. The 2775 initiator uses the value of MaxOutstandingUnexpectedPDUs that it 2776 declared to determine the amount of buffer resources required for 2777 control-type PDUs in this category that can be sent by a target. 2778 The following is a list of the PDUs in this category and the 2779 conditions for retiring the outstanding PDU: 2781 a. For an Asynchronous Message PDU with StatSN = x, the PDU is 2782 outstanding until the initiator sends a control-type PDU 2783 with ExpStatSN set to at least x+1. 2785 b. For a Reject PDU with StatSN = x which is sent after a task 2786 is active, the PDU is outstanding until the initiator sends 2787 a control-type PDU with ExpStatSN set to at least x+1. 2789 c. For a NOP-In PDU with ITT = 0xffffffff and StatSN = x, the 2790 PDU is outstanding until the initiator responds with a 2791 control-type PDU on the same connection where ExpStatSN is 2792 at least x+1. But if the NOP-In PDU is sent as a ping 2793 request with TTT != 0xffffffff, the PDU can also be retired 2794 when the initiator sends a NOP-Out PDU with the same ITT and 2795 TTT as in the ping request. Note that when a target sends a 2796 NOP-In PDU as a ping request, it must provision a buffer for 2797 the NOP-Out PDU sent as a ping response from the initiator. 2799 When the number of outstanding unexpected control-type PDUs equals 2800 MaxOutstandingUnexpectedPDUs, the iSCSI Layer at the target MUST NOT 2801 generate any unexpected PDUs which otherwise it would have 2802 generated, even if its intent is to indicate an iSCSI error 2803 condition (e.g., Asynchronous Message, Reject). Task timeouts as in 2804 the initiator waiting for a command completion or other connection 2805 and session level exceptions will ensure that correct operational 2806 behavior will result in these cases despite not generating the PDU. 2807 This rule overrides any other requirements elsewhere which require 2808 that a Reject PDU MUST be sent. 2810 (Implementation note: SCSI task timeout and recovery can be a 2811 lengthy process and hence SHOULD be avoided by proper provisioning 2812 of resources.) 2814 (Implementation note: To ensure that the initiator has a means to 2815 inform the target that outstanding PDUs have been retired, the 2816 target should reserve the last unexpected control-type PDU allowable 2817 by the value of MaxOutstandingUnexpectedPDUs declared by the 2818 initiator for sending a NOP-In ping request with TTT != 0xffffffff 2819 to allow the initiator to return the NOP-Out ping response with the 2820 current ExpStatSN.) 2822 8.2 Flow Control for RDMA Read Resources 2824 If iSERHelloRequired is negotiated to "Yes", then the total number 2825 of RDMA Read operations that can be active simultaneously on an 2826 iSCSI/iSER connection depends on the amount of resources allocated 2827 as declared in the iSER Hello exchange described in section 5.1.3. 2828 Exceeding the number of RDMA Read operations allowed on a connection 2829 will result in the connection being terminated by the RCaP layer. 2830 The iSER Layer at the target maintains the iSER-ORD to keep track of 2831 the maximum number of RDMA Read Requests that can be issued by the 2832 iSER Layer on a particular RCaP Stream. 2834 During connection setup (see section 5.1), iSER-IRD is known at the 2835 initiator and iSER-ORD is known at the target after the iSER Layers 2836 at the initiator and the target have respectively allocated the 2837 connection resources necessary to support RCaP, as directed by the 2838 Allocate_Connection_Resources Operational Primitive from the iSCSI 2839 Layer before the end of the iSCSI Login Phase. In the full feature 2840 phase, if iSERHelloRequired is ngtiated to "Yes", then the first 2841 message sent by the initiator is the iSER Hello Message (see section 2842 9.3) which contains the value of iSER-IRD. In response to the iSER 2843 Hello Message, the target sends the iSER HelloReply Message (see 2844 section 9.4) which contains the value of iSER-ORD. The iSER Layer 2845 at both the initiator and the target MAY adjust (lower) the 2846 resources associated with iSER-IRD and iSER-ORD respectively to 2847 match the iSER-ORD value declared in the HelloReply Message. The 2848 iSER Layer at the target MUST flow control the RDMA Read Request 2849 Messages to not exceed the iSER-ORD value at the target. 2851 If iSERHelloRequired is negotiated to "No", then the maximum number 2852 of RDMA Read operations that can be active is negotiated via other 2853 means outside the scope of this document. For example, in 2854 InfiniBand, iSER connection setup uses InfiniBand CM MADs, with 2855 additional iSER information exchanged in the private data. 2857 8.3 STag Management 2859 An STag is an identifier of a Tagged Buffer used in an RDMA 2860 operation. The allocation and the subsequent invalidation of the 2861 STags are specified in this document if the STags are exposed on the 2862 wire by being Advertised in the iSER header or declared in the 2863 header of an RCaP Message. 2865 8.3.1 Allocation of STags 2867 When the iSCSI Layer at the initiator invokes the Send_Control 2868 Operational Primitive to request the iSER Layer at the initiator to 2869 process a SCSI Command, zero, one, or two STags may be allocated by 2870 the iSER Layer. See section 7.3.1 for details. The number of STags 2871 allocated depends on whether the command is unidirectional or 2872 bidirectional and whether solicited write data transfer is involved 2873 or not. 2875 When the iSCSI Layer at the initiator invokes the Send_Control 2876 Operational Primitive to request the iSER Layer at the initiator to 2877 process a Task Management Function Request with the TASK REASSIGN 2878 function, besides allocating zero, one, or two STags, the iSER Layer 2879 MUST invalidate the existing STags (if any) associated with the ITT. 2880 See section 7.3.3 for details. 2882 The iSER Layer at the target allocates a local Data Sink STag when 2883 the iSCSI Layer at the target invokes the Get_Data Operational 2884 Primitive to request the iSER Layer to process an R2T PDU. See 2885 section 7.3.6 for details. 2887 8.3.2 Invalidation of STags 2889 The invalidation of the STags at the initiator at the completion of 2890 a unidirectional or bidirectional command when the associated SCSI 2891 Response PDU is sent by the target is described in section 7.3.2. 2893 When a unidirectional or bidirectional command concludes without the 2894 associated SCSI Response PDU being sent by the target, the iSCSI 2895 Layer at the initiator MUST request the iSER Layer at the initiator 2896 to invalidate the STags by invoking the Deallocate_Task_Resources 2897 Operational Primitive qualified with ITT. In response, the iSER 2898 Layer at the initiator MUST locate the STags (if any) in the Local 2899 Mapping. The iSER Layer at the initiator MUST invalidate the STags 2900 (if any) and the Local Mapping. 2902 For an RDMA Read operation used to realize a SCSI Write data 2903 transfer, the iSER Layer at the target SHOULD invalidate the Data 2904 Sink STag at the conclusion of the RDMA Read operation referencing 2905 the Data Sink STag (to permit the immediate reuse of buffer 2906 resources). 2908 For an RDMA Write operation used to realize a SCSI Read data 2909 transfer, the Data Source STag at the target is not declared to the 2910 initiator and is not exposed on the wire. Invalidation of the STag 2911 is thus not specified. 2913 When a unidirectional or bidirectional command concludes without the 2914 associated SCSI Response PDU being sent by the target, the iSCSI 2915 Layer at the target MUST request the iSER Layer at the target to 2916 invalidate the STags by invoking the Deallocate_Task_Resources 2917 Operational Primitive qualified with ITT. In response, the iSER 2918 Layer at the target MUST locate the local STags (if any) in the 2919 Local Mapping. The iSER Layer at the target MUST invalidate the 2920 local STags (if any) and the Local Mapping. 2922 9 iSER Control and Data Transfer 2924 For iSCSI data-type PDUs (see section 7.1), the iSER Layer uses RDMA 2925 Read and RDMA Write operations to transfer the solicited data. For 2926 iSCSI control-type PDUs (see section 7.2), the iSER Layer uses Send 2927 Messages of RCaP. 2929 9.1 iSER Header Format 2931 An iSER header MUST be present in every Send Message of RCaP. The 2932 iSER header is located in the first 28 bytes of the message payload 2933 of the Send Message of RCaP, as shown in Figure 2. 2935 0 1 2 3 2936 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2938 | Opcode| Opcode Specific Fields | 2939 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2940 | Opcode Specific Fields (32 bits) | 2941 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2942 | | 2943 | Opcode Specific Fields (64 bits) | 2944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2945 | Opcode Specific Fields (32 bits) | 2946 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2947 | | 2948 | Opcode Specific Fields (64 bits) | 2949 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2951 Figure 2 iSER Header Format 2953 Opcode - Operation Code: 4 bits 2955 The Opcode field identifies the type of iSER Messages: 2957 0001b = iSCSI control-type PDU 2959 0010b = iSER Hello Message 2961 0011b = iSER HelloReply Message 2963 All other opcodes are reserved. 2965 9.2 iSER Header Format for iSCSI Control-Type PDU 2967 The iSER Layer uses Send Messages of RCaP to transfer iSCSI control- 2968 type PDUs (see section 7.2). The message payload of each of the 2969 Send Messages of RCaP used for transferring an iSER Message contains 2970 an iSER Header followed by an iSCSI control-type PDU. 2972 The iSER header in a Send Message of RCaP carrying an iSCSI control- 2973 type PDU MUST have the format as described in Figure 3. 2975 0 1 2 3 2976 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2977 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2978 | |W|R| | 2979 | 0001b |S|S| Reserved | 2980 | |V|V| | 2981 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2982 | Write STag | 2983 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2984 | | 2985 | Write Base Offset | 2986 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2987 | Read STag | 2988 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2989 | | 2990 | Read Base Offset | 2991 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 2992 Figure 3 iSER Header Format for iSCSI Control-Type PDU 2994 WSV - Write STag Valid flag: 1 bit 2996 This flag indicates the validity of the Write STag field and 2997 the Write Base Offset field of the iSER Header. If set to one, 2998 the Write STag field and the Write Base Offset field in this 2999 iSER Header are valid. If set to zero, the Write STag field 3000 and the Write Base Offset field in this iSER Header MUST be 3001 ignored at the receiver. The Write STag Valid flag is set to 3002 one when there is solicited data to be transferred for a SCSI 3003 Write or bidirectional command, or when there are non-immediate 3004 unsolicited and solicited data to be transferred for the 3005 referenced task specified in a Task Management Function Request 3006 with the TASK REASSIGN function. 3008 RSV - Read STag Valid flag: 1 bit 3010 This flag indicates the validity of the Read STag field and the 3011 Read Base Offset field of the iSER Header. If set to one, the 3012 Read STag field and the Read Base Offset field in this iSER 3013 Header is valid. If set to zero, the Read STag field and the 3014 Read Base Offset field in this iSER Header MUST be ignored at 3015 the receiver. The Read STag Valid flag is set to one for a 3016 SCSI Read or bidirectional command, or a Task Management 3017 Function Request with the TASK REASSIGN function. 3019 Write STag - Write Steering Tag: 32 bits 3021 This field contains the Write STag when the Write STag Valid 3022 flag is set to one. For a SCSI Write or bidirectional command, 3023 the Write STag is used to Advertise the initiator's I/O Buffer 3024 containing the solicited data. For a Task Management Function 3025 Request with the TASK REASSIGN function, the Write STag is used 3026 to Advertise the initiator's I/O Buffer containing the non- 3027 immediate unsolicited data and solicited data. This Write STag 3028 is used as the Data Source STag in the resultant RDMA Read 3029 operation(s). When the Write STag Valid flag is set to zero, 3030 this field MUST be set to zero and ignored on receive. 3032 Write Base Offset: 64 bits 3034 This field contains the Base Offset associated with the I/O 3035 Buffer for the SCSI Write command when the Write STag Valid 3036 flag is set to one. When the Write STag Valid flag is set to 3037 zero, this field MUST be set to zero and ignored on receive. 3039 Read STag - Read Steering Tag: 32 bits 3041 This field contains the Read STag when the Read STag Valid flag 3042 is set to one. The Read STag is used to Advertise the 3043 initiator's Read I/O Buffer of a SCSI Read or bidirectional 3044 command, or a Task Management Function Request with the TASK 3045 REASSIGN function. This Read STag is used as the Data Sink 3046 STag in the resultant RDMA Write operation(s). When the Read 3047 STag Valid flag is zero, this field MUST be set to zero and 3048 ignored on receive. 3050 Read Base Offset: 64 bits 3052 This field contains the Base Offset associated with the I/O 3053 Buffer for the SCSI Read command when the Read STag Valid flag 3054 is set to one. When the Read STag Valid flag is set to zero, 3055 this field MUST be set to zero and ignored on receive. 3057 Reserved: 3059 Reserved fields MUST be set to zero on transmit and MUST be 3060 ignored on receive. 3062 9.3 iSER Header Format for iSER Hello Message 3064 An iSER Hello Message MUST only contain the iSER header which MUST 3065 have the format as described in Figure 4. If iSERHelloRequired is 3066 negotiated to "Yes", then iSER Hello Message is the first iSER 3067 Message sent on the RCaP Stream from the iSER Layer at the initiator 3068 to the iSER Layer at the target. 3070 0 1 2 3 3071 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3072 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3073 | | | | | | 3074 | 0010b | Rsvd | MaxVer| MinVer| iSER-IRD | 3075 | | | | | | 3076 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3077 | Reserved | 3078 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3079 | | 3080 | Reserved | 3081 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3082 | Reserved | 3083 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3084 | | 3085 | Reserved | 3086 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3088 Figure 4 iSER Header Format for iSER Hello Message 3090 MaxVer - Maximum Version: 4 bits 3092 This field specifies the maximum version of the iSER protocol 3093 supported. It MUST be set to 10 to indicate the version of the 3094 specification described in this document. 3096 MinVer - Minimum Version: 4 bits 3098 This field specifies the minimum version of the iSER protocol 3099 supported. It MUST be set to 10 to indicate the version of the 3100 specification described in this document. 3102 iSER-IRD: 16 bits 3104 This field contains the value of the iSER-IRD at the initiator. 3106 Reserved (Rsvd): 3108 Reserved fields MUST be set to zero on transmit, and MUST be 3109 ignored on receive. 3111 9.4 iSER Header Format for iSER HelloReply Message 3113 An iSER HelloReply Message MUST only contain the iSER header which 3114 MUST have the format as described in Figure 5. If iSERHelloRequired 3115 is negotiated to "Yes", then the iSER HelloReply Message is the 3116 first iSER Message sent on the RCaP Stream from the iSER Layer at 3117 the target to the iSER Layer at the initiator. 3119 0 1 2 3 3120 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3121 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3122 | | |R| | | | 3123 | 0011b |Rsvd |E| MaxVer| CurVer| iSER-ORD | 3124 | | |J| | | | 3125 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3126 | Reserved | 3127 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3128 | | 3129 | Reserved | 3130 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3131 | Reserved | 3132 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3133 | | 3134 | Reserved | 3135 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3137 Figure 5 iSER Header Format for iSER HelloReply Message 3139 REJ - Reject flag: 1 bit 3141 This flag indicates whether the target is rejecting this 3142 connection. If set to one, the target is rejecting the 3143 connection. 3145 MaxVer - Maximum Version: 4 bits 3147 This field specifies the maximum version of the iSER protocol 3148 supported. It MUST be set to 10 to indicate the version of the 3149 specification described in this document. 3151 CurVer - Current Version: 4 bits 3152 This field specifies the current version of the iSER protocol 3153 supported. It MUST be set to 10 to indicate the version of the 3154 specification described in this document. 3156 iSER-ORD: 16 bits 3158 This field contains the value of the iSER-ORD at the target. 3160 Reserved (Rsvd): 3162 Reserved fields MUST be set to zero on transmit, and MUST be 3163 ignored on receive. 3165 9.5 SCSI Data Transfer Operations 3167 The iSER Layer at the initiator and the iSER Layer at the target 3168 handle each SCSI Write, SCSI Read, and bidirectional operation as 3169 described below. 3171 9.5.1 SCSI Write Operation 3173 The iSCSI Layer at the initiator MUST invoke the Send_Control 3174 Operational Primitive to request the iSER Layer at the initiator to 3175 send the SCSI Write Command. The iSER Layer at the initiator MUST 3176 request the RCaP layer to transmit a Send Message with the message 3177 payload consisting of the iSER header followed by the SCSI Command 3178 PDU and immediate data (if any). The SendSE Message should be used 3179 if supported by the RCaP layer (e.g., iWARP). If there is solicited 3180 data, the iSER Layer MUST Advertise the Write STag and the Base 3181 Offset in the iSER header of the Send Message, as described in 3182 section 9.2. Upon receiving the Send Message, the iSER Layer at the 3183 target MUST notify the iSCSI Layer at the target by invoking the 3184 Control_Notify Operational Primitive qualified with the SCSI Command 3185 PDU. See section 7.3.1 for details on the handling of the SCSI 3186 Write Command. 3188 For the non-immediate unsolicited data, the iSCSI Layer at the 3189 initiator MUST invoke a Send_Control Operational Primitive qualified 3190 with the SCSI Data-out PDU. Upon receiving each Send Message 3191 containing the non-immediate unsolicited data, the iSER Layer at the 3192 target MUST notify the iSCSI Layer at the target by invoking the 3193 Control_Notify Operational Primitive qualified with the SCSI Data- 3194 out PDU. See section 7.3.4 for details on the handling of the SCSI 3195 Data-out PDU. 3197 For the solicited data, when the iSCSI Layer at the target has an 3198 I/O Buffer available, it MUST invoke the Get_Data Operational 3199 Primitive qualified with the R2T PDU. See section 7.3.6 for details 3200 on the handling of the R2T PDU. 3202 When the data transfer associated with this SCSI Write operation is 3203 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3204 Operational Primitive when it is ready to send the SCSI Response 3205 PDU. Upon receiving a Send Message containing the SCSI Response 3206 PDU, the iSER Layer at the initiator MUST notify the iSCSI Layer at 3207 the initiator by invoking the Control_Notify Operational Primitive 3208 qualified with the SCSI Response PDU. See section 7.3.2 for details 3209 on the handling of the SCSI Response PDU. 3211 9.5.2 SCSI Read Operation 3213 The iSCSI Layer at the initiator MUST invoke the Send_Control 3214 Operational Primitive to request the iSER Layer at the initiator to 3215 send the SCSI Read Command. The iSER Layer at the initiator MUST 3216 request the RCaP layer to transmit a Send Message with the message 3217 payload consisting of the iSER header followed by the SCSI Command 3218 PDU. The SendSE Message should be used if supported by the RCaP 3219 layer (e.g., iWARP). The iSER Layer at the initiator MUST Advertise 3220 the Read STag and the Base Offset in the iSER header of the Send 3221 Message, as described in section 9.2. Upon receiving the Send 3222 Message, the iSER Layer at the target MUST notify the iSCSI Layer at 3223 the target by invoking the Control_Notify Operational Primitive 3224 qualified with the SCSI Command PDU. See section 7.3.1 for details 3225 on the handling of the SCSI Read Command. 3227 When the requested SCSI data is available in the I/O Buffer, the 3228 iSCSI Layer at the target MUST invoke the Put_Data Operational 3229 Primitive qualified with the SCSI Data-in PDU. See section 7.3.5 3230 for details on the handling of the SCSI Data-in PDU. 3232 When the data transfer associated with this SCSI Read operation is 3233 complete, the iSCSI Layer at the target MUST invoke the Send_Control 3234 Operational Primitive when it is ready to send the SCSI Response 3235 PDU. The SendInvSE Message should be used if supported by the RCaP 3236 layer (e.g., iWARP). Upon receiving the Send Message containing the 3237 SCSI Response PDU, the iSER Layer at the initiator MUST notify the 3238 iSCSI Layer at the initiator by invoking the Control_Notify 3239 Operational Primitive qualified with the SCSI Response PDU. See 3240 section 7.3.2 for details on the handling of the SCSI Response PDU. 3242 9.5.3 Bidirectional Operation 3244 The initiator and the target handle the SCSI Write and the SCSI Read 3245 portions of this bidirectional operation the same as described in 3246 Section 9.5.1 and Section 9.5.2 respectively. 3248 10 iSER Error Handling and Recovery 3250 RCaP provides the iSER Layer with reliable in-order delivery. 3251 Therefore, the error management needs of an iSER-assisted connection 3252 are somewhat different than those of a Traditional iSCSI connection. 3254 10.1 Error Handling 3256 iSER error handling is described in the following sections, 3257 classified loosely based on the sources of errors: 3259 1. Those originating at the transport layer (e.g., TCP). 3261 2. Those originating at the RCaP layer. 3263 3. Those originating at the iSER Layer. 3265 4. Those originating at the iSCSI Layer. 3267 10.1.1 Errors in the Transport Layer 3269 If the transport layer is TCP, then TCP packets with detected errors 3270 are silently dropped by the TCP layer and result in retransmission 3271 at the TCP layer. This has no impact on the iSER Layer. However, 3272 connection loss (e.g., link failure) and unexpected termination 3273 (e.g., TCP graceful or abnormal close without the iSCSI Logout 3274 exchanges) at the transport layer will cause the iSCSI/iSER 3275 connection to be terminated as well. 3277 10.1.1.1 Failure in the Transport Layer Before RCaP Mode is Enabled 3279 If the Connection is lost or terminated before the iSCSI Layer 3280 invokes the Allocate_Connection_Resources Operational Primitive, the 3281 login process is terminated and no further action is required. 3283 If the Connection is lost or terminated after the iSCSI Layer has 3284 invoked the Allocate_Connection_Resources Operational Primitive, 3285 then the iSCSI Layer MUST request the iSER Layer to deallocate all 3286 connection resources by invoking the Deallocate_Connection_Resources 3287 Operational Primitive. 3289 10.1.1.2 Failure in the Transport Layer After RCaP Mode is Enabled 3291 If the Connection is lost or terminated after the iSCSI Layer has 3292 invoked the Enable_Datamover Operational Primitive, the iSER Layer 3293 MUST notify the iSCSI Layer of the connection loss by invoking the 3294 Connection_Terminate_Notify Operational Primitive. Prior to 3295 invoking the Connection_Terminate_Notify Operational Primitive, the 3296 iSER layer MUST perform the actions described in Section 5.2.3.2. 3298 10.1.2 Errors in the RCaP Layer 3300 The RCaP layer does not have error recovery operations built in. If 3301 errors are detected at the RCaP layer, the RCaP layer will terminate 3302 the RCaP Stream and the associated Connection. 3304 10.1.2.1 Errors Detected in the Local RCaP Layer 3306 If an error is encountered at the local RCaP layer, the RCaP layer 3307 MAY send a Send Message to the Remote Peer to report the error if 3308 possible. (For iWARP, see [RDMAP] for the list of errors where a 3309 Terminate Message is sent.) The RCaP layer is responsible for 3310 terminating the Connection. After the RCaP layer notifies the iSER 3311 Layer that the Connection is terminated, the iSER Layer MUST notify 3312 the iSCSI Layer by invoking the Connection_Terminate_Notify 3313 Operational Primitive. Prior to invoking the Connection Terminate 3314 Notify Operational Primitive, the iSER layer MUST perform the 3315 actions described in Section 5.2.3.2. 3317 10.1.2.2 Errors Detected in the RCaP Layer at the Remote Peer 3319 If an error is encountered at the RCaP layer at the Remote Peer, the 3320 RCaP layer at the Remote Peer may send a Send Message to report the 3321 error if possible. If it is unable to send a Send Message, the 3322 Connection is terminated. This is treated the same as a failure in 3323 the transport layer after RDMA is enabled as described in section 3324 10.1.1.2. 3326 If an error is encountered at the RCaP layer at the Remote Peer and 3327 it is able to send a Send Message, the RCaP layer at the Remote Peer 3328 is responsible for terminating the connection. After the local RCaP 3329 layer notifies the iSER Layer that the Connection is terminated, the 3330 iSER Layer MUST notify the iSCSI Layer by invoking the Connection 3331 Terminate Notify Operational Primitive. Prior to invoking the 3332 Connection_Terminate_Notify Operational Primitive, the iSER layer 3333 MUST perform the actions described in Section 5.2.3.2. 3335 10.1.3 Errors in the iSER Layer 3337 The error handling due to errors at the iSER Layer is described in 3338 the following sections. 3340 10.1.3.1 Insufficient Connection Resources to Support RCaP at 3341 Connection Setup 3343 After the iSCSI Layer at the initiator invokes the 3344 Allocate_Connection_Resources Operational Primitive during the iSCSI 3345 login negotiation phase, if the iSER Layer at the initiator fails to 3346 allocate the connection resources necessary to support RCaP, it MUST 3347 return a status of failure to the iSCSI Layer at the initiator. The 3348 iSCSI Layer at the initiator MUST terminate the Connection as 3349 described in Section 5.2.3.1. 3351 After the iSCSI Layer at the target invokes the 3352 Allocate_Connection_Resources Operational Primitive during the iSCSI 3353 login negotiation phase, if the iSER Layer at the target fails to 3354 allocate the connection resources necessary to support RCaP, it MUST 3355 return a status of failure to the iSCSI Layer at the target. The 3356 iSCSI Layer at the target MUST send a Login Response with a status 3357 class of 3 (Target Error), and a status code of "0302" (Out of 3358 Resources). The iSCSI Layers at the initiator and the target MUST 3359 terminate the Connection as described in Section 5.2.3.1. 3361 10.1.3.2 iSER Negotiation Failures 3363 If iSERHelloRequired is negotiated to "Yes" and the RCaP or iSER 3364 related parameters declared by the initiator in the iSER Hello 3365 Message is unacceptable to the iSER Layer at the target, the iSER 3366 Layer at the target MUST set the Reject (REJ) flag, as described in 3367 section 9.4, in the iSER HelloReply Message. The following are the 3368 cases when the iSER Layer MUST set the REJ flag to 1 in the 3369 HelloReply Message: 3371 * The initiator-declared iSER-IRD value is greater than 0 and the 3372 target-declared iSER-ORD value is 0. 3374 * The initiator-supported and the target-supported iSER protocol 3375 versions do not overlap. 3377 After requesting the RCaP layer to send the iSER HelloReply Message, 3378 the handling of the error situation is the same as that for iSER 3379 format errors as described in section 10.1.3.3. 3381 10.1.3.3 iSER Format Errors 3383 The following types of errors in an iSER header are considered 3384 format errors: 3386 * Illegal contents of any iSER header field 3387 * Inconsistent field contents in an iSER header 3389 * Length error for an iSER Hello or HelloReply Message (see section 3390 9.3 and 9.4) 3392 When a format error is detected, the following events MUST occur in 3393 the specified sequence: 3395 1. The iSER Layer MUST request the RCaP layer to terminate the RCaP 3396 Stream. The RCaP layer MUST terminate the associated 3397 Connection. 3399 2. The iSER Layer MUST notify the iSCSI Layer of the connection 3400 termination by invoking the Connection_Terminate_Notify 3401 Operational Primitive. Prior to invoking the 3402 Connection_Terminate_Notify Operational Primitive, the iSER 3403 layer MUST perform the actions described in Section 5.2.3.2. 3405 10.1.3.4 iSER Protocol Errors 3407 If iSERHelloRequired is negotiated to "Yes", then the first iSER 3408 Message sent by the iSER Layer at the initiator MUST be the iSER 3409 Hello Message (see section 9.3). In this case the first iSER 3410 Message sent by the iSER Layer at the target MUST be the iSER 3411 HelloReply Message (see section 9.4). Failure to send the iSER 3412 Hello or HelloReply Message, as indicated by the wrong Opcode in the 3413 iSER header, is a protocol error. Conversely, if the iSER Hello 3414 Message is sent by the iSER Layer at the initiator when 3415 iSERHelloRequired is negotiated to "No", the iSER Layer at the 3416 target MAY treat this as a protocol error or respond with an iSER 3417 HelloReply Message. The handling of iSER protocol errors is the 3418 same as that for iSER format errors as described in section 3419 10.1.3.3. 3421 If the sending side of an iSER-enabled connection acts in a manner 3422 not permitted by the negotiated or declared login/text operational 3423 key values as described in section 6, this is a protocol error and 3424 the receiving side MAY handle this the same as for iSER format 3425 errors as described in section 10.1.3.3. 3427 10.1.4 Errors in the iSCSI Layer 3429 The error handling due to errors at the iSCSI Layer is described in 3430 the following sections. For error recovery, see section 10.2. 3432 10.1.4.1 iSCSI Format Errors 3434 When an iSCSI format error is detected, the iSCSI Layer MUST request 3435 the iSER Layer to terminate the RCaP Stream by invoking the 3436 Connection_Terminate Operational Primitive. For more details on the 3437 connection termination, see Section 5.2.3.1. 3439 10.1.4.2 iSCSI Digest Errors 3441 In the iSER-assisted mode, the iSCSI Layer will not see any digest 3442 error because both the HeaderDigest and the DataDigest keys are 3443 negotiated to "None". 3445 10.1.4.3 iSCSI Sequence Errors 3447 For Traditional iSCSI, sequence errors are caused by dropped PDUs 3448 due to header or data digest errors. Since digests are not used in 3449 iSER-assisted mode and the RCaP layer will deliver all messages in 3450 the order they were sent, sequence errors will not occur in iSER- 3451 assisted mode. 3453 10.1.4.4 iSCSI Protocol Error 3455 When the iSCSI Layer handles certain protocol errors by dropping the 3456 connection, the error handling is the same as that for iSCSI format 3457 errors as described in section 10.1.4.1. 3459 When the iSCSI Layer uses the iSCSI Reject PDU and response codes to 3460 handle certain other protocol errors, no special handling at the 3461 iSER Layer is required. 3463 10.1.4.5 SCSI Timeouts and Session Errors 3465 This is handled at the iSCSI Layer and no special handling at the 3466 iSER Layer is required. 3468 10.1.4.6 iSCSI Negotiation Failures 3470 For negotiation failures that happen during the Login Phase at the 3471 initiator after the iSCSI Layer has invoked the 3472 Allocate_Connection_Resources Operational Primitive and before the 3473 Enable_Datamover Operational Primitive has been invoked, the iSCSI 3474 Layer MUST request the iSER Layer to deallocate all connection 3475 resources by invoking the Deallocate_Connection_Resources 3476 Operational Primitive. The iSCSI Layer at the initiator MUST 3477 terminate the Connection. 3479 For negotiation failures during the Login Phase at the target, the 3480 iSCSI Layer can use a Login Response with a status class other than 3481 0 (success) to terminate the Login Phase. If the iSCSI Layer has 3482 invoked the Allocate_Connection_Resources Operational Primitive and 3483 before the Enable_Datamover Operational Primitive has been invoked, 3484 the iSCSI Layer at the target MUST request the iSER Layer at the 3485 target to deallocate all connection resources by invoking the 3486 Deallocate_Connection_Resources Operational Primitive. The iSCSI 3487 Layer at both the initiator and the target MUST terminate the 3488 Connection. 3490 During the iSCSI Login Phase, if the iSCSI Layer at the initiator 3491 receives a Login Response from the target with a status class other 3492 than 0 (Success) after the iSCSI Layer at the initiator has invoked 3493 the Allocate_Connection_Resources Operational Primitive, the iSCSI 3494 Layer MUST request the iSER Layer to deallocate all connection 3495 resources by invoking the Deallocate_Connection_Resources 3496 Operational Primitive. The iSCSI Layer MUST terminate the 3497 Connection in this case. 3499 For negotiation failures during the full feature phase, the error 3500 handling is left to the iSCSI Layer and no special handling at the 3501 iSER Layer is required. 3503 10.2 Error Recovery 3505 Error recovery requirements of iSCSI/iSER are the same as that of 3506 Traditional iSCSI. All three ErrorRecoveryLevels as defined in 3507 [iSCSI] are supported in iSCSI/iSER. 3509 * For ErrorRecoveryLevel 0, session recovery is handled by iSCSI 3510 and no special handling by the iSER Layer is required. 3512 * For ErrorRecoveryLevel 1, see section 10.2.1 on PDU Recovery. 3514 * For ErrorRecoveryLevel 2, see section 10.2.2 on Connection 3515 Recovery. 3517 The iSCSI Layer may invoke the Notice_Key_Values Operational 3518 Primitive during connection setup to request the iSER Layer to take 3519 note of the value of the operational ErrorRecoveryLevel, as 3520 described in sections 5.1.1 and 5.1.2. 3522 10.2.1 PDU Recovery 3524 As described in sections 10.1.4.2 and 10.1.4.3, digest and sequence 3525 errors will not occur in the iSER-assisted mode. If the RCaP layer 3526 detects an error, it will close the iSCSI/iSER connection, as 3527 described in section 10.1.2. Therefore, PDU recovery is not useful 3528 in the iSER-assisted mode. 3530 The iSCSI Layer at the initiator SHOULD disable iSCSI timeout-driven 3531 PDU retransmissions. 3533 10.2.2 Connection Recovery 3535 The iSCSI Layer at the initiator MAY reassign connection allegiance 3536 for non-immediate commands which are still in progress and are 3537 associated with the failed connection by using a Task Management 3538 Function Request with the TASK REASSIGN function. See section 7.3.3 3539 for more details. 3541 When the iSCSI Layer at the initiator does a task reassignment for a 3542 SCSI Write command, it MUST qualify the Send_Control Operational 3543 Primitive invocation with DataDescriptorOut which defines the I/O 3544 Buffer for both the non-immediate unsolicited data and the solicited 3545 data. This allows the iSCSI Layer at the target to use recovery 3546 R2Ts to request for data originally sent as unsolicited and 3547 solicited from the initiator. 3549 When the iSCSI Layer at the target accepts a reassignment request 3550 for a SCSI Read command, it MUST request the iSER Layer to process 3551 SCSI Data-in for all unacknowledged data by invoking the Put_Data 3552 Operational Primitive. See section 7.3.5 on the handling of SCSI 3553 Data-in. 3555 When the iSCSI Layer at the target accepts a reassignment request 3556 for a SCSI Write command, it MUST request the iSER Layer to process 3557 a recovery R2T for any non-immediate unsolicited data and any 3558 solicited data sequences that have not been received by invoking the 3559 Get_Data Operational Primitive. See section 7.3.6 on the handling 3560 of Ready To Transfer (R2T). 3562 The iSCSI Layer at the target MUST NOT issue recovery R2Ts on an 3563 iSCSI/iSER connection for a task for which the connection allegiance 3564 was never reassigned. The iSER Layer at the target MAY reject such 3565 a recovery R2T received via the Get_Data Operational Primitive 3566 invocation from the iSCSI Layer at the target, with an appropriate 3567 error code. 3569 The iSER Layer at the target will process the requests invoked by 3570 the Put_Data and Get_Data Operational Primitives for a reassigned 3571 task in the same way as for the original commands. 3573 11 Security Considerations 3575 When iSER is layered on top of an RCaP layer and provides the RDMA 3576 extensions to the iSCSI protocol, the security considerations of 3577 iSER are the same as that of the underlying RCaP layer. For iWARP, 3578 this is described in [RDMAP] and [RDDPSEC], plus the updates to both 3579 of those RFCs that are contained in [IPSEC-IPS]. 3581 Since iSER-assisted iSCSI protocol is still functionally iSCSI from 3582 a security considerations perspective, all of the iSCSI security 3583 requirements as described in [iSCSI] applies. If iSER is layered on 3584 top of a non-IP based RCaP layer, all the security protocol 3585 mechanisms applicable to that RCaP layer is also applicable to an 3586 iSCSI/iSER connection. If iSER is layered on top of a non-IP 3587 protocol, the IPsec mechanism as specified in [iSCSI] MUST be 3588 implemented at any point where the iSER protocol enters the IP 3589 network (e.g., via gateways), and the non-IP protocol SHOULD 3590 implement (optional to use) a packet by packet security protocol 3591 equal in strength to the IPsec mechanism specified by [iSCSI]. 3593 In order to protect target RCaP connection resources from possible 3594 resource exhaustion attacks, allocation of such resources for a new 3595 connection MUST be delayed until it is reasonably certain that the 3596 new connection is not part of a resource exhaustion attack (e.g., 3597 until after the SecurityNegotiation stage of Login), see section 3598 5.1.2. 3600 A valid STag exposes I/O Buffer resources to the network for access 3601 via the RCaP. The security measures for the RCAP and iSER described 3602 in the above paragraphs can be used to protect data in an I/O buffer 3603 from undesired disclosure or modification, and these measures are of 3604 heightened importance for implementations that retain (e.g., cache) 3605 STags for use in multiple tasks (e.g., iSCSI I/O operations) because 3606 the resources are exposed to the network for a longer period of 3607 time. 3609 A complementary means of controlling I/O Buffer resource exposure is 3610 invalidation of the STag after completion of the associated task, 3611 which is RECOMMENDED in Section 2.5.1. The use of Send with 3612 Invalidate messages (which cause remote STag invalidation) is 3613 OPTIONAL, therefore the iSER layer MUST NOT rely on use of a Send 3614 with Invalidate by its Remote Peer to cause local STag invalidation. 3615 If an STag is expected to be invalid after completion of a task, the 3616 iSER layer MUST check the STag and invalidate it if it is still 3617 valid. 3619 12 IANA Considerations 3621 IANA is requested to add the following entries to the "iSCSI 3622 Login/Text Keys" registry of "iSCSI Parameters": 3624 MaxAHSLength, [RFCXXXX] 3626 TaggedBufferForSolicitedDataOnly, [RFCXXXX] 3628 iSERHelloRequired, [RFCXXXX] 3630 RFC Editor: Please replace XXXX in all instances of [RFCXXXX] above 3631 with the RFC number of this document and remove this note. 3633 IANA is requested to update the following entries in the "iSCSI 3634 Login/Text Keys" registry of "iSCSI Parameters" to reference the RFC 3635 number of this draft when it is published as an RFC. 3637 InitiatorRecvDataSegmentLength 3639 MaxOutstandingUnexpectedPDUs 3641 RDMAExtensions 3643 TargetRecvDataSegmentLength 3645 IANA is also requested to change the RFC5046 reference for the iSCSI 3646 Login/Text Keys registry to the RFC number of this document. 3648 IANA is requested to update the registrations of the iSER Opcodes 1- 3649 3 in the iSER Opcodes registry to reference the RFC number of this 3650 draft when it is published as an RFC. 3652 13 References 3654 13.1 Normative References 3656 [RFC5046] M. Ko et al., "iSCSI Externsions for Remote Direct Memory 3657 Access", RFC 5046, October 2007 3659 [iSCSI] Chadalapaka et al., "iSCSI Protocol (Consolidated)", draft- 3660 ietf-storm-iscsi-cons-08.txt (work in progress), January 2013 3662 [RDMAP] R. Recio et al., "An RDMA Protocol Specification", RFC 5040, 3663 October 2007 3665 [DDP] H. Shah et al., "Direct Data Placement over Reliable 3666 Transports", RFC 5041, October 2007 3668 [MPA] P. Culley et al., "Marker PDU Aligned Framing for TCP 3669 Specification", RFC 5044, October 2007 3671 [RDDPSEC] J. Pinkerton et al., "DDP/RDMAP Security", RFC 5042, 3672 October 2007 3674 [TCP] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, 3675 September 1981 3677 [RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate 3678 Requirement Levels", BCP 14, RFC 2119, March 1997 3680 [IPS-IPSEC] D. Black et al., "Securing Block Storage Protocols over 3681 IP: RFC 3723 Requirements Update for IPsec v3", draft-ietf- 3682 storm-ipsec-ips-update-03 (work in progress), July 2013 3684 13.2 Informative References 3686 [SAM5] T10/2104D rev r04, SCSI Architecture Model - 5 (SAM-5), 3687 Committee Draft. 3689 [iSCSI-SAM] F. Knight et al., "Internet Small Computer Systems 3690 Interface (iSCSI) SCSI Architecture Features Update", draft- 3691 ietf-storm-iscsi-sam-04.txt (work in progress), August 2011 3693 [DA] M. Chadalapaka et al., "Datamover Architecture for iSCSI", RFC 3694 5047, October 2007 3696 [IB] InfiniBand Architecture Specification Volume 1 Release 1.2, 3697 October 2004 3699 [IPoIB] H.K. Chu et al, "Transmission of IP over InfiniBand", RFC 3700 4391, March 2006 3702 14 Appendix A: Summary of Changes from RFC 5046 3704 All changes are backward compatible with RFC 5046 except for item #8 3705 which reflects all known implementations of iSER, each of which has 3706 implemented this change, despite its absence in RFC 5046. As a 3707 result, a hypothetical implementation based on RFC 5046 will not 3708 interoperate with an implementation based on this version of the 3709 specification. 3711 1. Removed the requirement that a connection be opened in "normal" 3712 TCP mode and transitioned to zero-copy mode. This allows the spec 3713 to conform to existing implementation for both Infiniband and 3714 iWARP. Changes were made in sections 2, 3.1.6, 4.2, 5.1, 5.1.1, 3715 5.1.2, 5.1.3, 10.1.3.4, and 11. 3717 2. Added a clause in section 6.2 to clarify that 3718 MaxRecvDataSegmentLength must be ignored if it is declared in the 3719 Login Phase. 3721 3. Added a clause in section 6.2 to clarify that the initiator must 3722 not send more than InitiatorMaxRecvDataSegmentLength worth of data 3723 when a NOP-Out request is sent with a valid Initiator Task Tag. 3724 Since InitiatorMaxRecvDataSegmentLength can be smaller than 3725 TargetMaxRecvDataSegmentLength, returning the original data in the 3726 NOP-Out request in this situation can overflow the receive buffer 3727 unless the length of the data sent with the NOP-Out request is 3728 less than InitiatorMaxRecvDataSegmentLength. 3730 4. Added a SHOULD negotiate recommendation for 3731 MaxOutstandingUnexpectedPDUs in section 6.7. 3733 5. Added MaxAHSLength key in section 6.8 to set a limit on the AHS 3734 Length. This is useful when posting receive buffers in knowing 3735 what the maximum possible message length is in a PDU which 3736 contains AHS. 3738 6. Added TaggedBufferForSolicitedDataOnly key in section 6.9 to 3739 indicate how the memory region will be used. An initiator can 3740 treat the memory regions intended for unsolicited and solicited 3741 data differently, and can use different registration modes. In 3742 contrast, RFC 5046 treats the memory occupied by the data as a 3743 contiguous (or virtually contiguous, by means of scatter-gather 3744 mechanisms) and homogenous region. Adding a new key will allow 3745 different memory models to be accommodated. Changes were also 3746 made in section 7.3.1. 3748 7. Added iSERHelloRequired key in section 6.10 to allow an initiator 3749 to allocate connection resources after the login process by 3750 requiring the use of the iSER Hello messages before sending iSCSI 3751 PDUs. The default is "No" since iSER Hello messages have not been 3752 implemented and are not in use. Changes were made in sections 3753 5.1.1, 5.1.2, 5.1.3, 8.2, 9.3, 9.4, 10.1.3.2 and 10.1.3.4. 3755 8. Added two 64-bit fields in iSER header in section 9.2 for the Read 3756 Base Offset and the Write Base Offset to accommodate a non-zero 3757 Base Offset. This allows one implementation such as the OFED 3758 stack to be used in both the Infiniband and the iWARP environment. 3759 Changes were made in the definition of Base Offset, Advertisement, 3760 and Tagged Buffer. Changes were also made in sections 2.4.1, 2.5, 3761 2.6, 7.3.1, 7.3.3, 7.3.5, 7.3.6, 9.1, 9.3, 9.4, 9.5.1, and 9.5.2. 3762 This change is not backward compatible with RFC 5046, but is part 3763 of all known implementations of iSER at the time this document was 3764 developed. 3766 9. Remove iWARP specific behavior. Changes were made in the 3767 definition section on RDMA Operation and Send Message Type. 3768 Clarifications were added in section 2.4.2 on the use of SendSE 3769 and SendInvSE. These clarifications reflect a removal of the 3770 requirements in RFC 5046 for the use of these messages, as 3771 implementations have not followed RFC 5046 in this area. Changes 3772 affecting Send with Invalidate were made in sections 2.4.1, 2.5, 3773 2.6, 4.1, and 7.3.2. Changes affecting Terminate were made in 3774 sections 10.1.2.1 and 10.1.2.2. Changes were made in section 15 3775 to remove iWARP headers. 3777 10. Removed denial of service descriptions for the initiator in 3778 section 5.1.1 since it is applicable for the target only. 3780 11. Clarified in section 2.4.1 that STag invalidation is the 3781 initiator's responsibility for security reasons, and the initiator 3782 cannot rely on the target using an Invalidate version of Send. 3783 Added text in section 11 on Stag invalidation. 3785 15 Appendix B: Message Format for iSER 3787 This section is for information only and is NOT part of the 3788 standard. 3790 15.1 iWARP Message Format for iSER Hello Message 3792 The following figure depicts an iSER Hello Message encapsulated in 3793 an iWARP SendSE Message. 3795 0 1 2 3 3796 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3798 | MPA Header | DDP Control | RDMA Control | 3799 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3800 | Reserved | 3801 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3802 | (Send) Queue Number | 3803 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3804 | (Send) Message Sequence Number | 3805 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3806 | (Send) Message Offset | 3807 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3808 | 0010b | Zeros | 0001b | 0001b | iSER-IRD | 3809 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3810 | All Zeros | 3811 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3812 | | 3813 | All Zeros | 3814 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3815 | All Zeros | 3816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3817 | | 3818 | All Zeros | 3820 | MPA CRC | 3821 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3822 Figure 6 SendSE Message containing an iSER Hello Message 3824 15.2 iWARP Message Format for iSER HelloReply Message 3826 The following figure depicts an iSER HelloReply Message encapsulated 3827 in an iWARP SendSE Message. The Reject (REJ) flag is set to 0. 3829 0 1 2 3 3830 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3831 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3832 | MPA Header | DDP Control | RDMA Control | 3833 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3834 | Reserved | 3835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3836 | (Send) Queue Number | 3837 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3838 | (Send) Message Sequence Number | 3839 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3840 | (Send) Message Offset | 3841 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3842 | 0011b |Zeros|0| 0001b | 0001b | iSER-ORD | 3843 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3844 | All Zeros | 3845 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3846 | | 3847 | All Zeros | 3848 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3849 | All Zeros | 3850 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3851 | | 3852 | All Zeros | 3853 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3854 | MPA CRC | 3855 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3856 Figure 7 SendSE Message containing an iSER HelloReply Message 3858 15.3 iSER Header Format for SCSI Read Command PDU 3860 The following figure depicts a SCSI Read Command PDU embedded in an 3861 iSER Message. For this particular example, in the iSER header, the 3862 Write STag Valid flag is set to zero, the Read STag Valid flag is 3863 set to one, the Write STag field is set to all zeros, the Write Base 3864 Offset field is set to all zeros, the Read STag field contains a 3865 valid Read STag, and the Read Base Offset field contains a valid 3866 Base Offset for the Read Tagged Buffer. 3868 0 1 2 3 3869 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3870 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3871 | 0001b |0|1| All zeros | 3872 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3873 | All Zeros | 3874 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3875 | | 3876 | All Zeros | 3877 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3878 | Read STag | 3879 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3880 | | 3881 | Read Base Offset | 3882 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3883 | SCSI Read Command PDU | 3884 // // 3885 | | 3886 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3888 Figure 8 iSER Header Format for SCSI Read Command PDU 3890 15.4 iSER Header Format for SCSI Write Command PDU 3892 The following figure depicts a SCSI Write Command PDU embedded in an 3893 iSER Message. For this particular example, in the iSER header, the 3894 Write STag Valid flag is set to one, the Read STag Valid flag is set 3895 to zero, the Write STag field contains a valid Write STag, the Write 3896 Base Offset field contains a valid Base Offset for the Write Tagged 3897 Buffer, the Read STag field is set to all zeros since it is not 3898 used, and the Read Base Offset field is set to all zeros. 3900 0 1 2 3 3901 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3902 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3903 | 0001b |1|0| All zeros | 3904 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3905 | Write STag | 3906 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3907 | | 3908 | Write Base Offset | 3909 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3910 | All Zeros | 3911 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3912 | | 3913 | All Zeros | 3914 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3915 | SCSI Write Command PDU | 3916 // // 3917 | | 3918 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3919 Figure 9 iSER Header Format for SCSI Write Command PDU 3921 15.5 iSER Header Format for SCSI Response PDU 3923 The following figure depicts a SCSI Response PDU embedded in an iSER 3924 Message: 3926 0 1 2 3 3927 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 3928 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3929 | 0001b |0|0| All Zeros | 3930 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3931 | All Zeros | 3932 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3933 | | 3934 | All Zeros | 3935 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3936 | All Zeros | 3937 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3938 | | 3939 | All Zeros | 3940 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3941 | SCSI Response PDU | 3942 // // 3943 | | 3944 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 3945 Figure 10 iSER Header Format for SCSI Response PDU 3947 16 Appendix C: Architectural discussion of iSER over InfiniBand 3949 This section explains how an InfiniBand network (with Gateways) 3950 would be structured. It is informational only and is intended to 3951 provide insight on how iSER is used in an InfiniBand environment. 3953 16.1 Host side of iSCSI & iSER connections in Infiniband 3955 Figure 11 defines the topologies in which iSCSI and iSER will be 3956 able to operate on an InfiniBand Network. 3958 +---------+ +---------+ +---------+ +---------+ +--- -----+ 3959 | Host | | Host | | Host | | Host | | Host | 3960 | | | | | | | | | | 3961 +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ +---+-+---+ 3962 |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| |HCA| 3963 +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ +-v-+ 3964 |----+------|-----+-----|-----+-----|-----+-----|-----+---> To IB 3965 IB| IB | IB | IB | IB | SubNet2 SWTCH 3966 +-v-----------v-----------v-----------v-----------v---------+ 3967 | InfiniBand Switch for Subnet1 | 3968 +---+-----+--------+-----+--------+-----+------------v------+ 3969 | TCA | | TCA | | TCA | | 3970 +-----+ +-----+ +-----+ | IB 3971 / IB \ / IB \ / \ +--+--v--+--+ 3972 | iSER | | iSER | | IPoIB | | | TCA | | 3973 | Gateway | | Gateway | | Gateway | | +-----+ | 3974 | to | | to | | to | | Storage | 3975 | iSCSI | | iSER | | IP | | Controller| 3976 | TCP | | iWARP | |Ethernet | +-----+-----+ 3977 +---v-----| +---v-----| +----v----+ 3978 | EN | EN | EN 3979 +--------------+---------------+----> to IP based storage 3980 Ethernet links that carry iSCSI or iWARP 3982 Figure 11 iSCSI and iSER on IB 3984 In Figure 11, the Host systems are connected via the InfiniBand Host 3985 Channel Adapters (HCAs) to the InfiniBand links. With the use of IB 3986 switch(es), the InfiniBand links connect the HCA to InfiniBand 3987 Target Channel Adapters (TCAs) located in gateways or Storage 3988 Controllers. An iSER-capable IB-IP Gateway converts the iSER 3989 Messages encapsulated in IB protocols to either standard iSCSI, or 3990 iSER Messages for iWARP. An [IPoIB] Gateway converts the InfiniBand 3991 [IPoIB] protocol to IP protocol, and in the iSCSI case, permits 3992 iSCSI to be operated on an IB Network between the Hosts and the 3993 [IPoIB] Gateway. 3995 16.2 Storage side of iSCSI & iSER mixed network environment 3997 Figure 12 shows a storage controller that has three different portal 3998 groups: one supporting only iSCSI (TPG-4), one supporting iSER/iWARP 3999 or iSCSI (TPG-2), and one supporting iSER/IB (TPG-1). 4001 | | | 4002 | | | 4003 +--+--v--+----------+--v--+----------+--v--+--+ 4004 | | IB | |iWARP| | EN | | 4005 | | | | TCP | | NIC | | 4006 | |(TCA)| | RNIC| | | | 4007 | +-----| +-----+ +-----+ | 4008 | TPG-1 TPG-2 TPG-4 | 4009 | 9.1.3.3 9.1.2.4 9.1.2.6 | 4010 | | 4011 | Storage Controller | 4012 | | 4013 +---------------------------------------------+ 4015 Figure 12 Storage Controller with TCP, iWARP, and IB Connections 4017 The normal iSCSI portal group advertising processes (via SLP, iSNS, 4018 or SendTargets) are available to a Storage Controller. 4020 16.3 Discovery processes for an InfiniBand Host 4022 An InfiniBand Host system can gather portal group IP address from 4023 SLP, iSNS, or the SendTargets discovery processes by using TCP/IP 4024 via [IPoIB]. After obtaining one or more remote portal IP 4025 addresses, the Initiator uses the standard IP mechanisms to resolve 4026 the IP address to a local outgoing interface and the destination 4027 hardware address (Ethernet MAC or IB GID of the target or a gateway 4028 leading to the target). If the resolved interface is an [IPoIB] 4029 network interface, then the target portal can be reached through an 4030 InfiniBand fabric. In this case the Initiator can establish an 4031 iSCSI/TCP or iSCSI/iSER session with the Target over that InfiniBand 4032 interface, using the Hardware Address (InfiniBand GID) obtained 4033 through the standard Address Resolution (ARP) processes. 4035 If more than one IP address are obtained through the discovery 4036 process, the Initiator should select a Target IP address that is on 4037 the same IP subnet as the Initiator if one exists. This will avoid 4038 a potential overhead of going through a gateway when a direct path 4039 exists. 4041 In addition a user can configure manual static IP route entries if a 4042 particular path to the target is preferred. 4044 16.4 IBTA Connection specifications 4046 It is outside the scope of this document, but it is expected that 4047 the InfiniBand Trade Association (IBTA) has or will define: 4049 * The iSER ServiceID 4051 * A Means for permitting a Host to establish a connection with a 4052 peer InfiniBand end-node, and that peer indicating when that 4053 end-node supports iSER, so the Host would be able to fall back 4054 to iSCSI/TCP over [IPoIB]. 4056 * A Means for permitting the Host to establish connections with 4057 IB iSER connections on storage controllers or IB iSER connected 4058 Gateways in preference to [IPoIB] connected Gateways/Bridges or 4059 connections to Target Storage Controllers that also accept 4060 iSCSI via [IPoIB]. 4062 * A Means for combining the IB ServiceID for iSER and the IP port 4063 number such that the IB Host can use normal IB connection 4064 processes, yet ensure that the iSER target peer can actually 4065 connect to the required IP port number. 4067 17 Acknowledgments 4069 The authors acknowledge the following individuals for identifying 4070 implementation issues and/or suggesting resolutions to the issues 4071 clarified in this document: Robert Russell, Arne Redlich, David 4072 Black, Mallikarjun Chadalapaka, Tom Talpey, Felix Marti, Robert 4073 Sharp, Caitlin Bestler, Hemal Shah, Spencer Dawkins, Pete Resnick, 4074 Ted Lemon, Pete McCann, and Steve Kent. Credit also goes to the 4075 authors of the original iSER Specification [RFC5046], including 4076 Michael Ko, Mallikarjun Chadalapaka, John Hufferd, Uri Elzur, Hemal 4077 Shah, and Patricia Thaler. This document benefited from all of 4078 their contributions. 4080 Author's Address 4082 Michael Ko 4083 Email: mkosjc@gmail.com 4085 Alexander Nezhinsky 4086 Mellanox Technologies 4087 13 Zarchin St. 4088 Raanana 43662, Israel 4089 Phone: +972-74-712-9000 4090 Email: alexandern@mellanox.com, nezhinsky@gmail.com 4092 Copyright Notice 4094 Copyright (c) 2012 IETF Trust and the persons identified as the 4095 document authors. All rights reserved. 4097 This document is subject to BCP 78 and the IETF Trust's Legal 4098 Provisions Relating to IETF Documents 4099 (http://trustee.ietf.org/license-info) in effect on the date of 4100 publication of this document. Please review these documents 4101 carefully, as they describe your rights and restrictions with 4102 respect to this document. Code Components extracted from this 4103 document must include Simplified BSD License text as described in 4104 Section 4.e of the Trust Legal Provisions and are provided without 4105 warranty as described in the Simplified BSD License.