idnits 2.17.1 draft-ietf-ips-iscsi-impl-guide-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 987. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 999. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1007. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1013. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == There are 15 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. -- The abstract seems to indicate that this document updates RFC3720, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 993 has weird spacing: '... be clai...' == Line 1002 has weird spacing: '... any ass...' == Line 1003 has weird spacing: '...t of an atte...' == Line 1005 has weird spacing: '...of this spec...' == Line 1010 has weird spacing: '...tention any...' == (2 more instances...) == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (September 2006) is 6431 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 3720' is mentioned on line 282, but not defined ** Obsolete undefined reference: RFC 3720 (Obsoleted by RFC 7143) == Unused Reference: 'RFC2119' is defined on line 938, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3720 (Obsoleted by RFC 7143) -- Possible downref: Non-RFC (?) normative reference: ref. 'SPC3' Summary: 7 errors (**), 0 flaws (~~), 12 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT Mallikarjun Chadalapaka 3 draft-ietf-ips-iscsi-impl-guide-02.txt Hewlett-Packard Co. 4 Editor 6 Expires September 2006 8 iSCSI Implementer's Guide 10 Status of this Memo 11 By submitting this Internet-Draft, each author represents 12 that any applicable patent or other IPR claims of which he or 13 she is aware have been or will be disclosed, and any of which 14 he or she becomes aware will be disclosed, in accordance with 15 Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet 18 Engineering Task Force (IETF), its areas, and its working 19 groups. Note that other groups may also distribute working 20 documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of 23 six months and may be updated, replaced, or obsoleted by 24 other documents at any time. It is inappropriate to use 25 Internet-Drafts as reference material or to cite them other 26 than a "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/1id-abstracts.html 31 The list of Internet-Draft Shadow Directories can be accessed 32 at http://www.ietf.org/shadow.html. 34 Abstract 35 iSCSI is a SCSI transport protocol and maps the SCSI family 36 of application protocols onto TCP/IP. RFC 3720 defines the 37 iSCSI protocol. This document compiles the clarifications to 38 the original protocol definition in RFC 3720 to serve as a 39 companion document for the iSCSI implementers. This document 40 updates RFC 3720 and the text in this document supersedes the 41 text in RFC 3720 when the two differ. 43 Table of Contents 45 1 Definitions and acronyms ...............................3 46 1.1 Definitions ............................................3 47 1.2 Acronyms ...............................................3 48 2 Introduction ...........................................5 49 3 iSCSI semantics for SCSI tasks .........................6 50 3.1 Residual handling ......................................6 51 3.1.1 Overview..............................................6 52 3.1.2 SCSI REPORT LUNS and Residual Overflow................7 53 3.2 R2T Ordering ...........................................8 54 3.3 SCSI Protocol Interface Model for Response Ordering ....8 55 3.3.1 Model Description.....................................9 56 3.3.2 iSCSI Semantics with the Interface Model..............9 57 3.3.3 Current List of Fenced Response Use Cases............10 58 4 Task Management .......................................12 59 4.1 Requests Affecting Multiple Tasks .....................12 60 4.1.1 Scope of affected tasks..............................12 61 4.1.2 Clarified multi-task abort semantics.................12 62 4.1.3 Updated multi-task abort semantics...................14 63 4.1.4 Rationale behind the new semantics...................16 64 5 Discovery semantics ...................................18 65 5.1 Error Recovery for Discovery Sessions .................18 66 5.2 Reinstatement Semantics of Discovery Sessions .........18 67 5.2.1 Unnamed Discovery Sessions...........................19 68 5.2.2 Named Discovery Sessions.............................19 69 5.3 TPGT Values ...........................................20 70 5.4 Session type negotiation ..............................20 71 6 iSCSI Error Handling and Recovery .....................21 72 6.1 ITT ...................................................21 73 6.2 Format Errors .........................................21 74 6.3 Digest Errors .........................................21 75 7 iSCSI PDUs ............................................23 76 7.1 Asynchronous Message ..................................23 77 8 Login/Text Operational Text Keys ......................24 78 8.1 FastMultiTaskAbort ....................................24 79 9 Security Considerations ...............................25 80 10 IANA Considerations ...................................26 81 11 References and Bibliography ...........................27 82 11.1 Normative References.................................27 83 11.2 Informative References...............................27 84 12 Editor's Address ......................................28 85 13 Acknowledgements ......................................29 86 14 Full Copyright Statement ..............................30 87 15 Intellectual Property Statement .......................31 89 1 Definitions and acronyms 91 1.1 Definitions 93 I/O Buffer � A buffer that is used in a SCSI Read or Write 94 operation so SCSI data may be sent from or received into 95 that buffer. For a read or write data transfer to take 96 place for a task, an I/O Buffer is required on the 97 initiator and at least one required on the target. 99 SCSI-Presented Data Transfer Length (SPDTL): SPDTL is the 100 aggregate data length of the data that SCSI layer 101 logically "presents" to iSCSI layer for a Data-in or 102 Data-out transfer in the context of a SCSI task. For a 103 bidirectional task, there are two SPDTL values � one for 104 Data-in and one for Data-out. Note that the notion of 105 "presenting" includes immediate data per the data 106 transfer model in [SAM2], and excludes overlapping data 107 transfers, if any, requested by the SCSI layer. 109 Third-party: A term used in this document to denote nexus 110 objects (I_T or I_T_L) and iSCSI sessions which reap the 111 side-effects of actions took place in the context of a 112 separate iSCSI session, while being third parties to the 113 action that caused the side-effects. One example of a 114 Third-party session is an iSCSI session hosting an I_T_L 115 nexus to an LU that is reset with an LU Reset TMF via a 116 separate I_T nexus. 118 1.2 Acronyms 120 Acronym Definition 122 ------------------------------------------------------------- 124 EDTL Expected Data Transfer Length 126 IANA Internet Assigned Numbers Authority 128 IETF Internet Engineering Task Force 130 I/O Input - Output 132 IP Internet Protocol 134 iSCSI Internet SCSI 136 iSER iSCSI Extensions for RDMA 138 ITT Initiator Task Tag 140 LO Leading Only 142 LU Logical Unit 144 LUN Logical Unit Number 146 PDU Protocol Data Unit 148 RDMA Remote Direct Memory Access 150 R2T Ready To Transfer 152 R2TSN Ready To Transfer Sequence Number 154 RFC Request For Comments 156 SAM SCSI Architecture Model 158 SCSI Small Computer Systems Interface 160 SN Sequence Number 162 SNACK Selective Negative Acknowledgment - also 164 Sequence Number Acknowledgement for data 166 TCP Transmission Control Protocol 168 TMF Task Management Function 170 TTT Target Transfer Tag 172 UA Unit Attention 174 2 Introduction 176 Several iSCSI implementations had been built after [RFC3720] was 177 published and the iSCSI community is now richer by the resulting 178 implementation expertise. The goal of this document is to 179 leverage this expertise both to offer clarifications to the 180 [RFC3720] semantics and to address defects in [RFC3720] as 181 appropriate. This document intends to offer critical guidance 182 to implementers with regard to non-obvious iSCSI implementation 183 aspects so as to improve interoperability and accelerate iSCSI 184 adoption. This document, however, does not purport to be an 185 all-encompassing iSCSI how-to guide for implementers, nor a 186 complete revision of [RFC3720]. This document instead is 187 intended as a companion document to [RFC3720] for the iSCSI 188 implementers. 190 iSCSI implementers are required to reference [RFC3722] and 191 [RFC3723] in addition to [RFC3720] for mandatory requirements. 192 In addition, [RFC3721] also contains useful information for 193 iSCSI implementers. The text in this document, however, updates 194 and supersedes the text in all the noted RFCs whenever there is 195 such a question. 197 3 iSCSI semantics for SCSI tasks 199 3.1 Residual handling 201 Section 10.4.1 of [RFC3720] defines the notion of "residuals" 202 and specifies how the residual information should be encoded 203 into the SCSI Response PDU in Counts and Flags fields. Section 204 3.1.1 clarifies the intent of [RFC3720] and explains the general 205 principles. Section 3.1.2 describes the residual handling in 206 the REPORT LUNS scenario. 208 3.1.1 Overview 210 SCSI-Presented Data Transfer Length (SPDTL) is the term this 211 document uses (see section 1.1 for definition) to represent the 212 aggregate data length that the target SCSI layer attempts to 213 transfer using the local iSCSI layer for a task. Expected Data 214 Transfer Length (EDTL) is the iSCSI term that represents the 215 length of data that iSCSI layer expects to transfer for a task. 216 EDTL is specified in the SCSI Command PDU. 218 When SPDTL = EDTL for a task, the target iSCSI layer completes 219 the task with no residuals. Whenever SPDTL differs from EDTL 220 for a task, that task is said to have a residual. 222 If SPDTL > EDTL for a task, iSCSI Overflow MUST be signaled in 223 the SCSI Response PDU as specified in [RFC3720]. Residual Count 224 MUST be set to the numerical value of (SPDTL � EDTL). 226 If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in 227 the SCSI Response PDU as specified in [RFC3720]. Residual Count 228 MUST be set to the numerical value of (EDTL � SPDTL). 230 Note that the Overflow and Underflow scenarios are independent 231 of Data-in and Data-out. Either scenario is logically possible 232 in either direction of data transfer. 234 3.1.2 SCSI REPORT LUNS and Residual Overflow 236 The specification of the SCSI REPORT LUNS command requires that 237 the SCSI target limit the amount of data transferred to a 238 maximum size (ALLOCATION LENGTH) provided by the initiator in 239 the REPORT LUNS CDB. If the Expected Data Transfer Length 240 (EDTL) in the iSCSI header of the SCSI Command PDU for a REPORT 241 LUNS command is set to at least as large as that ALLOCATION 242 LENGTH, the SCSI layer truncation prevents an iSCSI Residual 243 Overflow from occurring. A SCSI initiator can detect that such 244 truncation has occurred via other information at the SCSI layer. 245 The rest of the section elaborates this required behavior. 247 iSCSI uses the (O) bit (bit 5) in the Flags field of the SCSI 248 Response and the last SCSI Data-In PDUs to indicate that that an 249 iSCSI target was unable to transfer all of the SCSI data for a 250 command to the initiator because the amount of data to be 251 transferred exceeded the EDTL in the corresponding SCSI Command 252 PDU (see Section 10.4.1 of [RFC3720]). 254 The SCSI REPORT LUNS command requests a target SCSI layer to 255 return a logical unit inventory (LUN list) to the initiator SCSI 256 layer (see section 6.21 of SPC-3 [SPC3]). The size of this LUN 257 list may not be known to the initiator SCSI layer when it issues 258 the REPORT LUNS command; to avoid transfer of more LUN list data 259 than the initiator is prepared for, the REPORT LUNS CDB contains 260 an ALLOCATION LENGTH field to specify the maximum amount of data 261 to be transferred to the initiator for this command. If the 262 initiator SCSI layer has under-estimated the number of logical 263 units at the target, it is possible that the complete logical 264 unit inventory does not fit in the specified ALLOCATION LENGTH. 265 In this situation, section 4.3.3.6 in [SPC3] requires that the 266 target SCSI layer "shall terminate transfers to the Data-In 267 Buffer" when the number of bytes specified by the ALLOCATION 268 LENGTH field have been transferred. 270 Therefore, in response to a REPORT LUNS command, the SCSI layer 271 at the target presents at most ALLOCATION LENGTH bytes of data 272 (logical unit inventory) to iSCSI for transfer to the initiator. 273 For a REPORT LUNS command, if the iSCSI EDTL is at least as 274 large as the ALLOCATION LENGTH, the SCSI truncation ensures that 275 the EDTL will accommodate all of the data to be transferred. If 276 all of the logical unit inventory data presented to the iSCSI 277 layer � i.e. the data remaining after any SCSI truncation - is 278 transferred to the initiator by the iSCSI layer, an iSCSI 279 Residual Overflow has not occurred and the iSCSI (O) bit MUST 280 NOT be set in the SCSI Response or final SCSI Data-Out PDU. 281 This is not a new requirement but is already required by the 282 combination of [RFC 3720] with the specification of the REPORT 283 LUNS command in [SPC3]. If the iSCSI EDTL is larger than the 284 ALLOCATION LENGTH however in this scenario, note that the iSCSI 285 Underflow MUST be signaled in the SCSI Response PDU. An iSCSI 286 Underflow MUST also be signaled when the iSCSI EDTL is equal to 287 ALLOCATION LENGTH but the logical unit inventory data presented 288 to the iSCSI layer is smaller than ALLOCATION LENGTH. 290 The LUN LIST LENGTH field in the logical unit inventory (first 291 field in the inventory) is not affected by truncation of the 292 inventory to fit in ALLOCATION LENGTH; this enables a SCSI 293 initiator to determine that the received inventory is incomplete 294 by noticing that the LUN LIST LENGTH in the inventory is larger 295 than the ALLOCATION LENGTH that was sent in the REPORT LUNS CDB. 296 A common initiator behavior in this situation is to re-issue the 297 REPORT LUNS command with a larger ALLOCATION LENGTH. 299 3.2 R2T Ordering 301 Section 10.8 in [RFC3720] says the following: 303 The target may send several R2T PDUs. It, therefore, can have 304 a number of pending data transfers. The number of outstanding 305 R2T PDUs are limited by the value of the negotiated key 306 MaxOutstandingR2T. Within a connection, outstanding R2Ts MUST 307 be fulfilled by the initiator in the order in which they were 308 received. 310 The quoted [RFC3720] text was unclear on the scope of 311 applicability � either per task, or across all tasks on a 312 connection � and may be interpreted as either. This section is 313 intended to clarify that the scope of applicability of the 314 quoted text is a task. No R2T ordering relationship � either in 315 generation at the target or in fulfilling at the initiator � 316 across tasks is implied. I.e., outstanding R2Ts within a task 317 MUST be fulfilled by the initiator in the order in which they 318 were received on a connection. 320 3.3 SCSI Protocol Interface Model for Response Ordering 322 Whenever an iSCSI session is composed of multiple connections, 323 the Response PDUs (task responses or TMF responses) originating 324 in the target SCSI layer are distributed onto the multiple 325 connections by the target iSCSI layer according to iSCSI 326 connection allegiance rules. This process generally may not 327 preserve the ordering of the responses by the time they are 328 delivered to the initiator SCSI layer. Since ordering is not 329 expected across SCSI responses anyway, this approach works fine 330 in the general case. However to address the special cases where 331 some ordering is desired by the SCSI layer, the following SCSI 332 protocol interface model is assumed. 334 3.3.1 Model Description 336 SCSI protocol layer instructs the SCSI transport layer of a 337 "Response Fence" associated with the response in question when 338 the "Send Command Complete" protocol data service (SAM-2, clause 339 5.4.2) and "Task Management Function Executed" (SAM-2, clause 340 6.9) service are invoked. The Response Fence flag instructs the 341 SCSI transport layer that the following conditions must be met 342 in delivering the response message: 344 (1) Response with Response Fence MUST chronologically be 345 delivered after all the "preceding" responses on the 346 I_T_L nexus, if the preceding responses are delivered at 347 all, to the application client on the initiator. 349 (2) Response with Response Fence MUST chronologically be 350 delivered prior to all the "following" responses on the 351 I_T_L nexus. 353 The "preceding" and "following" notions refer to the order of 354 hand-off of a response message from the target SCSI protocol 355 layer to the target SCSI transport (e.g. iSCSI) layer. 357 3.3.2 iSCSI Semantics with the Interface Model 359 The target iSCSI layer MUST do the following on sensing the 360 "Response Fence" flag associated with a response being handed 361 down from the target SCSI layer: 363 a) If it is a single-connection session, no special processing 364 is required. Standard SCSI Response PDU build process 365 happens. 367 b) If it is a multi-connection session, target iSCSI layer 368 takes note of last-sent and unacknowledged StatSN on each 369 of the connections in the iSCSI session, and waits for 370 acknowledgement (may solicit for acknowledgement by way of 371 a Nop-In) of each such StatSN to clear the fence. SCSI 372 response with the Response Fence flag must be sent to the 373 initiator only after receiving acknowledgements for each of 374 the unacknowledged StatSNs. 376 c) Target iSCSI layer must wait for an acknowledgement of the 377 SCSI Response PDU that carried the response which the 378 target SCSI layer marked with the Response Fence flag. The 379 fence must be considered cleared after receiving the 380 acknowledgement. 382 d) All further status processing for the LU is resumed only 383 after clearing the fence. If any new responses for the 384 I_T_L nexus are received from the SCSI layer before the 385 fence is cleared, those Response PDUs must be held and 386 queued at the iSCSI layer until the fence is cleared. 388 3.3.3 Current List of Fenced Response Use Cases 390 This section lists the fenced response use cases that iSCSI 391 implementations must comply with. However, this is not an 392 exhaustive enumeration. It is expected that as SCSI protocol 393 specifications evolve, the specifications will specify when 394 response fencing is required on a case-by-case basis. 396 Response Fence flag MUST be assumed set by the target SCSI layer 397 on the following SCSI completion messages handed down to the 398 target iSCSI layer: 400 1. The first completion message carrying the UA after the 401 multi-task abort on issuing and third-party sessions. 403 2. The TMF Response carrying the mult-task TMF Response on the 404 issuing session. 406 3. The completion message indicating ACA establishment on the 407 issuing session. 409 4. The first completion message carrying the ACA ACTIVE status 410 after ACA establishment on issuing and third-party 411 sessions. 413 5. The TMF Response carrying the Clear ACA response on the 414 issuing session. 416 Note: Due to the absence of ACA-related fencing requirements in 417 [RFC3720], initiator implementations SHOULD NOT use ACA on 418 multi-connection iSCSI sessions to targets complying only with 419 [RFC3720], i.e. those not complying with this document. 420 Initiators may assess target compliance to this document via 421 negotiating for FastMultiTaskAbort (section 8.1) key. 423 4 Task Management 425 4.1 Requests Affecting Multiple Tasks 427 This section clarifies and updates the original text in section 428 10.6.2 of [RFC3720]. The clarified semantics (section 4.1.2) 429 are a superset of the protocol behavior required in the original 430 text and all iSCSI implementations MUST support the new 431 behavior. The updated semantics (section 4.1.3) on the other 432 hand are mandatory only when the new key FastMultiTaskAbort 433 (section 8.1) is negotiated to "Yes". 435 4.1.1 Scope of affected tasks 437 This section defines the notion of "affected tasks" in multi- 438 task abort scenarios. Scope definitions in this section apply 439 to both the clarified protocol behavior (section 4.1.2) and the 440 updated protocol behavior (section 4.1.3). 442 ABORT TASK SET: All outstanding tasks for the I_T_L nexus 443 identified by the LUN field in the ABORT TASK SET TMF 444 Request PDU. 446 CLEAR TASK SET: All outstanding tasks in the task set for 447 the LU identified by the LUN field in the CLEAR TASK SET 448 TMF Request PDU. See [SPC3] for the definition of a "task 449 set". 451 LOGICAL UNIT RESET: All outstanding tasks from all 452 initiators for the LU identified by the LUN field in the 453 LOGICAL UNIT RESET Request PDU. 455 TARGET WARM RESET/TARGET COLD RESET: All outstanding tasks 456 from all initiators across all LUs that the TMF-issuing 457 session has access to on the SCSI target device hosting the 458 iSCSI session. 460 Usage: an "ABORT TASK SET TMF Request PDU" in the preceding text 461 is an iSCSI TMF Request PDU with the "Function" field set to 462 "ABORT TASK SET" as defined in [RFC3720]. Similar usage is 463 employed for other scope descriptions. 465 4.1.2 Clarified multi-task abort semantics 467 All iSCSI implementations MUST support the protocol behavior 468 defined in this section as the default behavior. The execution 469 of ABORT TASK SET, CLEAR TASK SET, LOGICAL UNIT RESET, TARGET 470 WARM RESET, and TARGET COLD RESET TMF Requests consists of the 471 following sequence of actions in the specified order on the 472 specified party. 474 The initiator iSCSI layer: 476 a. MUST continue to respond to each TTT received for the 477 affected tasks. 479 b. Should receive any responses that the target may provide 480 for some tasks among the affected tasks (may process them 481 as usual because they are guaranteed to have 482 chronologically originated prior to the TMF response). 484 c. Should receive the TMF Response concluding all the tasks in 485 the set of affected tasks. 487 The target iSCSI layer: 489 a. MUST wait for all currently valid target transfer tags of 490 the affected tasks to be responded to. 492 b. MUST wait (concurrent with the wait in Step.a) for all 493 commands of the affected tasks to be received based on the 494 CmdSN ordering. SHOULD NOT wait for new commands on 495 third-party affected sessions - only the instantiated tasks 496 have to be considered for the purpose of determining the 497 affected tasks. In the case of target-scoped requests 498 (i.e. TARGET WARM RESET and TARGET COLD RESET), all the 499 commands that are not yet received on the issuing session 500 in the command stream however can be considered to have 501 been received with no command waiting period - i.e. the 502 entire CmdSN space up to the CmdSN of the task management 503 function can be "plugged". 505 c. MUST propagate the TMF request to and receive the response 506 from the target SCSI layer. 508 d. MUST address the Response Fence flag on the TMF Response on 509 issuing session as defined in 3.3.2. 511 e. MUST address the Response Fence flag on the first post-TMF 512 Response on third-party sessions as defined in 3.3.2. If 513 some tasks originate from non-iSCSI I_T_L nexuses then the 514 means by which the target ensures that all affected tasks 515 have returned their status to the initiator are defined by 516 the specific non-iSCSI transport protocol(s). 518 4.1.3 Updated multi-task abort semantics 520 Protocol behavior defined in this section MUST be implemented by 521 all iSCSI implementations complying with this document. 522 Protocol behavior defined in this section MUST be exhibited by 523 iSCSI implementations on an iSCSI session when they negotiate 524 the FastMultiTaskAbort (section 8.1) key to "Yes" on that 525 session. The execution of ABORT TASK SET, CLEAR TASK SET, 526 LOGICAL UNIT RESET, TARGET WARM RESET, and TARGET COLD RESET TMF 527 Requests consists of the following sequence of actions in the 528 specified order on the specified party. 530 The initiator iSCSI layer: 532 a. MUST NOT send any more Data-Out PDUs for affected tasks on 533 the issuing connection of the issuing iSCSI session once 534 the TMF is sent to the target. 536 b. Should receive any responses that the target may provide 537 for some tasks among the affected tasks (may process them 538 as usual because they are guaranteed to have 539 chronologically originated prior to the TMF response). 541 c. MUST respond to Async Message PDU with AsyncEvent=5 as 542 defined in section 7.1. 544 d. Should receive the TMF Response concluding all the tasks in 545 the set of affected tasks. 547 The target iSCSI layer: 549 a. MUST wait for all commands of the affected tasks to be 550 received based on the CmdSN ordering on the issuing 551 session. SHOULD NOT wait for new commands on third-party 552 affected sessions - only the instantiated tasks have to be 553 considered for the purpose of determining the affected 554 tasks. In the case of target-scoped requests (i.e. TARGET 555 WARM RESET and TARGET COLD RESET), all the commands that 556 are not yet received on the issuing session in the command 557 stream however can be considered to have been received with 558 no command waiting period - i.e. the entire CmdSN space up 559 to the CmdSN of the task management function can be 560 "plugged". 562 b. MUST propagate the TMF request to and receive the response 563 from the target SCSI layer. 565 c. MUST leave all active "affected TTTs" (i.e. active TTTs 566 associated with affected tasks) valid along with any buffer 567 allocations for the TTTs intact. 569 d. MUST generate an Asynchronous Message PDU with AsyncEvent=5 570 (section 7.1) on: 571 i) each connection of each third-party session that at 572 least one affected task is allegiant to, and 573 ii) each connection except the non-issuing connection of the 574 issuing session that has at least one allegiant affected 575 task. 577 If there are multiple affected LUs (say due to a target 578 reset), then one Async Message PDU MUST be sent for each 579 such LU on each connection that has at least one allegiant 580 affected task. 582 e. MUST address the Response Fence flag on the TMF Response on 583 issuing session as defined in 3.3.2. 585 f. MUST address the Response Fence flag on the first post-TMF 586 Response on third-party sessions as defined in 3.3.2. If 587 some tasks originate from non-iSCSI I_T_L nexuses then the 588 means by which the target ensures that all affected tasks 589 have returned their status to the initiator are defined by 590 the specific non-iSCSI transport protocol(s). 592 g. MUST free up the affected TTTs (and STags, if applicable) 593 and the corresponding buffers once it receives the 594 associated Nop-Out acknowledgement that the initiator 595 generated in response to the Async Message. 597 Implementation note: Technically, the TMF servicing is 598 complete in Step.e. Data transfers corresponding to terminated 599 tasks may however still be in progress even at the end of 600 Step.f. In the case of iSCSI/iSER, these transfers would be 601 into tagged buffers with STags not owned by any active tasks. 602 Step.g specifies an event to free up the resources. A target 603 may, on an implementation-defined internal timeout, also 604 choose to drop the connections on which it did not receive the 606 expected Nop-Out acknowledgements so as to reclaim the 607 associated buffer, STag and TTT resources as appropriate. 609 4.1.3.1 Clearing effects update 611 Appendix F.1 of [RFC3720] specifies the clearing effects of 612 target and LU resets on "Incomplete TTTs" as "Y". This meant 613 that a target warm reset or a target cold reset or an LU reset 614 would clear the active TTTs upon completion. The 615 FastMultiTaskAbort semantics defined by this section however do 616 not guarantee that the active TTTs are cleared by the end of the 617 reset operations. In fact, the new semantics are designed to 618 allow clearing the TTTs in a "lazy" fashion after the TMF 619 Response is delivered. Thus, when FastMultiTaskAbort=Yes is 620 operational on a session, the clearing effects of reset 621 operations on "Incomplete TTTs" is "N". 623 4.1.4 Rationale behind the new semantics 625 There are fundamentally three basic objectives behind the 626 semantics specified in section 4.1.2 and section 4.1.3. 628 1. Maintaining an ordered command flow I_T nexus abstraction 629 to the target SCSI layer even with multi-connection 630 sessions. 632 o Target iSCSI processing of a TMF request must maintain 633 the single flow illusion. Target behavior in Step.b 634 of section 4.1.2 and Step.a of section 4.1.3 635 correspond to this objective. 637 2. Maintaining a single ordered response flow I_T nexus 638 abstraction to the initiator SCSI layer even with multi- 639 connection sessions when one response (i.e. TMF response) 640 could imply the status of other unfinished tasks from the 641 initiator's perspective. 643 o Target must ensure that the initiator does not see 644 "old" task responses (that were placed on the wire 645 chronologically earlier than the TMF Response) after 646 seeing the TMF response. Target behavior in Step.d of 647 section 4.1.2 and Step.e of section 4.1.3 correspond 648 to this objective. 650 o Whenever the result of a TMF action is visible across 651 multiple I_T_L nexuses, [SAM2] requires the SCSI 652 device server to trigger a UA on each of the other 654 I_T_L nexuses. Once an initiator is notified of such 655 an UA, the application client on the receiving 656 initiator is required to clear its task state (clause 657 5.5 in [SAM2]) for the affected tasks. It would thus 658 be inappropriate to deliver a SCSI Response for a task 659 after the task state is cleared on the initiator, i.e. 660 after the UA is notified. The UA notification 661 contained in the first SCSI Response PDU on each 662 affected Third-party I_T_L nexus after the TMF action 663 thus MUST NOT pass the affected task responses on any 664 of the iSCSI sessions accessing the LU. Target 665 behavior in Step.e of section 4.1.2 and Step.f of 666 section 4.1.3 correspond to this objective. 668 3. Draining all active TTTs corresponding to affected tasks 669 in a deterministic fashion. 671 o Data-out PDUs with stale TTTs arriving after the tasks 672 are terminated can create a buffer management problem 673 even for traditional iSCSI implementations, and is 674 fatal for the connection for iSCSI/iSER 675 implementations. Either the termination of affected 676 tasks should be postponed until the TTTs are retired 677 (as in Step.a of section 4.1.2), or the TTTs and the 678 buffers should stay allocated beyond task termination 679 to be deterministically freed up later (as in Step.c 680 and Step.g of section 4.1.3). 682 The only other notable optimization is the plugging. If all 683 tasks on an I_T nexus will be aborted anyway (as with a target 684 reset), there is no need to wait to receive all commands to plug 685 the CmdSN holes. Target iSCSI layer can simply plug all missing 686 CmdSN slots and move on with TMF processing. The first 687 objective (maintaining a single ordered command flow) is still 688 met with this optimization because target SCSI layer only sees 689 ordered commands. 691 5 Discovery semantics 693 5.1 Error Recovery for Discovery Sessions 695 The negotiation of the key ErrorRecoveryLevel is not required 696 for Discovery sessions � i.e. for sessions that negotiated 697 "SessionType=Discovery" � because the default value of 0 is 698 necessary and sufficient for Discovery sessions. It is however 699 possible that some legacy iSCSI implementations might attempt to 700 negotiate the ErrorRecoveryLevel key on Discovery sessions. 701 When such a negotiation attempt is made by the remote side, a 702 compliant iSCSI implementation MUST propose a value of 0 (zero) 703 in response. The operational ErrorRecoveryLevel for Discovery 704 sessions thus MUST be 0. This naturally follows from the 705 functionality constraints [RFC3720] imposes on Discovery 706 sessions. 708 5.2 Reinstatement Semantics of Discovery Sessions 710 Discovery sessions are intended to be relatively short-lived. 711 Initiators are not expected to establish multiple Discovery 712 sessions to the same iSCSI Network Portal (see [RFC3720]). An 713 initiator may use the same iSCSI Initiator Name and ISID when 714 establishing different unique sessions with different targets 715 and/or different portal groups. This behavior is discussed in 716 Section 9.1.1 of [RFC3720] and is, in fact, encouraged as 717 conservative reuse of ISIDs. ISID RULE in [RFC3720] states that 718 there must not be more than one session with a matching 4-tuple: 719 . While 720 the spirit of the ISID RULE applies to Discovery sessions the 721 same as it does for Normal sessions, note that some Discovery 722 sessions differ from the Normal sessions in two important 723 aspects: 725 Because [RFC3720] allows a Discovery session to be 726 established without specifying a TargetName key in the 727 Login Request PDU (let us call such a session an "Unnamed" 728 Discovery session), there is no Target Node context to 729 enforce the ISID RULE. 731 Portal Groups are defined only in the context of a Target 732 Node. When the TargetName key is NULL-valued (i.e. not 733 specified), the TargetPortalGroupTag thus cannot be 734 ascertained to enforce the ISID RULE. 736 The following sections describe the two scenarios � Named 737 Discovery sessions and Unnamed Discovery sessions � separately. 739 5.2.1 Unnamed Discovery Sessions 741 For Unnamed Discovery sessions, neither the TargetName nor the 742 TargetPortalGroupTag is available to the targets in order to 743 enforce the ISID RULE. So the following rule applies. 745 UNNAMED ISID RULE: Targets MUST enforce the uniqueness of the 746 following 4-tuple for Unnamed Discovery sessions: 747 . The following 748 semantics are implied by this uniqueness requirement. 750 Targets SHOULD allow concurrent establishment of one Discovery 751 session with each of its Network Portals by the same initiator 752 port with a given iSCSI Node Name and an ISID. Each of the 753 concurrent Discovery sessions, if established by the same 754 initiator port to other Network Portals, MUST be treated as 755 independent sessions � i.e. one session MUST NOT reinstate the 756 other. 758 A new Unnamed Discovery session that has a matching 759 to an existing 760 discovery session MUST reinstate the existing Unnamed Discovery 761 session. Note thus that only an Unnamed Discovery session may 762 reinstate an Unnamed Discovery session. 764 5.2.2 Named Discovery Sessions 766 For a Named Discovery session, the TargetName key is specified 767 by the initiator and thus the target can unambiguously ascertain 768 the TargetPortalGroupTag as well. Since all the four elements 769 of the 4-tuple are known, the ISID RULE MUST be enforced by 770 targets with no changes from [RFC3720] semantics. A new session 771 with a matching thus will reinstate an existing session. 773 Note in this case that any new iSCSI session (Discovery or 774 Normal) with the matching 4-tuple may reinstate an existing 775 Named Discovery iSCSI session. 777 5.3 TPGT Values 779 SAM-2 and SAM-3 specifications incorrectly note in their 780 informative text that TPGT value should be non-zero, although 781 [RFC3720} allows the value of zero for TPGT. This section is to 782 clarify that zero value is expressly allowed as a legal value 783 for TPGT. A future revision of SAM will be corrected to address 784 this discrepancy. 786 5.4 Session type negotiation 788 During the Login phase, the SessionType key is offered by the 789 initiator to choose the type of session it wants to create with 790 the target. The target may accept or reject the offer. 791 Depending on the type of the session, a target may decide on 792 resources to allocate and the security to enforce etc. for the 793 session. If the SessionType key is thus going to be offered as 794 "Discovery", it SHOULD be offered in the initial Login request 795 by the initiator. 797 6 iSCSI Error Handling and Recovery 799 6.1 ITT 801 Section 10.19 in [RFC3720] mentions this in passing but noted 802 here again for making it obvious since the semantics apply to 803 the initiators in general. An ITT value of 0xffffffff is 804 reserved and MUST NOT be assigned for a task by the initiator. 805 The only instance it may be seen on the wire is in a target- 806 initiated NOP-In PDU (and in the initiator response to that PDU 807 if necessary). 809 6.2 Format Errors 811 Section 6.6 of [RFC3720] discusses format error handling. This 812 section elaborates on the "inconsistent" PDU field contents 813 noted in [RFC3720]. 815 All initiator-detected PDU construction errors MUST be 816 considered as format errors. Some examples of such errors are: 818 - NOP-In with a valid TTT but an invalid LUN 820 - NOP-In with a valid ITT (i.e. a NOP-In response) and also a 821 valid TTT 823 - SCSI Response PDU with Status=CHECK CONDITION, but 824 DataSegmentLength = 0 826 6.3 Digest Errors 828 Section 6.7 of [RFC3720] discusses digest error handling. It 829 states that "No further action is necessary for initiators if the discarded 830 PDU is an unsolicited PDU (e.g., Async, Reject)" on detecting a 831 payload digest error. This is incorrect. 833 An Asynchronous Message PDU or a Reject PDU carries the next 834 StatSN value on an iSCSI connection, advancing the StatSN. When 835 an initiator discards one of these PDUs due to a payload digest 836 error, the entire PDU including the header MUST be discarded. 837 Consequently, the initiator MUST treat the exception like a loss 838 of any other solicited response PDU � i.e. it MUST use one of 839 the following options noted in [RFC3720]: 841 a) Request PDU retransmission with a status SNACK. 843 b) Logout the connection for recovery and continue the 844 tasks on a different connection instance. 846 c) Logout to close the connection (abort all the commands 847 associated with the connection). 849 7 iSCSI PDUs 851 7.1 Asynchronous Message 853 This section defines additional semantics for the Asynchronous 854 Message PDU defined in section 10.9 of [RFC3720] using the same 855 conventions. 857 The following new legal value for AsyncEvent is defined: 859 5: all active tasks for LU with matching LUN field in the Async 860 Message PDU are being terminated. 862 The receiving initiator iSCSI layer MUST respond this Message by 863 taking the following steps in order. 865 i) Stop Data-Out transfers on that connection for all active 866 TTTs for the affected LUN quoted in the Async Message 867 PDU. 868 ii) Acknowledge the StatSN of the Async Message PDU via a 869 Nop-Out PDU with ITT=0xffffffff (i.e. non-ping flavor), 870 while copying the LUN field from Async Message to Nop- 871 Out. 873 8 Login/Text Operational Text Keys 875 This section follows the same conventions as section 12 of 876 [RFC3720]. 878 8.1 FastMultiTaskAbort 880 Use: LO 881 Senders: Initiator and Target 882 Scope: SW 884 Irrelevant when: SessionType=Discovery 885 FastMultiTaskAbort= 887 Default is No. 888 Result function is AND. 890 This key is used to negotiate the updated fast multi-task abort 891 semantics defined in section 4.1.3. By negotiating this key to 892 "Yes", an initiator and a target agree that the new semantics 893 MUST be used in the multi-task TMF handling situations. The 894 default is to use the [RFC3720] TMF semantics as clarified in 895 section 4.1.2. 897 9 Security Considerations 899 This document does not introduce any new security considerations 900 other than those already noted in [RFC3720]. Consequently, all 901 the iSCSI-related security text in [RFC3723] is also directly 902 applicable to this document. 904 10 IANA Considerations 906 This draft does not have any specific IANA considerations other 907 than those already noted in [RFC3720]. 909 11 References and Bibliography 911 11.1 Normative References 913 [RFC3720] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, 914 M., and E. Zeidner, "Internet Small Computer Systems 915 Interface (iSCSI)", RFC 3720, April 2004. 917 [RFC3722] Bakke, M., "String Profile for Internet Small 918 Computer Systems Interface (iSCSI) Names", RFC 3722, April 919 2004. 921 [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and 922 F. Travostino, "Securing Block Storage Protocols over IP", 923 RFC 3723, April 2004. 925 [SPC3] T10/1416-D, SCSI Primary Commands-3. 927 11.2 Informative References 929 [RFC3721] Bakke, M., Hafner, J., Hufferd, J., Voruganti, K., 930 and M. Krueger, "Internet Small Computer Systems Interface 931 (iSCSI) Naming and Discovery", RFC 3721, April 2004. 933 [iSER] Ko, M., Chadalapaka, M., Elzur, U., Shah, H., Thaler, 934 P., J. Hufferd, "iSCSI Extensions for RDMA", IETF 935 Internet Draft draft-ietf-ips-iser-04.txt (work in 936 progress), June 2005. 938 [RFC2119] Bradner, S. "Key Words for use in RFCs to Indicate 939 Requirement Levels", BCP 14, RFC 2119, March 1997. 941 [SAM2] ANSI X3.366-2003, SCSI Architecture Model-2 (SAM-2). 943 12 Editor's Address 945 Mallikarjun Chadalapaka 946 Hewlett-Packard Company 947 8000 Foothills Blvd. 948 Roseville, CA 95747-5668, USA 949 Phone: +1-916-785-5621 950 E-mail: cbm@rose.hp.com 952 13 Acknowledgements 954 The IP Storage (ips) Working Group in the Transport Area of 955 IETF has been responsible for defining the iSCSI protocol 956 (apart from a host of other relevant IP Storage protocols). 957 The editor acknowledges the contributions of the entire 958 working group. 960 The following individuals directly contributed to identifying 961 [RFC3720] issues and/or suggesting resolutions to the issues 962 clarified in this document: David Black (REPORT LUNS/overflow 963 semantics, ACA semantics), Gwendal Grignou (TMF scope), Mike 964 Ko (digest error handling for Asynchronous Message), Dmitry 965 Fomichev (reserved ITT), Bill Studenmund (residual handling, 966 discovery semantics), Ken Sandars (discovery semantics), Bob 967 Russell (discovery semantics), Julian Satran (discovery 968 semantics), Rob Elliott (T10 liaison, R2T ordering), Joseph 969 Pittman(TMF scope), Somesh Gupta (multi-task abort 970 semantics). This document benefited from all these 971 contributions. 973 14 Full Copyright Statement 975 Copyright (C) The Internet Society (2006). This document is 976 subject to the rights, licenses and restrictions contained in 977 BCP 78, and except as set forth therein, the authors retain 978 all their rights. 980 This document and the information contained herein are 981 provided on an "AS IS" basis and THE CONTRIBUTOR, THE 982 ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), 983 THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE 984 DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT 985 NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 986 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES 987 OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 989 15 Intellectual Property Statement 991 The IETF takes no position regarding the validity or scope of 992 any Intellectual Property Rights or other rights that might 993 be claimed to pertain to the implementation or use of the 994 technology described in this document or the extent to which 995 any license under such rights might or might not be 996 available; nor does it represent that it has made any 997 independent effort to identify any such rights. Information 998 on the procedures with respect to rights in RFC documents can 999 be found in BCP 78 and BCP 79. 1001 Copies of IPR disclosures made to the IETF Secretariat and 1002 any assurances of licenses to be made available, or the 1003 result of an attempt made to obtain a general license or 1004 permission for the use of such proprietary rights by 1005 implementers or users of this specification can be obtained 1006 from the IETF on-line IPR repository at 1007 http://www.ietf.org/ipr. 1009 The IETF invites any interested party to bring to its 1010 attention any copyrights, patents or patent applications, 1011 or other proprietary rights that may cover technology that 1012 may be required to implement this standard. Please address 1013 the information to the IETF at ietf-ipr@ietf.org.