idnits 2.17.1 draft-ietf-ips-iscsi-impl-guide-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 15. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1017. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1029. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1037. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1043. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? == There are 15 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 3 characters in excess of 72. -- The abstract seems to indicate that this document updates RFC3720, but the header doesn't have an 'Updates:' line to match this. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 1023 has weird spacing: '... be clai...' == Line 1032 has weird spacing: '... any ass...' == Line 1033 has weird spacing: '...t of an atte...' == Line 1035 has weird spacing: '...of this spec...' == Line 1040 has weird spacing: '...tention any...' == (2 more instances...) == The document seems to lack the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. (The document does seem to have the reference to RFC 2119 which the ID-Checklist requires). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2007) is 6245 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 3720' is mentioned on line 284, but not defined ** Obsolete undefined reference: RFC 3720 (Obsoleted by RFC 7143) == Unused Reference: 'RFC2119' is defined on line 968, but no explicit reference was found in the text ** Obsolete normative reference: RFC 3720 (Obsoleted by RFC 7143) -- Possible downref: Non-RFC (?) normative reference: ref. 'SPC3' Summary: 7 errors (**), 0 flaws (~~), 12 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 INTERNET DRAFT Mallikarjun Chadalapaka 3 draft-ietf-ips-iscsi-impl-guide-03.txt Hewlett-Packard Co. 4 Editor 6 Expires March 2007 8 iSCSI Implementer's Guide 10 Status of this Memo 11 By submitting this Internet-Draft, each author represents 12 that any applicable patent or other IPR claims of which he or 13 she is aware have been or will be disclosed, and any of which 14 he or she becomes aware will be disclosed, in accordance with 15 Section 6 of BCP 79. 17 Internet-Drafts are working documents of the Internet 18 Engineering Task Force (IETF), its areas, and its working 19 groups. Note that other groups may also distribute working 20 documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of 23 six months and may be updated, replaced, or obsoleted by 24 other documents at any time. It is inappropriate to use 25 Internet-Drafts as reference material or to cite them other 26 than a "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/1id-abstracts.html 31 The list of Internet-Draft Shadow Directories can be accessed 32 at http://www.ietf.org/shadow.html. 34 Abstract 35 iSCSI is a SCSI transport protocol and maps the SCSI family 36 of application protocols onto TCP/IP. RFC 3720 defines the 37 iSCSI protocol. This document compiles the clarifications to 38 the original protocol definition in RFC 3720 to serve as a 39 companion document for the iSCSI implementers. This document 40 updates RFC 3720 and the text in this document supersedes the 41 text in RFC 3720 when the two differ. 43 Table of Contents 45 1 Definitions and acronyms ...............................3 46 1.1 Definitions ............................................3 47 1.2 Acronyms ...............................................3 48 2 Introduction ...........................................5 49 3 iSCSI semantics for SCSI tasks .........................6 50 3.1 Residual handling ......................................6 51 3.1.1 Overview..............................................6 52 3.1.2 SCSI REPORT LUNS and Residual Overflow................7 53 3.2 R2T Ordering ...........................................8 54 3.3 SCSI Protocol Interface Model for Response Ordering ....8 55 3.3.1 Model Description.....................................9 56 3.3.2 iSCSI Semantics with the Interface Model..............9 57 3.3.3 Current List of Fenced Response Use Cases............10 58 4 Task Management .......................................12 59 4.1 Requests Affecting Multiple Tasks .....................12 60 4.1.1 Scope of affected tasks..............................12 61 4.1.2 Clarified multi-task abort semantics.................12 62 4.1.3 Updated multi-task abort semantics...................14 63 4.1.4 Rationale behind the new semantics...................16 64 5 Discovery semantics ...................................18 65 5.1 Error Recovery for Discovery Sessions .................18 66 5.2 Reinstatement Semantics of Discovery Sessions .........18 67 5.2.1 Unnamed Discovery Sessions...........................19 68 5.2.2 Named Discovery Sessions.............................19 69 6 Negotiation and Others ................................20 70 6.1 TPGT Values ...........................................20 71 6.2 Session type negotiation ..............................20 72 6.3 Understanding NotUnderstood ...........................20 73 7 iSCSI Error Handling and Recovery .....................22 74 7.1 ITT ...................................................22 75 7.2 Format Errors .........................................22 76 7.3 Digest Errors .........................................22 77 8 iSCSI PDUs ............................................24 78 8.1 Asynchronous Message ..................................24 79 9 Login/Text Operational Text Keys ......................25 80 9.1 FastMultiTaskAbort ....................................25 81 10 Security Considerations ...............................26 82 11 IANA Considerations ...................................27 83 12 References and Bibliography ...........................28 84 12.1 Normative References.................................28 85 12.2 Informative References...............................28 86 13 Editor's Address ......................................29 87 14 Acknowledgements ......................................30 88 15 Full Copyright Statement ..............................31 89 16 Intellectual Property Statement .......................32 91 1 Definitions and acronyms 93 1.1 Definitions 95 I/O Buffer � A buffer that is used in a SCSI Read or Write 96 operation so SCSI data may be sent from or received into 97 that buffer. For a read or write data transfer to take 98 place for a task, an I/O Buffer is required on the 99 initiator and at least one required on the target. 101 SCSI-Presented Data Transfer Length (SPDTL): SPDTL is the 102 aggregate data length of the data that SCSI layer 103 logically "presents" to iSCSI layer for a Data-in or 104 Data-out transfer in the context of a SCSI task. For a 105 bidirectional task, there are two SPDTL values � one for 106 Data-in and one for Data-out. Note that the notion of 107 "presenting" includes immediate data per the data 108 transfer model in [SAM2], and excludes overlapping data 109 transfers, if any, requested by the SCSI layer. 111 Third-party: A term used in this document to denote nexus 112 objects (I_T or I_T_L) and iSCSI sessions which reap the 113 side-effects of actions took place in the context of a 114 separate iSCSI session, while being third parties to the 115 action that caused the side-effects. One example of a 116 Third-party session is an iSCSI session hosting an I_T_L 117 nexus to an LU that is reset with an LU Reset TMF via a 118 separate I_T nexus. 120 1.2 Acronyms 122 Acronym Definition 124 ------------------------------------------------------------- 126 EDTL Expected Data Transfer Length 128 IANA Internet Assigned Numbers Authority 130 IETF Internet Engineering Task Force 132 I/O Input - Output 134 IP Internet Protocol 136 iSCSI Internet SCSI 138 iSER iSCSI Extensions for RDMA 140 ITT Initiator Task Tag 142 LO Leading Only 144 LU Logical Unit 146 LUN Logical Unit Number 148 PDU Protocol Data Unit 150 RDMA Remote Direct Memory Access 152 R2T Ready To Transfer 154 R2TSN Ready To Transfer Sequence Number 156 RFC Request For Comments 158 SAM SCSI Architecture Model 160 SCSI Small Computer Systems Interface 162 SN Sequence Number 164 SNACK Selective Negative Acknowledgment - also 166 Sequence Number Acknowledgement for data 168 TCP Transmission Control Protocol 170 TMF Task Management Function 172 TTT Target Transfer Tag 174 UA Unit Attention 176 2 Introduction 178 Several iSCSI implementations had been built after [RFC3720] was 179 published and the iSCSI community is now richer by the resulting 180 implementation expertise. The goal of this document is to 181 leverage this expertise both to offer clarifications to the 182 [RFC3720] semantics and to address defects in [RFC3720] as 183 appropriate. This document intends to offer critical guidance 184 to implementers with regard to non-obvious iSCSI implementation 185 aspects so as to improve interoperability and accelerate iSCSI 186 adoption. This document, however, does not purport to be an 187 all-encompassing iSCSI how-to guide for implementers, nor a 188 complete revision of [RFC3720]. This document instead is 189 intended as a companion document to [RFC3720] for the iSCSI 190 implementers. 192 iSCSI implementers are required to reference [RFC3722] and 193 [RFC3723] in addition to [RFC3720] for mandatory requirements. 194 In addition, [RFC3721] also contains useful information for 195 iSCSI implementers. The text in this document, however, updates 196 and supersedes the text in all the noted RFCs whenever there is 197 such a question. 199 3 iSCSI semantics for SCSI tasks 201 3.1 Residual handling 203 Section 10.4.1 of [RFC3720] defines the notion of "residuals" 204 and specifies how the residual information should be encoded 205 into the SCSI Response PDU in Counts and Flags fields. Section 206 3.1.1 clarifies the intent of [RFC3720] and explains the general 207 principles. Section 3.1.2 describes the residual handling in 208 the REPORT LUNS scenario. 210 3.1.1 Overview 212 SCSI-Presented Data Transfer Length (SPDTL) is the term this 213 document uses (see section 1.1 for definition) to represent the 214 aggregate data length that the target SCSI layer attempts to 215 transfer using the local iSCSI layer for a task. Expected Data 216 Transfer Length (EDTL) is the iSCSI term that represents the 217 length of data that iSCSI layer expects to transfer for a task. 218 EDTL is specified in the SCSI Command PDU. 220 When SPDTL = EDTL for a task, the target iSCSI layer completes 221 the task with no residuals. Whenever SPDTL differs from EDTL 222 for a task, that task is said to have a residual. 224 If SPDTL > EDTL for a task, iSCSI Overflow MUST be signaled in 225 the SCSI Response PDU as specified in [RFC3720]. Residual Count 226 MUST be set to the numerical value of (SPDTL � EDTL). 228 If SPDTL < EDTL for a task, iSCSI Underflow MUST be signaled in 229 the SCSI Response PDU as specified in [RFC3720]. Residual Count 230 MUST be set to the numerical value of (EDTL � SPDTL). 232 Note that the Overflow and Underflow scenarios are independent 233 of Data-in and Data-out. Either scenario is logically possible 234 in either direction of data transfer. 236 3.1.2 SCSI REPORT LUNS and Residual Overflow 238 The specification of the SCSI REPORT LUNS command requires that 239 the SCSI target limit the amount of data transferred to a 240 maximum size (ALLOCATION LENGTH) provided by the initiator in 241 the REPORT LUNS CDB. If the Expected Data Transfer Length 242 (EDTL) in the iSCSI header of the SCSI Command PDU for a REPORT 243 LUNS command is set to at least as large as that ALLOCATION 244 LENGTH, the SCSI layer truncation prevents an iSCSI Residual 245 Overflow from occurring. A SCSI initiator can detect that such 246 truncation has occurred via other information at the SCSI layer. 247 The rest of the section elaborates this required behavior. 249 iSCSI uses the (O) bit (bit 5) in the Flags field of the SCSI 250 Response and the last SCSI Data-In PDUs to indicate that that an 251 iSCSI target was unable to transfer all of the SCSI data for a 252 command to the initiator because the amount of data to be 253 transferred exceeded the EDTL in the corresponding SCSI Command 254 PDU (see Section 10.4.1 of [RFC3720]). 256 The SCSI REPORT LUNS command requests a target SCSI layer to 257 return a logical unit inventory (LUN list) to the initiator SCSI 258 layer (see section 6.21 of SPC-3 [SPC3]). The size of this LUN 259 list may not be known to the initiator SCSI layer when it issues 260 the REPORT LUNS command; to avoid transfer of more LUN list data 261 than the initiator is prepared for, the REPORT LUNS CDB contains 262 an ALLOCATION LENGTH field to specify the maximum amount of data 263 to be transferred to the initiator for this command. If the 264 initiator SCSI layer has under-estimated the number of logical 265 units at the target, it is possible that the complete logical 266 unit inventory does not fit in the specified ALLOCATION LENGTH. 267 In this situation, section 4.3.3.6 in [SPC3] requires that the 268 target SCSI layer "shall terminate transfers to the Data-In 269 Buffer" when the number of bytes specified by the ALLOCATION 270 LENGTH field have been transferred. 272 Therefore, in response to a REPORT LUNS command, the SCSI layer 273 at the target presents at most ALLOCATION LENGTH bytes of data 274 (logical unit inventory) to iSCSI for transfer to the initiator. 275 For a REPORT LUNS command, if the iSCSI EDTL is at least as 276 large as the ALLOCATION LENGTH, the SCSI truncation ensures that 277 the EDTL will accommodate all of the data to be transferred. If 278 all of the logical unit inventory data presented to the iSCSI 279 layer � i.e. the data remaining after any SCSI truncation - is 280 transferred to the initiator by the iSCSI layer, an iSCSI 281 Residual Overflow has not occurred and the iSCSI (O) bit MUST 282 NOT be set in the SCSI Response or final SCSI Data-Out PDU. 283 This is not a new requirement but is already required by the 284 combination of [RFC 3720] with the specification of the REPORT 285 LUNS command in [SPC3]. If the iSCSI EDTL is larger than the 286 ALLOCATION LENGTH however in this scenario, note that the iSCSI 287 Underflow MUST be signaled in the SCSI Response PDU. An iSCSI 288 Underflow MUST also be signaled when the iSCSI EDTL is equal to 289 ALLOCATION LENGTH but the logical unit inventory data presented 290 to the iSCSI layer is smaller than ALLOCATION LENGTH. 292 The LUN LIST LENGTH field in the logical unit inventory (first 293 field in the inventory) is not affected by truncation of the 294 inventory to fit in ALLOCATION LENGTH; this enables a SCSI 295 initiator to determine that the received inventory is incomplete 296 by noticing that the LUN LIST LENGTH in the inventory is larger 297 than the ALLOCATION LENGTH that was sent in the REPORT LUNS CDB. 298 A common initiator behavior in this situation is to re-issue the 299 REPORT LUNS command with a larger ALLOCATION LENGTH. 301 3.2 R2T Ordering 303 Section 10.8 in [RFC3720] says the following: 305 The target may send several R2T PDUs. It, therefore, can have 306 a number of pending data transfers. The number of outstanding 307 R2T PDUs are limited by the value of the negotiated key 308 MaxOutstandingR2T. Within a connection, outstanding R2Ts MUST 309 be fulfilled by the initiator in the order in which they were 310 received. 312 The quoted [RFC3720] text was unclear on the scope of 313 applicability � either per task, or across all tasks on a 314 connection � and may be interpreted as either. This section is 315 intended to clarify that the scope of applicability of the 316 quoted text is a task. No R2T ordering relationship � either in 317 generation at the target or in fulfilling at the initiator � 318 across tasks is implied. I.e., outstanding R2Ts within a task 319 MUST be fulfilled by the initiator in the order in which they 320 were received on a connection. 322 3.3 SCSI Protocol Interface Model for Response Ordering 324 Whenever an iSCSI session is composed of multiple connections, 325 the Response PDUs (task responses or TMF responses) originating 326 in the target SCSI layer are distributed onto the multiple 327 connections by the target iSCSI layer according to iSCSI 328 connection allegiance rules. This process generally may not 329 preserve the ordering of the responses by the time they are 330 delivered to the initiator SCSI layer. Since ordering is not 331 expected across SCSI responses anyway, this approach works fine 332 in the general case. However to address the special cases where 333 some ordering is desired by the SCSI layer, the following SCSI 334 protocol interface model is assumed. 336 3.3.1 Model Description 338 SCSI protocol layer instructs the SCSI transport layer of a 339 "Response Fence" associated with the response in question when 340 the "Send Command Complete" protocol data service (SAM-2, clause 341 5.4.2) and "Task Management Function Executed" (SAM-2, clause 342 6.9) service are invoked. The Response Fence flag instructs the 343 SCSI transport layer that the following conditions must be met 344 in delivering the response message: 346 (1) Response with Response Fence MUST chronologically be 347 delivered after all the "preceding" responses on the 348 I_T_L nexus, if the preceding responses are delivered at 349 all, to the application client on the initiator. 351 (2) Response with Response Fence MUST chronologically be 352 delivered prior to all the "following" responses on the 353 I_T_L nexus. 355 The "preceding" and "following" notions refer to the order of 356 hand-off of a response message from the target SCSI protocol 357 layer to the target SCSI transport (e.g. iSCSI) layer. 359 3.3.2 iSCSI Semantics with the Interface Model 361 The target iSCSI layer MUST do the following on sensing the 362 "Response Fence" flag associated with a response being handed 363 down from the target SCSI layer: 365 a) If it is a single-connection session, no special processing 366 is required. Standard SCSI Response PDU build process 367 happens. 369 b) If it is a multi-connection session, target iSCSI layer 370 takes note of last-sent and unacknowledged StatSN on each 371 of the connections in the iSCSI session, and waits for 372 acknowledgement (may solicit for acknowledgement by way of 373 a Nop-In) of each such StatSN to clear the fence. SCSI 374 response with the Response Fence flag must be sent to the 375 initiator only after receiving acknowledgements for each of 376 the unacknowledged StatSNs. 378 c) Target iSCSI layer must wait for an acknowledgement of the 379 SCSI Response PDU that carried the response which the 380 target SCSI layer marked with the Response Fence flag. The 381 fence must be considered cleared after receiving the 382 acknowledgement. 384 d) All further status processing for the LU is resumed only 385 after clearing the fence. If any new responses for the 386 I_T_L nexus are received from the SCSI layer before the 387 fence is cleared, those Response PDUs must be held and 388 queued at the iSCSI layer until the fence is cleared. 390 3.3.3 Current List of Fenced Response Use Cases 392 This section lists the fenced response use cases that iSCSI 393 implementations must comply with. However, this is not an 394 exhaustive enumeration. It is expected that as SCSI protocol 395 specifications evolve, the specifications will specify when 396 response fencing is required on a case-by-case basis. 398 Response Fence flag MUST be assumed set by the target SCSI layer 399 on the following SCSI completion messages handed down to the 400 target iSCSI layer: 402 1. The first completion message carrying the UA after the 403 multi-task abort on issuing and third-party sessions. 405 2. The TMF Response carrying the mult-task TMF Response on the 406 issuing session. 408 3. The completion message indicating ACA establishment on the 409 issuing session. 411 4. The first completion message carrying the ACA ACTIVE status 412 after ACA establishment on issuing and third-party 413 sessions. 415 5. The TMF Response carrying the Clear ACA response on the 416 issuing session. 418 Note: Due to the absence of ACA-related fencing requirements in 419 [RFC3720], initiator implementations SHOULD NOT use ACA on 420 multi-connection iSCSI sessions to targets complying only with 421 [RFC3720], i.e. those not complying with this document. 422 Initiators may assess target compliance to this document via 423 negotiating for FastMultiTaskAbort (section 9.1) key. 425 4 Task Management 427 4.1 Requests Affecting Multiple Tasks 429 This section clarifies and updates the original text in section 430 10.6.2 of [RFC3720]. The clarified semantics (section 4.1.2) 431 are a superset of the protocol behavior required in the original 432 text and all iSCSI implementations MUST support the new 433 behavior. The updated semantics (section 4.1.3) on the other 434 hand are mandatory only when the new key FastMultiTaskAbort 435 (section 9.1) is negotiated to "Yes". 437 4.1.1 Scope of affected tasks 439 This section defines the notion of "affected tasks" in multi- 440 task abort scenarios. Scope definitions in this section apply 441 to both the clarified protocol behavior (section 4.1.2) and the 442 updated protocol behavior (section 4.1.3). 444 ABORT TASK SET: All outstanding tasks for the I_T_L nexus 445 identified by the LUN field in the ABORT TASK SET TMF 446 Request PDU. 448 CLEAR TASK SET: All outstanding tasks in the task set for 449 the LU identified by the LUN field in the CLEAR TASK SET 450 TMF Request PDU. See [SPC3] for the definition of a "task 451 set". 453 LOGICAL UNIT RESET: All outstanding tasks from all 454 initiators for the LU identified by the LUN field in the 455 LOGICAL UNIT RESET Request PDU. 457 TARGET WARM RESET/TARGET COLD RESET: All outstanding tasks 458 from all initiators across all LUs that the TMF-issuing 459 session has access to on the SCSI target device hosting the 460 iSCSI session. 462 Usage: an "ABORT TASK SET TMF Request PDU" in the preceding text 463 is an iSCSI TMF Request PDU with the "Function" field set to 464 "ABORT TASK SET" as defined in [RFC3720]. Similar usage is 465 employed for other scope descriptions. 467 4.1.2 Clarified multi-task abort semantics 469 All iSCSI implementations MUST support the protocol behavior 470 defined in this section as the default behavior. The execution 471 of ABORT TASK SET, CLEAR TASK SET, LOGICAL UNIT RESET, TARGET 472 WARM RESET, and TARGET COLD RESET TMF Requests consists of the 473 following sequence of actions in the specified order on the 474 specified party. 476 The initiator iSCSI layer: 478 a. MUST continue to respond to each TTT received for the 479 affected tasks. 481 b. Should receive any responses that the target may provide 482 for some tasks among the affected tasks (may process them 483 as usual because they are guaranteed to have 484 chronologically originated prior to the TMF response). 486 c. Should receive the TMF Response concluding all the tasks in 487 the set of affected tasks. 489 The target iSCSI layer: 491 a. MUST wait for all currently valid target transfer tags of 492 the affected tasks to be responded to. 494 b. MUST wait (concurrent with the wait in Step.a) for all 495 commands of the affected tasks to be received based on the 496 CmdSN ordering. SHOULD NOT wait for new commands on 497 third-party affected sessions - only the instantiated tasks 498 have to be considered for the purpose of determining the 499 affected tasks. In the case of target-scoped requests 500 (i.e. TARGET WARM RESET and TARGET COLD RESET), all the 501 commands that are not yet received on the issuing session 502 in the command stream however can be considered to have 503 been received with no command waiting period - i.e. the 504 entire CmdSN space up to the CmdSN of the task management 505 function can be "plugged". 507 c. MUST propagate the TMF request to and receive the response 508 from the target SCSI layer. 510 d. MUST address the Response Fence flag on the TMF Response on 511 issuing session as defined in 3.3.2. 513 e. MUST address the Response Fence flag on the first post-TMF 514 Response on third-party sessions as defined in 3.3.2. If 515 some tasks originate from non-iSCSI I_T_L nexuses then the 516 means by which the target ensures that all affected tasks 517 have returned their status to the initiator are defined by 518 the specific non-iSCSI transport protocol(s). 520 4.1.3 Updated multi-task abort semantics 522 Protocol behavior defined in this section MUST be implemented by 523 all iSCSI implementations complying with this document. 524 Protocol behavior defined in this section MUST be exhibited by 525 iSCSI implementations on an iSCSI session when they negotiate 526 the FastMultiTaskAbort (section 9.1) key to "Yes" on that 527 session. The execution of ABORT TASK SET, CLEAR TASK SET, 528 LOGICAL UNIT RESET, TARGET WARM RESET, and TARGET COLD RESET TMF 529 Requests consists of the following sequence of actions in the 530 specified order on the specified party. 532 The initiator iSCSI layer: 534 a. MUST NOT send any more Data-Out PDUs for affected tasks on 535 the issuing connection of the issuing iSCSI session once 536 the TMF is sent to the target. 538 b. Should receive any responses that the target may provide 539 for some tasks among the affected tasks (may process them 540 as usual because they are guaranteed to have 541 chronologically originated prior to the TMF response). 543 c. MUST respond to Async Message PDU with AsyncEvent=5 as 544 defined in section 8.1. 546 d. Should receive the TMF Response concluding all the tasks in 547 the set of affected tasks. 549 The target iSCSI layer: 551 a. MUST wait for all commands of the affected tasks to be 552 received based on the CmdSN ordering on the issuing 553 session. SHOULD NOT wait for new commands on third-party 554 affected sessions - only the instantiated tasks have to be 555 considered for the purpose of determining the affected 556 tasks. In the case of target-scoped requests (i.e. TARGET 557 WARM RESET and TARGET COLD RESET), all the commands that 558 are not yet received on the issuing session in the command 559 stream however can be considered to have been received with 560 no command waiting period - i.e. the entire CmdSN space up 561 to the CmdSN of the task management function can be 562 "plugged". 564 b. MUST propagate the TMF request to and receive the response 565 from the target SCSI layer. 567 c. MUST leave all active "affected TTTs" (i.e. active TTTs 568 associated with affected tasks) valid along with any buffer 569 allocations for the TTTs intact. 571 d. MUST generate an Asynchronous Message PDU with AsyncEvent=5 572 (section 8.1) on: 573 i) each connection of each third-party session that at 574 least one affected task is allegiant to, and 575 ii) each connection except the non-issuing connection of the 576 issuing session that has at least one allegiant affected 577 task. 579 If there are multiple affected LUs (say due to a target 580 reset), then one Async Message PDU MUST be sent for each 581 such LU on each connection that has at least one allegiant 582 affected task. 584 e. MUST address the Response Fence flag on the TMF Response on 585 issuing session as defined in 3.3.2. 587 f. MUST address the Response Fence flag on the first post-TMF 588 Response on third-party sessions as defined in 3.3.2. If 589 some tasks originate from non-iSCSI I_T_L nexuses then the 590 means by which the target ensures that all affected tasks 591 have returned their status to the initiator are defined by 592 the specific non-iSCSI transport protocol(s). 594 g. MUST free up the affected TTTs (and STags, if applicable) 595 and the corresponding buffers once it receives the 596 associated Nop-Out acknowledgement that the initiator 597 generated in response to the Async Message. 599 Implementation note: Technically, the TMF servicing is 600 complete in Step.e. Data transfers corresponding to terminated 601 tasks may however still be in progress even at the end of 602 Step.f. In the case of iSCSI/iSER, these transfers would be 603 into tagged buffers with STags not owned by any active tasks. 604 Step.g specifies an event to free up the resources. A target 605 may, on an implementation-defined internal timeout, also 606 choose to drop the connections on which it did not receive the 608 expected Nop-Out acknowledgements so as to reclaim the 609 associated buffer, STag and TTT resources as appropriate. 611 4.1.3.1 Clearing effects update 613 Appendix F.1 of [RFC3720] specifies the clearing effects of 614 target and LU resets on "Incomplete TTTs" as "Y". This meant 615 that a target warm reset or a target cold reset or an LU reset 616 would clear the active TTTs upon completion. The 617 FastMultiTaskAbort semantics defined by this section however do 618 not guarantee that the active TTTs are cleared by the end of the 619 reset operations. In fact, the new semantics are designed to 620 allow clearing the TTTs in a "lazy" fashion after the TMF 621 Response is delivered. Thus, when FastMultiTaskAbort=Yes is 622 operational on a session, the clearing effects of reset 623 operations on "Incomplete TTTs" is "N". 625 4.1.4 Rationale behind the new semantics 627 There are fundamentally three basic objectives behind the 628 semantics specified in section 4.1.2 and section 4.1.3. 630 1. Maintaining an ordered command flow I_T nexus abstraction 631 to the target SCSI layer even with multi-connection 632 sessions. 634 o Target iSCSI processing of a TMF request must maintain 635 the single flow illusion. Target behavior in Step.b 636 of section 4.1.2 and Step.a of section 4.1.3 637 correspond to this objective. 639 2. Maintaining a single ordered response flow I_T nexus 640 abstraction to the initiator SCSI layer even with multi- 641 connection sessions when one response (i.e. TMF response) 642 could imply the status of other unfinished tasks from the 643 initiator's perspective. 645 o Target must ensure that the initiator does not see 646 "old" task responses (that were placed on the wire 647 chronologically earlier than the TMF Response) after 648 seeing the TMF response. Target behavior in Step.d of 649 section 4.1.2 and Step.e of section 4.1.3 correspond 650 to this objective. 652 o Whenever the result of a TMF action is visible across 653 multiple I_T_L nexuses, [SAM2] requires the SCSI 654 device server to trigger a UA on each of the other 656 I_T_L nexuses. Once an initiator is notified of such 657 an UA, the application client on the receiving 658 initiator is required to clear its task state (clause 659 5.5 in [SAM2]) for the affected tasks. It would thus 660 be inappropriate to deliver a SCSI Response for a task 661 after the task state is cleared on the initiator, i.e. 662 after the UA is notified. The UA notification 663 contained in the first SCSI Response PDU on each 664 affected Third-party I_T_L nexus after the TMF action 665 thus MUST NOT pass the affected task responses on any 666 of the iSCSI sessions accessing the LU. Target 667 behavior in Step.e of section 4.1.2 and Step.f of 668 section 4.1.3 correspond to this objective. 670 3. Draining all active TTTs corresponding to affected tasks 671 in a deterministic fashion. 673 o Data-out PDUs with stale TTTs arriving after the tasks 674 are terminated can create a buffer management problem 675 even for traditional iSCSI implementations, and is 676 fatal for the connection for iSCSI/iSER 677 implementations. Either the termination of affected 678 tasks should be postponed until the TTTs are retired 679 (as in Step.a of section 4.1.2), or the TTTs and the 680 buffers should stay allocated beyond task termination 681 to be deterministically freed up later (as in Step.c 682 and Step.g of section 4.1.3). 684 The only other notable optimization is the plugging. If all 685 tasks on an I_T nexus will be aborted anyway (as with a target 686 reset), there is no need to wait to receive all commands to plug 687 the CmdSN holes. Target iSCSI layer can simply plug all missing 688 CmdSN slots and move on with TMF processing. The first 689 objective (maintaining a single ordered command flow) is still 690 met with this optimization because target SCSI layer only sees 691 ordered commands. 693 5 Discovery semantics 695 5.1 Error Recovery for Discovery Sessions 697 The negotiation of the key ErrorRecoveryLevel is not required 698 for Discovery sessions � i.e. for sessions that negotiated 699 "SessionType=Discovery" � because the default value of 0 is 700 necessary and sufficient for Discovery sessions. It is however 701 possible that some legacy iSCSI implementations might attempt to 702 negotiate the ErrorRecoveryLevel key on Discovery sessions. 703 When such a negotiation attempt is made by the remote side, a 704 compliant iSCSI implementation MUST propose a value of 0 (zero) 705 in response. The operational ErrorRecoveryLevel for Discovery 706 sessions thus MUST be 0. This naturally follows from the 707 functionality constraints [RFC3720] imposes on Discovery 708 sessions. 710 5.2 Reinstatement Semantics of Discovery Sessions 712 Discovery sessions are intended to be relatively short-lived. 713 Initiators are not expected to establish multiple Discovery 714 sessions to the same iSCSI Network Portal (see [RFC3720]). An 715 initiator may use the same iSCSI Initiator Name and ISID when 716 establishing different unique sessions with different targets 717 and/or different portal groups. This behavior is discussed in 718 Section 9.1.1 of [RFC3720] and is, in fact, encouraged as 719 conservative reuse of ISIDs. ISID RULE in [RFC3720] states that 720 there must not be more than one session with a matching 4-tuple: 721 . While 722 the spirit of the ISID RULE applies to Discovery sessions the 723 same as it does for Normal sessions, note that some Discovery 724 sessions differ from the Normal sessions in two important 725 aspects: 727 Because [RFC3720] allows a Discovery session to be 728 established without specifying a TargetName key in the 729 Login Request PDU (let us call such a session an "Unnamed" 730 Discovery session), there is no Target Node context to 731 enforce the ISID RULE. 733 Portal Groups are defined only in the context of a Target 734 Node. When the TargetName key is NULL-valued (i.e. not 735 specified), the TargetPortalGroupTag thus cannot be 736 ascertained to enforce the ISID RULE. 738 The following sections describe the two scenarios � Named 739 Discovery sessions and Unnamed Discovery sessions � separately. 741 5.2.1 Unnamed Discovery Sessions 743 For Unnamed Discovery sessions, neither the TargetName nor the 744 TargetPortalGroupTag is available to the targets in order to 745 enforce the ISID RULE. So the following rule applies. 747 UNNAMED ISID RULE: Targets MUST enforce the uniqueness of the 748 following 4-tuple for Unnamed Discovery sessions: 749 . The following 750 semantics are implied by this uniqueness requirement. 752 Targets SHOULD allow concurrent establishment of one Discovery 753 session with each of its Network Portals by the same initiator 754 port with a given iSCSI Node Name and an ISID. Each of the 755 concurrent Discovery sessions, if established by the same 756 initiator port to other Network Portals, MUST be treated as 757 independent sessions � i.e. one session MUST NOT reinstate the 758 other. 760 A new Unnamed Discovery session that has a matching 761 to an existing 762 discovery session MUST reinstate the existing Unnamed Discovery 763 session. Note thus that only an Unnamed Discovery session may 764 reinstate an Unnamed Discovery session. 766 5.2.2 Named Discovery Sessions 768 For a Named Discovery session, the TargetName key is specified 769 by the initiator and thus the target can unambiguously ascertain 770 the TargetPortalGroupTag as well. Since all the four elements 771 of the 4-tuple are known, the ISID RULE MUST be enforced by 772 targets with no changes from [RFC3720] semantics. A new session 773 with a matching thus will reinstate an existing session. 775 Note in this case that any new iSCSI session (Discovery or 776 Normal) with the matching 4-tuple may reinstate an existing 777 Named Discovery iSCSI session. 779 6 Negotiation and Others 781 6.1 TPGT Values 783 SAM-2 and SAM-3 specifications incorrectly note in their 784 informative text that TPGT value should be non-zero, although 785 [RFC3720} allows the value of zero for TPGT. This section is to 786 clarify that zero value is expressly allowed as a legal value 787 for TPGT. A future revision of SAM will be corrected to address 788 this discrepancy. 790 6.2 Session type negotiation 792 During the Login phase, the SessionType key is offered by the 793 initiator to choose the type of session it wants to create with 794 the target. The target may accept or reject the offer. 795 Depending on the type of the session, a target may decide on 796 resources to allocate and the security to enforce etc. for the 797 session. If the SessionType key is thus going to be offered as 798 "Discovery", it SHOULD be offered in the initial Login request 799 by the initiator. 801 6.3 Understanding NotUnderstood 803 [RFC3720] defines NotUnderstood as a valid answer during a 804 negotiation text key exchange between two iSCSI nodes. 805 NotUnderstood has the reserved meaning that the sending side did 806 not understand the key semantics. This section seeks to clarify 807 that NotUnderstood is a valid answer for both declarative and 808 negotiated keys. The general iSCSI philosophy is that 809 comprehension precedes processing for any iSCSI key. A proposer 810 of an iSCSI key, negotiated or declarative, in a text key 811 exchange MUST thus be able to properly handle a NotUnderstood 812 response. 814 The proper way to handle a NotUnderstood response varies 815 depending on the lineage and type of the key. All keys defined 816 in [RFC3720] MUST be supported by all compliant implementations; 817 a NotUnderstood answer on any of the [RFC3720] keys therefore 818 MUST be considered a protocol error and handled accordingly. 819 For all other later keys, a NotUnderstood answer concludes the 820 negotiation for a negotiated key whereas for a declarative key, 821 a NotUnderstood answer simply informs the declarer of lack of 822 comprehension by the receiver. In either case, a NotUnderstood 823 answer always requires that the protocol behavior associated 824 with that key be not used within the scope of the key 825 (connection/session) by either side. 827 7 iSCSI Error Handling and Recovery 829 7.1 ITT 831 Section 10.19 in [RFC3720] mentions this in passing but noted 832 here again for making it obvious since the semantics apply to 833 the initiators in general. An ITT value of 0xffffffff is 834 reserved and MUST NOT be assigned for a task by the initiator. 835 The only instance it may be seen on the wire is in a target- 836 initiated NOP-In PDU (and in the initiator response to that PDU 837 if necessary). 839 7.2 Format Errors 841 Section 6.6 of [RFC3720] discusses format error handling. This 842 section elaborates on the "inconsistent" PDU field contents 843 noted in [RFC3720]. 845 All initiator-detected PDU construction errors MUST be 846 considered as format errors. Some examples of such errors are: 848 - NOP-In with a valid TTT but an invalid LUN 850 - NOP-In with a valid ITT (i.e. a NOP-In response) and also a 851 valid TTT 853 - SCSI Response PDU with Status=CHECK CONDITION, but 854 DataSegmentLength = 0 856 7.3 Digest Errors 858 Section 6.7 of [RFC3720] discusses digest error handling. It 859 states that "No further action is necessary for initiators if the discarded 860 PDU is an unsolicited PDU (e.g., Async, Reject)" on detecting a 861 payload digest error. This is incorrect. 863 An Asynchronous Message PDU or a Reject PDU carries the next 864 StatSN value on an iSCSI connection, advancing the StatSN. When 865 an initiator discards one of these PDUs due to a payload digest 866 error, the entire PDU including the header MUST be discarded. 867 Consequently, the initiator MUST treat the exception like a loss 868 of any other solicited response PDU � i.e. it MUST use one of 869 the following options noted in [RFC3720]: 871 a) Request PDU retransmission with a status SNACK. 873 b) Logout the connection for recovery and continue the 874 tasks on a different connection instance. 876 c) Logout to close the connection (abort all the commands 877 associated with the connection). 879 8 iSCSI PDUs 881 8.1 Asynchronous Message 883 This section defines additional semantics for the Asynchronous 884 Message PDU defined in section 10.9 of [RFC3720] using the same 885 conventions. 887 The following new legal value for AsyncEvent is defined: 889 5: all active tasks for LU with matching LUN field in the Async 890 Message PDU are being terminated. 892 The receiving initiator iSCSI layer MUST respond this Message by 893 taking the following steps in order. 895 i) Stop Data-Out transfers on that connection for all active 896 TTTs for the affected LUN quoted in the Async Message 897 PDU. 898 ii) Acknowledge the StatSN of the Async Message PDU via a 899 Nop-Out PDU with ITT=0xffffffff (i.e. non-ping flavor), 900 while copying the LUN field from Async Message to Nop- 901 Out. 903 9 Login/Text Operational Text Keys 905 This section follows the same conventions as section 12 of 906 [RFC3720]. 908 9.1 FastMultiTaskAbort 910 Use: LO 911 Senders: Initiator and Target 912 Scope: SW 914 Irrelevant when: SessionType=Discovery 915 FastMultiTaskAbort= 917 Default is No. 918 Result function is AND. 920 This key is used to negotiate the updated fast multi-task abort 921 semantics defined in section 4.1.3. By negotiating this key to 922 "Yes", an initiator and a target agree that the new semantics 923 MUST be used in the multi-task TMF handling situations. The 924 default is to use the [RFC3720] TMF semantics as clarified in 925 section 4.1.2. 927 10 Security Considerations 929 This document does not introduce any new security considerations 930 other than those already noted in [RFC3720]. Consequently, all 931 the iSCSI-related security text in [RFC3723] is also directly 932 applicable to this document. 934 11 IANA Considerations 936 This draft does not have any specific IANA considerations other 937 than those already noted in [RFC3720]. 939 12 References and Bibliography 941 12.1 Normative References 943 [RFC3720] Satran, J., Meth, K., Sapuntzakis, C., Chadalapaka, 944 M., and E. Zeidner, "Internet Small Computer Systems 945 Interface (iSCSI)", RFC 3720, April 2004. 947 [RFC3722] Bakke, M., "String Profile for Internet Small 948 Computer Systems Interface (iSCSI) Names", RFC 3722, April 949 2004. 951 [RFC3723] Aboba, B., Tseng, J., Walker, J., Rangan, V., and 952 F. Travostino, "Securing Block Storage Protocols over IP", 953 RFC 3723, April 2004. 955 [SPC3] T10/1416-D, SCSI Primary Commands-3. 957 12.2 Informative References 959 [RFC3721] Bakke, M., Hafner, J., Hufferd, J., Voruganti, K., 960 and M. Krueger, "Internet Small Computer Systems Interface 961 (iSCSI) Naming and Discovery", RFC 3721, April 2004. 963 [iSER] Ko, M., Chadalapaka, M., Elzur, U., Shah, H., Thaler, 964 P., J. Hufferd, "iSCSI Extensions for RDMA", IETF 965 Internet Draft draft-ietf-ips-iser-04.txt (work in 966 progress), June 2005. 968 [RFC2119] Bradner, S. "Key Words for use in RFCs to Indicate 969 Requirement Levels", BCP 14, RFC 2119, March 1997. 971 [SAM2] ANSI X3.366-2003, SCSI Architecture Model-2 (SAM-2). 973 13 Editor's Address 975 Mallikarjun Chadalapaka 976 Hewlett-Packard Company 977 8000 Foothills Blvd. 978 Roseville, CA 95747-5668, USA 979 Phone: +1-916-785-5621 980 E-mail: cbm@rose.hp.com 982 14 Acknowledgements 984 The IP Storage (ips) Working Group in the Transport Area of 985 IETF has been responsible for defining the iSCSI protocol 986 (apart from a host of other relevant IP Storage protocols). 987 The editor acknowledges the contributions of the entire 988 working group. 990 The following individuals directly contributed to identifying 991 [RFC3720] issues and/or suggesting resolutions to the issues 992 clarified in this document: David Black (REPORT LUNS/overflow 993 semantics, ACA semantics), Gwendal Grignou (TMF scope), Mike 994 Ko (digest error handling for Asynchronous Message), Dmitry 995 Fomichev (reserved ITT), Bill Studenmund (residual handling, 996 discovery semantics), Ken Sandars (discovery semantics), Bob 997 Russell (discovery semantics), Julian Satran (discovery 998 semantics), Rob Elliott (T10 liaison, R2T ordering), Joseph 999 Pittman(TMF scope), Somesh Gupta (multi-task abort 1000 semantics). This document benefited from all these 1001 contributions. 1003 15 Full Copyright Statement 1005 Copyright (C) The Internet Society (2006). This document is 1006 subject to the rights, licenses and restrictions contained in 1007 BCP 78, and except as set forth therein, the authors retain 1008 all their rights. 1010 This document and the information contained herein are 1011 provided on an "AS IS" basis and THE CONTRIBUTOR, THE 1012 ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), 1013 THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE 1014 DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT 1015 NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 1016 HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES 1017 OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1019 16 Intellectual Property Statement 1021 The IETF takes no position regarding the validity or scope of 1022 any Intellectual Property Rights or other rights that might 1023 be claimed to pertain to the implementation or use of the 1024 technology described in this document or the extent to which 1025 any license under such rights might or might not be 1026 available; nor does it represent that it has made any 1027 independent effort to identify any such rights. Information 1028 on the procedures with respect to rights in RFC documents can 1029 be found in BCP 78 and BCP 79. 1031 Copies of IPR disclosures made to the IETF Secretariat and 1032 any assurances of licenses to be made available, or the 1033 result of an attempt made to obtain a general license or 1034 permission for the use of such proprietary rights by 1035 implementers or users of this specification can be obtained 1036 from the IETF on-line IPR repository at 1037 http://www.ietf.org/ipr. 1039 The IETF invites any interested party to bring to its 1040 attention any copyrights, patents or patent applications, 1041 or other proprietary rights that may cover technology that 1042 may be required to implement this standard. Please address 1043 the information to the IETF at ietf-ipr@ietf.org.