iSCSI 1-July-02 IPS Julian Satran Internet Draft Kalman Meth draft-ietf-ips-iscsi-14.txt IBM Category: standards-track Costa Sapuntzakis Cisco Systems Mallikarjun Chadalapaka Hewlett-Packard Co. Efri Zeidner SANGate iSCSI Julian Satran Expires February 2003 1 iSCSI 1-July-02 Status of this Memo This document is an Internet-Draft and fully conforms to all provi- sions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for at most six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet- Drafts as reference mate- rial or to cite them except as "work in progress." The list of Internet-Drafts can be accessed at http://www.ietf.org/ ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract The Small Computer Systems Interface (SCSI) is a popular family of protocols for communicating with I/O devices, especially storage devices. This document describes a transport protocol for SCSI that works on top of TCP. The iSCSI protocol aims to be fully compliant with the rules laid out in the SCSI Architecture Model - 2 [SAM2] document. The current version of iSCSI is 0. Acknowledgements This protocol was developed by a design team that, beside the authors, included Daniel Smith, Ofer Biran, Jim Hafner and John Hufferd (IBM), Mark Bakke (Cisco), Randy Haagens (HP), Matt Wakeley (Agilent, now Sierra Logic), Luciano Dalle Ore (Quantum), Paul Von Stamwitz (Adaptec, now TrueSAN Networks). Also, a large group of people contributed to this work through their review, comments and valuable insights. We are grateful to all them. We are especially grateful to those who found the time and patience to take part in our weekly phone conferences and intermediate meet- ings in Almaden and Haifa, so helping to shape this document: Prasen- jit Sarkar, Meir Toledano, John Dowdy, Steve Legg, Alain Azagury (IBM), Dave Nagle (CMU), David Black (EMC), John Matze (Veritas - now with Okapi Software), Steve DeGroote, Mark Schrandt (NuSpeed), Gabi Hecht (Gadzoox), Robert Snively and Brian Forbes (Brocade), Nelson Julian Satran Expires February 2003 2 iSCSI 1-July-02 Nachum (StorAge), Uri Elzur (Broadcom). Many more helped clean and improve this document within the IPS working group. We are espe- cially grateful to David Robinson and Raghavendra Rao (Sun), Charles Monia, Joshua Tseng (Nishan), Somesh Gupta (Silverback), Michael Krause, Pierre Labat, Santosh Rao, Matthew Burbridge, Bob Barry, Rob- ert Elliott, Nick Martin (HP), Stephen Bailey (Sandburst), Steve Senum, Ayman Ghanem, Dave Peterson (Cisco), Barry Reinhold (Trebia Networks), Bob Russell (UNH), Eddy Quicksall (iVivity, Inc.), Bill Lynn and Michael Fischer (Adaptec), Vince Cavanna, Pat Thaler (Agi- lent), Jonathan Stone (Stanford), Luben Tuikov (Splentec), Paul Konig (?), Michael Krueger (Windriver), Martins Krikis (Intel), Doug Otis (Sanlight), Robert Griswold and Bill Moody (Crossroads), Yaron Klein (Sanrad). The recovery chapter was enhanced with help from Stephen Bailey (Sandburst), Somesh Gupta (Silverback) and Venkat Rangan (Rhapsody Networks). Eddy Quicksall contributed some examples and began the Definitions Section. Michael Fischer and Bob Barry started the Acronyms Section. Last, but not least, thanks to Ralph Weber for keeping us in line with T10 (SCSI) standardization. We would like to thank Steve Hetzler for his unwavering support and for coming up with such a good name for the protocol, Micky Rodeh, Jai Menon, Clod Barrera and Andy Bechtolsheim for helping this work happen. This document has to be considered together with the "Naming & Dis- covery"[NDT], "Boot"[BOOT] and "Securing iSCSI, iFCP and FCIP"[SEC- IPS] documents. The "Naming & Discovery" document is authored by: Mark Bakke (Cisco), Jim Hafner, John Hufferd, Kaladhar Voru- ganti (IBM), Marjorie Krueger (Hewlett-Packard). . The "Boot" document is authored by: Prasenjit Sarkar (IBM), Duncan Missimer (HP) and Costa Sapuntz- akis (Cisco). The "Securing iSCSI, iFCP and FCIP" document is authored by: Bernard Aboba(Microsoft), Joshua Tseng (Nishan), Jesse Walker (Intel), Venkat Rangan (Rhapsody Networks), Franco Travos- tino (Nortel Networks). Julian Satran Expires February 2003 3 iSCSI 1-July-02 We are grateful to all them for their good work and for helping us correlate this document with the ones they produced. Change Log The following changes were made from draft-ietf-ips-iSCSI-13 to draft-ietf-ips-iSCSI-14: - Text cleanup - Clarification on COLD RESET - required by SAM - fixed in 9.5 recommendation on empty data (was inconsistent with R2T) - 9.4.6.2 text reffers only to firstburstsize changed error code to "incorrect amount of data" - changed size to length everywhere - Reinstated I bit in text request (typo) - StatSN is retransmitted R2T should be the new value - Fixed DefaultTime2Wait and changed selection function format in Section 11 The following changes were made from draft-ietf-ips-iSCSI-12 to draft-ietf-ips-iSCSI-13: - Text cleanup - Limited decimal encoding to 64 bit integers - Logout Request reason code moved to byte 1 - Renamed MaxRecvPDULength to MaxRecvDataSegmentLength - Large Numbers allowed only if explicitely stated - CHAP is the mandatory to implement in-band authentication and SRP is optional - A negotiation answer is permitted only if all key=value pairs are complete. A flag indicates completion. - Clearing effects appendix simplified - SCSI effects are now part of [SPC3] - Made explicit a rule a bout checking when committing a nego- tiation - Added code 4 for Asynch Message - request negotiation The following changes were made from draft-ietf-ips-iSCSI-11 to draft-ietf-ips-iSCSI-12: - Clarify the use of A bit and DataACK at the end of data - Clarified checking to be done for abort task and removed Ref- erenced task tag from task management response - Range separator is tilde. Julian Satran Expires February 2003 4 iSCSI 1-July-02 - Fixed the paragraph numbering in the appendices. - Clarified the expected target behavior in a lost F-bit sce- nario when responding to Abort Task Set/Clear Task Set. - Added the TargetPortalGroupTag key as a Login/operational key, and its usage semantics were added to Section 4.3 Login Phase. - Clarified the language in Section 6.1.2 Allegiance Reassign- ment and Section 6.2 Usage Of Reject PDU in Recovery. - Clarified the states corresponding to full-feature phase operation in connection and session state diagrams in Chap- ter 5. - Delivering all negotiated unsolicited data are mandatory - Delivering all the data for an R2T is mandatory - Added a timeout guidance section to Chapter 8 - Added normative naming text (previously in NDT) - Clarified no duplicate parameter for login - Added a minimum required to support to text length (16k/64k) - Changed the name of TSID to TSIH to better reflect its mean- ing - Security - IPsec transport mode is MAY and authentication MUST be used when encryption is used - Added to logout a section clarifying the actions to be taken on task termination by the target - Removed CRN - Changed default time2wait & retain to better express typical ratio - Changes SCSI port element separator to comma - Async Event data format same as for SCSI response The following changes were made from draft-ietf-ips-iSCSI-10 to draft-ietf-ips-iSCSI-11: - ACA is SHOULD - New format for ISID that allows factory presets - New wording in section 9.5.4 that makes it clear that initia- tor must discard discontiguous data PDUs during reassignment. - Removed Parameter1 field definition for "drop the session" Async Message. - In state transitions chapter, added Logout timeout to the event set causing T17, and removed the "session close" event from the event set for T6. Changed "status class" to Status- Class. - Clarified that for ErrorRecoveryLevel < 2, a restart Login PDU terminates all the tasks. - Clarified the various subcases of interpretation for Time2Retain and Time2Wait in the Logout Response section. - Added a new section in the recovery chapter on connection timeout management. Julian Satran Expires February 2003 5 iSCSI 1-July-02 - The LogoutLoginMinTime and LogoutLoginMaxTime keys are respectively renamed to DefaultTime2Wait and DefaultTime2Retain, because they are used only on non-Logout events and also to better align with the notion of Time2Wait and Time2Retain that the draft already defines. - Added the new Appendix on clearing effects. - Retired the X-bit in Login PDU to make the bit position reserved. Moved the content under X-bit description to a new section 4.3.4 that describes "connection reinstatement". - Added text to section 6.1.2 that clarifies the expectations on targets during allegiance reassignment. - Minor changes in error recovery algorithms to change NextC- mdSN to CmdSN in the Session data structure. - Added a new section 4.3.5 defining the term "session rein- statement". - Added a new transition N11 to target session state diagram, to address the session reinstatement event. Enhancing the event set for N3(T) and N6(I & T) for the same event. Add- ing the same event to the event sets for target transitions T8, T13, T15, T16, T17, T18, and M2 (I & T). - Addressed the case of active TTTs when ABORT TASK SET/CLEAR TASK SET is in progress in section 9.5 and section 9.6. - Added a new Section 9.6.2 Task Management actions on task sets that describes the exact timeline of events on a task set task management function. - Clarified the usage of ITT for DataACK type of SNACK. - Added error code for inexistent session to login response - Changed the FIM SHOULD to should(!) - Added a TTT field for Data-In when A bit is 1 and to the cor- responding SNACK. To make it consistent changed slightly the layout of Data-IN, SCSI Response and SNACK. - Clarified the use of LUN with all PDUs holding TTT - Removed the ? value from negotiations - Unified text negotiations (login, ffp and formats) in one chapter - Clarified AHSLength and DataLength for all PDUs - Clarified use of Reject - Replaced Protocol Error with Negotiation Failure in negotia- tions - Removed FFP command before login from Reject Causes - Added Invalid Request During Login to Login Errors - Added tape text - Clarified Security Text - Aligned marker negotiations with the overall negotiations and added numeric range to the negotiation forms - Changed target network architecture example in Overview - Clarified T bit use in Login Reject - Version back to 00 Julian Satran Expires February 2003 6 iSCSI 1-July-02 The following changes were made from draft-ietf-ips-iSCSI-09 to draft-ietf-ips-iSCSI-10: - Clarifying MaxOutstandingR2T - Widening the scope of Reject reason code 0x09 to mean "Invalid PDU field". - Changes in the "iSCSI connection termination" section to make the terminology usage consistent with the rest of the draft. - Adding transition T18 in standard connection state diagram, and its description. - Other minor wording changes in the state transitions chap- ter to address "session close" case and others. - Adding a new state Q5(IN_CONTINUE) to the target session state diagram to resolve transitions N8 and N9 off Q2. - Removed the AHS drop bit feature. - Removed the qualifier field in Task Management Response PDU, and added a new response "Function authorization failed". - Clarified the fate of regular SCSI reservations on a session timeout, compared to a transient session failure. - Added wording in R2T section to address the case of receiv- ing a smaller write data sequence than was asked for in an R2T. - Changes and fixes in recovery algorithms to be consistent with the rest of the draft. - Changed the "Invalid SNACK" Reject reason code to "Invalid data ACK" because the invalid SNACK is already covered under "Protocol error". Also treating DataSN and R2TSN equiva- lently in this case. - Change in the SNACK section to require a Reject "Protocol error" on an invalid SNACK. - Time2Retain 0 in Logout Response indicates connection/ses- sion can't recover - Coordinate DataSequenceInOrder with Error recovery level and MaxOutstandingR2T, also stating that only the last read/write sequence is recoverable under digest error recovery if DataSequenceInOrder=Yes - Alias designation format appendix is again out(!) - T10 has decided it will go in SPC - Task Management synchronization moved to the target (task management response given after task management action and confirmed delivery of all previous responses) - Removed the don't care value in numerical negotiations - Changed Marker negotiation to allow it to be closed in one round - Marker position is not dependent of the length of the login phase - Statement made that reserved bits do not have to be checked at the beginning of Chapter 9 - InitialR2T, BidiInitialR2T and ImmediateData changed to LO - I bit (equivalent) in responses made 0 Julian Satran Expires February 2003 7 iSCSI 1-July-02 - Added a "double response" version for the ? key value to - ? value can be used only outside Login - added :, [ and ] as allowed in key values - allow 0 in LogoutLoginMax and Min - after task reassign no SNACK mandated, the function must be performed by target with information made available by reas- sign - removed the third party command section - SCSI now handles everything needed (including iSCSI aliasing) The following changes were made from draft-ietf-ips-iSCSI-08 to draft-ietf-ips-iSCSI-09: - Added Task management response "task management function not supported" - Negotiation (numeric) responder driven - Added vendor specific data to reject - Allow logout in discovery sessions - Variable DataPDULength - renamed MaxRecvPDULength - Key=value pairs can span PDU boundaries - Uniform treatment of text exchange resets - Reintroduced DataACK as a special form of SNACK - Extended ISID in the Login Request - Removed 0 as a "no limit value" (residue from mode pages) - Reintroduced LogoutLoginMinTime - Digests moved to Operational Keys - Removed X bit in all commands and replaced it in Login and added a cleaning rule to CmdSN numbering - Several simplifications in state transition section - stan- dard connection and session state diagrams are separately described for initiators and targets - Several minor technical and language changes in the error recovery section - Added Irrelevant to negotiations - Clarification to logout behavior - Clarification to command ordering - On SCSI timeout task abort instead of session failure - Changed version to 0x03 - ALL VERSION NUMBERS are temporary up to "Rafting" (take them with a grain of salt) The following changes were made from draft-ietf-ips-iSCSI-07 to draft-ietf-ips-iSCSI-08: - Clarified the use of initiator task tag with regard to the SCSI tag in Section 9.2.1.7 Initiator Task Tag - Added a clarification to Section 2.2.2.1 Command Numbering and Acknowledging - response to a command should not precede acknowledgment. Julian Satran Expires February 2003 8 iSCSI 1-July-02 - Added clarification to Section 9.7 SCSI Data-out & SCSI Data- in - good status in Data-In must be supported by initiators - Clarified InitiatorName is required at login in Section 4.3.1 Login Phase Start - Another clarification for SecurityContextComplete in Section 4.3.2 iSCSI Security Negotiation - Added "command not supported in this session type" to reject reasons - Discovery session implies MaxConnections = 1 - Second appearance of TargetAddress deleted - Padding forbidden for non-end-of-sequence data PDUs - Removed Boot and Copenhagener Session types - Changed explanation of ExpDataSN - Removed/corrected response 05 in Section 9.4.3 Response - Brought Section 2.2.6 iSCSI Names in line with NDT draft - Fixed the syntax in accordance with [RFC2372] and [RFC2373] - Removed forgotten references to the default iSCSI target - Counters back to Reject Response - Clarification - SendTargets admissible only in full feature phase - Changed name of DataOrder and DataDeliveryOrder to DataSe- quenceOrder and DataPDUInOrder and clarified appendix text - Padding bytes SHOULD be sent as 0 (instead of MUST be 0) - UA attention behavior for various resets deleted - replaced with reference to SAM2 - Removed AccessID - OpParmReset generalized - Clarified the definition of full-feature phase in Section 2.2.4 iSCSI Full Feature Phase - Added new Reject reason codes, tabular listing and a pointer to Section 9.14.4 Implicit termination of tasks - Added additional Reject usage semantics on CmdSN and DataSN to Section 9.14.4 Implicit termination of tasks - Added a new Logout Response code for failure - Renamed BUSY as RECOVERY_START, removed RECOVERY_DONE, and merged T11 and T14 transitions into T11-(1,2) in Section 5 State Transitions. - Corrected initiator handling of format errors - Clarified usage of command replay - Removed the delivery in same order as presented from Text Response - Clarified RefCmdSN function fro abort task - Corrected length field for AHS of type Extended CDB - Removed LUN from text management response - Clarified F bit for Bidirectional commands - Removed the Async iSCSI event "target reset" - Removed wording in Section 9.6 Task Management Function Response linking SCSI mode pages to Async Messages - Changed the ASC/ASCQ values to better mean "not enough unso- licited data" Julian Satran Expires February 2003 9 iSCSI 1-July-02 - Names examples include date - Removed references to S bit in Section 9.4 SCSI Response - Fixed NOP to simplify and avoid it consuming CmdSN - Fixed CRC and examples - Added the T, CSG & NSG fields to Login Command & Response, rewrote Chapter 3, changed all examples in Appendix C. - Login Phase Examples - to fit the above changes - Key=value confined to one response - Add command restart/replay to task management - Removed cryptographic digests - Removed "proxy required" status code - Re-named and fixed descriptions of status codes - Re-formatted login examples for clarity - SCSI/iSCSI parameters - fixed Section 3 SCSI Mode Parameters for iSCSI, out DataPDULength, DataSequenceOrder - Changed all sense keys to aborted command in the table in Section 9.4.2 Status - Rearranged requests to have all SCSI related grouped etc. - Fixed Task Management Function Request ABORT TASK and removed the part about it in Chapter 8. - Reintroduced aliases (the data format) in an appendix. The aliasing mechanism once part of iSCSI is part of [SPC] - Login negotiations - using only login request response (instead of former login and text) - F bit in login changed name to T bit - Stated defaults for mode parameters in chapter 3 - Updated Chapter 7 to reflect the current consensus on secu- rity - Changed all sense keys to aborted command in the table in 2.4.2 - Minor language clarifications in sections 1.2.3, 1.2.5, 1.2.6, 1.2.8. - Added a new Reject reason code "Task in progress" and clari- fied language in the same section. - Added more description to the session state transitions in Chapter 5. - Several changes in Chapter 6 corresponding to the new task management function "reassign". Other language changes in Chapter 6 for better description. Format errors are mandated to cause session failures. - Renamed the erstwhile error recovery levels as error recov- ery classes, and renamed "within-session" recovery to "con- nection recovery" to better reflect the mechanics. - Added Section 6.13 Error Recovery Hierarchy to define the error recovery hierarchy. - Modifications to error recovery algorithms in Appendix F. - Added a new Reject reason code "Invalid SNACK", added DataSN to Reject PDU. - Changed Section 9.17 Reject to use the "Invalid SNACK" rea- son code. Julian Satran Expires February 2003 10 iSCSI 1-July-02 - Removed a Logout reason code in Section 9.14 Logout Request to be consistent with Section 9.9 Asynchronous Message. - Collapsed the two event fields in Async Event and added ven- dor specific event - Immediate data can be negotiated anytime (consistency) - Removed replay as a protocol notion and all references to it - SNACK RunLength 0 means all - Cleaning the bookmark mechanism for text - New T10 approved ASC/ASQ codes - Added a incipient definitions section - thanks to Eddy Quick- sall - Change OpParmReset from Yes/No to default/current - Added Base64 to encode large strings - The 255 limit for key values is now "unless specified other- wise" - Cleaned SNACK format - Removed ExpR2TSN from SCSI command response it is too late - MaxBurstSize/FirstBurstSize back as key=value - Removed LogoutLoginMinTime (value provided in exchange) - Clear language on component function in generating ISID/TSID - Negotiation breaking is done through abort/reject - Removed all iSCSI mode pages The following changes were made from draft-ietf-ips-iSCSI-06 to draft-ietf-ips-iSCSI-07: - Clarified the "fate" of immediate commands and resources man- dated (1.2.2.1) and introduced a reject-code for rejected immediate commands - Clarify CmdSN handling and checking order for ITT and CmdSN 1.2.2.1 - Added a statement to the effect that a receiver must be able to accept 0 length Data Segments to 2.7.6. Added also a statement to 2.2.1 that a zero-length data segment implies a zero-length digest - SCSI MODE SELECT will not really set the parameters (will not cause an error either). The parameters will be set exclu- sively with text mode and can be retrieved with either text or Mode-SENSE. This enables us to disable their change after the Login negotiation. Also added to the negotiation (1.2.4) the value "?" with special meaning of enquiry - Changed "task" to "command" wherever relevant - EMDP usage in line with other SCSI protocols. EMDP governs how a target may request data and deliver. Similar to FCP a separate (protocol) parameter governs data PDU ordering within Sequence (DataPDUInOrder). Cleaned wording of DataOrder. Fixed final bit to define sequences in input stream. - Added a "persistent state" part (1.2.8) Julian Satran Expires February 2003 11 iSCSI 1-July-02 - Some Task Management commands may require authorization or may not be implemented. If not authorized they will return as if executed with a qualifier indicating "not authorized" or "not implemented" (clear LU and the resets) - Task management commands and responses are "generalized" to all iSCSI tagged commands (they are named now Task Manage- ment command and response). Their behavior with respect to their CmdSN is clarified and mandated - The logic to update ExpCmdSN etc. moved to 1.2.2.1 - Explicitly specified that a target can "initiate" negotiat- ing a parameter (offering)(1.2.4) - Returned the "direction" bit and a set of codes similar to version 05 - Introduced a "special" session type (CopyManagerSession) to be used between a Copy Manager and all of its target; it may help define authentication and limit the type f commands to be executed in such a session - Added 8.4 - How to Abort Safely a Command that Was Not Received - Fixed the Logout Text - AHSLength is now the first field in the AHS - Fixed wording in 2.35 indicating AHS is mandatory for Bi- directional commands - All key=value responses have to be explicit (none, not-under- stood etc.); no more selection by hiatus - Targets can also offer key=value pairs (i.e., initiate nego- tiation) stated explicitly in 2.9.3 - Logout has a CmdSN field - The Status SNACK can be discarded if the target has no such recovery - Some parameters have been removed and replaced by "reason- able" defaults (read arbitrary defaults!); many others can't be changed anymore while the session is in full-feature phase - NOP-Out specifies how LUN is generated when used (copied from NOP-In) - Initial Marker-Less Interval is not a parameter anymore - A response with F=1 during negotiation may not contain key=value pairs that may require additional answers from the initiator - Clarified the meaning of the F bit on Write commands with regard to immediate and unsolicited data; F bit 0 means that unsolicited data will follow while F bit 1 means that this is the last of them (if any) - You can have both immediate and unsolicited Data-Out PDUs - DataPDULength and FirstBurstSize of 0 are allowed and mean unlimited length - Task management command behavior relative to their own CmdSN is now stated in no uncertain terms (they are mandated to execute as if issued at CmdSN and, in case of aborts and Julian Satran Expires February 2003 12 iSCSI 1-July-02 clear/reset no additional response/status is expected for those commands after the task management command response - DataSN field in R2T renamed as R2TSN (better reflects seman- tics) and SNACK explicitly says that it requests Data or R2T. - A session can have only one outstanding text request (not sequence) - Text for Login Response 0301 changed (removed the mainte- nance mention) - Clarified when ExpDataSN is reserved in SCSI Response - Clarified the text and parameter (timers) for iSCSI event - Padding bytes should be 0 (2.1) - TotalAHSLength in 2.1.1.1 includes padding - DataSegmentLength in 2.1.1.2 excludes padding - Clarified bits in AHS type - Limit for key/value string lengths (63, 255) in 2.8.3 - Added an example of SCSI event to Asynchronous Message - Changed "Who" to "Who can send" in appendix - Clarified meaning of parameters on 2.18.1 - Asynchronous Mes- sage - iSCSI Event - Clarified the required initiator behavior at logout (not sending other commands) and how one expects the TCP close to be performed in 2.14 - Added a Login Response code indicating that a session can't include a given connection (0208) - Clarified transition to full feature phase (per session and per connection and the role of the leading connection) in 1.2.5 - Corrected "one outstanding text request per connection" instead of "per session" - For the Login Response TSID must be valid only if Login is accepted and the F bit is 1 - Added examples illustrating DataSN and R2TSN (from Eddy Quicksall) - Added more text to the task management command 2.5 - Removed EnableACA and its dependents (in task management) and stated the requirement for a Unit Attention conform to SAM2 - iSCSI Target Name if used on a connection other than the first must be the same as on the first (4.1) - Fixed the examples in the Login appendix to correspond to the new keys - Fixed SCSI Response Flags and made them consistent with the Data-In PDU - All specified keys except X-* MUST be accepted (2.8.3) - Hexadecimal notation is 0xab123cd (not 0x'ab123cd') - Clarified CmdSN usage in immediate commands and the meaning of "execution engine" in 1.2.2.1 - Reject response that prevent the creation of a SCSI task or result in a SCSI task being terminated must be followed by a SCSI Response with a Check Condition status 2.19.1 Julian Satran Expires February 2003 13 iSCSI 1-July-02 - Additional Runs (AddRuns) dropped from the SNACK request (too complex). With it disappeared also the implicit acknowledge- ment of sequences "between runs" - PDUs delivered because of SNACK will be exact replicas of the original PDUs (including all flags) 2.16 - Added CommandReplaySupport key to negotiate support for full command replay (a command can be replayed after the status has been issued but has not been acknowledged) and a reject cause of unsupported command reply - Added CommandFailoverSupport key to negotiate support for command allegiance change (command retry on another connec- tion) - Status SNACK for an acknowledged status is a protocol error (cause for reject) - Reject cause "Command In Progress" when requesting replay before status is issued and while command is running - Premature SNACKs are silently discarded (2.16) - Status SNACK has to supported only if within command or within connection recovery is supported. If within session recovery is supported SNACK can be discarded and followed by an Async. Message requesting logout - StatSN added to Logout Response - Added "CID not found" to Logout Response reason codes - Async Message - iSCSI event 2 (request logout) has to be sent on the connection to be dropped. Wording fixed. - Naming changes - iqn (stands for iSCSI qualified name) intro- duced as a replacement to fqn. Iqn prefixes also reversed names - text in 8.3 revised (task management implementation mecha- nism) - Fixed bit 7 byte 1 in Task Management response to 1 (consis- tency) - Clarified in 1.2.2 behavior when "command window" is 0 (MaxC- mdSN = ExpCmdSN -1) - Added state transitions part (new part 6) - Refreshed recovery chapter (new part 7) - Added an appendix with detailed recovery mechanisms (Appen- dix E) - Added session types a brief explanation in part 1 - Added DiscoverySession key and SendTargets appendix - SCSI response made to fit having both a Status and a Response field. Needed for target errors that result in a check condi- tion and ACA. In line with SAM2 that requires both fields (former versions where modeled on FCP). - The security appendix list SRP as mandatory to implement - Clarified initial CmdSN and the role of TSID as a serializer - Long Text Responses - additional fields added to the text request and text response - Added a SCSI to iSCSI concept mapping section 1.5 Julian Satran Expires February 2003 14 iSCSI 1-July-02 - Clarified SNACK wording to indicate that in general command. Request, iSCSI command and iSCSI command have the same mean- ing. Also status, response or numbered response. - Changed InitStatSN and clarified how it increases - Added requirement for a 0x00 delimiter after each key=value - Added binary negotiations (Yes|No) explicitly to 1.2.4 - All keys and values in the spec are case sensitive (stated in the text request) - Changed the "operational parameters sent before the secu- rity. MAY be discarded" into MUST be discarded - Changed the login reject 0201 to read - Security Negotiation Failed - Added to 2.3.1 a paragraph about mandatory consistencies - Stated clearly that F bit pairing is "local" (per/pair) and not per negotiation - Clarified dependent parameter status - Added CRC Example - Added OpParmReset=Yes - SecurityContextComplete is mandatory if any option offered - Added a warning about the implications of not sending all unsolicited data to part 8 - Added a recommendation to send unsolicited data at First- BurstSize and a response (error) for targets not supporting less - Many more minor editorial changes, clarifications, typos etc. - Responses in same position in SCSI response, logout, task etc. Julian Satran Expires February 2003 15 iSCSI 1-July-02 Status of this Memo . . . . . . . . . . . . . . . . . . . . . . . . . 2 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Definitions and Acronyms . . . . . . . . . . . . . . . . . . . . .24 1.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . .24 1.2 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . .28 1.3 Conventions used in this document . . . . . . . . . . . . . . .30 1.3.1 Word Rule . . . . . . . . . . . . . . . . . . . . . . . .30 1.3.2 Half-Word Rule . . . . . . . . . . . . . . . . . . . . . .31 1.3.3 Byte Rule . . . . . . . . . . . . . . . . . . . . . . . .31 2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32 2.1 SCSI Concepts . . . . . . . . . . . . . . . . . . . . . . . . .32 2.2 iSCSI Concepts and Functional Overview . . . . . . . . . . . .33 2.2.1 Layers and Sessions . . . . . . . . . . . . . . . . . . .33 2.2.2 Ordering and iSCSI Numbering . . . . . . . . . . . . . . .34 2.2.2.1 Command Numbering and Acknowledging . . . . . . . . .35 2.2.2.2 Response/Status Numbering and Acknowledging . . . . .38 2.2.2.3 Data Sequencing . . . . . . . . . . . . . . . . . . .39 2.2.3 iSCSI Login . . . . . . . . . . . . . . . . . . . . . . .39 2.2.4 iSCSI Full Feature Phase . . . . . . . . . . . . . . . . .40 2.2.5 iSCSI Connection Termination . . . . . . . . . . . . . . .43 2.2.6 iSCSI Names . . . . . . . . . . . . . . . . . . . . . . .43 2.2.6.1 iSCSI Name Requirements . . . . . . . . . . . . . . .44 2.2.6.2 iSCSI Name Encoding . . . . . . . . . . . . . . . . .46 2.2.6.3 iSCSI Name Structure . . . . . . . . . . . . . . . .46 2.2.6.3.1 Type "iqn." (iSCSI Qualified Name) . . . . . . .47 2.2.6.3.2 Type "eui." (IEEE EUI-64 format) . . . . . . . .48 2.2.7 Persistent State . . . . . . . . . . . . . . . . . . . . .49 2.2.8 Message Synchronization and Steering . . . . . . . . . . .49 2.2.8.1 Rationale . . . . . . . . . . . . . . . . . . . . . .49 2.2.8.2 Synchronization (sync) and Steering Functional Model 50 2.2.8.3 Sync and Steering and Other Encapsulation Layers . .52 2.2.8.4 Sync/Steering and iSCSI PDU Length . . . . . . . . .53 2.3 iSCSI Session Types . . . . . . . . . . . . . . . . . . . . . .54 2.4 SCSI to iSCSI Concepts Mapping Model . . . . . . . . . . . . .54 2.4.1 iSCSI Architecture Model . . . . . . . . . . . . . . . . .55 2.4.2 SCSI Architecture Model . . . . . . . . . . . . . . . . .57 2.4.3 Consequences of the Model . . . . . . . . . . . . . . . .59 2.4.3.1 I_T Nexus State . . . . . . . . . . . . . . . . . . .60 2.4.3.2 SCSI Mode Pages . . . . . . . . . . . . . . . . . . .60 2.5 Request/Response Summary . . . . . . . . . . . . . . . . . . .61 2.5.1 Request/Response types carrying SCSI payload . . . . . . .61 Julian Satran Expires February 2003 16 iSCSI 1-July-02 2.5.1.1 SCSI-Command . . . . . . . . . . . . . . . . . . . .61 2.5.1.2 SCSI-Response . . . . . . . . . . . . . . . . . . . .62 2.5.1.3 Task Management Function Request . . . . . . . . . .62 2.5.1.4 Task Management Function Response . . . . . . . . . .63 2.5.1.5 SCSI Data-out and SCSI Data-in . . . . . . . . . . .63 2.5.1.6 Ready To Transfer (R2T) . . . . . . . . . . . . . . .64 2.5.2 Requests/Responses carrying SCSI and iSCSI Payload . . . .64 2.5.2.1 Asynchronous Message . . . . . . . . . . . . . . . .64 2.5.3 Requests/Responses carrying iSCSI Only Payload . . . . . .65 2.5.3.1 Text Request and Text Response . . . . . . . . . . .65 2.5.3.2 Login Request and Login Response . . . . . . . . . .65 2.5.3.3 Logout Request and Response . . . . . . . . . . . . .66 2.5.3.4 SNACK Request . . . . . . . . . . . . . . . . . . .66 2.5.3.5 Reject . . . . . . . . . . . . . . . . . . . . . . .67 2.5.3.6 NOP-Out Request and NOP-In Response . . . . . . . . .67 3. SCSI Mode Parameters for iSCSI . . . . . . . . . . . . . . . . . .68 4. Login and Full Feature Phase Negotiation . . . . . . . . . . . . .69 4.1 Text Format . . . . . . . . . . . . . . . . . . . . . . . . . .69 4.2 Text Mode Negotiation . . . . . . . . . . . . . . . . . . . . .72 4.2.1 List negotiations . . . . . . . . . . . . . . . . . . . .74 4.2.2 Simple-value negotiations . . . . . . . . . . . . . . . .75 4.3 Login Phase . . . . . . . . . . . . . . . . . . . . . . . . . .76 4.3.1 Login Phase Start . . . . . . . . . . . . . . . . . . . .78 4.3.2 iSCSI Security Negotiation . . . . . . . . . . . . . . . .80 4.3.3 Operational Parameter Negotiation During the Login Phase .81 4.3.4 Connection reinstatement . . . . . . . . . . . . . . . . .82 4.3.5 Session reinstatement, closure and timeout . . . . . . . .83 4.3.5.1 Loss of Nexus notification . . . . . . . . . . . . .83 4.3.6 Session continuation and failure . . . . . . . . . . . . .84 4.4 Operational Parameter Negotiation Outside the Login Phase . . .84 5. State Transitions . . . . . . . . . . . . . . . . . . . . . . . .86 5.1 Standard Connection State Diagrams . . . . . . . . . . . . . .86 5.1.1 Standard Connection State Diagram for an Initiator . . . .86 5.1.2 Standard Connection State Diagram for a Target . . . . . .88 5.1.3 State Descriptions for Initiators and Targets . . . . . .90 5.1.4 State Transition Descriptions for Initiators and Targets .91 5.2 Connection Cleanup State Diagram for Initiators and Targets . .95 5.2.1 State Descriptions for Initiators and Targets . . . . . .96 5.2.2 State Transition Descriptions for Initiators and Targets .97 5.3 Session State Diagrams . . . . . . . . . . . . . . . . . . . .98 5.3.1 Session State Diagram for a Target . . . . . . . . . . . .99 5.3.2 State Descriptions for Initiators and Targets . . . . . 101 5.3.3 State Transition Descriptions for Initiators and Targets 101 Julian Satran Expires February 2003 17 iSCSI 1-July-02 6. iSCSI Error Handling and Recovery . . . . . . . . . . . . . . . 103 6.1 Retry and Reassign in Recovery . . . . . . . . . . . . . . . 103 6.1.1 Usage of Retry . . . . . . . . . . . . . . . . . . . . . 103 6.1.2 Allegiance Reassignment . . . . . . . . . . . . . . . . 104 6.2 Usage Of Reject PDU in Recovery . . . . . . . . . . . . . . . 105 6.3 Connection timeout management . . . . . . . . . . . . . . . . 105 6.3.1 Timeouts on transport exception events . . . . . . . . . 106 6.3.2 Timeouts on planned decommissioning . . . . . . . . . . 106 6.4 Format Errors . . . . . . . . . . . . . . . . . . . . . . . . 106 6.5 Digest Errors . . . . . . . . . . . . . . . . . . . . . . . . 107 6.6 Sequence Errors . . . . . . . . . . . . . . . . . . . . . . . 108 6.7 SCSI Timeouts . . . . . . . . . . . . . . . . . . . . . . . . 109 6.8 Negotiation Failures . . . . . . . . . . . . . . . . . . . . 109 6.9 Protocol Errors . . . . . . . . . . . . . . . . . . . . . . . 110 6.10 Connection Failures . . . . . . . . . . . . . . . . . . . . 110 6.11 Session Errors . . . . . . . . . . . . . . . . . . . . . . . 111 6.12 Recovery Classes . . . . . . . . . . . . . . . . . . . . . . 112 6.12.1 Recovery Within-command . . . . . . . . . . . . . . . . 112 6.12.2 Recovery Within-connection . . . . . . . . . . . . . . 113 6.12.3 Connection Recovery . . . . . . . . . . . . . . . . . . 114 6.12.4 Session Recovery . . . . . . . . . . . . . . . . . . . 114 6.13 Error Recovery Hierarchy . . . . . . . . . . . . . . . . . . 115 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 118 7.1 iSCSI Security Mechanisms . . . . . . . . . . . . . . . . . . 118 7.2 In-band Initiator-Target Authentication . . . . . . . . . . . 119 7.2.1 CHAP Considerations . . . . . . . . . . . . . . . . . . 120 7.2.2 SRP Considerations . . . . . . . . . . . . . . . . . . . 120 7.3 IPsec . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 7.3.1 Data Integrity and Authentication . . . . . . . . . . . 121 7.3.2 Confidentiality . . . . . . . . . . . . . . . . . . . . 121 7.3.3 Policy, Security Associations and Key Management . . . . 122 8. Notes to Implementers . . . . . . . . . . . . . . . . . . . . . 124 8.1 Multiple Network Adapters . . . . . . . . . . . . . . . . . . 124 8.1.1 Conservative Reuse of ISIDs . . . . . . . . . . . . . . 124 8.1.2 iSCSI Name, ISID and TPGT Use . . . . . . . . . . . . . 125 8.2 Autosense and Auto Contingent Allegiance (ACA) . . . . . . . 127 8.3 iSCSI timeouts . . . . . . . . . . . . . . . . . . . . . . . 127 8.4 Command Retry and Cleaning Old Command Instances . . . . . . 127 8.5 Synch and Steering Layer and Performance . . . . . . . . . . 127 8.6 Considerations for State-dependent devices . . . . . . . . . 128 8.6.1 Determining the proper ErrorRecoveryLevel . . . . . . . 128 9. iSCSI PDU Formats . . . . . . . . . . . . . . . . . . . . . . . 130 9.1 iSCSI PDU Length and Padding . . . . . . . . . . . . . . . . 130 Julian Satran Expires February 2003 18 iSCSI 1-July-02 9.2 PDU Template, Header, and Opcodes . . . . . . . . . . . . . . 130 9.2.1 Basic Header Segment (BHS) . . . . . . . . . . . . . . . 131 9.2.1.1 I . . . . . . . . . . . . . . . . . . . . . . . . . 132 9.2.1.2 Opcode . . . . . . . . . . . . . . . . . . . . . . 132 9.2.1.3 Opcode-specific Fields . . . . . . . . . . . . . . 133 9.2.1.4 TotalAHSLength . . . . . . . . . . . . . . . . . . 133 9.2.1.5 DataSegmentLength . . . . . . . . . . . . . . . . . 133 9.2.1.6 LUN . . . . . . . . . . . . . . . . . . . . . . . . 134 9.2.1.7 Initiator Task Tag . . . . . . . . . . . . . . . . 134 9.2.2 Additional Header Segment (AHS) . . . . . . . . . . . . 134 9.2.2.1 AHSType . . . . . . . . . . . . . . . . . . . . . . 134 9.2.2.2 AHSLength . . . . . . . . . . . . . . . . . . . . . 135 9.2.2.3 Extended CDB AHS . . . . . . . . . . . . . . . . . 135 9.2.2.4 Bidirectional Expected Read-Data Length AHS . . . . 135 9.2.3 Header Digest and Data Digest . . . . . . . . . . . . . 136 9.2.4 Data Segment . . . . . . . . . . . . . . . . . . . . . . 136 9.3 SCSI Command . . . . . . . . . . . . . . . . . . . . . . . . . 137 9.3.1 Flags and Task Attributes (byte 1) . . . . . . . . . . . 137 9.3.2 CmdSN - Command Sequence Number . . . . . . . . . . . . 138 9.3.3 ExpStatSN . . . . . . . . . . . . . . . . . . . . . . . 138 9.3.4 Expected Data Transfer Length . . . . . . . . . . . . . 138 9.3.5 CDB - SCSI Command Descriptor Block . . . . . . . . . . 139 9.3.6 Data Segment - Command Data . . . . . . . . . . . . . . 139 9.4 SCSI Response . . . . . . . . . . . . . . . . . . . . . . . . 140 9.4.1 Flags (byte 1) . . . . . . . . . . . . . . . . . . . . . 140 9.4.2 Status . . . . . . . . . . . . . . . . . . . . . . . . . 141 9.4.3 Response . . . . . . . . . . . . . . . . . . . . . . . . 142 9.4.4 Residual Count . . . . . . . . . . . . . . . . . . . . . 142 9.4.5 Bidirectional Read Residual Count . . . . . . . . . . . 143 9.4.6 Data Segment - Sense and Response Data Segment . . . . . 143 9.4.6.1 SenseLength . . . . . . . . . . . . . . . . . . . . 143 9.4.6.2 Sense Data . . . . . . . . . . . . . . . . . . . . 144 9.4.7 ExpDataSN . . . . . . . . . . . . . . . . . . . . . . . 144 9.4.8 StatSN - Status Sequence Number . . . . . . . . . . . . 145 9.4.9 ExpCmdSN - Next Expected CmdSN from this Initiator . . . 145 9.4.10 MaxCmdSN - Maximum CmdSN from this Initiator . . . . . 145 9.5 Task Management Function Request . . . . . . . . . . . . . . . 146 9.5.1 Function . . . . . . . . . . . . . . . . . . . . . . . . 146 9.5.2 LUN . . . . . . . . . . . . . . . . . . . . . . . . . . 149 9.5.3 Referenced Task Tag . . . . . . . . . . . . . . . . . . 149 9.5.4 RefCmdSN . . . . . . . . . . . . . . . . . . . . . . . . 149 9.5.5 ExpDataSN . . . . . . . . . . . . . . . . . . . . . . . 149 9.6 Task Management Function Response . . . . . . . . . . . . . . 151 Julian Satran Expires February 2003 19 iSCSI 1-July-02 9.6.1 Response . . . . . . . . . . . . . . . . . . . . . . . . 151 9.6.2 Task Management actions on task sets . . . . . . . . . . 153 9.7 SCSI Data-out & SCSI Data-in . . . . . . . . . . . . . . . . . 154 9.7.1 F (Final) Bit . . . . . . . . . . . . . . . . . . . . . 156 9.7.2 A (Acknowledge) bit . . . . . . . . . . . . . . . . . . 156 9.7.3 Target Transfer Tag . . . . . . . . . . . . . . . . . . 157 9.7.4 StatSN . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.7.5 DataSN . . . . . . . . . . . . . . . . . . . . . . . . . 157 9.7.6 Buffer Offset . . . . . . . . . . . . . . . . . . . . . 158 9.7.7 DataSegmentLength . . . . . . . . . . . . . . . . . . . 158 9.7.8 Flags (byte 1) . . . . . . . . . . . . . . . . . . . . . 158 9.8 Ready To Transfer (R2T) . . . . . . . . . . . . . . . . . . . 160 9.8.1 R2TSN . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.8.2 StatSN . . . . . . . . . . . . . . . . . . . . . . . . . 161 9.8.3 Desired Data Transfer Length and Buffer Offset . . . . . 162 9.8.4 Target Transfer Tag . . . . . . . . . . . . . . . . . . 162 9.9 Asynchronous Message . . . . . . . . . . . . . . . . . . . . . 163 9.9.1 AsyncEvent . . . . . . . . . . . . . . . . . . . . . . . 164 9.9.2 AsyncVCode . . . . . . . . . . . . . . . . . . . . . . . 165 9.9.3 Sense Data and iSCSI Event Data . . . . . . . . . . . . 165 9.9.3.1 SenseLength . . . . . . . . . . . . . . . . . . . . 166 9.10 Text Request . . . . . . . . . . . . . . . . . . . . . . . . 167 9.10.1 F (Final) Bit . . . . . . . . . . . . . . . . . . . . . 168 9.10.2 C (Continue) Bit . . . . . . . . . . . . . . . . . . . 168 9.10.3 Initiator Task Tag . . . . . . . . . . . . . . . . . . 168 9.10.4 Target Transfer Tag . . . . . . . . . . . . . . . . . . 168 9.10.5 Text . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.11 Text Response . . . . . . . . . . . . . . . . . . . . . . . . 171 9.11.1 F (Final) Bit . . . . . . . . . . . . . . . . . . . . . 171 9.11.2 C (Continue) Bit . . . . . . . . . . . . . . . . . . . 172 9.11.3 Initiator Task Tag . . . . . . . . . . . . . . . . . . 172 9.11.4 Target Transfer Tag . . . . . . . . . . . . . . . . . . 172 9.11.5 StatSN . . . . . . . . . . . . . . . . . . . . . . . . 173 9.11.6 Text Response Data . . . . . . . . . . . . . . . . . . 173 9.12 Login Request . . . . . . . . . . . . . . . . . . . . . . . . 174 9.12.1 T (Transit) Bit . . . . . . . . . . . . . . . . . . . . 175 9.12.2 C (Continue) Bit . . . . . . . . . . . . . . . . . . . 175 9.12.3 CSG and NSG . . . . . . . . . . . . . . . . . . . . . . 175 9.12.4 Version-max . . . . . . . . . . . . . . . . . . . . . . 175 9.12.5 Version-min . . . . . . . . . . . . . . . . . . . . . . 175 9.12.6 ISID . . . . . . . . . . . . . . . . . . . . . . . . . 176 9.12.7 TSIH . . . . . . . . . . . . . . . . . . . . . . . . . 177 9.12.8 Connection ID - CID . . . . . . . . . . . . . . . . . . 177 Julian Satran Expires February 2003 20 iSCSI 1-July-02 9.12.9 CmdSN . . . . . . . . . . . . . . . . . . . . . . . . . 178 9.12.10 ExpStatSN . . . . . . . . . . . . . . . . . . . . . . 178 9.12.11 Login Parameters . . . . . . . . . . . . . . . . . . . 178 9.13 Login Response . . . . . . . . . . . . . . . . . . . . . . . 180 9.13.1 Version-max . . . . . . . . . . . . . . . . . . . . . . 180 9.13.2 Version-active . . . . . . . . . . . . . . . . . . . . 181 9.13.3 TSIH . . . . . . . . . . . . . . . . . . . . . . . . . 181 9.13.4 StatSN . . . . . . . . . . . . . . . . . . . . . . . . 181 9.13.5 Status-Class and Status-Detail . . . . . . . . . . . . 181 9.13.6 T (Transit) bit . . . . . . . . . . . . . . . . . . . . 184 9.13.7 C (Continue) Bit . . . . . . . . . . . . . . . . . . . 184 9.13.8 Login Parameters . . . . . . . . . . . . . . . . . . . 185 9.14 Logout Request . . . . . . . . . . . . . . . . . . . . . . . 186 9.14.1 Reason Code . . . . . . . . . . . . . . . . . . . . . . 188 9.14.2 CID . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9.14.3 ExpStatSN . . . . . . . . . . . . . . . . . . . . . . . 189 9.14.4 Implicit termination of tasks . . . . . . . . . . . . . 189 9.15 Logout Response . . . . . . . . . . . . . . . . . . . . . . . 190 9.15.1 Response . . . . . . . . . . . . . . . . . . . . . . . 190 9.15.2 Time2Wait . . . . . . . . . . . . . . . . . . . . . . . 191 9.15.3 Time2Retain . . . . . . . . . . . . . . . . . . . . . . 191 9.16 SNACK Request . . . . . . . . . . . . . . . . . . . . . . . 193 9.16.1 Type . . . . . . . . . . . . . . . . . . . . . . . . . 194 9.16.2 BegRun . . . . . . . . . . . . . . . . . . . . . . . . 195 9.16.3 RunLength . . . . . . . . . . . . . . . . . . . . . . . 195 9.17 Reject . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 9.17.1 Reason . . . . . . . . . . . . . . . . . . . . . . . . 198 9.17.2 DataSN . . . . . . . . . . . . . . . . . . . . . . . . 199 9.17.3 StatSN, ExpCmdSN and MaxCmdSN . . . . . . . . . . . . . 199 9.17.4 Complete Header of Bad PDU . . . . . . . . . . . . . . 199 9.18 NOP-Out . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9.18.1 Initiator Task Tag . . . . . . . . . . . . . . . . . . 201 9.18.2 Target Transfer Tag . . . . . . . . . . . . . . . . . . 201 9.18.3 Ping Data . . . . . . . . . . . . . . . . . . . . . . . 201 9.19 NOP-In . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9.19.1 Target Transfer Tag . . . . . . . . . . . . . . . . . . 203 9.19.2 StatSN . . . . . . . . . . . . . . . . . . . . . . . . 203 9.19.3 LUN . . . . . . . . . . . . . . . . . . . . . . . . . . 203 10. iSCSI Security Keys and Authentication Methods . . . . . . . . 204 10.1 AuthMethod . . . . . . . . . . . . . . . . . . . . . . . . . 204 10.2 Kerberos . . . . . . . . . . . . . . . . . . . . . . . . . . 205 10.3 Simple Public-Key Mechanism (SPKM) . . . . . . . . . . . . . 206 10.4 Secure Remote Password (SRP) . . . . . . . . . . . . . . . . 207 Julian Satran Expires February 2003 21 iSCSI 1-July-02 10.5 Challenge Handshake Authentication Protocol (CHAP) . . . . . 208 11. Login/Text Operational Keys . . . . . . . . . . . . . . . . . . 210 11.1 HeaderDigest and DataDigest . . . . . . . . . . . . . . . . 210 11.2 MaxConnections . . . . . . . . . . . . . . . . . . . . . . . 212 11.3 SendTargets . . . . . . . . . . . . . . . . . . . . . . . . 212 11.4 TargetName . . . . . . . . . . . . . . . . . . . . . . . . . 212 11.5 InitiatorName . . . . . . . . . . . . . . . . . . . . . . . 213 11.6 TargetAlias . . . . . . . . . . . . . . . . . . . . . . . . 213 11.7 InitiatorAlias . . . . . . . . . . . . . . . . . . . . . . . 214 11.8 TargetAddress . . . . . . . . . . . . . . . . . . . . . . . 214 11.9 TargetPortalGroupTag . . . . . . . . . . . . . . . . . . . . 215 11.10 InitialR2T . . . . . . . . . . . . . . . . . . . . . . . . 215 11.11 BidiInitialR2T . . . . . . . . . . . . . . . . . . . . . . 216 11.12 ImmediateData . . . . . . . . . . . . . . . . . . . . . . . 217 11.13 MaxRecvDataSegmentLength . . . . . . . . . . . . . . . . . 218 11.14 MaxBurstLength . . . . . . . . . . . . . . . . . . . . . . 218 11.15 FirstBurstLength . . . . . . . . . . . . . . . . . . . . . 219 11.16 DefaultTime2Wait . . . . . . . . . . . . . . . . . . . . . 219 11.17 DefaultTime2Retain . . . . . . . . . . . . . . . . . . . . 220 11.18 MaxOutstandingR2T . . . . . . . . . . . . . . . . . . . . . 220 11.19 DataPDUInOrder . . . . . . . . . . . . . . . . . . . . . . 221 11.20 DataSequenceInOrder . . . . . . . . . . . . . . . . . . . . 221 11.21 ErrorRecoveryLevel . . . . . . . . . . . . . . . . . . . . 222 11.22 SessionType . . . . . . . . . . . . . . . . . . . . . . . . 222 11.23 The Vendor Specific Key Format . . . . . . . . . . . . . . 223 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 224 References and Bibliography . . . . . . . . . . . . . . . . . . . . 225 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 227 Appendix A. Sync and Steering with Fixed Interval Markers . . . . . 229 A.1 Markers At Fixed Intervals . . . . . . . . . . . . . . . . . 229 A.2 Initial Marker-less Interval . . . . . . . . . . . . . . . . 230 A.3 Negotiation . . . . . . . . . . . . . . . . . . . . . . . . 230 OFMarker, IFMarker 230 OFMarkInt, IFMarkInt 231 Appendix B. Examples . . . . . . . . . . . . . . . . . . . . . . . 233 B.2 Write Operation Example . . . . . . . . . . . . . . . . . . 234 B.3 R2TSN/DataSN use Examples . . . . . . . . . . . . . . . . . 234 B.4 CRC Examples . . . . . . . . . . . . . . . . . . . . . . . . 238 Appendix C. Login Phase Examples . . . . . . . . . . . . . . . . . 240 Appendix D. SendTargets Operation . . . . . . . . . . . . . . . . . 249 Appendix E. Algorithmic Presentation of Error Recovery Classes . . 254 E.2 Within-command Error Recovery Algorithms . . . . . . . . . . 255 Procedure Descriptions 255 Julian Satran Expires February 2003 22 iSCSI 1-July-02 Initiator Algorithms 256 Target Algorithms 258 E.3 Within-connection Recovery Algorithms . . . . . . . . . . . 260 Procedure Descriptions 260 Initiator Algorithms 261 Target Algorithms 264 E.4 Connection Recovery Algorithms . . . . . . . . . . . . . . . 264 Procedure Descriptions 264 Initiator Algorithms 265 Target Algorithms 267 Appendix F. Clearing effects of various events on targets . . . . . 269 F.1 Clearing effects on iSCSI objects . . . . . . . . . . . . . 269 F.2 Clearing effects on SCSI objects . . . . . . . . . . . . . . 274 Full Copyright Statement . . . . . . . . . . . . . . . . . . . . . 276 Julian Satran Expires February 2003 23 iSCSI 1-July-02 1. Definitions and Acronyms 1.1 Definitions - Alias: An alias string can also be associated with an iSCSI Node. The alias allows an organization to associate a user-friendly string with the iSCSI Name. However, the alias string is not a substitute for the iSCSI Name. - CID (Connection ID): Connections within a session are identified by a connection ID. It is a unique ID for this connection within the session for the initiator. It is generated by the initiator and pre- sented to the target during login requests and during logouts that close connections. - Connection: A connection is a TCP connection. Communication between the initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters, and data within iSCSI Protocol Data Units (iSCSI PDUs). - iSCSI Device: A SCSI Device using an iSCSI delivery subsystem - iSCSI Initiator Name: The iSCSI Initiator Name specifies the world- wide unique name of the initiator. - iSCSI Initiator Node: The "initiator". - iSCSI Layer: This layer builds/receives iSCSI PDUs and relays/ receives them to/from one or more TCP connections that form an initi- ator-target "session". - iSCSI Name: The name of an iSCSI initiator or iSCSI target. - iSCSI Node: The iSCSI Node represents a single iSCSI initiator or iSCSI target. There are one or more iSCSI Nodes within a Network Entity. The iSCSI Node is accessible via one or more Network Por- tals. An iSCSI Node is identified by its iSCSI Name. The separation of the iSCSI Name from the addresses used by and for the iSCSI node allows multiple iSCSI nodes to use the same addresses, and the same iSCSI node to use multiple addresses. - iSCSI Target Name: The iSCSI Target Name specifies the worldwide unique name of the target. Julian Satran Expires February 2003 24 iSCSI 1-July-02 - iSCSI Target Node: The "target". - iSCSI Task: An iSCSI task is an iSCSI request for which a response is expected. - iSCSI Transfer Direction: The iSCSI transfer direction is defined with regard to the initiator. Outbound or outgoing transfers are transfers from the initiator to the target, while inbound or incoming transfers are from the target to the initiator. - I_T nexus: According to [SAM2], the I_T nexus is a relationship between a SCSI Initiator Port and a SCSI Target Port. For iSCSI, this relationship is a session, defined as a relationship between an iSCSI Initiator's end of session (SCSI Initiator Port) and the iSCSI Tar- get's Portal Group. The I_T nexus can be identified by the conjunc- tion of the SCSI port names; that is, the I_T nexus identifier is the tuple (iSCSI Initiator Name + 'i'+ ISID, iSCSI Target Name + 't'+ Portal Group Tag). - Network Entity: The Network Entity represents a device or gateway that is accessible from the IP network. A Network Entity must have one or more Network Portals, each of which can be used to gain access to the IP network by some iSCSI Nodes contained in that Network Entity. - Network Portal: The Network Portal is a component of a Network Entity that has a TCP/IP network address and that may be used by an iSCSI Node within that Network Entity for the connection(s) within one of its iSCSI sessions. A Network Portal in an initiator is iden- tified by its IP address. A Network Portal in a target is identified by its IP address and its listening TCP port. - Originator - in a negotiation or exchange the party that initiates the negotiation or exchange. - PDU (Protocol Data Unit): The initiator and target divide their communications into messages. The term "iSCSI protocol data unit" (iSCSI PDU) is used for these messages. - Portal Groups: iSCSI supports multiple connections within the same session; some implementations will have the ability to combine con- Julian Satran Expires February 2003 25 iSCSI 1-July-02 nections in a session across multiple Network Portals. A Portal Group defines a set of Network Portals within an iSCSI Node that collec- tively supports the capability of coordinating a session with connec- tions spanning these portals. Not all Network Portals within a Portal Group need participate in every session connected through that Por- tal Group. One or more Portal Groups may provide access to an iSCSI Node. Each Network Portal as utilized by a given iSCSI Node belongs to exactly one portal group within that node. - Portal Group Tag: This simple unsigned-integer between 1 and 65535 identifies the Portal Group within an iSCSI Node. All Network Por- tals with the same portal group tag in the context of a given iSCSI Node are in the same Portal Group. - Responder: In a negotiation or exchange, the party that responds to the originator of the negotiation or exchange. - SCSI Device: This is the SAM2 term for an entity that contains other SCSI entities. For example, a SCSI Initiator Device contains one or more SCSI Initiator Ports and zero or more application cli- ents; a SCSI Target Device contains one or more SCSI Target Ports and one or more logical units. For iSCSI, the SCSI Device is the compo- nent within an iSCSI Node that provides the SCSI functionality. As such, there can be at most one SCSI Device within a given iSCSI Node. Access to the SCSI Device can only be achieved in an iSCSI normal operational session. The SCSI Device Name is defined to be the iSCSI Name of the node and its use is mandatory in the iSCSI protocol. - SCSI Layer: This builds/receives SCSI CDBs (Command Descriptor Blocks) and relays/receives them with the remaining command execute parameters to/from the iSCSI Layer. - Session: The group of TCP connections that link an initiator with a target, form a session (loosely equivalent to a SCSI I-T nexus). TCP connections can be added and removed from a session. Across all con- nections within a session, an initiator sees one "target image". - SSID (Session ID): A session between an iSCSI initiator and an iSCSI target is defined by a session ID that is a tuple composed of an initiator part (ISID) and a target part (Target Portal Group Tag). The ISID is explicitly specified by the initiator at session estab- lishment. The Target Portal Group Tag is implied by the initiator through the selection of the TCP end-point at connection establish- Julian Satran Expires February 2003 26 iSCSI 1-July-02 ment. The TargetPortalGroupTag key may also be returned by the tar- get as a confirmation during session establishment. - SCSI Initiator Port: This maps to the endpoint of an iSCSI normal operational session. An iSCSI normal operational session is negoti- ated through the login process between an iSCSI initiator node and an iSCSI target node. At successful completion of this process, a SCSI Initiator Port is created within the SCSI Initiator Device. The SCSI Initiator Port Name and SCSI Initiator Port Identifier are both defined to be the iSCSI Initiator Name together with (a) a label that identifies it as an initiator port name/identifier and (b) the ISID portion of the session identifier. - SCSI Port: This is the SAM2 term for an entity in a SCSI Device that provides the SCSI functionality to interface with a service delivery subsystem or transport. For iSCSI, the definition of the SCSI Initiator Port and the SCSI Target Port are different. - SCSI Port Name: A name made up as UTF-8 characters and includes the iSCSI Name + 'i' or 't' + ISID or Portal Group Tag. - SCSI Target Port: This maps to an iSCSI Target Portal Group. - SCSI Target Port Name and SCSI Target Port Identifier: These are both defined to be the iSCSI Target Name together with (a) a label that identifies it as a target port name/identifier and (b) the por- tal group tag. - Target Portal Group Tag: a numerical identifier (16 bit) for an iSCSI Target Portal Group - TSIH (Target Session Identifying Handle): The TSIH is a target assigned tag for a session with a specific named initiator. The tar- get generates it during session establishment and its internal for- mat and content are not defined by this protocol except for the value 0 that is reserved and used by the initiator to indicate a new ses- sion. It is given to the target during additional connection estab- lishment for the same session. Julian Satran Expires February 2003 27 iSCSI 1-July-02 1.2 Acronyms Acronym Definition -------------------------------------------------------------- 3DES Triple Data Encryption Standard ACA Auto Contingent Allegiance AEN Asynchronous Event Notification AES Advanced Encryption Standard AH Additional Header AHS Additional Header Segment API Application Programming Interface ASC Additional Sense Code ASCII American Standard Code for Information Interchange ASCQ Additional Sense Code Qualifier BHS Basic Header Segment CBC Cipher Block Chaining CDB Command Descriptor Block CHAP Challenge Handshake Authentication Protocol CID Connection ID CO Connection Only CRC Cyclic Redundancy Check CRL Certificate Revocation List CSG Current Stage CSM Connection State Machine DES Data Encryption Standard DNS Domain Name Server DOI Domain of Interpretation ESP Encapsulating Security Payload EUI Extended Unique Identifier FFP Full Feature Phase FFPO Full Feature Phase Only Gbps GigaBits per Second HBA Host Bus Adapter HMAC Hashed Message Authentication IANA Internet Assigned Numbers Authority ID Identifier IDN Internationalized Domain Name IEEE Institute of Electrical & Electronics Engineers IETF Internet Engineering Task Force IKE Internet Key Exchange I/O Input - Output IO Initialize Only IP Internet Protocol Julian Satran Expires February 2003 28 iSCSI 1-July-02 IPsec Internet Protocol Security IPv4 Internet Protocol Version 4 IPv6 Internet Protocol Version 6 IQN iSCSI Qualified Name ISID Initiator Session ID ITN Initiator Task Name ITT Initiator Task Tag KRB5 Kerberos V5 LFL Lower Functional Layer LTDS Logical-Text-Data-Segment LO Leading Only LU Logical Unit LUN Logical Unit Number MAC Message Authentication Codes NA Not Applicable NIC Network Interface Card NOP No Operation NSG Next Stage OS Operating System PDU Protocol Data Unit PKI Public Key Infrastructure R2T Ready To Transfer R2TSN Ready To Transfer Sequence Number RDMA Remote Direct Memory Access SAM SCSI Architecture Model SAM2 SCSI Architecture Model - 2 SAN Storage Area Network SCSI Small Computer Systems Interface SN Sequence Number SNACK Selective Negative Acknowledgment - also Sequence Number Acknowledgement for data SPKM Simple Public-Key Mechanism SRP Secure Remote Password SSID Session ID SW Session Wide TCB Task Control Block TCP Transmission Control Protocol TPGT Target Portal Group Tag TSIH Target Session Identifying Handle TTT Target Transfer Tag UFL Upper Functional Layer ULP Upper Level Protocol URN Uniform Resource Names Julian Satran Expires February 2003 29 iSCSI 1-July-02 UTF Universal Transformation Format WG Working Group 1.3 Conventions used in this document In examples, "I->" and "T->" show iSCSI PDUs sent by the initiator and target respectively. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119. iSCSI messages - PDUs - are represented by diagrams as in the follow- ing example: Byte/ 0 | 1 | 2 | 3 | / | | | | |0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7|0 1 2 3 4 5 6 7| +---------------+---------------+---------------+---------------+ 0| Basic Header Segment (BHS) | +---------------+---------------+---------------+---------------+ ---------- +| | +---------------+---------------+---------------+---------------+ The diagrams include byte and bit numbering. The following representation and ordering rules are observed in this document: - Word Rule - Half-word Rule - Byte Rule 1.3.1 Word Rule A word holds 4 consecutive bytes and whenever having a numeric con- tent the word is considered an unsigned number in base 2 positional representation with the lowest numbered byte (e.g., byte 0) bit 0 representing 2**31, bit 1 representing 2**30 and through Lowest num- bered byte + 3 (e.g., byte 3) bit 7 representing 2**0. Julian Satran Expires February 2003 30 iSCSI 1-July-02 Decimal and hexadecimal representation of word values map this repre- sentation to decimal or hexadecimal positional notation. 1.3.2 Half-Word Rule A half-word holds 2 consecutive bytes and whenever having a numeric content the half-word is considered an unsigned number in base 2 positional representation with the lowest numbered byte (e.g., byte 0) bit 0 representing 2**16, bit 1 representing 2**15 and through Lowest numbered byte + 1 (e.g., byte 1) bit 7 representing 2**0. Decimal and hexadecimal representation of word values map this repre- sentation to decimal or hexadecimal positional notation. 1.3.3 Byte Rule For every PDU bytes are sent and received in increasing numbering order (network order). Whenever a byte has a numerical content it is considered an unsigned number in base 2 positional representation with bit 0 representing 2**7, bit 1 representing 2**6 and through bit 7 representing 2**0. Julian Satran Expires February 2003 31 iSCSI 1-July-02 2. Overview 2.1 SCSI Concepts The SCSI Architecture Model-2 [SAM2] describes, in detail, the archi- tecture of the SCSI family of I/O protocols. This section provides a brief background of the SCSI architecture and is intended to famil- iarize readers with its terminology. At the highest level, SCSI is a family of interfaces for requesting services from I/O devices, including hard drives, tape drives, CD and DVD drives, printers, and scanners. In SCSI terminology, an individ- ual I/O device is called a "logical unit" (LU). SCSI is a client-server architecture. Clients of a SCSI interface are called "initiators". Initiators issue SCSI "commands" to request ser- vice from a logical unit. The "device server" on the logical unit accepts SCSI commands and processes them. A "SCSI transport" maps the client-server SCSI protocol to a spe- cific interconnect. Initiators are one endpoint of a SCSI transport. The "target" is the other endpoint. A target can contain multiple Logical Units (LUs). Each Logical Unit has an address within a tar- get called a Logical Unit Number (LUN). A SCSI task is a SCSI command or possibly a linked set of SCSI com- mands. Some LUs support multiple pending (queued) tasks, but the queue of tasks is managed by the target. The target uses an initia- tor provided "task tag" to distinguish between tasks. Only one com- mand in a task can be outstanding at any given time. Each SCSI command results in an optional data phase and a required response phase. In the data phase, information can travel from the initiator to target (e.g., WRITE), target to initiator (e.g., READ), or in both directions. In the response phase, the target returns the final status of the operation, including any errors. A response ter- minates a SCSI command. Command Descriptor Blocks (CDB) are the data structures used to con- tain the command parameters that an initiator hands to a target. The CDB content and structure is defined by [SAM] and device-type spe- cific SCSI standards. Julian Satran Expires February 2003 32 iSCSI 1-July-02 2.2 iSCSI Concepts and Functional Overview The iSCSI protocol is a mapping of the SCSI remote procedure invoca- tion model (see [SAM]) over the TCP protocol. SCSI commands are car- ried by iSCSI requests and SCSI responses and status are carried by iSCSI responses. iSCSI also uses the request response mechanism for iSCSI protocol mechanisms. For the remainder of this document, the terms "initiator" and "tar- get" refer to "iSCSI initiator node" and "iSCSI target node", respec- tively (see Section 2.4.1 iSCSI Architecture Model) unless otherwise qualified. In keeping with similar protocols, the initiator and target divide their communications into messages. This document uses the term "iSCSI protocol data unit" (iSCSI PDU) for these messages. For performance reasons, iSCSI allows a "phase-collapse". A command and its associated data may be shipped together from initiator to target, and data and responses may be shipped together from targets. The iSCSI transfer direction is defined with respect to the initia- tor. Outbound or outgoing transfers are transfers from an initiator to a target, while inbound or incoming transfers are from a target to an initiator. An iSCSI task is an iSCSI request for which a response is expected. In this document "iSCSI request", "iSCSI command", request, or (unqualified) command have the same meaning. Also, unless otherwise specified, status, response, or numbered response have the same mean- ing. 2.2.1 Layers and Sessions The following conceptual layering model is used to specify initiator and target actions and how they relate to transmitted and received Protocol Data Units: -The SCSI layer builds/receives SCSI CDBs (Command Descriptor Blocks) and relays/receives them with the remaining command execute parameters (cf. SAM2) to/from ->. Julian Satran Expires February 2003 33 iSCSI 1-July-02 -The iSCSI layer that builds/receives iSCSI PDUs and relays/ receives them to/from one or more TCP connections that form an initiator-target "session". Communication between the initiator and target occurs over one or more TCP connections. The TCP connections carry control messages, SCSI commands, parameters, and data within iSCSI Protocol Data Units (iSCSI PDUs). The group of TCP connections that link an initiator with a target, form a session (loosely equivalent to a SCSI I-T nexus - see Section 2.4.2 SCSI Architecture Model). A session is defined by a session ID that is composed of an initiator part and a target part. TCP connections can be added and removed from a session. Connections within a session are identified by a connection ID (CID). Across all connections within a session, an initiator sees one "tar- get image". All target identifying elements, such as LUN, are the same. A target also sees one "initiator image" across all connec- tions within a session. Initiator identifying elements, such as the Initiator Task Tag are global across the session regardless of the connection on which they are sent or received. iSCSI targets and initiators MUST support at least one TCP connec- tion and MAY support several connections in a session. For error recovery purposes, targets and initiators that support a single active connection in a session may have to support two connections during recovery. 2.2.2 Ordering and iSCSI Numbering iSCSI uses Command and Status numbering schemes and a Data sequenc- ing scheme. Command numbering is session-wide and is used for ordered command delivery over multiple connections. It can also be used as a mecha- nism for command flow control over a session. Status numbering is per connection and is used to enable missing sta- tus detection and recovery in the presence of transient or permanent communication errors. Data sequencing is per command or part of a command (R2T triggered sequence) and is used to detect missing data and/or R2T PDUs due to header digest errors. Julian Satran Expires February 2003 34 iSCSI 1-July-02 Typically, fields in the iSCSI PDUs communicate the Sequence Numbers between the initiator and target. During periods when traffic on a connection is unidirectional, iSCSI NOP-Out/In PDUs may be utilized to synchronize the command and status ordering counters of the tar- get and initiator. 2.2.2.1 Command Numbering and Acknowledging iSCSI supports ordered command delivery within a session. All com- mands (initiator-to-target PDUs) are numbered. Many SCSI activities are related to a task (SAM2). The task is iden- tified by the Initiator Task Tag for the life of the task. Commands in transit from the initiator to the target are numbered by iSCSI; the number is carried by the iSCSI PDU as CmdSN (Command- Sequence-Number). The numbering is session-wide. Outgoing iSCSI PDUs carry this number. The iSCSI initiator allocates CmdSNs with a 32-bit unsigned counter (modulo 2**32). Comparisons and arithmetic on CmdSN use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. Commands meant for immediate delivery are marked with an immediate delivery flag; they also carry CmdSN. CmdSN does not advance for com- mands marked for immediate delivery. Command numbering starts with the first login request on the first connection of a session (the leading login on the leading connec- tion) and command numbers are incremented by 1 for every non-immedi- ate command issued afterwards. If immediate delivery is used with task management commands, these commands may reach the target before the tasks on which they are sup- posed to act. For this reason the task management command MUST carry the current CmdSN as a marker of their position in the stream of com- mands. The initiator and target must ensure that the task management commands act as specified by SAM2. For example, both commands and responses appear as if delivered in order. Whenever CmdSN for an outgoing PDU is not specified by an explicit rule CmdSN will carry the current value of the local CmdSN register (see later in this sec- tion). Julian Satran Expires February 2003 35 iSCSI 1-July-02 The means by which one may request immediate delivery for a command or by which iSCSI decides by itself to mark a PDU for immediate delivery are beyond the scope of this document. The number of commands used for immediate delivery is not limited and their delivery to execution is not acknowledged through the number- ing scheme. Immediate commands can be rejected by the iSCSI target due to lack of resources. An iSCSI target MUST be able to handle at least one immediate task management command and one immediate non- task-management iSCSI command per connection at any time. With the exception of the commands marked for immediate delivery, the iSCSI target layer MUST deliver the commands for execution in the order specified by CmdSN. Commands marked for immediate delivery may be handed over by the iSCSI target layer for execution as soon as detected. iSCSI may avoid delivering some commands for execution if required by a prior SCSI or iSCSI action (e.g., CLEAR TASK SET Task Management request received before all the commands on which it was supposed to act). Delivery for execution means delivery to the SCSI execution engine or an iSCSI-SCSI protocol specific execution engine (e.g., for text requests). On any given connection, the iSCSI initiator MUST send the commands in increasing order of CmdSN, except for commands that are retrans- mitted due to digest error recovery and connection recovery. The initiator and target are assumed to have the following three reg- isters that are unique session wide and that define the numbering mechanism: - CmdSN - the current command Sequence Number, advanced by 1 on each command shipped except for commands marked for imme- diate delivery. CmdsN always contains the number to be assigned next. - ExpCmdSN - the next expected command by the target. The tar- get acknowledges all commands up to, but not including, this number. The initiator has to mark the acknowledged commands as such as soon as a PDU with the corresponding ExpCmdSN is received. The target iSCSI layer sets the ExpCmdSN to the largest non-immediate CmdSN that it can deliver for execu- tion plus 1 (no holes in the CmdSN sequence). - MaxCmdSN - the maximum number to be shipped. The queuing capacity of the receiving iSCSI layer is MaxCmdSN - ExpCmdSN + 1. Julian Satran Expires February 2003 36 iSCSI 1-July-02 ExpCmdSN and MaxCmdSN are derived from target-to-initiator PDU fields. Comparisons and arithmetic on ExpCmdSN and MaxCmdSN MUST use Serial Number Arithmetic as defined in [RFC1982] where SERIAL_BITS = 32. The target MUST NOT transmit a MaxCmdSN that is less than ExpCmdSN-1. For non-immediate commands, the CmdSN field can take any value from ExpCmdSN to MaxCmdSN inclusive. The target MUST silently ignore any non-immediate command outside of this range or non-immediate dupli- cates within the range. Note that the CmdSN carried by immediate com- mands may lie outside the ExpCmdSN to MaxCmdSN range (e.g., if the initiator has previously sent a non-immediate command carrying the CmdSN equal to MaxCmdSN - i.e., target window is closed). For group task management commands issued as immediate commands CmdSN indi- cates the scope of the group action (e.g., on ABORT TASK SET - what commands get aborted). MaxCmdSN and ExpCmdSN fields are processed by the initiator as fol- lows: -If the PDU MaxCmdSN is less than the PDU ExpCmdSN-1 (in Serial Arithmetic Sense), they are both ignored. -If the PDU MaxCmdSN is greater than the local MaxCmdSN (in Serial Arithmetic Sense) it updates the local MaxCmdSN; oth- erwise, it is ignored. -If the PDU ExpCmdSN is greater than the local ExpCmdSN (in Serial Arithmetic Sense) it updates the local ExpCmdSN; oth- erwise, it is ignored. This sequence is required because updates may arrive out of order being that they travel on different TCP connections. iSCSI initiators and targets MUST support the command numbering scheme. A numbered iSCSI request will not change its allocated CmdSN, regard- less of the number of times and circumstances in which it is reis- sued (see Section 6.1.1 Usage of Retry). At the target, it is assumed that CmdSN is relevant only while the command has not created any state related to its execution (execution state); afterwards, CmdSN becomes irrelevant. Testing for the execution state (represented by identifying the Initiator Task Tag) is assumed to precede any other action at the target, and is followed by ordering and delivery if no execution state is found or delivery if an execution state is found. Julian Satran Expires February 2003 37 iSCSI 1-July-02 If an initiator issues a command retry for a command with CmdSN R on a connection when the session CmdSN register is Q, it MUST NOT advance the CmdSN past R + 2**31 -1 unless the connection is no longer operational (has returned to the FREE state - see Section 5.1 Standard Connection State Diagrams), or the connection has been rein- stated (see Section 4.3.4 Connection reinstatement), or a non-immedi- ate command with CmdSN equal or greater than Q was issued on the same connection and the reception of the command is acknowledged by the target (see Section 8.4 Command Retry and Cleaning Old Command Instances). A target MUST NOT issue a command response or DATA-In PDU with sta- tus before acknowledging the command. However, the acknowledgement can be included in the response or Data-in PDU itself. 2.2.2.2 Response/Status Numbering and Acknowledging Responses in transit from the target to the initiator are numbered. The StatSN (Status Sequence Number) is used for this purpose. StatSN is a counter maintained per connection. ExpStatSN is used by the ini- tiator to acknowledge status. The status sequence number space is 32- bit unsigned-integers and the arithmetic operations are the regular mod(2**32) arithmetic. Status numbering starts with the Login response to the first Login request of the connection. The Login response includes an initial value for status numbering (any initial value is valid). To enable command recovery, the target MAY maintain enough state information to enable data and status recovery after a connection failure. A target can discard all the state information maintained for recovery after the status delivery is acknowledged through Exp- StatSN. A large absolute difference between StatSN and ExpStatSN may indi- cate a failed connection. Initiators undertake recovery actions if the difference is greater than an implementation defined constant that SHOULD NOT exceed 2**31-1. Initiators and Targets MUST support the response-numbering scheme. Julian Satran Expires February 2003 38 iSCSI 1-July-02 2.2.2.3 Data Sequencing Data and R2T PDUs, transferred as part of some command execution, MUST be sequenced. The DataSN field is used for data sequencing. For input (read) data PDUs, DataSN starts with 0 for the first data PDU of an input command and advances by 1 for each subsequent data PDU. For output data PDUs, DataSN starts with 0 for the first data PDU of a sequence (the initial unsolicited sequence or any data PDU sequence issued to satisfy an R2T) and advances by 1 for each subsequent data PDU. R2Ts are also sequenced per command. For example, the first R2T has an R2TSN of 0 and advances by 1 for each subsequent R2T. For bidirectional commands, the target uses the DataSN/R2TSN to sequence Data-In and R2T PDUs in one continuous sequence (undifferentiated). Unlike command and status, data PDUs and R2Ts are not acknowledged by a field in regular outgoing PDUs. Data-In PDUs can be acknowledged on demand by a special form of the SNACK PDU. Data and R2T PDUs are implicitly acknowledged by status. The DataSN/R2TSN field enables the initiator to detect missing data or R2T PDUs. For any given read or bidirectional command, a target MUST issue less than 2**32 combined R2T and Data-In PDUs. Any output data sequence MUST contain less than 2**32 Data-Out PDUs. 2.2.3 iSCSI Login The purpose of the iSCSI login is to enable a TCP connection for iSCSI use, authenticate the parties, negotiate the session's parame- ters and mark the connection as belonging to an iSCSI session. A session is used to identify all the connections with a given initi- ator that belong to the same I_T nexus to a target. (See Section 2.4.2 SCSI Architecture Model for more details on how a session relates to an I_T nexus). The targets listen on a well-known TCP port or other TCP port for incoming connections. The initiator begins the login process by con- necting to one of these TCP ports. As part of the login process, the initiator and target MAY wish to authenticate each other and set a security association protocol for the session. This can occur in many different ways and is subject to negotiation. Julian Satran Expires February 2003 39 iSCSI 1-July-02 In order to protect the TCP connection, an IPsec security associa- tion MAY be established before the Login request. Using IPsec secu- rity for iSCSI is specified in Chapter 7 and in [SEC-IPS]. The iSCSI Login Phase is carried through Login requests and responses. Once suitable authentication has occurred and operational parameters have been set, the initiator may start to send SCSI com- mands. How the target chooses to authorize an initiator is beyond the scope of this document. A more detailed description of the Login Phase can be found in Chapter 4. The login PDU includes the ISID part of the session ID (SSID). The target portal group servicing the login is implied by the selection of the connection end-point. For a new session, the TSIH is zero. As part of the response, the target generates a TSIH. During session establishment, the target identifies the SCSI initia- tor port (the "I" in the "I_T nexus") through the value pair (Initia- torName, ISID) (InitiatorName is described later in this section). Any persistent state (e.g., persistent reservations) on the target that is associated with a SCSI initiator port is identified based on this value pair. Any state associated with the SCSI target port (the "T" in the "I_T nexus") is identified externally by the TargetName and portal group tag (see Section 2.4.1 iSCSI Architecture Model) and internally in an implementation dependent way. As ISID is used to identify a persistent state, it is subject to reuse restrictions (see Section 2.4.3 Consequences of the Model). Before the Full Feature Phase is established, only Login Request and Login Response PDUs are allowed. Any other PDU, when received at ini- tiator or target, is a protocol error and MUST result in the connec- tion being terminated. Login requests and responses MUST be used exclusively during Login. On any connection the login phase MUST immediately succeed TCP connection establishment and a single Login Phase is allowed before tearing down a connection. 2.2.4 iSCSI Full Feature Phase Once the initiator is authorized to do so, the iSCSI session is in the iSCSI Full Feature Phase. A session is in Full Feature Phase after successfully finishing the Login Phase on the first (leading) connection of a session. A connection is in Full Feature Phase if Julian Satran Expires February 2003 40 iSCSI 1-July-02 the session is in Full Feature Phase and the connection login has completed successfully. An iSCSI connection is not in Full Feature Phase a) when it does not have an established transport connection, or b) when it has a valid transport connection, but a successful login was not performed or the connection is currently logged out. In a normal Full Feature Phase, the initiator may send SCSI commands and data to the various LUs on the target by wrapping them in iSCSI PDUs that go over the established iSCSI session. For an iSCSI request issued over a TCP connection, the corresponding response and/or requested PDU(s) MUST be sent over the same connec- tion. We call this "connection allegiance". If the original connec- tion fails before the command is completed, the connection allegiance of the command may be explicitly reassigned to a different transport connection as described in detail in Section 6.1 Retry and Reassign in Recovery. For SCSI commands that require data and/or a parameter transfer, the (optional) data and the status for the command MUST be sent over the same TCP connection to which the SCSI command is currently alle- giant, illustrating the above rule. Thus, if an initiator issues a READ command, the target MUST send the requested data, if any, followed by the status to the initiator over the same TCP connection that was used to deliver the SCSI command. If an initiator issues a WRITE command, the initiator MUST send the data, if any, for that command over the same TCP connection that was used to deliver the SCSI command. The target MUST return Ready To Transfer (R2T), if any, and the status over the same TCP connection that was used to deliver the SCSI command. Retransmission requests (SNACK PDUs) and the data and status that they generate MUST also use the same connection. However, consecutive commands that are part of a SCSI linked command- chain task MAY use different connections. Connection allegiance is strictly per-command and not per-task. During the iSCSI Full Feature Phase, the initiator and target MAY interleave unrelated SCSI com- mands, their SCSI Data, and responses over the session. Outgoing SCSI data (initiator to target user data or command parame- ters) is sent as either solicited data or unsolicited data. Solic- ited data are sent in response to R2T PDUs. Unsolicited data can be sent as part of an iSCSI command PDU ("immediate data") or in sepa- Julian Satran Expires February 2003 41 iSCSI 1-July-02 rate iSCSI data PDUs. An initiator may send unsolicited data up to FirstBurstLength as immediate (up to the negotiated maximum PDU length), in a separate PDU sequence or both. All subsequent data MUST be solicited. The maximum length of an individual data PDU or the immediate-part of the first unsolicited burst MAY be negotiated at login. Targets operate in either solicited (R2T) data mode or unsolicited (non R2T) data mode. The maximum amount of unsolicited data that can be sent with a command is negotiated at login. A target MAY sepa- rately enable immediate data without enabling the more general (sepa- rate data PDUs) form of unsolicited data. Unsolicited data on write are meant to reduce the effect of latency on throughput (no R2T is needed to start sending data). In addi- tion, immediate data are meant to reduce the protocol overhead (both bandwidth and execution time). An iSCSI initiator MAY choose to send no unsolicited data, only imme- diate data or FirstBurstLength bytes of unsolicited data with a com- mand. If any non-immediate unsolicited data are sent, the total unsolicited data MUST be either the negotiated amount or all the data if the total amount is less than the negotiated amount for unsolic- ited data. An initiator MUST honor an R2T data request for a valid outstanding command (i.e., carrying a valid Initiator Task Tag) and deliver all the requested data provided the command is supposed to deliver outgo- ing data and the R2T specifies data within the command bounds. The initiator actions on receiving an R2T request that specifies data all or part outside the command bounds is unspecified. It is considered an error for an initiator to send unsolicited data PDUs to a target that operates in R2T mode (only solicited data are allowed). It is also an error for an initiator to send more data, whether immediate or as separate PDUs, than the iSCSI limit for first burst. A target SHOULD NOT silently discard data and then request retrans- mission through R2T. Initiators SHOULD NOT keep track of the data transferred to or from the target (scoreboarding); targets perform residual count calculation. Incoming data for initiators is always Julian Satran Expires February 2003 42 iSCSI 1-July-02 implicitly solicited. SCSI data packets are matched to their corre- sponding SCSI commands by using Tags specified in the protocol. Initiator tags for pending commands are unique initiator-wide for a session. Target tags are not strictly specified by the protocol. It is assumed that target tags are used by the target to tag (alone or in combination with the LUN) the solicited data. Target tags are gen- erated by the target and "echoed" by the initiator. The above mecha- nisms are designed to accomplish efficient data delivery and a large degree of control over the data flow. iSCSI initiators and targets MUST also enforce some ordering rules. Unsolicited data MUST be sent on every connection in the same order in which commands were sent. A target that receives data out of order MAY terminate the session. 2.2.5 iSCSI Connection Termination An iSCSI connection may be terminated by use of a transport connec- tion shutdown, or a transport reset. Transport reset is assumed to be an exceptional event. Graceful TCP connection shutdowns are done by sending TCP FINs. A graceful transport connection shutdown SHOULD be initiated by either party only when the connection is not in iSCSI Full Feature Phase. A target MAY terminate a Full Feature Phase connection on internal exception events, but it SHOULD announce the fact through an Asyn- chronous Message PDU. Connection termination with outstanding com- mands may require recovery actions. If a connection is terminated while in Full Feature Phase, connec- tion cleanup (section 5) is required as a prelude to recovery. By doing connection cleanup before starting recovery, the initiator and target can avoid receiving stale PDUs after recovery. 2.2.6 iSCSI Names Both targets and initiators require names for the purpose of identi- fication, and so that iSCSI storage resources can be managed regard- less of location (address). An iSCSI node name is also the SCSI device name of an iSCSI device. The iSCSI name of a SCSI device is the principal object used in authentication of targets to initiators and initiators to targets. This name is also used to identify and manage iSCSI storage resources. Julian Satran Expires February 2003 43 iSCSI 1-July-02 iSCSI names must be unique within the operation domain of the end user. However, because the operation domain of an IP network is potentially worldwide, the iSCSI name formats are architected to be world wide unique. To assist naming authorities in the construction of world wide unique names, iSCSI provides two name formats for dif- ferent types of naming authorities. iSCSI names are associated with iSCSI nodes, not iSCSI network adapter cards, to ensure the replacement of network adapter cards does not require reconfiguration of all SCSI and iSCSI resource allo- cation information. Some SCSI commands require that protocol-specific identifiers be com- municated within SCSI CDBs. See Section 2.4.2 SCSI Architecture Model for the definition of the SCSI port name/identifier for iSCSI ports. An initiator may discover the iSCSI Target Names to which it has access, along with their addresses, using the SendTargets text request, or other techniques discussed in [NDT]. 2.2.6.1 iSCSI Name Requirements Each iSCSI node, whether an initiator or target, MUST have an iSCSI name. Initiators and targets MUST support the receipt of iSCSI names of up to the maximum length of 255 bytes. The initiator MUST present both its iSCSI Initiator Name and the iSCSI Target Name to which it wishes to connect in the first login request of a new session or connection. The only exception is if a discovery session (see Section 2.3 iSCSI Session Types) is to be established; the iSCSI Initiator Name is still required, but the iSCSI Target Name may be ignored. iSCSI names must adhere to the following requirements: a) iSCSI names must be globally unique. No two initiators or targets should have the same name. b) iSCSI names must be permanent. An iSCSI initiator or target has the same name for its lifetime. Julian Satran Expires February 2003 44 iSCSI 1-July-02 c) iSCSI names do not imply a location or address. An iSCSI ini- tiator or target can move, or have multiple addresses. A change of address does not imply a change of name. d) iSCSI names must not rely on a central name broker; the nam- ing authority must be distributed. e) iSCSI names must support integration with existing unique nam- ing schemes. f) iSCSI names must rely on existing naming authorities. iSCSI does not have to create its own naming authority. The encoding of an iSCSI name also has some requirements: a) iSCSI names must have a single encoding method when transmit- ted over various protocols. b) iSCSI names must be relatively simple to compare. The algo- rithm for comparing two iSCSI names for equivalence must not rely on any external server. c) iSCSI names must be composed of displayable characters only. iSCSI names should be kept as simple as possible. They must pro- vide for the use of international character sets, and must not be case sensitive. Whitespace characters are not allowed. d) iSCSI names must be transport-friendly. They must be trans- ported using both binary and ASCII-based protocols. An iSCSI name really names a logical software entity, and is not tied to a port or other hardware that can be changed. For instance, an initiator name should name the iSCSI initiator node, not a particu- lar NIC or HBA. When multiple NICs are used, they should generally all present the same iSCSI initiator name to the targets, because they are just paths to the same SCSI layer. In most operating sys- tems, the named entity is the operating system image. A target name should similarly not be tied to hardware interfaces which can be changed. A target name should identify the logical tar- get, and must be the same for the target regardless of the physical portion being addressed. This assists iSCSI initiators in determin- ing that two targets it has discovered are really two paths to the same target. The iSCSI name is designed to fulfill the functional requirements for Uniform Resource Names (URN) [RFC1737]. For example, it is required that the name have a global scope, independent of address or loca- tion, and that it be persistent and globally unique. Names must be Julian Satran Expires February 2003 45 iSCSI 1-July-02 extensible, and scale with the use of naming authorities. The encod- ing of the name should be readable by a human, as well as be machine- readable. See [RFC1737] for further requirements. 2.2.6.2 iSCSI Name Encoding An iSCSI name MUST be a UTF-8 encoding of a string of Unicode charac- ters, with the following properties: - it is in Normalization Form C (see "Unicode Normalization Forms" [UNICODE]) - it contains only the following characters: - ASCII dash ('-'=U+002d) - ASCII dot ('.'=U+002e) - ASCII colon (':'=U+003a) - Any character allowed by the output of the iSCSI stringprep template (described in [STPREP-iSCSI]) - when encoded in UTF-8, it is no larger than 255 bytes The stringprep process is described in [STPREP]; iSCSI's use of the stringprep process is described in [STPREP-iSCSI]. Stringprep is a method designed by the Internationalized Domain Name (IDN) working group to translate human-typed strings into a format that can be com- pared as opaque strings. Strings must not include punctuation, spac- ing, diacritical marks, or other characters that could get in the way of readability. The stringprep process also converts strings into equivalent strings of lower-case characters. Note that in most cases, the Stringprep process does not need to be implemented if the names are generated using only lower-case (any character set) alpha-numeric characters. Once iSCSI names encoded in UTF-8 are "normalized" (there is one and only one representation for each possible name), they may be safely compared byte-for-byte. 2.2.6.3 iSCSI Name Structure An iSCSI name consists of two parts - a type designator followed by a unique name string. Julian Satran Expires February 2003 46 iSCSI 1-July-02 The iSCSI name does not define any new naming authorities. Instead, it supports two existing ways of designating naming authorities: an iSCSI-Qualified Name, using domain names to identify a naming author- ity, and the EUI format, where the IEEE Registration Authority assists in the formation of world wide unique names (EUI-64 format). The type designator strings that may currently be used are: iqn. - iSCSI Qualified name eui. - Remainder of the string is an IEEE EUI-64 identi- fier, in ASCII-encoded hexadecimal. As these two naming authority designators will suffice in nearly every case for both software and hardware-based entities, the cre- ation of additional type designators is prohibited. One of these two type strings MUST be used when constructing an iSCSI name; any type string not listed here is not allowed, as they cannot be guaranteed to be unique. 2.2.6.3.1 Type "iqn." (iSCSI Qualified Name) This iSCSI name type can be used by any organization which owns a domain name. This naming format is useful when an end user or ser- vice provider wishes to assign iSCSI names for targets and/or initia- tors. To generate names of this type, the person or organization generat- ing the name must own a DNS domain name. This domain name does not have to be active, and does not have to resolve to an address; it just needs to be reserved to prevent others from generating iSCSI names using the same domain name. Because a domain name can expire, be acquired by another entity, and might be used to generate iSCSI names by both owners, the domain name must be additionally qualified by a date during which the naming authority owned the domain name. A date code is provided as part of the "iqn." format for this reason. The iSCSI qualified name string consists of: - The string "iqn.", used to distinguish these names from "eui." formatted names. - A date code, in yyyy-mm format. This date MUST be a date during which the naming authority owned the domain name used Julian Satran Expires February 2003 47 iSCSI 1-July-02 in this format, and SHOULD be the date on which the domain name was acquired by this naming authority. This date code uses the Gregorian calendar. All four digits in the year must be present. Both digits of the month must be present, with January == "01" and December == "12". The dash must be included. - Another ".". - The reversed domain name of the naming authority (person or organization) creating this iSCSI name. - Another ".". - Any string, within the character set and length boundaries, that the owner of the domain name deems appropriate. This may contain product types, serial numbers, host identifiers, software keys, or anything else that makes sense to uniquely identify the initiator or target. Everything after the reversed domain name, followed by another dot ".", can be assigned as desired by the owner of the domain name. It is the responsibility of the entity that is the naming author- ity to ensure that the iSCSI names it assigns are world wide unique. For example, "ACME Storage Arrays, Inc.", might own the domain name "acme.com". The following are examples of iSCSI qualified names that might be generated by "ACME Storage Arrays, Inc." Organization Subgroup Naming Authority Naming and/or string defined by Type Date Auth "acme.com" Naming Authority +--++-----+ +------+ +--------------------------------+ | || | | | | | iqn.2001-04.com.acme.diskarrays-sn-a8675309 iqn.2001-04.com.acme.storage:tape.sys1.xyz iqn.2001-04.com.acme.storage.tape:sys1.xyz 2.2.6.3.2 Type "eui." (IEEE EUI-64 format) The IEEE Registration Authority provides a service for assigning glo- bally unique identifiers [EUI]. The EUI-64 format is in use as a global identifier in other network protocols such as Fibre Channel. See http://standards.ieee.org/regauth/oui/index.shtml - for more information on registering for EUI identifiers. The format is "eui." followed by an EUI-64 identifier (16 ASCII- encoded hexidecimal digits). Example iSCSI name : Julian Satran Expires February 2003 48 iSCSI 1-July-02 Type EUI-64 identifier (ASCII-encoded hexadecimal) +--++--------------+ | || | eui.02004567A425678D The IEEE EUI-64 iSCSI name format might be used when a manufacturer is already registered with the IEEE Registration Authority and uses EUI-64 formatted world wide unique names for its products. More examples of name construction are discussed in [NDT]. 2.2.7 Persistent State iSCSI does not require any persistent state maintenance across ses- sions. However in some cases, SCSI requires persistent identifica- tion of the SCSI initiator port name (for iSCSI, the InitiatorName plus the ISID portion of the session identifier). (See Section 2.4.2 SCSI Architecture Model and Section 2.4.3 Consequences of the Model.) iSCSI sessions do not persist through power cycles and boot opera- tions. All iSCSI session and connection parameters are re-initialized on session and connection creation. Commands persist beyond connection termination if the session per- sists and command recovery within the session is supported. However, when a connection is dropped, command execution, as perceived by iSCSI (i.e., involving iSCSI protocol exchanges for the affected task), is suspended until a new allegiance is established by the 'task reassign' task management function. (See Section 9.5 Task Man- agement Function Request.) 2.2.8 Message Synchronization and Steering 2.2.8.1 Rationale iSCSI presents a mapping of the SCSI protocol onto TCP. This encapsu- lation is accomplished by sending iSCSI PDUs of varying lengths. Unfortunately, TCP does not have a built-in mechanism for signaling message boundaries at the TCP layer. iSCSI overcomes this obstacle by placing the message length in the iSCSI message header. This serves Julian Satran Expires February 2003 49 iSCSI 1-July-02 to delineate the end of the current message as well as the beginning of the next message. In situations where IP packets are delivered in order from the net- work, iSCSI message framing is not an issue and messages are pro- cessed one after the other. In the presence of IP packet reordering, (i.e., frames being dropped) legacy TCP implementations store the "out of order" TCP segments in temporary buffers until the missing TCP segments arrive, upon which the data must be copied to the appli- cation buffers. In iSCSI, it is desirable to steer the SCSI data within these out of order TCP segments into the pre-allocated SCSI buffers rather than store them in temporary buffers. This decreases the need for dedicated reassembly buffers as well as the latency and bandwidth related to extra copies. Relying solely on the "message length" information from the iSCSI message header may make it impossible to find iSCSI message bound- aries in subsequent TCP segments due to the loss of a TCP segment that contains the iSCSI message length. The missing TCP segment(s) must be received before any of the following segments can be steered to the correct SCSI buffers (due to the inability to determine the iSCSI message boundaries). Because these segments cannot be steered to the correct location, they must be saved in temporary buffers that must then be copied to the SCSI buffers. Different schemes can be used to recover synchronization. One of these schemes is detailed in Appendix A. - Sync and Steering with Fixed Interval Markers -. To make these schemes work, iSCSI implemen- tations have to make sure that the appropriate protocol layers are provided with enough information to implement a synchronization and/ or data steering mechanism. 2.2.8.2 Synchronization (sync) and Steering Functional Model We assume that iSCSI is implemented according to the following layer- ing scheme: Julian Satran Expires February 2003 50 iSCSI 1-July-02 +------------------------+ | SCSI | +------------------------+ | iSCSI | +------------------------+ | Sync and Steering | | +-------------------+ | | | TCP | | | +-------------------+ | +------------------------+ | Lower Functional Layers| | (LFL) | +------------------------+ | IP | +------------------------+ | Link | +------------------------+ In this model, LFL can be IPsec (a mechanism changing the IP stream and invisible to TCP). We assume that Sync and Steering operates just underneath iSCSI. An implementation may choose to place Sync and Steering somewhere else in the stack if it can translate the informa- tion kept by iSCSI in terms valid for the chosen layer. According to our layering model, iSCSI considers the information it delivers to the Sync and Steering layer (headers and payloads) as a contiguous stream of bytes mapped to the positive integers from 0 to infinity. In practice, though, iSCSI is not expected to handle infi- nitely long streams; stream addressing will wrap around at 2**32-1. This model assumes that the iSCSI layer will deliver complete PDUs to underlying layers in single (atomic) operations. The underlying layer does not need to examine the stream content to discover the PDU boundaries. If a specific implementation performs PDU delivery to the Sync and Steering layer through multiple operations, it MUST bracket an operation set used to deliver a single PDU in a manner that the Sync and Steering Layer can understand. The Sync and Steering Layer (which is OPTIONAL) MUST retain the PDU end address within the stream for every delivered iSCSI PDU. To enable the Sync and Steering operation to perform Steering, addi- tional information, including identifying tags and buffer offsets, Julian Satran Expires February 2003 51 iSCSI 1-July-02 MUST also be retained for every sent PDU. The Sync and Steering Layer is required to add enough information to every sent data item (IP packet, TCP packet or some other superstructure) to enable the receiver to steer it to a memory location independent of any other piece. If the transmission stream is built dynamically, this information is used to insert Sync and Steering information in the transmission stream (at first transmission or at re-transmission) either through a globally accessible table or a call-back mechanism. If the transmis- sion stream is built statically, the Sync and Steering information is inserted in the transmission stream when data are first presented to sync and steering. The retained information can be released whenever the transmitted data are acknowledged by the receiver. (in the case of dynamically built streams, by deletion from the global table or by an additional callback). On the outgoing path, the Sync and Steering layer MUST map the outgo- ing stream addresses from iSCSI stream addresses to TCP stream sequence numbers. On the incoming path, the Sync and Steering layer extracts the Sync and Steering information from the TCP stream. It then helps steer (place) the data stream to its final location and/or recover iSCSI PDU boundaries when some TCP packets are lost or received out of order. The data stream seen by the receiving iSCSI layer is identi- cal to the data stream that left the sending iSCSI layer. The Sync and Steering information is kept until the PDUs to which it refers are completely processed by the iSCSI layer. On the incoming path, the Sync and Steering layer does not change the way TCP notifies iSCSI about in-order data arrival. All data place- ments, in-order or out-of-order, performed by the Sync and Steering layer are hidden from iSCSI while conventional, in order, data arrival notifications generated by TCP are passed through to iSCSI. 2.2.8.3 Sync and Steering and Other Encapsulation Layers We recognize that in many environments the following is a more appro- priate layering model: Julian Satran Expires February 2003 52 iSCSI 1-July-02 +----------------------------------+ | SCSI | +----------------------------------+ | iSCSI | +----------------------------------+ | Upper Functional Layers (UFL) | +----------------------------------+ | Sync and Steering | | +-----------------------------+ | | | TCP | | | +-----------------------------+ | +----------------------------------+ | Lower Functional Layers (LFL) | +----------------------------------+ | IP | +----------------------------------+ | Link | +----------------------------------+ In this model, UFL can be TLS (see[RFC2246]) or some other transport conversion mechanism (a mechanism that changes the TCP stream, but that is transparent to iSCSI). To be effective and act on reception of TCP packets out of order, Sync and Steering has to be underneath UFL, and Sync and Steering data must be left out of any UFL transformation (encryption, compres- sion, padding etc.). However, Sync and Steering MUST take into account the additional data inserted in the stream by UFL. Sync and Steering MAY also restrict the type of transformations UFL may per- form on the stream. This makes implementation of Sync and Steering in the presence of otherwise opaque UFLs less attractive. 2.2.8.4 Sync/Steering and iSCSI PDU Length When a large iSCSI message is sent, the TCP segment(s) that contain the iSCSI header may be lost. The remaining TCP segment(s) up to the next iSCSI message must be buffered (in temporary buffers) because the iSCSI header that indicates to which SCSI buffers the data are to be steered was lost. To minimize the amount of buffering, it is rec- ommended that the iSCSI PDU length be restricted to a small value Julian Satran Expires February 2003 53 iSCSI 1-July-02 (perhaps a few TCP segments in length). During login, each end of the iSCSI session specifies the maximum iSCSI PDU length it will accept. 2.3 iSCSI Session Types iSCSI defines two types of sessions: a) Normal operational session - an unrestricted session. b) Discovery-session - a session opened only for target discov- ery; the target MAY accept only text requests with the SendTar- gets key and a logout request with reason "close the session". The session type is defined during login with key=value parameter in the login command. 2.4 SCSI to iSCSI Concepts Mapping Model The following diagram shows an example of how multiple iSCSI Nodes (targets in this case) can coexist within the same Network Entity and can share Network Portals (IP addresses and TCP ports). Other more complex configurations are also possible. See Section 2.4.1 iSCSI Architecture Model for detailed descriptions of the components of these diagrams. Julian Satran Expires February 2003 54 iSCSI 1-July-02 +-----------------------------------+ | Network Entity (iSCSI Client) | | | | +-------------+ | | | iSCSI Node | | | | (Initiator) | | | +-------------+ | | | | | | +--------------+ +--------------+ | | |Network Portal| |Network Portal| | | | 10.1.30.4 | | 10.1.40.6 | | +-+--------------+-+--------------+-+ | | | IP Networks | | | +-+--------------+-+--------------+-+ | |Network Portal| |Network Portal| | | | 10.1.30.21 | | 10.1.40.3 | | | | TCP Port 3260| | TCP Port 3260| | | +--------------+ +--------------+ | | | | | | ----------------- | | | | | | +-------------+ +--------------+ | | | iSCSI Node | | iSCSI Node | | | | (Target) | | (Target) | | | +-------------+ +--------------+ | | | | Network Entity (iSCSI Server) | +-----------------------------------+ 2.4.1 iSCSI Architecture Model This section describes the part of the iSCSI architecture model that has the most bearing on the relationship between iSCSI and the SCSI Architecture Model. a) Network Entity - represents a device or gateway that is acces- sible from the IP network. A Network Entity must have one or more Network Portals (see item d), each of which can be used b