Internet Draft Douglas Otis Expires October 2001 SANlight April 19, 2001 iSCSI Full Acknowledgement draft-otis-iscsi-fullack-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or made obsolete by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This document is illustrative of potential modifications to the iSCSI protocol proposal (draft-ietf-ips-iscsi-05+.txt). These changes are to create a means to do the following: - Ensure Management response is coherent. - Acknowledge ALL requests delivered to the Server. - Ensure integrity of the iSCSI request window. - Open request window during abnormal events. - Quickly eliminate invalidated requests. - Quickly expunge sequence holes. - Simplify the reception sequencer. draft-otis-iscsi-fullack-00.txt Page [2] Problem Statement: The iSCSI Service Delivery Subsystem provides a means of exchanging Device Service and Task Management requests and their associated data and responses between Client and Server. The Service Delivery Subsystem is assumed to provide error-free requests and responses between the Client and Server. (See SCSI Architecture Model–2.) This model assumes that the Service Delivery Subsystem enforces state synchronization transparent to the Server. The iSCSI Service Delivery Subsystem is also assumed able to provide sequential delivery between Client's and Server's Service Delivery Port. Although the SAM-2 model assumes sequential delivery, the iSCSI protocol extends this to be a requirement to minimize complexity. The Service Delivery Subsystem must also ensure responses to Task Management requests are delivered within the sequence of Server responses or Task state information becomes corrupt. iSCSI uses the TCP protocol as a transport that, in general, ensures reliable and sequential exchanges. Much of the complexity found in adhering to the sequential delivery requirement is created through the use of multiple TCP connections. Multiple connections allow increased reliability and capacity through multiple paths or adapters and are not to capture increased network bandwidth. Because these connections are expected to transverse different physical equipment, their relative latency is expected to diverge. To ensure sequential delivery, all Client requests are serialized session wide with connection allegiance between Client request and Server response. An exception to this serialization and sequential delivery are for requests to be presented to Server ahead of requests contained within Service Delivery Subsystem. If limited to a single instance, the present pending provision of using a flag and non- incremented serialization rather than null serialization allows the identification of this request relative to requests on differing connections. If successive ahead-of- sequence requests are limited to the same connection as well as the subsequent normal request carrying the same serialization, then these request's relative position can also be determined. Serialization of requests session wide provides two functions. First, it allows simple detection of requests that may have been repeated. The underlying mechanism of TCP is connectionless IP and, as a result, does not provide an indication of communication loss. TCP will eventually detect communication loss perhaps well after iSCSI attempts corrective action. Second, session wide request serialization allows for sequential delivery to the Server as well as timely acknowledgement of reception. draft-otis-iscsi-fullack-00.txt Page [3] In the event of the Service Delivery Subsystem attempting corrective action, the suspected connection is terminated by a new TCP connection doing a Login Restart using the same connection ID while also acknowledging the response serialization and the prior connection allegiance is transferred to this new connection. Potential problems arise as the Service Delivery Subsystem does not acknowledge ahead- of-sequence requests and, if there are successive ahead-of- sequence requests, repeated requests cannot be determined during corrective action without examining Client Tags. There is not necessarily a one-to-one relationship between Task Management and affected tasks. This creates a state synchronization problem as the connections returning to the Client are independently serialized. The Task Management response may be seen out of context to Server responses as a result. Task Management requests are identified to the Service Delivery Subsystem and will allow for special handling. Sequential delivery to the Server with a request window offers an additional problem. The iSCSI Service Delivery Subsystem combines Logical Units. Task Management is generally limited to one outstanding request per Logical Unit but there is only a provision for one additional Task Management request if flagged ahead-of-sequence such that successive Task Management requests will carry the same serialization and at least enough spare resources must be set aside to accommodate requests for the number of Logical Units. Sequential delivery potentially offers another problem depending on Logical Unit hierarchies and related delivery structures. If the Logical Unit is a simple flat model, then delivery may be stopped by lack of associated resources together with a busy unit. If the acknowledgement returned by the Service Delivery Subsystem is for delivery, then acknowledgement stops until resources become freed. The event created by an ahead-of-sequence or Task Management request will likely invalidate requests in-transit within the Service Delivery Subsystem. The quantity and latency of this in-transit request queue may be problematic for applications that are not likely to anticipate this unusual situation. The Logical Unit must enter into an ACA condition to reject these requests that may extend beyond normal fabric timeouts. As iSCSI may include various SCSI models, this inter-locking mechanism to purge in-transit requests may not exist. Solutions and Benefits: To ensure proper context of a response to a Task Management request, it must not appear before prior Server responses. Server response serialization can be changed to session wide in the same manner as Client requests. The benefit is Server resources can be freed without a response directed specifically to each connection. Logging of Server responses can be compiled in a coherent fashion. A connection failure becomes apparent across all connections at the Client. This is important as the Client is expected to initiate recovery action. draft-otis-iscsi-fullack-00.txt Page [4] As ahead-of-sequence requests, which are most likely Task Management requests, do not increment the request serialization, these requests are without Service Delivery Subsystem acknowledgement and are without a simple sequential sorting variable. With the exception of the sorting problem solved by examining the Client Tag, this technique keeps the sequencer handling of these events simple but there would be another technique that would also afford an even greater level of simplicity. The side affect of these requests is the likely invalidation of many in-transit requests. Instead of creating a special case for ahead-of-sequence request serialization, treat these requests in the same manner as all other requests. The mechanism to advance these requests however is to reject all prior pending non-ahead-of-sequence requests back to the Client. This has the advantage of instantly opening up the request window to its maximum. No additional set aside resources need to be allotted to handle ahead-of-sequence requests. The reject status could either indicate the range of requests rejected or each request could be individually rejected such that the Client is then freed to either purge or retry those requests as required. It places no expectations on the Service Delivery Subsystem to interpret the nature of SCSI requests treated in this fashion. It also ensures a timely removal of enqueued requests well within typical fabric timeouts. This rejection technique can also simplify the recovery of a terminated connection as the failed connection serialization does not need to be recalled for recovery nor are timeouts required to discovery the sequence holes created as a result of the connection termination. This rejection technique also maintains the integrity of the iSCSI request window. The technique removes the potentially sizeable amount resources that must be set aside otherwise. If the sequencer, a term for the sequencing function within the Server side of the Service Delivery Subsystem, was unable to deliver a request, sending an over-ride of this request would create uncertainty as it would be unknown if progress continued as a result of the prior request being accepted or if the over-ride had taken effect without explicit status indicating such. The rejection structure is already in place to allow for the sequencer to simply be advanced to the point this ahead-of-sequence request. draft-otis-iscsi-fullack-00.txt Page [5] The sequencer process could look something like the following: if ( (request_SN – next_request_SN ) > 2^(SERIAL_BITS - 1)) { reject_pdu(request_SN, SEQUENCER_INVALIDATION); } if (request_SN == next_request_SN) { send_pdu(request_SN); next_request_SN++; } Upon receipt of a request flagged ahead-of-sequence, the 'next sequence' value immediately becomes the serialization of this request as well as ExpCmd advancing to this value plus one. Rather than silently discarding these requests as it is now defined, these requests should be rejected back to the Client. Unexpected rejections would be an indication of nefarious spoofing attempts or a software bug. One unsatisfactory alternative would be a redefinition of Service Delivery Subsystem acknowledgement to indicate point of sequential reception without actual delivery. This would then create a problem of again having a large quantity of enqueued requests but now beyond even the ability to remove these requests with an ahead-of-sequence flag. Using the ahead-of-sequence flag to create a response that indicates the range of commands rejected or rejection on an individual basis ensures the state of the Server can be quickly ascertained well within a fabric timeout. This allows quick recovery of a connection termination, a Logical Unit hang condition, a flushing of invalidated requests, and an instant opening of the request window while still enabling the iSCSI flow control mechanism. This technique also ensures all requests are provided a timely acknowledgement by the Service Delivery Subsystem as requests are delivered. draft-otis-iscsi-fullack-00.txt Page [6] Author's Address: Douglas Otis SANlight Inc. 160 Saratoga Ave, #40 Santa Clara, CA 95051 Tel: (408) 260-1400 x2 dotis@sanlight.net Full Copyright Statement Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.