INTERNET DRAFT Yogesh Prem Swami File: draft-swami-tcp-lmdr-02.txt Khiem Le Expires: August 09, 2004 Nokia Research Center Dallas March 08, 2004 Lightweight Mobility Detection and Response (LMDR) Algorithm for TCP Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of [RFC2026]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html Abstract TCP congestion control is based on the assumption that end-to-end path of a connection doesn't change--or at best changes infrequently--once the connection is established. However, when a user moves from one subnet to another, this assumption breaks down. After a subnet change, A TCP sender that relies only on the rate of the arrival of ACKs for congestion control may inadvertently add to unnecessary network congestion (or reduced throughput). What's worse is that a TCP sender may be totally unaware of such user mobility and may not be able to take any remedial action to prevent packet loss. In this document we describe a network layer independent mechanism by which a TCP receiver can propagate its subnet change information to its peer, and based on that the sender can take appropriate action. Expires: August 09, 2004 [Page 1] draft-swami-tcp-lmdr-02.txt March 08, 2003 1. Introduction TCP congestion control [RFC2581] is based on the assumption that end-to-end path of a TCP connection does not change--or at best changes infrequently--once the connection is established. Based on this assumption, TCP increases its data rate (probes the network) whenever it receives a positive feedback in the form of ACKs. However, unless the assumption of "constant path" is made, the TCP sender cannot continue with the old data rate since the two paths may have different capacity and levels of congestion When a TCP sender or receiver changes its point of attachment to the Internet (henceforth referred as "changes subnets"), the entire end-to-end path between the sender and receiver can change. In these cases, the rate at which ACKs are received only reflect the congestion state of the old path. Therefore, relying on the rate of arrival of ACKs as the only criterion for congestion control can lead to periods of congestion that cannot be alleviated using existing algorithm. To summarize: After a subnet change following bad things can happen- a) the TCP sender MAY add to congestion and continuously lose packets in those subnets where there is an influx of connections from other subnets, OR b) in case the packets sent in old subnet are all lost due to subnet change (typically the case with Mobile-IPv4), then the TCP sender may have to wait until the RTO expires before it can start its loss recovery algorithm, OR c) it MAY spend a lot of time trying to reach a reasonable throughput on the new path if the congestion and network capacity (measured in terms of bandwidth-delay product) on the two different paths are substantially different. This is a direct consequence of having a SS_THRESH set to a value that does not reflect the real value of SS_THRESH on the new subnet. In this document, we describe a network layer independent mechanism by which a TCP receiver can propagate its subnet change information to its peer. We assume that a mobile host always knows its own subnet change (for example, by looking at its neighbor cache, destination cache, default router, or a combination of these [RFC2461]), but currently, it may not be able to inform about its subnet change to its peer. Please note that some network layer mobility management techniques such Mobile-IPv6 [JPA03] with route optimization may be used to Expires: August 09, 2004 [Page 2] draft-swami-tcp-lmdr-02.txt March 08, 2003 indirectly derive peer's mobility information (for example, a TCP sender can look into its binding cache to derive its peer's mobility information), but these schemes do not work in cases such as Mobile-IPv6 with reverse tunnelling, Mobile-IPv4 [RFC3344], or other types of networks such as traditional cellular networks. Once a TCP sender has mobility information about itself or its peer, it can use the congestion response described in section-5 to adjust its data rate. The rest of this document is organized as follows: Section-2 defines the terminology used in this document. Section-3 describes the issue of congestion in more detail. Section-4 has the details of subnet change algorithm, and Section-5 contains the associated congestion response algorithm. Section-6 describes certain corner cases. 2. Terminology The key words "MUST," "MUST NOT," "REQUIRED," "SHALL," "SHALL NOT," "SHOULD," "SHOULD NOT," "RECOMMENDED," "MAY," "OPTIONAL," and "silently ignore" in this document are to be interpreted as described in [RFC2119]. Mobile Node (MN): A host (not a router) capable of changing its point of attachment to the Internet without breaking transport layer connectivity. Hosts that change their point of attachment to the Internet but use DHCP or other mechanism to get a new IP address are not considered mobile. Old Subnet: MN's point of attachment (subnet prefix) to the Internet prior to movement New Subnet: MN's point of attachment after movement. Stale ACK: ACKs generated in response to the data orginally sent in the old subnet (note that some routers might transparently tunnel these packets to new subnet, but even in then the ACKs are still considered stale). INIT_WINDOW: The initial congestion window size at the start of connection as described in [RFC3390]. Expires: August 09, 2004 [Page 3] draft-swami-tcp-lmdr-02.txt March 08, 2003 3. Congestion Issues with Subnet Change For concreteness, the description below assumes network mobility based on Mobile IP, but the same concepts are readily applicable to other types of networks. To illustrate the problem, consider Figure-1. At time=T, the MN is reachable on Subnet-1 through AR-1 and has the care-of address . While MN is "attached" to AR-1, packet exchange between TCP-Sender and takes place using PATH-1. Let's assume that after some period of time, at T+1, MN moves (hands over) to Subnet-2 and is reachable through AR-2 with the care-of address . While MN is attached to AR-2, all packets exchanged between TCP-Sender and traverse though the Internet Cloud-2 (which may or may not overlap with Cloud-1) and use PATH-2. <---------PATH-1----------> /---------\ +---------+ | | | | Subnet-1 +---+ Cloud-1 +---+ AR-1 +-->>>>>MN | | | | | (Time=T) +------------+ | \----++---/ +---------+ | | | || | | TCP Sender +---+ ^V PATH-3 ^V^ PATH-4 | | | || | +------------+ | /----++---\ +----+----+ | | | | | Subnet-2 +---+ Cloud-2 +---+ AR-2 +-->>>>>MN | | | | (Time=T+1) \---------/ +---------+ <--------PATH-2-----------> During the transient period when MN moves from Subnet-1 to Subnet-2, AR-1 may (or may not) buffer and forward packets destined to and from through PATH-3 or through PATH-4 [K03]. We make the distinction between PATH-3 and PATH-4 to emphasize the fact that PATH-4 may belong to a well provisioned network that has dynamic equilibrium for mobile users. Such networks are designed to accommodate extremely bursty traffic. PATH-3, on the other hand, may consist of arbitrary routers without proper provisioning. Let's assume that a TCP connection was progressing between MN and Expires: August 09, 2004 [Page 4] draft-swami-tcp-lmdr-02.txt March 08, 2003 TCP Sender when the user moves from Subnet-1 to Subnet-2. We now analyze the problem of congestion on different paths shown above. 3.1 Congestion On PATH-1 Congestion on PATH-1 is governed by basic slow-start and congestion avoidance mechanisms [RFC2581]. As long as MN is on Subnet-1, standard congestion control is sufficient. But once it moves from Subnet-1 to Subnet-2, two different events can take place: 1. All packets destined to Subnet-1 are dropped by AR-1. In this case, after MN moves to Subnet-2, the TCP sender will most likely timeout since the tunnel establishment to the new access router will typically exceed the time during which the ACKs can trigger new data (in other words, the new data triggered by ACKs in flight will still have their tunnel end point set to AR-1 because of the latency involved in establishing the new tunnel). After timeout, the TCP sender will start with a congestion window of 1 which will hopefully traverse the new path PATH-3. In this case there is no need for extra congestion control. The disadvantage, however, of dropping all packets destined to Subnet-1 are: a) The sender will wait for one complete RTO before it can start loss recovery b) If the MN moves faster than one subnet per RTO, on an average, the TCP receiver will take a relatively long time to recover such packets (theoretically, it will never be able to recover, but in practice this is not true due to the randomness of motion). c) The sender will reduce its SS_THRESH to 1/2 packets in flight. Since there is no correlation between BDP and packet loss on PATH-1, the throughput of the connection will suffer if the SS_THRESH on new path is set to a small value (for example, if the sender moves to the new path right after the connection setup, and the SS_THRESH will get set to 2*MSS) 2. All packets (or all packets arriving to AR-1 during some period of time) destined to are forwarded to ([K03] describes the details of how this can be done). In this case, AR-1 can forward packets to using PATH-3 or PATH-4. We consider these two paths separately. Expires: August 09, 2004 [Page 5] draft-swami-tcp-lmdr-02.txt March 08, 2003 3.2 Congestion On PATH-3 If AR-1 starts forwarding packets to AR-2 using PATH-3, PATH-3 will experience a sudden burst of data. In addition, If multiple MNs move between AR-2 and AR-1, PATH-3 MAY get congested. But if sending packets on PATH-3 is bad for other connections, dropping them is bad for the connections that change subnets (section-3.1). 3.3 Congestion On PATH-4 In many cases, it's reasonable to assume that wireless service providers will have a well provisioned network that can accommodate highly bursty traffic. Such networks may have a dynamic equilibrium where the average transit traffic from AR-1 to AR-2 is the same as the transit traffic from AR-2 to AR-1. Such well provisioned paths are, however, not possible Internet-wide, since different mobile users will typically be connected to different hosts. 3.4 Congestion On PATH-2 Since the MN is able to receive packets even after moving away from AR-1, it will continue to generate ACKs in the orderly fashion. These ACKs will traverse PATH-3 or PATH-4 and finally reach the TCP sender. But the segments sent by TCP sender due to these ACKs will travel on PATH-2 (assuming the TCP sender has received the binding update to send data on new path). Unfortunately, the TCP sender has no congestion information about PATH-2 and using the old congestion window may cause network congestion on PATH-2. This problem becomes worse as the number of mobile users or rate of subnet change increases in the system. To summarize, after a subnet change, if the old access router does not take part in tunnelling packets to new subnet, there is no problem of congestion, but such a scheme is inefficient (section-3.1). On the other hand, if an old access router does take part in tunnelling packets to new subnet, the new path may get heavily congested. 4. Subnet Change Detection Quite often, a TCP sender is not aware of its peer's subnet state (whether it's in the old subnet or in a new subnet) even though its peer almost always knows about its own subnet information. This happens, for example, if MN uses Mobile-IPv6 with reverse routing (i.e., the home network transparently tunnels all packets to the receiver), or Mobile-IPv4, or cellular network for mobility management. It's therefore important to have a subnet change detection mechanism at the transport layer that can propagate this Expires: August 09, 2004 [Page 6] draft-swami-tcp-lmdr-02.txt March 08, 2003 information between peers. This section describes such a subnet change detection scheme. Subnet change detection in itself is a two step process. First, a mobile terminal needs to know it has moved from one subnet to another; second it needs to propagate this information to its peer. Detecting when a mobile terminal has changed its subnet is a neighbor discovery [RFC2461] problem and is beyond the scope of this document. In this document we assume that hosts can determine their own subnet information with the assistance from lower layers. We now focus on how a mobile can propagate this information to its peer. To do so, we propose to use one bit--call it 'M-bit'--from "reserved bits" in the TCP header. This bit acts as a flag whose value remains unchanged as long as the mobile remains attached to the same subnet. Once the mobile moves to a new subnet, the mobile flips (binary NOT) the bits and keeps the bit flipped as long as it remains in the new subnet. The peer host compares the value of 'M- bit' with the previously received values and uses any M-bit transition as an indication for peer's subnet change. Following are the details of subnet change detection algorithm: 1. Each TCP implementation should keep three state variables--my_subnet_flag, rem_subnet_flag, and high_out_old--to facilitate mobility detection. In addition, a sender MAY also keep another state variable--prefix_now--to indicate the current subnet-prefix information. The first two flags (my_subnet_flag, rem_subnet_flag) hold the mobility state information about the local and remote TCP respectively. 'high_out_old' is the highest sequence number of packet-in-flight when a TCP receiver detects that its peer has changed subnet. This state information is needed for congestion response. 2. At connection set up, both the client and server willing to have mobility detection should set the M=1 in the SYN packets sent by TCP client and server. If either (or both) of the SYN packets has M=0, then the TCP sender should stop processing mobility detection and response scheme. In these cases a Mobile Host should let the sender timeout after subnet change. Once both the entities know that the sender and receiver have mobility detection capabilities, the TCP sender and receiver should initialize my_subnet_flag =1; remote_subnet_flag=1; 3. For each packet sent, each host should determine Expires: August 09, 2004 [Page 7] draft-swami-tcp-lmdr-02.txt March 08, 2003 if it has moved to a new subnet. If either of the end points determines that it has moved, it should update the value of my_subnet_flag as follows: my_subnet_flag = ~(my_subnet_flag) where '~' is the boolean operation NOT. ***In addition, the receiver should also send an ACK with the highest sequence number within the maximum delayed ACK period if no such ACK is already scheduled.*** 4. Before sending any data or ACK packet, the TCP sender should set the value of M-bit in the TCP header as: M=my_subnet_flag 5. When the peer TCP receives a valid TCP packet, it should compare the value of 'M-bit' with the value of 'rem_subnet_flag.' If the two values match, TCP should proceed as usual. If the two flags differ, then the TCP sender SHOULD update the variables as follows: rem_subnet_flag=M-bit of the present packet. high_out_old = Sequence Number of the Last Byte in the retransmission queue. The peer TCP uses 'high_out_old' so that it does not base the congestion control decisions on stale ACKs. After making these changes, the TCP sender SHOULD follow the congestion response algorithm as described in section-5. NOTE: In certain network architectures it's possible that a mobile (and the associated link technology) has information on the congestion of the new path. In these cases, if the congestion on the new path is low, one MAY choose not to indicate the mobility information (i.e., flip the 'M-bit') to the sender since there is no need to reduce the data rate. However, the mobility information MUST be indicated if no such information is available. Implementation Note: Since M-bit is part of reserved bit, a firewall may drop the SYN packet itself [RFC3360]. Enabling this feature should take care of this in order to prevent black holes. 5. Congestion Response after Subnet Change The goal of congestion response after subnet change is to minimize Expires: August 09, 2004 [Page 8] draft-swami-tcp-lmdr-02.txt March 08, 2003 congestion on PATH-2. In principle, congestion response for PATH-2 has the same congestion control issues as with initiating a new connection--the sender should have no more than INIT_WINDOW worth of data outstanding on the *new path* and the SS_THRESH should be set to a large value. What makes the problem complex is the fact that unlike new connections, connections after subnet change have non-zero packets in flight. ***The congestion response after subnet change MUST therefore ignore the stale-ACKs and only use the ACKs generated in the new subnet to base its congestion control decisions.*** Unfortunately, the cumulative ACK property of TCP does not allow an easy way to ignore stale-ACKs. In this document we describe the congestion response in the presence of SACK option [RFC2018] only. With SACK option the congestion response waits for the SACK/ACK of new data sent in the new subnet, before growing its window. Following are the details of the algorithm: 1. Set the congestion window as cwnd=cwnd+INIT_WINDOW; 2. Send INIT_WINDOW worth of data on the new path and restart RTO timer as if this were a new connection [RFC2018]. 3. For each subsequent ACK received, follow mobile_SACK_cong_resp() mobile_SACK_cong_resp(tcp_packet ack_pkt){ IF ( ( ack_packet contains an ACK seq > high_out_old) OR ( ack_packet contains a SACK seq > high_out_old)){ cwnd=INIT_WINDOW + 2; SS_THRESH =INFINITE; if( ack_packet contained a SACK > high_out_old){ Mark packets less than high_out_old without a SACK flag as lost; Update packets in flight assuming all unsacked packets were lost; Expires: August 09, 2004 [Page 9] draft-swami-tcp-lmdr-02.txt March 08, 2003 Do loss recovery as described in [RFC3517]; } else { send new data as appropriate; } Follow [RFC2988] for timer calculation as if this were a new connection; } ELSE { cwnd = 0; /* Don't send any new data */ If ACK contains a SACK block, mark the packet as sacked; DO NOT restart the RTO timer even for pure ACKs; } Please note that the above algorithm waits for an ACK or SACK block that must have traversed the new path. In addition, the timer values are initialized as if this were a new connection. The timer values are not reset for stale ACKs since they don't provide any new congestion information (data flow rate) about the new path. 6. Anomalies 6.1 Race Conditions The congestion response algorithm described above works fine as long as the TCP sender receives the flipped M-bit before the new path is established. But if the flipped M-bit is received much later, the TCP sender would have already injected some data on the new path. An implementation MUST take proper precaution to send the M-bit before the new path is established (for example, by sending the flipped M-bit in parallel with the binding update procedure) 6.2 Rapid Subnet Hopping Consider the case when a mobile node moves from subnet-1 to subnet-2, to subnet-3 in a short period of time. If all the ACKs generated in subnet-2 are lost, it's possible that the sender will miss the subnet change indication. We believe that such events are rare and we do not attempt to solve it. Expires: August 09, 2004 [Page 10] draft-swami-tcp-lmdr-02.txt March 08, 2003 7. Architectural Considerations Architecturally, the method described above does not add any new architectural features in the system. Although LMDR requires a TCP receiver to look into some parameters and data structures (local to that stack) that are specific to IP layer, it should not be a problem either from an implementation point of view or from a theoretical point of view. In most cases, TCP layer already consults the IP layer for MTU information at the very least. Recently several proposals have been made regarding link-up and link-down, which addresses different link layer related issues. LMDR is different from these proposals and it's been designed for just one purpose: Subnet change notification and response for a TCP connection. LMDR does not try to solve any link-up or link-down issues which may or may not take place due to subnet change. 8. Security Considerations Since M-bit is valid only for an acceptable ACK [RFC793], it's immune to passive attacks as long as the congestion window is not of the order of 2^31 bytes. However, M-bit is not safe against active DoS attacks (present TCP is not safe either). We will describe a security mechanism (a TCP option) to protect against active attacks if there is a requirement from the working group. 9. Acknowledgments We would like to thank Mark Allman for his comments and suggestions on the draft. 10. REFERENCES [RFC2581] M. Allman, V. Paxson, W. Stevens, "TCP Congestion Control," Apr 1999. [K03] R. Koodli, "Fast Handover for Mobile IPv6," Internet draft; work in progress, draft-ietf-mobileip-fast- mipv6-07.txt, Sept 2003. [RFC2461] T. Narten, E. Normark., W, Simpson, " Neighbor Discovery for IP Version 6 (IPv6)," Dec 1998. [JPA03] D. Johnson, C. Perkins, J. Arkko, "Mobility Support in IPv6," Internet Draft; Work In Progress, draft-ietf- mobileip-ipv6-24.txt, June 2003. Expires: August 09, 2004 [Page 11] draft-swami-tcp-lmdr-02.txt March 08, 2003 [RFC3344] C. Perkins, "IP Mobility Support for IPv4," Aug 2002. [RFC3390] M. Allman, S. Floyd, C. Partridge, "Increasing TCP's Initial Window," Oct 2002. [RFC3360] S. Floyd, "Inappropriate TCP Resets Considered Harmful," Aug 2002. [RFC3517] E. Blanton, M. Allman, K. Fall, L. Wang, "A Conservative SACK-based Loss Recovery Algorithm for TCP," Internet draft; work in progress, draft-allman- tcp-sack-13.txt, Oct 2002. [RFC2018] M. Mathis, J. Mahdavi, S. Floyd, A. Romanow, "TCP Selective Acknowledgment Options," RFC 2018. Nov 2000. [RFC2988] V. Paxson, M. Allman, "Computing TCP's Retransmission Timer," Nov 2000. [RFC793] "Transmission Control Protocol," RFC-793, Sept 1981. 11. IPR Statement The IETF has been notified of intellectual property rights claimed in regard to some or all of the specification contained in this document. For more information consult the on-line list of claimed rights at http://www.ietf.org/ipr. Author's Address: Yogesh Prem Swami Khiem Le Nokia Research Center, Dallas Nokia Research Center, Dallas 6000 Connection Drive 6000 Connection Drive Irving, TX-75063, USA. Irving, TX-75063. USA. E-Mail: yogesh.swami@nokia.com E-Mail: khiem.le@nokia.com Ph : +1 972 374 0669 Ph : +1 972 894 4882 Expires: August 09, 2004 [Page 12]