Network Working Group C. Huitema Internet Draft INRIA Expiration Date: November 1995 May 1995 Multi-homed TCP C. Huitema Christian.Huitema@sophia.inria.fr draft-huitema-multi-homed-0.txt 1. Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). 2. Abstract Current TCP connections are identified by pairs of host addresses and port numbers. This introduces a "fate sharing" dependency between the connection and these values, and specially with the time to live of the host addresses. There are at least three cases where this dependency is unduly harmful: (1) when the host moves to a new location and is assigned a new address, (2) when the interface used by a multi-homed host to initiate the connection is temporarily disconnected, (3) when the host's network is renumbered, e.g. after changing provider. We propose to remove this fate sharing effect by allowing the set of addresses used by a TCP connection to change over time. We modify TCP defining a new type of parameter, PCB-ID, to be used during the initial synchronisation. This parameter identifies the "Protocol Control Block" associated to the TCP connection. When initiating a connection, the host attach to the SYN packet the identifier of the local PCB. If both hosts have identified their local PCB, they can now exchange "Extented TCP" packets, where the pair of 16-bit port numbers has been replaced by the 32-bit PCB-ID at the destination. This way, PCB location becomes independent of the IP addresses. The addresses in use can be stored in this PCB, so that we can change these in the course of the connection. In fact, we can use several addresses in parallel for the same connection, which is why our proposal is called "multi-homed TCP". 3. Proposed extensions to TCP Our proposal requires two extensions to TCP: the definition of a context parameter during the synchronization exchange and the carrying of this parameter in the data packets. 3.1. The context identification parameter When a TCP entity initiates a new connection, it can announce its willingness to receive data from alternate addresses by including a "PCB identification" parameter in the initial "synchronization" packet. This parameter has the following format: +----------+----------+-----------+----------+-----------+----------+ |Kind=PCBID| Length=6 | 32-bit Protocol Control Block Identifier | +----------+----------+-----------+----------+-----------+----------+ This option is a unilateral offer, not subject to negotiation. It allows the partner to send data from an alternate source address, if indeed the "local contex id" is somehow repeated in the new packets. The context-id is normally the identification of a "protocol control block"; however, the precise format and bits carried in this parameter are entirely a local matter. The only condition is one on reuse: in order to avoid protocol errors due to old packets popping out of the network, the context-id used by one connection cannot be reused immediately. One must wait for the end of the "TIME- WAIT" state. The parameter kind PCBID is to be allocated by the IANA. This parameter can only be present in TCP packets where the SYN bit is set. 3.2. Incoming data An entity that has learned the PCB identification of its partner can elect to send "Extended TCP" packets. The difference between extended TCP and regular TCP packets are that: - the IP protocol type is (a value to be defined by IANA), not 7 (TCP). - the IP source and destination addresses may be different from the ones use in the SYN packet. - the first 32 bits of the ETCP header carry a protocol control block identification, not a pair of ports. The extended TCP packets have a format identical to normal TCP packets, except for the replacement of the port pair by a context identification: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | protocol control block identification | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Acknowledgment Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Data | |U|A|P|R|S|F| | | Offset| Reserved |R|C|S|S|Y|I| Window | | | |G|K|H|T|N|N| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | Urgent Pointer | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Options | Padding | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 0 1 2 3 ETCP Header Format ETCP packets are delivered to an ETCP incoming process, which merely looks at the context identification and finds the corresponding PCB. All packets arriving on the PCB identified by the context XB are treated exactly as if they had been sent by host A to host B, using the TCP ports "a" and "b", i.e. the ports present in the TCP packet. 3.3. Extended checksum rules The checksum of ETCP packets is computed over the pseudo-header and the packet itself. The pseudo header is composed of the IP addresses actually present in the packet. 3.4. PCB extension The protocol control block of TCB is extended to include a list of partners addresses. This list is initialized with the partner address expressed in the connection request. It is populated with the source addresses from which ETCP packets bound to that PCB are received. Two dates are associated to each address, "first seen" and "last seen", i.e. the date at which a packet was received from this address the first time, and then the last time. The list is sorted by reverse order of "last seen" packets, i.e. most recently seen first. TCP entities should normally send their packets to the most recently seen address. They may elect to spread them over all of the "currently active" addresses, i.e. all addresses whose "last seen" date is larger than the "first seen" date of the most recently seen address. 4. Finding addresses The TCP "protocol control block" will be extended to contain a list of IP addresses of the source entity. These addresses are obtained by monitoring the IP source addresses of the ETCP packets. In this section, we will see the address reassignent at use in four cases, deprecation, anycasting, multi-homing and mobility. 4.1. Deprecation The address autoconfiguration and neighbor discovery procedures of IPv6 allows hosts to renumber their interface. This is supposed to occur in three phases: 1) Initially, the interface of host A is numbered with the address A1, 2) After reception of a router advertisement message, the host A decide that the preferred address of the interface is now A2. Packets sent to the deprecated address A1 can still be received for some period. 3) At the end of that period, the address A1 is invalidated. Packets sent to it are discarded by the network. Let's discuss first the behaviour of the host A, which is engaged in a TCP connection with the host B, using ports a and b and PCB identifiers xa and xb, respectively. Initially, packets will be exchanged using the addresses A1 and B1. (A1, B1, tcp) (a, b, SYN, PCBID=xa) ===========> <============ (B1, A1, etcp) (xa, SYN, ACK, PCBID=xb) (A1, B1, etcp) (xb, ACK, DATA) ===========> <============ (B1, A1, etcp) (xa, ACK) As soon as the address A1 becomes deprecated, A will stop using it. The next data or acknowledgement packets will be sent from the preferred source address A2: (A2, B1, etcp) (xb, DATA) ===========> <============ (B1, A1, etcp) (xa, DATA) (A2, B1, etcp) (xb, ACK, DATA) ===========> <============ (B1, A2, etcp) (xa, ACK) The address change is noted by B because upon reception in context "xb" of a packet from the new address A2. At this stage, B switches to the new address. Due to the propagation delay, some packets sent to the old address may still be in transit at this point. This will last for at most twice the maximum propagation delay. We may then ensure that no packets will be lost if the old address remains valid for at least twice that delay, e.g. more than four minutes. 4.2. Anycasting In order to use IPv6's anycasting facility, we should be able to send the initial SYN packet to an anycast address. Such connection requests are doomed in today's TCP, because anycast addresses cannot be used as source addresses. ETCP solves the problem rather elegantly. Suppose that the packet sent by the host A to the anycast address X is actually routed to the nearest server for that anycast address, B, through its interface B1. (A1, X, tcp) (a, b, SYN, PCBID=xa) ===========> <============ (B1, A1, etcp) (xa, SYN, ACK, PCBID=xb) (A1, B1, etcp) (xb, ACK, DATA) ===========> <============ (B1, A1, etcp) (xa, ACK) Because B knows the PCB identifier "xa", it can sent an ETCP packet back towards A, using the regular address B1 as source address, advertizing its own PCB identifier. A notes the source address B1 in its PCB, and the following packets are happily exchanged between A1 and B1. The solution can in fact be extended to other variations of anycasting, e.g. when the initial packet is sent to a multicast group. The only difference here is that several response will come, e.g. from B and C, two members of the group which both decided to reply to A. (A1, G, tcp) (a, b, SYN, PCBID=xa) ===========> <============ (B1, A1, etcp) (xa, SYN, ACK, PCBID=xb) (A1, B1, etcp) (xb, ACK, DATA) ===========> <============ (C1, A1, etcp) (xa, SYN, ACK, PCBID=xc) (A1, C1, etcp) (xc, RST) ===========> (A1, B1, etcp) (xb, DATA) ===========> Upon reception of the synchronisation packet from C, A realizes that this packet is a duplicate SYN, that the PCB identifier is different from the one advertized by B. It will refuse to consider the packet's informations, and immediately send a "reset" to C, clearing the additional connection. In the rare case where the binary values of xb and xc would be identical, A should simply discard the additional asynchronisation packet. It must not add the corresponding source address to its PCB. 4.3. Multi-homing Suppose that the host A is equipped with two interfaces, A1 and A2. It may decide to spread the traffic over these two interfaces, sending packets alternatively from sources A1 and A2. B, in turn, will send packets alternatively towards each of these addresses: (A1, B1, tcp) (a, b, SYN, PCBID=xa) ===========> <============ (B1, A1, etcp) (xa, SYN, ACK, PCBID=xb) (A1, B1, etcp) (xb, ACK, DATA) ===========> <============ (B1, A1, etcp) (xa, ACK) (A2, B1, etcp) (xb, ACK, DATA) ===========> <============ (B1, A2, etcp) (xa, DATA) (A1, B1, etcp) (xb, ACK, DATA) ===========> <============ (B1, A1, etcp) (xa, ACK) This is possible because B keeps in the PCB identified by "xb" a list of partners addresses. Both addresses will in turn be the "most recently seen" address. 4.4. Mobility Mobility is treated by exactly the same mechanisms as those of "renumbering" which we expressed in 4.1. There are however two problems, fast movements and double jumps. There are few good solutions to the "fast movement" problems. If a host gets a new address each fraction of a second, there is just no way that a mechanism that exchange information with communicating hosts through the whole internet will keep pace. Proposed solutions involve home base and tunnelling. They amount to saying that the mobile host is dual homed. It has a "mobile address", which changes when the host roams, and a "home address", which remains relatively static. When the host moves very fast, it should only use its home address. When the host moves slowly, it can start using the "mobile address" for more efficiency. The double jump problem occurs when two mobile hosts change cells simultaneously. In this case, we are in a situation were neither of the previous addresses is usable. A cannot tell B that its address changed from A1 to A2, because the packets bound to B1 don't make it to B, and vice versa. The only solution here is to request that mobile host maintain a stable home address, e.g. A0 for A and B0 for B. They will "refresh" the address list of their partners by regularly sending messages from these addresses. In the double jump situations, the deadlock will be broken if either A or B manages to get a packet to the home addresses A0 or B0. 5. Security considerations The procedure that we propose introduces a significant security hole. If a third party can obtain the "context-id" used by the TCP connection, then it can send TCP packets using a bogus source address and trick the TCP entity into believing that this is the new address of its partner: it can, in short, "grab the connection". In the current Internet, such connection grabbings require that one somehow subvert the routing protocol, which is more difficult than simply listening to traffic and forging packets. This threat is in fact inherent to the mobility environment which we are willing to accomodate: we have the choice of accepting this mode of operation or implementing a secure version of TCP. Implementing a secure version of TCP may be a good idea in any case, as TCP is currently vulnerable to several forms of attacks: as long as the origin of the packet is not securely authenticated, it is easy for intruders to send forged TCP messages and distort the state of an existing connection. We may be willing to use "IP security" to that effect, wrapping the TCP packets inside an "IP secure" payload: +-----------+-------------------------------+ | | IP +-------------------+| | IP header | security | TCP header + data || | | header +-------------------+| +-----------+-------------------------------+ The IP security header identifies a "security context", to which a key is attached, as well as the type of protection. Once a security context has been established, we obtain a proof of the source Internet address, as well as in some cases an integrity check, an confidentiality, i.e. protection against eavesdropping. How secure IP contexts are established is outside the scope of these memo. If such protections are available, then our procedure becomes entirely acceptable. In fact, we can define three mode of operations: (1) Secure mode, into which only those packets which were transmitted securely can be considered by TCP, (2) Protected mode, into which packets are requested to come either from a well identified address or through a secure channel, (3) Unprotected mode, for users who are willing to take chances. The second mode of operation protects us against the specific threats brought in by mobility, without imposing the performance penalties of cryptography on each and every packet. We should note that it is theoretically feasible to associate security parameters to the TCP connection, in much the same way that they are associated to a pair of IP hosts. This would solve the TCP problem entirely, making our connection very secure. But this is clearly a point of debate. 8. Acknowledgements and related works Many of the features of this proposal were discussed privately with Lixia Zhang, John Wroclavski, Van Jacobson and Greg Minshall. I received comments from Allison Mankin on a previous version of this proposal. The support of multiple addresses bears some similarities with the "transport context names" proposal of Dave Crocker. Various authors, notably Steve Deering, have proposed an alternative "context identification" mechanism based on "volatile port numbers": a specific port is assigned to each connexion within the SYN exchange, which is essentially equivalent to our 'context identifier". The scheme that we propose here has the advantage of using larger identifiers: 16 bits does not necessarily allow enough contexts. It also has the advantage of being entirely backward compatible (an additional TCP parameter can be ignored) and of being explicit: alternative addresses can only be used if the peer provided a context identifier. Appendix A. Relation with other proposals. Letting TCP connections survive a change of addresses is not a new problem. Partial solutions have been proposed, which essentially insert a "permanent identifier" layer between the transport control and the routing process. Instead of identifying the transport contexts by a pair of source and destination addresses, one would use this "more permanent" identifiers, often called "end point identifiers" or EID. The addresses may change but the EID will remain constant. In the "classic" version of the EID proposal, these identifiers will be present in each and every packet, in addition to the addresses. Several implementations have been proposed, e.g. as innovative IPng formats. Including EIDs in the IP or IPng header itself has been ruled out during the debates, but one could design an intermediate layer between IP and TCP that would carry those headers. The proposal, however, suffers from three drawbacks: (1) As EID are likely to have the same size than the IPng addresses, we are considering a sizeable overhead. (2) EID don't exist yet. Creating yet another address space would implies significant administration efforts. (3) Disconnecting identification and routing removes whatever security there is in the current IP protocol. The early SIP and SIPP proposals included a scheme that would alleviate the first concern and eliminate the second one. Instead of creating a new numbering space for EIDs, one would simply specialize one address, e.g. the one that was used for sending the initial synchronization packet of TCP. Most connections will simply go on using that address for their whole duration. If the host moves, or if a new provider is selected, the old address will be still be used as nominal source or destination address, but the packet would be explicitly source routed through the new address. This solution however suffers from the same security problems as the generic EID proposal, and does not entirely cure the "change of provider" problem. If the local network is renumbered, the old addresses ceases to be valid. In fact, that argument applies to all approaches based on "permanent identifiers": we are exchanging a dependency on the IP address for a dependency on these new identifiers. Proponents of the EID approaches are ready to swear on their most sacred books, or on the head of their favorite dog, that these new numbers will remain valid, unique and stable for eternity. We may note however that there is not a single human construct that ever met this requirement. The case is even more obvious if we focus on network history, where exceptions to the unicity and stability rules have been found for all proposed "unique numbers" that were considered, be they IP addresses, which get renumbered, domain names, which change from time to time, or IEEE-802 "MAC addresses", which can easily be misconfigured. Rather than yet another gigantic administrative effort, our solution involves only locally asserted numbers, the PCB identifiers. 3. Why TCP Our solution focuses on TCP. One may well object that TCP is not the only protocol used over the Internet, that the fate- sharing also occurs with other transport protocols, notably over UDP. However, a quick analysis shows that "user controlled" applications, that use the user datagram protocol, are much less sensitive than those application which relie on the services of a transport layer. This is notably the case of "client server" and "real time" application; this is also the case of "secure" applications. The most popular building block for "client server" applications is the "stateless RPC". In this model, a client simply sends a request to the server, generally a single UDP datagram. Upon reception of this request, the server executes the required actions and sends back a reply, in most cases a single UDP datagram. No "context" is kept by the server, and the time-span of the transaction is generally quite short, which minimizes the risk of observing a move in the middle of a transaction. At worst, if such a transistion occurs, the client may repeat its request. Indeed, if the server moves, the client will have to find its new location, its new address. But that has to be solved by updating the routing process or the name service; it is not a "transport control" problem. It is true that client-server applications may have to maintain long term associations. But they generally do so by identifying "application level contexts", which are identified by the users and servers names: authentication and access controls ought to be based on user names, not on hardware addresses. Real time applications also use UDP, generally in conjunction with "multicast transmission". The practical experience have shown however that the multicast transmission is very often achieved through a cascade of relais, and that the source address received in the packets is not always that of the actual source. The "real time transport protocol" includes an identification mechanism by which stations are assigned numbers within multicast groups. Real time applications go to great length to reduce their packet size, using computation intensive compression techniques; an additional header in each and every packet would be entirely unwelcomed. In fact, one may draw the general conclusion that applications which are using datagram services have to incorporate their own control functions. Identification of the partner, as well as general security measures, are an integral part of these applications. Providing them with a "one size fits all" intermediate layer would probably not help. Then, one may also observe that even if one were to implement a "transparent" solution, e.g. using some form of stable identifiers, the TCP code would still need to be updated. The error and throughput characteristics of mobile networks are different from those of fixed networks; controlling parallel transmission over multiple paths is not quite the same as controlling a single path. Key points of the implementation, such as the retransmission strategy or the congestion control algorithm would have to be revised in any case. Appendix B. Managing a set of addresses We have mentioned the need to maintain a "working set" of partner addresses, and to "grade" each address in that working set. However, the procedure described in the previous section only allows for addition of addresses to the connection, not for their removal. This is because we expect that the working set of addresses will simply be a cache of the "most recently used" address of our partners. TCP entities may use whatever strategy you want to maintain the set of addresses within reasonable limits, e.g. LRU replacement. We will describe here some possible manners of grading the addresses. The first seen and last seen monitoring will produce a first selection of selectable addresses within the working set. However, all the selected addresses should not necessarily be used. Even if several of them should be used, they need not be all used at the same rate. Suppose that we have sent different packets to different addresses. If one of these addresses has becomed invalid, packet sent to it will be lost. Indeed, packets sent to valid addresses may also be lost, e.g. if a cosmic ray hits a photodiode at the wrong moment. It is however quite reasonable to relate the selection of addresses with the error control: if a packet sent to one particular address failed, further packets should preferably be sent another acceptable address. Current TCP implementations try to characterize the network they are using by computing round-trip time estimates and congestion windows. When several addresses are in use, one may well believe that each address will be reached by a different path, and that each of these paths will have different characteristics. In fact, due to the sequential nature of the acknowledgment process, it would be very difficult to maintain more than one round-trip estimate for the connection. However, it is entirely conceivable to maintain a separate congestion avoidance window for each address in the working set, and to mainatin these windows by running parallel versions of the "slow start" algorithm. The window will be initialised when a new address is added to the working set; it will be increased when a packet sent to that address is acknowledged, it will shrink when a packet sent to that address is lost. Maintaining a congestion window per active address provides for a rationale way of spreading the traffic over the various addresses. It is clearly not the only way. In fact, this may well open the path for very fruitful researches.