Internet Draft Man Yeob Lim Dae Young Kim Chungnam Nat. Univ. November 1997 IP Extension for Reliable Multicast Status of This Memo This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force(IETF), its areas, and its working groups.Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". To learn the current status of any Internet-Draft, please check the "lid-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ds.internic.net (US East Coast), nic.nordu.net (Europe), ftp.isi.edu (USWest Coast), or munnari.oz.au (Pacific Rim). Abstract This memo presents IP extension for recovering multicast packets from congestion. Dropped packets can be recovered far faster by IP routers with extension of this memo than by group member end-hosts. Because necessary interactions are limited among adjacent routers, this scheme substantially reduces overall signaling overhead among group members for packet recovery. Lim & Kim Expires May 1998 [page 1] Internet Draft IP Extension for Reliable Multicasting November 1997 Table of Contents Status of This memo Abstract 1. Introduction .................................................... 3 2. Overview ........................................................ 4 2.1. Recovering single loss by cache routers .................... 4 2.2. Recovering burst loss by buffering routers ................. 4 2.3. Three schemes of recovery .................................. 5 2.4. Delay before retransmission ................................ 5 3. Protocol description ............................................ 6 3.1. Extension to IP datagram adding cache/buffering router address 6 3.2. Converting multicast datagram to unicast datagram .......... 6 3.3. Drop ICMP Message .......................................... 7 3.4. ICMP register buffering host message ....................... 7 4. Implementation issues ........................................... 8 4.1. Compatibility .............................................. 8 4.2. Minimum implementation ..................................... 8 4.3. Cache/Buffer size consideration ............................ 8 Lim & Kim Expires May 1998 [page 2] Internet Draft IP Extension for Reliable Multicasting November 1997 1. Introduction Since the IP Multicast was proposed[1], there have been many research works on reliable multicast protocols. However the fact that the multicast itself is done in the IP layer but the solutions are sought in the transport layer makes the search for solutions more difficult. The transport protocol sits on group member end hosts which are spread over a large geographical area, and so if packet losses occur in the layer, it not only takes long to detect in the transport layer but also makes group coordination very complicated. Even though many schemes were proposed to overcome the multicast communication losses[2-6], it is hard to devise an efficient solution without any attempt involving the IP protocol in the task. There are two types of packet loss in the real Internet environments. One type is transparent to routers, while the other is not. The first type includes packet loss due to link error or router failure. In order to recover these packets an end-to-end Ack/Nak operation is required. The second type includes packet drop due to congestion or TTL(time to live) expiration. This type of packet loss is made by explicit router decision. As the transmission quality improves, the first type of packet loss is diminishing to a negligible order and the congestion becomes major reason for packet loss. Because packet drops at congestion are done with routers' knowledge, we can think of a recovery scheme by explicit coordination among routers. If recovery of lost packets are done instantaneously and actively by the IP routers before later intervention by the higher protocol, not only the end-to-end multicast protocol can be significantly simplified but also the recovery can be done in a much faster fashion. A minimal requisite for the routers' capability at congestion in order to make the proposed scheme possible is that the router should be able to see the packet to collect necessary information before actually dropping one. A study[7] shows that the loss on the links of the multicast network is observed to be only 2% or less of the whole packet loss and also that the rest congestion loss are again classified into two types, single and burst. Most of the congestion loss consist of isolated single losses, but a few of very long loss bursts, lasting from a few seconds up to 3 minutes(around 2000 consecutive packets) contribute heavily to the total packet loss. The single losses are believed to be coming from instantaneous congestion of a relatively short period, while the burst losses are coming from long lasting congestion. We propose extensions to IP and ICMP protocols for efficient recovery of both single and burst packet losses due to router congestion. We propose to place multicast routers with recovery cache or buffer in various places in network so that lost packets can be recovered by coordination among routers. It is not suitable that all lost packets be recovered by routers. Recovery should be limited only to important multicast packets which are to be specially tagged, so that the cache size can be minimized and multicast routers are not required to do too Lim & Kim Expires May 1998 [page 3] Internet Draft IP Extension for Reliable Multicasting November 1997 much a processing overhead. In IP version 4, reliability bit in the type of service field can be used for this purpose. In IP version 6, one entry in the priority field is suitable to specify reliable multicasting per packet or flow label can be used to specify reliable multicasting per data stream. 2. Overview 2.1. Recovering single loss by cache routers Single losses are recovered by the so called cache router, which is located at just one previous hop from where congestion occurs. Cache routers continuously copy all multicast packets with QoS of recovery attribute in its ring type cache. When a cache router forwards a multicast packet, it updates the cache router address in the option field of the multicast packet with its own IP address. While this multicast packet travels along the network toward destination, the cache router address is updated every time the packet passes through cache routers. When a router has to drop a packet due to congestion, it sends a drop message to the cache router whose IP address is specified in the option field of the packet. Upon reception of the drop message, the cache router looks for the same packet in the cache. If it is still there, the cache router decrements TTL field of the packet, retransmits the packet and stores a copy in the cache again so that retransmission can be repeated as long as the packet's TTL is not equal to zero. In order to generate a drop message properly the router should be able to do minimum processing before actually dropping a packet, that is to accept a packet into buffer and to make a drop message duplicating the header part. 2.2. Recovering burst loss by buffering routers Burst loss occurs when a router encounters congestion for a long time. A burst can be from a few packets to several thousands of packets requiring a large cache to store copies. Because this size can be large, it is not feasible to equip every router with a cache big enough to recover burst loss. Instead, we place special routers for burst loss, so called buffering router at several nodes where multicast tree branches. A buffering router covers recovery from burst loss which occurs between itself and the next buffering router in the routing tree. When a burst loss occurs at a router the router sends a series of drop messages to the previous hop router, i.e. the cache router, which relays the drop messages to the buffering router. Once a burst loss occurs it will take a while before the router is relieved from the congestion. The router sends an estimated recovery time in the drop message. The buffering router waits until the congested router recovers from congestion and retransmits the packets. The packets are converted to unicast packets and directed to the congested router. Then the congested router restores multicast packets from the unicast packets and resumes multicast forwarding. Lim & Kim Expires May 1998 [page 4] Internet Draft IP Extension for Reliable Multicasting November 1997 When a host is available for a buffering device instead of the buffering router itself, the host can be used to store copies of multicast packets. If a host registers itself as a buffering device to a buffering router, the buffering router sends all duplicated multicast packets of proper QoS which pass through the buffering router to the buffering host and updates the buffering router address in the option field of the multicast packets with the IP address of the buffering host. 2.3. Three schemes of recovery The recovery is done in three steps. First, if a router encounters a loss of either single or burst, it sends a drop message to the cache router. If the cache router still has the packet in the cache it transmits the packet to the requester. Second, if the cache router finds the packet is no longer existing in the cache it forwards the drop message to the buffering router. Then the buffering router searches the packet in the buffer and sends to the requester by unicasting. The congested router converts the unicast packet into a multicast packet and resumes multicast routing. Third, if the buffering router finds the packet is no longer existing in the buffer it forwards the message to the original source of the packet. Then the source host retransmits the packet to the congested router by unicasting. Three schemes can be implemented in any combination or separately. This operation ensures full recovery upon packet loss due to congestion. 2.4. Delay before retransmission When a single or burst loss occurs there arise a question that how soon the router will be recovered from congestion and becomes ready to receive packets. If the congestion extends for a long period of time fast retransmission is useless or makes problem even worse. In the current version of IP, there is no provision on this but TCP flow control takes care of this situation by reducing the window size. But the control scheme of window size is not precisely corresponding to the recovery timing of the router. But there is no other way because there is no information about recovery time of router. But the congested router can have information on why there occurs overflow, whether it is single or burst and how soon it can be relieved. If this is true the router can add this information, estimated recovery time, in drop message. The cache router which is in charge of the primary retransmission decides whether retransmission be made instantly or be made after certain amount of delay time. If delay is required, the cache router forwards the drop message to the buffering router together with the delay information. Because the buffering router has enough amount of buffer which can hold the packets for a long time the packets are retransmitted after the delay time required for the congested router to recover. Lim & Kim Expires May 1998 [page 5] Internet Draft IP Extension for Reliable Multicasting November 1997 3. Protocol description 3.1. Extension to IP datagram adding cache/buffering router address An option is defined in IP datagrams to store cache and buffering router IP addresses. The cache IP address is the IP address of the cache router which ensures recovery from single loss and the buffering IP address is the IP address of the buffering router or host which ensures recovery from burst loss. Figure 1 shows packet format of the option in the IP datagram. 0 8 16 24 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | code(10) | length | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Cache Router IP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Buffering Router/Host IP Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1. The format of the router option in an IP datagram The source host initializes the cache and the buffering IP address as 0 and IP address of the source host respectively. Cache routers update the cache address field as their IP addresses and buffering routers update the buffering address field as its IP address. If a host is used as a buffering device, this field is updated as the IP address of the buffering host. When routing multicast packets with proper QoS, the buffering router forwards copy of the packets to the buffering host. The cache and the buffering address fields are continually updated while the packets are passing through the routers. This makes it possible that recovery is implemented by the nearest router from the congested router. 3.2. Converting multicast datagram to unicast datagram When a buffering router receives a drop message then the router searches the packet in the buffer. If it succeeds finding the packet it converts the packet into a unicast packet saving the multicast address in the option field and changing the destination address to the congested router address. If it fails, it forwards the drop message to the original source host. Receiving routers should convert a unicast packet to a multicast packet and continue multicast routing. The figure 2 shows the format of the multicast address option. Retransmission packet by cache routers is forwarded in multicast format because the cache router is located adjacent to the congested router. Lim & Kim Expires May 1998 [page 6] Internet Draft IP Extension for Reliable Multicasting November 1997 0 8 16 24 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | code(11) | length | reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Multicast Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2. The format of the multicast address option in an IP datagram 3.3. Drop ICMP Message This message is sent to the cache router when a drop from congestion occurs. One message is generated per each drop. Receiving the drop message the cache router searches the requested packet in the cache. If found, the cache router sends the packet to the congested router. If not found, drop message is forwarded to the buffering router without changing the source IP field, thus preserving the IP address of the congested router. Receiving the drop message the buffering router searches the requested packet in the buffer. If found, the buffering router converts the multicast packet into a unicast packet and sends to the congested router. If not found, drop message is forwarded to the original source host without changing the source IP field, thus preserving the IP address of the congested router. The source host can either retransmits the requested packet to the congested router or notifies to upper layer protocol. The congested router specifies estimated time to recover from congestion in the drop message, so that routers can make delay before retransmission. Figure 3 shows the format of the drop ICMP message. 0 8 16 24 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type | code | checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | estimated time to recover | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | internet header + 64 bits of datagram | | prefix | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3. ICMP drop message format 3.4. ICMP register buffering host message This message is used for a host to register itself to a buffering router as a buffering host. When a buffering router receives this message the router forwards all multicast packets with proper QoS to the buffering host. Upon receiving drop message from a congested router the buffering host transmits to the congested router in unicast format. Lim & Kim Expires May 1998 [page 7] Internet Draft IP Extension for Reliable Multicasting November 1997 0 8 16 24 31 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | type | code | checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | buffering host IP address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4. ICMP register cache host message format 4. Implementation issues 4.1. Compatibility The extensions proposed above are upward compatible with current version of IP. Cache routers and buffering routers can be used combined or separately. Routers can be replaced incrementally from congested areas. New ICMP drop message and datagram message with new options will be tunneling existing routers intact so that we can improve the network smoothly without changing all the routers simultaneously. 4.2. Minimum implementation The extensions proposed above could require major technology improvements like router design with large memory. For an intermediate stage we can think a minimum implementation plan. Without using cache or buffering routers, the sending host can recover packet losses in the IP layer. The IP of the source host can send retransmission packets to the congested router via unicast messages. The congested router converts the unicast packet to a multicast packet and resumes multicasting from the point where multicasting was suspended. This can enable recovery from congestion completely in the IP layer thus not only making the recovery much faster but also making the transport multicasting protocol simple and straightforward. The IP layer module can either have its own buffer to hold IP packets or manage the packets for retransmission using linked list with TCP buffer. 4.3. Cache/Buffer size consideration Suppose cache routers are used in a gigabit network and routers are separated by 100 kilometers apart. Packet travel time is 0.3 millisecond for one way. If a congestion occurs, the cache router drops the received multicast packet and sends a drop message. If we assume the time to process a received packet and to generate a drop message is 0.4 millisecond, the cache router should store a duplicate copy of a multicast packet by 1 millisecond. This results in 1 Mbit cache required for each channel. Considering buffer size of a buffering router, suppose round travel time between source and destination host is 10 seconds. If we give router's recovery time to recover from congestion 10 seconds the total Lim & Kim Expires May 1998 [page 8] Internet Draft IP Extension for Reliable Multicasting November 1997 time to store packets in the buffer will be 20 seconds. Supposing 1 percent of the total 1 gigabit traffic is multicasting traffic requiring retransmission, the buffer size will be 25 Mbyte. If we increase the recovery time to 3 minutes the buffer size becomes around 250 Mbyte, and we feel this figure is not difficult to implement. Authors: Man Yeob Lim, Mr Dae Young Kim, Prof. InfoCom Eng. Dept. InfoCom Eng. Dept. Chungnam National University Chungnam National University Daejeon 305-764 Daejeon 305-764 Korea Korea Phone: +82 42 821 3544 Phone: +82 42 821 6862 Fax: +82 42 821 2225 Fax: +82 42 823 5586 Email: mylim@sunam.kreonet.re.kr Email: dykim@ccl.chungnam.ac.kr http://ccl.chungnam.ac.kr/~dykim/ REFERENCES [1] S. Deering, Host Extensions for IP Multicasting, RFC 1112, Jan. 1989. [2] S. Kasera, J. Kurose, and D. Towsley, Scalable reliable multicast using multiple multicast groups, Proc. ACM Sigmetrics Conference, 1997. [3] J. M. Chang and N. F. Maxemchuk, Reliable broadcast protocol, ACM Trans. Computer Systems, 2(3):251-273, August 1984. [4] S. Armstrong, A. Freier, K. Marzullo, Multicast Transport Protocol, RFC 1301, Feb. 1992. [5] B. Whetten, T. Montgomery, S. Kaplan, A high performance totally ordered multicast protocol, Theory and Practice in Distributed Systems, Springer Verlag, LCNS 938. [6] C. Papadopoulos, G. Paruklar, G. Varghese, An error control scheme for large-scale multicast applications, Washington University, St. Louis. [7] M. Yajnik, J. Kurose, and D. Towsley, Packet loss correlation in the Mbone multicast network, University of Massachusetts at Amherst. Lim & Kim Expires May 1998 [page 9]