INTERNET-DRAFT Katsushi Kobayashi draft-kobayashi-sirens-00.txt NICT August 24, 2005 Expires February 2006 Simple Internet REsource Notification System (SIRENS) framework and protocol Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on February xx, 2006. Copyright Notice Copyright (C) The Internet Society (2005). Abstract This document specifies a Simple Internet REsource Notification System (SIRENS) framework and protocol. SIRENS framework intended to improve end-to-end communication throughput with network information provided by the routers in an explicit ways. In the seven layer model, SIRENS protocol layer is located between network and transport layer. This document also provides how to interact between SIRENS and upper transport system as TCP. 1. Introduction This document specifies a Simple Internet REsource Notification System (SIRENS) framework and protocol. SIRENS is an in-band network Kobayashi Expires February 2006 [Page 1] Internet Draft SIRENS framework and protocol August 24, 2005 path surveying system with an explicit router response mechanism, like the XCP and TCP quick starts already proposed. SIRENS provides a function where an end host surveys along the communication path through either end-to-end or hop-by-hop, and information about the path is immediately updated. The end host can optimize its communication behavior and be up-to-date. In TCP communication, the maximum size of the congestion window in TCP should be set into the product of the bottleneck bandwidth and the round trip time. In the original TCP, the bottleneck bandwidth should be predicted with implicit responses from the network, but the predicted bandwidth is not sufficiently accurate. With SIRENS, the end host is able to obtain the bottleneck bandwidth with an exact response. Also, the router can inform the end host of the traffic behavior preferred by the operator's policy or the network conditions. 2. Motivation and requirements for explicit network information feedback system There has been a many kind of requirements for an end-to-end transport system to extend the network application area. Sophisticated transport control following the end-to-end network conditions is indispensable to ensure the requirements. Almost all current Internet transport standards predict network conditions with implicit responses, e.g., packet losses with uncontineous sequence number, changing round trip time (RTT) and time to live (TTL), and MTU discovery with ICMP packet too big message, etc. A limited number of explicit responses has been standardized for the transport system as ECN. The network conditions predicted with these methods are insufficient to satisfy application requirements. A more accurate system to survey network conditions with explicit ways has been expected. A great success has been achieved with additive increase and multiple decrease (AIMD) control of TCP. With TCP- NewReno implementation, the congestion window size increases gradually in the congestion avoidance phase, i.e., by only one packet per RTT. More than 10 hours is required for the sending data rate to increase up to half the bottleneck bandwidth, where the bottleneck bandwidth is 10 Gbps, the RTT is 100 ms, and the segment size Is 1500 bytes. This longer time to reach a peak bandwidth obstructs TCP from using up a higher bandwidth link. Attempts are still being made to improve TCP performance by increasing network bandwidth. More aggressive rate control and bottleneck prediction approaches have been proposed for large bandwidth delay product networks [HSTCP, SCTCP, and FAST]. However, all the above approaches involve packet loss or RTT changing caused by congestion due to the TCP sender attempting to predict bottleneck bandwidth. If network path information could be collected with explicit router response, these intended congestion conditions could Kobayashi Expires February 2006 [Page 2] Internet Draft SIRENS framework and protocol August 24, 2005 be avoided, and better TCP performance could be expected. Moreover, In real-time A/V stream communication, it is difficult to apply the AIMD control used in TCP. The data rate for stream applications relies on the encoding method, and usually has CBR-like traffic characteristics. It is easy to multiply or decrease the data rate by switching poor encoding, when packet loss is detected. However, as it is impossible to increase the data rate additively, it should be changed stepwise. If there were a real-time system of surveying bottleneck bandwidths from the end host, the sender could know when to switch to a higher data rate, i.e. a higher media quality encoding format. An explicit system to feed back network information from the end host to obtain end-to-end communication path information using a specific probe message Is required. An end host sends a probe message to a destination IP address. When a router receives the probe message, it reacts to the message and changes its attributes. The router also forwards the probe message to the destination. The probe messages are finally received at the destination host, and the host obtains the network information the probe message passed to it. The section after this describes the requirements for an explicit router feedback system on the Internet. Of course, Increased network information could improve transport. However, the cost of surveying network paths is larger. The explicit router feedback mechanism should be carefully designed so that it can improve transport and the reduce the cost of the system. 2.1 Gradual deployment possibility An explicit router feedback system requires support from both the end host and routers. It is difficult to replace all Internet hosts and routers in the short term. The explicit router feedback system should be designed to work well, even if the system has been partially deployed. If a probe message is sent on a hop-by-hop basis as an RSVP RESV message, the message cannot pass through legacy clouds. Since the probe message should be directly forwarded like a conventional IP packet on a legacy cloud, the destination address of the probe message is the same as the data stream's.[RFC2205] 2.2 In-band or out-of-band If the probe message can be conveyed with a data packet, the system approach is in-band path surveying. If the prove message cannot be encoded into data streams, the system is out-of-band. Explicit congestion notification (ECN) is an in-band approach.[RFC3168] One- pass with announcement (OPWA) with RSVP is an out-of-band Kobayashi Expires February 2006 [Page 3] Internet Draft SIRENS framework and protocol August 24, 2005 approach.[RFC2205] In the out-of band approach, two streams, i.e., the data stream and probe message stream, should be maintained. From the point of view from firewall management, the out-of-band approach makes a firewall policy more complicated compared with the in-band. The greatest cost in the in-band approach is the additional packet size due to the probe message field added onto the IP packet. If the additional packet size overhead for the probe field is acceptable, the in-band approach is better. 2.3 Maintenance flow state An explicit network information feedback system should work not only in an edge network but also the Internet backbone. There are a lot of number streams across the backbone. If the system architecture requires that routers maintain the per flow state, the number of states on the router increases to the same order as streams. This large number of states places bounds on system scalability. Therefore, a router model in the system must be maintenance-free in terms of per-flow state. 2.4 Resource allocation In some similar efforts, temporary bandwidth allocation that does not disrupt cross traffic is believed necessary to maximize end-to-end throughput. [XCP, TCP-QS] This kind of resource allocation may be a target for malicious attacks to exhaust resources. Therefore, careful evaluation should be done to introduce accumulative resource allocation, even if each allocation expires after a short time. 2.5 End-host decides what information to request and/or when Although a bottleneck bandwidth has important information on efficient communication in best-effort networks, the network conditions cannot be specified only with a bandwidth parameter. In the TCP slow-start phase, the queue size in the bottleneck router interface is more important than the bandwidth. The congestion window increases stepwise at every RTT, since acknowledge messages return at the same time. Therefore, a chunk of packets is sent in bursts. Serious packet losses can occur, when the burst packet chunk is larger than the buffer. To avoid packet losses, the end host obtains the size of the output queue in the bottleneck, and reduces the size of the burst packet chunk so that it is less than the size of queue. The surveying system should have the capability of obtaining information on many kinds of network conditions. Also, different transport protocols, different implementations, and different states require different information about network conditions. The surveying system should provide a function where the end-host can decide when to handle requests itself, since the router Kobayashi Expires February 2006 [Page 4] Internet Draft SIRENS framework and protocol August 24, 2005 cannot take the end-host's situation into consideration. 2.6 Sufficient accuracy to minimize overhead Although more accurate data on network conditions data may improve transport, larger message data for better accuracy will lead to a larger overhead. The data format for the surveying system should be designed to take adequate accuracy and the overhead of the data into consideration. 2.7 Avoiding malicious users. If router forwarding decreases because of processing path surveying system messages, malicious users may try to make denial of service attacks(DoS). The path surveying system should be designed that the router processing cost for the message is light. Even if every packet contains path surveying messages, the router must be able to handle each one without reducing performance. 3. SIRENS protocol framework In the OSI layer model, the SIRENS protocol is in the middle of the network and transport layers, i.e. the SIRENS protocol data in a packet is located just after the IP header, and before the transport header e.g. TCP or UDP. The protocol number in the IP header is a SIRENS protocol specific one that should be assigned by IANA. The SIRENS protocol data has a protocol number field for the upper transport layer, and the field specifies the protocol number for the successive packet part. The SIRENS protocol works together with SIRENS capable hosts and routers. In typical unicast communication, the SIRENS host sends an appropriate SIRENS probe request into a packet, e.g. bandwidth limit, delay in link, and size of router queue. If the SIRENS capable router found the SIRENS protocol number in the IP header, it must refer to SIRENS data, and pass it to the SIRENS process block. After the router processes SIRENS data, it forwards the packet to the next hop as normal IP forwarding. The destination SIRENS host collects the network condition data conveyed by the SIRENS header, and sends back the collected information into the SIRENS response field. After the SIRENS host receives the SIRENS response header, the host obtains information about the network passed by the packet sent by itself. The host can optimize its network behavior based on the obtained information. By continually sending SIRENS probe packets for a short period, the SIRENS host can keep the network conditions up to date. Therefore, the SIRENS host can adjust to dynamically changing network Kobayashi Expires February 2006 [Page 5] Internet Draft SIRENS framework and protocol August 24, 2005 conditions immediately, i.e., changing communication paths, or terminating competitive flows. Two types of probe modes, watermark and profile modes, are specified in the SIRENS protocol. The watermark mode is used to obtain an upper or a lower limit for network resource information on the data path. This mode can provide the same function on other explicit network condition feedback frameworks while manipulating a special header, e.g. XCP and TCP quick starts. The profile mode enables the end host to illustrate the communication path conditions with per-hop granularity. +-------+ | +-------+ | |----------- V ----------| | |SIRENS | \---------------------------/ |SIRENS | |Host A | --------------------------- |Host B | | |-----------/ ^ \----------| | +-------+ | +-------+ (a) Water Mark mode: to determine bottleneck anywhere along the path. | V +-------+ +------+ +------+ +------+ +-------+ | |-------| |------| | | |------| | |SIRENS | |SIRENS| |SIRENS|______|SIRENS| |SIRENS | |Host A | |Router| |Router|------|Router| |Host B | | |-------| |------| | | |------| | +-------+ +------+ +------+ +------+ +-------+ (b) Profile mode: to illustrate the path in per-hop granularity. Bottleneck is output interface of the second hop router. Fig. 1 SIRENS watermark and profile mode. 3.1 Watermark mode The watermark mode can survey the entire end-to-end network path with only one SIRENS request packet. This mode is typically used for discovering bottleneck bandwidth or minimum MTU in network paths,and can find both. The sender specifies the expected communication parameter value for the SIRENS prove header. When the router receives this, the router decides whether it can provide the service quality specified by the parameter. If router cannot, it rewrites the parameter field with the value it can. Otherwise, the router does not touch the parameter field. After the packet is passed among SIREN capable routers, the destination host knows the upper or minimum service-quality capability for the communication path. The Kobayashi Expires February 2006 [Page 6] Internet Draft SIRENS framework and protocol August 24, 2005 watermark mode is based on the same idea as other explicit router response mechanisms such as XCP and TCP quick start. Also, temporary resource blocking on a router could be supported, if needed. 3.2 Profile mode The profile mode is used for surveying the network path with hop-by- hop granularity. The sender sequentially changes the TTL value in the SIRENS header. When the router receives the profile mode packet, it compares the TTL value both in the SIRENS header and the IP header. If both TTL values are the same, the router rewrites the parameter value that the router can utilize, and forwards the rewritten packet. Otherwise, the router does not touch the SIRENS header. After the destination host has collected all TTL SIRENS headers, the destination knows what the network path conditions are. Also, the condition information could be sent back to the sender host conveyed within a reverse message. In the profile mode, the same number of router hop packets is required for surveying the end-to-end path. Therefore, the delay time to send multiple packets in the profile mode is longer than the delay for a single packet in the watermark mode. When a host surveys the end-to-end path with the profile mode at a bandwidth of 100 Mbps, an RTT of 100 ms, and an MTU of 1500 bytes, more than 800 packets (100 ms x 100 Mbps/(8 x 1500 Bytes)) are sent during each RTT. With the largest number of hops, i.e., 256, the delay in surveying the path is 1.3 times the RTT. If the host chooses the watermark mode, the delay is the same as the RTT. Therefore, the difference between both modes is only 0.3 times the RTT in terms of delay, and this may be acceptable compared with information on hop-by-hop details. Also, the sender can survey three times in each RTT in the above case. 4. SIRENS protocol The SIRENS header is encapsulated just after the IP header, because it must be located before other transport layer headers such as TCP or UDP. The IP protocol number in SIRENS packet is a SIRENS specific number that should be assigned by IANA. The SIRENS header consists of three parts, SIRENS control, SIRENS request data, and SIRENS response data blocks. We can see the packet structure in the SIRENS protocol in Fig. 2. +========================================================+ | IP header, Protocol = SIRENS (To be allocated by IANA) | +========================================================+ ^ | SIRENS control block including | | | Protocol ID for successive payload e.g. TCP/UDP) | SIRENS +--------------------------------------------------------+ Header | SIRENS request block, | Kobayashi Expires February 2006 [Page 7] Internet Draft SIRENS framework and protocol August 24, 2005 | +--------------------------------------------------------+ | | SIRENS response block | | =........................................................= V +========================================================+ | Header and Payload for the protocol | | shown in SIRENS control block | =........................................................= =........................................................= +========================================================+ Fig. 2 SIRENS header in IP packet 4.1 SIRENS header The SIRENS header format can be seen in Fig. 3. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ |V=1| |req.mod|req. probe |req. TTL |Protocol | +---------------------------------------------------------------+ |res.len|res.mod|res. probe |res. TTL |reserved | +===============================================================+ | SIRENS request data | +===============================================================+ | SIRENS response data [0] | +---------------------------------------------------------------+ | SIRENS response data [1] | +---------------------------------------------------------------+ ................................................................. +---------------------------------------------------------------+ | SIRENS response data [n: n<= 15] | +---------------------------------------------------------------+ Fig. 3 SIRENS header structure The first eight octets is the SIRENS control block, and the following four octets is the SIRENS request data block. Every SIRENS packet must have these two types of blocks, and therefore the minimum block size for the SIRENS header is twelve octets. The SIRENS response blocks optionally follow after the SIRENS request data. Each field specified in the SIRENS header (Fig. 3) means: version (V): 2 bits This field represents the version of SIRENS. The version of SIRENS specified in this document is one (1). Kobayashi Expires February 2006 [Page 8] Internet Draft SIRENS framework and protocol August 24, 2005 request mode (req.mode) : 4 bits This field specifies the SIRENS request mode. The SIRENS router refers to the request mode field, and determines how this packet is to be treated. The meanings of the request mode values are as follows: req. mode| Name | Router action ---------+--------+-------------------------------------------------------- 0,8 | NOP | No action ---------+--------+-------------------------------------------------------- 1 | update | This mode is the sub- mode of the profile mode. | | If the req. TTL value is equal to the IP TTL, the | | router overwrites the req. data block as the | | router internal value corresponding to req. | | probe. ---------+--------+-------------------------------------------------------- 2 | min. | This mode is the sub-mode of the watermark mode. | | If the router internal value corresponding to | | req. probe is less than the req. data block, the | | router overwrites the router value. ---------+--------+-------------------------------------------------------- 3 | max. | This mode is the sub-mode of the watermark mode. | | If the router internal value corresponding to | | req. probe is more than the req. data block, | | the router overwrites the router value. ---------+--------+-------------------------------------------------------- 4..15 |reserved| ---------+--------+-------------------------------------------------------- request probe ID (req. probe): 8 bits This field specifies the identifier for the request data block following the SIRENS control block. The meanings of the request probe field values are explained in the following section. request TTL (req. TTL): 8 bits The use of this field relies on the value of the request mode. If the request mode is the profile mode. i.e., the value is one "(1) update", the intermediate SIRENS router compares this field with the TTL value in the IP header. When both values are the same, the router overwrites the SIRENS request data block using the router internal data value corresponding to the request probe ID. If the request mode is one of the watermark modes i.e., two "(2) min." or three "(3) max.", the router decrements the request TTL value with one. When the value before the decrement is zero (0), the SIRENS router must wrap this value, i.e., 255 is the new value. When the Kobayashi Expires February 2006 [Page 9] Internet Draft SIRENS framework and protocol August 24, 2005 watermark mode, this field is used to estimate how many SIRENS available routers are along the path. Protocol: 8 bits This field specifies the protocols of successive data in packets. The meanings in this field are the same as the protocol number of IP[RFC791]. response mode (req.mode) : 4 bits This field specifies the mode for the SIRENS response. The meanings of the request mode values are the same as the request mode values described above. response length (res. len.): 4 bits This field specifies the number of response data blocks successive to the request data block. response probe ID (res. probe): 8 bits This field specifies the identifier for the response data blocks. The meanings of response probe field values are the same as request probe values. response TTL (res. TTL): 8 bits The use of this field relies on the value of the response mode. If the response mode is one "(1) update", this field identifies the TTL of the first response block data. If the response mode is two "(2) min." or three "(3) max.", the SIRENS sender specifies the difference between the request TTL value from the TTL value in the IP header that received the latest packet. SIRENS request data: 32 bits This field provides the network-condition information along the whole path or at a specific hop, e.g. the limit in bandwidth, the packet loss rate, and the queue length in the output interface. This field is filled in by the SIRENS request sender, and could also be overwritten by any router along the communication path. The meaning of the SIRENS request data block depends on the SIRENS request probe value. The meanings of SIRENS request data blocks are explained in the following section. SIRENS response data[n]: 32 bits, 0 < n <= 15 Kobayashi Expires February 2006 [Page 10] Internet Draft SIRENS framework and protocol August 24, 2005 This field provides information feedback from the other end. Only the SIRENS response sender can write this field with the collected values or the preferred values. The router should not overwrite this field. The meaning of the SIRENS response data block depends on the SIRENS response probe value. 4.2 SIRENS probe field and data block. The formats and meanings of probe and data fields are the same in both SIRENS request and response blocks. The basic format for SIRENS data fields consists of two 16-bit sub-fields. When the request is in the max. or min. mode, the router must compare each sub-field value with the router internal data. If only one sub-field should be updated in the results for of comparison, the router updates corresponding sub-fields, and keeps another one. The format for the probe field is as follows: 0 1 2 3 4 5 6 7 +---------------+ | |d| | probe ID |i| | |r| +---------------+ Dir. (direction): 1 bit This bit specifies the parameters for the direction the router should refer to. If the direction bit is zero (0), the router refers to the interface parameters for packet outgoing. In case of multicast packets and more than one outgoing interface, the router must refer to outgoing interfaces for each multicast packet one by one. If the direction bit is one (1), the router refers to the interface parameters in the incoming packet. Probe ID: 7 bits This field specifies what kind of network information is expected by the sender, or is encoded into the data blocks. The meanings of probe ID fields and the formats of corresponding data blocks currently defined are described in the following. 4.2.1 Null: Probe ID = 0 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ Kobayashi Expires February 2006 [Page 11] Internet Draft SIRENS framework and protocol August 24, 2005 | all 0 | +---------------------------------------------------------------+ This is null data, and no action is to be expected of the SIRENS router. 4.2.2 Link capacity: Probe ID = 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | Link capacity (Bytes/sec.) |Available capacity (Bytes/sec.)| +---------------------------------------------------------------+ This is used for link bandwidth information, and is used in both watermark and profile modes. The link bandwidth information is the physical bandwidth limit on the router. The available bandwidth information is the gap in the physical bandwidth limit and the current utilization. Both fields are in 16-bit float format, details of which are given in Appendix A. 4.2.3 Queue in packet count: Probe ID = 2 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | Queue length (packets) |Available queue length(packets)| +---------------------------------------------------------------+ This is used for the queue length for the interface in packet count, and is used in both watermark and profile modes. The queue length (packets) information is the maximum queue size until the router discards a packet. The available queue length (packets) information is that in which the router can accept newer traffic. The gap in the queue length (packets) and the current queue utilization could typically be used. Both fields are in 16-bit float format. When the router adopts adaptive queue management instead of tail drop, the value of available queue length (packets) might be negative. 4.2.3 Queue in Kbytes: Probe ID = 3 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | Queue length (Kbyte) | Available Queue length (Kbyte)| +---------------------------------------------------------------+ This is used for the queue length for the interface as a bytes count, Kobayashi Expires February 2006 [Page 12] Internet Draft SIRENS framework and protocol August 24, 2005 and is used in both watermark and profile modes. The queue length (byte) information is the maximum queue size until the router discards a packet. The available queue length (byte) information is that in which the router can accept newer traffic. The gap in the queue length (packets) and the current queue utilization could typically be used. Both fields are in 16-bit float format. When the router adopts adaptive queue management instead of tail drop, the value of available queue length (packets) might be negative. 4.2.4 Loss: Probe ID = 4 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | Loss on link | Loss on queue | +---------------------------------------------------------------+ This is used for the packet loss probability on the link and in the router queue, and is used in both watermark and profile modes. The loss on link information is the packet loss probability mainly caused by link error. The loss on queue information is the packet loss probability caused by overflow in the router queue. Both fields are in 16-bit float format. 4.2.5 MTU : Probe ID = 5 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | MTU | NULL | +---------------------------------------------------------------+ This is used for MTU size in octets for the link, and is used in both watermark and profile modes. The MTU information is in unsigned 16-bit integers. The zero value for MTU represents 65536 or more. 4.2.6 Delay: Probe ID = 6 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | Delay on link | Delay in maximum | +---------------------------------------------------------------+ This is used for delay in the link, and is used in both watermark and profile modes. The delay in link information is the delay in micro seconds (1/1,000,000 second) under the best conditions, e.g. when there are no collisions on the Ethernet. The delay in maximum information is the delay in milliseconds under the worst conditions, Kobayashi Expires February 2006 [Page 13] Internet Draft SIRENS framework and protocol August 24, 2005 including not only the delay related to link conditions but also that in the internal buffer or queue on the router. Both fields are in 16-bit float format. 4.2.7 Router ID: Probe ID = 7 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------------------------------------------------------+ | Router ID | +---------------------------------------------------------------+ This is used to identify the router on the path, and is used in the profile mode. The router ID information is the 32-bit identifier for the router. This ID should not be changed for longer than a single transport session. The router IDs for routing protocols could use this ID. The end host can detect path changes with this information, even if the hop for the path is the same. 5. Use scenario with SIRENS protocol The goal of the SIRENS protocol is to improve the performance of end- to-end communications. The previous sections described the SIRENS protocol and frame work. However, the SIRENS protocol only provides a mechanism where an end system can obtain network information as explicit data, and it does not provide a way of improving communications performance itself. To improve this, the transport and application in the end system should take the network information data obtained from the SIRENS protocol into account. The following sections present examples of how end systems take this information into account. 5.1 TCP performance on long fast pipe network. It is a difficult for a transport system to obtain maximum data transfer throughput in a long fast network. If bottleneck bandwidth on the path is detected, the TCP(-NewReno) stack can determine an optimal maximum congestion window with RTT after bandwidth delay product calculation. If available headroom bandwidth is detected, the TCP stack can change the window size incrementally by taking the headroom into consideration. The SIRENS protocol can optimize TCP behavior. When using the SIRENS protocol with TCP, the TCP client sends a TCP SYN packet encapsulating the SIRENS header with a link capacity request in the watermark mode. The SIRENS routers should check and rewrite appropriate fields at every node along the path. The TCP server receives the SYN packet with the minimum bandwidth information rewritten by the routers. The TCP server replies with Kobayashi Expires February 2006 [Page 14] Internet Draft SIRENS framework and protocol August 24, 2005 minimum bandwidth information in the SIREN response block encapsulating a TCP SYN ACK packet. The server also sends a link capacity request to establish the backward path. After the TCP client receives the SYN ACK packet, the TCP client chooses an optimal parameter set as a slow start threshold, and congestion window increment parameter in the congestion avoidance phase. Even if the end host chooses an optimal slow start threshold value using the bottleneck bandwidth on the path, the slow start may not reach the slow start threshold limit without packet loss. This is because a rapidly increasing congestion window in slow start causes bursty packet traffic. This burstiness causes unexpected packet losses in the bottleneck resulting in the queue being filled, even if the link bandwidth is not filled up. The profile mode SIRENS protocol can draw the network on a hop-by-hop basis. After obtaining the queue and link capacity profile along the path, the TCP sender can select the slow start threshold so as not to fill the queue. Although this example is for TCP, the same kind of collaboration could be done by other transport protocols such as DCCP, TFRC, and SCTP. 5.2 TCP throughput with wireless network path. When using wireless links such as 802.11, the network conditions are usually changing. For example, the throughput of a base station could change, when a wireless node switches base stations, i.e., in hand-over. Even when no hand over occurs, bandwidth throughput could change when the wireless node moves location. In cases of node mobility, i.e., the end host has a wireless interface and chooses base stations itself, the node can be aware of changing conditions. The node could optimize its network behavior by updating. However, with network mobility, there is generally no way of detecting changing conditions along the communication path. If the network conditions worsen, the host will soon be aware of this. This is because some packet losses and increasing delays will occur. However, there is no way a host can immediately detect improving conditions e.g., better base station links or shortened distances between the wireless path. Also, the packet losses in data links in a wireless network are more frequent than in wired links due to noise or path conditions. In the current congestion control specifications, transport must have a congestion avoidance mechanism [RFC2914]. Most transport regards packet loss as congestion on the path and decreases the sending rate. If transport could distinguish losses caused by congestion for other reasons, the behavior in response to packet losses could be changed. Kobayashi Expires February 2006 [Page 15] Internet Draft SIRENS framework and protocol August 24, 2005 5.3 SIRENS with Multicast IP multicast has long been expected as an infrastructure technology to enable scalable data distribution without concentrating the connections onto the data server. Receiver-driven Layered Multicast (RLM) provides a scalable data distribution framework using multi- rate distribution groups without feedback messages from the receiver to the source. To find whether there is an upper rate group, the RLM receiver tries to join a group with more bandwidth, the same as the window size increment with TCP. However, when it is unable to gradually switch to a higher bandwidth group since these are limited, it may attempt a trial join, which will cause serious congestion not only in the trial join receiver but also in receivers sharing the distribution tree. SIRENS enables a host to estimate the available headroom bandwidth in per hop granularity. The RLM receiver can choose the optimal rate group using the network path information without creating unnecessary congestion. 6. SIRENS Protocol Deployment 6.1 SIRENS deployment with legacy infrastructure. Even if some routers are not SIRENS capable, a SIRENS packet is forwarded to the destination. The SIRENS prove field has a direction option for obtaining information for both input and output side interfaces. When either side of a legacy router (L) is a SIRENS router (A, B), and a packet is forwarded with A, L, and B, information about L is missing. If SIRENS probe packets are sent to designate the output interface of A, and the input of B, the least information related to the link between A and L, B, and L can be obtained. Even though this can only be used in limited situations, it is helpful in deploying SIRENS. 6.2 SIRENS with layer two switch network SIRENS assumes all intermediate devices on the communication path are only IP routers. Numerous layer two switches, i.e. Ethernet switches, have already been deployed. SIRENS should work on a layer two cloud to maximize its benefits. This section describes how layer two devices could support SIRENS. One possible way is for layer two devices to snoop into the SIRENS header and process this like IGMP snooping does.[RFC3448] A second way would be for the SIRENS router to listen to the layer two routing protocol on the attached cloud, and deduce network information. For example, the spanning tree protocol (STP) may be working on the Ethernet cloud. STP works with exchanging the bridge protocol data unit (BPDU), and computes the root path costs from its own device to Kobayashi Expires February 2006 [Page 16] Internet Draft SIRENS framework and protocol August 24, 2005 the root bridge. Since the cost value of each hop is defined by the bandwidth, and the root path cost is the multiplex of each hop, a device is able to suggests the bottleneck bandwidth from itself to the root bridge. The bottleneck bandwidth could be determined by combining the bandwidth information to the root bridges from both SIRENS routers on the layer two edges. Although the second way can only support a limited portion of SIRENS, it could use a legacy layer two infrastructures without any modifications.[802.1D] +-------+ +--------+ ?????????? +--------+ |SIRENS | |Ethernet| ? ? | | | Router+============+ Switch +===...===+ +===...===+ + | A |<-- BPDU -->| 1 |<-BPDU ->? ?<-BPDU ->| | +-------+SW 1 to Root+--------+ to Root? ? to Root| | ?Ethernet? |Ethernet| ? Switch ? | Switch | ? 3(?) ? |Root | +-------+ +--------+ ? ? | Bridge)| |SIRENS | |Ethernet| ? ? | | | Router+============+ Switch +===...===+ +===...===+ | | B |<-- BPDU -->| 2 |<-BPDU ->? ?<-BPDU ->| | +-------+SW 2 to Root+--------+ to Root?????????? to Root+--------+ Fig. 4. SIRENS routers are connected with a ethernet switch network. The cost between router A and B is lower-than equal to the combined cost values, SW 1 to Root, and SW 2 to root, even if SW 3 provides a direct connectivity between SW 1 and 2. 7. Security considerations As discussed in the requirement section, a SIRENS router must have sufficient performance to process the SIRENS header. Although SIRENS does not disallow the resource allocation function, the SIRENS specifications in this document do not define it. Resource allocation functions will also be victims of attacks involving over- consumed resources, and most user requests will be blocked. Therefore, the resource allocation function should be carefully designed to be robust against attempted attacks. SIRENS provides a function to survey network path information from every SIRENS capable node. Using the information surveyed by SIRENS, smarter attacks will be possible such as bandwidth consuming attacks just the same as with link capacity. There are always more or less of these kinds of threats in best-effort packet networks. This is because attacks on over-consumed bandwidths are always possible with brute-force on the Internet. The threat with SIRENS is not higher than with other system protocols. Kobayashi Expires February 2006 [Page 17] Internet Draft SIRENS framework and protocol August 24, 2005 8. IANA considerations The SIRENS protocol requires its own protocol numbers, which should be assigned by IANA. The protocol includes a four-bit request and response mode field, and a seven-bit probe ID field. Since these values should be managed as globally unique, additional values should be registered through IANA. Appendix A. Format for 16-bit float used in these specifications The format to represent network parameters in this standard should support a wide range of values, e.g., network bandwidths from 9.6 Kbps to 40 Gbps or more, and loss probability from 10e-6 in bit error rate in SONET to 100%. Although a 32-bit integer or 32-bit float format is common, and is able to support the range of requirements, it is too accurate for SIRENS to collect the network path information. From the viewpoint of bandwidth utilization, a smaller data overhead is better with appropriate accuracy. The specifications define a 16-bit version floating format as the following. The 16-bit floating format data could be derived with simply cutting off the lower 16 bits of the corresponding value in the IEEE754 single precision, 32-bit, floating format. 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-------------------------------+ |S| E | M | +-------------------------------+ where S (1 bit), E (8 bits), and M (7 bits) represent sign, exponent, and mantissa. The E value is biased with 127 plus the true exponent. If all bits in E are one, the data represents an infinity value. If E = 0 and M = 0: Val = (-1)^S x 0 If E = 0 and M != 0: Val = (-1)^S x M/2^6 x 2^(-127) If 0 < E and E < 255: Val = (-1)^S x (1 + M/2^7 ) x 2^(E - 127) References [HSTCP] S. Floyd, "HighSpeed TCP for Large Congestion Windows", RFC 3649, December 2003 [SCTCP] T. Kelly, "Scalable TCP: Improving Performance in Highspeed Wide Area Networks", ACM SIGCOMM Computer Communication Review, 33 (2) 83-91, April 2003 Kobayashi Expires February 2006 [Page 18] Internet Draft SIRENS framework and protocol August 24, 2005 [FAST] C. Jin, et al. "FAST TCP: Motivation, architecture, algolithms, performance", IEEE INFOCOMM 2004 [RFC2205] R. Braden, et al., "Resource ReSerVation Protocol(RSVP) -- Version 1 Functional Specification", RFC2205, September 1997 [RFC3168] K. Ramakrishnan, et al., "The Addition of Explicit Congestion Notification (ECN) to IP", RFC3168 September 2001 [XCP] A. Falk and D. Katabi, "XCP Specification", draft-falk-xcp- spec-00.txt (work in progress), October 2004 [TCP-QS] A. Jain, et al., "Quick-Start for TCP and IP", draft-ietf- tsvwg-quickstart-00.txt (work in progress), May 2005 [RFC791] J. Postel, "INTERNET PROTOCOL", RFC791(STD5), September 1981 [RFC2914] S. Floyd, "Congestion Control Principles", RFC2914, September 2000 [RFC3448] M. Handley, et al., "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC3448, January 2003 [802.1D] IEEE, IEEE Std 802.1D, 2004 Edition: "Media Access Control Bridges", June 2004. Author(s)' Address Katsushi Kobayashi National Institute of Communications Technology 4-2-1 Nukii-kitamachi, Koganei Tokyo 184-8795 JAPAN Email: ikob@koganei.wide.ad.jp Full Copyright Statement Copyright (C) The Internet Society (2005). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Kobayashi Expires February 2006 [Page 19] Internet Draft SIRENS framework and protocol August 24, 2005 Kobayashi Expires February 2006 [Page 20]