Internet Working Group Syed Misbahuddin(editor) INTERNET DRAFT King Fahd University of Petroleum and Minerals, Saudi Arabia Category: Informational Tariq Ibn Aziz King Fahd University of Petroleum and Minerals, Saudi Arabia Nizar Al-Holou, PhD The University of Detroit Mercy, Detroit, MI, USA June 2003 Expires in 6 months Development of an Algorithm to Reduce Internet Data Traffic Congestion draft-misbahuddin-data-reduc-02.txt Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026. This document specifies an Algorithm to Reduce Internet Data Trafgfic congession for Internet Community, and requests discussion and suggestions for improvements. A version of this draft document is intended for submission to the RFC editor as a Proposed Standard for the Internet Community. Discussion and suggestions for improvement are requested. This document will expire six months after publication. Distribution of this draft is unlimited. Copyright Notice Copyright (C) The Internet Society 2002. All Rights Reserved. Syed Misbahuddin et al. Informational [Page 1] INTERNET DRAFT June 2003 Abstract In recent era of Information Technology the data traffic over the Internet is increasing uncontrollably. This proliferation of data traffic is due to general shift towards e-business and other application of Information Technology. The businesses relying on Internet lose billions of dollars each year due to slow or failed web services. Therefore, in Internet research, the most conspicuous issue is to develop methodologies to reduce net traffic over the Internet. In this paper, an algorithm is proposed to reduce net data traffic, which works at Internet layer in TCP/IP reference model. The algorithm monitors data repetitions in IP datagram and prepares a compression code in response of this repetition. If no IP datagrams are repeated, no compression code is sent. Therefore, the algorithm does not put any overhead on the system. Furthermore, as the proposed algorithm works at IP datagrams only, therefore, it remains transparent from all client-server applications. Table of Contents 1. Introduction .................................................. 2. Review of IP Datagram.......................................... 3. Data Reduction Algorithm Motivation ........................... 3.1 Assumptions ............................................... 3.2 The Algorithm ............................................. 4. Conclusion .................................................... 6. Acknowledgments ............................................... Normative References ............................................. Informative References ........................................... Authors' Addresses ............................................... Full Copyright Statement ......................................... Syed Misbahuddin et al. Informational [Page 2] INTERNET DRAFT June 2003 1. Introduction The WWW traffic is growing exponentially day-by-day with significant shift towards e-commerce and increasing Internet usages in the society. However, the exorbitant net data traffic over the Internet leads to slow responses to the net users and consequently the e-commerce web sites lose about $4.3 billion per annum [1]. The quality of web service and proper web response time is influenced by two factors: the web serverÆs slow response and net congestion over the Internet. The web serverÆs slow response is obviously connected with number of client hits. The serverÆs response may be improved improving web serversÆ performance [2]. Net congestion over the Internet may be controlled by two methods: web-caching and data compression [1]. In web caching, the history of web objects is maintained at some caching servers. When a client requests a web object, a specific cache server generates the requested object instead of obtaining the object from original web server. The web caching mechanism helps reduce the net data traffic. However, web caching is still a challenging area because all web objects are not cacheable [3]. Data compression algorithms are applied on the information content of the web object. There are some related works addressing the issue of net congestion. However, their focus has been limited to IP/UDP/RTP and TCP header compression for Low-Speed Serial Links data communication such as dial-up access [7, 8]. To use this characteristic, the sender and receiver keep their own states of header information along the session. The sender sends only the difference information in next packet transmission. In the present Internet paradigm, the net traffic is growing uncontrollably, which leads significant net congestion. Therefore, the issue of net congestion mandates the investigation of data compression algorithms for general Internet domain. Furthermore, it is observed that the TCP/IP datagrams contain a significant data portion, which repeats the data content in several situations. This data repetition also necessitates scavenging data compression techniques for IP datagramÆs data field. Syed Misbahuddin et al have presented a data reduction algorithm, which works for message communication based data communication networks [6]. In this algorithm, the processors connected to a data network, maintain the history of transmitted and received messages. The transmitter side processor prepares a compression code if some data bytes in the message are repeated and send the non-repeated data bytes to the network. The receiving end prepares the complete message with the help of the message history and the received non-repeated data bytes. The objective of this paper to extend the data reduction algorithm proposed in [6] to IP datagrams. In this algorithm, the data repetition in the data field of IP datagram has been focused.. Syed Misbahuddin et al. Informational [Page 3] INTERNET DRAFT December 2002 Multimedia media data traffic is sent over UDP at constant rate equal to the darin rate of reciver. The conestion algorithms work at network layers on TCP. However, TCP based congestion algorithms bring performance degradation for Multimedia media data traffic [7]. Therefore, multimedia media data traffic requires a new approach for congestion control. The data reduction algorithms presented in this draft work at IP layer on IP data grams at both client and serever ends. Therfore, this approach can be equally applied to both TCP and UDP datagrams. Therefore, it will not bring performance degradation problem for multimedia media traffic. The proposed data reduction algorithm will be conducive to control data congestion over the routers as it works before IP datahgram are packateized at the sending end and after reassembled at the recieving end. Section 2 reviews IP datagram and section 3 discusses the motivation and the proposed data reduction algorithm. Finally, the conclusion is presented in section 4. 2. Review of IP Datagram In connectionless Internet services, the web objects are broken into individual data packets, which are sent over the Internet independently. These data packets carry information about the intended recipients. At the receiving end of Internet model, the receiver software combines the received packets and reconstructs the originally transmitted web object. Figure 1 shows the general model of IP datagram. +------------------------+ |Header| Data Area| +------------------------+ Figure 1: Format of an IP datagram An IP datagram travels independently over the net. The IP datagram contain variable length of data field. The size of data field depends upon the application sending the data over the net. In current version of IP version 4.0, a datagram size varies from single octet to 64k octets including the header. The header portion contains routing and other informational details about the datagram [4]. 3. Data Reduction Algorithm Motivation It may be commonly observed that in a client-server interaction, a client hits a web server multiple times. This kind of situation is especially observed in web-based emails services, on-line shopping and in some e-commerce. During multiple client-server interactions, most of the web object content may remain intact for long time. The web server uses several session variables to maintain the currency of an already shipped web object to a particular client. These session variables are originated by the web clients and sent to the web server. Furthermore, some commercial web servers Syed Misbahuddin et al. Informational [Page 4] INTERNET DRAFT June 2003 maintain the history of the interactions from web clients. For example, a server maintains of history of advertisement pushed to a particular client [5]. Based upon these observations of repeated content of web objects and the availability of the client information at the server side, a Data Reduction (DR) algorithm has been investigated in this paper. The DR algorithm is based upon following assumptions. 3.1 Assumptions 1. The Internetworking model is connectionless, which uses IP datagram for information exchange. All IP datagrams follow IP version 4.0. 2. One bit in ôType of serviceö field of IP version 4.0Æs header is used to denote the data reduction process. This bit will be defined as Data Reduction Bit (DRB). 3. The algorithm is implemented at Internet layer in the TCP/IP model at both client and server side. 4. Both client and server maintain limited of histories of IP datagrams of web objects. 5. Each IP datagram is assigned the identification number according to the content of web page. 6. The data field length in IP datagram is at least 8 bytes long. 3.2 The Algorithm The algorithm divides data field of IP datagram into fixed eight groups. The width of each group varies from 1 byte to N/8 bytes for 8 bytes to N bytes long data field in IP datagram respectively. According to the algorithm, the web server keeps a copy of the recently transmitted IP datagram in a buffer called T_BUFF. Each entry in T_BUFF consists of two fields: ID field and data field. The ID field holds the information to identify a particular IP datagram. The data field keeps a copy of the data field of the transmitted datagram. When a client requests the same web object, the data reduction algorithm, intercepts the generated IP datagram to check its presence in T_BUFF. If T_BUFF contains the copy of the IP datagram, the data reduction algorithm will verify if the data field of the IP datagram is changed. If some data bytes groups have not been changed, then the DR algorithm will set DRB in the IP header to ô1ö to reflect the repetition of data bytes. The algorithm will then prepare an eight bit compression code (CC) to indicate the repeated data bytes group in the IP datagramsÆ data field. In the compression code,a bit with a value of "1" indicates that a corresponding data byte group has been repeated. A bit with a value "0" indicates a data byte group is not repeated. The non-repeated data byte group(s) will follow the compression code in the data field of IP datagram. The indices of repeated and non-repeated data byte groups is determined by bit numbers of compression code with values "1" or "0" respectively. The IP datagram with compression code is shown in Figure 2 and data reduction algorithm is summarized in Figure 3. Syed Misbahuddin et al. Informational [Page 5] INTERNET DRAFT June 2003 +--------------------------------------------------+ | IP header with DRB=1 |CC | Data field | +--------------------------------------------------+ Figure 2: IP Datagram with compression Repeat For each IP packet generated Repeat If the generated IP packet pk is very first IP packet THEN 1. T_BUF(k)= pk 2. Send IP packet out Else If pk is found in T_BUF and data bytes are repeated THEN 1. Prepare compression code 2. Set DRB to ô1.ö 3. Send CC and non-repeated bytes. 4. Update T_BUF End End Figure 3: Data reduction algorithm executed at the Web server side The decompression algorithmö running at the client side recovers the actual transmitted IP datagram from the received IP datagram. The algorithm checks the data reduction bit in the received IP datagram. If this bit is "1", then the process will assume that some data byte groups in the received datagram have been repeated and the copies of the repeated data bytes groups are already present in R_BUFF at the client side. In this case, the client will interpret the very first byte of the received datagramÆs data field as an eight bit compression code. The compression code will describe the indices of the repeated and non-repeated data byte groups. The data decompression algorithm will collect non-repeated data bytes from the received IP datagram and repeated data bytes groups from R_BUFF buffer at the client side. The reconstructed IP datagram will be sent to the appropriate layer to reconstruct the received web page. The data decompression process can be summarized in flow chart shown in Figure 4. To illustrate the IP datagram reconstruction process at the client side, we assume that the IP datagramÆs data field is 8 bytes long. According to DR algorithm, the data field is divided into eight groups and the size of each data byte group is one byte long. Assuming groups 0, 1, 2 and 3 are repeated in the data field of IP datagram, Syed Misbahuddin et al. Informational [Page 6] INTERNET DRAFT June 2003 the CCÆs bits 0, 1, and 3 will be set to ô1." To indicate the positions of non-repeated data bytes, the bits 4 to 7 of the CC will be set to "0." The CC to reflect this situation is shown in Figure 5. The IP datagram received at the client side along with compression code and non-repeated data bytes is shown in Figure 6. Repeat For each IP datagram received Repeat If the received IP datagram pk is very first IP datagram THEN 3. R_BUF(k)= pk 4. Utilize IP datagram pk Else If pk is present in R_BUF and DRB is ô1ö THEN 5. Interpret CC 6. Collect repeated bytes from R_BUF 7. Collect non-repeated bytes from received message 8. R_BUF= pk 9. Utilize IP datagram pk End End Figure 4: IP datagram reconstruction process at the client side. +----------------------------------------------------------------+ |Bit0 | Bit1 | Bit2 | Bit3 | Bit4 | Bit5 | Bit6 | Bit7 | +----------------------------------------------------------------+ |1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | +----------------------------------------------------------------+ Figure 5: Compression code showing repeated and non-repeated data byte groupÆs indices +----------------------------------------------------------------------+ | IP datagram Header with DRB=1 | CC | Non-repeated data bytes | +----------------------------------------------------------------------+ Figure 6: The IP datagram received at the client side In this scenario the data decompression algorithm running at the web client side, retrieves bytes 0, 1, 2 and 3 from R_BUFF and bytes 4, 5, 6 and 7 from the received IP datagram. In this example, total 5 data byte groups are transmitted instead of 8 data byte group (CC + 4 non-repeated data byte group). In other words, due to DR algorithm, the IP datagram arrived at the client side in relatively short period of time. Therefore, by applying the proposed data reduction algorithm, Syed Misbahuddin et al. Informational [Page 7] INTERNET DRAFT June 2003 more IP datagrams can be transferred from web server to a web client in given amount of time. The number of transmitted data bytes decreases as the number of repeated data bytes increases. Consequently, if all data bytes are repeated then only one byte of compression code will be sufficient to represent all data bytes. If no data bytes are repeated, the compression code is not needed and IP datagram transmission remains normal. 5. Conclusion The Internet traffic is growing due to increasing trends of its application in all facets of life. Opening new web sites, online shopping, online news, online buying and selling of stocks, online banking, Internet emails and Internet telephony are some examples of major contributing factors for increased net traffic and net congestion. It may be easily observed that in most client-server interactions, most of web page contents remain unchanged. This observation of repeated web object can be exploited to devise a data reduction algorithm. In this paper, a data reduction algorithm has been proposed, which utilizes the content repetition of web objects. The proposed algorithm generates a compression code if some data bytes in the IP datagram are repeated. The web client can reconstruct the original IP datagrams with the help of compression code and received non-repeated data bytes. The performance of the proposed data reduction algorithm has been evaluated in terms of queue length at router and IP datagram delay. The numerical results generated from the simulation model indicate that the proposed data reduction algorithm helps reduce the data congestion at Internet and improves web transactions. The proposed data reduction algorithm can be incorporated with IP header compression indicated in the literature [7, 8]. This approach may give better performance results. Informative References [1] Mazen Zari, Hossein Saiedian and Muhammad Naeem, ôUnderstanding and reducing web delays,ö IEEE computer, December 2001, pp. 30-37. [4] Ed Taylor, ôThe Network Architecture Design Handbook,ö McGraw-Hill, 1998. [5] Douglas E. Comer, ôComputer Networks and Internetö, Prentice Hall, 1997. [7] V. Jacobson, "TCP/IP Compression for Low-speed Serial Links,ö RFC 1144. [8] S. Casner and V. Jaconson, ôCompressing IP/UDP/RTP Headers for Low-Speed Serial Links,ö July 1998, draft-ietf-avt-crtp-05.txt Syed Misbahuddin et al. Informational [Page 8] INTERNET DRAFT June 2003 Normative References [2] J. Bangs and J. Mogoul, ôScaleable Kernel Performance for Internet Servers under Realistic Loads,ö Proc. Usenix 1998 Technical Conf., Usenix, Berkeley, California, pp. 1-12. [3] J. Almeida, V. Almeida, and D. Yates, ôMeasuring the Behavior of a World Wide Web Servers,ö Proc. 7t IFIP conf. High Performance Networking (IFIP), Kluwer Academic Publishers, Norwell, Mass., 1997, pp. 57-72. [6] Syed Misbahuddin, Syed M. Mahmud and Nizar Al-Holou ôDevelopment and performance analysis of a data reduction algorithm for automotive multiplexingö IEEE transactions of Vehicular technology, Vol. 50, No. 1, January 2001. [7] Vasekin Rokocevic "Congestion control Algorithms for Multimedia Application in the wireless Internet", www.staff.city.ac.uk/~veselin/publications/Acta.pdf. Authors' Addresses Syed Misbahuddin, Dr. Eng. Assistant Professor Department of Computer Sceince and Software Engineering Hail Community College, KFUPM PO Box 2440, Hail, Saudi Arabia Email: smisbah@kfupm.edu.sa Nizar Al-Holou, PhD Chairman Department of Electrical and Computer Engineering The University of Detroit Mercy, 4001 West Mc Nichols, Detroit, MI 48221, USA, Email: alholoun@udmercy.edu Tariq Ibn Aziz Lecturer Department of Computer Sceince and Software Engineering Hail Community College, KFUPM PO Box 2440, Hail, Saudi Arabia Email: taziz@kfupm.edu.sa Syed Misbahuddin et al. Informational [Page 9] INTERNET DRAFT June 2003 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Syed Misbahuddin et al. Informational [Page 10]