INTERNET-DRAFT FRANCE TELECOM February 18, 1999 Cedric Goutard, Expires: July 18, 1999 Ivan Lovric, draft-lovric-francetelecom-satellites-00.txt Eric Maschio-Esposito Pre-filling a cache - A satellite overview Status of this Memo This document is an Internet-Draft and is in full conformance with all the provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress". The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Drafts Shadow Directories can be accessed at http://www.ietf.org/shadow.html . Abstract Today, satellites are becoming major vectors of the information diffusion on the Internet. Their use can prove to be fully useful for the cache pre-filling because they allow big volumes of data to be transferred at high speed (up to 45 Mb/s) and to be distributed simultaneously on several reception dishes. When having this pre- filling information on the cache, users can benefit from better access time to the stored pages. In this context, the satellite allows the quality of service for the end user to be improved by optimizing satellite links and by transferring large volumes of data directly only when the traffic on the network is low. France Telecom Expires: July 1999 [Page 1] Internet-Draft February 1999 Table of contents I Introduction II Experiments II-1 Proxy-satellite experimentation II-2 Caches pre-filling experiments over satellite II-2.1 Technical solution of cache pre-filling by co-operation II-2.1.1 Choice of the ICP co-operation mode II-2.1.2 Impact of a second cache in the architecture II-2.1.3 Collecting the test URLs II-2.1.4 FTP transfer over satellite II-2.1.5 Results of caches pre-filling by co-operation II-2.2 Technical solution of pre-filling by HTTP redirection II-2.2.1 Principle II-2.2.2 Experiments of pre-filling by redirection III Technical description of pre-filling methods III-1 Technical description of the pre-filling with Wcol III-1.1 Description of the files structure hierarchy with Wcol III-1.2 Complementary files generated by Wcol III-1.2.1 Creation of the INFO file III-1.2.2 Creation of the HEAD file III-1.3 Complete pre-filling process with Wcol III-2 Technical description of the pre-filling with SQUID III-2.1 Description of files structure hierarchy with SQUID III-2.1.1 Description of the LOG file III-2.1.2 Localization of the storage path III-2.2 Content of a file stored by SQUID III-2.3 LOG file generation France Telecom Expires: July 1999 [Page 2] Internet-Draft February 1999 III-2.4 Full pre-filling process with SQUID III-2.5 Difficulties encountered with dates III-3 Description file of URLS for the pre-filling with Wcol and SQUID IV Advantages highlighted by these experiments V Next experiments V-1 Multicast diffusion toward several remote caches V-2 Automatic feeding of the HTTP server V-3 Diffusion services toward communities of interest V-4 Pre-filling using ICP extensions VI Partnership France Telecom - EUTELSAT VII References VIII Acknowledgments IX Authors' addresses I Introduction In order to supply the internal needs of France Telecom on information broadcast technologies on the Internet , we chose to define services through a satellite diffusion infrastructure. Indeed, thanks to the intrinsic nature of satellite broadcast, it is the most practical way to develop these kind of services. Multicast technologies which are designed fo diffusion do not manage to impose themselves on the market and the deployment costs (i.e. the modification of all the routers to work in multicast mode) are two high. The satellite is becoming an unavoidable communication mode for the Internet community. Thanks to satellite, it is possible to increase the bandwidth without investing huge amounts of money for the connectivity needs. Transmission speed and information diffusion are critical points that all the ISPs are trying to solve without finding effective cheap means. We began with the observation that the Web traffic is asymmetrical: the amount of information returned by the Web servers is much higher than the one generated by client requests. So we studied the way to deploy an unidirectional high-speed access over satellite (only for the return way) for a company Intranet in order to decrease the traffic on the leased connections. This Intranet is built around a LAN which is connected with frame relay, ISDN or specialized connections that have an Internet access point. France Telecom Expires: July 1999 [Page 3] Internet-Draft February 1999 II Experiments II-1 Proxy-Satellite experimentation During this experimentation on a proxy-satellite, we simulated the needs of a company who has to broadcast bulky documents on its Intranet (to several work offices in several areas). We supposed that this company only has a limited bandwidth (64 kb/s to 256 kb/s), and does not want to change its network infrastructure (routers, etc.). With this configuration, the traffic generated by the broadcast of documents is rapidly going to saturate the local Network, and the other applications will not have enough bandwidth to run correctly. All the client/server applications, which regularly use a part of the Network resources, will be blocked. So, we modified the access architecture to the Internet/Intranet of the company by installing a proxy on the local network (taking into account the existing parameters), at the reception point of the satellite. As it is not possible to build differentiated services for the use of the Intranet and for the use of the Internet, this proxy can concentrate all the Intranet/Internet services (HTTP, FTP, NNTP). Then, the default routing parameters direct the requests (clients to servers) toward the terrestrial network and allow the responses (servers to clients) to be received over the satellite link. To configure the client's browsers, this solution only requires automatic configuration utilities (i.e. the file proxy.pac) so that they can access Intranet/Internet services through the proxy. We also modified the routing table of the last router to redirect all the packets whose destination address corresponds to the Proxy towards the satellite up link site. France Telecom Expires: July 1999 [Page 4] Internet-Draft February 1999 The diagram below shows the architecture defined for our experiment. ___________ //////// / \ //////// //////////X Satellite X////////// //////// \___________/ //////// . . . . \\ . . // \\ . . // \\. .// * \\ // * / \ \\ // / \ /___\ /___\ || Satellite board || +------------------------+ +---------+ | ISP | /| | | | CACHE R |/ | ISDN | PROXY | | E | |=====================| | +---------------+ M |\ | FRAME RELAY +---------+ || | O | \| || || | T | || +++++ | E | LAN ====================== ++ ++ | | | | + + | ACCESS | +--------+ +--------+ + + | | | User 1 | | User 2 | + INTERNET + +--------+ +--------+ +--------+ + + ++ ++ +++++ Although this solution already optimizes the response times, it is however insufficient and does not allow the broadcasting facilities of the satellites to be used. It still remains a unicast communication tool in a potentially multicast environment. II-2 Caches pre-filling experiments over satellite In order to optimize the efficiency of the satellite connections we undertook to pre-fill the Proxy Server's content (client side). The pre-filling is done in two principal steps : The first one consists in analyzing the logs of the proxy and determining a significant number of requested URLs. The second one aims at refreshing the contents (ISP Cache Server) and preparing a pre-filling file which will be broadcasted over satellite (for example as a background task during the night when the activity of the company is reduced). When the download is done, an application installs the updated URLs immediately on the local cache or on an HTTP server. France Telecom Expires: July 1999 [Page 5] Internet-Draft February 1999 So there are two different methods to pre-fill the cache : - The pre-filling by co-operation with an other local cache. - The pre-filling by redirection of the traffic towards a local HTTP server. The most frequently requested URLs are in fact directly delivered and are locally available. No backbone connection is needed with the original remote servers. As the update over satellite of the local cache or the HTTP server is an unidirectional mechanism, the bandwidth of the backbone is not at all affected by the refreshing of contents. II-2.1 Technical solution of pre-filling by co-operation The Netscape Proxy Server 3.5 used for the experiment can not easily be pre-filled. Although it comprises a development API, this one does not have functions to act on the cache content. Then the hierarchy generated on the proxy server is complex, the file names are transcoded. So we decided to install a second cache. Due to their public domain sources, we chose to use the Proxies Caches Wcol and Squid. The source files only allow us to better understand the management mechanism of the content treated by these caches. The optimization of the communications remains a major problem, so, to avoid the increase of remote connections, we chose to install the cache that we wanted to pre-fill on the local network at the reception point of the satellite. On this network, the pre-filling technique is based on the use of the ICP protocol. ___________ //////// / \ //////// //////////X Satellite X////////// //////// \___________/ //////// . . . . \\ . . // \\ . . // \\. .// * \\ // * / \ \\ // / \ /___\ /___\ || || +------------------------+ +------+ +-------+ | ISP | /| | | | PRE | | CACHE R |/ | ISDN |PROXY | ICP |FILLING| | E | |================| |<---->| CACHE | +---------------+ M |\ | FRAME RELAY +------+ +-------+ || | O | \| || || ++++++++ | T | || || ++ ++ | E | ===================== LAN + + | | | | + INTERNET +| ACCESS | +--------+ +--------+ + +| | | User 1 | | User 2 | ++++++++++ +--------+ +--------+ +--------+ France Telecom Expire: July 1999 [Page 6] Internet-Draft February 1999 II-2.1.1 Choice of the ICP co-operation mode The ICP protocol allows a hierarchy of co-operating caches to be defined. Usually, it is natural to define a vertical hierarchy with two or three levels of parents/children. Internet || First Level Parent / \ / \ Second Level Child1 Child2 / \ / \ / \ / \ Third Level C3 C4 C5 C6 We privileged the relationship child/child ("sibling" mode) for the two following reasons : 1- The pre-filling operation requires to stop the cache, to launch the pre-filling process and then to restart the cache. In operational mode, the Internet/Intranet services should not be stopped during updates even temporarily. Moreover, the content to pre-fill can become very quickly voluminous, and can require a relatively long process time. Some further studies will permit these parameters to be improved. 2- Using a parent/child hierarchy obliges the child to systematically request its parent, whatever the URL it needs. The parent must search for this URL on the Internet, if it does not have it in its cache. In a child/child relation, if the requested child does not have the document, it only replies by a MISS. In no way, this requested child will connect on the network to get the document. It is the querying child that will directly get the document on the Internet. This process does not modify its initial way of working. II-2.1.2 Impact of a second cache in the architecture. The pre-filled cache completes the main proxy-cache. The two graphics below explain the modification introduced by the pre-filled cache in the architecture. Sibling mode, chosen for the previously evoked reasons, generates a weak traffic between the querying and the replying caches. France Telecom Expires: July 1999 [Page 7] Internet-Draft February 1999 A - THE URL is contained in the pre-filled cache. Query (ICP) +--------------+ 2 +--------------+ | | --------------- > | WCOLD | | PROXY SERVER | | or | | NETSCAPE | < --------------- | SQUID | +--------------+ 3 HIT +--------------+ / \ | | | 1 | | 4 | \ / +--------------+ | | | CLIENT | | | +--------------+ The pre-filled cache contains the requested URL, it replies HIT and it subsequently returns the URL. As we are on the local network of the compagny, transfer times are almost immediate. B - THE URL is not contained in caches Direct to the main server / \ | | | 4 | | 5 | \ / Query (ICP) +--------------+ 2 +--------------+ | | --------------- > | WCOLD | | PROXY SERVER | | or | | NETSCAPE | < --------------- | SQUID | +--------------+ 3 MISS +--------------+ / \ | | | 1 | | 6 | \ / +--------------+ | | | CLIENT | | | +--------------+ The pre-filled cache does not contain the requested URL, the Proxy server, upon receiving a MISS reply, decides to contact the original server to get the URL. In spite of the negative answer, the cache response time is negligible compared to the connection time to the remote original server. France Telecom Expires: July 1999 [Page 8] Internet-Draft February 1999 II-2.1.3 Collecting the test URLs. We used a software that enables us to download all or part of a Web site. Files are recorded in their original formats (HTML, GIF, JPG, etc.) while preserving original path of the information. In fact, this software permits us to replicate a part of the downloaded site. A ZIP compression tool is used to optimize the transfer times. Remark: We noticed that it was not easy to predict the right level of the downloading. It is necessary to optimise the pre-filled contents to avoid downloading documents that would be of very little interest. II-2.1.4 FTP transfer over satellite For the purpose of feasibility and demonstration, we used a FTP server available on our Internet experimental platform . We put down a zipped file on the platform and launched the FTP downloading from the Server containing the satellite board and the Netscape Proxy. II-2.1.5 Results of caches pre-filling by co-operation The progressive integration of the different elements of the experimentation (satellite, then the proxy-satellite couple, then the proxy pre-filling) shows a constant improvement in the rapid access time to documents. Every element of the process participates in reducing downloading times. It is all the more remarkable if tests are carried out on video sequences or on Web sites containing a large number of high definition pictures. When used in unicast mode, the FTP transfers is more rapid. Background updates of a pre-filled content increases considerably the quality of service and limits the remote connection load. II-2.2 Technical solution of pre-filling by HTTP redirection We are going to describe in this part an alternative solution to the pre-filling of cache by co-operation. This solution also permits the URLs to be pre-fetched over satellite in order to improve the quality of service. II-2.2.1 principle Being a cache, the Proxy is able to filter the requests it receives. Due to this capability, one can deduce that it must be able to redirect those requests toward an HTTP server of our choice. The following diagram presents the functional principle of the pre-filling by redirection that we experimented: France Telecom Expires: July 1999 [Page 9] Internet-Draft February 1999 +++++ +------+ +---------+ ++ ++ /| | | | HTTP | + + / | |PROXY | HTTP | SERVER | + + | |================| |<---->| | + INTERNET + \ | +------+ +---------+ + + \| || || ++ ++ || || +++++ || || ====================== LAN | | +--------+ +--------+ | User 1 | | User 2 | +--------+ +--------+ We can analyse this process in two distinct parts: 1. The redirection of client requests by the cache toward an HTTP server. 2. The feeding of an HTTP server with up-to-date documents All HTTP requests from a client are parsed by the cache. If one filter is applicable, the request is modified in order to be transmitted to the local HTTP server, otherwise, the request is normally processed by the cache. The modification of the request follows this principle: Initial URL requested by the client to the cache: http://remote_server/document.html Requested URL submitted by the cache to the HTTP server and returned to the client: http://local_server/remote_server/document.html The principle of the pre-filling consists in applying filters to the client requests so that the cache could request directly some documents to the local HTTP server. The applied filters can use regular expressions and can be the following ones : asked URL | mapped URL ---------------------------------------------------------------------- http://www.ft.fr / | http://server/www.ft.fr/index.html http://www.ft.fr/ima1.gif | http://server/www.ft.fr/ima1.gif http://www.ft.fr/ima2.gif | http://server/www.ft.fr/ima2.gif http://www.cnet.fr / | http://server/www.cnet.fr / These examples permit us to redirect either some particular documents (those for the www.ft.fr site), or a whole site (www.cnet.fr). We must particularly pay attention to the writing of filters to make sure that only the documents to be pre-filled are taken into account. These filters must be up-to-date as soon as the content of the local HTTP server is modified. Finally, the great advantage of this France Telecom Expires: July 1999 [Page 10] Internet-Draft February 1999 redirection type is that it is transparent for the client. The client thinks he reaches the original Web server whereas in fact, the document he receives comes from another http server with the same field address (contrary to the redirection defined in the HTTP protocol). The local HTTP server is regularly fed with up-to-date documents and the study of this transfer file will be the subject of a next experiment. II-2.2.2 Experiments of pre-filling by redirection We used, for this experimentation, the Netscape proxy-cache 3.52 on Solaris 2.6. This solution has been chosen because it enabled us to easily create filtering and mapping rules just by modifying a configuration file (obj.conf) and restarting the cache with the in-line command "restart". The HTTP server that we used is Apache but all other server could match for the experimentation. The two necessary points for this server, in our experimentation, are that it needs to be easy to feed and very efficient. We developped a Shell script that uses a list of URLs to create the Netscape configuration file. This script creates an up-to-date configuration file and then restarts the proxy-cache. The file containing the URLs has the following form: http://www.ft.fr/index.html http://www.ft.fr/ima1.gif http://www.ft.fr/ima2.gif http://www.cnet.fr/intro.html ... This script proved us the feasibility of a cache pre-filling service while using a simple and effective principle of HTTP traffic redirection. This kind of service can therefore be an efficient alternative to the experiments previously described. For the moment, this script needs to be manually launched once the up-to-date URLs are donwloaded and the filter file is created on the targeted server. III Technical description of pre-filling methods This chapter describes methods used to pre-fill the caches Wcol and Squid. These methods were successfully implemented in the technical solution of cache pre-filling by ICP co-operation which has been previously described in this document. France Telecom Expires: July 1999 [Page 11] Internet-Draft February 1999 III-1 Technical description of the pre-filling with Wcol Wcol (see [http://shika.aist-nara.ac.jp/products/wcol/wcol.html]) is a cache which has particular pre-fetching functionalities, but these capabilities have not been used in our cache pre-filling studies. In fact, the interest of Wcol consists first in its capacity to support all or part of the ICP protocol since the WcolD version (the following version WcolE fully implements the protocol ICPv2 whereas the WcolD version only implements a small part of the ICP messages, which is however sufficient for our experiments). The second interest of Wcol is the simplicity of the hierarchy of the stored Web pages on the cache. III-1.1 Description of the files structure hierarchy with Wcol Under a main directory "http", corresponding to the protocol, a hash-coding key permits a first selection of the URLs and a first level of directory to be constituted. Then, the URLs are stored directly within the hash directory whose name has the format "hxxx" (xxx represents a number between 000 and 999), with a first directory level corresponding to the server name, then the HTTP port, and then the different directory names stored hierarchically. This storage mode is very similar to the one used in Web servers except for the hash-coding level. So, the internal storage hierarchy is easy to recreate, and that is the reason why Wcol presented an interesting solution for the experiment of pre-filling caches. Example : If the directory of internal storage is /home/cache/ (obtained by initializing the CacheDir keyword in the configuration file of Wcol), the http://sample/Welcome.html URL stored in the cache will have the following path: /home/cache/http/h001/sample/80/Welcome.html III-1.2 Complementary files generated by Wcol When a URL is stored within Wcol (for example Welcome.html), the cache completes the stored URL by an information file with ",info" extension (ex: Welcome.html,info) which contains the information related to the stored URL for a specific internal use. Among this information file, we can find attributes like the number of times that the document has been accessed, the last modification date, the creation date, etc. For every stored URL, there is also a header file with ",head" extension(ex: Welcome.html, head). This file contains the HTTP header and all related information. If the information file or the header file are missing, then Wcol does not consider the URL as valid though it is stored at the good path. Therefore, in order for an URL to be correctly pre-loaded in the cache, it is essential to create the "HEAD" and "INFO" files. France Telecom Expires: July 1999 [Page 12] Internet-Draft February 1999 In our experimentation, it was therefore necessary, in order to be able to pre-fill the cache, to implement the internal mechanism of Wcol for creating the HEAD and INFO files. Remark : Once the INFO and HEAD files created and the URL stored at the good place in the storage space of Wcol, the file is then validated by the cache though the information in the HEAD and INFO files are partial. III-1.2.1 Creation of the INFO file The creation of the information file is hard and requires the call to specific routines of Wcol stored in modules "base.c" and "info.c". The routine named "AssignFileName" stored in "base.c" has the advantage, for a given name of URL, to specify its exact location in the internal storage space of the cache. The "NewInfo" and "SaveInfo" routines of the "info.c" module permit the INFO file corresponding to a specific URL to be automatically created. Although many attributes are not initialized in the INFO structure created by a call to these routines, we noticed that a restricted information file, created by this way, is sufficient for the URL to be recognized by Wcol as valid, if at least the fields "attr.name", "attr.state", and "attr.last" of the INFO structure are correctly initialized. III-1.2.2 Creation of the HEAD file For a pre-filled URL, it is always necessary to create a HEAD file in order to be recognized by the cache. In fact, it is sufficient to create a short HEAD file that contains only the following information : HTTP/1.1 200 OK Content-type: -the MIME type corresponding to the URL- III-1.3 Complete pre-filling process with Wcol Once understood and implemented in a software aiming at recreating the HEAD and INFO files, the following step consisted in creating a tool permitting the whole description file making the link between a URL to preload and its physical location to be processed. The format of this file is described in the chapter III-3. The tool creates information and header files. It also moves the URL to store, from its initial physical location on the hard disk, to the right place in the storage space of Wcol. This process is executed by the tool for each entry in the description file. Once the description file is created, it is necessary to store temporarily or not the files to include in the cache at places stated in the description file. Then this tool previously described has just to be launched. Therefore the prefilling mechanism of Wcol that has France Telecom Expires: July 1999 [Page 13] Internet-Draft February 1999 been achieved in the experiment of cache pre-filling over satellite contains these three elements: - process aiming at recreating the INFO and HEAD files - tool processing of the description file of the URLs - the description file itself Remark: For our experiments we stopped Wcol before each pre-filling process and reactivated it in order to simplify the complete experiment and to avoid that data stored in memory by Wcol interferes with preloaded data. III-2 Technical description of the pre-filling with SQUID As it has been previously described, the cache Wcol has the advantage of storing the information in a very simple way, which is very similar to the hierarchies of files stored on Web servers. The disadvantage of this solution is the fact that Wcol is not sufficiently widespread compared to the main caches that we find on the market (Netscape Proxy Server, SQUID, etc.). That is why the second part of the experimentation consisted in studying the opportunities to pre-fill the content of a frequently used cache which supports ICP and whose file sources are available in the public domain. The only one we found is SQUID. This famous cache is also known for its quality and its resistance in the case of important loads; experiments have been done with the version 1.1.22 of SQUID (see [http://squid.nlanr.net /]). III-2.1 Description of the files structure hierarchy with SQUID With SQUID, the hierarchy of stored files is more complex than it is with Wcol. In fact, the hierarchy of files stored by SQUID does not have any common point with the one of a Web server, because it was created in order to optimize the research of files by the use of hash-coding keys on two distinct levels of hierarchy whereas Wcol only has one hash-coding key level. Moreover, files are not stored directly within the cache as it is the case with Wcol, but a transformation and a renaming are operated before their storage. The exact location of a file within SQUID is made possible through the analysis of the file "log" that contains the link between an URL and the stored file. III-2.1.1 Description of the LOG file The information permitting the location of a file is specified in the "log" file whose exact location is written in the configuration file of SQUID (key word: cache_dir). In this LOG file, there is one line for each URL stored in the cache. The format of that line is the following one: France Telecom Expires: July 1999 [Page 14] Internet-Draft February 1999 - name of the file on 8 hexadecimal characters - creation date - expiration date - last modification date - length of the file - URL corresponding to the file Thus, it appears that, without this LOG file, it is not possible to make the link between a URL and the corresponding stored file. Therefore, it is necessary to generate this LOG file to be able to pre-fill SQUID. III-2.1.2 Localization of the storage path The name of a file stored in a SQUID cache, after transformation, is being coded on 8 hexadecimal numbers. So, it is not sufficient to describe precisely the exact place of physical storage of the file in the cache. In fact, it is absolutely necessary to use information stored in the configuration file (keywords: swap_level1_dirs and swap_level2_dirs) that permit the final file path to be calculated. The physical storage path on the disk is then generated using the following formulas : - first level of directory = name of file % swap_level1_dirs - second level of directory = name of file / swap_level1_dirs % swap_level2_dirs These formulas come from function "storeSwapFullPatch" stored in the "store.c" module. By applying these formulas, it is then possible to calculate the final location on the hard drive from the Squid filename. For example : the file 00000001 is stored at /Squid/cache/01/00/00000001 if keyword cache_dir is equal to /Squid/cache in the configuration file of SQUID. III-2.2 Content of a file stored in SQUID Once the physical storage place in cache is known, it is necessary to create the file to store. This file is based on the URL and includes any supplementary information that SQUID required in order to consider the stored file as valid. These complementary information must be stored at the beginning of a file, then, the full content of the URL must be added. The complementary information to add at the beginning of the file are : HTTP/1.1 200 OK Content-type: -the MIME type MIME of the stored object- France Telecom Expires: July 1999 [Page 15] Internet-Draft February 1999 It is interesting to see that this complementary information is precisely the same than the one that was necessary in the HEAD file of Wcol. Other information can be added, which appeare in the web browsers properties menus. But the previously main characteristics described are sufficient to consider the stored files as valid. III-2.3 LOG file generation In order for the pre-filling to be correctly taken into account by SQUID, it is still necessary to generate the line of the LOG file corresponding to the URL that must be forced in the cache. In our case, we noticed that the following information has to be absolutely created for each line with the following form : - name of the stored file (8 hexadecimal numbers) incremented by one for each new line. - creation date. - fffffffe for the date of expiration (see paragraph III-2-5). - modification date lower than the creation date. - size of the stored file in the cache; that is size of header info + size of the object to store. - URL corresponding to the stored file . Entry sample in the LOG file: 00000009 3a6c6c6c fffffffe 3a6c6c60 250 http://sample/Welcome.html III-2.4 Full pre-filling process with SQUID After the previous descriptions of the way SQUID stores URLs, the necessary work to pre-fill the cache consisted in the creation of the following three elements : - process aiming at generating the file header and to create the file to store within SQUID from the URL to include; the process generates the line of the LOG file corresponding to the URL, and inserts the file to store within. - process responsible for the processing of the description file of the URLS - description file itself whose syntax is precisely the same that was used for Wcol The method used to pre-fill SQUID is really similar to the one used for Wcol. The supplementary complexity of SQUID is due to the more complex representation of a URL in its storage place. Remark: On each SQUID startup, the link between the URLs and the stored files on the disk is recreated in memory in order to optimize the research time. Thus, for the experimentation, it was necessary to stop SQUID before every pre-filling process in order to avoid the data located in memory to interfere with pre-filled information. France Telecom Expires: July 1999 [Page 16] Internet-Draft February 1999 III-2.5 Difficulties encountered with dates A problem appeared concerning the expiration date of pre-filled documents. It has been observed that SQUID quickly considers a pre- filled document as outdated after a few minutes, what means that the supplementary information about creation and last modification dates must also be added in header at the beginning of the stored documents. However, for the purpose of the experimentation, it was sufficient to fix an expiration date in the LOG different of fffffffe and superior to the creation date so that the document is no more considered as outdated by SQUID, until expiration of this date. III-3 Description file of URLS for the pre-filling with Wcol and SQUID The description file has a very simple structure because, in fact, only three attributes are absolutely necessary to well pre-fill Wcol and SQUID. The required data for each line of the file is the URL in its normalized format (see [RFC 1738]), the exact location (on the hard drive or on the network) of the document that must be included in the cache , as well as the MIME type of this document (see [RFC 1341]). Each of these fields must be separated by a space. The description file contains "n" lines for "n" URLs to include in the cache. Example of description file: http://sample/Welcome.html /home/sample/Welcome.html text/html http://sample/Welcome.gif /home/sample/Welcome.gif image/gif IV - Advantages highlighted by these experiments Pre-filling cache technologies which have been implemented and described in this document have shown their feasibility in real experiments. The reading of the SQUID and WCOL sources available in the public domain, considerably helped us to quickly find pre-filling solution. Moreover, these experiments allowed us in complex co- operating cache architectures using ICP to be validated between two different operating systems and two different caches products. Thus that shows the inter-operability of these solutions and also the advantages of the ICP protocol. The use of a satellite link highlights the great potentiality of this media to transfer bulky contents very quickly as near as possible from the end-users. Information access times are considerably improved. The couple satellite link and pre-filling cache avoids part of the problems involved in the traffic congestion. In spite of the delay of 300 ms generated by a GEO satellite , the benefit of this connection becomes undeniable as soon as the volume of the required document exceeds 5 KBytes. Some of our tests were relative to large volumes of video and high definition pictures. In that case, playing a video France Telecom Expires: July 1999 [Page 17] Internet-Draft February 1999 animation, while continuing the transfer in background, ensures a good fluidity of the video sequences without any cuts which frequently occur on ISDN connections (64 Kb/s). Moreover, the level of confidence of the satellites makes it possible to use light error correction protocols. A smart anticipation of user's needs and a fine update processing can even make him suppose that he can use an Internet bandwidth equal to the bandwidth of the local area network. Satellite is also a simple way to provide powerful and very fast accesses in critical or badly served areas with no high speed infrastructures. Moreover, when an ISP's architecture is being upgraded by adding a new cache server, it is possible to take benefits of these techniques to pre-fill this cache and initialize it with a preset content according to interests of the users. That allows time to be gained and the cache to be made immediately effective, whereas, in practice, a long initialization process is still necessary, during which first users have no benefits from using this cache. V - Next Experiments We will first improve the tools previously described. Then, the next researches concern the diffusion of contents. In fact we aim at pre- filling simultaneously several caches. For this purpose, we will use satellite and multicast protocols, for example, the MFTP protocol. Moreover, we will study the use of satellite diffusion in order to pre-fill caches with contents specifically targeted for communities of interest. V-1 Multicast diffusion toward several remote caches The following steps consists in experimenting the pre-filling while using a real multicast transfer between the satellite up-link site and the different reception sites. V-2 Automatic feeding of the HTTP server In the case of pre-filling by redirection, the experimentation does not yet integrate the basics of feeding the HTTP server with up to date documents. We also do not have yet implemented neither an automatic file transfer method nor an automatic choice of the files we download on the HTTP server. So, our next work will consist in an improvement of our script in order to automate and optimize updates and the restart the cache as soon as a new directory structure is written on the HTTP server. The second stage will be the use of a file transfer method to the HTTP server using a satellite link. France Telecom Expires: July 1999 [Page 18] Internet-Draft February 1999 V-3 Diffusion services toward communities of interest Using the architectures defined in the previous experiments, we will work on the definition of a feeding service of up-to-date documents. We will use for that purpose a file transfer method that we will later develop. This service could depend on an analysis of the logs we could get on the caches, and, for example, could decide that the next satellite update of the HTTP server will only concern the most popular URLs. In that case, a feedback on the most used cached data (by analyzing the log files) will contribute to make the pre-filling more cost-effective and more interesting. The identification and the binding of the major points of interest we can extract from the analysis of logs could enable us to create groups of interest (which can be different on each site). As all clients do not have the same points of interest, studies will be led to optimize the transfer of a common content to all caches, and, then the transfer of a personalized content for the purpose of each community of interest. We will also work on dynamic pages. V-4 Pre-filling using ICP extensions Another kind of caches pre-filling based on the ICP extensions will be implemented in a further experiments. Indeed, ICP extensions proposed in the referenced draft [draft-lovric-icp-ext-01.txt] permit the content of any cache to be pre-filled thanks to push-caching messages. A process that would send to a targeted cache an ICP_OP_SET message with the ICP_FLAG_ALIAS flag set, could force an URL in the targeted cache. It uses for that purpose a protocol like "file://..." to specify to the cache the network path of the stored URL alias. The cache must then fetch this alias in a lower or equal time than the delay set in the ICP_OP_SET message. Otherwise, it is also possible to pre-fill a full list of URLs by sending an ICP_OP_SET_TAB message with the ICP_FLAG_ALIAS flag set. In this case, the alias contains the list of the URLs to pre-fill. Each URL must also have an alias specifying its network path. Example of list file (see [draft-lovric-icp-ext-01.txt] for list file syntax) to pre-fill the following URLs : http://sample/Welcome.html http://sample/Welcome.gif 1,http 2,sample 3,80 4,/ 5,I,Welcome.html,A,file://home/sample/Welcome.html 5,I,Welcome.gif,A,file://home/sample/Welcome.gif France Telecom Expires: July 1999 [Page 19] Internet-Draft February 1999 Note: ICP Extensions also permit compressed aliases to be pre-filled. VI Partnership France Telecom - EUTELSAT A partnership between France Telecom and EUTELSAT will focus on the evaluation of the previously described solutions in a large scale multicast platform using satellites. EUTELSAT will provide the UpLink site and the satellite bandwidth and France Telecom will provide the cache pre-filling solutions. New diffusion services of personalized contents to different communities of interest will also be evaluated on this platform. The results of these evaluations will be published in a second draft which will be written by both partners. VII References [RFC-1341] Borenstein, N., N. Freed and, "MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies", RFC 1341, Bellcore, June, 1992. [RFC1738] Berners-Lee, T., Masinter, L., and Mr. McCahill, "Uniform Resource Locators (URL)", RFC 1738, CERN, Xerox PARK, University of Minnesota, December 1994. [RFC2186] D. Wessels, K., Claffy, "Internet Cache Protocol (ICP), version 2", RFC 2186, National Laboratory for Applied Network Research/UCSD, September 1997 [draft-lovric-icp-ext-01.txt] Lovric, "Internet Cache Protocol Extension", France Telecom, October 1998 VIII Acknowledgments The authors wish to thank Sandrine CHELLES, Christophe NETILLARD, Gilles GRATTARD, Betty PREHU, Sylvie LOVRIC for helping us in writing this document. France Telecom Expires: July 1999 [Page 20] Internet-Draft February 1999 IX Authors' addresses Cedric Goutard France Telecom Centre National des Etudes en Telecommunications 42, rue des Coutures BP 6243 14066 Caen Cedex France Phone: +33 2 31 75 91 49 Fax: +33 2 31 73 56 26 E-mail: cedric.goutard@cnet.francetelecom.fr Ivan Lovric France Telecom Centre National des Etudes en Telecommunications 42, rue des Coutures BP 6243 14066 Caen Cedex France Phone: +33 2 31 75 91 25 Fax: +33 2 31 73 56 26 E-mail: ivan.lovric@cnet.francetelecom.fr Eric Maschio-Esposito France Telecom Centre National des Etudes en Telecommunications 42, rue des Coutures BP 6243 14066 Caen Cedex France Phone: +33 2 31 75 91 63 Fax: +33 2 31 73 56 26 E-mail: eric.maschio-esposito@cnet.francetelecom.fr France Telecom Expires: July 1999 [Page 21]