INTERNET-DRAFT                                        FRANCE TELECOM 
February 18, 1999                                     Cedric Goutard,
Expires: July 18, 1999                                   Ivan Lovric, 
draft-lovric-francetelecom-satellites-00.txt   Eric Maschio-Esposito 

 
                Pre-filling a cache - A satellite overview 
 

Status of this Memo 

This document is an Internet-Draft and is in full conformance with
all the provisions of Section 10 of RFC2026.

Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six 
months and may be updated, replaced, or obsoleted by other 
documents at any time. It is inappropriate to use Internet-Drafts
as  reference material or to cite them other than as "work in 
progress".

The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt


The list of Internet-Drafts Shadow Directories can be accessed at
http://www.ietf.org/shadow.html .


Abstract

Today, satellites are becoming major vectors of the information
diffusion on the Internet. Their use can prove to be fully useful for
the cache pre-filling because they allow big volumes of data to be 
transferred at high speed (up to 45 Mb/s) and to be distributed 
simultaneously on several reception dishes. When having this pre-
filling information on the cache, users can benefit from better access
time to the stored pages. 
In this context, the satellite allows the quality of service for the
end user to be improved by optimizing satellite links and by 
transferring large volumes of data directly only when the traffic on
the network is low. 


France Telecom                Expires: July 1999              [Page 1]
                           Internet-Draft              February 1999


Table of contents

I  Introduction

II Experiments

  II-1 Proxy-satellite experimentation

  II-2 Caches pre-filling experiments over satellite

    II-2.1 Technical solution of cache pre-filling by co-operation

      II-2.1.1 Choice of the ICP co-operation mode

      II-2.1.2 Impact of a second cache in the architecture

      II-2.1.3 Collecting the test URLs

      II-2.1.4 FTP transfer over satellite

      II-2.1.5 Results of caches pre-filling by co-operation

    II-2.2 Technical solution of pre-filling by HTTP redirection

      II-2.2.1 Principle

      II-2.2.2 Experiments of pre-filling by redirection

III Technical description of pre-filling methods

  III-1 Technical description of the pre-filling with Wcol

    III-1.1 Description of the files structure hierarchy with Wcol

    III-1.2 Complementary files generated by Wcol

      III-1.2.1 Creation of the INFO file

      III-1.2.2 Creation of the HEAD file

      III-1.3 Complete pre-filling process with Wcol

  III-2 Technical description of the pre-filling with SQUID

    III-2.1 Description of files structure hierarchy with SQUID

       III-2.1.1 Description of the LOG file

       III-2.1.2 Localization of the storage path

    III-2.2 Content of a file stored by SQUID

    III-2.3  LOG file generation

France Telecom                Expires: July 1999              [Page 2]
                           Internet-Draft              February 1999


    III-2.4 Full pre-filling process with SQUID

    III-2.5 Difficulties encountered with dates

  III-3 Description file of URLS for the pre-filling with Wcol
        and SQUID

IV Advantages highlighted by these experiments

V Next experiments

  V-1 Multicast diffusion toward several remote caches

  V-2 Automatic feeding of the HTTP server 

  V-3 Diffusion services toward communities of interest

  V-4 Pre-filling using ICP extensions

VI Partnership France Telecom - EUTELSAT

VII References

VIII Acknowledgments

IX Authors' addresses

I Introduction

In order to supply the internal needs of France Telecom on information
broadcast technologies on the Internet , we chose to define services
through a satellite diffusion infrastructure. Indeed, thanks to the
intrinsic nature of satellite broadcast, it is the most practical way
to develop these kind of services. Multicast technologies which are
designed fo diffusion do not manage to impose themselves on the market
and the deployment costs (i.e. the modification of all the routers 
to work in multicast mode) are two high. The satellite is becoming an
unavoidable communication mode for the Internet community. Thanks to 
satellite, it is possible to increase the bandwidth without investing
huge amounts of money for the connectivity needs. Transmission speed
and information diffusion are critical points that all the ISPs are 
trying to solve without finding effective cheap means.

We began with the observation that the Web traffic is asymmetrical:
the amount of information returned by the Web servers is much higher
than the one generated by client requests. So we studied the way to 
deploy an unidirectional high-speed access over satellite (only for 
the return way) for a company Intranet in order to decrease the 
traffic on the leased connections. This Intranet is built around a LAN
which is connected with frame relay, ISDN or specialized connections
that have an Internet access point.


France Telecom                Expires: July 1999              [Page 3]
                           Internet-Draft              February 1999


II Experiments

II-1 Proxy-Satellite experimentation

During this experimentation on a proxy-satellite, we simulated the
needs of a company who has to broadcast bulky documents on its 
Intranet (to several work offices in several areas). We supposed that
this company only has a limited bandwidth (64 kb/s to 256 kb/s), and 
does not want to change its network infrastructure (routers, etc.).
With this configuration, the traffic generated by the broadcast of
documents is rapidly going to saturate the local Network, and the 
other applications will not have enough bandwidth to run correctly.
All the client/server applications, which regularly use a part of the
Network resources, will be blocked.

So, we modified the access architecture to the Internet/Intranet of 
the company by installing a proxy on the local network (taking into 
account the existing parameters), at the reception point of the 
satellite. As it is not possible to build differentiated services for
the use of the Intranet and for the use of the Internet, this proxy 
can concentrate all the Intranet/Internet services (HTTP, FTP, NNTP).
Then, the default routing parameters direct the requests (clients to
servers) toward the terrestrial network and allow the responses 
(servers to clients) to be received over the satellite link.
To configure the client's browsers, this solution only requires 
automatic configuration utilities (i.e. the file proxy.pac) so that 
they can access Intranet/Internet services through the proxy.
We also modified the routing table of the last router to redirect all
the packets whose destination address corresponds to the Proxy towards
the satellite up link site.


France Telecom                Expires: July 1999              [Page 4]
                           Internet-Draft              February 1999


The diagram below shows the architecture defined for our experiment.

                            ___________
                 ////////  /           \  ////////
                //////////X  Satellite  X//////////
                 ////////  \___________/  ////////  

                  .                            .
                .                                .           
         \\   .                                    .  //
        \\  .                                        . //
        \\.                                           .//  
       * \\                                          // *
      / \  \\                                      //  / \ 
     /___\                                            /___\
      ||                                Satellite board || 
 +------------------------+                        +---------+     
 |       ISP              | /|                     |         |      
 |  CACHE             R   |/ |     ISDN            |  PROXY  | 
 |                    E   |  |=====================|         |
 +---------------+    M   |\ |  FRAME RELAY        +---------+      
        ||       |    O   | \|                      ||            
        ||       |    T   |                         ||            
      +++++      |    E   |                LAN ====================== 
    ++     ++    |        |                     |             |
  +           +  | ACCESS |                  +--------+     +--------+
 +             + |        |                  | User 1 |     | User 2 | 
 +  INTERNET   + +--------+                  +--------+     +--------+
  +           +                                      
    ++     ++
      +++++

Although this solution already optimizes the response times, it is 
however insufficient and does not allow the broadcasting facilities 
of the satellites to be used. It still remains a unicast communication
tool in a potentially multicast environment.


II-2 Caches pre-filling experiments over satellite 

In order to optimize the efficiency of the satellite connections we
undertook to pre-fill the Proxy Server's content (client side). The
pre-filling is done in two principal steps :
The first one consists in analyzing the logs of the proxy and
determining a significant number of requested URLs.
The second one aims at refreshing the contents (ISP Cache Server) and
preparing a pre-filling file which will be broadcasted over satellite
(for example as a background task during the night when the activity
of the company is reduced). When the download is done, an application
installs the updated URLs immediately on the local cache or on an HTTP
server.


France Telecom                Expires: July 1999              [Page 5]
                           Internet-Draft              February 1999


So there are two different methods to pre-fill the cache :
- The pre-filling by co-operation with an other local cache.
- The pre-filling by redirection of the traffic towards a local HTTP
  server.

The most frequently requested URLs are in fact directly delivered and
are locally available. No backbone connection is needed with the 
original remote servers.
As the update over satellite of the local cache or the HTTP server is
an unidirectional mechanism, the bandwidth of the backbone is not at 
all affected by the refreshing of contents. 


II-2.1 Technical solution of pre-filling by co-operation 

The Netscape Proxy Server 3.5 used for the experiment can not easily 
be pre-filled. Although it comprises a development API, this one does
not have functions to act on the cache content. Then the hierarchy
generated on the proxy server is complex, the file names are 
transcoded. So we decided to install a second cache. Due to their 
public domain sources, we chose to use the Proxies Caches Wcol and 
Squid. The source files only allow us to better understand the 
management mechanism of the content treated by these caches.
 
The optimization of the communications remains a major problem, so, to
avoid the increase of remote connections, we chose to install the 
cache that we wanted to pre-fill on the local network at the reception
point of the satellite. On this network, the pre-filling technique is 
based on the use of the ICP protocol. 
                            ___________ 
                 ////////  /           \  ////////
                //////////X  Satellite  X//////////
                 ////////  \___________/  ////////                               
               .                           .
             .                               .           
      \\   .                                   .  //
     \\  .                                       . //
     \\.                                          .//  
    * \\                                         // *
   / \  \\                                     //  / \
  /___\                                           /___\
    ||                                             ||
 +------------------------+                   +------+      +-------+
 |       ISP              | /|                |      |      |  PRE  |
 |  CACHE             R   |/ |     ISDN       |PROXY | ICP  |FILLING|
 |                    E   |  |================|      |<---->| CACHE |
 +---------------+    M   |\ |  FRAME RELAY   +------+      +-------+
        ||       |    O   | \|                   ||            ||          
      ++++++++   |    T   |                      ||            ||
     ++      ++  |    E   |                  ===================== LAN
    +          + |        |                     |             |
   +  INTERNET  +| ACCESS |                  +--------+     +--------+
   +            +|        |                  | User 1 |     | User 2 | 
     ++++++++++  +--------+                  +--------+     +--------+
 France Telecom                Expire: July 1999              [Page 6]
                             Internet-Draft              February 1999


II-2.1.1 Choice of the ICP co-operation mode 

The ICP protocol allows a hierarchy of co-operating caches to be 
defined.
Usually, it is natural to define a vertical hierarchy with two or 
three levels of parents/children.

                                Internet 
                                   ||
First Level                      Parent 
                              /         \
                             /           \
Second Level              Child1       Child2 
                          /    \       /    \
                         /      \     /      \
Third Level             C3      C4   C5      C6 

We privileged the relationship child/child ("sibling" mode) for the 
two following reasons :
 
1- The pre-filling operation requires to stop the cache, to launch
   the pre-filling process and then to restart the cache. In 
   operational mode, the Internet/Intranet services should not be 
   stopped during updates even temporarily. Moreover, the content
   to pre-fill can become very quickly voluminous, and can require
   a relatively long process time. Some further studies will permit
   these parameters to be improved. 

2- Using a parent/child hierarchy obliges the child to systematically
   request its parent, whatever the URL it needs. The parent must 
   search for this URL on the Internet, if it does not have it in its
   cache. In a child/child relation, if the requested child does not
   have the document, it only replies by a MISS. In no way, this 
   requested child will connect on the network to get the document. 
   It is the querying child that will directly get the document on the
   Internet. 
   This process does not modify its initial way of working. 


II-2.1.2 Impact of a second cache in the architecture.

The pre-filled cache completes the main proxy-cache. The two graphics
below explain the modification introduced by the pre-filled 
cache in the architecture. 

Sibling mode, chosen for the previously evoked reasons, generates a 
weak traffic between the querying and the replying caches.


France Telecom                Expires: July 1999              [Page 7]
                            Internet-Draft              February 1999


A - THE URL is contained in the pre-filled cache. 

                                    Query (ICP)
              +--------------+      2               +--------------+
              |              |   --------------- >  |     WCOLD    |
              | PROXY SERVER |                      |       or     |
              |   NETSCAPE   |   < ---------------  |     SQUID    |
              +--------------+      3 HIT           +--------------+
                 / \    |         
                  |     |
                1 |     | 4
                  |    \ /
              +--------------+   
              |              |   
              |    CLIENT    |   
              |              |   
              +--------------+

The pre-filled cache contains the requested URL, it replies HIT and
it subsequently returns the URL. As we are on the local network of the
compagny, transfer times are almost immediate. 
              
B - THE URL is not contained in caches


          Direct to the main server

                 / \    |  
                  |     |  
                4 |     | 5
                  |    \ /         Query (ICP)                      
              +--------------+      2               +--------------+
              |              |   --------------- >  |     WCOLD    |
              | PROXY SERVER |                      |       or     |
              |   NETSCAPE   |   < ---------------  |     SQUID    |
              +--------------+      3 MISS          +--------------+
                 / \    |         
                  |     |
                1 |     | 6
                  |    \ /
              +--------------+   
              |              |   
              |    CLIENT    |   
              |              |   
              +--------------+   

The pre-filled cache does not contain the requested URL, the Proxy 
server, upon receiving a MISS reply, decides to contact the original 
server to get the URL. In spite of the negative answer, the cache 
response time is negligible compared to the connection time to the 
remote original server.


France Telecom                Expires: July 1999              [Page 8]
                            Internet-Draft              February 1999


II-2.1.3 Collecting the test URLs.

We used a software that enables us to download all or part of a Web 
site. Files are recorded in their original formats (HTML, GIF, JPG,
etc.) while preserving original path of the information. In fact, 
this software permits us to replicate a part of the downloaded site.
A ZIP compression tool is used to optimize the transfer times. 

Remark: 
We noticed that it was not easy to predict the right level of the 
downloading. It is necessary to optimise the pre-filled contents to
avoid downloading documents that would be of very little interest. 


II-2.1.4 FTP transfer over satellite 

For the purpose of feasibility and demonstration, we used a FTP server
available on our Internet experimental platform . We put down a zipped
file on the platform and launched the FTP downloading from the Server 
containing the satellite board and the Netscape Proxy.


II-2.1.5 Results of caches pre-filling by co-operation
  
The progressive integration of the different elements of the 
experimentation (satellite, then the proxy-satellite couple, then the 
proxy pre-filling) shows a constant improvement in the rapid access 
time to documents. Every element of the process participates in 
reducing downloading times. It is all the more remarkable if tests are
carried out on video sequences or on Web sites containing a large 
number of high definition pictures. When used in unicast mode, the 
FTP transfers is more rapid. Background updates of a pre-filled 
content increases considerably the quality of service and limits the 
remote connection load. 


II-2.2 Technical solution of pre-filling by HTTP redirection 

We are going to describe in this part an alternative solution to the 
pre-filling of cache by co-operation. This solution also permits the
URLs to be pre-fetched over satellite in order to improve the quality
of service. 


II-2.2.1 principle 

Being a cache, the Proxy is able to filter the requests it receives. 
Due to this capability, one can deduce that it must be able to 
redirect those requests toward an HTTP server of our choice. 

The following diagram presents the functional principle of the 
pre-filling by redirection that we experimented: 


France Telecom                Expires: July 1999              [Page 9]
                            Internet-Draft              February 1999


     +++++                          +------+      +---------+
   ++     ++      /|                |      |      |  HTTP   |
 +           +   / |                |PROXY | HTTP | SERVER  |
+             + |  |================|      |<---->|         |
+  INTERNET   +  \ |                +------+      +---------+
 +           +    \|                   ||             ||
   ++     ++                           ||             ||
     +++++                             ||             ||
                                    ====================== LAN
                                        |             |
                                  +--------+     +--------+
                                  | User 1 |     | User 2 | 
                                  +--------+     +--------+


We can analyse this process in two distinct parts: 

1. The redirection of client requests by the cache toward 
   an HTTP server.
2. The feeding of an HTTP server with up-to-date documents 

All HTTP requests from a client are parsed by the cache. If one 
filter is applicable, the request is modified in order to be 
transmitted to the local HTTP server, otherwise, the request is
normally processed by the cache. The modification of the request
follows this principle: 

Initial URL requested by the client to the cache: 
http://remote_server/document.html 

Requested URL submitted by the cache to the HTTP server and returned 
to the client: 
http://local_server/remote_server/document.html 

The principle of the pre-filling consists in applying filters to the 
client requests so that the cache could request directly some 
documents to the local HTTP server. The applied filters can use 
regular expressions and can be the following ones : 

  asked URL                 |  mapped URL 
----------------------------------------------------------------------
http://www.ft.fr /          | http://server/www.ft.fr/index.html 
http://www.ft.fr/ima1.gif   | http://server/www.ft.fr/ima1.gif 
http://www.ft.fr/ima2.gif   | http://server/www.ft.fr/ima2.gif 
http://www.cnet.fr /        | http://server/www.cnet.fr / 

These examples permit us to redirect either some particular documents 
(those for the www.ft.fr site), or a whole site (www.cnet.fr). 
We must particularly pay attention to the writing of filters to make
sure that only the documents to be pre-filled are taken into account.
These filters must be up-to-date as soon as the content of the local 
HTTP server is modified. Finally, the great advantage of this


France Telecom                Expires: July 1999              [Page 10]
                             Internet-Draft              February 1999


redirection type is that it is transparent for the client. The client
thinks he reaches the original Web server whereas in fact, the 
document he receives comes from another http server with the same 
field address (contrary to the redirection defined in the HTTP
protocol).

The local HTTP server is regularly fed with up-to-date documents and 
the study of this transfer file will be the subject of a next 
experiment. 


II-2.2.2 Experiments of pre-filling by redirection 

We used, for this experimentation, the Netscape proxy-cache 3.52 on 
Solaris 2.6. This solution has been chosen because it enabled us to 
easily create filtering and mapping rules just by modifying a 
configuration file (obj.conf) and restarting the cache with the 
in-line command "restart". 

The HTTP server that we used is Apache but all other server could 
match for the experimentation. The two necessary points for this 
server, in our experimentation, are that it needs to be easy to feed 
and very efficient. 

We developped a Shell script that uses a list of URLs to create the 
Netscape configuration file. This script creates an up-to-date 
configuration file and then restarts the proxy-cache. The file 
containing the URLs has the following form: 

http://www.ft.fr/index.html 
http://www.ft.fr/ima1.gif 
http://www.ft.fr/ima2.gif 
http://www.cnet.fr/intro.html 
   ...

This script proved us the feasibility of a cache pre-filling service 
while using a simple and effective principle of HTTP traffic 
redirection. This kind of service can therefore be an efficient 
alternative to the experiments previously described. 

For the moment, this script needs to be manually launched once the 
up-to-date URLs are donwloaded and the filter file is created on the
targeted server.

III Technical description of pre-filling methods

This chapter describes methods used to pre-fill the caches Wcol and 
Squid. These methods were successfully implemented in the technical 
solution of cache pre-filling by ICP co-operation which has been 
previously described in this document. 


France Telecom                Expires: July 1999              [Page 11]
                             Internet-Draft              February 1999


III-1 Technical description of the pre-filling with Wcol 

Wcol (see [http://shika.aist-nara.ac.jp/products/wcol/wcol.html]) is a
cache which has particular pre-fetching functionalities, but these 
capabilities have not been used in our cache pre-filling studies. 
In fact, the interest of Wcol consists first in its capacity to 
support all or part of the ICP protocol since the WcolD version (the 
following version WcolE fully implements the protocol ICPv2 whereas 
the WcolD version only implements a small part of the ICP messages, 
which is however sufficient for our experiments). The second interest 
of Wcol is the simplicity of the hierarchy of the stored Web pages 
on the cache. 


III-1.1 Description of the files structure hierarchy with Wcol 

Under a main directory "http", corresponding to the protocol, a 
hash-coding key permits a first selection of the URLs and a first
level of directory to be constituted. Then, the URLs are stored 
directly within the hash directory whose name has the format
"hxxx" (xxx represents a number between 000 and 999), with a first
directory level corresponding to the server name, then the HTTP port,
and then the different directory names stored hierarchically. This 
storage mode is very similar to the one used in Web servers except 
for the hash-coding level. So, the internal storage hierarchy is easy
to recreate, and that is the reason why Wcol presented an interesting
solution for the experiment of pre-filling caches. 

Example : 
If the directory of internal storage is /home/cache/ (obtained by 
initializing the CacheDir keyword in the configuration file of Wcol),
the http://sample/Welcome.html URL stored in the cache will have the 
following path: 
/home/cache/http/h001/sample/80/Welcome.html 


III-1.2 Complementary files generated by Wcol 

When a URL is stored within Wcol (for example Welcome.html), the cache 
completes the stored URL by an information file with ",info" extension 
(ex: Welcome.html,info) which contains the information related to the 
stored URL for a specific internal use.
Among this information file, we can find attributes like the number of
times that the document has been accessed, the last modification date, 
the creation date, etc.
For every stored URL, there is also a header file with ",head" 
extension(ex: Welcome.html, head). This file contains the HTTP header 
and all related information. If the information file or the header 
file are missing, then Wcol does not consider the URL as valid though
it is stored at the good path. Therefore, in order for an URL to 
be correctly pre-loaded in the cache, it is essential to create the 
"HEAD" and "INFO" files. 


France Telecom                Expires: July 1999              [Page 12]
                             Internet-Draft              February 1999


In our experimentation, it was therefore necessary, in order to be 
able to pre-fill the cache, to implement the internal mechanism of 
Wcol for creating the HEAD and INFO files. 

Remark : 
Once the INFO and HEAD files created and the URL stored at the good 
place in the storage space of Wcol, the file is then validated by the 
cache though the information in the HEAD and INFO files are partial. 


III-1.2.1 Creation of the INFO file 

The creation of the information file is hard and requires the call to 
specific routines of Wcol stored in modules "base.c" and "info.c". 
The routine named "AssignFileName" stored in "base.c" has the 
advantage, for a given name of URL, to specify its exact location in 
the internal storage space of the cache. 
The "NewInfo" and "SaveInfo" routines of the "info.c" module permit 
the INFO file corresponding to a specific URL to be automatically 
created. Although many attributes are not initialized in the INFO 
structure created by a call to these routines, we noticed that a 
restricted information file, created by this way, is sufficient for 
the URL to be recognized by Wcol as valid, if at least the fields 
"attr.name", "attr.state", and "attr.last" of the INFO structure are
correctly initialized. 


III-1.2.2 Creation of the HEAD file 

For a pre-filled URL, it is always necessary to create a HEAD file in
order to be recognized by the cache. 
In fact, it is sufficient to create a short HEAD file that contains 
only the following information : 

HTTP/1.1 200 OK 
Content-type: -the MIME type corresponding to the URL- 

III-1.3 Complete pre-filling process with Wcol

Once understood and implemented in a software aiming at recreating the
HEAD and INFO files, the following step consisted in creating a tool 
permitting the whole description file making the link between a URL to
preload and its physical location to be processed. The format of this 
file is described in the chapter III-3. 
The tool creates information and header files. It also moves the URL 
to store, from its initial physical location on the hard disk, to the
right place in the storage space of Wcol. This process is executed by
the tool for each entry in the description file. 
Once the description file is created, it is necessary to store 
temporarily or not the files to include in the cache at places stated 
in the description file. Then this tool previously described has just 
to be launched. Therefore the prefilling mechanism of Wcol that has 


France Telecom                Expires: July 1999              [Page 13]
                             Internet-Draft              February 1999


been achieved in the experiment of cache pre-filling over satellite 
contains these three elements: 

- process aiming at recreating the INFO and HEAD files 
- tool processing of the description file of the URLs 
- the description file itself 

Remark:
For our experiments we stopped Wcol before each pre-filling process 
and reactivated it in order to simplify the complete experiment and
to avoid that data stored in memory by Wcol interferes with preloaded
data.
 

III-2 Technical description of the pre-filling with SQUID 

As it has been previously described, the cache Wcol has the advantage
of storing the information in a very simple way, which is very similar
to the hierarchies of files stored on Web servers. The disadvantage of
this solution is the fact that Wcol is not sufficiently widespread 
compared to the main caches that we find on the market (Netscape Proxy
Server, SQUID, etc.). That is why the second part of the 
experimentation consisted in studying the opportunities to pre-fill 
the content of a frequently used cache which supports ICP and whose 
file sources are available in the public domain. The only one we found
is SQUID. This famous cache is also known for its quality and its 
resistance in the case of important loads; experiments have been done 
with the version 1.1.22 of SQUID (see [http://squid.nlanr.net /]). 


III-2.1 Description of the files structure hierarchy with SQUID 

With SQUID, the hierarchy of stored files is more complex than it is
with Wcol. In fact, the hierarchy of files stored by SQUID does not 
have any common point with the one of a Web server, because it was 
created in order to optimize the research of files by the use of 
hash-coding keys on two distinct levels of hierarchy whereas Wcol only
 has one hash-coding key level. Moreover, files are not stored 
directly within the cache as it is the case with Wcol, but a 
transformation and a renaming are operated before their storage. 
The exact location of a file within SQUID is made possible through the
analysis of the file "log" that contains the link between an URL and 
the stored file. 


III-2.1.1 Description of the LOG file 

The information permitting the location of a file is specified in the 
"log" file whose exact location is written in the configuration file 
of SQUID (key word: cache_dir). 

In this LOG file, there is one line for each URL stored in the cache. 
The format of that line is the following one: 

France Telecom                Expires: July 1999              [Page 14]
                             Internet-Draft              February 1999


- name of the file on 8 hexadecimal characters
- creation date 
- expiration date 
- last modification date
- length of the file 
- URL corresponding to the file 

Thus, it appears that, without this LOG file, it is not possible to
make the link between a URL and the corresponding stored file. 

Therefore, it is necessary to generate this LOG file to be able to 
pre-fill SQUID. 


III-2.1.2 Localization of the storage path 

The name of a file stored in a SQUID cache, after transformation, is
being coded on 8 hexadecimal numbers. So, it is not sufficient to 
describe precisely the exact place of physical storage of the file in
the cache. In fact, it is absolutely necessary to use information 
stored in the configuration file (keywords: swap_level1_dirs and 
swap_level2_dirs) that permit the final file path to be calculated.
The physical storage path on the disk is then generated using the 
following formulas : 

- first level of directory = name of file % swap_level1_dirs 

- second level of directory = name of file / swap_level1_dirs % 
                              swap_level2_dirs 

These formulas come from function "storeSwapFullPatch" stored in the 
"store.c" module. 
By applying these formulas, it is then possible to calculate the final 
location on the hard drive from the Squid filename. 

For example : 
the file 00000001 is stored at /Squid/cache/01/00/00000001 if keyword 
cache_dir is equal to /Squid/cache in the configuration file of SQUID. 


III-2.2 Content of a file stored in SQUID 

Once the physical storage place in cache is known, it is necessary to
create the file to store. This file is based on the URL and includes 
any supplementary information that SQUID required in order to consider
the stored file as valid. These complementary information must be 
stored at the beginning of a file, then, the full content of the URL 
must be added. The complementary information to add at the beginning 
of the file are : 

HTTP/1.1 200 OK 
Content-type: -the MIME type MIME of the stored object- 


France Telecom                Expires: July 1999              [Page 15]
                             Internet-Draft              February 1999


It is interesting to see that this complementary information is 
precisely the same than the one that was necessary in the HEAD file 
of Wcol. Other information can be added, which appeare in the web 
browsers properties menus. But the previously main characteristics 
described are sufficient to consider the stored files as valid.

III-2.3 LOG file generation

In order for the pre-filling to be correctly taken into account by 
SQUID, it is still necessary to generate the line of the LOG file 
corresponding to the URL that must be forced in the cache. In our 
case, we noticed that the following information has to be absolutely
created for each line with the following form : 

- name of the stored file (8 hexadecimal numbers) incremented by 
  one for each new line. 
- creation date. 
- fffffffe for the date of expiration (see paragraph III-2-5). 
- modification date lower than the creation date. 
- size of the stored file in the cache; that is size of header info +
  size of the object to store. 
- URL corresponding to the stored file .

Entry sample in the LOG file: 

00000009 3a6c6c6c fffffffe 3a6c6c60 250 http://sample/Welcome.html


III-2.4 Full pre-filling process with SQUID 

After the previous descriptions of the way SQUID stores URLs, the 
necessary work to pre-fill the cache consisted in the creation of the
following three elements : 
- process aiming at generating the file header and to create the file 
  to store within SQUID from the URL to include; the process generates
  the line of the LOG file corresponding to the URL, and inserts the
  file to store within. 
- process responsible for the processing of the description file of 
  the URLS 
- description file itself whose syntax is precisely the same that 
  was used for Wcol 

The method used to pre-fill SQUID is really similar to the one used 
for Wcol. The supplementary complexity of SQUID is due to the more 
complex representation of a URL in its storage place. 

Remark: 
On each SQUID startup, the link between the URLs and the stored files
on the disk is recreated in memory in order to optimize the research
time. 
Thus, for the experimentation, it was necessary to stop SQUID before 
every pre-filling process in order to avoid the data located in memory
to interfere with pre-filled information. 

France Telecom                Expires: July 1999              [Page 16]
                             Internet-Draft              February 1999


III-2.5 Difficulties encountered with dates 

A problem appeared concerning the expiration date of pre-filled 
documents. It has been observed that SQUID quickly considers a pre-
filled document as outdated after a few minutes, what means that the 
supplementary information about creation and last modification dates 
must also be added in header at the beginning of the stored documents.
However, for the purpose of the experimentation, it was sufficient to
fix an expiration date in the LOG different of fffffffe and superior 
to the creation date so that the document is no more considered as 
outdated by SQUID, until expiration of this date. 


III-3 Description file of URLS for the pre-filling with Wcol and SQUID 

The description file has a very simple structure because, in fact, 
only three attributes are absolutely necessary to well pre-fill Wcol 
and SQUID. 
The required data for each line of the file is the URL in its 
normalized format (see [RFC 1738]), the exact location (on the hard 
drive or on the network) of the document that must be included in the
cache , as well as the MIME type of this document (see [RFC 1341]). 
Each of these fields must be separated by a space. The description 
file contains "n" lines for "n" URLs to include in the cache. 

Example of description file: 

http://sample/Welcome.html /home/sample/Welcome.html text/html 
http://sample/Welcome.gif /home/sample/Welcome.gif image/gif 

 
IV - Advantages highlighted by these experiments

Pre-filling cache technologies which have been implemented and 
described in this document have shown their feasibility in real 
experiments. The reading of the SQUID and WCOL sources available in 
the public domain, considerably helped us to quickly find pre-filling
solution. Moreover, these experiments allowed us in complex co-
operating cache architectures using ICP to be validated between
two different operating systems and two different caches products.
Thus that shows the inter-operability of these solutions and also the
advantages of the ICP protocol.

The use of a satellite link highlights the great potentiality of this
media to transfer bulky contents very quickly as near as possible from
the end-users. Information access times are considerably improved. 
The couple satellite link and pre-filling cache avoids part of the 
problems involved in the traffic congestion. In spite of the delay of
300 ms generated by a GEO satellite , the benefit of this connection 
becomes undeniable as soon as the volume of the required document 
exceeds 5 KBytes. Some of our tests were relative to large volumes
of video and high definition pictures. In that case, playing a video


France Telecom                Expires: July 1999              [Page 17]
                             Internet-Draft              February 1999


animation, while continuing the transfer in background, ensures a good
fluidity of the video sequences without any cuts which frequently 
occur on ISDN connections (64 Kb/s). Moreover, the level of confidence
of the satellites makes it possible to use light error correction 
protocols.

A smart anticipation of user's needs and a fine update processing 
can even make him suppose that he can use an Internet bandwidth equal
to the bandwidth of the local area network. Satellite is also a simple
way to provide powerful and very fast accesses in critical or badly 
served areas with no high speed infrastructures.
Moreover, when an ISP's architecture is being upgraded by adding a new
cache server, it is possible to take benefits of these techniques to 
pre-fill this cache and initialize it with  a preset content 
according to interests of the users. That allows time to be gained and
the cache to be made immediately effective, whereas, in 
practice, a long initialization process is still necessary, during 
which first users have no benefits from using this cache. 


V - Next Experiments 

We will first improve the tools previously described. Then, the next
researches concern the diffusion of contents. In fact we aim at pre-
filling simultaneously several caches. For this purpose, we will use
satellite and multicast protocols, for example, the MFTP protocol. 
Moreover, we will study the use of satellite diffusion in order to 
pre-fill caches with contents specifically targeted for communities 
of interest. 


V-1 Multicast diffusion toward several remote caches 

The following steps consists in experimenting the pre-filling while 
using a real multicast transfer between the satellite up-link site 
and the different reception sites. 


V-2 Automatic feeding of the HTTP server 

In the case of pre-filling by redirection, the experimentation does 
not yet integrate the basics of feeding the HTTP server with up to 
date documents. We also do not have yet implemented neither an
automatic file transfer method nor an automatic choice of the files
 we download on the HTTP server. 

So, our next work will consist in an improvement of our script in 
order to automate and optimize updates and the restart the cache as 
soon as a new directory structure is written on the HTTP server. 

The second stage will be the use of a file transfer method to the HTTP
server using a satellite link. 


France Telecom                Expires: July 1999              [Page 18]
                             Internet-Draft              February 1999


V-3 Diffusion services toward communities of interest 

Using the architectures defined in the previous experiments, we will 
work on the definition of a feeding service of up-to-date documents. 
We will use for that purpose a file transfer method that we will
later develop. This service could depend on an analysis of the logs 
we could get on the caches, and, for example, could decide that the 
next satellite update of the HTTP server will only concern the most 
popular URLs. In that case, a feedback on the most used cached data 
(by analyzing the log files) will contribute to make the pre-filling
more cost-effective and more interesting. 

The identification and the binding of the major points of interest we
can extract from the analysis of logs could enable us to create groups
of interest (which can be different on each site). 
As all clients do not have the same points of interest, studies will
be led to optimize the transfer of a common content to all caches, 
and, then the transfer of a personalized content for the purpose of 
each community of interest. 

We will also work on dynamic pages. 


V-4 Pre-filling using ICP extensions 

Another kind of caches pre-filling based on the ICP extensions will 
be implemented in a further experiments. 
Indeed, ICP extensions proposed in the referenced draft 
[draft-lovric-icp-ext-01.txt] permit the content of any cache to be
pre-filled thanks to push-caching messages. 

A process that would send to a targeted cache an ICP_OP_SET message
with the ICP_FLAG_ALIAS flag set, could force an URL in the targeted 
cache. It uses for that purpose a protocol like "file://..." to 
specify to the cache the network path of the stored URL alias. The 
cache must then fetch this alias in a lower or equal time than the
delay set in the ICP_OP_SET message. Otherwise, it is also possible
to pre-fill a full list of URLs by sending an ICP_OP_SET_TAB message
with the ICP_FLAG_ALIAS flag set. In this case, the alias contains the
list of the URLs to pre-fill. Each URL must also have an alias  
specifying its network path. 

Example of list file 
(see [draft-lovric-icp-ext-01.txt] for list file syntax) 
to pre-fill the following URLs :  http://sample/Welcome.html
                                  http://sample/Welcome.gif
1,http
2,sample
3,80
4,/
5,I,Welcome.html,A,file://home/sample/Welcome.html 
5,I,Welcome.gif,A,file://home/sample/Welcome.gif


France Telecom                Expires: July 1999              [Page 19]
                             Internet-Draft              February 1999

Note: ICP Extensions also permit compressed aliases to be pre-filled.


VI Partnership France Telecom - EUTELSAT

A partnership between France Telecom and EUTELSAT will focus on the
evaluation of the previously described solutions in a large scale 
multicast platform using satellites.
EUTELSAT will provide the UpLink site and the satellite bandwidth
and France Telecom will provide the cache pre-filling solutions.
New diffusion services of personalized contents to different 
communities of interest will also be evaluated on this platform.

The results of these evaluations will be published in a second draft 
which will be written by both partners. 

VII References

[RFC-1341] Borenstein, N., N. Freed and, "MIME (Multipurpose Internet
Mail Extensions): Mechanisms for Specifying and Describing the Format
of Internet Message Bodies", 
RFC 1341, Bellcore, June, 1992. 

[RFC1738] 

Berners-Lee, T., Masinter, L., and Mr. McCahill, "Uniform Resource 
Locators (URL)", RFC 1738, CERN, Xerox PARK, University of Minnesota, 
December 1994. 

[RFC2186] 
D. Wessels, K., Claffy, "Internet Cache Protocol (ICP), version 2", 
RFC 2186, National Laboratory for Applied Network Research/UCSD, 
September 1997 

[draft-lovric-icp-ext-01.txt] 
Lovric, "Internet Cache Protocol Extension", France Telecom, 
October 1998 


VIII Acknowledgments

The authors wish to thank Sandrine CHELLES, Christophe NETILLARD,
Gilles GRATTARD, Betty PREHU, Sylvie LOVRIC for helping us in writing 
this document.


France Telecom                Expires: July 1999              [Page 20]
                             Internet-Draft              February 1999


IX Authors' addresses

Cedric Goutard
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243 
14066 Caen Cedex
France
Phone: +33 2 31 75 91 49 
Fax: +33 2 31 73 56 26 
E-mail: cedric.goutard@cnet.francetelecom.fr 

Ivan Lovric
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243 
14066 Caen Cedex
France
Phone: +33 2 31 75 91 25
Fax: +33 2 31 73 56 26 
E-mail: ivan.lovric@cnet.francetelecom.fr 

Eric Maschio-Esposito
France Telecom
Centre National des Etudes en Telecommunications
42, rue des Coutures BP 6243 
14066 Caen Cedex
France
Phone: +33 2 31 75 91 63
Fax: +33 2 31 73 56 26
E-mail: eric.maschio-esposito@cnet.francetelecom.fr 


France Telecom                Expires: July 1999              [Page 21]