< draft-wessels-icp-v2-appl-01.txt   draft-wessels-icp-v2-appl-02.txt >
Network Working Group D. Wessels Network Working Group D. Wessels
Internet-Draft K. Claffy Internet-Draft K. Claffy
National Laboratory for Applied National Laboratory for Applied
Obsoletes <draft-wessels-icp-v2-appl-00.txt> Network Research/UCSD Obsoletes <draft-wessels-icp-v2-appl-02.txt> Network Research/UCSD
Expires: 1 January 1998 1 July 1997 Expires: 8 January 1998 8 July 1997
Application of Internet Cache Protocol (ICP), version 2 Application of Internet Cache Protocol (ICP), version 2
<draft-wessels-icp-v2-appl-01.txt> <draft-wessels-icp-v2-appl-03.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft. Internet-Drafts are working This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
skipping to change at page 1, line 32 skipping to change at page 1, line 32
material or to cite them other than as ``work in progress.'' material or to cite them other than as ``work in progress.''
To learn the current status of any Internet-Draft, please check the To learn the current status of any Internet-Draft, please check the
``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
ftp.isi.edu (US West Coast). ftp.isi.edu (US West Coast).
Abstract Abstract
This draft document describes version 2 of the Internet Cache This document describes the application of ICPv2 (Internet Cache
Protocol (ICPv2) as currently implemented in two World-Wide Web proxy Protocol version 2, RFCXXXX) to Web caching. ICPv2 is a lightweight
cache packages[3,5]. ICP is a lightweight message format used for message format used for communication among Web caches. Several
communicating among Web caches. ICP is used to exchange hints about independent caching implementations now use ICP[3,5], making it
the existence of URLs in neighbor caches. Caches exchange ICP important to codify the existing practical uses of ICP for those
queries and replies to gather information to use in selecting the trying to implement, deploy, and extend its use.
most appropriate location from which to retrieve an object.
This document describes the application of ICPv2 to Web caching. A ICP queries and replies refer to the existence of URLs (or objects)
companion document (RFCXXXX, <draft-wessels-icp-v2-03.txt>) describes in neighbor caches. Caches exchange ICP messages and use the
the format and syntax of the protocol itself. Several independent gathered information to select the most appropriate location from
caching implementations now use ICP, and we consider it important to which to retrieve an object. A companion document (RFCXXXX)
codify the existing practical uses of ICP for those trying to describes the format and syntax of the protocol itself. In this
implement, deploy, and extend its use for their own purposes. document we focus on issues of ICP deployment, efficiency, security,
and interaction with other aspects of Web traffic behavior.
Table of Contents Table of Contents
1. Introduction................................................. 2 1. Introduction................................................. 2
2. Web Cache Hierarchies........................................ 3 2. Web Cache Hierarchies........................................ 3
3. What is the Added Value of ICP?.............................. 5 3. What is the Added Value of ICP?.............................. 5
4. Example Configuration of ICP Hierarchy....................... 5 4. Example Configuration of ICP Hierarchy....................... 5
4.1. Configuring the `proxy.customer.org' cache................. 5 4.1. Configuring the `proxy.customer.org' cache................. 6
4.2. Configuring the `cache.isp.com' cache...................... 6 4.2. Configuring the `cache.isp.com' cache...................... 6
5. Applying the Protocol........................................ 7 5. Applying the Protocol........................................ 7
5.1. Sending ICP Queries........................................ 7 5.1. Sending ICP Queries........................................ 8
5.2. Receiving ICP Queries and Sending Replies.................. 9 5.2. Receiving ICP Queries and Sending Replies.................. 10
5.3. Receiving ICP Replies...................................... 11 5.3. Receiving ICP Replies...................................... 11
5.4. ICP Options................................................ 13 5.4. ICP Options................................................ 13
6. Firewalls.................................................... 14 6. Firewalls.................................................... 14
7. Multicast.................................................... 14 7. Multicast.................................................... 15
8. Lessons Learned.............................................. 15 8. Lessons Learned.............................................. 16
8.1. Differences Between ICP and HTTP........................... 16
8.2. Parents, Siblings, Hits and Misses......................... 16
8.3. Different Roles of ICP..................................... 17
8.4. Protocol Design Flaws of ICPv2............................. 17
9. Security Considerations...................................... 18 9. Security Considerations...................................... 18
9.1. Inserting Bogus ICP Queries................................ 19
9.2. Inserting Bogus ICP Replies................................ 19
9.3. Eavesdropping.............................................. 20
9.4. Blocking ICP Messages...................................... 20
9.5. Delaying ICP Messages...................................... 20
9.6. Denial of Service.......................................... 20
9.7. Altering ICP Fields........................................ 21
9.8. Summary.................................................... 22
10. References................................................... 23 10. References................................................... 23
11. Acknowledgments.............................................. 23 11. Acknowledgments.............................................. 24
12. Author's Addresses........................................... 23 12. Author's Addresses........................................... 24
1. Introduction 1. Introduction
ICP is a lightweight message format used for communicating among Web ICP is a lightweight message format used for communicating among Web
caches. ICP is used to exchange hints about the existence of URLs in caches. ICP is used to exchange hints about the existence of URLs in
neighbor caches. Caches exchange ICP queries and replies to gather neighbor caches. Caches exchange ICP queries and replies to gather
information for use in selecting the most appropriate location from information for use in selecting the most appropriate location from
which to retrieve an object. which to retrieve an object.
This document describes the implementation of ICP in software. For a This document describes the implementation of ICP in software. For a
description of the protocol and message format, please refer to the description of the protocol and message format, please refer to the
companion document ((RFCXXXX, <draft-wessels-icp-v2-03.txt>). We companion document (RFCXXXX). We avoid making judgments about
avoid making judgments about whether or how ICP should be used in whether or how ICP should be used in particular Web caching configu-
particular Web caching configurations. ICP may be a "net win" in rations. ICP may be a "net win" in some situations, and a "net loss"
some situations, and a "net loss" in others. We recognize that cer- in others. We recognize that certain practices described in this
tain things described in this document are incorrect approaches. document are suboptimal. Some of these exist for historical reasons.
Some of these exist for historical reasons. Some aspects have been Some aspects have been improved in later versions. Since this docu-
improved in later versions. Since this document only serves to ment only serves to describe current practices, we focus on document-
describe current practices, we focus on documenting rather than eval- ing rather than evaluating. However, we do address known security
uating. However, we do address known security problems and other problems and other shortcomings.
shortcomings.
The remainder of this document is written as follows: First, we give The remainder of this document is written as follows. We first
some basic descriptions and definitions of Web cache hierarchies, a describe Web cache hierarchies, explain motivation for using ICP, and
brief justification for the existence of ICP, and demonstrate how demonstrate how to configure its use in cache hierarchies. We then
cache hierarchies and ICP are expressed in configuration files. Sec- provide a step-by-step description of an ICP query-response transac-
tion five includes a step-by-step description of a ICP query-response tion. We then discuss ICP interaction with firewalls, and briefly
transaction. The subsequent sections address firewalls, multicast, touch on multicasting ICP. We end with lessons with have learned
lessons learned, and security. during the protocol development and deployement thus far, and the
canonical security considerations.
ICP was initially developed by Peter Danzig, et. al. at the Univer- ICP was initially developed by Peter Danzig, et. al. at the Univer-
sity of Southern California as a central part of hierarchical caching sity of Southern California as a central part of hierarchical caching
in the Harvest research project[3]. in the Harvest research project[3].
2. Web Cache Hierarchies 2. Web Cache Hierarchies
A single Web cache will reduce the amount of traffic generated by the A single Web cache will reduce the amount of traffic generated by the
clients behind it. Similarly, a group of Web caches can benefit by clients behind it. Similarly, a group of Web caches can benefit by
sharing another cache in much the same way. Researchers on the Har- sharing another cache in much the same way. Researchers on the Har-
skipping to change at page 3, line 39 skipping to change at page 4, line 4
does have the requested object (i.e., a "neighbor hit"), then the does have the requested object (i.e., a "neighbor hit"), then the
cache will request it from them. If none of the neighbors has the cache will request it from them. If none of the neighbors has the
object (a "neighbor miss"), then the cache must forward the request object (a "neighbor miss"), then the cache must forward the request
either to a parent, or directly to the origin server. The essential either to a parent, or directly to the origin server. The essential
difference between a parent and sibling is that a "neighbor hit" may difference between a parent and sibling is that a "neighbor hit" may
be fetched from either one, but a "neighbor miss" may NOT be fetched be fetched from either one, but a "neighbor miss" may NOT be fetched
from a sibling. In other words, in a sibling relationship, a cache from a sibling. In other words, in a sibling relationship, a cache
can only ask to retrieve objects that the sibling already has cached, can only ask to retrieve objects that the sibling already has cached,
whereas the same cache can ask a parent to retrieve any object whereas the same cache can ask a parent to retrieve any object
regardless of whether or not it is cached. A parent cache's role is regardless of whether or not it is cached. A parent cache's role is
to provide "transit" for the request if necessary, and accordingly
parent caches are ideally located within or on the way to a transit
Internet service provider (ISP).
Squid and Harvest allow for complex hierarchical configurations. For
example, one could specify that a given neighbor be used for only a
certain class of requests, such as URLs from a specific DNS domain.
Additionally, it is possible to treat a neighbor as a sibling for
some requests and as a parent for others.
The cache hierarchy model described here includes a number of fea-
tures to prevent top-level caches from becoming choke points. One is
T H E I N T E R N E T T H E I N T E R N E T
=========================== ===========================
| || | ||
| || | ||
| || | ||
| || | ||
| +----------------------+ | +----------------------+
| | | | | |
| | PARENT | | | PARENT |
| | CACHE | | | CACHE |
skipping to change at page 4, line 45 skipping to change at page 4, line 45
| | | | | | | | | |
| | | | | | | | | |
V V V V V V V V V V
=================== ===================
CACHE CLIENTS CACHE CLIENTS
FIGURE 1: A Simple Web cache hierarchy. The local cache can FIGURE 1: A Simple Web cache hierarchy. The local cache can
retrieve hits from sibling caches, hits and misses from parent retrieve hits from sibling caches, hits and misses from parent
caches, and some requests directly from origin servers. caches, and some requests directly from origin servers.
to provide "transit" for the request if necessary, and accordingly
parent caches are ideally located within or on the way to a transit
Internet service provider (ISP).
Squid and Harvest allow for complex hierarchical configurations. For
example, one could specify that a given neighbor be used for only a
certain class of requests, such as URLs from a specific DNS domain.
Additionally, it is possible to treat a neighbor as a sibling for
some requests and as a parent for others.
The cache hierarchy model described here includes a number of fea-
tures to prevent top-level caches from becoming choke points. One is
the ability to restrict parents as just described previously (by the ability to restrict parents as just described previously (by
domains). Another optimization is that the cache only forwards domains). Another optimization is that the cache only forwards
cachable requests to its neighbors. A large class of Web requests cachable requests to its neighbors. A large class of Web requests
are inherently uncachable, including: requests requiring certain are inherently uncachable, including: requests requiring certain
types of authentication, session-encrypted data, highly personalized types of authentication, session-encrypted data, highly personalized
responses, and certain types of database queries. Lower level caches responses, and certain types of database queries. Lower level caches
should handle these requests directly rather than burdening parent should handle these requests directly rather than burdening parent
caches. caches.
3. What is the Added Value of ICP? 3. What is the Added Value of ICP?
skipping to change at page 8, line 48 skipping to change at page 9, line 13
request (method, port number, source, etc.). request (method, port number, source, etc.).
o The peer is a sibling, and the HTTP request includes a "Pragma: o The peer is a sibling, and the HTTP request includes a "Pragma:
no-cache" header. This is because the sibling would be asked to no-cache" header. This is because the sibling would be asked to
transit the request, which is not allowed. transit the request, which is not allowed.
o The peer is configured to never be sent ICP queries (i.e. with o The peer is configured to never be sent ICP queries (i.e. with
the `no-query' option). the `no-query' option).
If the determination yields only one queryable ICP peer, and the If the determination yields only one queryable ICP peer, and the
squid configuration directive `single_parent_bypass' is set, then one Squid configuration directive `single_parent_bypass' is set, then one
can bypass waiting for the single ICP response and just send the HTTP can bypass waiting for the single ICP response and just send the HTTP
request directly to the peer cache. request directly to the peer cache.
The squid configuration option `source_ping' configures a squid cache The Squid configuration option `source_ping' configures a Squid cache
to send a ping to the original source simultaneous with its ICP to send a ping to the original source simultaneous with its ICP
queries, in case the origin is closer than any of the caches. queries, in case the origin is closer than any of the caches.
5.1.3. Calculate the expected number of ICP replies 5.1.3. Calculate the expected number of ICP replies
Harvest and Squid want to maximize the chance to get a HIT reply from Harvest and Squid want to maximize the chance to get a HIT reply from
one of the peers. Therefore, the cache waits for all ICP replies to one of the peers. Therefore, the cache waits for all ICP replies to
be received. Normally, we expect to receive an ICP reply for each be received. Normally, we expect to receive an ICP reply for each
query sent, except: query sent, except:
skipping to change at page 16, line 7 skipping to change at page 16, line 15
exactly how many replies to expect. Squid regularly (every 15 min- exactly how many replies to expect. Squid regularly (every 15 min-
utes) sends out test ICP_OP_QUERY messages to only the multicast utes) sends out test ICP_OP_QUERY messages to only the multicast
group peers. As with a real ICP query, a timeout event is installed group peers. As with a real ICP query, a timeout event is installed
and the replies are counted until the timeout occurs. We have found and the replies are counted until the timeout occurs. We have found
that the received count varies considerably. Therefore, the number that the received count varies considerably. Therefore, the number
of replies to expect is calculated as a moving average, rounded down of replies to expect is calculated as a moving average, rounded down
to the nearest integer. to the nearest integer.
8. Lessons Learned 8. Lessons Learned
8.1. Differences between ICP and HTTP 8.1. Differences Between ICP and HTTP
ICP is notably different from HTTP. HTTP supports a rich and sophis- ICP is notably different from HTTP. HTTP supports a rich and sophis-
ticated set of features. In contrast, ICP was designed to be simple, ticated set of features. In contrast, ICP was designed to be simple,
small, and efficient. HTTP request and reply headers consist of small, and efficient. HTTP request and reply headers consist of
lines of ASCII text delimited by a CRLF pair, whereas ICP uses a lines of ASCII text delimited by a CRLF pair, whereas ICP uses a
fixed size header and represents numbers in binary. The only thing fixed size header and represents numbers in binary. The only thing
ICP and HTTP have in common is the URL. ICP and HTTP have in common is the URL.
Note that the ICP message does not even include the HTTP request Note that the ICP message does not even include the HTTP request
method. The original implementation assumed that only GET requests method. The original implementation assumed that only GET requests
skipping to change at page 16, line 45 skipping to change at page 17, line 8
reply does it say that the two caches have a sibling or parent rela- reply does it say that the two caches have a sibling or parent rela-
tionship. A sibling cache can only respond with HIT or MISS, not tionship. A sibling cache can only respond with HIT or MISS, not
"you can retrieve this from me" or "you can not retrieve this from "you can retrieve this from me" or "you can not retrieve this from
me." The querying cache must apply the HIT or MISS reply to its me." The querying cache must apply the HIT or MISS reply to its
local configuration to prevent it from resolving misses through a local configuration to prevent it from resolving misses through a
sibling cache. This constraint is awkward, because this aspect of sibling cache. This constraint is awkward, because this aspect of
the relationship can be configured only in the cache originating the the relationship can be configured only in the cache originating the
requests, and indirectly via the access controls configured in the requests, and indirectly via the access controls configured in the
queried cache as described earlier in section 4.2. queried cache as described earlier in section 4.2.
8.3. Different roles of ICP 8.3. Different Roles of ICP
There are two different understandings of what exactly the role of There are two different understandings of what exactly the role of
ICP is in a cache mesh. One understanding is that ICP's role is only ICP is in a cache mesh. One understanding is that ICP's role is only
object location, specifically, to provide hints about whether or not object location, specifically, to provide hints about whether or not
a named object exists in a neighbor cache. An implied assumption is a named object exists in a neighbor cache. An implied assumption is
that cache hits are highly desirable, and ICP is used to maximize the that cache hits are highly desirable, and ICP is used to maximize the
chance of getting them. If an ICP message is lost due to congestion, chance of getting them. If an ICP message is lost due to congestion,
then nothing significant is lost; the request will be satisfied then nothing significant is lost; the request will be satisfied
regardless. regardless.
skipping to change at page 18, line 44 skipping to change at page 19, line 6
unknown addresses. unknown addresses.
Because we trust the validity of an address in an IP packet, ICP is Because we trust the validity of an address in an IP packet, ICP is
susceptible to IP address spoofing. In this document we address some susceptible to IP address spoofing. In this document we address some
consequences of IP address spoofing. Normally, spoofed addresses can consequences of IP address spoofing. Normally, spoofed addresses can
only be detected by routers, not by hosts. However, the IP Authenti- only be detected by routers, not by hosts. However, the IP Authenti-
cation Header[7,8] can be used underneath ICP to provide crypto- cation Header[7,8] can be used underneath ICP to provide crypto-
graphic authentication of the entire IP packet containing the ICP graphic authentication of the entire IP packet containing the ICP
protocol, thus eliminating the risk of IP address spoofing. protocol, thus eliminating the risk of IP address spoofing.
9.1. Inserting bogus ICP queries 9.1. Inserting Bogus ICP Queries
Processing an ICP_OP_QUERY message has no known security implica- Processing an ICP_OP_QUERY message has no known security implica-
tions, so long as the requesting address is granted access to the tions, so long as the requesting address is granted access to the
cache. cache.
9.2. Inserting bogus ICP replies 9.2. Inserting Bogus ICP Replies
Here we are concerned with a third party generating ICP reply mes- Here we are concerned with a third party generating ICP reply mes-
sages which are returned to the querying cache before the real reply sages which are returned to the querying cache before the real reply
arrives, or before any replies arrive. The third party may insert arrives, or before any replies arrive. The third party may insert
bogus ICP replies which appear to come from legitimate neighbors. bogus ICP replies which appear to come from legitimate neighbors.
There are three vulnerabilities: There are three vulnerabilities:
o Preventing a certain neighbor from being used o Preventing a certain neighbor from being used
If a third-party could send an ICP_OP_MISS_NOFETCH reply back If a third-party could send an ICP_OP_MISS_NOFETCH reply back
skipping to change at page 20, line 14 skipping to change at page 20, line 26
Address is zero-filled by Squid and Harvest, the URLs cannot be Address is zero-filled by Squid and Harvest, the URLs cannot be
mapped back to individual host systems. mapped back to individual host systems.
By default, Squid and Harvest do not send ICP messages for URLs con- By default, Squid and Harvest do not send ICP messages for URLs con-
taining `cgi-bin' or `?'. These URLs sometimes contain sensitive taining `cgi-bin' or `?'. These URLs sometimes contain sensitive
information as argument parameters. Cache administrators need to be information as argument parameters. Cache administrators need to be
aware that altering the configuration to make ICP queries for such aware that altering the configuration to make ICP queries for such
URLs may expose sensitive information to outsiders, especially when URLs may expose sensitive information to outsiders, especially when
multicast is used. multicast is used.
9.4. Blocking ICP messages 9.4. Blocking ICP Messages
Intentionally blocked (or discarded) ICP queries or replies will Intentionally blocked (or discarded) ICP queries or replies will
appear to reflect link failure or congestion, and will prevent the appear to reflect link failure or congestion, and will prevent the
use of a neighbor as well as lead to timeouts (see section 5.1.4). use of a neighbor as well as lead to timeouts (see section 5.1.4).
If all messages are blocked, the cache will assume the neighbor is If all messages are blocked, the cache will assume the neighbor is
down and remove it from the selection algorithm. However, if, for down and remove it from the selection algorithm. However, if, for
example, every other query is blocked, the neighbor will remain example, every other query is blocked, the neighbor will remain
"alive," but every other request will suffer the ICP timeout. "alive," but every other request will suffer the ICP timeout.
9.5. Delaying ICP messages 9.5. Delaying ICP Messages
The neighbor selection algorithm normally waits for all ICP MISS The neighbor selection algorithm normally waits for all ICP MISS
replies to arrive. Delaying queries or replies, so that they arrive replies to arrive. Delaying queries or replies, so that they arrive
later than they normally would, will cause additional delay for the later than they normally would, will cause additional delay for the
subsequent HTTP request. Of course, if messages are delayed so that subsequent HTTP request. Of course, if messages are delayed so that
they arrive after the timeout, the behavior is the same as "blocking" they arrive after the timeout, the behavior is the same as "blocking"
above. above.
9.6. Denial of service 9.6. Denial of Service
A denial-of-service attack, where the ICP port is flooded with a con- A denial-of-service attack, where the ICP port is flooded with a con-
tinuous stream of bogus messages has three vulnerabilities: tinuous stream of bogus messages has three vulnerabilities:
o The application may log every bogus ICP message and eventually o The application may log every bogus ICP message and eventually
fill up a disk partition. fill up a disk partition.
o The socket receive queue may fill up, causing legitimate mes- o The socket receive queue may fill up, causing legitimate mes-
sages to be dropped. sages to be dropped.
o The host may waste some CPU cycles receiving the bogus messages. o The host may waste some CPU cycles receiving the bogus messages.
9.7. Altering ICP fields 9.7. Altering ICP Fields
Here we assume a third party is able to change one or more of the ICP Here we assume a third party is able to change one or more of the ICP
reply message fields. reply message fields.
Opcode Opcode
Changing the opcode field is much like inserting bogus messages Changing the opcode field is much like inserting bogus messages
described above. Changing a hit to a miss would prevent the peer described above. Changing a hit to a miss would prevent the peer
from being used. Changing a miss to a hit would force the peer to from being used. Changing a miss to a hit would force the peer to
be used. be used.
skipping to change at page 23, line 43 skipping to change at page 24, line 10
1995. 1995.
[9] Atkinson, R., "IP Encapsulating Security Payload (ESP)", RFC [9] Atkinson, R., "IP Encapsulating Security Payload (ESP)", RFC
1827, NRL, August 1995. 1827, NRL, August 1995.
11. Acknowledgments 11. Acknowledgments
The authors wish to thank Paul A Vixie <paul@vix.com> for The authors wish to thank Paul A Vixie <paul@vix.com> for
providing excellent feedback on this document, providing excellent feedback on this document,
Martin Hamilton <martin@mrrl.lut.ac.uk> for pushing the Martin Hamilton <martin@mrrl.lut.ac.uk> for pushing the
development of multicast ICP, and Eric Rescorla <ekr@terisa.com> development of multicast ICP, Eric Rescorla <ekr@terisa.com>
and Randall Atkinson <rja@home.net> for assisting with security issues. and Randall Atkinson <rja@home.net> for assisting with security issues,
and especially Allyn Romanow for keeping us on the right track.
12. Author's Addresses: 12. Authors' Addresses:
Duane Wessels Duane Wessels
National Laboratory for Applied Network Research National Laboratory for Applied Network Research
10100 Hopkins Drive 10100 Hopkins Drive
La Jolla, CA 92093 La Jolla, CA 92093
wessels@nlanr.net wessels@nlanr.net
K Claffy K Claffy
National Laboratory for Applied Network Research National Laboratory for Applied Network Research
10100 Hopkins Drive 10100 Hopkins Drive
 End of changes. 25 change blocks. 
65 lines changed or deleted 79 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/