idnits 2.17.1 

draft-ietf-grow-anycast-03.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1055.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1032.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1039.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1045.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (January 24, 2006) is 6666 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271)

  -- Obsolete informational reference (is this intentional?): RFC 2267
     (Obsoleted by RFC 2827)


     Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           J. Abley
3	Internet-Draft                                                       ISC
4	Expires: July 28, 2006                                      K. Lindqvist
5	                                                Netnod Internet Exchange
6	                                                        January 24, 2006

8	                     Operation of Anycast Services
9	                       draft-ietf-grow-anycast-03

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on July 28, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2006).

40	Abstract

42	   As the Internet has grown, and as systems and networked services
43	   within enterprises have become more pervasive, many services with
44	   high availability requirements have emerged.  These requirements have
45	   increased the demands on the reliability of the infrastructure on
46	   which those services rely.

48	   Various techniques have been employed to increase the availability of
49	   services deployed on the Internet.  This document presents commentary
50	   and recommendations for distribution of services using anycast.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
56	   3.  Anycast Service Distribution . . . . . . . . . . . . . . . . .  5
57	     3.1.  General Description  . . . . . . . . . . . . . . . . . . .  5
58	     3.2.  Goals  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
59	   4.  Design . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
60	     4.1.  Protocol Suitability . . . . . . . . . . . . . . . . . . .  7
61	     4.2.  Node Placement . . . . . . . . . . . . . . . . . . . . . .  7
62	     4.3.  Routing Systems  . . . . . . . . . . . . . . . . . . . . .  8
63	       4.3.1.  Anycast within an IGP  . . . . . . . . . . . . . . . .  8
64	       4.3.2.  Anycast within the Global Internet . . . . . . . . . .  9
65	     4.4.  Routing Considerations . . . . . . . . . . . . . . . . . .  9
66	       4.4.1.  Signalling Service Availability  . . . . . . . . . . .  9
67	       4.4.2.  Covering Prefix  . . . . . . . . . . . . . . . . . . . 10
68	       4.4.3.  Equal-Cost Paths . . . . . . . . . . . . . . . . . . . 10
69	       4.4.4.  Route Dampening  . . . . . . . . . . . . . . . . . . . 12
70	       4.4.5.  Reverse Path Forwarding Checks . . . . . . . . . . . . 13
71	       4.4.6.  Propagation Scope  . . . . . . . . . . . . . . . . . . 13
72	       4.4.7.  Other Peoples' Networks  . . . . . . . . . . . . . . . 14
73	       4.4.8.  Aggregation Risks  . . . . . . . . . . . . . . . . . . 14
74	     4.5.  Addressing Considerations  . . . . . . . . . . . . . . . . 15
75	     4.6.  Data Synchronisation . . . . . . . . . . . . . . . . . . . 15
76	     4.7.  Node Autonomy  . . . . . . . . . . . . . . . . . . . . . . 16
77	     4.8.  Multi-Service Nodes  . . . . . . . . . . . . . . . . . . . 17
78	       4.8.1.  Multiple Covering Prefixes . . . . . . . . . . . . . . 17
79	       4.8.2.  Pessimistic Withdrawal . . . . . . . . . . . . . . . . 17
80	       4.8.3.  Intra-Node Interior Connectivity . . . . . . . . . . . 18
81	   5.  Service Management . . . . . . . . . . . . . . . . . . . . . . 19
82	     5.1.  Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 19
83	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
84	     6.1.  Denial-of-Service Attack Mitigation  . . . . . . . . . . . 20
85	     6.2.  Service Compromise . . . . . . . . . . . . . . . . . . . . 20
86	     6.3.  Service Hijacking  . . . . . . . . . . . . . . . . . . . . 20
87	   7.  Protocol Considerations  . . . . . . . . . . . . . . . . . . . 21
88	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 22
89	   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
90	   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24
91	     10.1. Normative References . . . . . . . . . . . . . . . . . . . 24
92	     10.2. Informative References . . . . . . . . . . . . . . . . . . 24
93	   Appendix A.  Change History  . . . . . . . . . . . . . . . . . . . 27
94	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28
95	   Intellectual Property and Copyright Statements . . . . . . . . . . 29

97	1.  Introduction

99	   To distribute a service using anycast, the service is first
100	   associated with a stable set of IP addresses, and reachability to
101	   those addresses is advertised in a routing system from multiple,
102	   independent service nodes.  Various techniques for anycast deployment
103	   of services are discussed in [RFC1546], [ISC-TN-2003-1] and [ISC-TN-
104	   2004-1].

106	   The techniques and considerations described in this document apply to
107	   services reachable over both IPv4 and IPv6.

109	   Anycast has in recent years become increasingly popular for adding
110	   redundancy to DNS servers to complement the redundancy which the DNS
111	   architecture itself already provides.  Several root DNS server
112	   operators have distributed their servers widely around the Internet,
113	   and both resolver and authority servers are commonly distributed
114	   within the networks of service providers.  Anycast distribution has
115	   been used by commercial DNS authority server operators for several
116	   years.  The use of anycast is not limited to the DNS, although the
117	   use of anycast imposes some additional limitations on the nature of
118	   the service being distributed, including transaction longevity,
119	   transaction state held on servers and data synchronisation
120	   capabilities.

122	   Although anycast is conceptually simple, its implementation
123	   introduces some pitfalls for operation of services.  For example,
124	   monitoring the availability of the service becomes more difficult;
125	   the observed availability changes according to the location of the
126	   client within the network, and the client catchment of individual
127	   anycast nodes is neither static, nor reliably deterministic.

129	   This document will describe the use of anycast for both local scope
130	   distribution of services using an Interior Gateway Protocol (IGP) and
131	   global distribution using BGP [RFC1771].  Many of the issues for
132	   monitoring and data synchronisation are common to both, but
133	   deployment issues differ substantially.

135	2.  Terminology

137	   Service Address: an IP address associated with a particular service
138	      (e.g. the destination address used by DNS resolvers to reach a
139	      particular authority server).

141	   Anycast: the practice of making a particular Service Address
142	      available in multiple, discrete, autonomous locations, such that
143	      datagrams sent are routed to one of several available locations.

145	   Anycast Node: an internally-connected collection of hosts and routers
146	      which together provide service for an anycast Service Address.  An
147	      Anycast Node might be as simple as a single host participating in
148	      a routing system with adjacent routers, or it might include a
149	      number of hosts connected in some more elaborate fashion; in
150	      either case, to the routing system across which the service is
151	      being anycast, each Anycast Node presents a unique path to the
152	      Service Address.  The entire anycast system for the service
153	      consists of two or more separate Anycast Nodes.

155	   Catchment: in physical geography, an area drained by a river, also
156	      known as a drainage basin.  By analogy, as used in this document,
157	      the topological region of a network within which packets directed
158	      at an anycast address are routed to one particular node.

160	   Local-Scope Anycast: reachability information for the anycast Service
161	      Address is propagated through a routing system in such a way that
162	      a particular anycast node is only visible to a subset of the whole
163	      routing system.

165	   Local Node: an Anycast Node providing service using a Local-Scope
166	      Anycast address.

168	   Global-Scope Anycast: reachability information for the anycast
169	      Service Address is propagated through a routing system in such a
170	      way that a particular anycast node is potentially visible to the
171	      whole routing system.

173	   Global Node: an Anycast Node providing service using a Global-Scope
174	      Anycast address.

176	3.  Anycast Service Distribution

178	3.1.  General Description

180	   Anycast is the name given to the practice of making a Service Address
181	   available to a routing system at Anycast Nodes in two or more
182	   discrete locations.  The service provided by each node is generally
183	   consistent regardless of the particular node chosen by the routing
184	   system to handle a particular request (although some services may
185	   benefit from deliberate differences in the behaviours of individual
186	   nodes, in order to facilitate locality-specific behaviour; see
187	   Section 4.6).

189	   For services distributed using anycast, there is no inherent
190	   requirement for referrals to other servers or name-based service
191	   distribution ("round-robin DNS"), although those techniques could be
192	   combined with anycast service distribution if an application required
193	   it.  The routing system decides which node is used for each request,
194	   based on the topological design of the routing system and the point
195	   in the network at which the request originates.

197	   The Anycast Node chosen to service a particular query can be
198	   influenced by the traffic engineering capabilities of the routing
199	   protocols which make up the routing system.  The degree of influence
200	   available to the operator of the node depends on the scale of the
201	   routing system within which the Service Address is anycast.

203	   Load-balancing between Anycast Nodes is typically difficult to
204	   achieve (load distribution between nodes is generally unbalanced in
205	   terms of request and traffic load).  Distribution of load between
206	   nodes for the purposes of reliability, and coarse-grained
207	   distribution of load for the purposes of making popular services
208	   scalable can often be achieved, however.

210	   The scale of the routing system through which a service is anycast
211	   can vary from a small Interior Gateway Protocol (IGP) connecting a
212	   small handful of components, to the Border Gateway Protocol (BGP)
213	   [RFC1771] connecting the global Internet, depending on the nature of
214	   the service distribution that is required.

216	3.2.  Goals

218	   A service may be anycast for a variety of reasons.  A number of
219	   common objectives are:

221	   1.  Coarse ("unbalanced") distribution of load across nodes, to allow
222	       infrastructure to scale to increased numbers of queries and to
223	       accommodate transient query peaks;

225	   2.  Mitigation of non-distributed denial of service attacks by
226	       localising damage to single anycast nodes;

228	   3.  Constraint of distributed denial of service attacks or flash
229	       crowds to local regions around anycast nodes (perhaps restricting
230	       query traffic to local peering links, rather than paid transit
231	       circuits);

233	   4.  To provide additional information to help locate location of
234	       traffic sources in the case of attack (or query) traffic which
235	       incorporates spoofed source addresses.  This information is
236	       derived from the property of anycast service distribution that
237	       the selection of the Anycast Node used to service a particular
238	       query may be related to the topological source of the request.

240	   5.  Improvement of query response time, by reducing the network
241	       distance between client and server with the provision of a local
242	       Anycast Node.  The extent to which query response time is
243	       improved depends on the way that nodes are selected for the
244	       clients by the routing system.  Topological nearness within the
245	       routing system does not, in general, correlate to round-trip
246	       performance across a network; in some cases response times may
247	       see no reduction, and may increase.

249	   6.  To reduce a list of servers to a single, distributed address.
250	       For example, a large number of authoritative nameservers for a
251	       zone may be deployed using a small set of anycast Service
252	       Addresses; this approach can increase the accessibility of zone
253	       data in the DNS without increasing the size of a referral
254	       response from a nameserver authoritative for the parent zone.

256	4.  Design

258	4.1.  Protocol Suitability

260	   When a service is anycast between two or more nodes, the routing
261	   system makes the node selection decision on behalf of a client.
262	   Since it is usually a requirement that a single client-server
263	   interaction is carried out between a client and the same server node
264	   for the duration of the transaction, it follows that the routing
265	   system's node selection decision ought to be stable for substantially
266	   longer than the expected transaction time, if the service is to be
267	   provided reliably.

269	   Some services have very short transaction times, and may even be
270	   carried out using a single packet request and a single packet reply
271	   (e.g.  DNS transactions over UDP transport).  Other services involve
272	   far longer-lived transactions (e.g. bulk file downloads and audio-
273	   visual media streaming).

275	   Services may be anycast within very predictable routing systems,
276	   which can remain stable for long periods of time (e.g. anycast within
277	   a well-managed and topologically-simple IGP, where node selection
278	   changes only occur as a response to node failures).  Other
279	   deployments have far less predictable characteristics (see
280	   Section 4.4.7).

282	   The stability of the routing system together with the transaction
283	   time of the service should be carefully compared when deciding
284	   whether a service is suitable for distribution using anycast.  In
285	   some cases, for new protocols, it may be practical to split large
286	   transactions into an initialisation phase which is handled by anycast
287	   servers, and a sustained phase which is provided by non-anycast
288	   servers, perhaps chosen during the initialisation phase.

290	   This document deliberately avoids prescribing rules as to which
291	   protocols or services are suitable for distribution by anycast; to
292	   attempt to do so would be presumptuous.

294	4.2.  Node Placement

296	   Decisions as to where Anycast Nodes should be placed will depend to a
297	   large extent on the goals of the service distribution.  For example:

299	   o  A DNS recursive resolver service might be distributed within an
300	      ISP's network, one Anycast Node per site.

302	   o  A root DNS server service might be distributed throughout the
303	      Internet; Anycast Nodes could be located in regions with poor
304	      external connectivity to ensure that the DNS functions adequately
305	      within the region during times of external network failure.

307	   o  An FTP mirror service might include local nodes located at
308	      exchange points, so that ISPs connected to that exchange point
309	      could download bulk data more cheaply than if they had to use
310	      expensive transit circuits.

312	   In general node placement decisions should be made with consideration
313	   of likely traffic requirements, the potential for flash crowds or
314	   denial-of-service traffic, the stability of the local routing system
315	   and the failure modes with respect to node failure, or local routing
316	   system failure.

318	4.3.  Routing Systems

320	4.3.1.  Anycast within an IGP

322	   There are several common motivations for the distribution of a
323	   Service Address within the scope of an IGP:

325	   1.  to improve service response times, by hosting a service close to
326	       other users of the network;

328	   2.  to improve service reliability by providing automatic fail-over
329	       to backup nodes; and

331	   3.  to keep service traffic local, to avoid congesting wide-area
332	       links.

334	   In each case the decisions as to where and how services are
335	   provisioned can be made by network engineers without requiring such
336	   operational complexities as regional variances in the configuration
337	   of client computers, or deliberate DNS incoherence (causing DNS
338	   queries to yield different answers depending on where the queries
339	   originate).

341	   When a service is anycast within an IGP the routing system is
342	   typically under the control of the same organisation that is
343	   providing the service, and hence the relationship between service
344	   transaction characteristics and network stability are likely to be
345	   well-understood.  This technique is consequently applicable to a
346	   larger number of applications than Internet-wide anycast service
347	   distribution (see Section 4.1).

349	   An IGP will generally have no inherent restriction on the length of
350	   prefix that can be introduced to it.  In this case there is no need
351	   to construct a covering prefix for particular Service Addresses; host
352	   routes corresponding to the Service Address can instead be introduced
353	   to the routing system.  See Section 4.4.2 for more discussion of the
354	   requirement for a covering prefix.

356	   IGPs often feature little or no aggregation of routes, partly due to
357	   algorithmic complexities in supporting aggregation.  There is little
358	   motivation for aggregation in many networks' IGPs in many cases,
359	   since the amount of routing information carried in the IGP is small
360	   enough that scaling concerns in routers do not arise.  For discussion
361	   of aggregation risks in other routing systems, see Section 4.4.8.

363	   By reducing the scope of the IGP to just the hosts providing service
364	   (together with one or more gateway routers) this technique can be
365	   applied to the construction of server clusters.  This application is
366	   discussed in some detail in [ISC-TN-2004-1].

368	4.3.2.  Anycast within the Global Internet

370	   Service Addresses may be anycast within the global Internet routing
371	   system in order to distribute services across the entire network.
372	   The principal differences between this application and the IGP-scope
373	   distribution discussed in Section 4.3.1 are that:

375	   1.  the routing system is, in general, controlled by other people;

377	   2.  the routing protocol concerned (BGP), and commonly-accepted
378	       practices in its deployment, impose some additional constraints
379	       (see Section 4.4).

381	4.4.  Routing Considerations

383	4.4.1.  Signalling Service Availability

385	   When a routing system is provided with reachability information for a
386	   Service Address from an individual node, packets addressed to that
387	   Service Address will start to arrive at the node.  Since it is
388	   essential for the node to be ready to accept requests before they
389	   start to arrive, a coupling between the routing information and the
390	   availability of the service at a particular node is desirable.

392	   Where a routing advertisement from a node corresponds to a single
393	   Service Address, this coupling might be such that availability of the
394	   service triggers the route advertisement, and non-availability of the
395	   service triggers a route withdrawal.  This can be achieved using
396	   routing protocol implementations on the same server which provide the
397	   service being distributed, which are configured to advertise and
398	   withdraw the route advertisement in conjunction with the availability
399	   (and health) of the software on the host which processes service
400	   requests.  An example of such an arrangement for a DNS service is
401	   included in [ISC-TN-2004-1].

403	   Where a routing advertisement from a node corresponds to two or more
404	   Service Addresses, it may not be appropriate to trigger a route
405	   withdrawal due to the non-availability of a single service.  Another
406	   approach in the case where the service is down at one Anycast Node is
407	   to route requests to a different Anycast Node where the service is
408	   working normally.  This approach is discussed in Section 4.8.

410	   Rapid advertisement/withdrawal oscillations can cause operational
411	   problems, and nodes should be configured such that rapid oscillations
412	   are avoided (e.g. by implementing a minimum delay following a
413	   withdrawal before the service can be re-advertised).  See
414	   Section 4.4.4 for a discussion of route oscillations in BGP.

416	4.4.2.  Covering Prefix

418	   In some routing systems (e.g. the BGP-based routing system of the
419	   global Internet) it is not possible, in general, to propagate a host
420	   route with confidence that the route will propagate throughout the
421	   network.  This is a consequence of operational policy, and not a
422	   protocol restriction.

424	   In such cases it is necessary to propagate a route which covers the
425	   Service Address, and which has a sufficiently short prefix that it
426	   will not be discarded by commonly-deployed import policies.  For IPv4
427	   Service Addresses, this is often a 24-bit prefix, but there are other
428	   well-documented examples of IPv4 import polices which filter on
429	   Regional Internet Registry (RIR) allocation boundaries, and hence
430	   some experimentation may be prudent.  Corresponding import policies
431	   for IPv6 prefixes also exist.  See Section 4.5 for more discussion of
432	   IPv6 Service Addresses and corresponding anycast routes.

434	   The propagation of a single route per service has some associated
435	   scaling issues which are discussed in Section 4.4.8.

437	   Where multiple Service Addresses are covered by the same covering
438	   route, there is no longer a tight coupling between the advertisement
439	   of that route and the individual services associated with the covered
440	   host routes.  The resulting impact on signalling availability of
441	   individual services is discussed in Section 4.4.1 and Section 4.8.

443	4.4.3.  Equal-Cost Paths

445	   Some routing systems support equal-cost paths to the same
446	   destination.  Where multiple, equal-cost paths exist and lead to
447	   different anycast nodes, there is a risk that different request
448	   packets associated with a single transaction might be delivered to
449	   more than one node.  Services provided over TCP [RFC0793] necessarily
450	   involve transactions with multiple request packets, due to the TCP
451	   setup handshake.

453	   For services which are distributed across the global Internet using
454	   BGP, equal-cost paths are normally not a consideration: BGP's exit
455	   selection algorithm usually selects a single, consistent exit for a
456	   single destination regardless of whether multiple candidate paths
457	   exist.  Implementations of BGP exist that support multi-path exit
458	   selection, however.

460	   Equal cost paths are commonly supported in IGPs.  Multi-node
461	   selection for a single transaction can be avoided in most cases by
462	   careful consideration of IGP link metrics, or by applying equal-cost
463	   multi-path (ECMP) selection algorithms which cause a single node to
464	   be selected for a single multi-packet transaction.  For an example of
465	   the use of hash-based ECMP selection in anycast service distribution,
466	   see [ISC-TN-2004-1].

468	   Other ECMP selection algorithms are commonly available, including
469	   those in which packets from the same flow are not guaranteed to be
470	   routed towards the same destination.  ECMP algorithms which select a
471	   route on a per-packet basis rather than per-flow are commonly
472	   referred to as performing "Per Packet Load Balancing" (PPLB).

474	   With respect to anycast service distribution, some uses of PPLB may
475	   cause different packets from a single multi-packet transaction sent
476	   by a client to be delivered to different anycast nodes, effectively
477	   making the anycast service unavailable.  Whether this affects
478	   specific anycast services will depend on how and where anycast nodes
479	   are deployed within the routing system, and on where the PPLB is
480	   being performed:

482	   1.  PPLB across multiple, parallel links between the same pair of
483	       routers should cause no node selection problems;

485	   2.  PPLB across diverse paths within a single autonomous system (AS),
486	       where the paths converge to a single exit as they leave the AS,
487	       should cause no node selection problems;

489	   3.  PPLB across links to different neighbour ASes where the neighbour
490	       ASes have selected different nodes for a particular anycast
491	       destination will, in general, cause request packets to be
492	       distributed across multiple anycast nodes.  This will have the
493	       effect that the anycast service is unavailable to clients
494	       downstream of the router performing PPLB.

496	   The uses of PPLB which have the potential to interact badly with
497	   anycast service distribution can also cause persistent packet
498	   reordering.  A network path that persistently reorders segments will
499	   degrade the performance of traffic carried by TCP [Allman2000].  TCP,
500	   according to several documented measurements, accounts for the bulk
501	   of traffic carried on the Internet ([McCreary2000], [Fomenkov2004]).
502	   Consequently, in many cases it is reasonable to consider networks
503	   making such use of PPLB to be pathological.

505	4.4.4.  Route Dampening

507	   Frequent advertisements and withdrawals of individual prefixes in BGP
508	   are known as flaps.  Rapid flapping can lead to CPU exhaustion on
509	   routers quite remote from the source of the instability, and for this
510	   reason rapid route oscillations are frequently "dampened", as
511	   described in [RFC2439].

513	   A dampened path will be suppressed by routers for an interval which
514	   increases according to the frequency of the observed oscillation; a
515	   suppressed path will not propagate.  Hence a single router can
516	   prevent the propagation of a flapping prefix to the rest of an
517	   autonomous system, affording other routers in the network protection
518	   from the instability.

520	   Some implementations of flap dampening penalise oscillating
521	   advertisements based on the observed AS_PATH, and not on the NLRI.
522	   For this reason, network instability which leads to route flapping
523	   from a single anycast node ought not to cause advertisements from
524	   other nodes (which have different AS_PATH attributes) to be dampened.

526	   To limit the opportunity of such implementations to penalise
527	   advertisements originating from different Anycast Nodes in response
528	   to oscillations from just a single node, care should be taken to
529	   arrange that the AS_PATH attributes on routes from different nodes
530	   are as diverse as possible.  For example, Anycast Nodes should use
531	   the same origin AS for their advertisements, but might have different
532	   upstream ASes.

534	   Where different implementations of flap dampening are prevalent,
535	   individual nodes' instability may result in stable nodes becoming
536	   unavailable.  In mitigation, the following measures may be useful:

538	   1.  Judicious deployment of Local Nodes in combination with
539	       especially stable Global Nodes (with high inter-AS path splay,
540	       redundant hardware, power, etc) may help limit oscillation
541	       problems to the Local Nodes' limited regions of influence;

543	   2.  Aggressive flap-dampening of the service prefix close to the
544	       origin (e.g. within an Anycast Node, or in adjacent ASes of each
545	       Anycast Node) may also help reduce the opportunity of remote ASes
546	       to see oscillations at all.

548	4.4.5.  Reverse Path Forwarding Checks

550	   Reverse Path Forwarding (RPF) checks, first described in [RFC2267],
551	   are commonly deployed as part of ingress interface packet filters on
552	   routers in the Internet in order to deny packets whose source
553	   addresses are spoofed (see also RFC 2827 [RFC2827]).  Deployed
554	   implementations of RPF make several modes of operation available
555	   (e.g. "loose" and "strict").

557	   Some modes of RPF can cause non-spoofed packets to be denied when
558	   they originate from multi-homed site, since selected paths might
559	   legitimately not correspond with the ingress interface of non-spoofed
560	   packets from the multi-homed site.  This issue is discussed in
561	   [RFC3704].

563	   A collection of anycast nodes deployed across the Internet is largely
564	   indistinguishable from a distributed, multi-homed site to the routing
565	   system, and hence this risk also exists for anycast nodes, even if
566	   individual nodes are not multi-homed.  Care should be taken to ensure
567	   that each anycast node is treated as a multi-homed network, and that
568	   the corresponding recommendations in [RFC3704] with respect to RPF
569	   checks are heeded.

571	4.4.6.  Propagation Scope

573	   In the context of Anycast service distribution across the global
574	   Internet, Global Nodes are those which are capable of providing
575	   service to clients anywhere in the network; reachability information
576	   for the service is propagated globally, without restriction, by
577	   advertising the routes covering the Service Addresses for global
578	   transit to one or more providers.

580	   More than one Global Node can exist for a single service (and indeed
581	   this is often the case, for reasons of redundancy and load-sharing).

583	   In contrast, it is sometimes desirable to deploy an Anycast Node
584	   which only provides services to a local catchment of autonomous
585	   systems, and which is deliberately not available to the entire
586	   Internet; such nodes are referred to in this document as Local Nodes.
587	   An example of circumstances in which a Local Node may be appropriate
588	   are nodes designed to serve a region with rich internal connectivity
589	   but unreliable, congested or expensive access to the rest of the
590	   Internet.

592	   Local Nodes advertise covering routes for Service Addresses in such a
593	   way that their propagation is restricted.  This might be done using
594	   well-known community string attributes such as NO_EXPORT [RFC1997] or
595	   NOPEER [RFC3765], or by arranging with peers to apply a conventional
596	   "peering" import policy instead of a "transit" import policy, or some
597	   suitable combination of measures.

599	   Advertising reachability to Service Addresses from Local Nodes should
600	   ideally be made using a routing policy that require presence of
601	   explicit attributes for propagation, rather than relying on implicit
602	   (default) policy.  Inadvertent propagation of a route beyond its
603	   intended horizon can result in capacity problems for Local Nodes
604	   which might degrade service performance network-wide.

606	4.4.7.  Other Peoples' Networks

608	   When Anycast services are deployed across networks operated by
609	   others, their reachability is dependent on routing policies and
610	   topology changes (planned and unplanned) which are unpredictable and
611	   sometimes difficult to identify.  Since the routing system may
612	   include networks operated by multiple, unrelated organisations, the
613	   possibility of unforeseen interactions resulting from the
614	   combinations of unrelated changes also exists.

616	   The stability and predictability of such a routing system should be
617	   taken into consideration when assessing the suitability of anycast as
618	   a distribution strategy for particular services and protocols (see
619	   also Section 4.1).

621	   By way of mitigation, routing policies used by Anycast Nodes across
622	   such routing systems should be conservative, individual nodes'
623	   internal and external/connecting infrastructure should be scaled to
624	   support loads far in excess of the average, and the service should be
625	   monitored proactively from many points in order to avoid unpleasant
626	   surprises (see Section 5.1).

628	4.4.8.  Aggregation Risks

630	   The propagation of a single route for each anycast service does not
631	   scale well for routing systems in which the load of routing
632	   information which must be carried is a concern, and where there are
633	   potentially many services to distribute.  For example, an autonomous
634	   system which provides services to the Internet with N Service
635	   Addresses covered by a single exported route, would need to advertise
636	   (N+1) routes if each of those services were to be distributed using
637	   anycast.

639	   The common practice of applying minimum prefix-length filters in
640	   import policies on the Internet (see Section 4.4.2) means that for a
641	   route covering a Service Address to be usefully propagated the prefix
642	   length must be substantially less than that required to advertise
643	   just the host route.  Widespread advertisement of short prefixes for
644	   individual services hence also has a negative impact on address
645	   conservation.

647	   Both of these issues can be mitigated to some extent by the use of a
648	   single covering prefix to accommodate multiple Service Addresses, as
649	   described in Section 4.8.  This implies a de-coupling of the route
650	   advertisement from individual service availability (see
651	   Section 4.4.1), however, with attendant risks to the stability of the
652	   service as a whole (see Section 4.7).

654	   In general, the scaling problems described here prevent anycast from
655	   being a useful, general approach for service distribution on the
656	   global Internet.  It remains, however, a useful technique for
657	   distributing a limited number of Internet-critical services, as well
658	   as in smaller networks where the aggregation concerns discussed here
659	   do not apply.

661	4.5.  Addressing Considerations

663	   Service Addresses should be unique within the routing system that
664	   connects all Anycast Nodes to all possible clients of the service.
665	   Service Addresses must also be chosen so that corresponding routes
666	   will be allowed to propagate within that routing system.

668	   For an IPv4-numbered service deployed across the Internet, for
669	   example, an address might be chosen from a block where the minimum
670	   RIR allocation size is 24 bits, and reachability to that address
671	   might be provided by originating the covering 24-bit prefix.

673	   For an IPv4-numbered service deployed within a private network, a
674	   locally-unused [RFC1918] address might be chosen, and reachability to
675	   that address might be signalled using a (32-bit) host route.

677	   For IPv6-numbered services, Anycast Addresses are not scoped
678	   differently from unicast addresses.  As such the guidelines presented
679	   for IPv4 with respect to address suitability follow for IPv6.  Note
680	   that historical prohibitions on anycast distribution of services over
681	   IPv6 have been removed from the IPv6 addressing specification in
682	   [I-D.ietf-ipv6-addr-arch-v4].

684	4.6.  Data Synchronisation

686	   Although some services have been deployed in localised form (such
687	   that clients from particular regions are presented with regionally-
688	   relevant content) many services have the property that responses to
689	   client requests should be consistent, regardless of where the request
690	   originates.  For a service distributed using anycast, that implies
691	   that different Anycast Nodes must operate in a consistent manner and,
692	   where that consistent behaviour is based on a data set, that the data
693	   concerned be synchronised between nodes.

695	   The mechanism by which data is synchronised depends on the nature of
696	   the service; examples are zone transfers for authoritative DNS
697	   servers and rsync for FTP archives.  In general, the synchronisation
698	   of data between Anycast Nodes will involve transactions between non-
699	   anycast addresses.

701	   Data synchronisation across public networks should be carried out
702	   with appropriate authentication and encryption.

704	4.7.  Node Autonomy

706	   For an Anycast deployment whose goals include improved reliability
707	   through redundancy, it is important to minimise the opportunity for a
708	   single defect to compromise many (or all) nodes, or for the failure
709	   of one node to provide a cascading failure bringing down additional
710	   successive nodes until the service as a whole is defeated.

712	   Co-dependencies are avoided by making each node as autonomous and
713	   self-sufficient as possible.  The degree to which nodes can survive
714	   failure elsewhere depends on the nature of the service being
715	   delivered, but for services which accommodate disconnected operation
716	   (e.g. the timed propagation of changes between master and slave
717	   servers in the DNS) a high degree of autonomy can be achieved.

719	   The possibility of cascading failure due to load can also be reduced
720	   by the deployment of both Global and Local Nodes for a single
721	   service, since the effective fail-over path of traffic is, in
722	   general, from Local Node to Global Node; traffic that might sink one
723	   Local Node is unlikely to sink all Local Nodes, except in the most
724	   degenerate cases.

726	   The chance of cascading failure due to a software defect in an
727	   operating system or server can be reduced in many cases by deploying
728	   nodes running different implementations of operating system, server
729	   software, routing protocol software, etc, such that a defect which
730	   appears in a single component does not affect the whole system.

732	   It should be noted that these approaches to increase node autonomy
733	   are, to varying degrees, contrary to the practical goals of making a
734	   deployed service straightforward to operate.  A service which is
735	   over-complex is more likely to suffer from operator error than a
736	   service which is more straightforward to run.  Careful consideration
737	   should be given to all of these aspects so that an appropriate
738	   balance may be found.

740	4.8.  Multi-Service Nodes

742	   For a service distributed across a routing system where covering
743	   prefixes are required to announce reachability to a single Service
744	   Address (see Section 4.4.2), special consideration is required in the
745	   case where multiple services need to be distributed across a single
746	   set of nodes.  This results from the requirement to signal
747	   availability of individual services to the routing system so that
748	   requests for service are not received by nodes which are not able to
749	   process them (see Section 4.4.1).

751	   Several approaches are described in the following sections.

753	4.8.1.  Multiple Covering Prefixes

755	   Each Service Address is chosen such that only one Service Address is
756	   covered by each advertised prefix.  Advertisement and withdrawal of a
757	   single covering prefix can be tightly coupled to the availability of
758	   the single associated service.

760	   This is the most straightforward approach.  However, since it makes
761	   very poor utilisation of globally-unique addresses, it is only
762	   suitable for use for a small number of critical, infrastructural
763	   services such as root DNS servers.  General Internet-wide deployment
764	   of services using this approach will not scale.

766	4.8.2.  Pessimistic Withdrawal

768	   Multiple Service Addresses are chosen such that they are covered by a
769	   single prefix.  Advertisement and withdrawal of the single covering
770	   prefix is coupled to the availability of all associated services; if
771	   any individual service becomes unavailable, the covering prefix is
772	   withdrawn.

774	   The coupling between service availability and advertisement of the
775	   covering prefix is complicated by the requirement that all Service
776	   Addresses must be available -- the announcement needs to be triggered
777	   by the presence of all component routes, and not just a single
778	   covered route.

780	   The fact that a single malfunctioning service causes all deployed
781	   services in a node to be taken off-line may make this approach
782	   unsuitable for many applications.

784	4.8.3.  Intra-Node Interior Connectivity

786	   Multiple Service Addresses are chosen such that they are covered by a
787	   single prefix.  Advertisement and withdrawal of the single covering
788	   prefix is coupled to the availability of any one service.  Nodes have
789	   interior connectivity, e.g. using tunnels, and host routes for
790	   service addresses are distributed using an IGP which extends to
791	   include routers at all nodes.

793	   In the event that a service is unavailable at one node, but available
794	   at other nodes, a request may be routed over the interior network
795	   from the receiving node towards some other node for processing.

797	   In the event that some local services in a node are down and the node
798	   is disconnected from other nodes, continued advertisement of the
799	   covering prefix might cause requests to become black-holed.

801	   This approach allows reasonable address utilisation of the netblock
802	   covered by the announced prefix, at the expense of reduced autonomy
803	   of individual nodes; the IGP in which all nodes participate can be
804	   viewed as a single point of failure.

806	5.  Service Management

808	5.1.  Monitoring

810	   Monitoring a service which is distributed is more complex than
811	   monitoring a non-distributed service, since the observed accuracy and
812	   availability of the service is, in general, different when viewed
813	   from clients attached to different parts of the network.  When a
814	   problem is identified, it is also not always obvious which node
815	   served the request, and hence which node is malfunctioning.

817	   It is recommended that distributed services are monitored from probes
818	   distributed representatively across the routing system, and, where
819	   possible, the identity of the node answering individual requests is
820	   recorded along with performance and availability statistics.  The
821	   RIPE NCC DNSMON service [1] is an example of such monitoring for the
822	   DNS.

824	   Monitoring the routing system (from a variety of places, in the case
825	   of routing systems where perspective is relevant) can also provide
826	   useful diagnostics for troubleshooting service availability.  This
827	   can be achieved using dedicated probes, or public route measurement
828	   facilities on the Internet such as the RIPE NCC Routing Information
829	   Service [2] and the University of Oregon Route Views Project [3].

831	   Monitoring the health of the component devices in an Anycast
832	   deployment of a service (hosts, routers, etc) is straightforward, and
833	   can be achieved using the same tools and techniques commonly used to
834	   manage other network-connected infrastructure, without the additional
835	   complexity involved in monitoring Anycast service addresses.

837	6.  Security Considerations

839	6.1.  Denial-of-Service Attack Mitigation

841	   This document describes mechanisms for deploying services on the
842	   Internet which can be used to mitigate vulnerability to attack:

844	   1.  An Anycast Node can act as a sink for attack traffic originated
845	       within its sphere of influence, preventing nodes elsewhere from
846	       having to deal with that traffic;

848	   2.  The task of dealing with attack traffic whose sources are widely
849	       distributed is itself distributed across all the nodes which
850	       contribute to the service.  Since the problem of sorting between
851	       legitimate and attack traffic is distributed, this may lead to
852	       better scaling properties than a service which is not
853	       distributed.

855	6.2.  Service Compromise

857	   The distribution of a service across several (or many) autonomous
858	   nodes imposes increased monitoring as well as an increased systems
859	   administration burden on the operator of the service which might
860	   reduce the effectiveness of host and router security.

862	   The potential benefit of being able to take compromised servers off-
863	   line without compromising the service can only be realised if there
864	   are working procedures to do so quickly and reliably.

866	6.3.  Service Hijacking

868	   It is possible that an unauthorised party might advertise routes
869	   corresponding to anycast Service Addresses across a network, and by
870	   doing so capture legitimate request traffic or process requests in a
871	   manner which compromises the service (or both).  A rogue Anycast Node
872	   might be difficult to detect by clients or by the operator of the
873	   service.

875	   The risk of service hijacking by manipulation of the routing system
876	   exists regardless of whether a service is distributed using anycast.
877	   However, the fact that legitimate Anycast Nodes are observable in the
878	   routing system may make it more difficult to detect rogue nodes.

880	7.  Protocol Considerations

882	   This document does not impose any protocol considerations.

884	8.  IANA Considerations

886	   This document requests no action from IANA.

888	9.  Acknowledgements

890	   The authors gratefully acknowledge the contributions from various
891	   participants of the grow working group, and in particular Geoff
892	   Huston, Pekka Savola, Danny McPherson, Ben Black and Alan Barrett.

894	   This work was supported by the US National Science Foundation
895	   (research grant SCI-0427144) and DNS-OARC.

897	10.  References

899	10.1.  Normative References

901	   [I-D.ietf-ipv6-addr-arch-v4]
902	              Hinden, R. and S. Deering, "IP Version 6 Addressing
903	              Architecture", draft-ietf-ipv6-addr-arch-v4-04 (work in
904	              progress), May 2005.

906	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
907	              RFC 793, September 1981.

909	   [RFC1771]  Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
910	              (BGP-4)", RFC 1771, March 1995.

912	   [RFC1918]  Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and
913	              E. Lear, "Address Allocation for Private Internets",
914	              BCP 5, RFC 1918, February 1996.

916	   [RFC1997]  Chandrasekeran, R., Traina, P., and T. Li, "BGP
917	              Communities Attribute", RFC 1997, August 1996.

919	   [RFC2439]  Villamizar, C., Chandra, R., and R. Govindan, "BGP Route
920	              Flap Damping", RFC 2439, November 1998.

922	   [RFC2827]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
923	              Defeating Denial of Service Attacks which employ IP Source
924	              Address Spoofing", BCP 38, RFC 2827, May 2000.

926	   [RFC3704]  Baker, F. and P. Savola, "Ingress Filtering for Multihomed
927	              Networks", BCP 84, RFC 3704, March 2004.

929	10.2.  Informative References

931	   [Allman2000]
932	              Allman, M. and E. Blanton, "On Making TCP More Robust to
933	              Packet Reordering", January 2000,
934	              <http://www.icir.org/mallman/papers/tcp-reorder-ccr.ps>.

936	   [Fomenkov2004]
937	              Fomenkov, M., Keys, K., Moore, D., and k. claffy,
938	              "Longitudinal Study of Internet Traffic from 1999-2003",
939	              January 2004, <http://www.caida.org/outreach/papers/2003/
940	              nlanr/nlanr_overview.pdf>.

942	   [ISC-TN-2003-1]
943	              Abley, J., "Hierarchical Anycast for Global Service
944	              Distribution", March 2003,
945	              <http://www.isc.org/pubs/tn/isc-tn-2003-1.html>.

947	   [ISC-TN-2004-1]
948	              Abley, J., "A Software Approach to Distributing Requests
949	              for DNS Service using GNU Zebra, ISC BIND 9 and FreeBSD",
950	              March 2004,
951	              <http://www.isc.org/pubs/tn/isc-tn-2004-1.html>.

953	   [McCreary2000]
954	              McCreary, S. and k. claffy, "Trends in Wide Area IP
955	              Traffic Patterns: A View from Ames Internet Exchange",
956	              September 2000, <http://www.caida.org/outreach/papers/
957	              2000/AIX0005/AIX0005.pdf>.

959	   [RFC1546]  Partridge, C., Mendez, T., and W. Milliken, "Host
960	              Anycasting Service", RFC 1546, November 1993.

962	   [RFC2267]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
963	              Defeating Denial of Service Attacks which employ IP Source
964	              Address Spoofing", RFC 2267, January 1998.

966	   [RFC3765]  Huston, G., "NOPEER Community for Border Gateway Protocol
967	              (BGP) Route Scope Control", RFC 3765, April 2004.

969	URIs

971	   [1]  <http://dnsmon.ripe.net/>

973	   [2]  <http://ris.ripe.net>

975	   [3]  <http://www.route-views.org>

977	Appendix A.  Change History

979	   This section should be removed before publication.

981	   draft-kurtis-anycast-bcp-00:  Initial draft.  Discussed at IETF 61 in
982	      the grow meeting and adopted as a working group document shortly
983	      afterwards.

985	   draft-ietf-grow-anycast-00:  Missing and empty sections completed;
986	      some structural reorganisation; general wordsmithing.  Document
987	      discussed at IETF 62.

989	   draft-ietf-grow-anycast-01:  This appendix added; acknowledgements
990	      section added; commentary on RFC3513 prohibition of anycast on
991	      hosts removed; minor sentence re-casting and related jiggery-
992	      pokery.  This revision published for discussion at IETF 63.

994	   draft-ietf-grow-anycast-02:  Normative reference to [I-D.ietf-ipv6-
995	      addr-arch-v4] added (in the RFC editor's queue at the time of
996	      writing; reference should be updated to an RFC number when
997	      available).  Added commentary on per-packet load balancing.

999	   draft-ietf-grow-anycast-03:  Editorial changes and language clean-up
1000	      at the request of the IESG.

1002	Authors' Addresses

1004	   Joe Abley
1005	   Internet Systems Consortium, Inc.
1006	   950 Charter Street
1007	   Redwood City, CA  94063
1008	   USA

1010	   Phone: +1 650 423 1317
1011	   Email: jabley@isc.org
1012	   URI:   http://www.isc.org/

1014	   Kurt Erik Lindqvist
1015	   Netnod Internet Exchange
1016	   Bellmansgatan 30
1017	   118 47 Stockholm
1018	   Sweden

1020	   Email: kurtis@kurtis.pp.se
1021	   URI:   http://www.netnod.se/

1023	Intellectual Property Statement

1025	   The IETF takes no position regarding the validity or scope of any
1026	   Intellectual Property Rights or other rights that might be claimed to
1027	   pertain to the implementation or use of the technology described in
1028	   this document or the extent to which any license under such rights
1029	   might or might not be available; nor does it represent that it has
1030	   made any independent effort to identify any such rights.  Information
1031	   on the procedures with respect to rights in RFC documents can be
1032	   found in BCP 78 and BCP 79.

1034	   Copies of IPR disclosures made to the IETF Secretariat and any
1035	   assurances of licenses to be made available, or the result of an
1036	   attempt made to obtain a general license or permission for the use of
1037	   such proprietary rights by implementers or users of this
1038	   specification can be obtained from the IETF on-line IPR repository at
1039	   http://www.ietf.org/ipr.

1041	   The IETF invites any interested party to bring to its attention any
1042	   copyrights, patents or patent applications, or other proprietary
1043	   rights that may cover technology that may be required to implement
1044	   this standard.  Please address the information to the IETF at
1045	   ietf-ipr@ietf.org.

1047	Disclaimer of Validity

1049	   This document and the information contained herein are provided on an
1050	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1051	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1052	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1053	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1054	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1055	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1057	Copyright Statement

1059	   Copyright (C) The Internet Society (2006).  This document is subject
1060	   to the rights, licenses and restrictions contained in BCP 78, and
1061	   except as set forth therein, the authors retain all their rights.

1063	Acknowledgment

1065	   Funding for the RFC Editor function is currently provided by the
1066	   Internet Society.