idnits 2.17.1 

draft-ietf-grow-anycast-02.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5 on line 1034.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1011.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1018.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1024.

  ** This document has an original RFC 3978 Section 5.4 Copyright Line,
     instead of the newer IETF Trust Copyright according to RFC 4748.

  ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead
     of the newer disclaimer which includes the IETF Trust according to RFC
     4748.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not
     match the current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (October 21, 2005) is 6762 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  ** Obsolete normative reference: RFC  793 (Obsoleted by RFC 9293)

  ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271)

  -- Obsolete informational reference (is this intentional?): RFC 2267
     (Obsoleted by RFC 2827)


     Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                           J. Abley
3	Internet-Draft                                                       ISC
4	Expires: April 24, 2006                                     K. Lindqvist
5	                                                Netnod Internet Exchange
6	                                                        October 21, 2005

8	                     Operation of Anycast Services
9	                       draft-ietf-grow-anycast-02

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on April 24, 2006.

36	Copyright Notice

38	   Copyright (C) The Internet Society (2005).

40	Abstract

42	   As the Internet has grown, and as systems and networked services
43	   within enterprises have become more pervasive, many services with
44	   high availability requirements have emerged.  These requirements have
45	   increased the demands on the reliability of the infrastructure on
46	   which those services rely.

48	   Various techniques have been employed to increase the availability of
49	   services deployed on the Internet.  This document presents commentary
50	   and recommendations for distribution of services using anycast.

52	Table of Contents

54	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
55	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
56	   3.  Anycast Service Distribution . . . . . . . . . . . . . . . . .  5
57	     3.1   General Description  . . . . . . . . . . . . . . . . . . .  5
58	     3.2   Goals  . . . . . . . . . . . . . . . . . . . . . . . . . .  5
59	   4.  Design . . . . . . . . . . . . . . . . . . . . . . . . . . . .  7
60	     4.1   Protocol Suitability . . . . . . . . . . . . . . . . . . .  7
61	     4.2   Node Placement . . . . . . . . . . . . . . . . . . . . . .  7
62	     4.3   Routing Systems  . . . . . . . . . . . . . . . . . . . . .  8
63	       4.3.1   Anycast within an IGP  . . . . . . . . . . . . . . . .  8
64	       4.3.2   Anycast within the Global Internet . . . . . . . . . .  9
65	     4.4   Routing Considerations . . . . . . . . . . . . . . . . . .  9
66	       4.4.1   Signalling Service Availability  . . . . . . . . . . .  9
67	       4.4.2   Covering Prefix  . . . . . . . . . . . . . . . . . . . 10
68	       4.4.3   Equal-Cost Paths . . . . . . . . . . . . . . . . . . . 10
69	       4.4.4   Route Dampening  . . . . . . . . . . . . . . . . . . . 12
70	       4.4.5   Reverse Path Forwarding Checks . . . . . . . . . . . . 13
71	       4.4.6   Propagation Scope  . . . . . . . . . . . . . . . . . . 13
72	       4.4.7   Other Peoples' Networks  . . . . . . . . . . . . . . . 14
73	       4.4.8   Aggregation Risks  . . . . . . . . . . . . . . . . . . 14
74	     4.5   Addressing Considerations  . . . . . . . . . . . . . . . . 15
75	     4.6   Data Synchronisation . . . . . . . . . . . . . . . . . . . 15
76	     4.7   Node Autonomy  . . . . . . . . . . . . . . . . . . . . . . 16
77	     4.8   Multi-Service Nodes  . . . . . . . . . . . . . . . . . . . 16
78	       4.8.1   Multiple Covering Prefixes . . . . . . . . . . . . . . 17
79	       4.8.2   Pessimistic Withdrawal . . . . . . . . . . . . . . . . 17
80	       4.8.3   Intra-Node Interior Connectivity . . . . . . . . . . . 17
81	   5.  Service Management . . . . . . . . . . . . . . . . . . . . . . 19
82	     5.1   Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 19
83	   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
84	     6.1   Denial-of-Service Attack Mitigation  . . . . . . . . . . . 20
85	     6.2   Service Compromise . . . . . . . . . . . . . . . . . . . . 20
86	     6.3   Service Hijacking  . . . . . . . . . . . . . . . . . . . . 20
87	   7.  Protocol Considerations  . . . . . . . . . . . . . . . . . . . 21
88	   8.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 22
89	   9.  Acknowlegements  . . . . . . . . . . . . . . . . . . . . . . . 23
90	   10.   References . . . . . . . . . . . . . . . . . . . . . . . . . 24
91	     10.1  Normative References . . . . . . . . . . . . . . . . . . . 24
92	     10.2  Informative References . . . . . . . . . . . . . . . . . . 24
93	       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 26
94	   A.  Change History . . . . . . . . . . . . . . . . . . . . . . . . 27
95	       Intellectual Property and Copyright Statements . . . . . . . . 28

97	1.  Introduction

99	   To distribute a service using anycast, the service is first
100	   associated with a stable set of IP addresses, and reachability to
101	   those addresses is advertised in a routing system from multiple,
102	   independent service nodes.  Various techniques for anycast deployment
103	   of services are discussed in [RFC1546], [ISC-TN-2003-1] and [ISC-TN-
104	   2004-1].

106	   Anycast has in recent years become increasingly popular for adding
107	   redundancy to DNS servers to complement the redundancy which the DNS
108	   architecture itself already provides.  Several root DNS server
109	   operators have distributed their servers widely around the Internet,
110	   and both resolver and authority servers are commonly distributed
111	   within the networks of service providers.  Anycast distribution has
112	   been used by commercial DNS authority server operators for several
113	   years.  The use of anycast is not limited to the DNS, although the
114	   use of anycast imposes some additional limitations on the nature of
115	   the service being distributed, including transaction longevity,
116	   transaction state held on servers and data synchronisation
117	   capabilities.

119	   Although anycast is conceptually simple, its implementation
120	   introduces some pitfalls for operation of services.  For example,
121	   monitoring the availability of the service becomes more difficult;
122	   the observed availability changes according to the location of the
123	   client within the network, and the client catchment of individual
124	   anycast nodes is neither static, nor reliably deterministic.

126	   This document will describe the use of anycast for both local scope
127	   distribution of services using an Interior Gateway Protocol (IGP) and
128	   global distribution using BGP [RFC1771].  Many of the issues for
129	   monitoring and data synchronisation are common to both, but
130	   deployment issues differ substantially.

132	2.  Terminology

134	   Service Address: an IP address associated with a particular service
135	      (e.g. the destination address used by DNS resolvers to reach a
136	      particular authority server).

138	   Anycast: the practice of making a particular Service Address
139	      available in multiple, discrete, autonomous locations, such that
140	      datagrams sent are routed to one of several available locations.

142	   Anycast Node: an internally-connected collection of hosts and routers
143	      which together provide service for an anycast Service Address.  An
144	      Anycast Node might be as simple as a single host participating in
145	      a routing protocol with adjacent routers, or it might include a
146	      number of hosts connected in some more elaborate fashion; in
147	      either case, to the routing system across which the service is
148	      being anycast, each Anycast Node presents a unique path to the
149	      Service Address.  The entire anycast system for the service
150	      consists of two or more separate Anycast Nodes.

152	   Local-Scope Anycast: reachability information for the anycast Service
153	      Address is propagated through a routing system in such a way that
154	      a particular anycast node is only visible to a subset of the whole
155	      routing system.

157	   Local Node: an Anycast Node providing service using a Local-Scope
158	      Anycast address.

160	   Global-Scope Anycast: reachability information for the anycast
161	      Service Address is propagated through a routing system in such a
162	      way that a particular anycast node is potentially visible to the
163	      whole routing system.

165	   Global Node: an Anycast Node providing service using a Global-Scope
166	      Anycast address.

168	3.  Anycast Service Distribution

170	3.1  General Description

172	   Anycast is the name given to the practice of making a Service Address
173	   available to a routing system at Anycast Nodes in two or more
174	   discrete locations.  The service provided by each node is consistent
175	   regardless of the particular node chosen by the routing system to
176	   handle a particular request.

178	   For services distributed using anycast, there is no inherent
179	   requirement for referrals to other servers or name-based service
180	   distribution ("round-robin DNS"), although those techniques could be
181	   combined with anycast service distribution if an application required
182	   it.  The routing system decides which node is used for each request,
183	   based on the topological design of the routing system and the point
184	   in the network at which the request originates.

186	   The Anycast Node chosen to service a particular query can be
187	   influenced by the traffic engineering capabilities of the routing
188	   protocols which make up the routing system.  The degree of influence
189	   available to the operator of the node depends on the scale of the
190	   routing system within which the Service Address is anycast.

192	   Load-balancing between Anycast Nodes is typically difficult to
193	   achieve (load distribution between nodes is generally unbalanced in
194	   terms of request and traffic load).  Distribution of load between
195	   nodes for the purposes of reliability, and coarse-grained
196	   distribution of load for the purposes of making popular services
197	   scalable can often be achieved, however.

199	   The scale of the routing system through which a service is anycast
200	   can vary from a small Interior Gateway Protocol (IGP) connecting a
201	   small handful of components, to the Border Gateway Protocol (BGP)
202	   [RFC1771] connecting the global Internet, depending on the nature of
203	   the service distribution that is required.

205	3.2  Goals

207	   A service may be anycast for a variety of reasons.  A number of
208	   common objectives are:

210	   1.  Coarse ("unbalanced") distribution of load across nodes, to allow
211	       infrastructure to scale to increased numbers of queries and to
212	       accommodate transient query peaks;

214	   2.  Mitigation of non-distributed denial of service attacks by
215	       localising damage to single anycast nodes;

217	   3.  Constraint of distributed denial of service attacks or flash
218	       crowds to local regions around anycast nodes (perhaps restricting
219	       query traffic to local peering links, rather than paid transit
220	       circuits);

222	   4.  To provide additional information to help locate location of
223	       traffic sources in the case of attack (or query) traffic which
224	       incorporates spoofed source addresses.  This information is
225	       derived from the property of anycast service distribution that
226	       the the selection of the Anycast Node used to service a
227	       particular query may be related to the topological source of the
228	       request.

230	   5.  Improvement of query response time, by reducing the network
231	       distance between client and server with the provision of a local
232	       Anycast Node.  The extent to which query response time is
233	       improved depends on the way that nodes are selected for the
234	       clients by the routing system.  Topological nearness within the
235	       routing system does not, in general, correlate to round-trip
236	       performance across a network; in some cases response times may
237	       see no reduction, and may increase.

239	   6.  To reduce a list of servers to a single, distributed address.
240	       For example, a large number of authoritative nameservers for a
241	       zone may be deployed using a small set of anycast Service
242	       Addresses; this approach can increase the accessibility of zone
243	       data in the DNS without increasing the size of a referral
244	       response from a nameserver authoritative for the parent zone.

246	4.  Design

248	4.1  Protocol Suitability

250	   When a service is anycast between two or more nodes, the routing
251	   system makes the node selection decision on behalf of a client.
252	   Since it is usually a requirement that a single client-server
253	   interaction is carried out between a client and the same server node
254	   for the duration of the transaction, it follows that the routing
255	   system's node selection decision ought to be stable for substantially
256	   longer than the expected transaction time, if the service is to be
257	   provided reliably.

259	   Some services have very short transaction times, and may even be
260	   carried out using a single packet request and a single packet reply
261	   in some cases (e.g.  DNS transactions over UDP transport).  Other
262	   services involve far longer-lived transactions (e.g. bulk file
263	   downloads and audio-visual media streaming).

265	   Some anycast deployments have very predictable routing systems, which
266	   can remain stable for long periods of time (e.g. anycast within an
267	   well-managed and topologically-simple IGP, where node selection
268	   changes only occur as a response to node failures).  Other
269	   deployments have far less predictable characteristics (see
270	   Section 4.4.7).

272	   The stability of the routing system together with the transaction
273	   time of the service should be carefully compared when deciding
274	   whether a service is suitable for distribution using anycast.  In
275	   some cases, for new protocols, it may be practical to split large
276	   transactions into an initialisation phase which is handled by anycast
277	   servers, and a sustained phase which is provided by non-anycast
278	   servers, perhaps chosen during the initialisation phase.

280	   This document deliberately avoids prescribing rules as to which
281	   protocols or services are suitable for distribution by anycast; to
282	   attempt to do so would be presumptuous.

284	4.2  Node Placement

286	   Decisions as to where Anycast Nodes should be placed will depend to a
287	   large extent on the goals of the service distribution.  For example:

289	   o  A DNS recursive resolver service might be distributed within an
290	      ISP's network, one Anycast Node per site.

292	   o  A root DNS server service might be distributed throughout the
293	      Internet with nodes located in regions with poor external
294	      connectivity, to ensure that the DNS functions adequately within
295	      the region during times of external network failure.

297	   o  An FTP mirror service might include local nodes located at
298	      exchange points, so that ISPs connected to that exchange point
299	      could download bulk data more cheaply than if they had to use
300	      expensive transit circuits.

302	   In general node placement decisions should be made with consideration
303	   of likely traffic requirements, the potential for flash crowds or
304	   denial-of-service traffic, the stability of the local routing system
305	   and the failure modes with respect to node failure, or local routing
306	   system failure.

308	4.3  Routing Systems

310	4.3.1  Anycast within an IGP

312	   There are several common motivations for the distribution of a
313	   Service Address within the scope of an IGP:

315	   1.  to improve service response times, by hosting a service close to
316	       other users of the network;

318	   2.  to improve service reliability by providing automatic fail-over
319	       to backup nodes; and

321	   3.  to keep service traffic local, to avoid congesting wide-area
322	       links.

324	   In each case the decisions as to where and how services are
325	   provisioned can be made by network engineers without requiring such
326	   operational complexities as regional variances in the configuration
327	   of client computers, or deliberate DNS incoherence (causing DNS
328	   queries to yield different answers depending on where the queries
329	   originate).

331	   When a service is anycast within an IGP the routing system is
332	   typically under the control of the same organisation that is
333	   providing the service, and hence the relationship between service
334	   transaction characteristics and network stability are likely to be
335	   well-understood.  This technique is consequently applicable to a
336	   larger number of applications than Internet-wide anycast service
337	   distribution (see Section 4.1).

339	   An IGP will generally have no inherent restriction on the length of
340	   prefix that can be introduced to it.  There may well therefore be no
341	   need to construct a covering prefix for particular Service Addresses;
342	   host routes corresponding to the Service Address can instead be
343	   introduced to the routing system.  See Section 4.4.2 for more
344	   discussion of the requirement for a covering prefix.

346	   IGPs often feature little or no aggregation of routes, partly due to
347	   algorithmic complexities in supporting aggregation.  There is little
348	   motivation for aggregation in many networks' IGPs in any case, since
349	   the amount of routing information carried in the IGP is small enough
350	   that scaling concerns in routers do not arise.  For discussion of
351	   aggregation risks in other routing systems, see Section 4.4.8.

353	   By reducing the scope of the IGP to just the hosts providing service
354	   (together with one or more gateway routers) this technique can be
355	   applied to the construction of server clusters.  This application is
356	   discussed in some detail in [ISC-TN-2004-1].

358	4.3.2  Anycast within the Global Internet

360	   Service Addresses may be anycast within the global Internet routing
361	   system in order to distribute services across the entire network.
362	   The principal differences between this application and the IGP-scope
363	   distribution discussed in Section 4.3.1 are that:

365	   1.  the routing system is, in general, controlled by other people;

367	   2.  the routing protocol concerned (BGP), and commonly-accepted
368	       practices in its deployment, impose some additional constraints
369	       (see Section 4.4).

371	4.4  Routing Considerations

373	4.4.1  Signalling Service Availability

375	   When a routing system is provided with reachability information for a
376	   Service Address from an individual node, packets addressed to that
377	   Service Address will start to arrive at the node.  Since it is
378	   essential for the node to be ready to accept requests before they
379	   start to arrive, a coupling between the routing information and the
380	   availability of the service at a particular node is desirable.

382	   Where a routing advertisement from a node corresponds to a single
383	   Service Address, this coupling might be such that availability of the
384	   service triggers the route advertisement, and non-availability of the
385	   service triggers a route withdrawal.  This can be achieved using
386	   routing protocol implementations on the same server which provide the
387	   service being distributed, which are configured to advertise and
388	   withdraw the route advertisement in conjunction with the availability
389	   (and health) of the software on the host which processes service
390	   requests.  An example of such an arrangement for a DNS service is
391	   included in [ISC-TN-2004-1].

393	   Where a routing advertisement from a node corresponds to two or more
394	   Service Addresses, it may not be appropriate to trigger a route
395	   withdrawal due to the non-availability of a single service.  Another
396	   approach is to route requests for the service which is down at one
397	   Anycast Node to a different Anycast Node at which the service is up.
398	   This approach is discussed in Section 4.8.

400	   Rapid advertisement/withdrawal oscillations can cause operational
401	   problems, and nodes should be configured such that rapid oscillations
402	   are avoided (e.g. by implementing a minimum delay following a
403	   withdrawal before the service can be re-advertised).  See
404	   Section 4.4.4 for a discussion of route oscillations in BGP.

406	4.4.2  Covering Prefix

408	   In some routing systems (e.g. the BGP-based routing system of the
409	   global Internet) it is not possible, in general, to propagate a host
410	   route with confidence that the route will propagate throughout the
411	   network.  This is a consequence of operational policy, and not a
412	   protocol restriction.

414	   In such cases it is necessary to propagate a route which covers the
415	   Service Address, and which has a sufficiently short prefix that it
416	   will not be discarded by commonly-deployed import policies.  For IPv4
417	   Service Addresses, this is often a 24-bit prefix, but there are other
418	   well-documented examples of IPv4 import polices which filter on
419	   Regional Internet Registry (RIR) allocation boundaries, and hence
420	   some experimentation may be prudent.  Corresponding import policies
421	   for IPv6 prefixes also exist.  See Section 4.5 for more discussion of
422	   IPv6 Service Addresses and corresponding anycast routes.

424	   The propagation of a single route per service has some associated
425	   scaling issues which are discussed in Section 4.4.8.

427	   Where multiple Service Addresses are covered by the same covering
428	   route, there is no longer a tight coupling between the advertisement
429	   of that route and the individual services associated with the covered
430	   host routes.  The resulting impact on signaling availability of
431	   individual services is discussed in Section 4.4.1 and Section 4.8.

433	4.4.3  Equal-Cost Paths

435	   Some routing systems support equal-cost paths to the same
436	   destination.  Where multiple, equal-cost paths exist and lead to
437	   different anycast nodes, there is a risk that different request
438	   packets associated with a single transaction might be delivered to
439	   more than one node.  Services provided over TCP [RFC0793] necessarily
440	   involve transactions with multiple request packets, due to the TCP
441	   setup handshake.

443	   For services which are distributed across the global Internet using
444	   BGP, equal-cost paths are normally not a consideration: BGP's exit
445	   selection algorithm usually selects a single, consistent exit for a
446	   single destination regardless of whether multiple candidate paths
447	   exist.  Implementations of BGP exist that support multi-path exit
448	   selection, however.

450	   Equal cost paths are commonly supported in IGPs.  Multi-node
451	   selection for a single transaction can be avoided in most cases by
452	   careful consideration of IGP link metrics, or by applying equal-cost
453	   multi-path (ECMP) selection algorithms which cause a single node to
454	   be selected for a single multi-packet transaction.  For an example of
455	   the use of hash-based ECMP selection in anycast service distribution,
456	   see [ISC-TN-2004-1].

458	   Other ECMP selection algorithms are commonly available, including
459	   those in which packets from the same flow are not guaranteed to be
460	   routed towards the same destination.  ECMP algorithms which select a
461	   route on a per-packet basis rather than per-flow are commonly
462	   referred to as performing "Per Packet Load Balancing" (PPLB).

464	   With respect to anycast service distribution, some uses of PPLB may
465	   cause different packets from a single multi-packet transaction sent
466	   by a client to be delivered to different anycast nodes, effectively
467	   making the anycast service unavailable.  Whether this affects
468	   specific anycast services will depend on how and where anycast nodes
469	   are deployed within the routing system, and on where the PPLB is
470	   being performed:

472	   1.  PPLB across multiple, parallel links between the same pair of
473	       routers should cause no node selection problems;

475	   2.  PPLB across diverse paths within a single autonomous system (AS),
476	       where the paths converge to a single exit as they leave the AS,
477	       should cause no node selection problems;

479	   3.  PPLB across links to different neighbour ASes where where the
480	       neighbour ASes have selected different nodes for a particular
481	       anycast destination will, in general, cause request packets to be
482	       distributed across multiple anycast nodes.  This will have the
483	       effect that the anycast service is unavailable to clients
484	       downstream of the router performing PPLB.

486	   The uses of PPLB which have the potential to interact badly with
487	   anycast service distribution can also cause persistent packet
488	   reordering.  A network path that persistently reorders segments will
489	   degrade the performance of traffic carried by TCP [Allman2000].  TCP,
490	   according to several documented measurements, accounts for the bulk
491	   of traffic carried on the Internet ([McCreary2000], [Fomenkov2004]).
492	   Consequently, in many cases it is reasonable to consider networks
493	   making such use of PPLB to be pathological.

495	4.4.4  Route Dampening

497	   Frequent advertisements and withdrawals of individual prefixes in BGP
498	   are known as flaps.  Rapid flapping can lead to CPU exhaustion on
499	   routers quite remote from the source of the instability, and for this
500	   reason rapid route oscillations are frequently "dampened", as
501	   described in [RFC2439].

503	   A dampened path will be suppressed by routers for an interval which
504	   increases according to the frequency of the observed oscillation; a
505	   suppressed path will not propagate.  Hence a single router can
506	   prevent the propagation of a flapping prefix to the rest of an
507	   autonomous system, affording other routers in the network protection
508	   from the instability.

510	   Some implementations of flap dampening penalise oscillating
511	   advertisements based on the observed AS_PATH, and not on the NLRI.
512	   For this reason, network instability which leads to route flapping
513	   from a single anycast node ought not to cause advertisements from
514	   other nodes (which have different AS_PATH attributes) to be dampened.

516	   To limit the opportunity of such implementations to penalise
517	   advertisements originating from different Anycast Nodes in response
518	   to oscillations from just a single node, care should be taken to
519	   arrange that the AS_PATH attributes on routes from different nodes
520	   are as diverse as possible.  For example, Anycast Nodes should use
521	   the same origin AS for their advertisements, but might have different
522	   upstream ASs.

524	   Where different implementations of flap dampening are prevalent,
525	   individual nodes' instability may result in stable nodes becoming
526	   unavailable.  In mitigation, the following measures may be useful:

528	   1.  Judicious deployment of Local Nodes in combination with
529	       especially stable Global Nodes (with high inter-AS path splay,
530	       redundant hardware, power, etc) may help limit oscillation
531	       problems to the Local Nodes' limited regions of influence;

533	   2.  Aggressive flap-dampening of the service prefix close to the
534	       origin (e.g. within an Anycast Node, or in adjcacent ASes of each
535	       Anycast Node) may also help reduce the opportunity of remote ASes
536	       to see oscillations at all.

538	4.4.5  Reverse Path Forwarding Checks

540	   Reverse Path Forwarding (RPF) checks, first described in [RFC2267],
541	   are commonly deployed as part of ingress interface packet filters on
542	   routers in the Internet in order to deny packets whose source
543	   addresses are spoofed (see also RFC 2827 [RFC2827]).  Deployed
544	   implementations of RPF make several modes of operation available
545	   (e.g. "loose" and "strict").

547	   Some modes of RPF can cause non-spoofed packets to be denied when
548	   they originate from multi-homed site, since selected paths might
549	   legitimately not correspond with the ingress interface of non-spoofed
550	   packets from the multi-homed site.  This issue is discussed in
551	   [RFC3704].

553	   A collection of anycast nodes deployed across the Internet is largely
554	   indistinguishable from a distributed, multi-homed site to the routing
555	   system, and hence this risk also exists for anycast nodes, even if
556	   individual nodes are not multi-homed.  Care should be taken to ensure
557	   that each anycast node is treated as a multi-homed network, and that
558	   the corresponding recommendations in [RFC3704] with respect to RPF
559	   checks are heeded.

561	4.4.6  Propagation Scope

563	   In the context of Anycast service distribution across the global
564	   Internet, Global Nodes are those which are capable of providing
565	   service to clients anywhere in the network; reachability information
566	   for the service is propagated globally, without restriction, by
567	   advertising the routes covering the Service Addresses for global
568	   transit to one or more providers.

570	   More than one Global Node can exist for a single service (and indeed
571	   this is often the case, for reasons of redundancy and load-sharing).

573	   In contrast, it is sometimes desirable to deploy an Anycast Node
574	   which only provides services to a local catchment of autonomous
575	   systems, and which is deliberately not available to the entire
576	   Internet; such nodes are referred to in this document as Local Nodes.
577	   An example of circumstances in which a Local Node may be appropriate
578	   are nodes designed to serve a region with rich internal connectivity
579	   but unreliable, congested or expensive access to the rest of the
580	   Internet.

582	   Local Nodes advertise covering routes for Service Addresses in such a
583	   way that their propagation is restricted.  This might be done using
584	   well-known community string attributes such as NO_EXPORT [RFC1997] or
585	   NOPEER [RFC3765], or by arranging with peers to apply a conventional
586	   "peering" import policy instead of a "transit" import policy, or some
587	   suitable combination of measures.

589	   Advertising reachability to Service Addresses from Local Nodes should
590	   ideally be made using a routing policy that require presence of
591	   explicit attributes for propagation, rather than reling on implicit
592	   (default) policy.  Inadvertant propagation of a route beyond its
593	   intended horizon can result in capacity problems for Local Nodes
594	   which might degrade service performance network-wide.

596	4.4.7  Other Peoples' Networks

598	   When Anycast services are deployed across networks operated by
599	   others, their reachability is dependent on routing policies and
600	   topology changes (planned and unplanned) which are unpredictable and
601	   sometimes difficult to identify.  Since the routing system may
602	   include networks operated by multiple, unrelated organisations, the
603	   possibility of unforeseen interactions resulting from the
604	   combinations of unrelated changes also exists.

606	   The stability and predictability of such a routing system should be
607	   taken into consideration when assessing the suitability of anycast as
608	   a distribution strategy for particular services and protocols (see
609	   also Section 4.1).

611	   By way of mitigation, routing policies used by Anycast Nodes across
612	   such routing systems should be conservative, individual nodes'
613	   internal and external/connecting infrastructure should be scaled to
614	   support loads far in excess of the average, and the service should be
615	   monitored proactively from many points in order to avoid unpleasant
616	   surprises (see Section 5.1).

618	4.4.8  Aggregation Risks

620	   The propagation of a single route for each anycast service does not
621	   scale well for routing systems in which the load of routing
622	   information which must be carried is a concern, and where there are
623	   potentially many services to distribute.  For example, an autonomous
624	   system which provides services to the Internet with N Service
625	   Addresses covered by a single exported route, would need to advertise
626	   (N+1) routes if each of those services were to be distributed using
627	   anycast.

629	   The common practice of applying minimum prefix-length filters in
630	   import policies on the Internet (see Section 4.4.2) means that for a
631	   route covering a Service Address to be usefully propagated the prefix
632	   length must be substantially less than that required to advertise
633	   just the host route.  Widespread advertisement of short prefixes for
634	   individual services hence also has a negative impact on address
635	   conservation.

637	   Both of these issues can be mitigated to some extent by the use of a
638	   single covering prefix to accommodate multiple Service Addresses, as
639	   described in Section 4.8.  This implies a decoupling of the route
640	   advertisement from individual service availability (see
641	   Section 4.4.1), however, with attendant risks to the stability of the
642	   service as a whole (see Section 4.7).

644	   In general, the scaling problems described here prevent anycast from
645	   being a useful, general approach for service distribution on the
646	   global Internet.  It remains, however, a useful technique for
647	   distributing a limited number of Internet-critical services, as well
648	   as in smaller networks where the aggregation concerns discussed here
649	   do not apply.

651	4.5  Addressing Considerations

653	   Service Addresses should be unique within the routing system that
654	   connects all Anycast Nodes to all possible clients of the service.
655	   Service Addresses must also be chosen so that corresponding routes
656	   will be allowed to propagate within that routing system.

658	   For an IPv4-numbered service deployed across the Internet, for
659	   example, an address might be chosen from a block where the minimum
660	   RIR allocation size is 24 bits, and reachability to that address
661	   might be provided by originating the covering 24-bit prefix.

663	   For an IPv4-numbered service deployed within a private network, a
664	   locally-unused [RFC1918] address might be chosen, and rechability to
665	   that address might be signalled using a (32-bit) host route.

667	   For IPv6-numbered services, Anycast Addresses are not scoped
668	   differently from unicast addresses.  As such the guidelines presented
669	   for IPv4 with respect to address suitability follow for IPv6.  Note
670	   that historical prohibitions on anycast distribution of services over
671	   IPv6 have been removed from the IPv6 addressing specification in
672	   [I-D.ietf-ipv6-addr-arch-v4].

674	4.6  Data Synchronisation

676	   Although some services have been deployed in localised form (such
677	   that clients from particular regions are presented with regionally-
678	   relevant content) many services have the property that responses to
679	   client requests should be consistent, regardless of where the request
680	   originates.  For a service distributed using anycast, that implies
681	   that different Anycast Nodes must operate in a consistent manner and,
682	   where that consistent behaviour is based on a data set, that the data
683	   concerned be synchronised between nodes.

685	   The mechanism by which data is synchronised depends on the nature of
686	   the service; examples are zone transfers for authoritative DNS
687	   servers and rsync for FTP archives.  In general, the synchronisation
688	   of data between Anycast Nodes will involve transactions between non-
689	   anycast addresses.

691	   Data synchronisation across public networks should be carried out
692	   with appropriate authentication and encryption.

694	4.7  Node Autonomy

696	   For an Anycast deployment whose goals include improved reliability
697	   through redundancy, it is important to minimise the opportunity for a
698	   single defect to compromise many (or all) nodes, or for the failure
699	   of one node to provide a cascading failure bringing down additional
700	   successive nodes until the service as a whole is defeated.

702	   Co-dependencies are avoided by making each node as autonomous and
703	   self-sufficient as possible.  The degree to which nodes can survive
704	   failure elsewhere depends on the nature of the service being
705	   delivered, but for services which accommodate disconnected operation
706	   (e.g. the timed propagation of changes between master and slave
707	   servers in the DNS) a high degree of autonomy can be achieved.

709	   The possibility of cascading failure due to load can also be reduced
710	   by the deployment of both Global and Local Nodes for a single
711	   service, since the effective fail-over path of traffic is, in
712	   general, from Local Node to Global Node; traffic that might sink one
713	   Local Node is unlikely to sink all Local Nodes, except in the most
714	   degenerate cases.

716	   The chance of cascading failure due to a software defect in an
717	   operating system or server can be reduced in many cases by deploying
718	   nodes running different implementations of operating system, server
719	   software, routing protocol software, etc, such that a defect which
720	   appears in a single component does not affect the whole system.

722	4.8  Multi-Service Nodes

724	   For a service distributed across a routing system where covering
725	   prefixes are required to announce reachability to a single Service
726	   Address (see Section 4.4.2), special consideration is required in the
727	   case where multiple services need to be distributed across a single
728	   set of nodes.  This results from the requirement to signal
729	   availability of individual services to the routing system so that
730	   requests for service are not received by nodes which are not able to
731	   process them (see Section 4.4.1).

733	   Several approaches are described in the following sections.

735	4.8.1  Multiple Covering Prefixes

737	   Each Service Address is chosen such that only one Service Address is
738	   covered by each advertised prefix.  Advertisement and withdrawal of a
739	   single covering prefix can be tightly coupled to the availability of
740	   the single associated service.

742	   This is the most straightforward approach.  However, since it makes
743	   very poor utilisation of globally-unique addresses, it is only
744	   suitable for use for a small number of critical, infrastructural
745	   services such as root DNS servers.  General Internet-wide deployment
746	   of services using this approach will not scale.

748	4.8.2  Pessimistic Withdrawal

750	   Multiple Service Addresses are chosen such that they are covered by a
751	   single prefix.  Advertisement and withdrawl of the single covering
752	   prefix is coupled to the availability of all associated services; if
753	   any individual service becomes unavailable, the covering prefix is
754	   withdrawn.

756	   The coupling between service availability and advertisement of the
757	   covering prefix is complicated by the requirement that all Service
758	   Addresses must be available -- the announcement needs to be triggered
759	   by the presence of all component routes, and not just a single
760	   covered route.

762	   The fact that a single malfunctioning service causes all deployed
763	   services in a node to be taken off-line may make this approach
764	   unsuitable for many applications.

766	4.8.3  Intra-Node Interior Connectivity

768	   Multiple Service Addresses are chosen such that they are covered by a
769	   single prefix.  Advertisement and withdrawal of the single covering
770	   prefix is coupled to the availability of any one service.  Nodes have
771	   interior connectivity, e.g. using tunnels, and host routes for
772	   service addresses are distributed using an IGP which extends to
773	   include routers at all nodes.

775	   In the event that a service is unavailable at one node, but available
776	   at other nodes, a request may be routed over the interior network
777	   from the receiving node towards some other node for processing.

779	   In the event that some local services in a node are down and the node
780	   is disconnected from other nodes, continued advertisement of the
781	   covering prefix might cause requests to become black-holed.

783	   This approach allows reasonable address utilisation of the netblock
784	   covered by the announced prefix, at the expense of reduced autonomy
785	   of individual nodes; the IGP in which all nodes participate can be
786	   viewed as a single point of failure.

788	5.  Service Management

790	5.1  Monitoring

792	   Monitoring a service which is distributed is more complex than
793	   monitoring a non-distributed service, since the observed accuracy and
794	   availability of the service is, in general, different when viewed
795	   from clients attached to different parts of the network.  When a
796	   problem is identified, it is also not always obvious which node
797	   served the request, and hence which node is malfunctioning.

799	   It is recommended that distributed services are monitored from probes
800	   distributed representatively across the routing system, and, where
801	   possible, the identity of the node answering individual requests is
802	   recorded along with performance and availability statistics.  The
803	   RIPE NCC DNSMON service [1] is an example of such monitoring for the
804	   DNS.

806	   Monitoring the routing system (from a variety of places, in the case
807	   of routing systems where perspective is relevant) can also provide
808	   useful diagnostics for troubleshooting service availability.  This
809	   can be achieved using dedicated probes, or public route measurement
810	   facilities on the Internet such as the RIPE NCC Routing Information
811	   Service [2] and the University of Oregon Route Views Project [3].

813	   Monitoring the health of the component devices in an Anycast
814	   deployment of a service (hosts, routers, etc) is straightforward, and
815	   can be achieved using the same tools and techniques commonly used to
816	   manage other network-connected infrastructure, without the additional
817	   complexity involved in monitoring Anycast service addresses.

819	6.  Security Considerations

821	6.1  Denial-of-Service Attack Mitigation

823	   This document describes mechanisms for deploying services on the
824	   Internet which can be used to mitigate vulnerability to attack:

826	   1.  An Anycast Node can act as a sink for attack traffic originated
827	       within its sphere of influence, preventing nodes elsewhere from
828	       having to deal with that traffic;

830	   2.  The task of dealing with attack traffic whose sources are widely
831	       distributed is itself distributed across all the nodes which
832	       contribute to the service.  Since the problem of sorting between
833	       legitimate and attack traffic is distributed, this may lead to
834	       better scaling properties than a service which is not
835	       distributed.

837	6.2  Service Compromise

839	   The distribution of a service across several (or many) autonomous
840	   nodes imposes increased monitoring as well as an increased systems
841	   administration burden on the operator of the service which might
842	   reduce the effectiveness of host and router security.

844	   The potential benefit of being able to take compromised servers off-
845	   line without compromising the service can only be realised if there
846	   are working procedures to do so quickly and reliably.

848	6.3  Service Hijacking

850	   It is possible that an unauthorised party might advertise routes
851	   corresponding to anycast Service Addresses across a network, and by
852	   doing so capture legitimate request traffic or process requests in a
853	   manner which compromises the service (or both).  A rogue Anycast Node
854	   might be difficult to detect by clients or by the operator of the
855	   service.

857	   The risk of service hijacking by manipulation of the routing sytem
858	   exists regardless of whether a service is distributed using anycast.
859	   However, the fact that legitimate Anycast Nodes are observable in the
860	   routing system may make it more difficult to detect rogue nodes.

862	7.  Protocol Considerations

864	   This document does not impose any protocol considerations.

866	8.  IANA Considerations

868	   This document requests no action from IANA.

870	9.  Acknowlegements

872	   The authors gratefully acknowledge the contributions from various
873	   participants of the grow working group, and in particular Geoff
874	   Huston, Pekka Savola, Danny McPherson, Ben Black and Alan Barrett.

876	   This work was supported by the US National Science Foundation
877	   (research grant SCI-0427144) and DNS-OARC.

879	10.  References

881	10.1  Normative References

883	   [I-D.ietf-ipv6-addr-arch-v4]
884	              Hinden, R. and S. Deering, "IP Version 6 Addressing
885	              Architecture", draft-ietf-ipv6-addr-arch-v4-04 (work in
886	              progress), May 2005.

888	   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
889	              RFC 793, September 1981.

891	   [RFC1771]  Rekhter, Y. and T. Li, "A Border Gateway Protocol 4
892	              (BGP-4)", RFC 1771, March 1995.

894	   [RFC1918]  Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and
895	              E. Lear, "Address Allocation for Private Internets",
896	              BCP 5, RFC 1918, February 1996.

898	   [RFC1997]  Chandrasekeran, R., Traina, P., and T. Li, "BGP
899	              Communities Attribute", RFC 1997, August 1996.

901	   [RFC2439]  Villamizar, C., Chandra, R., and R. Govindan, "BGP Route
902	              Flap Damping", RFC 2439, November 1998.

904	   [RFC2827]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
905	              Defeating Denial of Service Attacks which employ IP Source
906	              Address Spoofing", BCP 38, RFC 2827, May 2000.

908	   [RFC3704]  Baker, F. and P. Savola, "Ingress Filtering for Multihomed
909	              Networks", BCP 84, RFC 3704, March 2004.

911	10.2  Informative References

913	   [Allman2000]
914	              Allman, M. and E. Blanton, "On Making TCP More Robust to
915	              Packet Reordering", January 2000,
916	              <http://www.icir.org/mallman/papers/tcp-reorder-ccr.ps>.

918	   [Fomenkov2004]
919	              Fomenkov, M., Keys, K., Moore, D., and k. claffy,
920	              "Longitudinal Study of Internet Traffic from 1999-2003",
921	              January 2004, <http://www.caida.org/outreach/papers/2003/
922	              nlanr/nlanr_overview.pdf>.

924	   [ISC-TN-2003-1]
925	              Abley, J., "Hierarchical Anycast for Global Service
926	              Distribution", March 2003,
927	              <http://www.isc.org/pubs/tn/isc-tn-2003-1.html>.

929	   [ISC-TN-2004-1]
930	              Abley, J., "A Software Approach to Distributing Requests
931	              for DNS Service using GNU Zebra, ISC BIND 9 and FreeBSD",
932	              March 2004,
933	              <http://www.isc.org/pubs/tn/isc-tn-2004-1.html>.

935	   [McCreary2000]
936	              McCreary, S. and k. claffy, "Trends in Wide Area IP
937	              Traffic Patterns: A View from Ames Internet Exchange",
938	              September 2000, <http://www.caida.org/outreach/papers/
939	              2000/AIX0005/AIX0005.pdf>.

941	   [RFC1546]  Partridge, C., Mendez, T., and W. Milliken, "Host
942	              Anycasting Service", RFC 1546, November 1993.

944	   [RFC2267]  Ferguson, P. and D. Senie, "Network Ingress Filtering:
945	              Defeating Denial of Service Attacks which employ IP Source
946	              Address Spoofing", RFC 2267, January 1998.

948	   [RFC3765]  Huston, G., "NOPEER Community for Border Gateway Protocol
949	              (BGP) Route Scope Control", RFC 3765, April 2004.

951	URIs

953	   [1]  <http://dnsmon.ripe.net/>

955	   [2]  <http://ris.ripe.net>

957	   [3]  <http://www.route-views.org>

959	Authors' Addresses

961	   Joe Abley
962	   Internet Systems Consortium, Inc.
963	   950 Charter Street
964	   Redwood City, CA  94063
965	   USA

967	   Phone: +1 650 423 1317
968	   Email: jabley@isc.org
969	   URI:   http://www.isc.org/

971	   Kurt Erik Lindqvist
972	   Netnod Internet Exchange
973	   Bellmansgatan 30
974	   118 47 Stockholm
975	   Sweden

977	   Email: kurtis@kurtis.pp.se
978	   URI:   http://www.netnod.se/

980	Appendix A.  Change History

982	   This section should be removed before publication.

984	   draft-kurtis-anycast-bcp-00:  Initial draft.  Discussed at IETF 61 in
985	      the grow meeting and adopted as a working group document shortly
986	      afterwards.

988	   draft-ietf-grow-anycast-00:  Missing and empty sections completed;
989	      some structural reorganisation; general wordsmithing.  Document
990	      discussed at IETF 62.

992	   draft-ietf-grow-anycast-01:  This appendix added; acknowledgements
993	      section added; commentary on RFC3513 prohibition of anycast on
994	      hosts removed; minor sentence re-casting and related jiggery-
995	      pokery.  This revision published for discussion at IETF 63.

997	   draft-ietf-grow-anycast-02:  Normative reference to [I-D.ietf-ipv6-
998	      addr-arch-v4] added (in the RFC editor's queue at the time of
999	      writing; reference should be updated to an RFC number when
1000	      available).  Added commentary on per-packet load balancing.

1002	Intellectual Property Statement

1004	   The IETF takes no position regarding the validity or scope of any
1005	   Intellectual Property Rights or other rights that might be claimed to
1006	   pertain to the implementation or use of the technology described in
1007	   this document or the extent to which any license under such rights
1008	   might or might not be available; nor does it represent that it has
1009	   made any independent effort to identify any such rights.  Information
1010	   on the procedures with respect to rights in RFC documents can be
1011	   found in BCP 78 and BCP 79.

1013	   Copies of IPR disclosures made to the IETF Secretariat and any
1014	   assurances of licenses to be made available, or the result of an
1015	   attempt made to obtain a general license or permission for the use of
1016	   such proprietary rights by implementers or users of this
1017	   specification can be obtained from the IETF on-line IPR repository at
1018	   http://www.ietf.org/ipr.

1020	   The IETF invites any interested party to bring to its attention any
1021	   copyrights, patents or patent applications, or other proprietary
1022	   rights that may cover technology that may be required to implement
1023	   this standard.  Please address the information to the IETF at
1024	   ietf-ipr@ietf.org.

1026	Disclaimer of Validity

1028	   This document and the information contained herein are provided on an
1029	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
1030	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
1031	   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
1032	   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
1033	   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
1034	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

1036	Copyright Statement

1038	   Copyright (C) The Internet Society (2005).  This document is subject
1039	   to the rights, licenses and restrictions contained in BCP 78, and
1040	   except as set forth therein, the authors retain all their rights.

1042	Acknowledgment

1044	   Funding for the RFC Editor function is currently provided by the
1045	   Internet Society.