idnits 2.17.1 draft-ietf-grow-anycast-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1034. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1011. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1018. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1024. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (October 21, 2005) is 6762 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2267 (Obsoleted by RFC 2827) Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Abley 3 Internet-Draft ISC 4 Expires: April 24, 2006 K. Lindqvist 5 Netnod Internet Exchange 6 October 21, 2005 8 Operation of Anycast Services 9 draft-ietf-grow-anycast-02 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on April 24, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2005). 40 Abstract 42 As the Internet has grown, and as systems and networked services 43 within enterprises have become more pervasive, many services with 44 high availability requirements have emerged. These requirements have 45 increased the demands on the reliability of the infrastructure on 46 which those services rely. 48 Various techniques have been employed to increase the availability of 49 services deployed on the Internet. This document presents commentary 50 and recommendations for distribution of services using anycast. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Anycast Service Distribution . . . . . . . . . . . . . . . . . 5 57 3.1 General Description . . . . . . . . . . . . . . . . . . . 5 58 3.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 4. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 60 4.1 Protocol Suitability . . . . . . . . . . . . . . . . . . . 7 61 4.2 Node Placement . . . . . . . . . . . . . . . . . . . . . . 7 62 4.3 Routing Systems . . . . . . . . . . . . . . . . . . . . . 8 63 4.3.1 Anycast within an IGP . . . . . . . . . . . . . . . . 8 64 4.3.2 Anycast within the Global Internet . . . . . . . . . . 9 65 4.4 Routing Considerations . . . . . . . . . . . . . . . . . . 9 66 4.4.1 Signalling Service Availability . . . . . . . . . . . 9 67 4.4.2 Covering Prefix . . . . . . . . . . . . . . . . . . . 10 68 4.4.3 Equal-Cost Paths . . . . . . . . . . . . . . . . . . . 10 69 4.4.4 Route Dampening . . . . . . . . . . . . . . . . . . . 12 70 4.4.5 Reverse Path Forwarding Checks . . . . . . . . . . . . 13 71 4.4.6 Propagation Scope . . . . . . . . . . . . . . . . . . 13 72 4.4.7 Other Peoples' Networks . . . . . . . . . . . . . . . 14 73 4.4.8 Aggregation Risks . . . . . . . . . . . . . . . . . . 14 74 4.5 Addressing Considerations . . . . . . . . . . . . . . . . 15 75 4.6 Data Synchronisation . . . . . . . . . . . . . . . . . . . 15 76 4.7 Node Autonomy . . . . . . . . . . . . . . . . . . . . . . 16 77 4.8 Multi-Service Nodes . . . . . . . . . . . . . . . . . . . 16 78 4.8.1 Multiple Covering Prefixes . . . . . . . . . . . . . . 17 79 4.8.2 Pessimistic Withdrawal . . . . . . . . . . . . . . . . 17 80 4.8.3 Intra-Node Interior Connectivity . . . . . . . . . . . 17 81 5. Service Management . . . . . . . . . . . . . . . . . . . . . . 19 82 5.1 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 19 83 6. Security Considerations . . . . . . . . . . . . . . . . . . . 20 84 6.1 Denial-of-Service Attack Mitigation . . . . . . . . . . . 20 85 6.2 Service Compromise . . . . . . . . . . . . . . . . . . . . 20 86 6.3 Service Hijacking . . . . . . . . . . . . . . . . . . . . 20 87 7. Protocol Considerations . . . . . . . . . . . . . . . . . . . 21 88 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 89 9. Acknowlegements . . . . . . . . . . . . . . . . . . . . . . . 23 90 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 91 10.1 Normative References . . . . . . . . . . . . . . . . . . . 24 92 10.2 Informative References . . . . . . . . . . . . . . . . . . 24 93 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 26 94 A. Change History . . . . . . . . . . . . . . . . . . . . . . . . 27 95 Intellectual Property and Copyright Statements . . . . . . . . 28 97 1. Introduction 99 To distribute a service using anycast, the service is first 100 associated with a stable set of IP addresses, and reachability to 101 those addresses is advertised in a routing system from multiple, 102 independent service nodes. Various techniques for anycast deployment 103 of services are discussed in [RFC1546], [ISC-TN-2003-1] and [ISC-TN- 104 2004-1]. 106 Anycast has in recent years become increasingly popular for adding 107 redundancy to DNS servers to complement the redundancy which the DNS 108 architecture itself already provides. Several root DNS server 109 operators have distributed their servers widely around the Internet, 110 and both resolver and authority servers are commonly distributed 111 within the networks of service providers. Anycast distribution has 112 been used by commercial DNS authority server operators for several 113 years. The use of anycast is not limited to the DNS, although the 114 use of anycast imposes some additional limitations on the nature of 115 the service being distributed, including transaction longevity, 116 transaction state held on servers and data synchronisation 117 capabilities. 119 Although anycast is conceptually simple, its implementation 120 introduces some pitfalls for operation of services. For example, 121 monitoring the availability of the service becomes more difficult; 122 the observed availability changes according to the location of the 123 client within the network, and the client catchment of individual 124 anycast nodes is neither static, nor reliably deterministic. 126 This document will describe the use of anycast for both local scope 127 distribution of services using an Interior Gateway Protocol (IGP) and 128 global distribution using BGP [RFC1771]. Many of the issues for 129 monitoring and data synchronisation are common to both, but 130 deployment issues differ substantially. 132 2. Terminology 134 Service Address: an IP address associated with a particular service 135 (e.g. the destination address used by DNS resolvers to reach a 136 particular authority server). 138 Anycast: the practice of making a particular Service Address 139 available in multiple, discrete, autonomous locations, such that 140 datagrams sent are routed to one of several available locations. 142 Anycast Node: an internally-connected collection of hosts and routers 143 which together provide service for an anycast Service Address. An 144 Anycast Node might be as simple as a single host participating in 145 a routing protocol with adjacent routers, or it might include a 146 number of hosts connected in some more elaborate fashion; in 147 either case, to the routing system across which the service is 148 being anycast, each Anycast Node presents a unique path to the 149 Service Address. The entire anycast system for the service 150 consists of two or more separate Anycast Nodes. 152 Local-Scope Anycast: reachability information for the anycast Service 153 Address is propagated through a routing system in such a way that 154 a particular anycast node is only visible to a subset of the whole 155 routing system. 157 Local Node: an Anycast Node providing service using a Local-Scope 158 Anycast address. 160 Global-Scope Anycast: reachability information for the anycast 161 Service Address is propagated through a routing system in such a 162 way that a particular anycast node is potentially visible to the 163 whole routing system. 165 Global Node: an Anycast Node providing service using a Global-Scope 166 Anycast address. 168 3. Anycast Service Distribution 170 3.1 General Description 172 Anycast is the name given to the practice of making a Service Address 173 available to a routing system at Anycast Nodes in two or more 174 discrete locations. The service provided by each node is consistent 175 regardless of the particular node chosen by the routing system to 176 handle a particular request. 178 For services distributed using anycast, there is no inherent 179 requirement for referrals to other servers or name-based service 180 distribution ("round-robin DNS"), although those techniques could be 181 combined with anycast service distribution if an application required 182 it. The routing system decides which node is used for each request, 183 based on the topological design of the routing system and the point 184 in the network at which the request originates. 186 The Anycast Node chosen to service a particular query can be 187 influenced by the traffic engineering capabilities of the routing 188 protocols which make up the routing system. The degree of influence 189 available to the operator of the node depends on the scale of the 190 routing system within which the Service Address is anycast. 192 Load-balancing between Anycast Nodes is typically difficult to 193 achieve (load distribution between nodes is generally unbalanced in 194 terms of request and traffic load). Distribution of load between 195 nodes for the purposes of reliability, and coarse-grained 196 distribution of load for the purposes of making popular services 197 scalable can often be achieved, however. 199 The scale of the routing system through which a service is anycast 200 can vary from a small Interior Gateway Protocol (IGP) connecting a 201 small handful of components, to the Border Gateway Protocol (BGP) 202 [RFC1771] connecting the global Internet, depending on the nature of 203 the service distribution that is required. 205 3.2 Goals 207 A service may be anycast for a variety of reasons. A number of 208 common objectives are: 210 1. Coarse ("unbalanced") distribution of load across nodes, to allow 211 infrastructure to scale to increased numbers of queries and to 212 accommodate transient query peaks; 214 2. Mitigation of non-distributed denial of service attacks by 215 localising damage to single anycast nodes; 217 3. Constraint of distributed denial of service attacks or flash 218 crowds to local regions around anycast nodes (perhaps restricting 219 query traffic to local peering links, rather than paid transit 220 circuits); 222 4. To provide additional information to help locate location of 223 traffic sources in the case of attack (or query) traffic which 224 incorporates spoofed source addresses. This information is 225 derived from the property of anycast service distribution that 226 the the selection of the Anycast Node used to service a 227 particular query may be related to the topological source of the 228 request. 230 5. Improvement of query response time, by reducing the network 231 distance between client and server with the provision of a local 232 Anycast Node. The extent to which query response time is 233 improved depends on the way that nodes are selected for the 234 clients by the routing system. Topological nearness within the 235 routing system does not, in general, correlate to round-trip 236 performance across a network; in some cases response times may 237 see no reduction, and may increase. 239 6. To reduce a list of servers to a single, distributed address. 240 For example, a large number of authoritative nameservers for a 241 zone may be deployed using a small set of anycast Service 242 Addresses; this approach can increase the accessibility of zone 243 data in the DNS without increasing the size of a referral 244 response from a nameserver authoritative for the parent zone. 246 4. Design 248 4.1 Protocol Suitability 250 When a service is anycast between two or more nodes, the routing 251 system makes the node selection decision on behalf of a client. 252 Since it is usually a requirement that a single client-server 253 interaction is carried out between a client and the same server node 254 for the duration of the transaction, it follows that the routing 255 system's node selection decision ought to be stable for substantially 256 longer than the expected transaction time, if the service is to be 257 provided reliably. 259 Some services have very short transaction times, and may even be 260 carried out using a single packet request and a single packet reply 261 in some cases (e.g. DNS transactions over UDP transport). Other 262 services involve far longer-lived transactions (e.g. bulk file 263 downloads and audio-visual media streaming). 265 Some anycast deployments have very predictable routing systems, which 266 can remain stable for long periods of time (e.g. anycast within an 267 well-managed and topologically-simple IGP, where node selection 268 changes only occur as a response to node failures). Other 269 deployments have far less predictable characteristics (see 270 Section 4.4.7). 272 The stability of the routing system together with the transaction 273 time of the service should be carefully compared when deciding 274 whether a service is suitable for distribution using anycast. In 275 some cases, for new protocols, it may be practical to split large 276 transactions into an initialisation phase which is handled by anycast 277 servers, and a sustained phase which is provided by non-anycast 278 servers, perhaps chosen during the initialisation phase. 280 This document deliberately avoids prescribing rules as to which 281 protocols or services are suitable for distribution by anycast; to 282 attempt to do so would be presumptuous. 284 4.2 Node Placement 286 Decisions as to where Anycast Nodes should be placed will depend to a 287 large extent on the goals of the service distribution. For example: 289 o A DNS recursive resolver service might be distributed within an 290 ISP's network, one Anycast Node per site. 292 o A root DNS server service might be distributed throughout the 293 Internet with nodes located in regions with poor external 294 connectivity, to ensure that the DNS functions adequately within 295 the region during times of external network failure. 297 o An FTP mirror service might include local nodes located at 298 exchange points, so that ISPs connected to that exchange point 299 could download bulk data more cheaply than if they had to use 300 expensive transit circuits. 302 In general node placement decisions should be made with consideration 303 of likely traffic requirements, the potential for flash crowds or 304 denial-of-service traffic, the stability of the local routing system 305 and the failure modes with respect to node failure, or local routing 306 system failure. 308 4.3 Routing Systems 310 4.3.1 Anycast within an IGP 312 There are several common motivations for the distribution of a 313 Service Address within the scope of an IGP: 315 1. to improve service response times, by hosting a service close to 316 other users of the network; 318 2. to improve service reliability by providing automatic fail-over 319 to backup nodes; and 321 3. to keep service traffic local, to avoid congesting wide-area 322 links. 324 In each case the decisions as to where and how services are 325 provisioned can be made by network engineers without requiring such 326 operational complexities as regional variances in the configuration 327 of client computers, or deliberate DNS incoherence (causing DNS 328 queries to yield different answers depending on where the queries 329 originate). 331 When a service is anycast within an IGP the routing system is 332 typically under the control of the same organisation that is 333 providing the service, and hence the relationship between service 334 transaction characteristics and network stability are likely to be 335 well-understood. This technique is consequently applicable to a 336 larger number of applications than Internet-wide anycast service 337 distribution (see Section 4.1). 339 An IGP will generally have no inherent restriction on the length of 340 prefix that can be introduced to it. There may well therefore be no 341 need to construct a covering prefix for particular Service Addresses; 342 host routes corresponding to the Service Address can instead be 343 introduced to the routing system. See Section 4.4.2 for more 344 discussion of the requirement for a covering prefix. 346 IGPs often feature little or no aggregation of routes, partly due to 347 algorithmic complexities in supporting aggregation. There is little 348 motivation for aggregation in many networks' IGPs in any case, since 349 the amount of routing information carried in the IGP is small enough 350 that scaling concerns in routers do not arise. For discussion of 351 aggregation risks in other routing systems, see Section 4.4.8. 353 By reducing the scope of the IGP to just the hosts providing service 354 (together with one or more gateway routers) this technique can be 355 applied to the construction of server clusters. This application is 356 discussed in some detail in [ISC-TN-2004-1]. 358 4.3.2 Anycast within the Global Internet 360 Service Addresses may be anycast within the global Internet routing 361 system in order to distribute services across the entire network. 362 The principal differences between this application and the IGP-scope 363 distribution discussed in Section 4.3.1 are that: 365 1. the routing system is, in general, controlled by other people; 367 2. the routing protocol concerned (BGP), and commonly-accepted 368 practices in its deployment, impose some additional constraints 369 (see Section 4.4). 371 4.4 Routing Considerations 373 4.4.1 Signalling Service Availability 375 When a routing system is provided with reachability information for a 376 Service Address from an individual node, packets addressed to that 377 Service Address will start to arrive at the node. Since it is 378 essential for the node to be ready to accept requests before they 379 start to arrive, a coupling between the routing information and the 380 availability of the service at a particular node is desirable. 382 Where a routing advertisement from a node corresponds to a single 383 Service Address, this coupling might be such that availability of the 384 service triggers the route advertisement, and non-availability of the 385 service triggers a route withdrawal. This can be achieved using 386 routing protocol implementations on the same server which provide the 387 service being distributed, which are configured to advertise and 388 withdraw the route advertisement in conjunction with the availability 389 (and health) of the software on the host which processes service 390 requests. An example of such an arrangement for a DNS service is 391 included in [ISC-TN-2004-1]. 393 Where a routing advertisement from a node corresponds to two or more 394 Service Addresses, it may not be appropriate to trigger a route 395 withdrawal due to the non-availability of a single service. Another 396 approach is to route requests for the service which is down at one 397 Anycast Node to a different Anycast Node at which the service is up. 398 This approach is discussed in Section 4.8. 400 Rapid advertisement/withdrawal oscillations can cause operational 401 problems, and nodes should be configured such that rapid oscillations 402 are avoided (e.g. by implementing a minimum delay following a 403 withdrawal before the service can be re-advertised). See 404 Section 4.4.4 for a discussion of route oscillations in BGP. 406 4.4.2 Covering Prefix 408 In some routing systems (e.g. the BGP-based routing system of the 409 global Internet) it is not possible, in general, to propagate a host 410 route with confidence that the route will propagate throughout the 411 network. This is a consequence of operational policy, and not a 412 protocol restriction. 414 In such cases it is necessary to propagate a route which covers the 415 Service Address, and which has a sufficiently short prefix that it 416 will not be discarded by commonly-deployed import policies. For IPv4 417 Service Addresses, this is often a 24-bit prefix, but there are other 418 well-documented examples of IPv4 import polices which filter on 419 Regional Internet Registry (RIR) allocation boundaries, and hence 420 some experimentation may be prudent. Corresponding import policies 421 for IPv6 prefixes also exist. See Section 4.5 for more discussion of 422 IPv6 Service Addresses and corresponding anycast routes. 424 The propagation of a single route per service has some associated 425 scaling issues which are discussed in Section 4.4.8. 427 Where multiple Service Addresses are covered by the same covering 428 route, there is no longer a tight coupling between the advertisement 429 of that route and the individual services associated with the covered 430 host routes. The resulting impact on signaling availability of 431 individual services is discussed in Section 4.4.1 and Section 4.8. 433 4.4.3 Equal-Cost Paths 435 Some routing systems support equal-cost paths to the same 436 destination. Where multiple, equal-cost paths exist and lead to 437 different anycast nodes, there is a risk that different request 438 packets associated with a single transaction might be delivered to 439 more than one node. Services provided over TCP [RFC0793] necessarily 440 involve transactions with multiple request packets, due to the TCP 441 setup handshake. 443 For services which are distributed across the global Internet using 444 BGP, equal-cost paths are normally not a consideration: BGP's exit 445 selection algorithm usually selects a single, consistent exit for a 446 single destination regardless of whether multiple candidate paths 447 exist. Implementations of BGP exist that support multi-path exit 448 selection, however. 450 Equal cost paths are commonly supported in IGPs. Multi-node 451 selection for a single transaction can be avoided in most cases by 452 careful consideration of IGP link metrics, or by applying equal-cost 453 multi-path (ECMP) selection algorithms which cause a single node to 454 be selected for a single multi-packet transaction. For an example of 455 the use of hash-based ECMP selection in anycast service distribution, 456 see [ISC-TN-2004-1]. 458 Other ECMP selection algorithms are commonly available, including 459 those in which packets from the same flow are not guaranteed to be 460 routed towards the same destination. ECMP algorithms which select a 461 route on a per-packet basis rather than per-flow are commonly 462 referred to as performing "Per Packet Load Balancing" (PPLB). 464 With respect to anycast service distribution, some uses of PPLB may 465 cause different packets from a single multi-packet transaction sent 466 by a client to be delivered to different anycast nodes, effectively 467 making the anycast service unavailable. Whether this affects 468 specific anycast services will depend on how and where anycast nodes 469 are deployed within the routing system, and on where the PPLB is 470 being performed: 472 1. PPLB across multiple, parallel links between the same pair of 473 routers should cause no node selection problems; 475 2. PPLB across diverse paths within a single autonomous system (AS), 476 where the paths converge to a single exit as they leave the AS, 477 should cause no node selection problems; 479 3. PPLB across links to different neighbour ASes where where the 480 neighbour ASes have selected different nodes for a particular 481 anycast destination will, in general, cause request packets to be 482 distributed across multiple anycast nodes. This will have the 483 effect that the anycast service is unavailable to clients 484 downstream of the router performing PPLB. 486 The uses of PPLB which have the potential to interact badly with 487 anycast service distribution can also cause persistent packet 488 reordering. A network path that persistently reorders segments will 489 degrade the performance of traffic carried by TCP [Allman2000]. TCP, 490 according to several documented measurements, accounts for the bulk 491 of traffic carried on the Internet ([McCreary2000], [Fomenkov2004]). 492 Consequently, in many cases it is reasonable to consider networks 493 making such use of PPLB to be pathological. 495 4.4.4 Route Dampening 497 Frequent advertisements and withdrawals of individual prefixes in BGP 498 are known as flaps. Rapid flapping can lead to CPU exhaustion on 499 routers quite remote from the source of the instability, and for this 500 reason rapid route oscillations are frequently "dampened", as 501 described in [RFC2439]. 503 A dampened path will be suppressed by routers for an interval which 504 increases according to the frequency of the observed oscillation; a 505 suppressed path will not propagate. Hence a single router can 506 prevent the propagation of a flapping prefix to the rest of an 507 autonomous system, affording other routers in the network protection 508 from the instability. 510 Some implementations of flap dampening penalise oscillating 511 advertisements based on the observed AS_PATH, and not on the NLRI. 512 For this reason, network instability which leads to route flapping 513 from a single anycast node ought not to cause advertisements from 514 other nodes (which have different AS_PATH attributes) to be dampened. 516 To limit the opportunity of such implementations to penalise 517 advertisements originating from different Anycast Nodes in response 518 to oscillations from just a single node, care should be taken to 519 arrange that the AS_PATH attributes on routes from different nodes 520 are as diverse as possible. For example, Anycast Nodes should use 521 the same origin AS for their advertisements, but might have different 522 upstream ASs. 524 Where different implementations of flap dampening are prevalent, 525 individual nodes' instability may result in stable nodes becoming 526 unavailable. In mitigation, the following measures may be useful: 528 1. Judicious deployment of Local Nodes in combination with 529 especially stable Global Nodes (with high inter-AS path splay, 530 redundant hardware, power, etc) may help limit oscillation 531 problems to the Local Nodes' limited regions of influence; 533 2. Aggressive flap-dampening of the service prefix close to the 534 origin (e.g. within an Anycast Node, or in adjcacent ASes of each 535 Anycast Node) may also help reduce the opportunity of remote ASes 536 to see oscillations at all. 538 4.4.5 Reverse Path Forwarding Checks 540 Reverse Path Forwarding (RPF) checks, first described in [RFC2267], 541 are commonly deployed as part of ingress interface packet filters on 542 routers in the Internet in order to deny packets whose source 543 addresses are spoofed (see also RFC 2827 [RFC2827]). Deployed 544 implementations of RPF make several modes of operation available 545 (e.g. "loose" and "strict"). 547 Some modes of RPF can cause non-spoofed packets to be denied when 548 they originate from multi-homed site, since selected paths might 549 legitimately not correspond with the ingress interface of non-spoofed 550 packets from the multi-homed site. This issue is discussed in 551 [RFC3704]. 553 A collection of anycast nodes deployed across the Internet is largely 554 indistinguishable from a distributed, multi-homed site to the routing 555 system, and hence this risk also exists for anycast nodes, even if 556 individual nodes are not multi-homed. Care should be taken to ensure 557 that each anycast node is treated as a multi-homed network, and that 558 the corresponding recommendations in [RFC3704] with respect to RPF 559 checks are heeded. 561 4.4.6 Propagation Scope 563 In the context of Anycast service distribution across the global 564 Internet, Global Nodes are those which are capable of providing 565 service to clients anywhere in the network; reachability information 566 for the service is propagated globally, without restriction, by 567 advertising the routes covering the Service Addresses for global 568 transit to one or more providers. 570 More than one Global Node can exist for a single service (and indeed 571 this is often the case, for reasons of redundancy and load-sharing). 573 In contrast, it is sometimes desirable to deploy an Anycast Node 574 which only provides services to a local catchment of autonomous 575 systems, and which is deliberately not available to the entire 576 Internet; such nodes are referred to in this document as Local Nodes. 577 An example of circumstances in which a Local Node may be appropriate 578 are nodes designed to serve a region with rich internal connectivity 579 but unreliable, congested or expensive access to the rest of the 580 Internet. 582 Local Nodes advertise covering routes for Service Addresses in such a 583 way that their propagation is restricted. This might be done using 584 well-known community string attributes such as NO_EXPORT [RFC1997] or 585 NOPEER [RFC3765], or by arranging with peers to apply a conventional 586 "peering" import policy instead of a "transit" import policy, or some 587 suitable combination of measures. 589 Advertising reachability to Service Addresses from Local Nodes should 590 ideally be made using a routing policy that require presence of 591 explicit attributes for propagation, rather than reling on implicit 592 (default) policy. Inadvertant propagation of a route beyond its 593 intended horizon can result in capacity problems for Local Nodes 594 which might degrade service performance network-wide. 596 4.4.7 Other Peoples' Networks 598 When Anycast services are deployed across networks operated by 599 others, their reachability is dependent on routing policies and 600 topology changes (planned and unplanned) which are unpredictable and 601 sometimes difficult to identify. Since the routing system may 602 include networks operated by multiple, unrelated organisations, the 603 possibility of unforeseen interactions resulting from the 604 combinations of unrelated changes also exists. 606 The stability and predictability of such a routing system should be 607 taken into consideration when assessing the suitability of anycast as 608 a distribution strategy for particular services and protocols (see 609 also Section 4.1). 611 By way of mitigation, routing policies used by Anycast Nodes across 612 such routing systems should be conservative, individual nodes' 613 internal and external/connecting infrastructure should be scaled to 614 support loads far in excess of the average, and the service should be 615 monitored proactively from many points in order to avoid unpleasant 616 surprises (see Section 5.1). 618 4.4.8 Aggregation Risks 620 The propagation of a single route for each anycast service does not 621 scale well for routing systems in which the load of routing 622 information which must be carried is a concern, and where there are 623 potentially many services to distribute. For example, an autonomous 624 system which provides services to the Internet with N Service 625 Addresses covered by a single exported route, would need to advertise 626 (N+1) routes if each of those services were to be distributed using 627 anycast. 629 The common practice of applying minimum prefix-length filters in 630 import policies on the Internet (see Section 4.4.2) means that for a 631 route covering a Service Address to be usefully propagated the prefix 632 length must be substantially less than that required to advertise 633 just the host route. Widespread advertisement of short prefixes for 634 individual services hence also has a negative impact on address 635 conservation. 637 Both of these issues can be mitigated to some extent by the use of a 638 single covering prefix to accommodate multiple Service Addresses, as 639 described in Section 4.8. This implies a decoupling of the route 640 advertisement from individual service availability (see 641 Section 4.4.1), however, with attendant risks to the stability of the 642 service as a whole (see Section 4.7). 644 In general, the scaling problems described here prevent anycast from 645 being a useful, general approach for service distribution on the 646 global Internet. It remains, however, a useful technique for 647 distributing a limited number of Internet-critical services, as well 648 as in smaller networks where the aggregation concerns discussed here 649 do not apply. 651 4.5 Addressing Considerations 653 Service Addresses should be unique within the routing system that 654 connects all Anycast Nodes to all possible clients of the service. 655 Service Addresses must also be chosen so that corresponding routes 656 will be allowed to propagate within that routing system. 658 For an IPv4-numbered service deployed across the Internet, for 659 example, an address might be chosen from a block where the minimum 660 RIR allocation size is 24 bits, and reachability to that address 661 might be provided by originating the covering 24-bit prefix. 663 For an IPv4-numbered service deployed within a private network, a 664 locally-unused [RFC1918] address might be chosen, and rechability to 665 that address might be signalled using a (32-bit) host route. 667 For IPv6-numbered services, Anycast Addresses are not scoped 668 differently from unicast addresses. As such the guidelines presented 669 for IPv4 with respect to address suitability follow for IPv6. Note 670 that historical prohibitions on anycast distribution of services over 671 IPv6 have been removed from the IPv6 addressing specification in 672 [I-D.ietf-ipv6-addr-arch-v4]. 674 4.6 Data Synchronisation 676 Although some services have been deployed in localised form (such 677 that clients from particular regions are presented with regionally- 678 relevant content) many services have the property that responses to 679 client requests should be consistent, regardless of where the request 680 originates. For a service distributed using anycast, that implies 681 that different Anycast Nodes must operate in a consistent manner and, 682 where that consistent behaviour is based on a data set, that the data 683 concerned be synchronised between nodes. 685 The mechanism by which data is synchronised depends on the nature of 686 the service; examples are zone transfers for authoritative DNS 687 servers and rsync for FTP archives. In general, the synchronisation 688 of data between Anycast Nodes will involve transactions between non- 689 anycast addresses. 691 Data synchronisation across public networks should be carried out 692 with appropriate authentication and encryption. 694 4.7 Node Autonomy 696 For an Anycast deployment whose goals include improved reliability 697 through redundancy, it is important to minimise the opportunity for a 698 single defect to compromise many (or all) nodes, or for the failure 699 of one node to provide a cascading failure bringing down additional 700 successive nodes until the service as a whole is defeated. 702 Co-dependencies are avoided by making each node as autonomous and 703 self-sufficient as possible. The degree to which nodes can survive 704 failure elsewhere depends on the nature of the service being 705 delivered, but for services which accommodate disconnected operation 706 (e.g. the timed propagation of changes between master and slave 707 servers in the DNS) a high degree of autonomy can be achieved. 709 The possibility of cascading failure due to load can also be reduced 710 by the deployment of both Global and Local Nodes for a single 711 service, since the effective fail-over path of traffic is, in 712 general, from Local Node to Global Node; traffic that might sink one 713 Local Node is unlikely to sink all Local Nodes, except in the most 714 degenerate cases. 716 The chance of cascading failure due to a software defect in an 717 operating system or server can be reduced in many cases by deploying 718 nodes running different implementations of operating system, server 719 software, routing protocol software, etc, such that a defect which 720 appears in a single component does not affect the whole system. 722 4.8 Multi-Service Nodes 724 For a service distributed across a routing system where covering 725 prefixes are required to announce reachability to a single Service 726 Address (see Section 4.4.2), special consideration is required in the 727 case where multiple services need to be distributed across a single 728 set of nodes. This results from the requirement to signal 729 availability of individual services to the routing system so that 730 requests for service are not received by nodes which are not able to 731 process them (see Section 4.4.1). 733 Several approaches are described in the following sections. 735 4.8.1 Multiple Covering Prefixes 737 Each Service Address is chosen such that only one Service Address is 738 covered by each advertised prefix. Advertisement and withdrawal of a 739 single covering prefix can be tightly coupled to the availability of 740 the single associated service. 742 This is the most straightforward approach. However, since it makes 743 very poor utilisation of globally-unique addresses, it is only 744 suitable for use for a small number of critical, infrastructural 745 services such as root DNS servers. General Internet-wide deployment 746 of services using this approach will not scale. 748 4.8.2 Pessimistic Withdrawal 750 Multiple Service Addresses are chosen such that they are covered by a 751 single prefix. Advertisement and withdrawl of the single covering 752 prefix is coupled to the availability of all associated services; if 753 any individual service becomes unavailable, the covering prefix is 754 withdrawn. 756 The coupling between service availability and advertisement of the 757 covering prefix is complicated by the requirement that all Service 758 Addresses must be available -- the announcement needs to be triggered 759 by the presence of all component routes, and not just a single 760 covered route. 762 The fact that a single malfunctioning service causes all deployed 763 services in a node to be taken off-line may make this approach 764 unsuitable for many applications. 766 4.8.3 Intra-Node Interior Connectivity 768 Multiple Service Addresses are chosen such that they are covered by a 769 single prefix. Advertisement and withdrawal of the single covering 770 prefix is coupled to the availability of any one service. Nodes have 771 interior connectivity, e.g. using tunnels, and host routes for 772 service addresses are distributed using an IGP which extends to 773 include routers at all nodes. 775 In the event that a service is unavailable at one node, but available 776 at other nodes, a request may be routed over the interior network 777 from the receiving node towards some other node for processing. 779 In the event that some local services in a node are down and the node 780 is disconnected from other nodes, continued advertisement of the 781 covering prefix might cause requests to become black-holed. 783 This approach allows reasonable address utilisation of the netblock 784 covered by the announced prefix, at the expense of reduced autonomy 785 of individual nodes; the IGP in which all nodes participate can be 786 viewed as a single point of failure. 788 5. Service Management 790 5.1 Monitoring 792 Monitoring a service which is distributed is more complex than 793 monitoring a non-distributed service, since the observed accuracy and 794 availability of the service is, in general, different when viewed 795 from clients attached to different parts of the network. When a 796 problem is identified, it is also not always obvious which node 797 served the request, and hence which node is malfunctioning. 799 It is recommended that distributed services are monitored from probes 800 distributed representatively across the routing system, and, where 801 possible, the identity of the node answering individual requests is 802 recorded along with performance and availability statistics. The 803 RIPE NCC DNSMON service [1] is an example of such monitoring for the 804 DNS. 806 Monitoring the routing system (from a variety of places, in the case 807 of routing systems where perspective is relevant) can also provide 808 useful diagnostics for troubleshooting service availability. This 809 can be achieved using dedicated probes, or public route measurement 810 facilities on the Internet such as the RIPE NCC Routing Information 811 Service [2] and the University of Oregon Route Views Project [3]. 813 Monitoring the health of the component devices in an Anycast 814 deployment of a service (hosts, routers, etc) is straightforward, and 815 can be achieved using the same tools and techniques commonly used to 816 manage other network-connected infrastructure, without the additional 817 complexity involved in monitoring Anycast service addresses. 819 6. Security Considerations 821 6.1 Denial-of-Service Attack Mitigation 823 This document describes mechanisms for deploying services on the 824 Internet which can be used to mitigate vulnerability to attack: 826 1. An Anycast Node can act as a sink for attack traffic originated 827 within its sphere of influence, preventing nodes elsewhere from 828 having to deal with that traffic; 830 2. The task of dealing with attack traffic whose sources are widely 831 distributed is itself distributed across all the nodes which 832 contribute to the service. Since the problem of sorting between 833 legitimate and attack traffic is distributed, this may lead to 834 better scaling properties than a service which is not 835 distributed. 837 6.2 Service Compromise 839 The distribution of a service across several (or many) autonomous 840 nodes imposes increased monitoring as well as an increased systems 841 administration burden on the operator of the service which might 842 reduce the effectiveness of host and router security. 844 The potential benefit of being able to take compromised servers off- 845 line without compromising the service can only be realised if there 846 are working procedures to do so quickly and reliably. 848 6.3 Service Hijacking 850 It is possible that an unauthorised party might advertise routes 851 corresponding to anycast Service Addresses across a network, and by 852 doing so capture legitimate request traffic or process requests in a 853 manner which compromises the service (or both). A rogue Anycast Node 854 might be difficult to detect by clients or by the operator of the 855 service. 857 The risk of service hijacking by manipulation of the routing sytem 858 exists regardless of whether a service is distributed using anycast. 859 However, the fact that legitimate Anycast Nodes are observable in the 860 routing system may make it more difficult to detect rogue nodes. 862 7. Protocol Considerations 864 This document does not impose any protocol considerations. 866 8. IANA Considerations 868 This document requests no action from IANA. 870 9. Acknowlegements 872 The authors gratefully acknowledge the contributions from various 873 participants of the grow working group, and in particular Geoff 874 Huston, Pekka Savola, Danny McPherson, Ben Black and Alan Barrett. 876 This work was supported by the US National Science Foundation 877 (research grant SCI-0427144) and DNS-OARC. 879 10. References 881 10.1 Normative References 883 [I-D.ietf-ipv6-addr-arch-v4] 884 Hinden, R. and S. Deering, "IP Version 6 Addressing 885 Architecture", draft-ietf-ipv6-addr-arch-v4-04 (work in 886 progress), May 2005. 888 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 889 RFC 793, September 1981. 891 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 892 (BGP-4)", RFC 1771, March 1995. 894 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 895 E. Lear, "Address Allocation for Private Internets", 896 BCP 5, RFC 1918, February 1996. 898 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 899 Communities Attribute", RFC 1997, August 1996. 901 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route 902 Flap Damping", RFC 2439, November 1998. 904 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 905 Defeating Denial of Service Attacks which employ IP Source 906 Address Spoofing", BCP 38, RFC 2827, May 2000. 908 [RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed 909 Networks", BCP 84, RFC 3704, March 2004. 911 10.2 Informative References 913 [Allman2000] 914 Allman, M. and E. Blanton, "On Making TCP More Robust to 915 Packet Reordering", January 2000, 916 . 918 [Fomenkov2004] 919 Fomenkov, M., Keys, K., Moore, D., and k. claffy, 920 "Longitudinal Study of Internet Traffic from 1999-2003", 921 January 2004, . 924 [ISC-TN-2003-1] 925 Abley, J., "Hierarchical Anycast for Global Service 926 Distribution", March 2003, 927 . 929 [ISC-TN-2004-1] 930 Abley, J., "A Software Approach to Distributing Requests 931 for DNS Service using GNU Zebra, ISC BIND 9 and FreeBSD", 932 March 2004, 933 . 935 [McCreary2000] 936 McCreary, S. and k. claffy, "Trends in Wide Area IP 937 Traffic Patterns: A View from Ames Internet Exchange", 938 September 2000, . 941 [RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host 942 Anycasting Service", RFC 1546, November 1993. 944 [RFC2267] Ferguson, P. and D. Senie, "Network Ingress Filtering: 945 Defeating Denial of Service Attacks which employ IP Source 946 Address Spoofing", RFC 2267, January 1998. 948 [RFC3765] Huston, G., "NOPEER Community for Border Gateway Protocol 949 (BGP) Route Scope Control", RFC 3765, April 2004. 951 URIs 953 [1] 955 [2] 957 [3] 959 Authors' Addresses 961 Joe Abley 962 Internet Systems Consortium, Inc. 963 950 Charter Street 964 Redwood City, CA 94063 965 USA 967 Phone: +1 650 423 1317 968 Email: jabley@isc.org 969 URI: http://www.isc.org/ 971 Kurt Erik Lindqvist 972 Netnod Internet Exchange 973 Bellmansgatan 30 974 118 47 Stockholm 975 Sweden 977 Email: kurtis@kurtis.pp.se 978 URI: http://www.netnod.se/ 980 Appendix A. Change History 982 This section should be removed before publication. 984 draft-kurtis-anycast-bcp-00: Initial draft. Discussed at IETF 61 in 985 the grow meeting and adopted as a working group document shortly 986 afterwards. 988 draft-ietf-grow-anycast-00: Missing and empty sections completed; 989 some structural reorganisation; general wordsmithing. Document 990 discussed at IETF 62. 992 draft-ietf-grow-anycast-01: This appendix added; acknowledgements 993 section added; commentary on RFC3513 prohibition of anycast on 994 hosts removed; minor sentence re-casting and related jiggery- 995 pokery. This revision published for discussion at IETF 63. 997 draft-ietf-grow-anycast-02: Normative reference to [I-D.ietf-ipv6- 998 addr-arch-v4] added (in the RFC editor's queue at the time of 999 writing; reference should be updated to an RFC number when 1000 available). Added commentary on per-packet load balancing. 1002 Intellectual Property Statement 1004 The IETF takes no position regarding the validity or scope of any 1005 Intellectual Property Rights or other rights that might be claimed to 1006 pertain to the implementation or use of the technology described in 1007 this document or the extent to which any license under such rights 1008 might or might not be available; nor does it represent that it has 1009 made any independent effort to identify any such rights. Information 1010 on the procedures with respect to rights in RFC documents can be 1011 found in BCP 78 and BCP 79. 1013 Copies of IPR disclosures made to the IETF Secretariat and any 1014 assurances of licenses to be made available, or the result of an 1015 attempt made to obtain a general license or permission for the use of 1016 such proprietary rights by implementers or users of this 1017 specification can be obtained from the IETF on-line IPR repository at 1018 http://www.ietf.org/ipr. 1020 The IETF invites any interested party to bring to its attention any 1021 copyrights, patents or patent applications, or other proprietary 1022 rights that may cover technology that may be required to implement 1023 this standard. Please address the information to the IETF at 1024 ietf-ipr@ietf.org. 1026 Disclaimer of Validity 1028 This document and the information contained herein are provided on an 1029 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1030 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1031 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1032 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1033 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1034 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1036 Copyright Statement 1038 Copyright (C) The Internet Society (2005). This document is subject 1039 to the rights, licenses and restrictions contained in BCP 78, and 1040 except as set forth therein, the authors retain all their rights. 1042 Acknowledgment 1044 Funding for the RFC Editor function is currently provided by the 1045 Internet Society.