idnits 2.17.1 draft-ietf-grow-anycast-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1055. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1032. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1039. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1045. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 24, 2006) is 6666 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 1771 (Obsoleted by RFC 4271) -- Obsolete informational reference (is this intentional?): RFC 2267 (Obsoleted by RFC 2827) Summary: 5 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Abley 3 Internet-Draft ISC 4 Expires: July 28, 2006 K. Lindqvist 5 Netnod Internet Exchange 6 January 24, 2006 8 Operation of Anycast Services 9 draft-ietf-grow-anycast-03 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on July 28, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 As the Internet has grown, and as systems and networked services 43 within enterprises have become more pervasive, many services with 44 high availability requirements have emerged. These requirements have 45 increased the demands on the reliability of the infrastructure on 46 which those services rely. 48 Various techniques have been employed to increase the availability of 49 services deployed on the Internet. This document presents commentary 50 and recommendations for distribution of services using anycast. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Anycast Service Distribution . . . . . . . . . . . . . . . . . 5 57 3.1. General Description . . . . . . . . . . . . . . . . . . . 5 58 3.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5 59 4. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 60 4.1. Protocol Suitability . . . . . . . . . . . . . . . . . . . 7 61 4.2. Node Placement . . . . . . . . . . . . . . . . . . . . . . 7 62 4.3. Routing Systems . . . . . . . . . . . . . . . . . . . . . 8 63 4.3.1. Anycast within an IGP . . . . . . . . . . . . . . . . 8 64 4.3.2. Anycast within the Global Internet . . . . . . . . . . 9 65 4.4. Routing Considerations . . . . . . . . . . . . . . . . . . 9 66 4.4.1. Signalling Service Availability . . . . . . . . . . . 9 67 4.4.2. Covering Prefix . . . . . . . . . . . . . . . . . . . 10 68 4.4.3. Equal-Cost Paths . . . . . . . . . . . . . . . . . . . 10 69 4.4.4. Route Dampening . . . . . . . . . . . . . . . . . . . 12 70 4.4.5. Reverse Path Forwarding Checks . . . . . . . . . . . . 13 71 4.4.6. Propagation Scope . . . . . . . . . . . . . . . . . . 13 72 4.4.7. Other Peoples' Networks . . . . . . . . . . . . . . . 14 73 4.4.8. Aggregation Risks . . . . . . . . . . . . . . . . . . 14 74 4.5. Addressing Considerations . . . . . . . . . . . . . . . . 15 75 4.6. Data Synchronisation . . . . . . . . . . . . . . . . . . . 15 76 4.7. Node Autonomy . . . . . . . . . . . . . . . . . . . . . . 16 77 4.8. Multi-Service Nodes . . . . . . . . . . . . . . . . . . . 17 78 4.8.1. Multiple Covering Prefixes . . . . . . . . . . . . . . 17 79 4.8.2. Pessimistic Withdrawal . . . . . . . . . . . . . . . . 17 80 4.8.3. Intra-Node Interior Connectivity . . . . . . . . . . . 18 81 5. Service Management . . . . . . . . . . . . . . . . . . . . . . 19 82 5.1. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 19 83 6. Security Considerations . . . . . . . . . . . . . . . . . . . 20 84 6.1. Denial-of-Service Attack Mitigation . . . . . . . . . . . 20 85 6.2. Service Compromise . . . . . . . . . . . . . . . . . . . . 20 86 6.3. Service Hijacking . . . . . . . . . . . . . . . . . . . . 20 87 7. Protocol Considerations . . . . . . . . . . . . . . . . . . . 21 88 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 89 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 90 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 24 91 10.1. Normative References . . . . . . . . . . . . . . . . . . . 24 92 10.2. Informative References . . . . . . . . . . . . . . . . . . 24 93 Appendix A. Change History . . . . . . . . . . . . . . . . . . . 27 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 28 95 Intellectual Property and Copyright Statements . . . . . . . . . . 29 97 1. Introduction 99 To distribute a service using anycast, the service is first 100 associated with a stable set of IP addresses, and reachability to 101 those addresses is advertised in a routing system from multiple, 102 independent service nodes. Various techniques for anycast deployment 103 of services are discussed in [RFC1546], [ISC-TN-2003-1] and [ISC-TN- 104 2004-1]. 106 The techniques and considerations described in this document apply to 107 services reachable over both IPv4 and IPv6. 109 Anycast has in recent years become increasingly popular for adding 110 redundancy to DNS servers to complement the redundancy which the DNS 111 architecture itself already provides. Several root DNS server 112 operators have distributed their servers widely around the Internet, 113 and both resolver and authority servers are commonly distributed 114 within the networks of service providers. Anycast distribution has 115 been used by commercial DNS authority server operators for several 116 years. The use of anycast is not limited to the DNS, although the 117 use of anycast imposes some additional limitations on the nature of 118 the service being distributed, including transaction longevity, 119 transaction state held on servers and data synchronisation 120 capabilities. 122 Although anycast is conceptually simple, its implementation 123 introduces some pitfalls for operation of services. For example, 124 monitoring the availability of the service becomes more difficult; 125 the observed availability changes according to the location of the 126 client within the network, and the client catchment of individual 127 anycast nodes is neither static, nor reliably deterministic. 129 This document will describe the use of anycast for both local scope 130 distribution of services using an Interior Gateway Protocol (IGP) and 131 global distribution using BGP [RFC1771]. Many of the issues for 132 monitoring and data synchronisation are common to both, but 133 deployment issues differ substantially. 135 2. Terminology 137 Service Address: an IP address associated with a particular service 138 (e.g. the destination address used by DNS resolvers to reach a 139 particular authority server). 141 Anycast: the practice of making a particular Service Address 142 available in multiple, discrete, autonomous locations, such that 143 datagrams sent are routed to one of several available locations. 145 Anycast Node: an internally-connected collection of hosts and routers 146 which together provide service for an anycast Service Address. An 147 Anycast Node might be as simple as a single host participating in 148 a routing system with adjacent routers, or it might include a 149 number of hosts connected in some more elaborate fashion; in 150 either case, to the routing system across which the service is 151 being anycast, each Anycast Node presents a unique path to the 152 Service Address. The entire anycast system for the service 153 consists of two or more separate Anycast Nodes. 155 Catchment: in physical geography, an area drained by a river, also 156 known as a drainage basin. By analogy, as used in this document, 157 the topological region of a network within which packets directed 158 at an anycast address are routed to one particular node. 160 Local-Scope Anycast: reachability information for the anycast Service 161 Address is propagated through a routing system in such a way that 162 a particular anycast node is only visible to a subset of the whole 163 routing system. 165 Local Node: an Anycast Node providing service using a Local-Scope 166 Anycast address. 168 Global-Scope Anycast: reachability information for the anycast 169 Service Address is propagated through a routing system in such a 170 way that a particular anycast node is potentially visible to the 171 whole routing system. 173 Global Node: an Anycast Node providing service using a Global-Scope 174 Anycast address. 176 3. Anycast Service Distribution 178 3.1. General Description 180 Anycast is the name given to the practice of making a Service Address 181 available to a routing system at Anycast Nodes in two or more 182 discrete locations. The service provided by each node is generally 183 consistent regardless of the particular node chosen by the routing 184 system to handle a particular request (although some services may 185 benefit from deliberate differences in the behaviours of individual 186 nodes, in order to facilitate locality-specific behaviour; see 187 Section 4.6). 189 For services distributed using anycast, there is no inherent 190 requirement for referrals to other servers or name-based service 191 distribution ("round-robin DNS"), although those techniques could be 192 combined with anycast service distribution if an application required 193 it. The routing system decides which node is used for each request, 194 based on the topological design of the routing system and the point 195 in the network at which the request originates. 197 The Anycast Node chosen to service a particular query can be 198 influenced by the traffic engineering capabilities of the routing 199 protocols which make up the routing system. The degree of influence 200 available to the operator of the node depends on the scale of the 201 routing system within which the Service Address is anycast. 203 Load-balancing between Anycast Nodes is typically difficult to 204 achieve (load distribution between nodes is generally unbalanced in 205 terms of request and traffic load). Distribution of load between 206 nodes for the purposes of reliability, and coarse-grained 207 distribution of load for the purposes of making popular services 208 scalable can often be achieved, however. 210 The scale of the routing system through which a service is anycast 211 can vary from a small Interior Gateway Protocol (IGP) connecting a 212 small handful of components, to the Border Gateway Protocol (BGP) 213 [RFC1771] connecting the global Internet, depending on the nature of 214 the service distribution that is required. 216 3.2. Goals 218 A service may be anycast for a variety of reasons. A number of 219 common objectives are: 221 1. Coarse ("unbalanced") distribution of load across nodes, to allow 222 infrastructure to scale to increased numbers of queries and to 223 accommodate transient query peaks; 225 2. Mitigation of non-distributed denial of service attacks by 226 localising damage to single anycast nodes; 228 3. Constraint of distributed denial of service attacks or flash 229 crowds to local regions around anycast nodes (perhaps restricting 230 query traffic to local peering links, rather than paid transit 231 circuits); 233 4. To provide additional information to help locate location of 234 traffic sources in the case of attack (or query) traffic which 235 incorporates spoofed source addresses. This information is 236 derived from the property of anycast service distribution that 237 the selection of the Anycast Node used to service a particular 238 query may be related to the topological source of the request. 240 5. Improvement of query response time, by reducing the network 241 distance between client and server with the provision of a local 242 Anycast Node. The extent to which query response time is 243 improved depends on the way that nodes are selected for the 244 clients by the routing system. Topological nearness within the 245 routing system does not, in general, correlate to round-trip 246 performance across a network; in some cases response times may 247 see no reduction, and may increase. 249 6. To reduce a list of servers to a single, distributed address. 250 For example, a large number of authoritative nameservers for a 251 zone may be deployed using a small set of anycast Service 252 Addresses; this approach can increase the accessibility of zone 253 data in the DNS without increasing the size of a referral 254 response from a nameserver authoritative for the parent zone. 256 4. Design 258 4.1. Protocol Suitability 260 When a service is anycast between two or more nodes, the routing 261 system makes the node selection decision on behalf of a client. 262 Since it is usually a requirement that a single client-server 263 interaction is carried out between a client and the same server node 264 for the duration of the transaction, it follows that the routing 265 system's node selection decision ought to be stable for substantially 266 longer than the expected transaction time, if the service is to be 267 provided reliably. 269 Some services have very short transaction times, and may even be 270 carried out using a single packet request and a single packet reply 271 (e.g. DNS transactions over UDP transport). Other services involve 272 far longer-lived transactions (e.g. bulk file downloads and audio- 273 visual media streaming). 275 Services may be anycast within very predictable routing systems, 276 which can remain stable for long periods of time (e.g. anycast within 277 a well-managed and topologically-simple IGP, where node selection 278 changes only occur as a response to node failures). Other 279 deployments have far less predictable characteristics (see 280 Section 4.4.7). 282 The stability of the routing system together with the transaction 283 time of the service should be carefully compared when deciding 284 whether a service is suitable for distribution using anycast. In 285 some cases, for new protocols, it may be practical to split large 286 transactions into an initialisation phase which is handled by anycast 287 servers, and a sustained phase which is provided by non-anycast 288 servers, perhaps chosen during the initialisation phase. 290 This document deliberately avoids prescribing rules as to which 291 protocols or services are suitable for distribution by anycast; to 292 attempt to do so would be presumptuous. 294 4.2. Node Placement 296 Decisions as to where Anycast Nodes should be placed will depend to a 297 large extent on the goals of the service distribution. For example: 299 o A DNS recursive resolver service might be distributed within an 300 ISP's network, one Anycast Node per site. 302 o A root DNS server service might be distributed throughout the 303 Internet; Anycast Nodes could be located in regions with poor 304 external connectivity to ensure that the DNS functions adequately 305 within the region during times of external network failure. 307 o An FTP mirror service might include local nodes located at 308 exchange points, so that ISPs connected to that exchange point 309 could download bulk data more cheaply than if they had to use 310 expensive transit circuits. 312 In general node placement decisions should be made with consideration 313 of likely traffic requirements, the potential for flash crowds or 314 denial-of-service traffic, the stability of the local routing system 315 and the failure modes with respect to node failure, or local routing 316 system failure. 318 4.3. Routing Systems 320 4.3.1. Anycast within an IGP 322 There are several common motivations for the distribution of a 323 Service Address within the scope of an IGP: 325 1. to improve service response times, by hosting a service close to 326 other users of the network; 328 2. to improve service reliability by providing automatic fail-over 329 to backup nodes; and 331 3. to keep service traffic local, to avoid congesting wide-area 332 links. 334 In each case the decisions as to where and how services are 335 provisioned can be made by network engineers without requiring such 336 operational complexities as regional variances in the configuration 337 of client computers, or deliberate DNS incoherence (causing DNS 338 queries to yield different answers depending on where the queries 339 originate). 341 When a service is anycast within an IGP the routing system is 342 typically under the control of the same organisation that is 343 providing the service, and hence the relationship between service 344 transaction characteristics and network stability are likely to be 345 well-understood. This technique is consequently applicable to a 346 larger number of applications than Internet-wide anycast service 347 distribution (see Section 4.1). 349 An IGP will generally have no inherent restriction on the length of 350 prefix that can be introduced to it. In this case there is no need 351 to construct a covering prefix for particular Service Addresses; host 352 routes corresponding to the Service Address can instead be introduced 353 to the routing system. See Section 4.4.2 for more discussion of the 354 requirement for a covering prefix. 356 IGPs often feature little or no aggregation of routes, partly due to 357 algorithmic complexities in supporting aggregation. There is little 358 motivation for aggregation in many networks' IGPs in many cases, 359 since the amount of routing information carried in the IGP is small 360 enough that scaling concerns in routers do not arise. For discussion 361 of aggregation risks in other routing systems, see Section 4.4.8. 363 By reducing the scope of the IGP to just the hosts providing service 364 (together with one or more gateway routers) this technique can be 365 applied to the construction of server clusters. This application is 366 discussed in some detail in [ISC-TN-2004-1]. 368 4.3.2. Anycast within the Global Internet 370 Service Addresses may be anycast within the global Internet routing 371 system in order to distribute services across the entire network. 372 The principal differences between this application and the IGP-scope 373 distribution discussed in Section 4.3.1 are that: 375 1. the routing system is, in general, controlled by other people; 377 2. the routing protocol concerned (BGP), and commonly-accepted 378 practices in its deployment, impose some additional constraints 379 (see Section 4.4). 381 4.4. Routing Considerations 383 4.4.1. Signalling Service Availability 385 When a routing system is provided with reachability information for a 386 Service Address from an individual node, packets addressed to that 387 Service Address will start to arrive at the node. Since it is 388 essential for the node to be ready to accept requests before they 389 start to arrive, a coupling between the routing information and the 390 availability of the service at a particular node is desirable. 392 Where a routing advertisement from a node corresponds to a single 393 Service Address, this coupling might be such that availability of the 394 service triggers the route advertisement, and non-availability of the 395 service triggers a route withdrawal. This can be achieved using 396 routing protocol implementations on the same server which provide the 397 service being distributed, which are configured to advertise and 398 withdraw the route advertisement in conjunction with the availability 399 (and health) of the software on the host which processes service 400 requests. An example of such an arrangement for a DNS service is 401 included in [ISC-TN-2004-1]. 403 Where a routing advertisement from a node corresponds to two or more 404 Service Addresses, it may not be appropriate to trigger a route 405 withdrawal due to the non-availability of a single service. Another 406 approach in the case where the service is down at one Anycast Node is 407 to route requests to a different Anycast Node where the service is 408 working normally. This approach is discussed in Section 4.8. 410 Rapid advertisement/withdrawal oscillations can cause operational 411 problems, and nodes should be configured such that rapid oscillations 412 are avoided (e.g. by implementing a minimum delay following a 413 withdrawal before the service can be re-advertised). See 414 Section 4.4.4 for a discussion of route oscillations in BGP. 416 4.4.2. Covering Prefix 418 In some routing systems (e.g. the BGP-based routing system of the 419 global Internet) it is not possible, in general, to propagate a host 420 route with confidence that the route will propagate throughout the 421 network. This is a consequence of operational policy, and not a 422 protocol restriction. 424 In such cases it is necessary to propagate a route which covers the 425 Service Address, and which has a sufficiently short prefix that it 426 will not be discarded by commonly-deployed import policies. For IPv4 427 Service Addresses, this is often a 24-bit prefix, but there are other 428 well-documented examples of IPv4 import polices which filter on 429 Regional Internet Registry (RIR) allocation boundaries, and hence 430 some experimentation may be prudent. Corresponding import policies 431 for IPv6 prefixes also exist. See Section 4.5 for more discussion of 432 IPv6 Service Addresses and corresponding anycast routes. 434 The propagation of a single route per service has some associated 435 scaling issues which are discussed in Section 4.4.8. 437 Where multiple Service Addresses are covered by the same covering 438 route, there is no longer a tight coupling between the advertisement 439 of that route and the individual services associated with the covered 440 host routes. The resulting impact on signalling availability of 441 individual services is discussed in Section 4.4.1 and Section 4.8. 443 4.4.3. Equal-Cost Paths 445 Some routing systems support equal-cost paths to the same 446 destination. Where multiple, equal-cost paths exist and lead to 447 different anycast nodes, there is a risk that different request 448 packets associated with a single transaction might be delivered to 449 more than one node. Services provided over TCP [RFC0793] necessarily 450 involve transactions with multiple request packets, due to the TCP 451 setup handshake. 453 For services which are distributed across the global Internet using 454 BGP, equal-cost paths are normally not a consideration: BGP's exit 455 selection algorithm usually selects a single, consistent exit for a 456 single destination regardless of whether multiple candidate paths 457 exist. Implementations of BGP exist that support multi-path exit 458 selection, however. 460 Equal cost paths are commonly supported in IGPs. Multi-node 461 selection for a single transaction can be avoided in most cases by 462 careful consideration of IGP link metrics, or by applying equal-cost 463 multi-path (ECMP) selection algorithms which cause a single node to 464 be selected for a single multi-packet transaction. For an example of 465 the use of hash-based ECMP selection in anycast service distribution, 466 see [ISC-TN-2004-1]. 468 Other ECMP selection algorithms are commonly available, including 469 those in which packets from the same flow are not guaranteed to be 470 routed towards the same destination. ECMP algorithms which select a 471 route on a per-packet basis rather than per-flow are commonly 472 referred to as performing "Per Packet Load Balancing" (PPLB). 474 With respect to anycast service distribution, some uses of PPLB may 475 cause different packets from a single multi-packet transaction sent 476 by a client to be delivered to different anycast nodes, effectively 477 making the anycast service unavailable. Whether this affects 478 specific anycast services will depend on how and where anycast nodes 479 are deployed within the routing system, and on where the PPLB is 480 being performed: 482 1. PPLB across multiple, parallel links between the same pair of 483 routers should cause no node selection problems; 485 2. PPLB across diverse paths within a single autonomous system (AS), 486 where the paths converge to a single exit as they leave the AS, 487 should cause no node selection problems; 489 3. PPLB across links to different neighbour ASes where the neighbour 490 ASes have selected different nodes for a particular anycast 491 destination will, in general, cause request packets to be 492 distributed across multiple anycast nodes. This will have the 493 effect that the anycast service is unavailable to clients 494 downstream of the router performing PPLB. 496 The uses of PPLB which have the potential to interact badly with 497 anycast service distribution can also cause persistent packet 498 reordering. A network path that persistently reorders segments will 499 degrade the performance of traffic carried by TCP [Allman2000]. TCP, 500 according to several documented measurements, accounts for the bulk 501 of traffic carried on the Internet ([McCreary2000], [Fomenkov2004]). 502 Consequently, in many cases it is reasonable to consider networks 503 making such use of PPLB to be pathological. 505 4.4.4. Route Dampening 507 Frequent advertisements and withdrawals of individual prefixes in BGP 508 are known as flaps. Rapid flapping can lead to CPU exhaustion on 509 routers quite remote from the source of the instability, and for this 510 reason rapid route oscillations are frequently "dampened", as 511 described in [RFC2439]. 513 A dampened path will be suppressed by routers for an interval which 514 increases according to the frequency of the observed oscillation; a 515 suppressed path will not propagate. Hence a single router can 516 prevent the propagation of a flapping prefix to the rest of an 517 autonomous system, affording other routers in the network protection 518 from the instability. 520 Some implementations of flap dampening penalise oscillating 521 advertisements based on the observed AS_PATH, and not on the NLRI. 522 For this reason, network instability which leads to route flapping 523 from a single anycast node ought not to cause advertisements from 524 other nodes (which have different AS_PATH attributes) to be dampened. 526 To limit the opportunity of such implementations to penalise 527 advertisements originating from different Anycast Nodes in response 528 to oscillations from just a single node, care should be taken to 529 arrange that the AS_PATH attributes on routes from different nodes 530 are as diverse as possible. For example, Anycast Nodes should use 531 the same origin AS for their advertisements, but might have different 532 upstream ASes. 534 Where different implementations of flap dampening are prevalent, 535 individual nodes' instability may result in stable nodes becoming 536 unavailable. In mitigation, the following measures may be useful: 538 1. Judicious deployment of Local Nodes in combination with 539 especially stable Global Nodes (with high inter-AS path splay, 540 redundant hardware, power, etc) may help limit oscillation 541 problems to the Local Nodes' limited regions of influence; 543 2. Aggressive flap-dampening of the service prefix close to the 544 origin (e.g. within an Anycast Node, or in adjacent ASes of each 545 Anycast Node) may also help reduce the opportunity of remote ASes 546 to see oscillations at all. 548 4.4.5. Reverse Path Forwarding Checks 550 Reverse Path Forwarding (RPF) checks, first described in [RFC2267], 551 are commonly deployed as part of ingress interface packet filters on 552 routers in the Internet in order to deny packets whose source 553 addresses are spoofed (see also RFC 2827 [RFC2827]). Deployed 554 implementations of RPF make several modes of operation available 555 (e.g. "loose" and "strict"). 557 Some modes of RPF can cause non-spoofed packets to be denied when 558 they originate from multi-homed site, since selected paths might 559 legitimately not correspond with the ingress interface of non-spoofed 560 packets from the multi-homed site. This issue is discussed in 561 [RFC3704]. 563 A collection of anycast nodes deployed across the Internet is largely 564 indistinguishable from a distributed, multi-homed site to the routing 565 system, and hence this risk also exists for anycast nodes, even if 566 individual nodes are not multi-homed. Care should be taken to ensure 567 that each anycast node is treated as a multi-homed network, and that 568 the corresponding recommendations in [RFC3704] with respect to RPF 569 checks are heeded. 571 4.4.6. Propagation Scope 573 In the context of Anycast service distribution across the global 574 Internet, Global Nodes are those which are capable of providing 575 service to clients anywhere in the network; reachability information 576 for the service is propagated globally, without restriction, by 577 advertising the routes covering the Service Addresses for global 578 transit to one or more providers. 580 More than one Global Node can exist for a single service (and indeed 581 this is often the case, for reasons of redundancy and load-sharing). 583 In contrast, it is sometimes desirable to deploy an Anycast Node 584 which only provides services to a local catchment of autonomous 585 systems, and which is deliberately not available to the entire 586 Internet; such nodes are referred to in this document as Local Nodes. 587 An example of circumstances in which a Local Node may be appropriate 588 are nodes designed to serve a region with rich internal connectivity 589 but unreliable, congested or expensive access to the rest of the 590 Internet. 592 Local Nodes advertise covering routes for Service Addresses in such a 593 way that their propagation is restricted. This might be done using 594 well-known community string attributes such as NO_EXPORT [RFC1997] or 595 NOPEER [RFC3765], or by arranging with peers to apply a conventional 596 "peering" import policy instead of a "transit" import policy, or some 597 suitable combination of measures. 599 Advertising reachability to Service Addresses from Local Nodes should 600 ideally be made using a routing policy that require presence of 601 explicit attributes for propagation, rather than relying on implicit 602 (default) policy. Inadvertent propagation of a route beyond its 603 intended horizon can result in capacity problems for Local Nodes 604 which might degrade service performance network-wide. 606 4.4.7. Other Peoples' Networks 608 When Anycast services are deployed across networks operated by 609 others, their reachability is dependent on routing policies and 610 topology changes (planned and unplanned) which are unpredictable and 611 sometimes difficult to identify. Since the routing system may 612 include networks operated by multiple, unrelated organisations, the 613 possibility of unforeseen interactions resulting from the 614 combinations of unrelated changes also exists. 616 The stability and predictability of such a routing system should be 617 taken into consideration when assessing the suitability of anycast as 618 a distribution strategy for particular services and protocols (see 619 also Section 4.1). 621 By way of mitigation, routing policies used by Anycast Nodes across 622 such routing systems should be conservative, individual nodes' 623 internal and external/connecting infrastructure should be scaled to 624 support loads far in excess of the average, and the service should be 625 monitored proactively from many points in order to avoid unpleasant 626 surprises (see Section 5.1). 628 4.4.8. Aggregation Risks 630 The propagation of a single route for each anycast service does not 631 scale well for routing systems in which the load of routing 632 information which must be carried is a concern, and where there are 633 potentially many services to distribute. For example, an autonomous 634 system which provides services to the Internet with N Service 635 Addresses covered by a single exported route, would need to advertise 636 (N+1) routes if each of those services were to be distributed using 637 anycast. 639 The common practice of applying minimum prefix-length filters in 640 import policies on the Internet (see Section 4.4.2) means that for a 641 route covering a Service Address to be usefully propagated the prefix 642 length must be substantially less than that required to advertise 643 just the host route. Widespread advertisement of short prefixes for 644 individual services hence also has a negative impact on address 645 conservation. 647 Both of these issues can be mitigated to some extent by the use of a 648 single covering prefix to accommodate multiple Service Addresses, as 649 described in Section 4.8. This implies a de-coupling of the route 650 advertisement from individual service availability (see 651 Section 4.4.1), however, with attendant risks to the stability of the 652 service as a whole (see Section 4.7). 654 In general, the scaling problems described here prevent anycast from 655 being a useful, general approach for service distribution on the 656 global Internet. It remains, however, a useful technique for 657 distributing a limited number of Internet-critical services, as well 658 as in smaller networks where the aggregation concerns discussed here 659 do not apply. 661 4.5. Addressing Considerations 663 Service Addresses should be unique within the routing system that 664 connects all Anycast Nodes to all possible clients of the service. 665 Service Addresses must also be chosen so that corresponding routes 666 will be allowed to propagate within that routing system. 668 For an IPv4-numbered service deployed across the Internet, for 669 example, an address might be chosen from a block where the minimum 670 RIR allocation size is 24 bits, and reachability to that address 671 might be provided by originating the covering 24-bit prefix. 673 For an IPv4-numbered service deployed within a private network, a 674 locally-unused [RFC1918] address might be chosen, and reachability to 675 that address might be signalled using a (32-bit) host route. 677 For IPv6-numbered services, Anycast Addresses are not scoped 678 differently from unicast addresses. As such the guidelines presented 679 for IPv4 with respect to address suitability follow for IPv6. Note 680 that historical prohibitions on anycast distribution of services over 681 IPv6 have been removed from the IPv6 addressing specification in 682 [I-D.ietf-ipv6-addr-arch-v4]. 684 4.6. Data Synchronisation 686 Although some services have been deployed in localised form (such 687 that clients from particular regions are presented with regionally- 688 relevant content) many services have the property that responses to 689 client requests should be consistent, regardless of where the request 690 originates. For a service distributed using anycast, that implies 691 that different Anycast Nodes must operate in a consistent manner and, 692 where that consistent behaviour is based on a data set, that the data 693 concerned be synchronised between nodes. 695 The mechanism by which data is synchronised depends on the nature of 696 the service; examples are zone transfers for authoritative DNS 697 servers and rsync for FTP archives. In general, the synchronisation 698 of data between Anycast Nodes will involve transactions between non- 699 anycast addresses. 701 Data synchronisation across public networks should be carried out 702 with appropriate authentication and encryption. 704 4.7. Node Autonomy 706 For an Anycast deployment whose goals include improved reliability 707 through redundancy, it is important to minimise the opportunity for a 708 single defect to compromise many (or all) nodes, or for the failure 709 of one node to provide a cascading failure bringing down additional 710 successive nodes until the service as a whole is defeated. 712 Co-dependencies are avoided by making each node as autonomous and 713 self-sufficient as possible. The degree to which nodes can survive 714 failure elsewhere depends on the nature of the service being 715 delivered, but for services which accommodate disconnected operation 716 (e.g. the timed propagation of changes between master and slave 717 servers in the DNS) a high degree of autonomy can be achieved. 719 The possibility of cascading failure due to load can also be reduced 720 by the deployment of both Global and Local Nodes for a single 721 service, since the effective fail-over path of traffic is, in 722 general, from Local Node to Global Node; traffic that might sink one 723 Local Node is unlikely to sink all Local Nodes, except in the most 724 degenerate cases. 726 The chance of cascading failure due to a software defect in an 727 operating system or server can be reduced in many cases by deploying 728 nodes running different implementations of operating system, server 729 software, routing protocol software, etc, such that a defect which 730 appears in a single component does not affect the whole system. 732 It should be noted that these approaches to increase node autonomy 733 are, to varying degrees, contrary to the practical goals of making a 734 deployed service straightforward to operate. A service which is 735 over-complex is more likely to suffer from operator error than a 736 service which is more straightforward to run. Careful consideration 737 should be given to all of these aspects so that an appropriate 738 balance may be found. 740 4.8. Multi-Service Nodes 742 For a service distributed across a routing system where covering 743 prefixes are required to announce reachability to a single Service 744 Address (see Section 4.4.2), special consideration is required in the 745 case where multiple services need to be distributed across a single 746 set of nodes. This results from the requirement to signal 747 availability of individual services to the routing system so that 748 requests for service are not received by nodes which are not able to 749 process them (see Section 4.4.1). 751 Several approaches are described in the following sections. 753 4.8.1. Multiple Covering Prefixes 755 Each Service Address is chosen such that only one Service Address is 756 covered by each advertised prefix. Advertisement and withdrawal of a 757 single covering prefix can be tightly coupled to the availability of 758 the single associated service. 760 This is the most straightforward approach. However, since it makes 761 very poor utilisation of globally-unique addresses, it is only 762 suitable for use for a small number of critical, infrastructural 763 services such as root DNS servers. General Internet-wide deployment 764 of services using this approach will not scale. 766 4.8.2. Pessimistic Withdrawal 768 Multiple Service Addresses are chosen such that they are covered by a 769 single prefix. Advertisement and withdrawal of the single covering 770 prefix is coupled to the availability of all associated services; if 771 any individual service becomes unavailable, the covering prefix is 772 withdrawn. 774 The coupling between service availability and advertisement of the 775 covering prefix is complicated by the requirement that all Service 776 Addresses must be available -- the announcement needs to be triggered 777 by the presence of all component routes, and not just a single 778 covered route. 780 The fact that a single malfunctioning service causes all deployed 781 services in a node to be taken off-line may make this approach 782 unsuitable for many applications. 784 4.8.3. Intra-Node Interior Connectivity 786 Multiple Service Addresses are chosen such that they are covered by a 787 single prefix. Advertisement and withdrawal of the single covering 788 prefix is coupled to the availability of any one service. Nodes have 789 interior connectivity, e.g. using tunnels, and host routes for 790 service addresses are distributed using an IGP which extends to 791 include routers at all nodes. 793 In the event that a service is unavailable at one node, but available 794 at other nodes, a request may be routed over the interior network 795 from the receiving node towards some other node for processing. 797 In the event that some local services in a node are down and the node 798 is disconnected from other nodes, continued advertisement of the 799 covering prefix might cause requests to become black-holed. 801 This approach allows reasonable address utilisation of the netblock 802 covered by the announced prefix, at the expense of reduced autonomy 803 of individual nodes; the IGP in which all nodes participate can be 804 viewed as a single point of failure. 806 5. Service Management 808 5.1. Monitoring 810 Monitoring a service which is distributed is more complex than 811 monitoring a non-distributed service, since the observed accuracy and 812 availability of the service is, in general, different when viewed 813 from clients attached to different parts of the network. When a 814 problem is identified, it is also not always obvious which node 815 served the request, and hence which node is malfunctioning. 817 It is recommended that distributed services are monitored from probes 818 distributed representatively across the routing system, and, where 819 possible, the identity of the node answering individual requests is 820 recorded along with performance and availability statistics. The 821 RIPE NCC DNSMON service [1] is an example of such monitoring for the 822 DNS. 824 Monitoring the routing system (from a variety of places, in the case 825 of routing systems where perspective is relevant) can also provide 826 useful diagnostics for troubleshooting service availability. This 827 can be achieved using dedicated probes, or public route measurement 828 facilities on the Internet such as the RIPE NCC Routing Information 829 Service [2] and the University of Oregon Route Views Project [3]. 831 Monitoring the health of the component devices in an Anycast 832 deployment of a service (hosts, routers, etc) is straightforward, and 833 can be achieved using the same tools and techniques commonly used to 834 manage other network-connected infrastructure, without the additional 835 complexity involved in monitoring Anycast service addresses. 837 6. Security Considerations 839 6.1. Denial-of-Service Attack Mitigation 841 This document describes mechanisms for deploying services on the 842 Internet which can be used to mitigate vulnerability to attack: 844 1. An Anycast Node can act as a sink for attack traffic originated 845 within its sphere of influence, preventing nodes elsewhere from 846 having to deal with that traffic; 848 2. The task of dealing with attack traffic whose sources are widely 849 distributed is itself distributed across all the nodes which 850 contribute to the service. Since the problem of sorting between 851 legitimate and attack traffic is distributed, this may lead to 852 better scaling properties than a service which is not 853 distributed. 855 6.2. Service Compromise 857 The distribution of a service across several (or many) autonomous 858 nodes imposes increased monitoring as well as an increased systems 859 administration burden on the operator of the service which might 860 reduce the effectiveness of host and router security. 862 The potential benefit of being able to take compromised servers off- 863 line without compromising the service can only be realised if there 864 are working procedures to do so quickly and reliably. 866 6.3. Service Hijacking 868 It is possible that an unauthorised party might advertise routes 869 corresponding to anycast Service Addresses across a network, and by 870 doing so capture legitimate request traffic or process requests in a 871 manner which compromises the service (or both). A rogue Anycast Node 872 might be difficult to detect by clients or by the operator of the 873 service. 875 The risk of service hijacking by manipulation of the routing system 876 exists regardless of whether a service is distributed using anycast. 877 However, the fact that legitimate Anycast Nodes are observable in the 878 routing system may make it more difficult to detect rogue nodes. 880 7. Protocol Considerations 882 This document does not impose any protocol considerations. 884 8. IANA Considerations 886 This document requests no action from IANA. 888 9. Acknowledgements 890 The authors gratefully acknowledge the contributions from various 891 participants of the grow working group, and in particular Geoff 892 Huston, Pekka Savola, Danny McPherson, Ben Black and Alan Barrett. 894 This work was supported by the US National Science Foundation 895 (research grant SCI-0427144) and DNS-OARC. 897 10. References 899 10.1. Normative References 901 [I-D.ietf-ipv6-addr-arch-v4] 902 Hinden, R. and S. Deering, "IP Version 6 Addressing 903 Architecture", draft-ietf-ipv6-addr-arch-v4-04 (work in 904 progress), May 2005. 906 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 907 RFC 793, September 1981. 909 [RFC1771] Rekhter, Y. and T. Li, "A Border Gateway Protocol 4 910 (BGP-4)", RFC 1771, March 1995. 912 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 913 E. Lear, "Address Allocation for Private Internets", 914 BCP 5, RFC 1918, February 1996. 916 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 917 Communities Attribute", RFC 1997, August 1996. 919 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route 920 Flap Damping", RFC 2439, November 1998. 922 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 923 Defeating Denial of Service Attacks which employ IP Source 924 Address Spoofing", BCP 38, RFC 2827, May 2000. 926 [RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed 927 Networks", BCP 84, RFC 3704, March 2004. 929 10.2. Informative References 931 [Allman2000] 932 Allman, M. and E. Blanton, "On Making TCP More Robust to 933 Packet Reordering", January 2000, 934 . 936 [Fomenkov2004] 937 Fomenkov, M., Keys, K., Moore, D., and k. claffy, 938 "Longitudinal Study of Internet Traffic from 1999-2003", 939 January 2004, . 942 [ISC-TN-2003-1] 943 Abley, J., "Hierarchical Anycast for Global Service 944 Distribution", March 2003, 945 . 947 [ISC-TN-2004-1] 948 Abley, J., "A Software Approach to Distributing Requests 949 for DNS Service using GNU Zebra, ISC BIND 9 and FreeBSD", 950 March 2004, 951 . 953 [McCreary2000] 954 McCreary, S. and k. claffy, "Trends in Wide Area IP 955 Traffic Patterns: A View from Ames Internet Exchange", 956 September 2000, . 959 [RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host 960 Anycasting Service", RFC 1546, November 1993. 962 [RFC2267] Ferguson, P. and D. Senie, "Network Ingress Filtering: 963 Defeating Denial of Service Attacks which employ IP Source 964 Address Spoofing", RFC 2267, January 1998. 966 [RFC3765] Huston, G., "NOPEER Community for Border Gateway Protocol 967 (BGP) Route Scope Control", RFC 3765, April 2004. 969 URIs 971 [1] 973 [2] 975 [3] 977 Appendix A. Change History 979 This section should be removed before publication. 981 draft-kurtis-anycast-bcp-00: Initial draft. Discussed at IETF 61 in 982 the grow meeting and adopted as a working group document shortly 983 afterwards. 985 draft-ietf-grow-anycast-00: Missing and empty sections completed; 986 some structural reorganisation; general wordsmithing. Document 987 discussed at IETF 62. 989 draft-ietf-grow-anycast-01: This appendix added; acknowledgements 990 section added; commentary on RFC3513 prohibition of anycast on 991 hosts removed; minor sentence re-casting and related jiggery- 992 pokery. This revision published for discussion at IETF 63. 994 draft-ietf-grow-anycast-02: Normative reference to [I-D.ietf-ipv6- 995 addr-arch-v4] added (in the RFC editor's queue at the time of 996 writing; reference should be updated to an RFC number when 997 available). Added commentary on per-packet load balancing. 999 draft-ietf-grow-anycast-03: Editorial changes and language clean-up 1000 at the request of the IESG. 1002 Authors' Addresses 1004 Joe Abley 1005 Internet Systems Consortium, Inc. 1006 950 Charter Street 1007 Redwood City, CA 94063 1008 USA 1010 Phone: +1 650 423 1317 1011 Email: jabley@isc.org 1012 URI: http://www.isc.org/ 1014 Kurt Erik Lindqvist 1015 Netnod Internet Exchange 1016 Bellmansgatan 30 1017 118 47 Stockholm 1018 Sweden 1020 Email: kurtis@kurtis.pp.se 1021 URI: http://www.netnod.se/ 1023 Intellectual Property Statement 1025 The IETF takes no position regarding the validity or scope of any 1026 Intellectual Property Rights or other rights that might be claimed to 1027 pertain to the implementation or use of the technology described in 1028 this document or the extent to which any license under such rights 1029 might or might not be available; nor does it represent that it has 1030 made any independent effort to identify any such rights. Information 1031 on the procedures with respect to rights in RFC documents can be 1032 found in BCP 78 and BCP 79. 1034 Copies of IPR disclosures made to the IETF Secretariat and any 1035 assurances of licenses to be made available, or the result of an 1036 attempt made to obtain a general license or permission for the use of 1037 such proprietary rights by implementers or users of this 1038 specification can be obtained from the IETF on-line IPR repository at 1039 http://www.ietf.org/ipr. 1041 The IETF invites any interested party to bring to its attention any 1042 copyrights, patents or patent applications, or other proprietary 1043 rights that may cover technology that may be required to implement 1044 this standard. Please address the information to the IETF at 1045 ietf-ipr@ietf.org. 1047 Disclaimer of Validity 1049 This document and the information contained herein are provided on an 1050 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1051 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1052 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1053 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1054 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1055 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1057 Copyright Statement 1059 Copyright (C) The Internet Society (2006). This document is subject 1060 to the rights, licenses and restrictions contained in BCP 78, and 1061 except as set forth therein, the authors retain all their rights. 1063 Acknowledgment 1065 Funding for the RFC Editor function is currently provided by the 1066 Internet Society.