idnits 2.17.1 draft-ietf-grow-anycast-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5 on line 1126. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1103. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1110. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1116. ** This document has an original RFC 3978 Section 5.4 Copyright Line, instead of the newer IETF Trust Copyright according to RFC 4748. ** This document has an original RFC 3978 Section 5.5 Disclaimer, instead of the newer disclaimer which includes the IETF Trust according to RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 2006) is 6666 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) -- Obsolete informational reference (is this intentional?): RFC 2267 (Obsoleted by RFC 2827) -- Obsolete informational reference (is this intentional?): RFC 2487 (Obsoleted by RFC 3207) -- Obsolete informational reference (is this intentional?): RFC 2821 (Obsoleted by RFC 5321) -- Obsolete informational reference (is this intentional?): RFC 2845 (Obsoleted by RFC 8945) Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 11 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group J. Abley 3 Internet-Draft Afilias Canada 4 Expires: July 5, 2006 K. Lindqvist 5 Netnod Internet Exchange 6 January 2006 8 Operation of Anycast Services 9 draft-ietf-grow-anycast-04 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on July 5, 2006. 36 Copyright Notice 38 Copyright (C) The Internet Society (2006). 40 Abstract 42 As the Internet has grown, and as systems and networked services 43 within enterprises have become more pervasive, many services with 44 high availability requirements have emerged. These requirements have 45 increased the demands on the reliability of the infrastructure on 46 which those services rely. 48 Various techniques have been employed to increase the availability of 49 services deployed on the Internet. This document presents commentary 50 and recommendations for distribution of services using anycast. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 55 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 56 3. Anycast Service Distribution . . . . . . . . . . . . . . . . . 6 57 3.1. General Description . . . . . . . . . . . . . . . . . . . 6 58 3.2. Goals . . . . . . . . . . . . . . . . . . . . . . . . . . 6 59 4. Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 60 4.1. Protocol Suitability . . . . . . . . . . . . . . . . . . . 8 61 4.2. Node Placement . . . . . . . . . . . . . . . . . . . . . . 8 62 4.3. Routing Systems . . . . . . . . . . . . . . . . . . . . . 9 63 4.3.1. Anycast within an IGP . . . . . . . . . . . . . . . . 9 64 4.3.2. Anycast within the Global Internet . . . . . . . . . . 10 65 4.4. Routing Considerations . . . . . . . . . . . . . . . . . . 10 66 4.4.1. Signalling Service Availability . . . . . . . . . . . 10 67 4.4.2. Covering Prefix . . . . . . . . . . . . . . . . . . . 11 68 4.4.3. Equal-Cost Paths . . . . . . . . . . . . . . . . . . . 11 69 4.4.4. Route Dampening . . . . . . . . . . . . . . . . . . . 13 70 4.4.5. Reverse Path Forwarding Checks . . . . . . . . . . . . 14 71 4.4.6. Propagation Scope . . . . . . . . . . . . . . . . . . 14 72 4.4.7. Other Peoples' Networks . . . . . . . . . . . . . . . 15 73 4.4.8. Aggregation Risks . . . . . . . . . . . . . . . . . . 15 74 4.5. Addressing Considerations . . . . . . . . . . . . . . . . 16 75 4.6. Data Synchronisation . . . . . . . . . . . . . . . . . . . 16 76 4.7. Node Autonomy . . . . . . . . . . . . . . . . . . . . . . 17 77 4.8. Multi-Service Nodes . . . . . . . . . . . . . . . . . . . 18 78 4.8.1. Multiple Covering Prefixes . . . . . . . . . . . . . . 18 79 4.8.2. Pessimistic Withdrawal . . . . . . . . . . . . . . . . 18 80 4.8.3. Intra-Node Interior Connectivity . . . . . . . . . . . 19 81 4.9. Node Identification by Clients . . . . . . . . . . . . . . 19 82 5. Service Management . . . . . . . . . . . . . . . . . . . . . . 20 83 5.1. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . 20 84 6. Security Considerations . . . . . . . . . . . . . . . . . . . 21 85 6.1. Denial-of-Service Attack Mitigation . . . . . . . . . . . 21 86 6.2. Service Compromise . . . . . . . . . . . . . . . . . . . . 21 87 6.3. Service Hijacking . . . . . . . . . . . . . . . . . . . . 21 88 7. Protocol Considerations . . . . . . . . . . . . . . . . . . . 23 89 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 90 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 91 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 26 92 10.1. Normative References . . . . . . . . . . . . . . . . . . . 26 93 10.2. Informative References . . . . . . . . . . . . . . . . . . 26 94 Appendix A. Change History . . . . . . . . . . . . . . . . . . . 29 95 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 96 Intellectual Property and Copyright Statements . . . . . . . . . . 31 98 1. Introduction 100 To distribute a service using anycast, the service is first 101 associated with a stable set of IP addresses, and reachability to 102 those addresses is advertised in a routing system from multiple, 103 independent service nodes. Various techniques for anycast deployment 104 of services are discussed in [RFC1546], [ISC-TN-2003-1] and [ISC-TN- 105 2004-1]. 107 The techniques and considerations described in this document apply to 108 services reachable over both IPv4 and IPv6. 110 Anycast has in recent years become increasingly popular for adding 111 redundancy to DNS servers to complement the redundancy which the DNS 112 architecture itself already provides. Several root DNS server 113 operators have distributed their servers widely around the Internet, 114 and both resolver and authority servers are commonly distributed 115 within the networks of service providers. Anycast distribution has 116 been used by commercial DNS authority server operators for several 117 years. The use of anycast is not limited to the DNS, although the 118 use of anycast imposes some additional limitations on the nature of 119 the service being distributed, including transaction longevity, 120 transaction state held on servers and data synchronisation 121 capabilities. 123 Although anycast is conceptually simple, its implementation 124 introduces some pitfalls for operation of services. For example, 125 monitoring the availability of the service becomes more difficult; 126 the observed availability changes according to the location of the 127 client within the network, and the population of clients using 128 individual anycast nodes is neither static, nor reliably 129 deterministic. 131 This document will describe the use of anycast for both local scope 132 distribution of services using an Interior Gateway Protocol (IGP) and 133 global distribution using the Border Gateway Protocol (BGP) 134 [RFC4271]. Many of the issues for monitoring and data 135 synchronisation are common to both, but deployment issues differ 136 substantially. 138 2. Terminology 140 Service Address: an IP address associated with a particular service 141 (e.g. the destination address used by DNS resolvers to reach a 142 particular authority server). 144 Anycast: the practice of making a particular Service Address 145 available in multiple, discrete, autonomous locations, such that 146 datagrams sent are routed to one of several available locations. 148 Anycast Node: an internally-connected collection of hosts and routers 149 which together provide service for an anycast Service Address. An 150 Anycast Node might be as simple as a single host participating in 151 a routing system with adjacent routers, or it might include a 152 number of hosts connected in some more elaborate fashion; in 153 either case, to the routing system across which the service is 154 being anycast, each Anycast Node presents a unique path to the 155 Service Address. The entire anycast system for the service 156 consists of two or more separate Anycast Nodes. 158 Catchment: in physical geography, an area drained by a river, also 159 known as a drainage basin. By analogy, as used in this document, 160 the topological region of a network within which packets directed 161 at an anycast address are routed to one particular node. 163 Local-Scope Anycast: reachability information for the anycast Service 164 Address is propagated through a routing system in such a way that 165 a particular anycast node is only visible to a subset of the whole 166 routing system. 168 Local Node: an Anycast Node providing service using a Local-Scope 169 Anycast address. 171 Global-Scope Anycast: reachability information for the anycast 172 Service Address is propagated through a routing system in such a 173 way that a particular anycast node is potentially visible to the 174 whole routing system. 176 Global Node: an Anycast Node providing service using a Global-Scope 177 Anycast address. 179 3. Anycast Service Distribution 181 3.1. General Description 183 Anycast is the name given to the practice of making a Service Address 184 available to a routing system at Anycast Nodes in two or more 185 discrete locations. The service provided by each node is generally 186 consistent regardless of the particular node chosen by the routing 187 system to handle a particular request (although some services may 188 benefit from deliberate differences in the behaviours of individual 189 nodes, in order to facilitate locality-specific behaviour; see 190 Section 4.6). 192 For services distributed using anycast, there is no inherent 193 requirement for referrals to other servers or name-based service 194 distribution ("round-robin DNS"), although those techniques could be 195 combined with anycast service distribution if an application required 196 it. The routing system decides which node is used for each request, 197 based on the topological design of the routing system and the point 198 in the network at which the request originates. 200 The Anycast Node chosen to service a particular query can be 201 influenced by the traffic engineering capabilities of the routing 202 protocols which make up the routing system. The degree of influence 203 available to the operator of the node depends on the scale of the 204 routing system within which the Service Address is anycast. 206 Load-balancing between Anycast Nodes is typically difficult to 207 achieve (load distribution between nodes is generally unbalanced in 208 terms of request and traffic load). Distribution of load between 209 nodes for the purposes of reliability, and coarse-grained 210 distribution of load for the purposes of making popular services 211 scalable can often be achieved, however. 213 The scale of the routing system through which a service is anycast 214 can vary from a small Interior Gateway Protocol (IGP) connecting a 215 small handful of components, to the Border Gateway Protocol (BGP) 216 [RFC4271] connecting the global Internet, depending on the nature of 217 the service distribution that is required. 219 3.2. Goals 221 A service may be anycast for a variety of reasons. A number of 222 common objectives are: 224 1. Coarse ("unbalanced") distribution of load across nodes, to allow 225 infrastructure to scale to increased numbers of queries and to 226 accommodate transient query peaks; 228 2. Mitigation of non-distributed denial of service attacks by 229 localising damage to single anycast nodes; 231 3. Constraint of distributed denial of service attacks or flash 232 crowds to local regions around anycast nodes. Anycast 233 distribution of a service provides the opportunity for traffic to 234 be handled closer to its source, perhaps using high-performance 235 peering links rather than oversubscribed, paid transit circuits; 237 4. To provide additional information to help identify the location 238 of traffic sources in the case of attack (or query) traffic which 239 incorporates spoofed source addresses. This information is 240 derived from the property of anycast service distribution that 241 the selection of the Anycast Node used to service a particular 242 query may be related to the topological source of the request. 244 5. Improvement of query response time, by reducing the network 245 distance between client and server with the provision of a local 246 Anycast Node. The extent to which query response time is 247 improved depends on the way that nodes are selected for the 248 clients by the routing system. Topological nearness within the 249 routing system does not, in general, correlate to round-trip 250 performance across a network; in some cases response times may 251 see no reduction, and may increase. 253 6. To reduce a list of servers to a single, distributed address. 254 For example, a large number of authoritative nameservers for a 255 zone may be deployed using a small set of anycast Service 256 Addresses; this approach can increase the accessibility of zone 257 data in the DNS without increasing the size of a referral 258 response from a nameserver authoritative for the parent zone. 260 4. Design 262 4.1. Protocol Suitability 264 When a service is anycast between two or more nodes, the routing 265 system makes the node selection decision on behalf of a client. 266 Since it is usually a requirement that a single client-server 267 interaction is carried out between a client and the same server node 268 for the duration of the transaction, it follows that the routing 269 system's node selection decision ought to be stable for substantially 270 longer than the expected transaction time, if the service is to be 271 provided reliably. 273 Some services have very short transaction times, and may even be 274 carried out using a single packet request and a single packet reply 275 (e.g. DNS transactions over UDP transport). Other services involve 276 far longer-lived transactions (e.g. bulk file downloads and audio- 277 visual media streaming). 279 Services may be anycast within very predictable routing systems, 280 which can remain stable for long periods of time (e.g. anycast within 281 a well-managed and topologically-simple IGP, where node selection 282 changes only occur as a response to node failures). Other 283 deployments have far less predictable characteristics (see 284 Section 4.4.7). 286 The stability of the routing system together with the transaction 287 time of the service should be carefully compared when deciding 288 whether a service is suitable for distribution using anycast. In 289 some cases, for new protocols, it may be practical to split large 290 transactions into an initialisation phase which is handled by anycast 291 servers, and a sustained phase which is provided by non-anycast 292 servers, perhaps chosen during the initialisation phase. 294 This document deliberately avoids prescribing rules as to which 295 protocols or services are suitable for distribution by anycast; to 296 attempt to do so would be presumptuous. 298 4.2. Node Placement 300 Decisions as to where Anycast Nodes should be placed will depend to a 301 large extent on the goals of the service distribution. For example: 303 o A DNS recursive resolver service might be distributed within an 304 ISP's network, one Anycast Node per site. 306 o A root DNS server service might be distributed throughout the 307 Internet; Anycast Nodes could be located in regions with poor 308 external connectivity to ensure that the DNS functions adequately 309 within the region during times of external network failure. 311 o An FTP mirror service might include local nodes located at 312 exchange points, so that ISPs connected to that exchange point 313 could download bulk data more cheaply than if they had to use 314 expensive transit circuits. 316 In general node placement decisions should be made with consideration 317 of likely traffic requirements, the potential for flash crowds or 318 denial-of-service traffic, the stability of the local routing system 319 and the failure modes with respect to node failure, or local routing 320 system failure. 322 4.3. Routing Systems 324 4.3.1. Anycast within an IGP 326 There are several common motivations for the distribution of a 327 Service Address within the scope of an IGP: 329 1. to improve service response times, by hosting a service close to 330 other users of the network; 332 2. to improve service reliability by providing automatic fail-over 333 to backup nodes; and 335 3. to keep service traffic local, to avoid congesting wide-area 336 links. 338 In each case the decisions as to where and how services are 339 provisioned can be made by network engineers without requiring such 340 operational complexities as regional variances in the configuration 341 of client computers, or deliberate DNS incoherence (causing DNS 342 queries to yield different answers depending on where the queries 343 originate). 345 When a service is anycast within an IGP the routing system is 346 typically under the control of the same organisation that is 347 providing the service, and hence the relationship between service 348 transaction characteristics and network stability are likely to be 349 well-understood. This technique is consequently applicable to a 350 larger number of applications than Internet-wide anycast service 351 distribution (see Section 4.1). 353 An IGP will generally have no inherent restriction on the length of 354 prefix that can be introduced to it. In this case there is no need 355 to construct a covering prefix for particular Service Addresses; host 356 routes corresponding to the Service Address can instead be introduced 357 to the routing system. See Section 4.4.2 for more discussion of the 358 requirement for a covering prefix. 360 IGPs often feature little or no aggregation of routes, partly due to 361 algorithmic complexities in supporting aggregation. There is little 362 motivation for aggregation in many networks' IGPs in many cases, 363 since the amount of routing information carried in the IGP is small 364 enough that scaling concerns in routers do not arise. For discussion 365 of aggregation risks in other routing systems, see Section 4.4.8. 367 By reducing the scope of the IGP to just the hosts providing service 368 (together with one or more gateway routers) this technique can be 369 applied to the construction of server clusters. This application is 370 discussed in some detail in [ISC-TN-2004-1]. 372 4.3.2. Anycast within the Global Internet 374 Service Addresses may be anycast within the global Internet routing 375 system in order to distribute services across the entire network. 376 The principal differences between this application and the IGP-scope 377 distribution discussed in Section 4.3.1 are that: 379 1. the routing system is, in general, controlled by other people; 381 2. the routing protocol concerned (BGP), and commonly-accepted 382 practices in its deployment, impose some additional constraints 383 (see Section 4.4). 385 4.4. Routing Considerations 387 4.4.1. Signalling Service Availability 389 When a routing system is provided with reachability information for a 390 Service Address from an individual node, packets addressed to that 391 Service Address will start to arrive at the node. Since it is 392 essential for the node to be ready to accept requests before they 393 start to arrive, a coupling between the routing information and the 394 availability of the service at a particular node is desirable. 396 Where a routing advertisement from a node corresponds to a single 397 Service Address, this coupling might be such that availability of the 398 service triggers the route advertisement, and non-availability of the 399 service triggers a route withdrawal. This can be achieved using 400 routing protocol implementations on the same server which provide the 401 service being distributed, which are configured to advertise and 402 withdraw the route advertisement in conjunction with the availability 403 (and health) of the software on the host which processes service 404 requests. An example of such an arrangement for a DNS service is 405 included in [ISC-TN-2004-1]. 407 Where a routing advertisement from a node corresponds to two or more 408 Service Addresses, it may not be appropriate to trigger a route 409 withdrawal due to the non-availability of a single service. Another 410 approach in the case where the service is down at one Anycast Node is 411 to route requests to a different Anycast Node where the service is 412 working normally. This approach is discussed in Section 4.8. 414 Rapid advertisement/withdrawal oscillations can cause operational 415 problems, and nodes should be configured such that rapid oscillations 416 are avoided (e.g. by implementing a minimum delay following a 417 withdrawal before the service can be re-advertised). See 418 Section 4.4.4 for a discussion of route oscillations in BGP. 420 4.4.2. Covering Prefix 422 In some routing systems (e.g. the BGP-based routing system of the 423 global Internet) it is not possible, in general, to propagate a host 424 route with confidence that the route will propagate throughout the 425 network. This is a consequence of operational policy, and not a 426 protocol restriction. 428 In such cases it is necessary to propagate a route which covers the 429 Service Address, and which has a sufficiently short prefix that it 430 will not be discarded by commonly-deployed import policies. For IPv4 431 Service Addresses, this is often a 24-bit prefix, but there are other 432 well-documented examples of IPv4 import polices which filter on 433 Regional Internet Registry (RIR) allocation boundaries, and hence 434 some experimentation may be prudent. Corresponding import policies 435 for IPv6 prefixes also exist. See Section 4.5 for more discussion of 436 IPv6 Service Addresses and corresponding anycast routes. 438 The propagation of a single route per service has some associated 439 scaling issues which are discussed in Section 4.4.8. 441 Where multiple Service Addresses are covered by the same covering 442 route, there is no longer a tight coupling between the advertisement 443 of that route and the individual services associated with the covered 444 host routes. The resulting impact on signalling availability of 445 individual services is discussed in Section 4.4.1 and Section 4.8. 447 4.4.3. Equal-Cost Paths 449 Some routing systems support equal-cost paths to the same 450 destination. Where multiple, equal-cost paths exist and lead to 451 different anycast nodes, there is a risk that different request 452 packets associated with a single transaction might be delivered to 453 more than one node. Services provided over TCP [RFC0793] necessarily 454 involve transactions with multiple request packets, due to the TCP 455 setup handshake. 457 For services which are distributed across the global Internet using 458 BGP, equal-cost paths are normally not a consideration: BGP's exit 459 selection algorithm usually selects a single, consistent exit for a 460 single destination regardless of whether multiple candidate paths 461 exist. Implementations of BGP exist that support multi-path exit 462 selection, however. 464 Equal cost paths are commonly supported in IGPs. Multi-node 465 selection for a single transaction can be avoided in most cases by 466 careful consideration of IGP link metrics, or by applying equal-cost 467 multi-path (ECMP) selection algorithms which cause a single node to 468 be selected for a single multi-packet transaction. For an example of 469 the use of hash-based ECMP selection in anycast service distribution, 470 see [ISC-TN-2004-1]. 472 Other ECMP selection algorithms are commonly available, including 473 those in which packets from the same flow are not guaranteed to be 474 routed towards the same destination. ECMP algorithms which select a 475 route on a per-packet basis rather than per-flow are commonly 476 referred to as performing "Per Packet Load Balancing" (PPLB). 478 With respect to anycast service distribution, some uses of PPLB may 479 cause different packets from a single multi-packet transaction sent 480 by a client to be delivered to different anycast nodes, effectively 481 making the anycast service unavailable. Whether this affects 482 specific anycast services will depend on how and where anycast nodes 483 are deployed within the routing system, and on where the PPLB is 484 being performed: 486 1. PPLB across multiple, parallel links between the same pair of 487 routers should cause no node selection problems; 489 2. PPLB across diverse paths within a single autonomous system (AS), 490 where the paths converge to a single exit as they leave the AS, 491 should cause no node selection problems; 493 3. PPLB across links to different neighbour ASes where the neighbour 494 ASes have selected different nodes for a particular anycast 495 destination will, in general, cause request packets to be 496 distributed across multiple anycast nodes. This will have the 497 effect that the anycast service is unavailable to clients 498 downstream of the router performing PPLB. 500 The uses of PPLB which have the potential to interact badly with 501 anycast service distribution can also cause persistent packet 502 reordering. A network path that persistently reorders segments will 503 degrade the performance of traffic carried by TCP [Allman2000]. TCP, 504 according to several documented measurements, accounts for the bulk 505 of traffic carried on the Internet ([McCreary2000], [Fomenkov2004]). 506 Consequently, in many cases it is reasonable to consider networks 507 making such use of PPLB to be pathological. 509 4.4.4. Route Dampening 511 Frequent advertisements and withdrawals of individual prefixes in BGP 512 are known as flaps. Rapid flapping can lead to CPU exhaustion on 513 routers quite remote from the source of the instability, and for this 514 reason rapid route oscillations are frequently "dampened", as 515 described in [RFC2439]. 517 A dampened path will be suppressed by routers for an interval which 518 increases according to the frequency of the observed oscillation; a 519 suppressed path will not propagate. Hence a single router can 520 prevent the propagation of a flapping prefix to the rest of an 521 autonomous system, affording other routers in the network protection 522 from the instability. 524 Some implementations of flap dampening penalise oscillating 525 advertisements based on the observed AS_PATH, and not on Network 526 Layer Reachability Information (NLRI; see [RFC4271]). For this 527 reason, network instability which leads to route flapping from a 528 single anycast node will not generally cause advertisements from 529 other nodes (which have different AS_PATH attributes) to be dampened 530 by these implementations. 532 To limit the opportunity of such implementations to penalise 533 advertisements originating from different Anycast Nodes in response 534 to oscillations from just a single node, care should be taken to 535 arrange that the AS_PATH attributes on routes from different nodes 536 are as diverse as possible. For example, Anycast Nodes should use 537 the same origin AS for their advertisements, but might have different 538 upstream ASes. 540 Where different implementations of flap dampening are prevalent, 541 individual nodes' instability may result in stable nodes becoming 542 unavailable. In mitigation, the following measures may be useful: 544 1. Judicious deployment of Local Nodes in combination with 545 especially stable Global Nodes (with high inter-AS path splay, 546 redundant hardware, power, etc.) may help limit oscillation 547 problems to the Local Nodes' limited regions of influence; 549 2. Aggressive flap-dampening of the service prefix close to the 550 origin (e.g. within an Anycast Node, or in adjacent ASes of each 551 Anycast Node) may also help reduce the opportunity of remote ASes 552 to see oscillations at all. 554 4.4.5. Reverse Path Forwarding Checks 556 Reverse Path Forwarding (RPF) checks, first described in [RFC2267], 557 are commonly deployed as part of ingress interface packet filters on 558 routers in the Internet in order to deny packets whose source 559 addresses are spoofed (see also RFC 2827 [RFC2827]). Deployed 560 implementations of RPF make several modes of operation available 561 (e.g. "loose" and "strict"). 563 Some modes of RPF can cause non-spoofed packets to be denied when 564 they originate from multi-homed site, since selected paths might 565 legitimately not correspond with the ingress interface of non-spoofed 566 packets from the multi-homed site. This issue is discussed in 567 [RFC3704]. 569 A collection of anycast nodes deployed across the Internet is largely 570 indistinguishable from a distributed, multi-homed site to the routing 571 system, and hence this risk also exists for anycast nodes, even if 572 individual nodes are not multi-homed. Care should be taken to ensure 573 that each anycast node is treated as a multi-homed network, and that 574 the corresponding recommendations in [RFC3704] with respect to RPF 575 checks are heeded. 577 4.4.6. Propagation Scope 579 In the context of Anycast service distribution across the global 580 Internet, Global Nodes are those which are capable of providing 581 service to clients anywhere in the network; reachability information 582 for the service is propagated globally, without restriction, by 583 advertising the routes covering the Service Addresses for global 584 transit to one or more providers. 586 More than one Global Node can exist for a single service (and indeed 587 this is often the case, for reasons of redundancy and load-sharing). 589 In contrast, it is sometimes desirable to deploy an Anycast Node 590 which only provides services to a local catchment of autonomous 591 systems, and which is deliberately not available to the entire 592 Internet; such nodes are referred to in this document as Local Nodes. 593 An example of circumstances in which a Local Node may be appropriate 594 are nodes designed to serve a region with rich internal connectivity 595 but unreliable, congested or expensive access to the rest of the 596 Internet. 598 Local Nodes advertise covering routes for Service Addresses in such a 599 way that their propagation is restricted. This might be done using 600 well-known community string attributes such as NO_EXPORT [RFC1997] or 601 NOPEER [RFC3765], or by arranging with peers to apply a conventional 602 "peering" import policy instead of a "transit" import policy, or some 603 suitable combination of measures. 605 Advertising reachability to Service Addresses from Local Nodes should 606 ideally be made using a routing policy that require presence of 607 explicit attributes for propagation, rather than relying on implicit 608 (default) policy. Inadvertent propagation of a route beyond its 609 intended horizon can result in capacity problems for Local Nodes 610 which might degrade service performance network-wide. 612 4.4.7. Other Peoples' Networks 614 When Anycast services are deployed across networks operated by 615 others, their reachability is dependent on routing policies and 616 topology changes (planned and unplanned) which are unpredictable and 617 sometimes difficult to identify. Since the routing system may 618 include networks operated by multiple, unrelated organisations, the 619 possibility of unforeseen interactions resulting from the 620 combinations of unrelated changes also exists. 622 The stability and predictability of such a routing system should be 623 taken into consideration when assessing the suitability of anycast as 624 a distribution strategy for particular services and protocols (see 625 also Section 4.1). 627 By way of mitigation, routing policies used by Anycast Nodes across 628 such routing systems should be conservative, individual nodes' 629 internal and external/connecting infrastructure should be scaled to 630 support loads far in excess of the average, and the service should be 631 monitored proactively from many points in order to avoid unpleasant 632 surprises (see Section 5.1). 634 4.4.8. Aggregation Risks 636 The propagation of a single route for each anycast service does not 637 scale well for routing systems in which the load of routing 638 information which must be carried is a concern, and where there are 639 potentially many services to distribute. For example, an autonomous 640 system which provides services to the Internet with N Service 641 Addresses covered by a single exported route, would need to advertise 642 (N+1) routes if each of those services were to be distributed using 643 anycast. 645 The common practice of applying minimum prefix-length filters in 646 import policies on the Internet (see Section 4.4.2) means that for a 647 route covering a Service Address to be usefully propagated the prefix 648 length must be substantially less than that required to advertise 649 just the host route. Widespread advertisement of short prefixes for 650 individual services hence also has a negative impact on address 651 conservation. 653 Both of these issues can be mitigated to some extent by the use of a 654 single covering prefix to accommodate multiple Service Addresses, as 655 described in Section 4.8. This implies a de-coupling of the route 656 advertisement from individual service availability (see 657 Section 4.4.1), however, with attendant risks to the stability of the 658 service as a whole (see Section 4.7). 660 In general, the scaling problems described here prevent anycast from 661 being a useful, general approach for service distribution on the 662 global Internet. It remains, however, a useful technique for 663 distributing a limited number of Internet-critical services, as well 664 as in smaller networks where the aggregation concerns discussed here 665 do not apply. 667 4.5. Addressing Considerations 669 Service Addresses should be unique within the routing system that 670 connects all Anycast Nodes to all possible clients of the service. 671 Service Addresses must also be chosen so that corresponding routes 672 will be allowed to propagate within that routing system. 674 For an IPv4-numbered service deployed across the Internet, for 675 example, an address might be chosen from a block where the minimum 676 RIR allocation size is 24 bits, and reachability to that address 677 might be provided by originating the covering 24-bit prefix. 679 For an IPv4-numbered service deployed within a private network, a 680 locally-unused [RFC1918] address might be chosen, and reachability to 681 that address might be signalled using a (32-bit) host route. 683 For IPv6-numbered services, Anycast Addresses are not scoped 684 differently from unicast addresses. As such the guidelines presented 685 for IPv4 with respect to address suitability follow for IPv6. Note 686 that historical prohibitions on anycast distribution of services over 687 IPv6 have been removed from the IPv6 addressing specification in 688 [RFC4291]. 690 4.6. Data Synchronisation 692 Although some services have been deployed in localised form (such 693 that clients from particular regions are presented with regionally- 694 relevant content) many services have the property that responses to 695 client requests should be consistent, regardless of where the request 696 originates. For a service distributed using anycast, that implies 697 that different Anycast Nodes must operate in a consistent manner and, 698 where that consistent behaviour is based on a data set, that the data 699 concerned be synchronised between nodes. 701 The mechanism by which data is synchronised depends on the nature of 702 the service; examples are zone transfers for authoritative DNS 703 servers and rsync for FTP archives. In general, the synchronisation 704 of data between Anycast Nodes will involve transactions between non- 705 anycast addresses. 707 Data synchronisation across public networks should be carried out 708 with appropriate authentication and encryption. 710 4.7. Node Autonomy 712 For an Anycast deployment whose goals include improved reliability 713 through redundancy, it is important to minimise the opportunity for a 714 single defect to compromise many (or all) nodes, or for the failure 715 of one node to provide a cascading failure bringing down additional 716 successive nodes until the service as a whole is defeated. 718 Co-dependencies are avoided by making each node as autonomous and 719 self-sufficient as possible. The degree to which nodes can survive 720 failure elsewhere depends on the nature of the service being 721 delivered, but for services which accommodate disconnected operation 722 (e.g. the timed propagation of changes between master and slave 723 servers in the DNS) a high degree of autonomy can be achieved. 725 The possibility of cascading failure due to load can also be reduced 726 by the deployment of both Global and Local Nodes for a single 727 service, since the effective fail-over path of traffic is, in 728 general, from Local Node to Global Node; traffic that might sink one 729 Local Node is unlikely to sink all Local Nodes, except in the most 730 degenerate cases. 732 The chance of cascading failure due to a software defect in an 733 operating system or server can be reduced in many cases by deploying 734 nodes running different implementations of operating system, server 735 software, routing protocol software, etc., such that a defect which 736 appears in a single component does not affect the whole system. 738 It should be noted that these approaches to increase node autonomy 739 are, to varying degrees, contrary to the practical goals of making a 740 deployed service straightforward to operate. A service which is 741 over-complex is more likely to suffer from operator error than a 742 service which is more straightforward to run. Careful consideration 743 should be given to all of these aspects so that an appropriate 744 balance may be found. 746 4.8. Multi-Service Nodes 748 For a service distributed across a routing system where covering 749 prefixes are required to announce reachability to a single Service 750 Address (see Section 4.4.2), special consideration is required in the 751 case where multiple services need to be distributed across a single 752 set of nodes. This results from the requirement to signal 753 availability of individual services to the routing system so that 754 requests for service are not received by nodes which are not able to 755 process them (see Section 4.4.1). 757 Several approaches are described in the following sections. 759 4.8.1. Multiple Covering Prefixes 761 Each Service Address is chosen such that only one Service Address is 762 covered by each advertised prefix. Advertisement and withdrawal of a 763 single covering prefix can be tightly coupled to the availability of 764 the single associated service. 766 This is the most straightforward approach. However, since it makes 767 very poor utilisation of globally-unique addresses, it is only 768 suitable for use for a small number of critical, infrastructural 769 services such as root DNS servers. General Internet-wide deployment 770 of services using this approach will not scale. 772 4.8.2. Pessimistic Withdrawal 774 Multiple Service Addresses are chosen such that they are covered by a 775 single prefix. Advertisement and withdrawal of the single covering 776 prefix is coupled to the availability of all associated services; if 777 any individual service becomes unavailable, the covering prefix is 778 withdrawn. 780 The coupling between service availability and advertisement of the 781 covering prefix is complicated by the requirement that all Service 782 Addresses must be available -- the announcement needs to be triggered 783 by the presence of all component routes, and not just a single 784 covered route. 786 The fact that a single malfunctioning service causes all deployed 787 services in a node to be taken off-line may make this approach 788 unsuitable for many applications. 790 4.8.3. Intra-Node Interior Connectivity 792 Multiple Service Addresses are chosen such that they are covered by a 793 single prefix. Advertisement and withdrawal of the single covering 794 prefix is coupled to the availability of any one service. Nodes have 795 interior connectivity, e.g. using tunnels, and host routes for 796 service addresses are distributed using an IGP which extends to 797 include routers at all nodes. 799 In the event that a service is unavailable at one node, but available 800 at other nodes, a request may be routed over the interior network 801 from the receiving node towards some other node for processing. 803 In the event that some local services in a node are down and the node 804 is disconnected from other nodes, continued advertisement of the 805 covering prefix might cause requests to become black-holed. 807 This approach allows reasonable address utilisation of the netblock 808 covered by the announced prefix, at the expense of reduced autonomy 809 of individual nodes; the IGP in which all nodes participate can be 810 viewed as a single point of failure. 812 4.9. Node Identification by Clients 814 From time to time, all clients of deployed services experience 815 problems, and those problems require diagnosis. A service 816 distributed using anycast imposes an additional variable on the 817 diagnostic process over a simple, unicast service -- the particular 818 anycast node which is handling a client's request. 820 In some cases, common network-level diagnostic tools such as 821 traceroute may be sufficient to identify the node being used by a 822 client. However, the use of such tools may be beyond the abilities 823 of users at the client side of a transaction, and in any case network 824 conditions at the time of the problem may change by the time such 825 tools are exercised. 827 Troubleshooting problems with anycast services is greatly facilitated 828 if mechanisms to determine the identity of a node are designed in to 829 the protocol. Examples of such mechanisms include the NSID option in 830 DNS [I-D.ietf-dnsext-nsid] and the common inclusion of hostname 831 information in SMTP servers' initial greeting at session initiation 832 [RFC2821]. 834 Provision of such in-band mechanisms for node identification is 835 strongly recommended for services to be distributed using anycast. 837 5. Service Management 839 5.1. Monitoring 841 Monitoring a service which is distributed is more complex than 842 monitoring a non-distributed service, since the observed accuracy and 843 availability of the service is, in general, different when viewed 844 from clients attached to different parts of the network. When a 845 problem is identified, it is also not always obvious which node 846 served the request, and hence which node is malfunctioning. 848 It is recommended that distributed services are monitored from probes 849 distributed representatively across the routing system, and, where 850 possible, the identity of the node answering individual requests is 851 recorded along with performance and availability statistics. The 852 RIPE NCC DNSMON service [1] is an example of such monitoring for the 853 DNS. 855 Monitoring the routing system (from a variety of places, in the case 856 of routing systems where perspective is relevant) can also provide 857 useful diagnostics for troubleshooting service availability. This 858 can be achieved using dedicated probes, or public route measurement 859 facilities on the Internet such as the RIPE NCC Routing Information 860 Service [2] and the University of Oregon Route Views Project [3]. 862 Monitoring the health of the component devices in an Anycast 863 deployment of a service (hosts, routers, etc.) is straightforward, 864 and can be achieved using the same tools and techniques commonly used 865 to manage other network-connected infrastructure, without the 866 additional complexity involved in monitoring Anycast service 867 addresses. 869 6. Security Considerations 871 6.1. Denial-of-Service Attack Mitigation 873 This document describes mechanisms for deploying services on the 874 Internet which can be used to mitigate vulnerability to attack: 876 1. An Anycast Node can act as a sink for attack traffic originated 877 within its sphere of influence, preventing nodes elsewhere from 878 having to deal with that traffic; 880 2. The task of dealing with attack traffic whose sources are widely 881 distributed is itself distributed across all the nodes which 882 contribute to the service. Since the problem of sorting between 883 legitimate and attack traffic is distributed, this may lead to 884 better scaling properties than a service which is not 885 distributed. 887 6.2. Service Compromise 889 The distribution of a service across several (or many) autonomous 890 nodes imposes increased monitoring as well as an increased systems 891 administration burden on the operator of the service which might 892 reduce the effectiveness of host and router security. 894 The potential benefit of being able to take compromised servers off- 895 line without compromising the service can only be realised if there 896 are working procedures to do so quickly and reliably. 898 6.3. Service Hijacking 900 It is possible that an unauthorised party might advertise routes 901 corresponding to anycast Service Addresses across a network, and by 902 doing so capture legitimate request traffic or process requests in a 903 manner which compromises the service (or both). A rogue Anycast Node 904 might be difficult to detect by clients or by the operator of the 905 service. 907 The risk of service hijacking by manipulation of the routing system 908 exists regardless of whether a service is distributed using anycast. 909 However, the fact that legitimate Anycast Nodes are observable in the 910 routing system may make it more difficult to detect rogue nodes. 912 Many protocols which incorporate authentication or integrity 913 protection provide those features in a robust fashion, e.g. using 914 periodic re-authentication within a single session, or integrity 915 protection at either the channel (e.g. [RFC2845], [RFC2487]) or 916 message (e.g. [RFC4033], [RFC2311]) levels. Protocols which are 917 less robust may be more vulnerable to session hijacking. Given the 918 greater opportunity for undetected session hijack with anycast 919 services, the use of robust protocols is recommended for anycast 920 services that require authentication or integrity protection. 922 7. Protocol Considerations 924 This document does not impose any protocol considerations. 926 8. IANA Considerations 928 This document requests no action from IANA. 930 9. Acknowledgements 932 The authors gratefully acknowledge the contributions from various 933 participants of the grow working group, and in particular Geoff 934 Huston, Pekka Savola, Danny McPherson, Ben Black and Alan Barrett. 936 This work was supported by the US National Science Foundation 937 (research grant SCI-0427144) and DNS-OARC. 939 10. References 941 10.1. Normative References 943 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 944 RFC 793, September 1981. 946 [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and 947 E. Lear, "Address Allocation for Private Internets", 948 BCP 5, RFC 1918, February 1996. 950 [RFC1997] Chandrasekeran, R., Traina, P., and T. Li, "BGP 951 Communities Attribute", RFC 1997, August 1996. 953 [RFC2439] Villamizar, C., Chandra, R., and R. Govindan, "BGP Route 954 Flap Damping", RFC 2439, November 1998. 956 [RFC2827] Ferguson, P. and D. Senie, "Network Ingress Filtering: 957 Defeating Denial of Service Attacks which employ IP Source 958 Address Spoofing", BCP 38, RFC 2827, May 2000. 960 [RFC3704] Baker, F. and P. Savola, "Ingress Filtering for Multihomed 961 Networks", BCP 84, RFC 3704, March 2004. 963 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 964 Protocol 4 (BGP-4)", RFC 4271, January 2006. 966 [RFC4291] Hinden, R. and S. Deering, "IP Version 6 Addressing 967 Architecture", RFC 4291, February 2006. 969 10.2. Informative References 971 [Allman2000] 972 Allman, M. and E. Blanton, "On Making TCP More Robust to 973 Packet Reordering", January 2000, 974 . 976 [Fomenkov2004] 977 Fomenkov, M., Keys, K., Moore, D., and k. claffy, 978 "Longitudinal Study of Internet Traffic from 1999-2003", 979 January 2004, . 982 [I-D.ietf-dnsext-nsid] 983 Austein, R., "DNS Name Server Identifier Option (NSID)", 984 draft-ietf-dnsext-nsid-02 (work in progress), June 2006. 986 [ISC-TN-2003-1] 987 Abley, J., "Hierarchical Anycast for Global Service 988 Distribution", March 2003, 989 . 991 [ISC-TN-2004-1] 992 Abley, J., "A Software Approach to Distributing Requests 993 for DNS Service using GNU Zebra, ISC BIND 9 and FreeBSD", 994 March 2004, 995 . 997 [McCreary2000] 998 McCreary, S. and k. claffy, "Trends in Wide Area IP 999 Traffic Patterns: A View from Ames Internet Exchange", 1000 September 2000, . 1003 [RFC1546] Partridge, C., Mendez, T., and W. Milliken, "Host 1004 Anycasting Service", RFC 1546, November 1993. 1006 [RFC2267] Ferguson, P. and D. Senie, "Network Ingress Filtering: 1007 Defeating Denial of Service Attacks which employ IP Source 1008 Address Spoofing", RFC 2267, January 1998. 1010 [RFC2311] Dusse, S., Hoffman, P., Ramsdell, B., Lundblade, L., and 1011 L. Repka, "S/MIME Version 2 Message Specification", 1012 RFC 2311, March 1998. 1014 [RFC2487] Hoffman, P., "SMTP Service Extension for Secure SMTP over 1015 TLS", RFC 2487, January 1999. 1017 [RFC2821] Klensin, J., "Simple Mail Transfer Protocol", RFC 2821, 1018 April 2001. 1020 [RFC2845] Vixie, P., Gudmundsson, O., Eastlake, D., and B. 1021 Wellington, "Secret Key Transaction Authentication for DNS 1022 (TSIG)", RFC 2845, May 2000. 1024 [RFC3765] Huston, G., "NOPEER Community for Border Gateway Protocol 1025 (BGP) Route Scope Control", RFC 3765, April 2004. 1027 [RFC4033] Arends, R., Austein, R., Larson, M., Massey, D., and S. 1028 Rose, "DNS Security Introduction and Requirements", 1029 RFC 4033, March 2005. 1031 URIs 1033 [1] 1035 [2] 1037 [3] 1039 Appendix A. Change History 1041 This section should be removed before publication. 1043 Intended category: BCP. 1045 draft-kurtis-anycast-bcp-00: Initial draft. Discussed at IETF 61 in 1046 the grow meeting and adopted as a working group document shortly 1047 afterwards. 1049 draft-ietf-grow-anycast-00: Missing and empty sections completed; 1050 some structural reorganisation; general wordsmithing. Document 1051 discussed at IETF 62. 1053 draft-ietf-grow-anycast-01: This appendix added; acknowledgements 1054 section added; commentary on RFC3513 prohibition of anycast on 1055 hosts removed; minor sentence re-casting and related jiggery- 1056 pokery. This revision published for discussion at IETF 63. 1058 draft-ietf-grow-anycast-02: Normative reference to 1059 draft-ietf-ipv6-addr-arch-v4" added (in the RFC editor's queue at 1060 the time of writing; reference should be updated to an RFC number 1061 when available). Added commentary on per-packet load balancing. 1063 draft-ietf-grow-anycast-03: Editorial changes and language clean-up 1064 at the request of the IESG. 1066 draftt-ietf-grow-anycast-04: Replaced reference to RFC1771 with a 1067 reference to RFC4271. Replaced reference to 1068 draft-ietf-ipv6-addr-arch-v4 with a reference to RFC 4291. 1069 Changed author address for Abley. Wordsmithing in response to 1070 Gen-ART review by Sharon Chrisholm and Secdir review by Rob 1071 Austein. Added Section 4.9 at the suggestion of Rob Austein. 1073 Authors' Addresses 1075 Joe Abley 1076 Afilias Canada, Corp. 1077 204 - 4141 Yonge Street 1078 Toronto, ON M2P 2A8 1079 Canada 1081 Phone: +1 416 673 4176 1082 Email: jabley@ca.afilias.info 1083 URI: http://afilias.info/ 1085 Kurt Erik Lindqvist 1086 Netnod Internet Exchange 1087 Bellmansgatan 30 1088 118 47 Stockholm 1089 Sweden 1091 Email: kurtis@kurtis.pp.se 1092 URI: http://www.netnod.se/ 1094 Intellectual Property Statement 1096 The IETF takes no position regarding the validity or scope of any 1097 Intellectual Property Rights or other rights that might be claimed to 1098 pertain to the implementation or use of the technology described in 1099 this document or the extent to which any license under such rights 1100 might or might not be available; nor does it represent that it has 1101 made any independent effort to identify any such rights. Information 1102 on the procedures with respect to rights in RFC documents can be 1103 found in BCP 78 and BCP 79. 1105 Copies of IPR disclosures made to the IETF Secretariat and any 1106 assurances of licenses to be made available, or the result of an 1107 attempt made to obtain a general license or permission for the use of 1108 such proprietary rights by implementers or users of this 1109 specification can be obtained from the IETF on-line IPR repository at 1110 http://www.ietf.org/ipr. 1112 The IETF invites any interested party to bring to its attention any 1113 copyrights, patents or patent applications, or other proprietary 1114 rights that may cover technology that may be required to implement 1115 this standard. Please address the information to the IETF at 1116 ietf-ipr@ietf.org. 1118 Disclaimer of Validity 1120 This document and the information contained herein are provided on an 1121 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1122 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET 1123 ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, 1124 INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE 1125 INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1126 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1128 Copyright Statement 1130 Copyright (C) The Internet Society (2006). This document is subject 1131 to the rights, licenses and restrictions contained in BCP 78, and 1132 except as set forth therein, the authors retain all their rights. 1134 Acknowledgment 1136 Funding for the RFC Editor function is currently provided by the 1137 Internet Society.