idnits 2.17.1 draft-ietf-idr-flow-spec-08.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (April 21, 2009) is 5474 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '137' on line 482 -- Looks like a reference, but probably isn't: '139' on line 482 -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE.754.1985' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group P. Marques 3 Internet-Draft N. Sheth 4 Intended status: Standards Track R. Raszuk 5 Expires: October 23, 2009 B. Greene 6 Juniper Networks 7 J. Mauch 8 NTT/Verio 9 D. McPherson 10 Arbor Networks 11 April 21, 2009 13 Dissemination of flow specification rules 14 draft-ietf-idr-flow-spec-08 16 Status of this Memo 18 This Internet-Draft is submitted to IETF in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on October 23, 2009. 39 Copyright Notice 41 Copyright (c) 2009 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents in effect on the date of 46 publication of this document (http://trustee.ietf.org/license-info). 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. 50 Abstract 52 This document defines a new BGP NLRI encoding format that can be used 53 to distribute traffic flow specifications. This allows the routing 54 system to propagate information regarding more-specific components of 55 the traffic aggregate defined by an IP destination prefix. 57 Additionally it defines two applications of that encoding format. 58 One that can be used to automate inter-domain coordination of traffic 59 filtering, such as what is required in order to mitigate 60 (distributed) denial of service attacks. And a second application to 61 traffic filtering in the context of a BGP/MPLS VPN service. 63 The information is carried via the Border Gateway Protocol (BGP), 64 thereby reusing protocol algorithms, operational experience and 65 administrative processes such as inter-provider peering agreements. 67 Table of Contents 69 1. Definitions of Terms Used in this Memo . . . . . . . . . . . . 4 70 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 3. Flow specifications . . . . . . . . . . . . . . . . . . . . . 6 72 4. Dissemination of Information . . . . . . . . . . . . . . . . . 7 73 5. Traffic filtering . . . . . . . . . . . . . . . . . . . . . . 13 74 5.1. Order of traffic filtering rules . . . . . . . . . . . . . 14 75 6. Validation procedure . . . . . . . . . . . . . . . . . . . . . 15 76 7. Traffic Filtering Actions . . . . . . . . . . . . . . . . . . 16 77 8. Traffic filtering in RFC2547bis networks . . . . . . . . . . . 18 78 9. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 19 79 10. Security considerations . . . . . . . . . . . . . . . . . . . 19 80 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 81 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 82 13. Normative References . . . . . . . . . . . . . . . . . . . . . 22 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 85 1. Definitions of Terms Used in this Memo 87 NLRI - Network Layer Reachability Information 89 RIB - Routing Information Base 91 Loc-RIB - Local RIB 93 AS - Autonomous System Number 95 VRF - Virtual Routing and Forwarding instance 97 PE - Provider Edge router 99 2. Introduction 101 Modern IP routers contain both the capability to forward traffic 102 according to aggregate IP prefixes as well as to classify, shape, 103 rate limit, filter or redirect packets based on administratively 104 defined policies. 106 While forwarding information is, typically, dynamically signaled 107 across the network via routing protocols, there is no agreed upon 108 mechanism to dynamically signal flow information across autonomous- 109 systems. 111 For several applications, it may be necessary to exchange control 112 information pertaining to aggregated traffic flow definitions which 113 cannot be expressed using destination address prefixes only. 115 An aggregated traffic flow is considered to be an n-tuple consisting 116 of several matching criteria such as source and destination address 117 prefixes, IP protocol and transport protocol port numbers. 119 The intention of this document is to define a general procedure to 120 encode such flow specification rules as a BGP [RFC4271] NLRI which 121 can be reused for several different control applications. 122 Additionally, we define the required mechanisms to utilize this 123 definition to the problem of immediate concern to the authors: intra 124 and inter provider distribution of traffic filtering rules to filter 125 (Distributed) Denial of Service (DoS) attacks. 127 By expanding routing information with flow specifications, the 128 routing system can take advantage of the ACL/firewall capabilities in 129 the router's forwarding path. Flow specifications can be seen as 130 more specific routing entries to an unicast prefix and are expected 131 to depend upon the existing unicast data information. 133 A flow specification received from a external autonomous-system will 134 need to be validated against unicast routing before being accepted. 135 If the aggregate traffic flow defined by the unicast destination 136 prefix is forwarded to a given BGP peer, then the local system can 137 safely install more specific flow rules which may result in different 138 forwarding behavior, as requested by this system. 140 The key technology components required to address the class of 141 problems targeted by this document are: 143 1. Efficient point to multi-point distribution of control plane 144 information. 146 2. Inter-domain capabilities and routing policy support. 148 3. Tight integration with unicast routing, for verification 149 purposes. 151 Items 1 and 2 have already been addressed using BGP for other types 152 of control plane information. Close integration with BGP also makes 153 it feasible to specific a mechanism to automatically verify flow 154 information against unicast routing. These factors are behind the 155 choice of BGP as the carrier of flow specification information. 157 As with previous extensions to the BGP protocol, this specification 158 makes it possible to add additional information to Internet routers. 159 These are limited in terms of the maximum number of data elements 160 they can hold as well as the number of events they are able to 161 process in a given unit of time. The authors believe that, as with 162 previous extensions, service providers will be careful to keep 163 information levels bellow the maximum capacity of their devices. 165 It is also expected that in many initial deployments flow 166 specification information will replace existing host length route 167 advertisements rather than add additional information. 169 Experience with previous BGP extensions has also shown that the 170 maximum capacity of BGP speakers has been gradually increased 171 according to expected loads. Taking into account Internet unicast 172 routing as well as additional applications as they gain popularity. 174 From an operational perspective, the utilization of BGP as the 175 carrier for this information, allows a network service provider to 176 reuse both internal route distribution infrastructure (e.g.: route 177 reflector or confederation design) and existing external 178 relationships (e.g.: inter-domain BGP sessions to a customer 179 network). 181 While it is certainly possible to address this problem using other 182 mechanisms, the authors believe that this solution offers the 183 substantial advantage of being an incremental addition to already 184 deployed mechanisms. 186 In current deployments, the information distributed by the flow-spec 187 extension is originated both manually as well as automatically. The 188 latter by systems which are able to detect malicious flows. When 189 automated systems are used care should be taken to ensure their 190 correctness as well as to limit the advertisement rate of flow 191 routes. 193 This specification defines required protocol extensions to address 194 most common applications of IPv4 unicast and VPNv4 unicast filtering. 195 The same mechanism can be reused and new match criteria added to 196 address similar filtering needs for other BGP address families (for 197 example IPv6 unicast). Authors believe that those would be best to 198 be addressed in a separate document. 200 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 201 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 202 document are to be interpreted as described in RFC 2119 [RFC2119]. 204 3. Flow specifications 206 A flow specification is an n-tuple consisting of several matching 207 criteria that can be applied to IP traffic. A given IP packet is 208 said to match the defined flow if it matches all the specified 209 criteria. 211 A given flow may be associated with a set of attributes, depending on 212 the particular application, such attributes may or may not include 213 reachability information (i.e. NEXT_HOP). Well-known or AS-specific 214 community attributes can be used to encode a set of predetermined 215 actions. 217 A particular application is identified by a specific (AFI, SAFI) pair 218 [RFC4760] and corresponds to a distinct set of RIBs. Those RIBs 219 should be treated independently from each other in order to assure 220 non-interference between distinct applications. 222 BGP itself treats the NLRI as an opaque key to an entry in its 223 databases. Entries that are placed in the Loc-RIB are then 224 associated with a given set of semantics which is application 225 dependent. This is consistent with existing BGP applications. For 226 instance IP unicast routing (AFI=1, SAFI=1) and IP multicast reverse- 227 path information (AFI=1, SAFI=2) are handled by BGP without any 228 particular semantics being associated with them until installed in 229 the Loc-RIB. 231 Standard BGP policy mechanisms, such as UPDATE filtering by NLRI 232 prefix and community matching, SHOULD apply to the newly defined 233 NLRI-type. Network operators can also control propagation of such 234 routing updates by enabling or disabling the exchange of a particular 235 (AFI, SAFI) pair on a given BGP peering session. 237 4. Dissemination of Information 239 We define a "Flow Specification" NLRI type that may include several 240 components such as destination prefix, source prefix, protocol, 241 ports, etc. This NLRI is treated as an opaque bit string prefix by 242 BGP. Each bit string identifies a key to a database entry which a 243 set of attributes can be associated with. 245 This NLRI information is encoded using MP_REACH_NLRI and 246 MP_UNREACH_NLRI attributes as defined in RFC4760 [RFC4760]. Whenever 247 the corresponding application does not require Next Hop information, 248 this shall be encoded as a 0 octet length Next Hop in the 249 MP_REACH_NLRI attribute and ignored on receipt. 251 The NLRI field of the MP_REACH_NLRI and MP_UNREACH_NLRI is encoded as 252 a 1 or 2 octet NLRI length field followed by a variable length NLRI 253 value. The NLRI length is expressed in octets. 255 +------------------------------+ 256 | length (0xnn or 0xfn nn) | 257 +------------------------------+ 258 | NLRI value (variable) | 259 +------------------------------+ 261 flow-spec NLRI 263 If the NLRI length value is smaller than 240 (0xf0 hex), the length 264 field can be encoded as a single octet. Otherwise, it is encoded as 265 a extended length 2 octet value in which the most significant nibble 266 of the first byte is all ones. 268 The Flow Specification NLRI-type consists of several optional 269 subcomponents. A specific packet is considered to match the flow 270 specification when it matches the intersection (AND) of all the 271 components present in the specification. 273 The following component types are defined: 275 Type 1 - Destination Prefix 277 Encoding: 279 Defines the destination prefix to match. Prefixes are encoded 280 as in BGP UPDATE messages, a length in bits is followed by 281 enough octets to contain the prefix information. 283 Type 2 - Source Prefix 285 Encoding: 287 Defines the source prefix to match. 289 Type 3 - IP Protocol 291 Encoding: 293 Contains a set of {operator, value} pairs that are used to 294 match IP protocol value byte in IP packets. 296 The operator byte is encoded as: 298 0 1 2 3 4 5 6 7 299 +---+---+---+---+---+---+---+---+ 300 | e | a | len | 0 |lt |gt |eq | 301 +---+---+---+---+---+---+---+---+ 303 Numeric operator 305 * End of List bit. Set in the last {op, value} pair in the list. 307 * AND bit. If unset the previous term is logically ORed with the 308 current one. If set the operation is a logical AND. It should 309 be unset in the first operator byte of a sequence. The AND 310 operator has higher priority than OR for the purposes of 311 evaluating logical expressions. 313 * The length of value field for this operand is given as (1 << 314 len). 316 * Lt - less than comparison between data and value. 318 * gt - greater than comparison between data and value. 320 * eq - equality between data and value. 322 * The bits lt, gt, and eq can be combined to produce "less or 323 equal", "greater or equal" and inequality values. 325 Type 4 - Port 327 Encoding: 329 Defines a list of {operation, value} pairs that matches source 330 OR destination TCP/UDP ports. This list is encoded using the 331 numeric operand format defined above. Values are encoded as 1 332 or 2 byte quantities. 334 Port, source port and destination port components evaluate to 335 FALSE if the IP protocol field of the packet has a value other 336 than TCP or UDP, if the packet is fragmented and this is not 337 the first fragment or if the system in unable to locate the 338 transport header. Different implementations may or may not be 339 able to decode the transport header in the presence of IP 340 options or ESP NULL [RFC4303] encryption. 342 Type 5 - Destination port 344 Encoding: 346 Defines a list of {operation, value} pairs used to match the 347 destination port of a TCP or UDP packet. Values are encoded as 348 1 or 2 byte quantities. 350 Type 6 - Source port 352 Encoding: 354 Defines a list of {operation, value} pairs used to match the 355 source port of a TCP or UDP packet. Values are encoded as 1 or 356 2 byte quantities. 358 Type 7 - ICMP type 360 Encoding: 362 Defines a list of {operation, value} pairs used to match the 363 type field of an icmp packet. Values are encoded using a 364 single byte. 366 The ICMP type and code specifiers evaluate to FALSE whenever 367 the protocol value is not ICMP 369 Type 8 - ICMP code 371 Encoding: 373 Defines a list of {operation, value} pairs used to match the 374 code field of an icmp packet. Values are encoded using a 375 single byte. 377 Type 9 - TCP flags 379 Encoding: 381 Bitmask values can be encoded as a one or two byte bitmask. 382 When a single byte is specified it matches byte 13 of the TCP 383 header [RFC0793] which contains (bits 8 though 15 of the 4th 384 32bit word). When a 2 byte encoding is used it matches bytes 385 12 and 13 of the TCP header with the data offset field having a 386 "don't care" value. 388 As with port specifiers, this component evaluates to FALSE for 389 packets that are not TCP packets. 391 This type uses the bitmask operand format, which differs from 392 the numeric operator format in the lower nibble. 394 0 1 2 3 4 5 6 7 395 +---+---+---+---+---+---+---+---+ 396 | e | a | len | 0 | 0 |not| m | 397 +---+---+---+---+---+---+---+---+ 399 * Most significant nibble: (End of List bit, AND bit and Length 400 field), as defined for in the numeric operator format. 402 * NOT bit. If set, logical negation of operation. 404 * Match bit. If set this is a bitwise match operation defined as 405 "(data & value) == value"; if unset (data & value) evaluates to 406 true if any of the bits in the value mask are set in the data. 408 Type 10 - Packet length 410 Encoding: 412 Match on the total IP packet length (excluding L2 but including 413 IP header). Values are encoded using as 1 or 2 byte 414 quantities. 416 Type 11 - DSCP 418 Encoding: 420 Defines a list of {operation, value} pairs used to match the 421 6-bit DSCP field [RFC2474]. Values are encoded using a single 422 byte, where the two most significant bits are zero and the six 423 least significant bits contain the DSCP value. 425 Type 12 - Fragment 427 Encoding: 429 Uses bitmask operand format defined above. 431 0 1 2 3 4 5 6 7 432 +---+---+---+---+---+---+---+---+ 433 | Reserved |LF |FF |IsF|DF | 434 +---+---+---+---+---+---+---+---+ 436 Bitmask values: 438 + Bit 7 - Dont fragment 440 + Bit 6 - Is a fragment 442 + Bit 5 - First fragment 444 + Bit 4 - Last fragment 446 Flow specification components must follow strict type ordering. A 447 given component type may or may not be present in the specification, 448 but if present it MUST precede any component of higher numeric type 449 value. 451 If a given component type within a prefix in unknown, the prefix in 452 question cannot be used for traffic filtering purposes by the 453 receiver. Since a Flow Specification has the semantics of a logical 454 AND of all components, if a component is FALSE by definition it 455 cannot be applied. However for the purposes of BGP route propagation 456 this prefix should still be transmitted since BGP route distribution 457 is independent on NLRI semantics. 459 The encoding is chosen in order to account for future 460 extensibility. 462 An example of a Flow Specification encoding for: "all packets to 463 10.0.1/24 and TCP port 25". 465 +------------------+----------+----------+ 466 | destination | proto | port | 467 +------------------+----------+----------+ 468 | 0x01 18 0a 00 01 | 03 81 06 | 04 81 19 | 469 +------------------+----------+----------+ 471 Decode for protocol: 473 +-------+----------+------------------------------+ 474 | Value | | | 475 +-------+----------+------------------------------+ 476 | 0x03 | type | | 477 | 0x81 | operator | end-of-list, value size=1, = | 478 | 0x06 | value | | 479 +-------+----------+------------------------------+ 481 An example of a Flow Specification encoding for: "all packets to 482 10.0.1/24 from 192/8 and port {range [137, 139] or 8080}". 484 +------------------+----------+-------------------------+ 485 | destination | source | port | 486 +------------------+----------+-------------------------+ 487 | 0x01 18 0a 01 01 | 02 08 c0 | 04 03 89 45 8b 91 1f 90 | 488 +------------------+----------+-------------------------+ 490 Decode for port: 492 +--------+----------+------------------------------+ 493 | Value | | | 494 +--------+----------+------------------------------+ 495 | 0x04 | type | | 496 | 0x03 | operator | size=1, >= | 497 | 0x89 | value | 137 | 498 | 0x45 | operator | &, value size=1, <= | 499 | 0x8b | value | 139 | 500 | 0x91 | operator | end-of-list, value-size=2, = | 501 | 0x1f90 | value | 8080 | 502 +--------+----------+------------------------------+ 504 This constitutes a NLRI with an NLRI length of 16 octets. 506 Implementations wishing to exchange flow specification rules MUST use 507 BGP's Capability Advertisement facility to exchange the Multiprotocol 508 Extension Capability Code (Code 1) as defined in RFC4760 [RFC4760]. 509 The (AFI, SAFI) pair carried in the Multiprotocol Extension 510 capability MUST be the same as the one used to identify a particular 511 application that uses this NLRI-type. 513 5. Traffic filtering 515 Traffic filtering policies have been traditionally considered to be 516 relatively static. 518 The popularity of traffic-based denial of service (DoS) attacks, 519 which often requires the network operator to be able to use traffic 520 filters for detection and mitigation, brings with it requirements 521 that are not fully satisfied by existing tools. 523 Increasingly, DoS mitigation, requires coordination among several 524 Service Providers, in order to be able to identify traffic source(s) 525 and because the volumes of traffic may be such that they will 526 otherwise significantly affect the performance of the network. 528 Several techniques are currently used to control traffic filtering of 529 DoS attacks. Among those, one of the most common is to inject 530 unicast route advertisements corresponding to a destination prefix 531 being attacked. One variant of this technique marks such route 532 advertisements with a community that gets translated into a discard 533 next-hop by the receiving router. Other variants, attract traffic to 534 a particular node that serves as a deterministic drop point. 536 Using unicast routing advertisements to distribute traffic filtering 537 information has the advantage of using the existing infrastructure 538 and inter-as communication channels. This can allow, for instance, a 539 service provider to accept filtering requests from customers for 540 address space they own. 542 There are several drawbacks, however. An issue that is immediately 543 apparent is the granularity of filtering control: only destination 544 prefixes may be specified. Another area of concern is the fact that 545 filtering information is intermingled with routing information. 547 The mechanism defined in this document is designed to address these 548 limitations. We use the flow specification NLRI defined above to 549 convey information about traffic filtering rules for traffic that 550 should be discarded. 552 This mechanism is designed to, primarily, allow an upstream 553 autonomous system to perform inbound filtering, in their ingress 554 routers of traffic that a given downstream AS wishes to drop. 556 In order to achieve that goal, we define an application specific NLRI 557 identifier (AFI=1, SAFI=133) along with specific semantic rules. 559 BGP routing updates containing this identifier use the flow 560 specification NLRI encoding to convey particular aggregated flows 561 that require special treatment. 563 Flow routing information received via this (afi, safi) pair is 564 subject to the validation procedure detailed below. 566 5.1. Order of traffic filtering rules 568 With traffic filtering rules, more than one rule may match a 569 particular traffic flow. Thus it is necessary to define the order at 570 which rules get matched and applied to a particular traffic flow. 571 This ordering function must be such that it must not depend on the 572 arrival order of the flow specifications rules and must be constant 573 in the network. 575 The relative order of two flow specification rules is determined by 576 comparing their respective components. The algorithm starts by 577 comparing the left-most components of the rules. If the types 578 differ, the rule with lowest numeric type value has higher precedence 579 (and thus will match before) the rule that doesn't contain that 580 component type. If the component types are the same, then a type 581 specific comparison is performed. 583 For IP prefix values (IP destination and source prefix) precedence is 584 given to lowest IP value of the common prefix length; if the common 585 prefix is equal then the most specific prefix has precedence. 587 For all other component types, unless otherwise specified, the 588 comparison is performed by comparing the component data as a binary 589 string using the the memcmp() function as defined by the ISO C 590 standard. For strings of different lengths, the common prefix is 591 compared. If equal the longest string is considered to have higher 592 precedence than the shorter one. 594 Pseudocode: 596 flow_rule_cmp (a, b) 597 { 598 comp1 = next_component(a); 599 comp2 = next_component(b); 600 while (comp1 || comp2) { 601 // component_type returns infinity on end-of-list 602 if (component_type(comp1) < compnent_type(comp2)) { 603 return A_HAS_PRECEDENCE; 604 } 605 if (component_type(comp1) > component_type(comp2)) { 606 return B_HAS_PRECEDENCE; 607 } 609 if (component_type(comp1) == IP_DESTINATION || IP_SOURCE) { 610 common = MIN(prefix_length(comp1), prefix_length(comp2)); 611 cmp = prefix_compare(comp1, comp2, common); 612 // not equal, lowest value has precedence 613 // equal, longest match has precedence 614 } else { 615 common = MIN(component_length(comp1), component_length(comp2)); 616 cmp = memcmp(data(comp1), data(comp2), common); 617 // not equal, lowest value has precedence 618 // equal, longest string has precedence 619 } 620 } 622 return EQUAL; 623 } 625 6. Validation procedure 627 Flow specifications received from a BGP peer and which are accepted 628 in the respective Adj-RIB-In are used as input to the route selection 629 process. Although the forwarding attributes of two routes for the 630 same Flow Specification prefix may be the same, BGP is still required 631 to perform its path selection algorithm in order to select the 632 correct set of attributes to advertise. 634 The first step of the BGP Route Selection procedure (section 9.1.2 of 635 [RFC4271]) is to exclude from the selection procedure routes that are 636 considered non-feasible. In the context of IP routing information 637 this step is used to validate that the NEXT_HOP attribute of a given 638 route is resolvable. 640 The concept can be extended, in the case of Flow Specification NLRI, 641 to allow other validation procedures. 643 A flow specification NLRI must be validated such that it is 644 considered feasible if and only if: 646 a) The originator of the flow specification matches the originator of 647 the best-match unicast route for the destination prefix embedded 648 in the flow specification. 650 b) There are no more-specific unicast routes, when compared with the 651 flow destination prefix, that have been received from a different 652 neighboring AS than the best-match unicast route, which has been 653 determined in step a). 655 By originator of a BGP route, we mean either the BGP originator path 656 attribute, as used by route reflection, or the transport address of 657 the BGP peer, if this path attribute is not present. 659 The underlying concept is that the neighboring AS that advertises the 660 best unicast route for a destination is allowed to advertise flow- 661 spec information that conveys a more or equally specific destination 662 prefix. Thus, as long as there are no more-specific unicast routes, 663 received from a different neighbor AS, which would be affected by 664 that filtering rule. 666 The neighboring AS is the immediate destination of the traffic 667 described by the Flow Specification. If it requests these flows to 668 be dropped that request can be honored without concern that it 669 represents a denial of service in itself. Supposedly, the traffic is 670 being dropped by the downstream autonomous-system and there is no 671 added value in carrying the traffic to it. 673 BGP implementations MUST also enforce that the AS_PATH attribute of a 674 route received via eBGP contains the neighboring AS in the left-most 675 position of the AS_PATH attribute. While this rule is optional in 676 the BGP specification, it becomes necessary to enforce it for 677 security reasons. 679 7. Traffic Filtering Actions 681 This specification defines a minimum set of filtering actions that it 682 standardizes as BGP extended community values [RFC4360]. This is not 683 meant to be an inclusive list of all the possible actions but only a 684 subset that can be interpreted consistently across the network. 686 Implementations should provide mechanisms that map an arbitrary BGP 687 community value (normal or extended) to filtering actions that 688 require different mappings in different systems in the network. For 689 instance, providing packets with a worse than best-effort per-hop 690 behavior is a functionality that is likely to be implemented 691 differently in different systems and for which no standard behavior 692 is currently known. Rather than attempting to define it here, this 693 can be accomplished by mapping a user defined community value to 694 platform / network specific behavior via user configuration. 696 The default action for a traffic filtering flow specification is to 697 accept IP traffic that matches that particular rule. 699 The following extended community values can be used to specify 700 particular actions. 702 +--------+--------------------+--------------------------+ 703 | type | extended community | encoding | 704 +--------+--------------------+--------------------------+ 705 | 0x8006 | traffic-rate | 2-byte as#, 4-byte float | 706 | 0x8007 | traffic-action | bitmask | 707 | 0x8008 | redirect | 6-byte Route Target | 708 | 0x8009 | traffic-marking | DSCP value | 709 +--------+--------------------+--------------------------+ 711 Traffic-rate The traffic-rate extended community is a non-transitive 712 extended community across the Autonomous system boundary and uses 713 following extended community encoding: 715 The first two octets carry the 2 octet id which can be assigned 716 from a 2 byte AS number. When 4 byte AS number is locally 717 present 2 least significant bytes of such AS number can be 718 used. This value is purely informational and should not be 719 interpreted by the implementation. 721 The remaining 4 octets carry the rate information in IEEE 722 floating point [IEEE.754.1985] format , units being bytes per 723 second. A traffic-rate of 0 should result on all traffic for 724 the particular flow to be discarded. 726 Traffic-action The traffic-action extended community consists of 6 727 bytes of which only the 2 least significant bits of the 6th byte 728 (from left to right) are currently defined. 730 0 1 2 3 4 5 6 7 731 +---+---+---+---+---+---+---+---+ 732 | reserved | S | T | 733 +---+---+---+---+---+---+---+---+ 735 * Terminal action (bit 7). When this bit is set the traffic 736 filtering engine will apply any subsequent filtering rules (as 737 defined by the ordering procedure). If not set the evaluation 738 of the traffic filter stops when this rule is applied. 740 * Sample (bit 6). Enables traffic sampling and logging for this 741 flow specification. 743 Redirect The redirect extended community allows the traffic to be 744 redirected to a VRF routing instance that list the specified 745 route-target in its import policy. If several local instances 746 match this criteria, the choice between them is a local matter 747 (for example, the instance with the lowest Route Distinguisher 748 value can be elected). This extended community uses the same 749 encoding as the Route Target extended community [RFC4360] 751 Traffic Marking The traffic marking extended community instructs a 752 system to modify the DSCP bits of a transiting IP packet to the 753 corresponding value. This extended community is encoded as a 754 sequence of 5 zero bytes followed by the DSCP value encoded in the 755 6 least significant bits of 6th byte. 757 8. Traffic filtering in RFC2547bis networks 759 Provider-based layer 3 VPN networks, such as the ones using an BGP/ 760 MPLS IP VPN [RFC4364] control plane, have different traffic filtering 761 requirements than internet service providers. 763 In these environments, the VPN customer network often has traffic 764 filtering capabilities towards their external network connections 765 (e.g. firewall facing public network connection). Less common is the 766 presence of traffic filtering capabilities between different VPN 767 attachment sites. In an any-to-any connectivity model, which is the 768 default, this means that site to site traffic is unfiltered. 770 In circumstances where a security threat does get propagated inside 771 the VPN customer network, there may not be readily available 772 mechanisms to provide mitigation via traffic filter. 774 This document proposes an additional BGP NLRI type (afi=1, safi=134) 775 value, which can be used to propagate traffic filtering information 776 in a BGP/MPLS VPN environment. 778 The NLRI format for this address family consists of a fixed length 779 Route Distinguisher field (8 bytes) followed by a flow specification, 780 following the encoding defined in this document. The NLRI length 781 field shall include both the 8 bytes of the Route Distinguisher as 782 well as the subsequent flow specification. 784 Propagation of this NLRI is controlled by matching Route Target 785 extended communities associated with the BGP path advertisement with 786 the VRF import policy, using the same mechanism as described in "BGP/ 787 MPLS IP VPNs" [RFC4364] . 789 Flow specification rules received via this NLRI apply only to traffic 790 that belongs to the VRF(s) in which it is imported. By default, 791 traffic received from a remote PE is switched via an mpls forwarding 792 decision and is not subject to filtering. 794 Contrary to the behavior specified for the non-VPN NLRI, flow rules 795 are accepted by default, when received from remote PE routers. 797 9. Monitoring 799 Traffic filtering applications require monitoring and traffic 800 statistics facilities. While this is an implementation specific 801 choice, implementations SHOULD provide: 803 o A mechanism to log the packet header of filtered traffic, 805 o A mechanism to count the number of matches for a given Flow 806 Specification rule. 808 10. Security considerations 810 Inter-provider routing is based on a web of trust. Neighboring 811 autonomous-systems are trusted to advertise valid reachability 812 information. If this trust model is violated, a neighboring 813 autonomous system may cause a denial of service attack by advertising 814 reachability information for a given prefix for which it does not 815 provide service. 817 As long as traffic filtering rules are restricted to match the 818 corresponding unicast routing paths for the relevant prefixes, the 819 security characteristics of this proposal are equivalent to the 820 existing security properties of BGP unicast routing. 822 Where it not the case, this would open the door to further denial of 823 service attacks. 825 Enabling firewall like capabilities in routers without centralized 826 management could make certain failures harder to diagnose. For 827 example, it is possible to allow TCP packets to pass between a pair 828 of addresses but not ICMP packets. It is also possible to permit 829 packets smaller than 900 or greater than 1000 bytes to pass between a 830 pair of addresses, but not packets whose length is in the range 900- 831 1000. Such behavior may be confusing and these capabilities should 832 be used with care whether manually configured or coordinated through 833 the protocol extensions described in this document. 835 11. IANA Considerations 837 A flow specification consists of a sequence of flow components, which 838 are identified by a an 8-bit component type. Types must be assigned 839 and interpreted uniquely. The current specification defines types 1 840 though 12, with the value 0 being reserved. 842 For the purpose of this work IANA has allocated values for two SAFIs: 843 SAFI 133 for IPv4 and SAFI 134 for VPNv4 dissemination of flow 844 specification rules. 846 The following traffic filtering flow specification rules are to be 847 allocated by IANA from BGP Extended Communities Type - Experimental 848 Use registry. Authors recommend the following type values: 850 0x8006 - Flow spec traffic-rate 852 0x8007 - Flow spec traffic-action 854 0x8008 - Flow spec redirect 856 0x8009 - Flow spec traffic-remarking 858 Authors would like to ask IANA to create and maintain a new registry 859 entitled: "Flow Spec Component Type". Authors recommend to allocate 860 the following component types: 862 Type 1 - Destination Prefix 864 Type 2 - Source Prefix 866 Type 3 - IP Protocol 868 Type 4 - Port 870 Type 5 - Destination port 872 Type 6 - Source port 873 Type 7 - ICMP type 875 Type 8 - ICMP code 877 Type 9 - TCP flags 879 Type 10 - Packet length 881 Type 11 - DSCP 883 Type 12 - Fragment 885 In order to manage the limited number space and accommodate several 886 usages the following policies defined by RFC 5226 [RFC5226] are used: 888 +--------------+-------------------------------+ 889 | Range | Policy | 890 +--------------+-------------------------------+ 891 | 0 | Invalid value | 892 | [1 .. 12] | Defined by this specification | 893 | [13 .. 127] | Specification Required | 894 | [128 .. 255] | Private Use | 895 +--------------+-------------------------------+ 897 The specification of a particular "flow component type" must clearly 898 identify what is the criteria used to match packets forwarded by the 899 router. This criteria should be meaningful across router hops and 900 not depend on values that change hop-by-hop such as ttl or layer-2 901 encapsulation. 903 The "Traffic-action" extended community defined in this document has 904 6 unused bits which can be used to convey additional meaning. 905 Authors would like to ask IANA to create and maintain a new registry 906 entitled: "Traffic Action Fields". These values should be assigned 907 via IETF Review rules only. Authors recommend to allocate the 908 following traffic action fields: 910 0 Terminal Action 912 1 Sample 914 2-47 Unassigned 916 12. Acknowledgments 918 The authors would like to thank Yakov Rekhter, Dennis Ferguson, Chris 919 Morrow, Charlie Kaufman and David Smith for their comments. 921 Chaitanya Kodeboyina helped design the flow validation procedure. 923 Steven Lin and Jim Washburn ironed out all the details necessary to 924 produce a working implementation. 926 13. Normative References 928 [IEEE.754.1985] 929 Institute of Electrical and Electronics Engineers, 930 "Standard for Binary Floating-Point Arithmetic", 931 IEEE Standard 754, August 1985. 933 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 934 RFC 793, September 1981. 936 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 937 Requirement Levels", BCP 14, RFC 2119, March 1997. 939 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 940 "Definition of the Differentiated Services Field (DS 941 Field) in the IPv4 and IPv6 Headers", RFC 2474, 942 December 1998. 944 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 945 Protocol 4 (BGP-4)", RFC 4271, January 2006. 947 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 948 RFC 4303, December 2005. 950 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 951 Communities Attribute", RFC 4360, February 2006. 953 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 954 Networks (VPNs)", RFC 4364, February 2006. 956 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 957 "Multiprotocol Extensions for BGP-4", RFC 4760, 958 January 2007. 960 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 961 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 962 May 2008. 964 Authors' Addresses 966 Pedro Marques 967 Juniper Networks 968 1194 N. Mathilda Ave. 969 Sunnyvale, CA 94089 970 US 972 Email: roque@juniper.net 974 Nischal Sheth 975 Juniper Networks 976 1194 N. Mathilda Ave. 977 Sunnyvale, CA 94089 978 US 980 Email: nsheth@juniper.net 982 Robert Raszuk 983 Juniper Networks 984 1194 N. Mathilda Ave. 985 Sunnyvale, CA 94089 986 US 988 Email: raszuk@juniper.net 990 Barry Greene 991 Juniper Networks 992 1194 N. Mathilda Ave. 993 Sunnyvale, CA 94089 994 US 996 Email: bgreene@juniper.net 998 Jared Mauch 999 NTT/Verio 1000 8285 Reese Lane 1001 Ann Arbor, MI 48103-9753 1002 US 1004 Email: jared@puck.nether.net 1005 Danny McPherson 1006 Arbor Networks 1008 Email: danny@arbor.net