idnits 2.17.1 draft-ietf-idr-flow-spec-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 6 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (May 26, 2009) is 5447 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '137' on line 475 -- Looks like a reference, but probably isn't: '139' on line 475 -- Possible downref: Non-RFC (?) normative reference: ref. 'IEEE.754.1985' ** Obsolete normative reference: RFC 793 (Obsoleted by RFC 9293) ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) Summary: 4 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 IDR Working Group P. Marques 3 Internet-Draft N. Sheth 4 Intended status: Standards Track R. Raszuk 5 Expires: November 27, 2009 B. Greene 6 Juniper Networks 7 J. Mauch 8 NTT America 9 D. McPherson 10 Arbor Networks 11 May 26, 2009 13 Dissemination of flow specification rules 14 draft-ietf-idr-flow-spec-09 16 Status of this Memo 18 This Internet-Draft is submitted to IETF in full conformance with the 19 provisions of BCP 78 and BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on November 27, 2009. 39 Copyright Notice 41 Copyright (c) 2009 IETF Trust and the persons identified as the 42 document authors. All rights reserved. 44 This document is subject to BCP 78 and the IETF Trust's Legal 45 Provisions Relating to IETF Documents in effect on the date of 46 publication of this document (http://trustee.ietf.org/license-info). 47 Please review these documents carefully, as they describe your rights 48 and restrictions with respect to this document. 50 Abstract 52 This document defines a new BGP NLRI encoding format that can be used 53 to distribute traffic flow specifications. This allows the routing 54 system to propagate information regarding more-specific components of 55 the traffic aggregate defined by an IP destination prefix. 57 Additionally it defines two applications of that encoding format. 58 One that can be used to automate inter-domain coordination of traffic 59 filtering, such as what is required in order to mitigate 60 (distributed) denial of service attacks. And a second application to 61 traffic filtering in the context of a BGP/MPLS VPN service. 63 The information is carried via the Border Gateway Protocol (BGP), 64 thereby reusing protocol algorithms, operational experience and 65 administrative processes such as inter-provider peering agreements. 67 Table of Contents 69 1. Definitions of Terms Used in this Memo . . . . . . . . . . . . 4 70 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 3. Flow specifications . . . . . . . . . . . . . . . . . . . . . 6 72 4. Dissemination of Information . . . . . . . . . . . . . . . . . 7 73 5. Traffic filtering . . . . . . . . . . . . . . . . . . . . . . 13 74 5.1. Order of traffic filtering rules . . . . . . . . . . . . . 14 75 6. Validation procedure . . . . . . . . . . . . . . . . . . . . . 15 76 7. Traffic Filtering Actions . . . . . . . . . . . . . . . . . . 16 77 8. Traffic filtering in RFC2547bis networks . . . . . . . . . . . 18 78 9. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 19 79 10. Security considerations . . . . . . . . . . . . . . . . . . . 19 80 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 81 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 21 82 13. Normative References . . . . . . . . . . . . . . . . . . . . . 22 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 85 1. Definitions of Terms Used in this Memo 87 NLRI - Network Layer Reachability Information 89 RIB - Routing Information Base 91 Loc-RIB - Local RIB 93 AS - Autonomous System Number 95 VRF - Virtual Routing and Forwarding instance 97 PE - Provider Edge router 99 2. Introduction 101 Modern IP routers contain both the capability to forward traffic 102 according to IP prefixes as well as to classify, shape, rate limit, 103 filter or redirect packets based on administratively defined 104 policies. 106 These traffic policy mechanisms allow the router to define match 107 rules that operate on multiple fields of the packet header. Actions 108 such as the ones described above can be associated with each rule. 110 The n-tuple consisting of the matching criteria defines an aggregate 111 traffic flow specification. The matching criteria can include 112 elements such as source and destination address prefixes, IP protocol 113 and transport protocol port numbers. 115 This document defines a general procedure to encode flow 116 specification rules for aggregated traffic flows so that they can be 117 distributed as a BGP [RFC4271] NLRI. Additionally, we define the 118 required mechanisms to utilize this definition to the problem of 119 immediate concern to the authors: intra and inter provider 120 distribution of traffic filtering rules to filter (Distributed) 121 Denial of Service (DoS) attacks. 123 By expanding routing information with flow specifications, the 124 routing system can take advantage of the ACL/firewall capabilities in 125 the router's forwarding path. Flow specifications can be seen as 126 more specific routing entries to an unicast prefix and are expected 127 to depend upon the existing unicast data information. 129 A flow specification received from an external autonomous-system will 130 need to be validated against unicast routing before being accepted. 131 If the aggregate traffic flow defined by the unicast destination 132 prefix is forwarded to a given BGP peer, then the local system can 133 safely install more specific flow rules which may result in different 134 forwarding behavior, as requested by this system. 136 The key technology components required to address the class of 137 problems targeted by this document are: 139 1. Efficient point to multi-point distribution of control plane 140 information. 142 2. Inter-domain capabilities and routing policy support. 144 3. Tight integration with unicast routing, for verification 145 purposes. 147 Items 1 and 2 have already been addressed using BGP for other types 148 of control plane information. Close integration with BGP also makes 149 it feasible to specific a mechanism to automatically verify flow 150 information against unicast routing. These factors are behind the 151 choice of BGP as the carrier of flow specification information. 153 As with previous extensions to the BGP protocol, this specification 154 makes it possible to add additional information to Internet routers. 155 These are limited in terms of the maximum number of data elements 156 they can hold as well as the number of events they are able to 157 process in a given unit of time. The authors believe that, as with 158 previous extensions, service providers will be careful to keep 159 information levels bellow the maximum capacity of their devices. 161 It is also expected that in many initial deployments flow 162 specification information will replace existing host length route 163 advertisements rather than add additional information. 165 Experience with previous BGP extensions has also shown that the 166 maximum capacity of BGP speakers has been gradually increased 167 according to expected loads. Taking into account Internet unicast 168 routing as well as additional applications as they gain popularity. 170 From an operational perspective, the utilization of BGP as the 171 carrier for this information allows a network service provider to 172 reuse both internal route distribution infrastructure (e.g.: route 173 reflector or confederation design) and existing external 174 relationships (e.g.: inter-domain BGP sessions to a customer 175 network). 177 While it is certainly possible to address this problem using other 178 mechanisms, the authors believe that this solution offers the 179 substantial advantage of being an incremental addition to already 180 deployed mechanisms. 182 In current deployments, the information distributed by the flow-spec 183 extension is originated both manually as well as automatically. The 184 latter by systems which are able to detect malicious flows. When 185 automated systems are used care should be taken to ensure their 186 correctness as well as to limit the number and advertisement rate of 187 flow routes. 189 This specification defines required protocol extensions to address 190 most common applications of IPv4 unicast and VPNv4 unicast filtering. 191 The same mechanism can be reused and new match criteria added to 192 address similar filtering needs for other BGP address families (for 193 example IPv6 unicast). Authors believe that those would be best to 194 be addressed in a separate document. 196 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 197 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 198 document are to be interpreted as described in RFC 2119 [RFC2119]. 200 3. Flow specifications 202 A flow specification is an n-tuple consisting of several matching 203 criteria that can be applied to IP traffic. A given IP packet is 204 said to match the defined flow if it matches all the specified 205 criteria. 207 A given flow may be associated with a set of attributes, depending on 208 the particular application, such attributes may or may not include 209 reachability information (i.e. NEXT_HOP). Well-known or AS-specific 210 community attributes can be used to encode a set of predetermined 211 actions. 213 A particular application is identified by a specific (AFI, SAFI) pair 214 [RFC4760] and corresponds to a distinct set of RIBs. Those RIBs 215 should be treated independently from each other in order to assure 216 non-interference between distinct applications. 218 BGP itself treats the NLRI as an opaque key to an entry in its 219 databases. Entries that are placed in the Loc-RIB are then 220 associated with a given set of semantics which is application 221 dependent. This is consistent with existing BGP applications. For 222 instance IP unicast routing (AFI=1, SAFI=1) and IP multicast reverse- 223 path information (AFI=1, SAFI=2) are handled by BGP without any 224 particular semantics being associated with them until installed in 225 the Loc-RIB. 227 Standard BGP policy mechanisms, such as UPDATE filtering by NLRI 228 prefix and community matching, SHOULD apply to the newly defined 229 NLRI-type. Network operators can also control propagation of such 230 routing updates by enabling or disabling the exchange of a particular 231 (AFI, SAFI) pair on a given BGP peering session. 233 4. Dissemination of Information 235 We define a "Flow Specification" NLRI type that may include several 236 components such as destination prefix, source prefix, protocol, 237 ports, etc. This NLRI is treated as an opaque bit string prefix by 238 BGP. Each bit string identifies a key to a database entry which a 239 set of attributes can be associated with. 241 This NLRI information is encoded using MP_REACH_NLRI and 242 MP_UNREACH_NLRI attributes as defined in RFC4760 [RFC4760]. Whenever 243 the corresponding application does not require Next Hop information, 244 this shall be encoded as a 0 octet length Next Hop in the 245 MP_REACH_NLRI attribute and ignored on receipt. 247 The NLRI field of the MP_REACH_NLRI and MP_UNREACH_NLRI is encoded as 248 a 1 or 2 octet NLRI length field followed by a variable length NLRI 249 value. The NLRI length is expressed in octets. 251 +------------------------------+ 252 | length (0xnn or 0xfn nn) | 253 +------------------------------+ 254 | NLRI value (variable) | 255 +------------------------------+ 257 flow-spec NLRI 259 If the NLRI length value is smaller than 240 (0xf0 hex), the length 260 field can be encoded as a single octet. Otherwise, it is encoded as 261 a extended length 2 octet value in which the most significant nibble 262 of the first byte is all ones. 264 The Flow Specification NLRI-type consists of several optional 265 subcomponents. A specific packet is considered to match the flow 266 specification when it matches the intersection (AND) of all the 267 components present in the specification. 269 The following component types are defined: 271 Type 1 - Destination Prefix 272 Encoding: 274 Defines the destination prefix to match. Prefixes are encoded 275 as in BGP UPDATE messages, a length in bits is followed by 276 enough octets to contain the prefix information. 278 Type 2 - Source Prefix 280 Encoding: 282 Defines the source prefix to match. 284 Type 3 - IP Protocol 286 Encoding: 288 Contains a set of {operator, value} pairs that are used to 289 match IP protocol value byte in IP packets. 291 The operator byte is encoded as: 293 0 1 2 3 4 5 6 7 294 +---+---+---+---+---+---+---+---+ 295 | e | a | len | 0 |lt |gt |eq | 296 +---+---+---+---+---+---+---+---+ 298 Numeric operator 300 * End of List bit. Set in the last {op, value} pair in the list. 302 * AND bit. If unset the previous term is logically ORed with the 303 current one. If set the operation is a logical AND. It should 304 be unset in the first operator byte of a sequence. The AND 305 operator has higher priority than OR for the purposes of 306 evaluating logical expressions. 308 * The length of value field for this operand is given as (1 << 309 len). 311 * Lt - less than comparison between data and value. 313 * gt - greater than comparison between data and value. 315 * eq - equality between data and value. 317 * The bits lt, gt, and eq can be combined to produce "less or 318 equal", "greater or equal" and inequality values. 320 Type 4 - Port 322 Encoding: 324 Defines a list of {operation, value} pairs that matches source 325 OR destination TCP/UDP ports. This list is encoded using the 326 numeric operand format defined above. Values are encoded as 1 327 or 2 byte quantities. 329 Port, source port and destination port components evaluate to 330 FALSE if the IP protocol field of the packet has a value other 331 than TCP or UDP, if the packet is fragmented and this is not 332 the first fragment or if the system in unable to locate the 333 transport header. Different implementations may or may not be 334 able to decode the transport header in the presence of IP 335 options or ESP NULL [RFC4303] encryption. 337 Type 5 - Destination port 339 Encoding: 341 Defines a list of {operation, value} pairs used to match the 342 destination port of a TCP or UDP packet. Values are encoded as 343 1 or 2 byte quantities. 345 Type 6 - Source port 347 Encoding: 349 Defines a list of {operation, value} pairs used to match the 350 source port of a TCP or UDP packet. Values are encoded as 1 or 351 2 byte quantities. 353 Type 7 - ICMP type 355 Encoding: 357 Defines a list of {operation, value} pairs used to match the 358 type field of an icmp packet. Values are encoded using a 359 single byte. 361 The ICMP type and code specifiers evaluate to FALSE whenever 362 the protocol value is not ICMP 364 Type 8 - ICMP code 366 Encoding: 367 Defines a list of {operation, value} pairs used to match the 368 code field of an icmp packet. Values are encoded using a 369 single byte. 371 Type 9 - TCP flags 373 Encoding: 375 Bitmask values can be encoded as a one or two byte bitmask. 376 When a single byte is specified it matches byte 13 of the TCP 377 header [RFC0793] which contains bits 8 though 15 of the 4th 378 32bit word. When a 2 byte encoding is used it matches bytes 12 379 and 13 of the TCP header with the data offset field having a 380 "don't care" value. 382 As with port specifiers, this component evaluates to FALSE for 383 packets that are not TCP packets. 385 This type uses the bitmask operand format, which differs from 386 the numeric operator format in the lower nibble. 388 0 1 2 3 4 5 6 7 389 +---+---+---+---+---+---+---+---+ 390 | e | a | len | 0 | 0 |not| m | 391 +---+---+---+---+---+---+---+---+ 393 * Most significant nibble: (End of List bit, AND bit and Length 394 field), as defined for in the numeric operator format. 396 * NOT bit. If set, logical negation of operation. 398 * Match bit. If set this is a bitwise match operation defined as 399 "(data & value) == value"; if unset (data & value) evaluates to 400 true if any of the bits in the value mask are set in the data. 402 Type 10 - Packet length 404 Encoding: 406 Match on the total IP packet length (excluding L2 but including 407 IP header). Values are encoded using as 1 or 2 byte 408 quantities. 410 Type 11 - DSCP 412 Encoding: 413 Defines a list of {operation, value} pairs used to match the 414 6-bit DSCP field [RFC2474]. Values are encoded using a single 415 byte, where the two most significant bits are zero and the six 416 least significant bits contain the DSCP value. 418 Type 12 - Fragment 420 Encoding: 422 Uses bitmask operand format defined above. 424 0 1 2 3 4 5 6 7 425 +---+---+---+---+---+---+---+---+ 426 | Reserved |LF |FF |IsF|DF | 427 +---+---+---+---+---+---+---+---+ 429 Bitmask values: 431 + Bit 7 - Dont fragment 433 + Bit 6 - Is a fragment 435 + Bit 5 - First fragment 437 + Bit 4 - Last fragment 439 Flow specification components must follow strict type ordering. A 440 given component type may or may not be present in the specification, 441 but if present it MUST precede any component of higher numeric type 442 value. 444 If a given component type within a prefix in unknown, the prefix in 445 question cannot be used for traffic filtering purposes by the 446 receiver. Since a Flow Specification has the semantics of a logical 447 AND of all components, if a component is FALSE by definition it 448 cannot be applied. However for the purposes of BGP route propagation 449 this prefix should still be transmitted since BGP route distribution 450 is independent on NLRI semantics. 452 The encoding is chosen in order to account for future 453 extensibility. 455 An example of a Flow Specification encoding for: "all packets to 456 10.0.1/24 and TCP port 25". 458 +------------------+----------+----------+ 459 | destination | proto | port | 460 +------------------+----------+----------+ 461 | 0x01 18 0a 00 01 | 03 81 06 | 04 81 19 | 462 +------------------+----------+----------+ 464 Decode for protocol: 466 +-------+----------+------------------------------+ 467 | Value | | | 468 +-------+----------+------------------------------+ 469 | 0x03 | type | | 470 | 0x81 | operator | end-of-list, value size=1, = | 471 | 0x06 | value | | 472 +-------+----------+------------------------------+ 474 An example of a Flow Specification encoding for: "all packets to 475 10.0.1/24 from 192/8 and port {range [137, 139] or 8080}". 477 +------------------+----------+-------------------------+ 478 | destination | source | port | 479 +------------------+----------+-------------------------+ 480 | 0x01 18 0a 01 01 | 02 08 c0 | 04 03 89 45 8b 91 1f 90 | 481 +------------------+----------+-------------------------+ 483 Decode for port: 485 +--------+----------+------------------------------+ 486 | Value | | | 487 +--------+----------+------------------------------+ 488 | 0x04 | type | | 489 | 0x03 | operator | size=1, >= | 490 | 0x89 | value | 137 | 491 | 0x45 | operator | &, value size=1, <= | 492 | 0x8b | value | 139 | 493 | 0x91 | operator | end-of-list, value-size=2, = | 494 | 0x1f90 | value | 8080 | 495 +--------+----------+------------------------------+ 497 This constitutes a NLRI with an NLRI length of 16 octets. 499 Implementations wishing to exchange flow specification rules MUST use 500 BGP's Capability Advertisement facility to exchange the Multiprotocol 501 Extension Capability Code (Code 1) as defined in RFC4760 [RFC4760]. 502 The (AFI, SAFI) pair carried in the Multiprotocol Extension 503 capability MUST be the same as the one used to identify a particular 504 application that uses this NLRI-type. 506 5. Traffic filtering 508 Traffic filtering policies have been traditionally considered to be 509 relatively static. 511 The popularity of traffic-based denial of service (DoS) attacks, 512 which often requires the network operator to be able to use traffic 513 filters for detection and mitigation, brings with it requirements 514 that are not fully satisfied by existing tools. 516 Increasingly, DoS mitigation, requires coordination among several 517 Service Providers, in order to be able to identify traffic source(s) 518 and because the volumes of traffic may be such that they will 519 otherwise significantly affect the performance of the network. 521 Several techniques are currently used to control traffic filtering of 522 DoS attacks. Among those, one of the most common is to inject 523 unicast route advertisements corresponding to a destination prefix 524 being attacked. One variant of this technique marks such route 525 advertisements with a community that gets translated into a discard 526 next-hop by the receiving router. Other variants, attract traffic to 527 a particular node that serves as a deterministic drop point. 529 Using unicast routing advertisements to distribute traffic filtering 530 information has the advantage of using the existing infrastructure 531 and inter-as communication channels. This can allow, for instance, a 532 service provider to accept filtering requests from customers for 533 address space they own. 535 There are several drawbacks, however. An issue that is immediately 536 apparent is the granularity of filtering control: only destination 537 prefixes may be specified. Another area of concern is the fact that 538 filtering information is intermingled with routing information. 540 The mechanism defined in this document is designed to address these 541 limitations. We use the flow specification NLRI defined above to 542 convey information about traffic filtering rules for traffic that 543 should be discarded. 545 This mechanism is primarily designed to allow an upstream autonomous 546 system to perform inbound filtering, in their ingress routers of 547 traffic that a given downstream AS wishes to drop. 549 In order to achieve this goal, we define an application specific NLRI 550 identifier (AFI=1, SAFI=133) along with specific semantic rules. 552 BGP routing updates containing this identifier use the flow 553 specification NLRI encoding to convey particular aggregated flows 554 that require special treatment. 556 Flow routing information received via this (afi, safi) pair is 557 subject to the validation procedure detailed below. 559 5.1. Order of traffic filtering rules 561 With traffic filtering rules, more than one rule may match a 562 particular traffic flow. Thus it is necessary to define the order at 563 which rules get matched and applied to a particular traffic flow. 564 This ordering function must be such that it must not depend on the 565 arrival order of the flow specifications rules and must be constant 566 in the network. 568 The relative order of two flow specification rules is determined by 569 comparing their respective components. The algorithm starts by 570 comparing the left-most components of the rules. If the types 571 differ, the rule with lowest numeric type value has higher precedence 572 (and thus will match before) the rule that doesn't contain that 573 component type. If the component types are the same, then a type 574 specific comparison is performed. 576 For IP prefix values (IP destination and source prefix) precedence is 577 given to lowest IP value of the common prefix length; if the common 578 prefix is equal then the most specific prefix has precedence. 580 For all other component types, unless otherwise specified, the 581 comparison is performed by comparing the component data as a binary 582 string using the memcmp() function as defined by the ISO C standard. 583 For strings of different lengths, the common prefix is compared. If 584 equal the longest string is considered to have higher precedence than 585 the shorter one. 587 Pseudocode: 589 flow_rule_cmp (a, b) 590 { 591 comp1 = next_component(a); 592 comp2 = next_component(b); 593 while (comp1 || comp2) { 594 // component_type returns infinity on end-of-list 595 if (component_type(comp1) < compnent_type(comp2)) { 596 return A_HAS_PRECEDENCE; 597 } 598 if (component_type(comp1) > component_type(comp2)) { 599 return B_HAS_PRECEDENCE; 600 } 602 if (component_type(comp1) == IP_DESTINATION || IP_SOURCE) { 603 common = MIN(prefix_length(comp1), prefix_length(comp2)); 604 cmp = prefix_compare(comp1, comp2, common); 605 // not equal, lowest value has precedence 606 // equal, longest match has precedence 607 } else { 608 common = MIN(component_length(comp1), component_length(comp2)); 609 cmp = memcmp(data(comp1), data(comp2), common); 610 // not equal, lowest value has precedence 611 // equal, longest string has precedence 612 } 613 } 615 return EQUAL; 616 } 618 6. Validation procedure 620 Flow specifications received from a BGP peer and which are accepted 621 in the respective Adj-RIB-In are used as input to the route selection 622 process. Although the forwarding attributes of two routes for the 623 same Flow Specification prefix may be the same, BGP is still required 624 to perform its path selection algorithm in order to select the 625 correct set of attributes to advertise. 627 The first step of the BGP Route Selection procedure (section 9.1.2 of 628 [RFC4271]) is to exclude from the selection procedure routes that are 629 considered non-feasible. In the context of IP routing information 630 this step is used to validate that the NEXT_HOP attribute of a given 631 route is resolvable. 633 The concept can be extended, in the case of Flow Specification NLRI, 634 to allow other validation procedures. 636 A flow specification NLRI must be validated such that it is 637 considered feasible if and only if: 639 a) The originator of the flow specification matches the originator of 640 the best-match unicast route for the destination prefix embedded 641 in the flow specification. 643 b) There are no more-specific unicast routes, when compared with the 644 flow destination prefix, that have been received from a different 645 neighboring AS than the best-match unicast route, which has been 646 determined in step a). 648 By originator of a BGP route, we mean either the BGP originator path 649 attribute, as used by route reflection, or the transport address of 650 the BGP peer, if this path attribute is not present. 652 The underlying concept is that the neighboring AS that advertises the 653 best unicast route for a destination is allowed to advertise flow- 654 spec information that conveys a more or equally specific destination 655 prefix. Thus, as long as there are no more-specific unicast routes, 656 received from a different neighbor AS, which would be affected by 657 that filtering rule. 659 The neighboring AS is the immediate destination of the traffic 660 described by the Flow Specification. If it requests these flows to 661 be dropped that request can be honored without concern that it 662 represents a denial of service in itself. Supposedly, the traffic is 663 being dropped by the downstream autonomous-system and there is no 664 added value in carrying the traffic to it. 666 BGP implementations MUST also enforce that the AS_PATH attribute of a 667 route received via eBGP contains the neighboring AS in the left-most 668 position of the AS_PATH attribute. While this rule is optional in 669 the BGP specification, it becomes necessary to enforce it for 670 security reasons. 672 7. Traffic Filtering Actions 674 This specification defines a minimum set of filtering actions that it 675 standardizes as BGP extended community values [RFC4360]. This is not 676 meant to be an inclusive list of all the possible actions but only a 677 subset that can be interpreted consistently across the network. 679 Implementations should provide mechanisms that map an arbitrary BGP 680 community value (normal or extended) to filtering actions that 681 require different mappings in different systems in the network. For 682 instance, providing packets with a worse than best-effort per-hop 683 behavior is a functionality that is likely to be implemented 684 differently in different systems and for which no standard behavior 685 is currently known. Rather than attempting to define it here, this 686 can be accomplished by mapping a user defined community value to 687 platform / network specific behavior via user configuration. 689 The default action for a traffic filtering flow specification is to 690 accept IP traffic that matches that particular rule. 692 The following extended community values can be used to specify 693 particular actions. 695 +--------+--------------------+--------------------------+ 696 | type | extended community | encoding | 697 +--------+--------------------+--------------------------+ 698 | 0x8006 | traffic-rate | 2-byte as#, 4-byte float | 699 | 0x8007 | traffic-action | bitmask | 700 | 0x8008 | redirect | 6-byte Route Target | 701 | 0x8009 | traffic-marking | DSCP value | 702 +--------+--------------------+--------------------------+ 704 Traffic-rate The traffic-rate extended community is a non-transitive 705 extended community across the Autonomous system boundary and uses 706 following extended community encoding: 708 The first two octets carry the 2 octet id which can be assigned 709 from a 2 byte AS number. When 4 byte AS number is locally 710 present 2 least significant bytes of such AS number can be 711 used. This value is purely informational and should not be 712 interpreted by the implementation. 714 The remaining 4 octets carry the rate information in IEEE 715 floating point [IEEE.754.1985] format , units being bytes per 716 second. A traffic-rate of 0 should result on all traffic for 717 the particular flow to be discarded. 719 Traffic-action The traffic-action extended community consists of 6 720 bytes of which only the 2 least significant bits of the 6th byte 721 (from left to right) are currently defined. 723 0 1 2 3 4 5 6 7 724 +---+---+---+---+---+---+---+---+ 725 | reserved | S | T | 726 +---+---+---+---+---+---+---+---+ 728 * Terminal action (bit 7). When this bit is set the traffic 729 filtering engine will apply any subsequent filtering rules (as 730 defined by the ordering procedure). If not set the evaluation 731 of the traffic filter stops when this rule is applied. 733 * Sample (bit 6). Enables traffic sampling and logging for this 734 flow specification. 736 Redirect The redirect extended community allows the traffic to be 737 redirected to a VRF routing instance that list the specified 738 route-target in its import policy. If several local instances 739 match this criteria, the choice between them is a local matter 740 (for example, the instance with the lowest Route Distinguisher 741 value can be elected). This extended community uses the same 742 encoding as the Route Target extended community [RFC4360] 744 Traffic Marking The traffic marking extended community instructs a 745 system to modify the DSCP bits of a transiting IP packet to the 746 corresponding value. This extended community is encoded as a 747 sequence of 5 zero bytes followed by the DSCP value encoded in the 748 6 least significant bits of 6th byte. 750 8. Traffic filtering in RFC2547bis networks 752 Provider-based layer 3 VPN networks, such as the ones using an BGP/ 753 MPLS IP VPN [RFC4364] control plane, have different traffic filtering 754 requirements than internet service providers. 756 In these environments, the VPN customer network often has traffic 757 filtering capabilities towards their external network connections 758 (e.g. firewall facing public network connection). Less common is the 759 presence of traffic filtering capabilities between different VPN 760 attachment sites. In an any-to-any connectivity model, which is the 761 default, this means that site to site traffic is unfiltered. 763 In circumstances where a security threat does get propagated inside 764 the VPN customer network, there may not be readily available 765 mechanisms to provide mitigation via traffic filter. 767 This document proposes an additional BGP NLRI type (afi=1, safi=134) 768 value, which can be used to propagate traffic filtering information 769 in a BGP/MPLS VPN environment. 771 The NLRI format for this address family consists of a fixed length 772 Route Distinguisher field (8 bytes) followed by a flow specification, 773 following the encoding defined in this document. The NLRI length 774 field shall include both the 8 bytes of the Route Distinguisher as 775 well as the subsequent flow specification. 777 Propagation of this NLRI is controlled by matching Route Target 778 extended communities associated with the BGP path advertisement with 779 the VRF import policy, using the same mechanism as described in "BGP/ 780 MPLS IP VPNs" [RFC4364] . 782 Flow specification rules received via this NLRI apply only to traffic 783 that belongs to the VRF(s) in which it is imported. By default, 784 traffic received from a remote PE is switched via an mpls forwarding 785 decision and is not subject to filtering. 787 Contrary to the behavior specified for the non-VPN NLRI, flow rules 788 are accepted by default, when received from remote PE routers. 790 9. Monitoring 792 Traffic filtering applications require monitoring and traffic 793 statistics facilities. While this is an implementation specific 794 choice, implementations SHOULD provide: 796 o A mechanism to log the packet header of filtered traffic, 798 o A mechanism to count the number of matches for a given Flow 799 Specification rule. 801 10. Security considerations 803 Inter-provider routing is based on a web of trust. Neighboring 804 autonomous-systems are trusted to advertise valid reachability 805 information. If this trust model is violated, a neighboring 806 autonomous system may cause a denial of service attack by advertising 807 reachability information for a given prefix for which it does not 808 provide service. 810 As long as traffic filtering rules are restricted to match the 811 corresponding unicast routing paths for the relevant prefixes, the 812 security characteristics of this proposal are equivalent to the 813 existing security properties of BGP unicast routing. 815 Where it not the case, this would open the door to further denial of 816 service attacks. 818 Enabling firewall like capabilities in routers without centralized 819 management could make certain failures harder to diagnose. For 820 example, it is possible to allow TCP packets to pass between a pair 821 of addresses but not ICMP packets. It is also possible to permit 822 packets smaller than 900 or greater than 1000 bytes to pass between a 823 pair of addresses, but not packets whose length is in the range 900- 824 1000. Such behavior may be confusing and these capabilities should 825 be used with care whether manually configured or coordinated through 826 the protocol extensions described in this document. 828 11. IANA Considerations 830 A flow specification consists of a sequence of flow components, which 831 are identified by a an 8-bit component type. Types must be assigned 832 and interpreted uniquely. The current specification defines types 1 833 though 12, with the value 0 being reserved. 835 For the purpose of this work IANA has allocated values for two SAFIs: 836 SAFI 133 for IPv4 and SAFI 134 for VPNv4 dissemination of flow 837 specification rules. 839 The following traffic filtering flow specification rules are to be 840 allocated by IANA from BGP Extended Communities Type - Experimental 841 Use registry. Authors recommend the following type values: 843 0x8006 - Flow spec traffic-rate 845 0x8007 - Flow spec traffic-action 847 0x8008 - Flow spec redirect 849 0x8009 - Flow spec traffic-remarking 851 Authors would like to ask IANA to create and maintain a new registry 852 entitled: "Flow Spec Component Type". Authors recommend to allocate 853 the following component types: 855 Type 1 - Destination Prefix 857 Type 2 - Source Prefix 859 Type 3 - IP Protocol 861 Type 4 - Port 863 Type 5 - Destination port 865 Type 6 - Source port 866 Type 7 - ICMP type 868 Type 8 - ICMP code 870 Type 9 - TCP flags 872 Type 10 - Packet length 874 Type 11 - DSCP 876 Type 12 - Fragment 878 In order to manage the limited number space and accommodate several 879 usages the following policies defined by RFC 5226 [RFC5226] are used: 881 +--------------+-------------------------------+ 882 | Range | Policy | 883 +--------------+-------------------------------+ 884 | 0 | Invalid value | 885 | [1 .. 12] | Defined by this specification | 886 | [13 .. 127] | Specification Required | 887 | [128 .. 255] | Private Use | 888 +--------------+-------------------------------+ 890 The specification of a particular "flow component type" must clearly 891 identify what is the criteria used to match packets forwarded by the 892 router. This criteria should be meaningful across router hops and 893 not depend on values that change hop-by-hop such as ttl or layer-2 894 encapsulation. 896 The "Traffic-action" extended community defined in this document has 897 6 unused bits which can be used to convey additional meaning. 898 Authors would like to ask IANA to create and maintain a new registry 899 entitled: "Traffic Action Fields". These values should be assigned 900 via IETF Review rules only. Authors recommend to allocate the 901 following traffic action fields: 903 0 Terminal Action 905 1 Sample 907 2-47 Unassigned 909 12. Acknowledgments 911 The authors would like to thank Yakov Rekhter, Dennis Ferguson, Chris 912 Morrow, Charlie Kaufman and David Smith for their comments. 914 Chaitanya Kodeboyina helped design the flow validation procedure. 916 Steven Lin and Jim Washburn ironed out all the details necessary to 917 produce a working implementation. 919 13. Normative References 921 [IEEE.754.1985] 922 Institute of Electrical and Electronics Engineers, 923 "Standard for Binary Floating-Point Arithmetic", 924 IEEE Standard 754, August 1985. 926 [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, 927 RFC 793, September 1981. 929 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 930 Requirement Levels", BCP 14, RFC 2119, March 1997. 932 [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, 933 "Definition of the Differentiated Services Field (DS 934 Field) in the IPv4 and IPv6 Headers", RFC 2474, 935 December 1998. 937 [RFC4271] Rekhter, Y., Li, T., and S. Hares, "A Border Gateway 938 Protocol 4 (BGP-4)", RFC 4271, January 2006. 940 [RFC4303] Kent, S., "IP Encapsulating Security Payload (ESP)", 941 RFC 4303, December 2005. 943 [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended 944 Communities Attribute", RFC 4360, February 2006. 946 [RFC4364] Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private 947 Networks (VPNs)", RFC 4364, February 2006. 949 [RFC4760] Bates, T., Chandra, R., Katz, D., and Y. Rekhter, 950 "Multiprotocol Extensions for BGP-4", RFC 4760, 951 January 2007. 953 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 954 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 955 May 2008. 957 Authors' Addresses 959 Pedro Marques 960 Juniper Networks 961 1194 N. Mathilda Ave. 962 Sunnyvale, CA 94089 963 US 965 Email: roque@juniper.net 967 Nischal Sheth 968 Juniper Networks 969 1194 N. Mathilda Ave. 970 Sunnyvale, CA 94089 971 US 973 Email: nsheth@juniper.net 975 Robert Raszuk 976 Juniper Networks 977 1194 N. Mathilda Ave. 978 Sunnyvale, CA 94089 979 US 981 Email: raszuk@juniper.net 983 Barry Greene 984 Juniper Networks 985 1194 N. Mathilda Ave. 986 Sunnyvale, CA 94089 987 US 989 Email: bgreene@juniper.net 991 Jared Mauch 992 NTT America 993 101 Park Ave 994 41st Floor 995 New York, NY 10178 996 US 998 Email: jared@us.ntt.net 999 Danny McPherson 1000 Arbor Networks 1002 Email: danny@arbor.net