idnits 2.17.1 draft-bryant-filsfils-fat-pw-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.ii or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (March 2, 2009) is 5534 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4447 (ref. '6') (Obsoleted by RFC 8077) ** Downref: Normative reference to an Informational RFC: RFC 4378 (ref. '8') ** Obsolete normative reference: RFC 4379 (ref. '9') (Obsoleted by RFC 8029) == Outdated reference: A later version (-02) exists of draft-kompella-mpls-entropy-label-00 Summary: 4 errors (**), 0 flaws (~~), 2 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PWE3 S. Bryant, Ed. 3 Internet-Draft C. Filsfils 4 Intended status: Standards Track Cisco Systems 5 Expires: September 3, 2009 U. Drafz 6 Deutsche Telekom 7 V. Kompella 8 J. Regan 9 Alcatel-Lucent 10 S. Amante 11 Level 3 Communications 12 March 2, 2009 14 Flow Aware Transport of MPLS Pseudowires 15 draft-bryant-filsfils-fat-pw-03 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on September 3, 2009. 40 Copyright Notice 42 Copyright (c) 2009 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents in effect on the date of 47 publication of this document (http://trustee.ietf.org/license-info). 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. 51 Abstract 53 Where the payload carried over a pseudowire carries a number of 54 identifiable flows it can in some circumstances be desirable to carry 55 those flows over the equal cost multiple paths (ECMPs) that exist in 56 the packet switched network. Most forwarding engines are able to 57 hash based on label stacks and use this to balance flows over ECMPs. 58 This draft describes a method of identifying the flows, or flow 59 groups, to the label switched routers by including an additional 60 label in the label stack. 62 Requirements Language 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 66 document are to be interpreted as described in RFC2119 [1]. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 1.1. ECMP in Label Switched Routers . . . . . . . . . . . . . . 5 72 1.2. Flow Label . . . . . . . . . . . . . . . . . . . . . . . . 5 73 2. Native Service Processing Function . . . . . . . . . . . . . . 6 74 3. Pseudowire Forwarder . . . . . . . . . . . . . . . . . . . . . 6 75 3.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 6 76 4. Signaling the Presence of the Flow Label . . . . . . . . . . . 7 77 4.1. Structure of Flow Label TLV . . . . . . . . . . . . . . . 8 78 5. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 79 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 10 80 6.1. ECMP . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 81 6.2. Link Aggregation Groups . . . . . . . . . . . . . . . . . 12 82 6.3. The Single Large Flow Case . . . . . . . . . . . . . . . . 12 83 7. Applicability to MPLS . . . . . . . . . . . . . . . . . . . . 13 84 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 85 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 86 10. Congestion Considerations . . . . . . . . . . . . . . . . . . 14 87 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 88 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 89 12.1. Normative References . . . . . . . . . . . . . . . . . . . 15 90 12.2. Informative References . . . . . . . . . . . . . . . . . . 16 91 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 16 93 1. Introduction 95 A pseudowire [11] is normally transported over one single network 96 path, even if multiple Equal Cost Multiple Paths (ECMP) exit between 97 the ingress and egress PEs[2] [3]. This is required to preserve the 98 characteristics of the emulated service (e.g. to avoid misordering 99 SAToP pseudowire's [4]). The use of a single path to preserve order 100 remains the default mode of operation of a pseudowire (PW). The new 101 capability proposed in this document is an OPTIONAL mode which may be 102 used when the use of ECMP paths for is known to be beneficial (and 103 not harmful) to the operation of the PW. 105 Some pseudowires are used to transport large volumes of IP traffic 106 between routers at two locations. One example of this is the use of 107 an Ethernet pseudowire to create a virtual direct link between a pair 108 of routers. Such pseudowire's may carry from hundred's of Mbps to 109 Gbps of traffic. Such pseudowire's do not require strict ordering to 110 be preserved between packets of the pseudowire. They only require 111 ordering to be preserved within the context of each individual 112 transported IP flow. Some operators have requested the ability to 113 explicitly configure such a pseudowire to leverage the availability 114 of multiple ECMP paths. This allows for better capacity planning as 115 the statistical multiplexing of a larger number of smaller flows is 116 more efficient than with a smaller set of larger flows. Although 117 Ethernet is used as an example above, the mechanisms described in 118 this draft are general mechanisms that may be applied to any 119 pseudowire type in which there are identifiable flows, and in which 120 the there is no requirement to preserve the order between those 121 flows. 123 Typically, forwarding hardware can deduce that an IP payload is being 124 directly carried by an MPLS label stack, and is capable of looking at 125 some fields in packets to construct hash buckets for conversations or 126 flows. However, an intermediate node has no information on the type 127 pseudowire being carried in the packet. This limits the forwarder at 128 the intermediate node to only being able to make an ECMP choice based 129 on a hash of the label stack. In the case of a pseudowire emulating 130 a high bandwidth trunk, the granularity obtained by hashing the 131 default label stack is inadequate for satisfactory load-balancing. 132 The ingress node, however, is in the special position of being able 133 to look at the un-encapsulated packet and spread flows amongst an 134 available ECMP paths, or even Loop-Free Alternates I [12] . This 135 draft proposes a method to introduce granularity on the hashing of 136 traffic running over pseudowires by introducing an additional label, 137 chosen by the ingress node, and placed at the bottom of the label 138 stack. 140 In addition to providing an indication of the flow structure for use 141 in ECMP forwarding decisions, the mechanism described in the document 142 may also be used to select flows for distribution over an 802.1ad 143 link aggregation group that has been used in an MPLS network. 145 1.1. ECMP in Label Switched Routers 147 Label switched routers commonly hash the label stack or some elements 148 of the label stack as a method of discriminating between flows, in 149 order to distribute those flows over the available equal cost 150 multiple paths that exist in the network. Since the label at the 151 bottom of stack is usually the label most closely associated with the 152 flow, this normally provides the greatest entropy, and hence is 153 usually included in the hash. This draft describes a method of 154 adding an additional label at the bottom of stack in order to 155 facilitate the load balancing of the flows within a pseudowire over 156 the available ECMPs. A similar design for general MPLS use has also 157 been proposed [13], however that is outside the scope of this draft. 159 An alternative method of load balancing by creating a number of 160 pseudowires and distributing the flows amongst them was considered, 161 but was rejected because: 163 o It did not introduce as much entropy as the load balance label 164 method. 166 o It required additional pseudowires to be set up and maintained. 168 1.2. Flow Label 170 An additional label is interposed between the pseudowire label and 171 the control word, or if the control word is not present, between the 172 pseudowire label and the pseudowire payload. This additional label 173 is called the Flow label. Indivisible flows within the pseudowire 174 MUST be mapped to the same Flow label by the ingress PE. The flow 175 label stimulates the correct ECMP load balancing behaviour in the 176 PSN. On receipt of the pseudowire packet at the egress PE (which 177 knows this additional label is present) the flow label is discarded 178 without processing. 180 Note that the flow label MUST NOT be an MPLS reserved label (values 181 in the range 0..15) [5], but is otherwise unconstrained by the 182 protocol. 184 Considerations of the TTL value are described in the Security section 185 of this document. In the case of a pseudowire there are no lower 186 restrictions on the label value since the TTL is never the top label. 187 The designers of the generalized solution [13]. 189 2. Native Service Processing Function 191 The Native Service Processing (NSP) function is a component of a PE 192 that has knowledge of the structure of the emulated service and is 193 able to take action on the service outside the scope of the 194 pseudowire. In this case it is required that the NSP in the ingress 195 PE identify flows, or groups of flows within the service, and 196 indicate the flow (group) identity of each packet as it is passed to 197 the pseudowire forwarder. Since this is an NSP function, by 198 definition, the method used to identify a flow is outside the scope 199 of the pseudowire design. Similarly, since the NSP is internal to 200 the PE, the method of flow indication to the pseudowire forwarder is 201 outside the scope of this document 203 3. Pseudowire Forwarder 205 The pseudowire forwarder must be provided with a method of mapping 206 flows to load balanced paths. 208 The forwarder must generate a label for the flow or group of flows. 209 How the load balance label values are determined is outside the scope 210 of this document, however the load balance label allocated to a flow 211 MUST NOT be an MPLS reserved label and SHOULD remain constant for the 212 life of the flow. It is recommended that the method chosen to 213 generate the load balancing labels introduces a high degree of 214 entropy in their values, to maximise the entropy presented to the 215 ECMP path selection mechanism in the LSRs in the PSN, and hence 216 distribute the flows as evenly as possible over the available PSN 217 ECMP paths. The forwarder at the ingress PE prepends the pseudowire 218 control word (if applicable), and then pushes the flow label, 219 followed by the pseudowire label. 221 The forwarder at the egress PE uses the pseudowire label to identify 222 the pseudowire. From the context associated with the pseudowire 223 label, the egress PE can determine whether a flow label is present. 224 If a flow label is present, the label is discarded. 226 All other pseudowire forwarding operations are unmodified by the 227 inclusion of the flow label. 229 3.1. Encapsulation 231 The PWE3 Protocol Stack Reference Model modified to include flow 232 label is shown in Figure 1 below 233 +-------------+ +-------------+ 234 | Emulated | | Emulated | 235 | Ethernet | | Ethernet | 236 | (including | Emulated Service | (including | 237 | VLAN) |<==============================>| VLAN) | 238 | Services | | Services | 239 +-------------+ +-------------+ 240 | Flow | | Flow | 241 +-------------+ Pseudowire +-------------+ 242 |Demultiplexer|<==============================>|Demultiplexer| 243 +-------------+ +-------------+ 244 | PSN | PSN Tunnel | PSN | 245 | MPLS |<==============================>| MPLS | 246 +-------------+ +-------------+ 247 | Physical | | Physical | 248 +-----+-------+ +-----+-------+ 250 Figure 1: PWE3 Protocol Stack Reference Model 252 The encapsulation of a pseudowire with a flow label is shown in 253 Figure 2 below 255 +-------------------------------+ 256 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 257 +-------------------------------+ 258 | PW label | 4 octets 259 +-------------------------------+ 260 | Flow label | 4 octets 261 +-------------------------------+ 262 | Optional Control Word | 4 octets 263 +-------------------------------+ 264 | Payload | 265 | | 266 | | n octets 267 | | 268 +-------------------------------+ 270 Figure 2: Encapsulation of a pseudowire with a pseudowire load 271 balancing label 273 4. Signaling the Presence of the Flow Label 275 When using the signalling procedures in [6], there is a Pseudowire 276 Interface Parameter Sub-TLV type used to synchronize the flow label 277 states between the ingress and egress PEs. 279 The absence of a flow label (FL) TLV by either party indicates that 280 the PE concerned is unable to recognize this TLV and the sender of 281 the FL TLV MUST send a new label mapping without the FL TLV. This 282 preserves backwards compatibility with existing PEs that do not 283 understand the FL TLV or that cannot, do not wish to, process the 284 flow label. 286 A PE that wishes to use a flow label sends an FL TLV with the F bit 287 set. A PE that can correctly process a flow label and is willing to 288 receive on, but does not wish to send a flow label sends an FL TLV 289 with the F bit clear. A PE that sends an FL TLV with the F bit set 290 and receives an FL TLV with or without the F bit set MUST include the 291 flow label between the pseudowire label and the control word (or is 292 the control word is not present between the pseudowire label and the 293 pseudowire payload). 295 If PWE3 signalling [6] is not in use for a pseudowire, then whether 296 the flow label is used MUST be identically provisioned in both PEs at 297 the pseudowire endpoints. If there is no provisioning support for 298 this option, the default behaviour is not to include the flow label. 300 Note that what is signalled is the desire to include the flow label 301 in the label stack. The value of the label is a local matter for the 302 ingress PE, and the label value itself is not signalled. 304 4.1. Structure of Flow Label TLV 306 The structure of the flow label TLV is shown in Figure 3. 308 0 1 2 3 309 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 310 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 311 | FL | Length |F| must be zero | 312 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 314 Figure 3: Multiple VC TLV 316 Where: 318 o FL is the flow label TLV identifier assigned by IANA. 320 o Length is the length of the TLV in octets and is 4. 322 o When F=1 a flow label will be pushed. When F=0 a flow label will 323 not be pushed. 325 5. OAM 327 The following OAM considerations apply to this method of load 328 balancing. 330 Where the OAM is only to be used to perform a basic test that the 331 pseudowires have been configured at the PEs, VCCV [7] messages may be 332 sent using any load balance pseudowire path, i.e. using any value for 333 the flow label. 335 Where it is required to verify that a pseudowire is fully functional 336 for all flows, VCCV [7] connection verification message MUST be sent 337 over each ECMP path to the pseudowire egress PE. This problem is 338 difficult to solve and scales poorly. We believe that this problem 339 is addressed by the following two methods: 341 1. If a failure occurs within the PSN, this failure will normally be 342 detected by the PSN's IGP (link/node failure, link or BFD or IGP 343 hello detection), and the IGP convergence will naturally modify 344 the ECMP set of network paths between the Ingress and Egress 345 PE's. Hence the PW is only impacted during the normal IGP 346 convergence time. 348 2. If the failure is related to the individual corruption of an LFIB 349 entry in a router, then only the network path using that specific 350 entry is impacted. If the PW is load balanced over multiple 351 network paths, then this failure can only be detected if, by 352 chance, the transported OAM flow is mapped onto the impacted 353 network path, or all paths are tested. This type of error may be 354 better solved be solved by other means such as LSP self test 355 [14]. 357 To troubleshoot the MPLS PSN, including multiple paths, the 358 techniques described in [8] and [9] can be used. 360 Where the pseudowire OAM is carried out of band (VCCV Type 2) it is 361 necessary to insert an "MPLS Router Alert Label" in the label stack. 362 The resultant label stack is a follows: 364 +-------------------------------+ 365 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 366 +-------------------------------+ 367 | Router Alert label | 4 octets 368 +-------------------------------+ 369 | PW label | 4 octets 370 +-------------------------------+ 371 | Flow label | 4 octets 372 +-------------------------------+ 373 | Optional Control Word | 4 octets 374 +-------------------------------+ 375 | Payload | 376 | | 377 | | n octets 378 | | 379 +-------------------------------+ 381 Figure 4: Use of Router Alert LAbel 383 6. Applicability 385 A node within the PSN is not able to perform deep-packet-inspection 386 (DPI) of the PW as the PW technology is not self-describing: the 387 structure of the PW payload is only known to the ingress and egress 388 PE devices. The method proposed in this document provides a 389 statistical mitigation of the problem of load balance in those cased 390 where a PE is able to discern flows embedded in the traffic received 391 on the attachment circuit. 393 The methods describe in this document are transparent to the PSN and 394 as such do not require any new capability from the PSN. 396 The requirement to load-balance over multiple PSN paths occurs when 397 the ratio between the PW access speed and the PSN's core link 398 bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet 399 this property. Ethernet does and this is the reason why this 400 document focuses on Ethernet. Applications for other high-access- 401 bandwidth PW's (fiber-channel) may be defined in the future. 403 This design applies to MPLS pseudowires where it is meaningful to 404 deconstruct the packets presented to the ingress PE into flows. The 405 mechanism described in this document promotes the distribution of 406 flows within the pseudowire over different network paths. This in 407 turn means that whilst packets within a flow are delivered in order 408 (subject to normal IP delivery perturbations due to topology 409 variation), order is not maintained amongst packets of different 410 flows. It is not proposed to associate a different sequence number 411 with each flow. If sequence number support is required this 412 mechanism is not applicable. 414 Where it is known that the traffic carried by the Ethernet pseudowire 415 is IP the method of identifying the flows are well known and can be 416 applied. Such methods typically include hashing on the source and 417 destination addresses, the protocol ID and higher-layer flow- 418 dependent fields such as TCP/UDP ports, L2TPv3 Session ID's etc. 420 Where it is known that the traffic carried by the Ethernet pseudowire 421 is non-IP, techniques used for link bundling between Ethernet 422 switches may be reused. In this case however the latency 423 distribution would be larger than is found in the link bundle case. 424 The acceptability of the increased latency is for further study. Of 425 particular importance the Ethernet control frames SHOULD always be 426 mapped to the same PSN path to ensure in-order delivery. 428 6.1. ECMP 430 ECMP in packet switched networks is statistical in nature. The 431 mapping of flows to a particular path does not take into account the 432 bandwidth of the flow being mapped or the current bandwidth usage of 433 the members of the ECMP set. This simplification works well when the 434 distribution of flows is evenly spread over the ECMP set and there 435 are a large number of flows that have low bandwidth relative to the 436 paths. A random allocation of a flow to a path provides a good 437 approximation to an even spread provided polarization effects are 438 avoided. The method proposed in this document has the same 439 statistical properties as an IP PSN. 441 ECMP is a load-sharing mechanism that is based on sharing the load 442 over a number of layer 3 paths through the PSN. Often however 443 multiple links exist between a pair of LSRs that are considered by 444 the IGP to be a single link. These are known as link bundles. The 445 mechanism described in this document can also be used to distribute 446 the flows within a pseudowire over the members of the link bundle by 447 using the flow label value to identify candidate flows. How that 448 mapping takes place is outside the scope of this specification. 449 Similar considerations apply to link aggregation groups. 451 In the ECMP case and the link bundling case the NSP may attempt to 452 take bandwidth into consideration when allocating groups of flows to 453 a common path. That is permitted, but it must be borne in mind that 454 the semantics of a label stack entry (LSE) as defined by [5] cannot 455 be modified, the value of the flow label cannot be modified at any 456 point on the LSP, and the interpretation of bit patterns in or values 457 of the flow label by an LSR are undefined. 459 A different type of load balancing is the desire to carry a 460 pseudowire over a set of PSN links in which the bandwidth of members 461 of the link set is less than the bandwidth of the pseudowire. This 462 problem is addressed in [15]. Such a mechanism can be considered 463 complementary to this mechanism. 465 6.2. Link Aggregation Groups 467 Link Aggregation (LAG) is used to bond together several physical 468 circuits between two adjacent nodes so they appear to higher-layer 469 protocols as a single, higher bandwidth "virtual" pipe. These may 470 co-exist in various parts of a given network. An advantage of LAGs 471 is that they reduce the number of routing and signaling protocol 472 adjacencies between devices, reducing control plane processing 473 overhead. As with ECMP key problem related to LAG is, due to 474 inefficiencies in LAG load-distribution algorithms, a particular 475 component- link may experience congestion, and the mechanism proposed 476 here may be able to assist in producing a more uniform flow 477 distribution. 479 The same considerations requiring a flow to go over a single member 480 of an ECMP path set apply to a member of a LAG. 482 6.3. The Single Large Flow Case 484 Clearly the operator should make sure that the service offered using 485 PW technology and the method described in this document does not 486 exceed the maximum planned link capacity unless it can be guaranteed 487 that it conforms to the Internet traffic profile of a very large 488 number of small flows. 490 If the payload on a PW is made of a single inner flow (i.e. an 491 encrypted connection between two routers), or the flow identifiers 492 are too deeply buried in the packet then the functionality described 493 in this document does not give any benefits, though neither does it 494 cause harm relative to the existing situation. The most common case 495 where a single flow dominated the traffic on a PW is when it is used 496 to transport enterprise traffic. Enterprise traffic may well consist 497 of a large single TCP flows , or encrypted flows that cannot be 498 handled by the methods described in this document. 500 An operator has six options under these circumstances: 502 1. The operator can do nothing and the system will work as it does 503 without the flow label. 505 2. The operator can make the customer aware that the service 506 offering has a restriction on flow bandwidth and police flows to 507 that restriction. This would allow customers offering multiple 508 flows to use a larger fraction their access bandwidth, whilst 509 preventing an single flow from consuming a fraction of internal 510 link bandwidth that the operator considered excessive. 512 3. The operator could configure the ingress PE to assign a constant 513 flow label to all high bandwidth flows so that only one path was 514 affected by these flows, 516 4. The operator could configure the ingress PE to assign a random 517 flow label to all high bandwidth flows so as to minimise the 518 disruption to the network as a cost of out of order traffic to 519 the user. 521 5. The operator could configure the ingress to assign a label of 522 special significance to all high bandwidth flows so that some 523 other action (not specified in this document) could be taken on 524 the flow. 526 The issues described above are mitigated by the following two 527 factors: 529 o Firstly, the customer of a high-bandwidth PW service has an 530 incentive to get the best transport service because an inefficient 531 use of the PSN leads to jitter and eventually to loss to the PW's 532 payload. 534 o Secondly, the customer is usually able to tailor their 535 applications to generate many flows in the PSN. A well-known 536 example is massive data transport between servers which use many 537 parallel TCP sessions. This same technique can be used by any 538 transport protocol: multiple UDP ports, multiple L2TPv3 Session 539 ID's, multiple GRE keys may be used to decompose a large flow into 540 smaller components. This approach may be applied to IPsec where 541 multiple SPI's may be allocated to the same security association. 543 7. Applicability to MPLS 545 A further application of this technique would be to create a basis 546 for hash diversity without having to peek below the label stack for 547 IP traffic carried over LDP LSPs. Work on the generalization of this 548 to MPLS has been described in draft-kompella-mpls-entropy-label. 549 This is can be regarded as a complementary but distinct approach 550 since although similar consideration may apply to the identification 551 of flows and the allocation of flow label values, the flow labels are 552 imposed by different network components and the associated signalling 553 mechanisms are different. 555 8. Security Considerations 557 The pseudowire generic security considerations described in [11] and 558 the security considerations applicable to a specific pseudowire type 559 (for example, in the case of an Ethernet pseudowire [10] apply. 561 The ingress PE SHOULD take steps to ensure that the load-balance 562 label is not used as a covert channel. 564 It is useful to give consideration to the choice of TTL value in the 565 flow label LSE. Since the flow label is the bottom of stack and even 566 when PHP is employed will on arrival at the egress PE be prepended by 567 the PW label, the flow label TTL MAY be set to a value of 1. This 568 will prevent the packet being inadvertently forwarded based on the 569 value of the flow label. Note that this may be a departure from 570 considerations that apply to the general MPLS case. 572 9. IANA Considerations 574 IANA is requested to allocate the next available values from the IETF 575 Consensus range in the Pseudowire Interface Parameters Sub-TLV type 576 Registry as a Flow Label indicator. 578 Parameter Length Description 580 TBD 4 Load Balancing Label 582 10. Congestion Considerations 584 The congestion considerations applicable to pseudowires as described 585 in [11] and any additional congestion considerations developed at the 586 time of publication apply to this design. 588 The ability to explicitly configure a PW to leverage the availability 589 of multiple ECMP paths is beneficial to capacity planning as, all 590 other parameters being constant, the statistical multiplexing of a 591 larger number of smaller flows is more efficient than with a smaller 592 number of larger flows. 594 Note that if the classification into flows is only performed on IP 595 packets the behaviour of those flows in the face of congestion will 596 be as already defined by the IETF for packets of that type and no 597 additional congestion processing is required. 599 Where flows that are not IP are classified pseudowire congestion 600 avoidance must be applied to each non-IP load balance group. 602 11. Acknowledgements 604 The authors wish to thank Joerg Kuechemann, Wilfried Maas, Luca 605 Martini, Mark Townsley, Kireeti Kompella and Lucy Yong for valuable 606 comments on this document. 608 12. References 610 12.1. Normative References 612 [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement 613 Levels", BCP 14, RFC 2119, March 1997. 615 [2] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 616 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for Use 617 over an MPLS PSN", RFC 4385, February 2006. 619 [3] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal Cost 620 Multipath Treatment in MPLS Networks", BCP 128, RFC 4928, 621 June 2007. 623 [4] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time Division 624 Multiplexing (TDM) over Packet (SAToP)", RFC 4553, June 2006. 626 [5] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., Farinacci, 627 D., Li, T., and A. Conta, "MPLS Label Stack Encoding", 628 RFC 3032, January 2001. 630 [6] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. Heron, 631 "Pseudowire Setup and Maintenance Using the Label Distribution 632 Protocol (LDP)", RFC 4447, April 2006. 634 [7] Nadeau, T. and C. Pignataro, "Pseudowire Virtual Circuit 635 Connectivity Verification (VCCV): A Control Channel for 636 Pseudowires", RFC 5085, December 2007. 638 [8] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol Label 639 Switching (MPLS) Operations and Management (OAM)", RFC 4378, 640 February 2006. 642 [9] Kompella, K. and G. Swallow, "Detecting Multi-Protocol Label 643 Switched (MPLS) Data Plane Failures", RFC 4379, February 2006. 645 [10] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, 646 "Encapsulation Methods for Transport of Ethernet over MPLS 647 Networks", RFC 4448, April 2006. 649 12.2. Informative References 651 [11] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to-Edge 652 (PWE3) Architecture", RFC 3985, March 2005. 654 [12] Zinin, A., Torvi, R., Choudhury, G., Martin, C., Imhoff, B., 655 and D. Fedyk, "Basic Specification for IP Fast-Reroute: Loop- 656 free Alternates", draft-ietf-rtgwg-ipfrr-spec-base-12 (work in 657 progress), March 2008. 659 [13] Kompella, K. and S. Amante, "The Use of Entropy Labels in MPLS 660 Forwarding", draft-kompella-mpls-entropy-label-00 (work in 661 progress), July 2008. 663 [14] Swallow, G., "Label Switching Router Self-Test", 664 draft-ietf-mpls-lsr-self-test-07 (work in progress), May 2007. 666 [15] Stein, Y., Mendelsohn, I., and R. Insler, "PW Bonding", 667 draft-stein-pwe3-pwbonding-01 (work in progress), 668 November 2008. 670 Authors' Addresses 672 Stewart Bryant (editor) 673 Cisco Systems 674 250 Longwater Ave 675 Reading RG2 6GB 676 United Kingdom 678 Phone: +44-208-824-8828 679 Email: stbryant@cisco.com 681 Clarence Filsfils 682 Cisco Systems 683 Brussels 684 Belgium 686 Email: cfilsfil@cisco.com 687 Ulrich Drafz 688 Deutsche Telekom 689 Muenster, 690 Germany 692 Phone: 693 Fax: 694 Email: Ulrich.Drafz@t-com.net 695 URI: 697 Vach Kompella 698 Alcatel-Lucent 700 Phone: 701 Fax: 702 Email: Alcatel-Lucent vach.kompella@alcatel-lucent.com 703 URI: 705 Joe Regan 706 Alcatel-Lucent 708 Phone: 709 Fax: 710 Email: joe.regan@alcatel-lucent.comRegan 711 URI: 713 Shane Amante 714 Level 3 Communications 716 Phone: 717 Fax: 718 Email: shane@castlepoint.net 719 URI: