idnits 2.17.1 draft-ietf-pwe3-fat-pw-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? (You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Feb 2009 rather than one of the newer Notices. See https://trustee.ietf.org/license-info/.) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 23, 2009) is 5328 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4379 (Obsoleted by RFC 8029) ** Obsolete normative reference: RFC 4447 (Obsoleted by RFC 8077) == Outdated reference: A later version (-11) exists of draft-ietf-bfd-base-09 == Outdated reference: A later version (-02) exists of draft-kompella-mpls-entropy-label-00 Summary: 3 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PWE3 S. Bryant, Ed. 3 Internet-Draft C. Filsfils 4 Intended status: Standards Track Cisco Systems 5 Expires: March 27, 2010 U. Drafz 6 Deutsche Telekom 7 V. Kompella 8 J. Regan 9 Alcatel-Lucent 10 S. Amante 11 Level 3 Communications 12 September 23, 2009 14 Flow Aware Transport of Pseudowires over an MPLS PSN 15 draft-ietf-pwe3-fat-pw-01 17 Status of this Memo 19 This Internet-Draft is submitted to IETF in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF), its areas, and its working groups. Note that 24 other groups may also distribute working documents as Internet- 25 Drafts. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 The list of current Internet-Drafts can be accessed at 33 http://www.ietf.org/ietf/1id-abstracts.txt. 35 The list of Internet-Draft Shadow Directories can be accessed at 36 http://www.ietf.org/shadow.html. 38 This Internet-Draft will expire on March 27, 2010. 40 Copyright Notice 42 Copyright (c) 2009 IETF Trust and the persons identified as the 43 document authors. All rights reserved. 45 This document is subject to BCP 78 and the IETF Trust's Legal 46 Provisions Relating to IETF Documents in effect on the date of 47 publication of this document (http://trustee.ietf.org/license-info). 48 Please review these documents carefully, as they describe your rights 49 and restrictions with respect to this document. 51 Abstract 53 Where the payload carried over a pseudowire carries a number of 54 identifiable flows it can in some circumstances be desirable to carry 55 those flows over the equal cost multiple paths (ECMPs) that exist in 56 the packet switched network. Most forwarding engines are able to 57 hash based on label stacks and use this to balance flows over ECMPs. 58 This draft describes a method of identifying the flows, or flow 59 groups, to the label switched routers by including an additional 60 label in the label stack. 62 Requirements Language 64 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 65 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 66 document are to be interpreted as described in RFC2119 [RFC2119]. 68 Table of Contents 70 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 71 1.1. ECMP in Label Switched Routers . . . . . . . . . . . . . . 5 72 1.2. Flow Label . . . . . . . . . . . . . . . . . . . . . . . . 5 73 2. Native Service Processing Function . . . . . . . . . . . . . . 6 74 3. Pseudowire Forwarder . . . . . . . . . . . . . . . . . . . . . 6 75 3.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 76 4. Signaling the Presence of the Flow Label . . . . . . . . . . . 8 77 4.1. Structure of Flow Label TLV . . . . . . . . . . . . . . . 9 78 5. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 79 6. Applicability . . . . . . . . . . . . . . . . . . . . . . . . 10 80 6.1. Equal Cost Multiple Paths . . . . . . . . . . . . . . . . 11 81 6.2. Link Aggregation Groups . . . . . . . . . . . . . . . . . 12 82 6.3. The Single Large Flow Case . . . . . . . . . . . . . . . . 12 83 6.4. MPLS-TP . . . . . . . . . . . . . . . . . . . . . . . . . 14 84 7. Applicability to MPLS . . . . . . . . . . . . . . . . . . . . 14 85 8. Security Considerations . . . . . . . . . . . . . . . . . . . 14 86 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 87 10. Congestion Considerations . . . . . . . . . . . . . . . . . . 15 88 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 89 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 90 12.1. Normative References . . . . . . . . . . . . . . . . . . . 16 91 12.2. Informative References . . . . . . . . . . . . . . . . . . 16 92 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 94 1. Introduction 96 A pseudowire (PW) [RFC3985] is normally transported over one single 97 network path, even if multiple Equal Cost Multiple Paths (ECMP) exit 98 between the ingress and egress PW provider edge (PE) 99 equipments[RFC4385] [RFC4928]. This is required to preserve the 100 characteristics of the emulated service (e.g. to avoid misordering 101 SAToP pseudowire packets [RFC4553] or subjecting the packets to 102 unusable inter-arrival times ). The use of a single path to preserve 103 order remains the default mode of operation of a pseudowire (PW). 104 The new capability proposed in this document is an OPTIONAL mode 105 which may be used when the use of ECMP paths for is known to be 106 beneficial (and not harmful) to the operation of the PW. 108 Some pseudowires are used to transport large volumes of IP traffic 109 between routers at two locations. One example of this is the use of 110 an Ethernet pseudowire to create a virtual direct link between a pair 111 of routers. Such pseudowire's may carry from hundred's of Mbps to 112 Gbps of traffic. Such pseudowire's do not require strict ordering to 113 be preserved between packets of the pseudowire. They only require 114 ordering to be preserved within the context of each individual 115 transported IP flow. Some operators have requested the ability to 116 explicitly configure such a pseudowire to leverage the availability 117 of multiple ECMP paths. This allows for better capacity planning as 118 the statistical multiplexing of a larger number of smaller flows is 119 more efficient than with a smaller set of larger flows. Although 120 Ethernet is used as an example above, the mechanisms described in 121 this draft are general mechanisms that may be applied to any 122 pseudowire type in which there are identifiable flows, and in which 123 the there is no requirement to preserve the order between those 124 flows. 126 Typically, forwarding hardware can deduce that an IP payload is being 127 directly carried by an MPLS label stack, and is capable of looking at 128 some fields in packets to construct hash buckets for conversations or 129 flows. However, an intermediate node has no information on the type 130 pseudowire being carried in the packet. This limits the forwarder at 131 the intermediate node to only being able to make an ECMP choice based 132 on a hash of the label stack. In the case of a pseudowire emulating 133 a high bandwidth trunk, the granularity obtained by hashing the 134 default label stack is inadequate for satisfactory load-balancing. 135 The ingress node, however, is in the special position of being able 136 to look at the un-encapsulated packet and spread flows amongst any 137 available ECMP paths, or even any Loop-Free Alternates [RFC5286] . 138 This draft proposes a method to introduce granularity on the hashing 139 of traffic running over pseudowires by introducing an additional 140 label, chosen by the ingress node, and placed at the bottom of the 141 label stack. 143 In addition to providing an indication of the flow structure for use 144 in ECMP forwarding decisions, the mechanism described in the document 145 may also be used to select flows for distribution over an 802.1ad 146 link aggregation group that has been used in an MPLS network. 148 1.1. ECMP in Label Switched Routers 150 Label switched routers commonly hash the label stack or some elements 151 of the label stack as a method of discriminating between flows, in 152 order to distribute those flows over the available equal cost 153 multiple paths that exist in the network. Since the label at the 154 bottom of stack is usually the label most closely associated with the 155 flow, this normally provides the greatest entropy, and hence is 156 usually included in the hash. This draft describes a method of 157 adding an additional label at the bottom of stack in order to 158 facilitate the load balancing of the flows within a pseudowire over 159 the available ECMPs. A similar design for general MPLS use has also 160 been proposed [I-D.kompella-mpls-entropy-label], however that is 161 outside the scope of this draft. 163 An alternative method of load balancing by creating a number of 164 pseudowires and distributing the flows amongst them was considered, 165 but was rejected because: 167 o It did not introduce as much entropy as the load balance label 168 method. 170 o It required additional pseudowires to be set up and maintained. 172 1.2. Flow Label 174 An additional label is interposed between the pseudowire label and 175 the control word, or if the control word is not present, between the 176 pseudowire label and the pseudowire payload. This additional label 177 is called the Flow label. Indivisible flows within the pseudowire 178 MUST be mapped to the same Flow label by the ingress PE. The flow 179 label stimulates the correct ECMP load balancing behaviour in the 180 packet switched network (PSN). On receipt of the pseudowire packet 181 at the egress PE (which knows this additional label is present) the 182 flow label is discarded without processing. 184 Note that the flow label MUST NOT be an MPLS reserved label (values 185 in the range 0..15) [RFC3032], but is otherwise unconstrained by the 186 protocol. 188 Considerations of the TTL value are described in the Security section 189 of this document. The flow label can never become the top label in 190 normal operation, and hence the TTL in the flow label is never used 191 to determine whether the packet should be discarded due to TTL 192 expiry. Therefore there are no lower restrictions on the TTL value. 194 2. Native Service Processing Function 196 The Native Service Processing (NSP) function [RFC3985] is a component 197 of a PE that has knowledge of the structure of the emulated service 198 and is able to take action on the service outside the scope of the 199 pseudowire. In this case it is required that the NSP in the ingress 200 PE identify flows, or groups of flows within the service, and 201 indicate the flow (group) identity of each packet as it is passed to 202 the pseudowire forwarder. As an example, where the PW type is an 203 Ethernet, the NSP might parse the ingress Ethernet traffic and 204 consider all of the IP traffic. This traffic could then be 205 categorised into flows by considering all traffic with the same 206 source and destination address pair to be a single indivisible flow. 207 Since this is an NSP function, by definition, the method used to 208 identify a flow is outside the scope of the pseudowire design. 209 Similarly, since the NSP is internal to the PE, the method of flow 210 indication to the pseudowire forwarder is outside the scope of this 211 document. 213 3. Pseudowire Forwarder 215 The pseudowire forwarder must be provided with a method of mapping 216 flows to load balanced paths. 218 The forwarder must generate a label for the flow or group of flows. 219 How the load balance label values are determined is outside the scope 220 of this document, however the load balance label allocated to a flow 221 MUST NOT be an MPLS reserved label and SHOULD remain constant for the 222 life of the flow. It is recommended that the method chosen to 223 generate the load balancing labels introduces a high degree of 224 entropy in their values, to maximise the entropy presented to the 225 ECMP path selection mechanism in the LSRs in the PSN, and hence 226 distribute the flows as evenly as possible over the available PSN 227 ECMP paths. The forwarder at the ingress PE prepends the pseudowire 228 control word (if applicable), and then pushes the flow label, 229 followed by the pseudowire label. 231 The forwarder at the egress PE uses the pseudowire label to identify 232 the pseudowire. From the context associated with the pseudowire 233 label, the egress PE can determine whether a flow label is present. 234 If a flow label is present, the label is discarded. 236 All other pseudowire forwarding operations are unmodified by the 237 inclusion of the flow label. 239 3.1. Encapsulation 241 The PWE3 Protocol Stack Reference Model modified to include flow 242 label is shown in Figure 1 below 244 +-------------+ +-------------+ 245 | Emulated | | Emulated | 246 | Ethernet | | Ethernet | 247 | (including | Emulated Service | (including | 248 | VLAN) |<==============================>| VLAN) | 249 | Services | | Services | 250 +-------------+ +-------------+ 251 | Flow | | Flow | 252 +-------------+ Pseudowire +-------------+ 253 |Demultiplexer|<==============================>|Demultiplexer| 254 +-------------+ +-------------+ 255 | PSN | PSN Tunnel | PSN | 256 | MPLS |<==============================>| MPLS | 257 +-------------+ +-------------+ 258 | Physical | | Physical | 259 +-----+-------+ +-----+-------+ 261 Figure 1: PWE3 Protocol Stack Reference Model 263 The encapsulation of a pseudowire with a flow label is shown in 264 Figure 2 below 265 +-------------------------------+ 266 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 267 +-------------------------------+ 268 | PW label | 4 octets 269 +-------------------------------+ 270 | Flow label | 4 octets 271 +-------------------------------+ 272 | Optional Control Word | 4 octets 273 +-------------------------------+ 274 | Payload | 275 | | 276 | | n octets 277 | | 278 +-------------------------------+ 280 Figure 2: Encapsulation of a pseudowire with a pseudowire load 281 balancing label 283 4. Signaling the Presence of the Flow Label 285 When using the signalling procedures in [RFC4447], there is a 286 Pseudowire Interface Parameter Sub-TLV type used to synchronise the 287 flow label states between the ingress and egress PEs. 289 The absence of a flow label (FL) TLV by either party indicates that 290 the PE concerned is unable to recognise this TLV and the sender of 291 the FL TLV MUST send a new label mapping without the FL TLV. This 292 preserves backwards compatibility with existing PEs that do not 293 understand the FL TLV or that cannot, or do not wish to, process the 294 flow label. 296 A PE that wishes to use a flow label sends an FL TLV with the F bit 297 set (see Section 4.1). A PE that can correctly process a flow label 298 and is willing to receive one, but does not wish to send a flow label 299 sends an FL TLV with the F bit clear. A PE that sends an FL TLV with 300 the F bit set and receives an FL TLV with or without the F bit set 301 MUST include the flow label between the pseudowire label and the 302 control word (or is the control word is not present between the 303 pseudowire label and the pseudowire payload). 305 If PWE3 signalling [RFC4447] is not in use for a pseudowire, then 306 whether the flow label is used MUST be identically provisioned in 307 both PEs at the pseudowire endpoints. If there is no provisioning 308 support for this option, the default behaviour is not to include the 309 flow label. 311 Note that what is signalled is the desire to include the flow label 312 in the label stack. The value of the label is a local matter for the 313 ingress PE, and the label value itself is not signalled. 315 4.1. Structure of Flow Label TLV 317 The structure of the flow label TLV is shown in Figure 3. 319 0 1 2 3 320 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 321 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 322 | FL | Length |F| must be zero | 323 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 325 Figure 3: Multiple VC TLV 327 Where: 329 o FL is the flow label TLV identifier assigned by IANA. 331 o Length is the length of the TLV in octets and is 4. 333 o When F=1 a flow label will be pushed. When F=0 a flow label will 334 not be pushed. 336 5. OAM 338 The following OAM considerations apply to this method of load 339 balancing. 341 Where the OAM is only to be used to perform a basic test that the 342 pseudowires have been configured at the PEs, VCCV [RFC5085] messages 343 may be sent using any load balance pseudowire path, i.e. using any 344 value for the flow label. 346 Where it is required to verify that a pseudowire is fully functional 347 for all flows, VCCV [RFC5085] connection verification message MUST be 348 sent over each ECMP path to the pseudowire egress PE. This problem 349 is difficult to solve and scales poorly. We believe that this 350 problem is addressed by the following two methods: 352 1. If a failure occurs within the PSN, this failure will normally be 353 detected by the PSN's Interior Gateway protocol (IGP) link/node 354 failure detection mechanism (loss of light, bidirectional 355 forwarding detection [I-D.ietf-bfd-base] or IGP hello detection), 356 and the IGP convergence will naturally modify the ECMP set of 357 network paths between the Ingress and Egress PE's. Hence the PW 358 is only impacted during the normal IGP convergence time. 360 2. If the failure is related to the individual corruption of an 361 Label Forwarding Information dataBase (LFIB) entry in a router, 362 then only the network path using that specific entry is impacted. 363 If the PW is load balanced over multiple network paths, then this 364 failure can only be detected if, by chance, the transported OAM 365 flow is mapped onto the impacted network path, or all paths are 366 tested. This type of error may be better solved be solved by 367 other means such as LSP self test [I-D.ietf-mpls-lsr-self-test]. 369 To troubleshoot the MPLS PSN, including multiple paths, the 370 techniques described in [RFC4378] and [RFC4379] can be used. 372 Where the pseudowire OAM is carried out of band (VCCV Type 2) 373 [RFC5085] it is necessary to insert an "MPLS Router Alert Label" in 374 the label stack. The resultant label stack is a follows: 376 +-------------------------------+ 377 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 378 +-------------------------------+ 379 | Router Alert label | 4 octets 380 +-------------------------------+ 381 | PW label | 4 octets 382 +-------------------------------+ 383 | Flow label | 4 octets 384 +-------------------------------+ 385 | Optional Control Word | 4 octets 386 +-------------------------------+ 387 | Payload | 388 | | 389 | | n octets 390 | | 391 +-------------------------------+ 393 Figure 4: Use of Router Alert LAbel 395 6. Applicability 397 A node within the PSN is not able to perform deep-packet-inspection 398 (DPI) of the PW as the PW technology is not self-describing: the 399 structure of the PW payload is only known to the ingress and egress 400 PE devices. The method proposed in this document provides a 401 statistical mitigation of the problem of load balance in those cased 402 where a PE is able to discern flows embedded in the traffic received 403 on the attachment circuit. 405 The methods describe in this document are transparent to the PSN and 406 as such do not require any new capability from the PSN. 408 The requirement to load-balance over multiple PSN paths occurs when 409 the ratio between the PW access speed and the PSN's core link 410 bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet 411 this property. Ethernet may have this property, and for that reason 412 this document focuses on Ethernet. Applications for other high- 413 access-bandwidth PW's (e.g. Fibre Channel) may be defined in the 414 future. 416 This design applies to MPLS pseudowires where it is meaningful to de- 417 construct the packets presented to the ingress PE into flows. The 418 mechanism described in this document promotes the distribution of 419 flows within the pseudowire over different network paths. This in 420 turn means that whilst packets within a flow are delivered in order 421 (subject to normal IP delivery perturbations due to topology 422 variation), order is not maintained amongst packets of different 423 flows. It is not proposed to associate a different sequence number 424 with each flow. If sequence number support is required this 425 mechanism is not applicable. 427 Where it is known that the traffic carried by the Ethernet pseudowire 428 is IP the method of identifying the flows are well known and can be 429 applied. Such methods typically include hashing on the source and 430 destination addresses, the protocol ID and higher-layer flow- 431 dependent fields such as TCP/UDP ports, L2TPv3 Session ID's etc. 433 Where it is known that the traffic carried by the Ethernet pseudowire 434 is non-IP, techniques used for link bundling between Ethernet 435 switches may be reused. In this case however the latency 436 distribution would be larger than is found in the link bundle case. 437 The acceptability of the increased latency is for further study. Of 438 particular importance the Ethernet control frames SHOULD always be 439 mapped to the same PSN path to ensure in-order delivery. 441 6.1. Equal Cost Multiple Paths 443 ECMP in packet switched networks is statistical in nature. The 444 mapping of flows to a particular path does not take into account the 445 bandwidth of the flow being mapped or the current bandwidth usage of 446 the members of the ECMP set. This simplification works well when the 447 distribution of flows is evenly spread over the ECMP set and there 448 are a large number of flows that have low bandwidth relative to the 449 paths. The random allocation of a flow to a path provides a good 450 approximation to an even spread of flows, provided that polarisation 451 effects are avoided. The method proposed in this document has the 452 same statistical properties as an IP PSN. 454 ECMP is a load-sharing mechanism that is based on sharing the load 455 over a number of layer 3 paths through the PSN. Often however 456 multiple links exist between a pair of LSRs that are considered by 457 the IGP to be a single link. These are known as link bundles. The 458 mechanism described in this document can also be used to distribute 459 the flows within a pseudowire over the members of the link bundle by 460 using the flow label value to identify candidate flows. How that 461 mapping takes place is outside the scope of this specification. 462 Similar considerations apply to link aggregation groups. 464 In the ECMP case and the link bundling case the NSP may attempt to 465 take bandwidth into consideration when allocating groups of flows to 466 a common path. That is permitted, but it must be borne in mind that 467 the semantics of a label stack entry (LSE) as defined by [RFC3032] 468 cannot be modified, the value of the flow label cannot be modified at 469 any point on the LSP, and the interpretation of bit patterns in, or 470 values of, the flow label by an LSR are undefined. 472 A different type of load balancing is the desire to carry a 473 pseudowire over a set of PSN links in which the bandwidth of members 474 of the link set is less than the bandwidth of the pseudowire. This 475 problem is addressed in [I-D.stein-pwe3-pwbonding]. Such a mechanism 476 can be considered complementary to this mechanism. 478 6.2. Link Aggregation Groups 480 A Link Aggregation Group (LAG) is used to bond together several 481 physical circuits between two adjacent nodes so they appear to 482 higher-layer protocols as a single, higher bandwidth "virtual" pipe. 483 These may co-exist in various parts of a given network. An advantage 484 of LAGs is that they reduce the number of routing and signalling 485 protocol adjacencies between devices, reducing control plane 486 processing overhead. As with ECMP, the key problem related to LAGs 487 is that due to inefficiencies in LAG load-distribution algorithms, a 488 particular component of a LAG may experience congestion. The 489 mechanism proposed here may be able to assist in producing a more 490 uniform flow distribution. 492 The same considerations requiring a flow to go over a single member 493 of an ECMP path set apply to a member of a LAG. 495 6.3. The Single Large Flow Case 497 Clearly the operator should make sure that the service offered using 498 PW technology and the method described in this document does not 499 exceed the maximum planned link capacity, unless it can be guaranteed 500 that it conforms to the Internet traffic profile of a very large 501 number of small flows. 503 If the payload on a PW is made of a single inner flow (i.e. an 504 encrypted connection between two routers), or the flow identifiers 505 are too deeply buried in the packet, then the functionality described 506 in this document does not give any benefits, though neither does it 507 cause harm relative to the existing situation. The most common case 508 where a single flow dominated the traffic on a PW is when it is used 509 to transport enterprise traffic. Enterprise traffic may well consist 510 of a large single TCP flows, or encrypted flows that cannot be 511 handled by the methods described in this document. 513 An operator has six options under these circumstances: 515 1. The operator can do nothing and the system will work as it does 516 without the flow label. 518 2. The operator can make the customer aware that the service 519 offering has a restriction on flow bandwidth and police flows to 520 that restriction. This would allow customers offering multiple 521 flows to use a larger fraction their access bandwidth, whilst 522 preventing an single flow from consuming a fraction of internal 523 link bandwidth that the operator considered excessive. 525 3. The operator could configure the ingress PE to assign a constant 526 flow label to all high bandwidth flows so that only one path was 527 affected by these flows, 529 4. The operator could configure the ingress PE to assign a random 530 flow label to all high bandwidth flows so as to minimise the 531 disruption to the network as a cost of out of order traffic to 532 the user. 534 5. The operator could configure the ingress to assign a label of 535 special significance (such as a reserved label) to all high 536 bandwidth flows so that some other action (not specified in this 537 document) could be taken on the flow. 539 The issues described above are mitigated by the following two 540 factors: 542 o Firstly, the customer of a high-bandwidth PW service has an 543 incentive to get the best transport service because an inefficient 544 use of the PSN leads to jitter and eventually to loss to the PW's 545 payload. 547 o Secondly, the customer is usually able to tailor their 548 applications to generate many flows in the PSN. A well-known 549 example is massive data transport between servers which use many 550 parallel TCP sessions. This same technique can be used by any 551 transport protocol: multiple UDP ports, multiple L2TPv3 Session 552 ID's, multiple GRE keys may be used to decompose a large flow into 553 smaller components. This approach may be applied to IPsec 554 [RFC4301] where multiple Security Parameters Indexes (SPI's) may 555 be allocated to the same security association. 557 6.4. MPLS-TP 559 The MPLS Transport Profile (MPLS-TP) [I-D.ietf-mpls-tp-requirements] 560 requirement 44 states that "MPLS-TP SHOULD support mechanisms to 561 enable the reserved bandwidth of a transport path to be decreased 562 without impacting the existing traffic on that transport path, 563 provided that the level of existing traffic is smaller than the 564 reserved bandwidth following the decrease." The flow aware transport 565 of a PW reorders packets (albeit in an application friendly way), 566 therefore SHOULD NOT be deployed in a network conforming to the 567 MPLS-TP. 569 7. Applicability to MPLS 571 A further application of this technique would be to create a basis 572 for hash diversity without having to peek below the label stack for 573 IP traffic carried over LDP LSPs. Work on the generalisation of this 574 to MPLS has been described in [I-D.kompella-mpls-entropy-label]. 575 This is can be regarded as a complementary but distinct approach 576 since although similar consideration may apply to the identification 577 of flows and the allocation of flow label values, the flow labels are 578 imposed by different network components, and the associated 579 signalling mechanisms are different. 581 8. Security Considerations 583 The pseudowire generic security considerations described in [RFC3985] 584 and the security considerations applicable to a specific pseudowire 585 type (for example, in the case of an Ethernet pseudowire [RFC4448] 586 apply. 588 The ingress PE SHOULD take steps to ensure that the load-balance 589 label is not used as a covert channel. 591 It is useful to give consideration to the choice of TTL value in the 592 flow label stack entry [RFC3032]. The flow label is at the bottom of 593 label stack. Therefore, even when penultimate hop popping is 594 employed, it will always be will preceded by the PW label on arrival 595 at the PE. The flow label TTL will therefore never be considered by 596 the MPLS forwarder, and hence MAY be set to a value of 1. This will 597 prevent the packet being inadvertently forwarded based on the value 598 of the flow label. Note that this may be a departure from 599 considerations that apply to the general MPLS case. 601 9. IANA Considerations 603 IANA is requested to allocate the next available values from the IETF 604 Consensus range in the Pseudowire Interface Parameters Sub-TLV type 605 Registry as a Flow Label indicator. 607 Parameter Length Description 609 TBD 4 Load Balancing Label 611 10. Congestion Considerations 613 The congestion considerations applicable to pseudowires as described 614 in [RFC3985] and any additional congestion considerations developed 615 at the time of publication apply to this design. 617 The ability to explicitly configure a PW to leverage the availability 618 of multiple ECMP paths is beneficial to capacity planning as, all 619 other parameters being constant, the statistical multiplexing of a 620 larger number of smaller flows is more efficient than with a smaller 621 number of larger flows. 623 Note that if the classification into flows is only performed on IP 624 packets the behaviour of those flows in the face of congestion will 625 be as already defined by the IETF for packets of that type and no 626 additional congestion processing is required. 628 Where flows that are not IP are classified pseudowire congestion 629 avoidance must be applied to each non-IP load balance group. 631 11. Acknowledgements 633 The authors wish to thank Eric Grey, Kireeti Kompella, Joerg 634 Kuechemann, Wilfried Maas, Luca Martini, Mark Townsley, and Lucy Yong 635 for valuable comments on this document. 637 12. References 638 12.1. Normative References 640 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 641 Requirement Levels", BCP 14, RFC 2119, March 1997. 643 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 644 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 645 Encoding", RFC 3032, January 2001. 647 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 648 Label Switched (MPLS) Data Plane Failures", RFC 4379, 649 February 2006. 651 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 652 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 653 Use over an MPLS PSN", RFC 4385, February 2006. 655 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 656 Heron, "Pseudowire Setup and Maintenance Using the Label 657 Distribution Protocol (LDP)", RFC 4447, April 2006. 659 [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, 660 "Encapsulation Methods for Transport of Ethernet over MPLS 661 Networks", RFC 4448, April 2006. 663 [RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time 664 Division Multiplexing (TDM) over Packet (SAToP)", 665 RFC 4553, June 2006. 667 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 668 Cost Multipath Treatment in MPLS Networks", BCP 128, 669 RFC 4928, June 2007. 671 [RFC5085] Nadeau, T. and C. Pignataro, "Pseudowire Virtual Circuit 672 Connectivity Verification (VCCV): A Control Channel for 673 Pseudowires", RFC 5085, December 2007. 675 12.2. Informative References 677 [I-D.ietf-bfd-base] 678 Katz, D. and D. Ward, "Bidirectional Forwarding 679 Detection", draft-ietf-bfd-base-09 (work in progress), 680 February 2009. 682 [I-D.ietf-mpls-lsr-self-test] 683 Swallow, G., "Label Switching Router Self-Test", 684 draft-ietf-mpls-lsr-self-test-07 (work in progress), 685 May 2007. 687 [I-D.ietf-mpls-tp-requirements] 688 Niven-Jenkins, B., Brungard, D., Betts, M., Sprecher, N., 689 and S. Ueno, "MPLS-TP Requirements", 690 draft-ietf-mpls-tp-requirements-10 (work in progress), 691 August 2009. 693 [I-D.kompella-mpls-entropy-label] 694 Kompella, K. and S. Amante, "The Use of Entropy Labels in 695 MPLS Forwarding", draft-kompella-mpls-entropy-label-00 696 (work in progress), July 2008. 698 [I-D.stein-pwe3-pwbonding] 699 Stein, Y., Mendelsohn, I., and R. Insler, "PW Bonding", 700 draft-stein-pwe3-pwbonding-01 (work in progress), 701 November 2008. 703 [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- 704 Edge (PWE3) Architecture", RFC 3985, March 2005. 706 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 707 Internet Protocol", RFC 4301, December 2005. 709 [RFC4378] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol 710 Label Switching (MPLS) Operations and Management (OAM)", 711 RFC 4378, February 2006. 713 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 714 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 716 Authors' Addresses 718 Stewart Bryant (editor) 719 Cisco Systems 720 250 Longwater Ave 721 Reading RG2 6GB 722 United Kingdom 724 Phone: +44-208-824-8828 725 Email: stbryant@cisco.com 726 Clarence Filsfils 727 Cisco Systems 728 Brussels 729 Belgium 731 Email: cfilsfil@cisco.com 733 Ulrich Drafz 734 Deutsche Telekom 735 Muenster, 736 Germany 738 Phone: 739 Fax: 740 Email: Ulrich.Drafz@t-com.net 741 URI: 743 Vach Kompella 744 Alcatel-Lucent 746 Phone: 747 Fax: 748 Email: Alcatel-Lucent vach.kompella@alcatel-lucent.com 749 URI: 751 Joe Regan 752 Alcatel-Lucent 754 Phone: 755 Fax: 756 Email: joe.regan@alcatel-lucent.comRegan 757 URI: 759 Shane Amante 760 Level 3 Communications 762 Phone: 763 Fax: 764 Email: shane@castlepoint.net 765 URI: