idnits 2.17.1 draft-ietf-pwe3-fat-pw-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** You're using the IETF Trust Provisions' Section 6.b License Notice from 12 Sep 2009 rather than the newer Notice from 28 Dec 2009. (See https://trustee.ietf.org/license-info/) Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (January 27, 2010) is 5195 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 4379 (Obsoleted by RFC 8029) ** Obsolete normative reference: RFC 4447 (Obsoleted by RFC 8077) == Outdated reference: A later version (-02) exists of draft-kompella-mpls-entropy-label-00 Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 PWE3 S. Bryant, Ed. 3 Internet-Draft C. Filsfils 4 Intended status: Standards Track Cisco Systems 5 Expires: July 31, 2010 U. Drafz 6 Deutsche Telekom 7 V. Kompella 8 J. Regan 9 Alcatel-Lucent 10 S. Amante 11 Level 3 Communications 12 January 27, 2010 14 Flow Aware Transport of Pseudowires over an MPLS PSN 15 draft-ietf-pwe3-fat-pw-03 17 Abstract 19 Where the payload carried over a pseudowire carries a number of 20 identifiable flows it can in some circumstances be desirable to carry 21 those flows over the equal cost multiple paths (ECMPs) that exist in 22 the packet switched network. Most forwarding engines are able to 23 hash based on label stacks and use this to balance flows over ECMPs. 24 This draft describes a method of identifying the flows, or flow 25 groups, to the label switched routers by including an additional 26 label in the label stack. 28 Requirements Language 30 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 31 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 32 document are to be interpreted as described in RFC2119 [RFC2119]. 34 Status of this Memo 36 This Internet-Draft is submitted to IETF in full conformance with the 37 provisions of BCP 78 and BCP 79. 39 Internet-Drafts are working documents of the Internet Engineering 40 Task Force (IETF), its areas, and its working groups. Note that 41 other groups may also distribute working documents as Internet- 42 Drafts. 44 Internet-Drafts are draft documents valid for a maximum of six months 45 and may be updated, replaced, or obsoleted by other documents at any 46 time. It is inappropriate to use Internet-Drafts as reference 47 material or to cite them other than as "work in progress." 48 The list of current Internet-Drafts can be accessed at 49 http://www.ietf.org/ietf/1id-abstracts.txt. 51 The list of Internet-Draft Shadow Directories can be accessed at 52 http://www.ietf.org/shadow.html. 54 This Internet-Draft will expire on July 31, 2010. 56 Copyright Notice 58 Copyright (c) 2010 IETF Trust and the persons identified as the 59 document authors. All rights reserved. 61 This document is subject to BCP 78 and the IETF Trust's Legal 62 Provisions Relating to IETF Documents 63 (http://trustee.ietf.org/license-info) in effect on the date of 64 publication of this document. Please review these documents 65 carefully, as they describe your rights and restrictions with respect 66 to this document. Code Components extracted from this document must 67 include Simplified BSD License text as described in Section 4.e of 68 the Trust Legal Provisions and are provided without warranty as 69 described in the BSD License. 71 Table of Contents 73 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 74 1.1. ECMP in Label Switched Routers . . . . . . . . . . . . . . 5 75 1.2. Flow Label . . . . . . . . . . . . . . . . . . . . . . . . 5 76 2. Native Service Processing Function . . . . . . . . . . . . . . 6 77 3. Pseudowire Forwarder . . . . . . . . . . . . . . . . . . . . . 6 78 3.1. Encapsulation . . . . . . . . . . . . . . . . . . . . . . 7 79 4. Signaling the Presence of the Flow Label . . . . . . . . . . . 8 80 4.1. Structure of Flow Label Sub-TLV . . . . . . . . . . . . . 9 81 5. Multi-Segment Pseudowires . . . . . . . . . . . . . . . . . . 9 82 6. OAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 83 7. Applicability of FAT PWs . . . . . . . . . . . . . . . . . . . 11 84 7.1. Equal Cost Multiple Paths . . . . . . . . . . . . . . . . 12 85 7.2. Link Aggregation Groups . . . . . . . . . . . . . . . . . 13 86 7.3. The Single Large Flow Case . . . . . . . . . . . . . . . . 13 87 7.4. MPLS-TP . . . . . . . . . . . . . . . . . . . . . . . . . 14 88 8. Applicability to MPLS . . . . . . . . . . . . . . . . . . . . 15 89 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15 90 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 15 91 11. Congestion Considerations . . . . . . . . . . . . . . . . . . 16 92 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 16 93 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 16 94 13.1. Normative References . . . . . . . . . . . . . . . . . . . 16 95 13.2. Informative References . . . . . . . . . . . . . . . . . . 17 96 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 18 98 1. Introduction 100 A pseudowire (PW) [RFC3985] is normally transported over one single 101 network path, even if multiple Equal Cost Multiple Paths (ECMP) exit 102 between the ingress and egress PW provider edge (PE) 103 equipments[RFC4385] [RFC4928]. This is required to preserve the 104 characteristics of the emulated service (e.g. to avoid misordering 105 SAToP pseudowire packets [RFC4553] or subjecting the packets to 106 unusable inter-arrival times ). The use of a single path to preserve 107 order remains the default mode of operation of a pseudowire (PW). 108 The new capability proposed in this document is an OPTIONAL mode 109 which may be used when the use of ECMP paths for is known to be 110 beneficial (and not harmful) to the operation of the PW. 112 Some pseudowires are used to transport large volumes of IP traffic 113 between routers at two locations. One example of this is the use of 114 an Ethernet pseudowire to create a virtual direct link between a pair 115 of routers. Such pseudowire's may carry from hundred's of Mbps to 116 Gbps of traffic. Such pseudowire's do not require strict ordering to 117 be preserved between packets of the pseudowire. They only require 118 ordering to be preserved within the context of each individual 119 transported IP flow. Some operators have requested the ability to 120 explicitly configure such a pseudowire to leverage the availability 121 of multiple ECMP paths. This allows for better capacity planning as 122 the statistical multiplexing of a larger number of smaller flows is 123 more efficient than with a smaller set of larger flows. Although 124 Ethernet is used as an example above, the mechanisms described in 125 this draft are general mechanisms that may be applied to any 126 pseudowire type in which there are identifiable flows, and in which 127 there is no requirement to preserve the order between those flows. 129 Typically, forwarding hardware can deduce that an IP payload is being 130 directly carried by an MPLS label stack, and is capable of looking at 131 some fields in packets to construct hash buckets for conversations or 132 flows. However, an intermediate node has no information on the type 133 pseudowire being carried in the packet. This limits the forwarder at 134 the intermediate node to only being able to make an ECMP choice based 135 on a hash of the label stack. In the case of a pseudowire emulating 136 a high bandwidth trunk, the granularity obtained by hashing the 137 default label stack is inadequate for satisfactory load-balancing. 138 The ingress node, however, is in the special position of being able 139 to look at the un-encapsulated packet and spread flows amongst any 140 available ECMP paths, or even any Loop-Free Alternates [RFC5286] . 141 This draft proposes a method to introduce granularity on the hashing 142 of traffic running over pseudowires by introducing an additional 143 label, chosen by the ingress node, and placed at the bottom of the 144 label stack. 146 In addition to providing an indication of the flow structure for use 147 in ECMP forwarding decisions, the mechanism described in the document 148 may also be used to select flows for distribution over an 802.1ad 149 link aggregation group that has been used in an MPLS network. 151 1.1. ECMP in Label Switched Routers 153 Label switched routers commonly hash the label stack or some elements 154 of the label stack as a method of discriminating between flows, in 155 order to distribute those flows over the available equal cost 156 multiple paths that exist in the network. Since the label at the 157 bottom of stack is usually the label most closely associated with the 158 flow, this normally provides the greatest entropy, and hence is 159 usually included in the hash. This draft describes a method of 160 adding an additional label at the bottom of stack in order to 161 facilitate the load balancing of the flows within a pseudowire over 162 the available ECMPs. A similar design for general MPLS use has also 163 been proposed [I-D.kompella-mpls-entropy-label], however that is 164 outside the scope of this draft. 166 An alternative method of load balancing by creating a number of 167 pseudowires and distributing the flows amongst them was considered, 168 but was rejected because: 170 o It did not introduce as much entropy as the load balance label 171 method. 173 o It required additional pseudowires to be set up and maintained. 175 1.2. Flow Label 177 An additional label is interposed between the pseudowire label and 178 the control word, or if the control word is not present, between the 179 pseudowire label and the pseudowire payload. This additional label 180 is called the flow label. Indivisible flows within the pseudowire 181 MUST be mapped to the same flow label by the ingress PE. The flow 182 label stimulates the correct ECMP load balancing behaviour in the 183 packet switched network (PSN). On receipt of the pseudowire packet 184 at the egress PE (which knows this additional label is present) the 185 flow label is discarded without processing. 187 Note that the flow label MUST NOT be an MPLS reserved label (values 188 in the range 0..15) [RFC3032], but is otherwise unconstrained by the 189 protocol. 191 Considerations of the TTL value are described in the Security section 192 of this document. The flow label can never become the top label in 193 normal operation, and hence the TTL in the flow label is never used 194 to determine whether the packet should be discarded due to TTL 195 expiry. Therefore there are no lower restrictions on the TTL value. 197 2. Native Service Processing Function 199 The Native Service Processing (NSP) function [RFC3985] is a component 200 of a PE that has knowledge of the structure of the emulated service 201 and is able to take action on the service outside the scope of the 202 pseudowire. In this case it is required that the NSP in the ingress 203 PE identify flows, or groups of flows within the service, and 204 indicate the flow (group) identity of each packet as it is passed to 205 the pseudowire forwarder. As an example, where the PW type is an 206 Ethernet, the NSP might parse the ingress Ethernet traffic and 207 consider all of the IP traffic. This traffic could then be 208 categorised into flows by considering all traffic with the same 209 source and destination address pair to be a single indivisible flow. 210 Since this is an NSP function, by definition, the method used to 211 identify a flow is outside the scope of the pseudowire design. 212 Similarly, since the NSP is internal to the PE, the method of flow 213 indication to the pseudowire forwarder is outside the scope of this 214 document. 216 3. Pseudowire Forwarder 218 The pseudowire forwarder must be provided with a method of mapping 219 flows to load balanced paths. 221 The forwarder must generate a label for the flow or group of flows. 222 How the load balance label values are determined is outside the scope 223 of this document, however the load balance label allocated to a flow 224 MUST NOT be an MPLS reserved label and SHOULD remain constant for the 225 life of the flow. It is recommended that the method chosen to 226 generate the load balancing labels introduces a high degree of 227 entropy in their values, to maximise the entropy presented to the 228 ECMP path selection mechanism in the LSRs in the PSN, and hence 229 distribute the flows as evenly as possible over the available PSN 230 ECMP paths. The forwarder at the ingress PE prepends the pseudowire 231 control word (if applicable), and then pushes the flow label, 232 followed by the pseudowire label. 234 The forwarder at the egress PE uses the pseudowire label to identify 235 the pseudowire. From the context associated with the pseudowire 236 label, the egress PE can determine whether a flow label is present. 237 If a flow label is present, the label is discarded. 239 All other pseudowire forwarding operations are unmodified by the 240 inclusion of the flow label. 242 3.1. Encapsulation 244 The PWE3 Protocol Stack Reference Model modified to include flow 245 label is shown in Figure 1 below 247 +-------------+ +-------------+ 248 | Emulated | | Emulated | 249 | Ethernet | | Ethernet | 250 | (including | Emulated Service | (including | 251 | VLAN) |<==============================>| VLAN) | 252 | Services | | Services | 253 +-------------+ +-------------+ 254 | Flow | | Flow | 255 +-------------+ Pseudowire +-------------+ 256 |Demultiplexer|<==============================>|Demultiplexer| 257 +-------------+ +-------------+ 258 | PSN | PSN Tunnel | PSN | 259 | MPLS |<==============================>| MPLS | 260 +-------------+ +-------------+ 261 | Physical | | Physical | 262 +-----+-------+ +-----+-------+ 264 Figure 1: PWE3 Protocol Stack Reference Model 266 The encapsulation of a pseudowire with a flow label is shown in 267 Figure 2 below 268 +-------------------------------+ 269 | | 270 | Payload | 271 | | n octets 272 | | 273 +-------------------------------+ 274 | Optional Control Word | 4 octets 275 +-------------------------------+ 276 | Flow label | 4 octets 277 +-------------------------------+ 278 | PW label | 4 octets 279 +-------------------------------+ 280 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 281 +-------------------------------+ 283 Figure 2: Encapsulation of a pseudowire with a pseudowire load 284 balancing label 286 4. Signaling the Presence of the Flow Label 288 When using the signalling procedures in [RFC4447], a Pseudowire 289 Interface Parameter Flow Label Sub-TLV (FL Sub-TLV) type is used to 290 synchronise the flow label states between the ingress and egress PEs. 291 The presence of an FL Sub-TLV in the interface parameters indicates 292 to the ingress PE that the egress PE can correctly process a flow 293 label. 295 A PE that wishes to use a flow label includes in its label mapping 296 message a Flow Label Sub-TLV (FL Sub-TLV) with F = 1 (see 297 Section 4.1). A PE that can correctly process a flow label, and is 298 willing to receive one, but does not wish to send a flow label, 299 includes an FL Sub-TLV with F = 0. 301 If a PE has sent an FL Sub-TLV with F = 1, and has received an FL 302 Sub-TLV it MUST include a flow lablel in the label stack. 304 If a PE has sent an FL Sub-TLV with F = 1 and does not receive an FL 305 Sub-TLV it MUST send a new label mapping using an FL Sub-TLV with F = 306 0. 308 A PE that has sent an FL Sub-TLV with F = 0 MUST NOT include a flow 309 lablel in the label stack. 311 If a PE that previously did not received a label binding without a FL 312 Sub-TLV receives a new a label mapping with one included, it MAY send 313 a new label mapping including an FL Sub-TLV with F = 1. 315 The signalling procedures in [RFC4447] state that "Processing of the 316 interface parameters should continue when unknown interface 317 parameters are encountered, and they MUST be silently ignored." The 318 signalling proceedure described here is therefore backwards 319 compatible with existing implementations. 321 If PWE3 signalling [RFC4447] is not in use for a pseudowire, then 322 whether the flow label is used MUST be identically provisioned in 323 both PEs at the pseudowire endpoints. If there is no provisioning 324 support for this option, the default behaviour is not to include the 325 flow label. 327 Note that what is signalled is the desire to include the flow label 328 in the label stack. The value of the label is a local matter for the 329 ingress PE, and the label value itself is not signalled. 331 4.1. Structure of Flow Label Sub-TLV 333 The structure of the flow label TLV is shown in Figure 3. 335 0 1 2 3 336 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 337 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 338 | FL | Length |F| Reserved | 339 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 341 Figure 3: Flow Label Sub-TLV 343 Where: 345 o FL is the flow label sub-TLV identifier assigned by IANA. 347 o Length is the length of the TLV in octets and is 4. 349 o When F=1 a flow label MUST be pushed. When F=0 a flow label MUST 350 NOT be pushed. 352 o Reserved bits MUST be zero on transmit and MUST be ignored on 353 receive. 355 5. Multi-Segment Pseudowires 357 The flow label mechanism described in this document works on multi- 358 segment PWs without requiring modification to the Switching PEs 359 (S-PEs). This is because the flow label is transparent to the label 360 swap operation, and because interface parameter Sub-TLV signalling is 361 transitive. 363 6. OAM 365 The following OAM considerations apply to this method of load 366 balancing. 368 Where the OAM is only to be used to perform a basic test that the 369 pseudowires have been configured at the PEs, VCCV [RFC5085] messages 370 may be sent using any load balance pseudowire path, i.e. using any 371 value for the flow label. 373 Where it is required to verify that a pseudowire is fully functional 374 for all flows, VCCV [RFC5085] connection verification message MUST be 375 sent over each ECMP path to the pseudowire egress PE. This problem 376 is difficult to solve and scales poorly. We believe that this 377 problem is addressed by the following two methods: 379 1. If a failure occurs within the PSN, this failure will normally be 380 detected by the PSN's Interior Gateway protocol (IGP) link/node 381 failure detection mechanism (loss of light, bidirectional 382 forwarding detection [I-D.ietf-bfd-base] or IGP hello detection), 383 and the IGP convergence will naturally modify the ECMP set of 384 network paths between the Ingress and Egress PE's. Hence the PW 385 is only impacted during the normal IGP convergence time. 387 2. If the failure is related to the individual corruption of an 388 Label Forwarding Information dataBase (LFIB) entry in a router, 389 then only the network path using that specific entry is impacted. 390 If the PW is load balanced over multiple network paths, then this 391 failure can only be detected if, by chance, the transported OAM 392 flow is mapped onto the impacted network path, or all paths are 393 tested. This type of error may be better solved be solved by 394 other means such as LSP self test [I-D.ietf-mpls-lsr-self-test]. 396 To troubleshoot the MPLS PSN, including multiple paths, the 397 techniques described in [RFC4378] and [RFC4379] can be used. 399 Where the pseudowire OAM is carried out of band (VCCV Type 2) 400 [RFC5085] it is necessary to insert an "MPLS Router Alert Label" in 401 the label stack. The resultant label stack is a follows: 403 +-------------------------------+ 404 | | 405 | Payload | 406 | | n octets 407 | | 408 +-------------------------------+ 409 | Optional Control Word | 4 octets 410 +-------------------------------+ 411 | Flow label | 4 octets 412 +-------------------------------+ 413 | PW label | 4 octets 414 +-------------------------------+ 415 | Router Alert label | 4 octets 416 +-------------------------------+ 417 | MPLS Tunnel label(s) | n*4 octets (four octets per label) 418 +-------------------------------+ 420 Figure 4: Use of Router Alert LAbel 422 7. Applicability of FAT PWs 424 A node within the PSN is not able to perform deep-packet-inspection 425 (DPI) of the PW as the PW technology is not self-describing: the 426 structure of the PW payload is only known to the ingress and egress 427 PE devices. The method proposed in this document provides a 428 statistical mitigation of the problem of load balance in those cased 429 where a PE is able to discern flows embedded in the traffic received 430 on the attachment circuit. 432 The methods describe in this document are transparent to the PSN and 433 as such do not require any new capability from the PSN. 435 The requirement to load-balance over multiple PSN paths occurs when 436 the ratio between the PW access speed and the PSN's core link 437 bandwidth is large (e.g. >= 10%). ATM and FR are unlikely to meet 438 this property. Ethernet may have this property, and for that reason 439 this document focuses on Ethernet. Applications for other high- 440 access-bandwidth PW's (e.g. Fibre Channel) may be defined in the 441 future. 443 This design applies to MPLS pseudowires where it is meaningful to de- 444 construct the packets presented to the ingress PE into flows. The 445 mechanism described in this document promotes the distribution of 446 flows within the pseudowire over different network paths. This in 447 turn means that whilst packets within a flow are delivered in order 448 (subject to normal IP delivery perturbations due to topology 449 variation), order is not maintained amongst packets of different 450 flows. It is not proposed to associate a different sequence number 451 with each flow. If sequence number support is required this 452 mechanism is not applicable. 454 Where it is known that the traffic carried by the Ethernet pseudowire 455 is IP the method of identifying the flows are well known and can be 456 applied. Such methods typically include hashing on the source and 457 destination addresses, the protocol ID and higher-layer flow- 458 dependent fields such as TCP/UDP ports, L2TPv3 Session ID's etc. 460 Where it is known that the traffic carried by the Ethernet pseudowire 461 is non-IP, techniques used for link bundling between Ethernet 462 switches may be reused. In this case however the latency 463 distribution would be larger than is found in the link bundle case. 464 The acceptability of the increased latency is for further study. Of 465 particular importance the Ethernet control frames SHOULD always be 466 mapped to the same PSN path to ensure in-order delivery. 468 7.1. Equal Cost Multiple Paths 470 ECMP in packet switched networks is statistical in nature. The 471 mapping of flows to a particular path does not take into account the 472 bandwidth of the flow being mapped or the current bandwidth usage of 473 the members of the ECMP set. This simplification works well when the 474 distribution of flows is evenly spread over the ECMP set and there 475 are a large number of flows that have low bandwidth relative to the 476 paths. The random allocation of a flow to a path provides a good 477 approximation to an even spread of flows, provided that polarisation 478 effects are avoided. The method proposed in this document has the 479 same statistical properties as an IP PSN. 481 ECMP is a load-sharing mechanism that is based on sharing the load 482 over a number of layer 3 paths through the PSN. Often however 483 multiple links exist between a pair of LSRs that are considered by 484 the IGP to be a single link. These are known as link bundles. The 485 mechanism described in this document can also be used to distribute 486 the flows within a pseudowire over the members of the link bundle by 487 using the flow label value to identify candidate flows. How that 488 mapping takes place is outside the scope of this specification. 489 Similar considerations apply to link aggregation groups. 491 In the ECMP case and the link bundling case the NSP may attempt to 492 take bandwidth into consideration when allocating groups of flows to 493 a common path. That is permitted, but it must be borne in mind that 494 the semantics of a label stack entry (LSE) as defined by [RFC3032] 495 cannot be modified, the value of the flow label cannot be modified at 496 any point on the LSP, and the interpretation of bit patterns in, or 497 values of, the flow label by an LSR are undefined. 499 A different type of load balancing is the desire to carry a 500 pseudowire over a set of PSN links in which the bandwidth of members 501 of the link set is less than the bandwidth of the pseudowire. This 502 problem is addressed in [I-D.stein-pwe3-pwbonding]. Such a mechanism 503 can be considered complementary to this mechanism. 505 7.2. Link Aggregation Groups 507 A Link Aggregation Group (LAG) is used to bond together several 508 physical circuits between two adjacent nodes so they appear to 509 higher-layer protocols as a single, higher bandwidth "virtual" pipe. 510 These may co-exist in various parts of a given network. An advantage 511 of LAGs is that they reduce the number of routing and signalling 512 protocol adjacencies between devices, reducing control plane 513 processing overhead. As with ECMP, the key problem related to LAGs 514 is that due to inefficiencies in LAG load-distribution algorithms, a 515 particular component of a LAG may experience congestion. The 516 mechanism proposed here may be able to assist in producing a more 517 uniform flow distribution. 519 The same considerations requiring a flow to go over a single member 520 of an ECMP path set apply to a member of a LAG. 522 7.3. The Single Large Flow Case 524 Clearly the operator should make sure that the service offered using 525 PW technology and the method described in this document does not 526 exceed the maximum planned link capacity, unless it can be guaranteed 527 that it conforms to the Internet traffic profile of a very large 528 number of small flows. 530 If the payload on a PW is made of a single inner flow (i.e. an 531 encrypted connection between two routers), or the flow identifiers 532 are too deeply buried in the packet, then the functionality described 533 in this document does not give any benefits, though neither does it 534 cause harm relative to the existing situation. The most common case 535 where a single flow dominated the traffic on a PW is when it is used 536 to transport enterprise traffic. Enterprise traffic may well consist 537 of a large single TCP flows, or encrypted flows that cannot be 538 handled by the methods described in this document. 540 An operator has six options under these circumstances: 542 1. The operator can do nothing and the system will work as it does 543 without the flow label. 545 2. The operator can make the customer aware that the service 546 offering has a restriction on flow bandwidth and police flows to 547 that restriction. This would allow customers offering multiple 548 flows to use a larger fraction their access bandwidth, whilst 549 preventing an single flow from consuming a fraction of internal 550 link bandwidth that the operator considered excessive. 552 3. The operator could configure the ingress PE to assign a constant 553 flow label to all high bandwidth flows so that only one path was 554 affected by these flows, 556 4. The operator could configure the ingress PE to assign a random 557 flow label to all high bandwidth flows so as to minimise the 558 disruption to the network as a cost of out of order traffic to 559 the user. 561 5. The operator could configure the ingress to assign a label of 562 special significance (such as a reserved label) to all high 563 bandwidth flows so that some other action (not specified in this 564 document) could be taken on the flow. 566 The issues described above are mitigated by the following two 567 factors: 569 o Firstly, the customer of a high-bandwidth PW service has an 570 incentive to get the best transport service because an inefficient 571 use of the PSN leads to jitter and eventually to loss to the PW's 572 payload. 574 o Secondly, the customer is usually able to tailor their 575 applications to generate many flows in the PSN. A well-known 576 example is massive data transport between servers which use many 577 parallel TCP sessions. This same technique can be used by any 578 transport protocol: multiple UDP ports, multiple L2TPv3 Session 579 ID's, multiple GRE keys may be used to decompose a large flow into 580 smaller components. This approach may be applied to IPsec 581 [RFC4301] where multiple Security Parameters Indexes (SPI's) may 582 be allocated to the same security association. 584 7.4. MPLS-TP 586 The MPLS Transport Profile (MPLS-TP) [RFC5654] requirement 44 states 587 that "MPLS-TP SHOULD support mechanisms to enable the reserved 588 bandwidth of a transport path to be decreased without impacting the 589 existing traffic on that transport path, provided that the level of 590 existing traffic is smaller than the reserved bandwidth following the 591 decrease." The flow aware transport of a PW reorders packets (albeit 592 in an application friendly way), therefore SHOULD NOT be deployed in 593 a network conforming to the MPLS-TP. 595 8. Applicability to MPLS 597 A further application of this technique would be to create a basis 598 for hash diversity without having to peek below the label stack for 599 IP traffic carried over LDP LSPs. Work on the generalisation of this 600 to MPLS has been described in [I-D.kompella-mpls-entropy-label]. 601 This is can be regarded as a complementary, but distinct, approach 602 since although similar consideration may apply to the identification 603 of flows and the allocation of flow label values, the flow labels are 604 imposed by different network components, and the associated 605 signalling mechanisms are different. 607 9. Security Considerations 609 The pseudowire generic security considerations described in [RFC3985] 610 and the security considerations applicable to a specific pseudowire 611 type (for example, in the case of an Ethernet pseudowire [RFC4448] 612 apply. 614 The ingress PE SHOULD take steps to ensure that the load-balance 615 label is not used as a covert channel. 617 It is useful to give consideration to the choice of TTL value in the 618 flow label stack entry [RFC3032]. The flow label is at the bottom of 619 label stack. Therefore, even when penultimate hop popping is 620 employed, it will always be will preceded by the PW label on arrival 621 at the PE. The flow label TTL should therefore never be considered 622 by the forwarder, and hence SHOULD be set to a value of 1. This will 623 prevent the packet being inadvertently forwarded based on the value 624 of the flow label. Note that this may be a departure from 625 considerations that apply to the general MPLS case. 627 10. IANA Considerations 629 IANA is requested to allocate the next available values from the IETF 630 Consensus range in the Pseudowire Interface Parameters Sub-TLV type 631 Registry as a Flow Label indicator. 633 Parameter Length Description 634 ID 636 TBD 4 Flow Label 638 11. Congestion Considerations 640 The congestion considerations applicable to pseudowires as described 641 in [RFC3985] and any additional congestion considerations developed 642 at the time of publication apply to this design. 644 The ability to explicitly configure a PW to leverage the availability 645 of multiple ECMP paths is beneficial to capacity planning as, all 646 other parameters being constant, the statistical multiplexing of a 647 larger number of smaller flows is more efficient than with a smaller 648 number of larger flows. 650 Note that if the classification into flows is only performed on IP 651 packets the behaviour of those flows in the face of congestion will 652 be as already defined by the IETF for packets of that type and no 653 additional congestion processing is required. 655 Where flows that are not IP are classified pseudowire congestion 656 avoidance must be applied to each non-IP load balance group. 658 12. Acknowledgements 660 The authors wish to thank Eric Grey, Kireeti Kompella, Joerg 661 Kuechemann, Wilfried Maas, Luca Martini, Mark Townsley, and Lucy Yong 662 for valuable comments on this document. 664 13. References 666 13.1. Normative References 668 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 669 Requirement Levels", BCP 14, RFC 2119, March 1997. 671 [RFC3032] Rosen, E., Tappan, D., Fedorkow, G., Rekhter, Y., 672 Farinacci, D., Li, T., and A. Conta, "MPLS Label Stack 673 Encoding", RFC 3032, January 2001. 675 [RFC4379] Kompella, K. and G. Swallow, "Detecting Multi-Protocol 676 Label Switched (MPLS) Data Plane Failures", RFC 4379, 677 February 2006. 679 [RFC4385] Bryant, S., Swallow, G., Martini, L., and D. McPherson, 680 "Pseudowire Emulation Edge-to-Edge (PWE3) Control Word for 681 Use over an MPLS PSN", RFC 4385, February 2006. 683 [RFC4447] Martini, L., Rosen, E., El-Aawar, N., Smith, T., and G. 684 Heron, "Pseudowire Setup and Maintenance Using the Label 685 Distribution Protocol (LDP)", RFC 4447, April 2006. 687 [RFC4448] Martini, L., Rosen, E., El-Aawar, N., and G. Heron, 688 "Encapsulation Methods for Transport of Ethernet over MPLS 689 Networks", RFC 4448, April 2006. 691 [RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time 692 Division Multiplexing (TDM) over Packet (SAToP)", 693 RFC 4553, June 2006. 695 [RFC4928] Swallow, G., Bryant, S., and L. Andersson, "Avoiding Equal 696 Cost Multipath Treatment in MPLS Networks", BCP 128, 697 RFC 4928, June 2007. 699 [RFC5085] Nadeau, T. and C. Pignataro, "Pseudowire Virtual Circuit 700 Connectivity Verification (VCCV): A Control Channel for 701 Pseudowires", RFC 5085, December 2007. 703 13.2. Informative References 705 [I-D.ietf-bfd-base] 706 Katz, D. and D. Ward, "Bidirectional Forwarding 707 Detection", draft-ietf-bfd-base-11 (work in progress), 708 January 2010. 710 [I-D.ietf-mpls-lsr-self-test] 711 Swallow, G., "Label Switching Router Self-Test", 712 draft-ietf-mpls-lsr-self-test-07 (work in progress), 713 May 2007. 715 [I-D.kompella-mpls-entropy-label] 716 Kompella, K. and S. Amante, "The Use of Entropy Labels in 717 MPLS Forwarding", draft-kompella-mpls-entropy-label-00 718 (work in progress), July 2008. 720 [I-D.stein-pwe3-pwbonding] 721 Stein, Y., Mendelsohn, I., and R. Insler, "PW Bonding", 722 draft-stein-pwe3-pwbonding-01 (work in progress), 723 November 2008. 725 [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- 726 Edge (PWE3) Architecture", RFC 3985, March 2005. 728 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 729 Internet Protocol", RFC 4301, December 2005. 731 [RFC4378] Allan, D. and T. Nadeau, "A Framework for Multi-Protocol 732 Label Switching (MPLS) Operations and Management (OAM)", 733 RFC 4378, February 2006. 735 [RFC5286] Atlas, A. and A. Zinin, "Basic Specification for IP Fast 736 Reroute: Loop-Free Alternates", RFC 5286, September 2008. 738 [RFC5654] Niven-Jenkins, B., Brungard, D., Betts, M., Sprecher, N., 739 and S. Ueno, "Requirements of an MPLS Transport Profile", 740 RFC 5654, September 2009. 742 Authors' Addresses 744 Stewart Bryant (editor) 745 Cisco Systems 746 250 Longwater Ave 747 Reading RG2 6GB 748 United Kingdom 750 Phone: +44-208-824-8828 751 Email: stbryant@cisco.com 753 Clarence Filsfils 754 Cisco Systems 755 Brussels 756 Belgium 758 Email: cfilsfil@cisco.com 760 Ulrich Drafz 761 Deutsche Telekom 762 Muenster 763 Germany 765 Email: Ulrich.Drafz@t-com.net 767 Vach Kompella 768 Alcatel-Lucent 770 Email: Alcatel-Lucent vach.kompella@alcatel-lucent.com 771 Joe Regan 772 Alcatel-Lucent 774 Email: joe.regan@alcatel-lucent.comRegan 776 Shane Amante 777 Level 3 Communications 779 Email: shane@castlepoint.net