idnits 2.17.1 draft-wei-tsvwg-tunnel-congestion-feedback-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 8 instances of lines with control characters in the document. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 17 has weird spacing: '...A basic model...' == Line 100 has weird spacing: '...loss or unex...' == Line 102 has weird spacing: '...nneling scen...' == Line 200 has weird spacing: '...gestion could...' == Line 459 has weird spacing: '...n outer heade...' == (2 more instances...) == The document doesn't use any RFC 2119 keywords, yet has text resembling RFC 2119 boilerplate text. -- The document date (October 8, 2014) is 3488 days in the past. Is this intentional? -- Found something which looks like a code comment -- if you have code sections in the document, please surround them with '' and '' lines. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC 3168' is mentioned on line 102, but not defined == Missing Reference: 'SFC' is mentioned on line 298, but not defined == Missing Reference: 'VNFPOOL' is mentioned on line 346, but not defined == Unused Reference: 'I-D.boucadair-sfc-framework' is defined on line 835, but no explicit reference was found in the text == Unused Reference: 'I-D.zong-vnfpool-problem-statement' is defined on line 839, but no explicit reference was found in the text ** Obsolete normative reference: RFC 5101 (Obsoleted by RFC 7011) == Outdated reference: A later version (-02) exists of draft-boucadair-sfc-framework-00 == Outdated reference: A later version (-06) exists of draft-zong-vnfpool-problem-statement-02 ** Downref: Normative reference to an Informational draft: draft-zong-vnfpool-problem-statement (ref. 'I-D.zong-vnfpool-problem-statement') Summary: 3 errors (**), 0 flaws (~~), 15 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Internet Engineering Task Force X. Wei 3 INTERNET-DRAFT L.Zhu 4 Intended Status: Standards Track Huawei Technologies 5 Expires: April 11, 2015 L.Deng 6 China Mobile 7 October 8, 2014 9 Tunnel Congestion Feedback 10 draft-wei-tsvwg-tunnel-congestion-feedback-03 12 Abstract 14 This document describes a mechanism to calculate congestion of a 15 tunnel segment based on RFC 6040 recommendations, and a feedback 16 protocol by which to send the measured congestion of the tunnel from 17 egress to ingress router. A basic model for measuring tunnel 18 congestion and feedback is described, and a protocol for carrying the 19 feedback data is outlined. 21 Status of this Memo 23 This Internet-Draft is submitted to IETF in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF), its areas, and its working groups. Note that 28 other groups may also distribute working documents as 29 Internet-Drafts. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 The list of current Internet-Drafts can be accessed at 37 http://www.ietf.org/1id-abstracts.html 39 The list of Internet-Draft Shadow Directories can be accessed at 40 http://www.ietf.org/shadow.html 42 Copyright and License Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Conventions and Terminology . . . . . . . . . . . . . . . . . . 4 61 2.1 Conventions . . . . . . . . . . . . . . . . . . . . . . . . 4 62 2.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 4 63 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . 5 64 3.1 3GPP network scenario . . . . . . . . . . . . . . . . . . . 6 65 3.2 Network Function Virtualization Scenario . . . . . . . . . . 7 66 3.3 Data Center Tenancy Scenario . . . . . . . . . . . . . . . . 9 67 4. Congestion Control Model . . . . . . . . . . . . . . . . . . . 9 68 4.1 Congestion Calculation . . . . . . . . . . . . . . . . . . . 10 69 4.2 Data Information . . . . . . . . . . . . . . . . . . . . . . 12 70 4.3 Congestion Feedback . . . . . . . . . . . . . . . . . . . . 12 71 4.4 Congestion Control . . . . . . . . . . . . . . . . . . . . . 13 72 5. Congestion Feedback Protocol . . . . . . . . . . . . . . . . . 13 73 5.1 Properties of Candidate Protocol . . . . . . . . . . . . . . 13 74 5.2 IPFIX Extensions for Congestion Feedback . . . . . . . . . . 14 75 5.3 Other Protocols . . . . . . . . . . . . . . . . . . . . . . 18 76 6. Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 77 7. Security Considerations . . . . . . . . . . . . . . . . . . . . 18 78 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 18 79 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 19 80 9.1 Normative References . . . . . . . . . . . . . . . . . . . 19 81 9.2 Informative References . . . . . . . . . . . . . . . . . . 19 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 84 1. Introduction 86 In current practice of Internet protocol, encapsulation of IP headers 87 is always the technical proposal for overlay networking scenarios. 88 For example, mobile network are designed to encapsulate inner IP 89 header and application layer header chain through IP header, UDP 90 header and GTP-U header. It is also designed to fulfill the mobility, 91 QoS control, bearer management and other specific application of the 92 mobile network. Some organization's private network encrypt IP header 93 by Internet tunnel solutions with private key or certification 94 approaches to setup VPN (virtual private network) over WAN (wide area 95 network). 97 Congestion is the situation that traffic input exceeds throughput of 98 any segment of transmission path, which can result from 99 transportation constraints and interface/processor overload. In 100 general, congestion seen as the cause of packet loss or unexpected 101 delay to network end points. End to end congestion protocols (e.g. 102 ECN [RFC 3168] and ECN handling for tunneling scenario [RFC6040]) 103 are discussed in IETF. 105 In IP header encapsulation cases, IP headers should be carried over 106 transportation protocol like TCP or UDP, which influents the explicit 107 congestion control feedback, since the receiver should mark ECN in 108 TCP acknowledgment. On the other hand, packet loss and performance 109 degradation should not be recognized by network elements, for 110 instance the tunnel ingress and egress entity, when network segment 111 is encapsulated by IP header and UDP header chain. That causes 112 management problem when tunnel segment is considered as an 113 independent administration domain, and network operator intents to 114 keep network operation reliable. 116 This document describes a mechanism for feedback of congestion 117 observed in IP tunnels usages. Common tunnel deployments such as 118 mobile backhaul networks, VPNs and other IP-in-IP tunnels can be 119 congested as a result of sustained high load. 121 Network providers use a number of methods to deal with high load 122 conditions including proper network dimensioning, policies for 123 preferential flow treatment and selective offloading among others. 124 The mechanism proposed in this document is expected to complement 125 them and provide congestion information that to allow making better, 126 policies and decisions. 128 The model and general solution proposed in chapter 4 consist of 129 identifying congestion marks set in the tunnel segment, and feeding 130 back the congestion information from the egress to the ingress of the 131 tunnel. Measuring congestion of a tunnel segment is based on counting 132 outer packet CE marks for packets that have ECT marks in the inner 133 packet. This proposal depends on statistical marking of congestion 134 and uses the method described in RFC 6040 [RFC6040], Appendix C. 136 In chapter 5 the desired properties of the congestion information 137 conveying protocol are outlined, and IPFIX [RFC5101] as a candidate 138 protocol for these extensions is explored further. 140 2. Conventions and Terminology 142 2.1 Conventions 144 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 145 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 146 document are to be interpreted as described in RFC 2119 [RFC2119] 148 2.2 Terminology 150 Tunnel: A channel over which encapsulated packets traverse 151 across a network. 153 Encapsulation: The process of adding control information when it 154 passes through the layered model. 156 Encapsulator: The tunnel endpoint function that adds an outer IP 157 header to tunnel a packet, the encapsulator is 158 considered as the "ingress" of the tunnel. 160 Decapsulator: The tunnel endpoint function that removes an outer IP 161 header from a tunneled packet, the decapsulator is 162 considered as the "egress" of the tunnel. 164 Outer header: The header added to encapsulate a tunneled packet. 166 Inner header: The header encapsulated by the outer header. 168 E2E: End to End. 170 VPN: Virtual Private Network is a technology for using the 171 Internet or another intermediate network to connect 172 computers to isolated remote computer networks that 173 would otherwise be inaccessible. 175 GRE: Generic Routing Encapsulation. 177 IPFIX IP Flow Information Export. An IETF protocol to export 178 flow information from routers and other devices. 180 RED Random Early Detection 182 NFV Network Functions Virtualization is an alternative design 183 approach for building complex IT applications, 184 particularly in the telecommunications and service 185 provider industries, that virtualizes entire classes 186 of function into building blocks that may be 187 connected, or chained, together to create services. 189 VNF Virtualized Network Function may consist of one or more virtual 190 machines running different software and processes, 191 which form the building blocks for NFV. 193 SFC Service Function Chain is a group of connected VNF in a specific 194 sequence/map using NFV approach, in order to deliver a 195 specific service. 197 3. Problem Statement 199 Network traffic congestion control plays a significant role in 200 network performance management, and sustaining congestion could 201 impact subscriber's experience. Currently the solution of network 202 congestion problem mainly focuses on end-to-end method, i.e. ECN 203 [RFC3168], and the traffic sender are in charge of reducing traffic 204 rates in case of network congested. But sometimes it's not always 205 reliable to dependent on end hosts to solve the congestion situation, 206 because some end hosts may not support ECN, or even ECN is supported 207 by end hosts some traffics, e.g. UDP-based traffic, may not support 208 ECN. 210 Though the congestion happens in operator's network, in case that the 211 congestion information is transparent to operator, network 212 administration would be hard to take action to control the network 213 traffic of reason to network congestion. To improve the performance 214 of the network, it's better for operator to take network congestion 215 situation into network traffic management. 217 Many kinds of tunnels are widely deployed in current networks, even 218 in some scenarios all traffics transmitted through designated 219 tunnel(s). 221 Because the ingress and egress of tunnel are usually deployed by 222 operator, so it's easy for operator to execute operator's policy, for 223 example gating, flow control and dropping. The tunnel feedback 224 mechanism should be feasible for operator to collect network 225 congestion information in encapsulation segment. After obtaining 226 congestion information, operator could make policy at tunnel ingress 227 for traffic management taking these information into consideration. 229 ECN handling mechanisms in RFC 6040 specifies how ECN should be 230 handled for tunneling. In addition, RFC 6040, Appendix C provides 231 guidance to calculate congestion experienced in the tunnel itself. 232 However, there is no standardized mechanism by which the congestion 233 information inside the tunnel can be fed back from egress to ingress 234 router. 236 In the following sub-sections, some network tunnel scenarios are 237 discussed. 239 3.1 3GPP network scenario 241 Tunnels, including GRE [RFC2784], GTP [TS29.060], IP-in-IP [RFC2003] 242 or IPSec [RFC4301] etc, are widely deployed in 3GPP networks. And in 243 3GPP network tunnels are used to carry end user flows within the 244 backhaul network such as shown in Figure 1. 246 IP backhaul networks such as those of mobile networks are provisioned 247 and managed to provide the subscribed levels of end user service. 248 These networks are traffic engineered, and have defined mechanisms 249 for providing differentiated services and QoS per user or flow. 250 Policy to configure per user flow attributes in these networks have 251 traditionally been based on monitoring and static configuration. 253 Currently, these networks are increasingly used for applications that 254 demand high bandwidth. The nature of the flows and length of end user 255 sessions can lead to significant variability in aggregate bandwidth 256 demands and latency. In such cases, it would be useful to have a more 257 dynamic feedback of congestion information. In addition, eNB, SGW and 258 PGW are administrated by one mobile operator, mobile backhaul to 259 carry IP/UDP/GTP encapsulation is regally administrated by back haul 260 service operator. This aggregate congestion feedback could be used to 261 determine flow handling and admission control. 262 \|/ 263 | 264 | 265 +-|---+ +------+ +------+ 266 +--+ | | Tunnel1 | | Tunnel2 | | Ext 267 |UE|-(RAN)-| eNB |===========| S-GW |=========| P-GW |-------- 268 +--+ | | RAN | | Core | |Network 269 +-+---+ Backhaul +---+--+ Network +---+--+ 271 Figure 1: Example - Mobile Network and Tunnels 273 3.2 Network Function Virtualization Scenario 275 Telecoms networks contain an increasing variety of proprietary 276 hardware appliances, leading to increasing difficulty in lauching new 277 network services, as well as the complexity of integrating and 278 deploying these appliances in a network. 280 Network Functions Virtualisation (NFV) aims to address these problems 281 by decoupling the software from dedicated hardware platforms to a 282 range of industry standard server hardware for various network 283 services, through IT virtualization technology that can be moved to, 284 or instantiated in, various locations in the network as required. In 285 this way, it is expected to provide significant benefits for network 286 operators (reduced expenditures for network construction and 287 maintenance) and their customers (shortened time-to-market for new 288 network services). 290 Furthermore, service functions are preferred to be deployed and 291 managed in a data center manner, rather than being inserted on the 292 data-forwarding path between communicating peers as today. SFC WG is 293 currently working on a new framework to cope with this highly dynamic 294 routing problem for a network service, which requires that the 295 relevant data traffic be traversing a group of virtualized network 296 function nodes (VNFs), each of which could be applied at any layer 297 within the network protocol stack (network layer, transport layer, 298 application layer, etc.). [SFC] 300 As shown in Figure 2, in a SFC-enabled domain (e.g. with or across 301 network operator's deployed data centers), a PDP (Policy Decision 302 Point) is the central entity which is responsible for maintaining SFC 303 Policy Tables (rules for the boundary nodes on deciding which IP flow 304 to traverse which service function path), and enforcing appropriate 305 policies in SF Nodes and SFC Boundary Nodes. Beginning at the Ingress 306 node, at each hop of a given service function path (as decided by a 307 matched SFC policy rule/map), if the next function node is not an 308 immediate (L3) neighbor, packet are encapsulated and forwarded to 309 correspondent downstream function node, as shown in Figure 3. 311 . . . . . . . . . . . . . . . . . . . . . . . . . 312 . SFC Policy Enforcement . 313 . +-------+ . 314 . | |-----------------+ . 315 . +-------| PDP | | . 316 . | | |-------+ | . 317 . | +-------+ | | . 318 . . . | . . . . . | . . . . . | . . . . | . . . . 319 . . . | . . . . . | . . . . . | . . . . | . . . . 320 . | | | | . 321 . v v v v . 322 . +---------+ +---------+ +-------+ +-------+ . 323 . |SFC_BN_1 | |SFC_BN_n | | SF_1 | | SF_m | . 324 . +---------+ +---------+ +-------+ +-------+ . 325 . SFC-enabled Domain . 326 . . . . . . . . . . . . . . . . . . . . . . . . . 328 Figure 2: SFC Policy Enforcement Scheme 330 Network Service 331 +----------+ +----------+ +----------+ 332 | VNF#1 | tunnel#1 | VNF#2 | tunnels | VNF#n | 333 | Instance |-----------| Instance |- ... ... -| Instance | 334 +----------+ +----------+ +----------+ 335 ^ 336 | Virtualization 337 +--------------------------------------------------------+ 338 | Virtualization Platform | 339 +--------------------------------------------------------+ 341 Figure 3: Example - Mobile Network service chaining and Tunnels 343 However, using VNFs running commodity platforms can introduce 344 additional points of failure beyond those inherent in a single 345 specialized server, and therefore poses additional challenges on 346 reliability. [VNFPOOL] proposes using pooling techniques in response, 347 which requires maintaining a backup mapping among running VNF 348 instances for a given service function, and choosing from them for a 349 specific data flow. It is clear that it would be helpful to make more 350 efficient use of network capacity in case of local congestion, if the 351 choice is based on the ECN feedback as well as the running status 352 and/or physical resources accommodation of a candidate VNF instance. 354 3.3 Data Center Tenancy Scenario 356 In the scenario of data center of multi-tenant, network resource 357 would be shared between more than one tenants, and in order to 358 provide functional isolation and at the same time guarantee 359 scalability for tenants, the tunnel based isolation mechanisms, e.g. 360 VxLAN and STT etc, are provided. 362 In the scenario described above, hypervisor or vSwitch would act as 363 tunnel endpoint for the traffic between VMs, and tunnels are agnostic 364 to VMs, in other words, the congestion indication information such as 365 ECN flag marked by network entity of data center are agnostic to VMs. 366 To deal with this situation, two solutions could be used: 368 Solution 1: Using tunnel translation, hypervisor or vSwitch marks the 369 inner IP header according to ECN flag in outer IP header before 370 transmits packets to VM. 372 Solution 2: Using the congestion control mechanism provided in this 373 document between hypervisors or vSwitchs to do congestion control for 374 VMs' traffic. 376 4. Congestion Control Model 378 In this section, the basic congestion control model will be provided, 379 and each detailed aspect of this model will also be introduced in the 380 following subsection. 382 The congestion control model provides network administrator with a 383 method to manage the data traffic in its network domain. The basic 384 model consists of the following components: Ingress, Egress, 385 Feedback, Meter, Collector and Manager. 387 As shown in Figure 4, network traffic enters the tunnel through 388 tunnel ingress, passing through en-route routers, which will mark 389 packets according to ECN mechanism as specified in RFC3168, to tunnel 390 egress; the egress collects the congestion level information 391 encountered in tunnel and feeds back it to the corresponding ingress; 392 after receiving congestion information, the ingress takes actions to 393 control the traffic that passing through the path between the ingress 394 and egress to reduce the congestion level in the tunnel. 396 At egress, a module named Meter is used to estimate the congestion 397 level in the tunnel as described in the section above. A congestion 398 information feedback module, called Feedback, is used to control the 399 congestion information feedback procedure. 401 The metering module named Meter in the Egress node accounts the 402 congestion marks it receives. The Feedback module calculates the 403 amount of congestion and feeds back the congestion information to the 404 Ingress node. The Collector at the Ingress receives the congestion 405 information which is fed back from the Feedback module. The Manager 406 implements functions such as admission control and traffic 407 engineering according to the congestion level experienced in tunnel 408 to control the traffic to reduce the congestion level, the detailed 409 actions taken by the Manager are out of the scope of current 410 document. 412 congestion feedback signal 413 ######################################### 414 +-----#-------+ +------#----+ 415 | # | | # | 416 | # | | # | 417 | V | | # | 418 | +---------+ | +--------------+ | +--------+| 419 | |Collector| | | | | |Meter || 420 traffic| +---------+ | | | | +-----+--+|traffic 421 ======>| |Manager | |======================> | |Feedback||======> 422 | +---------+ | | Routers | | +--------+| 423 | | | (ECN-enabled)| | | 424 +-------------+ +--------------+ +-----------+ 426 Figure 4: Basic Feedback Model 428 To support traffic management and congestion information feedback in 429 tunnel, there are mainly three issues that this document discusses: 430 calculation of congestion level information, feeding back the 431 congestion information from egress to ingress, and implementation of 432 congestion control. The tunnel ingress/egress are assumed to be 433 compliant with RFC6040 and the tunnel interior routers are compliant 434 with RFC3168. 436 In addition, it should be noted that these tunnels may carry ECT or 437 Not-ECT traffic. A well defined mechanism for aggregate congestion 438 calculation should be able to work in the presence of all kinds of 439 traffic and would benefit from a common feedback mechanism and 440 protocol. 442 4.1 Congestion Calculation 444 This section discusses how to calculate congestion level experienced 445 in the tunnel, an example of how to calculate congestion level is 446 provided. In this document calculation of congestion in the tunnel is 447 based on the method described in RFC 6040, Appendix C. 449 The egress can calculate congestion using moving averages. The 450 proportion of packets not marked in the inner header that have a CE 451 marking in the outer header is considered to have experienced 452 congestion in the tunnel. Note that the packets are ECN capable and 453 not congestion-marked before tunnel. Since routers implementing RED 454 randomly select a percentage of packets to mark, this method can be 455 effectively used to expose congestion in the tunnel. 457 When the ingress is RFC6040 compliant, the packets collected by 458 egress can be divided into to 4 categories, shown in figure 5. The 459 tag before "|" stands for ECN field in outer header; and the tag 460 after "|" stands for ECN field in inner header. 462 "Not-ECN|Not-ECN" indicates traffic that does not support ECN, for 463 example UDP and Not-ECT marked TCP; "CE|CE" indicates ECN capable 464 packets that have CE-mark before entering the tunnel; "CE|ECT" 465 indicates ECN capable packets that are CE-marked in the tunnel; 466 "ECT|ECT" indicates ECN capable packets that have not experienced 467 congested in tunnel (or outside the tunnel). 469 +--------------------------+ 470 | Not-ECN|Not-ECN | 471 +--------------------------+ 472 | CE|CE | 473 +--------------------------+ 474 | CE|ECT | 475 +--------------------------+ 476 | ECT|ECT | 477 +--------------------------+ 479 Figure 5: ECN marking categories by outer/inner packet 481 Out of the total number of packets, if the quantity of CE|ECT packets 482 is A, the quantity of ECT|ECT packets is B, then the congestion level 483 (C) can be calculated as follows: 485 C=A/(A+B) 487 As an example, consider 100 packets to calculate the moving average 488 as shown in RFC 6040, Appendix C. Say that there are 12 packets that 489 have CE|ECT marks indicating that they have experienced congestion in 490 the tunnel. And, there are 58 packets that have ECT|ECT marks 491 indicating that there was no congestion in either the tunnel or 492 elsewhere. The egress can calculate congestions as: 494 C = 12/ (12 + 58) 495 = 12/70 (17% congestion) 497 4.2 Data Information 499 This section discusses congestion-related information that should be 500 conveyed from egress to ingress. 502 (1)Congestion volume. The information indicating the how much 503 congestion has been experienced in the tunnel by traffic passing 504 through the tunnel. Because there are both ECT packets and Not-ECT 505 packets passing through the tunnel network, and in case of 506 congestion, the ECT packets would be CE-marked instead of dropped and 507 tunnel egress can be aware of these CE-marked packets; but Not-ECT 508 packets would be dropped and tunnel egress cannot be aware of these 509 dropped packets, so it's hard for egress to calculate the precise 510 number of congested packets. According to the analysis in subclause 511 4.1, the congestion volume is preferred in the form of percentage, 512 e.g. 17.14%. 514 (2)Egress identifier. To control the traffic congestion in certain 515 tunnel, the ingress needs to have the knowledge of which traffic 516 should be controlled, especially for the case that the ingress 517 establishes tunnels with different egresses. So the egress identifier 518 should be transmitted together with congestion volume to ingress. 519 This identifier is usually the identifier of the tunnel or the 520 address of tunnel egress. 522 4.3 Congestion Feedback 524 This sub-section focuses on the discussion of feedback procedure. The 525 congestion feedback procedure conveys congestion status from egress 526 to ingress. The discussion of feedback protocol will be discussed in 527 the next section. 529 To reduce the overload, caused by this procedure, on network 530 especially in case the feedback signal goes through the same path as 531 data traffic, the feedback will only occur when congestion happens. 532 In other words, egress doesn't send feedback signal if there is no 533 congestion happens. Also egress will ignore ephemeral congestion and 534 only feed back congestion information if the congestion level goes 535 higher than a specified threshold (TH1) and/or lasts for a specified 536 period of time (T1). 538 When egress detects congestion level higher than TH1 and for a period 539 of T1, it sends feedback signal to ingress periodically (T2) until 540 the congestion level is lower than TH1. 542 4.4 Congestion Control 544 After ingress receives congestion information from egress, it will 545 take actions to try to reduce the congestion. For example, ingress 546 could choose to drop some packets or do certain traffic engineering 547 etc. 549 Usually, network policy would have impact on what action is to be 550 taken. For example, which packets to drop may be decided by the 551 agreement between subscriber and network administrator. The specific 552 choice of congestion alleviation measures taken by the ingress is out 553 of scope of this document. 555 The ingress will continue to implement control actions until there is 556 no congestion feedback from the egress. 558 5. Congestion Feedback Protocol 560 In different networks, there are always different tunnel protocols 561 deployed. For instance, the congestion feedback can be done either by 562 utilizing the existing tunnel protocol or using an alternative 563 protocol. For example, in 3GPP network GTP (GPRS Tunnel 564 Protocol)[TS29.060] is used as tunnel protocol to transmit traffic 565 between network entities. And because GTP protocol is easy to be 566 extended for additional information element, GTP itself would be a 567 good choice for congestion feedback. In some other networks an 568 independent protocol could be used for congestion feedback, for 569 example the network using tunnel protocols such as IP-in-IP 570 [RFC2003], GRE [RFC2784]. 572 Currently, this section mainly focuses on the discussion of 573 independent protocols for congestion feedback. There are two choices 574 for such an independent protocol, one is define as a new dedicated 575 protocol from scratch, the other one is meant to evaluate and reuse 576 the existing protocol(s). 578 5.1 Properties of Candidate Protocol 580 To feedback congestion efficiently there are some properties that are 581 desirable in the feedback protocol. 583 1. Congestion friendliness. The feeding back traffics are coexistence 584 with other traffics, so when congestion happens in the network, 585 the feeding back traffic should be reduced, So that feedback 586 itself will not congest the network further when the network is 587 already getting congested. In other words, feedback frequency 588 should adjust to network's congestion level. 590 2. Extensibility. The authors consider that using an existing 591 protocol, or extensions to an existing protocol is preferable. The 592 ability of a protocol to support modular extensions to report 593 congestion level as feedback is a key attribute of the protocol 594 under consideration. 596 3. Compactness. In different situations, there may be different 597 congestion information to be conveyed, and in order to reduce 598 network load, the information to be conveyed should be selectable, 599 i.e. only the required information should be possible to convey. 601 4. In/Out of band signal. The feedback message could be along the 602 same path with network data traffic, referred as in band signal; 603 or go through a different path with network data traffic, referred 604 as out of band signal. 606 5.2 IPFIX Extensions for Congestion Feedback 608 This section outlines IPFIX extensions for feedback of congestion. 609 The authors consider that IPFIX is a suitable protocol that is 610 reasonably easy to extend to carry tunnel congestion reporting. The 611 Feedback module acts as IPFIX exporter, and Collector module acts as 612 IPFIX Collector. 614 Since IPFIX is preferred to use SCTP as transport, it has the 615 foundation for congestion-friendly behavior, and because SCTP allows 616 partially reliable delivery [RFC3758] - IPFIX message channels can be 617 tagged so that SCTP does not retransmit certain losses. This makes it 618 safe during high levels of congestion in the reverse direction, to 619 avoid a congestion collapse.. When congestion occurs in the network, 620 the Exporter (Egress) can reduce the IPFIX traffic. Thus the feedback 621 itself will not congest the network further when the network is 622 already getting congested. When the Exporter detects network 623 congestion, it can also reduce IPFIX traffic frequency to avoid more 624 congestion in network while being able to sufficiently convey 625 congestion status. 627 Because the template mechanism in IPFIX is flexible, it allows the 628 export of only the required information. Sending only the required 629 information can also reduce network load. 631 The basic procedure for feedback using IPFIX is as follows: 632 (1)The exporter inform the collector how to interpret the IEs in 633 IPFIX message using template. Collector just accepts template 634 passively; which IEs to send is configured by other means that not 635 included in IPFIX specification. 637 (2)The exporter meters the traffic and sends the congestion level to 638 collector. 640 Congestion feedback using IPFIX is shown in the figures below. There 641 are two variations to congestion feedback model using IPFIX. In the 642 first one shown in Figure 6(a), congestion information is sent 643 directly from egress to ingress and ingress makes decisions according 644 this information. In the second case shown in Figure 6(b), congestion 645 information is sent to a mediation controller instead of tunnel 646 ingress; the controller is in charge of making decisions according to 647 network congestion and control the behavior of ingress node, for 648 example, reducing traffic or forbidding new traffic flows. In this 649 model the congestion information from egress to controller is 650 conveyed by IPFIX, but how controller controls the behavior of 651 ingress is out of scope of this document. 653 IPFIX 654 |-----------------------------------------| 655 | | 656 | | 657 | V 658 +----------+ tunnel +-----------+ 659 |Egress |========================== |Inress | 660 |(Exporter)| |(Collector)| 661 +----------+ +-----------+ 663 (a) Direct Feedback. 665 IPFIX +-----------+ 666 --------->|Controller |##################### 667 | |(Collector)| # 668 | +-----------+ # 669 | # 670 +----------+ tunnel +-----V-+ 671 |Egress | ===========================|Ingress| 672 |(Exporter)| +-------+ 673 +----------+ 675 (b) Mediated Feedback. 677 Figure 6: IPFIX Congestion Feedback Models 679 To support feeding back congestion information, some extensions to 680 the IPFIX protocol are necessary. According to the definition of 681 congestion-related information defined in "Data Mode" section, new 682 IEs conveying congestion level is defined for IPFIX. 684 Definition of new IE indicating congestion level. 685 Description: 686 The congestion level calculated by exporter. 687 Abstract Data Type: float32 688 Data Type Semantics: quantity 689 ElementId: TBD. 690 Status: current 692 The example below shows how IPFIX can be used for congestion 693 feedback. 695 (1) Sending Template Set The exporter use Template Set to inform the 696 collector how to interpret the IEs in the following Data Set. 698 +------------------------+--------------------+ 699 |Set ID=2 |Length=n | 700 +------------------------+--------------------+ 701 |Template ID=257 |Field Count=m | 702 +------------------------+--------------------+ 703 |exporterIPv4Address=130 |Field Length=4 | 704 +------------------------+--------------------+ 705 |collectorIPv4Address=211|Field Length=4 | 706 +------------------------+--------------------+ 707 |CongestionLevel=TBD1 |Field Length=2 | 708 +---------------------------------------------+ 709 |Enterprise Number=TBD2 | 710 +---------------------------------------------+ 712 (2) Sending Data Set The exporter meters the traffic and sends the 713 congestion information to collector by Data Set. 715 +------------------+-------------------+ 716 |Set ID=257 |Length=n | 717 +--------------------------------------+ 718 |192.0.2.12 | 719 +--------------------------------------+ 720 |192.0.2.34 | 721 +--------------------------------------+ 722 |0.1714 | 723 +--------------------------------------+ 725 +--------+ +---------+ 726 |Exporter| |Collector| 727 +--------+ +---------+ 728 | | 729 | | 730 | (1)Sending Template Set | 731 |------------------------------------->| 732 | | 733 +--------+ | 734 |metering| | 735 +--------+ | 736 | (2)Sending Data Set | 737 |------------------------------------->| 738 | . | 739 | . | 740 | . | 741 | | 742 | | 744 Figure 7: IPFIX Congestion Flow 746 Before sending congestion information to collector, the exporter 747 sends a Template set to Collector. The Template set specifies the 748 structure and semantics of the subsequent Data Set containing 749 congestion-related information. The Collector understands the Data 750 Sets that follow according to Template Set that was sent previously. 751 The exporting Process transmits the Template Set in advance of any 752 Data Sets that use that Template ID, to help ensure that the 753 Collector has the Template Record before receiving the first Data 754 Record. Data Records that correspond to a Template Record may appear 755 in the same and/or subsequent IPFIX Message(s). 757 The Exporter meters the traffic passing through it and generates flow 758 records. At this point, the Exporter may cache the records and then 759 send congestion cumulative information to the collector. When 760 Exporter detects that the network is heavily congested, it can change 761 the feedback frequency to avoid adding more congestion to network. 763 When receiving congestion related information, the Collector will 764 make decisions to control the traffic entering the tunnel to reduce 765 tunnel congestion. 767 5.3 Other Protocols 769 A thorough evaluation of other protocols have not been performed at 770 this time. 772 6. Benefits 774 This section provides a short discussion about what benefits the 775 tunnel congestion control would bring. 777 Tunnel congestion control is a kind of local congestion control, 778 where each tunnel is treated as an independent administrative domain 779 in terms of congestion feedback and control, and it only responds to 780 the congestion happened in the tunnel. The tunnel congestion control 781 is complementary with e2e ECN control. 783 The tunnel congestion feedback provides the network administrator 784 with network congestion level information that can be used as an 785 input for it local network management rather than relying solely on 786 the e2e congestion control or blind traffic throttling. If the tunnel 787 is congested it will be a waste of resource to allow new traffic to 788 enter, because they may eventually get dropped in the tunnel. It's 789 more efficient to have a control on new traffic at ingress. 791 7. Security Considerations 793 This document describes the tunnel congestion calculation and 794 feedback. For feeding back congestion, security mechanisms of IPFIX 795 are expected to be sufficient. No additional security concerns are 796 expected. 798 8. IANA Considerations 800 IANA assignment of parameters for IPFIX extension may need to be 801 considered in this document. 803 9. References 805 9.1 Normative References 807 [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, 808 October 1996. 810 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 811 Requirement Levels", BCP 14, RFC 2119, March 1997. 813 [RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., and P. 814 Traina, "Generic Routing Encapsulation (GRE)", RFC 2784, 815 March 2000. 817 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 818 of Explicit Congestion Notification (ECN) to IP", 819 RFC 3168, September 2001. 821 [RFC3758] Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P. 822 Conrad, "Stream Control Transmission Protocol (SCTP) 823 Partial Reliability Extension", RFC 3758, May 2004. 825 [RFC4301] Kent, S. and K. Seo, "Security Architecture for the 826 Internet Protocol", RFC 4301, December 2005. 828 [RFC5101] Claise, B., Ed., "Specification of the IP Flow Information 829 Export (IPFIX) Protocol for the Exchange of IP Traffic 830 Flow Information", RFC 5101, January 2008. 832 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 833 Notification", RFC 6040, November 2010. 835 [I-D.boucadair-sfc-framework] Boucadair, M. etc, "Service Function 836 Chaining: Framework & Architecture", draft-boucadair-sfc- 837 framework-00(work in progress), October 2013. 839 [I-D.zong-vnfpool-problem-statement] Zong, N. etc, "Virtualized 840 Network Function (VNF) Pool Problem Statement", draft- 841 zong-vnfpool-problem-statement-02(work in progress), 842 January 2014. 844 9.2 Informative References 846 [TS29.060]3GPP TS 29.060: "General Packet Radio Service (GPRS); GPRS 847 Tunnelling Protocol (GTP) across the Gn and Gp interface". 849 Authors' Addresses 851 Xinpeng Wei 852 Beiqing Rd. Z-park No.156, Haidian District, 853 Beijing, 100095, P. R. China 854 E-mail: weixinpeng@huawei.com 856 Zhu Lei 857 Beiqing Rd. Z-park No.156, Haidian District, 858 Beijing, 100095, P. R. China 859 E-mail:lei.zhu@huawei.com 861 Lingli Deng 862 Beijing, 100095, P. R. China 863 E-mail: denglingli@gmail.com