idnits 2.17.1 draft-fan-opsawg-transmission-interruption-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 15, 2013) is 3936 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- No issues found here. Summary: 0 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Operations and Management Area Working Group P. Fan 3 Internet-Draft L. Li 4 Intended status: Informational China Mobile 5 Expires: January 16, 2014 July 15, 2013 7 Requirements for IP/MPLS network transmission interruption duration 8 draft-fan-opsawg-transmission-interruption-03 10 Abstract 12 The transmission performance of IP/MPLS network affects upper layer 13 services and networks, but there is no consensus in the industry on 14 transmission interruption for IP/MPLS network up to now. This memo 15 studies requirements for the interruption duration criteria in 16 several service scenarios. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at http://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on January 16, 2014. 35 Copyright Notice 37 Copyright (c) 2013 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 2. Services and Performance Criteria . . . . . . . . . . . . . . 3 54 2.1. Softswitch . . . . . . . . . . . . . . . . . . . . . . . 3 55 2.2. SS7 transport . . . . . . . . . . . . . . . . . . . . . . 5 56 2.3. LTE Backhaul . . . . . . . . . . . . . . . . . . . . . . 6 57 2.4. Ethernet VPN . . . . . . . . . . . . . . . . . . . . . . 6 58 2.5. IPTV . . . . . . . . . . . . . . . . . . . . . . . . . . 7 59 3. Other considerations . . . . . . . . . . . . . . . . . . . . 7 60 4. Security Considerations . . . . . . . . . . . . . . . . . . . 7 61 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 7 62 6. Appendix: Impact Analysis on Transmission Quality of IP 63 Carried Softswitch Voice . . . . . . . . . . . . . . . . . . 7 64 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 65 8. Informative References . . . . . . . . . . . . . . . . . . . 10 66 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 68 1. Introduction 70 Today's IP/MPLS network is widely used as a bearer network to carry 71 diversified packet switched services. The transmission qualities of 72 these services are closely related to the performance of bearer 73 layers, as network failure, delay, congestion and other abnormities 74 will inevitably bring about service interruption and user perception 75 degradation. However, there is no consensus in the industry on 76 transmission interruption for IP/MPLS network up to now. This memo 77 studies relationships between service performance and transmission 78 interruption duration in several scenarios, and is intended to reach 79 a list of requirements for these interruption duration criteria. 81 For a long time the industry has been aspiring for the so-called 82 golden standard for network resilience, that is the 50-millisecond 83 recovery threshold. [HeavyReading] gives us a basic introduction to 84 the origin of this fast protection legacy which can date back to 85 1980s. The 50ms threshold was established informally in the early 86 1980s, and then formally through standardization of [G.841] 87 recommendation on SDH network protection architects. The specific 88 requirement shows a maximum threshold for detecting and restoring a 89 fault of 60ms, which adds up fault detection duration of less than 90 10ms and protection switching time of less than 50ms. The report 91 also mentions original concerns that the threshold results from. The 92 voice channel banks deployed in early 1980s had limited fault 93 tolerance. Failures that lasted longer than 200ms would generate a 94 Carrier Group Alarm (CGA) which caused the channel bank to terminate 95 all connections over that given TDM line. So an outage budget was 96 developed by carriers and the 50ms standard was employed to protect 97 voice services. However newer channel banks at that time had started 98 to implement a CGA timer of 2s, so the 50ms protection was adopted to 99 protect a small and diminishing fraction of digital network. 101 Historically this 50ms fast protection speed has been achieved by SDH 102 network. Using various fast convergence technics, IP/MPLS is also 103 able to react within 50ms. As for network applications that are 104 carried by optical or packet core, changes have been made through the 105 past decades, accompanied by the continuing questions about needs for 106 50ms protection. Here we list three basic considerations about 107 services and their requirement for IP/MPLS: for services like TDM 108 over IP/MPLS, the traditional 50ms guarantee should be kept and met; 109 for current IP services (e.g. voice, internet), experiences or 110 experiments are to be provided for guidance; for services in future, 111 we are supposed to propose requirement early and give consideration 112 to IP/MPLS. 114 2. Services and Performance Criteria 116 Services delivered by IP/MPLS network have different transmission 117 quality requirements, thus introduce different performance criteria 118 for the bearing IP/MPLS network. We believe there are two principles 119 that need to be considered during network and service design, 120 configuration and operation. The IP/MPLS bearer should satisfy 121 quality requirements of upper level services and applications, while 122 services and applications should also take into account the intrinsic 123 IP capabilities. In this section we will describe concerns on IP/ 124 MPLS and service mutual adaptation from aspects of several kinds of 125 service scenarios. 127 2.1. Softswitch 129 From the softswitch point of view, the IP carrying nature imposes 130 certain influence to the service quality. Especially when speech is 131 delivered by IP, the communication quality of voice is impaired, and 132 in turn makes higher requirements for the transmission performance of 133 IP. The following table gives a list of criteria regarding 134 transmission quality of a typical GSM network as well as impacting 135 factors brought by IP bearer. 137 +-----------------------------------------+------------------------+ 138 | Criteria of GSM | Impacting Factors | 139 | Transmission Quality | Brought by IP Bearer | 140 +----------+------------------------------+------------------------+ 141 | |Call loss of wireless channel | None | 142 | +------------------------------+------------------------+ 143 | | Call loss between switches | Failure of Nc/Mc | 144 |Call Loss | (typical value: <=1%) | interface carried by IP| 145 | +------------------------------+------------------------+ 146 | | Call loss between switch and | None | 147 | | BSC (typical value: <=0.5%) | | 148 +----------+------------------------------+------------------------+ 149 | Call | Call cut-off rate | Failure of Nc/Mc | 150 | Cut-off | (typical value: <1%) | interface carried by IP| 151 +----------+------------------------------+------------------------+ 152 | | Service providing delay | None | 153 | +------------------------------+------------------------+ 154 |Connection| Calling party connection | IP carried signaling | 155 | Delay | delay (typical value: <=4s) | delay | 156 | +------------------------------+------------------------+ 157 | |Called party connection delay | None | 158 | | (typical value: <=4s) | | 159 +----------+------------------------------+------------------------+ 161 If voice is carried by IP, communication quality criteria of call 162 loss, call cut-off and connection delay are likely to be influenced. 163 This subsection focuses on the three criteria and their impacting 164 factors to give requirements for softswitch and IP bearer networks, 165 with detailed analysis described in the appendix. Note that the 166 current discussion on softswitch is focused on quality of 167 transmission while not on quality of voice. In another word, the 168 scope of discussion is limited to network related QoS aspect, while 169 subjective QoE criteria such as PESQ (Perceptual Evaluation of Speech 170 Quality) and MOS (Mean Opinion Score) are left to later revisions. 172 Call loss related requirement: The duration of SCTP interface 173 association timer should be shorter than that of the state machine 174 message timer of upper layer protocols, and this duration is 175 further recommended to be no longer than 6 seconds in order to 176 maintain detection sensitivity; the interruption duration of IP 177 bearer network should be as short as possible to avoid call loss, 178 and this duration is further recommended to be no longer than 5 179 seconds. 181 Call cut-off related requirement: The SCTP association should be 182 guaranteed during IP layer interruption to avoid interface 183 breakoff alert. The requirements are the same as those related to 184 call loss. 186 Connection delay related requirement: The IP convergence time should 187 be no longer than 3 seconds to ensure that connection delay is 188 shorter than 4 seconds. 190 The overall requirement for IP/MPLS interruption duration is no 191 longer than 3 seconds. 193 2.2. SS7 transport 195 The Signaling System No. 7 (SS7/C7) network is one of the examples of 196 the principle that services should take into account the ability of 197 IP. The bearer of SS7 protocol stack has been experiencing evolution 198 from TDM to IP. Traditionally the user parts of SS7 (including MAP, 199 CAP, BSSAP+, ISUP, etc.) are carried by MTP layers, but the bearer 200 has gradually been evolved into a packetized form with SIGTRAN 201 (including M2PA, M2UA, M3UA, etc.) using SCTP associations over IP. 202 The change requires transport layer to take mechanisms to meet demand 203 of SCN signaling, and more importantly it requires protocols to make 204 adaption to the "best effort" fact of IP. 206 The SIGTRAN uses an architecture that can be described as standard IP 207 plus unified transport plus diversified adaption units. It 208 introduces SCTP to realize reliable signaling transport over IP. The 209 SCTP itself provides reliable transmission mechanisms, such as path 210 selection and monitoring, validation and acknowledgment mechanisms, 211 and retransmission timing management. 213 The unreliable nature of IP makes it necessary for the upper-level 214 protocols to be more tolerable to the possible instability of bearer. 215 Once a service request from a UE is accepted, the system allocates 216 resources and establishes paths for the user. A breakoff caused by 217 IP will result in signaling disconnection or rerouting. Signaling 218 transmission path may also be switched back after IP layer restores. 219 Frequent switchovers and disconnections lead to unnecessary system 220 cost and service interruption, so parameters should be configured a 221 little bit "insensitive" to try to sustain connections on control 222 plane. 224 One of the examples of parameter configuration is the timer value. 225 The following gives two cases about SCTP on transport layer and M2PA 226 on adaption layer. The values should not be set very small to 227 prevent unnecessary disconnection caused by IP instability. However, 228 because upper services of SS7 may also have timeout rules, values 229 should not be set very large too to avoid violating the rules. 231 1) SCTP 233 SCTP uses RTO to manage timeout duration for retransmission in case 234 of feedback missing. The RTO is given an initial, a max and a min 235 value, and is calculated instantaneously with a set of management 236 rules. Many other parameters are used for fault detection in SCTP. 237 Association.Max.Retrans is used to indicate the upper limit of number 238 of possible retransmission without considering endpoint down. 239 Path.Max.Retrans is a similar value to detect path failure. The 240 parameters together characterize the ability of SCTP to tolerate 241 bearer downwards and provide reliable SS7 transport upwards. The 242 typical values of the parameters are RTO.Initial = 0.5 sec, RTO.MIN = 243 0.5 sec, RTO.MAX = 1.5 sec, Path.Max.Retrans = 5, Assoc.Max.Retrans = 244 10. 246 2) M2PA 248 Although protocols like H.248 and BICC can be carried directly upon 249 SCTP, the user part protocols of SS7 usually have to be carried by 250 SCTP/IP with the help of different adaption layers. In this case, 251 the attributes of adaption layers, e.g. M2PA used between STPs, are 252 more important to SS7. M2PA uses a T7 timer to indicate the maximum 253 delay of acknowledgement and start T7 at the time of data 254 transmission. If no message is acknowledged after the maximum 255 waiting time, T7 expires and M2PA sends a message of out of service 256 to the peer end. Because propagation delays in IP networks are more 257 variable than in traditional SS7 networks, the value of T7 should be 258 set considering IP propagation delays, as well as acknowledgement 259 time, SCTP slow-start algorithms, upper service timers and other 260 factors. Typical value of T7 is 7~10 sec. 262 Parameter configuration induced tolerance to bearer may have some 263 influence on service, but it avoids service cut-off or severe user 264 perception degradation. For services like SMS or route lookup, 265 possible latency may be introduced, but operations can still be 266 completed after short delay. Because SMS has no strict requirement 267 for instantaneity, impact on service is limited. If route lookup 268 takes more time due to IP interruption and convergence, user may 269 experience longer setup delay when dialing. For service of location 270 update, even if operation fails because bearer is interrupted for too 271 long, UE has the mechanism to initiate request again. 273 2.3. LTE Backhaul 275 To be further analyzed. 277 2.4. Ethernet VPN 279 Ethernet VPNs (e.g. VPLS) are used to provide transparent Ethernet 280 type layer 2 connections for customers. Ethernet frames are treated 281 as service payload and encapsulated and transported in providers MPLS 282 network. The interruption criteria of IP/MPLS bearer should 283 guarantee continuity of Ethernet service, and IP/MPLS failover is not 284 supposed to generate outage of Ethernet service. 286 [Y.1731] and [IEEE802.1ag] describe in detail OAM functions and 287 mechanisms for Ethernet, with specific recommendation on connectivity 288 fault management. Ethernet uses continuity check function to detect 289 loss of continuity between any pair of MEPs in a MEG, and this 290 function is realized by sending CCMs (connectivity check messages) 291 between peer MEPs. When a MEP does not receive CCM from a peer MEP 292 within a certain interval, it detects loss of continuity to that peer 293 MEP. The threshold interval is specified as 3.5 times the CCM 294 transmission period, which corresponds to a loss of three consecutive 295 CCMs from the peer MEP, and the CCM transmission period is 296 recommended to be the default value of 1 second. So the interruption 297 duration of IP/MPLS for Ethernet VPN services should be less than 3 298 seconds. 300 2.5. IPTV 302 To be further analyzed. 304 3. Other considerations 306 So far this document has focused on use cases and their requirement 307 for IP/MPLS, and other practical issues are not included in this 308 version. For example, an IP/MPLS packet core is expected to carry a 309 variety of services, so the requirement for IP/MPLS may have to 310 include additional concerns on this multi-service co-existence 311 scenario. A simple and straight-forward way may be to satisfy the 312 most critical need for protection time required by the services. 313 Another issue is related to service awareness. Whether service type 314 is or can be known by IP/MPLS would influence the ability of IP/MPLS 315 to provide reliability guarantee accordingly. It seems to be easier 316 to perform service identification on edge devices than network core. 317 We believe these kinds of issues need to be taken into account, and 318 currently we will just leave them to be updated in future revisions. 320 4. Security Considerations 322 TBD 324 5. IANA Considerations 326 This memo includes no request to IANA. 328 6. Appendix: Impact Analysis on Transmission Quality of IP Carried 329 Softswitch Voice 331 This section describes impact on transmission quality of softswitch 332 voice when carried by IP and requirements for IP bearer convergence 333 time. 335 1) Call Loss 336 Call loss is used to describe the circumstance where a phone call 337 fails to establish after initiated by a subscriber due to network 338 faults. In the practical network, the call loss rate is mainly 339 associated by the factors as follows: 341 1. Interfaces, including Nc, Mc and interface between MSS and SG. 343 2. State machine message timer. If a timeout takes place, the state 344 machine releases signaling messages, producing a call loss. 345 Typical value of BICC timer is 10~15 seconds and value of DTAP 346 timer about 15 seconds. 348 3. Interface association timer. Associations breaks off at the 349 expiration of timer. 351 4. Bearer network convergence time. 353 If the configured timer duration of a state machine is shorter than 354 the timer duration of interface association, then although interface 355 association may not be broken off, call loss is still possible to 356 occur due to message timer expiration. If the association timer 357 duration is shorter than IP routing convergence time, the association 358 is considered broken off by SCTP, hence message loss at interface 359 between MSS and SG as well as interface Nc results in massive call 360 loss, and new calling request cannot be satisfied because of 361 interface Mc breakoff. In this case, the call loss rate can be 362 calculated as 364 Call Loss Rate = ( IP Convergence Time + Association Restoration 365 Time ) * CAPS / BHCA. 367 However, if the association timer duration is longer than IP routing 368 convergence time, then the association is considered normal by SCTP, 369 and data will be retransmitted. Although this may cause buffer 370 overflow leading to call loss, the call loss rate is possible to 371 achieve approximately zero if buffer is big enough. 373 From the analysis above and practical operation experience, the 374 requirements for softswitch and IP bearer are as follows: the 375 duration of SCTP interface association timer should be shorter than 376 that of the state machine message timer, and this duration is further 377 recommended to be no longer than 6 seconds in order to maintain 378 detection sensitivity; the interruption duration of IP bearer network 379 should be as short as possible to avoid call loss during the IP layer 380 interruption period, and this duration is further recommended to be 381 no longer than 5 seconds. 383 2) Call Cut-off 385 Call cut-off is referred to the abnormal release during a phone call 386 due to reasons other than intentional release by any of the parties 387 involved in the call. The call cut-off rate is related with: 389 1. Interfaces, including Nc and interface between MSS and SG. 391 2. Interface association timer. 393 3. Bearer network convergence time. 395 If the association timer duration is shorter than IP routing 396 convergence time, established phone calls will be released once 397 interruption of interface Nc or interface connecting MSS and SG is 398 detected. In the case of association breakoff, call cut-off rate can 399 be calculated as 401 Call Cut-off Rate = ( CAPS * Call Duration ) * Busy Hour Association 402 Breakoffs / BHCA. 404 While if the association is not interrupted, the call cut-off rate 405 can be approximately zero. 407 In conclusion, the SCTP association should be guaranteed during IP 408 layer interruption to avoid interface breakoff alert. The 409 requirements for softswitch and IP bearer are the same as those 410 related to call loss. 412 3) Connection Delay 414 The connection delay from a call initiation by a calling party to 415 PLMN should be no longer than 4 seconds. This delay is affected by 416 factors below: 418 1. RRC connection setup delay (irrelevant to whether service is 419 carried by IP or not). 421 2. Core network signaling interaction delay. The message number at 422 interface Nc/Nb is 6, and is 8 (calling side) or 16 (called side, 423 in case of IP-IP) at interface Mc. Each message is with a delay 424 of no longer than 50 milliseconds. Calling message delay at 425 interface Nc is no longer than 300 milliseconds. If long 426 distance call is made though CMN, the message delay is to be 427 increased by transmission delay of 5 msec/km and CMN process 428 delay. So the message delay is likely to be 400 milliseconds. 430 3. IP bearer network QoS and load. 432 The connection delay is influenced by the delay criterion defined in 433 the IP bearer network QoS, and is raised by delay, jitter, packet 434 loss caused by network overload. In addition, if the configured 435 timer duration of interface association is too long, the SCTP 436 sensitivity to the retransmitted messages after packet loss will be 437 decreased, which increases connection delay. 439 Connection delay is generally expressed as 441 Connection Delay = IP convergence time + RRC connection setup delay 442 + Signaling Interaction Delay, 444 and is no longer than 4 seconds. So the IP network in normal working 445 state should be constrained within a certain range of load to ensure 446 that delay is shorter than 50 milliseconds, while in interruption 447 state the IP convergence time should be no longer than 3 seconds to 448 ensure that connection delay is shorter than 4 seconds. 450 From the analysis of IP/MPLS performance according to the three 451 criteria above, we suggest the transmission interruption duration of 452 IP/MPLS network for softswitch service should be no longer than 3 453 seconds. 455 7. Acknowledgements 457 The authors would like to thank Chris Donley and Melinda Shore for 458 their kind help in content enrichment, and Christopher Liljenstolpe, 459 Andrew Malis and Adrian Farrel for their helpful comments on the 460 document. 462 8. Informative References 464 [G.841] ITU-T Recommendation G.841, ., "Types and characteristics 465 of SDH network protection architectures", October 1998. 467 [HeavyReading] 468 Bennett, G., "Resilience Reliability and OAM in Converged 469 Network", Heavy Reading, Vol. 2, No. 6, February 2004. 471 [IEEE802.1ag] 472 IEEE Std 802.1ag-2007, ., "IEEE Standard for Local and 473 metropolitan area networks, Virtual Bridged Local Area 474 Networks, Amendment 5: Connectivity Fault Management", 475 December 2007. 477 [Y.1731] ITU-T Recommendation Y.1731, ., "OAM Functions and 478 Mechanisms for Ethernet based Networks", July 2011. 480 Authors' Addresses 482 Peng Fan 483 China Mobile 484 32 Xuanwumen West Street, Xicheng District 485 Beijing 100053 486 P.R. China 488 Email: fanpeng@chinamobile.com 490 Lianyuan Li 491 China Mobile 492 32 Xuanwumen West Street, Xicheng District 493 Beijing 100053 494 P.R. China 496 Email: lilianyuan@chinamobile.com