idnits 2.17.1 draft-zhuang-tsvwg-ai-ecn-for-dcn-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (October 18, 2019) is 1652 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'RFC2309' is defined on line 272, but no explicit reference was found in the text -- Obsolete informational reference (is this intentional?): RFC 2309 (Obsoleted by RFC 7567) Summary: 0 errors (**), 0 flaws (~~), 3 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 TSVWG Y. Zhuang 3 Internet-Draft B. Zhang 4 Intended status: Informational H. Pan 5 Expires: April 20, 2020 Huawei Technologies Co., Ltd. 6 October 18, 2019 8 Artificial Intelligence (AI) based ECN adaptive reconfiguration for 9 datacenter networks 10 draft-zhuang-tsvwg-ai-ecn-for-dcn-00 12 Abstract 14 This document is to provide an artificial intelligence (AI) based ECN 15 adaptive reconfiguration for datacenter networks. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at https://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on April 20, 2020. 34 Copyright Notice 36 Copyright (c) 2019 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (https://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 52 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . 2 53 1.2. Intent . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.3. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 55 2. Architecture of the AI ECN datacenter networks . . . . . . . 3 56 3. Scene-based ECN adaptive reconfiguration with AI . . . . . . 4 57 3.1. Scene Training . . . . . . . . . . . . . . . . . . . . . 5 58 3.2. Scene Identification and ECN Adaptive Reconfiguration . . 5 59 4. Data collection and AI ECN adaptive reconfiguration . . . . . 5 60 4.1. Data collection . . . . . . . . . . . . . . . . . . . . . 5 61 4.2. ECN adaptive Reconfiguration . . . . . . . . . . . . . . 6 62 5. Security Considerations . . . . . . . . . . . . . . . . . . . 6 63 6. Manageability Consideration . . . . . . . . . . . . . . . . . 6 64 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 65 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 6 66 8.1. Normative References . . . . . . . . . . . . . . . . . . 6 67 8.2. Informative References . . . . . . . . . . . . . . . . . 6 68 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 7 69 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 71 1. Introduction 73 1.1. Background 75 As defined in [RFC3168], Explicit Congestion Notification is 76 introduced for IP to allow congestion to be signaled before dropping 77 packets. As such, the latency of applications is reduced due to less 78 retransmission of the dropped packets. Besides, MPLS also supports 79 ECN defined in [RFC6679]. For tunneling, [RFC6040] defines how ECN 80 should be constructed in the case of IP-in-IP tunnels. 82 Meanwhile, the upper layer transports protocols, like TCP in 83 [RFC3168] and UDP based protocols DCCP in [RFC4341][RFC4342][RFC5632] 84 and RTP in [RFC6679] are defined to support ECN-capable functions. 86 With ECN marking, active queue management (AQM) can choose a non- 87 packet loss way to indicate congestion on the device, rather than 88 dropping packets which might ask for packet retransmission and 89 increase the latency. By using AQM in network devices, it can signal 90 to common congestion-controlled transports to manage the queue length 91 in the buffer and reduce the latency of traffics. Random Early 92 Detection (RED) specified in [RFC2309]is one of the AQM algorithms 93 that recommended to be implemented in routers. 95 As stated in [RFC7567], with proper parameters, RED can be an 96 effective algorithm. However, dynamically predicting the set of 97 parameters (minimum threshold and maximum threshold) is difficult. 98 As a result, its present use in the Internet is limited. Other AQM 99 algorithms have also been developed, while how to find proper 100 parameters of algorithms for application traffics is still difficult 101 and affect the network performance. 103 For data center networks, traffic patterns change with the deployment 104 of applications like storage and high performance computing and 105 changes of corresponding traffics which make the network more 106 dynamic, while such applications have more restrict requirements on 107 high throughput and ultra-low latency. In this area, a set of static 108 ECN configurations suitable for all traffics at all time challenges. 110 With this, this document is to provide a way to seek ECN adaptive 111 reconfiguration by using AI technologies in running data center 112 network environment. 114 1.2. Intent 116 Our intent is to seek proper parameters of ECN adaptive 117 reconfiguration by using artificial intelligence technologies to 118 achieve self-tuning in a running data center network, so as to 119 accommodate the changes of network resources to improve the network 120 performance. 122 We also offer this as a starting point for seeking adaptive 123 parameters for algorithms and network reconfigurations by using 124 advanced technologies of AI. We do not change the way ECN works 125 defined in [RFC3168]. With this, this document is to provide a way 126 to achieve ECN adaptive reconfiguration by using AI technologies in 127 dyanmic data center network environment. 129 1.3. Terminology 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 133 "OPTIONAL" in this document are to be interpreted as described in BCP 134 14 [RFC2119] [RFC8174] when, and only when, they appear in all 135 capitals, as shown here. 137 2. Architecture of the AI ECN datacenter networks 139 The following is a simple 2 layer data center network architecture 140 with an analyzer to process the AI ECN adaptive reconfiguration with 141 the changes of network traffics. 143 +------------------------------------------------------+ 144 | Analyzer | 145 +-.-----.-------------.-------.--------------.-----.---+ 146 . . . . . . 147 . . . . . . 148 . +---.-----------+ . . +-----------.---+ . 149 . | Spine | . . | Spine | . 150 . ++--+--+----+---+ . . +-+-+-+----+----+ . 151 . | | +----------.-------.---------------+ . 152 . | +-------------.-------.-+ | | | | | . 153 . | | +--.-------.--------+ | | . 154 . | +-------------.-------.------+ | | . 155 +---+--+-+ ++--+--.-+ +.-+--+--+ ++-+----.+ 156 | | | | | | | | 157 | Leaf | | Leaf | | Leaf | | Leaf | 158 ++------++ ++------++ ++------++ ++------++ 159 | | | | | | | | 160 | | | | | | | | 161 +++ +++ +++ +++ +++ +++ +++ +++ 162 |S| ...|S| |S| ...|S| |S| ...|S| |S| ...|S| 163 +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ 165 ........ information collecting path 167 -------- data path 169 Figure 1. The architecture of a 2-layer data center network 171 The analyzer can be integrated with spine or can be an independent 172 device which is left for implementation. In this design, it is 173 responsible for collecting device information and conducting the 174 induction for proper parameters for ECN adaptive reconfiguration 175 periodically. 177 3. Scene-based ECN adaptive reconfiguration with AI 179 The idea of AI ECN in this document is to identify the "scene" of the 180 current network at some time based on the collected information over 181 a period. The identified scene (which can also considered as a 182 network traffic pattern)is one of the scenes that are collected and 183 learned from datacenter networks running different traffics of 184 various applications in training process. The ECN settings of these 185 scenes are decided based on human experience. As such, the ECN 186 parameters of current network can be tuned to the settings of the 187 identified scene. This adaptive reconfiguration process is running 188 periodically to accommodate changes of the running network 189 environment due to traffic changes. 191 3.1. Scene Training 193 Scene training is the first process in the procedure. It composes of 194 two steps. Firstly, construct typical scenes and generate a learning 195 model to identify these scenes based on a set of network performance 196 indicators. Secondly, provide proper ECN settings for these typical 197 scenes based on human experience. 199 In the first step, it might need the network operator to select some 200 typical applications and the combinations of traffics based on 201 experience to be used as the typical training scenes. For these 202 typical scenes, we run a learning algorithm (for example, neutral 203 network) to learn the characteristics of these scenes from 204 periodically collected network performance indicators. 206 The selected network performance indicators can be device's port 207 bandwidth, queue size, etc al. which might be related to the 208 applications and traffics in the networks. 210 While in the second step, human experience from network 211 administrators can be used to provide proper ECN configurations for 212 these typical scenes. AI technologies can also be used to enrich the 213 scene sets based on these human experience, which is left for 214 implementation. 216 3.2. Scene Identification and ECN Adaptive Reconfiguration 218 In the practical network, the analyzer periodically collects 219 information of selected network performance indicators from network 220 nodes. The information is then used as input to the pre-learnt model 221 and get the identified scene. The ECN settings of network devices 222 will then be adaptively reconfigured to the parameters of the 223 identified scene periodically. 225 The adaptive cycle of the period can be decided according to 226 experience or it can be a training result in previous process defined 227 in section 3.1. 229 4. Data collection and AI ECN adaptive reconfiguration 231 4.1. Data collection 233 In both training and adaptive reconfiguration process, the analyzer 234 needs to collect information of the network i.e. a set of network 235 performance indicators. 237 The data collection can be achieved by grpc or yang-push or other 238 protocols. 240 4.2. ECN adaptive Reconfiguration 242 The adaptive reconfiguration of ECN in a running network environment 243 can be achieved by control-plane protocols such as netconf. 245 5. Security Considerations 247 TBD 249 6. Manageability Consideration 251 TBD 253 7. IANA Considerations 255 No IANA action 257 8. References 259 8.1. Normative References 261 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 262 Requirement Levels", BCP 14, RFC 2119, 263 DOI 10.17487/RFC2119, March 1997, 264 . 266 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 267 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 268 May 2017, . 270 8.2. Informative References 272 [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, 273 S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., 274 Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, 275 S., Wroclawski, J., and L. Zhang, "Recommendations on 276 Queue Management and Congestion Avoidance in the 277 Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, 278 . 280 [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition 281 of Explicit Congestion Notification (ECN) to IP", 282 RFC 3168, DOI 10.17487/RFC3168, September 2001, 283 . 285 [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion 286 Control Protocol (DCCP) Congestion Control ID 2: TCP-like 287 Congestion Control", RFC 4341, DOI 10.17487/RFC4341, March 288 2006, . 290 [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for 291 Datagram Congestion Control Protocol (DCCP) Congestion 292 Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, 293 DOI 10.17487/RFC4342, March 2006, 294 . 296 [RFC5632] Griffiths, C., Livingood, J., Popkin, L., Woundy, R., and 297 Y. Yang, "Comcast's ISP Experiences in a Proactive Network 298 Provider Participation for P2P (P4P) Technical Trial", 299 RFC 5632, DOI 10.17487/RFC5632, September 2009, 300 . 302 [RFC6040] Briscoe, B., "Tunnelling of Explicit Congestion 303 Notification", RFC 6040, DOI 10.17487/RFC6040, November 304 2010, . 306 [RFC6679] Westerlund, M., Johansson, I., Perkins, C., O'Hanlon, P., 307 and K. Carlberg, "Explicit Congestion Notification (ECN) 308 for RTP over UDP", RFC 6679, DOI 10.17487/RFC6679, August 309 2012, . 311 [RFC7567] Baker, F., Ed. and G. Fairhurst, Ed., "IETF 312 Recommendations Regarding Active Queue Management", 313 BCP 197, RFC 7567, DOI 10.17487/RFC7567, July 2015, 314 . 316 Acknowledgements 318 We would like to thank the following persons for their great efforts 319 and contributions to the work: Huafeng Wen, Binghui Wu, Weiqin Kong, 320 Ke Meng, Xitong Jia, Liang Shan, Siyu Yan, Weishan Deng, Boding Wang, 321 Jungan Yan, Haonan Ye and Liang Zhang. 323 Authors' Addresses 325 Yan Zhuang 326 Huawei Technologies Co., Ltd. 328 Email: zhuangyan.zhuang@huawei.com 329 Bai Zhang 330 Huawei Technologies Co., Ltd. 332 Email: white.zhangbai@huawei.com 334 Haotao Pan 335 Huawei Technologies Co., Ltd. 337 Email: panhaotao@huawei.com