<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc SYSTEM "http://xml.resource.org/authoring/rfc2629.dtd" [
	<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
	<!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
	<!ENTITY RFC3168 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3168.xml">
	<!ENTITY RFC4342 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4342.xml">
	<!ENTITY RFC4341 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4341.xml">
	<!ENTITY RFC5632 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5632.xml">
	<!ENTITY RFC6040 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6040.xml">
	<!ENTITY RFC6679 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6679.xml">
	<!ENTITY RFC2309 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2309.xml">
	<!ENTITY RFC7567 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7567.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="no" ?>
<?rfc strict="no" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no"?>
<?rfc sortrefs="yes" ?>
<rfc category="info" docName="draft-zhuang-tsvwg-ai-ecn-for-dcn-00" ipr="trust200902">
	<front>
		<title abbrev="AI ECN adptive reconfiguration">
                Artificial Intelligence (AI) based ECN adaptive reconfiguration for datacenter networks </title>
		<author initials="Y. Z." surname="Zhuang" fullname="Yan Zhuang">
			<organization>Huawei Technologies Co., Ltd.</organization>
			<address>
				<email>zhuangyan.zhuang@huawei.com</email>
			</address>
		</author>
		<author initials="B.Z." surname="Zhang" fullname="Bai Zhang">
			<organization>Huawei Technologies Co., Ltd.</organization>
			<address>
				<email>white.zhangbai@huawei.com</email>
			</address>
		</author>
		<author initials="H.P." surname="Pan" fullname="Haotao Pan">
			<organization>Huawei Technologies Co., Ltd.</organization>
			<address>
				<email>panhaotao@huawei.com</email>
			</address>
		</author>
		<date month="October" year="2019"/>
		<area>TSV</area>
		<workgroup>TSVWG</workgroup>
		<abstract>
			<t>This document is to provide an artificial intelligence (AI) based ECN adaptive reconfiguration for datacenter networks.
		</t>
		</abstract>
	</front>
	<middle>
		<section title="Introduction" anchor="introduction">
		<section title="Background" anchor="background">
			<t>As defined in <xref target="RFC3168"/>, Explicit Congestion Notification is introduced for IP to allow congestion to be signaled before 
			dropping packets. As such, the latency of applications is reduced due to less retransmission of the dropped packets. Besides, 
			MPLS also supports ECN defined in <xref target="RFC6679"/>. For tunneling, <xref target="RFC6040"/> defines how ECN should be constructed in the case of IP-in-IP tunnels.</t>

			<t>Meanwhile, the upper layer transports protocols, like TCP in <xref target="RFC3168"/> and UDP based protocols DCCP in <xref target="RFC4341"/><xref target="RFC4342"/><xref target="RFC5632"/> and RTP in 
			<xref target="RFC6679"/> are defined to support ECN-capable functions.</t>
			
			<t>With ECN marking, active queue management (AQM) can choose a non-packet loss way to indicate congestion on the device, rather than dropping
			packets which might ask for packet retransmission and increase the latency. By using AQM in network devices, it can signal to common congestion-controlled
			transports to manage the queue length in the buffer and reduce the latency of traffics. Random Early Detection (RED) specified in <xref target="RFC2309"/>is one of the 
			AQM algorithms that recommended to be implemented in routers.
			</t>

			<t>As stated in <xref target="RFC7567"/>, with proper parameters, RED can be an effective algorithm. However, dynamically predicting the set of parameters (minimum 
			threshold and maximum threshold) is difficult. As a result, its present use in the Internet is limited. Other AQM algorithms have also been developed, 
			while how to find proper parameters of algorithms for application traffics is still difficult and affect the network performance. 
			</t>

			<t>For data center networks, traffic patterns change with the deployment of applications like storage and high performance computing and changes of
			corresponding traffics which make the network more dynamic, while such applications have more restrict requirements on high throughput and ultra-low 
			latency. In this area, a set of static ECN configurations suitable for all traffics at all time challenges. </t>

			<t>With this, this document is to provide a way to seek ECN adaptive reconfiguration by using AI technologies in running data center network environment.</t>
		</section>
		<section title ="Intent">
			<t>Our intent is to seek proper parameters of ECN adaptive reconfiguration by using artificial intelligence technologies to achieve self-tuning in a running 
			data center network, so as to accommodate the changes of network resources to improve the network performance.</t>

			<t>We also offer this as a starting point for seeking adaptive parameters for algorithms and network reconfigurations by using advanced technologies of AI. 
			We do not change the way ECN works defined in <xref target="RFC3168"/>. With this, this document is to provide a way to achieve ECN adaptive reconfiguration by
			using AI technologies in dyanmic data center network environment.</t>
			
		</section>
		<section title="Terminology">
			<t>	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
      NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
      "MAY", and "OPTIONAL" in this document are to be interpreted as
      described in BCP 14 <xref target="RFC2119"/>
				<xref target="RFC8174"/> when, and only when, they
      appear in all capitals, as shown here.				
				
			</t>
		</section>
		</section>
		
		<section title="Architecture of the AI ECN datacenter networks">
		<t>The following is a simple 2 layer data center network architecture with an analyzer to process the AI ECN adaptive reconfiguration with the changes of network traffics.
		</t>
		<figure>
			<artwork>


  +------------------------------------------------------+
  |                     Analyzer                         |
  +-.-----.-------------.-------.--------------.-----.---+
    .     .             .       .              .     .
    .     .             .       .              .     .
    . +---.-----------+ .       .  +-----------.---+ .
    . |     Spine     | .       .  |     Spine     | .
    . ++--+--+----+---+ .       .  +-+-+-+----+----+ .
    .  |  |  +----------.-------.---------------+    .
    .  |  +-------------.-------.-+  | | |    | |    .
    .  |          |  +--.-------.--------+    | |    .
    .  |  +-------------.-------.------+      | |    .
   +---+--+-+    ++--+--.-+    +.-+--+--+    ++-+----.+
   |        |    |        |    |        |    |        |
   |  Leaf  |    |  Leaf  |    |  Leaf  |    |  Leaf  |
   ++------++    ++------++    ++------++    ++------++
    |      |      |      |      |      |      |      |
    |      |      |      |      |      |      |      |
   +++    +++    +++    +++    +++    +++    +++    +++
   |S| ...|S|    |S| ...|S|    |S| ...|S|    |S| ...|S|
   +-+    +-+    +-+    +-+    +-+    +-+    +-+    +-+

   ........  information collecting path

   --------  data path

   Figure 1. The architecture of a 2-layer data center network
			</artwork>
			</figure>
		<t>The analyzer can be integrated with spine or can be an independent device which is left for implementation. 
		In this design, it is responsible for collecting device information and conducting the induction for proper parameters 
		for ECN adaptive reconfiguration periodically.</t>
		</section>

		<section title="Scene-based ECN adaptive reconfiguration with AI">
			<t>The idea of AI ECN in this document is to identify the “scene” of the current network at some
			time based on the collected information over a period. The identified scene (which can also considered
			as a network traffic pattern)is one of the scenes that are collected and learned from datacenter networks
			running different traffics of various applications in training process. The ECN settings of these scenes
			are decided based on human experience. As such, the ECN parameters of current network can be tuned to the 
			settings of the identified scene. This adaptive reconfiguration process is running periodically to accommodate 
			changes of the running network environment due to traffic changes.</t>

			<section title ="Scene Training">
			<t>Scene training is the first process in the procedure. It composes of two steps. Firstly, construct typical
			scenes and generate a learning model to identify these scenes based on a set of network performance indicators. 
			Secondly, provide proper ECN settings for these typical scenes based on human experience.</t>
			
			<t>In the first step, it might need the network operator to select some typical applications and the combinations 
			of traffics based on experience to be used as the typical training scenes. For these typical scenes, we run a learning
			algorithm (for example, neutral network) to learn the characteristics of these scenes from periodically collected network
			performance indicators. </t>
			
			<t>The selected network performance indicators can be device’s port bandwidth, queue size, etc al. which might be related
			to the applications and traffics in the networks.</t>
			
			<t>While in the second step, human experience from network administrators can be used to provide proper ECN configurations
			for these typical scenes. AI technologies can also be used to enrich the scene sets based on these human experience, which 
			is left for implementation.</t>
			</section>
			
			<section title="Scene Identification and ECN Adaptive Reconfiguration">
			<t>In the practical network, the analyzer periodically collects information of selected network performance indicators from network nodes. 
			The information is then used as input to the pre-learnt model and get the identified scene. The ECN settings of network devices 
			will then be adaptively reconfigured to the parameters of the identified scene periodically.</t>
			
			<t>The adaptive cycle of the period can be decided according to experience or it can be a training result in previous process
			defined in section 3.1.</t>
			
			</section>
		</section>
		<section title="Data collection and AI ECN adaptive reconfiguration">
			<section title="Data collection">
				<t>In both training and adaptive reconfiguration process, the analyzer needs to collect information of the network i.e.
				a set of network performance indicators.</t>
				
				<t>The data collection can be achieved by grpc or yang-push or other protocols.</t>
			</section>
			
			<section title="ECN adaptive Reconfiguration">
				<t>The adaptive reconfiguration of ECN in a running network environment can be achieved by control-plane protocols such as netconf. </t>
			</section>
		</section>
		<section title="Security Considerations" anchor="Security">
			<t>
			TBD
			</t>
		</section>
		<section title="Manageability Consideration" anchor="Manageability">
			<t>TBD</t>
		</section>
		<section title="IANA Considerations" anchor="IANA">
			<t>No IANA action</t>
		</section>
	</middle>
	<back>
	
		<references title="Normative References">
   &RFC2119;
   &RFC8174;
  
		</references>
		<references title="Informative References">
			
			&RFC3168;
			&RFC4341;
			&RFC4342;
			&RFC5632;
			&RFC6040;
			&RFC6679;
			&RFC2309;
			&RFC7567;
		</references>
		<section title="Acknowledgements" numbered="no">
		<t>
		We would like to thank the following persons for their great efforts and contributions to the work: Huafeng Wen, 
		Binghui Wu, Weiqin Kong, Ke Meng, Xitong Jia, Liang Shan, Siyu Yan, Weishan Deng, Boding Wang, Jungan Yan, Haonan Ye and Liang Zhang.
		</t>
		</section>
		<!-- generic-out-of-band-aspects -->
	</back>
</rfc>
