idnits 2.17.1 

draft-kim-nmrg-rl-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  == The document doesn't use any RFC 2119 keywords, yet seems to have RFC
     2119 boilerplate text.

  -- The document date (March 11, 2019) is 1872 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'Wikipedia' is mentioned on line 146, but not defined

  == Missing Reference: 'TBD' is mentioned on line 385, but not defined

  == Unused Reference: 'I-D.jiang-nmlrg-network-machine-learning' is defined
     on line 398, but no explicit reference was found in the text

  == Unused Reference: 'Teiralbar' is defined on line 410, but no explicit
     reference was found in the text

  == Unused Reference: 'Nasim' is defined on line 417, but no explicit
     reference was found in the text

  == Unused Reference: 'Minsuk' is defined on line 422, but no explicit
     reference was found in the text

  == Unused Reference: 'April' is defined on line 428, but no explicit
     reference was found in the text

  == Unused Reference: 'Markus' is defined on line 432, but no explicit
     reference was found in the text

  == Unused Reference: 'Ann' is defined on line 436, but no explicit
     reference was found in the text

  == Unused Reference: 'Kok-Lim' is defined on line 441, but no explicit
     reference was found in the text

  == Unused Reference: 'Al-Dayaa' is defined on line 457, but no explicit
     reference was found in the text

  == Unused Reference: 'Chowdappa' is defined on line 464, but no explicit
     reference was found in the text

  == Unused Reference: 'Mnih' is defined on line 471, but no explicit
     reference was found in the text

  == Unused Reference: 'Stampa' is defined on line 474, but no explicit
     reference was found in the text

  == Unused Reference: 'Krizhevsky' is defined on line 478, but no explicit
     reference was found in the text

  == Unused Reference: 'Volodymyr' is defined on line 484, but no explicit
     reference was found in the text

  == Unused Reference: 'Ju-Bong' is defined on line 491, but no explicit
     reference was found in the text

  == Outdated reference: A later version (-05) exists of draft-kim-nmrg-rl-03


     Summary: 0 errors (**), 0 flaws (~~), 20 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Management Research Group                               M-S. Kim
3	Internet-Draft                                                      ETRI
4	Intended status: Informational                                  Y-H. Han
5	Expires: September 12, 2019                                    KoreaTech
6	                                                               Y-G. Hong
7	                                                                    ETRI
8	                                                          March 11, 2019

10	      Intelligent Reinforcement-learning-based Network Management
11	                          draft-kim-nmrg-rl-04

13	Abstract

15	   This document presents intelligent network management scenarios based
16	   on reinforcement-learning approaches.  Nowadays, a heterogeneous
17	   network should usually provide real-time connectivity, the type of
18	   network management with the quality of real-time data, and
19	   transmission services generated by the operating system for an
20	   application service.  With that reason intelligent management system
21	   is needed to support real-time connection and protection through
22	   efficient management of interfering network traffic for high-quality
23	   network data transmission in the both cloud and IoE network systems.
24	   Reinforcement-learning is one of the machine learning algorithms that
25	   can intelligently and autonomously provide to management systems over
26	   a communication network.  Reinforcement-learning has developed and
27	   expanded with deep learning technique based on model-driven or data-
28	   driven technical approaches so that these trendy techniques have been
29	   widely to intelligently attempt an adaptive networking models with
30	   effective strategies in environmental disturbances over variety of
31	   networking areas.

33	Status of This Memo

35	   This Internet-Draft is submitted in full conformance with the
36	   provisions of BCP 78 and BCP 79.

38	   Internet-Drafts are working documents of the Internet Engineering
39	   Task Force (IETF).  Note that other groups may also distribute
40	   working documents as Internet-Drafts.  The list of current Internet-
41	   Drafts is at https://datatracker.ietf.org/drafts/current/.

43	   Internet-Drafts are draft documents valid for a maximum of six months
44	   and may be updated, replaced, or obsoleted by other documents at any
45	   time.  It is inappropriate to use Internet-Drafts as reference
46	   material or to cite them other than as "work in progress."

48	   This Internet-Draft will expire on September 12, 2019.

50	Copyright Notice

52	   Copyright (c) 2019 IETF Trust and the persons identified as the
53	   document authors.  All rights reserved.

55	   This document is subject to BCP 78 and the IETF Trust's Legal
56	   Provisions Relating to IETF Documents
57	   (https://trustee.ietf.org/license-info) in effect on the date of
58	   publication of this document.  Please review these documents
59	   carefully, as they describe your rights and restrictions with respect
60	   to this document.  Code Components extracted from this document must
61	   include Simplified BSD License text as described in Section 4.e of
62	   the Trust Legal Provisions and are provided without warranty as
63	   described in the Simplified BSD License.

65	Table of Contents

67	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
68	   2.  Conventions and Terminology . . . . . . . . . . . . . . . . .   3
69	   3.  Theoretical Approaches  . . . . . . . . . . . . . . . . . . .   4
70	     3.1.  Reinforcement-learning  . . . . . . . . . . . . . . . . .   4
71	     3.2.  Deep-reinforcement-learning . . . . . . . . . . . . . . .   4
72	     3.3.  Advantage Actor Critic (A2C)  . . . . . . . . . . . . . .   4
73	     3.4.  Asynchronously Advantage Actor Critic (A3C) . . . . . . .   5
74	   4.  Reinforcement-learning-based process scenario . . . . . . . .   5
75	     4.1.  Single-agent with Single-model  . . . . . . . . . . . . .   6
76	     4.2.  Multi-agents Sharing Single-model . . . . . . . . . . . .   6
77	     4.3.  Adversarial Self-Play with Single-model . . . . . . . . .   6
78	     4.4.  Cooperative Multi-agents with Multiple-models . . . . . .   6
79	     4.5.  Competitive Multi-agents with Multiple-models . . . . . .   7
80	   5.  Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . .   7
81	     5.1.  Intelligent Edge-computing for Traffic Control using
82	           Deep-reinforcement-learning . . . . . . . . . . . . . . .   7
83	     5.2.  Edge computing system in a field of Construction-site
84	           using Reinforcement-learning  . . . . . . . . . . . . . .   7
85	     5.3.  Deep-reinforcement-learning-based Cyber Physical
86	           Management Control system over a network  . . . . . . . .   8
87	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .   9
88	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .   9
89	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .   9
90	     8.1.  Normative References  . . . . . . . . . . . . . . . . . .   9
91	     8.2.  Informative References  . . . . . . . . . . . . . . . . .   9
92	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  11

94	1.  Introduction

96	   Reinforcement-learning for intelligently autonomous network
97	   management, in general, is one of the challengeable methods in a
98	   dynamic complex and cluttered network environments.  With the
99	   intelligent approach needs the development of computational systems
100	   in a single or large distributed networking nodes, where these
101	   environments involve limited and incomplete knowledge.

103	   The reinforcement-learning can become a challenge-able and effective
104	   technique to transfer and share information via the global
105	   environment, as it does not require a priori-knowledge of the agent
106	   behavior or environment to accomplish its tasks [Megherbi].  Such a
107	   knowledge is usually acquired and learned repeatedly and autonomously
108	   by trial and error.  The reinforcement-learning is also one of the
109	   machine learning techniques that will be adapted to the various
110	   networking environments for automatic networks [S.Jiang].

112	   Deep-reinforcement-learning recently proposes has been extended from
113	   reinforcement-learning that can emerge as more powerful model-driven
114	   or data-driven model in a large state space, to overcome the
115	   classical behavior reinforcement-learning process.  However, the
116	   classical reinforcement-learning slightly has a limitation to be
117	   adopted in networking areas, since the networking environments
118	   consist of significantly large and complex components in fields of
119	   routing configuration, optimization and system management, so that
120	   deep-reinforcement-learning can provide much more state information
121	   for learning process.[MS]

123	   There are many different networking management problems to
124	   intelligently solve, such as connectivity, traffic management, fast
125	   Internet without latency and etc.  Reinforcement-learning-based
126	   approaches can surely provide some of specific solutions with
127	   multiple cases against human operating capacities although it is a
128	   challengeable area due to a multitude of reasons such as large state
129	   space, complexity in the giving reward, difficulty in control
130	   actions, and difficulty in sharing and merging of the trained
131	   knowledge in a distributed memory node to be transferred over a
132	   communication network.[MS]

134	2.  Conventions and Terminology

136	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
137	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
138	   document are to be interpreted as described in [RFC2119].

140	3.  Theoretical Approaches

142	3.1.  Reinforcement-learning

144	   Reinforcement-learning is an area of machine learning concerned with
145	   how software agents should take actions in an environment so as to
146	   maximize some notion of cumulative reward.[Wikipedia] The
147	   reinforcement-learning is normally used with a reward from
148	   centralized node (the global brain), and capable of autonomous
149	   acquirement and incorporation of knowledge.  It is continuously self-
150	   improving and becoming more efficient as the learning process from an
151	   agent experience to optimize management performance for autonomous
152	   learning process.[Sutton][Madera]

154	3.2.  Deep-reinforcement-learning

156	   Some of advanced techniques using reinforcement-learning encounter
157	   and combine with deep-learning in neural networks that has made it
158	   possible to extract high-level features from raw data in compute
159	   vision [A Krizhevsky].  There are many challenges under the deep-
160	   learning models such as convolution neural network, recurrent neural
161	   network and etc., on the reinforcement-learning approach.  The
162	   benefit of the deep learning applications is that lots of networking
163	   models, but the problematic issue is complex and cluttered networking
164	   structures used with large amounts of labelled training data.

166	   Recently, the advances in training deep neural networks to develop a
167	   novel artificial agent, termed a deep Q-network (deep-reinforcement-
168	   learning network), can be used to learn successful policies directly
169	   from high-dimensional sensory inputs using end-to-end reinforcement
170	   learning [V.Mnih].

172	   The deep-reinforcement-learning (deep Q-network) can provide more
173	   extended and powerful scenarios to build networking models with
174	   optimized action controls, huge system states and real-time-based
175	   reward function.  Moreover, the technique has a significant advantage
176	   to set highly sequential data in a large model state space.  [MS] In
177	   particular, the data distribution in reinforcement-learning is able
178	   to change as learning behaviors, that is a problem for deep learning
179	   approaches assumed by a fixed underlying distribution [V.  Mnih].

181	3.3.  Advantage Actor Critic (A2C)

183	   Advantage Actor Critic is one of the intelligent reinforcement-
184	   learning models based on policy gradient model.  The intelligent
185	   approach can optimize deep neural network controller in terms of
186	   reinforcement-learning algorithms, and show that parallel actor-
187	   learners have a stabilizing effect on training and they can be
188	   allowing all of the methods to successfully train neural network
189	   controllers [Volodymyr Mnih].  Even if the prior deep-reinforcement-
190	   learning algorithm with experience replay memory tremendously has
191	   performance in challenging of the control service domains, it still
192	   needs to use more memory and computational power due to off-policy
193	   learning methods.  To make up for this algorithms, a new algorithm
194	   has appeared.

196	   The Advantage Actor Critic (consisting of actor and critic) method
197	   would implement generalized policy iteration alternating between a
198	   policy evaluation and a policy improvement step.  Actor is a policy-
199	   based method that can improve the current policy for available the
200	   best next action.  Critic in the value-based approach can evaluate
201	   the current policy and reduce the variance by a bootstrapping method.
202	   It is more stable and effective algorithm than the pure policy-based
203	   gradient methods.[MS]

205	3.4.  Asynchronously Advantage Actor Critic (A3C)

207	   Asynchronously Advantage Actor Critic is the updated algorithm based
208	   on Advantage Actor Critic.  The main algorithm concept is to run
209	   multiple environments in parallel to run the agent asynchronously
210	   instead of experience replay.  The parallel environment reduces the
211	   correlation of agent's data and induces each agent to experience
212	   various states so that the learning process can become a stationary
213	   process.  This algorithm is a beneficial and practical point of view
214	   since it allows learning performance even with a general multi-core
215	   CPU.  In addition, it can be applied to continuous space as well as
216	   discrete action space, and also has the advantages of learning both
217	   feedforward and recurrent agent.[MS]

219	   A3C algorithm is possibly a number of complementary improvement to
220	   the neural network architecture and it has been shown to accurately
221	   produce and estimate of Q-values by including separate streams for
222	   the state value and advantage in the network to improve both value-
223	   based and policy-based methods by making it easier for the network to
224	   represent feature coordinates [Volodymyr Mnih].

226	4.  Reinforcement-learning-based process scenario

228	   With a single agent or multiple agents trained for intelligent
229	   network management, a variety of training scenarios are possible,
230	   depending on how agents are interacted and how many models are linked
231	   to the agents.  The followings are possible RL training scenarios for
232	   network management.

234	4.1.  Single-agent with Single-model

236	   This is the traditional scenario of training a single agent who tries
237	   to achieve one goal related to network management.  It receives all
238	   of information and rewards from a network (or a simulated network),
239	   and decides its appropriate action for the current network status.

241	4.2.  Multi-agents Sharing Single-model

243	   In this scenario, multiple agents share a single model and a single
244	   goal linked to the model.  But, each of them is connected to an
245	   independent part of network or an independent whole network, so that
246	   they receive different information and rewards from such an
247	   independent one.  The multiple agents experience differently on their
248	   connected networks.  However, it does not mean their training
249	   behavior for network management will diverge.  Each of their
250	   experience is used to train the single model.  This scenario is a
251	   kind of parallelized version of the traditional 'Single-Agent with
252	   Single-Model' scenario, which can speed-up the RL training process
253	   and stabilize the single model's behavior.

255	4.3.  Adversarial Self-Play with Single-model

257	   This scenario contains two interacting agents with inverse reward
258	   functions linked to a single model.  This scenario makes an agent
259	   have the perfectly matched opposing agent: itself, and trains the
260	   agent to become increasingly more skilled for network management.
261	   Inverse rewards are used to punish the opposing agent when an agent
262	   receives as positive reward, and vice versa.  The two agents are
263	   linked to a single model for network management, and the model are
264	   trained and stabilized while both agents interact in a conflicting
265	   manner.

267	4.4.  Cooperative Multi-agents with Multiple-models

269	   In this scenario, two or more interacting agents share a common
270	   reward function linked to multiple different models for network
271	   management.  In this scenario, a common goal is set up and all agents
272	   are trained to achieve the goal together that is hard to be achieved
273	   alone.  Usually, each agent has access only to partial information of
274	   network status and determines an appropriate action by using its own
275	   model.  Each of actions will be independently taken in order to
276	   accomplish a management task and collaboratively achieve the common
277	   goal.

279	4.5.  Competitive Multi-agents with Multiple-models

281	   This scenario contains two or more interacting agents with diverse
282	   reward function linked to multiple different models.  In this
283	   scenario, agents will compete with one another to obtain some limited
284	   set of network resources and try to achieve their own goal.  In a
285	   network, there will be tasks that have different management
286	   objectives.  This leads multi-objective optimization problems, which
287	   are generally difficult to solve analytically.  This scenario is
288	   suitable for solving such a multi-objective optimization problem
289	   related to network management by allowing each agent solve a single-
290	   objective problem, but complete with each other.

292	5.  Use Cases

294	5.1.  Intelligent Edge-computing for Traffic Control using Deep-
295	      reinforcement-learning

297	   Edge computing is a concept that allows data from a variety of
298	   devices to be directly analyzed at the site or near the data, rather
299	   than being sent to a centralized data center such as the cloud.  As
300	   such, edge computing will support data flow acceleration by
301	   processing data with low latency in real-time.  In addition, by
302	   supporting efficient data processing on large amounts of data that
303	   can be processed around the source, and internet bandwidth usage will
304	   be also reduced.

306	   Deep-reinforcement-learning would be useful technique to improve
307	   system performance in an intelligent edge-controlled service system
308	   for fast response time, reliability and security.  Deep-
309	   reinforcement-learning is model-free approach so that many algorithms
310	   such as DQN, A2C and A3C can be adopted to resolve network problems
311	   in time-sensitive systems.

313	5.2.  Edge computing system in a field of Construction-site using
314	      Reinforcement-learning

316	   In a construction site, there are many dangerous elements such as
317	   noisy, gas leak and vibration needed by alerts, so that real-time
318	   monitoring system to detect the alerts using machine learning
319	   techniques can provide more effective solution and approach to
320	   recognize dangerous construction elements.

322	   Representatively, to monitor these elements CCTV (closed-circuit
323	   television) should be locally and continuously broadcasting in a
324	   situation of construction site.  At that time, it is in-effective and
325	   wasteful even if the CCTV is constantly broadcasting unchangeable
326	   scenes in high definition.  However, the streaming should be
327	   converted to high quality streaming data to rapidly show and defect
328	   the dangerous situation, when any alert should be detected due to the
329	   dangerous elements.  To approach technically deep-reinforcement-
330	   learning can provide a solution to automatically detect these kinds
331	   of dangerous situations with prediction in an advance.  It can also
332	   provide the transform data including with the high-rate streaming
333	   video and quickly prevent the other risks.  Deep-reinforcement-
334	   learning is an important role to efficiently manage and monitor with
335	   the given dataset in real-time.

337	5.3.  Deep-reinforcement-learning-based Cyber Physical Management
338	      Control system over a network

340	   With the nonlinear control system such as cyber physical system
341	   provides an unstable system environment with initial control state
342	   due to its nonlinear nature.  In order to stably control the unstable
343	   initial state, the prior-complex mathematical control methods (Linear
344	   Quadratic Regulator, Proportional Integral Differential) are used for
345	   successful control and management, but these approaches are needed
346	   with difficult mathematical process and high-rate effort.  Therefore,
347	   using deep-reinforcement-learning can surely provide more effective
348	   technical approach without difficult initial set of control states to
349	   be compared with the other methods.

351	   The ultimate purpose of the reinforcement-learning is to interact
352	   with the environment and maximize the target reward value.  Observing
353	   the state in the step and the action by the policy are performed, and
354	   the reward judge a value through the compensation given in the
355	   environment.  Deep-reinforcement-learning using Convolutional Neural
356	   Network (CNN) can provide more performing learning process to make
357	   stable control and management.

359	   As part of the system, it shows how the physical environment and the
360	   cyber environment interact with the reinforcement-learning module
361	   over a network.  The actions to control the physical environment,
362	   delivered to the Enhanced Learning model based on DQN, transfer to
363	   data to the physical environment using networking communication tools
364	   as below.

366	      +-----Environment-----+            +---Control and Management---+
367	      .                     .            .                            .
368	      . +-----------------+ .  Network   +--------------+             .
369	      . . Physical System . .----------->. Cyber Module .             .
370	      . .                 . .<-----------.              .             .
371	      . +-----------------+ .            +--------------+             .
372	      .                     .            .      .          +--------+ .
373	      +---------------------+            .      .----------.RL Agent. .
374	                                         .                 +--------+ .
375	                                         +............................+

377	       Figure 1: DRL-based Cyber Physical Management Control System

379	6.  IANA Considerations

381	   There are no IANA considerations related to this document.

383	7.  Security Considerations

385	   [TBD]

387	8.  References

389	8.1.  Normative References

391	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
392	              Requirement Levels", BCP 14, RFC 2119,
393	              DOI 10.17487/RFC2119, March 1997,
394	              <https://www.rfc-editor.org/info/rfc2119>.

396	8.2.  Informative References

398	   [I-D.jiang-nmlrg-network-machine-learning]
399	              Jiang, S., "Network Machine Learning", ID draft-jiang-
400	              nmlrg-network-machine-learning-02, October 2016.

402	   [Megherbi]
403	              "Megherbi, D. B., Kim, Minsuk, Madera, Manual., A Study of
404	              Collaborative Distributed Multi-Goal and Multi-agent based
405	              Systems for Large Critical Key Infrastructures and
406	              Resources (CKIR) Dynamic Monitoring and Surveillance, IEEE
407	              International Conference on Technologies for Homeland
408	              Security", 2013.

410	   [Teiralbar]
411	              "Megherbi, D. B., Teiralbar, A. Boulenouar, J., A Time-
412	              varying Environment Machine Learning Technique for
413	              Autonomous Agent Shortest Path Planning, Proceedings of
414	              SPIE International Conference on Signal and Image
415	              Processing, Orlando, Florida", 2001.

417	   [Nasim]    "Nasim ArianpooEmail, Victor C.M. Leung, How network
418	              monitoring and reinforcement learning can improve tcp
419	              fairness in wireless multi-hop networks, EURASIP Journal
420	              on Wireless Communications and Networking", 2016.

422	   [Minsuk]   "Dalila B. Megherbi and Minsuk Kim, A Hybrid P2P and
423	              Master-Slave Cooperative Distributed Multi-Agent
424	              Reinforcement Learning System with Asynchronously
425	              Triggered Exploratory Trials and Clutter-index-based
426	              Selected Sub goals, IEEE CIG Conference", 2016.

428	   [April]    "April Yu, Raphael Palefsky-Smith, Rishi Bedi, Deep
429	              Reinforcement Learning for Simulated Autonomous Vehicle
430	              Control, Stanford University", 2016.

432	   [Markus]   "Markus Kuderer, Shilpa Gulati, Wolfram Burgard, Learning
433	              Driving Styles for Autonomous Vehicles from Demonstration,
434	              Robotics and Automation (ICRA)", 2015.

436	   [Ann]      "Ann Nowe, Peter Vrancx, Yann De Hauwere, Game Theory and
437	              Multi-agent Reinforcement Learning, In book: Reinforcement
438	              Learning: State of the Art, Edition: Adaptation, Learning,
439	              and Optimization Volume 12", 2012.

441	   [Kok-Lim]  "Kok-Lim Alvin Yau, Hock Guan Goh, David Chieng, Kae
442	              Hsiang Kwong, Application of Reinforcement Learning to
443	              wireless sensor networks: models and algorithms, Published
444	              in Journal Computing archive Volume 97 Issue 11, Pages
445	              1045-1075", November 2015.

447	   [Sutton]   "Sutton, R. S., Barto, A. G., Reinforcement Learning: an
448	              Introduction, MIT Press", 1998.

450	   [Madera]   "Madera, M., Megherbi, D. B., An Interconnected Dynamical
451	              System Composed of Dynamics-based Reinforcement Learning
452	              Agents in a Distributed Environment: A Case Study,
453	              Proceedings IEEE International Conference on Computational
454	              Intelligence for Measurement Systems and Applications,
455	              Italy", 2012.

457	   [Al-Dayaa]
458	              "Al-Dayaa, H. S., Megherbi, D. B., Towards A Multiple-
459	              Lookahead-Levels Reinforcement-Learning Technique and Its
460	              Implementation in Integrated Circuits, Journal of
461	              Artificial Intelligence, Journal of Supercomputing. Vol.
462	              62, issue 1, pp. 588-61", 2012.

464	   [Chowdappa]
465	              "Chowdappa, Aswini., Skjellum, Anthony., Doss, Nathan,
466	              Thread-Safe Message Passing with P4 and MPI, Technical
467	              Report TR-CS-941025, Computer Science Department and NSF
468	              Engineering Research Center, Mississippi State
469	              University", 1994.

471	   [Mnih]     "V.Mnih and et al., Human-level Control Through Deep
472	              Reinforcement Learning, Nature 518.7540", 2015.

474	   [Stampa]   "G Stamp, M Arias, etc., A Deep-reinforcement Learning
475	              Approach for Software-defined Networking Routing
476	              Optimization, cs.NI", 2017.

478	   [Krizhevsky]
479	              "A Krizhevsky, I Sutskever, and G Hinton, Imagenet
480	              classification with deep con- volutional neural networks,
481	              In Advances in Neural Information Processing Systems,
482	              1106-1114", 2012.

484	   [Volodymyr]
485	              "Volodymyr Mnih and et al., Asynchronous Methods for Deep
486	              Reinforcement Learning, ICML, arXiv:1602.01783", 2016.

488	   [MS]       "Intelligent Network Management using Reinforcement-
489	              learning, draft-kim-nmrg-rl-03", 2018.

491	   [Ju-Bong]  "Deep Q-Network Based Rotary Inverted Pendulum System and
492	              Its Monitoring on the EdgeX Platform, International
493	              Conference on Artificial Intelligence in Information and
494	              Communication (ICAIIC)", 2019.

496	Authors' Addresses
497	   Min-Suk Kim
498	   Etri
499	   161 Gajeong-Dong Yuseung-Gu
500	   Daejeon  305-700
501	   Korea

503	   Phone: +82 42 860 5930
504	   Email: mskim16@etri.re.kr

506	   Youn-Hee Han
507	   KoreaTech
508	   Byeongcheon-myeon Gajeon-ri, Dongnam-gu
509	   Choenan-si, Chungcheongnam-do
510	   330-708
511	   Korea

513	   Phone: +82 41 560 1486
514	   Email: yhhan@koreatech.ac.kr

516	   Yong-Geun Hong
517	   ETRI
518	   161 Gajeong-Dong Yuseung-Gu
519	   Daejeon  305-700
520	   Korea

522	   Phone: +82 42 860 6557
523	   Email: yghong@etri.re.kr