< draft-ihsan-nmrg-rl-vne-ps-01.txt   draft-ihsan-nmrg-rl-vne-ps-02.txt >
Internet Engineering Task Force I. Ullah Internet Engineering Task Force I. Ullah
Internet-Draft Y-H. Han Internet-Draft Y-H. Han
Intended status: Informational KOREATECH Intended status: Informational KOREATECH
Expires: 22 April 2022 TY. Kim Expires: 23 October 2022 TY. Kim
ETRI ETRI
19 October 2021 21 April 2022
Reinforcement Learning-Based Virtual Network Embedding: Problem Reinforcement Learning-Based Virtual Network Embedding: Problem
Statement Statement
draft-ihsan-nmrg-rl-vne-ps-01 draft-ihsan-nmrg-rl-vne-ps-02
Abstract Abstract
In Network virtualization (NV) technology, Virtual Network Embedding In Network virtualization (NV) technology, Virtual Network Embedding
(VNE) is an algorithm used to map a virtual network to the substrate (VNE) is an algorithm used to map a virtual network to the substrate
network. VNE is the core orientation of NV which has a great impact network. VNE is the core orientation of NV which has a great impact
on the performance of virtual network and resource utilization of the on the performance of virtual network and resource utilization of the
substrate network. An efficient embedding algorithm can maximize the substrate network. An efficient embedding algorithm can maximize the
acceptance ratio of virtual networks to increase the revenue for acceptance ratio of virtual networks to increase the revenue for
Internet service provider. Several works have been appeared on the Internet service provider. Several works have been appeared on the
skipping to change at page 2, line 10 skipping to change at page 2, line 10
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 22 April 2022. This Internet-Draft will expire on 23 October 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components and restrictions with respect to this document. Code Components
extracted from this document must include Simplified BSD License text extracted from this document must include Revised BSD License text as
as described in Section 4.e of the Trust Legal Provisions and are described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Simplified BSD License. provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction and Scope . . . . . . . . . . . . . . . . . . . 2 1. Introduction and Scope . . . . . . . . . . . . . . . . . . . 2
2. Reinforcement Learning-based VNE Solutions . . . . . . . . . 5 2. Reinforcement Learning-based VNE Solutions . . . . . . . . . 5
3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8
4. Problem Space . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Problem Space . . . . . . . . . . . . . . . . . . . . . . . . 9
4.1. State Representation . . . . . . . . . . . . . . . . . . 9 4.1. State Representation . . . . . . . . . . . . . . . . . . 9
4.2. Action Space . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Action Space . . . . . . . . . . . . . . . . . . . . . . 9
4.3. Reward Description . . . . . . . . . . . . . . . . . . . 10 4.3. Reward Description . . . . . . . . . . . . . . . . . . . 10
4.4. Policy and RL Algorithms . . . . . . . . . . . . . . . . 11 4.4. Policy and RL Algorithms . . . . . . . . . . . . . . . . 11
4.5. Training Environment . . . . . . . . . . . . . . . . . . 12 4.5. Training Environment . . . . . . . . . . . . . . . . . . 12
4.6. Sim2Real Gap . . . . . . . . . . . . . . . . . . . . . . 13 4.6. Sim2Real Gap . . . . . . . . . . . . . . . . . . . . . . 13
4.7. Generalization . . . . . . . . . . . . . . . . . . . . . 13 4.7. Generalization . . . . . . . . . . . . . . . . . . . . . 14
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14
7. Informative References . . . . . . . . . . . . . . . . . . . 14 7. Informative References . . . . . . . . . . . . . . . . . . . 14
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18
1. Introduction and Scope 1. Introduction and Scope
Recently, Network virtualization (NV) technology has received a lot Recently, Network virtualization (NV) technology has received a lot
of attention from academics and industry. It allows multiple of attention from academics and industry. It allows multiple
heterogeneous virtual networks to share resources on the same heterogeneous virtual networks to share resources on the same
substrate network (SN) [RFC7364], [ASNVT2020]. The current large- substrate network (SN) [RFC7364], [ASNVT2020]. The current large-
size fixed substrate network architecture is no longer efficient and size fixed substrate network architecture is no longer efficient and
not extendable due to network ossification. To overcome this not extendable due to network ossification. To overcome this
limitations, traditional Internet Service Providers (ISPs) are limitations, traditional Internet Service Providers (ISPs) are
skipping to change at page 3, line 26 skipping to change at page 3, line 26
network and increase the commercial revenue of both SPs and InPs. NV network and increase the commercial revenue of both SPs and InPs. NV
can increase network agility, flexibility and scalability while can increase network agility, flexibility and scalability while
creating significant cost savings. Greater network workload creating significant cost savings. Greater network workload
mobility, increased availability of network resources with good mobility, increased availability of network resources with good
performance, and automated operations, are all the benefits of NV. performance, and automated operations, are all the benefits of NV.
Virtual Network Embedding (VNE) [VNESURV2013] is one of the main Virtual Network Embedding (VNE) [VNESURV2013] is one of the main
technique and strategy which used to map a virtual network to the technique and strategy which used to map a virtual network to the
substrate network. VNE algorithm has two main parts, Node embedding: substrate network. VNE algorithm has two main parts, Node embedding:
where virtual nodes of VN have to be mapped to the SN nodes, and Link where virtual nodes of VN have to be mapped to the SN nodes, and Link
ebbedding: where virtual links between the VNs have to be mapped to embedding: where virtual links between the VNs have to be mapped to
the physical paths in the substrate network. It has been proven to the physical paths in the substrate network. It has been proven to
be NP-Hard, and both node and link embeddings have become challenging be NP-Hard, and both node and link embeddings have become challenging
for the researchers. A virtual node and link should be efficiently for the researchers. A virtual node and link should be efficiently
embedded into a given SN, so that more VNR can be accepted with embedded into a given SN, so that more VNR can be accepted with
minimum cost. The distance of the virtual nodes from each other in a minimum cost. The distance of the virtual nodes from each other in a
given SN is a big contribution to the link failures and causes the given SN is a big contribution to the link failures and causes the
rejection of VNRs. Hence, an efficient and intelligent technique is rejection of VNRs. Hence, an efficient and intelligent technique is
required for VNE problem to reduce VNRs rejection [ENViNE2021]. In required for VNE problem to reduce VNRs rejection [ENViNE2021]. In
the perspective of the InPs, the efficient VNE performs better mostly the perspective of the InPs, the efficient VNE performs better mostly
in terms of revenue, acceptance ratio, and revenue-to-cost ratio. in terms of revenue, acceptance ratio, and revenue-to-cost ratio.
skipping to change at page 5, line 5 skipping to change at page 5, line 5
Recently, artificial intelligence and machine learning technologies Recently, artificial intelligence and machine learning technologies
have been widely used to solve networking problems [SUR2018], have been widely used to solve networking problems [SUR2018],
[MLCNM2018], [MVNNML2021]. There has been a surge in research [MLCNM2018], [MVNNML2021]. There has been a surge in research
efforts,specially,reinforcement learning (RL) which has been efforts,specially,reinforcement learning (RL) which has been
contributed much more in the many complex tasks, e.g. video games and contributed much more in the many complex tasks, e.g. video games and
auto-driving etc. The main goal of an RL to learn better policies auto-driving etc. The main goal of an RL to learn better policies
for sequential decision making problems (e.g., VNE) and solve them for sequential decision making problems (e.g., VNE) and solve them
very efficiently. very efficiently.
Problems such as node ordering, pattern matching, and network feature Problems such as node classification, pattern matching, and network
extraction can all be simplified by graph-related theories and feature extraction, can be simplified by graph-related theories and
techniques. Graph neural network (GNN) is a new type of ML model techniques. Graph neural network (GNN) is a new type of ML model
architecture that can aggregate graph features (degrees, distance to architecture that can aggregate graph features (degrees, distance to
specific nodes, node connectivity, etc.) on nodes [DVNEGCN2021]. The specific nodes, node connectivity, etc.) on nodes [DVNEGCN2021].
model can be used to cluster nodes and links according to the Graph convolution neural network (GCNN) is a natural generalization
form of GNN which is used to automatically extract the features of
underlying network, which optimizes the selection of VNE decision.
The model can be used to cluster nodes and links according to the
physical nodes and physical links attribute characteristics (CPU, physical nodes and physical links attribute characteristics (CPU,
storage, bandwidth, delay, etc.), and it is highly suitable for graph storage, bandwidth, delay, etc.), and it is highly suitable for graph
structures of any topological form. Hence, GNN is useful to find the structures of any topological form. Hence, GNN is useful to find the
best VNE strategy by intelligent agent training, and the organic best VNE strategy by intelligent agent training, and the organic
combination of VNE and GCN has a good prerequisite. combination of VNE and GCN has a good prerequisite.
Designing and applying RL techniques directly into VNE problems is Designing and applying RL techniques directly into VNE problems is
not yet trivial, but may face several challenges. This document not yet trivial, but may face several challenges. Several works have
describes the problems. Several works have appeared on the design of been appeared on the design of VNE solutions using RL, which focuses
VNE solutions using RL, which focuses on how to interact with the on how to interact with the environment to achieve maximum cumulative
environment to achieve maximum cumulative return [VNEQS2021], return [VNEQS2021], [NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020],
[NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020], [RLVNEWSN2020], [RLVNEWSN2020], [QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019],
[QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019], [DeepViNE2019], [DeepViNE2019], [VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018],
[VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018], [NeuroViNE2018], [NeuroViNE2018], [QVNE2020]. This document outlines the problems
[QVNE2020]. This document outlines the problems encountered when encountered when designing and applying RL-based VNE solutions.
designing and applying RL-based VNE solutions. Section 2 describes Section 2 describes how to design RL-based VNE solutions. Section 3
how to design RL-based VNE solutions. Section 3 gives terminology, gives terminology, and Section 4 describes the problem space details.
and Section 4 describes the problem space details.
2. Reinforcement Learning-based VNE Solutions 2. Reinforcement Learning-based VNE Solutions
As we discussed that RL has been studied in various fields (such as As we discussed that RL has been studied in various fields (such as
game, control system, operation research, information theory, multi- game, control system, operation research, information theory, multi-
agent system, network system, etc.) and shows better performance than agent system, network system, etc.) and shows better performance than
humans. Unlike deep learning, RL trains a policy model by receiving humans. Unlike deep learning, RL trains a policy model by receiving
rewards through interaction with the environment without training rewards through interaction with the environment without training
label data. label data.
Recently, there have been several attempts to solve VNE problems Recently, there have been several attempts to solve VNE problems
using RL. When applying RL-based algorithms to solve VNE problems, using RL. When applying RL-based algorithms to solve VNE problems,
the RL agent automatically learns without human intervention through the RL agent automatically learns through the environment without
interaction with the environment. Once the agent completed the human intervention. Once the agent completed the learning process,
learning process, it can generate the most appropriate embeddings it can generate the most appropriate embeddings decision (action)
decision (action) based on the state of the network. Based on the based on the his knowledge and network state. For single embedding
embedding or action the agent get reward from the environments to or action at each time step the agent get reward from the
adaptively train its policy for future action. The RL agent gets the environments to adaptively train its policy for future action. The
most optimized model based on the reward function defined according RL agent gets the most optimized model based on the reward function
to each objective (revenue, cost, revenue to cost ratio and defined according to each objective (revenue, cost, revenue to cost
acceptance ratio). The optimal RL policy model provides the VNE ratio and acceptance ratio). The optimal RL policy model provides
strategy appropriately according to the objective of the network the VNE strategy appropriately according to the objective of the
operator. network operator.
Figure 2 shows the virtual network embedding solution based on RL Figure 2. shows the virtual network embedding solution based on RL
algorithm. The RL is divided into a training process and an algorithm. The RL strategy is divided into two main parts training
inference process. In the training process, state information is process and an inference process. In the training process, state
composed of various substrate networks and VNRs (Environment), which information is composed of various substrate networks and VNRs
are used as suitable inputs for RL models through feature extraction. (Environment), which are used as suitable inputs for RL models
After that, the RL model is updated by model updater using a feature through feature extraction. After that, the RL model is updated by
extracted state and reward. In the inference process, using the model updater using a feature extracted state and reward. In the
trained RL model, the embedding result is provided to the operating inference process, using the trained RL model, the embedding result
network in real time. is provided to the operating network in real time.
The following figure shows the detail about RL method based virtual The following figure shows the detail about RL method based virtual
networks embedding solutions. networks embedding solutions.
RL Model Training Process RL Model Training Process
+--------------------------------------------------------------------+ +--------------------------------------------------------------------+
| Training Environment | | Training Environment |
| +-------------------+ RL-based VNE Agent | | +-------------------+ RL-based VNE Agent |
| | +---------+ | +----------------------------------+ | | | +---------+ | +----------------------------------+ |
| | | +---------+ | | Action | | | | | +---------+ | | Action | |
skipping to change at page 8, line 24 skipping to change at page 8, line 24
Virtual Network Embedding (VNE) [VNESURV2013] is one of the main Virtual Network Embedding (VNE) [VNESURV2013] is one of the main
techniques used to map a virtual network to the substrate network. techniques used to map a virtual network to the substrate network.
Substrate Network (SN) Substrate Network (SN)
The underlying physical network which contains the resources such The underlying physical network which contains the resources such
as CPU and bandwidth for virtual networks is called substrate as CPU and bandwidth for virtual networks is called substrate
network. network.
Virtual Network Request (VNR) Virtual Network Request (VNR)
Virtual Network Request is a complete single Virtual network Virtual Network Request is a complete single Virtual network
request containing virtual nodes and virtual links. containing virtual nodes and virtual links.
Agent Agent
In RL, an agent is the component that makes the decision abd take In RL, an agent is the component that makes the decision abd take
action (i.e., embedding decision). action (i.e., embedding decision).
State State
State is a representation (e.g., remaining SN capacity and State is a representation (e.g., remaining SN capacity and
requested VN resource) of the current environment, and it tells requested VN resource) of the current environment, and it tells
the agent what situation it is in currently. the agent what situation it is in currently.
skipping to change at page 9, line 33 skipping to change at page 9, line 33
RL agent to establish a thorough knowledge of the network status and RL agent to establish a thorough knowledge of the network status and
generate efficient embedding decisions. Therefore, it is essential generate efficient embedding decisions. Therefore, it is essential
to firstly design the state representation that serves as the input to firstly design the state representation that serves as the input
to the agent. The state representation is the information which an to the agent. The state representation is the information which an
agent can receive from the environment, and consists of a set of agent can receive from the environment, and consists of a set of
values representing the current situation in the environment. Based values representing the current situation in the environment. Based
on the state representation, the RL agent selects the most on the state representation, the RL agent selects the most
appropriate action through its policy model. In the VNE problem, an appropriate action through its policy model. In the VNE problem, an
RL agent needs to know the information of the overall SN entities and RL agent needs to know the information of the overall SN entities and
their current status in order to use the resources of the nodes and their current status in order to use the resources of the nodes and
edges of the substrate network. Also it must know the requirements links of the substrate network. Also it must know the requirements
of the VNR. Therefore, in the VNE problem, the state usually should of the VNR. Therefore, in the VNE problem, the state usually should
represent the current resource state of the nodes and edges of the represent the current resource state of the nodes and links of the
substrate network (ie, CPU, memory, storage, bandwidth, delay, loss substrate network (ie, CPU, memory, storage, bandwidth, delay, loss
rate, etc.) and the requirements of the virtual node and link of the rate, etc.) and the requirements of the virtual node and link of the
VNR. The collected status information is used as raw input, or VNR. The collected status information is used as raw input, or
refined status information through the feature extraction process is refined status information through the feature extraction process is
used as input for the RL agent. The state representation may vary used as input for the RL agent. The state representation may vary
depending on the operator's objective and VNE strategy. The method depending on the operator's objective and VNE strategy. The method
of determining such feature extraction and representation greatly of determining such feature extraction and representation greatly
affects the performance of the agent. affects the performance of the agent.
4.2. Action Space 4.2. Action Space
skipping to change at page 11, line 14 skipping to change at page 11, line 14
Revenue Revenue
Revenue is the sum of the virtual resources requested by the VN, Revenue is the sum of the virtual resources requested by the VN,
and calculated to determine the total cost of the resources. and calculated to determine the total cost of the resources.
Typically, a successful action (e.g., VNR is embedded without Typically, a successful action (e.g., VNR is embedded without
violation) is treated to be a good reward which also increases the violation) is treated to be a good reward which also increases the
revenue. Otherwise, a failed action (e.g., VNR is rejected) leads revenue. Otherwise, a failed action (e.g., VNR is rejected) leads
that the agent will receive a negative reward as well as that the agent will receive a negative reward as well as
decreasing the revenue. decreasing the revenue.
Cost
Cost is the expenditure incurred when VNR is embedded as a
substrate network. It's not a good embedding result to pursue
only high revenue. It is important for the network operator and
SP to spend less. The lower the cost, the better the agent will
be rewarded.
Acceptance Ratio Acceptance Ratio
Acceptance ratio is the ratio measured by the number of Acceptance ratio is the ratio measured by the number of
successfully embedded virtual network requests divided by total successfully embedded virtual network requests divided by total
number of virtual network requests. To achieve a high acceptance number of virtual network requests. To achieve a high acceptance
ratio, the agent is trying to embed maximum VNR and get a good ratio, the agent is trying to embed maximum VNR and get a good
reward. Getting a good reward is usually proportional to the reward. Getting a good reward is usually proportional to the
acceptance ratio. acceptance ratio.
Revenue-to-cost ratio Revenue-to-cost ratio
To balance and compare the cost of resources for embedding VNR, To balance and compare the cost of resources for embedding VNR,
skipping to change at page 12, line 45 skipping to change at page 13, line 4
through a simulation that simulates the real environment. In order through a simulation that simulates the real environment. In order
to solve the VNE problem, we need to use a network simulator similar to solve the VNE problem, we need to use a network simulator similar
to the real environment because it is difficult to repeatedly to the real environment because it is difficult to repeatedly
experiment with real network environments using an RL algorithm, and experiment with real network environments using an RL algorithm, and
it is very challenging and overwhelming to directly apply an RL it is very challenging and overwhelming to directly apply an RL
algorithm to real-world environments. When solving VNE problems, a algorithm to real-world environments. When solving VNE problems, a
network simulation environment similar to a real network is required. network simulation environment similar to a real network is required.
The network simulation environment should have a general SN The network simulation environment should have a general SN
environment and VNR required by the operator. The SN has nodes and environment and VNR required by the operator. The SN has nodes and
links between nodes, and each has capacity such as CPU and Bandwidth. links between nodes, and each has capacity such as CPU and Bandwidth.
In the case of VNR, there are virtual nodes and links required by the In the case of VNR, there are virtual nodes and links required by the
operator, and each must have its own requirements. operator, and each must have its own requirements.
As described in [DTwin2022], a digital twin network is a virtual
representation of the physical network environment and can be built
by applying digital twin technologies to the environment and creating
virtual images of diverse physical network facilities. The digital
twin for networks is an expansion platform of network simulation. In
[DTwin2022], Section 8.2 describes that a digital twin network
provides the complete machine learning lifecycle development by
providing a realistic network environment, including network
topologies, etc. Hence, RL algorithms to solve the VNE problem can
be trained and verified on a digital twin network upfront before
deployed to the physical networks, and the verification accuracy will
be generally high when the digital twin network reproduces network
behaviors well under various conditions. On the other hand, two
placeholders marked as [DTwin2022] in the above new paragraph should
be replaced with the right reference number after inserting the
following new Internet-Draft, which introduces the definition,
architecture, and use-cases of digital twin network, into "Section 7.
Informative References" of our Internet Draft.
4.6. Sim2Real Gap 4.6. Sim2Real Gap
An RL algorithm iteratively learns through a simulation environment Sim-to-real is a very comprehensive concept and applied in many
to train a model of the desired policy. The trained model is then fields including robotics and classic machine vision tasks. An RL
algorithm iteratively learns through a simulation environment to
train a model of the desired policy. The trained model is then
applied to the real environment and/or tuned more for adapting to the applied to the real environment and/or tuned more for adapting to the
real one. However, when the trained model is applied in the real one. However, when the trained model is applied in the
simulation to the real environment, sim2real gap problem arises. simulation to the real environment, sim2real gap problem arises.
Obviously, the simulation environment does not match perfectly to the Closing the gap between simulation and reality gap in terms of
real environment which mostly fails in the tuning process and gives actuation requires simulators to be more accurate, and to account for
poor performance in the model because of the Sim2Real gap. The variability in agent dynamics. Obviously, the simulation environment
sim2real gap is caused by the difference between the simulation and does not match perfectly to the real environment which mostly fails
the real environment. It is because the simulation environment in the tuning process and gives poor performance in the model because
cannot perfectly simulate the real environment, and there are many of the Sim2Real gap. The sim2real gap is caused by the difference
variables in the real environment. In a real network environment for between the simulation and the real environment. It is because the
VNE, the SN's nodes and links may fail due to external factors, or simulation environment cannot perfectly simulate the real
capacity such as CPU may change suddenly. In order to solve this environment, and there are many variables in the real environment.
problem, the simulation environment should be more robust or the In a real network environment for VNE, the SN's nodes and links may
trained RL model should be generalized. To reduce the gap between fail due to external factors, or capacity such as CPU may change
sim and real network environments we need to train our model with an suddenly. In order to solve this problem, the simulation environment
efficient and large number of VNR and keep learning the agent not should be more robust or the trained RL model should be generalized.
only depend on previous memorization. To reduce the gap between sim and real network environments we need
to train our model with an efficient and large number of VNR and keep
learning the agent not only depend on previous memorization.
4.7. Generalization 4.7. Generalization
Generalization refers to the trained model's ability to adapt Generalization refers to the trained model's ability to adapt
properly to previously unseen new observations. An RL algorithm properly to previously unseen new observations. An RL algorithm
tries to learn a model that optimizes some objective with the purpose tries to learn a model that optimizes some objective with the purpose
of performing well on data that has never been seen by the model of performing well on data that has never been seen by the model
during training. In terms of VNE problems, the generalization is a during training. In terms of VNE problems, the generalization is a
measure of how the agent's policy model performs on predicting unseen measure of how the agent's policy model performs on predicting unseen
VNR. The RL agent not only has to memorize all the previous variance VNR. The RL agent not only has to memorize all the previous variance
skipping to change at page 14, line 32 skipping to change at page 15, line 12
DOI 10.1109/TNSM.2020.2971543, February 2020, DOI 10.1109/TNSM.2020.2971543, February 2020,
<https://ieeexplore.ieee.org/document/8982091>. <https://ieeexplore.ieee.org/document/8982091>.
[DeepViNE2019] [DeepViNE2019]
Dolati, M., Hassanpour, S. B., Ghaderi, M., and A. Dolati, M., Hassanpour, S. B., Ghaderi, M., and A.
Khonsari, "DeepViNE: Virtual Network Embedding with Deep Khonsari, "DeepViNE: Virtual Network Embedding with Deep
Reinforcement Learning", BCP 72, RFC 3552, Reinforcement Learning", BCP 72, RFC 3552,
DOI 10.1109/INFCOMW.2019.8845171, September 2019, DOI 10.1109/INFCOMW.2019.8845171, September 2019,
<https://ieeexplore.ieee.org/document/8845171>. <https://ieeexplore.ieee.org/document/8845171>.
[DTwin2022]
Yang, H., Zhou, C., Duan, X., Lopez, D., Pastor, A., and
Q. Wu, "Digital Twin Network: Concepts and Reference
Architecture", DOI https://datatracker.ietf.org/doc/draft-
irtf-nmrg-network-digital-twin-arch/, March 2022,
<https://datatracker.ietf.org/doc/draft-irtf-nmrg-network-
digital-twin-arch/>.
[DVNEGCN2021] [DVNEGCN2021]
Zhang, Peiying., Wang, Chao., Kumar, NeeraJ., Zhang,, Zhang, Peiying., Wang, Chao., Kumar, NeeraJ., Zhang,,
Weishan., and Lei. Liu, "Dynamic Virtual Network Embedding Weishan., and Lei. Liu, "Dynamic Virtual Network Embedding
Algorithm based on Graph Convolution Neural Network and Algorithm based on Graph Convolution Neural Network and
Reinforcement Learning", DOI 10.1109/JIOT.2021.3095094, Reinforcement Learning", DOI 10.1109/JIOT.2021.3095094,
July 2021, <https://ieeexplore.ieee.org/document/9475485>. July 2021, <https://ieeexplore.ieee.org/document/9475485>.
[ENViNE2021] [ENViNE2021]
ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego
Network-Based Virtual Network Embedding Scheme for Revenue Network-Based Virtual Network Embedding Scheme for Revenue
 End of changes. 24 change blocks. 
67 lines changed or deleted 107 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/