| < draft-ihsan-nmrg-rl-vne-ps-01.txt | draft-ihsan-nmrg-rl-vne-ps-02.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force I. Ullah | Internet Engineering Task Force I. Ullah | |||
| Internet-Draft Y-H. Han | Internet-Draft Y-H. Han | |||
| Intended status: Informational KOREATECH | Intended status: Informational KOREATECH | |||
| Expires: 22 April 2022 TY. Kim | Expires: 23 October 2022 TY. Kim | |||
| ETRI | ETRI | |||
| 19 October 2021 | 21 April 2022 | |||
| Reinforcement Learning-Based Virtual Network Embedding: Problem | Reinforcement Learning-Based Virtual Network Embedding: Problem | |||
| Statement | Statement | |||
| draft-ihsan-nmrg-rl-vne-ps-01 | draft-ihsan-nmrg-rl-vne-ps-02 | |||
| Abstract | Abstract | |||
| In Network virtualization (NV) technology, Virtual Network Embedding | In Network virtualization (NV) technology, Virtual Network Embedding | |||
| (VNE) is an algorithm used to map a virtual network to the substrate | (VNE) is an algorithm used to map a virtual network to the substrate | |||
| network. VNE is the core orientation of NV which has a great impact | network. VNE is the core orientation of NV which has a great impact | |||
| on the performance of virtual network and resource utilization of the | on the performance of virtual network and resource utilization of the | |||
| substrate network. An efficient embedding algorithm can maximize the | substrate network. An efficient embedding algorithm can maximize the | |||
| acceptance ratio of virtual networks to increase the revenue for | acceptance ratio of virtual networks to increase the revenue for | |||
| Internet service provider. Several works have been appeared on the | Internet service provider. Several works have been appeared on the | |||
| skipping to change at page 2, line 10 ¶ | skipping to change at page 2, line 10 ¶ | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on 22 April 2022. | This Internet-Draft will expire on 23 October 2022. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2022 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
| license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
| Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
| and restrictions with respect to this document. Code Components | and restrictions with respect to this document. Code Components | |||
| extracted from this document must include Simplified BSD License text | extracted from this document must include Revised BSD License text as | |||
| as described in Section 4.e of the Trust Legal Provisions and are | described in Section 4.e of the Trust Legal Provisions and are | |||
| provided without warranty as described in the Simplified BSD License. | provided without warranty as described in the Revised BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction and Scope . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction and Scope . . . . . . . . . . . . . . . . . . . 2 | |||
| 2. Reinforcement Learning-based VNE Solutions . . . . . . . . . 5 | 2. Reinforcement Learning-based VNE Solutions . . . . . . . . . 5 | |||
| 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 4. Problem Space . . . . . . . . . . . . . . . . . . . . . . . . 9 | 4. Problem Space . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 4.1. State Representation . . . . . . . . . . . . . . . . . . 9 | 4.1. State Representation . . . . . . . . . . . . . . . . . . 9 | |||
| 4.2. Action Space . . . . . . . . . . . . . . . . . . . . . . 9 | 4.2. Action Space . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 4.3. Reward Description . . . . . . . . . . . . . . . . . . . 10 | 4.3. Reward Description . . . . . . . . . . . . . . . . . . . 10 | |||
| 4.4. Policy and RL Algorithms . . . . . . . . . . . . . . . . 11 | 4.4. Policy and RL Algorithms . . . . . . . . . . . . . . . . 11 | |||
| 4.5. Training Environment . . . . . . . . . . . . . . . . . . 12 | 4.5. Training Environment . . . . . . . . . . . . . . . . . . 12 | |||
| 4.6. Sim2Real Gap . . . . . . . . . . . . . . . . . . . . . . 13 | 4.6. Sim2Real Gap . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 4.7. Generalization . . . . . . . . . . . . . . . . . . . . . 13 | 4.7. Generalization . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | |||
| 7. Informative References . . . . . . . . . . . . . . . . . . . 14 | 7. Informative References . . . . . . . . . . . . . . . . . . . 14 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 1. Introduction and Scope | 1. Introduction and Scope | |||
| Recently, Network virtualization (NV) technology has received a lot | Recently, Network virtualization (NV) technology has received a lot | |||
| of attention from academics and industry. It allows multiple | of attention from academics and industry. It allows multiple | |||
| heterogeneous virtual networks to share resources on the same | heterogeneous virtual networks to share resources on the same | |||
| substrate network (SN) [RFC7364], [ASNVT2020]. The current large- | substrate network (SN) [RFC7364], [ASNVT2020]. The current large- | |||
| size fixed substrate network architecture is no longer efficient and | size fixed substrate network architecture is no longer efficient and | |||
| not extendable due to network ossification. To overcome this | not extendable due to network ossification. To overcome this | |||
| limitations, traditional Internet Service Providers (ISPs) are | limitations, traditional Internet Service Providers (ISPs) are | |||
| skipping to change at page 3, line 26 ¶ | skipping to change at page 3, line 26 ¶ | |||
| network and increase the commercial revenue of both SPs and InPs. NV | network and increase the commercial revenue of both SPs and InPs. NV | |||
| can increase network agility, flexibility and scalability while | can increase network agility, flexibility and scalability while | |||
| creating significant cost savings. Greater network workload | creating significant cost savings. Greater network workload | |||
| mobility, increased availability of network resources with good | mobility, increased availability of network resources with good | |||
| performance, and automated operations, are all the benefits of NV. | performance, and automated operations, are all the benefits of NV. | |||
| Virtual Network Embedding (VNE) [VNESURV2013] is one of the main | Virtual Network Embedding (VNE) [VNESURV2013] is one of the main | |||
| technique and strategy which used to map a virtual network to the | technique and strategy which used to map a virtual network to the | |||
| substrate network. VNE algorithm has two main parts, Node embedding: | substrate network. VNE algorithm has two main parts, Node embedding: | |||
| where virtual nodes of VN have to be mapped to the SN nodes, and Link | where virtual nodes of VN have to be mapped to the SN nodes, and Link | |||
| ebbedding: where virtual links between the VNs have to be mapped to | embedding: where virtual links between the VNs have to be mapped to | |||
| the physical paths in the substrate network. It has been proven to | the physical paths in the substrate network. It has been proven to | |||
| be NP-Hard, and both node and link embeddings have become challenging | be NP-Hard, and both node and link embeddings have become challenging | |||
| for the researchers. A virtual node and link should be efficiently | for the researchers. A virtual node and link should be efficiently | |||
| embedded into a given SN, so that more VNR can be accepted with | embedded into a given SN, so that more VNR can be accepted with | |||
| minimum cost. The distance of the virtual nodes from each other in a | minimum cost. The distance of the virtual nodes from each other in a | |||
| given SN is a big contribution to the link failures and causes the | given SN is a big contribution to the link failures and causes the | |||
| rejection of VNRs. Hence, an efficient and intelligent technique is | rejection of VNRs. Hence, an efficient and intelligent technique is | |||
| required for VNE problem to reduce VNRs rejection [ENViNE2021]. In | required for VNE problem to reduce VNRs rejection [ENViNE2021]. In | |||
| the perspective of the InPs, the efficient VNE performs better mostly | the perspective of the InPs, the efficient VNE performs better mostly | |||
| in terms of revenue, acceptance ratio, and revenue-to-cost ratio. | in terms of revenue, acceptance ratio, and revenue-to-cost ratio. | |||
| skipping to change at page 5, line 5 ¶ | skipping to change at page 5, line 5 ¶ | |||
| Recently, artificial intelligence and machine learning technologies | Recently, artificial intelligence and machine learning technologies | |||
| have been widely used to solve networking problems [SUR2018], | have been widely used to solve networking problems [SUR2018], | |||
| [MLCNM2018], [MVNNML2021]. There has been a surge in research | [MLCNM2018], [MVNNML2021]. There has been a surge in research | |||
| efforts,specially,reinforcement learning (RL) which has been | efforts,specially,reinforcement learning (RL) which has been | |||
| contributed much more in the many complex tasks, e.g. video games and | contributed much more in the many complex tasks, e.g. video games and | |||
| auto-driving etc. The main goal of an RL to learn better policies | auto-driving etc. The main goal of an RL to learn better policies | |||
| for sequential decision making problems (e.g., VNE) and solve them | for sequential decision making problems (e.g., VNE) and solve them | |||
| very efficiently. | very efficiently. | |||
| Problems such as node ordering, pattern matching, and network feature | Problems such as node classification, pattern matching, and network | |||
| extraction can all be simplified by graph-related theories and | feature extraction, can be simplified by graph-related theories and | |||
| techniques. Graph neural network (GNN) is a new type of ML model | techniques. Graph neural network (GNN) is a new type of ML model | |||
| architecture that can aggregate graph features (degrees, distance to | architecture that can aggregate graph features (degrees, distance to | |||
| specific nodes, node connectivity, etc.) on nodes [DVNEGCN2021]. The | specific nodes, node connectivity, etc.) on nodes [DVNEGCN2021]. | |||
| model can be used to cluster nodes and links according to the | Graph convolution neural network (GCNN) is a natural generalization | |||
| form of GNN which is used to automatically extract the features of | ||||
| underlying network, which optimizes the selection of VNE decision. | ||||
| The model can be used to cluster nodes and links according to the | ||||
| physical nodes and physical links attribute characteristics (CPU, | physical nodes and physical links attribute characteristics (CPU, | |||
| storage, bandwidth, delay, etc.), and it is highly suitable for graph | storage, bandwidth, delay, etc.), and it is highly suitable for graph | |||
| structures of any topological form. Hence, GNN is useful to find the | structures of any topological form. Hence, GNN is useful to find the | |||
| best VNE strategy by intelligent agent training, and the organic | best VNE strategy by intelligent agent training, and the organic | |||
| combination of VNE and GCN has a good prerequisite. | combination of VNE and GCN has a good prerequisite. | |||
| Designing and applying RL techniques directly into VNE problems is | Designing and applying RL techniques directly into VNE problems is | |||
| not yet trivial, but may face several challenges. This document | not yet trivial, but may face several challenges. Several works have | |||
| describes the problems. Several works have appeared on the design of | been appeared on the design of VNE solutions using RL, which focuses | |||
| VNE solutions using RL, which focuses on how to interact with the | on how to interact with the environment to achieve maximum cumulative | |||
| environment to achieve maximum cumulative return [VNEQS2021], | return [VNEQS2021], [NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020], | |||
| [NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020], [RLVNEWSN2020], | [RLVNEWSN2020], [QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019], | |||
| [QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019], [DeepViNE2019], | [DeepViNE2019], [VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018], | |||
| [VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018], [NeuroViNE2018], | [NeuroViNE2018], [QVNE2020]. This document outlines the problems | |||
| [QVNE2020]. This document outlines the problems encountered when | encountered when designing and applying RL-based VNE solutions. | |||
| designing and applying RL-based VNE solutions. Section 2 describes | Section 2 describes how to design RL-based VNE solutions. Section 3 | |||
| how to design RL-based VNE solutions. Section 3 gives terminology, | gives terminology, and Section 4 describes the problem space details. | |||
| and Section 4 describes the problem space details. | ||||
| 2. Reinforcement Learning-based VNE Solutions | 2. Reinforcement Learning-based VNE Solutions | |||
| As we discussed that RL has been studied in various fields (such as | As we discussed that RL has been studied in various fields (such as | |||
| game, control system, operation research, information theory, multi- | game, control system, operation research, information theory, multi- | |||
| agent system, network system, etc.) and shows better performance than | agent system, network system, etc.) and shows better performance than | |||
| humans. Unlike deep learning, RL trains a policy model by receiving | humans. Unlike deep learning, RL trains a policy model by receiving | |||
| rewards through interaction with the environment without training | rewards through interaction with the environment without training | |||
| label data. | label data. | |||
| Recently, there have been several attempts to solve VNE problems | Recently, there have been several attempts to solve VNE problems | |||
| using RL. When applying RL-based algorithms to solve VNE problems, | using RL. When applying RL-based algorithms to solve VNE problems, | |||
| the RL agent automatically learns without human intervention through | the RL agent automatically learns through the environment without | |||
| interaction with the environment. Once the agent completed the | human intervention. Once the agent completed the learning process, | |||
| learning process, it can generate the most appropriate embeddings | it can generate the most appropriate embeddings decision (action) | |||
| decision (action) based on the state of the network. Based on the | based on the his knowledge and network state. For single embedding | |||
| embedding or action the agent get reward from the environments to | or action at each time step the agent get reward from the | |||
| adaptively train its policy for future action. The RL agent gets the | environments to adaptively train its policy for future action. The | |||
| most optimized model based on the reward function defined according | RL agent gets the most optimized model based on the reward function | |||
| to each objective (revenue, cost, revenue to cost ratio and | defined according to each objective (revenue, cost, revenue to cost | |||
| acceptance ratio). The optimal RL policy model provides the VNE | ratio and acceptance ratio). The optimal RL policy model provides | |||
| strategy appropriately according to the objective of the network | the VNE strategy appropriately according to the objective of the | |||
| operator. | network operator. | |||
| Figure 2 shows the virtual network embedding solution based on RL | Figure 2. shows the virtual network embedding solution based on RL | |||
| algorithm. The RL is divided into a training process and an | algorithm. The RL strategy is divided into two main parts training | |||
| inference process. In the training process, state information is | process and an inference process. In the training process, state | |||
| composed of various substrate networks and VNRs (Environment), which | information is composed of various substrate networks and VNRs | |||
| are used as suitable inputs for RL models through feature extraction. | (Environment), which are used as suitable inputs for RL models | |||
| After that, the RL model is updated by model updater using a feature | through feature extraction. After that, the RL model is updated by | |||
| extracted state and reward. In the inference process, using the | model updater using a feature extracted state and reward. In the | |||
| trained RL model, the embedding result is provided to the operating | inference process, using the trained RL model, the embedding result | |||
| network in real time. | is provided to the operating network in real time. | |||
| The following figure shows the detail about RL method based virtual | The following figure shows the detail about RL method based virtual | |||
| networks embedding solutions. | networks embedding solutions. | |||
| RL Model Training Process | RL Model Training Process | |||
| +--------------------------------------------------------------------+ | +--------------------------------------------------------------------+ | |||
| | Training Environment | | | Training Environment | | |||
| | +-------------------+ RL-based VNE Agent | | | +-------------------+ RL-based VNE Agent | | |||
| | | +---------+ | +----------------------------------+ | | | | +---------+ | +----------------------------------+ | | |||
| | | | +---------+ | | Action | | | | | | +---------+ | | Action | | | |||
| skipping to change at page 8, line 24 ¶ | skipping to change at page 8, line 24 ¶ | |||
| Virtual Network Embedding (VNE) [VNESURV2013] is one of the main | Virtual Network Embedding (VNE) [VNESURV2013] is one of the main | |||
| techniques used to map a virtual network to the substrate network. | techniques used to map a virtual network to the substrate network. | |||
| Substrate Network (SN) | Substrate Network (SN) | |||
| The underlying physical network which contains the resources such | The underlying physical network which contains the resources such | |||
| as CPU and bandwidth for virtual networks is called substrate | as CPU and bandwidth for virtual networks is called substrate | |||
| network. | network. | |||
| Virtual Network Request (VNR) | Virtual Network Request (VNR) | |||
| Virtual Network Request is a complete single Virtual network | Virtual Network Request is a complete single Virtual network | |||
| request containing virtual nodes and virtual links. | containing virtual nodes and virtual links. | |||
| Agent | Agent | |||
| In RL, an agent is the component that makes the decision abd take | In RL, an agent is the component that makes the decision abd take | |||
| action (i.e., embedding decision). | action (i.e., embedding decision). | |||
| State | State | |||
| State is a representation (e.g., remaining SN capacity and | State is a representation (e.g., remaining SN capacity and | |||
| requested VN resource) of the current environment, and it tells | requested VN resource) of the current environment, and it tells | |||
| the agent what situation it is in currently. | the agent what situation it is in currently. | |||
| skipping to change at page 9, line 33 ¶ | skipping to change at page 9, line 33 ¶ | |||
| RL agent to establish a thorough knowledge of the network status and | RL agent to establish a thorough knowledge of the network status and | |||
| generate efficient embedding decisions. Therefore, it is essential | generate efficient embedding decisions. Therefore, it is essential | |||
| to firstly design the state representation that serves as the input | to firstly design the state representation that serves as the input | |||
| to the agent. The state representation is the information which an | to the agent. The state representation is the information which an | |||
| agent can receive from the environment, and consists of a set of | agent can receive from the environment, and consists of a set of | |||
| values representing the current situation in the environment. Based | values representing the current situation in the environment. Based | |||
| on the state representation, the RL agent selects the most | on the state representation, the RL agent selects the most | |||
| appropriate action through its policy model. In the VNE problem, an | appropriate action through its policy model. In the VNE problem, an | |||
| RL agent needs to know the information of the overall SN entities and | RL agent needs to know the information of the overall SN entities and | |||
| their current status in order to use the resources of the nodes and | their current status in order to use the resources of the nodes and | |||
| edges of the substrate network. Also it must know the requirements | links of the substrate network. Also it must know the requirements | |||
| of the VNR. Therefore, in the VNE problem, the state usually should | of the VNR. Therefore, in the VNE problem, the state usually should | |||
| represent the current resource state of the nodes and edges of the | represent the current resource state of the nodes and links of the | |||
| substrate network (ie, CPU, memory, storage, bandwidth, delay, loss | substrate network (ie, CPU, memory, storage, bandwidth, delay, loss | |||
| rate, etc.) and the requirements of the virtual node and link of the | rate, etc.) and the requirements of the virtual node and link of the | |||
| VNR. The collected status information is used as raw input, or | VNR. The collected status information is used as raw input, or | |||
| refined status information through the feature extraction process is | refined status information through the feature extraction process is | |||
| used as input for the RL agent. The state representation may vary | used as input for the RL agent. The state representation may vary | |||
| depending on the operator's objective and VNE strategy. The method | depending on the operator's objective and VNE strategy. The method | |||
| of determining such feature extraction and representation greatly | of determining such feature extraction and representation greatly | |||
| affects the performance of the agent. | affects the performance of the agent. | |||
| 4.2. Action Space | 4.2. Action Space | |||
| skipping to change at page 11, line 14 ¶ | skipping to change at page 11, line 14 ¶ | |||
| Revenue | Revenue | |||
| Revenue is the sum of the virtual resources requested by the VN, | Revenue is the sum of the virtual resources requested by the VN, | |||
| and calculated to determine the total cost of the resources. | and calculated to determine the total cost of the resources. | |||
| Typically, a successful action (e.g., VNR is embedded without | Typically, a successful action (e.g., VNR is embedded without | |||
| violation) is treated to be a good reward which also increases the | violation) is treated to be a good reward which also increases the | |||
| revenue. Otherwise, a failed action (e.g., VNR is rejected) leads | revenue. Otherwise, a failed action (e.g., VNR is rejected) leads | |||
| that the agent will receive a negative reward as well as | that the agent will receive a negative reward as well as | |||
| decreasing the revenue. | decreasing the revenue. | |||
| Cost | ||||
| Cost is the expenditure incurred when VNR is embedded as a | ||||
| substrate network. It's not a good embedding result to pursue | ||||
| only high revenue. It is important for the network operator and | ||||
| SP to spend less. The lower the cost, the better the agent will | ||||
| be rewarded. | ||||
| Acceptance Ratio | Acceptance Ratio | |||
| Acceptance ratio is the ratio measured by the number of | Acceptance ratio is the ratio measured by the number of | |||
| successfully embedded virtual network requests divided by total | successfully embedded virtual network requests divided by total | |||
| number of virtual network requests. To achieve a high acceptance | number of virtual network requests. To achieve a high acceptance | |||
| ratio, the agent is trying to embed maximum VNR and get a good | ratio, the agent is trying to embed maximum VNR and get a good | |||
| reward. Getting a good reward is usually proportional to the | reward. Getting a good reward is usually proportional to the | |||
| acceptance ratio. | acceptance ratio. | |||
| Revenue-to-cost ratio | Revenue-to-cost ratio | |||
| To balance and compare the cost of resources for embedding VNR, | To balance and compare the cost of resources for embedding VNR, | |||
| skipping to change at page 12, line 45 ¶ | skipping to change at page 13, line 4 ¶ | |||
| through a simulation that simulates the real environment. In order | through a simulation that simulates the real environment. In order | |||
| to solve the VNE problem, we need to use a network simulator similar | to solve the VNE problem, we need to use a network simulator similar | |||
| to the real environment because it is difficult to repeatedly | to the real environment because it is difficult to repeatedly | |||
| experiment with real network environments using an RL algorithm, and | experiment with real network environments using an RL algorithm, and | |||
| it is very challenging and overwhelming to directly apply an RL | it is very challenging and overwhelming to directly apply an RL | |||
| algorithm to real-world environments. When solving VNE problems, a | algorithm to real-world environments. When solving VNE problems, a | |||
| network simulation environment similar to a real network is required. | network simulation environment similar to a real network is required. | |||
| The network simulation environment should have a general SN | The network simulation environment should have a general SN | |||
| environment and VNR required by the operator. The SN has nodes and | environment and VNR required by the operator. The SN has nodes and | |||
| links between nodes, and each has capacity such as CPU and Bandwidth. | links between nodes, and each has capacity such as CPU and Bandwidth. | |||
| In the case of VNR, there are virtual nodes and links required by the | In the case of VNR, there are virtual nodes and links required by the | |||
| operator, and each must have its own requirements. | operator, and each must have its own requirements. | |||
| As described in [DTwin2022], a digital twin network is a virtual | ||||
| representation of the physical network environment and can be built | ||||
| by applying digital twin technologies to the environment and creating | ||||
| virtual images of diverse physical network facilities. The digital | ||||
| twin for networks is an expansion platform of network simulation. In | ||||
| [DTwin2022], Section 8.2 describes that a digital twin network | ||||
| provides the complete machine learning lifecycle development by | ||||
| providing a realistic network environment, including network | ||||
| topologies, etc. Hence, RL algorithms to solve the VNE problem can | ||||
| be trained and verified on a digital twin network upfront before | ||||
| deployed to the physical networks, and the verification accuracy will | ||||
| be generally high when the digital twin network reproduces network | ||||
| behaviors well under various conditions. On the other hand, two | ||||
| placeholders marked as [DTwin2022] in the above new paragraph should | ||||
| be replaced with the right reference number after inserting the | ||||
| following new Internet-Draft, which introduces the definition, | ||||
| architecture, and use-cases of digital twin network, into "Section 7. | ||||
| Informative References" of our Internet Draft. | ||||
| 4.6. Sim2Real Gap | 4.6. Sim2Real Gap | |||
| An RL algorithm iteratively learns through a simulation environment | Sim-to-real is a very comprehensive concept and applied in many | |||
| to train a model of the desired policy. The trained model is then | fields including robotics and classic machine vision tasks. An RL | |||
| algorithm iteratively learns through a simulation environment to | ||||
| train a model of the desired policy. The trained model is then | ||||
| applied to the real environment and/or tuned more for adapting to the | applied to the real environment and/or tuned more for adapting to the | |||
| real one. However, when the trained model is applied in the | real one. However, when the trained model is applied in the | |||
| simulation to the real environment, sim2real gap problem arises. | simulation to the real environment, sim2real gap problem arises. | |||
| Obviously, the simulation environment does not match perfectly to the | Closing the gap between simulation and reality gap in terms of | |||
| real environment which mostly fails in the tuning process and gives | actuation requires simulators to be more accurate, and to account for | |||
| poor performance in the model because of the Sim2Real gap. The | variability in agent dynamics. Obviously, the simulation environment | |||
| sim2real gap is caused by the difference between the simulation and | does not match perfectly to the real environment which mostly fails | |||
| the real environment. It is because the simulation environment | in the tuning process and gives poor performance in the model because | |||
| cannot perfectly simulate the real environment, and there are many | of the Sim2Real gap. The sim2real gap is caused by the difference | |||
| variables in the real environment. In a real network environment for | between the simulation and the real environment. It is because the | |||
| VNE, the SN's nodes and links may fail due to external factors, or | simulation environment cannot perfectly simulate the real | |||
| capacity such as CPU may change suddenly. In order to solve this | environment, and there are many variables in the real environment. | |||
| problem, the simulation environment should be more robust or the | In a real network environment for VNE, the SN's nodes and links may | |||
| trained RL model should be generalized. To reduce the gap between | fail due to external factors, or capacity such as CPU may change | |||
| sim and real network environments we need to train our model with an | suddenly. In order to solve this problem, the simulation environment | |||
| efficient and large number of VNR and keep learning the agent not | should be more robust or the trained RL model should be generalized. | |||
| only depend on previous memorization. | To reduce the gap between sim and real network environments we need | |||
| to train our model with an efficient and large number of VNR and keep | ||||
| learning the agent not only depend on previous memorization. | ||||
| 4.7. Generalization | 4.7. Generalization | |||
| Generalization refers to the trained model's ability to adapt | Generalization refers to the trained model's ability to adapt | |||
| properly to previously unseen new observations. An RL algorithm | properly to previously unseen new observations. An RL algorithm | |||
| tries to learn a model that optimizes some objective with the purpose | tries to learn a model that optimizes some objective with the purpose | |||
| of performing well on data that has never been seen by the model | of performing well on data that has never been seen by the model | |||
| during training. In terms of VNE problems, the generalization is a | during training. In terms of VNE problems, the generalization is a | |||
| measure of how the agent's policy model performs on predicting unseen | measure of how the agent's policy model performs on predicting unseen | |||
| VNR. The RL agent not only has to memorize all the previous variance | VNR. The RL agent not only has to memorize all the previous variance | |||
| skipping to change at page 14, line 32 ¶ | skipping to change at page 15, line 12 ¶ | |||
| DOI 10.1109/TNSM.2020.2971543, February 2020, | DOI 10.1109/TNSM.2020.2971543, February 2020, | |||
| <https://ieeexplore.ieee.org/document/8982091>. | <https://ieeexplore.ieee.org/document/8982091>. | |||
| [DeepViNE2019] | [DeepViNE2019] | |||
| Dolati, M., Hassanpour, S. B., Ghaderi, M., and A. | Dolati, M., Hassanpour, S. B., Ghaderi, M., and A. | |||
| Khonsari, "DeepViNE: Virtual Network Embedding with Deep | Khonsari, "DeepViNE: Virtual Network Embedding with Deep | |||
| Reinforcement Learning", BCP 72, RFC 3552, | Reinforcement Learning", BCP 72, RFC 3552, | |||
| DOI 10.1109/INFCOMW.2019.8845171, September 2019, | DOI 10.1109/INFCOMW.2019.8845171, September 2019, | |||
| <https://ieeexplore.ieee.org/document/8845171>. | <https://ieeexplore.ieee.org/document/8845171>. | |||
| [DTwin2022] | ||||
| Yang, H., Zhou, C., Duan, X., Lopez, D., Pastor, A., and | ||||
| Q. Wu, "Digital Twin Network: Concepts and Reference | ||||
| Architecture", DOI https://datatracker.ietf.org/doc/draft- | ||||
| irtf-nmrg-network-digital-twin-arch/, March 2022, | ||||
| <https://datatracker.ietf.org/doc/draft-irtf-nmrg-network- | ||||
| digital-twin-arch/>. | ||||
| [DVNEGCN2021] | [DVNEGCN2021] | |||
| Zhang, Peiying., Wang, Chao., Kumar, NeeraJ., Zhang,, | Zhang, Peiying., Wang, Chao., Kumar, NeeraJ., Zhang,, | |||
| Weishan., and Lei. Liu, "Dynamic Virtual Network Embedding | Weishan., and Lei. Liu, "Dynamic Virtual Network Embedding | |||
| Algorithm based on Graph Convolution Neural Network and | Algorithm based on Graph Convolution Neural Network and | |||
| Reinforcement Learning", DOI 10.1109/JIOT.2021.3095094, | Reinforcement Learning", DOI 10.1109/JIOT.2021.3095094, | |||
| July 2021, <https://ieeexplore.ieee.org/document/9475485>. | July 2021, <https://ieeexplore.ieee.org/document/9475485>. | |||
| [ENViNE2021] | [ENViNE2021] | |||
| ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego | ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego | |||
| Network-Based Virtual Network Embedding Scheme for Revenue | Network-Based Virtual Network Embedding Scheme for Revenue | |||
| End of changes. 24 change blocks. | ||||
| 67 lines changed or deleted | 107 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||