< draft-giraltyellamraju-alto-bsg-requirements-00.txt   draft-giraltyellamraju-alto-bsg-requirements-01.txt >
ALTO J. Ros-Giralt ALTO J. Ros-Giralt
Internet-Draft S. Yellamraju Internet-Draft S. Yellamraju
Intended status: Informational Qualcomm Intended status: Informational Qualcomm
Expires: 8 September 2022 Q. Wu Expires: 23 September 2022 Q. Wu
Huawei Huawei
L.M. Contreras L.M. Contreras
Telefonica Telefonica
R. Yang R. Yang
Yale University Yale University
K. Gao K. Gao
Sichuan University Sichuan University
7 March 2022 22 March 2022
Supporting Bottleneck Structure Graphs in ALTO: Use Cases and Supporting Bottleneck Structure Graphs in ALTO: Use Cases and
Requirements Requirements
draft-giraltyellamraju-alto-bsg-requirements-00 draft-giraltyellamraju-alto-bsg-requirements-01
Abstract Abstract
This document proposes an extension to the base Application-Layer This document proposes an extension to the base Application-Layer
Traffic Optimization (ALTO) protocol to support bottleneck structures Traffic Optimization (ALTO) protocol to support bottleneck structures
as an efficient representation of the state of a network. Bottleneck as an efficient representation of the state of a network. Bottleneck
structures are efficient computational graphs that allow network structures are efficient computational graphs that allow network
operators and application service providers to optimize application operators and application service providers to optimize application
performance in a variety of communication problems including routing, performance in a variety of communication problems including routing,
flow control, flow scheduling, bandwidth prediction, and network flow control, flow scheduling, bandwidth prediction, and network
skipping to change at page 2, line 20 skipping to change at page 2, line 20
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 8 September 2022. This Internet-Draft will expire on 23 September 2022.
Copyright Notice Copyright Notice
Copyright (c) 2022 IETF Trust and the persons identified as the Copyright (c) 2022 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 47 skipping to change at page 2, line 47
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 4
3. Brief Introduction to Bottleneck Structures . . . . . . . . . 4 3. Brief Introduction to Bottleneck Structures . . . . . . . . . 4
3.1. Example of Bottleneck Structure . . . . . . . . . . . . . 4 3.1. Example of Bottleneck Structure . . . . . . . . . . . . . 4
3.2. Propagation Properties . . . . . . . . . . . . . . . . . 7 3.2. Propagation Properties . . . . . . . . . . . . . . . . . 7
3.3. Forces of Interaction among Flows and Links . . . . . . . 7 3.3. Forces of Interaction among Flows and Links . . . . . . . 7
3.4. Ripple Effects in a Communication Network . . . . . . . . 8 3.4. Ripple Effects in a Communication Network . . . . . . . . 8
3.5. Not all Bottleneck Links Are Born Equal . . . . . . . . . 8 3.5. Not all Bottleneck Links Are Born Equal . . . . . . . . . 8
3.6. Quantifying the Ripple Effects . . . . . . . . . . . . . 9 3.6. Quantifying the Ripple Effects . . . . . . . . . . . . . 9
3.7. Types of Bottleneck Structures . . . . . . . . . . . . . 10 3.7. Types of Bottleneck Structures . . . . . . . . . . . . . 11
3.8. Computing Optimized Network Reconfigurations . . . . . . 11 3.8. Computing Optimized Network Reconfigurations . . . . . . 12
3.9. Types of Network Reconfigurations . . . . . . . . . . . . 12 3.9. Types of Network Reconfigurations . . . . . . . . . . . . 12
4. ALTO Bottleneck Structure Service Use Cases . . . . . . . . . 14 4. ALTO Bottleneck Structure Service Use Cases . . . . . . . . . 15
4.1. Application Rate Limiting for CDN and Edge Cloud 4.1. Application Rate Limiting for CDN and Edge Cloud
Applications . . . . . . . . . . . . . . . . . . . . . . 14 Applications . . . . . . . . . . . . . . . . . . . . . . 15
4.2. Time-bound Constrained Flow Acceleration for Large Data Set 4.2. Time-bound Constrained Flow Acceleration for Large Data Set
Transfers . . . . . . . . . . . . . . . . . . . . . . . . 15 Transfers . . . . . . . . . . . . . . . . . . . . . . . . 15
4.3. Application Performance Optimization Through AI 4.3. Application Performance Optimization Through AI
Modeling . . . . . . . . . . . . . . . . . . . . . . . . 18 Modeling . . . . . . . . . . . . . . . . . . . . . . . . 18
4.4. Optimal Joint Routing and Congestion Control . . . . . . 19 4.4. Optimized Joint Routing and Congestion Control . . . . . 19
4.5. Service Placement for Edge Computing . . . . . . . . . . 20 4.5. Service Placement for Edge Computing . . . . . . . . . . 20
4.6. Training Neural Networks and AI Inference for Edge Clouds,
Data Centers and Planet-Scale Networks . . . . . . . . . 21
5. Example: Application Layer Traffic Optimization using 5. Example: Application Layer Traffic Optimization using
Bottleneck Structures . . . . . . . . . . . . . . . . . . 21 Bottleneck Structures . . . . . . . . . . . . . . . . . . 23
6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 26 6. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 28
6.1. Requirement 1: Bottleneck Structure Graph (BSG) 6.1. Requirement 1: Bottleneck Structure Graph (BSG)
Abstraction . . . . . . . . . . . . . . . . . . . . . . . 26 Abstraction . . . . . . . . . . . . . . . . . . . . . . . 28
6.2. Requirement 2: Information Received from the Network . . 27 6.2. Requirement 2: Information Received from the Network . . 29
6.3. Requirement 3: Information Passed to the Application . . 28 6.3. Requirement 3: Information Passed to the Application . . 30
6.4. Requirement 4: Features Needed to Support the Use 6.4. Requirement 4: Features Needed to Support the Use
Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 30
7. Security Considerations . . . . . . . . . . . . . . . . . . . 29 7. Security Considerations . . . . . . . . . . . . . . . . . . . 31
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 30 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 32
9. References . . . . . . . . . . . . . . . . . . . . . . . . . 30 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 33
9.1. Normative References . . . . . . . . . . . . . . . . . . 30 9.1. Normative References . . . . . . . . . . . . . . . . . . 33
9.2. Informative References . . . . . . . . . . . . . . . . . 31 9.2. Informative References . . . . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35
1. Introduction 1. Introduction
Bottleneck structures have been recently introduced in [G2-SIGCOMM] Bottleneck structures have been recently introduced in [G2-SIGCOMM]
and [G2-SIGMETRICS] as efficient computational graphs that embed and [G2-SIGMETRICS] as efficient computational graphs that embed
information about the topology, routing and flow information of a information about the topology, routing and flow information of a
network. These computational graphs allow network operators and network. These computational graphs allow network operators and
application service providers to compute network derivatives that can application service providers to compute network derivatives that can
be used to make traffic optimization decisions. For instance, using be used to make traffic optimization decisions. For instance, using
the bottleneck structure of a network, a real-time communication the bottleneck structure of a network, a real-time communication
skipping to change at page 8, line 51 skipping to change at page 8, line 51
the same relevancy. In Figure 2, links at the top of the graph have the same relevancy. In Figure 2, links at the top of the graph have
a higher impact on the overall performance of the network than links a higher impact on the overall performance of the network than links
at the bottom. For instance, consider link l1. A variation on its at the bottom. For instance, consider link l1. A variation on its
capacity will create a ripple effect that will impact the performance capacity will create a ripple effect that will impact the performance
of all the flows in the network, since all flow vertices are of all the flows in the network, since all flow vertices are
reachable from vertex l1 according to the bottleneck structure. In reachable from vertex l1 according to the bottleneck structure. In
contrast, link l3 has a much smaller impact on the overall contrast, link l3 has a much smaller impact on the overall
performance of the network, since a variation of its capacity will performance of the network, since a variation of its capacity will
affect flow f5, but have no impact on any of the other flows. This affect flow f5, but have no impact on any of the other flows. This
information is valuable to network operators and application service information is valuable to network operators and application service
providers as it can be used to make traffic engineering decisions. providers as it can be used to make informed network optimization
decisions. For instance, in edge computing, an operator could choose
For instance, in edge computing, an operator could choose to place a to place a containerized service (e.g., for extended reality, XR) on
containerized service (e.g., for extended reality, XR) on compute compute nodes that would yield communication paths traversing
nodes that would yield communication paths traversing bottleneck bottleneck links with lower impact on the overall performance of the
links with lower impact on the overall performance of the network. network (See the use case in Section 4.5 for more details).
Similarly, in network slicing (or capacity planning in general),
operators could choose to allocate more bandwidth on those links that
are more influential (i.e., those links that are at lower levels in
the bottleneck structure) according to the expected traffic pattern
in the network slice.
Overall, bottleneck structures provide a mechanism to rank bottleneck Overall, bottleneck structures provide a mechanism to rank bottleneck
links according to their impact on the overall performance of the links according to their impact on the overall performance of the
network. This information can be used in a variety of optimization network. This information can be used in a variety of optimization
problems, such as traffic engineering, routing, capacity planning, or problems, such as traffic engineering, routing, capacity planning, or
resilience analysis, among others. resilience analysis, among others.
3.6. Quantifying the Ripple Effects 3.6. Quantifying the Ripple Effects
Bottleneck structures not only allow network operators to reason Bottleneck structures not only allow network operators to reason
skipping to change at page 11, line 33 skipping to change at page 11, line 43
path vertex_. A Q-PGG is slightly larger than a PGG (with path vertex_. A Q-PGG is slightly larger than a PGG (with
about |Q| times more vertices, where |Q| is the number of QoS about |Q| times more vertices, where |Q| is the number of QoS
classes supported by the network) but still significantly smaller classes supported by the network) but still significantly smaller
than the FGG. than the FGG.
For most of the applications, it is recommended to use a PGG, or a For most of the applications, it is recommended to use a PGG, or a
Q-PGG if the network supports QoS classes, since these bottleneck Q-PGG if the network supports QoS classes, since these bottleneck
structures are significantly smaller and faster to process than an structures are significantly smaller and faster to process than an
FGG, and it is often the case that the operator does not need to know FGG, and it is often the case that the operator does not need to know
flow-level information in order to make proper application flow-level information in order to make proper application
performance optimization decisions. Note also that the PGG and Q-PGG performance optimization decisions. Note also that the PGG and the
provide the additional security advantage of hiding flow-level Q-PGG provide the additional security advantage of hiding flow-level
information from the graph. This can be important to operators that information from the graph. This can be important to operators that
are sensitive about security and privacy. are sensitive about security and privacy.
3.8. Computing Optimized Network Reconfigurations 3.8. Computing Optimized Network Reconfigurations
A central element to the theory of bottleneck structures is the A central element to the theory of bottleneck structures is the
ability to efficiently compute derivatives on a network. Derivatives ability to efficiently compute derivatives on a network. Derivatives
are a core building block of the optimization framework, as they are a core building block of the optimization framework, as they
reveal the directions (gradients) in the feasible set that can help reveal the directions (gradients) in the feasible set that can help
bring the network to a higher level of performance. In this bring the network to a higher level of performance. In this
skipping to change at page 16, line 45 skipping to change at page 16, line 50
| | | | | | | |
+--^---+ +------+ +--^---+ +------+
| +2 | +2
| |
+--v---+ +--v---+
| | | |
| f5 | | f5 |
| | | |
+------+ +------+
Figure 3: Reducing the rate of flow f1 maximally accelerates flow f5 Figure 3: Reducing the rate of flow f1 maximally accelerates flow f5.
Suppose our goal is to accelerate flow f5. To achieve this Suppose our goal is to accelerate flow f5. To achieve this
objective, we will also assume that we are allowed to traffic shape objective, we will also assume that we are allowed to traffic shape
(reduce) the rate of any of the other flows. Effectively, for each (reduce) the rate of any of the other flows. Effectively, for each
flow fi different than f5, we need to compute the following flow fi different than f5, we need to compute the following
derivative derivative
-dr5/d_ri -dr5/d_ri
and then pick the maximum value. Note that in the above expression, and then pick the maximum value. Note that in the above expression,
we take the left-derivative (d_), since a traffic shaper reduces the we take the left-derivative (d_), since a traffic shaper reduces the
rate of a flow. We also negate the derivative (-d), since we are rate of a flow. We also negate the derivative (-d), since we are
interested in a positive impact induced on flow f5 when reducing the interested in a positive impact induced on flow f5 when reducing the
rate of another flow fi. rate of another flow fi.
Such a calculation can be efficiently performed using the bottleneck Such a calculation can be efficiently performed using the bottleneck
structure. As an example, Figure 3 illustrates how the value of (- structure. As an example, Figure 3 illustrates how the value of (-
skipping to change at page 19, line 20 skipping to change at page 19, line 27
service could help unlock a richer set of machine learning algorithms service could help unlock a richer set of machine learning algorithms
to optimize application performance. to optimize application performance.
An ALTO server could help the application service provider implement An ALTO server could help the application service provider implement
AI-assisted prediction algorithms by exposing the bottleneck AI-assisted prediction algorithms by exposing the bottleneck
structure of the network. Alternatively, ALTO could implement an AI- structure of the network. Alternatively, ALTO could implement an AI-
assisted prediction module with the help of bottleneck structures. assisted prediction module with the help of bottleneck structures.
The application would then query the ALTO server to obtain the The application would then query the ALTO server to obtain the
predicted value. predicted value.
4.4. Optimal Joint Routing and Congestion Control 4.4. Optimized Joint Routing and Congestion Control
In traditional IP networks, the problems of flow routing and In traditional IP networks, the problems of flow routing and
congestion control are separately resolved by following a two-step congestion control are separately resolved by following a two-step
process: first, a routing protocol is used to determine the path process: first, a routing protocol is used to determine the path
between any two nodes in a network; then, flows are routed according between any two nodes in a network; then, flows are routed according
to such paths and their transmission rates are regulated using a to such paths and their transmission rates are regulated using a
congestion control algorithm. This layered and disjoint approach is congestion control algorithm. This layered and disjoint approach is
known to be scalable but suboptimal because the routing algorithm known to be scalable but suboptimal because the routing algorithm
identifies paths without taking into account the flow transmission identifies paths without taking into account the flow transmission
rates assigned by the congestion control algorithm. rates assigned by the congestion control algorithm.
skipping to change at page 21, line 23 skipping to change at page 21, line 29
An ALTO server could help the application service provider optimize An ALTO server could help the application service provider optimize
the placement decision by exposing the bottleneck structure of the the placement decision by exposing the bottleneck structure of the
network. With this information alone, the provider could compute the network. With this information alone, the provider could compute the
effect of placing the service in one location versus another. effect of placing the service in one location versus another.
Alternatively, the application service could query the ALTO server by Alternatively, the application service could query the ALTO server by
passing the information of the possible locations where it can be passing the information of the possible locations where it can be
placed, and the ALTO server could return an ordered list of the placed, and the ALTO server could return an ordered list of the
locations and their expected performance. locations and their expected performance.
4.6. Training Neural Networks and AI Inference for Edge Clouds, Data
Centers and Planet-Scale Networks
Neural network training and inference using distributed computing
systems are the subject of intense research and one of the leading
target applications in today's communication networks. [TOPOOPT-MIT]
[FLEXFLOW-STFORD] [SINGULARITY-MSFT]. To illustrate this use case,
we will focus our discussion on three types of networks: edge clouds,
data centers and planet-scale networks.
5G and Edge Clouds enable for the first time the ability to provide
intelligence at the edge of the network. This capability is
disruptive in that humans and machines will have access to
unprecedented compute power to perform AI inference in real time.
For instance, using augmented reality (AR), humans will be able to
make better informed decisions as they navigate through an
environment by leveraging AI-inference on video and audio signals
captured in real time from their user equipments (UEs). Similarly,
machines such as vehicles or factory robots will be able to use AI
inference to optimize their actions.
Two resources are needed to perform inference: (1) Input data from
the environment (e.g., image and audio signals captured from a video
camera) and (2) compute (typically in the form of GPUs and CPUs).
The input data needs to be transmitted from the location where it is
captured (e.g., a micro-camera running on a human's glasses) to the
location where it is to be processed for inference. The transmission
of the input data requires communication resources, whereas the
inference process requires computing resources. Since computing
resources in the edge cloud (Figure 4) are distributed across the
user equipment (UE), the radio unit (RU), the distributed unit (DU)
and the central unit (CU) [PETERSON], the problem of efficiently
performing AI-inference is one of optimizing the trade-off
communication-compute as follows: compute (communication) power is
more scarce (abundant) if the inference is performed closer to the
UE, and more abundant (scarce) if performed closer to the CU. For
instance, if an AR application running on a UE needs to perform an
inference task at a time when the communication path from the RU to
the DU is highly congested, then it will have an incentive to perform
such a task directly in the UE or in the RU. If instead the network
offers an uncongested path to the DU and the CU, it will have an
incentive to run the inference task on these other nodes since they
offer more compute power.
+------+ +------+ +------+ +------+
| | | | | | | |
| UE +-------+ RU +-------+ DU +-------+ CU +
| | | | | | | |
+------+ Air +------+ +------+ +------+
Interface Fronthaul Backhaul
Figure 4: An AI-inference application in the edge cloud needs to
place the inference task on a compute node location (UE, RU, DU
or CU) that will perform well from both a compute and a
communication standpoint.
Using ALTO path vector [I-D.ietf-alto-path-vector] and performance
metrics [I-D.ietf-alto-performance-metrics] features, the application
could retrieve the amount of compute resources located in the RU, DU
and CU. By extending ALTO to support bottleneck structures, the
application would also be able to estimate in real-time the available
bandwidth for the paths UE-RU, UE-RU-DU, and UE-RU-DU-CU. Further,
using bottleneck structure methods described in [G2-SIGCOMM], the
application would be able to estimate the time to complete the
inference task for each of the four possible scenarios (running in
the UE, the RU, the DU or, or the CU) and choose the configuration
with the fastest execution.
Similar joint compute-communication optimization problems appear when
performing neural network training in large-scale data centers.
Large-scale data centers with millions of compute nodes are used to
train gigantic neural networks (with potentially trillions of
parameters). Such a massive task needs to be broken down into
smaller subtasks that are then executed on the nodes. Once again,
compute and communication need to be jointly optimized (see
[TOPOOPT-MIT] and [FLEXFLOW-STFORD]) in order to ensure regions in
the network don't become bottlenecks. By exposing bottleneck
structure information using ALTO, the AI-training application can
make better subtask placement decisions that avoid potential network
bottlenecks.
Finally, AI-training using planet-scale networks generalizes the same
joint compute and communication problem to an Internet level
[SINGULARITY-MSFT], with the need to implement a global scheduler
that is responsible for placing workloads onto clusters of globally-
distributed compute nodes. Here too enabling better network state
visibility using ALTO and bottleneck structure graphs could help the
scheduler make better task placement decisions.
5. Example: Application Layer Traffic Optimization using Bottleneck 5. Example: Application Layer Traffic Optimization using Bottleneck
Structures Structures
In this section we provide an example illustrating how bottleneck In this section we provide an example illustrating how bottleneck
structures can be used to optimize application performance. This structures can be used to optimize application performance. This
example will then be referenced in Section 6 to discuss and introduce example will then be referenced in Section 6 to discuss and introduce
the necessary requirements to integrate the BSG service into the ALTO the necessary requirements to integrate the BSG service into the ALTO
standard. It is worth noticing that, as shown in Section 4, standard. It is worth noticing that, as shown in Section 4,
bottleneck structures have numerous applications. This section bottleneck structures have numerous applications. This section
provides a complete example for just one of the use cases. In provides a complete example for just one of the use cases. In
particular, the focus of the next example is on the joint routing and particular, the focus of the next example is on the joint routing and
congestion control use case Section 4.4. congestion control use case Section 4.4.
Figure 4 provides a view of Google's B4 network as presented in Figure 5 provides a view of Google's B4 network as presented in
[B4-SIGCOMM], providing connectivity to 12 data centers distributed [B4-SIGCOMM], providing connectivity to 12 data centers distributed
across the world (two in Asia, six in America and four in Europe). across the world (two in Asia, six in America and four in Europe).
+-----+ +-----+ +-----+ +-----+ +------+ +------+ +-----+ +-----+ +-----+ +-----+ +------+ +------+
| | | | | | | | | | | | | | | | | | | | | | | |
| DC1 +---+ DC3 +--+ DC4 +--+ DC7 +---+ DC11 +----+ DC12 | | DC1 +---+ DC3 +--+ DC4 +--+ DC7 +---+ DC11 +----+ DC12 |
| | | | | | | | | | | | | | | | | | | | | | | |
+---+-+ +--+--+ +--+-++ +----++ +-+--+-+ +----+-+ +---+-+ +--+--+ +--+-++ +----++ +-+--+-+ +----+-+
| | | | | | | | | | | | | | | | | |
| +------+ | +----+ | | | +--------+ | | +------+ | +----+ | | | +--------+ |
skipping to change at page 22, line 25 skipping to change at page 24, line 25
| | | | | | | | | | | | | | | | | |
+---+-+ +--+--+ ++---++ ++---++ +-+---++ +-+---+ +---+-+ +--+--+ ++---++ ++---++ +-+---++ +-+---+
| | | | | | | | | | | | | | | | | | | | | | | |
| DC2 +---+ DC5 +--+ DC6 +--+ DC8 +---+ DC10 +----+ DC9 | | DC2 +---+ DC5 +--+ DC6 +--+ DC8 +---+ DC10 +----+ DC9 |
| | | | | | | | | | | | | | | | | | | | | | | |
+-----+ +-----+ +-----+ +-----+ +------+ +-----+ +-----+ +-----+ +-----+ +-----+ +------+ +-----+
|-----| |-----------------------| |-----------------| |-----| |-----------------------| |-----------------|
Asia America Europe Asia America Europe
Figure 4: A subset of Google's B4 network introduced in [B4-SIGCOMM]. Figure 5: Google's B4 network introduced in [B4-SIGCOMM].
The 12 data centeres are connected via a total of 19 links, labeled The 12 data centeres are connected via a total of 19 links, labeled
l1, l2, ... l19. Table 1 presents the pair of data centers that each l1, l2, ... l19. Table 1 presents the pair of data centers that each
link is connected to. link is connected to.
+======+=======================+======+=======================+ +======+=======================+======+=======================+
| Link | Adjacent data centers | Link | Adjacent data centers | | Link | Adjacent data centers | Link | Adjacent data centers |
+======+=======================+======+=======================+ +======+=======================+======+=======================+
| l1 | DC1, DC2 | l11 | DC10, DC12 | | l1 | DC1, DC2 | l11 | DC10, DC12 |
+------+-----------------------+------+-----------------------+ +------+-----------------------+------+-----------------------+
skipping to change at page 23, line 29 skipping to change at page 25, line 29
+------+-----------------------+------+-----------------------+ +------+-----------------------+------+-----------------------+
| l7 | DC7, DC8 | l17 | DC7, DC8 | | l7 | DC7, DC8 | l17 | DC7, DC8 |
+------+-----------------------+------+-----------------------+ +------+-----------------------+------+-----------------------+
| l8 | DC8, DC10 | l18 | DC9, DC11 | | l8 | DC8, DC10 | l18 | DC9, DC11 |
+------+-----------------------+------+-----------------------+ +------+-----------------------+------+-----------------------+
| l9 | DC9, DC10 | l19 | DC10, DC11 | | l9 | DC9, DC10 | l19 | DC10, DC11 |
+------+-----------------------+------+-----------------------+ +------+-----------------------+------+-----------------------+
| l10 | DC7, DC11 | | | | l10 | DC7, DC11 | | |
+------+-----------------------+------+-----------------------+ +------+-----------------------+------+-----------------------+
Table 1: Link connectivity in the B4 network. Table 1: Link connectivity (adjacency matrix) in the B4
network.
For the sake of illustration, we will assume a simple configuration For the sake of illustration, we will assume a simple configuration
consisting of a pair of flows (one for each direction) connecting consisting of a pair of flows (one for each direction) connecting
every data center in the US with every data center in Europe, with every data center in the US with every data center in Europe, with
all flows routed along a shortest path from source to destination. all flows routed along a shortest path from source to destination.
Since there are six data centers in the US and four in Europe, this Since there are six data centers in the US and four in Europe, this
configuration has a total of 48 flows. All links are assumed to have configuration has a total of 48 flows. All links are assumed to have
a capacity of 10 Gbps except for the transatlantic links (DC7-DC11 a capacity of 10 Gbps except for the transatlantic links (DC7-DC11
and DC8-DC10), which are configured at 25 Gbps. and DC8-DC10), which are configured at 25 Gbps.
skipping to change at page 24, line 35 skipping to change at page 26, line 35
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
+-v---v--+ +-v----v-+ +-v----+ +-v----v-+ +-v---v--+ +-v----v-+ +-v----+ +-v----v-+
| | | | | | | | | | | | | | | |
| l8 | | l10 | | l5 | | l3 | | l8 | | l10 | | l5 | | l3 |
| | | | | | | | | | | | | | | |
+-^---^--+ +-^---^--+ +------+ +--------+ +-^---^--+ +-^---^--+ +------+ +--------+
| | | | | | | |
| | | +-----------------------+ | | | +-----------------------+
| | | | | | | |
| +---------|-----------------+ | | +---------|---------------+ |
| | | | | | | |
+-v----+ +---v---+ +---v---+ +---v---+ +-v----+ +---v---+ +---v---+ +---v---+
| | | | | | | | | | | | | | | |
| f6 | | f9 | | f23 | | f18 | (...) | f6 | | f9 | | f23 | | f18 | (...)
| | | | | | | | | | | | | | | |
+------+ +-------+ +-------+ +-------+ +------+ +-------+ +-------+ +-------+
Figure 5: Bottleneck structure of Google's B4 network example. Figure 6: Bottleneck structure of Google's B4 network example.
For the sake of compactness, Figure 5 only includes the bottleneck For the sake of compactness, Figure 6 only includes the bottleneck
links and a subset of the flow vertices that are part of the complete links and a subset of the flow vertices that are part of the complete
bottleneck structure. In particular, out of the 19 links that are bottleneck structure. In particular, out of the 19 links that are
part of B4, six links (l15, l7, l8, l10, l5, l3) are bottlenecks. part of B4, six links (l15, l7, l8, l10, l5, l3) are bottlenecks.
The bottleneck structure graph shows the existence of two bottleneck The bottleneck structure graph shows the existence of two bottleneck
levels in our configuration example: levels in our configuration example:
* The first level at the top of the bottleneck structure includes * The first level at the top of the bottleneck structure includes
links l15 and l7. Flows f1, f2, f3, f10, etc. are bottlenecked at links l15 and l7. Flows f1, f2, f3, f10, etc. are bottlenecked at
this level. this level.
skipping to change at page 25, line 25 skipping to change at page 27, line 25
bottlenecked at level 2. For instance, consider the following bottlenecked at level 2. For instance, consider the following
directed path in the bottleneck structure: directed path in the bottleneck structure:
l15 -> f1 -> l8 -> f6 l15 -> f1 -> l8 -> f6
Using the MBA property, we have that since l15 precedes l8, it must Using the MBA property, we have that since l15 precedes l8, it must
be that s15 < s8, where s15 is the rate of flow f1 bottlenecked at be that s15 < s8, where s15 is the rate of flow f1 bottlenecked at
l15 and s8 is the rate of flow f6 bottlenecked at l8. l15 and s8 is the rate of flow f6 bottlenecked at l8.
Suppose now that an application needs to place a new flow on Google's Suppose now that an application needs to place a new flow on Google's
B4 network to transfer a large data set from data center 4 (DC4) to B4 network to transfer a large data set from data center 11 (DC11) to
data center 11 (DC11). The application needs to select and configure data center 4 (DC4). The application needs to select and configure a
a path from DC4 to DC11 (for instance, this could be done by using SR path from DC11 to DC4 (for instance, this could be done by using SR
[RFC8402]). The shortest path DC11 -> l10 -> DC7 -> l15 -> DC4 is [RFC8402]). The shortest path DC11 -> l10 -> DC7 -> l15 -> DC4 is
often used as the default option. Doing so, however, implies that often used as the default option. Doing so, however, implies that
the flow will be bottlenecked at link l15 at the upper level of the the flow will be bottlenecked at link l15 at the upper level of the
bottleneck structure, leading to a lower transmission rate. If bottleneck structure, leading to a lower transmission rate. If
instead we choose the non-shortest path DC11 -> l19 -> DC10 -> l8 -> instead we choose the non-shortest path DC11 -> l19 -> DC10 -> l8 ->
DC8 -> l16 -> DC4, now the flow will be bottlenecked at link l8 (at DC8 -> l16 -> DC4, now the flow will be bottlenecked at link l8 (at
the lower level of the bottleneck structure), leading to a higher the lower level of the bottleneck structure), leading to a higher
transmission rate. transmission rate.
Using QTBS, we can also numerically compute the transmission rate of Using QTBS, we can also numerically compute the transmission rate of
skipping to change at page 26, line 21 skipping to change at page 28, line 21
6. Requirements 6. Requirements
This section provides a discussion on the necessary requirements to This section provides a discussion on the necessary requirements to
integrate the BSG service into the ALTO standard. integrate the BSG service into the ALTO standard.
6.1. Requirement 1: Bottleneck Structure Graph (BSG) Abstraction 6.1. Requirement 1: Bottleneck Structure Graph (BSG) Abstraction
The first base requirement consists in extending the ALTO server with The first base requirement consists in extending the ALTO server with
the capability to compute bottleneck structures. For instance, with the capability to compute bottleneck structures. For instance, with
this capability, given the network configuration in Figure 4, the this capability, given the network configuration in Figure 5, the
ALTO server would be able to compute the bottleneck structure shown ALTO server would be able to compute the bottleneck structure shown
in Figure 5: in Figure 6:
* Requirement 1A (R1A). The ALTO server MUST compute the bottleneck * Requirement 1A (R1A). The ALTO server MUST compute the bottleneck
structure graph to allow applications optimize their performance structure graph to allow applications optimize their performance
using the BSG service. using the BSG service.
We note that the alternative, which would consists in ALTO simply We note that the alternative, which would consists in ALTO simply
providing the necessary information for applications to compute their providing the necessary information for applications to compute their
own bottleneck structures, would not scale due to the following own bottleneck structures, would not scale due to the following
issues: issues:
skipping to change at page 29, line 35 skipping to change at page 31, line 35
7. Security Considerations 7. Security Considerations
Future versions of this document may extend the base ALTO protocol, Future versions of this document may extend the base ALTO protocol,
so the Security Considerations [RFC7285] of the base ALTO protocol so the Security Considerations [RFC7285] of the base ALTO protocol
fully apply when this proposed extension is provided by an ALTO fully apply when this proposed extension is provided by an ALTO
server. server.
The Bottleneck Structure Graph extension requires additional scrutiny The Bottleneck Structure Graph extension requires additional scrutiny
on three security considerations discussed in the base protocol: on three security considerations discussed in the base protocol:
confidentiality of ALTO information (Section 15.3 of [RFC7285]), Confidentiality of ALTO information (Section 15.3 of [RFC7285]),
potential undesirable guidance from authenticated ALTO information potential undesirable guidance from authenticated ALTO information
(Section 15.2 of [RFC7285]), and availability of ALTO service (Section 15.2 of [RFC7285]), and availability of ALTO service
(Section 15.5 of [RFC7285]). (Section 15.5 of [RFC7285]).
For confidentiality of ALTO information, a network operator should be For confidentiality of ALTO information, a network operator should be
aware that this extension may introduce a new risk: As the Bottleneck aware that this extension may introduce a new risk: As the Bottleneck
Structure information may reveal more fine-grained internal network Structure information may reveal more fine-grained internal network
structures than the base protocol, an attacker may identify the structures than the base protocol, an attacker may identify the
bottleneck link and start a distributed denial-of-service (DDoS) bottleneck link and start a distributed denial-of-service (DDoS)
attack involving minimal flows to conduct in-network congestion. attack involving minimal flows to conduct in-network congestion.
skipping to change at page 30, line 21 skipping to change at page 32, line 21
information exposure or obfuscate the real information. information exposure or obfuscate the real information.
Implementations involving reduction or obfuscation of the Implementations involving reduction or obfuscation of the
Bottleneck Structure information SHOULD consider reduction/ Bottleneck Structure information SHOULD consider reduction/
obfuscation mechanisms that can preserve the integrity of ALTO obfuscation mechanisms that can preserve the integrity of ALTO
information, for example, by using minimal feasible region information, for example, by using minimal feasible region
compression algorithms [NOVA] or obfuscation protocols RESA compression algorithms [NOVA] or obfuscation protocols RESA
[MERCATOR]. We note that these obfuscation methods are [MERCATOR]. We note that these obfuscation methods are
experimental and their practical applicability to the generic experimental and their practical applicability to the generic
capability provided by this extension is not fully assessed. capability provided by this extension is not fully assessed.
We note that for operators that are sensitive about disclosing flow-
level information (even if it is anonymized), then they SHOULD
consider using the Path Gradient Graph (PGG) or the QoS-Path Gradient
Graph (Q-PGG) since these objects provide the additional security
advantage of hiding flow-level information from the graph.
For undesirable guidance, the ALTO server must be aware that, if For undesirable guidance, the ALTO server must be aware that, if
information reduction/obfuscation methods are implemented, they may information reduction/obfuscation methods are implemented, they may
lead to potential misleading information from Authenticated ALTO lead to potential misleading information from Authenticated ALTO
Information. In such cases, the Protection Strategies described in Information. In such cases, the Protection Strategies described in
Section 15.2.2 of [RFC7285] MUST be considered. Section 15.2.2 of [RFC7285] MUST be considered.
For availability of ALTO service, an ALTO server should be cognizant For availability of ALTO service, an ALTO server should be cognizant
that using Bottleneck Structures might have a new risk: frequently that using Bottleneck Structures might have a new risk: frequently
querying the BSG service might consume intolerable amounts of querying the BSG service might consume intolerable amounts of
computation and storage on the server side. For example, if an ALTO computation and storage on the server side. For example, if an ALTO
skipping to change at page 31, line 5 skipping to change at page 33, line 9
8. IANA Considerations 8. IANA Considerations
Future versions of this document may register new entries to the ALTO Future versions of this document may register new entries to the ALTO
Cost Metric Registry, the ALTO Cost Mode Registry, the ALTO Domain Cost Metric Registry, the ALTO Cost Mode Registry, the ALTO Domain
Entity Type Registry and the ALTO Entity Property Type Registry. Entity Type Registry and the ALTO Entity Property Type Registry.
9. References 9. References
9.1. Normative References 9.1. Normative References
[I-D.ietf-alto-path-vector]
Gao, K., Lee, Y., Randriamasy, S., Yang, Y. R., and J. J.
Zhang, "An ALTO Extension: Path Vector", Work in Progress,
Internet-Draft, draft-ietf-alto-path-vector-25, 20 March
2022, <https://datatracker.ietf.org/doc/html/draft-ietf-
alto-path-vector-25>.
[I-D.ietf-alto-performance-metrics]
Wu, Q., Yang, Y. R., Lee, Y., Dhody, D., Randriamasy, S.,
and L. M. C. Murillo, "ALTO Performance Cost Metrics",
Work in Progress, Internet-Draft, draft-ietf-alto-
performance-metrics-28, 21 March 2022,
<https://datatracker.ietf.org/doc/html/draft-ietf-alto-
performance-metrics-28>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/rfc/rfc2119>. <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S., [RFC7285] Alimi, R., Ed., Penno, R., Ed., Yang, Y., Ed., Kiesel, S.,
Previdi, S., Roome, W., Shalunov, S., and R. Woundy, Previdi, S., Roome, W., Shalunov, S., and R. Woundy,
"Application-Layer Traffic Optimization (ALTO) Protocol", "Application-Layer Traffic Optimization (ALTO) Protocol",
RFC 7285, DOI 10.17487/RFC7285, September 2014, RFC 7285, DOI 10.17487/RFC7285, September 2014,
<https://www.rfc-editor.org/rfc/rfc7285>. <https://www.rfc-editor.org/rfc/rfc7285>.
skipping to change at page 31, line 46 skipping to change at page 34, line 19
[BE-ONL] "Bandwidth Estimation on OpenNetLab", IETF Plenary 112, [BE-ONL] "Bandwidth Estimation on OpenNetLab", IETF Plenary 112,
IETF ALTO WG , 2021, IETF ALTO WG , 2021,
<https://datatracker.ietf.org/meeting/112/materials/ <https://datatracker.ietf.org/meeting/112/materials/
slides-112-alto-bandwidth-estimation-service-00>. slides-112-alto-bandwidth-estimation-service-00>.
[BE-SIGCOMM] [BE-SIGCOMM]
Kumar et al, A., "BwE: Flexible, Hierarchical Bandwidth Kumar et al, A., "BwE: Flexible, Hierarchical Bandwidth
Allocation for WAN Distributed Computing", ACM SIGCOMM , Allocation for WAN Distributed Computing", ACM SIGCOMM ,
2015. 2015.
[FLEXFLOW-STFORD]
Jia et al, Z., "Beyond Data And Model Parallelism For Deep
Neural Networks", n.d.,
<https://arxiv.org/pdf/1807.05358.pdf[>.
[FLOWDIR] Pujol, E., Poese, I., Zerwas, J., Smaragdakis, G., and A. [FLOWDIR] Pujol, E., Poese, I., Zerwas, J., Smaragdakis, G., and A.
Feldmann, "Steering Hyper-Giants' Traffic at Scale", ACM Feldmann, "Steering Hyper-Giants' Traffic at Scale", ACM
CoNEXT , 2019. CoNEXT , 2019.
[G2-SC] Amsel, N., Ros-Giralt, J., Yellamraju, S., von Hoffe, B., [G2-SC] Amsel, N., Ros-Giralt, J., Yellamraju, S., von Hoffe, B.,
and R. Lethin, "Computing Bottleneck Structures at Scale and R. Lethin, "Computing Bottleneck Structures at Scale
for High-Precision Network Performance Analysis", IEEE for High-Precision Network Performance Analysis", IEEE
International Workshop on Innovating the Network for Data International Workshop on Innovating the Network for Data
Intensive Science (INDIS), Supercomputing , 2020. Intensive Science (INDIS), Supercomputing , 2020.
skipping to change at page 33, line 18 skipping to change at page 35, line 38
Networking (TON) Vol 27, no. 2 (2019): 805-818., 2019, Networking (TON) Vol 27, no. 2 (2019): 805-818., 2019,
<https://doi.org/10.1109/IWQoS.2017.7969117>. <https://doi.org/10.1109/IWQoS.2017.7969117>.
[PETERSON] Peterson, L. and O. Sunay, "5G Mobile Networks: A Systems [PETERSON] Peterson, L. and O. Sunay, "5G Mobile Networks: A Systems
Approach", Open Networking Foundation , 2020. Approach", Open Networking Foundation , 2020.
[SH-SIGCOMM] [SH-SIGCOMM]
Singh et al, R., "Cost-effective capacity provisioning in Singh et al, R., "Cost-effective capacity provisioning in
wide area networks with Shoofly", ACM SIGCOMM , 2021. wide area networks with Shoofly", ACM SIGCOMM , 2021.
[SINGULARITY-MSFT]
Shukla et al, D., "Singularity: Planet-Scale, Preemptive
and Elastic Scheduling of AI Workloads", n.d.,
<https://arxiv.org/pdf/2202.07848.pdf>.
[TOPOOPT-MIT]
Wang et al, W., "TOPOOPT: Optimizing the Network Topology
for Distributed DNN Training", n.d.,
<https://arxiv.org/pdf/2202.00433.pdf>.
Authors' Addresses Authors' Addresses
Jordi Ros-Giralt Jordi Ros-Giralt
Qualcomm Qualcomm
Email: jros@qti.qualcomm.com Email: jros@qti.qualcomm.com
Sruthi Yellamraju Sruthi Yellamraju
Qualcomm Qualcomm
Email: yellamra@qti.qualcomm.com Email: yellamra@qti.qualcomm.com
Qin Wu Qin Wu
Huawei Huawei
Email: bill.wu@huawei.com Email: bill.wu@huawei.com
Luis Miguel Contreras Luis Miguel Contreras
Telefonica Telefonica
 End of changes. 33 change blocks. 
45 lines changed or deleted 179 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/