idnits 2.17.1 draft-kutscher-coinrg-dir-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack a Security Considerations section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 08, 2019) is 1753 days in the past. Is this intentional? Checking references for intended status: Experimental ---------------------------------------------------------------------------- No issues found here. Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 COINRG D. Kutscher 3 Internet-Draft University of Applied Sciences Emden/Leer 4 Intended status: Experimental T. Kaerkkaeinen 5 Expires: January 9, 2020 J. Ott 6 Technical University Muenchen 7 July 08, 2019 9 Directions for Computing in the Network 10 draft-kutscher-coinrg-dir-00 12 Abstract 14 In-network computing can be conceived in many different ways - from 15 active networking, data plane programmability, running virtualized 16 functions, service chaining, to distributed computing. 18 This memo proposes a particular direction for Computing in the 19 Networking (COIN) research and lists suggested research challenges. 21 Status of This Memo 23 This Internet-Draft is submitted in full conformance with the 24 provisions of BCP 78 and BCP 79. 26 Internet-Drafts are working documents of the Internet Engineering 27 Task Force (IETF). Note that other groups may also distribute 28 working documents as Internet-Drafts. The list of current Internet- 29 Drafts is at https://datatracker.ietf.org/drafts/current/. 31 Internet-Drafts are draft documents valid for a maximum of six months 32 and may be updated, replaced, or obsoleted by other documents at any 33 time. It is inappropriate to use Internet-Drafts as reference 34 material or to cite them other than as "work in progress." 36 This Internet-Draft will expire on January 9, 2020. 38 Copyright Notice 40 Copyright (c) 2019 IETF Trust and the persons identified as the 41 document authors. All rights reserved. 43 This document is subject to BCP 78 and the IETF Trust's Legal 44 Provisions Relating to IETF Documents 45 (https://trustee.ietf.org/license-info) in effect on the date of 46 publication of this document. Please review these documents 47 carefully, as they describe your rights and restrictions with respect 48 to this document. Code Components extracted from this document must 49 include Simplified BSD License text as described in Section 4.e of 50 the Trust Legal Provisions and are provided without warranty as 51 described in the Simplified BSD License. 53 Table of Contents 55 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 56 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 57 3. Computing in the Network vs Networked Computing vs Packet 58 Processing . . . . . . . . . . . . . . . . . . . . . . . . . 4 59 3.1. Networked Computing . . . . . . . . . . . . . . . . . . . 4 60 3.2. Packet Processing . . . . . . . . . . . . . . . . . . . . 5 61 3.3. Computing in the Network . . . . . . . . . . . . . . . . 6 62 3.4. Elements for Computing in the Network . . . . . . . . . . 8 63 4. Research Challenges . . . . . . . . . . . . . . . . . . . . . 10 64 4.1. Categorization of Different Use Cases for Computing in 65 the Network . . . . . . . . . . . . . . . . . . . . . . . 10 66 4.2. Networking and Remote-Method-Invocation Abstractions . . 10 67 4.3. Transport Abstractions . . . . . . . . . . . . . . . . . 12 68 4.4. Programming Abstractions . . . . . . . . . . . . . . . . 13 69 4.5. Security, Privacy, Trust Model . . . . . . . . . . . . . 14 70 4.6. Failure Handling, Debugging, Management . . . . . . . . . 15 71 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 15 72 6. Informative References . . . . . . . . . . . . . . . . . . . 15 73 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 16 75 1. Introduction 77 Recent advances in platform virtualization, link layer technologies 78 and data plane programmability have led to a growing set of use cases 79 where computation near users or data consuming applications is needed 80 - for example for addressing minimal latency requirements for 81 compute-intensive interactive applications (networked Augmented 82 Reality, AR), for addressing privacy sensitivity (avoiding raw data 83 copies outside a perimeter by processing data locally), and for 84 speeding up distributed computation by putting computation at 85 convenient places in a network topology. 87 In-network computing has mainly been perceived in five variants so 88 far: 1) Active Networking [ACTIVE], adapting the per-hop-behavior of 89 network elements with respect to packets in flows, 2) Edge Computing 90 as an extension of virtual-machine (VM) based platform-as-a-service, 91 3) programming the data plane of SDN switches (through powerful 92 programmable CPUs and programming abstractions, such as P4 [SAPIO]), 93 4) application-layer data processing frameworks, and 5) Service 94 Function Chaining (SFC). 96 Active Networking has not found much deployment in the past due to 97 its problematic security properties and complexity. 99 Programmable data planes can be used in data centers with uniform 100 infrastructure, good control over the infrastructure, and the 101 feasibility of centralized control over function placement and 102 scheduling. Due to the still limited, packet-based programmability 103 model, most applications today are point solutions that can 104 demonstrate benefits for particular optimizations, however often 105 without addressing transport protocol services or data security that 106 would be required for most applications running in shared 107 infrastructure today. 109 Edge Computing (as traditional cloud computing) has a fairly coarse- 110 grained (VM-based) computation-model and is hence typically deploying 111 centralized positioning/scheduling though virtual infrastructure 112 management (VIM) systems. 114 Microservices can be seen as a (light-weight) extension of the cloud 115 computing model (application logic in containers and orchestrators 116 for resource allocation and other management functions), leveraging 117 more light-weight platforms and fine-grained functions. Compared to 118 traditional VM-based systems, microservice platforms typically employ 119 a "stateless" approach, where the service/application state is not 120 tied to the compute platform, thus achieving fault tolerance with 121 respect to compute platform/process failures. 123 Application-layer data processing such as Apache Flink [FLINK] 124 provide attractive dataflow programming models for event-based stream 125 processing and light-weight fault-tolerance mechanisms - however 126 systems such as Flink are not designed for dynamic scheduling of 127 compute functions. 129 Modern distributed applications frameworks such as Ray [RAY], Sparrow 130 [SPARROW] or Canary [CANARY] are more flexible in this regard - but 131 since they are conceived as application-layer frameworks, their 132 scheduling logic can only operate with coarse-granular cost 133 information. For example, application-layer frameworks in general, 134 can only infer network performance, anomalies, optimization potential 135 indirectly (through observed performance or failure), so most 136 scheduling decisions are based on metrics such as platform load. 138 Service Function Chaining (SFC, [RFC7665]) is about establishing IP 139 tunnels between processing functions that are expected to work on 140 packets or flows - for applications such as inspection and 141 classification - not for general Computing in the Network purposes. 143 2. Terminology 145 We are using the following terms in this memo: 147 Program: a set of computations requested by a user 149 Program Instance: one currently executing instance of a program 151 Function: a specific computation that can be invoked as part of a 152 program 154 Execution Platform: a specific host platform that can run function 155 code 157 Execution Environment: a class of target environments (execution 158 platforms) for function execution, for example, a JVM-based 159 execution environment that can run functions represented in JVM 160 byte code 162 3. Computing in the Network vs Networked Computing vs Packet Processing 164 Many applications that might intuitively be characterized as 165 "computing in the network" are actually either about connecting 166 compute nodes/processes or about IP packet processing in fairly 167 traditional ways. 169 Here, we try to contrast these existing and wildly successful systems 170 (that probably do not require new research) with a more novel 171 "computing in the network (COIN)" approach that revisits the function 172 split between computing and networking. 174 3.1. Networked Computing 176 Networked Computing exists in various facets today (as described in 177 the Introduction). Fundamentally, these systems make use of 178 networking to connect compute instances - be it VMs, containers, 179 processes or other forms of distributed computing instances. 181 There are established frameworks for connecting these instances, from 182 general purpose Remote Method/Procedure Invocation to system-specific 183 application-layer protocols. With that, these systems are not 184 actually realizing "computing in the network" - they are just using 185 the network (and taking connectivity as granted). 187 Most of the challenges here are related to compute resource 188 allocation, i.e., orchestration methods for instantiating the right 189 compute instance on a corresponding platform - for achieving fault 190 tolerance, performance optimization and cost reduction. 192 Examples of successful applications of networked computing are 193 typical overlay systems such as CDNs. As overlays they do not need 194 to be "in the network" - they are effectively applications. (Note: 195 we sometimes refer to CDN as an "in-network" service because of the 196 mental model of HTTP requests that are being directed and potentially 197 forwarded by CDN systems. However, none of this happens "in the 198 network" - it is just a successful application of HTTP and underlying 199 transport protocols.) 201 3.2. Packet Processing 203 Packet processing is a function "in the network" - in a sense that 204 middleboxes reside in the network as transparent functions that apply 205 processing functions (inspection, classification, filtering, load 206 management etc.) - mostly _transparent_ to endpoints. Some middlebox 207 functions (TCP split proxies, video optimizers) are more invasive in 208 a sense that they do not only operate on IP flows but also try to 209 impersonate transport endpoints (or interfere with their behavior). 211 Since these systems can have severe impacts on service availability, 212 security/privacy, and performance they are typically not very 213 _programmable_. 215 Active Networking can be characterized as an attempt to offer 216 abstractions for programmable packet processing from an "endpoint 217 perspective", i.e., by using data packets to specify intended 218 behavior in the network with the mentioned security problems. 220 Programmable Data Plane approach such as P4 are providing 221 abstractions of different types of network switch hardware (NPUs, 222 CPUs, FPGA, PISA) from a switch/network programming perspective. 223 Corresponding programs are constrained by the capabilities 224 (instruction set, memory) of the target platform and typically 225 operate on packets/flow abstractions (for example match-action-style 226 processing). 228 Network Functions Virtualization (NFV) is essentially a "Networked 229 Computing" approach (after all, Network Functions are just 230 virtualized compute functions that get instantiated on compute 231 platforms by an orchestrator). However, some VNFs happen do process/ 232 forward packets (e.g., gateways in provider networks, NATs or 233 firewalls). Still that does not affect their fundamental properties 234 as virtualized computing functions. 236 3.3. Computing in the Network 238 In some deployments, networked computing and packet processing go 239 well together, for example when network virtualization (multiplexing 240 physical infrastructure for multiple isolated subnetworks) is 241 achieved through data-plane programming (SDN-style) to provide 242 connectivity for VMs of a tenant system. 244 While such deployments are including both computing and networking, 245 they are not really doing computing _in the network_. VM/containers 246 are virtualized hosts/processes using the existing network, and 247 packet processing/programmable networks is about packet-level 248 manipulation. While it is possible to implement certain 249 optimizations (for example, processing logic for data aggregation) - 250 the applicability is limited especially for applications where 251 application-data units do not map to packets and where additional 252 transport protocols and security requirements have to be considered. 254 Distributed Computing (stream processing, edge computing) on the 255 other side is an area where many application-layer frameworks exist 256 that actually _could_ benefit from a better integration of computing 257 and networking, i.e., from a new "computing in the network" approach. 259 For example, when running a distributed application that requires 260 dynamic function/process instantiation, traditional frameworks 261 typically deploy an orchestrator that keeps track of available host 262 platforms and assigned functions/processes. The orchestrator 263 typically has good visibility of the availability of and current load 264 on host platforms, so it can pick suitable candidates for 265 instantiating a new function. 267 However, it is typically agnostic of the network itself - as 268 application layer overlays the function instances and orchestrators 269 take the network as a given, assuming full connectivity between all 270 hosts and functions. While some optimizations may still be feasible 271 (for example co-locating interacting functions/processes on a single 272 host platform), these systems cannot easily reason about 274 o shortest paths between function instances; 276 o function off-loading opportunities on topologically convenient 277 next-hops; and 279 o availability of new, not yet utilized resources in the network. 281 While it is possible to perform optimizations like these in 282 application layers overlays, it involves significant monitoring 283 effort and would often duplicate information (topology, latency) that 284 is readily available inside the network. In addition to the 285 associated overhead, such systems also operate at different time 286 scales so that direct reaction in fine-grained computing environments 287 is difficult to achieve. 289 When asking the question of how the network can support distributed 290 computing better, it may be helpful to characterize this problem as a 291 resource allocation optimization problem: Can we integrate computing 292 and networking in a way that enables a joint optimization of 293 computing and networking resource usage? Can we apply this approach 294 to achieve certain optimization goals such as: 296 o low latency for certain function calls or compute threads; 298 o high throughput for a pipeline of data processing functions; 300 o high availability for an overall application/service; 302 o load management (balancing, concentration) according to 303 performance/cost constraints; and 305 o consideration of security/privacy constraints with respect to 306 platform selection and function execution? 308 o Also: can we do this at the speed of network dynamics, which may 309 be substantially higher than the rate at which distributed 310 computing applications change? 312 Considering computing and networking resource holistically could be 313 the key for achieving these optimization goals (without considerable 314 overhead through telemetry, management and orchestration systems). 315 If we are able to dissolve the layer boundaries between the 316 networking domain (that is typically concerned with routing, 317 forwarding, packet/flow-level load balancing) and the distributed 318 computing domain (that is typically concerned with 'processor' 319 allocation, scaling, reaction to failure for functions/processes), we 320 might get a handle to achieve a joint resource optimization and 321 enable the distributed computing layer to leverage network-provided 322 mechanisms directly. 324 For example, if distributing information about available/suitable 325 compute platform could be a routing function, we might be able to 326 obtain and utilize this information in a distributed fashion. If 327 instantiating a new function (or offloading some piece of 328 computation) could consider live performance data obtained from a in- 329 network forwarding/offloading service (similar to IP packet 330 forwarding in traditional IP networks), the "next-hop" decision could 331 be based both on network performance and node load/availability). 333 Integrating computing and networking in this manner would not rule 334 out highly optimized systems leveraging sophisticated orchestrators. 335 Instead, it would provide a (possibly somewhat uniform) framework 336 that could allow several operating and optimization modes, including 337 totally distributed modes, centralized orchestration, or hybrid 338 forms, where policies or intents are injected into the distributed 339 decision-making layer, i.e., as parameters for resource allocation 340 and forwarding decisions. 342 3.4. Elements for Computing in the Network 344 In-network computing requires computing resources (CPU, possibly 345 GPUs, memory, ...), physical or virtualized to some extent by a 346 suitable platform. These computing resources may be available in a 347 number of places, as partly already discussed above, including: 349 o They may be found on dedicated machines co-locating with the 350 routing infrastructure, e.g., having a set of servers next to each 351 router as one may find in access network concentrators. This 352 would come closest to today's principles of edge computing. 354 o They may be integrated with routers or other network operations 355 infrastructure and thus be tightly integrated within the same 356 physical device. 358 o They may be integrated within switches, similar to the (limited) 359 P4 compute capabilities offered today. 361 o They may be located on NICs (in hosts) or line cards (routers) and 362 be able to proactively perform some application functions, in the 363 sense of a generalized variant of "offloading" that protocol 364 stacks perform to reduce main CPU load. 366 o They might add novel types of dedicated hardware to execute 367 certain functions more efficiently, e.g., GPU nodes for 368 (distributed) analytics. 370 o They may also encompass additional resources at the edge of the 371 network, such as sensor nodes. Associated sensors could be 372 physical (as in IoT) or logical (as in MIB data about a network 373 device). 375 o Even user devices along the lines of crowd computing \cite{crowd- 376 computing} or mist computing \cite{mist-computing} may contribute 377 compute resources and dynamically become part of the network. 379 Depending on the type of execution platform, as already alluded to 380 above, a suitable execution framework must be put in place: from 381 lambda functions to threads to processes or process VMs to unikernels 382 to containers to full-blown VMs. This should support mutual 383 isolation and, depending on the service in question, a set of 384 security features (e.g., authentication, trustworthy execution, 385 accountability). Further, it may be desirable to be able to compose 386 the executable units, e.g., by chaining lambda functions or allowing 387 unikernels to provide services to each other - both within a local 388 execution platform and between remote platform instances across the 389 network. 391 The code to be executed may be pre-installed (as firmware, as 392 microcode, as operating system functions, as libraries, as *aaS 393 offering, among others) or may be dynamically supplied. While the 394 former is governed by the entity operating the execution device or 395 supplying it (the vendor), the code to be executed may have different 396 origins. Fundamentally, we can distinguish between two cases: 398 1. The code may be "centrally" provisioned, originating from an 399 application or other service provider inside the network. This 400 is analogous to CDNs, in which an application provider contracts 401 a CDN provider to host content and service logic on its behalf. 402 The deployment is usually long-term, even if instantiations of 403 the code may vary. The code thus originates from rather few - 404 known - sources. In this setting, applications only invoke this 405 code and pass on their parameters, context, data, etc. 407 2. The code may be "decentrally" provided from a user device or 408 other service that requires a certain function or service to be 409 carried out. At the coarse granularity of entire application 410 images, this has been explored as "code offloading"; recent 411 approaches have moved towards finer granularities of offloading 412 (sets of) functions, for which also some frameworks for 413 smartphones were developed, leading to finer granularities down 414 to individual functions. In this setting, application transfer 415 mobile code - along with suitable parameters, etc. - into the 416 network that is executed by suitable execution platforms. This 417 code is naturally expected to be less trusted as it may come from 418 an arbitrary source. 420 Obviously, 1. and 2. may be combined as mobile code may make use of 421 other in-network functions and services, allowing for flexible 422 application decomposition. Essentially, in-network computing may 423 support everything from full application offloading to decomposing an 424 application into small snippets of code (e.g., at class, objects, or 425 function granularity) that are fully distributed inside the network 426 and executed in a distributed fashion according to the control flow 427 of the application. This may lead to iterative or recursive calling 428 from application code on the initiating host to mobile code to pre- 429 provisioned code. 431 Another dimension beyond where the code comes from is how tightly the 432 code and the data are coupled. At one extreme approaches like Active 433 Messages combine the data and the code that operates (only) on that 434 data into transmission units, while at the other extreme approaches 435 like Network Function Virtualization are only concerned with the 436 instantiation of the code in the network. The underlying 437 architectural question is whether the goal is to enable the network 438 to perform computations on the data passing through it, or whether 439 the goal is to enable distributed computational processes to be built 440 in the network. 442 With these different existing and possibly emerging platforms and 443 execution environments and different ways to provision functions in 444 the network, it does not seem useful to assume any particular 445 platform and any particular "mobile code" representation as _the_ 446 "computing in the network" environment. Instead, it seems more 447 promising to reason about properties that are relevant with respect 448 to distributed program semantics and protocols/interfaces that would 449 be used to integrate functions on heterogeneous platforms into one 450 application context. We discuss these ideas and associated 451 challenges in the following section. 453 4. Research Challenges 455 Conceiving computing in the network as a joint resource optimization 456 problem as described above incurs a set of interesting, novel 457 research challenges that are particularly relevant from an Internet 458 Research perspective. 460 4.1. Categorization of Different Use Cases for Computing in the Network 462 There are different applications but also different configuration 463 classes of Computing in the Network systems. For example, a data 464 processing pipeline might be different from a distributed application 465 employing some stateful actor components. It is worthwhile analyzing 466 different typical use cases and identify commonalities (for example, 467 fundamental protocol elements etc.) and differences. 469 4.2. Networking and Remote-Method-Invocation Abstractions 471 In distributed systems, there are different classes of functions that 472 can be distinguished, for example: 474 1. Strictly stateless functions that do not keep any context state 475 beyond their activation time 477 2. Stateful functions/modules/programs that can be instantiated, 478 invoked and eventually destroyed that do keep state over a series 479 of function invocations 481 Modern frameworks such as Ray are offering a clear separation of 482 stateless functions and stateful actors and offer corresponding 483 abstractions in their programming environment. The aforementioned 484 analysis of use cases should provide a diverse set of use cases for 485 deriving a minimal yet sufficient set of function classes. 487 Beyond this fundamental categorization of functions/actors, there is 488 the question of interfaces and protocols mechanisms - as building 489 blocks to utilize functions in programs. For example, stateful 490 functions are typically invoked through some Remote Method Invocation 491 (RMI) protocol that identifies functions, allows for specifying/ 492 transferring parameters and function results etc. Stateful actors 493 could provide class-like interfaces that offer a set of functions 494 (some of which might manipulate actor state). 496 Another aspect is about identity (and naming) of functions and 497 actors. For actors that are typically used to achieve real-world 498 effects or to enable multiple invocations of functions manipulating 499 actor state over time, it is obvious that there needs to be a concept 500 of specific instances. Invoking an actor function would then require 501 specifying some actor instance identifier. 503 Stateless functions may be different: an invoking instance may be 504 oblivious function identify and locus (on an execution platform) and 505 might just want to leave it to the network to find the "best" 506 instance or locus for a new instantiation. Some fine-granular 507 functions might just be instantiated for one invocation. On the 508 other hand, a function might be tied to a particular execution 509 platform, for example an GPU-supported host system. The naming and 510 identity framework must allow for specifying such a function (or at 511 least equivalence classes) accordingly. 513 Stateful functions may share state within the same program context, 514 i.e., across multiple invocations by the same application (as, e.g., 515 holds for web services that preserve context - locally or on the 516 client side). But stateful functions may also hold state across 517 applications and possibly across different instantiations of a 518 function on different compute nodes. Such will require data 519 synchronization mechanisms and the implementation of suitable data 520 structure to achieve a certain degree of consistency. The targeted 521 degree of consistency may vary depending on the function and so may 522 the mechanisms used to achieve the desired consistency. 524 Finally, execution platforms will require efficient resource 525 management techniques to operate with different types of stateless 526 and stateful functions and their associated resources, as well as for 527 dynamically instantiated mobile code. Besides the aforementioned 528 location of suitable compute platforms and scheduling (possibly 529 queuing) functions and function invocations, this also includes 530 resource recovery ("garbage collection"). 532 4.3. Transport Abstractions 534 When implementing Computing in the Network and building blocks such 535 as function invocation it seems that IP packet processing is not the 536 right abstraction. First of all, carrying the context for some 537 function invocation might require many IP packets - possibly 538 something like Application Data Units (ADUs). But even if such ADUs 539 could be fit into network layer packets, other problems still need to 540 be addressed, for example message formats, reliability mechanisms, 541 flow and congestion control etc. 543 It could be argued that today's distributed computing overlays solve 544 that by using TCP and corresponding application layer formats (such 545 as HTTP) - however this bears the question whether a fine-granular 546 distributed computing system, aiming to leverage the network for 547 certain tasks, is best served by a TCP/IP-based approach that entails 548 issues such as 550 o need for additional resolution/mapping system to find IP addresses 551 for functions; 553 o possible overhead for establishing TCP connections for fine- 554 granular function invocation; and 556 o mismatch between TCP end-to-end semantics and the intention to 557 defer next-hop selection etc. to the network. 559 Moreover, some Computing in the Network applications such as Big Data 560 processing (Hadoop-style etc.) can benefit significantly from data- 561 oriented concepts such as 563 o in-network caching (of data objects that represent function 564 parameters or results); 566 o reasoning about the tradeoffs between moving data to function vs. 567 moving code to data assets; and 569 o sharing data (e.g., function results) between sets of consuming 570 entities. 572 RMI systems such as RICE [RICE] [I-D.kutscher-icnrg-rice] enable 573 Remote Method Invocation of ICN (data-oriented network/transport). 574 Research questions include investigating how such approaches can be 575 used to design general-purpose distributed computing systems. More 576 specifically, this would involve questions such as: 578 o What is the role of network elements in forwarding RMI requests? 580 o What visibility into load, performance and other properties should 581 endpoints and the network have to make forwarding/offloading 582 decisions? 584 o What is the notion of transport services in this concept and how 585 intertwined is traditional transport with RMI invocation? 587 o What kind of feedback mechanisms would be desirable for supporting 588 corresponding transport services? 590 4.4. Programming Abstractions 592 When creating SDKs and programming environments (as opposed to 593 individual point solutions) questions arise such as: 595 o How to use concepts such as stateless functions, actor models and 596 RMI in actual programs, i.e., what are minimal/ideal bindings or 597 extensions to programming languages so that programmers can take 598 advantage of Computing in the Network? 600 o Are there additional, potentially higher-layer, abstractions that 601 are needed/useful, for example data set synchronization, data 602 types for distributed computing such as CRDTs? 604 In addition to programming languages, bindings, and data types, there 605 is the question of execution environments and mobile code 606 representation. With the vast amount of different platforms (CPUs, 607 GPUs, FPGAs etc.) it does not seem useful to assume exactly one 608 environment. Instead, interesting applications might actually 609 benefit from running one particular function on a highly optimized 610 platform but are agnostic with respect to platforms for other, less 611 performance-critical functions. Being able to support a 612 heterogenous, evolving set of execution environments brings about 613 questions such as: 615 o How to discover available platforms (and understand their 616 properties)? 618 o How to specify application needs and map them to available 619 platforms? 621 o Can a certain function/application service be provided with 622 different fidelity levels, e.g., can an application leverage a GPU 623 platform if available and fall back to a reduced feature set in 624 case such a platform is not available? 626 In this context, updates and versioning could entail another 627 dimension of variability for Computing in the Network: 629 o How to manage coexistence of multiple versions of functions and 630 services, also for service routing and request forwarding? 632 o Is there potential for fallback and version negotiation if needed 633 (considering the risk of "bidding downs" attacks?) 635 o How to retire old versions? 637 o How to securely and reliably deal with function updates and 638 corresponding maintenance tasks? 640 4.5. Security, Privacy, Trust Model 642 Computing in the Network has interesting security-related challenges, 643 including: 645 o How can a caller trust that a remove function works as expected? 646 This entails several questions such as 648 * How to securely bind "function names" to actual function code? 650 * How to trust the execution platform (in its entirety)? 652 * How to trust the network that is forwards requests (and result 653 messages) reliably and securely? 655 o What levels of authentication are needed for callers (assuming 656 that not everybody can invoke any function)? 658 o How to authenticate and achieve confidentiality for requests, 659 their parameters and result data (especially when considering 660 sharing of results)? 662 Many of these questions are related to other design decisions such as 664 o What kind of session concept do we assume, i.e., is there a 665 concept of distributed application session that represents a trust 666 domain for its members? 668 o Where is trust anchored? Can the system enable decentralized 669 operation? 671 All of these questions are not new, but conceiving networking and 672 computing holistically seems to revisit distributed systems and 673 network security - because some established concepts and technologies 674 may not be directly applicable (such as transport layer security and 675 corresponding web PKI). 677 4.6. Failure Handling, Debugging, Management 679 Distributed computing naturally provides different types of failures 680 and exceptions. In fine-granular distributed computing, some 681 failures may by more tolerable (think microservices), i.e., platform 682 crash or function abort due to isolated problems could be handled by 683 just re-starting/re-running a particular function. Similarly, 684 "message loss" or incorrect routing information may be repairable by 685 the system itself (after time). 687 When failure cannot be repaired (or just tolerated) by the 688 distributed computing framework, this raises questions such as: 690 o What are strategies for retrying vs aborting function invocation? 692 o How to signal exceptions and enable robust response to failures? 694 Failure handling and debugging also has a management aspect that 695 leads to questions such as: 697 o What monitoring and instrumentation interfaces are needed? 699 o How can we represent, visualize, and understand the (dynamically 700 changing) properties of Computing in the Network infrastructure as 701 well as of the currently running/instantiated entities? 703 5. Acknowledgements 705 The authors would like to thank Dave Oran, Michal Krol, Spyridon 706 Mastorakis, Yiannis Psaras, and Eve Schooler for previous fruitful 707 discussions on Computing in the Network topics. 709 6. Informative References 711 [ACTIVE] Tennenhouse, D. and D. Wetherall, "Towards an active 712 network architecture", ACM SIGCOMM Computer Communication 713 Review Vol. 26, pp. 5-17, DOI 10.1145/231699.231701, April 714 1996. 716 [CANARY] Qu et al, H., "Canary -- A scheduling architecture for 717 high performance cloud computing", 2016, 718 . 720 [FLINK] Katsifodimos, A. and S. Schelter, "Apache Flink: Stream 721 Analytics at Scale", 2016 IEEE International Conference on 722 Cloud Engineering Workshop (IC2EW), 723 DOI 10.1109/ic2ew.2016.56, April 2016. 725 [I-D.kutscher-icnrg-rice] 726 Krol, M., Habak, K., Oran, D., Kutscher, D., and I. 727 Psaras, "Remote Method Invocation in ICN", draft-kutscher- 728 icnrg-rice-00 (work in progress), October 2018. 730 [RAY] Moritz et al, P., "Ray -- A Distributed Framework for 731 Emerging AI Applications", 2018, 732 . 734 [RFC7665] Halpern, J., Ed. and C. Pignataro, Ed., "Service Function 735 Chaining (SFC) Architecture", RFC 7665, 736 DOI 10.17487/RFC7665, October 2015, 737 . 739 [RICE] KrA^3l, M., Habak, K., Oran, D., Kutscher, D., and I. 740 Psaras, "RICE", Proceedings of the 5th ACM Conference on 741 Information-Centric Networking - ICN '18, 742 DOI 10.1145/3267955.3267956, 2018. 744 [SAPIO] Sapio, A., Abdelaziz, I., Aldilaijan, A., Canini, M., and 745 P. Kalnis, "In-Network Computation is a Dumb Idea Whose 746 Time Has Come", Proceedings of the 16th ACM Workshop on 747 Hot Topics in Networks - HotNets-XVI, 748 DOI 10.1145/3152434.3152461, 2017. 750 [SPARROW] Ousterhout, K., Wendell, P., Zaharia, M., and I. Stoica, 751 "Sparrow", Proceedings of the Twenty-Fourth ACM Symposium 752 on Operating Systems Principles - SOSP '13, 753 DOI 10.1145/2517349.2522716, 2013. 755 Authors' Addresses 757 Dirk Kutscher 758 University of Applied Sciences Emden/Leer 759 Constantiaplatz 4 760 Emden D-26723 761 Germany 763 Email: ietf@dkutscher.net 764 Teemu Kaerkkaeinen 765 Technical University Muenchen 766 Boltzmannstrasse 3 767 Munich 768 Germany 770 Email: kaerkkae@in.tum.de 772 Joerg Ott 773 Technical University Muenchen 774 Boltzmannstrasse 3 775 Munich 776 Germany 778 Email: jo@in.tum.de