idnits 2.17.1 

draft-kutscher-coinrg-dir-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack a Security Considerations section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (July 08, 2019) is 1753 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

     No issues found here.

     Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	COINRG                                                       D. Kutscher
3	Internet-Draft                 University of Applied Sciences Emden/Leer
4	Intended status: Experimental                            T. Kaerkkaeinen
5	Expires: January 9, 2020                                          J. Ott
6	                                           Technical University Muenchen
7	                                                           July 08, 2019

9	                Directions for Computing in the Network
10	                      draft-kutscher-coinrg-dir-00

12	Abstract

14	   In-network computing can be conceived in many different ways - from
15	   active networking, data plane programmability, running virtualized
16	   functions, service chaining, to distributed computing.

18	   This memo proposes a particular direction for Computing in the
19	   Networking (COIN) research and lists suggested research challenges.

21	Status of This Memo

23	   This Internet-Draft is submitted in full conformance with the
24	   provisions of BCP 78 and BCP 79.

26	   Internet-Drafts are working documents of the Internet Engineering
27	   Task Force (IETF).  Note that other groups may also distribute
28	   working documents as Internet-Drafts.  The list of current Internet-
29	   Drafts is at https://datatracker.ietf.org/drafts/current/.

31	   Internet-Drafts are draft documents valid for a maximum of six months
32	   and may be updated, replaced, or obsoleted by other documents at any
33	   time.  It is inappropriate to use Internet-Drafts as reference
34	   material or to cite them other than as "work in progress."

36	   This Internet-Draft will expire on January 9, 2020.

38	Copyright Notice

40	   Copyright (c) 2019 IETF Trust and the persons identified as the
41	   document authors.  All rights reserved.

43	   This document is subject to BCP 78 and the IETF Trust's Legal
44	   Provisions Relating to IETF Documents
45	   (https://trustee.ietf.org/license-info) in effect on the date of
46	   publication of this document.  Please review these documents
47	   carefully, as they describe your rights and restrictions with respect
48	   to this document.  Code Components extracted from this document must
49	   include Simplified BSD License text as described in Section 4.e of
50	   the Trust Legal Provisions and are provided without warranty as
51	   described in the Simplified BSD License.

53	Table of Contents

55	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
56	   2.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   4
57	   3.  Computing in the Network vs Networked Computing vs Packet
58	       Processing  . . . . . . . . . . . . . . . . . . . . . . . . .   4
59	     3.1.  Networked Computing . . . . . . . . . . . . . . . . . . .   4
60	     3.2.  Packet Processing . . . . . . . . . . . . . . . . . . . .   5
61	     3.3.  Computing in the Network  . . . . . . . . . . . . . . . .   6
62	     3.4.  Elements for Computing in the Network . . . . . . . . . .   8
63	   4.  Research Challenges . . . . . . . . . . . . . . . . . . . . .  10
64	     4.1.  Categorization of Different Use Cases for Computing in
65	           the Network . . . . . . . . . . . . . . . . . . . . . . .  10
66	     4.2.  Networking and Remote-Method-Invocation Abstractions  . .  10
67	     4.3.  Transport Abstractions  . . . . . . . . . . . . . . . . .  12
68	     4.4.  Programming Abstractions  . . . . . . . . . . . . . . . .  13
69	     4.5.  Security, Privacy, Trust Model  . . . . . . . . . . . . .  14
70	     4.6.  Failure Handling, Debugging, Management . . . . . . . . .  15
71	   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
72	   6.  Informative References  . . . . . . . . . . . . . . . . . . .  15
73	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  16

75	1.  Introduction

77	   Recent advances in platform virtualization, link layer technologies
78	   and data plane programmability have led to a growing set of use cases
79	   where computation near users or data consuming applications is needed
80	   - for example for addressing minimal latency requirements for
81	   compute-intensive interactive applications (networked Augmented
82	   Reality, AR), for addressing privacy sensitivity (avoiding raw data
83	   copies outside a perimeter by processing data locally), and for
84	   speeding up distributed computation by putting computation at
85	   convenient places in a network topology.

87	   In-network computing has mainly been perceived in five variants so
88	   far: 1) Active Networking [ACTIVE], adapting the per-hop-behavior of
89	   network elements with respect to packets in flows, 2) Edge Computing
90	   as an extension of virtual-machine (VM) based platform-as-a-service,
91	   3) programming the data plane of SDN switches (through powerful
92	   programmable CPUs and programming abstractions, such as P4 [SAPIO]),
93	   4) application-layer data processing frameworks, and 5) Service
94	   Function Chaining (SFC).

96	   Active Networking has not found much deployment in the past due to
97	   its problematic security properties and complexity.

99	   Programmable data planes can be used in data centers with uniform
100	   infrastructure, good control over the infrastructure, and the
101	   feasibility of centralized control over function placement and
102	   scheduling.  Due to the still limited, packet-based programmability
103	   model, most applications today are point solutions that can
104	   demonstrate benefits for particular optimizations, however often
105	   without addressing transport protocol services or data security that
106	   would be required for most applications running in shared
107	   infrastructure today.

109	   Edge Computing (as traditional cloud computing) has a fairly coarse-
110	   grained (VM-based) computation-model and is hence typically deploying
111	   centralized positioning/scheduling though virtual infrastructure
112	   management (VIM) systems.

114	   Microservices can be seen as a (light-weight) extension of the cloud
115	   computing model (application logic in containers and orchestrators
116	   for resource allocation and other management functions), leveraging
117	   more light-weight platforms and fine-grained functions.  Compared to
118	   traditional VM-based systems, microservice platforms typically employ
119	   a "stateless" approach, where the service/application state is not
120	   tied to the compute platform, thus achieving fault tolerance with
121	   respect to compute platform/process failures.

123	   Application-layer data processing such as Apache Flink [FLINK]
124	   provide attractive dataflow programming models for event-based stream
125	   processing and light-weight fault-tolerance mechanisms - however
126	   systems such as Flink are not designed for dynamic scheduling of
127	   compute functions.

129	   Modern distributed applications frameworks such as Ray [RAY], Sparrow
130	   [SPARROW] or Canary [CANARY] are more flexible in this regard - but
131	   since they are conceived as application-layer frameworks, their
132	   scheduling logic can only operate with coarse-granular cost
133	   information.  For example, application-layer frameworks in general,
134	   can only infer network performance, anomalies, optimization potential
135	   indirectly (through observed performance or failure), so most
136	   scheduling decisions are based on metrics such as platform load.

138	   Service Function Chaining (SFC, [RFC7665]) is about establishing IP
139	   tunnels between processing functions that are expected to work on
140	   packets or flows - for applications such as inspection and
141	   classification - not for general Computing in the Network purposes.

143	2.  Terminology

145	   We are using the following terms in this memo:

147	   Program:  a set of computations requested by a user

149	   Program Instance:  one currently executing instance of a program

151	   Function:  a specific computation that can be invoked as part of a
152	      program

154	   Execution Platform:  a specific host platform that can run function
155	      code

157	   Execution Environment:  a class of target environments (execution
158	      platforms) for function execution, for example, a JVM-based
159	      execution environment that can run functions represented in JVM
160	      byte code

162	3.  Computing in the Network vs Networked Computing vs Packet Processing

164	   Many applications that might intuitively be characterized as
165	   "computing in the network" are actually either about connecting
166	   compute nodes/processes or about IP packet processing in fairly
167	   traditional ways.

169	   Here, we try to contrast these existing and wildly successful systems
170	   (that probably do not require new research) with a more novel
171	   "computing in the network (COIN)" approach that revisits the function
172	   split between computing and networking.

174	3.1.  Networked Computing

176	   Networked Computing exists in various facets today (as described in
177	   the Introduction).  Fundamentally, these systems make use of
178	   networking to connect compute instances - be it VMs, containers,
179	   processes or other forms of distributed computing instances.

181	   There are established frameworks for connecting these instances, from
182	   general purpose Remote Method/Procedure Invocation to system-specific
183	   application-layer protocols.  With that, these systems are not
184	   actually realizing "computing in the network" - they are just using
185	   the network (and taking connectivity as granted).

187	   Most of the challenges here are related to compute resource
188	   allocation, i.e., orchestration methods for instantiating the right
189	   compute instance on a corresponding platform - for achieving fault
190	   tolerance, performance optimization and cost reduction.

192	   Examples of successful applications of networked computing are
193	   typical overlay systems such as CDNs.  As overlays they do not need
194	   to be "in the network" - they are effectively applications.  (Note:
195	   we sometimes refer to CDN as an "in-network" service because of the
196	   mental model of HTTP requests that are being directed and potentially
197	   forwarded by CDN systems.  However, none of this happens "in the
198	   network" - it is just a successful application of HTTP and underlying
199	   transport protocols.)

201	3.2.  Packet Processing

203	   Packet processing is a function "in the network" - in a sense that
204	   middleboxes reside in the network as transparent functions that apply
205	   processing functions (inspection, classification, filtering, load
206	   management etc.) - mostly _transparent_ to endpoints.  Some middlebox
207	   functions (TCP split proxies, video optimizers) are more invasive in
208	   a sense that they do not only operate on IP flows but also try to
209	   impersonate transport endpoints (or interfere with their behavior).

211	   Since these systems can have severe impacts on service availability,
212	   security/privacy, and performance they are typically not very
213	   _programmable_.

215	   Active Networking can be characterized as an attempt to offer
216	   abstractions for programmable packet processing from an "endpoint
217	   perspective", i.e., by using data packets to specify intended
218	   behavior in the network with the mentioned security problems.

220	   Programmable Data Plane approach such as P4 are providing
221	   abstractions of different types of network switch hardware (NPUs,
222	   CPUs, FPGA, PISA) from a switch/network programming perspective.
223	   Corresponding programs are constrained by the capabilities
224	   (instruction set, memory) of the target platform and typically
225	   operate on packets/flow abstractions (for example match-action-style
226	   processing).

228	   Network Functions Virtualization (NFV) is essentially a "Networked
229	   Computing" approach (after all, Network Functions are just
230	   virtualized compute functions that get instantiated on compute
231	   platforms by an orchestrator).  However, some VNFs happen do process/
232	   forward packets (e.g., gateways in provider networks, NATs or
233	   firewalls).  Still that does not affect their fundamental properties
234	   as virtualized computing functions.

236	3.3.  Computing in the Network

238	   In some deployments, networked computing and packet processing go
239	   well together, for example when network virtualization (multiplexing
240	   physical infrastructure for multiple isolated subnetworks) is
241	   achieved through data-plane programming (SDN-style) to provide
242	   connectivity for VMs of a tenant system.

244	   While such deployments are including both computing and networking,
245	   they are not really doing computing _in the network_. VM/containers
246	   are virtualized hosts/processes using the existing network, and
247	   packet processing/programmable networks is about packet-level
248	   manipulation.  While it is possible to implement certain
249	   optimizations (for example, processing logic for data aggregation) -
250	   the applicability is limited especially for applications where
251	   application-data units do not map to packets and where additional
252	   transport protocols and security requirements have to be considered.

254	   Distributed Computing (stream processing, edge computing) on the
255	   other side is an area where many application-layer frameworks exist
256	   that actually _could_ benefit from a better integration of computing
257	   and networking, i.e., from a new "computing in the network" approach.

259	   For example, when running a distributed application that requires
260	   dynamic function/process instantiation, traditional frameworks
261	   typically deploy an orchestrator that keeps track of available host
262	   platforms and assigned functions/processes.  The orchestrator
263	   typically has good visibility of the availability of and current load
264	   on host platforms, so it can pick suitable candidates for
265	   instantiating a new function.

267	   However, it is typically agnostic of the network itself - as
268	   application layer overlays the function instances and orchestrators
269	   take the network as a given, assuming full connectivity between all
270	   hosts and functions.  While some optimizations may still be feasible
271	   (for example co-locating interacting functions/processes on a single
272	   host platform), these systems cannot easily reason about

274	   o  shortest paths between function instances;

276	   o  function off-loading opportunities on topologically convenient
277	      next-hops; and

279	   o  availability of new, not yet utilized resources in the network.

281	   While it is possible to perform optimizations like these in
282	   application layers overlays, it involves significant monitoring
283	   effort and would often duplicate information (topology, latency) that
284	   is readily available inside the network.  In addition to the
285	   associated overhead, such systems also operate at different time
286	   scales so that direct reaction in fine-grained computing environments
287	   is difficult to achieve.

289	   When asking the question of how the network can support distributed
290	   computing better, it may be helpful to characterize this problem as a
291	   resource allocation optimization problem: Can we integrate computing
292	   and networking in a way that enables a joint optimization of
293	   computing and networking resource usage?  Can we apply this approach
294	   to achieve certain optimization goals such as:

296	   o  low latency for certain function calls or compute threads;

298	   o  high throughput for a pipeline of data processing functions;

300	   o  high availability for an overall application/service;

302	   o  load management (balancing, concentration) according to
303	      performance/cost constraints; and

305	   o  consideration of security/privacy constraints with respect to
306	      platform selection and function execution?

308	   o  Also: can we do this at the speed of network dynamics, which may
309	      be substantially higher than the rate at which distributed
310	      computing applications change?

312	   Considering computing and networking resource holistically could be
313	   the key for achieving these optimization goals (without considerable
314	   overhead through telemetry, management and orchestration systems).
315	   If we are able to dissolve the layer boundaries between the
316	   networking domain (that is typically concerned with routing,
317	   forwarding, packet/flow-level load balancing) and the distributed
318	   computing domain (that is typically concerned with 'processor'
319	   allocation, scaling, reaction to failure for functions/processes), we
320	   might get a handle to achieve a joint resource optimization and
321	   enable the distributed computing layer to leverage network-provided
322	   mechanisms directly.

324	   For example, if distributing information about available/suitable
325	   compute platform could be a routing function, we might be able to
326	   obtain and utilize this information in a distributed fashion.  If
327	   instantiating a new function (or offloading some piece of
328	   computation) could consider live performance data obtained from a in-
329	   network forwarding/offloading service (similar to IP packet
330	   forwarding in traditional IP networks), the "next-hop" decision could
331	   be based both on network performance and node load/availability).

333	   Integrating computing and networking in this manner would not rule
334	   out highly optimized systems leveraging sophisticated orchestrators.
335	   Instead, it would provide a (possibly somewhat uniform) framework
336	   that could allow several operating and optimization modes, including
337	   totally distributed modes, centralized orchestration, or hybrid
338	   forms, where policies or intents are injected into the distributed
339	   decision-making layer, i.e., as parameters for resource allocation
340	   and forwarding decisions.

342	3.4.  Elements for Computing in the Network

344	   In-network computing requires computing resources (CPU, possibly
345	   GPUs, memory, ...), physical or virtualized to some extent by a
346	   suitable platform.  These computing resources may be available in a
347	   number of places, as partly already discussed above, including:

349	   o  They may be found on dedicated machines co-locating with the
350	      routing infrastructure, e.g., having a set of servers next to each
351	      router as one may find in access network concentrators.  This
352	      would come closest to today's principles of edge computing.

354	   o  They may be integrated with routers or other network operations
355	      infrastructure and thus be tightly integrated within the same
356	      physical device.

358	   o  They may be integrated within switches, similar to the (limited)
359	      P4 compute capabilities offered today.

361	   o  They may be located on NICs (in hosts) or line cards (routers) and
362	      be able to proactively perform some application functions, in the
363	      sense of a generalized variant of "offloading" that protocol
364	      stacks perform to reduce main CPU load.

366	   o  They might add novel types of dedicated hardware to execute
367	      certain functions more efficiently, e.g., GPU nodes for
368	      (distributed) analytics.

370	   o  They may also encompass additional resources at the edge of the
371	      network, such as sensor nodes.  Associated sensors could be
372	      physical (as in IoT) or logical (as in MIB data about a network
373	      device).

375	   o  Even user devices along the lines of crowd computing \cite{crowd-
376	      computing} or mist computing \cite{mist-computing} may contribute
377	      compute resources and dynamically become part of the network.

379	   Depending on the type of execution platform, as already alluded to
380	   above, a suitable execution framework must be put in place: from
381	   lambda functions to threads to processes or process VMs to unikernels
382	   to containers to full-blown VMs.  This should support mutual
383	   isolation and, depending on the service in question, a set of
384	   security features (e.g., authentication, trustworthy execution,
385	   accountability).  Further, it may be desirable to be able to compose
386	   the executable units, e.g., by chaining lambda functions or allowing
387	   unikernels to provide services to each other - both within a local
388	   execution platform and between remote platform instances across the
389	   network.

391	   The code to be executed may be pre-installed (as firmware, as
392	   microcode, as operating system functions, as libraries, as *aaS
393	   offering, among others) or may be dynamically supplied.  While the
394	   former is governed by the entity operating the execution device or
395	   supplying it (the vendor), the code to be executed may have different
396	   origins.  Fundamentally, we can distinguish between two cases:

398	   1.  The code may be "centrally" provisioned, originating from an
399	       application or other service provider inside the network.  This
400	       is analogous to CDNs, in which an application provider contracts
401	       a CDN provider to host content and service logic on its behalf.
402	       The deployment is usually long-term, even if instantiations of
403	       the code may vary.  The code thus originates from rather few -
404	       known - sources.  In this setting, applications only invoke this
405	       code and pass on their parameters, context, data, etc.

407	   2.  The code may be "decentrally" provided from a user device or
408	       other service that requires a certain function or service to be
409	       carried out.  At the coarse granularity of entire application
410	       images, this has been explored as "code offloading"; recent
411	       approaches have moved towards finer granularities of offloading
412	       (sets of) functions, for which also some frameworks for
413	       smartphones were developed, leading to finer granularities down
414	       to individual functions.  In this setting, application transfer
415	       mobile code - along with suitable parameters, etc. - into the
416	       network that is executed by suitable execution platforms.  This
417	       code is naturally expected to be less trusted as it may come from
418	       an arbitrary source.

420	   Obviously, 1. and 2. may be combined as mobile code may make use of
421	   other in-network functions and services, allowing for flexible
422	   application decomposition.  Essentially, in-network computing may
423	   support everything from full application offloading to decomposing an
424	   application into small snippets of code (e.g., at class, objects, or
425	   function granularity) that are fully distributed inside the network
426	   and executed in a distributed fashion according to the control flow
427	   of the application.  This may lead to iterative or recursive calling
428	   from application code on the initiating host to mobile code to pre-
429	   provisioned code.

431	   Another dimension beyond where the code comes from is how tightly the
432	   code and the data are coupled.  At one extreme approaches like Active
433	   Messages combine the data and the code that operates (only) on that
434	   data into transmission units, while at the other extreme approaches
435	   like Network Function Virtualization are only concerned with the
436	   instantiation of the code in the network.  The underlying
437	   architectural question is whether the goal is to enable the network
438	   to perform computations on the data passing through it, or whether
439	   the goal is to enable distributed computational processes to be built
440	   in the network.

442	   With these different existing and possibly emerging platforms and
443	   execution environments and different ways to provision functions in
444	   the network, it does not seem useful to assume any particular
445	   platform and any particular "mobile code" representation as _the_
446	   "computing in the network" environment.  Instead, it seems more
447	   promising to reason about properties that are relevant with respect
448	   to distributed program semantics and protocols/interfaces that would
449	   be used to integrate functions on heterogeneous platforms into one
450	   application context.  We discuss these ideas and associated
451	   challenges in the following section.

453	4.  Research Challenges

455	   Conceiving computing in the network as a joint resource optimization
456	   problem as described above incurs a set of interesting, novel
457	   research challenges that are particularly relevant from an Internet
458	   Research perspective.

460	4.1.  Categorization of Different Use Cases for Computing in the Network

462	   There are different applications but also different configuration
463	   classes of Computing in the Network systems.  For example, a data
464	   processing pipeline might be different from a distributed application
465	   employing some stateful actor components.  It is worthwhile analyzing
466	   different typical use cases and identify commonalities (for example,
467	   fundamental protocol elements etc.) and differences.

469	4.2.  Networking and Remote-Method-Invocation Abstractions

471	   In distributed systems, there are different classes of functions that
472	   can be distinguished, for example:

474	   1.  Strictly stateless functions that do not keep any context state
475	       beyond their activation time

477	   2.  Stateful functions/modules/programs that can be instantiated,
478	       invoked and eventually destroyed that do keep state over a series
479	       of function invocations

481	   Modern frameworks such as Ray are offering a clear separation of
482	   stateless functions and stateful actors and offer corresponding
483	   abstractions in their programming environment.  The aforementioned
484	   analysis of use cases should provide a diverse set of use cases for
485	   deriving a minimal yet sufficient set of function classes.

487	   Beyond this fundamental categorization of functions/actors, there is
488	   the question of interfaces and protocols mechanisms - as building
489	   blocks to utilize functions in programs.  For example, stateful
490	   functions are typically invoked through some Remote Method Invocation
491	   (RMI) protocol that identifies functions, allows for specifying/
492	   transferring parameters and function results etc.  Stateful actors
493	   could provide class-like interfaces that offer a set of functions
494	   (some of which might manipulate actor state).

496	   Another aspect is about identity (and naming) of functions and
497	   actors.  For actors that are typically used to achieve real-world
498	   effects or to enable multiple invocations of functions manipulating
499	   actor state over time, it is obvious that there needs to be a concept
500	   of specific instances.  Invoking an actor function would then require
501	   specifying some actor instance identifier.

503	   Stateless functions may be different: an invoking instance may be
504	   oblivious function identify and locus (on an execution platform) and
505	   might just want to leave it to the network to find the "best"
506	   instance or locus for a new instantiation.  Some fine-granular
507	   functions might just be instantiated for one invocation.  On the
508	   other hand, a function might be tied to a particular execution
509	   platform, for example an GPU-supported host system.  The naming and
510	   identity framework must allow for specifying such a function (or at
511	   least equivalence classes) accordingly.

513	   Stateful functions may share state within the same program context,
514	   i.e., across multiple invocations by the same application (as, e.g.,
515	   holds for web services that preserve context - locally or on the
516	   client side).  But stateful functions may also hold state across
517	   applications and possibly across different instantiations of a
518	   function on different compute nodes.  Such will require data
519	   synchronization mechanisms and the implementation of suitable data
520	   structure to achieve a certain degree of consistency.  The targeted
521	   degree of consistency may vary depending on the function and so may
522	   the mechanisms used to achieve the desired consistency.

524	   Finally, execution platforms will require efficient resource
525	   management techniques to operate with different types of stateless
526	   and stateful functions and their associated resources, as well as for
527	   dynamically instantiated mobile code.  Besides the aforementioned
528	   location of suitable compute platforms and scheduling (possibly
529	   queuing) functions and function invocations, this also includes
530	   resource recovery ("garbage collection").

532	4.3.  Transport Abstractions

534	   When implementing Computing in the Network and building blocks such
535	   as function invocation it seems that IP packet processing is not the
536	   right abstraction.  First of all, carrying the context for some
537	   function invocation might require many IP packets - possibly
538	   something like Application Data Units (ADUs).  But even if such ADUs
539	   could be fit into network layer packets, other problems still need to
540	   be addressed, for example message formats, reliability mechanisms,
541	   flow and congestion control etc.

543	   It could be argued that today's distributed computing overlays solve
544	   that by using TCP and corresponding application layer formats (such
545	   as HTTP) - however this bears the question whether a fine-granular
546	   distributed computing system, aiming to leverage the network for
547	   certain tasks, is best served by a TCP/IP-based approach that entails
548	   issues such as

550	   o  need for additional resolution/mapping system to find IP addresses
551	      for functions;

553	   o  possible overhead for establishing TCP connections for fine-
554	      granular function invocation; and

556	   o  mismatch between TCP end-to-end semantics and the intention to
557	      defer next-hop selection etc. to the network.

559	   Moreover, some Computing in the Network applications such as Big Data
560	   processing (Hadoop-style etc.) can benefit significantly from data-
561	   oriented concepts such as

563	   o  in-network caching (of data objects that represent function
564	      parameters or results);

566	   o  reasoning about the tradeoffs between moving data to function vs.
567	      moving code to data assets; and

569	   o  sharing data (e.g., function results) between sets of consuming
570	      entities.

572	   RMI systems such as RICE [RICE] [I-D.kutscher-icnrg-rice] enable
573	   Remote Method Invocation of ICN (data-oriented network/transport).
574	   Research questions include investigating how such approaches can be
575	   used to design general-purpose distributed computing systems.  More
576	   specifically, this would involve questions such as:

578	   o  What is the role of network elements in forwarding RMI requests?

580	   o  What visibility into load, performance and other properties should
581	      endpoints and the network have to make forwarding/offloading
582	      decisions?

584	   o  What is the notion of transport services in this concept and how
585	      intertwined is traditional transport with RMI invocation?

587	   o  What kind of feedback mechanisms would be desirable for supporting
588	      corresponding transport services?

590	4.4.  Programming Abstractions

592	   When creating SDKs and programming environments (as opposed to
593	   individual point solutions) questions arise such as:

595	   o  How to use concepts such as stateless functions, actor models and
596	      RMI in actual programs, i.e., what are minimal/ideal bindings or
597	      extensions to programming languages so that programmers can take
598	      advantage of Computing in the Network?

600	   o  Are there additional, potentially higher-layer, abstractions that
601	      are needed/useful, for example data set synchronization, data
602	      types for distributed computing such as CRDTs?

604	   In addition to programming languages, bindings, and data types, there
605	   is the question of execution environments and mobile code
606	   representation.  With the vast amount of different platforms (CPUs,
607	   GPUs, FPGAs etc.) it does not seem useful to assume exactly one
608	   environment.  Instead, interesting applications might actually
609	   benefit from running one particular function on a highly optimized
610	   platform but are agnostic with respect to platforms for other, less
611	   performance-critical functions.  Being able to support a
612	   heterogenous, evolving set of execution environments brings about
613	   questions such as:

615	   o  How to discover available platforms (and understand their
616	      properties)?

618	   o  How to specify application needs and map them to available
619	      platforms?

621	   o  Can a certain function/application service be provided with
622	      different fidelity levels, e.g., can an application leverage a GPU
623	      platform if available and fall back to a reduced feature set in
624	      case such a platform is not available?

626	   In this context, updates and versioning could entail another
627	   dimension of variability for Computing in the Network:

629	   o  How to manage coexistence of multiple versions of functions and
630	      services, also for service routing and request forwarding?

632	   o  Is there potential for fallback and version negotiation if needed
633	      (considering the risk of "bidding downs" attacks?)

635	   o  How to retire old versions?

637	   o  How to securely and reliably deal with function updates and
638	      corresponding maintenance tasks?

640	4.5.  Security, Privacy, Trust Model

642	   Computing in the Network has interesting security-related challenges,
643	   including:

645	   o  How can a caller trust that a remove function works as expected?
646	      This entails several questions such as

648	      *  How to securely bind "function names" to actual function code?

650	      *  How to trust the execution platform (in its entirety)?

652	      *  How to trust the network that is forwards requests (and result
653	         messages) reliably and securely?

655	   o  What levels of authentication are needed for callers (assuming
656	      that not everybody can invoke any function)?

658	   o  How to authenticate and achieve confidentiality for requests,
659	      their parameters and result data (especially when considering
660	      sharing of results)?

662	   Many of these questions are related to other design decisions such as

664	   o  What kind of session concept do we assume, i.e., is there a
665	      concept of distributed application session that represents a trust
666	      domain for its members?

668	   o  Where is trust anchored?  Can the system enable decentralized
669	      operation?

671	   All of these questions are not new, but conceiving networking and
672	   computing holistically seems to revisit distributed systems and
673	   network security - because some established concepts and technologies
674	   may not be directly applicable (such as transport layer security and
675	   corresponding web PKI).

677	4.6.  Failure Handling, Debugging, Management

679	   Distributed computing naturally provides different types of failures
680	   and exceptions.  In fine-granular distributed computing, some
681	   failures may by more tolerable (think microservices), i.e., platform
682	   crash or function abort due to isolated problems could be handled by
683	   just re-starting/re-running a particular function.  Similarly,
684	   "message loss" or incorrect routing information may be repairable by
685	   the system itself (after time).

687	   When failure cannot be repaired (or just tolerated) by the
688	   distributed computing framework, this raises questions such as:

690	   o  What are strategies for retrying vs aborting function invocation?

692	   o  How to signal exceptions and enable robust response to failures?

694	   Failure handling and debugging also has a management aspect that
695	   leads to questions such as:

697	   o  What monitoring and instrumentation interfaces are needed?

699	   o  How can we represent, visualize, and understand the (dynamically
700	      changing) properties of Computing in the Network infrastructure as
701	      well as of the currently running/instantiated entities?

703	5.  Acknowledgements

705	   The authors would like to thank Dave Oran, Michal Krol, Spyridon
706	   Mastorakis, Yiannis Psaras, and Eve Schooler for previous fruitful
707	   discussions on Computing in the Network topics.

709	6.  Informative References

711	   [ACTIVE]   Tennenhouse, D. and D. Wetherall, "Towards an active
712	              network architecture", ACM SIGCOMM Computer Communication
713	              Review Vol. 26, pp. 5-17, DOI 10.1145/231699.231701, April
714	              1996.

716	   [CANARY]   Qu et al, H., "Canary -- A scheduling architecture for
717	              high performance cloud computing", 2016,
718	              <https://arxiv.org/abs/1602.01412>.

720	   [FLINK]    Katsifodimos, A. and S. Schelter, "Apache Flink: Stream
721	              Analytics at Scale", 2016 IEEE International Conference on
722	              Cloud Engineering Workshop (IC2EW),
723	              DOI 10.1109/ic2ew.2016.56, April 2016.

725	   [I-D.kutscher-icnrg-rice]
726	              Krol, M., Habak, K., Oran, D., Kutscher, D., and I.
727	              Psaras, "Remote Method Invocation in ICN", draft-kutscher-
728	              icnrg-rice-00 (work in progress), October 2018.

730	   [RAY]      Moritz et al, P., "Ray -- A Distributed Framework for
731	              Emerging AI Applications", 2018,
732	              <http://dl.acm.org/citation.cfm?id=3291168.3291210>.

734	   [RFC7665]  Halpern, J., Ed. and C. Pignataro, Ed., "Service Function
735	              Chaining (SFC) Architecture", RFC 7665,
736	              DOI 10.17487/RFC7665, October 2015,
737	              <https://www.rfc-editor.org/info/rfc7665>.

739	   [RICE]     KrA^3l, M., Habak, K., Oran, D., Kutscher, D., and I.
740	              Psaras, "RICE", Proceedings of the 5th ACM Conference on
741	              Information-Centric Networking - ICN '18,
742	              DOI 10.1145/3267955.3267956, 2018.

744	   [SAPIO]    Sapio, A., Abdelaziz, I., Aldilaijan, A., Canini, M., and
745	              P. Kalnis, "In-Network Computation is a Dumb Idea Whose
746	              Time Has Come", Proceedings of the 16th ACM Workshop on
747	              Hot Topics in Networks - HotNets-XVI,
748	              DOI 10.1145/3152434.3152461, 2017.

750	   [SPARROW]  Ousterhout, K., Wendell, P., Zaharia, M., and I. Stoica,
751	              "Sparrow", Proceedings of the Twenty-Fourth ACM Symposium
752	              on Operating Systems Principles - SOSP '13,
753	              DOI 10.1145/2517349.2522716, 2013.

755	Authors' Addresses

757	   Dirk Kutscher
758	   University of Applied Sciences Emden/Leer
759	   Constantiaplatz 4
760	   Emden  D-26723
761	   Germany

763	   Email: ietf@dkutscher.net
764	   Teemu Kaerkkaeinen
765	   Technical University Muenchen
766	   Boltzmannstrasse 3
767	   Munich
768	   Germany

770	   Email: kaerkkae@in.tum.de

772	   Joerg Ott
773	   Technical University Muenchen
774	   Boltzmannstrasse 3
775	   Munich
776	   Germany

778	   Email: jo@in.tum.de