idnits 2.17.1 

draft-kunze-coin-industrial-use-cases-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (November 04, 2019) is 1625 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-05) exists of
     draft-mcbride-edge-data-discovery-overview-01


     Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	COIN                                                            I. Kunze
3	Internet-Draft                                                 K. Wehrle
4	Intended status: Informational                    RWTH Aachen University
5	Expires: May 7, 2020                                   November 04, 2019

7	             Industrial Use Cases for In-Network Computing
8	                draft-kunze-coin-industrial-use-cases-01

10	Abstract

12	   Cyber-physical systems and the Industrial Internet of Things are
13	   characterized by diverse sets of requirements which can hardly be
14	   satisfied using standard networking technology.  One example are
15	   latency-critical computations which become increasingly complex and
16	   are consequently outsourced to more powerful cloud platforms for
17	   feasibility reasons.  The intrinsic physical propagation delay to
18	   these remote sites can, however, already be too high for given
19	   requirements.  The challenge is to develop techniques that bring
20	   together these requirements.  Utilizing available computational
21	   capabilities within the network can be a solution to this challenge
22	   which makes in-network computing concepts a promising starting point.
23	   This document discusses select industrial use cases to demonstrate
24	   how in-network computing concepts can be applied to the industrial
25	   domain and to point out essential requirements of industrial
26	   applications.

28	Status of This Memo

30	   This Internet-Draft is submitted in full conformance with the
31	   provisions of BCP 78 and BCP 79.

33	   Internet-Drafts are working documents of the Internet Engineering
34	   Task Force (IETF).  Note that other groups may also distribute
35	   working documents as Internet-Drafts.  The list of current Internet-
36	   Drafts is at https://datatracker.ietf.org/drafts/current/.

38	   Internet-Drafts are draft documents valid for a maximum of six months
39	   and may be updated, replaced, or obsoleted by other documents at any
40	   time.  It is inappropriate to use Internet-Drafts as reference
41	   material or to cite them other than as "work in progress."

43	   This Internet-Draft will expire on May 7, 2020.

45	Copyright Notice

47	   Copyright (c) 2019 IETF Trust and the persons identified as the
48	   document authors.  All rights reserved.

50	   This document is subject to BCP 78 and the IETF Trust's Legal
51	   Provisions Relating to IETF Documents
52	   (https://trustee.ietf.org/license-info) in effect on the date of
53	   publication of this document.  Please review these documents
54	   carefully, as they describe your rights and restrictions with respect
55	   to this document.  Code Components extracted from this document must
56	   include Simplified BSD License text as described in Section 4.e of
57	   the Trust Legal Provisions and are provided without warranty as
58	   described in the Simplified BSD License.

60	Table of Contents

62	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
63	   2.  In-Network Control / Time-sensitive applications  . . . . . .   4
64	     2.1.  Characterization and Requirements . . . . . . . . . . . .   5
65	       2.1.1.  Approaches  . . . . . . . . . . . . . . . . . . . . .   5
66	   3.  Large Volume Applications/ Traffic Filtering  . . . . . . . .   6
67	     3.1.  Characterization and Requirements . . . . . . . . . . . .   6
68	     3.2.  Approaches  . . . . . . . . . . . . . . . . . . . . . . .   7
69	       3.2.1.  Traffic Filters . . . . . . . . . . . . . . . . . . .   7
70	       3.2.2.  In-Network (Pre-)Processing . . . . . . . . . . . . .   8
71	   4.  Industrial Safety (Dead Man's Switch) . . . . . . . . . . . .   9
72	     4.1.  Characterization and Requirements . . . . . . . . . . . .   9
73	       4.1.1.  Approaches  . . . . . . . . . . . . . . . . . . . . .  10
74	   5.  Security Considerations . . . . . . . . . . . . . . . . . . .  10
75	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  11
76	   7.  Conclusion  . . . . . . . . . . . . . . . . . . . . . . . . .  11
77	   8.  Informative References  . . . . . . . . . . . . . . . . . . .  11
78	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  12

80	1.  Introduction

82	   The Internet is based on a best-effort network that provides limited
83	   guarantees regarding the timely and successful transmission of
84	   packets.  This design-choice is suitable for general Internet-based
85	   applications, but specialized industrial applications demand a number
86	   of strict performance guarantees, e.g., regarding real-time
87	   capabilities, which cannot be provided over regular best-effort
88	   networks.

90	   Enhancements to the standard Ethernet such as Time-Sensitive-
91	   Networking [TSN] try to achieve the requirements on the link layer by
92	   statically reserving shares of the bandwidth.  These concepts are
93	   well-suited for traditional industrial settings where the
94	   communication paths are encapsulated at the respective factory sites
95	   and where the communication patterns are well understood.  Following
96	   the vision of the Industrial Internet of Things (IIoT), more and more
97	   parts of the industrial production domain are interconnected.  This
98	   increases the complexity of the industrial networks, making them more
99	   dynamic and creating more diverse sets of requirements.  Furthermore,
100	   process control is imagined to be exercised from remote clouds for
101	   feasibility reasons which is why solutions on the link layer alone
102	   are not sufficient in these scenarios.

104	   Common components of the IIoT can be divided into three categories as
105	   illustrated in Figure 1.  Following
106	   [I-D.draft-mcbride-edge-data-discovery-overview-01], EDGE DEVICES,
107	   such as sensors and actuators, constitute the boundary between
108	   physical and digital world.  They communicate the current state of
109	   the physical world to the digital world by transmitting sensor data
110	   or let the digital world interact with or manipulate the physical
111	   world by executing actions after receiving (simple) control
112	   information.  The processing of the sensor data as well as the
113	   creation of the control information is done on COMPUTING DEVICES.
114	   They range from small-powered controllers in close proximity to the
115	   EDGE DEVICES, to more powerful edge or remote clouds in larger
116	   distances.  The connection between the EDGE and COMPUTING DEVICES is
117	   established by NETWORKING DEVICES.  In the industrial domain, they
118	   range from standard devices, e.g. typical Ethernet switches, which
119	   can interconnect all Ethernet-capable hosts, to proprietary equipment
120	   with proprietary protocols which only supports hosts of specific
121	   vendors.

123	   The challenge is to develop concepts which can include off-premise
124	   entities (such as distant cloud platforms) as well as proprietary
125	   hosts into the communication and still satisfy the performance
126	   requirements of modern industrial networks.  The in-network computing
127	   paradigm presents a promising starting point because (pre-)processing
128	   data within the network can speed up the communication, e.g., by
129	   reducing the amount of transmitted data and thus congestion.
130	   Flexibly distributing the computation tasks across the network helps
131	   to manage dynamic changes.  Specifying general requirements for the
132	   different application scenarios is difficult due to the mentioned
133	   diversity.  In an effort to showcase potential requirements for the
134	   domain of industrial production, we characterize and analyze three
135	   distinct scenarios to illustrate how in-network computations can be
136	   helpful.

138	    --------
139	    |Sensor| ------------|              ~~~~~~~~~~~~      ------------
140	    --------       -------------        { Internet } --- |Remote Cloud|
141	       .           |Access Point|---    ~~~~~~~~~~~~      ------------
142	    --------       -------------   |          |
143	    |Sensor| ----|        |        |          |
144	    --------     |        |       --------    |
145	       .         |        |       |Switch| ----------------------
146	       .         |        |       --------                       |
147	       .         |        |                   ------------       |
148	    ----------   |        |----------------- | Controller |      |
149	    |Actuator| ------------                   ------------       |
150	    ----------   |    --------                            ------------
151	       .         |----|Switch|---------------------------| Edge Cloud |
152	    ----------        --------                            ------------
153	    |Actuator|  ---------|
154	    ----------

156	   |-----------|       |------------------|     |-------------------|
157	    EDGE DEVICES        NETWORKING DEVICES        COMPUTING DEVICES
158	     Figure 1: Industrial networks show a high level of heterogeneity.

160	2.  In-Network Control / Time-sensitive applications

162	   The control of physical processes and components of a production line
163	   is a cornerstone of the industrial domain.  It is essential for the
164	   growing automation of production and ideally allows for a consistent
165	   quality level.  Traditionally, the control has been exercised by
166	   control software running on programmable logic controllers (PLCs)
167	   located directly next to the controlled process or component.  This
168	   approach is best-suited for settings with a simple model that is
169	   focussed on a single or few controlled components.

171	   Modern production lines and shop floors are characterized by an
172	   increasing amount of involved devices and sensors, a growing level of
173	   dependency between the different components, and more complex control
174	   models.  A centralized control is desirable to manage the large
175	   amount of available information which often has to be pre-processed
176	   or aggregated with other information before it can be used.  PLCs are
177	   not designed for this array of tasks and computations could
178	   theoretically be moved to more powerful devices.  These devices are
179	   no longer in close proximity to the controlled objects and induce
180	   additional latency.

182	   It is worthwhile to investigate whether the outsourcing of control
183	   functionality to distant computation platforms is viable, because
184	   these platforms have a high level of flexibility and scalability.  In
185	   the following, we describe the requirements and characteristics of
186	   the control setting in more detail.

188	2.1.  Characterization and Requirements

190	   A control process consists of two main components as is illustrated
191	   in Figure 2: a system under control and a controller.  In feedback
192	   control, the current state of the system is monitored, e.g., using
193	   sensors, and the controller influences the system based on the
194	   difference between the current and the reference state to keep it
195	   close to this reference state.

197	   Apart from the control model, the quality of the control primarily
198	   depends on the timely reception of the sensor feedback, because the
199	   controller can only react if it is notified about changes in the
200	   system state.  Depending on the dynamics of the controlled system,
201	   the control can be subject to tight latency constraints, often in the
202	   single digit millisecond range.  While low latencies are important,
203	   there is an even greater need for stable and deterministic levels of
204	   latency, because controllers can generally cope with different levels
205	   of latency if they are designed for them, but they are significantly
206	   challenged by dynamically changing or unstable latencies.  This is
207	   especially true if off-premise cloud platforms are included due to
208	   the unpredictable latency of the Internet.

210	   The main requirements for the industrial control scenario are low and
211	   stable latencies to ensure that processes can work continuously and
212	   that no machines are damaged.

214	    reference
215	      state      ------------        --------    Output
216	   ---------->  | Controller | ---> | System | ---------->
217	              ^  ------------        --------       |
218	              |                                     |
219	              |   observed state                    |
220	              |                    ---------        |
221	               -------------------| Sensors | <-----
222	                                   ---------
223	            Figure 2: Simple feedback control model

225	2.1.1.  Approaches

227	   Control models in general can become complex but there is a variety
228	   of control algorithms that are composed of simple computations such
229	   as matrix multiplication.  As these are supported by programmable
230	   network devices, it is a possibility to compose simplified
231	   approximations of the more complex algorithms and deploy them in the
232	   network.  While the simplified versions induce a more inaccurate
233	   control, they allow for a quicker response and might be sufficient to
234	   operate a basic tight control loop while the overall control can
235	   still be exercised from the cloud.  The problem, however, is that
236	   networking devices typically only allow for integer precision
237	   computation while floating point precision is needed by most control
238	   algorithms.  Early approaches like [RUETH] have already shown the
239	   general applicability of such ideas, but there are still a lot of
240	   open research questions not limited to the following:

242	   o  How can one derive the simplified versions of the overall
243	      controller?

245	      *  How complex can they become?

247	      *  How can one take the limited computational precision of
248	         networking devices into account when making them?

250	   o  How does one distribute the simplified versions in the network?

252	   o  How does the overall controller interact with the simplified
253	      versions?

255	3.  Large Volume Applications/ Traffic Filtering

257	   In the IIoT, processes and machines can be monitored more effectively
258	   resulting in more available information.  This data can be used to
259	   deploy machine learning (ML) techniques and consequently help to find
260	   previously unknown correlations between different components of the
261	   production which in turn helps to improve the overall production
262	   system.  Newly gained knowledge can be shared between different sites
263	   of the same company or even between different companies.

265	   Traditional company infrastructure is neither equipped for the
266	   management and storage of such large amounts of data nor for the
267	   computationally expensive training of ML approaches.  Similar to the
268	   considerations in Section 2, off-premise cloud platforms offer cost-
269	   effective solutions with a high degree of flexibility and
270	   scalability.  While the unpredictable latency of the Internet is only
271	   a subordinate problem for this use case, moving all data to off-
272	   premise locations primarily poses infrastructural challenges which
273	   are presented in more detail in the following.

275	3.1.  Characterization and Requirements

277	   Processes in the industrial domain are monitored by distributed
278	   sensors which range from simple binary (e.g., light barriers) to
279	   complex sensors measuring the system with varying degrees of
280	   resolution.  Sensors can further serve different purposes, as some
281	   might be used for time-critical process control while others are only
282	   used as redundant fall back platforms.  Overall, there is a high
283	   level of heterogeneity which makes managing the sensor output a
284	   challenging task.

286	   Depending on the deployed sensors and the complexity of the observed
287	   system, the resulting overall data volume can easily be in the range
288	   of several Gbit/s [GLEBKE].  Using off-premise clouds for managing
289	   the data requires uploading or streaming the growing volume of sensor
290	   data using the companies' Internet access which is typically limited
291	   to a few hundred of Mbit/s.  While large networking companies can
292	   simply upgrade their infrastructure, most industrial companies rely
293	   on traditional ISPs for their Internet access.  Higher access speeds
294	   are hence tied to higher costs and, above all, subject to the supply
295	   of the ISPs and consequently not always available.  A major challenge
296	   is thus to devise methodology which is able to handle such amounts of
297	   data over limited access links.

299	   Another aspect is that business data leaving the premise and control
300	   of the company further comes with security concerns, as sensitive
301	   information or valuable business secrets might be contained in it.
302	   Typical security measures such as encrypting the data makes in-
303	   network computing techniques hardly applicable as they typically work
304	   on unencrypted data.  Adding security to in-network computing
305	   approaches, either by adding functionality for handling encrypted
306	   data or devising general security measures, is thus a very promising
307	   field for research which we describe in more detail in Section 5.

309	3.2.  Approaches

311	   There are at least two concepts which might be suitable for reducing
312	   the amount of transmitted data in a meaningful way:

314	   1.  filtering out redundant or unnecessary data

316	   2.  aggregating data by applying preprocessing steps within the
317	       network

319	   Both concepts require detailed knowledge about the monitoring
320	   infrastructure at the factories and the purpose of the transmitted
321	   data.

323	3.2.1.  Traffic Filters

325	   Sensors are often set up redundantly, i.e., part of the collected
326	   data might also be redundant.  Moreover, they are often hard to
327	   configure or not configurable at all which is why their resolution or
328	   sampling frequency is often larger than required.  Consequently, it
329	   is likely that more data is transmitted than is actually needed or
330	   desired.  A trivial idea for reducing the amount of data is thus to
331	   filter out redundant or undesired data before it leaves the premise
332	   using simple traffic filters that are deployed in the on-premise
333	   network.  There are different approaches how this topic can be
334	   tackled.  A first step would be to simply scale down the available
335	   sensor data to the data rate that is needed.  For example, if a
336	   sensor transmits with a frequency of 5 kHz, but only 1 kHz are needed
337	   by the control entity, it might make sense to simply let only pass
338	   every fifth packet containing sensor data.  Alternatively, sensor
339	   data might filtered down to a lower frequency while the sensor value
340	   is in an uninteresting range, but let through with higher resolution
341	   once the sensor value range becomes interesting.  What is important
342	   at this point is that end-hosts are informed about the filtering so
343	   that they can distinguish between data loss and data filtered out on
344	   purpose.

346	   In this context, the following research questions can be of interest:

348	   o  How can traffic filters be designed?

350	   o  How can traffic filters be coordinated and deployed?

352	   o  How can traffic filters be changed dynamically?

354	   o  How can traffic filtering be signaled to the end-hosts?

356	3.2.2.  In-Network (Pre-)Processing

358	   There are manifold computations that can be performed on the sensor
359	   data in the cloud.  Some of them are very complex or need the
360	   complete sensor data during the computation, but there are also
361	   simpler operations which can be done on subsets of the overall
362	   dataset or earlier on the communication path as soon as all data is
363	   available.  One example is finding the maximum of all sensors values
364	   which can either be done iteratively on each intermediate hop or at
365	   the first hop, where all data is available.

367	   Using expert knowledge about the exact computation steps and the
368	   concrete transmission path of the sensor data, simple computation
369	   steps can be deployed in the on-premise network to reduce the overall
370	   data volume and potentially speed up the processing time in the
371	   cloud.

373	   Related work has already shown that in-network aggregation can help
374	   to improve the performance of distributed ML applications [SAPIO].
375	   Investigating the applicability of stream data processing techniques
376	   to programmable networking devices is also interesting, because
377	   sensor data is usually streamed.  In this context, the following
378	   research questions can be of interest:

380	   o  Which (pre-)processing steps can be deployed in the network?

382	      *  How complex can they become?

384	   o  How can applications incorporate the (pre-)processing steps?

386	   o  How can the programming of the techniques be streamlined?

388	4.  Industrial Safety (Dead Man's Switch)

390	   Despite increasing automation in production processes, human workers
391	   are still often necessary.  This gives safety measures a high
392	   priority to ensure that no human life is endangered.  In traditional
393	   factories, the regions of contact between humans and machines are
394	   well-defined and interactions are simple.  Simple safety measures
395	   like emergency switches at the working positions are enough to
396	   provide a decent level of safety.

398	   Modern factories are characterized by increasingly dynamic and
399	   complex environments with new interaction scenarios between humans
400	   and robots.  Robots can either directly assist humans or perform
401	   tasks autonomously.  The intersect between the human working area and
402	   the robots grows and it is harder for human workers to fully observe
403	   the complete environment.

405	   Additional safety measures are important to prevent accidents and
406	   support humans in observing the environment.  The increased
407	   availability of sensor data and the detailed monitoring of the
408	   factories can help to build additional safety measures if the
409	   corresponding data is collected early at the correct position.

411	4.1.  Characterization and Requirements

413	   Industrial safety measures are typically hardware solutions, because
414	   they have to pass rigorous testing before they are certified and
415	   deployment-ready.  Common measures include safety switches, which
416	   need to be triggered manually, and light barriers.  Additionally, the
417	   working area can be explicitly divided into 'contact' and 'safe'
418	   areas, indicating when workers have to watch out for interactions
419	   with machinery.

421	   These measures are static solutions, potentially relying on special
422	   hardware, and are challenged by the increased dynamics of modern
423	   factories where the factory configuration can be changed on demand.
424	   Software solutions offer a higher flexibility as they can dynamically
425	   respect new information gathered by the sensor systems.  Depending on
426	   the corresponding occupational safety laws, the software has to
427	   satisfy very strict requirements which cannot be satisfied by regular
428	   best-effort networks.

430	4.1.1.  Approaches

432	   Software-based solutions can take advantage of the large amount of
433	   available sensor data.  Different safety indicators within the
434	   production hall can be combined within the network so that
435	   programmable networking devices can give early responses if a
436	   potential safety breach is detected.  A rather simple possibility
437	   could be to track the positions of human workers and robots.
438	   Whenever a robot gets too close to a human in a non-working area or
439	   if a human enters a certain safety zone, robots are stopped to
440	   prevent injuries.  More advanced concepts could also include image
441	   data or combine arbitrary sensor data.

443	   In this context, the following research questions can be of interest:

445	   o  How can the software give guaranteed safety over best-effort
446	      networks?

448	   o  Which sensor information can be combined and how?

450	5.  Security Considerations

452	   Current in-network computing approaches typically work on unencrypted
453	   plain text data, because today's networking devices usually do not
454	   have crypto capabilities.  As is already mentioned in Section 3.1,
455	   this above all poses problems when business data, potentially
456	   containing business secrets, is streamed into remote computing
457	   facilities and consequently leaves the control of the company.  It is
458	   thus important to at least establish secure communication paths to
459	   the remote facilities.

461	   On the shop-floor and within the company, data is mostly communicated
462	   without any security measures.  This makes developing initial in-
463	   network computing techniques easier, but also has severe drawbacks,
464	   especially if in-network computing is widely deployed.  In this
465	   setting, data modifications are not only possible, but even
466	   encouraged.  Ensuring the correctness of data thus becomes an issue,
467	   especially if modifications are cooperatively performed by more than
468	   one device.  Additionally, unintended modifications could also be
469	   executed.  It is thus also important for on-premise communication to
470	   deploy security or at least authentication functionality.

472	6.  IANA Considerations

474	   N/A

476	7.  Conclusion

478	   In-network computing concepts have the potential to improve
479	   industrial applications.  There are at least three scenarios for
480	   which in-network processing can be beneficial, each having a unique
481	   set of requirements.

483	   In the control scenario, tight latency constraints in the single
484	   digit millisecond range have to be satisfied despite the use of cloud
485	   platforms and the corresponding unstable latency of the Internet.

487	   In a second scenario, large amounts of data have to be transmitted to
488	   cloud platforms for further evaluation.  One important task here is
489	   to reduce the amount of data that needs to be transmitted as the
490	   available Internet access speed is most likely non-sufficent.  Apart
491	   from that, security measures have to be implemented as business data
492	   is transmitted to the Internet.

494	   Regarding safety, software-based measures often lack the required
495	   guarantees and do not withstand the testing for certification.  In-
496	   network processing with its potential for early responses can be a
497	   solution by combining different sensor outputs early and acting
498	   quickly.

500	8.  Informative References

502	   [GLEBKE]   Glebke, R., "A Case for Integrated Data Processing in
503	              Large-Scale Cyber-Physical Systems", DOI: 10125/60162, in
504	              HICSS, January 2019.

506	   [I-D.draft-mcbride-edge-data-discovery-overview-01]
507	              McBride, M., Kutscher, D., Schooler, E., and C. Bernardos,
508	              "Overview of Edge Data Discovery", draft-mcbride-edge-
509	              data-discovery-overview-01 (work in progress), March 2019.

511	   [RUETH]    Rueth, J., "Towards In-Network Industrial Feedback
512	              Control", DOI: 10.1145/3229591.3229592, in ACM SIGCOMM
513	              NetCompute, August 2018.

515	   [SAPIO]    Sapio, A., "Scaling Distributed Machine Learning with In-
516	              Network Aggregation", 2019,
517	              <https://arxiv.org/abs/1903.06701>.

519	   [TSN]      "Time-Sensitive Networking (TSN) Task Group", 2019,
520	              <https://1.ieee802.org/tsn/>.

522	Authors' Addresses

524	   Ike Kunze
525	   RWTH Aachen University
526	   Ahornstr. 55
527	   Aachen  D-50274
528	   Germany

530	   Phone: +49-241-80-21422
531	   Email: kunze@comsys.rwth-aachen.de

533	   Klaus Wehrle
534	   RWTH Aachen University
535	   Ahornstr. 55
536	   Aachen  D-50274
537	   Germany

539	   Phone: +49-241-80-21401
540	   Email: wehrle@comsys.rwth-aachen.de