idnits 2.17.1 draft-kunze-coin-industrial-use-cases-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 04, 2019) is 1625 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-05) exists of draft-mcbride-edge-data-discovery-overview-01 Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 COIN I. Kunze 3 Internet-Draft K. Wehrle 4 Intended status: Informational RWTH Aachen University 5 Expires: May 7, 2020 November 04, 2019 7 Industrial Use Cases for In-Network Computing 8 draft-kunze-coin-industrial-use-cases-01 10 Abstract 12 Cyber-physical systems and the Industrial Internet of Things are 13 characterized by diverse sets of requirements which can hardly be 14 satisfied using standard networking technology. One example are 15 latency-critical computations which become increasingly complex and 16 are consequently outsourced to more powerful cloud platforms for 17 feasibility reasons. The intrinsic physical propagation delay to 18 these remote sites can, however, already be too high for given 19 requirements. The challenge is to develop techniques that bring 20 together these requirements. Utilizing available computational 21 capabilities within the network can be a solution to this challenge 22 which makes in-network computing concepts a promising starting point. 23 This document discusses select industrial use cases to demonstrate 24 how in-network computing concepts can be applied to the industrial 25 domain and to point out essential requirements of industrial 26 applications. 28 Status of This Memo 30 This Internet-Draft is submitted in full conformance with the 31 provisions of BCP 78 and BCP 79. 33 Internet-Drafts are working documents of the Internet Engineering 34 Task Force (IETF). Note that other groups may also distribute 35 working documents as Internet-Drafts. The list of current Internet- 36 Drafts is at https://datatracker.ietf.org/drafts/current/. 38 Internet-Drafts are draft documents valid for a maximum of six months 39 and may be updated, replaced, or obsoleted by other documents at any 40 time. It is inappropriate to use Internet-Drafts as reference 41 material or to cite them other than as "work in progress." 43 This Internet-Draft will expire on May 7, 2020. 45 Copyright Notice 47 Copyright (c) 2019 IETF Trust and the persons identified as the 48 document authors. All rights reserved. 50 This document is subject to BCP 78 and the IETF Trust's Legal 51 Provisions Relating to IETF Documents 52 (https://trustee.ietf.org/license-info) in effect on the date of 53 publication of this document. Please review these documents 54 carefully, as they describe your rights and restrictions with respect 55 to this document. Code Components extracted from this document must 56 include Simplified BSD License text as described in Section 4.e of 57 the Trust Legal Provisions and are provided without warranty as 58 described in the Simplified BSD License. 60 Table of Contents 62 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 63 2. In-Network Control / Time-sensitive applications . . . . . . 4 64 2.1. Characterization and Requirements . . . . . . . . . . . . 5 65 2.1.1. Approaches . . . . . . . . . . . . . . . . . . . . . 5 66 3. Large Volume Applications/ Traffic Filtering . . . . . . . . 6 67 3.1. Characterization and Requirements . . . . . . . . . . . . 6 68 3.2. Approaches . . . . . . . . . . . . . . . . . . . . . . . 7 69 3.2.1. Traffic Filters . . . . . . . . . . . . . . . . . . . 7 70 3.2.2. In-Network (Pre-)Processing . . . . . . . . . . . . . 8 71 4. Industrial Safety (Dead Man's Switch) . . . . . . . . . . . . 9 72 4.1. Characterization and Requirements . . . . . . . . . . . . 9 73 4.1.1. Approaches . . . . . . . . . . . . . . . . . . . . . 10 74 5. Security Considerations . . . . . . . . . . . . . . . . . . . 10 75 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 76 7. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 11 77 8. Informative References . . . . . . . . . . . . . . . . . . . 11 78 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 80 1. Introduction 82 The Internet is based on a best-effort network that provides limited 83 guarantees regarding the timely and successful transmission of 84 packets. This design-choice is suitable for general Internet-based 85 applications, but specialized industrial applications demand a number 86 of strict performance guarantees, e.g., regarding real-time 87 capabilities, which cannot be provided over regular best-effort 88 networks. 90 Enhancements to the standard Ethernet such as Time-Sensitive- 91 Networking [TSN] try to achieve the requirements on the link layer by 92 statically reserving shares of the bandwidth. These concepts are 93 well-suited for traditional industrial settings where the 94 communication paths are encapsulated at the respective factory sites 95 and where the communication patterns are well understood. Following 96 the vision of the Industrial Internet of Things (IIoT), more and more 97 parts of the industrial production domain are interconnected. This 98 increases the complexity of the industrial networks, making them more 99 dynamic and creating more diverse sets of requirements. Furthermore, 100 process control is imagined to be exercised from remote clouds for 101 feasibility reasons which is why solutions on the link layer alone 102 are not sufficient in these scenarios. 104 Common components of the IIoT can be divided into three categories as 105 illustrated in Figure 1. Following 106 [I-D.draft-mcbride-edge-data-discovery-overview-01], EDGE DEVICES, 107 such as sensors and actuators, constitute the boundary between 108 physical and digital world. They communicate the current state of 109 the physical world to the digital world by transmitting sensor data 110 or let the digital world interact with or manipulate the physical 111 world by executing actions after receiving (simple) control 112 information. The processing of the sensor data as well as the 113 creation of the control information is done on COMPUTING DEVICES. 114 They range from small-powered controllers in close proximity to the 115 EDGE DEVICES, to more powerful edge or remote clouds in larger 116 distances. The connection between the EDGE and COMPUTING DEVICES is 117 established by NETWORKING DEVICES. In the industrial domain, they 118 range from standard devices, e.g. typical Ethernet switches, which 119 can interconnect all Ethernet-capable hosts, to proprietary equipment 120 with proprietary protocols which only supports hosts of specific 121 vendors. 123 The challenge is to develop concepts which can include off-premise 124 entities (such as distant cloud platforms) as well as proprietary 125 hosts into the communication and still satisfy the performance 126 requirements of modern industrial networks. The in-network computing 127 paradigm presents a promising starting point because (pre-)processing 128 data within the network can speed up the communication, e.g., by 129 reducing the amount of transmitted data and thus congestion. 130 Flexibly distributing the computation tasks across the network helps 131 to manage dynamic changes. Specifying general requirements for the 132 different application scenarios is difficult due to the mentioned 133 diversity. In an effort to showcase potential requirements for the 134 domain of industrial production, we characterize and analyze three 135 distinct scenarios to illustrate how in-network computations can be 136 helpful. 138 -------- 139 |Sensor| ------------| ~~~~~~~~~~~~ ------------ 140 -------- ------------- { Internet } --- |Remote Cloud| 141 . |Access Point|--- ~~~~~~~~~~~~ ------------ 142 -------- ------------- | | 143 |Sensor| ----| | | | 144 -------- | | -------- | 145 . | | |Switch| ---------------------- 146 . | | -------- | 147 . | | ------------ | 148 ---------- | |----------------- | Controller | | 149 |Actuator| ------------ ------------ | 150 ---------- | -------- ------------ 151 . |----|Switch|---------------------------| Edge Cloud | 152 ---------- -------- ------------ 153 |Actuator| ---------| 154 ---------- 156 |-----------| |------------------| |-------------------| 157 EDGE DEVICES NETWORKING DEVICES COMPUTING DEVICES 158 Figure 1: Industrial networks show a high level of heterogeneity. 160 2. In-Network Control / Time-sensitive applications 162 The control of physical processes and components of a production line 163 is a cornerstone of the industrial domain. It is essential for the 164 growing automation of production and ideally allows for a consistent 165 quality level. Traditionally, the control has been exercised by 166 control software running on programmable logic controllers (PLCs) 167 located directly next to the controlled process or component. This 168 approach is best-suited for settings with a simple model that is 169 focussed on a single or few controlled components. 171 Modern production lines and shop floors are characterized by an 172 increasing amount of involved devices and sensors, a growing level of 173 dependency between the different components, and more complex control 174 models. A centralized control is desirable to manage the large 175 amount of available information which often has to be pre-processed 176 or aggregated with other information before it can be used. PLCs are 177 not designed for this array of tasks and computations could 178 theoretically be moved to more powerful devices. These devices are 179 no longer in close proximity to the controlled objects and induce 180 additional latency. 182 It is worthwhile to investigate whether the outsourcing of control 183 functionality to distant computation platforms is viable, because 184 these platforms have a high level of flexibility and scalability. In 185 the following, we describe the requirements and characteristics of 186 the control setting in more detail. 188 2.1. Characterization and Requirements 190 A control process consists of two main components as is illustrated 191 in Figure 2: a system under control and a controller. In feedback 192 control, the current state of the system is monitored, e.g., using 193 sensors, and the controller influences the system based on the 194 difference between the current and the reference state to keep it 195 close to this reference state. 197 Apart from the control model, the quality of the control primarily 198 depends on the timely reception of the sensor feedback, because the 199 controller can only react if it is notified about changes in the 200 system state. Depending on the dynamics of the controlled system, 201 the control can be subject to tight latency constraints, often in the 202 single digit millisecond range. While low latencies are important, 203 there is an even greater need for stable and deterministic levels of 204 latency, because controllers can generally cope with different levels 205 of latency if they are designed for them, but they are significantly 206 challenged by dynamically changing or unstable latencies. This is 207 especially true if off-premise cloud platforms are included due to 208 the unpredictable latency of the Internet. 210 The main requirements for the industrial control scenario are low and 211 stable latencies to ensure that processes can work continuously and 212 that no machines are damaged. 214 reference 215 state ------------ -------- Output 216 ----------> | Controller | ---> | System | ----------> 217 ^ ------------ -------- | 218 | | 219 | observed state | 220 | --------- | 221 -------------------| Sensors | <----- 222 --------- 223 Figure 2: Simple feedback control model 225 2.1.1. Approaches 227 Control models in general can become complex but there is a variety 228 of control algorithms that are composed of simple computations such 229 as matrix multiplication. As these are supported by programmable 230 network devices, it is a possibility to compose simplified 231 approximations of the more complex algorithms and deploy them in the 232 network. While the simplified versions induce a more inaccurate 233 control, they allow for a quicker response and might be sufficient to 234 operate a basic tight control loop while the overall control can 235 still be exercised from the cloud. The problem, however, is that 236 networking devices typically only allow for integer precision 237 computation while floating point precision is needed by most control 238 algorithms. Early approaches like [RUETH] have already shown the 239 general applicability of such ideas, but there are still a lot of 240 open research questions not limited to the following: 242 o How can one derive the simplified versions of the overall 243 controller? 245 * How complex can they become? 247 * How can one take the limited computational precision of 248 networking devices into account when making them? 250 o How does one distribute the simplified versions in the network? 252 o How does the overall controller interact with the simplified 253 versions? 255 3. Large Volume Applications/ Traffic Filtering 257 In the IIoT, processes and machines can be monitored more effectively 258 resulting in more available information. This data can be used to 259 deploy machine learning (ML) techniques and consequently help to find 260 previously unknown correlations between different components of the 261 production which in turn helps to improve the overall production 262 system. Newly gained knowledge can be shared between different sites 263 of the same company or even between different companies. 265 Traditional company infrastructure is neither equipped for the 266 management and storage of such large amounts of data nor for the 267 computationally expensive training of ML approaches. Similar to the 268 considerations in Section 2, off-premise cloud platforms offer cost- 269 effective solutions with a high degree of flexibility and 270 scalability. While the unpredictable latency of the Internet is only 271 a subordinate problem for this use case, moving all data to off- 272 premise locations primarily poses infrastructural challenges which 273 are presented in more detail in the following. 275 3.1. Characterization and Requirements 277 Processes in the industrial domain are monitored by distributed 278 sensors which range from simple binary (e.g., light barriers) to 279 complex sensors measuring the system with varying degrees of 280 resolution. Sensors can further serve different purposes, as some 281 might be used for time-critical process control while others are only 282 used as redundant fall back platforms. Overall, there is a high 283 level of heterogeneity which makes managing the sensor output a 284 challenging task. 286 Depending on the deployed sensors and the complexity of the observed 287 system, the resulting overall data volume can easily be in the range 288 of several Gbit/s [GLEBKE]. Using off-premise clouds for managing 289 the data requires uploading or streaming the growing volume of sensor 290 data using the companies' Internet access which is typically limited 291 to a few hundred of Mbit/s. While large networking companies can 292 simply upgrade their infrastructure, most industrial companies rely 293 on traditional ISPs for their Internet access. Higher access speeds 294 are hence tied to higher costs and, above all, subject to the supply 295 of the ISPs and consequently not always available. A major challenge 296 is thus to devise methodology which is able to handle such amounts of 297 data over limited access links. 299 Another aspect is that business data leaving the premise and control 300 of the company further comes with security concerns, as sensitive 301 information or valuable business secrets might be contained in it. 302 Typical security measures such as encrypting the data makes in- 303 network computing techniques hardly applicable as they typically work 304 on unencrypted data. Adding security to in-network computing 305 approaches, either by adding functionality for handling encrypted 306 data or devising general security measures, is thus a very promising 307 field for research which we describe in more detail in Section 5. 309 3.2. Approaches 311 There are at least two concepts which might be suitable for reducing 312 the amount of transmitted data in a meaningful way: 314 1. filtering out redundant or unnecessary data 316 2. aggregating data by applying preprocessing steps within the 317 network 319 Both concepts require detailed knowledge about the monitoring 320 infrastructure at the factories and the purpose of the transmitted 321 data. 323 3.2.1. Traffic Filters 325 Sensors are often set up redundantly, i.e., part of the collected 326 data might also be redundant. Moreover, they are often hard to 327 configure or not configurable at all which is why their resolution or 328 sampling frequency is often larger than required. Consequently, it 329 is likely that more data is transmitted than is actually needed or 330 desired. A trivial idea for reducing the amount of data is thus to 331 filter out redundant or undesired data before it leaves the premise 332 using simple traffic filters that are deployed in the on-premise 333 network. There are different approaches how this topic can be 334 tackled. A first step would be to simply scale down the available 335 sensor data to the data rate that is needed. For example, if a 336 sensor transmits with a frequency of 5 kHz, but only 1 kHz are needed 337 by the control entity, it might make sense to simply let only pass 338 every fifth packet containing sensor data. Alternatively, sensor 339 data might filtered down to a lower frequency while the sensor value 340 is in an uninteresting range, but let through with higher resolution 341 once the sensor value range becomes interesting. What is important 342 at this point is that end-hosts are informed about the filtering so 343 that they can distinguish between data loss and data filtered out on 344 purpose. 346 In this context, the following research questions can be of interest: 348 o How can traffic filters be designed? 350 o How can traffic filters be coordinated and deployed? 352 o How can traffic filters be changed dynamically? 354 o How can traffic filtering be signaled to the end-hosts? 356 3.2.2. In-Network (Pre-)Processing 358 There are manifold computations that can be performed on the sensor 359 data in the cloud. Some of them are very complex or need the 360 complete sensor data during the computation, but there are also 361 simpler operations which can be done on subsets of the overall 362 dataset or earlier on the communication path as soon as all data is 363 available. One example is finding the maximum of all sensors values 364 which can either be done iteratively on each intermediate hop or at 365 the first hop, where all data is available. 367 Using expert knowledge about the exact computation steps and the 368 concrete transmission path of the sensor data, simple computation 369 steps can be deployed in the on-premise network to reduce the overall 370 data volume and potentially speed up the processing time in the 371 cloud. 373 Related work has already shown that in-network aggregation can help 374 to improve the performance of distributed ML applications [SAPIO]. 375 Investigating the applicability of stream data processing techniques 376 to programmable networking devices is also interesting, because 377 sensor data is usually streamed. In this context, the following 378 research questions can be of interest: 380 o Which (pre-)processing steps can be deployed in the network? 382 * How complex can they become? 384 o How can applications incorporate the (pre-)processing steps? 386 o How can the programming of the techniques be streamlined? 388 4. Industrial Safety (Dead Man's Switch) 390 Despite increasing automation in production processes, human workers 391 are still often necessary. This gives safety measures a high 392 priority to ensure that no human life is endangered. In traditional 393 factories, the regions of contact between humans and machines are 394 well-defined and interactions are simple. Simple safety measures 395 like emergency switches at the working positions are enough to 396 provide a decent level of safety. 398 Modern factories are characterized by increasingly dynamic and 399 complex environments with new interaction scenarios between humans 400 and robots. Robots can either directly assist humans or perform 401 tasks autonomously. The intersect between the human working area and 402 the robots grows and it is harder for human workers to fully observe 403 the complete environment. 405 Additional safety measures are important to prevent accidents and 406 support humans in observing the environment. The increased 407 availability of sensor data and the detailed monitoring of the 408 factories can help to build additional safety measures if the 409 corresponding data is collected early at the correct position. 411 4.1. Characterization and Requirements 413 Industrial safety measures are typically hardware solutions, because 414 they have to pass rigorous testing before they are certified and 415 deployment-ready. Common measures include safety switches, which 416 need to be triggered manually, and light barriers. Additionally, the 417 working area can be explicitly divided into 'contact' and 'safe' 418 areas, indicating when workers have to watch out for interactions 419 with machinery. 421 These measures are static solutions, potentially relying on special 422 hardware, and are challenged by the increased dynamics of modern 423 factories where the factory configuration can be changed on demand. 424 Software solutions offer a higher flexibility as they can dynamically 425 respect new information gathered by the sensor systems. Depending on 426 the corresponding occupational safety laws, the software has to 427 satisfy very strict requirements which cannot be satisfied by regular 428 best-effort networks. 430 4.1.1. Approaches 432 Software-based solutions can take advantage of the large amount of 433 available sensor data. Different safety indicators within the 434 production hall can be combined within the network so that 435 programmable networking devices can give early responses if a 436 potential safety breach is detected. A rather simple possibility 437 could be to track the positions of human workers and robots. 438 Whenever a robot gets too close to a human in a non-working area or 439 if a human enters a certain safety zone, robots are stopped to 440 prevent injuries. More advanced concepts could also include image 441 data or combine arbitrary sensor data. 443 In this context, the following research questions can be of interest: 445 o How can the software give guaranteed safety over best-effort 446 networks? 448 o Which sensor information can be combined and how? 450 5. Security Considerations 452 Current in-network computing approaches typically work on unencrypted 453 plain text data, because today's networking devices usually do not 454 have crypto capabilities. As is already mentioned in Section 3.1, 455 this above all poses problems when business data, potentially 456 containing business secrets, is streamed into remote computing 457 facilities and consequently leaves the control of the company. It is 458 thus important to at least establish secure communication paths to 459 the remote facilities. 461 On the shop-floor and within the company, data is mostly communicated 462 without any security measures. This makes developing initial in- 463 network computing techniques easier, but also has severe drawbacks, 464 especially if in-network computing is widely deployed. In this 465 setting, data modifications are not only possible, but even 466 encouraged. Ensuring the correctness of data thus becomes an issue, 467 especially if modifications are cooperatively performed by more than 468 one device. Additionally, unintended modifications could also be 469 executed. It is thus also important for on-premise communication to 470 deploy security or at least authentication functionality. 472 6. IANA Considerations 474 N/A 476 7. Conclusion 478 In-network computing concepts have the potential to improve 479 industrial applications. There are at least three scenarios for 480 which in-network processing can be beneficial, each having a unique 481 set of requirements. 483 In the control scenario, tight latency constraints in the single 484 digit millisecond range have to be satisfied despite the use of cloud 485 platforms and the corresponding unstable latency of the Internet. 487 In a second scenario, large amounts of data have to be transmitted to 488 cloud platforms for further evaluation. One important task here is 489 to reduce the amount of data that needs to be transmitted as the 490 available Internet access speed is most likely non-sufficent. Apart 491 from that, security measures have to be implemented as business data 492 is transmitted to the Internet. 494 Regarding safety, software-based measures often lack the required 495 guarantees and do not withstand the testing for certification. In- 496 network processing with its potential for early responses can be a 497 solution by combining different sensor outputs early and acting 498 quickly. 500 8. Informative References 502 [GLEBKE] Glebke, R., "A Case for Integrated Data Processing in 503 Large-Scale Cyber-Physical Systems", DOI: 10125/60162, in 504 HICSS, January 2019. 506 [I-D.draft-mcbride-edge-data-discovery-overview-01] 507 McBride, M., Kutscher, D., Schooler, E., and C. Bernardos, 508 "Overview of Edge Data Discovery", draft-mcbride-edge- 509 data-discovery-overview-01 (work in progress), March 2019. 511 [RUETH] Rueth, J., "Towards In-Network Industrial Feedback 512 Control", DOI: 10.1145/3229591.3229592, in ACM SIGCOMM 513 NetCompute, August 2018. 515 [SAPIO] Sapio, A., "Scaling Distributed Machine Learning with In- 516 Network Aggregation", 2019, 517 . 519 [TSN] "Time-Sensitive Networking (TSN) Task Group", 2019, 520 . 522 Authors' Addresses 524 Ike Kunze 525 RWTH Aachen University 526 Ahornstr. 55 527 Aachen D-50274 528 Germany 530 Phone: +49-241-80-21422 531 Email: kunze@comsys.rwth-aachen.de 533 Klaus Wehrle 534 RWTH Aachen University 535 Ahornstr. 55 536 Aachen D-50274 537 Germany 539 Phone: +49-241-80-21401 540 Email: wehrle@comsys.rwth-aachen.de