idnits 2.17.1 draft-ietf-roll-indus-routing-reqs-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? -- It seems you're using the 'non-IETF stream' Licence Notice instead Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 22, 2009) is 5573 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'HART' is mentioned on line 1035, but not defined == Outdated reference: A later version (-13) exists of draft-ietf-roll-terminology-00 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Networking Working Group K. Pister, Ed. 3 Internet-Draft Dust Networks 4 Intended status: Informational P. Thubert, Ed. 5 Expires: July 26, 2009 Cisco Systems 6 S. Dwars 7 Shell 8 T. Phinney 9 January 22, 2009 11 Industrial Routing Requirements in Low Power and Lossy Networks 12 draft-ietf-roll-indus-routing-reqs-04 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on July 26, 2009. 37 Copyright Notice 39 Copyright (c) 2009 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. 49 Abstract 51 Wireless, low power field devices enable industrial users to 52 significantly increase the amount of information collected and the 53 number of control points that can be remotely managed. The 54 deployment of these wireless devices will significantly improve the 55 productivity and safety of the plants while increasing the efficiency 56 of the plant workers by extending the information set available from 57 wired systems. In an industrial environment, low power, high 58 reliability, and easy installation and maintenance are mandatory 59 qualities for wireless devices. The aim of this document is to 60 analyze the requirements for the routing protocol used for Low power 61 and Lossy Networks (LLN) in industrial environments. 63 Requirements Language 65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 67 document are to be interpreted as described in RFC 2119 [RFC2119]. 69 Table of Contents 71 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 2.1. Applications and Traffic Patterns . . . . . . . . . . . . 5 74 2.2. Network Topology of Industrial Applications . . . . . . . 9 75 2.2.1. The Physical Topology . . . . . . . . . . . . . . . . 10 76 2.2.2. Logical Topologies . . . . . . . . . . . . . . . . . . 12 77 3. Traffic Characteristics . . . . . . . . . . . . . . . . . . . 13 78 3.1. Service Parameters . . . . . . . . . . . . . . . . . . . . 14 79 3.2. Configurable Application Requirement . . . . . . . . . . . 15 80 3.3. Different Routes for Different Flows . . . . . . . . . . . 15 81 4. Reliability Requirements . . . . . . . . . . . . . . . . . . . 15 82 5. Device-Aware Routing Requirements . . . . . . . . . . . . . . 17 83 6. Broadcast/Multicast . . . . . . . . . . . . . . . . . . . . . 18 84 7. Route Establishment Time . . . . . . . . . . . . . . . . . . . 19 85 8. Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 9. Manageability . . . . . . . . . . . . . . . . . . . . . . . . 20 87 10. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 88 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 89 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 90 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 91 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 92 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 93 13.3. External Informative References . . . . . . . . . . . . . 23 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24 96 1. Terminology 98 This document employes terminology defined in the ROLL terminology 99 document [I-D.ietf-roll-terminology]. This document also refers to 100 industrial standards: 102 HART: "Highway Addressable Remote Transducer", a group of 103 specifications for industrial process and control devices 104 administered by the HART Foundation (see [HART]). The latest version 105 for the specifications is HART7 which includes the additions for 106 WirelessHART. 108 ISA: "International Society of Automation". ISA is an ANSI 109 accredited standards-making society. ISA100 is an ISA committee 110 whose charter includes defining a family of standards for industrial 111 automation. [ISA100.11a] is a working group within ISA100 that is 112 working on a standard for monitoring and non-critical process control 113 applications. 115 2. Introduction 117 Wireless, low-power field devices enable industrial users to 118 significantly increase the amount of information collected and the 119 number of control points that can be remotely managed. The 120 deployment of these wireless devices will significantly improve the 121 productivity and safety of the plants while increasing the efficiency 122 of the plant workers. IPv6 is perceived as a key technology to 123 provide the scalability and interoperability that are required in 124 that space and is being more and more present in standards and 125 products under development and early deployments. 127 Cable is perceived as a more proven, safer techhnology, and existing, 128 operational deployments are very stable in time. For these reasons, 129 it is not expected that wireless will replace wire in any foreseeable 130 future; the consensus in the industrial space is rather that wireless 131 will tremendously augment the scope and benefits of automation by 132 enabling the control of devices that were not connected in the past 133 for reasons of cost and/or deployment complexities. But for LLN to 134 be adopted in the industrial environment, the wireless network needs 135 to have three qualities: low power, high reliability, and easy 136 installation and maintenance. The routing protocol used for low 137 power and lossy networks (LLN) is important to fulfilling these 138 goals. 140 Industrial automation is segmented into two distinct application 141 spaces, known as "process" or "process control" and "discrete 142 manufacturing" or "factory automation". In industrial process 143 control, the product is typically a fluid (oil, gas, chemicals ...). 144 In factory automation or discrete manufacturing, the products are 145 individual elements (screws, cars, dolls). While there is some 146 overlap of products and systems between these two segments, they are 147 surprisingly separate communities. The specifications targeting 148 industrial process control tend to have more tolerance for network 149 latency than what is needed for factory automation. 151 Irrespective of this different 'process' and 'discrete' plant nature 152 both plant types will have similar needs for automating the 153 collection of data that used to be collected manually, or was not 154 collected before. Examples are wireless sensors that report the 155 state of a fuse, report the state of a luminary, HVAC status, report 156 vibration levels on pumps, report man-down, and so on. 158 Other novel application arenas that equally apply to both 'process' 159 and 'discrete' involve mobile sensors that roam in and out of plants, 160 such as active sensor tags on containers or vehicles. 162 Some if not all of these applications will need to be served by the 163 same low power and lossy wireless network technology. This may mean 164 several disconnected, autonomous LLN networks connecting to multiple 165 hosts, but sharing the same ether. Interconnecting such networks, if 166 only to supervise channel and priority allocations, or to fully 167 synchronize, or to share path capacity within a set of physical 168 network components may be desired, or may not be desired for 169 practical reasons, such as e.g. cyber security concerns in relation 170 to plant safety and integrity. 172 All application spaces desire battery operated networks of hundreds 173 of sensors and actuators communicating with LLN access points. In an 174 oil refinery, the total number of devices might exceed one million, 175 but the devices will be clustered into smaller networks that in most 176 cases interconnect and report to an existing plant network 177 infrastructure. 179 Existing wired sensor networks in this space typically use 180 communication protocols with low data rates, from 1,200 baud (e.g. 181 wired HART) to the one to two hundred Kbps range for most of the 182 others. The existing protocols are often master/slave with command/ 183 response. 185 2.1. Applications and Traffic Patterns 187 The industrial market classifies process applications into three 188 broad categories and six classes. 190 o Safety 192 * Class 0: Emergency action - Always a critical function 194 o Control 196 * Class 1: Closed loop regulatory control - Often a critical 197 function 199 * Class 2: Closed loop supervisory control - Usually non-critical 200 function 202 * Class 3: Open loop control - Operator takes action and controls 203 the actuator (human in the loop) 205 o Monitoring 207 * Class 4: Alerting - Short-term operational effect (for example 208 event-based maintenance) 210 * Class 5: Logging and downloading / uploading - No immediate 211 operational consequence (e.g., history collection, sequence-of- 212 events, preventive maintenance) 214 Safety critical functions effect the basic safety integrity of the 215 plant. These normally dormant functions kick in only when process 216 control systems, or their operators, have failed. By design and by 217 regular interval inspection, they have a well-understood probability 218 of failure on demand in the range of typically once per 10-1000 219 years. 221 In-time deliveries of messages becomes more relevant as the class 222 number decreases. 224 Note that for a control application, the jitter is just as important 225 as latency and has a potential of destabilizing control algorithms. 227 Industrial users are interested in deploying wireless networks for 228 the monitoring classes 4 and 5, and in the non-critical portions of 229 classes 3 through 2. 231 Classes 4 and 5 also include asset monitoring and tracking which 232 include equipment monitoring and are essentially separate from 233 process monitoring. An example of equipment monitoring is the 234 recording of motor vibrations to detect bearing wear. However, 235 similar sensors detecting excessive vibration levels could be used as 236 safeguarding loops that immediately initiate a trip, and thus end up 237 being class 0. 239 In the near future, most LLN systems in industrial automation 240 environments will be for low frequency data collection. Packets 241 containing samples will be generated continuously, and 90% of the 242 market is covered by packet rates of between 1/s and 1/hour, with the 243 average under 1/min. In industrial process, these sensors include 244 temperature, pressure, fluid flow, tank level, and corrosion. Some 245 sensors are bursty, such as vibration monitors that may generate and 246 transmit tens of kilo-bytes (hundreds to thousands of packets) of 247 time-series data at reporting rates of minutes to days. 249 Almost all of these sensors will have built-in microprocessors that 250 may detect alarm conditions. Time-critical alarm packets are 251 expected to be granted a lower latency than periodic sensor data 252 streams. 254 Some devices will transmit a log file every day, again with typically 255 tens of Kbytes of data. For these applications there is very little 256 "downstream" traffic coming from the LLN access point and traveling 257 to particular sensors. During diagnostics, however, a technician may 258 be investigating a fault from a control room and expect to have "low" 259 latency (human tolerable) in a command/response mode. 261 Low-rate control, often with a "human in the loop" (also referred to 262 as "open loop"), is implemented via communication to a control room 263 because that's where the human in the loop will be. The sensor data 264 makes its way through the LLN access point to the centralized 265 controller where it is processed, the operator sees the information 266 and takes action, and the control information is then sent out to the 267 actuator node in the network. 269 In the future, it is envisioned that some open loop processes will be 270 automated (closed loop) and packets will flow over local loops and 271 not involve the LLN access point. These closed loop controls for 272 non-critical applications will be implemented on LLNs. Non-critical 273 closed loop applications have a latency requirement that can be as 274 low as 100 ms but many control loops are tolerant of latencies above 275 1 s. 277 More likely though is that loops will be closed in the field 278 entirely, and in such a case, having wireless links within the 279 control loop does not usually present actual value. Most control 280 loops have sensors and actuators within such proximity that a wire 281 between them remains the most sensible option from an economic point 282 of view. This 'control in the field' architecture is already common 283 practice with wired field busses. An 'upstream' wireless link would 284 only be used to influence the in-field controller settings, and to 285 occasionally capture diagnostics. Even though the link back to a 286 control room might be a wireless, this architecture reduces the tight 287 latency and availability requirements for the wireless links. 289 Closing loops in the field: 291 o does not prevent the same loop from being closed through a remote 292 multi-variable controller during some modes of operation, while 293 being closed directly in the field during other modes of operation 294 (e.g., fallback, or when timing is more critical) 296 o does not imply that the loop will be closed with a wired 297 connection, or that the wired connection is more energy efficient 298 even when it exists as an alternate to the wireless connection. 300 A realistic future scenario is for a field device with a battery or 301 ultra-capacitor power storage to have both wireless and unpowered 302 wired communications capability (e.g., galvanically isolated RS-485), 303 where the wireless communication is more flexible and, for local loop 304 operation, more energy efficient, and the wired communication 305 capability serves as a backup interconnect among the loop elements, 306 but without a wired connection back to the operations center 307 blockhouse. In other words, the loop elements are interconnected 308 through wiring to a nearby junction box, but the 2 km home-run link 309 from the junction box to the control center does not exist. 311 When wireless communication conditions are good, devices use wireless 312 for loop interconnect, and either one wireless device reports alarms 313 and other status to the control center for all elements of the loop 314 or each element reports independently. When wireless communications 315 are sporadic, the loop interconnect uses the self-powered 316 galvanically-isolated RS-485 link and one of the devices with good 317 wireless communications to the control center serves as a router for 318 those devices which are unable to contact the control center 319 directly. 321 The above approach is particularly attractive for large storage tanks 322 in tank farms, where devices may not all have good wireless 323 visibility of the control center, and where a home run cable from the 324 tank to the control center is undesirable due to the electro- 325 potential differences between the tank location and the distant 326 control center that arise during lightning storms. 328 In fast control, tens of milliseconds of latency is typical. In many 329 of these systems, if a packet does not arrive within the specified 330 interval, the system enters an emergency shutdown state, often with 331 substantial financial repercussions. For a one-second control loop 332 in a system with a mean-time between shutdowns target of 30 years, 333 the latency requirement implies nine 9s of reliability. Given such 334 exposure, given the intrinsic vulnerability of wireless link 335 availability, and given the emergence of control in the field 336 architectures, most users tend to not aim for fast closed loop 337 control with wireless links within that fast loop. 339 2.2. Network Topology of Industrial Applications 341 Although network topology is difficult to generalize, the majority of 342 existing applications can be met by networks of 10 to 200 field 343 devices and maximum number of hops of twenty. It is assumed that the 344 field devices themselves will provide routing capability for the 345 network, and additional repeaters/routers will not be required in 346 most cases. 348 For the vast majority of industrial applications, the traffic is 349 mostly composed of real time publish/subscribe sensor data also 350 referred to as buffered, from the field devices over a LLN towards 351 one or more sinks. Increasingly over time, these sinks will be a 352 part of a backbone but today they are often fragmented and isolated. 354 The wireless sensor network is a LLN of field devices for which two 355 logical roles are defined, the field routers and the non routing 356 devices. It is acceptable and even probable that the repartition of 357 the roles across the field devices change over time to balance the 358 cost of the forwarding operation amongst the nodes. 360 In order to scale a control network in terms of density, one possible 361 architecture is to deploy a backbone as a canopy that aggregates 362 multiple smaller LLNs. The backbone is a high-speed infrastructure 363 network that may interconnect multiple WSNs through backbone routers. 364 Infrastructure devices can be connected to the backbone. A gateway / 365 manager that interconnects the backbone to the plant network of the 366 corporate network can be viewed as collapsing the backbone and the 367 infrastructure devices into a single device that operates all the 368 required logical roles. The backbone is likely to become an option 369 in the industrial network. 371 Typically, such backbones interconnect to the 'legacy' wired plant 372 infrastructure, the plant network, also known as the 'Process Control 373 Domain', the PCD. These plant automation networks are domain wise 374 segregated from the office network or office domain (OD), which in 375 itself is typically segregated from the Internet. 377 Sinks for LLN sensor data reside on both the plant network PCD, the 378 business network OD, and on the Internet. Applications close to 379 existing plant automation, such as wired process control and 380 monitoring systems running on fieldbusses, that require high 381 availability and low latencies, and that are managed by 'Control and 382 Automation' departments typically reside on the PCD. Other 383 applications such as automated corrosion monitoring, cathodic 384 protection voltage verification, or machine condition (vibration) 385 monitoring where one sample per week is considered over sampling, 386 would more likely deliver their sensor readings in the office domain. 387 Such applications are 'owned' by e.g. maintenance departments. 389 Yet other applications like third party maintained luminaries, or 390 vendor managed inventory systems, where a supplier of chemicals needs 391 access to tank level readings at his customer's site, will be best 392 served with direct Internet connectivity all the way to its sensor at 393 his customer's site. Temporary 'Babysitting sensors' deployed for 394 just a few days, say during startup or troubleshooting or for ad-hoc 395 measurement campaigns for R and D purposes are other examples where 396 Internet would be the domain where wireless sensor data shall land, 397 and other domains such as office and plant should preferably be 398 circumvented if quick deployment without potentially impacting plant 399 safety integrity is required. 401 This multiple domain multiple applications connectivity creates a 402 significant challenge. Many different applications will all share 403 the same medium, the ether, within the fence, preferably sharing the 404 same frequency bands, and preferably sharing the same protocols, 405 preferably synchronized to optimize co-existence challenges, yet 406 logically segregated to avoid creation of intolerable short cuts 407 between existing wired domains. 409 Given this challenge, LLN networks are best to be treated as all 410 sitting on yet another segregated domain, segregated from all other 411 wired domains where conventional security is organized by perimeter. 412 Moving away from the traditional perimeter security mindset means 413 moving towards stronger end-device identity authentication, so that 414 LLN access points can split the various wireless data streams and 415 interconnect back to the appropriate domain pending identity and 416 trust established by the gateways in the authenticity of message 417 originators. 419 Similar considerations are to be given to how multiple applications 420 may or may not be allowed to share routing devices and their 421 potentially redundant bandwidth within the network. Challenges here 422 are to balance available capacity, required latencies, expected 423 priorities, and last but not least available (battery) energy within 424 the routing devices. 426 2.2.1. The Physical Topology 428 There is no specific physical topology for an industrial process 429 control network. One extreme example is a multi-square-kilometer 430 refinery where isolated tanks, some of them with power but most with 431 no backbone connectivity, compose a farm that spans over of the 432 surface of the plant. A few hundred field devices are deployed to 433 ensure the global coverage using a wireless self-forming self-healing 434 mesh network that might be 5 to 10 hops across. Local feedback loops 435 and mobile workers tend to be only one or two hops. The backbone is 436 in the refinery proper, many hops away. Even there, powered 437 infrastructure is also typically several hops away. So hopping to/ 438 from the powered infrastructure will in general be more costly than 439 the direct route. 441 In the opposite extreme case, the backbone network spans all the 442 nodes and most nodes are in direct sight of one or more backbone 443 router. Most communication between field devices and infrastructure 444 devices as well as field device to field device occurs across the 445 backbone. Form afar, this model resembles the WIFI ESS (Extended 446 Service Set). But from a layer 3 perspective, the issues are the 447 default (backbone) router selection and the routing inside the 448 backbone whereas the radio hop towards the field device is in fact a 449 simple local delivery. 450 ---+------------------------ 451 | Plant Network 452 | 453 +-----+ 454 | | Gateway 455 | | 456 +-----+ 457 | 458 | Backbone 459 +--------------------+------------------+ 460 | | | 461 +-----+ +-----+ +-----+ 462 | | Backbone | | Backbone | | Backbone 463 | | router | | router | | router 464 +-----+ +-----+ +-----+ 465 o o o o o o o o o o o o o 466 o o o o o o o o o o o o o o o o o o 467 o o o o o o o o o o o M o o o o o 468 o o M o o o o o o o o o o o o o 469 o o o o o o o o o 470 o o o o o 471 LLN 473 Figure 1: Backbone-based Physical Topology 475 2.2.2. Logical Topologies 477 Most of the traffic over the LLN is publish/subscribe of sensor data 478 from the field device towards a sink that can be a backbone router, a 479 gateway, or a controller/manager. The destination of the sensor data 480 is an Infrastructure devices that sits on the backbone and is 481 reachable via one or more backbone router. 483 For security, reliability, availability or serviceability reasons, it 484 is often required that the logical topologies are not physically 485 congruent over the radio network, that is they form logical 486 partitions of the LLN. For instance, a routing topology that is set 487 up for control should be isolated from a topology that reports the 488 temperature and the status of the vents, if that second topology has 489 lesser constraints for the security policy. This isolation might be 490 implemented as Virtual LANs and Virtual Routing Tables in shared 491 nodes in the backbone, but correspond effectively to physical nodes 492 in the wireless network. 494 Since publishing the data is the raison d'etre for most of the 495 sensors, in some cases it makes sense to build proactively a set of 496 routes between the sensors and one or more backbone router and 497 maintain those routes at all time. Also, because of the lossy nature 498 of the network, the routing in place should attempt to propose 499 multiple paths in the form of Directed Acyclic Graphs oriented 500 towards the destination. 502 In contrast with the general requirement of maintaining default 503 routes towards the sinks, the need for field device to field device 504 connectivity is very specific and rare, though the traffic associated 505 might be of foremost importance. Field device to field device routes 506 are often the most critical, optimized and well-maintained routes. A 507 class 0 control loop requires guaranteed delivery and extremely tight 508 response times. Both the respect of criteria in the route 509 computation and the quality of the maintenance of the route are 510 critical for the field devices operation. Typically, a control loop 511 will be using a dedicated direct wire that has very different 512 capabilities, cost and constraints than the wireless medium, with the 513 need to use a wireless path as a back up route only in case of loss 514 of the wired path. 516 Considering that though each field device to field device route 517 computation has specific constraints in terms of latency and 518 availability it can be expected that the shortest path possible will 519 often be selected and that this path will be routed inside the LLN as 520 opposed to via the backbone. It can also be noted that the lifetimes 521 of the routes might range from minutes for a mobile workers to tens 522 of years for a command and control closed loop. Finally, time- 523 varying user requirements for latency and bandwidth will change the 524 constraints on the routes, which might either trigger a constrained 525 route recomputation, a reprovisioning of the underlying L2 protocols, 526 or both in that order. For instance, a wireless worker may initiate 527 a bulk transfer to configure or diagnose a field device. A level 528 sensor device may need to perform a calibration and send a bulk file 529 to a plant. 531 3. Traffic Characteristics 533 The industrial applications fall into four large service categories 534 [ISA100.11a]: 536 1. Periodic data (aka buffered). Data that is generated 537 periodically and has a well understood data bandwidth 538 requirement, both deterministic and predictable. Timely delivery 539 of such data is often the core function of a wireless sensor 540 network and permanent resources are assigned to ensure that the 541 required bandwidth stays available. Buffered data usually 542 exhibits a short time to live, and the newer reading obsoletes 543 the previous. In some cases, alarms are low priority information 544 that gets repeated over and over. The end-to-end latency of this 545 data is not as important as the regularity with which the data is 546 presented to the plant application. 548 2. Event data. This category includes alarms and aperiodic data 549 reports with bursty data bandwidth requirements. In certain 550 cases, alarms are critical and require a priority service from 551 the network. 553 3. Client/Server. Many industrial applications are based on a 554 client/server model and implement a command response protocol. 555 The data bandwidth required is often bursty. The acceptable 556 round-trip latency for some legacy systems was based on the time 557 to send tens of bytes over a 1200 baud link. Hundreds of 558 milliseconds is typical. This type of request is statistically 559 multiplexed over the LLN and cost-based fair-share best-effort 560 service is usually expected. 562 4. Bulk transfer. Bulk transfers involve the transmission of blocks 563 of data in multiple packets where temporary resources are 564 assigned to meet a transaction time constraint. Transient 565 resources are assigned for a limited period of time (related to 566 file size and data rate) to meet the bulk transfers service 567 requirements. 569 3.1. Service Parameters 571 The following service parameters can affect routing decisions in a 572 resource-constrained network: 574 o Data bandwidth - the bandwidth might be allocated permanently or 575 for a period of time to a specific flow that usually exhibits well 576 defined properties of burstiness and throughput. Some bandwidth 577 will also be statistically shared between flows in a best effort 578 fashion. 580 o Latency - the time taken for the data to transit the network from 581 the source to the destination. This may be expressed in terms of 582 a deadline for delivery. Most monitoring latencies will be in 583 seconds to minutes. 585 o Transmission phase - process applications can be synchronized to 586 wall clock time and require coordinated transmissions. A common 587 coordination frequency is 4 Hz (250 ms). 589 o Service contract type - revocation priority. LLNs have limited 590 network resources that can vary with time. This means the system 591 can become fully subscribed or even over subscribed. System 592 policies determine how resources are allocated when resources are 593 over subscribed. The choices are blocking and graceful 594 degradation. 596 o Transmission priority - the means by which limited resources 597 within field devices are allocated across multiple services. For 598 transmissions, a device has to select which packet in its queue 599 will be sent at the next transmission opportunity. Packet 600 priority is used as one criterion for selecting the next packet. 601 For reception, a device has to decide how to store a received 602 packet. The field devices are memory constrained and receive 603 buffers may become full. Packet priority is used to select which 604 packets are stored or discarded. 606 The routing protocol MUST also support different metric types for 607 each link used to compute the path according to some objective 608 function (e.g. minimize latency). 610 For these reasons, the ROLL routing infrastructure is required to 611 compute and update constrained routes on demand, and it can be 612 expected that this model will become more prevalent for field device 613 to field device connectivity as well as for some field device to 614 Infrastructure devices over time. 616 Industrial application data flows between field devices are not 617 necessarily symmetric. In particular, asymmetrical cost and 618 unidirectional routes are common for published data and alerts, which 619 represent the most part of the sensor traffic. The routing protocol 620 MUST be able to compute a set of unidirectional routes with 621 potentially different costs that are composed of one or more non- 622 congruent paths. 624 3.2. Configurable Application Requirement 626 Time-varying user requirements for latency and bandwidth may require 627 changes in the provisioning of the underlying L2 protocols. A 628 technician may initiate a query/response session or bulk transfer to 629 diagnose or configure a field device. A level sensor device may need 630 to perform a calibration and send a bulk file to a plant. The 631 routing protocol MUST route on paths that are changed to 632 appropriately provision the application requirements. The routing 633 protocol MUST support the ability to recompute paths based on Network 634 Layer abstractions of the underlying link attributes/metric that may 635 change dynamically. 637 3.3. Different Routes for Different Flows 639 Because different services categories have different service 640 requirements, it is often desirable to have different routes for 641 different data flows between the same two endpoints. For example, 642 alarm or periodic data from A to Z may require path diversity with 643 specific latency and reliability. A file transfer between A and Z 644 may not need path diversity. The routing algorithm MUST be able to 645 generate different routes with different characteritics (e.g. 646 Optimized according to different cost, etc...). 648 4. Reliability Requirements 650 LLN reliability constitutes several unrelated aspects: 652 1) Availability of source to destination connectivity when the 653 application needs it, expressed in number of succeses / number of 654 attempts 656 2) Availability of source to destination connectivity when the 657 application might need it, expressed in number of potential 658 failures / available bandwidth, 660 3) Ability, expressed in number of successes divided by number of 661 attempts to get data delivered from source to destination within 662 a capped time, 664 4) How well a network (serving many applications) achieves end-to- 665 end delivery of packets within a bounded latency 667 5) Trustworthiness of data that is delivered to the sinks. 669 6) ... 671 This makes quantifying reliability the equivalent of plotting it on a 672 three plus dimensional graph. Different applications have different 673 requirements, and expressing reliability as a one dimensional 674 parameter, like 'reliability my wireless network is 99.9%' is often 675 creating more confusion than clarity. 677 The impact of not receiving sensor data due to sporadic network 678 outages can be devastating if this happens unnoticed. However, if 679 destinations that expect periodic sensor data or alarm status 680 updates, fail to get them, then automatically these systems can take 681 appropriate actions that prevent dangerous situations. Pending the 682 wireless application, appropriate action ranges from initiating a 683 shut down within 100 ms, to using a last known good value for as much 684 as N successive samples, to sending out an operator into the plant to 685 collect monthly data in the conventional way, i.e. some portable 686 sensor, paper and a clipboard. 688 The impact of receiving corrupted data, and not being able to detect 689 that received data is corrupt, is often more dangerous. Data 690 corruption can either come from random bit errors, so white noise, or 691 from occasional bursty interference sources like thunderstorms or 692 leaky microwave ovens, but also from conscious attacks by 693 adversaries. 695 Another critical aspect for the routing is the capability to ensure 696 maximum disruption time and route maintainance. The maximum 697 disruption time is the time it takes at most for a specific path to 698 be restored when broken. Route maintainance ensures that a path is 699 monitored to be restored when broken within the maximum disruption 700 time. Maintenance should also ensure that a path continues to 701 provide the service for which it was established for instance in 702 terms of bandwidth, jitter and latency. 704 In industrial applications, availability is usually defined with 705 respect to end-to-end delivery of packets within a bounded latency. 706 availability requirements vary over many orders of magnitude. Some 707 non-critical monitoring applications may tolerate a availability of 708 less than 90% with hours of latency. Most industrial standards, such 709 as HART7, have set user availability expectations at 99.9%. 710 Regulatory requirements are a driver for some industrial 711 applications. Regulatory monitoring requires high data integrity 712 because lost data is assumed to be out of compliance and subject to 713 fines. This can drive up either availability, or thrustworthiness 714 requirements. 716 Because LLN link stability is often low, path diversity is critical. 717 Hop-by-hop link diversity is used to improve latency-bounded 718 reliability by sending data over diverse paths. 720 Because data from field devices are aggregated and funneled at the 721 LLN access point before they are routed to plant applications, LLN 722 access point redundancy is an important factor in overall 723 availability. A route that connects a field device to a plant 724 application may have multiple paths that go through more than one LLN 725 access point. The routing protocol MUST be able to compute paths of 726 not-necessarily-equal cost toward a given destination so as to enable 727 load balancing across a variety of paths. The availability of each 728 path in a multipath route can change over time. Hence, it is 729 important to measure the availability on a per-path basis and select 730 a path (or paths) according to the availability requirements. 732 5. Device-Aware Routing Requirements 734 Wireless LLN nodes in industrial environments are powered by a 735 variety of sources. Battery operated devices with lifetime 736 requirements of at least five years are the most common. Battery 737 operated devices have a cap on their total energy, and typically can 738 report an estimate of remaining energy, and typically do not have 739 constraints on the short-term average power consumption. Energy 740 scavenging devices are more complex. These systems contain both a 741 power scavenging device (such as solar, vibration, or temperature 742 difference) and an energy storage device, such as a rechargeable 743 battery or a capacitor. These systems, therefore, have limits on 744 both long-term average power consumption (which cannot exceed the 745 average scavenged power over the same interval) as well as the short- 746 term limits imposed by the energy storage requirements. For solar- 747 powered systems, the energy storage system is generally designed to 748 provide days of power in the absence of sunlight. Many industrial 749 sensors run off of a 4-20 mA current loop, and can scavenge on the 750 order of milliwatts from that source. Vibration monitoring systems 751 are a natural choice for vibration scavenging, which typically only 752 provides tens or hundreds of microwatts. Due to industrial 753 temperature ranges and desired lifetimes, the choices of energy 754 storage devices can be limited, and the resulting stored energy is 755 often comparable to the energy cost of sending or receiving a packet 756 rather than the energy of operating the node for several days. And 757 of course, some nodes will be line-powered. 759 Example 1: solar panel, lead-acid battery sized for two weeks of 760 rain. 762 Example 2: vibration scavenger, 1mF tantalum capacitor. 764 Field devices have limited resources. Low-power, low-cost devices 765 have limited memory for storing route information. Typical field 766 devices will have a finite number of routes they can support for 767 their embedded sensor/actuator application and for forwarding other 768 devices packets in a mesh network slotted-link. 770 Users may strongly prefer that the same device have different 771 lifetime requirements in different locations. A sensor monitoring a 772 non-critical parameter in an easily accessed location may have a 773 lifetime requirement that is shorter and tolerate more statistical 774 variation than a mission-critical sensor in a hard-to-reach place 775 that requires a plant shutdown in order to replace. 777 The routing algorithm MUST support node-constrained routing (e.g. 778 taking into account the existing energy state as a node constraint). 779 Node constraints include power and memory, as well as constraints 780 placed on the device by the user, such as battery life. 782 6. Broadcast/Multicast 784 Some existing industrial plant applications do not use broadcast or 785 multicast addressing to communicate to field devices. Unicast 786 address support is sufficient for them. 788 In some other industrial process automation environments, multicast 789 over IP is used to deliver to multiple nodes that may be 790 functionally-similar or not. Example usages are: 792 1) Delivery of alerts to multiple similar servers in an automation 793 control room. Alerts are multicast to a group address based on 794 the part of the automation process where the alerts arose (e.g., 795 the multicast address "all-nodes-interested-in-alerts-for- 796 process-unit-X"). This is always a restricted-scope multicast, 797 not a broadcast 799 2) Delivery of common packets to multiple routers over a backbone, 800 where the packets results in each receiving router initiating 801 multicast (sometimes as a full broadcast) within the LLN. For 802 instance, This can be a byproduct of having potentially 803 physically separated backbone routers that can inject messages 804 into different portions of the same larger LLN. 806 3) Publication of measurement data to more than one subscriber. 807 This feature is useful in some peer to peer control applications. 808 For example, level position may be useful to a controller that 809 operates the flow valve and also to the overfill alarm indicator. 810 Both controller and alarm indicator would receive the same 811 publication sent as a multicast by the level gauge. 813 Both of these uses require an 1:N security mechanism as well; they 814 aren't of any use if the end-to-end security is only point-to-point. 816 It is quite possible that first-generation wireless automation field 817 networks can be adequately useful without either of these 818 capabilities, but in the near future, wireless field devices with 819 communication controllers and protocol stacks will require control 820 and configuration, such as firmware downloading, that may benefit 821 from broadcast or multicast addressing. 823 The routing protocol SHOULD support broadcast or multicast 824 addressing. 826 7. Route Establishment Time 828 During network formation, installers with no networking skill must be 829 able to determine if their devices are "in the network" with 830 sufficient connectivity to perform their function. Installers will 831 have sufficient skill to provision the devices with a sample rate or 832 activity profile. The routing algorithm MUST find the appropriate 833 route(s) and report success or failure within several minutes, and 834 SHOULD report success or failure within tens of seconds. 836 Network connectivity in real deployments is always time varying, with 837 time constants from seconds to months. So long as the underlying 838 connectivity has not been compromised, this link churn should not 839 substantially affect network operation. The routing algorithm MUST 840 respond to normal link failure rates with routes that meet the 841 Service requirements (especially latency) throughout the routing 842 response. The routing algorithm SHOULD always be in the process of 843 recalculating the route in response to changing link statistics. The 844 routing algorithm MUST recalculate the paths when field devices 845 change due to insertion, removal or failure, and this recalculation 846 MUST NOT cause latencies greater than the specified constraints 847 (typically seconds to minutes). 849 8. Mobility 851 Various economic factors have contributed to a reduction of trained 852 workers in the plant. The industry as a whole appears to be trying 853 to solve this problem with what is called the "wireless worker". 854 Carrying a PDA or something similar, this worker will be able to 855 accomplish more work in less time than the older, better-trained 856 workers that he or she replaces. Whether the premise is valid, the 857 use case is commonly presented: the worker will be wirelessly 858 connected to the plant IT system to download documentation, 859 instructions, etc., and will need to be able to connect "directly" to 860 the sensors and control points in or near the equipment on which he 861 or she is working. It is possible that this "direct" connection 862 could come via the normal LLNs data collection network. This 863 connection is likely to require higher bandwidth and lower latency 864 than the normal data collection operation. 866 Undecided yet is if these PDAs will use the LLN network directly to 867 talk to field sensors, or if they will rather use other wireless 868 connectivity that proxys back into the field, or to anywhere else, 869 the user interfaces typically used for plant historians, asset 870 management systems, and the likes. 872 The routing protocol SHOULD support the wireless worker with fast 873 network connection times of a few of seconds, and low command and 874 response latencies to the plant behind the LLN access points, to 875 applications, and to field devices. The routing protocol SHOULD also 876 support the bandwidth allocation for bulk transfers between the field 877 device and the handheld device of the wireless worker. The routing 878 protocol SHOULD support walking speeds for maintaining network 879 connectivity as the handheld device changes position in the wireless 880 network. 882 Some field devices will be mobile. These devices may be located on 883 moving parts such as rotating components or they may be located on 884 vehicles such as cranes or fork lifts. The routing protocol SHOULD 885 support vehicular speeds of up to 35 kmph. 887 9. Manageability 889 The process and control industry is manpower constrained. The aging 890 demographics of plant personnel are causing a looming manpower 891 problem for industry across many markets. The goal for the 892 industrial networks is to have the installation process not require 893 any new skills for the plant personnel. The person would install the 894 wireless sensor or wireless actuator the same way the wired sensor or 895 wired actuator is installed, except the step to connect wire is 896 eliminated. 898 Most users in fact demand even much further simplified provisioning 899 methods, whereby automatically any new device will connect and report 900 at the LLN access point. This requires availability of open and 901 untrusted side channels for new joiners, and it requires strong and 902 automated authentication so that networks can automatically accept or 903 reject new joiners. Ideally, for a user, adding new devices should 904 be as easy as dragging and dropping an icon from a pool of 905 authenticated new joiners into a pool for the wired domain that this 906 new sensor should connect to. Under the hood, invisible to the user, 907 auditable security mechanisms should take care of new device 908 authentication, and secret join key distribution. These more 909 sophisticated 'over the air' secure provisioning methods should 910 eliminate the use of traditional configuration tools for setting up 911 devices prior to being ready to securely join a LLN access point. 913 There will be many new applications where even without any human 914 intervention at the plant, devices that have never been on site 915 before, should be allowed, based on their credentials and crypto 916 capabilities, to connect anyway. Examples are 3rd party road 917 tankers, rail cargo containers with overfill protection sensors, or 918 consumer cars that need to be refueled with hydrogen by robots at 919 future petrol stations. 921 The routing protocol for LLNs is expected to be easy to deploy and 922 manage. Because the number of field devices in a network is large, 923 provisioning the devices manually may not make sense. The routing 924 MAY require commissioning of information about the node itself, like 925 identity, security tokens, radio standards and frequencies, etc. The 926 routing protocol SHOULD NOT require to preprovision information about 927 the environment where the node will be deployed. The routing 928 protocol MUST enable the full discovery and setup of the environment 929 (available links, selected peers, reachable network).The protocol 930 also MUST support the distribution of configuration from a 931 centralized management controller if operator-initiated configuration 932 change is allowed. 934 10. Security 936 Given that wireless sensor networks in industrial automation operate 937 in systems that have substantial financial and human safety 938 implications, security is of considerable concern. Levels of 939 security violation that are tolerated as a "cost of doing business" 940 in the banking industry are not acceptable when in some cases 941 literally thousands of lives may be at risk. 943 Security is easily confused with guarantee for availability. When 944 discussing wireless security, it's important to distinguish clearly 945 between the risks of temporary losing connectivity, say due to a 946 thunderstorm, and the risks associated with knowledgeable adversaries 947 attacking a wireless system. The conscious attacks need to be split 948 between 1) attacks on the actual application served by the wireless 949 devices and 2) attacks that exploit the presence of a wireless access 950 point that may provide connectivity onto legacy wired plant networks, 951 so attacks that have little to do with the wireless devices in the 952 LLNs. The second type of attack, access points that might be 953 wireless backdoors that may allow an attacker outside the fence to 954 access typically non-secured process control and/or office networks, 955 are typically the ones that do create exposures where lives are at 956 risk. This implies that the LLN access point on its own must possess 957 functionality that guarantees domain segregation, and thus prohibits 958 many types of traffic further upstream. 960 Current generation industrial wireless device manufactures are 961 specifying security at the MAC layer and the transport layer. A 962 shared key is used to authenticate messages at the MAC layer. At the 963 transport layer, commands are encrypted with unique randomly- 964 generated end-to-end Session keys. HART7 and ISA100.11a are examples 965 of security systems for industrial wireless networks. 967 Although such symmetric key encryption and authentication mechanisms 968 at MAC and transport layers may protect reasonably well during the 969 lifecycle, the initial network boot (provisioning) step in many cases 970 requires more sophisticated steps to securely land the initial secret 971 keys in field devices. It is vital that also during these steps, the 972 ease of deployment and the freedom of mixing and matching products 973 from different suppliers does not complicate life for those that 974 deploy and commission. Given average skill levels in the field, and 975 given serious resource constraints in the market, investing a little 976 bit more in sensor node hardware and software so that new devices 977 automatically can be deemed trustworthy, and thus automatically join 978 the domains that they should join, with just one drag and drop action 979 for those in charge of deploying, will yield in faster adoption and 980 proliferation of the LLN technology. 982 Industrial plants may not maintain the same level of physical 983 security for field devices that is associated with traditional 984 network sites such as locked IT centers. In industrial plants it 985 must be assumed that the field devices have marginal physical 986 security and might be compromised. The routing protocol SHOULD limit 987 the risk incurred by one node being compromised, for instance by 988 proposing non congruent path for a given route and balancing the 989 traffic across the network. 991 The routing protocol SHOULD compartmentalize the trust placed in 992 field devices so that a compromised field device does not destroy the 993 security of the whole network. The routing MUST be configured and 994 managed using secure messages and protocols that prevent outsider 995 attacks and limit insider attacks from field devices installed in 996 insecure locations in the plant. 998 Wireless typically forces abandonance of classical 'by perimeter' 999 thinking when trying to secure network domains. Wireless nodes in 1000 LLN networks should thus be regarded as little islands with trusted 1001 kernels, situated in an ocean of untrusted connectivity, an ocean 1002 that might be full of pirate ships. Consequently, confidence in node 1003 identity and ability to challenge authenticity of source node 1004 credentials gets more relevant. Cryptographic boundaries inside 1005 devices that clearly demark the border between trusted and untrusted 1006 areas need to be drawn. Protection against compromise of the 1007 cryptographic boundaries inside the hardware of devices is outside of 1008 the scope this document. 1010 11. IANA Considerations 1012 This document includes no request to IANA. 1014 12. Acknowledgements 1016 Many thanks to Rick Enns, Alexander Chernoguzov and Chol Su Kang for 1017 their contributions. 1019 13. References 1021 13.1. Normative References 1023 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1024 Requirement Levels", BCP 14, RFC 2119, March 1997. 1026 13.2. Informative References 1028 [I-D.ietf-roll-terminology] 1029 Vasseur, J., "Terminology in Low power And Lossy 1030 Networks", draft-ietf-roll-terminology-00 (work in 1031 progress), October 2008. 1033 13.3. External Informative References 1035 [HART] www.hartcomm.org, "Highway Addressable Remote Transducer", 1036 a group of specifications for industrial process and 1037 control devices administered by the HART Foundation". 1039 [ISA100.11a] 1040 ISA, "ISA100, Wireless Systems for Automation", May 2008, 1041 < http://www.isa.org/Community/ 1042 SP100WirelessSystemsforAutomation>. 1044 Authors' Addresses 1046 Kris Pister (editor) 1047 Dust Networks 1048 30695 Huntwood Ave. 1049 Hayward, 94544 1050 USA 1052 Email: kpister@dustnetworks.com 1054 Pascal Thubert (editor) 1055 Cisco Systems 1056 Village d'Entreprises Green Side 1057 400, Avenue de Roumanille 1058 Batiment T3 1059 Biot - Sophia Antipolis 06410 1060 FRANCE 1062 Phone: +33 497 23 26 34 1063 Email: pthubert@cisco.com 1065 Sicco Dwars 1066 Shell Global Solutions International B.V. 1067 Sir Winston Churchilllaan 299 1068 Rijswijk 2288 DC 1069 Netherlands 1071 Phone: +31 70 447 2660 1072 Email: sicco.dwars@shell.com 1074 Tom Phinney 1075 5012 W. Torrey Pines Circle 1076 Glendale, AZ 85308-3221 1077 USA 1079 Phone: +1 602 938 3163 1080 Email: tom.phinney@cox.net