idnits 2.17.1 draft-ietf-roll-indus-routing-reqs-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** The document seems to lack a License Notice according IETF Trust Provisions of 28 Dec 2009, Section 6.b.i or Provisions of 12 Sep 2009 Section 6.b -- however, there's a paragraph with a matching beginning. Boilerplate error? -- It seems you're using the 'non-IETF stream' Licence Notice instead Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (December 18, 2008) is 5606 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'HART' is mentioned on line 1032, but not defined == Outdated reference: A later version (-13) exists of draft-ietf-roll-terminology-00 Summary: 1 error (**), 0 flaws (~~), 3 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Networking Working Group K. Pister, Ed. 3 Internet-Draft Dust Networks 4 Intended status: Informational P. Thubert, Ed. 5 Expires: June 21, 2009 Cisco Systems 6 S. Dwars 7 Shell 8 T. Phinney 9 December 18, 2008 11 Industrial Routing Requirements in Low Power and Lossy Networks 12 draft-ietf-roll-indus-routing-reqs-03 14 Status of this Memo 16 This Internet-Draft is submitted to IETF in full conformance with the 17 provisions of BCP 78 and BCP 79. 19 Internet-Drafts are working documents of the Internet Engineering 20 Task Force (IETF), its areas, and its working groups. Note that 21 other groups may also distribute working documents as Internet- 22 Drafts. 24 Internet-Drafts are draft documents valid for a maximum of six months 25 and may be updated, replaced, or obsoleted by other documents at any 26 time. It is inappropriate to use Internet-Drafts as reference 27 material or to cite them other than as "work in progress." 29 The list of current Internet-Drafts can be accessed at 30 http://www.ietf.org/ietf/1id-abstracts.txt. 32 The list of Internet-Draft Shadow Directories can be accessed at 33 http://www.ietf.org/shadow.html. 35 This Internet-Draft will expire on June 21, 2009. 37 Copyright Notice 39 Copyright (c) 2008 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (http://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. 49 Abstract 51 Wireless, low power field devices enable industrial users to 52 significantly increase the amount of information collected and the 53 number of control points that can be remotely managed. The 54 deployment of these wireless devices will significantly improve the 55 productivity and safety of the plants while increasing the efficiency 56 of the plant workers by extending the information set available from 57 wired systems. In an industrial environment, low power, high 58 reliability, and easy installation and maintenance are mandatory 59 qualities for wireless devices. The aim of this document is to 60 analyze the requirements for the routing protocol used for Low power 61 and Lossy Networks (LLN) in industrial environments. 63 Requirements Language 65 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 66 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 67 document are to be interpreted as described in RFC 2119 [RFC2119]. 69 Table of Contents 71 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 72 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 2.1. Applications and Traffic Patterns . . . . . . . . . . . . 5 74 2.2. Network Topology of Industrial Applications . . . . . . . 9 75 2.2.1. The Physical Topology . . . . . . . . . . . . . . . . 10 76 2.2.2. Logical Topologies . . . . . . . . . . . . . . . . . . 11 77 3. Traffic Characteristics . . . . . . . . . . . . . . . . . . . 13 78 3.1. Service Parameters . . . . . . . . . . . . . . . . . . . . 13 79 3.2. Configurable Application Requirement . . . . . . . . . . . 14 80 3.3. Different Routes for Different Flows . . . . . . . . . . . 15 81 4. Reliability Requirements . . . . . . . . . . . . . . . . . . . 15 82 5. Device-Aware Routing Requirements . . . . . . . . . . . . . . 17 83 6. Broadcast/Multicast . . . . . . . . . . . . . . . . . . . . . 18 84 7. Route Establishment Time . . . . . . . . . . . . . . . . . . . 19 85 8. Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 86 9. Manageability . . . . . . . . . . . . . . . . . . . . . . . . 20 87 10. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 88 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 23 89 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 90 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 91 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 92 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 93 13.3. External Informative References . . . . . . . . . . . . . 23 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 23 96 1. Terminology 98 This document employes terminology defined in the ROLL terminology 99 document [I-D.ietf-roll-terminology]. This document also refers to 100 industrial standards: 102 HART: "Highway Addressable Remote Transducer", a group of 103 specifications for industrial process and control devices 104 administered by the HART Foundation (see [HART]). The latest version 105 for the specifications is HART7 which includes the additions for 106 WirelessHART. 108 ISA: "International Society of Automation". ISA is an ANSI 109 accredited standards-making society. ISA100 is an ISA committee 110 whose charter includes defining a family of standards for industrial 111 automation. [ISA100.11a] is a working group within ISA100 that is 112 working on a standard for monitoring and non-critical process control 113 applications. 115 2. Introduction 117 Wireless, low-power field devices enable industrial users to 118 significantly increase the amount of information collected and the 119 number of control points that can be remotely managed. The 120 deployment of these wireless devices will significantly improve the 121 productivity and safety of the plants while increasing the efficiency 122 of the plant workers. 124 Cable is perceived as a more proven, safer techhnology, and existing, 125 operational deployments are very stable in time. For these reasons, 126 it is not expected that wireless will replace wire in any foreseeable 127 future; the consensus in the industrial space is rather that wireless 128 will tremendously augment the scope and benefits of automation by 129 enabling the control of devices that were not connected in the past 130 for reasons of cost and/or deployment complexities. But for LLN to 131 be adopted in the industrial environment, the wireless network needs 132 to have three qualities: low power, high reliability, and easy 133 installation and maintenance. The routing protocol used for low 134 power and lossy networks (LLN) is important to fulfilling these 135 goals. 137 Industrial automation is segmented into two distinct application 138 spaces, known as "process" or "process control" and "discrete 139 manufacturing" or "factory automation". In industrial process 140 control, the product is typically a fluid (oil, gas, chemicals ...). 141 In factory automation or discrete manufacturing, the products are 142 individual elements (screws, cars, dolls). While there is some 143 overlap of products and systems between these two segments, they are 144 surprisingly separate communities. The specifications targeting 145 industrial process control tend to have more tolerance for network 146 latency than what is needed for factory automation. 148 Irrespective of this different 'process' and 'discrete' plant nature 149 both plant types will have similar needs for automating the 150 collection of data that used to be collected manually, or was not 151 collected before. Examples are wireless sensors that report the 152 state of a fuse, report the state of a luminary, HVAC status, report 153 vibration levels on pumps, report man-down, and so on. 155 Other novel application arenas that equally apply to both 'process' 156 and 'discrete' involve mobile sensors that roam in and out of plants, 157 such as active sensor tags on containers or vehicles. 159 Some if not all of these applications will need to be served by the 160 same low power and lossy wireless network technology. This may mean 161 several disconnected, autonomous LLN networks connecting to multiple 162 hosts, but sharing the same ether. Interconnecting such networks, if 163 only to supervise channel and priority allocations, or to fully 164 synchronize, or to share path capacity within a set of physical 165 network components may be desired, or may not be desired for 166 practical reasons, such as e.g. cyber security concerns in relation 167 to plant safety and integrity. 169 All application spaces desire battery operated networks of hundreds 170 of sensors and actuators communicating with LLN access points. In an 171 oil refinery, the total number of devices might exceed one million, 172 but the devices will be clustered into smaller networks that in most 173 cases interconnect and report to an existing plant network 174 infrastructure. 176 Existing wired sensor networks in this space typically use 177 communication protocols with low data rates, from 1,200 baud (e.g. 178 wired HART) to the one to two hundred Kbps range for most of the 179 others. The existing protocols are often master/slave with command/ 180 response. 182 2.1. Applications and Traffic Patterns 184 The industrial market classifies process applications into three 185 broad categories and six classes. 187 o Safety 189 * Class 0: Emergency action - Always a critical function 191 o Control 193 * Class 1: Closed loop regulatory control - Often a critical 194 function 196 * Class 2: Closed loop supervisory control - Usually non-critical 197 function 199 * Class 3: Open loop control - Operator takes action and controls 200 the actuator (human in the loop) 202 o Monitoring 204 * Class 4: Alerting - Short-term operational effect (for example 205 event-based maintenance) 207 * Class 5: Logging and downloading / uploading - No immediate 208 operational consequence (e.g., history collection, sequence-of- 209 events, preventive maintenance) 211 Safety critical functions effect the basic safety integrity of the 212 plant. These normally dormant functions kick in only when process 213 control systems, or their operators, have failed. By design and by 214 regular interval inspection, they have a well-understood probability 215 of failure on demand in the range of typically once per 10-1000 216 years. 218 In-time deliveries of messages becomes more relevant as the class 219 number decreases. 221 Note that for a control application, the jitter is just as important 222 as latency and has a potential of destabilizing control algorithms. 224 Industrial users are interested in deploying wireless networks for 225 the monitoring classes 4 and 5, and in the non-critical portions of 226 classes 3 through 2. 228 Classes 4 and 5 also include asset monitoring and tracking which 229 include equipment monitoring and are essentially separate from 230 process monitoring. An example of equipment monitoring is the 231 recording of motor vibrations to detect bearing wear. However, 232 similar sensors detecting excessive vibration levels could be used as 233 safeguarding loops that immediately initiate a trip, and thus end up 234 being class 0. 236 In the near future, most LLN systems in industrial automation 237 environments will be for low frequency data collection. Packets 238 containing samples will be generated continuously, and 90% of the 239 market is covered by packet rates of between 1/s and 1/hour, with the 240 average under 1/min. In industrial process, these sensors include 241 temperature, pressure, fluid flow, tank level, and corrosion. Some 242 sensors are bursty, such as vibration monitors that may generate and 243 transmit tens of kilo-bytes (hundreds to thousands of packets) of 244 time-series data at reporting rates of minutes to days. 246 Almost all of these sensors will have built-in microprocessors that 247 may detect alarm conditions. Time-critical alarm packets are 248 expected to be granted a lower latency than periodic sensor data 249 streams. 251 Some devices will transmit a log file every day, again with typically 252 tens of Kbytes of data. For these applications there is very little 253 "downstream" traffic coming from the LLN access point and traveling 254 to particular sensors. During diagnostics, however, a technician may 255 be investigating a fault from a control room and expect to have "low" 256 latency (human tolerable) in a command/response mode. 258 Low-rate control, often with a "human in the loop" (also referred to 259 as "open loop"), is implemented via communication to a control room 260 because that's where the human in the loop will be. The sensor data 261 makes its way through the LLN access point to the centralized 262 controller where it is processed, the operator sees the information 263 and takes action, and the control information is then sent out to the 264 actuator node in the network. 266 In the future, it is envisioned that some open loop processes will be 267 automated (closed loop) and packets will flow over local loops and 268 not involve the LLN access point. These closed loop controls for 269 non-critical applications will be implemented on LLNs. Non-critical 270 closed loop applications have a latency requirement that can be as 271 low as 100 ms but many control loops are tolerant of latencies above 272 1 s. 274 More likely though is that loops will be closed in the field 275 entirely, and in such a case, having wireless links within the 276 control loop does not usually present actual value. Most control 277 loops have sensors and actuators within such proximity that a wire 278 between them remains the most sensible option from an economic point 279 of view. This 'control in the field' architecture is already common 280 practice with wired field busses. An 'upstream' wireless link would 281 only be used to influence the in-field controller settings, and to 282 occasionally capture diagnostics. Even though the link back to a 283 control room might be a wireless, this architecture reduces the tight 284 latency and availability requirements for the wireless links. 286 Closing loops in the field: 288 o does not prevent the same loop from being closed through a remote 289 multi-variable controller during some modes of operation, while 290 being closed directly in the field during other modes of operation 291 (e.g., fallback, or when timing is more critical) 293 o does not imply that the loop will be closed with a wired 294 connection, or that the wired connection is more energy efficient 295 even when it exists as an alternate to the wireless connection. 297 A realistic future scenario is for a field device with a battery or 298 ultra-capacitor power storage to have both wireless and unpowered 299 wired communications capability (e.g., galvanically isolated RS-485), 300 where the wireless communication is more flexible and, for local loop 301 operation, more energy efficient, and the wired communication 302 capability serves as a backup interconnect among the loop elements, 303 but without a wired connection back to the operations center 304 blockhouse. In other words, the loop elements are interconnected 305 through wiring to a nearby junction box, but the 2 km home-run link 306 from the junction box to the control center does not exist. 308 When wireless communication conditions are good, devices use wireless 309 for loop interconnect, and either one wireless device reports alarms 310 and other status to the control center for all elements of the loop 311 or each element reports independently. When wireless communications 312 are sporadic, the loop interconnect uses the self-powered 313 galvanically-isolated RS-485 link and one of the devices with good 314 wireless communications to the control center serves as a router for 315 those devices which are unable to contact the control center 316 directly. 318 The above approach is particularly attractive for large storage tanks 319 in tank farms, where devices may not all have good wireless 320 visibility of the control center, and where a home run cable from the 321 tank to the control center is undesirable due to the electro- 322 potential differences between the tank location and the distant 323 control center that arise during lightning storms. 325 In fast control, tens of milliseconds of latency is typical. In many 326 of these systems, if a packet does not arrive within the specified 327 interval, the system enters an emergency shutdown state, often with 328 substantial financial repercussions. For a one-second control loop 329 in a system with a mean-time between shutdowns target of 30 years, 330 the latency requirement implies nine 9s of reliability. Given such 331 exposure, given the intrinsic vulnerability of wireless link 332 availability, and given the emergence of control in the field 333 architectures, most users tend to not aim for fast closed loop 334 control with wireless links within that fast loop. 336 2.2. Network Topology of Industrial Applications 338 Although network topology is difficult to generalize, the majority of 339 existing applications can be met by networks of 10 to 200 field 340 devices and maximum number of hops of twenty. It is assumed that the 341 field devices themselves will provide routing capability for the 342 network, and additional repeaters/routers will not be required in 343 most cases. 345 For the vast majority of industrial applications, the traffic is 346 mostly composed of real time publish/subscribe sensor data also 347 referred to as buffered, from the field devices over a LLN towards 348 one or more sinks. Increasingly over time, these sinks will be a 349 part of a backbone but today they are often fragmented and isolated. 351 The wireless sensor network is a LLN of field devices for which two 352 logical roles are defined, the field routers and the non routing 353 devices. It is acceptable and even probable that the repartition of 354 the roles across the field devices change over time to balance the 355 cost of the forwarding operation amongst the nodes. 357 In order to scale a control network in terms of density, one possible 358 architecture is to deploy a backbone as a canopy that aggregates 359 multiple smaller LLNs. The backbone is a high-speed infrastructure 360 network that may interconnect multiple WSNs through backbone routers. 361 Infrastructure devices can be connected to the backbone. A gateway / 362 manager that interconnects the backbone to the plant network of the 363 corporate network can be viewed as collapsing the backbone and the 364 infrastructure devices into a single device that operates all the 365 required logical roles. The backbone is likely to become an option 366 in the industrial network. 368 Typically, such backbones interconnect to the 'legacy' wired plant 369 infrastructure, the plant network, also known as the 'Process Control 370 Domain', the PCD. These plant automation networks are domain wise 371 segregated from the office network or office domain (OD), which in 372 itself is typically segregated from the Internet. 374 Sinks for LLN sensor data reside on both the plant network PCD, the 375 business network OD, and on the Internet. Applications close to 376 existing plant automation, such as wired process control and 377 monitoring systems running on fieldbusses, that require high 378 availability and low latencies, and that are managed by 'Control and 379 Automation' departments typically reside on the PCD. Other 380 applications such as automated corrosion monitoring, cathodic 381 protection voltage verification, or machine condition (vibration) 382 monitoring where one sample per week is considered over sampling, 383 would more likely deliver their sensor readings in the office domain. 385 Such applications are 'owned' by e.g. maintenance departments. 387 Yet other applications like third party maintained luminaries, or 388 vendor managed inventory systems, where a supplier of chemicals needs 389 access to tank level readings at his customer's site, will be best 390 served with direct Internet connectivity all the way to its sensor at 391 his customer's site. Temporary 'Babysitting sensors' deployed for 392 just a few days, say during startup or troubleshooting or for ad-hoc 393 measurement campaigns for R and D purposes are other examples where 394 Internet would be the domain where wireless sensor data shall land, 395 and other domains such as office and plant should preferably be 396 circumvented if quick deployment without potentially impacting plant 397 safety integrity is required. 399 This multiple domain multiple applications connectivity creates a 400 significant challenge. Many different applications will all share 401 the same medium, the ether, within the fence, preferably sharing the 402 same frequency bands, and preferably sharing the same protocols, 403 preferably synchronized to optimize co-existence challenges, yet 404 logically segregated to avoid creation of intolerable short cuts 405 between existing wired domains. 407 Given this challenge, LLN networks are best to be treated as all 408 sitting on yet another segregated domain, segregated from all other 409 wired domains where conventional security is organized by perimeter. 410 Moving away from the traditional perimeter security mindset means 411 moving towards stronger end-device identity authentication, so that 412 LLN access points can split the various wireless data streams and 413 interconnect back to the appropriate domain pending identity and 414 trust established by the gateways in the authenticity of message 415 originators. 417 Similar considerations are to be given to how multiple applications 418 may or may not be allowed to share routing devices and their 419 potentially redundant bandwidth within the network. Challenges here 420 are to balance available capacity, required latencies, expected 421 priorities, and last but not least available (battery) energy within 422 the routing devices. 424 2.2.1. The Physical Topology 426 There is no specific physical topology for an industrial process 427 control network. One extreme example is a multi-square-kilometer 428 refinery where isolated tanks, some of them with power but most with 429 no backbone connectivity, compose a farm that spans over of the 430 surface of the plant. A few hundred field devices are deployed to 431 ensure the global coverage using a wireless self-forming self-healing 432 mesh network that might be 5 to 10 hops across. Local feedback loops 433 and mobile workers tend to be only one or two hops. The backbone is 434 in the refinery proper, many hops away. Even there, powered 435 infrastructure is also typically several hops away. So hopping to/ 436 from the powered infrastructure will in general be more costly than 437 the direct route. 439 In the opposite extreme case, the backbone network spans all the 440 nodes and most nodes are in direct sight of one or more backbone 441 router. Most communication between field devices and infrastructure 442 devices as well as field device to field device occurs across the 443 backbone. Form afar, this model resembles the WIFI ESS (Extended 444 Service Set). But from a layer 3 perspective, the issues are the 445 default (backbone) router selection and the routing inside the 446 backbone whereas the radio hop towards the field device is in fact a 447 simple local delivery. 448 ---+------------------------ 449 | Plant Network 450 | 451 +-----+ 452 | | Gateway 453 | | 454 +-----+ 455 | 456 | Backbone 457 +--------------------+------------------+ 458 | | | 459 +-----+ +-----+ +-----+ 460 | | Backbone | | Backbone | | Backbone 461 | | router | | router | | router 462 +-----+ +-----+ +-----+ 463 o o o o o o o o o o o o o 464 o o o o o o o o o o o o o o o o o o 465 o o o o o o o o o o o M o o o o o 466 o o M o o o o o o o o o o o o o 467 o o o o o o o o o 468 o o o o o 469 LLN 471 Figure 1: Backbone-based Physical Topology 473 2.2.2. Logical Topologies 475 Most of the traffic over the LLN is publish/subscribe of sensor data 476 from the field device towards a sink that can be a backbone router, a 477 gateway, or a controller/manager. The destination of the sensor data 478 is an Infrastructure devices that sits on the backbone and is 479 reachable via one or more backbone router. 481 For security, reliability, availability or serviceability reasons, it 482 is often required that the logical topologies are not physically 483 congruent over the radio network, that is they form logical 484 partitions of the LLN. For instance, a routing topology that is set 485 up for control should be isolated from a topology that reports the 486 temperature and the status of the vents, if that second topology has 487 lesser constraints for the security policy. This isolation might be 488 implemented as Virtual LANs and Virtual Routing Tables in shared 489 nodes in the backbone, but correspond effectively to physical nodes 490 in the wireless network. 492 Since publishing the data is the raison d'etre for most of the 493 sensors, in some cases it makes sense to build proactively a set of 494 routes between the sensors and one or more backbone router and 495 maintain those routes at all time. Also, because of the lossy nature 496 of the network, the routing in place should attempt to propose 497 multiple paths in the form of Directed Acyclic Graphs oriented 498 towards the destination. 500 In contrast with the general requirement of maintaining default 501 routes towards the sinks, the need for field device to field device 502 connectivity is very specific and rare, though the traffic associated 503 might be of foremost importance. Field device to field device routes 504 are often the most critical, optimized and well-maintained routes. A 505 class 0 control loop requires guaranteed delivery and extremely tight 506 response times. Both the respect of criteria in the route 507 computation and the quality of the maintenance of the route are 508 critical for the field devices operation. Typically, a control loop 509 will be using a dedicated direct wire that has very different 510 capabilities, cost and constraints than the wireless medium, with the 511 need to use a wireless path as a back up route only in case of loss 512 of the wired path. 514 Considering that though each field device to field device route 515 computation has specific constraints in terms of latency and 516 availability it can be expected that the shortest path possible will 517 often be selected and that this path will be routed inside the LLN as 518 opposed to via the backbone. It can also be noted that the lifetimes 519 of the routes might range from minutes for a mobile workers to tens 520 of years for a command and control closed loop. Finally, time- 521 varying user requirements for latency and bandwidth will change the 522 constraints on the routes, which might either trigger a constrained 523 route recomputation, a reprovisioning of the underlying L2 protocols, 524 or both in that order. For instance, a wireless worker may initiate 525 a bulk transfer to configure or diagnose a field device. A level 526 sensor device may need to perform a calibration and send a bulk file 527 to a plant. 529 3. Traffic Characteristics 531 The industrial applications fall into four large service categories 532 [ISA100.11a]: 534 1. Periodic data (aka buffered). Data that is generated 535 periodically and has a well understood data bandwidth 536 requirement, both deterministic and predictable. Timely delivery 537 of such data is often the core function of a wireless sensor 538 network and permanent resources are assigned to ensure that the 539 required bandwidth stays available. Buffered data usually 540 exhibits a short time to live, and the newer reading obsoletes 541 the previous. In some cases, alarms are low priority information 542 that gets repeated over and over. The end-to-end latency of this 543 data is not as important as the regularity with which the data is 544 presented to the plant application. 546 2. Event data. This category includes alarms and aperiodic data 547 reports with bursty data bandwidth requirements. In certain 548 cases, alarms are critical and require a priority service from 549 the network. 551 3. Client/Server. Many industrial applications are based on a 552 client/server model and implement a command response protocol. 553 The data bandwidth required is often bursty. The acceptable 554 round-trip latency for some legacy systems was based on the time 555 to send tens of bytes over a 1200 baud link. Hundreds of 556 milliseconds is typical. This type of request is statistically 557 multiplexed over the LLN and cost-based fair-share best-effort 558 service is usually expected. 560 4. Bulk transfer. Bulk transfers involve the transmission of blocks 561 of data in multiple packets where temporary resources are 562 assigned to meet a transaction time constraint. Transient 563 resources are assigned for a limited period of time (related to 564 file size and data rate) to meet the bulk transfers service 565 requirements. 567 3.1. Service Parameters 569 The following service parameters can affect routing decisions in a 570 resource-constrained network: 572 o Data bandwidth - the bandwidth might be allocated permanently or 573 for a period of time to a specific flow that usually exhibits well 574 defined properties of burstiness and throughput. Some bandwidth 575 will also be statistically shared between flows in a best effort 576 fashion. 578 o Latency - the time taken for the data to transit the network from 579 the source to the destination. This may be expressed in terms of 580 a deadline for delivery. Most monitoring latencies will be in 581 seconds to minutes. 583 o Transmission phase - process applications can be synchronized to 584 wall clock time and require coordinated transmissions. A common 585 coordination frequency is 4 Hz (250 ms). 587 o Service contract type - revocation priority. LLNs have limited 588 network resources that can vary with time. This means the system 589 can become fully subscribed or even over subscribed. System 590 policies determine how resources are allocated when resources are 591 over subscribed. The choices are blocking and graceful 592 degradation. 594 o Transmission priority - the means by which limited resources 595 within field devices are allocated across multiple services. For 596 transmissions, a device has to select which packet in its queue 597 will be sent at the next transmission opportunity. Packet 598 priority is used as one criterion for selecting the next packet. 599 For reception, a device has to decide how to store a received 600 packet. The field devices are memory constrained and receive 601 buffers may become full. Packet priority is used to select which 602 packets are stored or discarded. 604 The routing protocol MUST also support different metric types for 605 each link used to compute the path according to some objective 606 function (e.g. minimize latency). 608 For these reasons, the ROLL routing infrastructure is required to 609 compute and update constrained routes on demand, and it can be 610 expected that this model will become more prevalent for field device 611 to field device connectivity as well as for some field device to 612 Infrastructure devices over time. 614 Industrial application data flows between field devices are not 615 necessarily symmetric. In particular, asymmetrical cost and 616 unidirectional routes are common for published data and alerts, which 617 represent the most part of the sensor traffic. The routing protocol 618 MUST be able to compute a set of unidirectional routes with 619 potentially different costs that are composed of one or more non- 620 congruent paths. 622 3.2. Configurable Application Requirement 624 Time-varying user requirements for latency and bandwidth may require 625 changes in the provisioning of the underlying L2 protocols. A 626 technician may initiate a query/response session or bulk transfer to 627 diagnose or configure a field device. A level sensor device may need 628 to perform a calibration and send a bulk file to a plant. The 629 routing protocol MUST route on paths that are changed to 630 appropriately provision the application requirements. The routing 631 protocol MUST support the ability to recompute paths based on 632 underlying link attributes/metric that may change dynamically. 634 3.3. Different Routes for Different Flows 636 Because different services categories have different service 637 requirements, it is often desirable to have different routes for 638 different data flows between the same two endpoints. For example, 639 alarm or periodic data from A to Z may require path diversity with 640 specific latency and reliability. A file transfer between A and Z 641 may not need path diversity. The routing algorithm MUST be able to 642 generate different routes with different characteritics (e.g. 643 Optimized according to different cost, etc...). 645 4. Reliability Requirements 647 LLN reliability constitutes several unrelated aspects: 649 1) Availability of source to destination connectivity when the 650 application needs it, expressed in number of succeses / number of 651 attempts 653 2) Availability of source to destination connectivity when the 654 application might need it, expressed in number of potential 655 failures / available bandwidth, 657 3) Ability, expressed in number of successes divided by number of 658 attempts to get data delivered from source to destination within 659 a capped time, 661 4) How well a network (serving many applications) achieves end-to- 662 end delivery of packets within a bounded latency 664 5) Trustworthiness of data that is delivered to the sinks. 666 6) ... 668 This makes quantifying reliability the equivalent of plotting it on a 669 three plus dimensional graph. Different applications have different 670 requirements, and expressing reliability as a one dimensional 671 parameter, like 'reliability my wireless network is 99.9%' is often 672 creating more confusion than clarity. 674 The impact of not receiving sensor data due to sporadic network 675 outages can be devastating if this happens unnoticed. However, if 676 destinations that expect periodic sensor data or alarm status 677 updates, fail to get them, then automatically these systems can take 678 appropriate actions that prevent dangerous situations. Pending the 679 wireless application, appropriate action ranges from initiating a 680 shut down within 100 ms, to using a last known good value for as much 681 as N successive samples, to sending out an operator into the plant to 682 collect monthly data in the conventional way, i.e. some portable 683 sensor, paper and a clipboard. 685 The impact of receiving corrupted data, and not being able to detect 686 that received data is corrupt, is often more dangerous. Data 687 corruption can either come from random bit errors, so white noise, or 688 from occasional bursty interference sources like thunderstorms or 689 leaky microwave ovens, but also from conscious attacks by 690 adversaries. 692 Another critical aspect for the routing is the capability to ensure 693 maximum disruption time and route maintainance. The maximum 694 disruption time is the time it takes at most for a specific path to 695 be restored when broken. Route maintainance ensures that a path is 696 monitored to be restored when broken within the maximum disruption 697 time. Maintenance should also ensure that a path continues to 698 provide the service for which it was established for instance in 699 terms of bandwidth, jitter and latency. 701 In industrial applications, availability is usually defined with 702 respect to end-to-end delivery of packets within a bounded latency. 703 availability requirements vary over many orders of magnitude. Some 704 non-critical monitoring applications may tolerate a availability of 705 less than 90% with hours of latency. Most industrial standards, such 706 as HART7, have set user availability expectations at 99.9%. 707 Regulatory requirements are a driver for some industrial 708 applications. Regulatory monitoring requires high data integrity 709 because lost data is assumed to be out of compliance and subject to 710 fines. This can drive up either availability, or thrustworthiness 711 requirements. 713 Because LLN link stability is often low, path diversity is critical. 714 Hop-by-hop link diversity is used to improve latency-bounded 715 reliability by sending data over diverse paths. 717 Because data from field devices are aggregated and funneled at the 718 LLN access point before they are routed to plant applications, LLN 719 access point redundancy is an important factor in overall 720 availability. A route that connects a field device to a plant 721 application may have multiple paths that go through more than one LLN 722 access point. The routing protocol MUST be able to compute paths of 723 not-necessarily-equal cost toward a given destination so as to enable 724 load balancing across a variety of paths. The availability of each 725 path in a multipath route can change over time. Hence, it is 726 important to measure the availability on a per-path basis and select 727 a path (or paths) according to the availability requirements. 729 5. Device-Aware Routing Requirements 731 Wireless LLN nodes in industrial environments are powered by a 732 variety of sources. Battery operated devices with lifetime 733 requirements of at least five years are the most common. Battery 734 operated devices have a cap on their total energy, and typically can 735 report an estimate of remaining energy, and typically do not have 736 constraints on the short-term average power consumption. Energy 737 scavenging devices are more complex. These systems contain both a 738 power scavenging device (such as solar, vibration, or temperature 739 difference) and an energy storage device, such as a rechargeable 740 battery or a capacitor. These systems, therefore, have limits on 741 both long-term average power consumption (which cannot exceed the 742 average scavenged power over the same interval) as well as the short- 743 term limits imposed by the energy storage requirements. For solar- 744 powered systems, the energy storage system is generally designed to 745 provide days of power in the absence of sunlight. Many industrial 746 sensors run off of a 4-20 mA current loop, and can scavenge on the 747 order of milliwatts from that source. Vibration monitoring systems 748 are a natural choice for vibration scavenging, which typically only 749 provides tens or hundreds of microwatts. Due to industrial 750 temperature ranges and desired lifetimes, the choices of energy 751 storage devices can be limited, and the resulting stored energy is 752 often comparable to the energy cost of sending or receiving a packet 753 rather than the energy of operating the node for several days. And 754 of course, some nodes will be line-powered. 756 Example 1: solar panel, lead-acid battery sized for two weeks of 757 rain. 759 Example 2: vibration scavenger, 1mF tantalum capacitor. 761 Field devices have limited resources. Low-power, low-cost devices 762 have limited memory for storing route information. Typical field 763 devices will have a finite number of routes they can support for 764 their embedded sensor/actuator application and for forwarding other 765 devices packets in a mesh network slotted-link. 767 Users may strongly prefer that the same device have different 768 lifetime requirements in different locations. A sensor monitoring a 769 non-critical parameter in an easily accessed location may have a 770 lifetime requirement that is shorter and tolerate more statistical 771 variation than a mission-critical sensor in a hard-to-reach place 772 that requires a plant shutdown in order to replace. 774 The routing algorithm MUST support node-constrained routing (e.g. 775 taking into account the existing energy state as a node constraint). 776 Node constraints include power and memory, as well as constraints 777 placed on the device by the user, such as battery life. 779 6. Broadcast/Multicast 781 Some existing industrial plant applications do not use broadcast or 782 multicast addressing to communicate to field devices. Unicast 783 address support is sufficient for them. 785 In some other industrial process automation environments, multicast 786 over IP is used to deliver to multiple nodes that may be 787 functionally-similar or not. Example usages are: 789 1) Delivery of alerts to multiple similar servers in an automation 790 control room. Alerts are multicast to a group address based on 791 the part of the automation process where the alerts arose (e.g., 792 the multicast address "all-nodes-interested-in-alerts-for- 793 process-unit-X"). This is always a restricted-scope multicast, 794 not a broadcast 796 2) Delivery of common packets to multiple routers over a backbone, 797 where the packets results in each receiving router initiating 798 multicast (sometimes as a full broadcast) within the LLN. For 799 instance, This can be a byproduct of having potentially 800 physically separated backbone routers that can inject messages 801 into different portions of the same larger LLN. 803 3) Publication of measurement data to more than one subscriber. 804 This feature is useful in some peer to peer control applications. 805 For example, level position may be useful to a controller that 806 operates the flow valve and also to the overfill alarm indicator. 807 Both controller and alarm indicator would receive the same 808 publication sent as a multicast by the level gauge. 810 Both of these uses require an 1:N security mechanism as well; they 811 aren't of any use if the end-to-end security is only point-to-point. 813 It is quite possible that first-generation wireless automation field 814 networks can be adequately useful without either of these 815 capabilities, but in the near future, wireless field devices with 816 communication controllers and protocol stacks will require control 817 and configuration, such as firmware downloading, that may benefit 818 from broadcast or multicast addressing. 820 The routing protocol SHOULD support broadcast or multicast 821 addressing. 823 7. Route Establishment Time 825 During network formation, installers with no networking skill must be 826 able to determine if their devices are "in the network" with 827 sufficient connectivity to perform their function. Installers will 828 have sufficient skill to provision the devices with a sample rate or 829 activity profile. The routing algorithm MUST find the appropriate 830 route(s) and report success or failure within several minutes, and 831 SHOULD report success or failure within tens of seconds. 833 Network connectivity in real deployments is always time varying, with 834 time constants from seconds to months. So long as the underlying 835 connectivity has not been compromised, this link churn should not 836 substantially affect network operation. The routing algorithm MUST 837 respond to normal link failure rates with routes that meet the 838 Service requirements (especially latency) throughout the routing 839 response. The routing algorithm SHOULD always be in the process of 840 recalculating the route in response to changing link statistics. The 841 routing algorithm MUST recalculate the paths when field devices 842 change due to insertion, removal or failure, and this recalculation 843 MUST NOT cause latencies greater than the specified constraints 844 (typically seconds to minutes). 846 8. Mobility 848 Various economic factors have contributed to a reduction of trained 849 workers in the plant. The industry as a whole appears to be trying 850 to solve this problem with what is called the "wireless worker". 851 Carrying a PDA or something similar, this worker will be able to 852 accomplish more work in less time than the older, better-trained 853 workers that he or she replaces. Whether the premise is valid, the 854 use case is commonly presented: the worker will be wirelessly 855 connected to the plant IT system to download documentation, 856 instructions, etc., and will need to be able to connect "directly" to 857 the sensors and control points in or near the equipment on which he 858 or she is working. It is possible that this "direct" connection 859 could come via the normal LLNs data collection network. This 860 connection is likely to require higher bandwidth and lower latency 861 than the normal data collection operation. 863 Undecided yet is if these PDAs will use the LLN network directly to 864 talk to field sensors, or if they will rather use other wireless 865 connectivity that proxys back into the field, or to anywhere else, 866 the user interfaces typically used for plant historians, asset 867 management systems, and the likes. 869 The routing protocol SHOULD support the wireless worker with fast 870 network connection times of a few of seconds, and low command and 871 response latencies to the plant behind the LLN access points, to 872 applications, and to field devices. The routing protocol SHOULD also 873 support the bandwidth allocation for bulk transfers between the field 874 device and the handheld device of the wireless worker. The routing 875 protocol SHOULD support walking speeds for maintaining network 876 connectivity as the handheld device changes position in the wireless 877 network. 879 Some field devices will be mobile. These devices may be located on 880 moving parts such as rotating components or they may be located on 881 vehicles such as cranes or fork lifts. The routing protocol SHOULD 882 support vehicular speeds of up to 35 kmph. 884 9. Manageability 886 The process and control industry is manpower constrained. The aging 887 demographics of plant personnel are causing a looming manpower 888 problem for industry across many markets. The goal for the 889 industrial networks is to have the installation process not require 890 any new skills for the plant personnel. The person would install the 891 wireless sensor or wireless actuator the same way the wired sensor or 892 wired actuator is installed, except the step to connect wire is 893 eliminated. 895 Most users in fact demand even much further simplified provisioning 896 methods, whereby automatically any new device will connect and report 897 at the LLN access point. This requires availability of open and 898 untrusted side channels for new joiners, and it requires strong and 899 automated authentication so that networks can automatically accept or 900 reject new joiners. Ideally, for a user, adding new devices should 901 be as easy as dragging and dropping an icon from a pool of 902 authenticated new joiners into a pool for the wired domain that this 903 new sensor should connect to. Under the hood, invisible to the user, 904 auditable security mechanisms should take care of new device 905 authentication, and secret join key distribution. These more 906 sophisticated 'over the air' secure provisioning methods should 907 eliminate the use of traditional configuration tools for setting up 908 devices prior to being ready to securely join a LLN access point. 910 There will be many new applications where even without any human 911 intervention at the plant, devices that have never been on site 912 before, should be allowed, based on their credentials and crypto 913 capabilities, to connect anyway. Examples are 3rd party road 914 tankers, rail cargo containers with overfill protection sensors, or 915 consumer cars that need to be refueled with hydrogen by robots at 916 future petrol stations. 918 The routing protocol for LLNs is expected to be easy to deploy and 919 manage. Because the number of field devices in a network is large, 920 provisioning the devices manually may not make sense. The routing 921 MAY require commissioning of information about the node itself, like 922 identity, security tokens, radio standards and frequencies, etc. The 923 routing protocol SHOULD NOT require to preprovision information about 924 the environment where the node will be deployed. The routing 925 protocol MUST enable the full discovery and setup of the environment 926 (available links, selected peers, reachable network).The protocol 927 also MUST support the distribution of configuration from a 928 centralized management controller if operator-initiated configuration 929 change is allowed. 931 10. Security 933 Given that wireless sensor networks in industrial automation operate 934 in systems that have substantial financial and human safety 935 implications, security is of considerable concern. Levels of 936 security violation that are tolerated as a "cost of doing business" 937 in the banking industry are not acceptable when in some cases 938 literally thousands of lives may be at risk. 940 Security is easily confused with guarantee for availability. When 941 discussing wireless security, it's important to distinguish clearly 942 between the risks of temporary losing connectivity, say due to a 943 thunderstorm, and the risks associated with knowledgeable adversaries 944 attacking a wireless system. The conscious attacks need to be split 945 between 1) attacks on the actual application served by the wireless 946 devices and 2) attacks that exploit the presence of a wireless access 947 point that may provide connectivity onto legacy wired plant networks, 948 so attacks that have little to do with the wireless devices in the 949 LLNs. The second type of attack, access points that might be 950 wireless backdoors that may allow an attacker outside the fence to 951 access typically non-secured process control and/or office networks, 952 are typically the ones that do create exposures where lives are at 953 risk. This implies that the LLN access point on its own must possess 954 functionality that guarantees domain segregation, and thus prohibits 955 many types of traffic further upstream. 957 Current generation industrial wireless device manufactures are 958 specifying security at the MAC layer and the transport layer. A 959 shared key is used to authenticate messages at the MAC layer. At the 960 transport layer, commands are encrypted with unique randomly- 961 generated end-to-end Session keys. HART7 and ISA100.11a are examples 962 of security systems for industrial wireless networks. 964 Although such symmetric key encryption and authentication mechanisms 965 at MAC and transport layers may protect reasonably well during the 966 lifecycle, the initial network boot (provisioning) step in many cases 967 requires more sophisticated steps to securely land the initial secret 968 keys in field devices. It is vital that also during these steps, the 969 ease of deployment and the freedom of mixing and matching products 970 from different suppliers does not complicate life for those that 971 deploy and commission. Given average skill levels in the field, and 972 given serious resource constraints in the market, investing a little 973 bit more in sensor node hardware and software so that new devices 974 automatically can be deemed trustworthy, and thus automatically join 975 the domains that they should join, with just one drag and drop action 976 for those in charge of deploying, will yield in faster adoption and 977 proliferation of the LLN technology. 979 Industrial plants may not maintain the same level of physical 980 security for field devices that is associated with traditional 981 network sites such as locked IT centers. In industrial plants it 982 must be assumed that the field devices have marginal physical 983 security and might be compromised. The routing protocol SHOULD limit 984 the risk incurred by one node being compromised, for instance by 985 proposing non congruent path for a given route and balancing the 986 traffic across the network. 988 The routing protocol SHOULD compartmentalize the trust placed in 989 field devices so that a compromised field device does not destroy the 990 security of the whole network. The routing MUST be configured and 991 managed using secure messages and protocols that prevent outsider 992 attacks and limit insider attacks from field devices installed in 993 insecure locations in the plant. 995 Wireless typically forces abandonance of classical 'by perimeter' 996 thinking when trying to secure network domains. Wireless nodes in 997 LLN networks should thus be regarded as little islands with trusted 998 kernels, situated in an ocean of untrusted connectivity, an ocean 999 that might be full of pirate ships. Consequently, confidence in node 1000 identity and ability to challenge authenticity of source node 1001 credentials gets more relevant. Cryptographic boundaries inside 1002 devices that clearly demark the border between trusted and untrusted 1003 areas need to be drawn. Protection against compromise of the 1004 cryptographic boundaries inside the hardware of devices is outside of 1005 the scope this document. 1007 11. IANA Considerations 1009 This document includes no request to IANA. 1011 12. Acknowledgements 1013 Many thanks to Rick Enns, Alexander Chernoguzov and Chol Su Kang for 1014 their contributions. 1016 13. References 1018 13.1. Normative References 1020 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1021 Requirement Levels", BCP 14, RFC 2119, March 1997. 1023 13.2. Informative References 1025 [I-D.ietf-roll-terminology] 1026 Vasseur, J., "Terminology in Low power And Lossy 1027 Networks", draft-ietf-roll-terminology-00 (work in 1028 progress), October 2008. 1030 13.3. External Informative References 1032 [HART] www.hartcomm.org, "Highway Addressable Remote Transducer", 1033 a group of specifications for industrial process and 1034 control devices administered by the HART Foundation". 1036 [ISA100.11a] 1037 ISA, "ISA100, Wireless Systems for Automation", May 2008, 1038 < http://www.isa.org/Community/ 1039 SP100WirelessSystemsforAutomation>. 1041 Authors' Addresses 1043 Kris Pister (editor) 1044 Dust Networks 1045 30695 Huntwood Ave. 1046 Hayward, 94544 1047 USA 1049 Email: kpister@dustnetworks.com 1051 Pascal Thubert (editor) 1052 Cisco Systems 1053 Village d'Entreprises Green Side 1054 400, Avenue de Roumanille 1055 Batiment T3 1056 Biot - Sophia Antipolis 06410 1057 FRANCE 1059 Phone: +33 497 23 26 34 1060 Email: pthubert@cisco.com 1062 Sicco Dwars 1063 Shell Global Solutions International B.V. 1064 Sir Winston Churchilllaan 299 1065 Rijswijk 2288 DC 1066 Netherlands 1068 Phone: +31 70 447 2660 1069 Email: sicco.dwars@shell.com 1071 Tom Phinney 1072 5012 W. Torrey Pines Circle 1073 Glendale, AZ 85308-3221 1074 USA 1076 Phone: +1 602 938 3163 1077 Email: tom.phinney@cox.net