idnits 2.17.1 draft-ietf-roll-indus-routing-reqs-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 19. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 1063. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 1074. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 1081. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 1087. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year == Using lowercase 'not' together with uppercase 'MUST', 'SHALL', 'SHOULD', or 'RECOMMENDED' is not an accepted usage according to RFC 2119. Please use uppercase 'NOT' together with RFC 2119 keywords (if that is what you mean). Found 'MUST not' in this paragraph: Network connectivity in real deployments is always time varying, with time constants from seconds to months. So long as the underlying connectivity has not been compromised, this link churn should not substantially affect network operation. The routing algorithm MUST respond to normal link failure rates with routes that meet the Service requirements (especially latency) throughout the routing response. The routing algorithm SHOULD always be in the process of optimizing the system in response to changing link statistics. The routing algorithm MUST re-optimize the paths when field devices change due to insertion, removal or failure, and this re-optimization MUST not cause latencies greater than the specified constraints (typically seconds to minutes). -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (July 8, 2008) is 5770 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'HART' is mentioned on line 1003, but not defined == Unused Reference: 'I-D.culler-rl2n-routing-reqs' is defined on line 995, but no explicit reference was found in the text Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 7 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Networking Working Group K. Pister, Ed. 3 Internet-Draft Dust Networks 4 Intended status: Informational P. Thubert, Ed. 5 Expires: January 9, 2009 Cisco Systems 6 S. Dwars 7 Shell 8 T. Phinney 9 July 8, 2008 11 Industrial Routing Requirements in Low Power and Lossy Networks 12 draft-ietf-roll-indus-routing-reqs-01 14 Status of this Memo 16 By submitting this Internet-Draft, each author represents that any 17 applicable patent or other IPR claims of which he or she is aware 18 have been or will be disclosed, and any of which he or she becomes 19 aware will be disclosed, in accordance with Section 6 of BCP 79. 21 Internet-Drafts are working documents of the Internet Engineering 22 Task Force (IETF), its areas, and its working groups. Note that 23 other groups may also distribute working documents as Internet- 24 Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or obsoleted by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference 29 material or to cite them other than as "work in progress." 31 The list of current Internet-Drafts can be accessed at 32 http://www.ietf.org/ietf/1id-abstracts.txt. 34 The list of Internet-Draft Shadow Directories can be accessed at 35 http://www.ietf.org/shadow.html. 37 This Internet-Draft will expire on January 9, 2009. 39 Abstract 41 Wireless, low power field devices enable industrial users to 42 significantly increase the amount of information collected and the 43 number of control points that can be remotely managed. The 44 deployment of these wireless devices will significantly improve the 45 productivity and safety of the plants while increasing the efficiency 46 of the plant workers. For wireless devices to have a significant 47 advantage over wired devices in an industrial environment the 48 wireless network needs to have three qualities: low power, high 49 reliability, and easy installation and maintenance. The aim of this 50 document is to analyze the requirements for the routing protocol used 51 for low power and lossy networks (L2N) in industrial environments. 53 Requirements Language 55 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 56 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 57 document are to be interpreted as described in RFC 2119 [RFC2119]. 59 Table of Contents 61 1. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 62 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 63 2.1. Applications and Traffic Patterns . . . . . . . . . . . . 5 64 2.2. Network Topology of Industrial Applications . . . . . . . 7 65 2.2.1. The Physical Topology . . . . . . . . . . . . . . . . 9 66 2.2.2. Logical Topologies . . . . . . . . . . . . . . . . . . 10 67 3. Service Requirements . . . . . . . . . . . . . . . . . . . . . 12 68 3.1. Configurable Application Requirement . . . . . . . . . . . 13 69 3.2. Different Routes for Different Flows . . . . . . . . . . . 14 70 4. Reliability Requirements . . . . . . . . . . . . . . . . . . . 14 71 5. Device-Aware Routing Requirements . . . . . . . . . . . . . . 15 72 6. Broadcast/Multicast . . . . . . . . . . . . . . . . . . . . . 17 73 7. Route Establishment Time . . . . . . . . . . . . . . . . . . . 17 74 8. Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 75 9. Manageability . . . . . . . . . . . . . . . . . . . . . . . . 19 76 10. Security . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 77 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 78 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 79 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 80 13.1. Normative References . . . . . . . . . . . . . . . . . . . 22 81 13.2. Informative References . . . . . . . . . . . . . . . . . . 22 82 13.3. External Informative References . . . . . . . . . . . . . 22 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 84 Intellectual Property and Copyright Statements . . . . . . . . . . 24 86 1. Terminology 88 Actuator: a field device that moves or controls plant equipment. 90 Closed Loop Control: A process whereby a device controller controls 91 an actuator based on information sensed by one or more field devices. 93 Downstream: Data direction traveling from the plant application to 94 the field device. 96 PCD: Process Control Domain. The 'legacy' wired plant Network. 98 OD: Office Domain. The office Network. 100 Field Device: physical devices placed in the plant's operating 101 environment (both RF and environmental). Field devices include 102 sensors and actuators as well as network routing devices and L2N 103 access points in the plant. 105 HART: "Highway Addressable Remote Transducer", a group of 106 specifications for industrial process and control devices 107 administered by the HART Foundation (see [HART]). The latest version 108 for the specifications is HART7 which includes the additions for 109 WirelessHART. 111 ISA: "International Society of Automation". ISA is an ANSI 112 accredited standards-making society. ISA100 is an ISA committee 113 whose charter includes defining a family of standards for industrial 114 automation. [ISA100.11a] is a working group within ISA100 that is 115 working on a standard for monitoring and non-critical process control 116 applications. 118 L2N Access Point: The L2N access point is an infrastructure device 119 that connects the low power and lossy network system to a plant's 120 backbone network. 122 Open Loop Control: A process whereby a plant operator manually 123 manipulates an actuator over the network where the decision is 124 influenced by information sensed by field devices. 126 Plant Application: The plant application is a computer process 127 running in the plant that communicates with field devices to perform 128 tasks that may include control, monitoring and data gathering. 130 Upstream: Data direction traveling from the field device to the plant 131 application. 133 RL2N: Routing in Low power and Lossy Networks. 135 2. Introduction 137 Wireless, low-power field devices enable industrial users to 138 significantly increase the amount of information collected and the 139 number of control points that can be remotely managed. The 140 deployment of these wireless devices will significantly improve the 141 productivity and safety of the plants while increasing the efficiency 142 of the plant workers. 144 Wireless field devices enable expansion of networked points by 145 appreciably reducing cost of installing a device. The cost 146 reductions come from eliminating cabling costs and simplified 147 planning. Cabling also carries an overhead cost associated with 148 planning the installation, determining where the cable has to run, 149 and interfacing with the various organizations required to coordinate 150 its deployment. Doing away with the network and power cables reduces 151 the planning and administrative overhead of installing a device. 153 For wireless devices to have a significant advantage over wired 154 devices in an industrial environment, the wireless network needs to 155 have three qualities: low power, high reliability, and easy 156 installation and maintenance. The routing protocol used for low 157 power and lossy networks (L2N) is important to fulfilling these 158 goals. 160 Industrial automation is segmented into two distinct application 161 spaces, known as "process" or "process control" and "discrete 162 manufacturing" or "factory automation". In industrial process 163 control, the product is typically a fluid (oil, gas, chemicals ...). 164 In factory automation or discrete manufacturing, the products are 165 individual elements (screws, cars, dolls). While there is some 166 overlap of products and systems between these two segments, they are 167 surprisingly separate communities. The specifications targeting 168 industrial process control tend to have more tolerance for network 169 latency than what is needed for factory automation. 171 Irrespective of this different 'process' and 'discrete' plant nature 172 both plant types will have similar needs for automating the 173 collection of data that used to be collected manually, or was not 174 collected before. Examples are wireless sensors that report the 175 state of a fuse, report the state of a luminary, HVAC status, report 176 vibration levels on pumps, report man-down, and so on. 178 Other novel application arenas that equally apply to both 'process' 179 and 'discrete' involve mobile sensors that roam in and out of plants, 180 such as active sensor tags on containers or vehicles. 182 Some if not all of these applications will need to be served by the 183 same low power and lossy wireless network technology. This may mean 184 several disconnected, autonomous L2N networks connecting to multiple 185 hosts, but sharing the same ether. Interconnecting such networks, if 186 only to supervise channel and priority allocations, or to fully 187 synchronize, or to share path capacity within a set of physical 188 network components may be desired, or may not be desired for 189 practical reasons, such as e.g. cyber security concerns in relation 190 to plant safety and integrity. 192 All application spaces desire battery operated networks of hundreds 193 of sensors and actuators communicating with L2N access points. In an 194 oil refinery, the total number of devices might exceed one million, 195 but the devices will be clustered into smaller networks that in most 196 cases interconnect and report to an existing plant network 197 infrastructure. 199 Existing wired sensor networks in this space typically use 200 communication protocols with low data rates, from 1,200 baud (e.g. 201 wired HART) to the one to two hundred Kbps range for most of the 202 others. The existing protocols are often master/slave with command/ 203 response. 205 2.1. Applications and Traffic Patterns 207 The industrial market classifies process applications into three 208 broad categories and six classes. 210 o Safety 212 * Class 0: Emergency action - Always a critical function 214 o Control 216 * Class 1: Closed loop regulatory control - Often a critical 217 function 219 * Class 2: Closed loop supervisory control - Usually non-critical 220 function 222 * Class 3: Open loop control - Operator takes action and controls 223 the actuator (human in the loop) 225 o Monitoring 227 * Class 4: Alerting - Short-term operational effect (for example 228 event-based maintenance) 230 * Class 5: Logging and downloading / uploading - No immediate 231 operational consequence (e.g., history collection, sequence-of- 232 events, preventive maintenance) 234 Safety critical functions affect the basic safety integrity of the 235 plant. These normally dormant functions kick in only when process 236 control systems, or their operators, have failed. By design and by 237 regular interval inspection, they have a well-understood probability 238 of failure on demand in the range of typically once per 10-1000 239 years. 241 In-time deliveries of messages becomes more relevant as the class 242 number decreases. 244 Note that for a control application, the jitter is just as important 245 as latency and has a potential of destabilizing control algorithms. 247 Industrial users are interested in deploying wireless networks for 248 the monitoring classes 4 and 5, and in the non-critical portions of 249 classes 3 through 2. 251 Classes 4 and 5 also include asset monitoring and tracking which 252 include equipment monitoring and are essentially separate from 253 process monitoring. An example of equipment monitoring is the 254 recording of motor vibrations to detect bearing wear. However, 255 similar sensors detecting excessive vibration levels could be used as 256 safeguarding loops that immediately initiate a trip, and thus end up 257 being class 0. 259 In the near future, most low power and lossy network systems will be 260 for low frequency data collection. Packets containing samples will 261 be generated continuously, and 90% of the market is covered by packet 262 rates of between 1/s and 1/hour, with the average under 1/min. In 263 industrial process, these sensors include temperature, pressure, 264 fluid flow, tank level, and corrosion. Some sensors are bursty, such 265 as vibration monitors that may generate and transmit tens of kilo- 266 bytes (hundreds to thousands of packets) of time-series data at 267 reporting rates of minutes to days. 269 Almost all of these sensors will have built-in microprocessors that 270 may detect alarm conditions. Time-critical alarm packets are 271 expected to be granted a lower latency than periodic sensor data 272 streams. 274 Some devices will transmit a log file every day, again with typically 275 tens of Kbytes of data. For these applications there is very little 276 "downstream" traffic coming from the L2N access point and traveling 277 to particular sensors. During diagnostics, however, a technician may 278 be investigating a fault from a control room and expect to have "low" 279 latency (human tolerable) in a command/response mode. 281 Low-rate control, often with a "human in the loop" (also referred to 282 as "open loop"), is implemented via communication to a control room 283 because that's where the human in the loop will be. The sensor data 284 makes its way through the L2N access point to the centralized 285 controller where it is processed, the operator sees the information 286 and takes action, and the control information is then sent out to the 287 actuator node in the network. 289 In the future, it is envisioned that some open loop processes will be 290 automated (closed loop) and packets will flow over local loops and 291 not involve the L2N access point. These closed loop controls for 292 non-critical applications will be implemented on L2Ns. Non-critical 293 closed loop applications have a latency requirement that can be as 294 low as 100 ms but many control loops are tolerant of latencies above 295 1 s. 297 More likely though is that loops will be closed in the field 298 entirely, which in most cases eliminates the need for having wireless 299 links within the control loop. Most control loops have sensors and 300 actuators within such proximity that a wire between them remains the 301 most sensible option from an economic point of view. This 'control 302 in the field' architecture is already common practice with wired 303 field busses. An 'upstream' wireless link would only be used to 304 influence the in-field controller settings, and to occasionally 305 capture diagnostics. Even though the link back to a control room 306 might be a wireless and L2N-ish, this architecture reduces the tight 307 latency and availability requirements for the wireless links. 309 In fast control, tens of milliseconds of latency is typical. In many 310 of these systems, if a packet does not arrive within the specified 311 interval, the system enters an emergency shutdown state, often with 312 substantial financial repercussions. For a one-second control loop 313 in a system with a mean-time between shutdowns target of 30 years, 314 the latency requirement implies nine 9s of reliability. Given such 315 exposure, given the intrinsic vulnerability of wireless link 316 availability, and given the emergence of control in the field 317 architectures, most users tend to not aim for fast closed loop 318 control with wireless links within that fast loop. 320 2.2. Network Topology of Industrial Applications 322 Although network topology is difficult to generalize, the majority of 323 existing applications can be met by networks of 10 to 200 field 324 devices and maximum number of hops from two to twenty. It is assumed 325 that the field devices themselves will provide routing capability for 326 the network, and additional repeaters/routers will not be required in 327 most cases. 329 For most industrial applications, a manager, gateway or backbone 330 router acts as a sink for the wireless sensor network. The vast 331 majority of the traffic is real time publish/subscribe sensor data 332 from the field devices over a L2N towards one or more sinks. 333 Increasingly over time, these sinks will be a part of a backbone but 334 today they are often fragmented and isolated. 336 The wireless sensor network is a Low Power and Lossy Network of field 337 devices for which two logical roles are defined, the field routers 338 and the non routing devices. It is acceptable and even probable that 339 the repartition of the roles across the field devices change over 340 time to balance the cost of the forwarding operation amongst the 341 nodes. 343 The backbone is a high-speed infrastructure network that may 344 interconnect multiple WSNs through backbone routers. Infrastructure 345 devices can be connected to the backbone. A gateway / manager that 346 interconnects the backbone to the plant network of the corporate 347 network can be viewed as collapsing the backbone and the 348 infrastructure devices into a single device that operates all the 349 required logical roles. The backbone is likely to become an 350 important function of the industrial network. 352 Typically, such backbones interconnect to the 'legacy' wired plant 353 infrastructure, the plant network, also known as the 'Process Control 354 Domain', the PCD. These plant automation networks are domain wise 355 segregated from the office network or office domain (OD), which in 356 itself is typically segregated from the Internet. 358 Sinks for L2N sensor data reside on both the plant network PCD, the 359 business network OD, and on the Internet. Applications close to 360 existing plant automation, such as wired process control and 361 monitoring systems running on fieldbusses, that require high 362 availability and low latencies, and that are managed by 'Control and 363 Automation' departments typically reside on the PCD. Other 364 applications such as automated corrosion monitoring, cathodic 365 protection voltage verification, or machine condition (vibration) 366 monitoring where one sample per week is considered over sampling, 367 would more likely deliver their sensor readings in the office domain. 368 Such applications are 'owned' by e.g. maintenance departments. 370 Yet other applications will be best served with direct Internet 371 connectivity. Examples include: third-party-maintained luminaries; 372 vendor-managed inventory systems, where a supplier of chemicals needs 373 access to tank level readings at his customer's site; temporary 374 'Babysitting sensors' deployed for just a few days, perhaps during 375 startup, troubleshooting, or ad-hoc measurement campaigns for R&D 376 purposes. In these cases, the sensor data naturally flows to the 377 Internet, and other domains such as office and plant should be 378 circumvented. This will allow quick deployment without impacting 379 plant safety integrity. 381 This multiple domain multiple applications connectivity creates a 382 significant challenge. Many different applications will all share 383 the same medium, the ether, within the fence, preferably sharing the 384 same frequency bands, and preferably sharing the same protocols, 385 preferably synchronized to optimize co-existence challenges, yet 386 logically segregated to avoid creation of intolerable short cuts 387 between existing wired domains. 389 Given this challenge, L2N networks are best to be treated as all 390 sitting on yet another segregated domain, segregated from all other 391 wired domains where conventional security is organized by perimeter. 392 Moving away from the traditional perimeter security mindset means 393 moving towards stronger end-device identity authentication, so that 394 L2N access points can split the various wireless data streams and 395 interconnect back to the appropriate domain pending identity and 396 trust established by the gateways in the authenticity of message 397 originators. 399 Similar considerations are to be given to how multiple applications 400 may or may not be allowed to share routing devices and their 401 potentially redundant bandwidth within the network. Challenges here 402 are to balance available capacity, required latencies, expected 403 priorities, and last but not least available (battery) energy within 404 the routing devices. 406 2.2.1. The Physical Topology 408 There is no specific physical topology for an industrial process 409 control network. One extreme example is a multi-square-kilometer 410 refinery where isolated tanks, some of them with power but most with 411 no backbone connectivity, compose a farm that spans over of the 412 surface of the plant. A few hundred field devices are deployed to 413 ensure the global coverage using a wireless self-forming self-healing 414 mesh network that might be 5 to 10 hops across. Local feedback loops 415 and mobile workers tend to be only one or two hops. The backbone is 416 in the refinery proper, many hops away. Even there, powered 417 infrastructure is also typically several hops away. So hopping to/ 418 from the powered infrastructure will in general be more costly than 419 the direct route. 421 In the opposite extreme case, the backbone network spans all the 422 nodes and most nodes are in direct sight of one or more backbone 423 router. Most communication between field devices and infrastructure 424 devices as well as field device to field device occurs across the 425 backbone. From afar, this model resembles the WIFI ESS (Extended 426 Service Set). But from a layer 3 perspective, the issues are the 427 default (backbone) router selection and the routing inside the 428 backbone whereas the radio hop towards the field device is in fact a 429 simple local delivery. 431 ---+------------------------ 432 | Plant Network 433 | 434 +-----+ 435 | | Gateway 436 | | 437 +-----+ 438 | 439 | Backbone 440 +--------------------+------------------+ 441 | | | 442 +-----+ +-----+ +-----+ 443 | | Backbone | | Backbone | | Backbone 444 | | router | | router | | router 445 +-----+ +-----+ +-----+ 446 o o o o o o o o o o o o o 447 o o o o o o o o o o o o o o o o o o 448 o o o o o o o o o o o M o o o o o 449 o o M o o o o o o o o o o o o o 450 o o o o o o o o o 451 o o o o o 452 L2N 454 Figure 1: The Physical Topology 456 2.2.2. Logical Topologies 458 Most of the traffic over the LLN is publish/subscribe of sensor data 459 from the field device towards the backbone router or gateway that 460 acts as the sink for the WSN. The destination of the sensor data is 461 an Infrastructure device that sits on the backbone and is reachable 462 via one or more backbone router. 464 For security, reliability, availability or serviceability reasons, it 465 is often required that the logical topologies are not physically 466 congruent over the radio network, that is they form logical 467 partitions of the LLN. For instance, a routing topology that is set 468 up for control should be isolated from a topology that reports the 469 temperature and the status of the events, if that second topology has 470 lesser constraints for the security policy. This isolation might be 471 implemented as Virtual LANs and Virtual Routing Tables in shared 472 nodes the backbone, but correspond effectively to physical nodes in 473 the wireless network. 475 Since publishing the data is the raison d'etre for most of the 476 sensors, it makes sense to build proactively a set of default routes 477 between the sensors and one or more backbone router and maintain 478 those routes at all times. Also, because of the lossy nature of the 479 network, the routing in place should attempt to propose multiple 480 forwarding solutions, building forwarding topologies in the form of 481 Directed Acyclic Graphs oriented towards the sinks. 483 In contrast with the general requirement of maintaining default 484 routes towards the sinks, the need for field device to field device 485 connectivity is very specific and rare, though the traffic associated 486 might be of foremost importance. Field device to field device routes 487 are often the most critical, optimized and well-maintained routes. A 488 class 0 control loop requires guaranteed delivery and extremely tight 489 response times. Both the respect of criteria in the route 490 computation and the quality of the maintenance of the route are 491 critical for the field devices operation. Typically, a control loop 492 will be using a dedicated direct wire that has very different 493 capabilities, cost and constraints than the wireless medium, with the 494 need to use a wireless path as a back up route only in case of loss 495 of the wired path. 497 Considering that though each field device to field device route 498 computation has specific constraints in terms of latency and 499 availability it can be expected that the shortest path possible will 500 often be selected and that this path will be routed inside the LLN as 501 opposed to via the backbone. It can also be noted that the lifetimes 502 of the routes might range from minutes for a mobile workers to tens 503 of years for a command and control closed loop. Finally, time- 504 varying user requirements for latency and bandwidth will change the 505 constraints on the routes, which might either trigger a constrained 506 route recomputation, a reprovisioning of the underlying L2 protocols, 507 or both in that order. For instance, a wireless worker may initiate 508 a bulk transfer to configure or diagnose a field device. A level 509 sensor device may need to perform a calibration and send a bulk file 510 to a plant. 512 For these reasons, the ROLL routing infrastructure MUST be able to 513 compute and update constrained routes on demand (that is reactively), 514 and it can be expected that this model will become more prevalent for 515 field device to field device connectivity as well as for some field 516 device to Infrastructure devices over time. 518 3. Service Requirements 520 The industrial applications fall into four large service categories 521 [ISA100.11a]: 523 1. Periodic data (aka buffered). Data that is generated 524 periodically and has a well understood data bandwidth 525 requirement, both deterministic and predictable. Timely delivery 526 of such data is often the core function of a wireless sensor 527 network and permanent resources are assigned to ensure that the 528 required bandwidth stays available. Buffered data usually 529 exhibits a short time to live, and the newer reading obsoletes 530 the previous. In some cases, alarms are low priority information 531 that gets repeated over and over. The end-to-end latency of this 532 data is not as important as the regularity with which the data is 533 presented to the plant application. 535 2. Event data. This category includes alarms and aperiodic data 536 reports with bursty data bandwidth requirements. In certain 537 cases, alarms are critical and require a priority service from 538 the network. 540 3. Client/Server. Many industrial applications are based on a 541 client/server model and implement a command response protocol. 542 The data bandwidth required is often bursty. The acceptable 543 round-trip latency for some legacy systems was based on the time 544 to send tens of bytes over a 1200 baud link. Hundreds of 545 milliseconds is typical. This type of request is statistically 546 multiplexed over the L2N and cost-based fair-share best-effort 547 service is usually expected. 549 4. Bulk transfer. Bulk transfers involve the transmission of blocks 550 of data in multiple packets where temporary resources are 551 assigned to meet a transaction time constraint. Transient 552 resources are assigned for a limited period of time (related to 553 file size and data rate) to meet the bulk transfers service 554 requirements. 556 For industrial applications Service parameters include but might not 557 be limited to: 559 o Data bandwidth - the bandwidth might be allocated permanently or 560 for a period of time to a specific flow that usually exhibits well 561 defined properties of burstiness and throughput. Some bandwidth 562 will also be statistically shared between flows in a best effort 563 fashion. 565 o Latency - the time taken for the data to transit the network from 566 the source to the destination. This may be expressed in terms of 567 a deadline for delivery. Most monitoring latencies will be in 568 seconds to minutes. 570 o Transmission phase - process applications can be synchronized to 571 wall clock time and require coordinated transmissions. A common 572 coordination frequency is 4 Hz (250 ms). 574 o Service contract type - revocation priority. L2Ns have limited 575 network resources that can vary with time. This means the system 576 can become fully subscribed or even over subscribed. System 577 policies determine how resources are allocated when resources are 578 over subscribed. The choices are blocking and graceful 579 degradation. 581 o Transmission priority - the means by which limited resources 582 within field devices are allocated across multiple services. For 583 transmissions, a device has to select which packet in its queue 584 will be sent at the next transmission opportunity. Packet 585 priority is used as one criterion for selecting the next packet. 586 For reception, a device has to decide how to store a received 587 packet. The field devices are memory constrained and receive 588 buffers may become full. Packet priority is used to select which 589 packets are stored or discarded. 591 The routing protocol MUST also support different metric types for 592 each link used to compute the path according to some objective 593 function (e.g. minimize latency). 595 Industrial application data flows between field devices are not 596 necessarily symmetric. In particular, asymmetrical cost and 597 unidirectional routes are common for published data and alerts, which 598 represent the most part of the sensor traffic. The routing protocol 599 MUST be able to set up unidirectional or asymmetrical cost routes 600 that are composed of one or more non congruent paths. 602 3.1. Configurable Application Requirement 604 Time-varying user requirements for latency and bandwidth will require 605 changes in the provisioning of the underlying L2 protocols. A 606 technician may initiate a query/response session or bulk transfer to 607 diagnose or configure a field device. A level sensor device may need 608 to perform a calibration and send a bulk file to a plant. The 609 routing protocol MUST route on paths that are changed to 610 appropriately provision the application requirements. The routing 611 protocol MUST support the ability to recompute paths based on 612 underlying link characteristics that may change dynamically. 614 3.2. Different Routes for Different Flows 616 Because different services categories have different service 617 requirements, it is often desirable to have different routes for 618 different data flows between the same two endpoints. For example, 619 alarm or periodic data from A to Z may require path diversity with 620 specific latency and reliability. A file transfer between A and Z 621 may not need path diversity. The routing algorithm MUST be able to 622 generate different routes for different flows. 624 4. Reliability Requirements 626 There are a variety of different ways to look at reliability in an 627 industrial low power lossy network: 629 1) Availability of source to sink connectivity when the application 630 needs it, expressed in #fail / #success 632 2) Availability of source to sink connectivity when the application 633 might need it, expressed in #potential fail / available 634 bandwidth, 636 3) Probability of failure on demand, 638 4) Ability, expressed in #failures divided by #successes to get data 639 delivered from source to sink within a capped time, 641 5) How well a network (serving many applications) achieves end-to- 642 end delivery of packets within a bounded latency 644 The common theme running through all reliability requirements from a 645 user perspective is that it be end-to-end, usually with a time bound. 647 The impact of not receiving sensor data due to sporadic network 648 outages can be devastating if this happens unnoticed. However, if 649 sinks that expect periodic sensor data or alarm status updates, fail 650 to get them, then automatically these systems can take appropriate 651 actions that prevent dangerous situations. Depending on the wireless 652 application, appropriate action ranges from initiating a shut down 653 within 100 ms, to using a last known good value for as much as N 654 successive samples, to sending out an operator into the plant to 655 collect monthly data in the conventional way, i.e. some portable 656 sensor, paper and a clipboard. 658 Another critical aspect for the routing is the capability to ensure 659 maximum disruption time and route maintainance. The maximum 660 disruption time is the time it takes at most for a specific path to 661 be restored when broken. Route maintainance ensures that a path is 662 monitored to be restored when broken within the maximum disruption 663 time. Maintenance should also ensure that a path continues to 664 provide the service for which it was established for instance in 665 terms of bandwidth, jitter and latency. 667 In industrial applications, reliability is usually defined with 668 respect to end-to-end delivery of packets within a bounded latency. 669 Reliability requirements vary over many orders of magnitude. Some 670 non-critical monitoring applications may tolerate a availability of 671 less than 90% with hours of latency. Most industrial standards, such 672 as HART7, have set user reliability expectations at 99.9%. 673 Regulatory requirements are a driver for some industrial 674 applications. Regulatory monitoring requires high data integrity 675 because lost data is assumed to be out of compliance and subject to 676 fines. This can drive up either reliability, or thrustworthiness 677 requirements. 679 Hop-by-hop path diversity is used to improve latency-bounded 680 reliability. Additionally, bicasting or pluricasting may be used 681 over multiple non congruent / non overlapping paths to increase the 682 likelihood that at least one instance of a critical packet be 683 delivered error free. 685 Because data from field devices are aggregated and funneled at the 686 L2N access point before they are routed to plant applications, L2N 687 access point redundancy is an important factor in overall 688 availability. A route that connects a field device to a plant 689 application may have multiple paths that go through more than one L2N 690 access point. The routing protocol MUST support multiple L2N access 691 points and load distribution among L2N access points. The routing 692 protocol MUST support multiple L2N access points when L2N access 693 point redundancy is required. Because L2Ns are lossy in nature, 694 multiple paths in a L2N route MUST be supported. The availability of 695 each path in a route can change over time. Hence, it is important to 696 measure the availability on a per-path basis and select a path (or 697 paths) according to the availability requirements. 699 5. Device-Aware Routing Requirements 701 Wireless L2N nodes in industrial environments are powered by a 702 variety of sources. Battery operated devices with lifetime 703 requirements of at least five years are the most common. Battery 704 operated devices have a cap on their total energy, and typically can 705 report an estimate of remaining energy, and typically do not have 706 constraints on the short-term average power consumption. Energy 707 scavenging devices are more complex. These systems contain both a 708 power scavenging device (such as solar, vibration, or temperature 709 difference) and an energy storage device, such as a rechargeable 710 battery or a capacitor. These systems, therefore, have limits on 711 both long-term average power consumption (which cannot exceed the 712 average scavenged power over the same interval) as well as the short- 713 term limits imposed by the energy storage requirements. For solar- 714 powered systems, the energy storage system is generally designed to 715 provide days of power in the absence of sunlight. Many industrial 716 sensors run off of a 4-20 mA current loop, and can scavenge on the 717 order of milliwatts from that source. Vibration monitoring systems 718 are a natural choice for vibration scavenging, which typically only 719 provides tens or hundreds of microwatts. Due to industrial 720 temperature ranges and desired lifetimes, the choices of energy 721 storage devices can be limited, and the resulting stored energy is 722 often comparable to the energy cost of sending or receiving a packet 723 rather than the energy of operating the node for several days. And 724 of course, some nodes will be line-powered. 726 Example 1: solar panel, lead-acid battery sized for two weeks of 727 rain. In this system, the average power consumption over any two 728 week period must be kept below a threshhold defined by the solar 729 panel. The peak power over minutes or hours could be dramatically 730 higher. 732 Example 2: 100uA vibration scavenger, 1mF tantalum capacitor. With 733 very limited storage capability, even the short-term average power 734 consumption of this system must be low. If the cost of sending or 735 receiving a packet is 100uC, and a maximum tolerable capacitor 736 voltage droop of 1V is allowed, then the long term average must be 737 less than 1 packet sent or received per second, and no more than 5 738 packets may be forwarded in any given second. 740 Field devices have limited resources. Low-power, low-cost devices 741 have limited memory for storing route information. Typical field 742 devices will have a finite number of routes they can support for 743 their embedded sensor/actuator application and for forwarding other 744 devices packets in a mesh network slotted-link. 746 Users may strongly prefer that the same device have different 747 lifetime requirements in different locations. A sensor monitoring a 748 non-critical parameter in an easily accessed location may have a 749 lifetime requirement that is shorter and tolerate more statistical 750 variation than a mission-critical sensor in a hard-to-reach place 751 that requires a plant shutdown in order to replace. 753 The routing algorithm MUST support node-constrained routing (e.g. 754 taking into account the existing energy state as a node constraint). 755 Node constraints include power and memory, as well as constraints 756 placed on the device by the user, such as battery life. 758 6. Broadcast/Multicast 760 Some existing industrial plant applications do not use broadcast or 761 multicast addressing to communicate to field devices. Unicast 762 address support is sufficient for them. 764 In some other industrial process automation environments, multicast 765 over IP is used to deliver to multiple nodes that may be 766 functionally-similar or not. Example usages are: 768 1) Delivery of alerts to multiple similar servers in an automation 769 control room. Alerts are multicast to a group address based on 770 the part of the automation process where the alerts arose (e.g., 771 the multicast address "all-nodes-interested-in-alerts-for- 772 process-unit-X"). This is always a restricted-scope multicast, 773 not a broadcast 775 2) Delivery of common packets to multiple routers over a backbone, 776 where the packets results in each receiving router initiating 777 multicast (sometimes as a full broadcast) within the LLN. This 778 is byproduct of having potentially physically separated backbone 779 routers that can inject messages into different portions of the 780 same larger LLN. 782 3) Publication of measurement data to more than one subscriber. 783 This feature is useful in some peer to peer control applications. 784 For example, level position may be useful to a controller that 785 operates the flow valve and also to the overfill alarm indicator. 786 Both controller and alarm indicator would receive the same 787 publication sent as a multicast by the level gauge. 789 It is quite possible that first-generation wireless automation field 790 networks can be adequately useful without either of these 791 capabilities, but in the near future, wireless field devices with 792 communication controllers and protocol stacks will require control 793 and configuration, such as firmware downloading, that may benefit 794 from broadcast or multicast addressing. 796 The routing protocol SHOULD support broadcast or multicast 797 addressing. 799 7. Route Establishment Time 801 During network formation, installers with no networking skill must be 802 able to determine if their devices are "in the network" with 803 sufficient connectivity to perform their function. Installers will 804 have sufficient skill to provision the devices with a sample rate or 805 activity profile. The routing algorithm MUST find the appropriate 806 route(s) and report success or failure within several minutes, and 807 SHOULD report success or failure within tens of seconds. 809 Network connectivity in real deployments is always time varying, with 810 time constants from seconds to months. So long as the underlying 811 connectivity has not been compromised, this link churn should not 812 substantially affect network operation. The routing algorithm MUST 813 respond to normal link failure rates with routes that meet the 814 Service requirements (especially latency) throughout the routing 815 response. The routing algorithm SHOULD always be in the process of 816 optimizing the system in response to changing link statistics. The 817 routing algorithm MUST re-optimize the paths when field devices 818 change due to insertion, removal or failure, and this re-optimization 819 MUST not cause latencies greater than the specified constraints 820 (typically seconds to minutes). 822 8. Mobility 824 Various economic factors have contributed to a reduction of trained 825 workers in the plant. The industry as a whole appears to be trying 826 to solve this problem with what is called the "wireless worker". 827 Carrying a PDA or something similar, this worker will be able to 828 accomplish more work in less time than the older, better-trained 829 workers that he or she replaces. Whether the premise is valid, the 830 use case is commonly presented: the worker will be wirelessly 831 connected to the plant IT system to download documentation, 832 instructions, etc., and will need to be able to connect "directly" to 833 the sensors and control points in or near the equipment on which he 834 or she is working. It is possible that this "direct" connection 835 could come via the normal L2Ns data collection network. This 836 connection is likely to require higher bandwidth and lower latency 837 than the normal data collection operation. 839 Undecided yet is if these PDAs will use the L2N network directly to 840 talk to field sensors, or if they will rather use other wireless 841 connectivity that proxys back into the field, or to anywhere else, 842 the user interfaces typically used for plant historians, asset 843 management systems, and the likes. 845 The routing protocol SHOULD support the wireless worker with fast 846 network connection times of a few of seconds, and low command and 847 response latencies to the plant behind the L2N access points, to 848 applications, and to field devices. The routing protocol SHOULD also 849 support the bandwidth allocation for bulk transfers between the field 850 device and the handheld device of the wireless worker. The routing 851 protocol SHOULD support walking speeds for maintaining network 852 connectivity as the handheld device changes position in the wireless 853 network. 855 Some field devices will be mobile. These devices may be located on 856 moving parts such as rotating components or they may be located on 857 vehicles such as cranes or fork lifts. The routing protocol SHOULD 858 support vehicular speeds of up to 35 kmph. 860 9. Manageability 862 The process and control industry is manpower constrained. The aging 863 demographics of plant personnel are causing a looming manpower 864 problem for industry across many markets. The goal for the 865 industrial networks is to have the installation process not require 866 any new skills for the plant personnel. The person would install the 867 wireless sensor or wireless actuator the same way the wired sensor or 868 wired actuator is installed, except the step to connect wire is 869 eliminated. 871 Most users in fact demand even much further simplified provisioning 872 methods, whereby automatically any new device will connect and report 873 at the L2N access point. This requires availability of open and 874 untrusted side channels for new joiners, and it requires strong and 875 automated authentication so that networks can automatically accept or 876 reject new joiners. Ideally, for a user, adding new devices should 877 be as easy as dragging and dropping an icon from a pool of 878 authenticated new joiners into a pool for the wired domain that this 879 new sensor should connect to. Under the hood, invisible to the user, 880 auditable security mechanisms should take care of new device 881 authentication, and secret join key distribution. These more 882 sophisticated 'over the air' secure provisioning methods should 883 eliminate the use of traditional configuration tools for setting up 884 devices prior to being ready to securely join a L2N access point. 886 There will be many new applications where even without any human 887 intervention at the plant, devices that have never been on site 888 before, should be allowed, based on their credentials and crypto 889 capabilities, to connect anyway. Examples are 3rd party road 890 tankers, rail cargo containers with overfill protection sensors, or 891 consumer cars that need to be refueled with hydrogen by robots at 892 future petrol stations. 894 The routing protocol for L2Ns is expected to be easy to deploy and 895 manage. Because the number of field devices in a network is large, 896 provisioning the devices manually would not make sense. Therefore, 897 the routing protocol MUST support auto-provisioning of field devices. 898 The protocol also MUST support the distribution of configuration from 899 a centralized management controller if operator-initiated 900 configuration change is allowed. 902 10. Security 904 Given that wireless sensor networks in industrial automation operate 905 in systems that have substantial financial and human safety 906 implications, security is of considerable concern. Levels of 907 security violation that are tolerated as a "cost of doing business" 908 in the banking industry are not acceptable when in some cases 909 literally thousands of lives may be at risk. 911 Security is easily confused with guarantee for availability. When 912 discussing wireless security, it's important to distinguish clearly 913 between the risks of temporary losing connectivity, say due to a 914 thunderstorm, and the risks associated with knowledgeable adversaries 915 attacking a wireless system. The conscious attacks need to be split 916 between 1) attacks on the actual application served be the wireless 917 devices and 2) attacks that exploit the presence of a wireless access 918 point that MAY provide connectivity onto legacy wired plant networks, 919 so attacks that have little to do with the wireless devices in the 920 L2Ns. The second type of attack, access points that might be 921 wireless backdoors that may allow an attacker outside the fence to 922 access typically non-secured process control and/or office networks, 923 are typically the ones that do create exposures where lives are at 924 risk. This implies that the L2N access point on its own must possess 925 functionality that guarantees domain segregation, and thus prohibits 926 many types of traffic further upstream. 928 Current generation industrial wireless device manufactures are 929 specifying security at the MAC layer and the transport layer. A 930 shared key is used to authenticate messages at the MAC layer. At the 931 transport layer, commands are encrypted with unique randomly- 932 generated end-to-end Session keys. HART7 and ISA100.11a are examples 933 of security systems for industrial wireless networks. 935 Although such symmetric key encryption and authentication mechanisms 936 at MAC and transport layers may protect reasonably well during the 937 lifecycle, the initial network boot (provisioning) step in many cases 938 requires more sophisticated steps to securely land the initial secret 939 keys in field devices. It is vital that also during these steps, the 940 ease of deployment and the freedom of mixing and matching products 941 from different suppliers doesn't complicate life for those that 942 deploy and commission. Given average skill levels in the field, and 943 given serious resource constraints in the market, investing a little 944 bit more in sensor node hardware and software so that new devices 945 automatically can be deemed trustworthy, and thus automatically join 946 the domains that they should join, with just one drag and drop action 947 for those in charge of deploying, will yield in faster adoption and 948 proliferation of the L2N technology. 950 Industrial plants may not maintain the same level of physical 951 security for field devices that is associated with traditional 952 network sites such as locked IT centers. In industrial plants it 953 must be assumed that the field devices have marginal physical 954 security and the security system needs to have limited trust in them. 955 The routing protocol SHOULD place limited trust in the field devices 956 deployed in the plant network. 958 The routing protocol SHOULD compartmentalize the trust placed in 959 field devices so that a compromised field device does not destroy the 960 security of the whole network. The routing MUST be configured and 961 managed using secure messages and protocols that prevent outsider 962 attacks and limit insider attacks from field devices installed in 963 insecure locations in the plant. 965 Wireless typically forces us to abandon classical 'by perimeter' 966 thinking when trying to secure network domains. Wireless nodes in 967 L2N networks should thus be regarded as little islands with trusted 968 kernels, situated in an ocean of untrusted connectivity, an ocean 969 that might be full of pirate ships. Consequently, confidence in node 970 identity and ability to challenge authenticity of source node 971 credentials gets more relevant. Cryptographic boundaries inside 972 devices that clearly demark the border between trusted and untrusted 973 areas need to be drawn. Protection against compromise of the 974 cryptographic boundaries inside the hardware of devices is outside of 975 the scope this document. Standards exist that address those 976 vulnerabilities. 978 11. IANA Considerations 980 This document includes no request to IANA. 982 12. Acknowledgements 984 Many thanks to Rick Enns, Alexander Chernoguzov and Chol Su Kang for 985 their contributions. 987 13. References 988 13.1. Normative References 990 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 991 Requirement Levels", BCP 14, RFC 2119, March 1997. 993 13.2. Informative References 995 [I-D.culler-rl2n-routing-reqs] 996 Vasseur, J. and D. Cullerot, "Routing Requirements for Low 997 Power And Lossy Networks", 998 draft-culler-rl2n-routing-reqs-01 (work in progress), 999 July 2007. 1001 13.3. External Informative References 1003 [HART] www.hartcomm.org, "Highway Addressable Remote Transducer", 1004 a group of specifications for industrial process and 1005 control devices administered by the HART Foundation". 1007 [ISA100.11a] 1008 ISA, "ISA100, Wireless Systems for Automation", May 2008, 1009 < http://www.isa.org/Community/ 1010 SP100WirelessSystemsforAutomation>. 1012 Authors' Addresses 1014 Kris Pister (editor) 1015 Dust Networks 1016 30695 Huntwood Ave. 1017 Hayward, 94544 1018 USA 1020 Email: kpister@dustnetworks.com 1022 Pascal Thubert (editor) 1023 Cisco Systems 1024 Village d'Entreprises Green Side 1025 400, Avenue de Roumanille 1026 Batiment T3 1027 Biot - Sophia Antipolis 06410 1028 FRANCE 1030 Phone: +33 497 23 26 34 1031 Email: pthubert@cisco.com 1032 Sicco Dwars 1033 Shell Global Solutions International B.V. 1034 Sir Winston Churchilllaan 299 1035 Rijswijk 2288 DC 1036 Netherlands 1038 Phone: +31 70 447 2660 1039 Email: sicco.dwars@shell.com 1041 Tom Phinney 1042 5012 W. Torrey Pines Circle 1043 Glendale, AZ 85308-3221 1044 USA 1046 Phone: +1 602 938 3163 1047 Email: tom.phinney@cox.net 1049 Full Copyright Statement 1051 Copyright (C) The IETF Trust (2008). 1053 This document is subject to the rights, licenses and restrictions 1054 contained in BCP 78, and except as set forth therein, the authors 1055 retain all their rights. 1057 This document and the information contained herein are provided on an 1058 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 1059 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 1060 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 1061 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 1062 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 1063 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 1065 Intellectual Property 1067 The IETF takes no position regarding the validity or scope of any 1068 Intellectual Property Rights or other rights that might be claimed to 1069 pertain to the implementation or use of the technology described in 1070 this document or the extent to which any license under such rights 1071 might or might not be available; nor does it represent that it has 1072 made any independent effort to identify any such rights. Information 1073 on the procedures with respect to rights in RFC documents can be 1074 found in BCP 78 and BCP 79. 1076 Copies of IPR disclosures made to the IETF Secretariat and any 1077 assurances of licenses to be made available, or the result of an 1078 attempt made to obtain a general license or permission for the use of 1079 such proprietary rights by implementers or users of this 1080 specification can be obtained from the IETF on-line IPR repository at 1081 http://www.ietf.org/ipr. 1083 The IETF invites any interested party to bring to its attention any 1084 copyrights, patents or patent applications, or other proprietary 1085 rights that may cover technology that may be required to implement 1086 this standard. Please address the information to the IETF at 1087 ietf-ipr@ietf.org.