idnits 2.17.1 draft-wetterwald-detnet-utilities-reqs-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** The document is more than 15 pages and seems to lack a Table of Contents. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (June 30, 2015) is 3215 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Unused Reference: 'IEC61850-90-12' is defined on line 1154, but no explicit reference was found in the text == Outdated reference: A later version (-05) exists of draft-finn-detnet-problem-statement-03 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 detnet P. Wetterwald 3 Internet-Draft Cisco 4 Intended status: Informational J. Raymond 5 Expires: January 1, 2016 Hydro-Quebec 6 June 30, 2015 8 Deterministic Networking Uitilities requirements 9 draft-wetterwald-detnet-utilities-reqs-02 11 Abstract 13 This paper documents the needs in Smart Grid industry to establish 14 multi-hop paths for characterized flows with deterministic 15 properties. 17 Status of This Memo 19 This Internet-Draft is submitted in full conformance with the 20 provisions of BCP 78 and BCP 79. 22 Internet-Drafts are working documents of the Internet Engineering 23 Task Force (IETF). Note that other groups may also distribute 24 working documents as Internet-Drafts. The list of current Internet- 25 Drafts is at http://datatracker.ietf.org/drafts/current/. 27 Internet-Drafts are draft documents valid for a maximum of six months 28 and may be updated, replaced, or obsoleted by other documents at any 29 time. It is inappropriate to use Internet-Drafts as reference 30 material or to cite them other than as "work in progress." 32 This Internet-Draft will expire on January 1, 2016. 34 Copyright Notice 36 Copyright (c) 2015 IETF Trust and the persons identified as the 37 document authors. All rights reserved. 39 This document is subject to BCP 78 and the IETF Trust's Legal 40 Provisions Relating to IETF Documents 41 (http://trustee.ietf.org/license-info) in effect on the date of 42 publication of this document. Please review these documents 43 carefully, as they describe your rights and restrictions with respect 44 to this document. Code Components extracted from this document must 45 include Simplified BSD License text as described in Section 4.e of 46 the Trust Legal Provisions and are provided without warranty as 47 described in the Simplified BSD License. 49 1. Introduction 51 [I-D.finn-detnet-problem-statement] defines the characteristics of a 52 deterministic flow as a data communication flow with a bounded 53 latency, extraordinarily low frame loss, and a very narrow jitter. 54 This document intends to define the utility requirements for 55 deterministic networking. 57 2. Requirements Language 59 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 60 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 61 document are to be interpreted as described in RFC 2119 [RFC2119]. 63 3. Overview 65 Utility Telecom Networks 67 The business and technology trends that are sweeping the utility 68 industry will drastically transform the utility business from the way 69 it has been for many decades. At the core of many of these changes 70 is a drive to modernize the electrical grid with an integrated 71 telecommunications infrastructure. However, interoperability, 72 concerns, legacy networks, disparate tools, and stringent security 73 requirements all add complexity to the grid transformation. Given 74 the range and diversity of the requirements that should be addressed 75 by the next generation telecommunications infrastructure, utilities 76 need to adopt a holistic architectural approach to integrate the 77 electrical grid with digital telecommunications across the entire 78 power delivery chain. 80 Many utilities still rely on complex environments formed of multiple 81 application-specific, proprietary networks. Information is siloed 82 between operational areas. This prevents utility operations from 83 realizing the operational efficiency benefits, visibility, and 84 functional integration of operational information across grid 85 applications and data networks. The key to modernizing grid 86 telecommunications is to provide a common, adaptable, multi-service 87 network infrastructure for the entire utility organization. Such a 88 network serves as the platform for current capabilities while 89 enabling future expansion of the network to accommodate new 90 applications and services. 92 To meet this diverse set of requirements, both today and in the 93 future, the next generation utility telecommunnications network will 94 be based on open-standards-based IP architecture. An end-to-end IP 95 architecture takes advantage of nearly three decades of IP technology 96 development, facilitating interoperability across disparate networks 97 and devices, as it has been already demonstrated in many mission- 98 critical and highly secure networks. 100 IEC (International Electrotechnical Commission) and different 101 National Committees have mandated a specific adhoc group (AHG8) to 102 define the migration strategy to IPv6 for all the IEC TC57 power 103 automation standards. IPv6 is seen as the obvious future 104 telecommunications technology for the Smart Grid. The Adhoc Group 105 has disclosed, to the IEC coordination group, their conclusions at 106 the end of 2014. 108 It is imperative that utilities participate in standards development 109 bodies to influence the development of future solutions and to 110 benefit from shared experiences of other utilities and vendors. 112 4. Telecommunications Trends and General telecommunications 113 Requirements 115 These general telecommunications requirements are over and above the 116 specific requirements of the use cases that have been addressed so 117 far. These include both current and future telecommunications 118 related requirements that should be factored into the network 119 architecture and design. 121 4.1. General Telecommunications Requirements 123 o IP Connectivity everywhere 125 o Monitoring services everywhere and from different remote centers 127 o Move services to a virtual data center 129 o Unify access to applications / information from the corporate 130 network 132 o Unify services 134 o Unified Communications Solutions 136 o Mix of fiber and microwave technologies - obsolescence of SONET/ 137 SDH or TDM 139 o Standardize grid telecommunications protocol to opened standard to 140 ensure interoperability 142 o Reliable Telecommunications for Transmission and Distribution 143 Substations 145 o IEEE 1588 time synchronization Client / Server Capabilities 147 o Integration of Multicast Design 149 o QoS Requirements Mapping 151 o Enable Future Network Expansion 153 o Substation Network Resilience 155 o Fast Convergence Design 157 o Scalable Headend Design 159 o Define Service Level Agreements (SLA) and Enable SLA Monitoring 161 o Integration of 3G/4G Technologies and future technologies 163 o Ethernet Connectivity for Station Bus Architecture 165 o Ethernet Connectivity for Process Bus Architecture 167 o Protection, teleprotection and PMU (Phaser Measurement Unit) on IP 169 4.1.1. Migration to Packet-Switched Network 171 Throughout the world, utilities are increasingly planning for a 172 future based on smart grid applications requiring advanced 173 telecommunications systems. Many of these applications utilize 174 packet connectivity for communicating information and control signals 175 across the utility's Wide Area Network (WAN), made possible by 176 technologies such as multiprotocol label switching (MPLS). The data 177 that traverses the utility WAN includes: 179 o Grid monitoring, control, and protection data 181 o Non-control grid data (e.g. asset data for condition-based 182 monitoring) 184 o Physical safety and security data (e.g. voice and video) 186 o Remote worker access to corporate applications (voice, maps, 187 schematics, etc.) 189 o Field area network backhaul for smart metering, and distribution 190 grid management 192 o Enterprise traffic (email, collaboration tools, business 193 applications) 195 WANs support this wide variety of traffic to and from substations, 196 the transmission and distribution grid, generation sites, between 197 control centers, and between work locations and data centers. To 198 maintain this rapidly expanding set of applications, many utilities 199 are taking steps to evolve present time-division multiplexing (TDM) 200 based and frame relay infrastructures to packet systems. Packet- 201 based networks are designed to provide greater functionalities and 202 higher levels of service for applications, while continuing to 203 deliver reliability and deterministic (real-time) traffic support. 205 4.2. Applications, Use cases and traffic patterns 207 Among the numerous applications and use cases that a utility deploys 208 today, many rely on high availability and deterministic behaviour of 209 the telecommunications networks. Protection use cases and generation 210 control are the most demanding and can't rely on a best effort 211 approach. 213 4.2.1. Transmission use cases 215 Protection means not only the protection of the human operator but 216 also the protection of the electric equipments and the preservation 217 of the stability and frequency of the grid. If a default occurs on 218 the transmission or the distribution of the electricity, important 219 damages could occured to the human operator but also to very costly 220 electrical equipments and perturb the grid leading to blackouts. The 221 time and reliability requirements are very strong to avoid dramatic 222 impacts to the electrical infrastructure. 224 4.2.1.1. Tele Protection 226 The key criteria for measuring Teleprotection performance are command 227 transmission time, dependability and security. These criteria are 228 defined by the IEC standard 60834 as follows: 230 o Transmission time (Speed): The time between the moment where state 231 changes at the transmitter input and the moment of the 232 corresponding change at the receiver output, including propagation 233 delay. Overall operating time for a Teleprotection system 234 includes the time for initiating the command at the transmitting 235 end, the propagation delay over the network (including equipments) 236 and the selection and decision time at the receiving end, 237 including any additional delay due to a noisy environment. 239 o Dependability: The ability to issue and receive valid commands in 240 the presence of interference and/or noise, by minimizing the 241 probability of missing command (PMC). Dependability targets are 242 typically set for a specific bit error rate (BER) level. 244 o Security: The ability to prevent false tripping due to a noisy 245 environment, by minimizing the probability of unwanted commands 246 (PUC). Security targets are also set for a specific bit error 247 rate (BER) level. 249 Additional key elements that may impact Teleprotection performance 250 include bandwidth rate of the Teleprotection system and its 251 resiliency or failure recovery capacity. Transmission time, 252 bandwidth utilization and resiliency are directly linked to the 253 telecommunications equipments and the connections that are used to 254 transfer the commands between relays. 256 4.2.1.1.1. Latency Budget Consideration 258 Delay requirements for utility networks may vary depending upon a 259 number of parameters, such as the specific protection equipments 260 used. Most power line equipment can tolerate short circuits or 261 faults for up to approximately five power cycles before sustaining 262 irreversible damage or affecting other segments in the network. This 263 translates to total fault clearance time of 100ms. As a safety 264 precaution, however, actual operation time of protection systems is 265 limited to 70- 80 percent of this period, including fault recognition 266 time, command transmission time and line breaker switching time. 267 Some system components, such as large electromechanical switches, 268 require particularly long time to operate and take up the majority of 269 the total clearance time, leaving only a 10ms window for the 270 telecommunications part of the protection scheme, independent of the 271 distance to travel. Given the sensitivity of the issue, new networks 272 impose requirements that are even more stringent: IEC standard 61850 273 limits the transfer time for protection messages to 1/4 - 1/2 cycle 274 or 4 - 8ms (for 60Hz lines) for the most critical messages. 276 4.2.1.1.2. Asymetric delay 278 In addition to minimal transmission delay, a differential protection 279 telecommunications channel must be synchronous, i.e., experiencing 280 symmetrical channel delay in transmit and receive paths. This 281 requires special attention in jitter-prone packet networks. While 282 optimally Teleprotection systems should support zero asymmetric 283 delay, typical legacy relays can tolerate discrepancies of up to 284 750us. 286 The main tools available for lowering delay variation below this 287 threshold are: 289 o A jitter buffer at the multiplexers on each end of the line can be 290 used to offset delay variation by queuing sent and received 291 packets. The length of the queues must balance the need to 292 regulate the rate of transmission with the need to limit overall 293 delay, as larger buffers result in increased latency. This is the 294 old TDM traditional way to fulfill this requirement. 296 o Traffic management tools ensure that the Teleprotection signals 297 receive the highest transmission priority and minimize the number 298 of jitter addition during the path. This is one way to meet the 299 requirement in IP networks. 301 o Standard Packet-Based synchronization technologies, such as 302 1588-2008 Precision Time Protocol (PTP) and Synchronous Ethernet 303 (Sync-E), can help maintain stable networks by keeping a highly 304 accurate clock source on the different network devices involved. 306 4.2.1.1.2.1. Other traffic characteristics 308 o Redundancy: The existence in a system of more than one means of 309 accomplishing a given function. 311 o Recovery time : The duration of time within which a business 312 process must be restored after any type of disruption in order to 313 avoid unacceptable consequences associated with a break in 314 business continuity. 316 o performance management : In networking, a management function 317 defined for controlling and analyzing different parameters/metrics 318 such as the throughput, error rate. 320 o packet loss : One or more packets of data travelling across 321 network fail to reach their destination. 323 4.2.1.1.2.2. Teleprotection network requirements 325 The following table captures the main network requirements (this is 326 based on IEC 61850 standard) 327 +-----------------------------+-------------------------------------+ 328 | Teleprotection Requirement | Attribute | 329 +-----------------------------+-------------------------------------+ 330 | One way maximum delay | 4-10 ms | 331 | | | 332 | Asymetric delay required | Yes | 333 | | | 334 | Maximum jitter | less than 250 us (750 us for legacy | 335 | | IED) | 336 | | | 337 | Topology | Point to point, point to Multi- | 338 | | point | 339 | | | 340 | Availability | 99.9999 | 341 | | | 342 | precise timing required | Yes | 343 | | | 344 | Recovery time on node | less than 50ms - hitless | 345 | failure | | 346 | | | 347 | performance management | Yes, Mandatory | 348 | | | 349 | Redundancy | Yes | 350 | | | 351 | Packet loss | 0.1% to 1% | 352 +-----------------------------+-------------------------------------+ 354 Table 1: Teleprotection network requirements 356 4.2.1.2. Inter-Trip Protection scheme 358 Inter-tripping is the controlled tripping of a circuit breaker to 359 complete the isolation of a circuit or piece of apparatus in concert 360 with the tripping of other circuit breakers. The main use of such 361 schemes is to ensure that protection at both ends of a faulted 362 circuit will operate to isolate the equipment concerned. Inter- 363 tripping schemes use signaling to convey a trip command to remote 364 circuit breakers to isolate circuits. 366 +--------------------------------+----------------------------------+ 367 | Inter-Trip protection | Attribute | 368 | Requirement | | 369 +--------------------------------+----------------------------------+ 370 | One way maximum delay | 5 ms | 371 | | | 372 | Asymetric delay required | No | 373 | | | 374 | Maximum jitter | Not critical | 375 | | | 376 | Topology | Point to point, point to Multi- | 377 | | point | 378 | | | 379 | Bandwidth | 64 Kbps | 380 | | | 381 | Availability | 99.9999 | 382 | | | 383 | precise timing required | Yes | 384 | | | 385 | Recovery time on node failure | less than 50ms - hitless | 386 | | | 387 | performance management | Yes, Mandatory | 388 | | | 389 | Redundancy | Yes | 390 | | | 391 | Packet loss | 0.1% | 392 +--------------------------------+----------------------------------+ 394 Table 2: Inter-Trip protection network requirements 396 4.2.1.3. Current Differential Protection Scheme 398 Current differential protection is commonly used for line protection, 399 and is typical for protecting parallel circuits. A main advantage 400 for differential protection is that, compared to overcurrent 401 protection, it allows only the faulted circuit to be de-energized in 402 case of a fault. At both end of the lines, the current is measured 403 by the differential relays, and based on Kirchhoff's law, both relays 404 will trip the circuit breaker if the current going into the line does 405 not equal the current going out of the line. This type of protection 406 scheme assumes some form of communications being present between the 407 relays at both end of the line, to allow both relays to compare 408 measured current values. A fault in line 1 will cause overcurrent to 409 be flowing in both lines, but because the current in line 2 is a 410 through following current, this current is measured equal at both 411 ends of the line, therefore the differential relays on line 2 will 412 not trip line 2. Line 1 will be tripped, as the relays will not 413 measure the same currents at both ends of the line. Line 414 differential protection schemes assume a very low telecommunications 415 delay between both relays, often as low as 5ms. Moreover, as those 416 systems are often not time-synchronized, they also assume symmetric 417 telecommunications paths with constant delay, which allows comparing 418 current measurement values taken at the exact same time. 420 +----------------------------------+--------------------------------+ 421 | Current Differential protection | Attribute | 422 | Requirement | | 423 +----------------------------------+--------------------------------+ 424 | One way maximum delay | 5 ms | 425 | | | 426 | Asymetric delay Required | Yes | 427 | | | 428 | Maximum jitter | less than 250 us (750us for | 429 | | legacy IED) | 430 | | | 431 | Topology | Point to point, point to | 432 | | Multi-point | 433 | | | 434 | Bandwidth | 64 Kbps | 435 | | | 436 | Availability | 99.9999 | 437 | | | 438 | precise timing required | Yes | 439 | | | 440 | Recovery time on node failure | less than 50ms - hitless | 441 | | | 442 | performance management | Yes, Mandatory | 443 | | | 444 | Redundancy | Yes | 445 | | | 446 | Packet loss | 0.1% | 447 +----------------------------------+--------------------------------+ 449 Table 3: Current Differential Protection requirements 451 4.2.1.4. Distance Protection Scheme 453 Distance (Impedance Relay) protection scheme is based on voltage and 454 current measurements. A fault on a circuit will generally create a 455 sag in the voltage level. If the ratio of voltage to current 456 measured at the protection relay terminals, which equates to an 457 impedance element, falls within a set threshold the circuit breaker 458 will operate. The operating characteristics of this protection are 459 based on the line characteristics. This means that when a fault 460 appears on the line, the impedance setting in the relay is compared 461 to the apparent impedance of the line from the relay terminals to the 462 fault. If the relay setting is determined to be below the apparent 463 impedance it is determined that the fault is within the zone of 464 protection. When the transmission line length is under a minimum 465 length, distance protection becomes more difficult to coordinate. In 466 these instances the best choice of protection is current differential 467 protection. 469 +-------------------------------+-----------------------------------+ 470 | Distance protection | Attribute | 471 | Requirement | | 472 +-------------------------------+-----------------------------------+ 473 | One way maximum delay | 5 ms | 474 | | | 475 | Asymetric delay Required | No | 476 | | | 477 | Maximum jitter | Not critical | 478 | | | 479 | Topology | Point to point, point to Multi- | 480 | | point | 481 | | | 482 | Bandwidth | 64 Kbps | 483 | | | 484 | Availability | 99.9999 | 485 | | | 486 | precise timing required | Yes | 487 | | | 488 | Recovery time on node failure | less than 50ms - hitless | 489 | | | 490 | performance management | Yes, Mandatory | 491 | | | 492 | Redundancy | Yes | 493 | | | 494 | Packet loss | 0.1% | 495 +-------------------------------+-----------------------------------+ 497 Table 4: Distance Protection requirements 499 4.2.1.5. Inter-Substation Protection Signaling 501 This use case describes the exchange of Sampled Value and/or GOOSE 502 (Generic Object Oriented Substation Events) message between 503 Intelligent Electronic Devices (IED) in two substations for 504 protection and tripping coordination. The two IEDs are in a master- 505 slave mode. 507 The Current Transformer or Voltage Transformer (CT/VT) in one 508 substation sends the sampled analog voltage or current value to the 509 Merging Unit (MU) over hard wire. The merging unit sends the time- 510 synchronized 61850-9-2 sampled values to the slave IED. The slave 511 IED forwards the information to the Master IED in the other 512 substation. The master IED makes the determination (for example 513 based on sampled value differentials) to send a trip command to the 514 originating IED. Once the slave IED/Relay receives the GOOSE trip 515 for breaker tripping, it opens the breaker. It then sends a 516 confirmation message back to the master. All data exchanges between 517 IEDs are either through Sampled Value and/or GOOSE messages. 519 +----------------------------------+--------------------------------+ 520 | Inter-Substation protection | Attribute | 521 | Requirement | | 522 +----------------------------------+--------------------------------+ 523 | One way maximum delay | 5 ms | 524 | | | 525 | Asymetric delay Required | No | 526 | | | 527 | Maximum jitter | Not critical | 528 | | | 529 | Topology | Point to point, point to | 530 | | Multi-point | 531 | | | 532 | Bandwidth | 64 Kbps | 533 | | | 534 | Availability | 99.9999 | 535 | | | 536 | precise timing required | Yes | 537 | | | 538 | Recovery time on node failure | less than 50ms - hitless | 539 | | | 540 | performance management | Yes, Mandatory | 541 | | | 542 | Redundancy | Yes | 543 | | | 544 | Packet loss | 1% | 545 +----------------------------------+--------------------------------+ 547 Table 5: Inter-Substation Protection requirements 549 4.2.1.6. Intra-Substation Process Bus Communications 551 This use case describes the data flow from the CT/VT to the IEDs in 552 the substation via the merging unit (MU). The CT/VT in the 553 substation send the sampled value (analog voltage or current) to the 554 Merging Unit (MU) over hard wire. The merging unit sends the time- 555 synchronized 61850-9-2 sampled values to the IEDs in the substation 556 in GOOSE message format. The GPS Master Clock can send 1PPS or 557 IRIG-B format to MU through serial port, or IEEE 1588 protocol via 558 network. Process bus communication using 61850 simplifies 559 connectivity within the substation and removes the requirement for 560 multiple serial connections and removes the slow serial bus 561 architectures that are typically used. This also ensures increased 562 flexibility and increased speed with the use of multicast messaging 563 between multiple devices. 565 +----------------------------------+--------------------------------+ 566 | Intra-Substation protection | Attribute | 567 | Requirement | | 568 +----------------------------------+--------------------------------+ 569 | One way maximum delay | 5 ms | 570 | | | 571 | Asymetric delay Required | No | 572 | | | 573 | Maximum jitter | Not critical | 574 | | | 575 | Topology | Point to point, point to | 576 | | Multi-point | 577 | | | 578 | Bandwidth | 64 Kbps | 579 | | | 580 | Availability | 99.9999 | 581 | | | 582 | precise timing required | Yes | 583 | | | 584 | Recovery time on Node failure | less than 50ms - hitless | 585 | | | 586 | performance management | Yes, Mandatory | 587 | | | 588 | Redundancy | Yes - No | 589 | | | 590 | Packet loss | 0.1% | 591 +----------------------------------+--------------------------------+ 593 Table 6: Intra-Substation Protection requirements 595 4.2.1.7. Wide Area Monitoring and Control Systems 597 The application of synchrophasor measurement data from Phasor 598 Measurement Units (PMU) to Wide Area Monitoring and Control Systems 599 promises to provide important new capabilities for improving system 600 stability. Access to PMU data enables more timely situational 601 awareness over larger portions of the grid than what has been 602 possible historically with normal SCADA (Supervisory Control and Data 603 Acquisition) data. Handling the volume and real-time nature of 604 synchrophasor data presents unique challenges for existing 605 application architectures. Wide Area management System (WAMS) makes 606 it possible for the condition of the bulk power system to be observed 607 and understood in real-time so that protective, preventative, or 608 corrective action can be taken. Because of the very high sampling 609 rate of measurements and the strict requirement for time 610 synchronization of the samples, WAMS has stringent telecommunications 611 requirements in an IP network that are captured in the following 612 table: 614 +----------------------+--------------------------------------------+ 615 | WAMS Requirement | Attribute | 616 +----------------------+--------------------------------------------+ 617 | One way maximum | 50 ms | 618 | delay | | 619 | | | 620 | Asymetric delay | No | 621 | Required | | 622 | | | 623 | Maximum jitter | Not critical | 624 | | | 625 | Topology | Point to point, point to Multi-point, | 626 | | Multi-point to Multi-point | 627 | | | 628 | Bandwidth | 100 Kbps | 629 | | | 630 | Availability | 99.9999 | 631 | | | 632 | precise timing | Yes | 633 | required | | 634 | | | 635 | Recovery time on | less than 50ms - hitless | 636 | Node failure | | 637 | | | 638 | performance | Yes, Mandatory | 639 | management | | 640 | | | 641 | Redundancy | Yes | 642 | | | 643 | Packet loss | 1% | 644 +----------------------+--------------------------------------------+ 646 Table 7: WAMS Special Communication Requirements 648 4.2.1.8. IEC 61850 WAN engineering guidelines requirement 649 classification 651 The IEC (International Electrotechnical Commission) has recently 652 published a Technical Report which offers guidelines on how to define 653 and deploy Wide Area Networks for the interconnections of electric 654 substations, generation plants and SCADA operation centers. The IEC 655 61850-90-12 is providing a classification of WAN communication 656 requirements into 4 classes. You will find herafter the table 657 summarizing these requirements: 659 +----------------+------------+------------+------------+-----------+ 660 | WAN | Class WA | Class WB | Class WC | Class WD | 661 | Requirement | | | | | 662 +----------------+------------+------------+------------+-----------+ 663 | Application | EHV (Extra | HV (High | MV (Medium | General | 664 | field | High | Voltage) | Voltage) | purpose | 665 | | Voltage) | | | | 666 | | | | | | 667 | Latency | 5 ms | 10 ms | 100 ms | > 100 ms | 668 | | | | | | 669 | Jitter | 10 us | 100 us | 1 ms | 10 ms | 670 | | | | | | 671 | Latency | 100 us | 1 ms | 10 ms | 100 ms | 672 | Asymetry | | | | | 673 | | | | | | 674 | Time Accuracy | 1 us | 10 us | 100 us | 10 to 100 | 675 | | | | | ms | 676 | | | | | | 677 | Bit Error rate | 10-7 to | 10-5 to | 10-3 | | 678 | | 10-6 | 10-4 | | | 679 | | | | | | 680 | Unavailability | 10-7 to | 10-5 to | 10-3 | | 681 | | 10-6 | 10-4 | | | 682 | | | | | | 683 | Recovery delay | Zero | 50 ms | 5 s | 50 s | 684 | | | | | | 685 | Cyber security | extremely | High | Medium | Medium | 686 | | high | | | | 687 +----------------+------------+------------+------------+-----------+ 689 Table 8: 61850-90-12 Communication Requirements; Courtesy of IEC 691 4.2.2. Distribution use case 693 4.2.2.1. Fault Location Isolation and Service Restoration (FLISR) 695 As the name implies, Fault Location, Isolation, and Service 696 Restoration (FLISR) refers to the ability to automatically locate the 697 fault, isolate the fault, and restore service in the distribution 698 network. It is a self-healing feature whose purpose is to minimize 699 the impact of faults by serving portions of the loads on the affected 700 circuit by switching to other circuits. It reduces the number of 701 customers that experience a sustained power outage by reconfiguring 702 distribution circuits. This will likely be the first wide spread 703 application of distributed intelligence in the grid. Secondary 704 substations can be connected to multiple primary substations. 705 Normally, static power switch statuses (open/closed) in the network 706 dictate the power flow to secondary substations. Reconfiguring the 707 network in the event of a fault is typically done manually on site to 708 operate switchgear to energize/de-energize alternate paths. 709 Automating the operation of substation switchgear allows the utility 710 to have a more dynamic network where the flow of power can be altered 711 under fault conditions but also during times of peak load. It allows 712 the utility to shift peak loads around the network. Or, to be more 713 precise, alters the configuration of the network to move loads 714 between different primary substations. The FLISR capability can be 715 enabled in two modes: 717 o Managed centrally from DMS (Distribution Management System), or 719 o Executed locally through distributed control via intelligent 720 switches and fault sensors. 722 There are 3 distinct sub-functions that are performed: 724 1. Fault Location Identification 726 This sub-function is initiated by SCADA inputs, such as lockouts, 727 fault indications/location, and, also, by input from the Outage 728 Management System (OMS), and in the future by inputs from fault- 729 predicting devices. It determines the specific protective device, 730 which has cleared the sustained fault, identifies the de-energized 731 sections, and estimates the probable location of the actual or the 732 expected fault. It distinguishes faults cleared by controllable 733 protective devices from those cleared by fuses, and identifies 734 momentary outages and inrush/cold load pick-up currents. This step 735 is also referred to as Fault Detection Classification and Location 736 (FDCL). This step helps to expedite the restoration of faulted 737 sections through fast fault location identification and improved 738 diagnostic information available for crew dispatch. Also provides 739 visualization of fault information to design and implement a 740 switching plan to isolate the fault. 742 2. Fault Type Determination 744 I. Indicates faults cleared by controllable protective devices by 745 distinguishing between: 747 a. Faults cleared by fuses 749 b. Momentary outages 750 c. Inrush/cold load current 752 II. Determines the faulted sections based on SCADA fault indications 753 and protection lockout signals 755 III. Increases the accuracy of the fault location estimation based 756 on SCADA fault current measurements and real-time fault analysis 758 3. Fault Isolation and Service Restoration 760 Once the location and type of the fault has been pinpointed, the 761 systems will attempt to isolate the fault and restore the non-faulted 762 section of the network. This can have three modes of operation: 764 I. Closed-loop mode : This is initiated by the Fault location sub- 765 function. It generates a switching order (i.e., sequence of 766 switching) for the remotely controlled switching devices to isolate 767 the faulted section, and restore service to the non-faulted sections. 768 The switching order is automatically executed via SCADA. 770 II. Advisory mode : This is initiated by the Fault location sub- 771 function. It generates a switching order for remotely and manually 772 controlled switching devices to isolate the faulted section, and 773 restore service to the non-faulted sections. The switching order is 774 presented to operator for approval and execution. 776 III. Study mode : the operator initiates this function. It analyzes 777 a saved case modified by the operator, and generates a switching 778 order under the operating conditions specified by the operator. 780 With the increasing volume of data that are collected through fault 781 sensors, utilities will use Big Data query and analysis tools to 782 study outage information to anticipate and prevent outages by 783 detecting failure patterns and their correlation with asset age, 784 type, load profiles, time of day, weather conditions, and other 785 conditions to discover conditions that lead to faults and take the 786 necessary preventive and corrective measures. 788 +----------------------+--------------------------------------------+ 789 | FLISR Requirement | Attribute | 790 +----------------------+--------------------------------------------+ 791 | One way maximum | 80 ms | 792 | delay | | 793 | | | 794 | Asymetric delay | No | 795 | Required | | 796 | | | 797 | Maximum jitter | 40 ms | 798 | | | 799 | Topology | Point to point, point to Multi-point, | 800 | | Multi-point to Multi-point | 801 | | | 802 | Bandwidth | 64 Kbps | 803 | | | 804 | Availability | 99.9999 | 805 | | | 806 | precise timing | Yes | 807 | required | | 808 | | | 809 | Recovery time on | Depends on customer impact | 810 | Node failure | | 811 | | | 812 | performance | Yes, Mandatory | 813 | management | | 814 | | | 815 | Redundancy | Yes | 816 | | | 817 | Packet loss | 0.1% | 818 +----------------------+--------------------------------------------+ 820 Table 9: FLISR Communication Requirements 822 4.2.3. Generation use case 824 4.2.3.1. Frequency Control / Automatic Generation Control (AGC) 826 The system frequency should be maintained within a very narrow band. 827 Deviations from the acceptable frequency range are detected and 828 forwarded to the Load Frequency Control (LFC) system so that required 829 up or down generation increase / decrease pulses can be sent to the 830 power plants for frequency regulation. The trend in system frequency 831 is a measure of mismatch between demand and generation, and is a 832 necessary parameter for load control in interconnected systems. 834 Automatic generation control (AGC) is a system for adjusting the 835 power output of generators at different power plants, in response to 836 changes in the load. Since a power grid requires that generation and 837 load closely balance moment by moment, frequent adjustments to the 838 output of generators are necessary. The balance can be judged by 839 measuring the system frequency; if it is increasing, more power is 840 being generated than used, and all machines in the system are 841 accelerating. If the system frequency is decreasing, more demand is 842 on the system than the instantaneous generation can provide, and all 843 generators are slowing down. 845 Where the grid has tie lines to adjacent control areas, automatic 846 generation control helps maintain the power interchanges over the tie 847 lines at the scheduled levels. The AGC takes into account various 848 parameters including the most economical units to adjust, the 849 coordination of thermal, hydroelectric, and other generation types, 850 and even constraints related to the stability of the system and 851 capacity of interconnections to other power grids. 853 For the purpose of AGC we use static frequency measurements and 854 averaging methods are used to get a more precise measure of system 855 frequency in steady-state conditions. 857 During disturbances, more real-time dynamic measurements of system 858 frequency are taken using PMUs, especially when different areas of 859 the system exhibit different frequencies. But that is outside the 860 scope of this use case. 862 +---------------------------------------------------+---------------+ 863 | FCAG (Frequency Control Automatic Generation) | Attribute | 864 | Requirement | | 865 +---------------------------------------------------+---------------+ 866 | One way maximum delay | 500 ms | 867 | | | 868 | Asymetric delay Required | No | 869 | | | 870 | Maximum jitter | Not critical | 871 | | | 872 | Topology | Point to | 873 | | point | 874 | | | 875 | Bandwidth | 20 Kbps | 876 | | | 877 | Availability | 99.999 | 878 | | | 879 | precise timing required | Yes | 880 | | | 881 | Recovery time on Node failure | N/A | 882 | | | 883 | performance management | Yes, | 884 | | Mandatory | 885 | | | 886 | Redundancy | Yes | 887 | | | 888 | Packet loss | 1% | 889 +---------------------------------------------------+---------------+ 891 Table 10: FCAG Communication Requirements 893 4.3. Specific Network topologies of Smart Grid Applications 895 Utilities often have very large private telecommunications networks. 896 It covers an entire territory / country. The main purpose of the 897 network, until now, has been to support transmission network 898 monitoring, control, and automation, remote control of generation 899 sites, and providing FCAPS (Fault. Configuration. Accounting. 900 Performance. Security) services from centralized network operation 901 centers. 903 Going forward, one network will support operation and maintenance of 904 electrical networks (generation, transmission, and distribution), 905 voice and data services for ten of thousands of employees and for 906 exchange with neighboring interconnections, and administrative 907 services. To meet those requirements, utility may deploy several 908 physical networks leveraging different technologies across the 909 country: an optical network and a microwave network for instance. 911 Each protection and automatism system between two points has two 912 telecommunications circuits, one on each network. Path diversity 913 between two substations is key. Regardless of the event type 914 (hurricane, ice storm, etc.), one path shall stay available so the 915 SPS can still operate. 917 In the optical network, signals are transmitted over more than tens 918 of thousands of circuits using fiber optic links, microwave and 919 telephone cables. This network is the nervous system of the 920 utility's power transmission operations. The optical network 921 represents ten of thousands of km of cable deployed along the power 922 lines. 924 Due to vast distances between transmission substations (for example 925 as far as 280km apart), the fiber signal can be amplified to reach a 926 distance of 280 km without attenuation. 928 4.4. Precision Time Protocol 930 Some utilities do not use GPS clocks in generation substations. One 931 of the main reasons is that some of the generation plants are 30 to 932 50 meters deep under ground and the GPS signal can be weak and 933 unreliable. Instead, atomic clocks are used. Clocks are 934 synchronized amongst each other. Rubidium clocks provide clock and 935 1ms timestamps for IRIG-B. Some companies plan to transition to the 936 Precision Time Protocol (IEEE 1588), distributing the synchronization 937 signal over the IP/MPLS network. 939 The Precision Time Protocol (PTP) is defined in IEEE standard 1588. 940 PTP is applicable to distributed systems consisting of one or more 941 nodes, communicating over a network. Nodes are modeled as containing 942 a real-time clock that may be used by applications within the node 943 for various purposes such as generating time-stamps for data or 944 ordering events managed by the node. The protocol provides a 945 mechanism for synchronizing the clocks of participating nodes to a 946 high degree of accuracy and precision. 948 PTP operates based on the following assumptions : 950 It is assumed that the network eliminates cyclic forwarding of PTP 951 messages within each communication path (e.g., by using a spanning 952 tree protocol). PTP eliminates cyclic forwarding of PTP messages 953 between communication paths. 955 PTP is tolerant of an occasional missed message, duplicated 956 message, or message that arrived out of order. However, PTP 957 assumes that such impairments are relatively rare. 959 PTP was designed assuming a multicast communication model. PTP 960 also supports a unicast communication model as long as the 961 behavior of the protocol is preserved. 963 Like all message-based time transfer protocols, PTP time accuracy 964 is degraded by asymmetry in the paths taken by event messages. 965 Asymmetry is not detectable by PTP, however, if known, PTP 966 corrects for asymmetry. 968 A time-stamp event is generated at the time of transmission and 969 reception of any event message. The time-stamp event occurs when the 970 message's timestamp point crosses the boundary between the node and 971 the network. 973 IEC 61850 will recommend the use of the IEEE PTP 1588 Utility Profile 974 (as defined in IEC 62439-3 Annex B) which offers the support of 975 redundant attachment of clocks to Paralell Redundancy Protcol (PRP) 976 and High-availability Seamless Redundancy (HSR) networks. 978 5. IANA Considerations 980 This memo includes no request to IANA. 982 6. Security Considerations 984 6.1. Current Practices and Their Limitations 986 Grid monitoring and control devices are already targets for cyber 987 attacks and legacy telecommunications protocols have many intrinsic 988 network related vulnerabilities. DNP3, Modbus, PROFIBUS/PROFINET, 989 and other protocols are designed around a common paradigm of request 990 and respond. Each protocol is designed for a master device such as 991 an HMI (Human Machine Interface) system to send commands to 992 subordinate slave devices to retrieve data (reading inputs) or 993 control (writing to outputs). Because many of these protocols lack 994 authentication, encryption, or other basic security measures, they 995 are prone to network-based attacks, allowing a malicious actor or 996 attacker to utilize the request-and-respond system as a mechanism for 997 command-and-control like functionality. Specific security concerns 998 common to most industrial control, including utility 999 telecommunication protocols include the following: 1001 o Network or transport errors (e.g. malformed packets or excessive 1002 latency) can cause protocol failure. 1004 o Protocol commands may be available that are capable of forcing 1005 slave devices into inoperable states, including powering-off 1006 devices, forcing them into a listen-only state, disabling 1007 alarming. 1009 o Protocol commands may be available that are capable of restarting 1010 communications and otherwise interrupting processes. 1012 o Protocol commands may be available that are capable of clearing, 1013 erasing, or resetting diagnostic information such as counters and 1014 diagnostic registers. 1016 o Protocol commands may be available that are capable of requesting 1017 sensitive information about the controllers, their configurations, 1018 or other need-to-know information. 1020 o Most protocols are application layer protocols transported over 1021 TCP; therefore it is easy to transport commands over non-standard 1022 ports or inject commands into authorized traffic flows. 1024 o Protocol commands may be available that are capable of 1025 broadcasting messages to many devices at once (i.e. a potential 1026 DoS). 1028 o Protocol commands may be available to query the device network to 1029 obtain defined points and their values (i.e. a configuration 1030 scan). 1032 o Protocol commands may be available that will list all available 1033 function codes (i.e. a function scan). 1035 o Bump in the wire (BITW) solutions : A hardware device is added to 1036 provide IPSec services between two routers that are not capable of 1037 IPSec functions. This special IPsec device will intercept then 1038 intercept outgoing datagrams, add IPSec protection to them, and 1039 strip it off incoming datagrams. BITW can all IPSec to legacy 1040 hosts and can retrofit non-IPSec routers to provide security 1041 benefits. The disadvantages are complexity and cost. 1043 These inherent vulnerabilities, along with increasing connectivity 1044 between IT an OT networks, make network-based attacks very feasible. 1045 Simple injection of malicious protocol commands provides control over 1046 the target process. Altering legitimate protocol traffic can also 1047 alter information about a process and disrupt the legitimate controls 1048 that are in place over that process. A man- in-the-middle attack 1049 could provide both control over a process and misrepresentation of 1050 data back to operator consoles. 1052 6.2. Security Trends in Utility Networks 1054 Although advanced telecommunications networks can assist in 1055 transforming the energy industry, playing a critical role in 1056 maintaining high levels of reliability, performance, and 1057 manageability, they also introduce the need for an integrated 1058 security infrastructure. Many of the technologies being deployed to 1059 support smart grid projects such as smart meters and sensors can 1060 increase the vulnerability of the grid to attack. Top security 1061 concerns for utilities migrating to an intelligent smart grid 1062 telecommunications platform center on the following trends: 1064 o Integration of distributed energy resources 1066 o Proliferation of digital devices to enable management, automation, 1067 protection, and control 1069 o Regulatory mandates to comply with standards for critical 1070 infrastructure protection 1072 o Migration to new systems for outage management, distribution 1073 automation, condition-based maintenance, load forecasting, and 1074 smart metering 1076 o Demand for new levels of customer service and energy management 1078 This development of a diverse set of networks to support the 1079 integration of microgrids, open-access energy competition, and the 1080 use of network-controlled devices is driving the need for a converged 1081 security infrastructure for all participants in the smart grid, 1082 including utilities, energy service providers, large commercial and 1083 industrial, as well as residential customers. Securing the assets of 1084 electric power delivery systems, from the control center to the 1085 substation, to the feeders and down to customer meters, requires an 1086 end-to-end security infrastructure that protects the myriad of 1087 telecommunications assets used to operate, monitor, and control power 1088 flow and measurement. Cyber security refers to all the security 1089 issues in automation and telecommunications that affect any functions 1090 related to the operation of the electric power systems. 1091 Specifically, it involves the concepts of: 1093 o Integrity : data cannot be altered undetectably 1095 o Authenticity : the telecommunications parties involved must be 1096 validated as genuine 1098 o Authorization : only requests and commands from the authorized 1099 users can be accepted by the system 1101 o Confidentiality : data must not be accessible to any 1102 unauthenticated users 1104 When designing and deploying new smart grid devices and 1105 telecommunications systems, it's imperative to understand the various 1106 impacts of these new components under a variety of attack situations 1107 on the power grid. Consequences of a cyber attack on the grid 1108 telecommunications network can be catastrophic. This is why security 1109 for smart grid is not just an ad hoc feature or product, it's a 1110 complete framework integrating both physical and Cyber security 1111 requirements and covering the entire smart grid networks from 1112 generation to distribution. Security has therefore become one of the 1113 main foundations of the utility telecom network architecture and must 1114 be considered at every layer with a defense-in-depth approach. 1115 Migrating to IP based protocols is key to address these challenges 1116 for two reasons: 1118 1. IP enables a rich set of features and capabilities to enhance the 1119 security posture 1121 2. IP is based on open standards, which allows interoperability 1122 between different vendors and products, driving down the costs 1123 associated with implementing security solutions in OT networks. 1125 Securing OT (Operation technology) telecommunications over packet- 1126 switched IP networks follow the same principles that are foundational 1127 for securing the IT infrastructure, i.e., consideration must be given 1128 to enforcing electronic access control for both person-to-machine and 1129 machine-to-machine communications, and providing the appropriate 1130 levels of data privacy, device and platform integrity, and threat 1131 detection and mitigation. 1133 7. Acknowledgements 1135 Faramarz Maghsoodlou, Ph. D. IoT Connected Industries and Energy 1136 Practice Cisco 1138 Pascal Thubert, CTAO Cisco 1140 8. References 1142 8.1. Normative References 1144 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1145 Requirement Levels", BCP 14, RFC 2119, March 1997. 1147 8.2. Informative References 1149 [I-D.finn-detnet-problem-statement] 1150 Finn, N. and P. Thubert, "Deterministic Networking Problem 1151 Statement", draft-finn-detnet-problem-statement-03 (work 1152 in progress), June 2015. 1154 [IEC61850-90-12] 1155 TC57 WG10, IEC., "IEC 61850-90-12 TR: Communication 1156 networks and systems for power utility automation - Part 1157 90-12: Wide area network engineering guidelines", 2015. 1159 [IEC62439-3:2012] 1160 TC65, IEC., "IEC 62439-3: Industrial communication 1161 networks - High availability automation networks - Part 3: 1162 Parallel Redundancy Protocol (PRP) and High-availability 1163 Seamless Redundancy (HSR)", 2012. 1165 Authors' Addresses 1167 Patrick Wetterwald 1168 Cisco Systems 1169 45 Allees des Ormes 1170 Mougins 06250 1171 FRANCE 1173 Phone: +33 4 97 23 26 36 1174 Email: pwetterw@cisco.com 1176 Jean Raymond 1177 Hydro-Quebec 1178 1500 University 1179 Montreal H3A3S7 1180 Canada 1182 Phone: +1 514 840 3000 1183 Email: raymond.jean@hydro.qc.ca