idnits 2.17.1 draft-ietf-ccamp-alarm-module-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 420 has weird spacing: '...perator str...' == Line 425 has weird spacing: '...w state wri...' -- The document date (August 8, 2018) is 2081 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC6536' is mentioned on line 2634, but not defined ** Obsolete undefined reference: RFC 6536 (Obsoleted by RFC 8341) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Vallin 3 Internet-Draft Stefan Vallin AB 4 Intended status: Standards Track M. Bjorklund 5 Expires: February 9, 2019 Cisco 6 August 8, 2018 8 YANG Alarm Module 9 draft-ietf-ccamp-alarm-module-02 11 Abstract 13 This document defines a YANG module for alarm management. It 14 includes functions for alarm list management, alarm shelving and 15 notifications to inform management systems. There are also RPCs to 16 manage the operator state of an alarm and administrative alarm 17 procedures. The module carefully maps to relevant alarm standards. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on February 9, 2019. 36 Copyright Notice 38 Copyright (c) 2018 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 55 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 57 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 58 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.3. Identifying the Alarming Resource . . . . . . . . . . . . 7 60 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 8 61 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 62 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 8 63 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 9 64 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 10 65 3.6. Root Cause, Impacted Resources and Related Alarms . . . . 10 66 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 11 67 3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 11 68 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 11 69 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 12 70 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 12 71 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 12 72 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 13 73 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 13 74 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 13 75 4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 14 76 4.7. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 14 77 4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 14 78 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 14 79 6. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 44 80 7. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 44 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 56 83 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 57 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 57 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 57 86 11.2. Informative References . . . . . . . . . . . . . . . . . 58 87 Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 59 88 Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 60 89 Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 61 90 Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 62 91 Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 63 92 Appendix F. Background and Usability Requirements . . . . . . . 64 93 F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 64 94 F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 64 95 F.2. Usability Requirements . . . . . . . . . . . . . . . . . 65 97 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 68 99 1. Introduction 101 This document defines a YANG [RFC7950] module for alarm management. 102 The purpose is to define a standardised alarm interface for network 103 devices that can be easily integrated into management applications. 104 The model is also applicable as a northbound alarm interface in the 105 management applications. 107 Alarm monitoring is a fundamental part of monitoring the network. 108 Raw alarms from devices do not always tell the status of the network 109 services or necessarily point to the root cause. However, being able 110 to feed alarms to the alarm management application in a standardised 111 format is a starting point for performing higher level network 112 assurance tasks. 114 The design of the module is based on experience from using and 115 implementing available alarm standards from ITU [X.733], 3GPP 116 [ALARMIRP] and ANSI [ISA182]. 118 1.1. Terminology and Notation 120 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 121 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 122 "OPTIONAL" in this document are to be interpreted as described in BCP 123 14 [RFC2119] [RFC8174] when, and only when, they appear in all 124 capitals, as shown here. 126 The following terms are defined in [RFC7950]: 128 o action 130 o client 132 o data tree 134 o RPC 136 o server 138 The following terms are used within this document: 140 o Alarm (the general concept): An alarm signifies an undesirable 141 state in a resource that requires corrective action. 143 o Alarm Type: An alarm type identifies a possible unique alarm state 144 for a resource. Alarm types are names to identify the state like 145 "link-alarm", "jitter-violation", "high-disk-utilization". 147 o Resource: A fine-grained identification of the alarming resource, 148 for example: an interface, a process. 150 o Alarm Instance: The alarm state for a specific resource and alarm 151 type. For example (GigabitEthernet0/15, link-alarm). An entry in 152 the alarm list. 154 o Alarm Inventory: A list of all possible alarm types on a system. 156 o Alarm Shelving: Blocking alarms according to specific criteria. 158 o Management System: The alarm management application that consumes 159 the alarms, i.e., acts as a client. 161 o System: The system that implements this YANG alarm module, i.e., 162 acts as a server. This corresponds to a network device or a 163 management application that provides a north-bound alarm 164 interface. 166 Tree diagrams used in this document follow the notation defined in 167 [RFC8340]. 169 2. Objectives 171 The objectives for the design of the Alarm Module are: 173 o Simple to use. If a system supports this module, it shall be 174 straight-forward to integrate this into a YANG based alarm 175 manager. 177 o View alarms as states on resources and not as discrete 178 notifications. 180 o Clear definition of "alarm" in order to exclude general events 181 that should not be forwarded as alarm notifications. 183 o Clear and precise identification of alarm types and alarm 184 instances. 186 o A management system should be able to pull all available alarm 187 types from a system, i.e., read the alarm inventory from a system. 188 This makes it possible to prepare alarm operators with 189 corresponding alarm instructions. 191 o Address alarm usability requirements, see Appendix F. While IETF 192 has not really addressed alarm management, telecom standards has 193 addressed it purely from a protocol perspective. The process 194 industry has published several relevant standards addressing 195 requirements for a useful alarm interface; [EEMUA], [ISA182]. 196 This alarm module defines usability requirements as well as a YANG 197 data model. 199 o Mapping to X.733, which is a requirement for some alarm systems. 200 Still, keep some of the X.733 concepts out of the core model in 201 order to make the model small and easy to understand. 203 3. Alarm Module Concepts 205 This section defines the fundamental concepts behind the data model. 206 This section is rooted in the works of Vallin et. al [ALARMSEM]. 208 3.1. Alarm Definition 210 An alarm signifies an undesirable state in a resource that requires 211 corrective action. 213 There are two main things to remember from this definition: 215 1. the definition focuses on leaving out events and logging 216 information in general. Alarms should only be used for undesired 217 states that require action. 219 2. the definition also focus on alarms as a state on a resource, not 220 the notifications that report the state changes. 222 See Appendix F for more motivation and consequences around this 223 definition. 225 3.2. Alarm Type 227 This document defines an alarm type with an alarm type id and an 228 alarm type qualifier. 230 The alarm type id is modeled as a YANG identity. With YANG 231 identities, new alarm types can be defined in a distributed fashion. 232 YANG identities are hierarchical, which means that an hierarchy of 233 alarm types can be defined. 235 Standards and vendors should define their own alarm type identities 236 based on this definition. 238 The use of YANG identities means that all possible alarms are 239 identified at design time. This explicit declaration of alarm types 240 makes it easier to allow for alarm qualification reviews and 241 preparation of alarm actions and documentation. 243 There are occasions where the alarm types are not known at design 244 time. For example, a system with digital inputs that allows users to 245 connects detectors (e.g., smoke detector) to the inputs. In this 246 case it is a configuration action that says that certain connectors 247 are fire alarms for example. A potential drawback of this is that 248 there is a big risk that alarm operators will receive alarm types as 249 a surprise, they do not know how to resolve the problem since a 250 defined alarm procedure does not necessarily exist. To avoid this 251 risk the system MUST publish all possible alarm types in the alarm 252 inventory, see Section 4.2. 254 In order to allow for dynamic addition of alarm types the alarm 255 module also allows for further qualification of the identity based 256 alarm type using a string. 258 A vendor or standard can then define their own alarm-type hierarchy. 259 The example below shows a hierarchy based on X.733 event types: 261 import ietf-alarms { 262 prefix al; 263 } 264 identity vendor-alarms { 265 base al:alarm-type; 266 } 267 identity communications-alarm { 268 base vendor-alarms; 269 } 270 identity link-alarm { 271 base communications-alarm; 272 } 274 Alarm types can be abstract. An abstract alarm type is used as a 275 base for defining hierarchical alarm types. Concrete alarm types are 276 used for alarm states and appear in the alarm inventory. There are 277 two kinds of concrete alarm types: 279 1. The last subordinate identity in the "alarm-type-id" hierarchy is 280 concrete, for example: "alarm-identity.environmental- 281 alarm.smoke". In this example "alarm-identity" and 282 "environmental-alarm" are abstract YANG identities, whereas 283 "smoke" is a concrete YANG identity. 285 2. The YANG identity hierarchy is abstract and the concrete alarm 286 type is defined by the dynamic alarm qualifier string, for 287 example: "alarm-identity.environmental-alarm.external-detector" 288 with alarm-type-qualifier "smoke". 290 For example: 292 // Alternative 1: concrete alarm type identity 293 import ietf-alarms { 294 prefix al; 295 } 296 identity environmental-alarm { 297 base al:alarm-type; 298 description "Abstract alarm type"; 299 } 300 identity smoke { 301 base environmental-alarm; 302 description "Concrete alarm type"; 303 } 305 // Alternative 2: concrete alarm type qualifier 306 import ietf-alarms { 307 prefix al; 308 } 309 identity environmental-alarm { 310 base al:alarm-type; 311 description "Abstract alarm type"; 312 } 313 identity external-detector { 314 base environmental-alarm; 315 description 316 "Abstract alarm type, a run-time configuration 317 procedure sets the type of alarm detected. This will 318 be reported in the alarm-type-qualifier."; 319 } 321 A server SHOULD strive to minimize the number of dynamically defined 322 alarm types. 324 3.3. Identifying the Alarming Resource 326 It is of vital importance to be able to refer to the alarming 327 resource. This reference must be as fine-grained as possible. If 328 the alarming resource exists in the data tree then an instance- 329 identifier MUST be used with the full path to the object. 331 This module also allows for alternate naming of the alarming resource 332 if it is not available in the data tree. 334 3.4. Identifying Alarm Instances 336 A primary goal of this alarm module is to remove any ambiguity in how 337 alarm notifications are mapped to an update of an alarm instance. 338 X.733 and especially 3GPP were not really clear on this point. This 339 YANG alarm module states that the tuple (resource, alarm type 340 identifier, alarm type qualifier) corresponds to a single alarm 341 instance. This means that alarm notifications for the same resource 342 and same alarm type are matched to update the same alarm instance. 343 These three leafs are therefore used as the key in the alarm list: 345 list alarm { 346 key "resource alarm-type-id alarm-type-qualifier"; 347 ... 348 } 350 3.5. Alarm Life-Cycle 352 The alarm model clearly separates the resource alarm life-cycle from 353 the operator and administrative life-cycles of an alarm. 355 o resource alarm life-cycle: the alarm instrumentation that controls 356 alarm raise, clearance, and severity changes. 358 o operator alarm life-cycle: operators acting upon alarms with 359 actions like acknowledgment and closing. Closing an alarm implies 360 that the operator considers the corrective action performed. 361 Operators can also shelf (block/filter) alarms in order to avoid 362 nuisance alarms. 364 o administrative alarm life-cycle: purging (deleting) unwanted 365 alarms and compressing the alarm status change list. This module 366 exposes operations to manage the administrative life-cycle. The 367 server may also perform these operations based on other policies, 368 but how that is done is out of scope for this document. 370 A server SHOULD describe how long it retains cleared/closed alarms: 371 until manually purged or if it has an automatic removal policy. 373 3.5.1. Resource Alarm Life-Cycle 375 From a resource perspective, an alarm can for example have the 376 following life-cycle: raise, change severity, change severity, clear, 377 being raised again etc. All of these status changes can have 378 different alarm texts generated by the instrumentation. Two 379 important things to note: 381 1. Alarms are not deleted when they are cleared. Deleting alarms is 382 an administrative process. The alarm module defines an rpc 383 "purge" that deletes alarms. 385 2. Alarms are not cleared by operators, only the underlying 386 instrumentation can clear an alarm. Operators can close alarms. 388 The YANG tree representation below illustrates the resource oriented 389 life-cycle: 391 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 392 ... 393 +--ro is-cleared boolean 394 +--ro last-changed yang:date-and-time 395 +--ro perceived-severity severity 396 +--ro alarm-text alarm-text 397 +--ro status-change* [time] 398 +--ro time yang:date-and-time 399 +--ro perceived-severity severity-with-clear 400 +--ro alarm-text alarm-text 402 For every status change from the resource perspective a row is added 403 to the "status-change" list. The last status values are also 404 represented as leafs for the alarm. Note well that the alarm 405 severity does not include "cleared", alarm clearance is a boolean 406 flag. 408 An alarm can therefore look like this: ((GigabitEthernet0/25, link- 409 alarm,""), false, T, major, "Interface GigabitEthernet0/25 down") 411 3.5.2. Operator Alarm Life-cycle 413 Operators can also act upon alarms using the set-operator-state 414 action: 416 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 417 ... 418 +--ro operator-state-change* [time] {operator-actions}? 419 | +--ro time yang:date-and-time 420 | +--ro operator string 421 | +--ro state operator-state 422 | +--ro text? string 423 +---x set-operator-state {operator-actions}? 424 +---w input 425 +---w state writable-operator-state 426 +---w text? string 428 The operator state for an alarm can be: "none", "ack", "shelved", and 429 "closed". Alarm deletion (using the rpc "purge-alarms"), can use 430 this state as a criteria. A closed alarm is an alarm where the 431 operator has performed any required corrective actions. Closed 432 alarms are good candidates for being purged. 434 3.5.3. Administrative Alarm Life-Cycle 436 Deleting alarms from the alarm list is considered an administrative 437 action. This is supported by the "purge-alarms" rpc. The "purge- 438 alarms" rpc takes a filter as input. The filter selects alarms based 439 on the operator and resource life-cycle such as "all closed cleared 440 alarms older than a time specification". The server may also perform 441 these operations based on other policies, but how that is done is out 442 of scope for this document. 444 Alarms can be compressed. Compressing an alarm deletes all entries 445 in the alarm's "status-change" list except for the last status 446 change. A client can perform this using the "compress-alarms" rpc. 447 The server may also perform these operations based on other policies, 448 but how that is done is out of scope for this document. 450 3.6. Root Cause, Impacted Resources and Related Alarms 452 The general principle of this alarm module is to limit the amount of 453 alarms. The alarm has two leaf-lists to identify possible impacted 454 resources and possible root-cause resources. The system should not 455 represent individual alarms for the possible root-cause resources and 456 impacted resources. These serves as hints only. It is up to the 457 client application to use this information to present the overall 458 status. 460 A system should always strive to identify the resource that can be 461 acted upon as the "resource" leaf. The "impacted-resource" leaf-list 462 shall be used to identify any side-effects of the alarm. The 463 impacted resources can not be acted upon to fix the problem. An 464 example of this kind of alarm might be a disc full problem which 465 impacts a number of databases. 467 In some occasions the system might not be capable of detecting the 468 root cause, the resource that can be acted upon. The instrumentation 469 in this case only monitors the side-effect and needs to represent an 470 alarm that indicates a situation that needs acting upon. The 471 instrumentation still might identify possible candidates for the 472 root-cause resource. In this case the "root-cause-resource" leaf- 473 list can be used to indicate the candidate root-cause resources. An 474 example of this kind of alarm might be an active test tool that 475 detects an SLA violation on a VPN connection and identifies the 476 devices along the chain as candidate root causes. 478 The alarm module also supports a way to associate different alarms to 479 each other with the "related-alarm" list. This list enables the 480 server to inform the client that certain alarms are related to other 481 alarms. 483 Note well that this module does not prescribe any dependencies or 484 preference between the above alarm correlation mechanisms. Different 485 systems have different capabilities and the above described 486 mechanisms are available to support the instrumentation features. 488 3.7. Alarm Shelving 490 Alarm shelving is an important function in order for alarm management 491 applications and operators to stop superfluous alarms. A shelved 492 alarm implies that any alarms fulfilling this criteria are ignored 493 (blocked/filtered). Shelved alarms appear in a dedicated shelved 494 alarm list in order not to disturb the relevant alarms. Shelved 495 alarms do not generate notifications. 497 3.8. Alarm Profiles 499 Alarm profiles are used to configure further information to an alarm 500 type. This module supports configuring severity levels overriding 501 the system default levels. This corresponds to the Alarm Assignment 502 Profile, ASAP, functionality in M.3100 [M.3100] and M.3160 [M.3160]. 503 Other standard or enterprise modules can augment this list with 504 further alarm type information. 506 4. Alarm Data Model 508 The fundamental parts of the data model are the "alarm-list" with 509 associated notifications and the "alarm-inventory" list of all 510 possible alarm types. These MUST be implemented by a system. The 511 rest of the data model are made conditional with YANG the features 512 "operator-actions", "alarm-shelving", "alarm-history", "alarm- 513 summary", "alarm-profile", and "severity-assignment". 515 The data model has the following overall structure: 517 4.1. Alarm Control 519 The "/alarms/control/notify-status-changes" choice controls if 520 notifications are sent for all state changes, only raise and clear, 521 or only notifications more severe than a configured level. This 522 feature in combination with alarm shelving corresponds to the ITU 523 Alarm Report Control functionality. 525 Every alarm has a list of status changes, this is a circular list. 526 The length of this list is controlled by "/alarms/control/max-alarm- 527 status-changes". 529 4.1.1. Alarm Shelving 531 The shelving control tree is shown below: 533 Shelved alarms are shown in a dedicated shelved alarm list. The 534 instrumentation MUST move shelved alarms from the alarm list 535 (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- 536 alarms/). Shelved alarms do not generate any notifications. When 537 the shelving criteria is removed or changed the alarm list MUST be 538 updated to the correct actual state of the alarms. 540 Shelving and unshelving can only be performed by editing the shelf 541 configuration. It cannot be performed on individual alarms. The 542 server will add an operator state indicating that the alarm was 543 shelved/unshelved. 545 A leaf (/alarms/summary/shelfs-active) in the alarm summary indicates 546 if there are shelved alarms. 548 A system can select to not support the shelving feature. 550 4.2. Alarm Inventory 552 The alarm inventory represents all possible alarm types that may 553 occur in the system. A management system may use this to build alarm 554 procedures. The alarm inventory is relevant for several reasons: 556 The system might not instrument all defined alarm type identities, 557 and some alarm identities are abstract. 559 The system has configured dynamic alarm types using the alarm 560 qualifier. The inventory makes it possible for the management 561 system to discover these. 563 Note that the mechanism whereby dynamic alarm types are added using 564 the alarm type qualifier MUST populate this list. 566 The optional leaf-list "resource" in the alarm inventory enables the 567 system to publish for which resources a given alarm type may appear. 569 A server MUST implement the alarm inventory in order to enable 570 controlled alarm procedures in the client. 572 The alarm inventory tree is shown below: 574 4.3. Alarm Summary 576 The alarm summary list summarises alarms per severity; how many 577 cleared, cleared and closed, and closed. It also gives an indication 578 if there are shelved alarms. 580 The alarm summary tree is shown below: 582 4.4. The Alarm List 584 The alarm list (/alarms/alarm-list) is a function from (resource, 585 alarm type, alarm type qualifier) to the current alarm state. 587 Every alarm has three important states, the resource clearance state 588 "is-cleared", the severity "perceived-severity" and the operator 589 state available in the operator state change list. 591 In order to see the alarm history the resource state changes are 592 available in the "status-change" list and the operator history is 593 available in the "operator-state-change" list. 595 4.5. The Shelved Alarms List 597 The shelved alarm list has the same structure as the alarm list 598 above. It shows all the alarms that matches the shelving criteria 599 (/alarms/control/alarm-shelving). 601 4.6. Alarm Profiles 603 Alarm profiles (/alarms/alarm-profile/) is a list of configurable 604 alarm types. The list supports configurable alarm severity levels in 605 the container "alarm-severity-assignment-profile". If an alarm 606 matches the configured alarm type it MUST use the configured severity 607 level(s) instead of the system default. This configuration MUST also 608 be represented in the alarm inventory. 610 4.7. RPCs and Actions 612 The alarm module supports rpcs and actions to manage the alarms: 614 "purge-alarms" (rpc): delete alarms according to specific 615 criteria, for example all cleared alarms older then a specific 616 date. 618 "compress-alarms" (rpc): compress the status-change list for the 619 alarms. 621 "set-operator-state" (action): change the operator state for an 622 alarm: for example acknowledge. 624 4.8. Notifications 626 The alarm module supports a general notification to report alarm 627 state changes. It carries all relevant parameters for the alarm 628 management application. 630 There is also a notification to report that an operator changed the 631 operator state on an alarm, like acknowledge. 633 If the alarm inventory is changed, for example a new card type is 634 inserted, a notification will tell the management application that 635 new alarm types are available. 637 5. Alarm YANG Module 639 This YANG module references [RFC6991]. 641 file "ietf-alarms@2018-08-08.yang" 642 module ietf-alarms { 643 yang-version 1.1; 644 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; 645 prefix al; 647 import ietf-yang-types { 648 prefix yang; 649 reference "RFC 6991: Common YANG Data Types."; 650 } 652 organization 653 "IETF CCAMP Working Group"; 654 contact 655 "WG Web: 656 WG List: 658 Editor: Stefan Vallin 659 661 Editor: Martin Bjorklund 662 "; 663 description 664 "This module defines an interface for managing alarms. Main 665 inputs to the module design are the 3GPP Alarm IRP, ITU-T X.733 666 and ANSI/ISA-18.2 alarm standards. 668 Main features of this module include: 670 * Alarm list: 671 A list of all alarms. Cleared alarms stay in 672 the list until explicitly purged. 674 * Operator actions on alarms: 675 Acknowledging and closing alarms. 677 * Administrative actions on alarms: 678 Purging alarms from the list according to specific 679 criteria. 681 * Alarm inventory: 682 A management application can read all 683 alarm types implemented by the system. 685 * Alarm shelving: 686 Shelving (blocking) alarms according 687 to specific criteria. 689 * Alarm profiles: 690 A management system can attach further 691 information to alarm types, for example 692 overriding system default severity 693 levels. 695 This module uses a stateful view on alarms. An alarm is a state 696 for a specific resource (note that an alarm is not a 697 notification). An alarm type is a possible alarm state for a 698 resource. For example, the tuple: 700 ('link-alarm', 'GigabitEthernet0/25') 702 is an alarm of type 'link-alarm' on the resource 703 'GigabitEthernet0/25'. 705 Alarm types are identified using YANG identities and an optional 706 string-based qualifier. The string-based qualifier allows for 707 dynamic extension of the statically defined alarm types. Alarm 708 types identify a possible alarm state and not the individual 709 notifications. For example, the traditional 'link-down' and 710 'link-up' notifications are two notifications referring to the 711 same alarm type 'link-alarm'. 713 With this design there is no ambiguity about how alarm and alarm 714 clear correlation should be performed: notifications that report 715 the same resource and alarm type are considered updates of the 716 same alarm, e.g., clearing an active alarm or changing the 717 severity of an alarm. 719 The instrumentation can update 'severity' and 'alarm-text' on an 720 existing alarm. The above alarm example can therefore look 721 like: 723 (('link-alarm', 'GigabitEthernet0/25'), 724 warning, 725 'interface down while interface admin state is up') 727 There is a clear separation between updates on the alarm from 728 the underlying resource, like clear, and updates from an 729 operator like acknowledge or closing an alarm: 731 (('link-alarm', 'GigabitEthernet0/25'), 732 warning, 733 'interface down while interface admin state is up', 734 cleared, 735 closed) 737 Administrative actions like removing closed alarms older than a 738 given time is supported. 740 This alarm module does not define how the underlying 741 instrumentation detects and clears the specific alarms. 742 That belongs to the SDO or enterprise that owns that 743 specific technology. 745 Copyright (c) 2018 IETF Trust and the persons identified as 746 authors of the code. All rights reserved. 748 Redistribution and use in source and binary forms, with or 749 without modification, is permitted pursuant to, and subject to 750 the license terms contained in, the Simplified BSD License set 751 forth in Section 4.c of the IETF Trust's Legal Provisions 752 Relating to IETF Documents 753 (https://trustee.ietf.org/license-info). 755 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 756 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 757 'OPTIONAL' in the module text are to be interpreted as described 758 in RFC 2119 (https://tools.ietf.org/html/rfc2119). 760 This version of this YANG module is part of RFC XXXX 761 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 762 full legal notices."; 764 revision 2018-08-08 { 765 description 766 "Initial revision."; 767 reference "RFC XXXX: YANG Alarm Module"; 768 } 770 /* 771 * Features 772 */ 774 feature operator-actions { 775 description 776 "This feature indicates that the system supports operator 777 states on alarms."; 778 } 780 feature alarm-shelving { 781 description 782 "This feature indicates that the system supports shelving 783 (blocking) alarms."; 784 } 786 feature alarm-history { 787 description 788 "This feature indicates that server maintains a history of 789 state changes for each alarm. For example, if an alarm 790 toggles between cleared and active 10 times, these state 791 changes are present in a separate list in the alarm."; 792 } 794 feature alarm-summary { 795 description 796 "This feature indicates that the server summarizes the number 797 of alarms per severity and operator state."; 798 } 800 feature alarm-profile { 801 description 802 "The system supports clients to configure further information 803 to each alarm type."; 804 } 806 feature severity-assignment { 807 description 808 "The system supports configurable alarm severity levels."; 809 reference 810 "M.3160/M.3100 Alarm Severity Assignment Profile, ASAP"; 811 } 813 /* 814 * Identities 815 */ 817 identity alarm-type-id { 818 description 819 "Base identity for alarm types. A unique identification of the 820 alarm, not including the resource. Different resources can 821 share alarm types. If the resource reports the same alarm 822 type, it is to be considered to be the same alarm. The alarm 823 type is a simplification of the different X.733 and 3GPP alarm 824 IRP alarm correlation mechanisms and it allows for 825 hierarchical extensions. 827 A string-based qualifier can be used in addition to the 828 identity in order to have different alarm types based on 829 information not known at design-time, such as values in 830 textual SNMP Notification var-binds. 832 Standards and vendors can define sub-identities to clearly 833 identify specific alarm types. 835 This identity is abstract and MUST NOT be used for alarms."; 836 } 837 /* 838 * Common types 839 */ 841 typedef resource { 842 type union { 843 type instance-identifier { 844 require-instance false; 845 } 846 type yang:object-identifier; 847 type yang:uuid; 848 type string; 849 } 850 description 851 "This is an identification of the alarming resource, such as an 852 interface. It should be as fine-grained as possible both to 853 guide the operator and to guarantee uniqueness of the alarms. 855 If the alarming resource is modelled in YANG, this type will 856 be an instance-identifier. 858 If the resource is an SNMP object, the type will be an 859 object-identifier. 861 If the resource is anything else, for example a distinguished 862 name or a CIM path, this type will be a string. 864 If the alarming object is identified by a UUID use the uuid 865 type. Be cautious when using this type, since a UUID is hard 866 to use for an operator. 868 If the server supports several models, the presedence should 869 be in the order as given in the union definition."; 870 } 872 typedef resource-match { 873 type union { 874 type yang:xpath1.0; 875 type yang:object-identifier; 876 type string; 877 } 878 description 879 "This type is used to match resources of type 'resource'. 880 Since the type 'resource' is a union of different types, 881 the 'resource-match' type is also a union of corresponding 882 types. 884 If the type is given as an XPath 1.0 expression, a resource 885 of type 'instance-identifier' matches if the instance is part 886 of the node set that is the result of evaluating the XPath 1.0 887 expression. For example, the XPath 1.0 expression: 889 /if:interfaces/if:interface[if:type='ianaift:ethernetCsmacd'] 891 would match the resource instance-identifier: 893 /if:interfaces/if:interface[if:name='eth1'], 895 assuming that the interface 'eth1' is of type 896 'ianaift:ethernetCsmacd'. 898 If the type is given as an object identifier, a resource of 899 type 'object-identifier' matches if the match object 900 identifier is a prefix of the resource's object identifier. 901 For example, the value: 903 1.3.6.1.2.1.2.2 905 would match the resource object identifier: 907 1.3.6.1.2.1.2.2.1.1.5 909 If the type is given as an UUID or a string, it is interpreted 910 as a W3C regular expression, which matches a resource of type 911 'yang:uuid' or 'string' if the given regular expression 912 matches the resource string. 914 If the type is given as an XPath expression it is evaluated 915 in the following XPath context: 917 o The set of namespace declarations are those in scope on 918 the leaf element where this type is used. 920 o The set of variable bindings is empty. 922 o The function library is the core function library 923 and the functions defined in Section 10 of RFC 7950. 925 o The function library is the core function library 927 o The context node is the root node in the data tree."; 928 } 930 typedef alarm-text { 931 type string; 932 description 933 "The string used to inform operators about the alarm. This 934 MUST contain enough information for an operator to be able 935 to understand the problem and how to resolve it. If this 936 string contains structure, this format should be clearly 937 documented for programs to be able to parse that 938 information."; 939 } 941 typedef severity { 942 type enumeration { 943 enum indeterminate { 944 value 2; 945 description 946 "Indicates that the severity level could not be 947 determined. This level SHOULD be avoided."; 948 } 949 enum minor { 950 value 3; 951 description 952 "The 'minor' severity level indicates the existence of a 953 non-service affecting fault condition and that corrective 954 action should be taken in order to prevent a more serious 955 (for example, service affecting) fault. Such a severity 956 can be reported, for example, when the detected alarm 957 condition is not currently degrading the capacity of the 958 resource."; 959 } 960 enum warning { 961 value 4; 962 description 963 "The 'warning' severity level indicates the detection of a 964 potential or impending service affecting fault, before any 965 significant effects have been felt. Action should be 966 taken to further diagnose (if necessary) and correct the 967 problem in order to prevent it from becoming a more 968 serious service affecting fault."; 969 } 970 enum major { 971 value 5; 972 description 973 "The 'major' severity level indicates that a service 974 affecting condition has developed and an urgent corrective 975 action is required. Such a severity can be reported, for 976 example, when there is a severe degradation in the 977 capability of the resource and its full capability must be 978 restored."; 979 } 980 enum critical { 981 value 6; 982 description 983 "The 'critical' severity level indicates that a service 984 affecting condition has occurred and an immediate 985 corrective action is required. Such a severity can be 986 reported, for example, when a resource becomes totally out 987 of service and its capability must be restored."; 988 } 989 } 990 description 991 "The severity level of the alarm. Note well that value 'clear' 992 is not included. If an alarm is cleared or not is a separate 993 boolean flag."; 994 reference 995 "ITU Recommendation X.733: Information Technology 996 - Open Systems Interconnection 997 - System Management: Alarm Reporting Function"; 998 } 1000 typedef severity-with-clear { 1001 type union { 1002 type enumeration { 1003 enum cleared { 1004 value 1; 1005 description 1006 "The alarm is cleared by the instrumentation."; 1007 } 1008 } 1009 type severity; 1010 } 1011 description 1012 "The severity level of the alarm including clear. 1013 This is used *only* in notifications reporting state changes 1014 for an alarm."; 1015 } 1017 typedef writable-operator-state { 1018 type enumeration { 1019 enum none { 1020 value 1; 1021 description 1022 "The alarm is not being taken care of."; 1023 } 1024 enum ack { 1025 value 2; 1026 description 1027 "The alarm is being taken care of. Corrective action not 1028 taken yet, or failed"; 1030 } 1031 enum closed { 1032 value 3; 1033 description 1034 "Corrective action taken successfully."; 1035 } 1036 } 1037 description 1038 "Operator states on an alarm. The 'closed' state indicates 1039 that an operator considers the alarm being resolved. This 1040 is separate from the alarm's 'is-cleared' leaf."; 1041 } 1043 typedef operator-state { 1044 type union { 1045 type writable-operator-state; 1046 type enumeration { 1047 enum shelved { 1048 value 4; 1049 description 1050 "The alarm is shelved. Alarms in /alarms/shelved-alarms/ 1051 MUST be assigned this operator state by the server as 1052 the last entry in the operator-state-change list. The 1053 text for that entry SHOULD include the shelf name."; 1054 } 1055 enum un-shelved { 1056 value 5; 1057 description 1058 "The alarm is moved back to 'alarm-list' from a shelf. 1059 Alarms that are moved from /alarms/shelved-alarms/ to 1060 /alarms/alarm-list MUST be assigned this state by the 1061 server as the last entry in the 'operator-state-change' 1062 list. The text for that entry SHOULD include the shelf 1063 name."; 1064 } 1065 } 1066 } 1067 description 1068 "Operator states on an alarm. The 'closed' state indicates 1069 that an operator considers the alarm being resolved. This 1070 is separate from the alarm's 'is-cleared' leaf."; 1071 } 1073 /* Alarm type */ 1075 typedef alarm-type-id { 1076 type identityref { 1077 base alarm-type-id; 1079 } 1080 description 1081 "Identifies an alarm type. The description of the alarm type 1082 id MUST indicate if the alarm type is abstract or not. An 1083 abstract alarm type is used as a base for other alarm type ids 1084 and will not be used as a value for an alarm or be present in 1085 the alarm inventory."; 1086 } 1088 typedef alarm-type-qualifier { 1089 type string; 1090 description 1091 "If an alarm type can not be fully specified at design time by 1092 alarm-type-id, this string qualifier is used in addition to 1093 fully define a unique alarm type. 1095 The definition of alarm qualifiers is considered being part 1096 of the instrumentation and out of scope for this module. 1097 An empty string is used when this is part of a key."; 1098 } 1100 /* 1101 * Groupings 1102 */ 1104 grouping common-alarm-parameters { 1105 description 1106 "Common parameters for an alarm. 1108 This grouping is used both in the alarm list and in the 1109 notification representing an alarm state change."; 1110 leaf resource { 1111 type resource; 1112 mandatory true; 1113 description 1114 "The alarming resource. See also 'alt-resource'. 1115 This could for example be a reference to the alarming 1116 interface"; 1117 } 1118 leaf alarm-type-id { 1119 type alarm-type-id; 1120 mandatory true; 1121 description 1122 "This leaf and the leaf 'alarm-type-qualifier' together 1123 provides a unique identification of the alarm type."; 1124 } 1125 leaf alarm-type-qualifier { 1126 type alarm-type-qualifier; 1127 description 1128 "This leaf is used when the 'alarm-type-id' leaf cannot 1129 uniquely identify the alarm type. Normally, this is not 1130 the case, and this leaf is the empty string."; 1131 } 1132 leaf-list alt-resource { 1133 type resource; 1134 description 1135 "Used if the alarming resource is available over other 1136 interfaces. This field can contain SNMP OID's, CIM paths or 1137 3GPP Distinguished names for example."; 1138 } 1139 list related-alarm { 1140 key "resource alarm-type-id alarm-type-qualifier"; 1141 description 1142 "References to related alarms. Note that the related alarm 1143 might have been purged from the alarm list."; 1144 leaf resource { 1145 type leafref { 1146 path "/alarms/alarm-list/alarm/resource"; 1147 require-instance false; 1148 } 1149 description 1150 "The alarming resource for the related alarm."; 1151 } 1152 leaf alarm-type-id { 1153 type leafref { 1154 path "/alarms/alarm-list/alarm" 1155 + "[resource=current()/../resource]" 1156 + "/alarm-type-id"; 1157 require-instance false; 1158 } 1159 description 1160 "The alarm type identifier for the related alarm."; 1161 } 1162 leaf alarm-type-qualifier { 1163 type leafref { 1164 path "/alarms/alarm-list/alarm" 1165 + "[resource=current()/../resource]" 1166 + "[alarm-type-id=current()/../alarm-type-id]" 1167 + "/alarm-type-qualifier"; 1168 require-instance false; 1169 } 1170 description 1171 "The alarm qualifier for the related alarm."; 1172 } 1173 } 1174 leaf-list impacted-resource { 1175 type resource; 1176 description 1177 "Resources that might be affected by this alarm. If the 1178 system creates an alarm on a resource and also has a mapping 1179 to other resources that might be impacted, these resources 1180 can be listed in this leaf-list. In this way the system can 1181 create one alarm instead of several. For example, if an 1182 interface has an alarm, the 'impacted-resource' can 1183 reference the aggregated port channels."; 1184 } 1185 leaf-list root-cause-resource { 1186 type resource; 1187 description 1188 "Resources that are candidates for causing the alarm. If the 1189 system has a mechanism to understand the candidate root 1190 causes of an alarm, this leaf-list can be used to list the 1191 root cause candidate resources. In this way the system can 1192 create one alarm instead of several. An example might be a 1193 logging system (alarm resource) that fails, the alarm can 1194 reference the file-system in the 'root-cause-resource' 1195 leaf-list. Note that the intended use is not to also send an 1196 an alarm with the root-cause-resource as alarming resource. 1197 The root-cause-resource leaf list is a hint and should not 1198 also generate an alarm for the same problem."; 1199 } 1200 } 1202 grouping alarm-state-change-parameters { 1203 description 1204 "Parameters for an alarm state change. 1206 This grouping is used both in the alarm list's 1207 status-change list and in the notification representing an 1208 alarm state change."; 1209 leaf time { 1210 type yang:date-and-time; 1211 mandatory true; 1212 description 1213 "The time the status of the alarm changed. The value 1214 represents the time the real alarm state change appeared 1215 in the resource and not when it was added to the 1216 alarm list. The /alarm-list/alarm/last-changed MUST be 1217 set to the same value."; 1218 } 1219 leaf perceived-severity { 1220 type severity-with-clear; 1221 mandatory true; 1222 description 1223 "The severity of the alarm as defined by X.733. Note 1224 that this may not be the original severity since the alarm 1225 may have changed severity."; 1226 reference 1227 "ITU Recommendation X.733: Information Technology 1228 - Open Systems Interconnection 1229 - System Management: Alarm Reporting Function"; 1230 } 1231 leaf alarm-text { 1232 type alarm-text; 1233 mandatory true; 1234 description 1235 "A user friendly text describing the alarm state change."; 1236 reference 1237 "ITU Recommendation X.733: Information Technology 1238 - Open Systems Interconnection 1239 - System Management: Alarm Reporting Function"; 1240 } 1241 } 1243 grouping operator-parameters { 1244 description 1245 "This grouping defines parameters that can be changed by an 1246 operator."; 1247 leaf time { 1248 type yang:date-and-time; 1249 mandatory true; 1250 description 1251 "Timestamp for operator action on alarm."; 1252 } 1253 leaf operator { 1254 type string; 1255 mandatory true; 1256 description 1257 "The name of the operator that has acted on this 1258 alarm."; 1259 } 1260 leaf state { 1261 type operator-state; 1262 mandatory true; 1263 description 1264 "The operator's view of the alarm state."; 1265 } 1266 leaf text { 1267 type string; 1268 description 1269 "Additional optional textual information provided by 1270 the operator."; 1272 } 1273 } 1275 grouping resource-alarm-parameters { 1276 description 1277 "Alarm parameters that originates from the resource view."; 1278 leaf is-cleared { 1279 type boolean; 1280 mandatory true; 1281 description 1282 "Indicates the current clearance state of the alarm. An 1283 alarm might toggle from active alarm to cleared alarm and 1284 back to active again."; 1285 } 1286 leaf last-changed { 1287 type yang:date-and-time; 1288 mandatory true; 1289 description 1290 "A timestamp when the alarm status was last changed. Status 1291 changes are changes to 'is-cleared', 'perceived-severity', 1292 and 'alarm-text'."; 1293 } 1294 leaf perceived-severity { 1295 type severity; 1296 mandatory true; 1297 description 1298 "The last severity of the alarm. 1300 If an alarm was raised with severity 'warning', but later 1301 changed to 'major', this leaf will show 'major'."; 1302 } 1303 leaf alarm-text { 1304 type alarm-text; 1305 mandatory true; 1306 description 1307 "The last reported alarm text. This text should contain 1308 information for an operator to be able to understand 1309 the problem and how to resolve it."; 1310 } 1311 list status-change { 1312 if-feature "alarm-history"; 1313 key "time"; 1314 min-elements 1; 1315 description 1316 "A list of status change events for this alarm. 1318 The entry with latest time-stamp in this list MUST 1319 correspond to the leafs 'is-cleared', 'perceived-severity' 1320 and 'alarm-text' for the alarm. The time-stamp for that 1321 entry MUST be equal to the 'last-changed' leaf. 1323 This list is ordered according to the timestamps of 1324 alarm state changes. The last item corresponds to the 1325 latest state change. 1327 The following state changes creates an entry in this 1328 list: 1329 - changed severity (warning, minor, major, critical) 1330 - clearance status, this also updates the 'is-cleared' 1331 leaf 1332 - alarm text update"; 1333 uses alarm-state-change-parameters; 1334 } 1335 } 1337 /* 1338 * The /alarms data tree 1339 */ 1341 container alarms { 1342 description 1343 "The top container for this module."; 1344 container control { 1345 description 1346 "Configuration to control the alarm behaviour."; 1347 leaf max-alarm-status-changes { 1348 type union { 1349 type uint16; 1350 type enumeration { 1351 enum infinite { 1352 description 1353 "The status change entries are accumulated 1354 infinitely."; 1355 } 1356 } 1357 } 1358 default "32"; 1359 description 1360 "The status-change entries are kept in a circular list 1361 per alarm. When this number is exceeded, the oldest 1362 status change entry is automatically removed. If the 1363 value is 'infinite', the status change entries are 1364 accumulated infinitely."; 1365 } 1366 choice notify-status-changes { 1367 description 1368 "This leaf controls the notifications sent for alarm status 1369 updates. There are three options: 1370 1. notifications are sent for all updates, severity level 1371 changes and alarm text changes 1372 2. notifications are only sent for alarm raise and clear 1373 3. notifications are sent for status changes equal to or 1374 above the specified severity level. Clear notifications 1375 shall always be sent 1376 Notifications shall also be sent for state changes that 1377 makes an alarm less severe than the specified level. 1378 In option 3, assuming the severity level is set to major, 1379 and that the alarm has the following state changes 1380 [(Time, severity, clear)]: 1381 [(T1, major, -), (T2, minor, -), (T3, warning, -), 1382 (T4, minor, -), (T5, major, -), (T6, critical, -), 1383 (T7, major. -), (T8, major, clear)] 1384 In that case, notifications will be sent at 1385 T1, T2, T5, T6, T7 and T8."; 1386 leaf notify-all-state-changes { 1387 type empty; 1388 description 1389 "Send notifications for all status changes."; 1390 } 1391 leaf notify-raise-and-clear { 1392 type empty; 1393 description 1394 "Send notifications only for raise, clear, and re-raise. 1395 Notifications for severity level changes or alarm text 1396 changes are not sent."; 1397 } 1398 leaf notify-severity-level { 1399 type severity; 1400 description 1401 "Only send notifications for alarm state changes 1402 crossing the specified level. Always send clear 1403 notifications."; 1404 } 1405 } 1406 container alarm-shelving { 1407 if-feature "alarm-shelving"; 1408 description 1409 "The alarm-shelving/shelf list is used to shelve 1410 (block/filter) alarms. The server will move any alarms 1411 corresponding to the shelving criteria from the 1412 alarms/alarm-list/alarm list to the 1413 alarms/shelved-alarms/shelved-alarm list. It will also 1414 stop sending notifications for the shelved alarms. The 1415 conditions in the shelf criteria are logically ANDed. 1417 When the shelving criteria is deleted or changed, the 1418 non-matching alarms MUST appear in the 1419 alarms/alarm-list/alarm list according to the real state. 1420 This means that the instrumentation MUST maintain states 1421 for the shelved alarms. Alarms that match the criteria 1422 shall have an operator-state 'shelved'. When the shelf 1423 configuration will remove an alarm from the shelf the 1424 server shall add an operator state 'unshelved'."; 1425 list shelf { 1426 key "name"; 1427 leaf name { 1428 type string; 1429 description 1430 "An arbitrary name for the alarm shelf."; 1431 } 1432 description 1433 "Each entry defines the criteria for shelving alarms. 1434 Criteria are ANDed. If no criteria are specified, 1435 all alarms will be shelved."; 1436 leaf-list resource { 1437 type resource-match; 1438 description 1439 "Shelve alarms for matching resources."; 1440 } 1441 leaf alarm-type-id { 1442 type alarm-type-id; 1443 description 1444 "Shelve all alarms that have an alarm-type-id that is 1445 equal to or derived from the given alarm-type-id."; 1446 } 1447 leaf alarm-type-qualifier-match { 1448 type string; 1449 description 1450 "A W3C regular expression that is used to match 1451 an alarm type qualifier. Shelve all alarms that 1452 matches this regular expression for the alarm 1453 type qualifier."; 1454 } 1455 leaf description { 1456 type string; 1457 description 1458 "An optional textual description of the shelf. This 1459 description should include the reason for shelving 1460 these alarms."; 1461 } 1462 } 1463 } 1464 } 1465 container alarm-inventory { 1466 config false; 1467 description 1468 "This alarm-inventory/alarm-type list contains all possible 1469 alarm types for the system. 1470 If the system knows for which resources a specific alarm 1471 type can appear, this is also identified in the inventory. 1472 The list also tells if each alarm type has a corresponding 1473 clear state. The inventory shall only contain concrete 1474 alarm types. 1476 The alarm inventory MUST be updated by the system when new 1477 alarms can appear. This can be the case when installing new 1478 software modules or inserting new card types. A 1479 notification 'alarm-inventory-changed' is sent when the 1480 inventory is changed."; 1481 list alarm-type { 1482 key "alarm-type-id alarm-type-qualifier"; 1483 description 1484 "An entry in this list defines a possible alarm."; 1485 leaf alarm-type-id { 1486 type alarm-type-id; 1487 description 1488 "The statically defined alarm type identifier for this 1489 possible alarm."; 1490 } 1491 leaf alarm-type-qualifier { 1492 type alarm-type-qualifier; 1493 description 1494 "The optionally dynamically defined alarm type identifier 1495 for this possible alarm."; 1496 } 1497 leaf-list resource { 1498 type resource-match; 1499 description 1500 "Optionally, specifies for which resources the alarm type 1501 is valid."; 1502 } 1503 leaf has-clear { 1504 type boolean; 1505 mandatory true; 1506 description 1507 "This leaf tells the operator if the alarm will be 1508 cleared when the correct corrective action has been 1509 taken. Implementations SHOULD strive for detecting the 1510 cleared state for all alarm types. If this leaf is 1511 true, the operator can monitor the alarm until it 1512 becomes cleared after the corrective action has been 1513 taken. If this leaf is false the operator needs to 1514 validate that the alarm is not longer active using other 1515 mechanisms. Alarms can lack a corresponding clear due 1516 to missing instrumentation or that there is no logical 1517 corresponding clear state."; 1518 } 1519 leaf-list severity-levels { 1520 type severity; 1521 description 1522 "This leaf-list indicates the possible severity levels of 1523 this alarm type. Note well that 'clear' is not part of 1524 the severity type. In general, the severity level should 1525 be defined by the instrumentation based on dynamic state 1526 and not defined statically by the alarm type in order to 1527 provide relevant severity level based on dynamic state 1528 and context. However most alarm types have a defined set 1529 of possible severity levels and this should be provided 1530 here."; 1531 } 1532 leaf description { 1533 type string; 1534 mandatory true; 1535 description 1536 "A description of the possible alarm. It SHOULD include 1537 information on possible underlying root causes and 1538 corrective actions."; 1539 } 1540 } 1541 } 1542 container summary { 1543 if-feature "alarm-summary"; 1544 config false; 1545 description 1546 "This container gives a summary of number of alarms."; 1547 list alarm-summary { 1548 key "severity"; 1549 description 1550 "A global summary of all alarms in the system. The summary 1551 does not include shelved alarms."; 1552 leaf severity { 1553 type severity; 1554 description 1555 "Alarm summary for this severity level."; 1556 } 1557 leaf total { 1558 type yang:gauge32; 1559 description 1560 "Total number of alarms of this severity level."; 1562 } 1563 leaf cleared { 1564 type yang:gauge32; 1565 description 1566 "For this severity level, the number of alarms that are 1567 cleared."; 1568 } 1569 leaf cleared-not-closed { 1570 if-feature "operator-actions"; 1571 type yang:gauge32; 1572 description 1573 "For this severity level, the number of alarms that are 1574 cleared but not closed."; 1575 } 1576 leaf cleared-closed { 1577 if-feature "operator-actions"; 1578 type yang:gauge32; 1579 description 1580 "For this severity level, the number of alarms that are 1581 cleared and closed."; 1582 } 1583 leaf not-cleared-closed { 1584 if-feature "operator-actions"; 1585 type yang:gauge32; 1586 description 1587 "For this severity level, the number of alarms that are 1588 not cleared but closed."; 1589 } 1590 leaf not-cleared-not-closed { 1591 if-feature "operator-actions"; 1592 type yang:gauge32; 1593 description 1594 "For this severity level, the number of alarms that are 1595 not cleared and not closed."; 1596 } 1597 } 1598 leaf shelves-active { 1599 if-feature "alarm-shelving"; 1600 type empty; 1601 description 1602 "This is a hint to the operator that there are active 1603 alarm shelves. This leaf MUST exist if the 1604 alarms/shelved-alarms/number-of-shelved-alarms is > 0."; 1605 } 1606 } 1607 container alarm-list { 1608 config false; 1609 description 1610 "The alarms in the system."; 1611 leaf number-of-alarms { 1612 type yang:gauge32; 1613 description 1614 "This object shows the total number of 1615 alarms in the system, i.e., the total number 1616 of entries in the alarm list."; 1617 } 1618 leaf last-changed { 1619 type yang:date-and-time; 1620 description 1621 "A timestamp when the alarm list was last 1622 changed. The value can be used by a manager to 1623 initiate an alarm resynchronization procedure."; 1624 } 1625 list alarm { 1626 key "resource alarm-type-id alarm-type-qualifier"; 1627 description 1628 "The list of alarms. Each entry in the list holds one 1629 alarm for a given alarm type and resource. 1630 An alarm can be updated from the underlying resource or 1631 by the user. The following leafs are maintained by the 1632 resource: is-cleared, last-change, perceived-severity, 1633 and alarm-text. An operator can change: operator-state 1634 and operator-text. 1636 Entries appear in the alarm list the first time an 1637 alarm becomes active for a given alarm-type and resource. 1638 Entries do not get deleted when the alarm is cleared, this 1639 is a boolean state in the alarm. 1641 Alarm entries are removed, purged, from the list by an 1642 explicit purge action. For example, purge all alarms 1643 that are cleared and in closed operator-state that are 1644 older than 24 hours. Systems may also remove alarms based 1645 on locally configured policies which is out of scope for 1646 this module."; 1647 uses common-alarm-parameters; 1648 leaf time-created { 1649 type yang:date-and-time; 1650 mandatory true; 1651 description 1652 "The time-stamp when this alarm entry was created. This 1653 represents the first time the alarm appeared, it can 1654 also represent that the alarm re-appeared after a purge. 1655 Further state-changes of the same alarm does not change 1656 this leaf, these changes will update the 'last-changed' 1657 leaf."; 1659 } 1660 uses resource-alarm-parameters; 1661 list operator-state-change { 1662 if-feature "operator-actions"; 1663 key "time"; 1664 description 1665 "This list is used by operators to indicate 1666 the state of human intervention on an alarm. 1667 For example, if an operator has seen an alarm, 1668 the operator can add a new item to this list indicating 1669 that the alarm is acknowledged."; 1670 uses operator-parameters; 1671 } 1672 action set-operator-state { 1673 if-feature "operator-actions"; 1674 description 1675 "This is a means for the operator to indicate 1676 the level of human intervention on an alarm."; 1677 input { 1678 leaf state { 1679 type writable-operator-state; 1680 mandatory true; 1681 description 1682 "Set this operator state."; 1683 } 1684 leaf text { 1685 type string; 1686 description 1687 "Additional optional textual information."; 1688 } 1689 } 1690 } 1691 notification operator-action { 1692 if-feature "operator-actions"; 1693 description 1694 "This notification is used to report that an operator 1695 acted upon an alarm."; 1696 uses operator-parameters; 1697 } 1698 } 1699 } 1700 container shelved-alarms { 1701 if-feature "alarm-shelving"; 1702 config false; 1703 description 1704 "The shelved alarms. Alarms appear here if they match the 1705 criteria in /alarms/control/alarm-shelving. This list does 1706 not generate any notifications. The list represents alarms 1707 that are considered not relevant by the operator. Alarms in 1708 this list have an operator-state of 'shelved'. This can not 1709 be changed."; 1710 leaf number-of-shelved-alarms { 1711 type yang:gauge32; 1712 description 1713 "This object shows the total number of currently 1714 alarms, i.e., the total number of entries 1715 in the alarm list."; 1716 } 1717 leaf alarm-shelf-last-changed { 1718 type yang:date-and-time; 1719 description 1720 "A timestamp when the shelved alarm list was last 1721 changed. The value can be used by a manager to 1722 initiate an alarm resynchronization procedure."; 1723 } 1724 list shelved-alarm { 1725 key "resource alarm-type-id alarm-type-qualifier"; 1726 description 1727 "The list of shelved alarms. Shelved alarms 1728 can only be updated from the underlying resource, 1729 no operator actions are supported."; 1730 uses common-alarm-parameters; 1731 leaf shelf-name { 1732 type leafref { 1733 path "/alarms/control/alarm-shelving/shelf/name"; 1734 require-instance false; 1735 } 1736 description 1737 "The name of the shelf."; 1738 } 1739 uses resource-alarm-parameters; 1740 list operator-state-change { 1741 if-feature "operator-actions"; 1742 key "time"; 1743 description 1744 "This list is used by operators to indicate 1745 the state of human intervention on an alarm. 1746 For shelved alarms, the system has set the list 1747 item in the list to 'shelved'."; 1748 uses operator-parameters; 1749 } 1750 } 1751 } 1752 list alarm-profile { 1753 if-feature "alarm-profile"; 1754 key "alarm-type-id alarm-type-qualifier-match resource"; 1755 ordered-by user; 1756 description 1757 "This list is used to assign further information or 1758 configuration for each alarm type. This module supports 1759 a mechanism where the client can override the system 1760 default alarm severity levels. The alarm-profile is 1761 also a useful augmentation point for specific additions 1762 to alarm types."; 1763 leaf alarm-type-id { 1764 type al:alarm-type-id; 1765 description 1766 "The alarm type identifier to match."; 1767 } 1768 leaf alarm-type-qualifier-match { 1769 type string; 1770 description 1771 "A W3C regular expression that is used to 1772 match."; 1773 } 1774 leaf resource { 1775 type al:resource-match; 1776 description 1777 "Specifies which resources to match."; 1778 } 1779 leaf description { 1780 type string; 1781 mandatory true; 1782 description 1783 "A description of the alarm profile."; 1784 } 1785 container alarm-severity-assignment-profile { 1786 if-feature "severity-assignment"; 1787 description 1788 "The client can override the system default 1789 severity level."; 1790 reference 1791 "ITU M.3100, ITU M.3160 1792 - Generic Network Information Model, 1793 Alarm Severity Assignment Profile"; 1794 leaf-list severity-levels { 1795 type al:severity; 1796 ordered-by user; 1797 description 1798 "Specifies the configured severity level(s) for the 1799 matching alarm. If the alarm has several severity 1800 levels the leaf-list shall be given in rising severity 1801 order. The original M3100/M3160 ASAP function only 1802 allows for a one-to-one mapping between alarm type and 1803 severity but since the IETF alarm module supports 1804 stateful alarms the mapping must allow for several 1805 severity levels. 1807 Assume a high-utilisation alarm type with two 1808 thresholds with the system default severity levels of 1809 threshold1 = warning and threshold2 = minor. Setting 1810 this leaf-list to (minor, major) will assign the 1811 severity levels threshold1 = minor and 1812 threshold2 = major"; 1813 } 1814 } 1815 } 1816 } 1818 /* 1819 * Operations 1820 */ 1822 rpc compress-alarms { 1823 if-feature "alarm-history"; 1824 description 1825 "This operation requests the server to compress entries in the 1826 alarm list by removing all but the latest state change for all 1827 alarms. Conditions in the input are logically ANDed. If no 1828 input condition is given, all alarms are compressed."; 1829 input { 1830 leaf resource { 1831 type leafref { 1832 path "/alarms/alarm-list/alarm/resource"; 1833 require-instance false; 1834 } 1835 description 1836 "Compress the alarms with this resource."; 1837 } 1838 leaf alarm-type-id { 1839 type leafref { 1840 path "/alarms/alarm-list/alarm/alarm-type-id"; 1841 require-instance false; 1842 } 1843 description 1844 "Compress alarms with this alarm-type-id."; 1845 } 1846 leaf alarm-type-qualifier { 1847 type leafref { 1848 path "/alarms/alarm-list/alarm/alarm-type-qualifier"; 1849 require-instance false; 1850 } 1851 description 1852 "Compress the alarms with this alarm-type-qualifier."; 1853 } 1854 } 1855 output { 1856 leaf compressed-alarms { 1857 type uint32; 1858 description 1859 "Number of compressed alarm entries."; 1860 } 1861 } 1862 } 1863 rpc compress-shelved-alarms { 1864 if-feature "alarm-history and alarm-shelving"; 1865 description 1866 "This operation requests the server to compress entries in the 1867 shelved alarm list by removing all but the latest state change 1868 for all alarms. Conditions in the input are logically ANDed. 1869 If no input condition is given, all alarms are compressed."; 1870 input { 1871 leaf resource { 1872 type leafref { 1873 path "/alarms/shelved-alarms/shelved-alarm/resource"; 1874 require-instance false; 1875 } 1876 description 1877 "Compress the alarms with this resource."; 1878 } 1879 leaf alarm-type-id { 1880 type leafref { 1881 path "/alarms/shelved-alarms/shelved-alarm/alarm-type-id"; 1882 require-instance false; 1883 } 1884 description 1885 "Compress alarms with this alarm-type-id."; 1886 } 1887 leaf alarm-type-qualifier { 1888 type leafref { 1889 path "/alarms/shelved-alarms/shelved-alarm" 1890 + "/alarm-type-qualifier"; 1891 require-instance false; 1892 } 1893 description 1894 "Compress the alarms with this alarm-type-qualifier."; 1895 } 1896 } 1897 output { 1898 leaf compressed-alarms { 1899 type uint32; 1900 description 1901 "Number of compressed alarm entries."; 1902 } 1903 } 1904 } 1906 grouping filter-input { 1907 description 1908 "Grouping to specify a filter construct on alarm information."; 1909 leaf alarm-status { 1910 type enumeration { 1911 enum any { 1912 description 1913 "Ignore alarm clearance status."; 1914 } 1915 enum cleared { 1916 description 1917 "Filter cleared alarms."; 1918 } 1919 enum not-cleared { 1920 description 1921 "Filter not cleared alarms."; 1922 } 1923 } 1924 mandatory true; 1925 description 1926 "The clearance status of the alarm."; 1927 } 1928 container older-than { 1929 presence "Age specification"; 1930 description 1931 "Matches the 'last-status-change' leaf in the alarm."; 1932 choice age-spec { 1933 description 1934 "Filter using date and time age."; 1935 case seconds { 1936 leaf seconds { 1937 type uint16; 1938 description 1939 "Seconds part"; 1940 } 1941 } 1942 case minutes { 1943 leaf minutes { 1944 type uint16; 1945 description 1946 "Minute part"; 1948 } 1949 } 1950 case hours { 1951 leaf hours { 1952 type uint16; 1953 description 1954 "Hours part."; 1955 } 1956 } 1957 case days { 1958 leaf days { 1959 type uint16; 1960 description 1961 "Day part"; 1962 } 1963 } 1964 case weeks { 1965 leaf weeks { 1966 type uint16; 1967 description 1968 "Week part"; 1969 } 1970 } 1971 } 1972 } 1973 container severity { 1974 presence "Severity filter"; 1975 choice sev-spec { 1976 description 1977 "Filter based on severity level."; 1978 leaf below { 1979 type severity; 1980 description 1981 "Severity less than this leaf."; 1982 } 1983 leaf is { 1984 type severity; 1985 description 1986 "Severity level equal this leaf."; 1987 } 1988 leaf above { 1989 type severity; 1990 description 1991 "Severity level higher than this leaf."; 1992 } 1993 } 1994 description 1995 "Filter based on severity."; 1997 } 1998 container operator-state-filter { 1999 if-feature "operator-actions"; 2000 presence "Operator state filter"; 2001 leaf state { 2002 type operator-state; 2003 description 2004 "Filter on operator state."; 2005 } 2006 leaf user { 2007 type string; 2008 description 2009 "Filter based on which operator."; 2010 } 2011 description 2012 "Filter based on operator state."; 2013 } 2014 } 2016 rpc purge-alarms { 2017 description 2018 "This operation requests the server to delete entries from the 2019 alarm list or the shelved alarms list according to the 2020 supplied criteria. To purge alarms in the shelved alarms, 2021 set the operator-state filter input to 'shelved'. 2022 Typically it can be used to delete alarms that are 2023 in closed operator state and older than a specified time. 2024 In the shelved alarm list it makes sense to delete alarms that 2025 are not relevant anymore. 2026 The number of purged alarms is returned as an output 2027 parameter."; 2028 input { 2029 uses filter-input; 2030 } 2031 output { 2032 leaf purged-alarms { 2033 type uint32; 2034 description 2035 "Number of purged alarms."; 2036 } 2037 } 2038 } 2040 /* 2041 * Notifications 2042 */ 2044 notification alarm-notification { 2045 description 2046 "This notification is used to report a state change for an 2047 alarm. The same notification is used for reporting a newly 2048 raised alarm, a cleared alarm or changing the text and/or 2049 severity of an existing alarm."; 2050 uses common-alarm-parameters; 2051 uses alarm-state-change-parameters; 2052 } 2053 notification alarm-inventory-changed { 2054 description 2055 "This notification is used to report that the list of possible 2056 alarms has changed. This can happen when for example if a new 2057 software module is installed, or a new physical card is 2058 inserted."; 2059 } 2060 } 2062 2064 6. X.733 Extensions 2066 Many alarm systems are based on the X.733, [X.733], and X.736 [X.736] 2067 alarm standards. This module augments the alarm inventory, the alarm 2068 lists and the alarm notification with X.733 and X.736 parameters. 2070 The module also supports a feature whereby the alarm manager can 2071 configure the mapping from alarm types to X.733 event-type and 2072 probable-cause parameters. This might be needed when the default 2073 mapping provided by the system is in conflict with other management 2074 systems or not considered correct. 2076 Note that the IETF Alarm Module term 'resource' is synonymous to the 2077 ITU term 'managed object'. 2079 7. The X.733 Mapping Module 2081 This YANG module references [X.733] and [X.736]. 2083 file "ietf-alarms-x733@2018-08-08.yang" 2084 module ietf-alarms-x733 { 2085 yang-version 1.1; 2086 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; 2087 prefix x733; 2089 import ietf-alarms { 2090 prefix al; 2091 } 2092 import ietf-yang-types { 2093 prefix yang; 2094 reference "RFC 6991: Common YANG Data Types"; 2095 } 2097 organization 2098 "IETF CCAMP Working Group"; 2099 contact 2100 "WG Web: 2101 WG List: 2103 Editor: Stefan Vallin 2104 2106 Editor: Martin Bjorklund 2107 "; 2108 description 2109 "This module augments the ietf-alarms module with X.733 alarm 2110 parameters. 2112 The following structures are augmented with X.733 event type 2113 and probable cause: 2115 1) alarms/alarm-inventory: all possible alarm types 2116 2) alarms/alarm-list: every alarm in the system 2117 3) alarm-notification: notifications indicating alarm state 2118 changes 2120 The module also optionally allows the alarm management system 2121 to configure the mapping from the IETF Alarm module alarm keys 2122 to the ITU tuple (event-type, probable-cause). 2124 The mapping does not include a corresponding X.733 specific 2125 problem value. The recommendation is to use the 2126 'alarm-type-qualifier' leaf which serves the same purpose. 2128 The module uses an integer and a corresponding string for 2129 probable cause instead of a globally defined enumeration, in 2130 order to be able to manage conflicting enumeration definitions. 2131 A single globally defined enumeration is challenging to 2132 maintain."; 2133 reference 2134 "ITU Recommendation X.733: Information Technology 2135 - Open Systems Interconnection 2136 - System Management: Alarm Reporting Function"; 2138 revision 2018-08-08 { 2139 description 2140 "Initial revision."; 2142 reference "RFC XXXX: YANG Alarm Module"; 2143 } 2145 /* 2146 * Features 2147 */ 2149 feature configure-x733-mapping { 2150 description 2151 "The system supports configurable X733 mapping from 2152 the IETF alarm module alarm-type to X733 event-type 2153 and probable-cause."; 2154 } 2156 /* 2157 * Typedefs 2158 */ 2160 typedef event-type { 2161 type enumeration { 2162 enum other { 2163 value 1; 2164 description 2165 "None of the below."; 2166 } 2167 enum communications-alarm { 2168 value 2; 2169 description 2170 "An alarm of this type is principally associated with the 2171 procedures and/or processes required to convey 2172 information from one point to another."; 2173 } 2174 enum quality-of-service-alarm { 2175 value 3; 2176 description 2177 "An alarm of this type is principally associated with a 2178 degradation in the quality of a service."; 2179 } 2180 enum processing-error-alarm { 2181 value 4; 2182 description 2183 "An alarm of this type is principally associated with a 2184 software or processing fault."; 2185 } 2186 enum equipment-alarm { 2187 value 5; 2188 description 2189 "An alarm of this type is principally associated with an 2190 equipment fault."; 2191 } 2192 enum environmental-alarm { 2193 value 6; 2194 description 2195 "An alarm of this type is principally associated with a 2196 condition relating to an enclosure in which the equipment 2197 resides."; 2198 } 2199 enum integrity-violation { 2200 value 7; 2201 description 2202 "An indication that information may have been illegally 2203 modified, inserted or deleted."; 2204 } 2205 enum operational-violation { 2206 value 8; 2207 description 2208 "An indication that the provision of the requested service 2209 was not possible due to the unavailability, malfunction or 2210 incorrect invocation of the service."; 2211 } 2212 enum physical-violation { 2213 value 9; 2214 description 2215 "An indication that a physical resource has been violated 2216 in a way that suggests a security attack."; 2217 } 2218 enum security-service-or-mechanism-violation { 2219 value 10; 2220 description 2221 "An indication that a security attack has been detected by 2222 a security service or mechanism."; 2223 } 2224 enum time-domain-violation { 2225 value 11; 2226 description 2227 "An indication that an event has occurred at an unexpected 2228 or prohibited time."; 2229 } 2230 } 2231 description 2232 "The event types as defined by X.733 and X.736."; 2233 reference 2234 "ITU Recommendation X.733: Information Technology 2235 - Open Systems Interconnection 2236 - System Management: Alarm Reporting Function 2237 ITU Recommendation X.736: Information Technology 2238 - Open Systems Interconnection 2239 - System Management: Security Alarm Reporting Function"; 2240 } 2242 typedef trend { 2243 type enumeration { 2244 enum less-severe { 2245 description 2246 "There is at least one outstanding alarm of a 2247 severity higher (more severe) than that in the 2248 current alarm."; 2249 } 2250 enum no-change { 2251 description 2252 "The Perceived severity reported in the current 2253 alarm is the same as the highest (most severe) 2254 of any of the outstanding alarms"; 2255 } 2256 enum more-severe { 2257 description 2258 "The Perceived severity in the current alarm is 2259 higher (more severe) than that reported in any 2260 of the outstanding alarms."; 2261 } 2262 } 2263 description 2264 "This type is used to describe the 2265 severity trend of the alarming resource"; 2266 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2267 } 2269 typedef value-type { 2270 type union { 2271 type int64; 2272 type uint64; 2273 type decimal64 { 2274 fraction-digits 2; 2275 } 2276 } 2277 description 2278 "A generic union type to match ITU choice of integer 2279 and real."; 2280 } 2282 /* 2283 * Groupings 2284 */ 2286 grouping x733-alarm-parameters { 2287 description 2288 "Common X.733 parameters for alarms."; 2289 leaf event-type { 2290 type event-type; 2291 description 2292 "The X.733/X.736 event type for this alarm."; 2293 } 2294 leaf probable-cause { 2295 type uint32; 2296 description 2297 "The X.733 probable cause for this alarm."; 2298 } 2299 leaf probable-cause-string { 2300 type string; 2301 description 2302 "The user friendly string matching 2303 the probable cause integer value. The string 2304 SHOULD match the X.733 enumeration. For example, 2305 value 27 is 'localNodeTransmissionError'."; 2306 } 2307 container threshold-information { 2308 description 2309 "This parameter shall be present when the alarm 2310 is a result of crossing a threshold. "; 2311 leaf triggered-threshold { 2312 type string; 2313 description 2314 "The identifier of the threshold attribute that 2315 caused the notification."; 2316 } 2317 leaf observed-value { 2318 type value-type; 2319 description 2320 "The value of the gauge or counter which crossed 2321 the threshold. This may be different from the 2322 threshold value if, for example, the gauge may 2323 only take on discrete values."; 2324 } 2325 choice threshold-level { 2326 description 2327 "In the case of a gauge the threshold level specifies 2328 a pair of threshold values, the first being the value 2329 of the crossed threshold and the second, its corresponding 2330 hysteresis; in the case of a counter the threshold level 2331 specifies only the threshold value."; 2332 case up { 2333 leaf up-high { 2334 type value-type; 2335 description 2336 "The going up threshold for rising the alarm."; 2337 } 2338 leaf up-low { 2339 type value-type; 2340 description 2341 "The threshold level for clearing the alarm. 2342 This is used for hysteresis functions for gauges."; 2343 } 2344 } 2345 case down { 2346 leaf down-low { 2347 type value-type; 2348 description 2349 "The going down threshold for rising the alarm."; 2350 } 2351 leaf down-high { 2352 type value-type; 2353 description 2354 "The threshold level for clearing the alarm. 2355 This is used for hysteresis functions for gauges."; 2356 } 2357 } 2358 } 2359 leaf arm-time { 2360 type yang:date-and-time; 2361 description 2362 "For a gauge threshold, the time at which the threshold 2363 was last re-armed, namely the time after the previous 2364 threshold crossing at which the hysteresis value of the 2365 threshold was exceeded thus again permitting generation 2366 of notifications when the threshold is crossed. 2367 For a counter threshold, the later of the time at which 2368 the threshold offset was last applied, or the time at 2369 which the counter was last initialized (for resettable 2370 counters)."; 2371 } 2372 } 2373 list monitored-attributes { 2374 uses attribute; 2375 key "id"; 2376 description 2377 "The Monitored attributes parameter, when present, defines 2378 one or more attributes of the resource and their 2379 corresponding values at the time of the alarm."; 2380 } 2381 leaf-list proposed-repair-actions { 2382 type string; 2383 description 2384 "This parameter, when present, is used if the cause is 2385 known and the system being managed can suggest one or 2386 more solutions (such as switch in standby equipment, 2387 retry, replace media)."; 2388 } 2389 leaf trend-indication { 2390 type trend; 2391 description 2392 "This parameter specifies the current 2393 severity trend of the resource. If present it 2394 indicates that there are one or more alarms 2395 ('outstanding alarms') which have not been cleared, 2396 and pertain to the same resource as that to which 2397 this alarm ('current alarm') pertains. 2398 The possible values are: 2400 more-severe: The Perceived severity in the current 2401 alarm is higher (more severe) than that reported in 2402 any of the outstanding alarms. 2404 no-change: The Perceived severity reported in the 2405 current alarm is the same as the highest (most severe) 2406 of any of the outstanding alarms. 2408 less-severe: There is at least one outstanding alarm 2409 of a severity higher (more severe) than that in the 2410 current alarm."; 2411 } 2412 leaf backedup-status { 2413 type boolean; 2414 description 2415 "This parameter, when present, specifies whether or not 2416 the object emitting the alarm has been backed-up, and 2417 services provided to the user have, therefore, not been 2418 disrupted. The use of this field in conjunction with the 2419 severity field provides information in an independent form 2420 to qualify the seriousness of the alarm and the ability of 2421 the system as a whole to continue to provide services. 2422 If the value of this parameter is true, it indicates that 2423 the object emitting the alarm has been backed-up; if false, 2424 the object has not been backed-up."; 2425 } 2426 leaf backup-object { 2427 type al:resource; 2428 description 2429 "This parameter shall be present when the Backed-up status 2430 parameter is present and has the value true. This parameter 2431 specifies the managed object instance that is providing 2432 back-up services for the managed object about which the 2433 notification pertains. This parameter is useful, 2434 for example, when the back-up object is from a pool of 2435 objects any of which may be dynamically allocated to 2436 replace a faulty object."; 2437 } 2438 list additional-information { 2439 key "identifier"; 2440 description 2441 "This parameter allows the inclusion of a 2442 set of additional information in the alarm. It is 2443 a series of data structures each of which contains three 2444 items of information: an identifier, a significance 2445 indicator, and the problem information."; 2446 leaf identifier { 2447 type string; 2448 description 2449 "Identifies the data-type of the information parameter."; 2450 } 2451 leaf significant { 2452 type boolean; 2453 description 2454 "Set to true if the receiving system must be able to 2455 parse the contents of the information subparameter 2456 for the event report to be fully understood."; 2457 } 2458 leaf information { 2459 type string; 2460 description 2461 "Additional information about the alarm."; 2462 } 2463 } 2464 leaf security-alarm-detector { 2465 type al:resource; 2466 description 2467 "This parameter identifies the detector of the security 2468 alarm."; 2469 } 2470 leaf service-user { 2471 type al:resource; 2472 description 2473 "This parameter identifies the service-user whose request 2474 for service led to the generation of the security alarm."; 2475 } 2476 leaf service-provider { 2477 type al:resource; 2478 description 2479 "This parameter identifies the intended service-provider 2480 of the service that led to the generation of the security 2481 alarm."; 2482 } 2483 reference 2484 "ITU Recommendation X.733: Information Technology 2485 - Open Systems Interconnection 2486 - System Management: Alarm Reporting Function 2487 ITU Recommendation X.736: Information Technology 2488 - Open Systems Interconnection 2489 - System Management: Security Alarm Reporting Function"; 2490 } 2492 grouping x733-alarm-definition-parameters { 2493 description 2494 "Common X.733 parameters for alarm definitions. 2495 This grouping is used to define those alarm 2496 attributes that can be mapped from the alarm-type 2497 mechanism in the ietf-alarm module."; 2498 leaf event-type { 2499 type event-type; 2500 description 2501 "The alarm type has this X.733/X.736 event type."; 2502 } 2503 leaf probable-cause { 2504 type uint32; 2505 description 2506 "The alarm type has this X.733 probable cause value. 2507 This module defines probable cause as an integer 2508 and not as an enumeration. The reason being that the 2509 primary use of probable cause is in the management 2510 application if it is based on the X.733 standard. 2511 However, most management applications have their own 2512 defined enum definitions and merging enums from 2513 different systems might create conflicts. By using 2514 a configurable uint32 the system can be configured 2515 to match the enum values in the management application."; 2516 } 2517 leaf probable-cause-string { 2518 type string; 2519 description 2520 "This string can be used to give a user friendly string 2521 to the probable cause value."; 2522 } 2523 } 2525 grouping attribute { 2526 description 2527 "A grouping to match the ITU generic reference to 2528 an attribute."; 2529 leaf id { 2530 type al:resource; 2531 description 2532 "The resource representing the attribute."; 2533 } 2534 leaf value { 2535 type string; 2536 description 2537 "The value represented as a string since it could 2538 be of any type."; 2539 } 2540 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2541 } 2543 /* 2544 * Add X.733 parameters to the alarm definitions, alarms, 2545 * and notification. 2546 */ 2548 augment "/al:alarms/al:alarm-inventory/al:alarm-type" { 2549 description 2550 "Augment X.733 mapping information to the alarm inventory."; 2551 uses x733-alarm-definition-parameters; 2552 } 2554 /* 2555 * Add X.733 configurable mapping. 2556 */ 2558 augment "/al:alarms/al:control" { 2559 description 2560 "Add X.733 mapping capabilities. "; 2561 list x733-mapping { 2562 if-feature "configure-x733-mapping"; 2563 key "alarm-type-id alarm-type-qualifier-match"; 2564 description 2565 "This list allows a management application to control the 2566 X.733 mapping for all alarm types in the system. Any entry 2567 in this list will allow the alarm manager to over-ride the 2568 default X.733 mapping in the system and the final mapping 2569 will be shown in the alarm inventory."; 2570 leaf alarm-type-id { 2571 type al:alarm-type-id; 2572 description 2573 "Map the alarm type with this alarm type identifier."; 2575 } 2576 leaf alarm-type-qualifier-match { 2577 type string; 2578 description 2579 "A W3C regular expression that is used when mapping an 2580 alarm type and alarm-type-qualifier to X.733 parameters."; 2581 } 2582 uses x733-alarm-definition-parameters; 2583 } 2584 } 2585 augment "/al:alarms/al:alarm-list/al:alarm" { 2586 description 2587 "Augment X.733 information to the alarm."; 2588 uses x733-alarm-parameters; 2589 } 2590 augment "/al:alarms/al:shelved-alarms/al:shelved-alarm" { 2591 description 2592 "Augment X.733 information to the alarm."; 2593 uses x733-alarm-parameters; 2594 } 2595 augment "/al:alarm-notification" { 2596 description 2597 "Augment X.733 information to the alarm notification."; 2598 uses x733-alarm-parameters; 2599 } 2600 } 2602 2604 8. IANA Considerations 2606 This document registers a URI in the IETF XML registry [RFC3688]. 2607 Following the format in RFC 3688, the following registration is 2608 requested to be made. 2610 URI: urn:ietf:params:xml:ns:yang:ietf-alarms 2612 Registrant Contact: The IESG. 2614 XML: N/A, the requested URI is an XML namespace. 2616 This document registers a YANG module in the YANG Module Names 2617 registry [RFC6020]. 2619 name: ietf-alarms 2620 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms 2621 prefix: al 2622 reference: RFC XXXX 2624 9. Security Considerations 2626 The YANG module specified in this document defines a schema for data 2627 that is designed to be accessed via network management protocols such 2628 as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer 2629 is the secure transport layer, and the mandatory-to-implement secure 2630 transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer 2631 is HTTPS, and the mandatory-to-implement secure transport is TLS 2632 [RFC5246]. 2634 The NETCONF access control model [RFC6536] provides the means to 2635 restrict access for particular NETCONF or RESTCONF users to a 2636 preconfigured subset of all available NETCONF or RESTCONF protocol 2637 operations and content. 2639 There are a number of data nodes defined in this YANG module that are 2640 writable/creatable/deletable (i.e., config true, which is the 2641 default). These data nodes may be considered sensitive or vulnerable 2642 in some network environments. Write operations (e.g., edit-config) 2643 to these data nodes without proper protection can have a negative 2644 effect on network operations. These are the subtrees and data nodes 2645 and their sensitivity/vulnerability: 2647 /alarms/control/notify-status-change: This leaf controls whether an 2648 alarm should notify only raise and clear or all severity level 2649 changes. Unauthorized access to leaf could have a negative impact 2650 on operational procedures relying on fine-grained alarm state 2651 change reporting. 2653 /alarms/control/alarm-shelving/shelf: This list controls the 2654 shelving (blocking) of alarms. Unauthorized access to this list 2655 could jeopardize the alarm management procedures since these 2656 alarms will not be notified and not be part of the alarm list. 2658 Some of the RPC operations in this YANG module may be considered 2659 sensitive or vulnerable in some network environments. It is thus 2660 important to control access to these operations. These are the 2661 operations and their sensitivity/vulnerability: 2663 purge-alarms: This RPC deletes alarms from the alarm list. 2664 Unauthorized use of this RPC could jeopardize the alarm management 2665 procedures since the deleted alarms may be vital for the alarm 2666 management application. 2668 10. Acknowledgements 2670 The authors wish to thank Viktor Leijon and Johan Nordlander for 2671 their valuable input on forming the alarm model. 2673 The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and 2674 Balazs Lengyel for their extensive reviews and contributions to this 2675 document. 2677 11. References 2679 11.1. Normative References 2681 [M.3100] International Telecommunications Union, "Generic Network 2682 Information Model", ITU-T Recommendation M.3100, 2005. 2684 [M.3160] International Telecommunications Union, "Generic, 2685 protocol-neutral management information model", ITU-T 2686 Recommendation M.3100, 2008. 2688 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2689 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 2690 RFC2119, March 1997, 2691 . 2693 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 2694 DOI 10.17487/RFC3688, January 2004, 2695 . 2697 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 2698 (TLS) Protocol Version 1.2", RFC 5246, DOI 10.17487/ 2699 RFC5246, August 2008, . 2702 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 2703 the Network Configuration Protocol (NETCONF)", RFC 6020, 2704 DOI 10.17487/RFC6020, October 2010, 2705 . 2707 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 2708 and A. Bierman, Ed., "Network Configuration Protocol 2709 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 2710 . 2712 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 2713 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 2714 . 2716 [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", RFC 2717 6991, DOI 10.17487/RFC6991, July 2013, 2718 . 2720 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 2721 RFC 7950, DOI 10.17487/RFC7950, August 2016, 2722 . 2724 [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 2725 Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, 2726 . 2728 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2729 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2730 May 2017, . 2732 [X.733] International Telecommunications Union, "Information 2733 Technology - Open Systems Interconnection - Systems 2734 Management: Alarm Reporting Function", ITU-T 2735 Recommendation X.733, 1992. 2737 11.2. Informative References 2739 [ALARMIRP] 2740 3GPP, "Telecommunication management; Fault Management; 2741 Part 2: Alarm Integration Reference Point (IRP): 2742 Information Service (IS)", 3GPP TS 32.111-2 3.4.0, March 2743 2005. 2745 [ALARMSEM] 2746 Wallin, S., Leijon, V., Nordlander, J., and N. Bystedt, 2747 "The semantics of alarm definitions: enabling systematic 2748 reasoning about alarms. International Journal of Network 2749 Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, 2750 http://dx.doi.org/10.1002/nem.800", March 2012. 2752 [EEMUA] EEMUA Publication No. 191 Engineering Equipment and 2753 Materials Users Association, London, 2 edition., "Alarm 2754 Systems: A Guide to Design, Management and Procurement.", 2755 2007. 2757 [ISA182] International Society of Automation,ISA, "ANSI/ISA- 2758 18.2-2009 Management of Alarm Systems for the Process 2759 Industries", 2009. 2761 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 2762 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 2763 September 2004, . 2765 [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", 2766 BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, 2767 . 2769 [X.736] International Telecommunications Union, "Information 2770 Technology - Open Systems Interconnection - Systems 2771 Management: Security alarm reporting function", ITU-T 2772 Recommendation X.736, 1992. 2774 Appendix A. Vendor-specific Alarm-Types Example 2776 This example shows how to define alarm-types in a vendor-specific 2777 module. In this case the vendor "xyz" has chosen to define top level 2778 identities according to X.733 event types. 2780 module example-xyz-alarms { 2781 namespace "urn:example:xyz-alarms"; 2782 prefix xyz-al; 2784 import ietf-alarms { 2785 prefix al; 2786 } 2788 identity xyz-alarms { 2789 base al:alarm-type-id; 2790 } 2792 identity communications-alarm { 2793 base xyz-alarms; 2794 } 2795 identity quality-of-service-alarm { 2796 base xyz-alarms; 2797 } 2798 identity processing-error-alarm { 2799 base xyz-alarms; 2800 } 2801 identity equipment-alarm { 2802 base xyz-alarms; 2803 } 2804 identity environmental-alarm { 2805 base xyz-alarms; 2806 } 2808 // communications alarms 2809 identity link-alarm { 2810 base communications-alarm; 2811 } 2813 // QoS alarms 2814 identity high-jitter-alarm { 2815 base quality-of-service-alarm; 2816 } 2817 } 2819 Appendix B. Alarm Inventory Example 2821 This shows an alarm inventory, it shows one alarm type defined only 2822 with the identifier, and another dynamically configured. In the 2823 latter case a digital input has been connected to a smoke-detector, 2824 therefore the 'alarm-type-qualifier' is set to "smoke-detector" and 2825 the 'alarm-type-identity' to "environmental-alarm". 2827 2830 2831 2832 xyz-al:link-alarm 2833 2834 2835 /dev:interfaces/dev:interface 2836 2837 true 2838 2839 Link failure, operational state down but admin state up 2840 2841 2842 2843 xyz-al:environmental-alarm 2844 smoke-alarm 2845 true 2846 2847 Connected smoke detector to digital input 2848 2849 2850 2851 2853 Appendix C. Alarm List Example 2855 In this example we show an alarm that has toggled [major, clear, 2856 major]. An operator has acknowledged the alarm. 2858 2861 2862 1 2863 2015-04-08T08:39:50.00Z 2865 2866 2867 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 2868 2869 xyz-al:link-alarm 2870 2872 2015-04-08T08:39:50.00Z 2873 false 2874 1.3.6.1.2.1.2.2.1.1.17 2875 2015-04-08T08:39:40.00Z 2876 major 2877 2878 Link operationally down but administratively up 2879 2880 2881 2882 major 2883 2884 Link operationally down but administratively up 2885 2886 2887 2888 2889 cleared 2890 2891 Link operationally up and administratively up 2892 2893 2894 2895 2896 major 2897 2898 Link operationally down but administratively up 2899 2900 2901 2902 2903 ack 2904 joe 2905 Will investigate, ticket TR764999 2906 2907 2908 2909 2911 Appendix D. Alarm Shelving Example 2913 This example shows how to shelf alarms. We shelf alarms related to 2914 the smoke-detectors since they are being installed and tested. We 2915 also shelf all alarms from FastEthernet1/0. 2917 2920 2921 2922 2923 FE10 2924 2925 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 2926 2927 2928 2929 detectortest 2930 xyz-al:environmental-alarm 2931 2932 smoke-alarm 2933 2934 2935 2936 2937 2939 Appendix E. X.733 Mapping Example 2941 This example shows how to map a dynamic alarm type (alarm-type- 2942 identity=environmental-alarm, alarm-type-qualifier=smoke-alarm) to 2943 the corresponding X.733 event-type and probable cause parameters. 2945 2947 2948 2950 xyz-al:environmental-alarm 2951 2952 smoke-alarm 2953 2954 quality-of-service-alarm 2955 777 2956 2957 2958 2960 Appendix F. Background and Usability Requirements 2962 This section gives background information regarding design choices in 2963 the alarm module. It also defines usability requirements for alarms. 2964 Alarm usability is important for an alarm interface. A data-model 2965 will help in defining the format but if the actual alarms are of low 2966 value we have not gained the goal of alarm management. 2968 The telecommunication domain has standardised an alarm interface in 2969 ITU-T X.733 [X.733]. This continued in mobile networks within the 2970 3GPP organisation [ALARMIRP]. Although SNMP is the dominant 2971 mechanism for monitoring devices, IETF did not early on standardise 2972 an alarm MIB. Instead, management systems interpreted the enterprise 2973 specific traps per MIB and device to build an alarm list. When 2974 finally The Alarm MIB [RFC3877] was published, it had to address the 2975 existence of enterprise traps and map these into alarms. This 2976 requirement led to a MIB that is not always easy to use. 2978 F.1. Alarm Concepts 2980 There are two misconceptions regarding alarms and alarm interfaces 2981 that are important to sort out. The first problem is that alarms are 2982 mixed with events in general. Alarms MUST correspond to an 2983 undesirable state that needs corrective action. Many implementations 2984 of alarm interfaces do not adhere to this principle and just send 2985 events in general. In order to qualify as an alarm, there must exist 2986 a corrective action. If that is not true, it is an event that can go 2987 into logs. 2989 The other misconception is that the term "alarm" refers to the 2990 notification itself. Rather, an alarm is a state of a resource in 2991 the system. The alarm notifications report state changes of the 2992 alarm, such as alarm raise and alarm clear. 2994 "One of the most important principles of alarm management is that an 2995 alarm requires an action. This means that if the operator does not 2996 need to respond to an alarm (because unacceptable consequences do not 2997 occur), then it is not an alarm. Following this cardinal rule will 2998 help eliminate many potential alarm management issues." [ISA182] 3000 F.1.1. Alarm type 3002 Since every alarm has a corresponding corrective action, a vendor can 3003 to prepare a list of available alarms and their corrective actions. 3004 We use the term "alarm type" to refer to every possible alarm that 3005 could be active in the system. 3007 Alarm types are also fundamental in order to provide a state-based 3008 alarm list. The alarm list correlates alarm state changes for the 3009 same alarm type and the same resource into one alarm. 3011 Different alarm interfaces use different mechanisms to define alarm 3012 types, ranging from simple error numbers to more advanced mechanisms 3013 like the X.733 triplet of event type, probable cause and specific 3014 problem. 3016 A common misunderstanding is that individual alarm notifications are 3017 alarm types. This is not correct; e.g., "link-up" and "link-down" 3018 are two notifications reporting different states for the same alarm 3019 type, "link-alarm". 3021 F.2. Usability Requirements 3023 Common alarm problems and the cause of the problems are summarised in 3024 Table 1. This summary is adopted to networking based on the ISA 3025 [ISA182] and EEMUA [EEMUA] standards. 3027 +------------------+--------------------------------+---------------+ 3028 | Problem | Cause | How this | 3029 | | | module | 3030 | | | address the | 3031 | | | cause | 3032 +------------------+--------------------------------+---------------+ 3033 | Alarms are | "Nuisance" alarms (chattering | Strict | 3034 | generated but | alarms and fleeting alarms), | definition of | 3035 | they are ignored | faulty hardware, redundant | alarms | 3036 | by the operator. | alarms, cascading alarms, | requiring | 3037 | | incorrect alarm settings, | corrective | 3038 | | alarms have not been | response. | 3039 | | rationalised, the alarms | Alarm | 3040 | | represent log information | requirements | 3041 | | rather than true alarms. | in Table 2. | 3042 | | | | 3043 | When alarms | Insufficient alarm response | The alarm | 3044 | occur, operators | procedures and not well | inventory | 3045 | do not know how | defined alarm types. | lists all | 3046 | to respond. | | alarm types | 3047 | | | and | 3048 | | | corrective | 3049 | | | actions. | 3050 | | | Alarm | 3051 | | | requirements | 3052 | | | in Table 2. | 3053 | | | | 3054 | The alarm | Nuisance alarms, stale alarms, | The alarm | 3055 | display is full | alarms from equipment not in | definition | 3056 | of alarms, even | service. | and alarm | 3057 | when there is | | shelving. | 3058 | nothing wrong. | | | 3059 | | | | 3060 | During a | Incorrect prioritization of | State-based | 3061 | failure, | alarms. Not using advanced | alarm model, | 3062 | operators are | alarm techniques (e.g. state- | alarm rate | 3063 | flooded with so | based alarming). | requirements | 3064 | many alarms that | | in Table 3 | 3065 | they do not know | | and Table 4 | 3066 | which ones are | | | 3067 | the most | | | 3068 | important. | | | 3069 +------------------+--------------------------------+---------------+ 3071 Table 1: Alarm Problems and Causes 3073 Based upon the above problems EEMUA gives the following definition of 3074 a good alarm: 3076 +----------------+--------------------------------------------------+ 3077 | Characteristic | Explanation | 3078 +----------------+--------------------------------------------------+ 3079 | Relevant | Not spurious or of low operational value. | 3080 | | | 3081 | Unique | Not duplicating another alarm. | 3082 | | | 3083 | Timely | Not long before any response is needed or too | 3084 | | late to do anything. | 3085 | | | 3086 | Prioritised | Indicating the importance that the operator | 3087 | | deals with the problem. | 3088 | | | 3089 | Understandable | Having a message which is clear and easy to | 3090 | | understand. | 3091 | | | 3092 | Diagnostic | Identifying the problem that has occurred. | 3093 | | | 3094 | Advisory | Indicative of the action to be taken. | 3095 | | | 3096 | Focusing | Drawing attention to the most important issues. | 3097 +----------------+--------------------------------------------------+ 3099 Table 2: Definition of a Good Alarm 3101 Vendors SHOULD rationalise all alarms according to above. Another 3102 crucial requirement is acceptable alarm notification rates. Vendors 3103 SHOULD make sure that they do not exceed the recommendations from 3104 EEMUA below: 3106 +-----------------------------------+-------------------------------+ 3107 | Long Term Alarm Rate in Steady | Acceptability | 3108 | Operation | | 3109 +-----------------------------------+-------------------------------+ 3110 | More than one per minute | Very likely to be | 3111 | | unacceptable. | 3112 | | | 3113 | One per 2 minutes | Likely to be over-demanding. | 3114 | | | 3115 | One per 5 minutes | Manageable. | 3116 | | | 3117 | Less than one per 10 minutes | Very likely to be acceptable. | 3118 +-----------------------------------+-------------------------------+ 3120 Table 3: Acceptable Alarm Rates, Steady State 3122 +----------------------------+--------------------------------------+ 3123 | Number of alarms displayed | Acceptability | 3124 | in 10 minutes following a | | 3125 | major network problem | | 3126 +----------------------------+--------------------------------------+ 3127 | More than 100 | Definitely excessive and very likely | 3128 | | to lead to the operator to abandon | 3129 | | the use of the alarm system. | 3130 | | | 3131 | 20-100 | Hard to cope with. | 3132 | | | 3133 | Under 10 | Should be manageable - but may be | 3134 | | difficult if several of the alarms | 3135 | | require a complex operator response. | 3136 +----------------------------+--------------------------------------+ 3138 Table 4: Acceptable Alarm Rates, Burst 3140 The numbers in Table 3 and Table 4 are the sum of all alarms for a 3141 network being managed from one alarm console. So every individual 3142 system or NMS contributes to these numbers. 3144 Vendors SHOULD make sure that the following rules are used in 3145 designing the alarm interface: 3147 1. Rationalize the alarms in the system to ensure that every alarm 3148 is necessary, has a purpose, and follows the cardinal rule - that 3149 it requires an operator response. Adheres to the rules of 3150 Table 2 3152 2. Audit the quality of the alarms. Talk with the operators about 3153 how well the alarm information support them. Do they know what 3154 to do in the event of an alarm? Are they able to quickly 3155 diagnose the problem and determine the corrective action? Does 3156 the alarm text adhere to the requirements in Table 2? 3158 3. Analyze and benchmark the performance of the system and compare 3159 it to the recommended metrics in Table 3 and Table 4. Start by 3160 identifying nuisance alarms, standing alarms at normal state and 3161 startup. 3163 Authors' Addresses 3165 Stefan Vallin 3166 Stefan Vallin AB 3168 Email: stefan@wallan.se 3169 Martin Bjorklund 3170 Cisco 3172 Email: mbj@tail-f.com