idnits 2.17.1 draft-ietf-ccamp-alarm-module-09.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 466 has weird spacing: '...perator str...' == Line 471 has weird spacing: '...w state wri...' == Line 672 has weird spacing: '...r-match str...' == Line 722 has weird spacing: '...alifier ala...' == Line 776 has weird spacing: '...alifier lea...' == (6 more instances...) -- The document date (April 11, 2019) is 1842 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'XSD-TYPES' == Outdated reference: A later version (-21) exists of draft-ietf-netmod-yang-instance-file-format-02 Summary: 0 errors (**), 0 flaws (~~), 8 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Vallin 3 Internet-Draft Stefan Vallin AB 4 Intended status: Standards Track M. Bjorklund 5 Expires: October 13, 2019 Cisco 6 April 11, 2019 8 YANG Alarm Module 9 draft-ietf-ccamp-alarm-module-09 11 Abstract 13 This document defines a YANG module for alarm management. It 14 includes functions for alarm list management, alarm shelving and 15 notifications to inform management systems. There are also 16 operations to manage the operator state of an alarm and 17 administrative alarm procedures. The module carefully maps to 18 relevant alarm standards. 20 Status of This Memo 22 This Internet-Draft is submitted in full conformance with the 23 provisions of BCP 78 and BCP 79. 25 Internet-Drafts are working documents of the Internet Engineering 26 Task Force (IETF). Note that other groups may also distribute 27 working documents as Internet-Drafts. The list of current Internet- 28 Drafts is at https://datatracker.ietf.org/drafts/current/. 30 Internet-Drafts are draft documents valid for a maximum of six months 31 and may be updated, replaced, or obsoleted by other documents at any 32 time. It is inappropriate to use Internet-Drafts as reference 33 material or to cite them other than as "work in progress." 35 This Internet-Draft will expire on October 13, 2019. 37 Copyright Notice 39 Copyright (c) 2019 IETF Trust and the persons identified as the 40 document authors. All rights reserved. 42 This document is subject to BCP 78 and the IETF Trust's Legal 43 Provisions Relating to IETF Documents 44 (https://trustee.ietf.org/license-info) in effect on the date of 45 publication of this document. Please review these documents 46 carefully, as they describe your rights and restrictions with respect 47 to this document. Code Components extracted from this document must 48 include Simplified BSD License text as described in Section 4.e of 49 the Trust Legal Provisions and are provided without warranty as 50 described in the Simplified BSD License. 52 Table of Contents 54 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 56 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 5 57 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 58 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 59 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 6 60 3.3. Identifying the Alarming Resource . . . . . . . . . . . . 8 61 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 9 62 3.5. Alarm Lifecycle . . . . . . . . . . . . . . . . . . . . . 9 63 3.5.1. Resource Alarm Lifecycle . . . . . . . . . . . . . . 10 64 3.5.2. Operator Alarm Lifecycle . . . . . . . . . . . . . . 10 65 3.5.3. Administrative Alarm Lifecycle . . . . . . . . . . . 11 66 3.6. Root Cause, Impacted Resources and Related 67 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . 11 68 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 13 69 3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 13 70 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 13 71 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 15 72 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 15 73 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 15 74 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 16 75 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 17 76 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 19 77 4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 19 78 4.7. Operations . . . . . . . . . . . . . . . . . . . . . . . 20 79 4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 20 80 5. Relationship to the ietf-hardware YANG module . . . . . . . . 20 81 6. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 21 82 7. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 53 83 8. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 53 84 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 65 85 10. Security Considerations . . . . . . . . . . . . . . . . . . . 65 86 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 66 87 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 67 88 12.1. Normative References . . . . . . . . . . . . . . . . . . 67 89 12.2. Informative References . . . . . . . . . . . . . . . . . 68 90 Appendix A. Vendor-specific Alarm Types Example . . . . . . . . 69 91 Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 70 92 Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 71 93 Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 72 94 Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 73 95 Appendix F. Relationship to other alarm standards . . . . . . . 74 96 F.1. Alarm definition . . . . . . . . . . . . . . . . . . . . 74 97 F.2. Data model . . . . . . . . . . . . . . . . . . . . . . . 76 98 F.2.1. X.733 . . . . . . . . . . . . . . . . . . . . . . . . 76 99 F.2.2. RFC 3877, the Alarm MIB . . . . . . . . . . . . . . . 76 100 F.2.3. 3GPP Alarm IRP . . . . . . . . . . . . . . . . . . . 77 101 F.2.4. G.7710 . . . . . . . . . . . . . . . . . . . . . . . 77 102 Appendix G. Alarm Usability Requirements . . . . . . . . . . . . 77 103 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 81 105 1. Introduction 107 This document defines a YANG [RFC7950] module for alarm management. 108 The purpose is to define a standardized alarm interface for network 109 devices that can be easily integrated into management applications. 110 The model is also applicable as a northbound alarm interface in the 111 management applications. 113 Alarm monitoring is a fundamental part of monitoring the network. 114 Raw alarms from devices do not always tell the status of the network 115 services or necessarily point to the root cause. However, being able 116 to feed alarms to the alarm management application in a standardized 117 format is a starting point for performing higher level network 118 assurance tasks. 120 The design of the module is based on experience from using and 121 implementing available alarm standards from ITU [X.733], 3GPP 122 [ALARMIRP] and ANSI [ISA182]. 124 1.1. Terminology and Notation 126 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 127 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 128 "OPTIONAL" in this document are to be interpreted as described in BCP 129 14 [RFC2119] [RFC8174] when, and only when, they appear in all 130 capitals, as shown here. 132 The following terms are defined in [RFC7950]: 134 o action 136 o client 138 o data tree 140 o server 142 The following terms are used within this document: 144 o Alarm (the general concept): An alarm signifies an undesirable 145 state in a resource that requires corrective action. 147 o Fault: A fault is the underlying cause of an undesired behavior. 148 There is no trivial one-to-one mapping between faults and alarms. 149 One fault may result in several alarms in case the system lacks 150 root-cause and correlation capabilities. An alarm might not have 151 an underlying fault as a cause. For example, imagine a bad Mean 152 Opinion Score (MOS) alarm from a Voice over IP (VOIP) probe and 153 the cause being non-optimal QoS configuration. 155 o Alarm Type: An alarm type identifies a possible unique alarm state 156 for a resource. Alarm types are names to identify the state like 157 "link-alarm", "jitter-violation", "high-disk-utilization". 159 o Resource: A fine-grained identification of the alarming resource, 160 for example: an interface, a process. 162 o Alarm Instance: The alarm state for a specific resource and alarm 163 type. For example ("GigabitEthernet0/15", link-alarm). An entry 164 in the alarm list. 166 o Cleared alarm: A cleared alarm is an alarm where the system 167 considers the undesired state to be cleared. Operators can not 168 clear alarms, clearance is managed by the system. For example, a 169 linkUp notification can be considered a clear condition for a 170 linkDown state. 172 o Closed alarm: Operators can close alarms irrespective of the alarm 173 being cleared or not. A closed alarm indicates that the alarm 174 does not need attention, either since the corrective action has 175 been taken or that it can be ignored for other reasons. 177 o Alarm Inventory: A list of all possible alarm types on a system. 179 o Alarm Shelving: Blocking alarms according to specific criteria. 181 o Corrective Action: An action taken by an operator or automation 182 routine in order to minimize the impact of the alarm or resolve 183 the root cause. 185 o Management System: The alarm management application that consumes 186 the alarms, i.e., acts as a client. 188 o System: The system that implements this YANG alarm module, i.e., 189 acts as a server. This corresponds to a network device or a 190 management application that provides a northbound alarm interface. 192 Tree diagrams used in this document follow the notation defined in 193 [RFC8340]. 195 2. Objectives 197 The objectives for the design of the Alarm Module are: 199 o Simple to use. If a system supports this module, it shall be 200 straight-forward to integrate this into a YANG based alarm 201 manager. 203 o View alarms as states on resources and not as discrete 204 notifications. 206 o To provide a precise definition of "alarm" in order to exclude 207 general events that should not be forwarded as alarm 208 notifications. 210 o To provide precise identification of alarm types and alarm 211 instances. 213 o A management system should be able to pull all available alarm 214 types from a system, i.e., read the alarm inventory from a system. 215 This makes it possible to prepare alarm operators with 216 corresponding alarm instructions. 218 o Address alarm usability requirements, see Appendix G. While IETF 219 and telecom standards have addressed alarms mostly from a protocol 220 perspective, the process industry has published several relevant 221 standards addressing requirements for a useful alarm interface; 222 [EEMUA], [ISA182]. This alarm module defines usability 223 requirements as well as a YANG data model. 225 o Mapping to [X.733], which is a requirement for some alarm systems. 226 Still, keep some of the X.733 concepts out of the core model in 227 order to make the model small and easy to understand. 229 3. Alarm Module Concepts 231 This section defines the fundamental concepts behind the data model. 232 This section is rooted in the works of Vallin et. al [ALARMSEM]. 234 3.1. Alarm Definition 236 An alarm signifies an undesirable state in a resource that requires 237 corrective action. 239 There are two main things to remember from this definition: 241 1. the definition focuses on leaving out events and logging 242 information in general. Alarms should only be used for undesired 243 states that require action. 245 2. the definition also focuses on alarms as a state on a resource, 246 not the notifications that report the state changes. 248 See Appendix F for information how this definition relates to other 249 alarm standards. 251 3.2. Alarm Type 253 This document defines an alarm type with an alarm type id and an 254 alarm type qualifier. 256 The alarm type id is modeled as a YANG identity. With YANG 257 identities, new alarm types can be defined in a distributed fashion. 258 YANG identities are hierarchical, which means that a hierarchy of 259 alarm types can be defined. 261 Standards and vendors should define their own alarm type identities 262 based on this definition. 264 The use of YANG identities means that all possible alarms are 265 identified at design time. This explicit declaration of alarm types 266 makes it easier to allow for alarm qualification reviews and 267 preparation of alarm actions and documentation. 269 There are occasions where the alarm types are not known at design 270 time. An example is a system with digital inputs that allows users 271 to connect detectors, such as smoke detectors, to the inputs. In 272 this case it is a configuration action that says that certain 273 connectors are fire alarms for example. 275 In order to allow for dynamic addition of alarm types the alarm 276 module allows for further qualification of the identity-based alarm 277 type using a string. A potential drawback of this is that there is a 278 significant risk that alarm operators will receive alarm types as a 279 surprise. They do not know how to resolve the problem since a 280 defined alarm procedure does not necessarily exist. To avoid this 281 risk the system MUST publish all possible alarm types in the alarm 282 inventory, see Section 4.2. 284 A vendor or standards organization can define their own alarm type 285 hierarchy. The example below shows a hierarchy based on X.733 event 286 types: 288 import ietf-alarms { 289 prefix al; 290 } 291 identity vendor-alarms { 292 base al:alarm-type; 293 } 294 identity communications-alarm { 295 base vendor-alarms; 296 } 297 identity link-alarm { 298 base communications-alarm; 299 } 301 Alarm types can be abstract. An abstract alarm type is used as a 302 base for defining hierarchical alarm types. Concrete alarm types are 303 used for alarm states and appear in the alarm inventory. There are 304 two kinds of concrete alarm types: 306 1. The last subordinate identity in the "alarm-type-id" hierarchy is 307 concrete, for example: "alarm-identity.environmental- 308 alarm.smoke". In this example "alarm-identity" and 309 "environmental-alarm" are abstract YANG identities, whereas 310 "smoke" is a concrete YANG identity. 312 2. The YANG identity hierarchy is abstract and the concrete alarm 313 type is defined by the dynamic alarm qualifier string, for 314 example: "alarm-identity.environmental-alarm.external-detector" 315 with alarm-type-qualifier "smoke". 317 For example: 319 // Alternative 1: concrete alarm type identity 320 import ietf-alarms { 321 prefix al; 322 } 323 identity environmental-alarm { 324 base al:alarm-type; 325 description "Abstract alarm type"; 326 } 327 identity smoke { 328 base environmental-alarm; 329 description "Concrete alarm type"; 330 } 332 // Alternative 2: concrete alarm type qualifier 333 import ietf-alarms { 334 prefix al; 335 } 336 identity environmental-alarm { 337 base al:alarm-type; 338 description "Abstract alarm type"; 339 } 340 identity external-detector { 341 base environmental-alarm; 342 description 343 "Abstract alarm type, a run-time configuration 344 procedure sets the type of alarm detected. This will 345 be reported in the alarm-type-qualifier."; 346 } 348 A server SHOULD strive to minimize the number of dynamically defined 349 alarm types. 351 3.3. Identifying the Alarming Resource 353 It is of vital importance to be able to refer to the alarming 354 resource. This reference must be as fine-grained as possible. If 355 the alarming resource exists in the data tree then an instance- 356 identifier MUST be used with the full path to the object. 358 When the module is used in a controller/orchestrator/manager the 359 original device resource identification can be modified to include 360 the device in the path. The details depend on how devices are 361 identified, and are out of scope for this specification. 363 Example: 365 The original device alarm might identify the resource as 366 "/dev:interfaces/dev:interface[dev:name='FastEthernet1/0']". 368 The resource identification in the manager could look something 369 like: "/mgr:devices/mgr:device[mgr:name='xyz123']/dev:interfaces/ 370 dev:interface[dev:name='FastEthernet1/0']" 372 This module also allows for alternate naming of the alarming resource 373 if it is not available in the data tree. 375 3.4. Identifying Alarm Instances 377 A primary goal of this alarm module is to remove any ambiguity in how 378 alarm notifications are mapped to an update of an alarm instance. 379 The X.733 and 3GPP documents were not clear on this point. This YANG 380 alarm module states that the tuple (resource, alarm type identifier, 381 alarm type qualifier) corresponds to a single alarm instance. This 382 means that alarm notifications for the same resource and same alarm 383 type are matched to update the same alarm instance. These three 384 leafs are therefore used as the key in the alarm list: 386 list alarm { 387 key "resource alarm-type-id alarm-type-qualifier"; 388 ... 389 } 391 3.5. Alarm Lifecycle 393 The alarm model clearly separates the resource alarm lifecycle from 394 the operator and administrative lifecycles of an alarm. 396 o resource alarm lifecycle: the alarm instrumentation that controls 397 alarm raise, clearance, and severity changes. 399 o operator alarm lifecycle: operators acting upon alarms with 400 actions like acknowledgment and closing. Closing an alarm implies 401 that the operator considers the corrective action performed. 402 Operators can also shelf (block/filter) alarms in order to avoid 403 nuisance alarms. 405 o administrative alarm lifecycle: purging (deleting) unwanted alarms 406 and compressing the alarm status change list. This module exposes 407 operations to manage the administrative lifecycle. The server may 408 also perform these operations based on other policies, but how 409 that is done is out of scope for this document. 411 A server SHOULD describe how long it retains cleared/closed alarms: 412 until manually purged or if it has an automatic removal policy. How 413 this is done is outside the scope of this document. 415 3.5.1. Resource Alarm Lifecycle 417 From a resource perspective, an alarm can for example have the 418 following lifecycle: raise, change severity, change severity, clear, 419 being raised again, etc. All of these status changes can have 420 different alarm texts generated by the instrumentation. Two 421 important things to note: 423 1. Alarms are not deleted when they are cleared. Deleting alarms is 424 an administrative process. The alarm module defines an action 425 "purge-alarms" that deletes alarms. 427 2. Alarms are not cleared by operators, only the underlying 428 instrumentation can clear an alarm. Operators can close alarms. 430 The YANG tree representation below illustrates the resource oriented 431 lifecycle: 433 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 434 ... 435 +--ro is-cleared boolean 436 +--ro last-changed yang:date-and-time 437 +--ro perceived-severity severity 438 +--ro alarm-text alarm-text 439 +--ro status-change* [time] {alarm-history}? 440 +--ro time yang:date-and-time 441 +--ro perceived-severity severity-with-clear 442 +--ro alarm-text alarm-text 444 For every status change from the resource perspective a row is added 445 to the "status-change" list, if the server implements the feature 446 "alarm-history". The feature "alarm-history" is optional to 447 implement, since keeping the alarm history may have an impact on the 448 server's memory resources. 450 The last status values are also represented as leafs for the alarm. 451 Note well that the alarm severity does not include "cleared", alarm 452 clearance is a boolean flag. 454 An alarm can therefore look like this: (("GigabitEthernet0/25", link- 455 alarm,""), false, 2018-04-08T08:20:10.00Z, major, "Interface 456 GigabitEthernet0/25 down") 458 3.5.2. Operator Alarm Lifecycle 460 Operators can act upon alarms using the set-operator-state action: 462 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 463 ... 464 +--ro operator-state-change* [time] {operator-actions}? 465 | +--ro time yang:date-and-time 466 | +--ro operator string 467 | +--ro state operator-state 468 | +--ro text? string 469 +---x set-operator-state {operator-actions}? 470 +---w input 471 +---w state writable-operator-state 472 +---w text? string 474 The operator state for an alarm can be: "none", "ack", "shelved", and 475 "closed". Alarm deletion (using the action "purge-alarms") can use 476 this state as a criterion. For example, a closed alarm is an alarm 477 where the operator has performed any required corrective actions. 478 Closed alarms are good candidates for being purged. 480 3.5.3. Administrative Alarm Lifecycle 482 Deleting alarms from the alarm list is considered an administrative 483 action. This is supported by the "purge-alarms" action. The "purge- 484 alarms" action takes a filter as input. The filter selects alarms 485 based on the operator and resource lifecycle such as "all closed 486 cleared alarms older than a time specification". The server may also 487 perform these operations based on other policies, but how that is 488 done is out of scope for this document. 490 Purged alarms are removed from the alarm list. Note well, if the 491 alarm resource state changes after a purge, the alarm will reappear 492 in the alarm list. 494 Alarms can be compressed. Compressing an alarm deletes all entries 495 in the alarm's "status-change" list except for the last status 496 change. A client can perform this using the "compress-alarms" 497 action. The server may also perform these operations based on other 498 policies, but how that is done is out of scope for this document. 500 3.6. Root Cause, Impacted Resources and Related Alarms 502 The alarm module does not mandate any requirements for the system to 503 support alarm correlation or root-cause and service-impact analysis. 504 However, if such features are supported, this section describes how 505 the results of such analysis are represented in the data model. 506 These parts of the model are optional. The module supports three 507 scenarios: 509 Root cause analysis: An alarm can indicate candidate root cause 510 resources, for example: a database issue alarm referring to a full 511 disc partition. 513 Service impact analysis: An alarm can refer to potential impacted 514 resources, for example: an interface alarm referring to impacted 515 network services 517 Alarm correlation: Dependencies between alarms, several alarms can 518 be grouped as relating to each other, for example a streaming 519 media alarm relating to a high jitter alarm. 521 Different systems have varying degrees of alarm correlation and 522 analysis capabilities, and the intent of the alarm module is to 523 enable any capability, including none. 525 The general principle of this alarm module is to limit the amount of 526 alarms. In many cases several resources are affected for a given 527 underlying problem. A full disk will of course impact databases and 528 applications as well. The recommendation is to have a single alarm 529 for the underlying problem and list the affected resources in the 530 alarm, rather than having separate alarms for each resource. 532 The alarm has one leaf-list to identify possible "impacted-resources" 533 and a leaf-list to identify possible "root-cause-resources". These 534 serves as hints only. It is up to the client application to use this 535 information to present the overall status. Using the disk full 536 example, a good alarm would be to use the hard disk partition as the 537 alarming resource and add the database and applications into the 538 "impacted-resources" leaf-list. 540 A system should always strive to identify the resource that can be 541 acted upon as the "resource" leaf. The "impacted-resource" leaf-list 542 shall be used to identify any side-effects of the alarm. The 543 impacted resources can not be acted upon to fix the problem. The 544 disk full example above illustrates the principle; you can not fix 545 the underlying issue by database operations. However, you need to 546 pay attention to the database to perform any operations that limits 547 the impact of problem. 549 On some occasions the system might not be capable of detecting the 550 root cause, the resource that can be acted upon. The instrumentation 551 in this case only monitors the side-effect and raises an alarm to 552 indicate a situation requiring attention. The instrumentation still 553 might identify possible candidates for the root-cause resource. In 554 this case the "root-cause-resource" leaf-list can be used to indicate 555 the candidate root-cause resources. An example of this kind of alarm 556 might be an active test tool that detects an SLA violation on a VPN 557 connection and identifies the devices along the chain as candidate 558 root causes. 560 The alarm module also supports a way to associate different alarms to 561 each other with the "related-alarm" list. This list enables the 562 server to inform the client that certain alarms are related to other 563 alarms. 565 Note well that this module does not prescribe any dependencies or 566 preference between the above alarm correlation mechanisms. Different 567 systems have different capabilities and the above described 568 mechanisms are available to support the instrumentation features. 570 3.7. Alarm Shelving 572 Alarm shelving is an important function in order for alarm management 573 applications and operators to stop superfluous alarms. A shelved 574 alarm implies that any alarms fulfilling these criteria are ignored 575 (blocked/filtered). Shelved alarms appear in a dedicated shelved 576 alarm list in order to be filtered out from the alarm list in order 577 for the main alarm list to only contain entries of interest. Shelved 578 alarms do not generate notifications but the shelved alarm list is 579 updated with any alarm state changes. 581 Alarm shelving is optional to implement, since matching alarms 582 against shelf criteria may have an impact on the server's processing 583 resources. 585 3.8. Alarm Profiles 587 Alarm profiles are used to configure further information to an alarm 588 type. This module supports configuring severity levels overriding 589 the system default levels. This corresponds to the Alarm Assignment 590 Profile, ASAP, functionality in M.3100 [M.3100] and M.3160 [M.3160]. 591 Other standard or enterprise modules can augment this list with 592 further alarm type information. 594 4. Alarm Data Model 596 The fundamental parts of the data model are the "alarm-list" with 597 associated notifications and the "alarm-inventory" list of all 598 possible alarm types. These MUST be implemented by a system. The 599 rest of the data model are made conditional with the YANG features 600 "operator-actions", "alarm-shelving", "alarm-history", "alarm- 601 summary", "alarm-profile", and "severity-assignment". 603 The data model has the following overall structure: 605 +--rw control 606 | +--rw max-alarm-status-changes? union 607 | +--rw notify-status-changes? enumeration 608 | +--rw notify-severity-level? severity 609 | +--rw alarm-shelving {alarm-shelving}? 610 | ... 611 +--ro alarm-inventory 612 | +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 613 | ... 614 +--ro summary {alarm-summary}? 615 | +--ro alarm-summary* [severity] 616 | | ... 617 | +--ro shelves-active? empty {alarm-shelving}? 618 +--ro alarm-list 619 | +--ro number-of-alarms? yang:gauge32 620 | +--ro last-changed? yang:date-and-time 621 | +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 622 | | ... 623 | +---x purge-alarms 624 | | ... 625 | +---x compress-alarms {alarm-history}? 626 | ... 627 +--ro shelved-alarms {alarm-shelving}? 628 | +--ro number-of-shelved-alarms? yang:gauge32 629 | +--ro shelved-alarms-last-changed? yang:date-and-time 630 | +--ro shelved-alarm* 631 | | [resource alarm-type-id alarm-type-qualifier] 632 | | ... 633 | +---x purge-shelved-alarms 634 | | ... 635 | +---x compress-shelved-alarms {alarm-history}? 636 | ... 637 +--rw alarm-profile* 638 [alarm-type-id alarm-type-qualifier-match resource] 639 {alarm-profile}? 640 +--rw alarm-type-id alarm-type-id 641 +--rw alarm-type-qualifier-match string 642 +--rw resource resource-match 643 +--rw description string 644 +--rw alarm-severity-assignment-profile 645 {severity-assignment}? 646 ... 648 4.1. Alarm Control 650 The "/alarms/control/notify-status-changes" leaf controls if 651 notifications are sent for all state changes, only raise and clear, 652 or only notifications more severe than a configured level. This 653 feature in combination with alarm shelving corresponds to the ITU 654 Alarm Report Control functionality, see Appendix F.2.4. 656 Every alarm has a list of status changes. The length of this list is 657 controlled by "/alarms/control/max-alarm-status-changes". When the 658 list is full and a new entry created, the oldest entry is removed. 660 4.1.1. Alarm Shelving 662 The shelving control tree is shown below: 664 +--rw control 665 +--rw alarm-shelving {alarm-shelving}? 666 +--rw shelf* [name] 667 +--rw name string 668 +--rw resource* resource-match 669 +--rw alarm-type* 670 | [alarm-type-id alarm-type-qualifier-match] 671 | +--rw alarm-type-id alarm-type-id 672 | +--rw alarm-type-qualifier-match string 673 +--rw description? string 675 Shelved alarms are shown in a dedicated shelved alarm list. Matching 676 alarms MUST appear in the /alarms/shelved-alarms/shelved-alarm list, 677 and non-matching /alarms MUST appear in the /alarms/alarm-list/alarm 678 list. The server does not send any notifications for shelved alarms. 680 Shelving and unshelving can only be performed by editing the shelf 681 configuration. It cannot be performed on individual alarms. The 682 server will add an operator state indicating that the alarm was 683 shelved/unshelved. 685 A leaf (/alarms/summary/shelves-active) in the alarm summary 686 indicates if there are shelved alarms. 688 A system can select to not support the shelving feature. 690 4.2. Alarm Inventory 692 The alarm inventory represents all possible alarm types that may 693 occur in the system. A management system may use this to build alarm 694 procedures. The alarm inventory is relevant for several reasons: 696 The system might not implement all defined alarm type identities, 697 and some alarm identities are abstract. 699 The system has configured dynamic alarm types using the alarm 700 qualifier. The inventory makes it possible for the management 701 system to discover these. 703 Note that the mechanism whereby dynamic alarm types are added using 704 the alarm type qualifier MUST populate this list. 706 The optional leaf-list "resource" in the alarm inventory enables the 707 system to publish for which resources a given alarm type may appear. 709 A server MUST implement the alarm inventory in order to enable 710 controlled alarm procedures in the client. 712 A server implementer may want to document the alarm inventory for 713 off-line processing by clients. The file format defined in 714 [I-D.ietf-netmod-yang-instance-file-format] can be used for this 715 purpose. 717 The alarm inventory tree is shown below: 719 +--ro alarm-inventory 720 +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 721 +--ro alarm-type-id alarm-type-id 722 +--ro alarm-type-qualifier alarm-type-qualifier 723 +--ro resource* resource-match 724 +--ro will-clear boolean 725 +--ro severity-levels* severity 726 +--ro description string 728 4.3. Alarm Summary 730 The alarm summary list summarizes alarms per severity; how many 731 cleared, cleared and closed, and closed. It also gives an indication 732 if there are shelved alarms. 734 The alarm summary tree is shown below: 736 +--ro summary {alarm-summary}? 737 +--ro alarm-summary* [severity] 738 | +--ro severity severity 739 | +--ro total? yang:gauge32 740 | +--ro not-cleared? yang:gauge32 741 | +--ro cleared? yang:gauge32 742 | +--ro cleared-not-closed? yang:gauge32 743 | | {operator-actions}? 744 | +--ro cleared-closed? yang:gauge32 745 | | {operator-actions}? 746 | +--ro not-cleared-closed? yang:gauge32 747 | | {operator-actions}? 748 | +--ro not-cleared-not-closed? yang:gauge32 749 | {operator-actions}? 750 +--ro shelves-active? empty {alarm-shelving}? 752 4.4. The Alarm List 754 The alarm list (/alarms/alarm-list) is a function from the tuple 755 (resource, alarm type, alarm type qualifier) to the current composite 756 alarm state. The composite state includes states for the resource 757 lifecycle such as severity, clearance flag and operator states such 758 as acknowledgment. This means that for a given resource and alarm 759 type the alarm list shows the current states of the alarm such as 760 acknowledged and cleared status. 762 +--ro alarm-list 763 +--ro number-of-alarms? yang:gauge32 764 +--ro last-changed? yang:date-and-time 765 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 766 | +--ro resource resource 767 | +--ro alarm-type-id alarm-type-id 768 | +--ro alarm-type-qualifier alarm-type-qualifier 769 | +--ro alt-resource* resource 770 | +--ro related-alarm* 771 | | [resource alarm-type-id alarm-type-qualifier] 772 | | {alarm-correlation}? 773 | | +--ro resource 774 | | | -> /alarms/alarm-list/alarm/resource 775 | | +--ro alarm-type-id leafref 776 | | +--ro alarm-type-qualifier leafref 777 | +--ro impacted-resource* resource 778 | | {service-impact-analysis}? 779 | +--ro root-cause-resource* resource 780 | | {root-cause-analysis}? 781 | +--ro time-created yang:date-and-time 782 | +--ro is-cleared boolean 783 | +--ro last-raised yang:date-and-time 784 | +--ro last-changed yang:date-and-time 785 | +--ro perceived-severity severity 786 | +--ro alarm-text alarm-text 787 | +--ro status-change* [time] {alarm-history}? 788 | | +--ro time yang:date-and-time 789 | | +--ro perceived-severity severity-with-clear 790 | | +--ro alarm-text alarm-text 791 | +--ro operator-state-change* [time] {operator-actions}? 792 | | +--ro time yang:date-and-time 793 | | +--ro operator string 794 | | +--ro state operator-state 795 | | +--ro text? string 796 | +---x set-operator-state {operator-actions}? 797 | | +---w input 798 | | +---w state writable-operator-state 799 | | +---w text? string 800 | +---n operator-action {operator-actions}? 801 | +-- time yang:date-and-time 802 | +-- operator string 803 | +-- state operator-state 804 | +-- text? string 805 +---x purge-alarms 806 | +---w input 807 | | +---w alarm-clearance-status enumeration 808 | | +---w older-than! 809 | | | +---w (age-spec)? 810 | | | +--:(seconds) 811 | | | | +---w seconds? uint16 812 | | | +--:(minutes) 813 | | | | +---w minutes? uint16 814 | | | +--:(hours) 815 | | | | +---w hours? uint16 816 | | | +--:(days) 817 | | | | +---w days? uint16 818 | | | +--:(weeks) 819 | | | +---w weeks? uint16 820 | | +---w severity! 821 | | | +---w (sev-spec)? 822 | | | +--:(below) 823 | | | | +---w below? severity 824 | | | +--:(is) 825 | | | | +---w is? severity 826 | | | +--:(above) 827 | | | +---w above? severity 828 | | +---w operator-state-filter! {operator-actions}? 829 | | +---w state? operator-state 830 | | +---w user? string 831 | +--ro output 832 | +--ro purged-alarms? uint32 833 +---x compress-alarms {alarm-history}? 834 +---w input 835 | +---w resource? resource-match 836 | +---w alarm-type-id? 837 | | -> /alarms/alarm-list/alarm/alarm-type-id 838 | +---w alarm-type-qualifier? leafref 839 +--ro output 840 +--ro compressed-alarms? uint32 842 Every alarm has three important states, the resource clearance state 843 "is-cleared", the severity "perceived-severity" and the operator 844 state available in the operator state change list. 846 In order to see the alarm history the resource state changes are 847 available in the "status-change" list and the operator history is 848 available in the "operator-state-change" list. 850 4.5. The Shelved Alarms List 852 The shelved alarm list has the same structure as the alarm list 853 above. It shows all the alarms that matches the shelving criteria 854 (/alarms/control/alarm-shelving). 856 4.6. Alarm Profiles 858 Alarm profiles (/alarms/alarm-profile) is a list of configurable 859 alarm types. The list supports configurable alarm severity levels in 860 the container "alarm-severity-assignment-profile". If an alarm 861 matches the configured alarm type it MUST use the configured severity 862 level(s) instead of the system default. This configuration MUST also 863 be represented in the alarm inventory. 865 +--rw alarm-profile* 866 [alarm-type-id alarm-type-qualifier-match resource] 867 {alarm-profile}? 868 +--rw alarm-type-id alarm-type-id 869 +--rw alarm-type-qualifier-match string 870 +--rw resource resource-match 871 +--rw description string 872 +--rw alarm-severity-assignment-profile 873 {severity-assignment}? 874 +--rw severity-levels* severity 876 4.7. Operations 878 The alarm module supports the following actions to manage the alarms: 880 /alarms/alarm-list/purge-alarms: Delete alarms from the "alarm-list" 881 according to specific criteria, for example all cleared alarms 882 older than a specific date. 884 /alarms/alarm-list/compress-alarms: Compress the "status-change" 885 list for the alarms. 887 /alarms/alarm-list/alarm/set-operator-state: Change the operator 888 state for an alarm. For example, an alarm can be acknowledged by 889 setting the operator state to "ack". 891 /alarms/shelved-alarm-list/purge-shelved-alarms: Delete alarms from 892 the "shelved-alarm-list" according to specific criteria, for 893 example all alarms older than a specific date. 895 /alarms/shelved-alarm-list/compress-shelved-alarms: Compress the 896 "status-change" list for the alarms. 898 4.8. Notifications 900 The alarm module supports a general notification to report alarm 901 state changes. It carries all relevant parameters for the alarm 902 management application. 904 There is also a notification to report that an operator changed the 905 operator state on an alarm, like acknowledge. 907 If the alarm inventory is changed, for example a new card type is 908 inserted, a notification will tell the management application that 909 new alarm types are available. 911 5. Relationship to the ietf-hardware YANG module 913 RFC 8348 [RFC8348] defines the "ietf-hardware" YANG data model for 914 the management of hardware. The "alarm-state" in RFC 8348 is a 915 summary of the alarm severity levels that may be active on the 916 specific hardware component. It does not say anything about how 917 alarms are reported, and it doesn't provide any details of the 918 alarms. 920 The mapping between the alarm YANG data model and the "alarm-state" 921 in RFC 8348 is as follows: 923 resource: Corresponds to an entry in the list "/hardware/component/" 925 is-cleared: No bit set in "/hardware/component/state/alarm-state" 927 perceived-severity: Corresponding bit set in 928 "/hardware/component/state/alarm-state". 930 operator-state-change/state: If the alarm is acknowledged by the 931 operator, the bit "under-repair" is in "/hardware/component/state/ 932 alarm-state". 934 6. Alarm YANG Module 936 This YANG module references [RFC6991] and [XSD-TYPES]. 938 file "ietf-alarms@2019-04-10.yang" 939 module ietf-alarms { 940 yang-version 1.1; 941 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; 942 prefix al; 944 import ietf-yang-types { 945 prefix yang; 946 reference 947 "RFC 6991: Common YANG Data Types."; 948 } 950 organization 951 "IETF CCAMP Working Group"; 952 contact 953 "WG Web: 954 WG List: 956 Editor: Stefan Vallin 957 959 Editor: Martin Bjorklund 960 "; 962 // RFC Ed.: replace XXXX with actual RFC number and 963 // remove this note. 965 description 966 "This module defines an interface for managing alarms. Main 967 inputs to the module design are the 3GPP Alarm IRP, ITU-T X.733 968 and ANSI/ISA-18.2 alarm standards. 970 Main features of this module include: 972 * Alarm list: 973 A list of all alarms. Cleared alarms stay in 974 the list until explicitly purged. 976 * Operator actions on alarms: 977 Acknowledging and closing alarms. 979 * Administrative actions on alarms: 980 Purging alarms from the list according to specific 981 criteria. 983 * Alarm inventory: 984 A management application can read all 985 alarm types implemented by the system. 987 * Alarm shelving: 988 Shelving (blocking) alarms according 989 to specific criteria. 991 * Alarm profiles: 992 A management system can attach further 993 information to alarm types, for example 994 overriding system default severity 995 levels. 997 This module uses a stateful view on alarms. An alarm is a state 998 for a specific resource (note that an alarm is not a 999 notification). An alarm type is a possible alarm state for a 1000 resource. For example, the tuple: 1002 ('link-alarm', 'GigabitEthernet0/25') 1004 is an alarm of type 'link-alarm' on the resource 1005 'GigabitEthernet0/25'. 1007 Alarm types are identified using YANG identities and an optional 1008 string-based qualifier. The string-based qualifier allows for 1009 dynamic extension of the statically defined alarm types. Alarm 1010 types identify a possible alarm state and not the individual 1011 notifications. For example, the traditional 'link-down' and 1012 'link-up' notifications are two notifications referring to the 1013 same alarm type 'link-alarm'. 1015 With this design there is no ambiguity about how alarm and alarm 1016 clear correlation should be performed: notifications that report 1017 the same resource and alarm type are considered updates of the 1018 same alarm, e.g., clearing an active alarm or changing the 1019 severity of an alarm. 1021 The instrumentation can update 'severity' and 'alarm-text' on an 1022 existing alarm. The above alarm example can therefore look 1023 like: 1025 (('link-alarm', 'GigabitEthernet0/25'), 1026 warning, 1027 'interface down while interface admin state is up') 1029 There is a clear separation between updates on the alarm from 1030 the underlying resource, like clear, and updates from an 1031 operator like acknowledge or closing an alarm: 1033 (('link-alarm', 'GigabitEthernet0/25'), 1034 warning, 1035 'interface down while interface admin state is up', 1036 cleared, 1037 closed) 1039 Administrative actions like removing closed alarms older than a 1040 given time is supported. 1042 This alarm module does not define how the underlying 1043 instrumentation detects and clears the specific alarms. That 1044 belongs to the SDO or enterprise that owns that specific 1045 technology. 1047 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 1048 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 1049 'MAY', and 'OPTIONAL' in this document are to be interpreted as 1050 described in BCP 14 (RFC 2119) (RFC 8174) when, and only when, 1051 they appear in all capitals, as shown here. 1053 Copyright (c) 2019 IETF Trust and the persons identified as 1054 authors of the code. All rights reserved. 1056 Redistribution and use in source and binary forms, with or 1057 without modification, is permitted pursuant to, and subject to 1058 the license terms contained in, the Simplified BSD License set 1059 forth in Section 4.c of the IETF Trust's Legal Provisions 1060 Relating to IETF Documents 1061 (https://trustee.ietf.org/license-info). 1063 This version of this YANG module is part of RFC XXXX 1064 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 1065 full legal notices."; 1067 // RFC Ed.: update the date below with the date of RFC publication 1068 // and remove this note. 1070 revision 2019-04-10 { 1071 description 1072 "Initial revision."; 1073 reference 1074 "RFC XXXX: YANG Alarm Module"; 1075 } 1077 /* 1078 * Features 1079 */ 1081 feature operator-actions { 1082 description 1083 "This feature indicates that the system supports operator 1084 states on alarms."; 1085 } 1087 feature alarm-shelving { 1088 description 1089 "This feature indicates that the system supports shelving 1090 (blocking) alarms. 1092 Alarm shelving may have an impact on server processing 1093 resources in order to match alarms against shelf 1094 criteria."; 1095 } 1097 feature alarm-history { 1098 description 1099 "This feature indicates that server maintains a history of 1100 state changes for each alarm. For example, if an alarm 1101 toggles between cleared and active 10 times, these state 1102 changes are present in a separate list in the alarm. 1104 Keeping the alarm history may have an impact on server memory 1105 resources."; 1106 } 1108 feature alarm-summary { 1109 description 1110 "This feature indicates that the server summarizes the number 1111 of alarms per severity and operator state."; 1112 } 1114 feature alarm-profile { 1115 description 1116 "The system supports clients to configure further information 1117 to each alarm type."; 1119 } 1121 feature severity-assignment { 1122 description 1123 "The system supports configurable alarm severity levels."; 1124 reference 1125 "M.3160/M.3100 Alarm Severity Assignment Profile, ASAP"; 1126 } 1128 feature root-cause-analysis { 1129 description 1130 "The system supports identifying candidate root-cause 1131 resources for an alarm, for example a disc partition 1132 root cause for a logger failure alarm."; 1133 } 1135 feature service-impact-analysis { 1136 description 1137 "The system supports identifying candidate impacted 1138 resources for an alarm. For example, an interface state change 1139 resulting in a link alarm which can refer to a link as being 1140 impacted."; 1141 } 1143 feature alarm-correlation { 1144 description 1145 "The system supports correlating/grouping alarms 1146 that belong together."; 1147 } 1149 /* 1150 * Identities 1151 */ 1153 identity alarm-type-id { 1154 description 1155 "Base identity for alarm types. A unique identification of the 1156 alarm, not including the resource. Different resources can 1157 share alarm types. If the resource reports the same alarm 1158 type, it is to be considered to be the same alarm. The alarm 1159 type is a simplification of the different X.733 and 3GPP alarm 1160 IRP alarm correlation mechanisms and it allows for 1161 hierarchical extensions. 1163 A string-based qualifier can be used in addition to the 1164 identity in order to have different alarm types based on 1165 information not known at design-time, such as values in 1166 textual SNMP Notification var-binds. 1168 Standards and vendors can define sub-identities to clearly 1169 identify specific alarm types. 1171 This identity is abstract and MUST NOT be used for alarms."; 1172 } 1174 /* 1175 * Common types 1176 */ 1178 typedef resource { 1179 type union { 1180 type instance-identifier { 1181 require-instance false; 1182 } 1183 type yang:object-identifier; 1184 type string; 1185 type yang:uuid; 1186 } 1187 description 1188 "This is an identification of the alarming resource, such as an 1189 interface. It should be as fine-grained as possible both to 1190 guide the operator and to guarantee uniqueness of the alarms. 1192 If the alarming resource is modeled in YANG, this type will 1193 be an instance-identifier. 1195 If the resource is an SNMP object, the type will be an 1196 object-identifier. 1198 If the resource is anything else, for example a distinguished 1199 name or a CIM path, this type will be a string. 1201 If the alarming object is identified by a UUID use the uuid 1202 type. Be cautious when using this type, since a UUID is hard 1203 to use for an operator. 1205 If the server supports several models, the precedence should 1206 be in the order as given in the union definition."; 1207 } 1209 typedef resource-match { 1210 type union { 1211 type yang:xpath1.0; 1212 type yang:object-identifier; 1213 type string; 1214 } 1215 description 1216 "This type is used to match resources of type 'resource'. 1217 Since the type 'resource' is a union of different types, the 1218 'resource-match' type is also a union of corresponding types. 1220 If the type is given as an XPath 1.0 expression, a resource of 1221 type 'instance-identifier' matches if the instance is part of 1222 the node set that is the result of evaluating the XPath 1.0 1223 expression. For example, the XPath 1.0 expression: 1225 /ietf-interfaces:interfaces/ietf-interfaces:interface 1226 [ietf-interfaces:type='ianaift:ethernetCsmacd'] 1228 would match the resource instance-identifier: 1230 /if:interfaces/if:interface[if:name='eth1'], 1232 assuming that the interface 'eth1' is of type 1233 'ianaift:ethernetCsmacd'. 1235 If the type is given as an object identifier, a resource of 1236 type 'object-identifier' matches if the match object 1237 identifier is a prefix of the resource's object identifier. 1238 For example, the value: 1240 1.3.6.1.2.1.2.2 1242 would match the resource object identifier: 1244 1.3.6.1.2.1.2.2.1.1.5 1246 If the type is given as an UUID or a string, it is interpreted 1247 as an XML Schema regular expression, which matches a resource 1248 of type 'yang:uuid' or 'string' if the given regular 1249 expression matches the resource string. 1251 If the type is given as an XPath expression it is evaluated 1252 in the following XPath context: 1254 o The set of namespace declarations is the set of prefix 1255 and namespace pairs for all YANG modules implemented by 1256 the server, where the prefix is the YANG module name and 1257 the namespace is as defined by the 'namespace' statement 1258 in the YANG module. 1260 If a leaf of this type is encoded in XML, all namespace 1261 declarations in scope on the leaf element are added to 1262 the set of namespace declarations. If a prefix found in 1263 the XML is already present in the set of namespace 1264 declarations, the namespace in the XML is used. 1266 o The set of variable bindings is empty. 1268 o The function library is the core function library 1269 and the functions defined in Section 10 of RFC 7950. 1271 o The context node is the root node in the data tree."; 1272 reference 1273 "XML Schema Part 2: Datatypes Second Edition, 1274 World Wide Web Consortium Recommendation 1275 REC-xmlschema-2-20041028"; 1276 } 1278 typedef alarm-text { 1279 type string; 1280 description 1281 "The string used to inform operators about the alarm. This 1282 MUST contain enough information for an operator to be able to 1283 understand the problem and how to resolve it. If this string 1284 contains structure, this format should be clearly documented 1285 for programs to be able to parse that information."; 1286 } 1288 typedef severity { 1289 type enumeration { 1290 enum indeterminate { 1291 value 2; 1292 description 1293 "Indicates that the severity level could not be 1294 determined. This level SHOULD be avoided."; 1295 } 1296 enum warning { 1297 value 3; 1298 description 1299 "The 'warning' severity level indicates the detection of a 1300 potential or impending service affecting fault, before any 1301 significant effects have been felt. Action should be 1302 taken to further diagnose (if necessary) and correct the 1303 problem in order to prevent it from becoming a more 1304 serious service affecting fault."; 1305 } 1306 enum minor { 1307 value 4; 1308 description 1309 "The 'minor' severity level indicates the existence of a 1310 non-service affecting fault condition and that corrective 1311 action should be taken in order to prevent a more serious 1312 (for example, service affecting) fault. Such a severity 1313 can be reported, for example, when the detected alarm 1314 condition is not currently degrading the capacity of the 1315 resource."; 1316 } 1317 enum major { 1318 value 5; 1319 description 1320 "The 'major' severity level indicates that a service 1321 affecting condition has developed and an urgent corrective 1322 action is required. Such a severity can be reported, for 1323 example, when there is a severe degradation in the 1324 capability of the resource and its full capability must be 1325 restored."; 1326 } 1327 enum critical { 1328 value 6; 1329 description 1330 "The 'critical' severity level indicates that a service 1331 affecting condition has occurred and an immediate 1332 corrective action is required. Such a severity can be 1333 reported, for example, when a resource becomes totally out 1334 of service and its capability must be restored."; 1335 } 1336 } 1337 description 1338 "The severity level of the alarm. Note well that value 'clear' 1339 is not included. If an alarm is cleared or not is a separate 1340 boolean flag."; 1341 reference 1342 "ITU Recommendation X.733: Information Technology 1343 - Open Systems Interconnection 1344 - System Management: Alarm Reporting Function"; 1345 } 1347 typedef severity-with-clear { 1348 type union { 1349 type enumeration { 1350 enum cleared { 1351 value 1; 1352 description 1353 "The alarm is cleared by the instrumentation."; 1354 } 1355 } 1356 type severity; 1357 } 1358 description 1359 "The severity level of the alarm including clear. This is used 1360 only in notifications reporting state changes for an alarm."; 1361 } 1363 typedef writable-operator-state { 1364 type enumeration { 1365 enum none { 1366 value 1; 1367 description 1368 "The alarm is not being taken care of."; 1369 } 1370 enum ack { 1371 value 2; 1372 description 1373 "The alarm is being taken care of. Corrective action not 1374 taken yet, or failed"; 1375 } 1376 enum closed { 1377 value 3; 1378 description 1379 "Corrective action taken successfully."; 1380 } 1381 } 1382 description 1383 "Operator states on an alarm. The 'closed' state indicates 1384 that an operator considers the alarm being resolved. This is 1385 separate from the alarm's 'is-cleared' leaf."; 1386 } 1388 typedef operator-state { 1389 type union { 1390 type writable-operator-state; 1391 type enumeration { 1392 enum shelved { 1393 value 4; 1394 description 1395 "The alarm is shelved. Alarms in /alarms/shelved-alarms/ 1396 MUST be assigned this operator state by the server as 1397 the last entry in the operator-state-change list. The 1398 text for that entry SHOULD include the shelf name."; 1399 } 1400 enum un-shelved { 1401 value 5; 1402 description 1403 "The alarm is moved back to 'alarm-list' from a shelf. 1404 Alarms that are moved from /alarms/shelved-alarms/ to 1405 /alarms/alarm-list MUST be assigned this state by the 1406 server as the last entry in the 'operator-state-change' 1407 list. The text for that entry SHOULD include the shelf 1408 name."; 1409 } 1410 } 1411 } 1412 description 1413 "Operator states on an alarm. The 'closed' state indicates 1414 that an operator considers the alarm being resolved. This is 1415 separate from the alarm's 'is-cleared' leaf."; 1416 } 1418 /* Alarm type */ 1420 typedef alarm-type-id { 1421 type identityref { 1422 base alarm-type-id; 1423 } 1424 description 1425 "Identifies an alarm type. The description of the alarm type 1426 id MUST indicate if the alarm type is abstract or not. An 1427 abstract alarm type is used as a base for other alarm type ids 1428 and will not be used as a value for an alarm or be present in 1429 the alarm inventory."; 1430 } 1432 typedef alarm-type-qualifier { 1433 type string; 1434 description 1435 "If an alarm type can not be fully specified at design time by 1436 alarm-type-id, this string qualifier is used in addition to 1437 fully define a unique alarm type. 1439 The definition of alarm qualifiers is considered being part of 1440 the instrumentation and out of scope for this module. An 1441 empty string is used when this is part of a key."; 1442 } 1444 /* 1445 * Groupings 1446 */ 1448 grouping common-alarm-parameters { 1449 description 1450 "Common parameters for an alarm. 1452 This grouping is used both in the alarm list and in the 1453 notification representing an alarm state change."; 1454 leaf resource { 1455 type resource; 1456 mandatory true; 1457 description 1458 "The alarming resource. See also 'alt-resource'. This could 1459 for example be a reference to the alarming interface"; 1460 } 1461 leaf alarm-type-id { 1462 type alarm-type-id; 1463 mandatory true; 1464 description 1465 "This leaf and the leaf 'alarm-type-qualifier' together 1466 provides a unique identification of the alarm type."; 1467 } 1468 leaf alarm-type-qualifier { 1469 type alarm-type-qualifier; 1470 description 1471 "This leaf is used when the 'alarm-type-id' leaf cannot 1472 uniquely identify the alarm type. Normally, this is not the 1473 case, and this leaf is the empty string."; 1474 } 1475 leaf-list alt-resource { 1476 type resource; 1477 description 1478 "Used if the alarming resource is available over other 1479 interfaces. This field can contain SNMP OIDs, CIM paths or 1480 3GPP Distinguished names for example."; 1481 } 1482 list related-alarm { 1483 if-feature "alarm-correlation"; 1484 key "resource alarm-type-id alarm-type-qualifier"; 1485 description 1486 "References to related alarms. Note that the related alarm 1487 might have been purged from the alarm list."; 1488 leaf resource { 1489 type leafref { 1490 path "/alarms/alarm-list/alarm/resource"; 1491 require-instance false; 1492 } 1493 description 1494 "The alarming resource for the related alarm."; 1495 } 1496 leaf alarm-type-id { 1497 type leafref { 1498 path "/alarms/alarm-list/alarm" 1499 + "[resource=current()/../resource]" 1500 + "/alarm-type-id"; 1501 require-instance false; 1502 } 1503 description 1504 "The alarm type identifier for the related alarm."; 1505 } 1506 leaf alarm-type-qualifier { 1507 type leafref { 1508 path "/alarms/alarm-list/alarm" 1509 + "[resource=current()/../resource]" 1510 + "[alarm-type-id=current()/../alarm-type-id]" 1511 + "/alarm-type-qualifier"; 1512 require-instance false; 1513 } 1514 description 1515 "The alarm qualifier for the related alarm."; 1516 } 1517 } 1518 leaf-list impacted-resource { 1519 if-feature "service-impact-analysis"; 1520 type resource; 1521 description 1522 "Resources that might be affected by this alarm. If the 1523 system creates an alarm on a resource and also has a mapping 1524 to other resources that might be impacted, these resources 1525 can be listed in this leaf-list. In this way the system can 1526 create one alarm instead of several. For example, if an 1527 interface has an alarm, the 'impacted-resource' can 1528 reference the aggregated port channels."; 1529 } 1530 leaf-list root-cause-resource { 1531 if-feature "root-cause-analysis"; 1532 type resource; 1533 description 1534 "Resources that are candidates for causing the alarm. If the 1535 system has a mechanism to understand the candidate root 1536 causes of an alarm, this leaf-list can be used to list the 1537 root cause candidate resources. In this way the system can 1538 create one alarm instead of several. An example might be a 1539 logging system (alarm resource) that fails, the alarm can 1540 reference the file-system in the 'root-cause-resource' 1541 leaf-list. Note that the intended use is not to also send 1542 an an alarm with the root-cause-resource as alarming 1543 resource. The root-cause-resource leaf list is a hint and 1544 should not also generate an alarm for the same problem."; 1545 } 1546 } 1548 grouping alarm-state-change-parameters { 1549 description 1550 "Parameters for an alarm state change. 1552 This grouping is used both in the alarm list's status-change 1553 list and in the notification representing an alarm state 1554 change."; 1555 leaf time { 1556 type yang:date-and-time; 1557 mandatory true; 1558 description 1559 "The time the status of the alarm changed. The value 1560 represents the time the real alarm state change appeared in 1561 the resource and not when it was added to the alarm 1562 list. The /alarm-list/alarm/last-changed MUST be set to the 1563 same value."; 1564 } 1565 leaf perceived-severity { 1566 type severity-with-clear; 1567 mandatory true; 1568 description 1569 "The severity of the alarm as defined by X.733. Note that 1570 this may not be the original severity since the alarm may 1571 have changed severity."; 1572 reference 1573 "ITU Recommendation X.733: Information Technology 1574 - Open Systems Interconnection 1575 - System Management: Alarm Reporting Function"; 1576 } 1577 leaf alarm-text { 1578 type alarm-text; 1579 mandatory true; 1580 description 1581 "A user friendly text describing the alarm state change."; 1582 reference 1583 "ITU Recommendation X.733: Information Technology 1584 - Open Systems Interconnection 1585 - System Management: Alarm Reporting Function"; 1586 } 1587 } 1589 grouping operator-parameters { 1590 description 1591 "This grouping defines parameters that can be changed by an 1592 operator."; 1593 leaf time { 1594 type yang:date-and-time; 1595 mandatory true; 1596 description 1597 "Timestamp for operator action on alarm."; 1598 } 1599 leaf operator { 1600 type string; 1601 mandatory true; 1602 description 1603 "The name of the operator that has acted on this alarm."; 1604 } 1605 leaf state { 1606 type operator-state; 1607 mandatory true; 1608 description 1609 "The operator's view of the alarm state."; 1610 } 1611 leaf text { 1612 type string; 1613 description 1614 "Additional optional textual information provided by the 1615 operator."; 1616 } 1617 } 1619 grouping resource-alarm-parameters { 1620 description 1621 "Alarm parameters that originates from the resource view."; 1622 leaf is-cleared { 1623 type boolean; 1624 mandatory true; 1625 description 1626 "Indicates the current clearance state of the alarm. An 1627 alarm might toggle from active alarm to cleared alarm and 1628 back to active again."; 1629 } 1630 leaf last-raised { 1631 type yang:date-and-time; 1632 mandatory true; 1633 description 1634 "An alarm may change severity level and toggle between 1635 active and cleared during its life-time. This leaf indicates 1636 the last time it was last raised (is-cleared = false)."; 1637 } 1638 leaf last-changed { 1639 type yang:date-and-time; 1640 mandatory true; 1641 description 1642 "A timestamp when the 'status-change' or 1643 'operator-state-change' list was last changed."; 1644 } 1645 leaf perceived-severity { 1646 type severity; 1647 mandatory true; 1648 description 1649 "The last severity of the alarm. 1651 If an alarm was raised with severity 'warning', but later 1652 changed to 'major', this leaf will show 'major'."; 1653 } 1654 leaf alarm-text { 1655 type alarm-text; 1656 mandatory true; 1657 description 1658 "The last reported alarm text. This text should contain 1659 information for an operator to be able to understand the 1660 problem and how to resolve it."; 1661 } 1662 list status-change { 1663 if-feature "alarm-history"; 1664 key "time"; 1665 min-elements 1; 1666 description 1667 "A list of status change events for this alarm. 1669 The entry with latest time-stamp in this list MUST 1670 correspond to the leafs 'is-cleared', 'perceived-severity' 1671 and 'alarm-text' for the alarm. 1673 This list is ordered according to the timestamps of alarm 1674 state changes. The first item corresponds to the latest 1675 state change. 1677 The following state changes creates an entry in this 1678 list: 1679 - changed severity (warning, minor, major, critical) 1680 - clearance status, this also updates the 'is-cleared' 1681 leaf 1682 - alarm text update"; 1683 uses alarm-state-change-parameters; 1684 } 1685 } 1687 grouping filter-input { 1688 description 1689 "Grouping to specify a filter construct on alarm information."; 1690 leaf alarm-clearance-status { 1691 type enumeration { 1692 enum any { 1693 description 1694 "Ignore alarm clearance status."; 1695 } 1696 enum cleared { 1697 description 1698 "Filter cleared alarms."; 1699 } 1700 enum not-cleared { 1701 description 1702 "Filter not cleared alarms."; 1703 } 1704 } 1705 mandatory true; 1706 description 1707 "The clearance status of the alarm."; 1708 } 1709 container older-than { 1710 presence "Age specification"; 1711 description 1712 "Matches the 'last-status-change' leaf in the alarm."; 1713 choice age-spec { 1714 description 1715 "Filter using date and time age."; 1716 case seconds { 1717 leaf seconds { 1718 type uint16; 1719 description 1720 "Age expressed in seconds."; 1721 } 1722 } 1723 case minutes { 1724 leaf minutes { 1725 type uint16; 1726 description 1727 "Age expressed in minutes."; 1728 } 1729 } 1730 case hours { 1731 leaf hours { 1732 type uint16; 1733 description 1734 "Age expressed in hours."; 1735 } 1736 } 1737 case days { 1738 leaf days { 1739 type uint16; 1740 description 1741 "Age expressed in days."; 1742 } 1743 } 1744 case weeks { 1745 leaf weeks { 1746 type uint16; 1747 description 1748 "Age expressed in weeks."; 1749 } 1750 } 1751 } 1752 } 1753 container severity { 1754 presence "Severity filter"; 1755 choice sev-spec { 1756 description 1757 "Filter based on severity level."; 1758 leaf below { 1759 type severity; 1760 description 1761 "Severity less than this leaf."; 1762 } 1763 leaf is { 1764 type severity; 1765 description 1766 "Severity level equal this leaf."; 1767 } 1768 leaf above { 1769 type severity; 1770 description 1771 "Severity level higher than this leaf."; 1772 } 1773 } 1774 description 1775 "Filter based on severity."; 1776 } 1777 container operator-state-filter { 1778 if-feature "operator-actions"; 1779 presence "Operator state filter"; 1780 leaf state { 1781 type operator-state; 1782 description 1783 "Filter on operator state."; 1784 } 1785 leaf user { 1786 type string; 1787 description 1788 "Filter based on which operator."; 1789 } 1790 description 1791 "Filter based on operator state."; 1793 } 1794 } 1796 /* 1797 * The /alarms data tree 1798 */ 1800 container alarms { 1801 description 1802 "The top container for this module."; 1803 container control { 1804 description 1805 "Configuration to control the alarm behavior."; 1806 leaf max-alarm-status-changes { 1807 type union { 1808 type uint16; 1809 type enumeration { 1810 enum infinite { 1811 description 1812 "The status change entries are accumulated 1813 infinitely."; 1814 } 1815 } 1816 } 1817 default "32"; 1818 description 1819 "The status-change entries are kept in a circular list per 1820 alarm. When this number is exceeded, the oldest status 1821 change entry is automatically removed. If the value is 1822 'infinite', the status change entries are accumulated 1823 infinitely."; 1824 } 1825 leaf notify-status-changes { 1826 type enumeration { 1827 enum all-state-changes { 1828 description 1829 "Send notifications for all status changes."; 1830 } 1831 enum raise-and-clear { 1832 description 1833 "Send notifications only for raise, clear, and 1834 re-raise. Notifications for severity level changes or 1835 alarm text changes are not sent."; 1836 } 1837 enum severity-level { 1838 description 1839 "Only send notifications for alarm state changes 1840 crossing the level specified in 1841 'notify-severity-level'. Always send clear 1842 notifications."; 1843 } 1844 } 1845 must '. != "severity-level" or ../notify-severity-level' { 1846 description 1847 "When notify-status-changes is 'severity-level', a value 1848 must be given for notify-severity-level."; 1849 } 1850 default "all-state-changes"; 1851 description 1852 "This leaf controls the notifications sent for alarm status 1853 updates. There are three options: 1855 1. Notifications are sent for all updates, severity level 1856 changes and alarm text changes 1858 2. Notifications are only sent for alarm raise and clear 1860 3. Notifications are sent for status changes equal to or 1861 above the specified severity level. Clear 1862 notifications shall always be sent. Notifications shall 1863 also be sent for state changes that makes an alarm less 1864 severe than the specified level. 1866 For example, in option 3, assuming the severity level is 1867 set to major and that the alarm has the following state 1868 changes: 1870 [(Time, severity, clear)]: 1871 [(T1, major, -), (T2, minor, -), (T3, warning, -), 1872 (T4, minor, -), (T5, major, -), (T6, critical, -), 1873 (T7, major. -), (T8, major, clear)] 1875 In that case, notifications will be sent at times 1876 T1, T2, T5, T6, T7 and T8."; 1877 } 1878 leaf notify-severity-level { 1879 when '../notify-status-changes = "severity-level"'; 1880 type severity; 1881 description 1882 "Only send notifications for alarm state changes crossing 1883 the specified level. Always send clear notifications."; 1884 } 1885 container alarm-shelving { 1886 if-feature "alarm-shelving"; 1887 description 1888 "The alarm-shelving/shelf list is used to shelve 1889 (block/filter) alarms. The conditions in the shelf 1890 criteria are logically ANDed. The first matching shelf is 1891 used, and an alarm is shelved only for this first match. 1892 Matching alarms MUST appear in the 1893 /alarms/shelved-alarms/shelved-alarm list, and 1894 non-matching /alarms MUST appear in the 1895 /alarms/alarm-list/alarm list. The server does not send 1896 any notifications for shelved alarms. 1898 The server MUST maintain states (e.g., severity 1899 changes) for the shelved alarms. 1901 Alarms that match the criteria shall have an 1902 operator state 'shelved'. When the shelf 1903 configuration removes an alarm from the shelf the server 1904 shall add an operator state 'un-shelved'."; 1905 list shelf { 1906 key "name"; 1907 ordered-by user; 1908 leaf name { 1909 type string; 1910 description 1911 "An arbitrary name for the alarm shelf."; 1912 } 1913 description 1914 "Each entry defines the criteria for shelving alarms. 1915 Criteria are ANDed. If no criteria are specified, 1916 all alarms will be shelved."; 1917 leaf-list resource { 1918 type resource-match; 1919 description 1920 "Shelve alarms for matching resources."; 1921 } 1922 list alarm-type { 1923 key "alarm-type-id alarm-type-qualifier-match"; 1924 description 1925 "Any alarm matching the combined criteria of 1926 alarm-type-id and alarm-type-qualifier-match 1927 MUST be matched."; 1928 leaf alarm-type-id { 1929 type alarm-type-id; 1930 description 1931 "Shelve all alarms that have an alarm-type-id that is 1932 equal to or derived from the given alarm-type-id."; 1933 } 1934 leaf alarm-type-qualifier-match { 1935 type string; 1936 description 1937 "An XML Schema regular expression that is used to 1938 match an alarm type qualifier. Shelve all alarms 1939 that matches this regular expression for the alarm 1940 type qualifier."; 1941 reference 1942 "XML Schema Part 2: Datatypes Second Edition, 1943 World Wide Web Consortium Recommendation 1944 REC-xmlschema-2-20041028"; 1945 } 1946 } 1947 leaf description { 1948 type string; 1949 description 1950 "An optional textual description of the shelf. This 1951 description should include the reason for shelving 1952 these alarms."; 1953 } 1954 } 1955 } 1956 } 1957 container alarm-inventory { 1958 config false; 1959 description 1960 "This alarm-inventory/alarm-type list contains all possible 1961 alarm types for the system. 1963 If the system knows for which resources a specific alarm 1964 type can appear, this is also identified in the inventory. 1965 The list also tells if each alarm type has a corresponding 1966 clear state. The inventory shall only contain concrete 1967 alarm types. 1969 The alarm inventory MUST be updated by the system when new 1970 alarms can appear. This can be the case when installing new 1971 software modules or inserting new card types. A 1972 notification 'alarm-inventory-changed' is sent when the 1973 inventory is changed."; 1974 list alarm-type { 1975 key "alarm-type-id alarm-type-qualifier"; 1976 description 1977 "An entry in this list defines a possible alarm."; 1978 leaf alarm-type-id { 1979 type alarm-type-id; 1980 description 1981 "The statically defined alarm type identifier for this 1982 possible alarm."; 1983 } 1984 leaf alarm-type-qualifier { 1985 type alarm-type-qualifier; 1986 description 1987 "The optionally dynamically defined alarm type identifier 1988 for this possible alarm."; 1989 } 1990 leaf-list resource { 1991 type resource-match; 1992 description 1993 "Optionally, specifies for which resources the alarm type 1994 is valid."; 1995 } 1996 leaf will-clear { 1997 type boolean; 1998 mandatory true; 1999 description 2000 "This leaf tells the operator if the alarm will be 2001 cleared when the correct corrective action has been 2002 taken. Implementations SHOULD strive for detecting the 2003 cleared state for all alarm types. 2005 If this leaf is 'true', the operator can monitor the 2006 alarm until it becomes cleared after the corrective 2007 action has been taken. 2009 If this leaf is 'false', the operator needs to validate 2010 that the alarm is no longer active using other 2011 mechanisms. Alarms can lack a corresponding clear due 2012 to missing instrumentation or that there is no logical 2013 corresponding clear state."; 2014 } 2015 leaf-list severity-levels { 2016 type severity; 2017 description 2018 "This leaf-list indicates the possible severity levels of 2019 this alarm type. Note well that 'clear' is not part of 2020 the severity type. In general, the severity level 2021 should be defined by the instrumentation based on 2022 dynamic state and not defined statically by the alarm 2023 type in order to provide relevant severity level based 2024 on dynamic state and context. However most alarm types 2025 have a defined set of possible severity levels and this 2026 should be provided here."; 2027 } 2028 leaf description { 2029 type string; 2030 mandatory true; 2031 description 2032 "A description of the possible alarm. It SHOULD include 2033 information on possible underlying root causes and 2034 corrective actions."; 2035 } 2036 } 2037 } 2038 container summary { 2039 if-feature "alarm-summary"; 2040 config false; 2041 description 2042 "This container gives a summary of number of alarms."; 2043 list alarm-summary { 2044 key "severity"; 2045 description 2046 "A global summary of all alarms in the system. The summary 2047 does not include shelved alarms."; 2048 leaf severity { 2049 type severity; 2050 description 2051 "Alarm summary for this severity level."; 2052 } 2053 leaf total { 2054 type yang:gauge32; 2055 description 2056 "Total number of alarms of this severity level."; 2057 } 2058 leaf not-cleared { 2059 type yang:gauge32; 2060 description 2061 "Total number of alarms of this severity level 2062 that are not cleared."; 2063 } 2064 leaf cleared { 2065 type yang:gauge32; 2066 description 2067 "For this severity level, the number of alarms that are 2068 cleared."; 2069 } 2070 leaf cleared-not-closed { 2071 if-feature "operator-actions"; 2072 type yang:gauge32; 2073 description 2074 "For this severity level, the number of alarms that are 2075 cleared but not closed."; 2076 } 2077 leaf cleared-closed { 2078 if-feature "operator-actions"; 2079 type yang:gauge32; 2080 description 2081 "For this severity level, the number of alarms that are 2082 cleared and closed."; 2083 } 2084 leaf not-cleared-closed { 2085 if-feature "operator-actions"; 2086 type yang:gauge32; 2087 description 2088 "For this severity level, the number of alarms that are 2089 not cleared but closed."; 2090 } 2091 leaf not-cleared-not-closed { 2092 if-feature "operator-actions"; 2093 type yang:gauge32; 2094 description 2095 "For this severity level, the number of alarms that are 2096 not cleared and not closed."; 2097 } 2098 } 2099 leaf shelves-active { 2100 if-feature "alarm-shelving"; 2101 type empty; 2102 description 2103 "This is a hint to the operator that there are active 2104 alarm shelves. This leaf MUST exist if the 2105 /alarms/shelved-alarms/number-of-shelved-alarms is > 0."; 2106 } 2107 } 2108 container alarm-list { 2109 config false; 2110 description 2111 "The alarms in the system."; 2112 leaf number-of-alarms { 2113 type yang:gauge32; 2114 description 2115 "This object shows the total number of 2116 alarms in the system, i.e., the total number 2117 of entries in the alarm list."; 2118 } 2119 leaf last-changed { 2120 type yang:date-and-time; 2121 description 2122 "A timestamp when the alarm list was last 2123 changed. The value can be used by a manager to 2124 initiate an alarm resynchronization procedure."; 2125 } 2126 list alarm { 2127 key "resource alarm-type-id alarm-type-qualifier"; 2128 description 2129 "The list of alarms. Each entry in the list holds one 2130 alarm for a given alarm type and resource. An alarm can 2131 be updated from the underlying resource or by the user. 2132 The following leafs are maintained by the resource: 2133 is-cleared, last-change, perceived-severity, and 2134 alarm-text. An operator can change: operator-state and 2135 operator-text. 2137 Entries appear in the alarm list the first time an alarm 2138 becomes active for a given alarm-type and resource. 2139 Entries do not get deleted when the alarm is cleared. 2140 Clear status is represented as a boolean flag. 2142 Alarm entries are removed, purged, from the list by an 2143 explicit purge action. For example, purge all alarms that 2144 are cleared and in closed operator-state that are older 2145 than 24 hours. Purged alarms are removed from the alarm 2146 list. If the alarm resource state changes after a purge, 2147 the alarm will reappear in the alarm list. 2149 Systems may also remove alarms based on locally configured 2150 policies which is out of scope for this module."; 2151 uses common-alarm-parameters; 2152 leaf time-created { 2153 type yang:date-and-time; 2154 mandatory true; 2155 description 2156 "The time-stamp when this alarm entry was created. This 2157 represents the first time the alarm appeared, it can 2158 also represent that the alarm re-appeared after a purge. 2159 Further state-changes of the same alarm does not change 2160 this leaf, these changes will update the 'last-changed' 2161 leaf."; 2162 } 2163 uses resource-alarm-parameters; 2164 list operator-state-change { 2165 if-feature "operator-actions"; 2166 key "time"; 2167 description 2168 "This list is used by operators to indicate the state of 2169 human intervention on an alarm. For example, if an 2170 operator has seen an alarm, the operator can add a new 2171 item to this list indicating that the alarm is 2172 acknowledged."; 2173 uses operator-parameters; 2174 } 2175 action set-operator-state { 2176 if-feature "operator-actions"; 2177 description 2178 "This is a means for the operator to indicate the level 2179 of human intervention on an alarm."; 2180 input { 2181 leaf state { 2182 type writable-operator-state; 2183 mandatory true; 2184 description 2185 "Set this operator state."; 2186 } 2187 leaf text { 2188 type string; 2189 description 2190 "Additional optional textual information."; 2191 } 2192 } 2193 } 2194 notification operator-action { 2195 if-feature "operator-actions"; 2196 description 2197 "This notification is used to report that an operator 2198 acted upon an alarm."; 2199 uses operator-parameters; 2200 } 2201 } 2202 action purge-alarms { 2203 description 2204 "This operation requests the server to delete entries from 2205 the alarm list according to the supplied criteria. 2207 Typically this operation is used to delete alarms that are 2208 in closed operator state and older than a specified time. 2210 The number of purged alarms is returned as an output 2211 parameter."; 2212 input { 2213 uses filter-input; 2214 } 2215 output { 2216 leaf purged-alarms { 2217 type uint32; 2218 description 2219 "Number of purged alarms."; 2220 } 2221 } 2222 } 2223 action compress-alarms { 2224 if-feature "alarm-history"; 2225 description 2226 "This operation requests the server to compress entries in 2227 the alarm list by removing all but the latest 2228 'status-change' entry for all matching alarms. Conditions 2229 in the input are logically ANDed. If no input condition 2230 is given, all alarms are compressed."; 2231 input { 2232 leaf resource { 2233 type resource-match; 2234 description 2235 "Compress the alarms matching this resource."; 2236 } 2237 leaf alarm-type-id { 2238 type leafref { 2239 path "/alarms/alarm-list/alarm/alarm-type-id"; 2240 require-instance false; 2241 } 2242 description 2243 "Compress alarms with this alarm-type-id."; 2244 } 2245 leaf alarm-type-qualifier { 2246 type leafref { 2247 path "/alarms/alarm-list/alarm/alarm-type-qualifier"; 2248 require-instance false; 2249 } 2250 description 2251 "Compress the alarms with this alarm-type-qualifier."; 2252 } 2253 } 2254 output { 2255 leaf compressed-alarms { 2256 type uint32; 2257 description 2258 "Number of compressed alarm entries."; 2259 } 2260 } 2261 } 2262 } 2263 container shelved-alarms { 2264 if-feature "alarm-shelving"; 2265 config false; 2266 description 2267 "The shelved alarms. Alarms appear here if they match the 2268 criteria in /alarms/control/alarm-shelving. This list does 2269 not generate any notifications. The list represents alarms 2270 that are considered not relevant by the operator. Alarms in 2271 this list have an operator-state of 'shelved'. This can not 2272 be changed."; 2274 leaf number-of-shelved-alarms { 2275 type yang:gauge32; 2276 description 2277 "This object shows the total number of currently 2278 alarms, i.e., the total number of entries 2279 in the alarm list."; 2280 } 2281 leaf shelved-alarms-last-changed { 2282 type yang:date-and-time; 2283 description 2284 "A timestamp when the shelved alarm list was last changed. 2285 The value can be used by a manager to initiate an alarm 2286 resynchronization procedure."; 2287 } 2288 list shelved-alarm { 2289 key "resource alarm-type-id alarm-type-qualifier"; 2290 description 2291 "The list of shelved alarms. Shelved alarms can only be 2292 updated from the underlying resource, no operator actions 2293 are supported."; 2294 uses common-alarm-parameters; 2295 leaf shelf-name { 2296 type leafref { 2297 path "/alarms/control/alarm-shelving/shelf/name"; 2298 require-instance false; 2299 } 2300 description 2301 "The name of the shelf."; 2302 } 2303 uses resource-alarm-parameters; 2304 list operator-state-change { 2305 if-feature "operator-actions"; 2306 key "time"; 2307 description 2308 "This list is used by operators to indicate the state of 2309 human intervention on an alarm. For shelved alarms, the 2310 system has set the list item in the list to 'shelved'."; 2311 uses operator-parameters; 2312 } 2313 } 2314 action purge-shelved-alarms { 2315 description 2316 "This operation requests the server to delete entries from 2317 the shelved alarms list according to the supplied 2318 criteria. 2320 In the shelved alarm list it makes sense to delete alarms 2321 that are not relevant anymore. 2323 The number of purged alarms is returned as an output 2324 parameter."; 2325 input { 2326 uses filter-input; 2327 } 2328 output { 2329 leaf purged-alarms { 2330 type uint32; 2331 description 2332 "Number of purged alarms."; 2333 } 2334 } 2335 } 2336 action compress-shelved-alarms { 2337 if-feature "alarm-history"; 2338 description 2339 "This operation requests the server to compress entries in 2340 the shelved alarm list by removing all but the latest 2341 'status-change' entry for all matching shelved alarms. 2342 Conditions in the input are logically ANDed. If no input 2343 condition is given, all alarms are compressed."; 2344 input { 2345 leaf resource { 2346 type leafref { 2347 path "/alarms/shelved-alarms/shelved-alarm/resource"; 2348 require-instance false; 2349 } 2350 description 2351 "Compress the alarms with this resource."; 2352 } 2353 leaf alarm-type-id { 2354 type leafref { 2355 path "/alarms/shelved-alarms/shelved-alarm" 2356 + "/alarm-type-id"; 2357 require-instance false; 2358 } 2359 description 2360 "Compress alarms with this alarm-type-id."; 2361 } 2362 leaf alarm-type-qualifier { 2363 type leafref { 2364 path "/alarms/shelved-alarms/shelved-alarm" 2365 + "/alarm-type-qualifier"; 2366 require-instance false; 2367 } 2368 description 2369 "Compress the alarms with this alarm-type-qualifier."; 2370 } 2372 } 2373 output { 2374 leaf compressed-alarms { 2375 type uint32; 2376 description 2377 "Number of compressed alarm entries."; 2378 } 2379 } 2380 } 2381 } 2382 list alarm-profile { 2383 if-feature "alarm-profile"; 2384 key "alarm-type-id alarm-type-qualifier-match resource"; 2385 ordered-by user; 2386 description 2387 "This list is used to assign further information or 2388 configuration for each alarm type. This module supports a 2389 mechanism where the client can override the system default 2390 alarm severity levels. The alarm-profile is also a useful 2391 augmentation point for specific additions to alarm types."; 2392 leaf alarm-type-id { 2393 type alarm-type-id; 2394 description 2395 "The alarm type identifier to match."; 2396 } 2397 leaf alarm-type-qualifier-match { 2398 type string; 2399 description 2400 "An XML Schema regular expression that is used to match the 2401 alarm type qualifier."; 2402 reference 2403 "XML Schema Part 2: Datatypes Second Edition, 2404 World Wide Web Consortium Recommendation 2405 REC-xmlschema-2-20041028"; 2406 } 2407 leaf resource { 2408 type resource-match; 2409 description 2410 "Specifies which resources to match."; 2411 } 2412 leaf description { 2413 type string; 2414 mandatory true; 2415 description 2416 "A description of the alarm profile."; 2417 } 2418 container alarm-severity-assignment-profile { 2419 if-feature "severity-assignment"; 2420 description 2421 "The client can override the system default severity 2422 level."; 2423 reference 2424 "ITU M.3100, ITU M.3160 2425 - Generic Network Information Model, Alarm Severity 2426 Assignment Profile"; 2427 leaf-list severity-levels { 2428 type severity; 2429 ordered-by user; 2430 description 2431 "Specifies the configured severity level(s) for the 2432 matching alarm. If the alarm has several severity 2433 levels the leaf-list shall be given in rising severity 2434 order. The original M3100/M3160 ASAP function only 2435 allows for a one-to-one mapping between alarm type and 2436 severity but since the IETF alarm module supports 2437 stateful alarms the mapping must allow for several 2438 severity levels. 2440 Assume a high-utilization alarm type with two thresholds 2441 with the system default severity levels of threshold1 = 2442 warning and threshold2 = minor. Setting this leaf-list 2443 to (minor, major) will assign the severity levels 2444 threshold1 = minor and threshold2 = major"; 2445 } 2446 } 2447 } 2448 } 2450 /* 2451 * Notifications 2452 */ 2454 notification alarm-notification { 2455 description 2456 "This notification is used to report a state change for an 2457 alarm. The same notification is used for reporting a newly 2458 raised alarm, a cleared alarm or changing the text and/or 2459 severity of an existing alarm."; 2460 uses common-alarm-parameters; 2461 uses alarm-state-change-parameters; 2462 } 2464 notification alarm-inventory-changed { 2465 description 2466 "This notification is used to report that the list of possible 2467 alarms has changed. This can happen when for example if a new 2468 software module is installed, or a new physical card is 2469 inserted."; 2470 } 2471 } 2473 2475 7. X.733 Extensions 2477 Many alarm systems are based on the X.733, [X.733], and X.736 [X.736] 2478 alarm standards. This module augments the alarm inventory, the alarm 2479 lists and the alarm notification with X.733 and X.736 parameters. 2481 The module also supports a feature whereby the alarm manager can 2482 configure the mapping from alarm types to X.733 "event-type" and 2483 "probable-cause" parameters. This might be needed when the default 2484 mapping provided by the system is in conflict with other management 2485 systems or not considered correct. 2487 Note that the IETF Alarm Module term "resource" is synonymous to the 2488 ITU term "managed object". 2490 8. The X.733 Mapping Module 2492 This YANG module references [X.721], [X.733] and [X.736]. 2494 file "ietf-alarms-x733@2019-03-21.yang" 2495 module ietf-alarms-x733 { 2496 yang-version 1.1; 2497 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; 2498 prefix x733; 2500 import ietf-alarms { 2501 prefix al; 2502 } 2503 import ietf-yang-types { 2504 prefix yang; 2505 reference 2506 "RFC 6991: Common YANG Data Types"; 2507 } 2509 organization 2510 "IETF CCAMP Working Group"; 2511 contact 2512 "WG Web: 2513 WG List: 2515 Editor: Stefan Vallin 2516 2518 Editor: Martin Bjorklund 2519 "; 2520 description 2521 "This module augments the ietf-alarms module with X.733 alarm 2522 parameters. 2524 The following structures are augmented with X.733 event type 2525 and probable cause: 2527 1) alarms/alarm-inventory: all possible alarm types 2528 2) alarms/alarm-list: every alarm in the system 2529 3) alarm-notification: notifications indicating alarm state 2530 changes 2531 4) alarms/shelved-alarms 2533 The module also optionally allows the alarm management system 2534 to configure the mapping from the IETF Alarm module alarm keys 2535 to the ITU tuple (event-type, probable-cause). 2537 The mapping does not include a corresponding X.733 specific 2538 problem value. The recommendation is to use the 2539 'alarm-type-qualifier' leaf which serves the same purpose. 2541 The module uses an integer and a corresponding string for 2542 probable cause instead of a globally defined enumeration, in 2543 order to be able to manage conflicting enumeration definitions. 2544 A single globally defined enumeration is challenging to 2545 maintain. 2547 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 2548 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'NOT RECOMMENDED', 2549 'MAY', and 'OPTIONAL' in this document are to be interpreted as 2550 described in BCP 14 (RFC 2119) (RFC 8174) when, and only when, 2551 they appear in all capitals, as shown here. 2553 Copyright (c) 2019 IETF Trust and the persons identified as 2554 authors of the code. All rights reserved. 2556 Redistribution and use in source and binary forms, with or 2557 without modification, is permitted pursuant to, and subject to 2558 the license terms contained in, the Simplified BSD License set 2559 forth in Section 4.c of the IETF Trust's Legal Provisions 2560 Relating to IETF Documents 2561 (https://trustee.ietf.org/license-info). 2563 This version of this YANG module is part of RFC XXXX 2564 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 2565 full legal notices."; 2566 reference 2567 "ITU Recommendation X.733: Information Technology 2568 - Open Systems Interconnection 2569 - System Management: Alarm Reporting Function"; 2571 revision 2019-03-21 { 2572 description 2573 "Initial revision."; 2574 reference 2575 "RFC XXXX: YANG Alarm Module"; 2576 } 2578 /* 2579 * Features 2580 */ 2582 feature configure-x733-mapping { 2583 description 2584 "The system supports configurable X733 mapping from 2585 the IETF alarm module alarm-type to X733 event-type 2586 and probable-cause."; 2587 } 2589 /* 2590 * Typedefs 2591 */ 2593 typedef event-type { 2594 type enumeration { 2595 enum other { 2596 value 1; 2597 description 2598 "None of the below."; 2599 } 2600 enum communications-alarm { 2601 value 2; 2602 description 2603 "An alarm of this type is principally associated with the 2604 procedures and/or processes required to convey 2605 information from one point to another."; 2606 } 2607 enum quality-of-service-alarm { 2608 value 3; 2609 description 2610 "An alarm of this type is principally associated with a 2611 degradation in the quality of a service."; 2613 } 2614 enum processing-error-alarm { 2615 value 4; 2616 description 2617 "An alarm of this type is principally associated with a 2618 software or processing fault."; 2619 } 2620 enum equipment-alarm { 2621 value 5; 2622 description 2623 "An alarm of this type is principally associated with an 2624 equipment fault."; 2625 } 2626 enum environmental-alarm { 2627 value 6; 2628 description 2629 "An alarm of this type is principally associated with a 2630 condition relating to an enclosure in which the equipment 2631 resides."; 2632 } 2633 enum integrity-violation { 2634 value 7; 2635 description 2636 "An indication that information may have been illegally 2637 modified, inserted or deleted."; 2638 } 2639 enum operational-violation { 2640 value 8; 2641 description 2642 "An indication that the provision of the requested service 2643 was not possible due to the unavailability, malfunction or 2644 incorrect invocation of the service."; 2645 } 2646 enum physical-violation { 2647 value 9; 2648 description 2649 "An indication that a physical resource has been violated 2650 in a way that suggests a security attack."; 2651 } 2652 enum security-service-or-mechanism-violation { 2653 value 10; 2654 description 2655 "An indication that a security attack has been detected by 2656 a security service or mechanism."; 2657 } 2658 enum time-domain-violation { 2659 value 11; 2660 description 2661 "An indication that an event has occurred at an unexpected 2662 or prohibited time."; 2663 } 2664 } 2665 description 2666 "The event types as defined by X.733 and X.736."; 2667 reference 2668 "ITU Recommendation X.733: Information Technology 2669 - Open Systems Interconnection 2670 - System Management: Alarm Reporting Function 2671 ITU Recommendation X.736: Information Technology 2672 - Open Systems Interconnection 2673 - System Management: Security Alarm Reporting Function"; 2674 } 2676 typedef trend { 2677 type enumeration { 2678 enum less-severe { 2679 description 2680 "There is at least one outstanding alarm of a 2681 severity higher (more severe) than that in the 2682 current alarm."; 2683 } 2684 enum no-change { 2685 description 2686 "The Perceived severity reported in the current 2687 alarm is the same as the highest (most severe) 2688 of any of the outstanding alarms"; 2689 } 2690 enum more-severe { 2691 description 2692 "The Perceived severity in the current alarm is 2693 higher (more severe) than that reported in any 2694 of the outstanding alarms."; 2695 } 2696 } 2697 description 2698 "This type is used to describe the 2699 severity trend of the alarming resource"; 2700 reference 2701 "ITU Recommendation X.721: Information Technology 2702 - Open Systems Interconnection 2703 - Structure of management information: 2704 Definition of management information 2705 Module Attribute-ASN1Module"; 2706 } 2708 typedef value-type { 2709 type union { 2710 type int64; 2711 type uint64; 2712 type decimal64 { 2713 fraction-digits 2; 2714 } 2715 } 2716 description 2717 "A generic union type to match ITU choice of integer 2718 and real."; 2719 } 2721 /* 2722 * Groupings 2723 */ 2725 grouping x733-alarm-parameters { 2726 description 2727 "Common X.733 parameters for alarms."; 2728 leaf event-type { 2729 type event-type; 2730 description 2731 "The X.733/X.736 event type for this alarm."; 2732 } 2733 leaf probable-cause { 2734 type uint32; 2735 description 2736 "The X.733 probable cause for this alarm."; 2737 } 2738 leaf probable-cause-string { 2739 type string; 2740 description 2741 "The user friendly string matching 2742 the probable cause integer value. The string 2743 SHOULD match the X.733 enumeration. For example, 2744 value 27 is 'localNodeTransmissionError'."; 2745 } 2746 container threshold-information { 2747 description 2748 "This parameter shall be present when the alarm 2749 is a result of crossing a threshold. "; 2750 leaf triggered-threshold { 2751 type string; 2752 description 2753 "The identifier of the threshold attribute that 2754 caused the notification."; 2755 } 2756 leaf observed-value { 2757 type value-type; 2758 description 2759 "The value of the gauge or counter which crossed 2760 the threshold. This may be different from the 2761 threshold value if, for example, the gauge may 2762 only take on discrete values."; 2763 } 2764 choice threshold-level { 2765 description 2766 "In the case of a gauge the threshold level specifies 2767 a pair of threshold values, the first being the value 2768 of the crossed threshold and the second, its corresponding 2769 hysteresis; in the case of a counter the threshold level 2770 specifies only the threshold value."; 2771 case up { 2772 leaf up-high { 2773 type value-type; 2774 description 2775 "The going up threshold for rising the alarm."; 2776 } 2777 leaf up-low { 2778 type value-type; 2779 description 2780 "The threshold level for clearing the alarm. 2781 This is used for hysteresis functions for gauges."; 2782 } 2783 } 2784 case down { 2785 leaf down-low { 2786 type value-type; 2787 description 2788 "The going down threshold for rising the alarm."; 2789 } 2790 leaf down-high { 2791 type value-type; 2792 description 2793 "The threshold level for clearing the alarm. 2794 This is used for hysteresis functions for gauges."; 2795 } 2796 } 2797 } 2798 leaf arm-time { 2799 type yang:date-and-time; 2800 description 2801 "For a gauge threshold, the time at which the threshold 2802 was last re-armed, namely the time after the previous 2803 threshold crossing at which the hysteresis value of the 2804 threshold was exceeded thus again permitting generation 2805 of notifications when the threshold is crossed. 2806 For a counter threshold, the later of the time at which 2807 the threshold offset was last applied, or the time at 2808 which the counter was last initialized (for resettable 2809 counters)."; 2810 } 2811 } 2812 list monitored-attributes { 2813 uses attribute; 2814 key "id"; 2815 description 2816 "The Monitored attributes parameter, when present, defines 2817 one or more attributes of the resource and their 2818 corresponding values at the time of the alarm."; 2819 } 2820 leaf-list proposed-repair-actions { 2821 type string; 2822 description 2823 "This parameter, when present, is used if the cause is 2824 known and the system being managed can suggest one or 2825 more solutions (such as switch in standby equipment, 2826 retry, replace media)."; 2827 } 2828 leaf trend-indication { 2829 type trend; 2830 description 2831 "This parameter specifies the current 2832 severity trend of the resource. If present it 2833 indicates that there are one or more alarms 2834 ('outstanding alarms') which have not been cleared, 2835 and pertain to the same resource as that to which 2836 this alarm ('current alarm') pertains. 2837 The possible values are: 2839 more-severe: The Perceived severity in the current 2840 alarm is higher (more severe) than that reported in 2841 any of the outstanding alarms. 2843 no-change: The Perceived severity reported in the 2844 current alarm is the same as the highest (most severe) 2845 of any of the outstanding alarms. 2847 less-severe: There is at least one outstanding alarm 2848 of a severity higher (more severe) than that in the 2849 current alarm."; 2850 } 2851 leaf backedup-status { 2852 type boolean; 2853 description 2854 "This parameter, when present, specifies whether or not 2855 the object emitting the alarm has been backed-up, and 2856 services provided to the user have, therefore, not been 2857 disrupted. The use of this field in conjunction with the 2858 severity field provides information in an independent form 2859 to qualify the seriousness of the alarm and the ability of 2860 the system as a whole to continue to provide services. 2861 If the value of this parameter is true, it indicates that 2862 the object emitting the alarm has been backed-up; if false, 2863 the object has not been backed-up."; 2864 } 2865 leaf backup-object { 2866 type al:resource; 2867 description 2868 "This parameter shall be present when the Backed-up status 2869 parameter is present and has the value true. This parameter 2870 specifies the managed object instance that is providing 2871 back-up services for the managed object about which the 2872 notification pertains. This parameter is useful, 2873 for example, when the back-up object is from a pool of 2874 objects any of which may be dynamically allocated to 2875 replace a faulty object."; 2876 } 2877 list additional-information { 2878 key "identifier"; 2879 description 2880 "This parameter allows the inclusion of a 2881 set of additional information in the alarm. It is 2882 a series of data structures each of which contains three 2883 items of information: an identifier, a significance 2884 indicator, and the problem information."; 2885 leaf identifier { 2886 type string; 2887 description 2888 "Identifies the data-type of the information parameter."; 2889 } 2890 leaf significant { 2891 type boolean; 2892 description 2893 "Set to true if the receiving system must be able to 2894 parse the contents of the information subparameter 2895 for the event report to be fully understood."; 2896 } 2897 leaf information { 2898 type string; 2899 description 2900 "Additional information about the alarm."; 2902 } 2903 } 2904 leaf security-alarm-detector { 2905 type al:resource; 2906 description 2907 "This parameter identifies the detector of the security 2908 alarm."; 2909 } 2910 leaf service-user { 2911 type al:resource; 2912 description 2913 "This parameter identifies the service-user whose request 2914 for service led to the generation of the security alarm."; 2915 } 2916 leaf service-provider { 2917 type al:resource; 2918 description 2919 "This parameter identifies the intended service-provider 2920 of the service that led to the generation of the security 2921 alarm."; 2922 } 2923 reference 2924 "ITU Recommendation X.733: Information Technology 2925 - Open Systems Interconnection 2926 - System Management: Alarm Reporting Function 2927 ITU Recommendation X.736: Information Technology 2928 - Open Systems Interconnection 2929 - System Management: Security Alarm Reporting Function"; 2930 } 2932 grouping x733-alarm-definition-parameters { 2933 description 2934 "Common X.733 parameters for alarm definitions. 2935 This grouping is used to define those alarm 2936 attributes that can be mapped from the alarm-type 2937 mechanism in the ietf-alarm module."; 2938 leaf event-type { 2939 type event-type; 2940 description 2941 "The alarm type has this X.733/X.736 event type."; 2942 } 2943 leaf probable-cause { 2944 type uint32; 2945 description 2946 "The alarm type has this X.733 probable cause value. 2947 This module defines probable cause as an integer 2948 and not as an enumeration. The reason being that the 2949 primary use of probable cause is in the management 2950 application if it is based on the X.733 standard. 2951 However, most management applications have their own 2952 defined enum definitions and merging enums from 2953 different systems might create conflicts. By using 2954 a configurable uint32 the system can be configured 2955 to match the enum values in the management application."; 2956 } 2957 leaf probable-cause-string { 2958 type string; 2959 description 2960 "This string can be used to give a user friendly string 2961 to the probable cause value."; 2962 } 2963 } 2965 grouping attribute { 2966 description 2967 "A grouping to match the ITU generic reference to 2968 an attribute."; 2969 leaf id { 2970 type al:resource; 2971 description 2972 "The resource representing the attribute."; 2973 } 2974 leaf value { 2975 type string; 2976 description 2977 "The value represented as a string since it could 2978 be of any type."; 2979 } 2980 reference 2981 "ITU Recommendation X.721: Information Technology 2982 - Open Systems Interconnection 2983 - Structure of management information: 2984 Definition of management information 2985 Module Attribute-ASN1Module"; 2986 } 2988 /* 2989 * Add X.733 parameters to the alarm definitions, alarms, 2990 * and notification. 2991 */ 2993 augment "/al:alarms/al:alarm-inventory/al:alarm-type" { 2994 description 2995 "Augment X.733 mapping information to the alarm inventory."; 2996 uses x733-alarm-definition-parameters; 2997 } 2998 /* 2999 * Add X.733 configurable mapping. 3000 */ 3002 augment "/al:alarms/al:control" { 3003 description 3004 "Add X.733 mapping capabilities. "; 3005 list x733-mapping { 3006 if-feature "configure-x733-mapping"; 3007 key "alarm-type-id alarm-type-qualifier-match"; 3008 description 3009 "This list allows a management application to control the 3010 X.733 mapping for all alarm types in the system. Any entry 3011 in this list will allow the alarm manager to over-ride the 3012 default X.733 mapping in the system and the final mapping 3013 will be shown in the alarm inventory."; 3014 leaf alarm-type-id { 3015 type al:alarm-type-id; 3016 description 3017 "Map the alarm type with this alarm type identifier."; 3018 } 3019 leaf alarm-type-qualifier-match { 3020 type string; 3021 description 3022 "A W3C regular expression that is used when mapping an 3023 alarm type and alarm-type-qualifier to X.733 parameters."; 3024 } 3025 uses x733-alarm-definition-parameters; 3026 } 3027 } 3029 augment "/al:alarms/al:alarm-list/al:alarm" { 3030 description 3031 "Augment X.733 information to the alarm."; 3032 uses x733-alarm-parameters; 3033 } 3035 augment "/al:alarms/al:shelved-alarms/al:shelved-alarm" { 3036 description 3037 "Augment X.733 information to the alarm."; 3038 uses x733-alarm-parameters; 3039 } 3041 augment "/al:alarm-notification" { 3042 description 3043 "Augment X.733 information to the alarm notification."; 3044 uses x733-alarm-parameters; 3045 } 3047 } 3049 3051 9. IANA Considerations 3053 This document registers two URIs in the IETF XML registry [RFC3688]. 3054 Following the format in RFC 3688, the following registrations are 3055 requested to be made. 3057 URI: urn:ietf:params:xml:ns:yang:ietf-alarms 3058 Registrant Contact: The IESG. 3059 XML: N/A, the requested URI is an XML namespace. 3061 URI: urn:ietf:params:xml:ns:yang:ietf-alarms-x733 3062 Registrant Contact: The IESG. 3063 XML: N/A, the requested URI is an XML namespace. 3065 This document registers two YANG modules in the YANG Module Names 3066 registry [RFC6020]. 3068 name: ietf-alarms 3069 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms 3070 prefix: al 3071 reference: RFC XXXX 3073 name: ietf-alarms-x733 3074 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms-x733 3075 prefix: x733 3076 reference: RFC XXXX 3078 10. Security Considerations 3080 The YANG module specified in this document defines a schema for data 3081 that is designed to be accessed via network management protocols such 3082 as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer 3083 is the secure transport layer, and the mandatory-to-implement secure 3084 transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer 3085 is HTTPS, and the mandatory-to-implement secure transport is TLS 3086 [RFC8446]. 3088 The Network Configuration Access Control model (NACM) [RFC8341] 3089 provides the means to restrict access for particular NETCONF or 3090 RESTCONF users to a preconfigured subset of all available NETCONF or 3091 RESTCONF protocol operations and content. 3093 The list of alarms itself may be potentially sensitive from a 3094 security perspective, in that it potentially gives an attacker an 3095 authoritative picture of the (broken) state of the network. 3097 There are a number of data nodes defined in this YANG module that are 3098 writable/creatable/deletable (i.e., config true, which is the 3099 default). These data nodes may be considered sensitive or vulnerable 3100 in some network environments. Write operations (e.g., edit-config) 3101 to these data nodes without proper protection can have a negative 3102 effect on network operations. These are the subtrees and data nodes 3103 and their sensitivity/vulnerability: 3105 /alarms/control/notify-status-changes: This leaf controls whether an 3106 alarm should notify based on various state changes. Unauthorized 3107 access to this leaf could have a negative impact on operational 3108 procedures relying on fine-grained alarm state change reporting 3110 /alarms/control/alarm-shelving/shelf: This list controls the 3111 shelving (blocking) of alarms. Unauthorized access to this list 3112 could jeopardize the alarm management procedures since these 3113 alarms will not be notified and not be part of the alarm list. 3115 /alarms/control/alarm-profile/alarm-severity-assignment-profile: 3116 This list controls the severity levels of an alarm. Unauthorized 3117 access to this could for example downgrade the severity of an 3118 alarm and thereby have a negative impact on the alarm monitoring 3119 process. 3121 Some of the operations in this YANG module may be considered 3122 sensitive or vulnerable in some network environments. It is thus 3123 important to control access to these operations. These are the 3124 operations and their sensitivity/vulnerability: 3126 /alarms/alarm-list/purge-alarms: This action deletes alarms from the 3127 alarm list. Unauthorized use of this action could jeopardize the 3128 alarm management procedures since the deleted alarms may be vital 3129 for the alarm management application. 3131 /alarms/alarm-list/alarm/set-operator-state: This action can be used 3132 by the operator to indicate the level of human intervention on an 3133 alarm. Unauthorized use of this action could result in alarms 3134 being ignored by operators. 3136 11. Acknowledgements 3138 The authors wish to thank Viktor Leijon and Johan Nordlander for 3139 their valuable input on forming the alarm model. 3141 The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and 3142 Balazs Lengyel for their extensive reviews and contributions to this 3143 document. 3145 12. References 3147 12.1. Normative References 3149 [M.3100] International Telecommunications Union, "Generic Network 3150 Information Model", ITU-T Recommendation M.3100, 2005. 3152 [M.3160] International Telecommunications Union, "Generic, 3153 protocol-neutral management information model", 3154 ITU-T Recommendation M.3100, 2008. 3156 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 3157 Requirement Levels", BCP 14, RFC 2119, 3158 DOI 10.17487/RFC2119, March 1997, 3159 . 3161 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 3162 DOI 10.17487/RFC3688, January 2004, 3163 . 3165 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 3166 the Network Configuration Protocol (NETCONF)", RFC 6020, 3167 DOI 10.17487/RFC6020, October 2010, 3168 . 3170 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 3171 and A. Bierman, Ed., "Network Configuration Protocol 3172 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 3173 . 3175 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 3176 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 3177 . 3179 [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", 3180 RFC 6991, DOI 10.17487/RFC6991, July 2013, 3181 . 3183 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 3184 RFC 7950, DOI 10.17487/RFC7950, August 2016, 3185 . 3187 [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 3188 Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, 3189 . 3191 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 3192 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 3193 May 2017, . 3195 [RFC8341] Bierman, A. and M. Bjorklund, "Network Configuration 3196 Access Control Model", STD 91, RFC 8341, 3197 DOI 10.17487/RFC8341, March 2018, 3198 . 3200 [RFC8348] Bierman, A., Bjorklund, M., Dong, J., and D. Romascanu, "A 3201 YANG Data Model for Hardware Management", RFC 8348, 3202 DOI 10.17487/RFC8348, March 2018, 3203 . 3205 [RFC8446] Rescorla, E., "The Transport Layer Security (TLS) Protocol 3206 Version 1.3", RFC 8446, DOI 10.17487/RFC8446, August 2018, 3207 . 3209 [X.721] International Telecommunications Union, "Information 3210 Technology - Open Systems Interconnection - Structure of 3211 management information: Definition of management 3212 information", ITU-T Recommendation X.721, 1992. 3214 [X.733] International Telecommunications Union, "Information 3215 Technology - Open Systems Interconnection - Systems 3216 Management: Alarm Reporting Function", 3217 ITU-T Recommendation X.733, 1992. 3219 [XSD-TYPES] 3220 Malhotra, A. and P. Biron, "XML Schema Part 2: Datatypes 3221 Second Edition", World Wide Web Consortium Recommendation 3222 REC-xmlschema-2-20041028, October 2004, 3223 . 3225 12.2. Informative References 3227 [ALARMIRP] 3228 3GPP, "Telecommunication management; Fault Management; 3229 Part 2: Alarm Integration Reference Point (IRP): 3230 Information Service (IS)", 3GPP TS 32.111-2 3.4.0, March 3231 2005. 3233 [ALARMSEM] 3234 Wallin, S., Leijon, V., Nordlander, J., and N. Bystedt, 3235 "The semantics of alarm definitions: enabling systematic 3236 reasoning about alarms. International Journal of Network 3237 Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, 3238 http://dx.doi.org/10.1002/nem.800", March 2012. 3240 [EEMUA] EEMUA Publication No. 191 Engineering Equipment and 3241 Materials Users Association, London, 2 edition., "Alarm 3242 Systems: A Guide to Design, Management and Procurement.", 3243 2007. 3245 [G.7710] ITU-T, "SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL 3246 SYSTEMS AND NETWORKS Data over Transport - Generic aspects 3247 - Transport network control aspects. Common equipment 3248 management function requirements", 2012. 3250 [I-D.ietf-netmod-yang-instance-file-format] 3251 Lengyel, B. and B. Claise, "YANG Instance Data File 3252 Format", draft-ietf-netmod-yang-instance-file-format-02 3253 (work in progress), February 2019. 3255 [ISA182] International Society of Automation,ISA, "ANSI/ISA- 3256 18.2-2009 Management of Alarm Systems for the Process 3257 Industries", 2009. 3259 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 3260 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 3261 September 2004, . 3263 [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", 3264 BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, 3265 . 3267 [X.736] International Telecommunications Union, "Information 3268 Technology - Open Systems Interconnection - Systems 3269 Management: Security alarm reporting function", 3270 ITU-T Recommendation X.736, 1992. 3272 Appendix A. Vendor-specific Alarm Types Example 3274 This example shows how to define alarm types in a vendor-specific 3275 module. In this case the vendor "xyz" has chosen to define top level 3276 identities according to X.733 event types. 3278 module example-xyz-alarms { 3279 namespace "urn:example:xyz-alarms"; 3280 prefix xyz-al; 3282 import ietf-alarms { 3283 prefix al; 3284 } 3286 identity xyz-alarms { 3287 base al:alarm-type-id; 3288 } 3290 identity communications-alarm { 3291 base xyz-alarms; 3292 } 3293 identity quality-of-service-alarm { 3294 base xyz-alarms; 3295 } 3296 identity processing-error-alarm { 3297 base xyz-alarms; 3298 } 3299 identity equipment-alarm { 3300 base xyz-alarms; 3301 } 3302 identity environmental-alarm { 3303 base xyz-alarms; 3304 } 3306 // communications alarms 3307 identity link-alarm { 3308 base communications-alarm; 3309 } 3311 // QoS alarms 3312 identity high-jitter-alarm { 3313 base quality-of-service-alarm; 3314 } 3315 } 3317 Appendix B. Alarm Inventory Example 3319 This shows an alarm inventory, it shows one alarm type defined only 3320 with the identifier, and another dynamically configured. In the 3321 latter case a digital input has been connected to a smoke-detector, 3322 therefore the "alarm-type-qualifier" is set to "smoke-detector" and 3323 the "alarm-type-id" to "environmental-alarm". 3325 3328 3329 3330 xyz-al:link-alarm 3331 3332 3333 /dev:interfaces/dev:interface 3334 3335 true 3336 3337 Link failure, operational state down but admin state up 3338 3339 3340 3341 xyz-al:environmental-alarm 3342 smoke-alarm 3343 true 3344 3345 Connected smoke detector to digital input 3346 3347 3348 3349 3351 Appendix C. Alarm List Example 3353 In this example we show an alarm that has toggled [major, clear, 3354 major]. An operator has acknowledged the alarm. 3356 3359 3360 1 3361 2018-04-08T08:39:50.00Z 3362 3363 3364 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3365 3366 xyz-al:link-alarm 3367 3368 2018-04-08T08:20:10.00Z 3369 false 3370 1.3.6.1.2.1.2.2.1.1.17 3371 2018-04-08T08:39:40.00Z 3372 2018-04-08T08:39:50.00Z 3373 major 3374 3375 Link operationally down but administratively up 3376 3377 3378 3379 major 3380 3381 Link operationally down but administratively up 3382 3383 3384 3385 3386 cleared 3387 3388 Link operationally up and administratively up 3389 3390 3391 3392 3393 major 3394 3395 Link operationally down but administratively up 3396 3397 3398 3399 3400 ack 3401 joe 3402 Will investigate, ticket TR764999 3403 3404 3405 3406 3408 Appendix D. Alarm Shelving Example 3410 This example shows how to shelf alarms. We shelf alarms related to 3411 the smoke-detectors since they are being installed and tested. We 3412 also shelf all alarms from FastEthernet1/0. 3414 3417 3418 3419 3420 FE10 3421 3422 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3423 3424 3425 3426 detectortest 3427 3428 3429 xyz-al:environmental-alarm 3430 3431 3432 smoke-alarm 3433 3434 3435 3436 3437 3438 3440 Appendix E. X.733 Mapping Example 3442 This example shows how to map a dynamic alarm type (alarm-type- 3443 id=environmental-alarm, alarm-type-qualifier=smoke-alarm) to the 3444 corresponding X.733 "event-type" and "probable-cause" parameters. 3446 3448 3449 3451 xyz-al:environmental-alarm 3452 3453 smoke-alarm 3454 3455 quality-of-service-alarm 3456 777 3457 3458 3459 3461 Appendix F. Relationship to other alarm standards 3463 This section briefly describes how this alarm module relates to other 3464 relevant standards. 3466 F.1. Alarm definition 3468 The table below summarizes relevant definitions of the term "alarm" 3469 in other alarm standards. 3471 +------------+---------------------------+--------------------------+ 3472 | Standard | Definition | Comment | 3473 +------------+---------------------------+--------------------------+ 3474 | X.733 | error: A deviation of a | The X.733 alarm | 3475 | [X.733] | system from normal | definition is focused on | 3476 | | operation. fault: The | the notification as such | 3477 | | physical or algorithmic | and not the state. X.733 | 3478 | | cause of a malfunction. | defines an alarm as a | 3479 | | Faults manifest | deviation from normal | 3480 | | themselves as errors. | condition, but without | 3481 | | alarm: A notification, of | the requirement that it | 3482 | | the form defined by this | needs corrective | 3483 | | function, of a specific | actions. | 3484 | | event. An alarm may or | | 3485 | | may not represent an | | 3486 | | error. | | 3487 | | | | 3488 | G.7710 | Alarms are indications | The G.7710 definition is | 3489 | [G.7710] | that are automatically | close to the original | 3490 | | generated by a device as | X.733 definition. | 3491 | | a result of the | | 3492 | | declaration of a failure. | | 3493 | | | | 3494 | Alarm MIB | Alarm: Persistent | RFC 3877 defines the | 3495 | [RFC3877] | indication of a fault. | term alarm referring | 3496 | | Fault: Lasting error or | back to "a deviation | 3497 | | warning condition. | from normal operation". | 3498 | | Error: A deviation of a | The Alarm YANG model | 3499 | | system from normal | adds the requirement | 3500 | | operation. | that it should require a | 3501 | | | corrective action and | 3502 | | | should be undesired, not | 3503 | | | only a deviation from | 3504 | | | normal. The alarm MIB | 3505 | | | is state oriented in the | 3506 | | | same way as the Alarm | 3507 | | | YANG, it focuses on the | 3508 | | | "lasting condition", | 3509 | | | not the individual | 3510 | | | notifications. | 3511 | | | | 3512 | ISA | Alarm: An audible and/or | The ISA standard adds an | 3513 | [ISA182] | visible means of | important requirement to | 3514 | | indicating to the | the "deviation from | 3515 | | operator an equipment | normal condition state": | 3516 | | malfunction, process | requiring a response. | 3517 | | deviation or abnormal | | 3518 | | condition requiring a | | 3519 | | response. | | 3520 | | | | 3521 | EEMUA | An alarm is an event to | This is the foundation | 3522 | [EEMUA] | which an operator must | for the definition of | 3523 | | knowingly react,respond, | alarm in this document. | 3524 | | and acknowledge - not | It focuses on the core | 3525 | | simply acknowledge and | criteria that an action | 3526 | | ignore. | is really needed. | 3527 | | | | 3528 | 3GPP Alarm | 3GPP v15: An alarm | The latest 3GPP Alarm | 3529 | IRP | signifies an undesired | IRP version uses | 3530 | [ALARMIRP] | condition of a resource | literally the same alarm | 3531 | | (e.g. device, link) for | definition as this alarm | 3532 | | which an operator action | module. It is worth | 3533 | | is required. It | noting that earlier | 3534 | | emphasizes a key | versions used a | 3535 | | requirement that | definition not requiring | 3536 | | operators [...] should | an operator action and | 3537 | | not be informed about an | the more broad | 3538 | | undesired condition | definition of deviation | 3539 | | unless it requires | from normal condition. | 3540 | | operator action. 3GPP | The earlier version also | 3541 | | v12: alarm: abnormal | defined an alarm as a | 3542 | | network entity condition, | special case of "event". | 3543 | | which categorizes an | | 3544 | | event as a fault. fault: | | 3545 | | a deviation of a system | | 3546 | | from normal operation, | | 3547 | | which may result in the | | 3548 | | loss of operational | | 3549 | | capabilities [...] | | 3550 +------------+---------------------------+--------------------------+ 3552 Table 1: Definition of alarm in standards 3554 The evolution of the definition of alarm moves from focused on events 3555 reporting a deviation from normal operation towards a definition to a 3556 undesired *state* which *requires an operator action*. 3558 F.2. Data model 3560 This section describes how this YANG alarm module relates to other 3561 standard data models. Note well that we cover other data models for 3562 alarm interfaces. Not other standards such as SDO specific alarms 3563 for example. 3565 F.2.1. X.733 3567 X.733 has acted as a base for several alarm data models over the 3568 year. The YANG alarm module differs in the following ways: 3570 X.733 models the alarm list as a list of notifications. The YANG 3571 alarm module defines the alarm list as the current alarm states 3572 for the resources, which is generated from the state change 3573 reporting notifications. 3575 In X.733 an alarm can have the severity level clear. In the YANG 3576 alarm module "clear" is not a severity level, it is a separate 3577 state of the alarm. An alarm can have the following states for 3578 example (major, cleared), (minor, not cleared) 3580 X.733 uses a flat globally defined enumerated "probable-cause" to 3581 identify alarm types. This alarm module uses a hierarchical YANG 3582 identity, "alarm-type". This enables delegation of alarm types 3583 within organizations. It also lets management reason about 3584 abstract alarm types corresponding to base identities, see 3585 Section 3.2. 3587 The YANG alarm module has not included the majority of the X.733 3588 alarm attributes. Rather these are defined in an augmenting 3589 module if "strict" X.733 compliance is needed. 3591 F.2.2. RFC 3877, the Alarm MIB 3593 The MIB in RFC 3877 takes a different approach, rather than defining 3594 a concrete data model for alarms, it defines a model to map existing 3595 SNMP managed objects and notifications into alarm states and alarm 3596 notifications. This was necessary since MIBs were already defined 3597 with both managed objects and notifications indicating alarms, for 3598 example linkUp and linkDown notifications in combination with 3599 ifAdminState and ifOperState. So RFC 3877 can not really be compared 3600 to the alarm YANG module in that sense. 3602 The Alarm MIB maps existing MIB definitions into alarms, 3603 alarmModelTable. The upside of that is that a SNMP Manager can at 3604 runtime read the possible alarm types. This corresponds to the 3605 alarmInventory in the alarm YANG module. 3607 F.2.3. 3GPP Alarm IRP 3609 The 3GPP Alarm IRP is an evolution of X.733. Main differences 3610 between the alarm YANG module and 3GPP are: 3612 3GPP keeps the majority of the X.733 attributes, the alarm YANG 3613 module does not. 3615 3GPP introduced overlapping and possibly conflicting keys for 3616 alarms, alarmId and (managed object, event type, probable cause, 3617 specific problem). (See Annex C in [X.733] Example 3). In the 3618 YANG alarm module the key for identifying an alarm instance is 3619 clearly defined by ("resource", "alarm-type-id", "alarm-type- 3620 qualifier"). See also Section 3.4 for more information. 3622 The alarm YANG module clearly separates the resource/ 3623 instrumentation life cycle from the operator life cycle. 3GPP 3624 allows operators to set the alarm severity to clear, this is not 3625 allowed by this module, rather an operator closes an alarm which 3626 does not affect the severity. 3628 F.2.4. G.7710 3630 G.7710 is different than the previous referenced alarm standards. It 3631 does not define a data model for alarm reporting. It defines common 3632 equipment management function requirements including alarm 3633 instrumentation. The scope is transport networks. 3635 The requirements in G.7710 corresponds to features in the alarm YANG 3636 module in the following way: 3638 Alarm Severity Assignment Profile (ASAP): the alarm profile 3639 "/alarms/alarm-profile/". 3641 Alarm Reporting Control (ARC): alarm shelving "/alarms/control/ 3642 alarm-shelving/" and the ability to control alarm notifications 3643 "/alarms/control/notify-status-changes". Alarm shelving 3644 corresponds to the use case of turning off alarm reporting for a 3645 specific resource, the NALM state in M.3100. 3647 Appendix G. Alarm Usability Requirements 3649 This section defines usability requirements for alarms. Alarm 3650 usability is important for an alarm interface. A data model will 3651 help in defining the format but if the actual alarms are of low value 3652 we have not gained the goal of alarm management. 3654 Common alarm problems and the cause of the problems are summarized in 3655 Table 2. This summary is adopted to networking based on the ISA 3656 [ISA182] and EEMUA [EEMUA] standards. 3658 +------------------+--------------------------------+---------------+ 3659 | Problem | Cause | How this | 3660 | | | module | 3661 | | | address the | 3662 | | | cause | 3663 +------------------+--------------------------------+---------------+ 3664 | Alarms are | "Nuisance" alarms (chattering | Strict | 3665 | generated but | alarms and fleeting alarms), | definition of | 3666 | they are ignored | faulty hardware, redundant | alarms | 3667 | by the operator. | alarms, cascading alarms, | requiring | 3668 | | incorrect alarm settings, | corrective | 3669 | | alarms have not been | response. | 3670 | | rationalized, the alarms | Alarm | 3671 | | represent log information | requirements | 3672 | | rather than true alarms. | in Table 3. | 3673 | | | | 3674 | When alarms | Insufficient alarm response | The alarm | 3675 | occur, operators | procedures and not well | inventory | 3676 | do not know how | defined alarm types. | lists all | 3677 | to respond. | | alarm types | 3678 | | | and | 3679 | | | corrective | 3680 | | | actions. | 3681 | | | Alarm | 3682 | | | requirements | 3683 | | | in Table 3. | 3684 | | | | 3685 | The alarm | Nuisance alarms, stale alarms, | The alarm | 3686 | display is full | alarms from equipment not in | definition | 3687 | of alarms, even | service. | and alarm | 3688 | when there is | | shelving. | 3689 | nothing wrong. | | | 3690 | | | | 3691 | During a | Incorrect prioritization of | State-based | 3692 | failure, | alarms. Not using advanced | alarm model, | 3693 | operators are | alarm techniques (e.g. state- | alarm rate | 3694 | flooded with so | based alarming). | requirements | 3695 | many alarms that | | in Table 4 | 3696 | they do not know | | and Table 5 | 3697 | which ones are | | | 3698 | the most | | | 3699 | important. | | | 3700 +------------------+--------------------------------+---------------+ 3702 Table 2: Alarm Problems and Causes 3704 Based upon the above problems EEMUA gives the following definition of 3705 a good alarm: 3707 +----------------+--------------------------------------------------+ 3708 | Characteristic | Explanation | 3709 +----------------+--------------------------------------------------+ 3710 | Relevant | Not spurious or of low operational value. | 3711 | | | 3712 | Unique | Not duplicating another alarm. | 3713 | | | 3714 | Timely | Not long before any response is needed or too | 3715 | | late to do anything. | 3716 | | | 3717 | Prioritized | Indicating the importance that the operator | 3718 | | deals with the problem. | 3719 | | | 3720 | Understandable | Having a message which is clear and easy to | 3721 | | understand. | 3722 | | | 3723 | Diagnostic | Identifying the problem that has occurred. | 3724 | | | 3725 | Advisory | Indicative of the action to be taken. | 3726 | | | 3727 | Focusing | Drawing attention to the most important issues. | 3728 +----------------+--------------------------------------------------+ 3730 Table 3: Definition of a Good Alarm 3732 Vendors SHOULD rationalize all alarms according to above. Another 3733 crucial requirement is acceptable alarm notification rates. Vendors 3734 SHOULD make sure that they do not exceed the recommendations from 3735 EEMUA below: 3737 +-----------------------------------+-------------------------------+ 3738 | Long Term Alarm Rate in Steady | Acceptability | 3739 | Operation | | 3740 +-----------------------------------+-------------------------------+ 3741 | More than one per minute | Very likely to be | 3742 | | unacceptable. | 3743 | | | 3744 | One per 2 minutes | Likely to be over-demanding. | 3745 | | | 3746 | One per 5 minutes | Manageable. | 3747 | | | 3748 | Less than one per 10 minutes | Very likely to be acceptable. | 3749 +-----------------------------------+-------------------------------+ 3751 Table 4: Acceptable Alarm Rates, Steady State 3753 +----------------------------+--------------------------------------+ 3754 | Number of alarms displayed | Acceptability | 3755 | in 10 minutes following a | | 3756 | major network problem | | 3757 +----------------------------+--------------------------------------+ 3758 | More than 100 | Definitely excessive and very likely | 3759 | | to lead to the operator to abandon | 3760 | | the use of the alarm system. | 3761 | | | 3762 | 20-100 | Hard to cope with. | 3763 | | | 3764 | Under 10 | Should be manageable - but may be | 3765 | | difficult if several of the alarms | 3766 | | require a complex operator response. | 3767 +----------------------------+--------------------------------------+ 3769 Table 5: Acceptable Alarm Rates, Burst 3771 The numbers in Table 4 and Table 5 are the sum of all alarms for a 3772 network being managed from one alarm console. So every individual 3773 system or NMS contributes to these numbers. 3775 Vendors SHOULD make sure that the following rules are used in 3776 designing the alarm interface: 3778 1. Rationalize the alarms in the system to ensure that every alarm 3779 is necessary, has a purpose, and follows the cardinal rule - that 3780 it requires an operator response. Adheres to the rules of 3781 Table 3 3783 2. Audit the quality of the alarms. Talk with the operators about 3784 how well the alarm information support them. Do they know what 3785 to do in the event of an alarm? Are they able to quickly 3786 diagnose the problem and determine the corrective action? Does 3787 the alarm text adhere to the requirements in Table 3? 3789 3. Analyze and benchmark the performance of the system and compare 3790 it to the recommended metrics in Table 4 and Table 5. Start by 3791 identifying nuisance alarms, standing alarms at normal state and 3792 startup. 3794 Authors' Addresses 3796 Stefan Vallin 3797 Stefan Vallin AB 3799 Email: stefan@wallan.se 3800 Martin Bjorklund 3801 Cisco 3803 Email: mbj@tail-f.com