idnits 2.17.1 draft-ietf-ccamp-alarm-module-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 440 has weird spacing: '...perator str...' == Line 445 has weird spacing: '...w state wri...' == Line 654 has weird spacing: '...alifier ala...' == Line 707 has weird spacing: '...alifier lea...' == Line 717 has weird spacing: '...everity sev...' == (3 more instances...) -- The document date (November 6, 2018) is 1996 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) ** Obsolete normative reference: RFC 6536 (Obsoleted by RFC 8341) Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Vallin 3 Internet-Draft Stefan Vallin AB 4 Intended status: Standards Track M. Bjorklund 5 Expires: May 10, 2019 Cisco 6 November 6, 2018 8 YANG Alarm Module 9 draft-ietf-ccamp-alarm-module-05 11 Abstract 13 This document defines a YANG module for alarm management. It 14 includes functions for alarm list management, alarm shelving and 15 notifications to inform management systems. There are also RPCs to 16 manage the operator state of an alarm and administrative alarm 17 procedures. The module carefully maps to relevant alarm standards. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on May 10, 2019. 36 Copyright Notice 38 Copyright (c) 2018 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 55 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 57 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 58 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.3. Identifying the Alarming Resource . . . . . . . . . . . . 7 60 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 8 61 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 62 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 9 63 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 10 64 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 10 65 3.6. Root Cause, Impacted Resources and Related Alarms . . . . 10 66 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 11 67 3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 12 68 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 12 69 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 13 70 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 14 71 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 14 72 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 15 73 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 15 74 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 17 75 4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 17 76 4.7. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 17 77 4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 18 78 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 18 79 6. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 48 80 7. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 48 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 59 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 59 83 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 60 86 11.2. Informative References . . . . . . . . . . . . . . . . . 62 87 Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 63 88 Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 64 89 Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 65 90 Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 66 91 Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 67 92 Appendix F. Relationships to other standards . . . . . . . . . . 68 93 F.1. Relationship to RFC 8348 . . . . . . . . . . . . . . . . 68 94 F.2. Relationship to other alarm standards . . . . . . . . . . 68 95 F.2.1. Alarm definition . . . . . . . . . . . . . . . . . . 68 96 F.2.2. Data model . . . . . . . . . . . . . . . . . . . . . 70 97 Appendix G. Alarm Usability Requirements . . . . . . . . . . . . 72 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75 100 1. Introduction 102 This document defines a YANG [RFC7950] module for alarm management. 103 The purpose is to define a standardized alarm interface for network 104 devices that can be easily integrated into management applications. 105 The model is also applicable as a northbound alarm interface in the 106 management applications. 108 Alarm monitoring is a fundamental part of monitoring the network. 109 Raw alarms from devices do not always tell the status of the network 110 services or necessarily point to the root cause. However, being able 111 to feed alarms to the alarm management application in a standardized 112 format is a starting point for performing higher level network 113 assurance tasks. 115 The design of the module is based on experience from using and 116 implementing available alarm standards from ITU [X.733], 3GPP 117 [ALARMIRP] and ANSI [ISA182]. 119 1.1. Terminology and Notation 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 123 "OPTIONAL" in this document are to be interpreted as described in BCP 124 14 [RFC2119] [RFC8174] when, and only when, they appear in all 125 capitals, as shown here. 127 The following terms are defined in [RFC7950]: 129 o action 131 o client 133 o data tree 135 o RPC 137 o server 139 The following terms are used within this document: 141 o Alarm (the general concept): An alarm signifies an undesirable 142 state in a resource that requires corrective action. 144 o Alarm Type: An alarm type identifies a possible unique alarm state 145 for a resource. Alarm types are names to identify the state like 146 "link-alarm", "jitter-violation", "high-disk-utilization". 148 o Resource: A fine-grained identification of the alarming resource, 149 for example: an interface, a process. 151 o Alarm Instance: The alarm state for a specific resource and alarm 152 type. For example (GigabitEthernet0/15, link-alarm). An entry in 153 the alarm list. 155 o Alarm Inventory: A list of all possible alarm types on a system. 157 o Alarm Shelving: Blocking alarms according to specific criteria. 159 o Corrective Action: An action taken by an operator or automation 160 routine in order to minimize the impact of the alarm or resolving 161 the root cause. 163 o Management System: The alarm management application that consumes 164 the alarms, i.e., acts as a client. 166 o System: The system that implements this YANG alarm module, i.e., 167 acts as a server. This corresponds to a network device or a 168 management application that provides a north-bound alarm 169 interface. 171 Tree diagrams used in this document follow the notation defined in 172 [RFC8340]. 174 2. Objectives 176 The objectives for the design of the Alarm Module are: 178 o Simple to use. If a system supports this module, it shall be 179 straight-forward to integrate this into a YANG based alarm 180 manager. 182 o View alarms as states on resources and not as discrete 183 notifications. 185 o Clear definition of "alarm" in order to exclude general events 186 that should not be forwarded as alarm notifications. 188 o Clear and precise identification of alarm types and alarm 189 instances. 191 o A management system should be able to pull all available alarm 192 types from a system, i.e., read the alarm inventory from a system. 193 This makes it possible to prepare alarm operators with 194 corresponding alarm instructions. 196 o Address alarm usability requirements, see Appendix G. While IETF 197 has not really addressed alarm management, telecom standards has 198 addressed it purely from a protocol perspective. The process 199 industry has published several relevant standards addressing 200 requirements for a useful alarm interface; [EEMUA], [ISA182]. 201 This alarm module defines usability requirements as well as a YANG 202 data model. 204 o Mapping to X.733, which is a requirement for some alarm systems. 205 Still, keep some of the X.733 concepts out of the core model in 206 order to make the model small and easy to understand. 208 3. Alarm Module Concepts 210 This section defines the fundamental concepts behind the data model. 211 This section is rooted in the works of Vallin et. al [ALARMSEM]. 213 3.1. Alarm Definition 215 An alarm signifies an undesirable state in a resource that requires 216 corrective action. 218 There are two main things to remember from this definition: 220 1. the definition focuses on leaving out events and logging 221 information in general. Alarms should only be used for undesired 222 states that require action. 224 2. the definition also focus on alarms as a state on a resource, not 225 the notifications that report the state changes. 227 See Appendix F for information how this definition relates to other 228 alarm standards. 230 3.2. Alarm Type 232 This document defines an alarm type with an alarm type id and an 233 alarm type qualifier. 235 The alarm type id is modeled as a YANG identity. With YANG 236 identities, new alarm types can be defined in a distributed fashion. 237 YANG identities are hierarchical, which means that an hierarchy of 238 alarm types can be defined. 240 Standards and vendors should define their own alarm type identities 241 based on this definition. 243 The use of YANG identities means that all possible alarms are 244 identified at design time. This explicit declaration of alarm types 245 makes it easier to allow for alarm qualification reviews and 246 preparation of alarm actions and documentation. 248 There are occasions where the alarm types are not known at design 249 time. For example, a system with digital inputs that allows users to 250 connects detectors (e.g., smoke detector) to the inputs. In this 251 case it is a configuration action that says that certain connectors 252 are fire alarms for example. 254 In order to allow for dynamic addition of alarm types the alarm 255 module allows for further qualification of the identity based alarm 256 type using a string. A potential drawback of this is that there is a 257 big risk that alarm operators will receive alarm types as a surprise, 258 they do not know how to resolve the problem since a defined alarm 259 procedure does not necessarily exist. To avoid this risk the system 260 MUST publish all possible alarm types in the alarm inventory, see 261 Section 4.2. 263 A vendor or standard organization can define their own alarm-type 264 hierarchy. The example below shows a hierarchy based on X.733 event 265 types: 267 import ietf-alarms { 268 prefix al; 269 } 270 identity vendor-alarms { 271 base al:alarm-type; 272 } 273 identity communications-alarm { 274 base vendor-alarms; 275 } 276 identity link-alarm { 277 base communications-alarm; 278 } 280 Alarm types can be abstract. An abstract alarm type is used as a 281 base for defining hierarchical alarm types. Concrete alarm types are 282 used for alarm states and appear in the alarm inventory. There are 283 two kinds of concrete alarm types: 285 1. The last subordinate identity in the "alarm-type-id" hierarchy is 286 concrete, for example: "alarm-identity.environmental- 287 alarm.smoke". In this example "alarm-identity" and 288 "environmental-alarm" are abstract YANG identities, whereas 289 "smoke" is a concrete YANG identity. 291 2. The YANG identity hierarchy is abstract and the concrete alarm 292 type is defined by the dynamic alarm qualifier string, for 293 example: "alarm-identity.environmental-alarm.external-detector" 294 with alarm-type-qualifier "smoke". 296 For example: 298 // Alternative 1: concrete alarm type identity 299 import ietf-alarms { 300 prefix al; 301 } 302 identity environmental-alarm { 303 base al:alarm-type; 304 description "Abstract alarm type"; 305 } 306 identity smoke { 307 base environmental-alarm; 308 description "Concrete alarm type"; 309 } 311 // Alternative 2: concrete alarm type qualifier 312 import ietf-alarms { 313 prefix al; 314 } 315 identity environmental-alarm { 316 base al:alarm-type; 317 description "Abstract alarm type"; 318 } 319 identity external-detector { 320 base environmental-alarm; 321 description 322 "Abstract alarm type, a run-time configuration 323 procedure sets the type of alarm detected. This will 324 be reported in the alarm-type-qualifier."; 325 } 327 A server SHOULD strive to minimize the number of dynamically defined 328 alarm types. 330 3.3. Identifying the Alarming Resource 332 It is of vital importance to be able to refer to the alarming 333 resource. This reference must be as fine-grained as possible. If 334 the alarming resource exists in the data tree then an instance- 335 identifier MUST be used with the full path to the object. 337 When the module is used in a controller/orchestrator/manager the 338 original device resource identification can be modified to include 339 the device in the path. The details depend on how devices are 340 identified, and are out of scope for this specification. 342 Example: 344 The original device alarm might identify the resource as 345 "/dev:interfaces/dev:interface[dev:name='FastEthernet1/0']". 347 The resource identification in the manager could look something 348 like: "/mgr:devices/mgr:device[mgr:name='xyz123']/dev:interfaces/ 349 dev:interface[dev:name='FastEthernet1/0']" 351 This module also allows for alternate naming of the alarming resource 352 if it is not available in the data tree. 354 3.4. Identifying Alarm Instances 356 A primary goal of this alarm module is to remove any ambiguity in how 357 alarm notifications are mapped to an update of an alarm instance. 358 X.733 and especially 3GPP were not really clear on this point. This 359 YANG alarm module states that the tuple (resource, alarm type 360 identifier, alarm type qualifier) corresponds to a single alarm 361 instance. This means that alarm notifications for the same resource 362 and same alarm type are matched to update the same alarm instance. 363 These three leafs are therefore used as the key in the alarm list: 365 list alarm { 366 key "resource alarm-type-id alarm-type-qualifier"; 367 ... 368 } 370 3.5. Alarm Life-Cycle 372 The alarm model clearly separates the resource alarm life-cycle from 373 the operator and administrative life-cycles of an alarm. 375 o resource alarm life-cycle: the alarm instrumentation that controls 376 alarm raise, clearance, and severity changes. 378 o operator alarm life-cycle: operators acting upon alarms with 379 actions like acknowledgment and closing. Closing an alarm implies 380 that the operator considers the corrective action performed. 381 Operators can also shelf (block/filter) alarms in order to avoid 382 nuisance alarms. 384 o administrative alarm life-cycle: purging (deleting) unwanted 385 alarms and compressing the alarm status change list. This module 386 exposes operations to manage the administrative life-cycle. The 387 server may also perform these operations based on other policies, 388 but how that is done is out of scope for this document. 390 A server SHOULD describe how long it retains cleared/closed alarms: 391 until manually purged or if it has an automatic removal policy. 393 3.5.1. Resource Alarm Life-Cycle 395 From a resource perspective, an alarm can for example have the 396 following life-cycle: raise, change severity, change severity, clear, 397 being raised again etc. All of these status changes can have 398 different alarm texts generated by the instrumentation. Two 399 important things to note: 401 1. Alarms are not deleted when they are cleared. Deleting alarms is 402 an administrative process. The alarm module defines an rpc 403 "purge" that deletes alarms. 405 2. Alarms are not cleared by operators, only the underlying 406 instrumentation can clear an alarm. Operators can close alarms. 408 The YANG tree representation below illustrates the resource oriented 409 life-cycle: 411 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 412 ... 413 +--ro is-cleared boolean 414 +--ro last-changed yang:date-and-time 415 +--ro perceived-severity severity 416 +--ro alarm-text alarm-text 417 +--ro status-change* [time] {alarm-history}? 418 +--ro time yang:date-and-time 419 +--ro perceived-severity severity-with-clear 420 +--ro alarm-text alarm-text 422 For every status change from the resource perspective a row is added 423 to the "status-change" list. The last status values are also 424 represented as leafs for the alarm. Note well that the alarm 425 severity does not include "cleared", alarm clearance is a boolean 426 flag. 428 An alarm can therefore look like this: ((GigabitEthernet0/25, link- 429 alarm,""), false, T, major, "Interface GigabitEthernet0/25 down") 431 3.5.2. Operator Alarm Life-cycle 433 Operators can also act upon alarms using the set-operator-state 434 action: 436 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 437 ... 438 +--ro operator-state-change* [time] {operator-actions}? 439 | +--ro time yang:date-and-time 440 | +--ro operator string 441 | +--ro state operator-state 442 | +--ro text? string 443 +---x set-operator-state {operator-actions}? 444 +---w input 445 +---w state writable-operator-state 446 +---w text? string 448 The operator state for an alarm can be: "none", "ack", "shelved", and 449 "closed". Alarm deletion (using the rpc "purge-alarms"), can use 450 this state as a criteria. A closed alarm is an alarm where the 451 operator has performed any required corrective actions. Closed 452 alarms are good candidates for being purged. 454 3.5.3. Administrative Alarm Life-Cycle 456 Deleting alarms from the alarm list is considered an administrative 457 action. This is supported by the "purge-alarms" rpc. The "purge- 458 alarms" rpc takes a filter as input. The filter selects alarms based 459 on the operator and resource life-cycle such as "all closed cleared 460 alarms older than a time specification". The server may also perform 461 these operations based on other policies, but how that is done is out 462 of scope for this document. 464 Purged alarms are removed from the alarm list. Note well, if the 465 alarm resource state changes after a purge, the alarm will reappear 466 in the alarm list. 468 Alarms can be compressed. Compressing an alarm deletes all entries 469 in the alarm's "status-change" list except for the last status 470 change. A client can perform this using the "compress-alarms" rpc. 471 The server may also perform these operations based on other policies, 472 but how that is done is out of scope for this document. 474 3.6. Root Cause, Impacted Resources and Related Alarms 476 The general principle of this alarm module is to limit the amount of 477 alarms. In many cases several resources are affected for a given 478 underlying problem. A full disk will of course impact databases and 479 applications as well. The recommendation is not have a single alarm 480 for the underlying problem an list the affected resources in the 481 alarm, rather than having separate alarms for each resource. 483 The alarm has one leaf-list to identify possible "impacted-resources" 484 and a leaf-list to identify possible "root-cause-resources". These 485 serves as hints only. It is up to the client application to use this 486 information to present the overall status. Using the the disk full 487 example, a "good" alarm would be to use the hard disk partition as 488 the alarming resource and add the database and applications into the 489 impacted-resources leaf-list. 491 A system should always strive to identify the resource that can be 492 acted upon as the "resource" leaf. The "impacted-resource" leaf-list 493 shall be used to identify any side-effects of the alarm. The 494 impacted resources can not be acted upon to fix the problem. The 495 disk full example above illustrates the principle; you can not fix 496 the underlying issue by database operations. However, you need to 497 pay attention to the database to perform any operations that limits 498 the impact of problem. 500 In some occasions the system might not be capable of detecting the 501 root cause, the resource that can be acted upon. The instrumentation 502 in this case only monitors the side-effect and needs to represent an 503 alarm that indicates a situation that needs acting upon. The 504 instrumentation still might identify possible candidates for the 505 root-cause resource. In this case the "root-cause-resource" leaf- 506 list can be used to indicate the candidate root-cause resources. An 507 example of this kind of alarm might be an active test tool that 508 detects an SLA violation on a VPN connection and identifies the 509 devices along the chain as candidate root causes. 511 The alarm module also supports a way to associate different alarms to 512 each other with the "related-alarm" list. This list enables the 513 server to inform the client that certain alarms are related to other 514 alarms. 516 Note well that this module does not prescribe any dependencies or 517 preference between the above alarm correlation mechanisms. Different 518 systems have different capabilities and the above described 519 mechanisms are available to support the instrumentation features. 521 3.7. Alarm Shelving 523 Alarm shelving is an important function in order for alarm management 524 applications and operators to stop superfluous alarms. A shelved 525 alarm implies that any alarms fulfilling this criteria are ignored 526 (blocked/filtered). Shelved alarms appear in a dedicated shelved 527 alarm list in order not to disturb the relevant alarms. Shelved 528 alarms do not generate notifications. 530 3.8. Alarm Profiles 532 Alarm profiles are used to configure further information to an alarm 533 type. This module supports configuring severity levels overriding 534 the system default levels. This corresponds to the Alarm Assignment 535 Profile, ASAP, functionality in M.3100 [M.3100] and M.3160 [M.3160]. 536 Other standard or enterprise modules can augment this list with 537 further alarm type information. 539 4. Alarm Data Model 541 The fundamental parts of the data model are the "alarm-list" with 542 associated notifications and the "alarm-inventory" list of all 543 possible alarm types. These MUST be implemented by a system. The 544 rest of the data model are made conditional with YANG the features 545 "operator-actions", "alarm-shelving", "alarm-history", "alarm- 546 summary", "alarm-profile", and "severity-assignment". 548 The data model has the following overall structure: 550 +--rw control 551 | +--rw max-alarm-status-changes? union 552 | +--rw (notify-status-changes)? 553 | | ... 554 | +--rw alarm-shelving {alarm-shelving}? 555 | ... 556 +--ro alarm-inventory 557 | +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 558 | ... 559 +--ro summary {alarm-summary}? 560 | +--ro alarm-summary* [severity] 561 | | ... 562 | +--ro shelves-active? empty {alarm-shelving}? 563 +--ro alarm-list 564 | +--ro number-of-alarms? yang:gauge32 565 | +--ro last-changed? yang:date-and-time 566 | +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 567 | ... 568 +--ro shelved-alarms {alarm-shelving}? 569 | +--ro number-of-shelved-alarms? yang:gauge32 570 | +--ro shelved-alarms-last-changed? yang:date-and-time 571 | +--ro shelved-alarm* 572 | [resource alarm-type-id alarm-type-qualifier] 573 | ... 574 +--rw alarm-profile* 575 [alarm-type-id alarm-type-qualifier-match resource] 576 {alarm-profile}? 577 +--rw alarm-type-id al:alarm-type-id 578 +--rw alarm-type-qualifier-match string 579 +--rw resource al:resource-match 580 +--rw description string 581 +--rw alarm-severity-assignment-profile 582 {severity-assignment}? 583 ... 585 4.1. Alarm Control 587 The "/alarms/control/notify-status-changes" choice controls if 588 notifications are sent for all state changes, only raise and clear, 589 or only notifications more severe than a configured level. This 590 feature in combination with alarm shelving corresponds to the ITU 591 Alarm Report Control functionality. 593 Every alarm has a list of status changes, this is a circular list. 594 The length of this list is controlled by "/alarms/control/max-alarm- 595 status-changes". 597 4.1.1. Alarm Shelving 599 The shelving control tree is shown below: 601 +--rw control 602 +--rw alarm-shelving {alarm-shelving}? 603 +--rw shelf* [name] 604 +--rw name string 605 +--rw resource* resource-match 606 +--rw alarm-type-id? alarm-type-id 607 +--rw alarm-type-qualifier-match? string 608 +--rw description? string 610 Shelved alarms are shown in a dedicated shelved alarm list. The 611 instrumentation MUST move shelved alarms from the alarm list 612 (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- 613 alarms/). Shelved alarms do not generate any notifications. When 614 the shelving criteria is removed or changed the alarm list MUST be 615 updated to the correct actual state of the alarms. 617 Shelving and unshelving can only be performed by editing the shelf 618 configuration. It cannot be performed on individual alarms. The 619 server will add an operator state indicating that the alarm was 620 shelved/unshelved. 622 A leaf (/alarms/summary/shelfs-active) in the alarm summary indicates 623 if there are shelved alarms. 625 A system can select to not support the shelving feature. 627 4.2. Alarm Inventory 629 The alarm inventory represents all possible alarm types that may 630 occur in the system. A management system may use this to build alarm 631 procedures. The alarm inventory is relevant for several reasons: 633 The system might not instrument all defined alarm type identities, 634 and some alarm identities are abstract. 636 The system has configured dynamic alarm types using the alarm 637 qualifier. The inventory makes it possible for the management 638 system to discover these. 640 Note that the mechanism whereby dynamic alarm types are added using 641 the alarm type qualifier MUST populate this list. 643 The optional leaf-list "resource" in the alarm inventory enables the 644 system to publish for which resources a given alarm type may appear. 646 A server MUST implement the alarm inventory in order to enable 647 controlled alarm procedures in the client. 649 The alarm inventory tree is shown below: 651 +--ro alarm-inventory 652 +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 653 +--ro alarm-type-id alarm-type-id 654 +--ro alarm-type-qualifier alarm-type-qualifier 655 +--ro resource* resource-match 656 +--ro has-clear boolean 657 +--ro severity-levels* severity 658 +--ro description string 660 4.3. Alarm Summary 662 The alarm summary list summarizes alarms per severity; how many 663 cleared, cleared and closed, and closed. It also gives an indication 664 if there are shelved alarms. 666 The alarm summary tree is shown below: 668 +--ro summary {alarm-summary}? 669 +--ro alarm-summary* [severity] 670 | +--ro severity severity 671 | +--ro total? yang:gauge32 672 | +--ro not-cleared? yang:gauge32 673 | +--ro cleared? yang:gauge32 674 | +--ro cleared-not-closed? yang:gauge32 675 | | {operator-actions}? 676 | +--ro cleared-closed? yang:gauge32 677 | | {operator-actions}? 678 | +--ro not-cleared-closed? yang:gauge32 679 | | {operator-actions}? 680 | +--ro not-cleared-not-closed? yang:gauge32 681 | {operator-actions}? 682 +--ro shelves-active? empty {alarm-shelving}? 684 4.4. The Alarm List 686 The alarm list (/alarms/alarm-list) is a function from (resource, 687 alarm type, alarm type qualifier) to the current composite alarm 688 state. The composite state includes states for the resource life- 689 cycle such as severity, clearance flag and operator states such as 690 acknowledgment. This means that for a given resource and alarm-type 691 the alarm list shows the current states of the alarm such as 692 acknowledged and cleared status. 694 +--ro alarm-list 695 +--ro number-of-alarms? yang:gauge32 696 +--ro last-changed? yang:date-and-time 697 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 698 +--ro resource resource 699 +--ro alarm-type-id alarm-type-id 700 +--ro alarm-type-qualifier alarm-type-qualifier 701 +--ro alt-resource* resource 702 +--ro related-alarm* 703 | [resource alarm-type-id alarm-type-qualifier] 704 | +--ro resource 705 | | -> /alarms/alarm-list/alarm/resource 706 | +--ro alarm-type-id leafref 707 | +--ro alarm-type-qualifier leafref 708 +--ro impacted-resource* resource 709 +--ro root-cause-resource* resource 710 +--ro time-created yang:date-and-time 711 +--ro is-cleared boolean 712 +--ro last-changed yang:date-and-time 713 +--ro perceived-severity severity 714 +--ro alarm-text alarm-text 715 +--ro status-change* [time] {alarm-history}? 716 | +--ro time yang:date-and-time 717 | +--ro perceived-severity severity-with-clear 718 | +--ro alarm-text alarm-text 719 +--ro operator-state-change* [time] {operator-actions}? 720 | +--ro time yang:date-and-time 721 | +--ro operator string 722 | +--ro state operator-state 723 | +--ro text? string 724 +---x set-operator-state {operator-actions}? 725 | +---w input 726 | +---w state writable-operator-state 727 | +---w text? string 728 +---n operator-action {operator-actions}? 729 +-- time yang:date-and-time 730 +-- operator string 731 +-- state operator-state 732 +-- text? string 734 Every alarm has three important states, the resource clearance state 735 "is-cleared", the severity "perceived-severity" and the operator 736 state available in the operator state change list. 738 In order to see the alarm history the resource state changes are 739 available in the "status-change" list and the operator history is 740 available in the "operator-state-change" list. 742 4.5. The Shelved Alarms List 744 The shelved alarm list has the same structure as the alarm list 745 above. It shows all the alarms that matches the shelving criteria 746 (/alarms/control/alarm-shelving). 748 4.6. Alarm Profiles 750 Alarm profiles (/alarms/alarm-profile/) is a list of configurable 751 alarm types. The list supports configurable alarm severity levels in 752 the container "alarm-severity-assignment-profile". If an alarm 753 matches the configured alarm type it MUST use the configured severity 754 level(s) instead of the system default. This configuration MUST also 755 be represented in the alarm inventory. 757 +--rw alarm-profile* 758 [alarm-type-id alarm-type-qualifier-match resource] 759 {alarm-profile}? 760 +--rw alarm-type-id al:alarm-type-id 761 +--rw alarm-type-qualifier-match string 762 +--rw resource al:resource-match 763 +--rw description string 764 +--rw alarm-severity-assignment-profile 765 {severity-assignment}? 766 +--rw severity-levels* al:severity 768 4.7. RPCs and Actions 770 The alarm module supports rpcs and actions to manage the alarms: 772 "purge-alarms" (rpc): delete alarms according to specific 773 criteria, for example all cleared alarms older then a specific 774 date. 776 "compress-alarms" (rpc): compress the status-change list for the 777 alarms. 779 "set-operator-state" (action): change the operator state for an 780 alarm: for example acknowledge. 782 4.8. Notifications 784 The alarm module supports a general notification to report alarm 785 state changes. It carries all relevant parameters for the alarm 786 management application. 788 There is also a notification to report that an operator changed the 789 operator state on an alarm, like acknowledge. 791 If the alarm inventory is changed, for example a new card type is 792 inserted, a notification will tell the management application that 793 new alarm types are available. 795 5. Alarm YANG Module 797 This YANG module references [RFC6991]. 799 file "ietf-alarms@2018-11-06.yang" 800 module ietf-alarms { 801 yang-version 1.1; 802 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; 803 prefix al; 805 import ietf-yang-types { 806 prefix yang; 807 reference "RFC 6991: Common YANG Data Types."; 808 } 810 organization 811 "IETF CCAMP Working Group"; 812 contact 813 "WG Web: 814 WG List: 816 Editor: Stefan Vallin 817 819 Editor: Martin Bjorklund 820 "; 821 description 822 "This module defines an interface for managing alarms. Main 823 inputs to the module design are the 3GPP Alarm IRP, ITU-T X.733 824 and ANSI/ISA-18.2 alarm standards. 826 Main features of this module include: 828 * Alarm list: 829 A list of all alarms. Cleared alarms stay in 830 the list until explicitly purged. 832 * Operator actions on alarms: 833 Acknowledging and closing alarms. 835 * Administrative actions on alarms: 836 Purging alarms from the list according to specific 837 criteria. 839 * Alarm inventory: 840 A management application can read all 841 alarm types implemented by the system. 843 * Alarm shelving: 844 Shelving (blocking) alarms according 845 to specific criteria. 847 * Alarm profiles: 848 A management system can attach further 849 information to alarm types, for example 850 overriding system default severity 851 levels. 853 This module uses a stateful view on alarms. An alarm is a state 854 for a specific resource (note that an alarm is not a 855 notification). An alarm type is a possible alarm state for a 856 resource. For example, the tuple: 858 ('link-alarm', 'GigabitEthernet0/25') 860 is an alarm of type 'link-alarm' on the resource 861 'GigabitEthernet0/25'. 863 Alarm types are identified using YANG identities and an optional 864 string-based qualifier. The string-based qualifier allows for 865 dynamic extension of the statically defined alarm types. Alarm 866 types identify a possible alarm state and not the individual 867 notifications. For example, the traditional 'link-down' and 868 'link-up' notifications are two notifications referring to the 869 same alarm type 'link-alarm'. 871 With this design there is no ambiguity about how alarm and alarm 872 clear correlation should be performed: notifications that report 873 the same resource and alarm type are considered updates of the 874 same alarm, e.g., clearing an active alarm or changing the 875 severity of an alarm. 877 The instrumentation can update 'severity' and 'alarm-text' on an 878 existing alarm. The above alarm example can therefore look 879 like: 881 (('link-alarm', 'GigabitEthernet0/25'), 882 warning, 883 'interface down while interface admin state is up') 885 There is a clear separation between updates on the alarm from 886 the underlying resource, like clear, and updates from an 887 operator like acknowledge or closing an alarm: 889 (('link-alarm', 'GigabitEthernet0/25'), 890 warning, 891 'interface down while interface admin state is up', 892 cleared, 893 closed) 895 Administrative actions like removing closed alarms older than a 896 given time is supported. 898 This alarm module does not define how the underlying 899 instrumentation detects and clears the specific alarms. 900 That belongs to the SDO or enterprise that owns that 901 specific technology. 903 Copyright (c) 2018 IETF Trust and the persons identified as 904 authors of the code. All rights reserved. 906 Redistribution and use in source and binary forms, with or 907 without modification, is permitted pursuant to, and subject to 908 the license terms contained in, the Simplified BSD License set 909 forth in Section 4.c of the IETF Trust's Legal Provisions 910 Relating to IETF Documents 911 (https://trustee.ietf.org/license-info). 913 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 914 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 915 'OPTIONAL' in the module text are to be interpreted as described 916 in RFC 2119 (https://tools.ietf.org/html/rfc2119). 918 This version of this YANG module is part of RFC XXXX 919 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 920 full legal notices."; 922 revision 2018-11-06 { 923 description 924 "Initial revision."; 926 reference "RFC XXXX: YANG Alarm Module"; 927 } 929 /* 930 * Features 931 */ 933 feature operator-actions { 934 description 935 "This feature indicates that the system supports operator 936 states on alarms."; 937 } 939 feature alarm-shelving { 940 description 941 "This feature indicates that the system supports shelving 942 (blocking) alarms."; 943 } 945 feature alarm-history { 946 description 947 "This feature indicates that server maintains a history of 948 state changes for each alarm. For example, if an alarm 949 toggles between cleared and active 10 times, these state 950 changes are present in a separate list in the alarm."; 951 } 953 feature alarm-summary { 954 description 955 "This feature indicates that the server summarizes the number 956 of alarms per severity and operator state."; 957 } 959 feature alarm-profile { 960 description 961 "The system supports clients to configure further information 962 to each alarm type."; 963 } 965 feature severity-assignment { 966 description 967 "The system supports configurable alarm severity levels."; 968 reference 969 "M.3160/M.3100 Alarm Severity Assignment Profile, ASAP"; 970 } 972 /* 973 * Identities 974 */ 976 identity alarm-type-id { 977 description 978 "Base identity for alarm types. A unique identification of the 979 alarm, not including the resource. Different resources can 980 share alarm types. If the resource reports the same alarm 981 type, it is to be considered to be the same alarm. The alarm 982 type is a simplification of the different X.733 and 3GPP alarm 983 IRP alarm correlation mechanisms and it allows for 984 hierarchical extensions. 986 A string-based qualifier can be used in addition to the 987 identity in order to have different alarm types based on 988 information not known at design-time, such as values in 989 textual SNMP Notification var-binds. 991 Standards and vendors can define sub-identities to clearly 992 identify specific alarm types. 994 This identity is abstract and MUST NOT be used for alarms."; 995 } 997 /* 998 * Common types 999 */ 1001 typedef resource { 1002 type union { 1003 type instance-identifier { 1004 require-instance false; 1005 } 1006 type yang:object-identifier; 1007 type string; 1008 type yang:uuid; 1009 } 1010 description 1011 "This is an identification of the alarming resource, such as an 1012 interface. It should be as fine-grained as possible both to 1013 guide the operator and to guarantee uniqueness of the alarms. 1015 If the alarming resource is modelled in YANG, this type will 1016 be an instance-identifier. 1018 If the resource is an SNMP object, the type will be an 1019 object-identifier. 1021 If the resource is anything else, for example a distinguished 1022 name or a CIM path, this type will be a string. 1024 If the alarming object is identified by a UUID use the uuid 1025 type. Be cautious when using this type, since a UUID is hard 1026 to use for an operator. 1028 If the server supports several models, the presedence should 1029 be in the order as given in the union definition."; 1030 } 1032 typedef resource-match { 1033 type union { 1034 type yang:xpath1.0; 1035 type yang:object-identifier; 1036 type string; 1037 } 1038 description 1039 "This type is used to match resources of type 'resource'. 1040 Since the type 'resource' is a union of different types, 1041 the 'resource-match' type is also a union of corresponding 1042 types. 1044 If the type is given as an XPath 1.0 expression, a resource 1045 of type 'instance-identifier' matches if the instance is part 1046 of the node set that is the result of evaluating the XPath 1.0 1047 expression. For example, the XPath 1.0 expression: 1049 /if:interfaces/if:interface[if:type='ianaift:ethernetCsmacd'] 1051 would match the resource instance-identifier: 1053 /if:interfaces/if:interface[if:name='eth1'], 1055 assuming that the interface 'eth1' is of type 1056 'ianaift:ethernetCsmacd'. 1058 If the type is given as an object identifier, a resource of 1059 type 'object-identifier' matches if the match object 1060 identifier is a prefix of the resource's object identifier. 1061 For example, the value: 1063 1.3.6.1.2.1.2.2 1065 would match the resource object identifier: 1067 1.3.6.1.2.1.2.2.1.1.5 1069 If the type is given as an UUID or a string, it is interpreted 1070 as a W3C regular expression, which matches a resource of type 1071 'yang:uuid' or 'string' if the given regular expression 1072 matches the resource string. 1074 If the type is given as an XPath expression it is evaluated 1075 in the following XPath context: 1077 o The set of namespace declarations are those in scope on 1078 the leaf element where this type is used. 1080 o The set of variable bindings is empty. 1082 o The function library is the core function library 1083 and the functions defined in Section 10 of RFC 7950. 1085 o The context node is the root node in the data tree."; 1086 } 1088 typedef alarm-text { 1089 type string; 1090 description 1091 "The string used to inform operators about the alarm. This 1092 MUST contain enough information for an operator to be able 1093 to understand the problem and how to resolve it. If this 1094 string contains structure, this format should be clearly 1095 documented for programs to be able to parse that 1096 information."; 1097 } 1099 typedef severity { 1100 type enumeration { 1101 enum indeterminate { 1102 value 2; 1103 description 1104 "Indicates that the severity level could not be 1105 determined. This level SHOULD be avoided."; 1106 } 1107 enum minor { 1108 value 3; 1109 description 1110 "The 'minor' severity level indicates the existence of a 1111 non-service affecting fault condition and that corrective 1112 action should be taken in order to prevent a more serious 1113 (for example, service affecting) fault. Such a severity 1114 can be reported, for example, when the detected alarm 1115 condition is not currently degrading the capacity of the 1116 resource."; 1117 } 1118 enum warning { 1119 value 4; 1120 description 1121 "The 'warning' severity level indicates the detection of a 1122 potential or impending service affecting fault, before any 1123 significant effects have been felt. Action should be 1124 taken to further diagnose (if necessary) and correct the 1125 problem in order to prevent it from becoming a more 1126 serious service affecting fault."; 1127 } 1128 enum major { 1129 value 5; 1130 description 1131 "The 'major' severity level indicates that a service 1132 affecting condition has developed and an urgent corrective 1133 action is required. Such a severity can be reported, for 1134 example, when there is a severe degradation in the 1135 capability of the resource and its full capability must be 1136 restored."; 1137 } 1138 enum critical { 1139 value 6; 1140 description 1141 "The 'critical' severity level indicates that a service 1142 affecting condition has occurred and an immediate 1143 corrective action is required. Such a severity can be 1144 reported, for example, when a resource becomes totally out 1145 of service and its capability must be restored."; 1146 } 1147 } 1148 description 1149 "The severity level of the alarm. Note well that value 'clear' 1150 is not included. If an alarm is cleared or not is a separate 1151 boolean flag."; 1152 reference 1153 "ITU Recommendation X.733: Information Technology 1154 - Open Systems Interconnection 1155 - System Management: Alarm Reporting Function"; 1156 } 1158 typedef severity-with-clear { 1159 type union { 1160 type enumeration { 1161 enum cleared { 1162 value 1; 1163 description 1164 "The alarm is cleared by the instrumentation."; 1165 } 1167 } 1168 type severity; 1169 } 1170 description 1171 "The severity level of the alarm including clear. 1172 This is used *only* in notifications reporting state changes 1173 for an alarm."; 1174 } 1176 typedef writable-operator-state { 1177 type enumeration { 1178 enum none { 1179 value 1; 1180 description 1181 "The alarm is not being taken care of."; 1182 } 1183 enum ack { 1184 value 2; 1185 description 1186 "The alarm is being taken care of. Corrective action not 1187 taken yet, or failed"; 1188 } 1189 enum closed { 1190 value 3; 1191 description 1192 "Corrective action taken successfully."; 1193 } 1194 } 1195 description 1196 "Operator states on an alarm. The 'closed' state indicates 1197 that an operator considers the alarm being resolved. This 1198 is separate from the alarm's 'is-cleared' leaf."; 1199 } 1201 typedef operator-state { 1202 type union { 1203 type writable-operator-state; 1204 type enumeration { 1205 enum shelved { 1206 value 4; 1207 description 1208 "The alarm is shelved. Alarms in /alarms/shelved-alarms/ 1209 MUST be assigned this operator state by the server as 1210 the last entry in the operator-state-change list. The 1211 text for that entry SHOULD include the shelf name."; 1212 } 1213 enum un-shelved { 1214 value 5; 1215 description 1216 "The alarm is moved back to 'alarm-list' from a shelf. 1217 Alarms that are moved from /alarms/shelved-alarms/ to 1218 /alarms/alarm-list MUST be assigned this state by the 1219 server as the last entry in the 'operator-state-change' 1220 list. The text for that entry SHOULD include the shelf 1221 name."; 1222 } 1223 } 1224 } 1225 description 1226 "Operator states on an alarm. The 'closed' state indicates 1227 that an operator considers the alarm being resolved. This 1228 is separate from the alarm's 'is-cleared' leaf."; 1229 } 1231 /* Alarm type */ 1233 typedef alarm-type-id { 1234 type identityref { 1235 base alarm-type-id; 1236 } 1237 description 1238 "Identifies an alarm type. The description of the alarm type 1239 id MUST indicate if the alarm type is abstract or not. An 1240 abstract alarm type is used as a base for other alarm type ids 1241 and will not be used as a value for an alarm or be present in 1242 the alarm inventory."; 1243 } 1245 typedef alarm-type-qualifier { 1246 type string; 1247 description 1248 "If an alarm type can not be fully specified at design time by 1249 alarm-type-id, this string qualifier is used in addition to 1250 fully define a unique alarm type. 1252 The definition of alarm qualifiers is considered being part 1253 of the instrumentation and out of scope for this module. 1254 An empty string is used when this is part of a key."; 1255 } 1257 /* 1258 * Groupings 1259 */ 1261 grouping common-alarm-parameters { 1262 description 1263 "Common parameters for an alarm. 1265 This grouping is used both in the alarm list and in the 1266 notification representing an alarm state change."; 1267 leaf resource { 1268 type resource; 1269 mandatory true; 1270 description 1271 "The alarming resource. See also 'alt-resource'. 1272 This could for example be a reference to the alarming 1273 interface"; 1274 } 1275 leaf alarm-type-id { 1276 type alarm-type-id; 1277 mandatory true; 1278 description 1279 "This leaf and the leaf 'alarm-type-qualifier' together 1280 provides a unique identification of the alarm type."; 1281 } 1282 leaf alarm-type-qualifier { 1283 type alarm-type-qualifier; 1284 description 1285 "This leaf is used when the 'alarm-type-id' leaf cannot 1286 uniquely identify the alarm type. Normally, this is not 1287 the case, and this leaf is the empty string."; 1288 } 1289 leaf-list alt-resource { 1290 type resource; 1291 description 1292 "Used if the alarming resource is available over other 1293 interfaces. This field can contain SNMP OID's, CIM paths or 1294 3GPP Distinguished names for example."; 1295 } 1296 list related-alarm { 1297 key "resource alarm-type-id alarm-type-qualifier"; 1298 description 1299 "References to related alarms. Note that the related alarm 1300 might have been purged from the alarm list."; 1301 leaf resource { 1302 type leafref { 1303 path "/alarms/alarm-list/alarm/resource"; 1304 require-instance false; 1305 } 1306 description 1307 "The alarming resource for the related alarm."; 1308 } 1309 leaf alarm-type-id { 1310 type leafref { 1311 path "/alarms/alarm-list/alarm" 1312 + "[resource=current()/../resource]" 1313 + "/alarm-type-id"; 1314 require-instance false; 1315 } 1316 description 1317 "The alarm type identifier for the related alarm."; 1318 } 1319 leaf alarm-type-qualifier { 1320 type leafref { 1321 path "/alarms/alarm-list/alarm" 1322 + "[resource=current()/../resource]" 1323 + "[alarm-type-id=current()/../alarm-type-id]" 1324 + "/alarm-type-qualifier"; 1325 require-instance false; 1326 } 1327 description 1328 "The alarm qualifier for the related alarm."; 1329 } 1330 } 1331 leaf-list impacted-resource { 1332 type resource; 1333 description 1334 "Resources that might be affected by this alarm. If the 1335 system creates an alarm on a resource and also has a mapping 1336 to other resources that might be impacted, these resources 1337 can be listed in this leaf-list. In this way the system can 1338 create one alarm instead of several. For example, if an 1339 interface has an alarm, the 'impacted-resource' can 1340 reference the aggregated port channels."; 1341 } 1342 leaf-list root-cause-resource { 1343 type resource; 1344 description 1345 "Resources that are candidates for causing the alarm. If the 1346 system has a mechanism to understand the candidate root 1347 causes of an alarm, this leaf-list can be used to list the 1348 root cause candidate resources. In this way the system can 1349 create one alarm instead of several. An example might be a 1350 logging system (alarm resource) that fails, the alarm can 1351 reference the file-system in the 'root-cause-resource' 1352 leaf-list. Note that the intended use is not to also send an 1353 an alarm with the root-cause-resource as alarming resource. 1354 The root-cause-resource leaf list is a hint and should not 1355 also generate an alarm for the same problem."; 1356 } 1357 } 1358 grouping alarm-state-change-parameters { 1359 description 1360 "Parameters for an alarm state change. 1362 This grouping is used both in the alarm list's 1363 status-change list and in the notification representing an 1364 alarm state change."; 1365 leaf time { 1366 type yang:date-and-time; 1367 mandatory true; 1368 description 1369 "The time the status of the alarm changed. The value 1370 represents the time the real alarm state change appeared 1371 in the resource and not when it was added to the 1372 alarm list. The /alarm-list/alarm/last-changed MUST be 1373 set to the same value."; 1374 } 1375 leaf perceived-severity { 1376 type severity-with-clear; 1377 mandatory true; 1378 description 1379 "The severity of the alarm as defined by X.733. Note 1380 that this may not be the original severity since the alarm 1381 may have changed severity."; 1382 reference 1383 "ITU Recommendation X.733: Information Technology 1384 - Open Systems Interconnection 1385 - System Management: Alarm Reporting Function"; 1386 } 1387 leaf alarm-text { 1388 type alarm-text; 1389 mandatory true; 1390 description 1391 "A user friendly text describing the alarm state change."; 1392 reference 1393 "ITU Recommendation X.733: Information Technology 1394 - Open Systems Interconnection 1395 - System Management: Alarm Reporting Function"; 1396 } 1397 } 1399 grouping operator-parameters { 1400 description 1401 "This grouping defines parameters that can be changed by an 1402 operator."; 1403 leaf time { 1404 type yang:date-and-time; 1405 mandatory true; 1406 description 1407 "Timestamp for operator action on alarm."; 1408 } 1409 leaf operator { 1410 type string; 1411 mandatory true; 1412 description 1413 "The name of the operator that has acted on this 1414 alarm."; 1415 } 1416 leaf state { 1417 type operator-state; 1418 mandatory true; 1419 description 1420 "The operator's view of the alarm state."; 1421 } 1422 leaf text { 1423 type string; 1424 description 1425 "Additional optional textual information provided by 1426 the operator."; 1427 } 1428 } 1430 grouping resource-alarm-parameters { 1431 description 1432 "Alarm parameters that originates from the resource view."; 1433 leaf is-cleared { 1434 type boolean; 1435 mandatory true; 1436 description 1437 "Indicates the current clearance state of the alarm. An 1438 alarm might toggle from active alarm to cleared alarm and 1439 back to active again."; 1440 } 1441 leaf last-changed { 1442 type yang:date-and-time; 1443 mandatory true; 1444 description 1445 "A timestamp when the alarm status was last changed. Status 1446 changes are changes to 'is-cleared', 'perceived-severity', 1447 and 'alarm-text'."; 1448 } 1449 leaf perceived-severity { 1450 type severity; 1451 mandatory true; 1452 description 1453 "The last severity of the alarm. 1455 If an alarm was raised with severity 'warning', but later 1456 changed to 'major', this leaf will show 'major'."; 1457 } 1458 leaf alarm-text { 1459 type alarm-text; 1460 mandatory true; 1461 description 1462 "The last reported alarm text. This text should contain 1463 information for an operator to be able to understand 1464 the problem and how to resolve it."; 1465 } 1466 list status-change { 1467 if-feature "alarm-history"; 1468 key "time"; 1469 min-elements 1; 1470 description 1471 "A list of status change events for this alarm. 1473 The entry with latest time-stamp in this list MUST 1474 correspond to the leafs 'is-cleared', 'perceived-severity' 1475 and 'alarm-text' for the alarm. The time-stamp for that 1476 entry MUST be equal to the 'last-changed' leaf. 1478 This list is ordered according to the timestamps of 1479 alarm state changes. The last item corresponds to the 1480 latest state change. 1482 The following state changes creates an entry in this 1483 list: 1484 - changed severity (warning, minor, major, critical) 1485 - clearance status, this also updates the 'is-cleared' 1486 leaf 1487 - alarm text update"; 1488 uses alarm-state-change-parameters; 1489 } 1490 } 1492 /* 1493 * The /alarms data tree 1494 */ 1496 container alarms { 1497 description 1498 "The top container for this module."; 1499 container control { 1500 description 1501 "Configuration to control the alarm behaviour."; 1502 leaf max-alarm-status-changes { 1503 type union { 1504 type uint16; 1505 type enumeration { 1506 enum infinite { 1507 description 1508 "The status change entries are accumulated 1509 infinitely."; 1510 } 1511 } 1512 } 1513 default "32"; 1514 description 1515 "The status-change entries are kept in a circular list 1516 per alarm. When this number is exceeded, the oldest 1517 status change entry is automatically removed. If the 1518 value is 'infinite', the status change entries are 1519 accumulated infinitely."; 1520 } 1521 choice notify-status-changes { 1522 description 1523 "This leaf controls the notifications sent for alarm status 1524 updates. There are three options: 1525 1. notifications are sent for all updates, severity level 1526 changes and alarm text changes 1527 2. notifications are only sent for alarm raise and clear 1528 3. notifications are sent for status changes equal to or 1529 above the specified severity level. Clear notifications 1530 shall always be sent 1531 Notifications shall also be sent for state changes that 1532 makes an alarm less severe than the specified level. 1533 In option 3, assuming the severity level is set to major, 1534 and that the alarm has the following state changes 1535 [(Time, severity, clear)]: 1536 [(T1, major, -), (T2, minor, -), (T3, warning, -), 1537 (T4, minor, -), (T5, major, -), (T6, critical, -), 1538 (T7, major. -), (T8, major, clear)] 1539 In that case, notifications will be sent at 1540 T1, T2, T5, T6, T7 and T8."; 1541 leaf notify-all-state-changes { 1542 type empty; 1543 description 1544 "Send notifications for all status changes."; 1545 } 1546 leaf notify-raise-and-clear { 1547 type empty; 1548 description 1549 "Send notifications only for raise, clear, and re-raise. 1550 Notifications for severity level changes or alarm text 1551 changes are not sent."; 1552 } 1553 leaf notify-severity-level { 1554 type severity; 1555 description 1556 "Only send notifications for alarm state changes 1557 crossing the specified level. Always send clear 1558 notifications."; 1559 } 1560 } 1561 container alarm-shelving { 1562 if-feature "alarm-shelving"; 1563 description 1564 "The alarm-shelving/shelf list is used to shelve 1565 (block/filter) alarms. The server will move any alarms 1566 corresponding to the shelving criteria from the 1567 alarms/alarm-list/alarm list to the 1568 alarms/shelved-alarms/shelved-alarm list. It will also 1569 stop sending notifications for the shelved alarms. The 1570 conditions in the shelf criteria are logically ANDed. 1571 When the shelving criteria is deleted or changed, the 1572 non-matching alarms MUST appear in the 1573 alarms/alarm-list/alarm list according to the real state. 1574 This means that the instrumentation MUST maintain states 1575 for the shelved alarms. Alarms that match the criteria 1576 shall have an operator-state 'shelved'. When the shelf 1577 configuration will remove an alarm from the shelf the 1578 server shall add an operator state 'unshelved'."; 1579 list shelf { 1580 key "name"; 1581 leaf name { 1582 type string; 1583 description 1584 "An arbitrary name for the alarm shelf."; 1585 } 1586 description 1587 "Each entry defines the criteria for shelving alarms. 1588 Criteria are ANDed. If no criteria are specified, 1589 all alarms will be shelved."; 1590 leaf-list resource { 1591 type resource-match; 1592 description 1593 "Shelve alarms for matching resources."; 1594 } 1595 leaf alarm-type-id { 1596 type alarm-type-id; 1597 description 1598 "Shelve all alarms that have an alarm-type-id that is 1599 equal to or derived from the given alarm-type-id."; 1600 } 1601 leaf alarm-type-qualifier-match { 1602 type string; 1603 description 1604 "A W3C regular expression that is used to match 1605 an alarm type qualifier. Shelve all alarms that 1606 matches this regular expression for the alarm 1607 type qualifier."; 1608 } 1609 leaf description { 1610 type string; 1611 description 1612 "An optional textual description of the shelf. This 1613 description should include the reason for shelving 1614 these alarms."; 1615 } 1616 } 1617 } 1618 } 1619 container alarm-inventory { 1620 config false; 1621 description 1622 "This alarm-inventory/alarm-type list contains all possible 1623 alarm types for the system. 1624 If the system knows for which resources a specific alarm 1625 type can appear, this is also identified in the inventory. 1626 The list also tells if each alarm type has a corresponding 1627 clear state. The inventory shall only contain concrete 1628 alarm types. 1630 The alarm inventory MUST be updated by the system when new 1631 alarms can appear. This can be the case when installing new 1632 software modules or inserting new card types. A 1633 notification 'alarm-inventory-changed' is sent when the 1634 inventory is changed."; 1635 list alarm-type { 1636 key "alarm-type-id alarm-type-qualifier"; 1637 description 1638 "An entry in this list defines a possible alarm."; 1639 leaf alarm-type-id { 1640 type alarm-type-id; 1641 description 1642 "The statically defined alarm type identifier for this 1643 possible alarm."; 1644 } 1645 leaf alarm-type-qualifier { 1646 type alarm-type-qualifier; 1647 description 1648 "The optionally dynamically defined alarm type identifier 1649 for this possible alarm."; 1650 } 1651 leaf-list resource { 1652 type resource-match; 1653 description 1654 "Optionally, specifies for which resources the alarm type 1655 is valid."; 1656 } 1657 leaf has-clear { 1658 type boolean; 1659 mandatory true; 1660 description 1661 "This leaf tells the operator if the alarm will be 1662 cleared when the correct corrective action has been 1663 taken. Implementations SHOULD strive for detecting the 1664 cleared state for all alarm types. If this leaf is 1665 true, the operator can monitor the alarm until it 1666 becomes cleared after the corrective action has been 1667 taken. If this leaf is false the operator needs to 1668 validate that the alarm is not longer active using other 1669 mechanisms. Alarms can lack a corresponding clear due 1670 to missing instrumentation or that there is no logical 1671 corresponding clear state."; 1672 } 1673 leaf-list severity-levels { 1674 type severity; 1675 description 1676 "This leaf-list indicates the possible severity levels of 1677 this alarm type. Note well that 'clear' is not part of 1678 the severity type. In general, the severity level should 1679 be defined by the instrumentation based on dynamic state 1680 and not defined statically by the alarm type in order to 1681 provide relevant severity level based on dynamic state 1682 and context. However most alarm types have a defined set 1683 of possible severity levels and this should be provided 1684 here."; 1685 } 1686 leaf description { 1687 type string; 1688 mandatory true; 1689 description 1690 "A description of the possible alarm. It SHOULD include 1691 information on possible underlying root causes and 1692 corrective actions."; 1693 } 1694 } 1696 } 1697 container summary { 1698 if-feature "alarm-summary"; 1699 config false; 1700 description 1701 "This container gives a summary of number of alarms."; 1702 list alarm-summary { 1703 key "severity"; 1704 description 1705 "A global summary of all alarms in the system. The summary 1706 does not include shelved alarms."; 1707 leaf severity { 1708 type severity; 1709 description 1710 "Alarm summary for this severity level."; 1711 } 1712 leaf total { 1713 type yang:gauge32; 1714 description 1715 "Total number of alarms of this severity level."; 1716 } 1717 leaf not-cleared { 1718 type yang:gauge32; 1719 description 1720 "Total number of alarms of this severity level 1721 that are not cleared."; 1722 } 1723 leaf cleared { 1724 type yang:gauge32; 1725 description 1726 "For this severity level, the number of alarms that are 1727 cleared."; 1728 } 1729 leaf cleared-not-closed { 1730 if-feature "operator-actions"; 1731 type yang:gauge32; 1732 description 1733 "For this severity level, the number of alarms that are 1734 cleared but not closed."; 1735 } 1736 leaf cleared-closed { 1737 if-feature "operator-actions"; 1738 type yang:gauge32; 1739 description 1740 "For this severity level, the number of alarms that are 1741 cleared and closed."; 1742 } 1743 leaf not-cleared-closed { 1744 if-feature "operator-actions"; 1745 type yang:gauge32; 1746 description 1747 "For this severity level, the number of alarms that are 1748 not cleared but closed."; 1749 } 1750 leaf not-cleared-not-closed { 1751 if-feature "operator-actions"; 1752 type yang:gauge32; 1753 description 1754 "For this severity level, the number of alarms that are 1755 not cleared and not closed."; 1756 } 1757 } 1758 leaf shelves-active { 1759 if-feature "alarm-shelving"; 1760 type empty; 1761 description 1762 "This is a hint to the operator that there are active 1763 alarm shelves. This leaf MUST exist if the 1764 alarms/shelved-alarms/number-of-shelved-alarms is > 0."; 1765 } 1766 } 1767 container alarm-list { 1768 config false; 1769 description 1770 "The alarms in the system."; 1771 leaf number-of-alarms { 1772 type yang:gauge32; 1773 description 1774 "This object shows the total number of 1775 alarms in the system, i.e., the total number 1776 of entries in the alarm list."; 1777 } 1778 leaf last-changed { 1779 type yang:date-and-time; 1780 description 1781 "A timestamp when the alarm list was last 1782 changed. The value can be used by a manager to 1783 initiate an alarm resynchronization procedure."; 1784 } 1785 list alarm { 1786 key "resource alarm-type-id alarm-type-qualifier"; 1787 description 1788 "The list of alarms. Each entry in the list holds one 1789 alarm for a given alarm type and resource. 1790 An alarm can be updated from the underlying resource or 1791 by the user. The following leafs are maintained by the 1792 resource: is-cleared, last-change, perceived-severity, 1793 and alarm-text. An operator can change: operator-state 1794 and operator-text. 1796 Entries appear in the alarm list the first time an 1797 alarm becomes active for a given alarm-type and resource. 1798 Entries do not get deleted when the alarm is cleared, this 1799 is a boolean state in the alarm. 1801 Alarm entries are removed, purged, from the list by an 1802 explicit purge action. For example, purge all alarms 1803 that are cleared and in closed operator-state that are 1804 older than 24 hours. Purged alarms are removed from the 1805 alarm list. If the alarm resource state changes 1806 after a purge, the alarm will reappear in the alarm list. 1808 Systems may also remove alarms based on locally configured 1809 policies which is out of scope for this module."; 1810 uses common-alarm-parameters; 1811 leaf time-created { 1812 type yang:date-and-time; 1813 mandatory true; 1814 description 1815 "The time-stamp when this alarm entry was created. This 1816 represents the first time the alarm appeared, it can 1817 also represent that the alarm re-appeared after a purge. 1818 Further state-changes of the same alarm does not change 1819 this leaf, these changes will update the 'last-changed' 1820 leaf."; 1821 } 1822 uses resource-alarm-parameters; 1823 list operator-state-change { 1824 if-feature "operator-actions"; 1825 key "time"; 1826 description 1827 "This list is used by operators to indicate 1828 the state of human intervention on an alarm. 1829 For example, if an operator has seen an alarm, 1830 the operator can add a new item to this list indicating 1831 that the alarm is acknowledged."; 1832 uses operator-parameters; 1833 } 1834 action set-operator-state { 1835 if-feature "operator-actions"; 1836 description 1837 "This is a means for the operator to indicate 1838 the level of human intervention on an alarm."; 1839 input { 1840 leaf state { 1841 type writable-operator-state; 1842 mandatory true; 1843 description 1844 "Set this operator state."; 1845 } 1846 leaf text { 1847 type string; 1848 description 1849 "Additional optional textual information."; 1850 } 1851 } 1852 } 1853 notification operator-action { 1854 if-feature "operator-actions"; 1855 description 1856 "This notification is used to report that an operator 1857 acted upon an alarm."; 1858 uses operator-parameters; 1859 } 1860 } 1861 } 1862 container shelved-alarms { 1863 if-feature "alarm-shelving"; 1864 config false; 1865 description 1866 "The shelved alarms. Alarms appear here if they match the 1867 criteria in /alarms/control/alarm-shelving. This list does 1868 not generate any notifications. The list represents alarms 1869 that are considered not relevant by the operator. Alarms in 1870 this list have an operator-state of 'shelved'. This can not 1871 be changed."; 1872 leaf number-of-shelved-alarms { 1873 type yang:gauge32; 1874 description 1875 "This object shows the total number of currently 1876 alarms, i.e., the total number of entries 1877 in the alarm list."; 1878 } 1879 leaf shelved-alarms-last-changed { 1880 type yang:date-and-time; 1881 description 1882 "A timestamp when the shelved alarm list was last 1883 changed. The value can be used by a manager to 1884 initiate an alarm resynchronization procedure."; 1885 } 1886 list shelved-alarm { 1887 key "resource alarm-type-id alarm-type-qualifier"; 1888 description 1889 "The list of shelved alarms. Shelved alarms 1890 can only be updated from the underlying resource, 1891 no operator actions are supported."; 1892 uses common-alarm-parameters; 1893 leaf shelf-name { 1894 type leafref { 1895 path "/alarms/control/alarm-shelving/shelf/name"; 1896 require-instance false; 1897 } 1898 description 1899 "The name of the shelf."; 1900 } 1901 uses resource-alarm-parameters; 1902 list operator-state-change { 1903 if-feature "operator-actions"; 1904 key "time"; 1905 description 1906 "This list is used by operators to indicate 1907 the state of human intervention on an alarm. 1908 For shelved alarms, the system has set the list 1909 item in the list to 'shelved'."; 1910 uses operator-parameters; 1911 } 1912 } 1913 } 1914 list alarm-profile { 1915 if-feature "alarm-profile"; 1916 key "alarm-type-id alarm-type-qualifier-match resource"; 1917 ordered-by user; 1918 description 1919 "This list is used to assign further information or 1920 configuration for each alarm type. This module supports 1921 a mechanism where the client can override the system 1922 default alarm severity levels. The alarm-profile is 1923 also a useful augmentation point for specific additions 1924 to alarm types."; 1925 leaf alarm-type-id { 1926 type al:alarm-type-id; 1927 description 1928 "The alarm type identifier to match."; 1929 } 1930 leaf alarm-type-qualifier-match { 1931 type string; 1932 description 1933 "A W3C regular expression that is used to 1934 match."; 1935 } 1936 leaf resource { 1937 type al:resource-match; 1938 description 1939 "Specifies which resources to match."; 1940 } 1941 leaf description { 1942 type string; 1943 mandatory true; 1944 description 1945 "A description of the alarm profile."; 1946 } 1947 container alarm-severity-assignment-profile { 1948 if-feature "severity-assignment"; 1949 description 1950 "The client can override the system default 1951 severity level."; 1952 reference 1953 "ITU M.3100, ITU M.3160 1954 - Generic Network Information Model, 1955 Alarm Severity Assignment Profile"; 1956 leaf-list severity-levels { 1957 type al:severity; 1958 ordered-by user; 1959 description 1960 "Specifies the configured severity level(s) for the 1961 matching alarm. If the alarm has several severity 1962 levels the leaf-list shall be given in rising severity 1963 order. The original M3100/M3160 ASAP function only 1964 allows for a one-to-one mapping between alarm type and 1965 severity but since the IETF alarm module supports 1966 stateful alarms the mapping must allow for several 1967 severity levels. 1969 Assume a high-utilisation alarm type with two 1970 thresholds with the system default severity levels of 1971 threshold1 = warning and threshold2 = minor. Setting 1972 this leaf-list to (minor, major) will assign the 1973 severity levels threshold1 = minor and 1974 threshold2 = major"; 1975 } 1976 } 1977 } 1978 } 1980 /* 1981 * Operations 1982 */ 1984 rpc compress-alarms { 1985 if-feature "alarm-history"; 1986 description 1987 "This operation requests the server to compress entries in the 1988 alarm list by removing all but the latest state change for all 1989 alarms. Conditions in the input are logically ANDed. If no 1990 input condition is given, all alarms are compressed."; 1991 input { 1992 leaf resource { 1993 type resource-match; 1994 description 1995 "Compress the alarms matching this resource."; 1996 } 1997 leaf alarm-type-id { 1998 type leafref { 1999 path "/alarms/alarm-list/alarm/alarm-type-id"; 2000 require-instance false; 2001 } 2002 description 2003 "Compress alarms with this alarm-type-id."; 2004 } 2005 leaf alarm-type-qualifier { 2006 type leafref { 2007 path "/alarms/alarm-list/alarm/alarm-type-qualifier"; 2008 require-instance false; 2009 } 2010 description 2011 "Compress the alarms with this alarm-type-qualifier."; 2012 } 2013 } 2014 output { 2015 leaf compressed-alarms { 2016 type uint32; 2017 description 2018 "Number of compressed alarm entries."; 2019 } 2020 } 2021 } 2022 rpc compress-shelved-alarms { 2023 if-feature "alarm-history and alarm-shelving"; 2024 description 2025 "This operation requests the server to compress entries in the 2026 shelved alarm list by removing all but the latest state change 2027 for all alarms. Conditions in the input are logically ANDed. 2028 If no input condition is given, all alarms are compressed."; 2029 input { 2030 leaf resource { 2031 type leafref { 2032 path "/alarms/shelved-alarms/shelved-alarm/resource"; 2033 require-instance false; 2034 } 2035 description 2036 "Compress the alarms with this resource."; 2037 } 2038 leaf alarm-type-id { 2039 type leafref { 2040 path "/alarms/shelved-alarms/shelved-alarm/alarm-type-id"; 2041 require-instance false; 2042 } 2043 description 2044 "Compress alarms with this alarm-type-id."; 2045 } 2046 leaf alarm-type-qualifier { 2047 type leafref { 2048 path "/alarms/shelved-alarms/shelved-alarm" 2049 + "/alarm-type-qualifier"; 2050 require-instance false; 2051 } 2052 description 2053 "Compress the alarms with this alarm-type-qualifier."; 2054 } 2055 } 2056 output { 2057 leaf compressed-alarms { 2058 type uint32; 2059 description 2060 "Number of compressed alarm entries."; 2061 } 2062 } 2063 } 2065 grouping filter-input { 2066 description 2067 "Grouping to specify a filter construct on alarm information."; 2068 leaf alarm-status { 2069 type enumeration { 2070 enum any { 2071 description 2072 "Ignore alarm clearance status."; 2073 } 2074 enum cleared { 2075 description 2076 "Filter cleared alarms."; 2077 } 2078 enum not-cleared { 2079 description 2080 "Filter not cleared alarms."; 2081 } 2082 } 2083 mandatory true; 2084 description 2085 "The clearance status of the alarm."; 2086 } 2087 container older-than { 2088 presence "Age specification"; 2089 description 2090 "Matches the 'last-status-change' leaf in the alarm."; 2091 choice age-spec { 2092 description 2093 "Filter using date and time age."; 2094 case seconds { 2095 leaf seconds { 2096 type uint16; 2097 description 2098 "Seconds part"; 2099 } 2100 } 2101 case minutes { 2102 leaf minutes { 2103 type uint16; 2104 description 2105 "Minute part"; 2106 } 2107 } 2108 case hours { 2109 leaf hours { 2110 type uint16; 2111 description 2112 "Hours part."; 2113 } 2114 } 2115 case days { 2116 leaf days { 2117 type uint16; 2118 description 2119 "Day part"; 2120 } 2121 } 2122 case weeks { 2123 leaf weeks { 2124 type uint16; 2125 description 2126 "Week part"; 2127 } 2129 } 2130 } 2131 } 2132 container severity { 2133 presence "Severity filter"; 2134 choice sev-spec { 2135 description 2136 "Filter based on severity level."; 2137 leaf below { 2138 type severity; 2139 description 2140 "Severity less than this leaf."; 2141 } 2142 leaf is { 2143 type severity; 2144 description 2145 "Severity level equal this leaf."; 2146 } 2147 leaf above { 2148 type severity; 2149 description 2150 "Severity level higher than this leaf."; 2151 } 2152 } 2153 description 2154 "Filter based on severity."; 2155 } 2156 container operator-state-filter { 2157 if-feature "operator-actions"; 2158 presence "Operator state filter"; 2159 leaf state { 2160 type operator-state; 2161 description 2162 "Filter on operator state."; 2163 } 2164 leaf user { 2165 type string; 2166 description 2167 "Filter based on which operator."; 2168 } 2169 description 2170 "Filter based on operator state."; 2171 } 2172 } 2174 rpc purge-alarms { 2175 description 2176 "This operation requests the server to delete entries from the 2177 alarm list or the shelved alarms list according to the 2178 supplied criteria. To purge alarms in the shelved alarms, 2179 set the operator-state filter input to 'shelved'. 2180 Typically it can be used to delete alarms that are 2181 in closed operator state and older than a specified time. 2182 In the shelved alarm list it makes sense to delete alarms that 2183 are not relevant anymore. 2184 The number of purged alarms is returned as an output 2185 parameter."; 2186 input { 2187 uses filter-input; 2188 } 2189 output { 2190 leaf purged-alarms { 2191 type uint32; 2192 description 2193 "Number of purged alarms."; 2194 } 2195 } 2196 } 2198 /* 2199 * Notifications 2200 */ 2202 notification alarm-notification { 2203 description 2204 "This notification is used to report a state change for an 2205 alarm. The same notification is used for reporting a newly 2206 raised alarm, a cleared alarm or changing the text and/or 2207 severity of an existing alarm."; 2208 uses common-alarm-parameters; 2209 uses alarm-state-change-parameters; 2210 } 2211 notification alarm-inventory-changed { 2212 description 2213 "This notification is used to report that the list of possible 2214 alarms has changed. This can happen when for example if a new 2215 software module is installed, or a new physical card is 2216 inserted."; 2217 } 2218 } 2220 2222 6. X.733 Extensions 2224 Many alarm systems are based on the X.733, [X.733], and X.736 [X.736] 2225 alarm standards. This module augments the alarm inventory, the alarm 2226 lists and the alarm notification with X.733 and X.736 parameters. 2228 The module also supports a feature whereby the alarm manager can 2229 configure the mapping from alarm types to X.733 event-type and 2230 probable-cause parameters. This might be needed when the default 2231 mapping provided by the system is in conflict with other management 2232 systems or not considered correct. 2234 Note that the IETF Alarm Module term 'resource' is synonymous to the 2235 ITU term 'managed object'. 2237 7. The X.733 Mapping Module 2239 This YANG module references [X.733] and [X.736]. 2241 file "ietf-alarms-x733@2018-11-06.yang" 2242 module ietf-alarms-x733 { 2243 yang-version 1.1; 2244 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; 2245 prefix x733; 2247 import ietf-alarms { 2248 prefix al; 2249 } 2250 import ietf-yang-types { 2251 prefix yang; 2252 reference "RFC 6991: Common YANG Data Types"; 2253 } 2255 organization 2256 "IETF CCAMP Working Group"; 2257 contact 2258 "WG Web: 2259 WG List: 2261 Editor: Stefan Vallin 2262 2264 Editor: Martin Bjorklund 2265 "; 2266 description 2267 "This module augments the ietf-alarms module with X.733 alarm 2268 parameters. 2270 The following structures are augmented with X.733 event type 2271 and probable cause: 2273 1) alarms/alarm-inventory: all possible alarm types 2274 2) alarms/alarm-list: every alarm in the system 2275 3) alarm-notification: notifications indicating alarm state 2276 changes 2277 4) alarms/shelved-alarms 2279 The module also optionally allows the alarm management system 2280 to configure the mapping from the IETF Alarm module alarm keys 2281 to the ITU tuple (event-type, probable-cause). 2283 The mapping does not include a corresponding X.733 specific 2284 problem value. The recommendation is to use the 2285 'alarm-type-qualifier' leaf which serves the same purpose. 2287 The module uses an integer and a corresponding string for 2288 probable cause instead of a globally defined enumeration, in 2289 order to be able to manage conflicting enumeration definitions. 2290 A single globally defined enumeration is challenging to 2291 maintain."; 2292 reference 2293 "ITU Recommendation X.733: Information Technology 2294 - Open Systems Interconnection 2295 - System Management: Alarm Reporting Function"; 2297 revision 2018-11-06 { 2298 description 2299 "Initial revision."; 2300 reference "RFC XXXX: YANG Alarm Module"; 2301 } 2303 /* 2304 * Features 2305 */ 2307 feature configure-x733-mapping { 2308 description 2309 "The system supports configurable X733 mapping from 2310 the IETF alarm module alarm-type to X733 event-type 2311 and probable-cause."; 2312 } 2314 /* 2315 * Typedefs 2316 */ 2318 typedef event-type { 2319 type enumeration { 2320 enum other { 2321 value 1; 2322 description 2323 "None of the below."; 2324 } 2325 enum communications-alarm { 2326 value 2; 2327 description 2328 "An alarm of this type is principally associated with the 2329 procedures and/or processes required to convey 2330 information from one point to another."; 2331 } 2332 enum quality-of-service-alarm { 2333 value 3; 2334 description 2335 "An alarm of this type is principally associated with a 2336 degradation in the quality of a service."; 2337 } 2338 enum processing-error-alarm { 2339 value 4; 2340 description 2341 "An alarm of this type is principally associated with a 2342 software or processing fault."; 2343 } 2344 enum equipment-alarm { 2345 value 5; 2346 description 2347 "An alarm of this type is principally associated with an 2348 equipment fault."; 2349 } 2350 enum environmental-alarm { 2351 value 6; 2352 description 2353 "An alarm of this type is principally associated with a 2354 condition relating to an enclosure in which the equipment 2355 resides."; 2356 } 2357 enum integrity-violation { 2358 value 7; 2359 description 2360 "An indication that information may have been illegally 2361 modified, inserted or deleted."; 2362 } 2363 enum operational-violation { 2364 value 8; 2365 description 2366 "An indication that the provision of the requested service 2367 was not possible due to the unavailability, malfunction or 2368 incorrect invocation of the service."; 2369 } 2370 enum physical-violation { 2371 value 9; 2372 description 2373 "An indication that a physical resource has been violated 2374 in a way that suggests a security attack."; 2375 } 2376 enum security-service-or-mechanism-violation { 2377 value 10; 2378 description 2379 "An indication that a security attack has been detected by 2380 a security service or mechanism."; 2381 } 2382 enum time-domain-violation { 2383 value 11; 2384 description 2385 "An indication that an event has occurred at an unexpected 2386 or prohibited time."; 2387 } 2388 } 2389 description 2390 "The event types as defined by X.733 and X.736."; 2391 reference 2392 "ITU Recommendation X.733: Information Technology 2393 - Open Systems Interconnection 2394 - System Management: Alarm Reporting Function 2395 ITU Recommendation X.736: Information Technology 2396 - Open Systems Interconnection 2397 - System Management: Security Alarm Reporting Function"; 2398 } 2400 typedef trend { 2401 type enumeration { 2402 enum less-severe { 2403 description 2404 "There is at least one outstanding alarm of a 2405 severity higher (more severe) than that in the 2406 current alarm."; 2407 } 2408 enum no-change { 2409 description 2410 "The Perceived severity reported in the current 2411 alarm is the same as the highest (most severe) 2412 of any of the outstanding alarms"; 2413 } 2414 enum more-severe { 2415 description 2416 "The Perceived severity in the current alarm is 2417 higher (more severe) than that reported in any 2418 of the outstanding alarms."; 2419 } 2420 } 2421 description 2422 "This type is used to describe the 2423 severity trend of the alarming resource"; 2424 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2425 } 2427 typedef value-type { 2428 type union { 2429 type int64; 2430 type uint64; 2431 type decimal64 { 2432 fraction-digits 2; 2433 } 2434 } 2435 description 2436 "A generic union type to match ITU choice of integer 2437 and real."; 2438 } 2440 /* 2441 * Groupings 2442 */ 2444 grouping x733-alarm-parameters { 2445 description 2446 "Common X.733 parameters for alarms."; 2447 leaf event-type { 2448 type event-type; 2449 description 2450 "The X.733/X.736 event type for this alarm."; 2451 } 2452 leaf probable-cause { 2453 type uint32; 2454 description 2455 "The X.733 probable cause for this alarm."; 2456 } 2457 leaf probable-cause-string { 2458 type string; 2459 description 2460 "The user friendly string matching 2461 the probable cause integer value. The string 2462 SHOULD match the X.733 enumeration. For example, 2463 value 27 is 'localNodeTransmissionError'."; 2464 } 2465 container threshold-information { 2466 description 2467 "This parameter shall be present when the alarm 2468 is a result of crossing a threshold. "; 2469 leaf triggered-threshold { 2470 type string; 2471 description 2472 "The identifier of the threshold attribute that 2473 caused the notification."; 2474 } 2475 leaf observed-value { 2476 type value-type; 2477 description 2478 "The value of the gauge or counter which crossed 2479 the threshold. This may be different from the 2480 threshold value if, for example, the gauge may 2481 only take on discrete values."; 2482 } 2483 choice threshold-level { 2484 description 2485 "In the case of a gauge the threshold level specifies 2486 a pair of threshold values, the first being the value 2487 of the crossed threshold and the second, its corresponding 2488 hysteresis; in the case of a counter the threshold level 2489 specifies only the threshold value."; 2490 case up { 2491 leaf up-high { 2492 type value-type; 2493 description 2494 "The going up threshold for rising the alarm."; 2495 } 2496 leaf up-low { 2497 type value-type; 2498 description 2499 "The threshold level for clearing the alarm. 2500 This is used for hysteresis functions for gauges."; 2501 } 2502 } 2503 case down { 2504 leaf down-low { 2505 type value-type; 2506 description 2507 "The going down threshold for rising the alarm."; 2508 } 2509 leaf down-high { 2510 type value-type; 2511 description 2512 "The threshold level for clearing the alarm. 2513 This is used for hysteresis functions for gauges."; 2514 } 2515 } 2516 } 2517 leaf arm-time { 2518 type yang:date-and-time; 2519 description 2520 "For a gauge threshold, the time at which the threshold 2521 was last re-armed, namely the time after the previous 2522 threshold crossing at which the hysteresis value of the 2523 threshold was exceeded thus again permitting generation 2524 of notifications when the threshold is crossed. 2525 For a counter threshold, the later of the time at which 2526 the threshold offset was last applied, or the time at 2527 which the counter was last initialized (for resettable 2528 counters)."; 2529 } 2530 } 2531 list monitored-attributes { 2532 uses attribute; 2533 key "id"; 2534 description 2535 "The Monitored attributes parameter, when present, defines 2536 one or more attributes of the resource and their 2537 corresponding values at the time of the alarm."; 2538 } 2539 leaf-list proposed-repair-actions { 2540 type string; 2541 description 2542 "This parameter, when present, is used if the cause is 2543 known and the system being managed can suggest one or 2544 more solutions (such as switch in standby equipment, 2545 retry, replace media)."; 2546 } 2547 leaf trend-indication { 2548 type trend; 2549 description 2550 "This parameter specifies the current 2551 severity trend of the resource. If present it 2552 indicates that there are one or more alarms 2553 ('outstanding alarms') which have not been cleared, 2554 and pertain to the same resource as that to which 2555 this alarm ('current alarm') pertains. 2556 The possible values are: 2558 more-severe: The Perceived severity in the current 2559 alarm is higher (more severe) than that reported in 2560 any of the outstanding alarms. 2562 no-change: The Perceived severity reported in the 2563 current alarm is the same as the highest (most severe) 2564 of any of the outstanding alarms. 2566 less-severe: There is at least one outstanding alarm 2567 of a severity higher (more severe) than that in the 2568 current alarm."; 2569 } 2570 leaf backedup-status { 2571 type boolean; 2572 description 2573 "This parameter, when present, specifies whether or not 2574 the object emitting the alarm has been backed-up, and 2575 services provided to the user have, therefore, not been 2576 disrupted. The use of this field in conjunction with the 2577 severity field provides information in an independent form 2578 to qualify the seriousness of the alarm and the ability of 2579 the system as a whole to continue to provide services. 2580 If the value of this parameter is true, it indicates that 2581 the object emitting the alarm has been backed-up; if false, 2582 the object has not been backed-up."; 2583 } 2584 leaf backup-object { 2585 type al:resource; 2586 description 2587 "This parameter shall be present when the Backed-up status 2588 parameter is present and has the value true. This parameter 2589 specifies the managed object instance that is providing 2590 back-up services for the managed object about which the 2591 notification pertains. This parameter is useful, 2592 for example, when the back-up object is from a pool of 2593 objects any of which may be dynamically allocated to 2594 replace a faulty object."; 2595 } 2596 list additional-information { 2597 key "identifier"; 2598 description 2599 "This parameter allows the inclusion of a 2600 set of additional information in the alarm. It is 2601 a series of data structures each of which contains three 2602 items of information: an identifier, a significance 2603 indicator, and the problem information."; 2604 leaf identifier { 2605 type string; 2606 description 2607 "Identifies the data-type of the information parameter."; 2608 } 2609 leaf significant { 2610 type boolean; 2611 description 2612 "Set to true if the receiving system must be able to 2613 parse the contents of the information subparameter 2614 for the event report to be fully understood."; 2615 } 2616 leaf information { 2617 type string; 2618 description 2619 "Additional information about the alarm."; 2620 } 2621 } 2622 leaf security-alarm-detector { 2623 type al:resource; 2624 description 2625 "This parameter identifies the detector of the security 2626 alarm."; 2627 } 2628 leaf service-user { 2629 type al:resource; 2630 description 2631 "This parameter identifies the service-user whose request 2632 for service led to the generation of the security alarm."; 2633 } 2634 leaf service-provider { 2635 type al:resource; 2636 description 2637 "This parameter identifies the intended service-provider 2638 of the service that led to the generation of the security 2639 alarm."; 2640 } 2641 reference 2642 "ITU Recommendation X.733: Information Technology 2643 - Open Systems Interconnection 2644 - System Management: Alarm Reporting Function 2645 ITU Recommendation X.736: Information Technology 2646 - Open Systems Interconnection 2647 - System Management: Security Alarm Reporting Function"; 2648 } 2650 grouping x733-alarm-definition-parameters { 2651 description 2652 "Common X.733 parameters for alarm definitions. 2653 This grouping is used to define those alarm 2654 attributes that can be mapped from the alarm-type 2655 mechanism in the ietf-alarm module."; 2656 leaf event-type { 2657 type event-type; 2658 description 2659 "The alarm type has this X.733/X.736 event type."; 2660 } 2661 leaf probable-cause { 2662 type uint32; 2663 description 2664 "The alarm type has this X.733 probable cause value. 2665 This module defines probable cause as an integer 2666 and not as an enumeration. The reason being that the 2667 primary use of probable cause is in the management 2668 application if it is based on the X.733 standard. 2669 However, most management applications have their own 2670 defined enum definitions and merging enums from 2671 different systems might create conflicts. By using 2672 a configurable uint32 the system can be configured 2673 to match the enum values in the management application."; 2674 } 2675 leaf probable-cause-string { 2676 type string; 2677 description 2678 "This string can be used to give a user friendly string 2679 to the probable cause value."; 2680 } 2681 } 2683 grouping attribute { 2684 description 2685 "A grouping to match the ITU generic reference to 2686 an attribute."; 2687 leaf id { 2688 type al:resource; 2689 description 2690 "The resource representing the attribute."; 2691 } 2692 leaf value { 2693 type string; 2694 description 2695 "The value represented as a string since it could 2696 be of any type."; 2697 } 2698 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2699 } 2701 /* 2702 * Add X.733 parameters to the alarm definitions, alarms, 2703 * and notification. 2704 */ 2706 augment "/al:alarms/al:alarm-inventory/al:alarm-type" { 2707 description 2708 "Augment X.733 mapping information to the alarm inventory."; 2709 uses x733-alarm-definition-parameters; 2710 } 2712 /* 2713 * Add X.733 configurable mapping. 2714 */ 2716 augment "/al:alarms/al:control" { 2717 description 2718 "Add X.733 mapping capabilities. "; 2719 list x733-mapping { 2720 if-feature "configure-x733-mapping"; 2721 key "alarm-type-id alarm-type-qualifier-match"; 2722 description 2723 "This list allows a management application to control the 2724 X.733 mapping for all alarm types in the system. Any entry 2725 in this list will allow the alarm manager to over-ride the 2726 default X.733 mapping in the system and the final mapping 2727 will be shown in the alarm inventory."; 2728 leaf alarm-type-id { 2729 type al:alarm-type-id; 2730 description 2731 "Map the alarm type with this alarm type identifier."; 2732 } 2733 leaf alarm-type-qualifier-match { 2734 type string; 2735 description 2736 "A W3C regular expression that is used when mapping an 2737 alarm type and alarm-type-qualifier to X.733 parameters."; 2738 } 2739 uses x733-alarm-definition-parameters; 2740 } 2741 } 2742 augment "/al:alarms/al:alarm-list/al:alarm" { 2743 description 2744 "Augment X.733 information to the alarm."; 2745 uses x733-alarm-parameters; 2746 } 2747 augment "/al:alarms/al:shelved-alarms/al:shelved-alarm" { 2748 description 2749 "Augment X.733 information to the alarm."; 2751 uses x733-alarm-parameters; 2752 } 2753 augment "/al:alarm-notification" { 2754 description 2755 "Augment X.733 information to the alarm notification."; 2756 uses x733-alarm-parameters; 2757 } 2758 } 2760 2762 8. IANA Considerations 2764 This document registers a URI in the IETF XML registry [RFC3688]. 2765 Following the format in RFC 3688, the following registration is 2766 requested to be made. 2768 URI: urn:ietf:params:xml:ns:yang:ietf-alarms 2770 Registrant Contact: The IESG. 2772 XML: N/A, the requested URI is an XML namespace. 2774 This document registers a YANG module in the YANG Module Names 2775 registry [RFC6020]. 2777 name: ietf-alarms 2778 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms 2779 prefix: al 2780 reference: RFC XXXX 2782 9. Security Considerations 2784 The YANG module specified in this document defines a schema for data 2785 that is designed to be accessed via network management protocols such 2786 as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer 2787 is the secure transport layer, and the mandatory-to-implement secure 2788 transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer 2789 is HTTPS, and the mandatory-to-implement secure transport is TLS 2790 [RFC5246]. 2792 The NETCONF access control model [RFC6536] provides the means to 2793 restrict access for particular NETCONF or RESTCONF users to a 2794 preconfigured subset of all available NETCONF or RESTCONF protocol 2795 operations and content. 2797 There are a number of data nodes defined in this YANG module that are 2798 writable/creatable/deletable (i.e., config true, which is the 2799 default). These data nodes may be considered sensitive or vulnerable 2800 in some network environments. Write operations (e.g., edit-config) 2801 to these data nodes without proper protection can have a negative 2802 effect on network operations. These are the subtrees and data nodes 2803 and their sensitivity/vulnerability: 2805 /alarms/control/notify-status-change: This leaf controls whether an 2806 alarm should notify only raise and clear or all severity level 2807 changes. Unauthorized access to leaf could have a negative impact 2808 on operational procedures relying on fine-grained alarm state 2809 change reporting. 2811 /alarms/control/alarm-shelving/shelf: This list controls the 2812 shelving (blocking) of alarms. Unauthorized access to this list 2813 could jeopardize the alarm management procedures since these 2814 alarms will not be notified and not be part of the alarm list. 2816 Some of the RPC operations in this YANG module may be considered 2817 sensitive or vulnerable in some network environments. It is thus 2818 important to control access to these operations. These are the 2819 operations and their sensitivity/vulnerability: 2821 purge-alarms: This RPC deletes alarms from the alarm list. 2822 Unauthorized use of this RPC could jeopardize the alarm management 2823 procedures since the deleted alarms may be vital for the alarm 2824 management application. 2826 10. Acknowledgements 2828 The authors wish to thank Viktor Leijon and Johan Nordlander for 2829 their valuable input on forming the alarm model. 2831 The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and 2832 Balazs Lengyel for their extensive reviews and contributions to this 2833 document. 2835 11. References 2837 11.1. Normative References 2839 [M.3100] International Telecommunications Union, "Generic Network 2840 Information Model", ITU-T Recommendation M.3100, 2005. 2842 [M.3160] International Telecommunications Union, "Generic, 2843 protocol-neutral management information model", 2844 ITU-T Recommendation M.3100, 2008. 2846 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2847 Requirement Levels", BCP 14, RFC 2119, 2848 DOI 10.17487/RFC2119, March 1997, . 2851 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 2852 DOI 10.17487/RFC3688, January 2004, . 2855 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 2856 (TLS) Protocol Version 1.2", RFC 5246, 2857 DOI 10.17487/RFC5246, August 2008, . 2860 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 2861 the Network Configuration Protocol (NETCONF)", RFC 6020, 2862 DOI 10.17487/RFC6020, October 2010, . 2865 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 2866 and A. Bierman, Ed., "Network Configuration Protocol 2867 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 2868 . 2870 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 2871 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 2872 . 2874 [RFC6536] Bierman, A. and M. Bjorklund, "Network Configuration 2875 Protocol (NETCONF) Access Control Model", RFC 6536, 2876 DOI 10.17487/RFC6536, March 2012, . 2879 [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", 2880 RFC 6991, DOI 10.17487/RFC6991, July 2013, 2881 . 2883 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 2884 RFC 7950, DOI 10.17487/RFC7950, August 2016, 2885 . 2887 [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 2888 Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, 2889 . 2891 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2892 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2893 May 2017, . 2895 [X.733] International Telecommunications Union, "Information 2896 Technology - Open Systems Interconnection - Systems 2897 Management: Alarm Reporting Function", 2898 ITU-T Recommendation X.733, 1992. 2900 11.2. Informative References 2902 [ALARMIRP] 2903 3GPP, "Telecommunication management; Fault Management; 2904 Part 2: Alarm Integration Reference Point (IRP): 2905 Information Service (IS)", 3GPP TS 32.111-2 3.4.0, March 2906 2005. 2908 [ALARMSEM] 2909 Wallin, S., Leijon, V., Nordlander, J., and N. Bystedt, 2910 "The semantics of alarm definitions: enabling systematic 2911 reasoning about alarms. International Journal of Network 2912 Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, 2913 http://dx.doi.org/10.1002/nem.800", March 2012. 2915 [EEMUA] EEMUA Publication No. 191 Engineering Equipment and 2916 Materials Users Association, London, 2 edition., "Alarm 2917 Systems: A Guide to Design, Management and Procurement.", 2918 2007. 2920 [G.7710] ITU-T, "SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL 2921 SYSTEMS AND NETWORKS Data over Transport - Generic aspects 2922 - Transport network control aspects. Common equipment 2923 management function requirements", 2012. 2925 [ISA182] International Society of Automation,ISA, "ANSI/ISA- 2926 18.2-2009 Management of Alarm Systems for the Process 2927 Industries", 2009. 2929 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 2930 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 2931 September 2004, . 2933 [RFC4268] Chisholm, S. and D. Perkins, "Entity State MIB", RFC 4268, 2934 DOI 10.17487/RFC4268, November 2005, . 2937 [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", 2938 BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, 2939 . 2941 [RFC8348] Bierman, A., Bjorklund, M., Dong, J., and D. Romascanu, "A 2942 YANG Data Model for Hardware Management", RFC 8348, 2943 DOI 10.17487/RFC8348, March 2018, . 2946 [X.736] International Telecommunications Union, "Information 2947 Technology - Open Systems Interconnection - Systems 2948 Management: Security alarm reporting function", 2949 ITU-T Recommendation X.736, 1992. 2951 Appendix A. Vendor-specific Alarm-Types Example 2953 This example shows how to define alarm-types in a vendor-specific 2954 module. In this case the vendor "xyz" has chosen to define top level 2955 identities according to X.733 event types. 2957 module example-xyz-alarms { 2958 namespace "urn:example:xyz-alarms"; 2959 prefix xyz-al; 2961 import ietf-alarms { 2962 prefix al; 2963 } 2965 identity xyz-alarms { 2966 base al:alarm-type-id; 2967 } 2969 identity communications-alarm { 2970 base xyz-alarms; 2971 } 2972 identity quality-of-service-alarm { 2973 base xyz-alarms; 2974 } 2975 identity processing-error-alarm { 2976 base xyz-alarms; 2977 } 2978 identity equipment-alarm { 2979 base xyz-alarms; 2980 } 2981 identity environmental-alarm { 2982 base xyz-alarms; 2983 } 2985 // communications alarms 2986 identity link-alarm { 2987 base communications-alarm; 2988 } 2990 // QoS alarms 2991 identity high-jitter-alarm { 2992 base quality-of-service-alarm; 2993 } 2994 } 2996 Appendix B. Alarm Inventory Example 2998 This shows an alarm inventory, it shows one alarm type defined only 2999 with the identifier, and another dynamically configured. In the 3000 latter case a digital input has been connected to a smoke-detector, 3001 therefore the 'alarm-type-qualifier' is set to "smoke-detector" and 3002 the 'alarm-type-identity' to "environmental-alarm". 3004 3007 3008 3009 xyz-al:link-alarm 3010 3011 3012 /dev:interfaces/dev:interface 3013 3014 true 3015 3016 Link failure, operational state down but admin state up 3017 3018 3019 3020 xyz-al:environmental-alarm 3021 smoke-alarm 3022 true 3023 3024 Connected smoke detector to digital input 3025 3026 3027 3028 3030 Appendix C. Alarm List Example 3032 In this example we show an alarm that has toggled [major, clear, 3033 major]. An operator has acknowledged the alarm. 3035 3038 3039 1 3040 2015-04-08T08:39:50.00Z 3042 3043 3044 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3045 3046 xyz-al:link-alarm 3047 3049 2015-04-08T08:39:50.00Z 3050 false 3051 1.3.6.1.2.1.2.2.1.1.17 3052 2015-04-08T08:39:40.00Z 3053 major 3054 3055 Link operationally down but administratively up 3056 3057 3058 3059 major 3060 3061 Link operationally down but administratively up 3062 3063 3064 3065 3066 cleared 3067 3068 Link operationally up and administratively up 3069 3070 3071 3072 3073 major 3074 3075 Link operationally down but administratively up 3076 3077 3078 3079 3080 ack 3081 joe 3082 Will investigate, ticket TR764999 3083 3084 3085 3086 3088 Appendix D. Alarm Shelving Example 3090 This example shows how to shelf alarms. We shelf alarms related to 3091 the smoke-detectors since they are being installed and tested. We 3092 also shelf all alarms from FastEthernet1/0. 3094 3097 3098 3099 3100 FE10 3101 3102 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3103 3104 3105 3106 detectortest 3107 xyz-al:environmental-alarm 3108 3109 smoke-alarm 3110 3111 3112 3113 3114 3116 Appendix E. X.733 Mapping Example 3118 This example shows how to map a dynamic alarm type (alarm-type- 3119 identity=environmental-alarm, alarm-type-qualifier=smoke-alarm) to 3120 the corresponding X.733 event-type and probable cause parameters. 3122 3124 3125 3127 xyz-al:environmental-alarm 3128 3129 smoke-alarm 3130 3131 quality-of-service-alarm 3132 777 3133 3134 3135 3137 Appendix F. Relationships to other standards 3139 This section briefly describes how this alarm module relates to other 3140 relevant standards. 3142 F.1. Relationship to RFC 8348 3144 RFC 8348 [RFC8348] defines a YANG data model for the management of 3145 hardware. The "alarm-state" in RFC 8348 (and EntityAlarmStatus in 3146 RFC 4268 [RFC4268]) is a summary of the alarm severity levels that 3147 may be active on the specific hardware component. It does not say 3148 anything about how alarms are reported, and it doesn't provide any 3149 details of the alarms. 3151 The mapping between the alarm YANG data-model and the alarm-state in 3152 RFC 8348 are outlined below 3154 resource: corresponds to /hardware/component/ 3156 is-cleared: no bit set in /hardware/component/state/alarm-state 3158 perceived-severity: corresponding bit set in 3159 /hardware/component/state/alarm-state 3161 operator-state-change/state: if the alarm is acknowledged by the 3162 operator it may correspond to under-repair 3164 F.2. Relationship to other alarm standards 3166 F.2.1. Alarm definition 3168 The table below summarizes relevant definitions of the term "alarm" 3169 in other alarm standards. 3171 +------------+---------------------------+--------------------------+ 3172 | Standard | Definition | Comment | 3173 +------------+---------------------------+--------------------------+ 3174 | X.733 | error: A deviation of a | The X.733 alarm | 3175 | [X.733] | system from normal | definition is focused on | 3176 | | operation. fault: The | the notification as such | 3177 | | physical or algorithmic | and not the state. It | 3178 | | cause of a malfunction. | also uses the basic | 3179 | | Faults manifest | criteria of deviation | 3180 | | themselves as errors. | from normal condition. | 3181 | | alarm: A notification, of | There is no requirement | 3182 | | the form defined by this | for an operation action | 3183 | | function, of a specific | to be required. | 3184 | | event. An alarm may or | | 3185 | | may not represent an | | 3186 | | error. | | 3187 | | | | 3188 | G.7710 | Alarms are indications | The G.7710 definition is | 3189 | [G.7710] | that are automatically | close to the original | 3190 | | generated by an NE as a | X.733 definition. | 3191 | | result of the declaration | | 3192 | | of a failure. | | 3193 | | | | 3194 | Alarm MIB | Alarm: Persistent | RFC 3877 defines alarm | 3195 | [RFC3877] | indication of a fault. | referring back to "a | 3196 | | Fault: Lasting error or | deviation from normal | 3197 | | warning condition. | operation". This is | 3198 | | Error: A deviation of a | problematic, since this | 3199 | | system from normal | might not require an | 3200 | | operation. | operator action. The | 3201 | | | alarm MIB is state | 3202 | | | oriented rather than | 3203 | | | notification oriented, | 3204 | | | an alarm is a "lasting | 3205 | | | condition", not a | 3206 | | | discrete notification | 3207 | | | reporting about a | 3208 | | | condition state change. | 3209 | | | | 3210 | ISA | Alarm: An audible and/or | The ISA standard adds an | 3211 | [ISA182] | visible means of | important requirement to | 3212 | | indicating to the | the "deviation from | 3213 | | operator an equipment | normal condition state"; | 3214 | | malfunction, process | requiring a response. | 3215 | | deviation or abnormal | | 3216 | | condition requiring a | | 3217 | | response. | | 3218 | | | | 3219 | EEMUA | An alarm is an event to | This is the foundation | 3220 | [EEMUA] | which an operator must | for the definition of | 3221 | | knowingly react,respond, | alarm in this document. | 3222 | | and acknowledge - not | It focuses on the core | 3223 | | simply acknowledge and | criteria that an action | 3224 | | ignore. | is really needed. | 3225 | | | | 3226 | 3GPP Alarm | 3GPP v15: An alarm | The latest 3GPP Alarm | 3227 | IRP | signifies an undesired | IRP version uses | 3228 | [ALARMIRP] | condition of a resource | literally the same alarm | 3229 | | (e.g. network element, | definition as this alarm | 3230 | | link) for which an | module. It is worth | 3231 | | operator action is | noting that earlier | 3232 | | required. It emphasizes a | versions used a | 3233 | | key requirement that | definition not requiring | 3234 | | operators [...] should | an operator action and | 3235 | | not be informed about an | the more broad | 3236 | | undesired condition | definition of deviation | 3237 | | unless it requires | from normal condition. | 3238 | | operator action. 3GPP | The earlier version also | 3239 | | v12: alarm: abnormal | defined an alarm as a | 3240 | | network entity condition, | special case of "event". | 3241 | | which categorizes an | | 3242 | | event as a fault. fault: | | 3243 | | a deviation of a system | | 3244 | | from normal operation, | | 3245 | | which may result in the | | 3246 | | loss of operational | | 3247 | | capabilities [...] | | 3248 +------------+---------------------------+--------------------------+ 3250 Table 1: Definition of alarm in standards 3252 The evolution of the definition of alarm moves from focused on events 3253 reporting a deviation from normal operation towards a definition to a 3254 undesired *state* which *requires an operator action*. 3256 F.2.2. Data model 3258 This section describes how this YANG alarm module relates to other 3259 standard data models. Note well that we cover other data-models for 3260 alarm interfaces. Not other standards such as SDO specific alarms 3261 for example. 3263 F.2.2.1. X.733 3265 X.733 has acted as a base for several alarm data models over the 3266 year. The YANG alarm module differs in the following ways: 3268 X.733 models the alarm list as a list of notifications. The YANG 3269 alarm module defines the alarm list as the current alarm states 3270 for the resources, which is generated from the state change 3271 reporting notifications. 3273 In X.733 an alarm can have the severity level clear. In the YANG 3274 alarm module "clear" is not a severity level, it is a separate 3275 state of the alarm. An alarm can have the following states for 3276 example (major, cleared), (minor, not cleared) 3278 X.733 uses a flat globally defined enumerated "probable cause" to 3279 identify alarm types. This alarm module uses a hierarchical YANG 3280 identity, alarm-type. This enables delegation of alarm types 3281 within organizations. It also lets management reason about 3282 "abstract" alarm-types corresponding to base identities, see 3283 Section 3.2. 3285 The YANG alarm module has not included the majority of the X.733 3286 alarm attributes. Rather these are defined in an augmenting 3287 module if "strict" X.733 compliance is needed. 3289 F.2.2.2. RFC 3877, the Alarm MIB 3291 The MIB in RFC 3877 takes a different approach, rather than defining 3292 a concrete data model for alarms, it defines a model to map existing 3293 SNMP managed objects and notifications into alarm states and alarm 3294 notifications. This was necessary since MIBs were already defined 3295 with both managed objects and notifications indicating alarms, for 3296 example linkUp and linkDown notifications in combination with 3297 ifAdminState and ifOperState. So RFC 3877 can not really be compared 3298 to the alarm YANG module in that sense. 3300 The Alarm MIB maps existing MIB definitions into alarms, 3301 alarmModelTable. The upside of that is that a SNMP Manager can at 3302 runtime read the possible alarm types. This corresponds to the 3303 alarmInventory in the alarm YANG module. 3305 F.2.2.3. 3GPP Alarm IRP 3307 The 3GPP Alarm IRP is an evolution of X.733. Main differences 3308 between the alarm YANG module and 3GPP are: 3310 3GPP keeps the majority of the X.733 attributes, the alarm YANG 3311 module does not. 3313 3GPP introduced overlapping and possibly conflicting keys for 3314 alarms, alarmId and (managed object, event type, probable cause, 3315 specific problem). (See Annex C in [X.733] Example 3). In the 3316 YANG alarm module the key for identifying an alarm instance is 3317 clearly defined by (resource, alarm-type, alarm-type-qualifier). 3318 See also Section 3.4 for more information. 3320 The alarm YANG module clearly separates the resource/ 3321 instrumentation life cycle from the operator life cycle. 3GPP 3322 allows operators to set the alarm severity to clear, this is not 3323 allowed by this module, rather an operator closes an alarm which 3324 does not affect the severity. 3326 F.2.2.4. G.7710 3328 G.7710 is different than the previous referenced alarm standards. It 3329 does define a data-model for alarm reporting. It defines common 3330 equipment management function requirements including alarm 3331 instrumentation. The scope is transport networks. 3333 The requirements in G.7710 corresponds to features in the alarm YANG 3334 module in the following way: 3336 Alarm Severity Assignment Profile (ASAP): the alarm profile 3337 "/alarms/alarm-profile/". 3339 Alarm Reporting Control (ARC): alarm shelving "/alarms/control/ 3340 alarm-shelving/" and the ability to control alarm notifications 3341 "/alarms/control/notify-status-changes". Alarm shelving 3342 corresponds to the use case of turning off alarm reporting for a 3343 specific resource, the NALM state in M.3100. 3345 Appendix G. Alarm Usability Requirements 3347 This section defines usability requirements for alarms. Alarm 3348 usability is important for an alarm interface. A data-model will 3349 help in defining the format but if the actual alarms are of low value 3350 we have not gained the goal of alarm management. 3352 Common alarm problems and the cause of the problems are summarized in 3353 Table 2. This summary is adopted to networking based on the ISA 3354 [ISA182] and EEMUA [EEMUA] standards. 3356 +------------------+--------------------------------+---------------+ 3357 | Problem | Cause | How this | 3358 | | | module | 3359 | | | address the | 3360 | | | cause | 3361 +------------------+--------------------------------+---------------+ 3362 | Alarms are | "Nuisance" alarms (chattering | Strict | 3363 | generated but | alarms and fleeting alarms), | definition of | 3364 | they are ignored | faulty hardware, redundant | alarms | 3365 | by the operator. | alarms, cascading alarms, | requiring | 3366 | | incorrect alarm settings, | corrective | 3367 | | alarms have not been | response. | 3368 | | rationalized, the alarms | Alarm | 3369 | | represent log information | requirements | 3370 | | rather than true alarms. | in Table 3. | 3371 | | | | 3372 | When alarms | Insufficient alarm response | The alarm | 3373 | occur, operators | procedures and not well | inventory | 3374 | do not know how | defined alarm types. | lists all | 3375 | to respond. | | alarm types | 3376 | | | and | 3377 | | | corrective | 3378 | | | actions. | 3379 | | | Alarm | 3380 | | | requirements | 3381 | | | in Table 3. | 3382 | | | | 3383 | The alarm | Nuisance alarms, stale alarms, | The alarm | 3384 | display is full | alarms from equipment not in | definition | 3385 | of alarms, even | service. | and alarm | 3386 | when there is | | shelving. | 3387 | nothing wrong. | | | 3388 | | | | 3389 | During a | Incorrect prioritization of | State-based | 3390 | failure, | alarms. Not using advanced | alarm model, | 3391 | operators are | alarm techniques (e.g. state- | alarm rate | 3392 | flooded with so | based alarming). | requirements | 3393 | many alarms that | | in Table 4 | 3394 | they do not know | | and Table 5 | 3395 | which ones are | | | 3396 | the most | | | 3397 | important. | | | 3398 +------------------+--------------------------------+---------------+ 3400 Table 2: Alarm Problems and Causes 3402 Based upon the above problems EEMUA gives the following definition of 3403 a good alarm: 3405 +----------------+--------------------------------------------------+ 3406 | Characteristic | Explanation | 3407 +----------------+--------------------------------------------------+ 3408 | Relevant | Not spurious or of low operational value. | 3409 | | | 3410 | Unique | Not duplicating another alarm. | 3411 | | | 3412 | Timely | Not long before any response is needed or too | 3413 | | late to do anything. | 3414 | | | 3415 | Prioritized | Indicating the importance that the operator | 3416 | | deals with the problem. | 3417 | | | 3418 | Understandable | Having a message which is clear and easy to | 3419 | | understand. | 3420 | | | 3421 | Diagnostic | Identifying the problem that has occurred. | 3422 | | | 3423 | Advisory | Indicative of the action to be taken. | 3424 | | | 3425 | Focusing | Drawing attention to the most important issues. | 3426 +----------------+--------------------------------------------------+ 3428 Table 3: Definition of a Good Alarm 3430 Vendors SHOULD rationalize all alarms according to above. Another 3431 crucial requirement is acceptable alarm notification rates. Vendors 3432 SHOULD make sure that they do not exceed the recommendations from 3433 EEMUA below: 3435 +-----------------------------------+-------------------------------+ 3436 | Long Term Alarm Rate in Steady | Acceptability | 3437 | Operation | | 3438 +-----------------------------------+-------------------------------+ 3439 | More than one per minute | Very likely to be | 3440 | | unacceptable. | 3441 | | | 3442 | One per 2 minutes | Likely to be over-demanding. | 3443 | | | 3444 | One per 5 minutes | Manageable. | 3445 | | | 3446 | Less than one per 10 minutes | Very likely to be acceptable. | 3447 +-----------------------------------+-------------------------------+ 3449 Table 4: Acceptable Alarm Rates, Steady State 3451 +----------------------------+--------------------------------------+ 3452 | Number of alarms displayed | Acceptability | 3453 | in 10 minutes following a | | 3454 | major network problem | | 3455 +----------------------------+--------------------------------------+ 3456 | More than 100 | Definitely excessive and very likely | 3457 | | to lead to the operator to abandon | 3458 | | the use of the alarm system. | 3459 | | | 3460 | 20-100 | Hard to cope with. | 3461 | | | 3462 | Under 10 | Should be manageable - but may be | 3463 | | difficult if several of the alarms | 3464 | | require a complex operator response. | 3465 +----------------------------+--------------------------------------+ 3467 Table 5: Acceptable Alarm Rates, Burst 3469 The numbers in Table 4 and Table 5 are the sum of all alarms for a 3470 network being managed from one alarm console. So every individual 3471 system or NMS contributes to these numbers. 3473 Vendors SHOULD make sure that the following rules are used in 3474 designing the alarm interface: 3476 1. Rationalize the alarms in the system to ensure that every alarm 3477 is necessary, has a purpose, and follows the cardinal rule - that 3478 it requires an operator response. Adheres to the rules of 3479 Table 3 3481 2. Audit the quality of the alarms. Talk with the operators about 3482 how well the alarm information support them. Do they know what 3483 to do in the event of an alarm? Are they able to quickly 3484 diagnose the problem and determine the corrective action? Does 3485 the alarm text adhere to the requirements in Table 3? 3487 3. Analyze and benchmark the performance of the system and compare 3488 it to the recommended metrics in Table 4 and Table 5. Start by 3489 identifying nuisance alarms, standing alarms at normal state and 3490 startup. 3492 Authors' Addresses 3494 Stefan Vallin 3495 Stefan Vallin AB 3497 Email: stefan@wallan.se 3498 Martin Bjorklund 3499 Cisco 3501 Email: mbj@tail-f.com