idnits 2.17.1 draft-ietf-ccamp-alarm-module-03.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 441 has weird spacing: '...perator str...' == Line 446 has weird spacing: '...w state wri...' == Line 642 has weird spacing: '...alifier ala...' == Line 692 has weird spacing: '...alifier lea...' == Line 702 has weird spacing: '...everity sev...' == (3 more instances...) -- The document date (September 20, 2018) is 2045 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC6536' is mentioned on line 2770, but not defined ** Obsolete undefined reference: RFC 6536 (Obsoleted by RFC 8341) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) Summary: 2 errors (**), 0 flaws (~~), 8 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Vallin 3 Internet-Draft Stefan Vallin AB 4 Intended status: Standards Track M. Bjorklund 5 Expires: March 24, 2019 Cisco 6 September 20, 2018 8 YANG Alarm Module 9 draft-ietf-ccamp-alarm-module-03 11 Abstract 13 This document defines a YANG module for alarm management. It 14 includes functions for alarm list management, alarm shelving and 15 notifications to inform management systems. There are also RPCs to 16 manage the operator state of an alarm and administrative alarm 17 procedures. The module carefully maps to relevant alarm standards. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on March 24, 2019. 36 Copyright Notice 38 Copyright (c) 2018 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 55 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 57 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 58 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.3. Identifying the Alarming Resource . . . . . . . . . . . . 7 60 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 8 61 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 62 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 9 63 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 10 64 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 10 65 3.6. Root Cause, Impacted Resources and Related Alarms . . . . 10 66 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 11 67 3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 11 68 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 12 69 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 13 70 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 13 71 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 13 72 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 14 73 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 15 74 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 17 75 4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 17 76 4.7. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 17 77 4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 17 78 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 18 79 6. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 47 80 7. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 48 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 59 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 59 83 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 60 86 11.2. Informative References . . . . . . . . . . . . . . . . . 61 87 Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 62 88 Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 63 89 Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 64 90 Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 65 91 Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 66 92 Appendix F. Background and Usability Requirements . . . . . . . 67 93 F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 67 94 F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 67 95 F.2. Relationships to other alarm standards . . . . . . . . . 68 96 F.2.1. Alarm definition . . . . . . . . . . . . . . . . . . 68 97 F.2.2. Data model . . . . . . . . . . . . . . . . . . . . . 70 98 F.3. Usability Requirements . . . . . . . . . . . . . . . . . 72 99 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75 101 1. Introduction 103 This document defines a YANG [RFC7950] module for alarm management. 104 The purpose is to define a standardized alarm interface for network 105 devices that can be easily integrated into management applications. 106 The model is also applicable as a northbound alarm interface in the 107 management applications. 109 Alarm monitoring is a fundamental part of monitoring the network. 110 Raw alarms from devices do not always tell the status of the network 111 services or necessarily point to the root cause. However, being able 112 to feed alarms to the alarm management application in a standardized 113 format is a starting point for performing higher level network 114 assurance tasks. 116 The design of the module is based on experience from using and 117 implementing available alarm standards from ITU [X.733], 3GPP 118 [ALARMIRP] and ANSI [ISA182]. 120 1.1. Terminology and Notation 122 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 123 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 124 "OPTIONAL" in this document are to be interpreted as described in BCP 125 14 [RFC2119] [RFC8174] when, and only when, they appear in all 126 capitals, as shown here. 128 The following terms are defined in [RFC7950]: 130 o action 132 o client 134 o data tree 136 o RPC 138 o server 140 The following terms are used within this document: 142 o Alarm (the general concept): An alarm signifies an undesirable 143 state in a resource that requires corrective action. 145 o Alarm Type: An alarm type identifies a possible unique alarm state 146 for a resource. Alarm types are names to identify the state like 147 "link-alarm", "jitter-violation", "high-disk-utilization". 149 o Resource: A fine-grained identification of the alarming resource, 150 for example: an interface, a process. 152 o Alarm Instance: The alarm state for a specific resource and alarm 153 type. For example (GigabitEthernet0/15, link-alarm). An entry in 154 the alarm list. 156 o Alarm Inventory: A list of all possible alarm types on a system. 158 o Alarm Shelving: Blocking alarms according to specific criteria. 160 o Corrective Action: An action taken by an operator or automation 161 routine in order to minimize the impact of the alarm or resolving 162 the root cause. 164 o Management System: The alarm management application that consumes 165 the alarms, i.e., acts as a client. 167 o System: The system that implements this YANG alarm module, i.e., 168 acts as a server. This corresponds to a network device or a 169 management application that provides a north-bound alarm 170 interface. 172 Tree diagrams used in this document follow the notation defined in 173 [RFC8340]. 175 2. Objectives 177 The objectives for the design of the Alarm Module are: 179 o Simple to use. If a system supports this module, it shall be 180 straight-forward to integrate this into a YANG based alarm 181 manager. 183 o View alarms as states on resources and not as discrete 184 notifications. 186 o Clear definition of "alarm" in order to exclude general events 187 that should not be forwarded as alarm notifications. 189 o Clear and precise identification of alarm types and alarm 190 instances. 192 o A management system should be able to pull all available alarm 193 types from a system, i.e., read the alarm inventory from a system. 194 This makes it possible to prepare alarm operators with 195 corresponding alarm instructions. 197 o Address alarm usability requirements, see Appendix F. While IETF 198 has not really addressed alarm management, telecom standards has 199 addressed it purely from a protocol perspective. The process 200 industry has published several relevant standards addressing 201 requirements for a useful alarm interface; [EEMUA], [ISA182]. 202 This alarm module defines usability requirements as well as a YANG 203 data model. 205 o Mapping to X.733, which is a requirement for some alarm systems. 206 Still, keep some of the X.733 concepts out of the core model in 207 order to make the model small and easy to understand. 209 3. Alarm Module Concepts 211 This section defines the fundamental concepts behind the data model. 212 This section is rooted in the works of Vallin et. al [ALARMSEM]. 214 3.1. Alarm Definition 216 An alarm signifies an undesirable state in a resource that requires 217 corrective action. 219 There are two main things to remember from this definition: 221 1. the definition focuses on leaving out events and logging 222 information in general. Alarms should only be used for undesired 223 states that require action. 225 2. the definition also focus on alarms as a state on a resource, not 226 the notifications that report the state changes. 228 See Appendix F for more motivation and consequences around this 229 definition as well as how it relates to other alarm standards. 231 3.2. Alarm Type 233 This document defines an alarm type with an alarm type id and an 234 alarm type qualifier. 236 The alarm type id is modeled as a YANG identity. With YANG 237 identities, new alarm types can be defined in a distributed fashion. 238 YANG identities are hierarchical, which means that an hierarchy of 239 alarm types can be defined. 241 Standards and vendors should define their own alarm type identities 242 based on this definition. 244 The use of YANG identities means that all possible alarms are 245 identified at design time. This explicit declaration of alarm types 246 makes it easier to allow for alarm qualification reviews and 247 preparation of alarm actions and documentation. 249 There are occasions where the alarm types are not known at design 250 time. For example, a system with digital inputs that allows users to 251 connects detectors (e.g., smoke detector) to the inputs. In this 252 case it is a configuration action that says that certain connectors 253 are fire alarms for example. 255 In order to allow for dynamic addition of alarm types the alarm 256 module allows for further qualification of the identity based alarm 257 type using a string. A potential drawback of this is that there is a 258 big risk that alarm operators will receive alarm types as a surprise, 259 they do not know how to resolve the problem since a defined alarm 260 procedure does not necessarily exist. To avoid this risk the system 261 MUST publish all possible alarm types in the alarm inventory, see 262 Section 4.2. 264 A vendor or standard organization can define their own alarm-type 265 hierarchy. The example below shows a hierarchy based on X.733 event 266 types: 268 import ietf-alarms { 269 prefix al; 270 } 271 identity vendor-alarms { 272 base al:alarm-type; 273 } 274 identity communications-alarm { 275 base vendor-alarms; 276 } 277 identity link-alarm { 278 base communications-alarm; 279 } 281 Alarm types can be abstract. An abstract alarm type is used as a 282 base for defining hierarchical alarm types. Concrete alarm types are 283 used for alarm states and appear in the alarm inventory. There are 284 two kinds of concrete alarm types: 286 1. The last subordinate identity in the "alarm-type-id" hierarchy is 287 concrete, for example: "alarm-identity.environmental- 288 alarm.smoke". In this example "alarm-identity" and 289 "environmental-alarm" are abstract YANG identities, whereas 290 "smoke" is a concrete YANG identity. 292 2. The YANG identity hierarchy is abstract and the concrete alarm 293 type is defined by the dynamic alarm qualifier string, for 294 example: "alarm-identity.environmental-alarm.external-detector" 295 with alarm-type-qualifier "smoke". 297 For example: 299 // Alternative 1: concrete alarm type identity 300 import ietf-alarms { 301 prefix al; 302 } 303 identity environmental-alarm { 304 base al:alarm-type; 305 description "Abstract alarm type"; 306 } 307 identity smoke { 308 base environmental-alarm; 309 description "Concrete alarm type"; 310 } 312 // Alternative 2: concrete alarm type qualifier 313 import ietf-alarms { 314 prefix al; 315 } 316 identity environmental-alarm { 317 base al:alarm-type; 318 description "Abstract alarm type"; 319 } 320 identity external-detector { 321 base environmental-alarm; 322 description 323 "Abstract alarm type, a run-time configuration 324 procedure sets the type of alarm detected. This will 325 be reported in the alarm-type-qualifier."; 326 } 328 A server SHOULD strive to minimize the number of dynamically defined 329 alarm types. 331 3.3. Identifying the Alarming Resource 333 It is of vital importance to be able to refer to the alarming 334 resource. This reference must be as fine-grained as possible. If 335 the alarming resource exists in the data tree then an instance- 336 identifier MUST be used with the full path to the object. 338 When the module is used in a controller/orchestrator/manager the 339 original device resource identification can be modified to include 340 the device in the path. The details depend on how devices are 341 identified, and are out of scope for this specification. 343 Example: 345 The original device alarm might identify the resource as 346 "/dev:interfaces/dev:interface[dev:name='FastEthernet1/0']". 348 The resource identification in the manager could look something 349 like: "/mgr:devices/mgr:device[mgr:name='xyz123']/dev:interfaces/ 350 dev:interface[dev:name='FastEthernet1/0']" 352 This module also allows for alternate naming of the alarming resource 353 if it is not available in the data tree. 355 3.4. Identifying Alarm Instances 357 A primary goal of this alarm module is to remove any ambiguity in how 358 alarm notifications are mapped to an update of an alarm instance. 359 X.733 and especially 3GPP were not really clear on this point. This 360 YANG alarm module states that the tuple (resource, alarm type 361 identifier, alarm type qualifier) corresponds to a single alarm 362 instance. This means that alarm notifications for the same resource 363 and same alarm type are matched to update the same alarm instance. 364 These three leafs are therefore used as the key in the alarm list: 366 list alarm { 367 key "resource alarm-type-id alarm-type-qualifier"; 368 ... 369 } 371 3.5. Alarm Life-Cycle 373 The alarm model clearly separates the resource alarm life-cycle from 374 the operator and administrative life-cycles of an alarm. 376 o resource alarm life-cycle: the alarm instrumentation that controls 377 alarm raise, clearance, and severity changes. 379 o operator alarm life-cycle: operators acting upon alarms with 380 actions like acknowledgment and closing. Closing an alarm implies 381 that the operator considers the corrective action performed. 382 Operators can also shelf (block/filter) alarms in order to avoid 383 nuisance alarms. 385 o administrative alarm life-cycle: purging (deleting) unwanted 386 alarms and compressing the alarm status change list. This module 387 exposes operations to manage the administrative life-cycle. The 388 server may also perform these operations based on other policies, 389 but how that is done is out of scope for this document. 391 A server SHOULD describe how long it retains cleared/closed alarms: 392 until manually purged or if it has an automatic removal policy. 394 3.5.1. Resource Alarm Life-Cycle 396 From a resource perspective, an alarm can for example have the 397 following life-cycle: raise, change severity, change severity, clear, 398 being raised again etc. All of these status changes can have 399 different alarm texts generated by the instrumentation. Two 400 important things to note: 402 1. Alarms are not deleted when they are cleared. Deleting alarms is 403 an administrative process. The alarm module defines an rpc 404 "purge" that deletes alarms. 406 2. Alarms are not cleared by operators, only the underlying 407 instrumentation can clear an alarm. Operators can close alarms. 409 The YANG tree representation below illustrates the resource oriented 410 life-cycle: 412 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 413 ... 414 +--ro is-cleared boolean 415 +--ro last-changed yang:date-and-time 416 +--ro perceived-severity severity 417 +--ro alarm-text alarm-text 418 +--ro status-change* [time] 419 +--ro time yang:date-and-time 420 +--ro perceived-severity severity-with-clear 421 +--ro alarm-text alarm-text 423 For every status change from the resource perspective a row is added 424 to the "status-change" list. The last status values are also 425 represented as leafs for the alarm. Note well that the alarm 426 severity does not include "cleared", alarm clearance is a boolean 427 flag. 429 An alarm can therefore look like this: ((GigabitEthernet0/25, link- 430 alarm,""), false, T, major, "Interface GigabitEthernet0/25 down") 432 3.5.2. Operator Alarm Life-cycle 434 Operators can also act upon alarms using the set-operator-state 435 action: 437 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 438 ... 439 +--ro operator-state-change* [time] {operator-actions}? 440 | +--ro time yang:date-and-time 441 | +--ro operator string 442 | +--ro state operator-state 443 | +--ro text? string 444 +---x set-operator-state {operator-actions}? 445 +---w input 446 +---w state writable-operator-state 447 +---w text? string 449 The operator state for an alarm can be: "none", "ack", "shelved", and 450 "closed". Alarm deletion (using the rpc "purge-alarms"), can use 451 this state as a criteria. A closed alarm is an alarm where the 452 operator has performed any required corrective actions. Closed 453 alarms are good candidates for being purged. 455 3.5.3. Administrative Alarm Life-Cycle 457 Deleting alarms from the alarm list is considered an administrative 458 action. This is supported by the "purge-alarms" rpc. The "purge- 459 alarms" rpc takes a filter as input. The filter selects alarms based 460 on the operator and resource life-cycle such as "all closed cleared 461 alarms older than a time specification". The server may also perform 462 these operations based on other policies, but how that is done is out 463 of scope for this document. 465 Alarms can be compressed. Compressing an alarm deletes all entries 466 in the alarm's "status-change" list except for the last status 467 change. A client can perform this using the "compress-alarms" rpc. 468 The server may also perform these operations based on other policies, 469 but how that is done is out of scope for this document. 471 3.6. Root Cause, Impacted Resources and Related Alarms 473 The general principle of this alarm module is to limit the amount of 474 alarms. The alarm has two leaf-lists to identify possible impacted 475 resources and possible root-cause resources. The system should not 476 represent individual alarms for the possible root-cause resources and 477 impacted resources. These serves as hints only. It is up to the 478 client application to use this information to present the overall 479 status. 481 A system should always strive to identify the resource that can be 482 acted upon as the "resource" leaf. The "impacted-resource" leaf-list 483 shall be used to identify any side-effects of the alarm. The 484 impacted resources can not be acted upon to fix the problem. An 485 example of this kind of alarm might be a disc full problem which 486 impacts a number of databases. 488 In some occasions the system might not be capable of detecting the 489 root cause, the resource that can be acted upon. The instrumentation 490 in this case only monitors the side-effect and needs to represent an 491 alarm that indicates a situation that needs acting upon. The 492 instrumentation still might identify possible candidates for the 493 root-cause resource. In this case the "root-cause-resource" leaf- 494 list can be used to indicate the candidate root-cause resources. An 495 example of this kind of alarm might be an active test tool that 496 detects an SLA violation on a VPN connection and identifies the 497 devices along the chain as candidate root causes. 499 The alarm module also supports a way to associate different alarms to 500 each other with the "related-alarm" list. This list enables the 501 server to inform the client that certain alarms are related to other 502 alarms. 504 Note well that this module does not prescribe any dependencies or 505 preference between the above alarm correlation mechanisms. Different 506 systems have different capabilities and the above described 507 mechanisms are available to support the instrumentation features. 509 3.7. Alarm Shelving 511 Alarm shelving is an important function in order for alarm management 512 applications and operators to stop superfluous alarms. A shelved 513 alarm implies that any alarms fulfilling this criteria are ignored 514 (blocked/filtered). Shelved alarms appear in a dedicated shelved 515 alarm list in order not to disturb the relevant alarms. Shelved 516 alarms do not generate notifications. 518 3.8. Alarm Profiles 520 Alarm profiles are used to configure further information to an alarm 521 type. This module supports configuring severity levels overriding 522 the system default levels. This corresponds to the Alarm Assignment 523 Profile, ASAP, functionality in M.3100 [M.3100] and M.3160 [M.3160]. 524 Other standard or enterprise modules can augment this list with 525 further alarm type information. 527 4. Alarm Data Model 529 The fundamental parts of the data model are the "alarm-list" with 530 associated notifications and the "alarm-inventory" list of all 531 possible alarm types. These MUST be implemented by a system. The 532 rest of the data model are made conditional with YANG the features 533 "operator-actions", "alarm-shelving", "alarm-history", "alarm- 534 summary", "alarm-profile", and "severity-assignment". 536 The data model has the following overall structure: 538 +--rw control 539 | +--rw max-alarm-status-changes? union 540 | +--rw (notify-status-changes)? 541 | | ... 542 | +--rw alarm-shelving {alarm-shelving}? 543 | ... 544 +--ro alarm-inventory 545 | +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 546 | ... 547 +--ro summary {alarm-summary}? 548 | +--ro alarm-summary* [severity] 549 | | ... 550 | +--ro shelves-active? empty {alarm-shelving}? 551 +--ro alarm-list 552 | +--ro number-of-alarms? yang:gauge32 553 | +--ro last-changed? yang:date-and-time 554 | +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 555 | ... 556 +--ro shelved-alarms {alarm-shelving}? 557 | +--ro number-of-shelved-alarms? yang:gauge32 558 | +--ro alarm-shelf-last-changed? yang:date-and-time 559 | +--ro shelved-alarm* 560 | [resource alarm-type-id alarm-type-qualifier] 561 | ... 562 +--rw alarm-profile* 563 [alarm-type-id alarm-type-qualifier-match resource] 564 {alarm-profile}? 565 +--rw alarm-type-id al:alarm-type-id 566 +--rw alarm-type-qualifier-match string 567 +--rw resource al:resource-match 568 +--rw description string 569 +--rw alarm-severity-assignment-profile 570 {severity-assignment}? 571 ... 573 4.1. Alarm Control 575 The "/alarms/control/notify-status-changes" choice controls if 576 notifications are sent for all state changes, only raise and clear, 577 or only notifications more severe than a configured level. This 578 feature in combination with alarm shelving corresponds to the ITU 579 Alarm Report Control functionality. 581 Every alarm has a list of status changes, this is a circular list. 582 The length of this list is controlled by "/alarms/control/max-alarm- 583 status-changes". 585 4.1.1. Alarm Shelving 587 The shelving control tree is shown below: 589 +--rw control 590 +--rw alarm-shelving {alarm-shelving}? 591 +--rw shelf* [name] 592 +--rw name string 593 +--rw resource* resource-match 594 +--rw alarm-type-id? alarm-type-id 595 +--rw alarm-type-qualifier-match? string 596 +--rw description? string 598 Shelved alarms are shown in a dedicated shelved alarm list. The 599 instrumentation MUST move shelved alarms from the alarm list 600 (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- 601 alarms/). Shelved alarms do not generate any notifications. When 602 the shelving criteria is removed or changed the alarm list MUST be 603 updated to the correct actual state of the alarms. 605 Shelving and unshelving can only be performed by editing the shelf 606 configuration. It cannot be performed on individual alarms. The 607 server will add an operator state indicating that the alarm was 608 shelved/unshelved. 610 A leaf (/alarms/summary/shelfs-active) in the alarm summary indicates 611 if there are shelved alarms. 613 A system can select to not support the shelving feature. 615 4.2. Alarm Inventory 617 The alarm inventory represents all possible alarm types that may 618 occur in the system. A management system may use this to build alarm 619 procedures. The alarm inventory is relevant for several reasons: 621 The system might not instrument all defined alarm type identities, 622 and some alarm identities are abstract. 624 The system has configured dynamic alarm types using the alarm 625 qualifier. The inventory makes it possible for the management 626 system to discover these. 628 Note that the mechanism whereby dynamic alarm types are added using 629 the alarm type qualifier MUST populate this list. 631 The optional leaf-list "resource" in the alarm inventory enables the 632 system to publish for which resources a given alarm type may appear. 634 A server MUST implement the alarm inventory in order to enable 635 controlled alarm procedures in the client. 637 The alarm inventory tree is shown below: 639 +--ro alarm-inventory 640 +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 641 +--ro alarm-type-id alarm-type-id 642 +--ro alarm-type-qualifier alarm-type-qualifier 643 +--ro resource* resource-match 644 +--ro has-clear boolean 645 +--ro severity-levels* severity 646 +--ro description string 648 4.3. Alarm Summary 650 The alarm summary list summarizes alarms per severity; how many 651 cleared, cleared and closed, and closed. It also gives an indication 652 if there are shelved alarms. 654 The alarm summary tree is shown below: 656 +--ro summary {alarm-summary}? 657 +--ro alarm-summary* [severity] 658 | +--ro severity severity 659 | +--ro total? yang:gauge32 660 | +--ro cleared? yang:gauge32 661 | +--ro cleared-not-closed? yang:gauge32 662 | | {operator-actions}? 663 | +--ro cleared-closed? yang:gauge32 664 | | {operator-actions}? 665 | +--ro not-cleared-closed? yang:gauge32 666 | | {operator-actions}? 667 | +--ro not-cleared-not-closed? yang:gauge32 668 | {operator-actions}? 669 +--ro shelves-active? empty {alarm-shelving}? 671 4.4. The Alarm List 673 The alarm list (/alarms/alarm-list) is a function from (resource, 674 alarm type, alarm type qualifier) to the current composite alarm 675 state. The composite state includes states for the resource life- 676 cycle such as severity, clearance flag and operator states such as 677 acknowledgment. 679 +--ro alarm-list 680 +--ro number-of-alarms? yang:gauge32 681 +--ro last-changed? yang:date-and-time 682 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 683 +--ro resource resource 684 +--ro alarm-type-id alarm-type-id 685 +--ro alarm-type-qualifier alarm-type-qualifier 686 +--ro alt-resource* resource 687 +--ro related-alarm* 688 | [resource alarm-type-id alarm-type-qualifier] 689 | +--ro resource 690 | | -> /alarms/alarm-list/alarm/resource 691 | +--ro alarm-type-id leafref 692 | +--ro alarm-type-qualifier leafref 693 +--ro impacted-resource* resource 694 +--ro root-cause-resource* resource 695 +--ro time-created yang:date-and-time 696 +--ro is-cleared boolean 697 +--ro last-changed yang:date-and-time 698 +--ro perceived-severity severity 699 +--ro alarm-text alarm-text 700 +--ro status-change* [time] {alarm-history}? 701 | +--ro time yang:date-and-time 702 | +--ro perceived-severity severity-with-clear 703 | +--ro alarm-text alarm-text 704 +--ro operator-state-change* [time] {operator-actions}? 705 | +--ro time yang:date-and-time 706 | +--ro operator string 707 | +--ro state operator-state 708 | +--ro text? string 709 +---x set-operator-state {operator-actions}? 710 | +---w input 711 | +---w state writable-operator-state 712 | +---w text? string 713 +---n operator-action {operator-actions}? 714 +-- time yang:date-and-time 715 +-- operator string 716 +-- state operator-state 717 +-- text? string 719 Every alarm has three important states, the resource clearance state 720 "is-cleared", the severity "perceived-severity" and the operator 721 state available in the operator state change list. 723 In order to see the alarm history the resource state changes are 724 available in the "status-change" list and the operator history is 725 available in the "operator-state-change" list. 727 4.5. The Shelved Alarms List 729 The shelved alarm list has the same structure as the alarm list 730 above. It shows all the alarms that matches the shelving criteria 731 (/alarms/control/alarm-shelving). 733 4.6. Alarm Profiles 735 Alarm profiles (/alarms/alarm-profile/) is a list of configurable 736 alarm types. The list supports configurable alarm severity levels in 737 the container "alarm-severity-assignment-profile". If an alarm 738 matches the configured alarm type it MUST use the configured severity 739 level(s) instead of the system default. This configuration MUST also 740 be represented in the alarm inventory. 742 +--rw alarm-profile* 743 [alarm-type-id alarm-type-qualifier-match resource] 744 {alarm-profile}? 745 +--rw alarm-type-id al:alarm-type-id 746 +--rw alarm-type-qualifier-match string 747 +--rw resource al:resource-match 748 +--rw description string 749 +--rw alarm-severity-assignment-profile 750 {severity-assignment}? 751 +--rw severity-levels* al:severity 753 4.7. RPCs and Actions 755 The alarm module supports rpcs and actions to manage the alarms: 757 "purge-alarms" (rpc): delete alarms according to specific 758 criteria, for example all cleared alarms older then a specific 759 date. 761 "compress-alarms" (rpc): compress the status-change list for the 762 alarms. 764 "set-operator-state" (action): change the operator state for an 765 alarm: for example acknowledge. 767 4.8. Notifications 769 The alarm module supports a general notification to report alarm 770 state changes. It carries all relevant parameters for the alarm 771 management application. 773 There is also a notification to report that an operator changed the 774 operator state on an alarm, like acknowledge. 776 If the alarm inventory is changed, for example a new card type is 777 inserted, a notification will tell the management application that 778 new alarm types are available. 780 5. Alarm YANG Module 782 This YANG module references [RFC6991]. 784 file "ietf-alarms@2018-09-20.yang" 785 module ietf-alarms { 786 yang-version 1.1; 787 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; 788 prefix al; 790 import ietf-yang-types { 791 prefix yang; 792 reference "RFC 6991: Common YANG Data Types."; 793 } 795 organization 796 "IETF CCAMP Working Group"; 797 contact 798 "WG Web: 799 WG List: 801 Editor: Stefan Vallin 802 804 Editor: Martin Bjorklund 805 "; 806 description 807 "This module defines an interface for managing alarms. Main 808 inputs to the module design are the 3GPP Alarm IRP, ITU-T X.733 809 and ANSI/ISA-18.2 alarm standards. 811 Main features of this module include: 813 * Alarm list: 814 A list of all alarms. Cleared alarms stay in 815 the list until explicitly purged. 817 * Operator actions on alarms: 818 Acknowledging and closing alarms. 820 * Administrative actions on alarms: 821 Purging alarms from the list according to specific 822 criteria. 824 * Alarm inventory: 825 A management application can read all 826 alarm types implemented by the system. 828 * Alarm shelving: 829 Shelving (blocking) alarms according 830 to specific criteria. 832 * Alarm profiles: 833 A management system can attach further 834 information to alarm types, for example 835 overriding system default severity 836 levels. 838 This module uses a stateful view on alarms. An alarm is a state 839 for a specific resource (note that an alarm is not a 840 notification). An alarm type is a possible alarm state for a 841 resource. For example, the tuple: 843 ('link-alarm', 'GigabitEthernet0/25') 845 is an alarm of type 'link-alarm' on the resource 846 'GigabitEthernet0/25'. 848 Alarm types are identified using YANG identities and an optional 849 string-based qualifier. The string-based qualifier allows for 850 dynamic extension of the statically defined alarm types. Alarm 851 types identify a possible alarm state and not the individual 852 notifications. For example, the traditional 'link-down' and 853 'link-up' notifications are two notifications referring to the 854 same alarm type 'link-alarm'. 856 With this design there is no ambiguity about how alarm and alarm 857 clear correlation should be performed: notifications that report 858 the same resource and alarm type are considered updates of the 859 same alarm, e.g., clearing an active alarm or changing the 860 severity of an alarm. 862 The instrumentation can update 'severity' and 'alarm-text' on an 863 existing alarm. The above alarm example can therefore look 864 like: 866 (('link-alarm', 'GigabitEthernet0/25'), 867 warning, 868 'interface down while interface admin state is up') 870 There is a clear separation between updates on the alarm from 871 the underlying resource, like clear, and updates from an 872 operator like acknowledge or closing an alarm: 874 (('link-alarm', 'GigabitEthernet0/25'), 875 warning, 876 'interface down while interface admin state is up', 877 cleared, 878 closed) 880 Administrative actions like removing closed alarms older than a 881 given time is supported. 883 This alarm module does not define how the underlying 884 instrumentation detects and clears the specific alarms. 885 That belongs to the SDO or enterprise that owns that 886 specific technology. 888 Copyright (c) 2018 IETF Trust and the persons identified as 889 authors of the code. All rights reserved. 891 Redistribution and use in source and binary forms, with or 892 without modification, is permitted pursuant to, and subject to 893 the license terms contained in, the Simplified BSD License set 894 forth in Section 4.c of the IETF Trust's Legal Provisions 895 Relating to IETF Documents 896 (https://trustee.ietf.org/license-info). 898 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 899 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 900 'OPTIONAL' in the module text are to be interpreted as described 901 in RFC 2119 (https://tools.ietf.org/html/rfc2119). 903 This version of this YANG module is part of RFC XXXX 904 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 905 full legal notices."; 907 revision 2018-09-20 { 908 description 909 "Initial revision."; 910 reference "RFC XXXX: YANG Alarm Module"; 911 } 913 /* 914 * Features 915 */ 917 feature operator-actions { 918 description 919 "This feature indicates that the system supports operator 920 states on alarms."; 921 } 923 feature alarm-shelving { 924 description 925 "This feature indicates that the system supports shelving 926 (blocking) alarms."; 927 } 929 feature alarm-history { 930 description 931 "This feature indicates that server maintains a history of 932 state changes for each alarm. For example, if an alarm 933 toggles between cleared and active 10 times, these state 934 changes are present in a separate list in the alarm."; 935 } 937 feature alarm-summary { 938 description 939 "This feature indicates that the server summarizes the number 940 of alarms per severity and operator state."; 941 } 943 feature alarm-profile { 944 description 945 "The system supports clients to configure further information 946 to each alarm type."; 947 } 949 feature severity-assignment { 950 description 951 "The system supports configurable alarm severity levels."; 952 reference 953 "M.3160/M.3100 Alarm Severity Assignment Profile, ASAP"; 954 } 956 /* 957 * Identities 958 */ 960 identity alarm-type-id { 961 description 962 "Base identity for alarm types. A unique identification of the 963 alarm, not including the resource. Different resources can 964 share alarm types. If the resource reports the same alarm 965 type, it is to be considered to be the same alarm. The alarm 966 type is a simplification of the different X.733 and 3GPP alarm 967 IRP alarm correlation mechanisms and it allows for 968 hierarchical extensions. 970 A string-based qualifier can be used in addition to the 971 identity in order to have different alarm types based on 972 information not known at design-time, such as values in 973 textual SNMP Notification var-binds. 975 Standards and vendors can define sub-identities to clearly 976 identify specific alarm types. 978 This identity is abstract and MUST NOT be used for alarms."; 979 } 981 /* 982 * Common types 983 */ 985 typedef resource { 986 type union { 987 type instance-identifier { 988 require-instance false; 989 } 990 type yang:object-identifier; 991 type yang:uuid; 992 type string; 993 } 994 description 995 "This is an identification of the alarming resource, such as an 996 interface. It should be as fine-grained as possible both to 997 guide the operator and to guarantee uniqueness of the alarms. 999 If the alarming resource is modelled in YANG, this type will 1000 be an instance-identifier. 1002 If the resource is an SNMP object, the type will be an 1003 object-identifier. 1005 If the resource is anything else, for example a distinguished 1006 name or a CIM path, this type will be a string. 1008 If the alarming object is identified by a UUID use the uuid 1009 type. Be cautious when using this type, since a UUID is hard 1010 to use for an operator. 1012 If the server supports several models, the presedence should 1013 be in the order as given in the union definition."; 1014 } 1016 typedef resource-match { 1017 type union { 1018 type yang:xpath1.0; 1019 type yang:object-identifier; 1020 type string; 1021 } 1022 description 1023 "This type is used to match resources of type 'resource'. 1024 Since the type 'resource' is a union of different types, 1025 the 'resource-match' type is also a union of corresponding 1026 types. 1028 If the type is given as an XPath 1.0 expression, a resource 1029 of type 'instance-identifier' matches if the instance is part 1030 of the node set that is the result of evaluating the XPath 1.0 1031 expression. For example, the XPath 1.0 expression: 1033 /if:interfaces/if:interface[if:type='ianaift:ethernetCsmacd'] 1035 would match the resource instance-identifier: 1037 /if:interfaces/if:interface[if:name='eth1'], 1039 assuming that the interface 'eth1' is of type 1040 'ianaift:ethernetCsmacd'. 1042 If the type is given as an object identifier, a resource of 1043 type 'object-identifier' matches if the match object 1044 identifier is a prefix of the resource's object identifier. 1045 For example, the value: 1047 1.3.6.1.2.1.2.2 1049 would match the resource object identifier: 1051 1.3.6.1.2.1.2.2.1.1.5 1053 If the type is given as an UUID or a string, it is interpreted 1054 as a W3C regular expression, which matches a resource of type 1055 'yang:uuid' or 'string' if the given regular expression 1056 matches the resource string. 1058 If the type is given as an XPath expression it is evaluated 1059 in the following XPath context: 1061 o The set of namespace declarations are those in scope on 1062 the leaf element where this type is used. 1064 o The set of variable bindings is empty. 1066 o The function library is the core function library 1067 and the functions defined in Section 10 of RFC 7950. 1069 o The function library is the core function library 1071 o The context node is the root node in the data tree."; 1072 } 1074 typedef alarm-text { 1075 type string; 1076 description 1077 "The string used to inform operators about the alarm. This 1078 MUST contain enough information for an operator to be able 1079 to understand the problem and how to resolve it. If this 1080 string contains structure, this format should be clearly 1081 documented for programs to be able to parse that 1082 information."; 1083 } 1085 typedef severity { 1086 type enumeration { 1087 enum indeterminate { 1088 value 2; 1089 description 1090 "Indicates that the severity level could not be 1091 determined. This level SHOULD be avoided."; 1092 } 1093 enum minor { 1094 value 3; 1095 description 1096 "The 'minor' severity level indicates the existence of a 1097 non-service affecting fault condition and that corrective 1098 action should be taken in order to prevent a more serious 1099 (for example, service affecting) fault. Such a severity 1100 can be reported, for example, when the detected alarm 1101 condition is not currently degrading the capacity of the 1102 resource."; 1103 } 1104 enum warning { 1105 value 4; 1106 description 1107 "The 'warning' severity level indicates the detection of a 1108 potential or impending service affecting fault, before any 1109 significant effects have been felt. Action should be 1110 taken to further diagnose (if necessary) and correct the 1111 problem in order to prevent it from becoming a more 1112 serious service affecting fault."; 1113 } 1114 enum major { 1115 value 5; 1116 description 1117 "The 'major' severity level indicates that a service 1118 affecting condition has developed and an urgent corrective 1119 action is required. Such a severity can be reported, for 1120 example, when there is a severe degradation in the 1121 capability of the resource and its full capability must be 1122 restored."; 1123 } 1124 enum critical { 1125 value 6; 1126 description 1127 "The 'critical' severity level indicates that a service 1128 affecting condition has occurred and an immediate 1129 corrective action is required. Such a severity can be 1130 reported, for example, when a resource becomes totally out 1131 of service and its capability must be restored."; 1132 } 1133 } 1134 description 1135 "The severity level of the alarm. Note well that value 'clear' 1136 is not included. If an alarm is cleared or not is a separate 1137 boolean flag."; 1138 reference 1139 "ITU Recommendation X.733: Information Technology 1140 - Open Systems Interconnection 1141 - System Management: Alarm Reporting Function"; 1142 } 1144 typedef severity-with-clear { 1145 type union { 1146 type enumeration { 1147 enum cleared { 1148 value 1; 1149 description 1150 "The alarm is cleared by the instrumentation."; 1151 } 1152 } 1153 type severity; 1154 } 1155 description 1156 "The severity level of the alarm including clear. 1158 This is used *only* in notifications reporting state changes 1159 for an alarm."; 1160 } 1162 typedef writable-operator-state { 1163 type enumeration { 1164 enum none { 1165 value 1; 1166 description 1167 "The alarm is not being taken care of."; 1168 } 1169 enum ack { 1170 value 2; 1171 description 1172 "The alarm is being taken care of. Corrective action not 1173 taken yet, or failed"; 1174 } 1175 enum closed { 1176 value 3; 1177 description 1178 "Corrective action taken successfully."; 1179 } 1180 } 1181 description 1182 "Operator states on an alarm. The 'closed' state indicates 1183 that an operator considers the alarm being resolved. This 1184 is separate from the alarm's 'is-cleared' leaf."; 1185 } 1187 typedef operator-state { 1188 type union { 1189 type writable-operator-state; 1190 type enumeration { 1191 enum shelved { 1192 value 4; 1193 description 1194 "The alarm is shelved. Alarms in /alarms/shelved-alarms/ 1195 MUST be assigned this operator state by the server as 1196 the last entry in the operator-state-change list. The 1197 text for that entry SHOULD include the shelf name."; 1198 } 1199 enum un-shelved { 1200 value 5; 1201 description 1202 "The alarm is moved back to 'alarm-list' from a shelf. 1203 Alarms that are moved from /alarms/shelved-alarms/ to 1204 /alarms/alarm-list MUST be assigned this state by the 1205 server as the last entry in the 'operator-state-change' 1206 list. The text for that entry SHOULD include the shelf 1207 name."; 1208 } 1209 } 1210 } 1211 description 1212 "Operator states on an alarm. The 'closed' state indicates 1213 that an operator considers the alarm being resolved. This 1214 is separate from the alarm's 'is-cleared' leaf."; 1215 } 1217 /* Alarm type */ 1219 typedef alarm-type-id { 1220 type identityref { 1221 base alarm-type-id; 1222 } 1223 description 1224 "Identifies an alarm type. The description of the alarm type 1225 id MUST indicate if the alarm type is abstract or not. An 1226 abstract alarm type is used as a base for other alarm type ids 1227 and will not be used as a value for an alarm or be present in 1228 the alarm inventory."; 1229 } 1231 typedef alarm-type-qualifier { 1232 type string; 1233 description 1234 "If an alarm type can not be fully specified at design time by 1235 alarm-type-id, this string qualifier is used in addition to 1236 fully define a unique alarm type. 1238 The definition of alarm qualifiers is considered being part 1239 of the instrumentation and out of scope for this module. 1240 An empty string is used when this is part of a key."; 1241 } 1243 /* 1244 * Groupings 1245 */ 1247 grouping common-alarm-parameters { 1248 description 1249 "Common parameters for an alarm. 1251 This grouping is used both in the alarm list and in the 1252 notification representing an alarm state change."; 1253 leaf resource { 1254 type resource; 1255 mandatory true; 1256 description 1257 "The alarming resource. See also 'alt-resource'. 1258 This could for example be a reference to the alarming 1259 interface"; 1260 } 1261 leaf alarm-type-id { 1262 type alarm-type-id; 1263 mandatory true; 1264 description 1265 "This leaf and the leaf 'alarm-type-qualifier' together 1266 provides a unique identification of the alarm type."; 1267 } 1268 leaf alarm-type-qualifier { 1269 type alarm-type-qualifier; 1270 description 1271 "This leaf is used when the 'alarm-type-id' leaf cannot 1272 uniquely identify the alarm type. Normally, this is not 1273 the case, and this leaf is the empty string."; 1274 } 1275 leaf-list alt-resource { 1276 type resource; 1277 description 1278 "Used if the alarming resource is available over other 1279 interfaces. This field can contain SNMP OID's, CIM paths or 1280 3GPP Distinguished names for example."; 1281 } 1282 list related-alarm { 1283 key "resource alarm-type-id alarm-type-qualifier"; 1284 description 1285 "References to related alarms. Note that the related alarm 1286 might have been purged from the alarm list."; 1287 leaf resource { 1288 type leafref { 1289 path "/alarms/alarm-list/alarm/resource"; 1290 require-instance false; 1291 } 1292 description 1293 "The alarming resource for the related alarm."; 1294 } 1295 leaf alarm-type-id { 1296 type leafref { 1297 path "/alarms/alarm-list/alarm" 1298 + "[resource=current()/../resource]" 1299 + "/alarm-type-id"; 1300 require-instance false; 1301 } 1302 description 1303 "The alarm type identifier for the related alarm."; 1304 } 1305 leaf alarm-type-qualifier { 1306 type leafref { 1307 path "/alarms/alarm-list/alarm" 1308 + "[resource=current()/../resource]" 1309 + "[alarm-type-id=current()/../alarm-type-id]" 1310 + "/alarm-type-qualifier"; 1311 require-instance false; 1312 } 1313 description 1314 "The alarm qualifier for the related alarm."; 1315 } 1316 } 1317 leaf-list impacted-resource { 1318 type resource; 1319 description 1320 "Resources that might be affected by this alarm. If the 1321 system creates an alarm on a resource and also has a mapping 1322 to other resources that might be impacted, these resources 1323 can be listed in this leaf-list. In this way the system can 1324 create one alarm instead of several. For example, if an 1325 interface has an alarm, the 'impacted-resource' can 1326 reference the aggregated port channels."; 1327 } 1328 leaf-list root-cause-resource { 1329 type resource; 1330 description 1331 "Resources that are candidates for causing the alarm. If the 1332 system has a mechanism to understand the candidate root 1333 causes of an alarm, this leaf-list can be used to list the 1334 root cause candidate resources. In this way the system can 1335 create one alarm instead of several. An example might be a 1336 logging system (alarm resource) that fails, the alarm can 1337 reference the file-system in the 'root-cause-resource' 1338 leaf-list. Note that the intended use is not to also send an 1339 an alarm with the root-cause-resource as alarming resource. 1340 The root-cause-resource leaf list is a hint and should not 1341 also generate an alarm for the same problem."; 1342 } 1343 } 1345 grouping alarm-state-change-parameters { 1346 description 1347 "Parameters for an alarm state change. 1349 This grouping is used both in the alarm list's 1350 status-change list and in the notification representing an 1351 alarm state change."; 1352 leaf time { 1353 type yang:date-and-time; 1354 mandatory true; 1355 description 1356 "The time the status of the alarm changed. The value 1357 represents the time the real alarm state change appeared 1358 in the resource and not when it was added to the 1359 alarm list. The /alarm-list/alarm/last-changed MUST be 1360 set to the same value."; 1361 } 1362 leaf perceived-severity { 1363 type severity-with-clear; 1364 mandatory true; 1365 description 1366 "The severity of the alarm as defined by X.733. Note 1367 that this may not be the original severity since the alarm 1368 may have changed severity."; 1369 reference 1370 "ITU Recommendation X.733: Information Technology 1371 - Open Systems Interconnection 1372 - System Management: Alarm Reporting Function"; 1373 } 1374 leaf alarm-text { 1375 type alarm-text; 1376 mandatory true; 1377 description 1378 "A user friendly text describing the alarm state change."; 1379 reference 1380 "ITU Recommendation X.733: Information Technology 1381 - Open Systems Interconnection 1382 - System Management: Alarm Reporting Function"; 1383 } 1384 } 1386 grouping operator-parameters { 1387 description 1388 "This grouping defines parameters that can be changed by an 1389 operator."; 1390 leaf time { 1391 type yang:date-and-time; 1392 mandatory true; 1393 description 1394 "Timestamp for operator action on alarm."; 1395 } 1396 leaf operator { 1397 type string; 1398 mandatory true; 1399 description 1400 "The name of the operator that has acted on this 1401 alarm."; 1402 } 1403 leaf state { 1404 type operator-state; 1405 mandatory true; 1406 description 1407 "The operator's view of the alarm state."; 1408 } 1409 leaf text { 1410 type string; 1411 description 1412 "Additional optional textual information provided by 1413 the operator."; 1414 } 1415 } 1417 grouping resource-alarm-parameters { 1418 description 1419 "Alarm parameters that originates from the resource view."; 1420 leaf is-cleared { 1421 type boolean; 1422 mandatory true; 1423 description 1424 "Indicates the current clearance state of the alarm. An 1425 alarm might toggle from active alarm to cleared alarm and 1426 back to active again."; 1427 } 1428 leaf last-changed { 1429 type yang:date-and-time; 1430 mandatory true; 1431 description 1432 "A timestamp when the alarm status was last changed. Status 1433 changes are changes to 'is-cleared', 'perceived-severity', 1434 and 'alarm-text'."; 1435 } 1436 leaf perceived-severity { 1437 type severity; 1438 mandatory true; 1439 description 1440 "The last severity of the alarm. 1442 If an alarm was raised with severity 'warning', but later 1443 changed to 'major', this leaf will show 'major'."; 1444 } 1445 leaf alarm-text { 1446 type alarm-text; 1447 mandatory true; 1448 description 1449 "The last reported alarm text. This text should contain 1450 information for an operator to be able to understand 1451 the problem and how to resolve it."; 1452 } 1453 list status-change { 1454 if-feature "alarm-history"; 1455 key "time"; 1456 min-elements 1; 1457 description 1458 "A list of status change events for this alarm. 1460 The entry with latest time-stamp in this list MUST 1461 correspond to the leafs 'is-cleared', 'perceived-severity' 1462 and 'alarm-text' for the alarm. The time-stamp for that 1463 entry MUST be equal to the 'last-changed' leaf. 1465 This list is ordered according to the timestamps of 1466 alarm state changes. The last item corresponds to the 1467 latest state change. 1469 The following state changes creates an entry in this 1470 list: 1471 - changed severity (warning, minor, major, critical) 1472 - clearance status, this also updates the 'is-cleared' 1473 leaf 1474 - alarm text update"; 1475 uses alarm-state-change-parameters; 1476 } 1477 } 1479 /* 1480 * The /alarms data tree 1481 */ 1483 container alarms { 1484 description 1485 "The top container for this module."; 1486 container control { 1487 description 1488 "Configuration to control the alarm behaviour."; 1489 leaf max-alarm-status-changes { 1490 type union { 1491 type uint16; 1492 type enumeration { 1493 enum infinite { 1494 description 1495 "The status change entries are accumulated 1496 infinitely."; 1497 } 1498 } 1499 } 1500 default "32"; 1501 description 1502 "The status-change entries are kept in a circular list 1503 per alarm. When this number is exceeded, the oldest 1504 status change entry is automatically removed. If the 1505 value is 'infinite', the status change entries are 1506 accumulated infinitely."; 1507 } 1508 choice notify-status-changes { 1509 description 1510 "This leaf controls the notifications sent for alarm status 1511 updates. There are three options: 1512 1. notifications are sent for all updates, severity level 1513 changes and alarm text changes 1514 2. notifications are only sent for alarm raise and clear 1515 3. notifications are sent for status changes equal to or 1516 above the specified severity level. Clear notifications 1517 shall always be sent 1518 Notifications shall also be sent for state changes that 1519 makes an alarm less severe than the specified level. 1520 In option 3, assuming the severity level is set to major, 1521 and that the alarm has the following state changes 1522 [(Time, severity, clear)]: 1523 [(T1, major, -), (T2, minor, -), (T3, warning, -), 1524 (T4, minor, -), (T5, major, -), (T6, critical, -), 1525 (T7, major. -), (T8, major, clear)] 1526 In that case, notifications will be sent at 1527 T1, T2, T5, T6, T7 and T8."; 1528 leaf notify-all-state-changes { 1529 type empty; 1530 description 1531 "Send notifications for all status changes."; 1532 } 1533 leaf notify-raise-and-clear { 1534 type empty; 1535 description 1536 "Send notifications only for raise, clear, and re-raise. 1537 Notifications for severity level changes or alarm text 1538 changes are not sent."; 1539 } 1540 leaf notify-severity-level { 1541 type severity; 1542 description 1543 "Only send notifications for alarm state changes 1544 crossing the specified level. Always send clear 1545 notifications."; 1546 } 1547 } 1548 container alarm-shelving { 1549 if-feature "alarm-shelving"; 1550 description 1551 "The alarm-shelving/shelf list is used to shelve 1552 (block/filter) alarms. The server will move any alarms 1553 corresponding to the shelving criteria from the 1554 alarms/alarm-list/alarm list to the 1555 alarms/shelved-alarms/shelved-alarm list. It will also 1556 stop sending notifications for the shelved alarms. The 1557 conditions in the shelf criteria are logically ANDed. 1558 When the shelving criteria is deleted or changed, the 1559 non-matching alarms MUST appear in the 1560 alarms/alarm-list/alarm list according to the real state. 1561 This means that the instrumentation MUST maintain states 1562 for the shelved alarms. Alarms that match the criteria 1563 shall have an operator-state 'shelved'. When the shelf 1564 configuration will remove an alarm from the shelf the 1565 server shall add an operator state 'unshelved'."; 1566 list shelf { 1567 key "name"; 1568 leaf name { 1569 type string; 1570 description 1571 "An arbitrary name for the alarm shelf."; 1572 } 1573 description 1574 "Each entry defines the criteria for shelving alarms. 1575 Criteria are ANDed. If no criteria are specified, 1576 all alarms will be shelved."; 1577 leaf-list resource { 1578 type resource-match; 1579 description 1580 "Shelve alarms for matching resources."; 1581 } 1582 leaf alarm-type-id { 1583 type alarm-type-id; 1584 description 1585 "Shelve all alarms that have an alarm-type-id that is 1586 equal to or derived from the given alarm-type-id."; 1587 } 1588 leaf alarm-type-qualifier-match { 1589 type string; 1590 description 1591 "A W3C regular expression that is used to match 1592 an alarm type qualifier. Shelve all alarms that 1593 matches this regular expression for the alarm 1594 type qualifier."; 1595 } 1596 leaf description { 1597 type string; 1598 description 1599 "An optional textual description of the shelf. This 1600 description should include the reason for shelving 1601 these alarms."; 1602 } 1603 } 1604 } 1605 } 1606 container alarm-inventory { 1607 config false; 1608 description 1609 "This alarm-inventory/alarm-type list contains all possible 1610 alarm types for the system. 1611 If the system knows for which resources a specific alarm 1612 type can appear, this is also identified in the inventory. 1613 The list also tells if each alarm type has a corresponding 1614 clear state. The inventory shall only contain concrete 1615 alarm types. 1617 The alarm inventory MUST be updated by the system when new 1618 alarms can appear. This can be the case when installing new 1619 software modules or inserting new card types. A 1620 notification 'alarm-inventory-changed' is sent when the 1621 inventory is changed."; 1622 list alarm-type { 1623 key "alarm-type-id alarm-type-qualifier"; 1624 description 1625 "An entry in this list defines a possible alarm."; 1626 leaf alarm-type-id { 1627 type alarm-type-id; 1628 description 1629 "The statically defined alarm type identifier for this 1630 possible alarm."; 1631 } 1632 leaf alarm-type-qualifier { 1633 type alarm-type-qualifier; 1634 description 1635 "The optionally dynamically defined alarm type identifier 1636 for this possible alarm."; 1637 } 1638 leaf-list resource { 1639 type resource-match; 1640 description 1641 "Optionally, specifies for which resources the alarm type 1642 is valid."; 1643 } 1644 leaf has-clear { 1645 type boolean; 1646 mandatory true; 1647 description 1648 "This leaf tells the operator if the alarm will be 1649 cleared when the correct corrective action has been 1650 taken. Implementations SHOULD strive for detecting the 1651 cleared state for all alarm types. If this leaf is 1652 true, the operator can monitor the alarm until it 1653 becomes cleared after the corrective action has been 1654 taken. If this leaf is false the operator needs to 1655 validate that the alarm is not longer active using other 1656 mechanisms. Alarms can lack a corresponding clear due 1657 to missing instrumentation or that there is no logical 1658 corresponding clear state."; 1659 } 1660 leaf-list severity-levels { 1661 type severity; 1662 description 1663 "This leaf-list indicates the possible severity levels of 1664 this alarm type. Note well that 'clear' is not part of 1665 the severity type. In general, the severity level should 1666 be defined by the instrumentation based on dynamic state 1667 and not defined statically by the alarm type in order to 1668 provide relevant severity level based on dynamic state 1669 and context. However most alarm types have a defined set 1670 of possible severity levels and this should be provided 1671 here."; 1672 } 1673 leaf description { 1674 type string; 1675 mandatory true; 1676 description 1677 "A description of the possible alarm. It SHOULD include 1678 information on possible underlying root causes and 1679 corrective actions."; 1680 } 1681 } 1682 } 1683 container summary { 1684 if-feature "alarm-summary"; 1685 config false; 1686 description 1687 "This container gives a summary of number of alarms."; 1688 list alarm-summary { 1689 key "severity"; 1690 description 1691 "A global summary of all alarms in the system. The summary 1692 does not include shelved alarms."; 1693 leaf severity { 1694 type severity; 1695 description 1696 "Alarm summary for this severity level."; 1697 } 1698 leaf total { 1699 type yang:gauge32; 1700 description 1701 "Total number of alarms of this severity level."; 1702 } 1703 leaf cleared { 1704 type yang:gauge32; 1705 description 1706 "For this severity level, the number of alarms that are 1707 cleared."; 1708 } 1709 leaf cleared-not-closed { 1710 if-feature "operator-actions"; 1711 type yang:gauge32; 1712 description 1713 "For this severity level, the number of alarms that are 1714 cleared but not closed."; 1715 } 1716 leaf cleared-closed { 1717 if-feature "operator-actions"; 1718 type yang:gauge32; 1719 description 1720 "For this severity level, the number of alarms that are 1721 cleared and closed."; 1722 } 1723 leaf not-cleared-closed { 1724 if-feature "operator-actions"; 1725 type yang:gauge32; 1726 description 1727 "For this severity level, the number of alarms that are 1728 not cleared but closed."; 1729 } 1730 leaf not-cleared-not-closed { 1731 if-feature "operator-actions"; 1732 type yang:gauge32; 1733 description 1734 "For this severity level, the number of alarms that are 1735 not cleared and not closed."; 1736 } 1737 } 1738 leaf shelves-active { 1739 if-feature "alarm-shelving"; 1740 type empty; 1741 description 1742 "This is a hint to the operator that there are active 1743 alarm shelves. This leaf MUST exist if the 1744 alarms/shelved-alarms/number-of-shelved-alarms is > 0."; 1745 } 1746 } 1747 container alarm-list { 1748 config false; 1749 description 1750 "The alarms in the system."; 1751 leaf number-of-alarms { 1752 type yang:gauge32; 1753 description 1754 "This object shows the total number of 1755 alarms in the system, i.e., the total number 1756 of entries in the alarm list."; 1757 } 1758 leaf last-changed { 1759 type yang:date-and-time; 1760 description 1761 "A timestamp when the alarm list was last 1762 changed. The value can be used by a manager to 1763 initiate an alarm resynchronization procedure."; 1764 } 1765 list alarm { 1766 key "resource alarm-type-id alarm-type-qualifier"; 1767 description 1768 "The list of alarms. Each entry in the list holds one 1769 alarm for a given alarm type and resource. 1770 An alarm can be updated from the underlying resource or 1771 by the user. The following leafs are maintained by the 1772 resource: is-cleared, last-change, perceived-severity, 1773 and alarm-text. An operator can change: operator-state 1774 and operator-text. 1776 Entries appear in the alarm list the first time an 1777 alarm becomes active for a given alarm-type and resource. 1778 Entries do not get deleted when the alarm is cleared, this 1779 is a boolean state in the alarm. 1781 Alarm entries are removed, purged, from the list by an 1782 explicit purge action. For example, purge all alarms 1783 that are cleared and in closed operator-state that are 1784 older than 24 hours. Systems may also remove alarms based 1785 on locally configured policies which is out of scope for 1786 this module."; 1787 uses common-alarm-parameters; 1788 leaf time-created { 1789 type yang:date-and-time; 1790 mandatory true; 1791 description 1792 "The time-stamp when this alarm entry was created. This 1793 represents the first time the alarm appeared, it can 1794 also represent that the alarm re-appeared after a purge. 1795 Further state-changes of the same alarm does not change 1796 this leaf, these changes will update the 'last-changed' 1797 leaf."; 1798 } 1799 uses resource-alarm-parameters; 1800 list operator-state-change { 1801 if-feature "operator-actions"; 1802 key "time"; 1803 description 1804 "This list is used by operators to indicate 1805 the state of human intervention on an alarm. 1806 For example, if an operator has seen an alarm, 1807 the operator can add a new item to this list indicating 1808 that the alarm is acknowledged."; 1809 uses operator-parameters; 1810 } 1811 action set-operator-state { 1812 if-feature "operator-actions"; 1813 description 1814 "This is a means for the operator to indicate 1815 the level of human intervention on an alarm."; 1816 input { 1817 leaf state { 1818 type writable-operator-state; 1819 mandatory true; 1820 description 1821 "Set this operator state."; 1822 } 1823 leaf text { 1824 type string; 1825 description 1826 "Additional optional textual information."; 1827 } 1828 } 1829 } 1830 notification operator-action { 1831 if-feature "operator-actions"; 1832 description 1833 "This notification is used to report that an operator 1834 acted upon an alarm."; 1835 uses operator-parameters; 1836 } 1837 } 1838 } 1839 container shelved-alarms { 1840 if-feature "alarm-shelving"; 1841 config false; 1842 description 1843 "The shelved alarms. Alarms appear here if they match the 1844 criteria in /alarms/control/alarm-shelving. This list does 1845 not generate any notifications. The list represents alarms 1846 that are considered not relevant by the operator. Alarms in 1847 this list have an operator-state of 'shelved'. This can not 1848 be changed."; 1849 leaf number-of-shelved-alarms { 1850 type yang:gauge32; 1851 description 1852 "This object shows the total number of currently 1853 alarms, i.e., the total number of entries 1854 in the alarm list."; 1855 } 1856 leaf alarm-shelf-last-changed { 1857 type yang:date-and-time; 1858 description 1859 "A timestamp when the shelved alarm list was last 1860 changed. The value can be used by a manager to 1861 initiate an alarm resynchronization procedure."; 1862 } 1863 list shelved-alarm { 1864 key "resource alarm-type-id alarm-type-qualifier"; 1865 description 1866 "The list of shelved alarms. Shelved alarms 1867 can only be updated from the underlying resource, 1868 no operator actions are supported."; 1869 uses common-alarm-parameters; 1870 leaf shelf-name { 1871 type leafref { 1872 path "/alarms/control/alarm-shelving/shelf/name"; 1873 require-instance false; 1874 } 1875 description 1876 "The name of the shelf."; 1877 } 1878 uses resource-alarm-parameters; 1879 list operator-state-change { 1880 if-feature "operator-actions"; 1881 key "time"; 1882 description 1883 "This list is used by operators to indicate 1884 the state of human intervention on an alarm. 1885 For shelved alarms, the system has set the list 1886 item in the list to 'shelved'."; 1887 uses operator-parameters; 1888 } 1889 } 1890 } 1891 list alarm-profile { 1892 if-feature "alarm-profile"; 1893 key "alarm-type-id alarm-type-qualifier-match resource"; 1894 ordered-by user; 1895 description 1896 "This list is used to assign further information or 1897 configuration for each alarm type. This module supports 1898 a mechanism where the client can override the system 1899 default alarm severity levels. The alarm-profile is 1900 also a useful augmentation point for specific additions 1901 to alarm types."; 1902 leaf alarm-type-id { 1903 type al:alarm-type-id; 1904 description 1905 "The alarm type identifier to match."; 1906 } 1907 leaf alarm-type-qualifier-match { 1908 type string; 1909 description 1910 "A W3C regular expression that is used to 1911 match."; 1912 } 1913 leaf resource { 1914 type al:resource-match; 1915 description 1916 "Specifies which resources to match."; 1917 } 1918 leaf description { 1919 type string; 1920 mandatory true; 1921 description 1922 "A description of the alarm profile."; 1923 } 1924 container alarm-severity-assignment-profile { 1925 if-feature "severity-assignment"; 1926 description 1927 "The client can override the system default 1928 severity level."; 1929 reference 1930 "ITU M.3100, ITU M.3160 1931 - Generic Network Information Model, 1932 Alarm Severity Assignment Profile"; 1933 leaf-list severity-levels { 1934 type al:severity; 1935 ordered-by user; 1936 description 1937 "Specifies the configured severity level(s) for the 1938 matching alarm. If the alarm has several severity 1939 levels the leaf-list shall be given in rising severity 1940 order. The original M3100/M3160 ASAP function only 1941 allows for a one-to-one mapping between alarm type and 1942 severity but since the IETF alarm module supports 1943 stateful alarms the mapping must allow for several 1944 severity levels. 1946 Assume a high-utilisation alarm type with two 1947 thresholds with the system default severity levels of 1948 threshold1 = warning and threshold2 = minor. Setting 1949 this leaf-list to (minor, major) will assign the 1950 severity levels threshold1 = minor and 1951 threshold2 = major"; 1952 } 1953 } 1954 } 1955 } 1957 /* 1958 * Operations 1959 */ 1961 rpc compress-alarms { 1962 if-feature "alarm-history"; 1963 description 1964 "This operation requests the server to compress entries in the 1965 alarm list by removing all but the latest state change for all 1966 alarms. Conditions in the input are logically ANDed. If no 1967 input condition is given, all alarms are compressed."; 1968 input { 1969 leaf resource { 1970 type leafref { 1971 path "/alarms/alarm-list/alarm/resource"; 1972 require-instance false; 1973 } 1974 description 1975 "Compress the alarms with this resource."; 1976 } 1977 leaf alarm-type-id { 1978 type leafref { 1979 path "/alarms/alarm-list/alarm/alarm-type-id"; 1980 require-instance false; 1981 } 1982 description 1983 "Compress alarms with this alarm-type-id."; 1984 } 1985 leaf alarm-type-qualifier { 1986 type leafref { 1987 path "/alarms/alarm-list/alarm/alarm-type-qualifier"; 1988 require-instance false; 1989 } 1990 description 1991 "Compress the alarms with this alarm-type-qualifier."; 1992 } 1993 } 1994 output { 1995 leaf compressed-alarms { 1996 type uint32; 1997 description 1998 "Number of compressed alarm entries."; 1999 } 2000 } 2001 } 2002 rpc compress-shelved-alarms { 2003 if-feature "alarm-history and alarm-shelving"; 2004 description 2005 "This operation requests the server to compress entries in the 2006 shelved alarm list by removing all but the latest state change 2007 for all alarms. Conditions in the input are logically ANDed. 2008 If no input condition is given, all alarms are compressed."; 2009 input { 2010 leaf resource { 2011 type leafref { 2012 path "/alarms/shelved-alarms/shelved-alarm/resource"; 2013 require-instance false; 2014 } 2015 description 2016 "Compress the alarms with this resource."; 2017 } 2018 leaf alarm-type-id { 2019 type leafref { 2020 path "/alarms/shelved-alarms/shelved-alarm/alarm-type-id"; 2021 require-instance false; 2023 } 2024 description 2025 "Compress alarms with this alarm-type-id."; 2026 } 2027 leaf alarm-type-qualifier { 2028 type leafref { 2029 path "/alarms/shelved-alarms/shelved-alarm" 2030 + "/alarm-type-qualifier"; 2031 require-instance false; 2032 } 2033 description 2034 "Compress the alarms with this alarm-type-qualifier."; 2035 } 2036 } 2037 output { 2038 leaf compressed-alarms { 2039 type uint32; 2040 description 2041 "Number of compressed alarm entries."; 2042 } 2043 } 2044 } 2046 grouping filter-input { 2047 description 2048 "Grouping to specify a filter construct on alarm information."; 2049 leaf alarm-status { 2050 type enumeration { 2051 enum any { 2052 description 2053 "Ignore alarm clearance status."; 2054 } 2055 enum cleared { 2056 description 2057 "Filter cleared alarms."; 2058 } 2059 enum not-cleared { 2060 description 2061 "Filter not cleared alarms."; 2062 } 2063 } 2064 mandatory true; 2065 description 2066 "The clearance status of the alarm."; 2067 } 2068 container older-than { 2069 presence "Age specification"; 2070 description 2071 "Matches the 'last-status-change' leaf in the alarm."; 2072 choice age-spec { 2073 description 2074 "Filter using date and time age."; 2075 case seconds { 2076 leaf seconds { 2077 type uint16; 2078 description 2079 "Seconds part"; 2080 } 2081 } 2082 case minutes { 2083 leaf minutes { 2084 type uint16; 2085 description 2086 "Minute part"; 2087 } 2088 } 2089 case hours { 2090 leaf hours { 2091 type uint16; 2092 description 2093 "Hours part."; 2094 } 2095 } 2096 case days { 2097 leaf days { 2098 type uint16; 2099 description 2100 "Day part"; 2101 } 2102 } 2103 case weeks { 2104 leaf weeks { 2105 type uint16; 2106 description 2107 "Week part"; 2108 } 2109 } 2110 } 2111 } 2112 container severity { 2113 presence "Severity filter"; 2114 choice sev-spec { 2115 description 2116 "Filter based on severity level."; 2117 leaf below { 2118 type severity; 2119 description 2120 "Severity less than this leaf."; 2121 } 2122 leaf is { 2123 type severity; 2124 description 2125 "Severity level equal this leaf."; 2126 } 2127 leaf above { 2128 type severity; 2129 description 2130 "Severity level higher than this leaf."; 2131 } 2132 } 2133 description 2134 "Filter based on severity."; 2135 } 2136 container operator-state-filter { 2137 if-feature "operator-actions"; 2138 presence "Operator state filter"; 2139 leaf state { 2140 type operator-state; 2141 description 2142 "Filter on operator state."; 2143 } 2144 leaf user { 2145 type string; 2146 description 2147 "Filter based on which operator."; 2148 } 2149 description 2150 "Filter based on operator state."; 2151 } 2152 } 2154 rpc purge-alarms { 2155 description 2156 "This operation requests the server to delete entries from the 2157 alarm list or the shelved alarms list according to the 2158 supplied criteria. To purge alarms in the shelved alarms, 2159 set the operator-state filter input to 'shelved'. 2160 Typically it can be used to delete alarms that are 2161 in closed operator state and older than a specified time. 2162 In the shelved alarm list it makes sense to delete alarms that 2163 are not relevant anymore. 2164 The number of purged alarms is returned as an output 2165 parameter."; 2166 input { 2167 uses filter-input; 2168 } 2169 output { 2170 leaf purged-alarms { 2171 type uint32; 2172 description 2173 "Number of purged alarms."; 2174 } 2175 } 2176 } 2178 /* 2179 * Notifications 2180 */ 2182 notification alarm-notification { 2183 description 2184 "This notification is used to report a state change for an 2185 alarm. The same notification is used for reporting a newly 2186 raised alarm, a cleared alarm or changing the text and/or 2187 severity of an existing alarm."; 2188 uses common-alarm-parameters; 2189 uses alarm-state-change-parameters; 2190 } 2191 notification alarm-inventory-changed { 2192 description 2193 "This notification is used to report that the list of possible 2194 alarms has changed. This can happen when for example if a new 2195 software module is installed, or a new physical card is 2196 inserted."; 2197 } 2198 } 2200 2202 6. X.733 Extensions 2204 Many alarm systems are based on the X.733, [X.733], and X.736 [X.736] 2205 alarm standards. This module augments the alarm inventory, the alarm 2206 lists and the alarm notification with X.733 and X.736 parameters. 2208 The module also supports a feature whereby the alarm manager can 2209 configure the mapping from alarm types to X.733 event-type and 2210 probable-cause parameters. This might be needed when the default 2211 mapping provided by the system is in conflict with other management 2212 systems or not considered correct. 2214 Note that the IETF Alarm Module term 'resource' is synonymous to the 2215 ITU term 'managed object'. 2217 7. The X.733 Mapping Module 2219 This YANG module references [X.733] and [X.736]. 2221 file "ietf-alarms-x733@2018-09-20.yang" 2222 module ietf-alarms-x733 { 2223 yang-version 1.1; 2224 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; 2225 prefix x733; 2227 import ietf-alarms { 2228 prefix al; 2229 } 2230 import ietf-yang-types { 2231 prefix yang; 2232 reference "RFC 6991: Common YANG Data Types"; 2233 } 2235 organization 2236 "IETF CCAMP Working Group"; 2237 contact 2238 "WG Web: 2239 WG List: 2241 Editor: Stefan Vallin 2242 2244 Editor: Martin Bjorklund 2245 "; 2246 description 2247 "This module augments the ietf-alarms module with X.733 alarm 2248 parameters. 2250 The following structures are augmented with X.733 event type 2251 and probable cause: 2253 1) alarms/alarm-inventory: all possible alarm types 2254 2) alarms/alarm-list: every alarm in the system 2255 3) alarm-notification: notifications indicating alarm state 2256 changes 2258 The module also optionally allows the alarm management system 2259 to configure the mapping from the IETF Alarm module alarm keys 2260 to the ITU tuple (event-type, probable-cause). 2262 The mapping does not include a corresponding X.733 specific 2263 problem value. The recommendation is to use the 2264 'alarm-type-qualifier' leaf which serves the same purpose. 2266 The module uses an integer and a corresponding string for 2267 probable cause instead of a globally defined enumeration, in 2268 order to be able to manage conflicting enumeration definitions. 2269 A single globally defined enumeration is challenging to 2270 maintain."; 2271 reference 2272 "ITU Recommendation X.733: Information Technology 2273 - Open Systems Interconnection 2274 - System Management: Alarm Reporting Function"; 2276 revision 2018-09-20 { 2277 description 2278 "Initial revision."; 2279 reference "RFC XXXX: YANG Alarm Module"; 2280 } 2282 /* 2283 * Features 2284 */ 2286 feature configure-x733-mapping { 2287 description 2288 "The system supports configurable X733 mapping from 2289 the IETF alarm module alarm-type to X733 event-type 2290 and probable-cause."; 2291 } 2293 /* 2294 * Typedefs 2295 */ 2297 typedef event-type { 2298 type enumeration { 2299 enum other { 2300 value 1; 2301 description 2302 "None of the below."; 2303 } 2304 enum communications-alarm { 2305 value 2; 2306 description 2307 "An alarm of this type is principally associated with the 2308 procedures and/or processes required to convey 2309 information from one point to another."; 2311 } 2312 enum quality-of-service-alarm { 2313 value 3; 2314 description 2315 "An alarm of this type is principally associated with a 2316 degradation in the quality of a service."; 2317 } 2318 enum processing-error-alarm { 2319 value 4; 2320 description 2321 "An alarm of this type is principally associated with a 2322 software or processing fault."; 2323 } 2324 enum equipment-alarm { 2325 value 5; 2326 description 2327 "An alarm of this type is principally associated with an 2328 equipment fault."; 2329 } 2330 enum environmental-alarm { 2331 value 6; 2332 description 2333 "An alarm of this type is principally associated with a 2334 condition relating to an enclosure in which the equipment 2335 resides."; 2336 } 2337 enum integrity-violation { 2338 value 7; 2339 description 2340 "An indication that information may have been illegally 2341 modified, inserted or deleted."; 2342 } 2343 enum operational-violation { 2344 value 8; 2345 description 2346 "An indication that the provision of the requested service 2347 was not possible due to the unavailability, malfunction or 2348 incorrect invocation of the service."; 2349 } 2350 enum physical-violation { 2351 value 9; 2352 description 2353 "An indication that a physical resource has been violated 2354 in a way that suggests a security attack."; 2355 } 2356 enum security-service-or-mechanism-violation { 2357 value 10; 2358 description 2359 "An indication that a security attack has been detected by 2360 a security service or mechanism."; 2361 } 2362 enum time-domain-violation { 2363 value 11; 2364 description 2365 "An indication that an event has occurred at an unexpected 2366 or prohibited time."; 2367 } 2368 } 2369 description 2370 "The event types as defined by X.733 and X.736."; 2371 reference 2372 "ITU Recommendation X.733: Information Technology 2373 - Open Systems Interconnection 2374 - System Management: Alarm Reporting Function 2375 ITU Recommendation X.736: Information Technology 2376 - Open Systems Interconnection 2377 - System Management: Security Alarm Reporting Function"; 2378 } 2380 typedef trend { 2381 type enumeration { 2382 enum less-severe { 2383 description 2384 "There is at least one outstanding alarm of a 2385 severity higher (more severe) than that in the 2386 current alarm."; 2387 } 2388 enum no-change { 2389 description 2390 "The Perceived severity reported in the current 2391 alarm is the same as the highest (most severe) 2392 of any of the outstanding alarms"; 2393 } 2394 enum more-severe { 2395 description 2396 "The Perceived severity in the current alarm is 2397 higher (more severe) than that reported in any 2398 of the outstanding alarms."; 2399 } 2400 } 2401 description 2402 "This type is used to describe the 2403 severity trend of the alarming resource"; 2404 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2405 } 2406 typedef value-type { 2407 type union { 2408 type int64; 2409 type uint64; 2410 type decimal64 { 2411 fraction-digits 2; 2412 } 2413 } 2414 description 2415 "A generic union type to match ITU choice of integer 2416 and real."; 2417 } 2419 /* 2420 * Groupings 2421 */ 2423 grouping x733-alarm-parameters { 2424 description 2425 "Common X.733 parameters for alarms."; 2426 leaf event-type { 2427 type event-type; 2428 description 2429 "The X.733/X.736 event type for this alarm."; 2430 } 2431 leaf probable-cause { 2432 type uint32; 2433 description 2434 "The X.733 probable cause for this alarm."; 2435 } 2436 leaf probable-cause-string { 2437 type string; 2438 description 2439 "The user friendly string matching 2440 the probable cause integer value. The string 2441 SHOULD match the X.733 enumeration. For example, 2442 value 27 is 'localNodeTransmissionError'."; 2443 } 2444 container threshold-information { 2445 description 2446 "This parameter shall be present when the alarm 2447 is a result of crossing a threshold. "; 2448 leaf triggered-threshold { 2449 type string; 2450 description 2451 "The identifier of the threshold attribute that 2452 caused the notification."; 2453 } 2454 leaf observed-value { 2455 type value-type; 2456 description 2457 "The value of the gauge or counter which crossed 2458 the threshold. This may be different from the 2459 threshold value if, for example, the gauge may 2460 only take on discrete values."; 2461 } 2462 choice threshold-level { 2463 description 2464 "In the case of a gauge the threshold level specifies 2465 a pair of threshold values, the first being the value 2466 of the crossed threshold and the second, its corresponding 2467 hysteresis; in the case of a counter the threshold level 2468 specifies only the threshold value."; 2469 case up { 2470 leaf up-high { 2471 type value-type; 2472 description 2473 "The going up threshold for rising the alarm."; 2474 } 2475 leaf up-low { 2476 type value-type; 2477 description 2478 "The threshold level for clearing the alarm. 2479 This is used for hysteresis functions for gauges."; 2480 } 2481 } 2482 case down { 2483 leaf down-low { 2484 type value-type; 2485 description 2486 "The going down threshold for rising the alarm."; 2487 } 2488 leaf down-high { 2489 type value-type; 2490 description 2491 "The threshold level for clearing the alarm. 2492 This is used for hysteresis functions for gauges."; 2493 } 2494 } 2495 } 2496 leaf arm-time { 2497 type yang:date-and-time; 2498 description 2499 "For a gauge threshold, the time at which the threshold 2500 was last re-armed, namely the time after the previous 2501 threshold crossing at which the hysteresis value of the 2502 threshold was exceeded thus again permitting generation 2503 of notifications when the threshold is crossed. 2504 For a counter threshold, the later of the time at which 2505 the threshold offset was last applied, or the time at 2506 which the counter was last initialized (for resettable 2507 counters)."; 2508 } 2509 } 2510 list monitored-attributes { 2511 uses attribute; 2512 key "id"; 2513 description 2514 "The Monitored attributes parameter, when present, defines 2515 one or more attributes of the resource and their 2516 corresponding values at the time of the alarm."; 2517 } 2518 leaf-list proposed-repair-actions { 2519 type string; 2520 description 2521 "This parameter, when present, is used if the cause is 2522 known and the system being managed can suggest one or 2523 more solutions (such as switch in standby equipment, 2524 retry, replace media)."; 2525 } 2526 leaf trend-indication { 2527 type trend; 2528 description 2529 "This parameter specifies the current 2530 severity trend of the resource. If present it 2531 indicates that there are one or more alarms 2532 ('outstanding alarms') which have not been cleared, 2533 and pertain to the same resource as that to which 2534 this alarm ('current alarm') pertains. 2535 The possible values are: 2537 more-severe: The Perceived severity in the current 2538 alarm is higher (more severe) than that reported in 2539 any of the outstanding alarms. 2541 no-change: The Perceived severity reported in the 2542 current alarm is the same as the highest (most severe) 2543 of any of the outstanding alarms. 2545 less-severe: There is at least one outstanding alarm 2546 of a severity higher (more severe) than that in the 2547 current alarm."; 2548 } 2549 leaf backedup-status { 2550 type boolean; 2551 description 2552 "This parameter, when present, specifies whether or not 2553 the object emitting the alarm has been backed-up, and 2554 services provided to the user have, therefore, not been 2555 disrupted. The use of this field in conjunction with the 2556 severity field provides information in an independent form 2557 to qualify the seriousness of the alarm and the ability of 2558 the system as a whole to continue to provide services. 2559 If the value of this parameter is true, it indicates that 2560 the object emitting the alarm has been backed-up; if false, 2561 the object has not been backed-up."; 2562 } 2563 leaf backup-object { 2564 type al:resource; 2565 description 2566 "This parameter shall be present when the Backed-up status 2567 parameter is present and has the value true. This parameter 2568 specifies the managed object instance that is providing 2569 back-up services for the managed object about which the 2570 notification pertains. This parameter is useful, 2571 for example, when the back-up object is from a pool of 2572 objects any of which may be dynamically allocated to 2573 replace a faulty object."; 2574 } 2575 list additional-information { 2576 key "identifier"; 2577 description 2578 "This parameter allows the inclusion of a 2579 set of additional information in the alarm. It is 2580 a series of data structures each of which contains three 2581 items of information: an identifier, a significance 2582 indicator, and the problem information."; 2583 leaf identifier { 2584 type string; 2585 description 2586 "Identifies the data-type of the information parameter."; 2587 } 2588 leaf significant { 2589 type boolean; 2590 description 2591 "Set to true if the receiving system must be able to 2592 parse the contents of the information subparameter 2593 for the event report to be fully understood."; 2594 } 2595 leaf information { 2596 type string; 2597 description 2598 "Additional information about the alarm."; 2599 } 2600 } 2601 leaf security-alarm-detector { 2602 type al:resource; 2603 description 2604 "This parameter identifies the detector of the security 2605 alarm."; 2606 } 2607 leaf service-user { 2608 type al:resource; 2609 description 2610 "This parameter identifies the service-user whose request 2611 for service led to the generation of the security alarm."; 2612 } 2613 leaf service-provider { 2614 type al:resource; 2615 description 2616 "This parameter identifies the intended service-provider 2617 of the service that led to the generation of the security 2618 alarm."; 2619 } 2620 reference 2621 "ITU Recommendation X.733: Information Technology 2622 - Open Systems Interconnection 2623 - System Management: Alarm Reporting Function 2624 ITU Recommendation X.736: Information Technology 2625 - Open Systems Interconnection 2626 - System Management: Security Alarm Reporting Function"; 2627 } 2629 grouping x733-alarm-definition-parameters { 2630 description 2631 "Common X.733 parameters for alarm definitions. 2632 This grouping is used to define those alarm 2633 attributes that can be mapped from the alarm-type 2634 mechanism in the ietf-alarm module."; 2635 leaf event-type { 2636 type event-type; 2637 description 2638 "The alarm type has this X.733/X.736 event type."; 2639 } 2640 leaf probable-cause { 2641 type uint32; 2642 description 2643 "The alarm type has this X.733 probable cause value. 2644 This module defines probable cause as an integer 2645 and not as an enumeration. The reason being that the 2646 primary use of probable cause is in the management 2647 application if it is based on the X.733 standard. 2648 However, most management applications have their own 2649 defined enum definitions and merging enums from 2650 different systems might create conflicts. By using 2651 a configurable uint32 the system can be configured 2652 to match the enum values in the management application."; 2653 } 2654 leaf probable-cause-string { 2655 type string; 2656 description 2657 "This string can be used to give a user friendly string 2658 to the probable cause value."; 2659 } 2660 } 2662 grouping attribute { 2663 description 2664 "A grouping to match the ITU generic reference to 2665 an attribute."; 2666 leaf id { 2667 type al:resource; 2668 description 2669 "The resource representing the attribute."; 2670 } 2671 leaf value { 2672 type string; 2673 description 2674 "The value represented as a string since it could 2675 be of any type."; 2676 } 2677 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2678 } 2680 /* 2681 * Add X.733 parameters to the alarm definitions, alarms, 2682 * and notification. 2683 */ 2685 augment "/al:alarms/al:alarm-inventory/al:alarm-type" { 2686 description 2687 "Augment X.733 mapping information to the alarm inventory."; 2688 uses x733-alarm-definition-parameters; 2689 } 2691 /* 2692 * Add X.733 configurable mapping. 2693 */ 2695 augment "/al:alarms/al:control" { 2696 description 2697 "Add X.733 mapping capabilities. "; 2698 list x733-mapping { 2699 if-feature "configure-x733-mapping"; 2700 key "alarm-type-id alarm-type-qualifier-match"; 2701 description 2702 "This list allows a management application to control the 2703 X.733 mapping for all alarm types in the system. Any entry 2704 in this list will allow the alarm manager to over-ride the 2705 default X.733 mapping in the system and the final mapping 2706 will be shown in the alarm inventory."; 2707 leaf alarm-type-id { 2708 type al:alarm-type-id; 2709 description 2710 "Map the alarm type with this alarm type identifier."; 2711 } 2712 leaf alarm-type-qualifier-match { 2713 type string; 2714 description 2715 "A W3C regular expression that is used when mapping an 2716 alarm type and alarm-type-qualifier to X.733 parameters."; 2717 } 2718 uses x733-alarm-definition-parameters; 2719 } 2720 } 2721 augment "/al:alarms/al:alarm-list/al:alarm" { 2722 description 2723 "Augment X.733 information to the alarm."; 2724 uses x733-alarm-parameters; 2725 } 2726 augment "/al:alarms/al:shelved-alarms/al:shelved-alarm" { 2727 description 2728 "Augment X.733 information to the alarm."; 2729 uses x733-alarm-parameters; 2730 } 2731 augment "/al:alarm-notification" { 2732 description 2733 "Augment X.733 information to the alarm notification."; 2734 uses x733-alarm-parameters; 2735 } 2736 } 2738 2740 8. IANA Considerations 2742 This document registers a URI in the IETF XML registry [RFC3688]. 2743 Following the format in RFC 3688, the following registration is 2744 requested to be made. 2746 URI: urn:ietf:params:xml:ns:yang:ietf-alarms 2748 Registrant Contact: The IESG. 2750 XML: N/A, the requested URI is an XML namespace. 2752 This document registers a YANG module in the YANG Module Names 2753 registry [RFC6020]. 2755 name: ietf-alarms 2756 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms 2757 prefix: al 2758 reference: RFC XXXX 2760 9. Security Considerations 2762 The YANG module specified in this document defines a schema for data 2763 that is designed to be accessed via network management protocols such 2764 as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer 2765 is the secure transport layer, and the mandatory-to-implement secure 2766 transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer 2767 is HTTPS, and the mandatory-to-implement secure transport is TLS 2768 [RFC5246]. 2770 The NETCONF access control model [RFC6536] provides the means to 2771 restrict access for particular NETCONF or RESTCONF users to a 2772 preconfigured subset of all available NETCONF or RESTCONF protocol 2773 operations and content. 2775 There are a number of data nodes defined in this YANG module that are 2776 writable/creatable/deletable (i.e., config true, which is the 2777 default). These data nodes may be considered sensitive or vulnerable 2778 in some network environments. Write operations (e.g., edit-config) 2779 to these data nodes without proper protection can have a negative 2780 effect on network operations. These are the subtrees and data nodes 2781 and their sensitivity/vulnerability: 2783 /alarms/control/notify-status-change: This leaf controls whether an 2784 alarm should notify only raise and clear or all severity level 2785 changes. Unauthorized access to leaf could have a negative impact 2786 on operational procedures relying on fine-grained alarm state 2787 change reporting. 2789 /alarms/control/alarm-shelving/shelf: This list controls the 2790 shelving (blocking) of alarms. Unauthorized access to this list 2791 could jeopardize the alarm management procedures since these 2792 alarms will not be notified and not be part of the alarm list. 2794 Some of the RPC operations in this YANG module may be considered 2795 sensitive or vulnerable in some network environments. It is thus 2796 important to control access to these operations. These are the 2797 operations and their sensitivity/vulnerability: 2799 purge-alarms: This RPC deletes alarms from the alarm list. 2800 Unauthorized use of this RPC could jeopardize the alarm management 2801 procedures since the deleted alarms may be vital for the alarm 2802 management application. 2804 10. Acknowledgements 2806 The authors wish to thank Viktor Leijon and Johan Nordlander for 2807 their valuable input on forming the alarm model. 2809 The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and 2810 Balazs Lengyel for their extensive reviews and contributions to this 2811 document. 2813 11. References 2815 11.1. Normative References 2817 [M.3100] International Telecommunications Union, "Generic Network 2818 Information Model", ITU-T Recommendation M.3100, 2005. 2820 [M.3160] International Telecommunications Union, "Generic, 2821 protocol-neutral management information model", ITU-T 2822 Recommendation M.3100, 2008. 2824 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2825 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 2826 RFC2119, March 1997, 2827 . 2829 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 2830 DOI 10.17487/RFC3688, January 2004, 2831 . 2833 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 2834 (TLS) Protocol Version 1.2", RFC 5246, DOI 10.17487/ 2835 RFC5246, August 2008, . 2838 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 2839 the Network Configuration Protocol (NETCONF)", RFC 6020, 2840 DOI 10.17487/RFC6020, October 2010, 2841 . 2843 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 2844 and A. Bierman, Ed., "Network Configuration Protocol 2845 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 2846 . 2848 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 2849 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 2850 . 2852 [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", RFC 2853 6991, DOI 10.17487/RFC6991, July 2013, 2854 . 2856 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 2857 RFC 7950, DOI 10.17487/RFC7950, August 2016, 2858 . 2860 [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 2861 Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, 2862 . 2864 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2865 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2866 May 2017, . 2868 [X.733] International Telecommunications Union, "Information 2869 Technology - Open Systems Interconnection - Systems 2870 Management: Alarm Reporting Function", ITU-T 2871 Recommendation X.733, 1992. 2873 11.2. Informative References 2875 [ALARMIRP] 2876 3GPP, "Telecommunication management; Fault Management; 2877 Part 2: Alarm Integration Reference Point (IRP): 2878 Information Service (IS)", 3GPP TS 32.111-2 3.4.0, March 2879 2005. 2881 [ALARMSEM] 2882 Wallin, S., Leijon, V., Nordlander, J., and N. Bystedt, 2883 "The semantics of alarm definitions: enabling systematic 2884 reasoning about alarms. International Journal of Network 2885 Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, 2886 http://dx.doi.org/10.1002/nem.800", March 2012. 2888 [EEMUA] EEMUA Publication No. 191 Engineering Equipment and 2889 Materials Users Association, London, 2 edition., "Alarm 2890 Systems: A Guide to Design, Management and Procurement.", 2891 2007. 2893 [G.7710] ITU-T, "SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL 2894 SYSTEMS AND NETWORKS Data over Transport - Generic aspects 2895 - Transport network control aspects. Common equipment 2896 management function requirements", 2012. 2898 [ISA182] International Society of Automation,ISA, "ANSI/ISA- 2899 18.2-2009 Management of Alarm Systems for the Process 2900 Industries", 2009. 2902 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 2903 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 2904 September 2004, . 2906 [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", 2907 BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, 2908 . 2910 [X.736] International Telecommunications Union, "Information 2911 Technology - Open Systems Interconnection - Systems 2912 Management: Security alarm reporting function", ITU-T 2913 Recommendation X.736, 1992. 2915 Appendix A. Vendor-specific Alarm-Types Example 2917 This example shows how to define alarm-types in a vendor-specific 2918 module. In this case the vendor "xyz" has chosen to define top level 2919 identities according to X.733 event types. 2921 module example-xyz-alarms { 2922 namespace "urn:example:xyz-alarms"; 2923 prefix xyz-al; 2925 import ietf-alarms { 2926 prefix al; 2927 } 2929 identity xyz-alarms { 2930 base al:alarm-type-id; 2931 } 2933 identity communications-alarm { 2934 base xyz-alarms; 2935 } 2936 identity quality-of-service-alarm { 2937 base xyz-alarms; 2938 } 2939 identity processing-error-alarm { 2940 base xyz-alarms; 2941 } 2942 identity equipment-alarm { 2943 base xyz-alarms; 2944 } 2945 identity environmental-alarm { 2946 base xyz-alarms; 2947 } 2949 // communications alarms 2950 identity link-alarm { 2951 base communications-alarm; 2952 } 2954 // QoS alarms 2955 identity high-jitter-alarm { 2956 base quality-of-service-alarm; 2957 } 2958 } 2960 Appendix B. Alarm Inventory Example 2962 This shows an alarm inventory, it shows one alarm type defined only 2963 with the identifier, and another dynamically configured. In the 2964 latter case a digital input has been connected to a smoke-detector, 2965 therefore the 'alarm-type-qualifier' is set to "smoke-detector" and 2966 the 'alarm-type-identity' to "environmental-alarm". 2968 2971 2972 2973 xyz-al:link-alarm 2974 2975 2976 /dev:interfaces/dev:interface 2977 2978 true 2979 2980 Link failure, operational state down but admin state up 2981 2982 2983 2984 xyz-al:environmental-alarm 2985 smoke-alarm 2986 true 2987 2988 Connected smoke detector to digital input 2989 2990 2991 2992 2994 Appendix C. Alarm List Example 2996 In this example we show an alarm that has toggled [major, clear, 2997 major]. An operator has acknowledged the alarm. 2999 3002 3003 1 3004 2015-04-08T08:39:50.00Z 3006 3007 3008 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3009 3010 xyz-al:link-alarm 3011 3013 2015-04-08T08:39:50.00Z 3014 false 3015 1.3.6.1.2.1.2.2.1.1.17 3016 2015-04-08T08:39:40.00Z 3017 major 3018 3019 Link operationally down but administratively up 3020 3021 3022 3023 major 3024 3025 Link operationally down but administratively up 3026 3027 3028 3029 3030 cleared 3031 3032 Link operationally up and administratively up 3033 3034 3035 3036 3037 major 3038 3039 Link operationally down but administratively up 3040 3041 3042 3043 3044 ack 3045 joe 3046 Will investigate, ticket TR764999 3047 3048 3049 3050 3052 Appendix D. Alarm Shelving Example 3054 This example shows how to shelf alarms. We shelf alarms related to 3055 the smoke-detectors since they are being installed and tested. We 3056 also shelf all alarms from FastEthernet1/0. 3058 3061 3062 3063 3064 FE10 3065 3066 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3067 3068 3069 3070 detectortest 3071 xyz-al:environmental-alarm 3072 3073 smoke-alarm 3074 3075 3076 3077 3078 3080 Appendix E. X.733 Mapping Example 3082 This example shows how to map a dynamic alarm type (alarm-type- 3083 identity=environmental-alarm, alarm-type-qualifier=smoke-alarm) to 3084 the corresponding X.733 event-type and probable cause parameters. 3086 3088 3089 3091 xyz-al:environmental-alarm 3092 3093 smoke-alarm 3094 3095 quality-of-service-alarm 3096 777 3097 3098 3099 3101 Appendix F. Background and Usability Requirements 3103 This section gives background information regarding design choices in 3104 the alarm module. It also defines usability requirements for alarms. 3105 Alarm usability is important for an alarm interface. A data-model 3106 will help in defining the format but if the actual alarms are of low 3107 value we have not gained the goal of alarm management. 3109 The telecommunication domain has standardized an alarm interface in 3110 ITU-T X.733 [X.733]. This continued in mobile networks within the 3111 3GPP organization [ALARMIRP]. Although SNMP is the dominant 3112 mechanism for monitoring devices, IETF did not early on standardize 3113 an alarm MIB. Instead, management systems interpreted the enterprise 3114 specific traps per MIB and device to build an alarm list. When 3115 finally The Alarm MIB [RFC3877] was published, it had to address the 3116 existence of enterprise traps and map these into alarms. This 3117 requirement led to a MIB that is not always easy to use. 3119 F.1. Alarm Concepts 3121 There are two misconceptions regarding alarms and alarm interfaces 3122 that are important to sort out. The first problem is that alarms are 3123 mixed with events in general. Alarms MUST correspond to an 3124 undesirable state that needs corrective action. Many implementations 3125 of alarm interfaces do not adhere to this principle and just send 3126 events in general. In order to qualify as an alarm, there must exist 3127 a corrective action. If that is not true, it is an event that can go 3128 into logs. 3130 "One of the most important principles of alarm management is that an 3131 alarm requires an action. This means that if the operator does not 3132 need to respond to an alarm (because unacceptable consequences do not 3133 occur), then it is not an alarm. Following this cardinal rule will 3134 help eliminate many potential alarm management issues." [ISA182] 3136 The other misconception is that the term "alarm" refers to the 3137 notification itself. Rather, an alarm is a state of a resource in 3138 the system. The alarm notifications report state changes of the 3139 alarm, such as alarm raise and alarm clear. 3141 F.1.1. Alarm type 3143 Since every alarm has a corresponding corrective action, a vendor can 3144 to prepare a list of available alarms and their corrective actions. 3145 We use the term "alarm type" to refer to every possible alarm that 3146 could be active in the system. 3148 Alarm types are also fundamental in order to provide a state-based 3149 alarm list. The alarm list correlates alarm state changes for the 3150 same alarm type and the same resource into one alarm. 3152 Different alarm interfaces use different mechanisms to define alarm 3153 types, ranging from simple error numbers to more advanced mechanisms 3154 like the X.733 triplet of event type, probable cause and specific 3155 problem. 3157 A common misunderstanding is that individual alarm notifications are 3158 alarm types. This is not correct; e.g., "link-up" and "link-down" 3159 are two notifications reporting different states for the same alarm 3160 type, "link-alarm". 3162 F.2. Relationships to other alarm standards 3164 This section briefly describes how this alarm module relates to other 3165 relevant alarm standards. It covers the definition of the concept of 3166 an alarm and the data models of the referenced alarm standards. 3168 F.2.1. Alarm definition 3170 The table below summarizes relevant definitions of the term "alarm". 3172 +------------+---------------------------+--------------------------+ 3173 | Standard | Definition | Comment | 3174 +------------+---------------------------+--------------------------+ 3175 | X.733 | error: A deviation of a | The X.733 alarm | 3176 | [X.733] | system from normal | definition is focused on | 3177 | | operation. fault: The | the notification as such | 3178 | | physical or algorithmic | and not the state. It | 3179 | | cause of a malfunction. | also uses the basic | 3180 | | Faults manifest | criteria of deviation | 3181 | | themselves as errors. | from normal condition. | 3182 | | alarm: A notification, of | There is no requirement | 3183 | | the form defined by this | for an operation action | 3184 | | function, of a specific | to be required. | 3185 | | event. An alarm may or | | 3186 | | may not represent an | | 3187 | | error. | | 3188 | | | | 3189 | G.7710 | Alarms are indications | The G.7710 definition is | 3190 | [G.7710] | that are automatically | close to the original | 3191 | | generated by an NE as a | X.733 definition. | 3192 | | result of the declaration | | 3193 | | of a failure. | | 3194 | | | | 3195 | Alarm MIB | Alarm: Persistent | RFC 3877 defines alarm | 3196 | [RFC3877] | indication of a fault. | referring back to "a | 3197 | | Fault: Lasting error or | deviation from normal | 3198 | | warning condition. | operation". This is | 3199 | | Error: A deviation of a | problematic, since this | 3200 | | system from normal | might not require an | 3201 | | operation. | operator action. The | 3202 | | | alarm MIB is state | 3203 | | | oriented rather than | 3204 | | | notification oriented, | 3205 | | | an alarm is a "lasting | 3206 | | | condition", not a | 3207 | | | discrete notification | 3208 | | | reporting about a | 3209 | | | condition state change. | 3210 | | | | 3211 | ISA | Alarm: An audible and/or | The ISA standard adds an | 3212 | [ISA182] | visible means of | important requirement to | 3213 | | indicating to the | the "deviation from | 3214 | | operator an equipment | normal condition state"; | 3215 | | malfunction, process | requiring a response. | 3216 | | deviation or abnormal | | 3217 | | condition requiring a | | 3218 | | response. | | 3219 | | | | 3220 | EEMUA | An alarm is an event to | This is the foundation | 3221 | [EEMUA] | which an operator must | for the definition of | 3222 | | knowingly react,respond, | alarm in this document. | 3223 | | and acknowledge - not | It focuses on the core | 3224 | | simply acknowledge and | criteria that an action | 3225 | | ignore. | is really needed. | 3226 | | | | 3227 | 3GPP Alarm | 3GPP v15: An alarm | The latest 3GPP Alarm | 3228 | IRP | signifies an undesired | IRP version uses | 3229 | [ALARMIRP] | condition of a resource | literally the same alarm | 3230 | | (e.g. network element, | definition as this alarm | 3231 | | link) for which an | module. It is worth | 3232 | | operator action is | noting that earlier | 3233 | | required. It emphasizes a | versions used a | 3234 | | key requirement that | definition not requiring | 3235 | | operators [...] should | an operator action and | 3236 | | not be informed about an | the more broad | 3237 | | undesired condition | definition of deviation | 3238 | | unless it requires | from normal condition. | 3239 | | operator action. 3GPP | The earlier version also | 3240 | | v12: alarm: abnormal | defined an alarm as a | 3241 | | network entity condition, | special case of "event". | 3242 | | which categorizes an | | 3243 | | event as a fault. fault: | | 3244 | | a deviation of a system | | 3245 | | from normal operation, | | 3246 | | which may result in the | | 3247 | | loss of operational | | 3248 | | capabilities [...] | | 3249 +------------+---------------------------+--------------------------+ 3251 Table 1: Definition of alarm in standards 3253 The evolution of the definition of alarm moves from focused on events 3254 reporting a deviation from normal operation towards a definition to a 3255 undesired *state* which *requires an operator action*. 3257 F.2.2. Data model 3259 This section describes how this YANG alarm module relates to other 3260 standard data models. Note well that we cover other data-models for 3261 alarm interfaces. Not other standards such as SDO specific alarms 3262 for example. 3264 F.2.2.1. X.733 3266 X.733 has acted as a base for several alarm data models over the 3267 year. The YANG alarm module differs in the following ways: 3269 X.733 models the alarm list as a list of notifications. The YANG 3270 alarm module defines the alarm list as the current alarm states 3271 for the resources, which is generated from the state change 3272 reporting notifications. 3274 In X.733 an alarm can have the severity level clear. In the YANG 3275 alarm module "clear" is not a severity level, it is a separate 3276 state of the alarm. An alarm can have the following states for 3277 example (major, cleared), (minor, not cleared) 3279 X.733 uses a flat globally defined enumerated "probable cause" to 3280 identify alarm types. This alarm module uses a hierarchical YANG 3281 identity, alarm-type. This enables delegation of alarm types 3282 within organizations. It also lets management reason about 3283 "abstract" alarm-types corresponding to base identities, see 3284 Section 3.2. 3286 The YANG alarm module has not included the majority of the X.733 3287 alarm attributes. Rather these are defined in an augmenting 3288 module if "strict" X.733 compliance is needed. 3290 F.2.2.2. RFC3877, the Alarm MIB 3292 The MIB in RFC3877 takes a different approach, rather than defining a 3293 concrete data-model for alarms, it defines a model to map existing 3294 SNMP managed-objects and notifications into alarm states and alarm 3295 notifications. This was necessary since MIBs where already defined 3296 with both managed objects and notifications indicating alarms, for 3297 example linkUp and linkDown notifications in combination with 3298 ifAdminState and ifOperState. So RFC3877 can not really be compared 3299 to the alarm YANG module in that sense. 3301 The Alarm MIB maps existing MIB definitions into alarms, 3302 alarmModelTable. The upside of that is that a SNMP Manager can at 3303 runtime read the possible alarm types. This corresponds to the 3304 alarmInventory in the alarm YANG module. 3306 F.2.2.3. 3GPP Alarm IRP 3308 The 3GPP Alarm IRP is an evolution of X.733. Main differences 3309 between the alarm YANG module and 3GPP are: 3311 3GPP keeps the majority of the X.733 attributes, the alarm YANG 3312 module does not. 3314 3GPP introduced overlapping and possibly conflicting keys for 3315 alarms, alarmId and (managed object, event type, probable cause, 3316 specific problem). (See Annex C in [X.733] Example 3). In the 3317 YANG alarm module the key for identifying an alarm instance is 3318 clearly defined by (resource, alarm-type, alarm-type-qualifier). 3319 See also Section 3.4 for more information. 3321 The alarm YANG module clearly separates the resource/ 3322 instrumentation life cycle from the operator life cycle. 3GPP 3323 allows operators to set the alarm severity to clear, this is not 3324 allowed by this module, rather an operator closes an alarm which 3325 does not affect the severity. 3327 F.2.2.4. G.7710 3329 G.7710 is different than the previous referenced alarm standards. It 3330 does define a data-model for alarm reporting. It defines common 3331 equipment management function requirements including alarm 3332 instrumentation. The scope is transport networks. 3334 The requirements in G.7710 corresponds to features in the alarm YANG 3335 module in the following way: 3337 Alarm Severity Assignment Profile (ASAP): the alarm profile 3338 "/alarms/alarm-profile/". 3340 Alarm Reporting Control (ARC): alarm shelving "/alarms/control/ 3341 alarm-shelving/" and the ability to control alarm notifications 3342 "/alarms/control/notify-status-changes". 3344 F.3. Usability Requirements 3346 Common alarm problems and the cause of the problems are summarized in 3347 Table 2. This summary is adopted to networking based on the ISA 3348 [ISA182] and EEMUA [EEMUA] standards. 3350 +------------------+--------------------------------+---------------+ 3351 | Problem | Cause | How this | 3352 | | | module | 3353 | | | address the | 3354 | | | cause | 3355 +------------------+--------------------------------+---------------+ 3356 | Alarms are | "Nuisance" alarms (chattering | Strict | 3357 | generated but | alarms and fleeting alarms), | definition of | 3358 | they are ignored | faulty hardware, redundant | alarms | 3359 | by the operator. | alarms, cascading alarms, | requiring | 3360 | | incorrect alarm settings, | corrective | 3361 | | alarms have not been | response. | 3362 | | rationalized, the alarms | Alarm | 3363 | | represent log information | requirements | 3364 | | rather than true alarms. | in Table 3. | 3365 | | | | 3366 | When alarms | Insufficient alarm response | The alarm | 3367 | occur, operators | procedures and not well | inventory | 3368 | do not know how | defined alarm types. | lists all | 3369 | to respond. | | alarm types | 3370 | | | and | 3371 | | | corrective | 3372 | | | actions. | 3373 | | | Alarm | 3374 | | | requirements | 3375 | | | in Table 3. | 3376 | | | | 3377 | The alarm | Nuisance alarms, stale alarms, | The alarm | 3378 | display is full | alarms from equipment not in | definition | 3379 | of alarms, even | service. | and alarm | 3380 | when there is | | shelving. | 3381 | nothing wrong. | | | 3382 | | | | 3383 | During a | Incorrect prioritization of | State-based | 3384 | failure, | alarms. Not using advanced | alarm model, | 3385 | operators are | alarm techniques (e.g. state- | alarm rate | 3386 | flooded with so | based alarming). | requirements | 3387 | many alarms that | | in Table 4 | 3388 | they do not know | | and Table 5 | 3389 | which ones are | | | 3390 | the most | | | 3391 | important. | | | 3392 +------------------+--------------------------------+---------------+ 3394 Table 2: Alarm Problems and Causes 3396 Based upon the above problems EEMUA gives the following definition of 3397 a good alarm: 3399 +----------------+--------------------------------------------------+ 3400 | Characteristic | Explanation | 3401 +----------------+--------------------------------------------------+ 3402 | Relevant | Not spurious or of low operational value. | 3403 | | | 3404 | Unique | Not duplicating another alarm. | 3405 | | | 3406 | Timely | Not long before any response is needed or too | 3407 | | late to do anything. | 3408 | | | 3409 | Prioritized | Indicating the importance that the operator | 3410 | | deals with the problem. | 3411 | | | 3412 | Understandable | Having a message which is clear and easy to | 3413 | | understand. | 3414 | | | 3415 | Diagnostic | Identifying the problem that has occurred. | 3416 | | | 3417 | Advisory | Indicative of the action to be taken. | 3418 | | | 3419 | Focusing | Drawing attention to the most important issues. | 3420 +----------------+--------------------------------------------------+ 3422 Table 3: Definition of a Good Alarm 3424 Vendors SHOULD rationalize all alarms according to above. Another 3425 crucial requirement is acceptable alarm notification rates. Vendors 3426 SHOULD make sure that they do not exceed the recommendations from 3427 EEMUA below: 3429 +-----------------------------------+-------------------------------+ 3430 | Long Term Alarm Rate in Steady | Acceptability | 3431 | Operation | | 3432 +-----------------------------------+-------------------------------+ 3433 | More than one per minute | Very likely to be | 3434 | | unacceptable. | 3435 | | | 3436 | One per 2 minutes | Likely to be over-demanding. | 3437 | | | 3438 | One per 5 minutes | Manageable. | 3439 | | | 3440 | Less than one per 10 minutes | Very likely to be acceptable. | 3441 +-----------------------------------+-------------------------------+ 3443 Table 4: Acceptable Alarm Rates, Steady State 3445 +----------------------------+--------------------------------------+ 3446 | Number of alarms displayed | Acceptability | 3447 | in 10 minutes following a | | 3448 | major network problem | | 3449 +----------------------------+--------------------------------------+ 3450 | More than 100 | Definitely excessive and very likely | 3451 | | to lead to the operator to abandon | 3452 | | the use of the alarm system. | 3453 | | | 3454 | 20-100 | Hard to cope with. | 3455 | | | 3456 | Under 10 | Should be manageable - but may be | 3457 | | difficult if several of the alarms | 3458 | | require a complex operator response. | 3459 +----------------------------+--------------------------------------+ 3461 Table 5: Acceptable Alarm Rates, Burst 3463 The numbers in Table 4 and Table 5 are the sum of all alarms for a 3464 network being managed from one alarm console. So every individual 3465 system or NMS contributes to these numbers. 3467 Vendors SHOULD make sure that the following rules are used in 3468 designing the alarm interface: 3470 1. Rationalize the alarms in the system to ensure that every alarm 3471 is necessary, has a purpose, and follows the cardinal rule - that 3472 it requires an operator response. Adheres to the rules of 3473 Table 3 3475 2. Audit the quality of the alarms. Talk with the operators about 3476 how well the alarm information support them. Do they know what 3477 to do in the event of an alarm? Are they able to quickly 3478 diagnose the problem and determine the corrective action? Does 3479 the alarm text adhere to the requirements in Table 3? 3481 3. Analyze and benchmark the performance of the system and compare 3482 it to the recommended metrics in Table 4 and Table 5. Start by 3483 identifying nuisance alarms, standing alarms at normal state and 3484 startup. 3486 Authors' Addresses 3488 Stefan Vallin 3489 Stefan Vallin AB 3491 Email: stefan@wallan.se 3492 Martin Bjorklund 3493 Cisco 3495 Email: mbj@tail-f.com