idnits 2.17.1 draft-ietf-ccamp-alarm-module-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 440 has weird spacing: '...perator str...' == Line 445 has weird spacing: '...w state wri...' == Line 641 has weird spacing: '...alifier ala...' == Line 691 has weird spacing: '...alifier lea...' == Line 701 has weird spacing: '...everity sev...' == (3 more instances...) -- The document date (October 9, 2018) is 1998 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) ** Obsolete normative reference: RFC 6536 (Obsoleted by RFC 8341) Summary: 2 errors (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Vallin 3 Internet-Draft Stefan Vallin AB 4 Intended status: Standards Track M. Bjorklund 5 Expires: April 12, 2019 Cisco 6 October 9, 2018 8 YANG Alarm Module 9 draft-ietf-ccamp-alarm-module-04 11 Abstract 13 This document defines a YANG module for alarm management. It 14 includes functions for alarm list management, alarm shelving and 15 notifications to inform management systems. There are also RPCs to 16 manage the operator state of an alarm and administrative alarm 17 procedures. The module carefully maps to relevant alarm standards. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on April 12, 2019. 36 Copyright Notice 38 Copyright (c) 2018 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 55 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 57 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 58 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.3. Identifying the Alarming Resource . . . . . . . . . . . . 7 60 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 8 61 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 62 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 9 63 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 10 64 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 10 65 3.6. Root Cause, Impacted Resources and Related Alarms . . . . 10 66 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 11 67 3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 11 68 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 12 69 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 13 70 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 13 71 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 13 72 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 14 73 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 15 74 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 17 75 4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 17 76 4.7. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 17 77 4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 17 78 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 18 79 6. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 47 80 7. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 48 81 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 59 82 9. Security Considerations . . . . . . . . . . . . . . . . . . . 59 83 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60 84 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 85 11.1. Normative References . . . . . . . . . . . . . . . . . . 60 86 11.2. Informative References . . . . . . . . . . . . . . . . . 61 87 Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 62 88 Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 63 89 Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 64 90 Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 65 91 Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 66 92 Appendix F. Relationships to other standards . . . . . . . . . . 67 93 F.1. Relationship to RFC 8348 . . . . . . . . . . . . . . . . 67 94 F.2. Relationship to other alarm standards . . . . . . . . . . 67 95 F.2.1. Alarm definition . . . . . . . . . . . . . . . . . . 67 96 F.2.2. Data model . . . . . . . . . . . . . . . . . . . . . 69 97 Appendix G. Alarm Usability Requirements . . . . . . . . . . . . 71 98 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 74 100 1. Introduction 102 This document defines a YANG [RFC7950] module for alarm management. 103 The purpose is to define a standardized alarm interface for network 104 devices that can be easily integrated into management applications. 105 The model is also applicable as a northbound alarm interface in the 106 management applications. 108 Alarm monitoring is a fundamental part of monitoring the network. 109 Raw alarms from devices do not always tell the status of the network 110 services or necessarily point to the root cause. However, being able 111 to feed alarms to the alarm management application in a standardized 112 format is a starting point for performing higher level network 113 assurance tasks. 115 The design of the module is based on experience from using and 116 implementing available alarm standards from ITU [X.733], 3GPP 117 [ALARMIRP] and ANSI [ISA182]. 119 1.1. Terminology and Notation 121 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 122 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 123 "OPTIONAL" in this document are to be interpreted as described in BCP 124 14 [RFC2119] [RFC8174] when, and only when, they appear in all 125 capitals, as shown here. 127 The following terms are defined in [RFC7950]: 129 o action 131 o client 133 o data tree 135 o RPC 137 o server 139 The following terms are used within this document: 141 o Alarm (the general concept): An alarm signifies an undesirable 142 state in a resource that requires corrective action. 144 o Alarm Type: An alarm type identifies a possible unique alarm state 145 for a resource. Alarm types are names to identify the state like 146 "link-alarm", "jitter-violation", "high-disk-utilization". 148 o Resource: A fine-grained identification of the alarming resource, 149 for example: an interface, a process. 151 o Alarm Instance: The alarm state for a specific resource and alarm 152 type. For example (GigabitEthernet0/15, link-alarm). An entry in 153 the alarm list. 155 o Alarm Inventory: A list of all possible alarm types on a system. 157 o Alarm Shelving: Blocking alarms according to specific criteria. 159 o Corrective Action: An action taken by an operator or automation 160 routine in order to minimize the impact of the alarm or resolving 161 the root cause. 163 o Management System: The alarm management application that consumes 164 the alarms, i.e., acts as a client. 166 o System: The system that implements this YANG alarm module, i.e., 167 acts as a server. This corresponds to a network device or a 168 management application that provides a north-bound alarm 169 interface. 171 Tree diagrams used in this document follow the notation defined in 172 [RFC8340]. 174 2. Objectives 176 The objectives for the design of the Alarm Module are: 178 o Simple to use. If a system supports this module, it shall be 179 straight-forward to integrate this into a YANG based alarm 180 manager. 182 o View alarms as states on resources and not as discrete 183 notifications. 185 o Clear definition of "alarm" in order to exclude general events 186 that should not be forwarded as alarm notifications. 188 o Clear and precise identification of alarm types and alarm 189 instances. 191 o A management system should be able to pull all available alarm 192 types from a system, i.e., read the alarm inventory from a system. 193 This makes it possible to prepare alarm operators with 194 corresponding alarm instructions. 196 o Address alarm usability requirements, see Appendix G. While IETF 197 has not really addressed alarm management, telecom standards has 198 addressed it purely from a protocol perspective. The process 199 industry has published several relevant standards addressing 200 requirements for a useful alarm interface; [EEMUA], [ISA182]. 201 This alarm module defines usability requirements as well as a YANG 202 data model. 204 o Mapping to X.733, which is a requirement for some alarm systems. 205 Still, keep some of the X.733 concepts out of the core model in 206 order to make the model small and easy to understand. 208 3. Alarm Module Concepts 210 This section defines the fundamental concepts behind the data model. 211 This section is rooted in the works of Vallin et. al [ALARMSEM]. 213 3.1. Alarm Definition 215 An alarm signifies an undesirable state in a resource that requires 216 corrective action. 218 There are two main things to remember from this definition: 220 1. the definition focuses on leaving out events and logging 221 information in general. Alarms should only be used for undesired 222 states that require action. 224 2. the definition also focus on alarms as a state on a resource, not 225 the notifications that report the state changes. 227 See Appendix F for information how this definition relates to other 228 alarm standards. 230 3.2. Alarm Type 232 This document defines an alarm type with an alarm type id and an 233 alarm type qualifier. 235 The alarm type id is modeled as a YANG identity. With YANG 236 identities, new alarm types can be defined in a distributed fashion. 237 YANG identities are hierarchical, which means that an hierarchy of 238 alarm types can be defined. 240 Standards and vendors should define their own alarm type identities 241 based on this definition. 243 The use of YANG identities means that all possible alarms are 244 identified at design time. This explicit declaration of alarm types 245 makes it easier to allow for alarm qualification reviews and 246 preparation of alarm actions and documentation. 248 There are occasions where the alarm types are not known at design 249 time. For example, a system with digital inputs that allows users to 250 connects detectors (e.g., smoke detector) to the inputs. In this 251 case it is a configuration action that says that certain connectors 252 are fire alarms for example. 254 In order to allow for dynamic addition of alarm types the alarm 255 module allows for further qualification of the identity based alarm 256 type using a string. A potential drawback of this is that there is a 257 big risk that alarm operators will receive alarm types as a surprise, 258 they do not know how to resolve the problem since a defined alarm 259 procedure does not necessarily exist. To avoid this risk the system 260 MUST publish all possible alarm types in the alarm inventory, see 261 Section 4.2. 263 A vendor or standard organization can define their own alarm-type 264 hierarchy. The example below shows a hierarchy based on X.733 event 265 types: 267 import ietf-alarms { 268 prefix al; 269 } 270 identity vendor-alarms { 271 base al:alarm-type; 272 } 273 identity communications-alarm { 274 base vendor-alarms; 275 } 276 identity link-alarm { 277 base communications-alarm; 278 } 280 Alarm types can be abstract. An abstract alarm type is used as a 281 base for defining hierarchical alarm types. Concrete alarm types are 282 used for alarm states and appear in the alarm inventory. There are 283 two kinds of concrete alarm types: 285 1. The last subordinate identity in the "alarm-type-id" hierarchy is 286 concrete, for example: "alarm-identity.environmental- 287 alarm.smoke". In this example "alarm-identity" and 288 "environmental-alarm" are abstract YANG identities, whereas 289 "smoke" is a concrete YANG identity. 291 2. The YANG identity hierarchy is abstract and the concrete alarm 292 type is defined by the dynamic alarm qualifier string, for 293 example: "alarm-identity.environmental-alarm.external-detector" 294 with alarm-type-qualifier "smoke". 296 For example: 298 // Alternative 1: concrete alarm type identity 299 import ietf-alarms { 300 prefix al; 301 } 302 identity environmental-alarm { 303 base al:alarm-type; 304 description "Abstract alarm type"; 305 } 306 identity smoke { 307 base environmental-alarm; 308 description "Concrete alarm type"; 309 } 311 // Alternative 2: concrete alarm type qualifier 312 import ietf-alarms { 313 prefix al; 314 } 315 identity environmental-alarm { 316 base al:alarm-type; 317 description "Abstract alarm type"; 318 } 319 identity external-detector { 320 base environmental-alarm; 321 description 322 "Abstract alarm type, a run-time configuration 323 procedure sets the type of alarm detected. This will 324 be reported in the alarm-type-qualifier."; 325 } 327 A server SHOULD strive to minimize the number of dynamically defined 328 alarm types. 330 3.3. Identifying the Alarming Resource 332 It is of vital importance to be able to refer to the alarming 333 resource. This reference must be as fine-grained as possible. If 334 the alarming resource exists in the data tree then an instance- 335 identifier MUST be used with the full path to the object. 337 When the module is used in a controller/orchestrator/manager the 338 original device resource identification can be modified to include 339 the device in the path. The details depend on how devices are 340 identified, and are out of scope for this specification. 342 Example: 344 The original device alarm might identify the resource as 345 "/dev:interfaces/dev:interface[dev:name='FastEthernet1/0']". 347 The resource identification in the manager could look something 348 like: "/mgr:devices/mgr:device[mgr:name='xyz123']/dev:interfaces/ 349 dev:interface[dev:name='FastEthernet1/0']" 351 This module also allows for alternate naming of the alarming resource 352 if it is not available in the data tree. 354 3.4. Identifying Alarm Instances 356 A primary goal of this alarm module is to remove any ambiguity in how 357 alarm notifications are mapped to an update of an alarm instance. 358 X.733 and especially 3GPP were not really clear on this point. This 359 YANG alarm module states that the tuple (resource, alarm type 360 identifier, alarm type qualifier) corresponds to a single alarm 361 instance. This means that alarm notifications for the same resource 362 and same alarm type are matched to update the same alarm instance. 363 These three leafs are therefore used as the key in the alarm list: 365 list alarm { 366 key "resource alarm-type-id alarm-type-qualifier"; 367 ... 368 } 370 3.5. Alarm Life-Cycle 372 The alarm model clearly separates the resource alarm life-cycle from 373 the operator and administrative life-cycles of an alarm. 375 o resource alarm life-cycle: the alarm instrumentation that controls 376 alarm raise, clearance, and severity changes. 378 o operator alarm life-cycle: operators acting upon alarms with 379 actions like acknowledgment and closing. Closing an alarm implies 380 that the operator considers the corrective action performed. 381 Operators can also shelf (block/filter) alarms in order to avoid 382 nuisance alarms. 384 o administrative alarm life-cycle: purging (deleting) unwanted 385 alarms and compressing the alarm status change list. This module 386 exposes operations to manage the administrative life-cycle. The 387 server may also perform these operations based on other policies, 388 but how that is done is out of scope for this document. 390 A server SHOULD describe how long it retains cleared/closed alarms: 391 until manually purged or if it has an automatic removal policy. 393 3.5.1. Resource Alarm Life-Cycle 395 From a resource perspective, an alarm can for example have the 396 following life-cycle: raise, change severity, change severity, clear, 397 being raised again etc. All of these status changes can have 398 different alarm texts generated by the instrumentation. Two 399 important things to note: 401 1. Alarms are not deleted when they are cleared. Deleting alarms is 402 an administrative process. The alarm module defines an rpc 403 "purge" that deletes alarms. 405 2. Alarms are not cleared by operators, only the underlying 406 instrumentation can clear an alarm. Operators can close alarms. 408 The YANG tree representation below illustrates the resource oriented 409 life-cycle: 411 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 412 ... 413 +--ro is-cleared boolean 414 +--ro last-changed yang:date-and-time 415 +--ro perceived-severity severity 416 +--ro alarm-text alarm-text 417 +--ro status-change* [time] 418 +--ro time yang:date-and-time 419 +--ro perceived-severity severity-with-clear 420 +--ro alarm-text alarm-text 422 For every status change from the resource perspective a row is added 423 to the "status-change" list. The last status values are also 424 represented as leafs for the alarm. Note well that the alarm 425 severity does not include "cleared", alarm clearance is a boolean 426 flag. 428 An alarm can therefore look like this: ((GigabitEthernet0/25, link- 429 alarm,""), false, T, major, "Interface GigabitEthernet0/25 down") 431 3.5.2. Operator Alarm Life-cycle 433 Operators can also act upon alarms using the set-operator-state 434 action: 436 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 437 ... 438 +--ro operator-state-change* [time] {operator-actions}? 439 | +--ro time yang:date-and-time 440 | +--ro operator string 441 | +--ro state operator-state 442 | +--ro text? string 443 +---x set-operator-state {operator-actions}? 444 +---w input 445 +---w state writable-operator-state 446 +---w text? string 448 The operator state for an alarm can be: "none", "ack", "shelved", and 449 "closed". Alarm deletion (using the rpc "purge-alarms"), can use 450 this state as a criteria. A closed alarm is an alarm where the 451 operator has performed any required corrective actions. Closed 452 alarms are good candidates for being purged. 454 3.5.3. Administrative Alarm Life-Cycle 456 Deleting alarms from the alarm list is considered an administrative 457 action. This is supported by the "purge-alarms" rpc. The "purge- 458 alarms" rpc takes a filter as input. The filter selects alarms based 459 on the operator and resource life-cycle such as "all closed cleared 460 alarms older than a time specification". The server may also perform 461 these operations based on other policies, but how that is done is out 462 of scope for this document. 464 Alarms can be compressed. Compressing an alarm deletes all entries 465 in the alarm's "status-change" list except for the last status 466 change. A client can perform this using the "compress-alarms" rpc. 467 The server may also perform these operations based on other policies, 468 but how that is done is out of scope for this document. 470 3.6. Root Cause, Impacted Resources and Related Alarms 472 The general principle of this alarm module is to limit the amount of 473 alarms. The alarm has two leaf-lists to identify possible impacted 474 resources and possible root-cause resources. The system should not 475 represent individual alarms for the possible root-cause resources and 476 impacted resources. These serves as hints only. It is up to the 477 client application to use this information to present the overall 478 status. 480 A system should always strive to identify the resource that can be 481 acted upon as the "resource" leaf. The "impacted-resource" leaf-list 482 shall be used to identify any side-effects of the alarm. The 483 impacted resources can not be acted upon to fix the problem. An 484 example of this kind of alarm might be a disc full problem which 485 impacts a number of databases. 487 In some occasions the system might not be capable of detecting the 488 root cause, the resource that can be acted upon. The instrumentation 489 in this case only monitors the side-effect and needs to represent an 490 alarm that indicates a situation that needs acting upon. The 491 instrumentation still might identify possible candidates for the 492 root-cause resource. In this case the "root-cause-resource" leaf- 493 list can be used to indicate the candidate root-cause resources. An 494 example of this kind of alarm might be an active test tool that 495 detects an SLA violation on a VPN connection and identifies the 496 devices along the chain as candidate root causes. 498 The alarm module also supports a way to associate different alarms to 499 each other with the "related-alarm" list. This list enables the 500 server to inform the client that certain alarms are related to other 501 alarms. 503 Note well that this module does not prescribe any dependencies or 504 preference between the above alarm correlation mechanisms. Different 505 systems have different capabilities and the above described 506 mechanisms are available to support the instrumentation features. 508 3.7. Alarm Shelving 510 Alarm shelving is an important function in order for alarm management 511 applications and operators to stop superfluous alarms. A shelved 512 alarm implies that any alarms fulfilling this criteria are ignored 513 (blocked/filtered). Shelved alarms appear in a dedicated shelved 514 alarm list in order not to disturb the relevant alarms. Shelved 515 alarms do not generate notifications. 517 3.8. Alarm Profiles 519 Alarm profiles are used to configure further information to an alarm 520 type. This module supports configuring severity levels overriding 521 the system default levels. This corresponds to the Alarm Assignment 522 Profile, ASAP, functionality in M.3100 [M.3100] and M.3160 [M.3160]. 523 Other standard or enterprise modules can augment this list with 524 further alarm type information. 526 4. Alarm Data Model 528 The fundamental parts of the data model are the "alarm-list" with 529 associated notifications and the "alarm-inventory" list of all 530 possible alarm types. These MUST be implemented by a system. The 531 rest of the data model are made conditional with YANG the features 532 "operator-actions", "alarm-shelving", "alarm-history", "alarm- 533 summary", "alarm-profile", and "severity-assignment". 535 The data model has the following overall structure: 537 +--rw control 538 | +--rw max-alarm-status-changes? union 539 | +--rw (notify-status-changes)? 540 | | ... 541 | +--rw alarm-shelving {alarm-shelving}? 542 | ... 543 +--ro alarm-inventory 544 | +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 545 | ... 546 +--ro summary {alarm-summary}? 547 | +--ro alarm-summary* [severity] 548 | | ... 549 | +--ro shelves-active? empty {alarm-shelving}? 550 +--ro alarm-list 551 | +--ro number-of-alarms? yang:gauge32 552 | +--ro last-changed? yang:date-and-time 553 | +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 554 | ... 555 +--ro shelved-alarms {alarm-shelving}? 556 | +--ro number-of-shelved-alarms? yang:gauge32 557 | +--ro alarm-shelf-last-changed? yang:date-and-time 558 | +--ro shelved-alarm* 559 | [resource alarm-type-id alarm-type-qualifier] 560 | ... 561 +--rw alarm-profile* 562 [alarm-type-id alarm-type-qualifier-match resource] 563 {alarm-profile}? 564 +--rw alarm-type-id al:alarm-type-id 565 +--rw alarm-type-qualifier-match string 566 +--rw resource al:resource-match 567 +--rw description string 568 +--rw alarm-severity-assignment-profile 569 {severity-assignment}? 570 ... 572 4.1. Alarm Control 574 The "/alarms/control/notify-status-changes" choice controls if 575 notifications are sent for all state changes, only raise and clear, 576 or only notifications more severe than a configured level. This 577 feature in combination with alarm shelving corresponds to the ITU 578 Alarm Report Control functionality. 580 Every alarm has a list of status changes, this is a circular list. 581 The length of this list is controlled by "/alarms/control/max-alarm- 582 status-changes". 584 4.1.1. Alarm Shelving 586 The shelving control tree is shown below: 588 +--rw control 589 +--rw alarm-shelving {alarm-shelving}? 590 +--rw shelf* [name] 591 +--rw name string 592 +--rw resource* resource-match 593 +--rw alarm-type-id? alarm-type-id 594 +--rw alarm-type-qualifier-match? string 595 +--rw description? string 597 Shelved alarms are shown in a dedicated shelved alarm list. The 598 instrumentation MUST move shelved alarms from the alarm list 599 (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- 600 alarms/). Shelved alarms do not generate any notifications. When 601 the shelving criteria is removed or changed the alarm list MUST be 602 updated to the correct actual state of the alarms. 604 Shelving and unshelving can only be performed by editing the shelf 605 configuration. It cannot be performed on individual alarms. The 606 server will add an operator state indicating that the alarm was 607 shelved/unshelved. 609 A leaf (/alarms/summary/shelfs-active) in the alarm summary indicates 610 if there are shelved alarms. 612 A system can select to not support the shelving feature. 614 4.2. Alarm Inventory 616 The alarm inventory represents all possible alarm types that may 617 occur in the system. A management system may use this to build alarm 618 procedures. The alarm inventory is relevant for several reasons: 620 The system might not instrument all defined alarm type identities, 621 and some alarm identities are abstract. 623 The system has configured dynamic alarm types using the alarm 624 qualifier. The inventory makes it possible for the management 625 system to discover these. 627 Note that the mechanism whereby dynamic alarm types are added using 628 the alarm type qualifier MUST populate this list. 630 The optional leaf-list "resource" in the alarm inventory enables the 631 system to publish for which resources a given alarm type may appear. 633 A server MUST implement the alarm inventory in order to enable 634 controlled alarm procedures in the client. 636 The alarm inventory tree is shown below: 638 +--ro alarm-inventory 639 +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 640 +--ro alarm-type-id alarm-type-id 641 +--ro alarm-type-qualifier alarm-type-qualifier 642 +--ro resource* resource-match 643 +--ro has-clear boolean 644 +--ro severity-levels* severity 645 +--ro description string 647 4.3. Alarm Summary 649 The alarm summary list summarizes alarms per severity; how many 650 cleared, cleared and closed, and closed. It also gives an indication 651 if there are shelved alarms. 653 The alarm summary tree is shown below: 655 +--ro summary {alarm-summary}? 656 +--ro alarm-summary* [severity] 657 | +--ro severity severity 658 | +--ro total? yang:gauge32 659 | +--ro cleared? yang:gauge32 660 | +--ro cleared-not-closed? yang:gauge32 661 | | {operator-actions}? 662 | +--ro cleared-closed? yang:gauge32 663 | | {operator-actions}? 664 | +--ro not-cleared-closed? yang:gauge32 665 | | {operator-actions}? 666 | +--ro not-cleared-not-closed? yang:gauge32 667 | {operator-actions}? 668 +--ro shelves-active? empty {alarm-shelving}? 670 4.4. The Alarm List 672 The alarm list (/alarms/alarm-list) is a function from (resource, 673 alarm type, alarm type qualifier) to the current composite alarm 674 state. The composite state includes states for the resource life- 675 cycle such as severity, clearance flag and operator states such as 676 acknowledgment. 678 +--ro alarm-list 679 +--ro number-of-alarms? yang:gauge32 680 +--ro last-changed? yang:date-and-time 681 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 682 +--ro resource resource 683 +--ro alarm-type-id alarm-type-id 684 +--ro alarm-type-qualifier alarm-type-qualifier 685 +--ro alt-resource* resource 686 +--ro related-alarm* 687 | [resource alarm-type-id alarm-type-qualifier] 688 | +--ro resource 689 | | -> /alarms/alarm-list/alarm/resource 690 | +--ro alarm-type-id leafref 691 | +--ro alarm-type-qualifier leafref 692 +--ro impacted-resource* resource 693 +--ro root-cause-resource* resource 694 +--ro time-created yang:date-and-time 695 +--ro is-cleared boolean 696 +--ro last-changed yang:date-and-time 697 +--ro perceived-severity severity 698 +--ro alarm-text alarm-text 699 +--ro status-change* [time] {alarm-history}? 700 | +--ro time yang:date-and-time 701 | +--ro perceived-severity severity-with-clear 702 | +--ro alarm-text alarm-text 703 +--ro operator-state-change* [time] {operator-actions}? 704 | +--ro time yang:date-and-time 705 | +--ro operator string 706 | +--ro state operator-state 707 | +--ro text? string 708 +---x set-operator-state {operator-actions}? 709 | +---w input 710 | +---w state writable-operator-state 711 | +---w text? string 712 +---n operator-action {operator-actions}? 713 +-- time yang:date-and-time 714 +-- operator string 715 +-- state operator-state 716 +-- text? string 718 Every alarm has three important states, the resource clearance state 719 "is-cleared", the severity "perceived-severity" and the operator 720 state available in the operator state change list. 722 In order to see the alarm history the resource state changes are 723 available in the "status-change" list and the operator history is 724 available in the "operator-state-change" list. 726 4.5. The Shelved Alarms List 728 The shelved alarm list has the same structure as the alarm list 729 above. It shows all the alarms that matches the shelving criteria 730 (/alarms/control/alarm-shelving). 732 4.6. Alarm Profiles 734 Alarm profiles (/alarms/alarm-profile/) is a list of configurable 735 alarm types. The list supports configurable alarm severity levels in 736 the container "alarm-severity-assignment-profile". If an alarm 737 matches the configured alarm type it MUST use the configured severity 738 level(s) instead of the system default. This configuration MUST also 739 be represented in the alarm inventory. 741 +--rw alarm-profile* 742 [alarm-type-id alarm-type-qualifier-match resource] 743 {alarm-profile}? 744 +--rw alarm-type-id al:alarm-type-id 745 +--rw alarm-type-qualifier-match string 746 +--rw resource al:resource-match 747 +--rw description string 748 +--rw alarm-severity-assignment-profile 749 {severity-assignment}? 750 +--rw severity-levels* al:severity 752 4.7. RPCs and Actions 754 The alarm module supports rpcs and actions to manage the alarms: 756 "purge-alarms" (rpc): delete alarms according to specific 757 criteria, for example all cleared alarms older then a specific 758 date. 760 "compress-alarms" (rpc): compress the status-change list for the 761 alarms. 763 "set-operator-state" (action): change the operator state for an 764 alarm: for example acknowledge. 766 4.8. Notifications 768 The alarm module supports a general notification to report alarm 769 state changes. It carries all relevant parameters for the alarm 770 management application. 772 There is also a notification to report that an operator changed the 773 operator state on an alarm, like acknowledge. 775 If the alarm inventory is changed, for example a new card type is 776 inserted, a notification will tell the management application that 777 new alarm types are available. 779 5. Alarm YANG Module 781 This YANG module references [RFC6991]. 783 file "ietf-alarms@2018-10-09.yang" 784 module ietf-alarms { 785 yang-version 1.1; 786 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; 787 prefix al; 789 import ietf-yang-types { 790 prefix yang; 791 reference "RFC 6991: Common YANG Data Types."; 792 } 794 organization 795 "IETF CCAMP Working Group"; 796 contact 797 "WG Web: 798 WG List: 800 Editor: Stefan Vallin 801 803 Editor: Martin Bjorklund 804 "; 805 description 806 "This module defines an interface for managing alarms. Main 807 inputs to the module design are the 3GPP Alarm IRP, ITU-T X.733 808 and ANSI/ISA-18.2 alarm standards. 810 Main features of this module include: 812 * Alarm list: 813 A list of all alarms. Cleared alarms stay in 814 the list until explicitly purged. 816 * Operator actions on alarms: 817 Acknowledging and closing alarms. 819 * Administrative actions on alarms: 820 Purging alarms from the list according to specific 821 criteria. 823 * Alarm inventory: 824 A management application can read all 825 alarm types implemented by the system. 827 * Alarm shelving: 828 Shelving (blocking) alarms according 829 to specific criteria. 831 * Alarm profiles: 832 A management system can attach further 833 information to alarm types, for example 834 overriding system default severity 835 levels. 837 This module uses a stateful view on alarms. An alarm is a state 838 for a specific resource (note that an alarm is not a 839 notification). An alarm type is a possible alarm state for a 840 resource. For example, the tuple: 842 ('link-alarm', 'GigabitEthernet0/25') 844 is an alarm of type 'link-alarm' on the resource 845 'GigabitEthernet0/25'. 847 Alarm types are identified using YANG identities and an optional 848 string-based qualifier. The string-based qualifier allows for 849 dynamic extension of the statically defined alarm types. Alarm 850 types identify a possible alarm state and not the individual 851 notifications. For example, the traditional 'link-down' and 852 'link-up' notifications are two notifications referring to the 853 same alarm type 'link-alarm'. 855 With this design there is no ambiguity about how alarm and alarm 856 clear correlation should be performed: notifications that report 857 the same resource and alarm type are considered updates of the 858 same alarm, e.g., clearing an active alarm or changing the 859 severity of an alarm. 861 The instrumentation can update 'severity' and 'alarm-text' on an 862 existing alarm. The above alarm example can therefore look 863 like: 865 (('link-alarm', 'GigabitEthernet0/25'), 866 warning, 867 'interface down while interface admin state is up') 869 There is a clear separation between updates on the alarm from 870 the underlying resource, like clear, and updates from an 871 operator like acknowledge or closing an alarm: 873 (('link-alarm', 'GigabitEthernet0/25'), 874 warning, 875 'interface down while interface admin state is up', 876 cleared, 877 closed) 879 Administrative actions like removing closed alarms older than a 880 given time is supported. 882 This alarm module does not define how the underlying 883 instrumentation detects and clears the specific alarms. 884 That belongs to the SDO or enterprise that owns that 885 specific technology. 887 Copyright (c) 2018 IETF Trust and the persons identified as 888 authors of the code. All rights reserved. 890 Redistribution and use in source and binary forms, with or 891 without modification, is permitted pursuant to, and subject to 892 the license terms contained in, the Simplified BSD License set 893 forth in Section 4.c of the IETF Trust's Legal Provisions 894 Relating to IETF Documents 895 (https://trustee.ietf.org/license-info). 897 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 898 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 899 'OPTIONAL' in the module text are to be interpreted as described 900 in RFC 2119 (https://tools.ietf.org/html/rfc2119). 902 This version of this YANG module is part of RFC XXXX 903 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 904 full legal notices."; 906 revision 2018-10-09 { 907 description 908 "Initial revision."; 909 reference "RFC XXXX: YANG Alarm Module"; 910 } 912 /* 913 * Features 914 */ 916 feature operator-actions { 917 description 918 "This feature indicates that the system supports operator 919 states on alarms."; 920 } 922 feature alarm-shelving { 923 description 924 "This feature indicates that the system supports shelving 925 (blocking) alarms."; 926 } 928 feature alarm-history { 929 description 930 "This feature indicates that server maintains a history of 931 state changes for each alarm. For example, if an alarm 932 toggles between cleared and active 10 times, these state 933 changes are present in a separate list in the alarm."; 934 } 936 feature alarm-summary { 937 description 938 "This feature indicates that the server summarizes the number 939 of alarms per severity and operator state."; 940 } 942 feature alarm-profile { 943 description 944 "The system supports clients to configure further information 945 to each alarm type."; 946 } 948 feature severity-assignment { 949 description 950 "The system supports configurable alarm severity levels."; 951 reference 952 "M.3160/M.3100 Alarm Severity Assignment Profile, ASAP"; 953 } 955 /* 956 * Identities 957 */ 959 identity alarm-type-id { 960 description 961 "Base identity for alarm types. A unique identification of the 962 alarm, not including the resource. Different resources can 963 share alarm types. If the resource reports the same alarm 964 type, it is to be considered to be the same alarm. The alarm 965 type is a simplification of the different X.733 and 3GPP alarm 966 IRP alarm correlation mechanisms and it allows for 967 hierarchical extensions. 969 A string-based qualifier can be used in addition to the 970 identity in order to have different alarm types based on 971 information not known at design-time, such as values in 972 textual SNMP Notification var-binds. 974 Standards and vendors can define sub-identities to clearly 975 identify specific alarm types. 977 This identity is abstract and MUST NOT be used for alarms."; 978 } 980 /* 981 * Common types 982 */ 984 typedef resource { 985 type union { 986 type instance-identifier { 987 require-instance false; 988 } 989 type yang:object-identifier; 990 type yang:uuid; 991 type string; 992 } 993 description 994 "This is an identification of the alarming resource, such as an 995 interface. It should be as fine-grained as possible both to 996 guide the operator and to guarantee uniqueness of the alarms. 998 If the alarming resource is modelled in YANG, this type will 999 be an instance-identifier. 1001 If the resource is an SNMP object, the type will be an 1002 object-identifier. 1004 If the resource is anything else, for example a distinguished 1005 name or a CIM path, this type will be a string. 1007 If the alarming object is identified by a UUID use the uuid 1008 type. Be cautious when using this type, since a UUID is hard 1009 to use for an operator. 1011 If the server supports several models, the presedence should 1012 be in the order as given in the union definition."; 1013 } 1015 typedef resource-match { 1016 type union { 1017 type yang:xpath1.0; 1018 type yang:object-identifier; 1019 type string; 1020 } 1021 description 1022 "This type is used to match resources of type 'resource'. 1023 Since the type 'resource' is a union of different types, 1024 the 'resource-match' type is also a union of corresponding 1025 types. 1027 If the type is given as an XPath 1.0 expression, a resource 1028 of type 'instance-identifier' matches if the instance is part 1029 of the node set that is the result of evaluating the XPath 1.0 1030 expression. For example, the XPath 1.0 expression: 1032 /if:interfaces/if:interface[if:type='ianaift:ethernetCsmacd'] 1034 would match the resource instance-identifier: 1036 /if:interfaces/if:interface[if:name='eth1'], 1038 assuming that the interface 'eth1' is of type 1039 'ianaift:ethernetCsmacd'. 1041 If the type is given as an object identifier, a resource of 1042 type 'object-identifier' matches if the match object 1043 identifier is a prefix of the resource's object identifier. 1044 For example, the value: 1046 1.3.6.1.2.1.2.2 1048 would match the resource object identifier: 1050 1.3.6.1.2.1.2.2.1.1.5 1052 If the type is given as an UUID or a string, it is interpreted 1053 as a W3C regular expression, which matches a resource of type 1054 'yang:uuid' or 'string' if the given regular expression 1055 matches the resource string. 1057 If the type is given as an XPath expression it is evaluated 1058 in the following XPath context: 1060 o The set of namespace declarations are those in scope on 1061 the leaf element where this type is used. 1063 o The set of variable bindings is empty. 1065 o The function library is the core function library 1066 and the functions defined in Section 10 of RFC 7950. 1068 o The function library is the core function library 1070 o The context node is the root node in the data tree."; 1071 } 1073 typedef alarm-text { 1074 type string; 1075 description 1076 "The string used to inform operators about the alarm. This 1077 MUST contain enough information for an operator to be able 1078 to understand the problem and how to resolve it. If this 1079 string contains structure, this format should be clearly 1080 documented for programs to be able to parse that 1081 information."; 1082 } 1084 typedef severity { 1085 type enumeration { 1086 enum indeterminate { 1087 value 2; 1088 description 1089 "Indicates that the severity level could not be 1090 determined. This level SHOULD be avoided."; 1091 } 1092 enum minor { 1093 value 3; 1094 description 1095 "The 'minor' severity level indicates the existence of a 1096 non-service affecting fault condition and that corrective 1097 action should be taken in order to prevent a more serious 1098 (for example, service affecting) fault. Such a severity 1099 can be reported, for example, when the detected alarm 1100 condition is not currently degrading the capacity of the 1101 resource."; 1102 } 1103 enum warning { 1104 value 4; 1105 description 1106 "The 'warning' severity level indicates the detection of a 1107 potential or impending service affecting fault, before any 1108 significant effects have been felt. Action should be 1109 taken to further diagnose (if necessary) and correct the 1110 problem in order to prevent it from becoming a more 1111 serious service affecting fault."; 1112 } 1113 enum major { 1114 value 5; 1115 description 1116 "The 'major' severity level indicates that a service 1117 affecting condition has developed and an urgent corrective 1118 action is required. Such a severity can be reported, for 1119 example, when there is a severe degradation in the 1120 capability of the resource and its full capability must be 1121 restored."; 1122 } 1123 enum critical { 1124 value 6; 1125 description 1126 "The 'critical' severity level indicates that a service 1127 affecting condition has occurred and an immediate 1128 corrective action is required. Such a severity can be 1129 reported, for example, when a resource becomes totally out 1130 of service and its capability must be restored."; 1131 } 1132 } 1133 description 1134 "The severity level of the alarm. Note well that value 'clear' 1135 is not included. If an alarm is cleared or not is a separate 1136 boolean flag."; 1137 reference 1138 "ITU Recommendation X.733: Information Technology 1139 - Open Systems Interconnection 1140 - System Management: Alarm Reporting Function"; 1141 } 1143 typedef severity-with-clear { 1144 type union { 1145 type enumeration { 1146 enum cleared { 1147 value 1; 1148 description 1149 "The alarm is cleared by the instrumentation."; 1150 } 1151 } 1152 type severity; 1153 } 1154 description 1155 "The severity level of the alarm including clear. 1157 This is used *only* in notifications reporting state changes 1158 for an alarm."; 1159 } 1161 typedef writable-operator-state { 1162 type enumeration { 1163 enum none { 1164 value 1; 1165 description 1166 "The alarm is not being taken care of."; 1167 } 1168 enum ack { 1169 value 2; 1170 description 1171 "The alarm is being taken care of. Corrective action not 1172 taken yet, or failed"; 1173 } 1174 enum closed { 1175 value 3; 1176 description 1177 "Corrective action taken successfully."; 1178 } 1179 } 1180 description 1181 "Operator states on an alarm. The 'closed' state indicates 1182 that an operator considers the alarm being resolved. This 1183 is separate from the alarm's 'is-cleared' leaf."; 1184 } 1186 typedef operator-state { 1187 type union { 1188 type writable-operator-state; 1189 type enumeration { 1190 enum shelved { 1191 value 4; 1192 description 1193 "The alarm is shelved. Alarms in /alarms/shelved-alarms/ 1194 MUST be assigned this operator state by the server as 1195 the last entry in the operator-state-change list. The 1196 text for that entry SHOULD include the shelf name."; 1197 } 1198 enum un-shelved { 1199 value 5; 1200 description 1201 "The alarm is moved back to 'alarm-list' from a shelf. 1202 Alarms that are moved from /alarms/shelved-alarms/ to 1203 /alarms/alarm-list MUST be assigned this state by the 1204 server as the last entry in the 'operator-state-change' 1205 list. The text for that entry SHOULD include the shelf 1206 name."; 1207 } 1208 } 1209 } 1210 description 1211 "Operator states on an alarm. The 'closed' state indicates 1212 that an operator considers the alarm being resolved. This 1213 is separate from the alarm's 'is-cleared' leaf."; 1214 } 1216 /* Alarm type */ 1218 typedef alarm-type-id { 1219 type identityref { 1220 base alarm-type-id; 1221 } 1222 description 1223 "Identifies an alarm type. The description of the alarm type 1224 id MUST indicate if the alarm type is abstract or not. An 1225 abstract alarm type is used as a base for other alarm type ids 1226 and will not be used as a value for an alarm or be present in 1227 the alarm inventory."; 1228 } 1230 typedef alarm-type-qualifier { 1231 type string; 1232 description 1233 "If an alarm type can not be fully specified at design time by 1234 alarm-type-id, this string qualifier is used in addition to 1235 fully define a unique alarm type. 1237 The definition of alarm qualifiers is considered being part 1238 of the instrumentation and out of scope for this module. 1239 An empty string is used when this is part of a key."; 1240 } 1242 /* 1243 * Groupings 1244 */ 1246 grouping common-alarm-parameters { 1247 description 1248 "Common parameters for an alarm. 1250 This grouping is used both in the alarm list and in the 1251 notification representing an alarm state change."; 1252 leaf resource { 1253 type resource; 1254 mandatory true; 1255 description 1256 "The alarming resource. See also 'alt-resource'. 1257 This could for example be a reference to the alarming 1258 interface"; 1259 } 1260 leaf alarm-type-id { 1261 type alarm-type-id; 1262 mandatory true; 1263 description 1264 "This leaf and the leaf 'alarm-type-qualifier' together 1265 provides a unique identification of the alarm type."; 1266 } 1267 leaf alarm-type-qualifier { 1268 type alarm-type-qualifier; 1269 description 1270 "This leaf is used when the 'alarm-type-id' leaf cannot 1271 uniquely identify the alarm type. Normally, this is not 1272 the case, and this leaf is the empty string."; 1273 } 1274 leaf-list alt-resource { 1275 type resource; 1276 description 1277 "Used if the alarming resource is available over other 1278 interfaces. This field can contain SNMP OID's, CIM paths or 1279 3GPP Distinguished names for example."; 1280 } 1281 list related-alarm { 1282 key "resource alarm-type-id alarm-type-qualifier"; 1283 description 1284 "References to related alarms. Note that the related alarm 1285 might have been purged from the alarm list."; 1286 leaf resource { 1287 type leafref { 1288 path "/alarms/alarm-list/alarm/resource"; 1289 require-instance false; 1290 } 1291 description 1292 "The alarming resource for the related alarm."; 1293 } 1294 leaf alarm-type-id { 1295 type leafref { 1296 path "/alarms/alarm-list/alarm" 1297 + "[resource=current()/../resource]" 1298 + "/alarm-type-id"; 1299 require-instance false; 1300 } 1301 description 1302 "The alarm type identifier for the related alarm."; 1303 } 1304 leaf alarm-type-qualifier { 1305 type leafref { 1306 path "/alarms/alarm-list/alarm" 1307 + "[resource=current()/../resource]" 1308 + "[alarm-type-id=current()/../alarm-type-id]" 1309 + "/alarm-type-qualifier"; 1310 require-instance false; 1311 } 1312 description 1313 "The alarm qualifier for the related alarm."; 1314 } 1315 } 1316 leaf-list impacted-resource { 1317 type resource; 1318 description 1319 "Resources that might be affected by this alarm. If the 1320 system creates an alarm on a resource and also has a mapping 1321 to other resources that might be impacted, these resources 1322 can be listed in this leaf-list. In this way the system can 1323 create one alarm instead of several. For example, if an 1324 interface has an alarm, the 'impacted-resource' can 1325 reference the aggregated port channels."; 1326 } 1327 leaf-list root-cause-resource { 1328 type resource; 1329 description 1330 "Resources that are candidates for causing the alarm. If the 1331 system has a mechanism to understand the candidate root 1332 causes of an alarm, this leaf-list can be used to list the 1333 root cause candidate resources. In this way the system can 1334 create one alarm instead of several. An example might be a 1335 logging system (alarm resource) that fails, the alarm can 1336 reference the file-system in the 'root-cause-resource' 1337 leaf-list. Note that the intended use is not to also send an 1338 an alarm with the root-cause-resource as alarming resource. 1339 The root-cause-resource leaf list is a hint and should not 1340 also generate an alarm for the same problem."; 1341 } 1342 } 1344 grouping alarm-state-change-parameters { 1345 description 1346 "Parameters for an alarm state change. 1348 This grouping is used both in the alarm list's 1349 status-change list and in the notification representing an 1350 alarm state change."; 1351 leaf time { 1352 type yang:date-and-time; 1353 mandatory true; 1354 description 1355 "The time the status of the alarm changed. The value 1356 represents the time the real alarm state change appeared 1357 in the resource and not when it was added to the 1358 alarm list. The /alarm-list/alarm/last-changed MUST be 1359 set to the same value."; 1360 } 1361 leaf perceived-severity { 1362 type severity-with-clear; 1363 mandatory true; 1364 description 1365 "The severity of the alarm as defined by X.733. Note 1366 that this may not be the original severity since the alarm 1367 may have changed severity."; 1368 reference 1369 "ITU Recommendation X.733: Information Technology 1370 - Open Systems Interconnection 1371 - System Management: Alarm Reporting Function"; 1372 } 1373 leaf alarm-text { 1374 type alarm-text; 1375 mandatory true; 1376 description 1377 "A user friendly text describing the alarm state change."; 1378 reference 1379 "ITU Recommendation X.733: Information Technology 1380 - Open Systems Interconnection 1381 - System Management: Alarm Reporting Function"; 1382 } 1383 } 1385 grouping operator-parameters { 1386 description 1387 "This grouping defines parameters that can be changed by an 1388 operator."; 1389 leaf time { 1390 type yang:date-and-time; 1391 mandatory true; 1392 description 1393 "Timestamp for operator action on alarm."; 1394 } 1395 leaf operator { 1396 type string; 1397 mandatory true; 1398 description 1399 "The name of the operator that has acted on this 1400 alarm."; 1401 } 1402 leaf state { 1403 type operator-state; 1404 mandatory true; 1405 description 1406 "The operator's view of the alarm state."; 1407 } 1408 leaf text { 1409 type string; 1410 description 1411 "Additional optional textual information provided by 1412 the operator."; 1413 } 1414 } 1416 grouping resource-alarm-parameters { 1417 description 1418 "Alarm parameters that originates from the resource view."; 1419 leaf is-cleared { 1420 type boolean; 1421 mandatory true; 1422 description 1423 "Indicates the current clearance state of the alarm. An 1424 alarm might toggle from active alarm to cleared alarm and 1425 back to active again."; 1426 } 1427 leaf last-changed { 1428 type yang:date-and-time; 1429 mandatory true; 1430 description 1431 "A timestamp when the alarm status was last changed. Status 1432 changes are changes to 'is-cleared', 'perceived-severity', 1433 and 'alarm-text'."; 1434 } 1435 leaf perceived-severity { 1436 type severity; 1437 mandatory true; 1438 description 1439 "The last severity of the alarm. 1441 If an alarm was raised with severity 'warning', but later 1442 changed to 'major', this leaf will show 'major'."; 1443 } 1444 leaf alarm-text { 1445 type alarm-text; 1446 mandatory true; 1447 description 1448 "The last reported alarm text. This text should contain 1449 information for an operator to be able to understand 1450 the problem and how to resolve it."; 1451 } 1452 list status-change { 1453 if-feature "alarm-history"; 1454 key "time"; 1455 min-elements 1; 1456 description 1457 "A list of status change events for this alarm. 1459 The entry with latest time-stamp in this list MUST 1460 correspond to the leafs 'is-cleared', 'perceived-severity' 1461 and 'alarm-text' for the alarm. The time-stamp for that 1462 entry MUST be equal to the 'last-changed' leaf. 1464 This list is ordered according to the timestamps of 1465 alarm state changes. The last item corresponds to the 1466 latest state change. 1468 The following state changes creates an entry in this 1469 list: 1470 - changed severity (warning, minor, major, critical) 1471 - clearance status, this also updates the 'is-cleared' 1472 leaf 1473 - alarm text update"; 1474 uses alarm-state-change-parameters; 1475 } 1476 } 1478 /* 1479 * The /alarms data tree 1480 */ 1482 container alarms { 1483 description 1484 "The top container for this module."; 1485 container control { 1486 description 1487 "Configuration to control the alarm behaviour."; 1488 leaf max-alarm-status-changes { 1489 type union { 1490 type uint16; 1491 type enumeration { 1492 enum infinite { 1493 description 1494 "The status change entries are accumulated 1495 infinitely."; 1496 } 1497 } 1498 } 1499 default "32"; 1500 description 1501 "The status-change entries are kept in a circular list 1502 per alarm. When this number is exceeded, the oldest 1503 status change entry is automatically removed. If the 1504 value is 'infinite', the status change entries are 1505 accumulated infinitely."; 1506 } 1507 choice notify-status-changes { 1508 description 1509 "This leaf controls the notifications sent for alarm status 1510 updates. There are three options: 1511 1. notifications are sent for all updates, severity level 1512 changes and alarm text changes 1513 2. notifications are only sent for alarm raise and clear 1514 3. notifications are sent for status changes equal to or 1515 above the specified severity level. Clear notifications 1516 shall always be sent 1517 Notifications shall also be sent for state changes that 1518 makes an alarm less severe than the specified level. 1519 In option 3, assuming the severity level is set to major, 1520 and that the alarm has the following state changes 1521 [(Time, severity, clear)]: 1522 [(T1, major, -), (T2, minor, -), (T3, warning, -), 1523 (T4, minor, -), (T5, major, -), (T6, critical, -), 1524 (T7, major. -), (T8, major, clear)] 1525 In that case, notifications will be sent at 1526 T1, T2, T5, T6, T7 and T8."; 1527 leaf notify-all-state-changes { 1528 type empty; 1529 description 1530 "Send notifications for all status changes."; 1531 } 1532 leaf notify-raise-and-clear { 1533 type empty; 1534 description 1535 "Send notifications only for raise, clear, and re-raise. 1536 Notifications for severity level changes or alarm text 1537 changes are not sent."; 1538 } 1539 leaf notify-severity-level { 1540 type severity; 1541 description 1542 "Only send notifications for alarm state changes 1543 crossing the specified level. Always send clear 1544 notifications."; 1545 } 1546 } 1547 container alarm-shelving { 1548 if-feature "alarm-shelving"; 1549 description 1550 "The alarm-shelving/shelf list is used to shelve 1551 (block/filter) alarms. The server will move any alarms 1552 corresponding to the shelving criteria from the 1553 alarms/alarm-list/alarm list to the 1554 alarms/shelved-alarms/shelved-alarm list. It will also 1555 stop sending notifications for the shelved alarms. The 1556 conditions in the shelf criteria are logically ANDed. 1557 When the shelving criteria is deleted or changed, the 1558 non-matching alarms MUST appear in the 1559 alarms/alarm-list/alarm list according to the real state. 1560 This means that the instrumentation MUST maintain states 1561 for the shelved alarms. Alarms that match the criteria 1562 shall have an operator-state 'shelved'. When the shelf 1563 configuration will remove an alarm from the shelf the 1564 server shall add an operator state 'unshelved'."; 1565 list shelf { 1566 key "name"; 1567 leaf name { 1568 type string; 1569 description 1570 "An arbitrary name for the alarm shelf."; 1571 } 1572 description 1573 "Each entry defines the criteria for shelving alarms. 1574 Criteria are ANDed. If no criteria are specified, 1575 all alarms will be shelved."; 1576 leaf-list resource { 1577 type resource-match; 1578 description 1579 "Shelve alarms for matching resources."; 1580 } 1581 leaf alarm-type-id { 1582 type alarm-type-id; 1583 description 1584 "Shelve all alarms that have an alarm-type-id that is 1585 equal to or derived from the given alarm-type-id."; 1586 } 1587 leaf alarm-type-qualifier-match { 1588 type string; 1589 description 1590 "A W3C regular expression that is used to match 1591 an alarm type qualifier. Shelve all alarms that 1592 matches this regular expression for the alarm 1593 type qualifier."; 1594 } 1595 leaf description { 1596 type string; 1597 description 1598 "An optional textual description of the shelf. This 1599 description should include the reason for shelving 1600 these alarms."; 1601 } 1602 } 1603 } 1604 } 1605 container alarm-inventory { 1606 config false; 1607 description 1608 "This alarm-inventory/alarm-type list contains all possible 1609 alarm types for the system. 1610 If the system knows for which resources a specific alarm 1611 type can appear, this is also identified in the inventory. 1612 The list also tells if each alarm type has a corresponding 1613 clear state. The inventory shall only contain concrete 1614 alarm types. 1616 The alarm inventory MUST be updated by the system when new 1617 alarms can appear. This can be the case when installing new 1618 software modules or inserting new card types. A 1619 notification 'alarm-inventory-changed' is sent when the 1620 inventory is changed."; 1621 list alarm-type { 1622 key "alarm-type-id alarm-type-qualifier"; 1623 description 1624 "An entry in this list defines a possible alarm."; 1625 leaf alarm-type-id { 1626 type alarm-type-id; 1627 description 1628 "The statically defined alarm type identifier for this 1629 possible alarm."; 1630 } 1631 leaf alarm-type-qualifier { 1632 type alarm-type-qualifier; 1633 description 1634 "The optionally dynamically defined alarm type identifier 1635 for this possible alarm."; 1636 } 1637 leaf-list resource { 1638 type resource-match; 1639 description 1640 "Optionally, specifies for which resources the alarm type 1641 is valid."; 1642 } 1643 leaf has-clear { 1644 type boolean; 1645 mandatory true; 1646 description 1647 "This leaf tells the operator if the alarm will be 1648 cleared when the correct corrective action has been 1649 taken. Implementations SHOULD strive for detecting the 1650 cleared state for all alarm types. If this leaf is 1651 true, the operator can monitor the alarm until it 1652 becomes cleared after the corrective action has been 1653 taken. If this leaf is false the operator needs to 1654 validate that the alarm is not longer active using other 1655 mechanisms. Alarms can lack a corresponding clear due 1656 to missing instrumentation or that there is no logical 1657 corresponding clear state."; 1658 } 1659 leaf-list severity-levels { 1660 type severity; 1661 description 1662 "This leaf-list indicates the possible severity levels of 1663 this alarm type. Note well that 'clear' is not part of 1664 the severity type. In general, the severity level should 1665 be defined by the instrumentation based on dynamic state 1666 and not defined statically by the alarm type in order to 1667 provide relevant severity level based on dynamic state 1668 and context. However most alarm types have a defined set 1669 of possible severity levels and this should be provided 1670 here."; 1671 } 1672 leaf description { 1673 type string; 1674 mandatory true; 1675 description 1676 "A description of the possible alarm. It SHOULD include 1677 information on possible underlying root causes and 1678 corrective actions."; 1679 } 1680 } 1681 } 1682 container summary { 1683 if-feature "alarm-summary"; 1684 config false; 1685 description 1686 "This container gives a summary of number of alarms."; 1687 list alarm-summary { 1688 key "severity"; 1689 description 1690 "A global summary of all alarms in the system. The summary 1691 does not include shelved alarms."; 1692 leaf severity { 1693 type severity; 1694 description 1695 "Alarm summary for this severity level."; 1696 } 1697 leaf total { 1698 type yang:gauge32; 1699 description 1700 "Total number of alarms of this severity level."; 1701 } 1702 leaf cleared { 1703 type yang:gauge32; 1704 description 1705 "For this severity level, the number of alarms that are 1706 cleared."; 1707 } 1708 leaf cleared-not-closed { 1709 if-feature "operator-actions"; 1710 type yang:gauge32; 1711 description 1712 "For this severity level, the number of alarms that are 1713 cleared but not closed."; 1714 } 1715 leaf cleared-closed { 1716 if-feature "operator-actions"; 1717 type yang:gauge32; 1718 description 1719 "For this severity level, the number of alarms that are 1720 cleared and closed."; 1721 } 1722 leaf not-cleared-closed { 1723 if-feature "operator-actions"; 1724 type yang:gauge32; 1725 description 1726 "For this severity level, the number of alarms that are 1727 not cleared but closed."; 1728 } 1729 leaf not-cleared-not-closed { 1730 if-feature "operator-actions"; 1731 type yang:gauge32; 1732 description 1733 "For this severity level, the number of alarms that are 1734 not cleared and not closed."; 1735 } 1736 } 1737 leaf shelves-active { 1738 if-feature "alarm-shelving"; 1739 type empty; 1740 description 1741 "This is a hint to the operator that there are active 1742 alarm shelves. This leaf MUST exist if the 1743 alarms/shelved-alarms/number-of-shelved-alarms is > 0."; 1744 } 1745 } 1746 container alarm-list { 1747 config false; 1748 description 1749 "The alarms in the system."; 1750 leaf number-of-alarms { 1751 type yang:gauge32; 1752 description 1753 "This object shows the total number of 1754 alarms in the system, i.e., the total number 1755 of entries in the alarm list."; 1756 } 1757 leaf last-changed { 1758 type yang:date-and-time; 1759 description 1760 "A timestamp when the alarm list was last 1761 changed. The value can be used by a manager to 1762 initiate an alarm resynchronization procedure."; 1763 } 1764 list alarm { 1765 key "resource alarm-type-id alarm-type-qualifier"; 1766 description 1767 "The list of alarms. Each entry in the list holds one 1768 alarm for a given alarm type and resource. 1769 An alarm can be updated from the underlying resource or 1770 by the user. The following leafs are maintained by the 1771 resource: is-cleared, last-change, perceived-severity, 1772 and alarm-text. An operator can change: operator-state 1773 and operator-text. 1775 Entries appear in the alarm list the first time an 1776 alarm becomes active for a given alarm-type and resource. 1777 Entries do not get deleted when the alarm is cleared, this 1778 is a boolean state in the alarm. 1780 Alarm entries are removed, purged, from the list by an 1781 explicit purge action. For example, purge all alarms 1782 that are cleared and in closed operator-state that are 1783 older than 24 hours. Systems may also remove alarms based 1784 on locally configured policies which is out of scope for 1785 this module."; 1786 uses common-alarm-parameters; 1787 leaf time-created { 1788 type yang:date-and-time; 1789 mandatory true; 1790 description 1791 "The time-stamp when this alarm entry was created. This 1792 represents the first time the alarm appeared, it can 1793 also represent that the alarm re-appeared after a purge. 1794 Further state-changes of the same alarm does not change 1795 this leaf, these changes will update the 'last-changed' 1796 leaf."; 1797 } 1798 uses resource-alarm-parameters; 1799 list operator-state-change { 1800 if-feature "operator-actions"; 1801 key "time"; 1802 description 1803 "This list is used by operators to indicate 1804 the state of human intervention on an alarm. 1805 For example, if an operator has seen an alarm, 1806 the operator can add a new item to this list indicating 1807 that the alarm is acknowledged."; 1808 uses operator-parameters; 1809 } 1810 action set-operator-state { 1811 if-feature "operator-actions"; 1812 description 1813 "This is a means for the operator to indicate 1814 the level of human intervention on an alarm."; 1815 input { 1816 leaf state { 1817 type writable-operator-state; 1818 mandatory true; 1819 description 1820 "Set this operator state."; 1821 } 1822 leaf text { 1823 type string; 1824 description 1825 "Additional optional textual information."; 1826 } 1827 } 1828 } 1829 notification operator-action { 1830 if-feature "operator-actions"; 1831 description 1832 "This notification is used to report that an operator 1833 acted upon an alarm."; 1834 uses operator-parameters; 1835 } 1836 } 1837 } 1838 container shelved-alarms { 1839 if-feature "alarm-shelving"; 1840 config false; 1841 description 1842 "The shelved alarms. Alarms appear here if they match the 1843 criteria in /alarms/control/alarm-shelving. This list does 1844 not generate any notifications. The list represents alarms 1845 that are considered not relevant by the operator. Alarms in 1846 this list have an operator-state of 'shelved'. This can not 1847 be changed."; 1848 leaf number-of-shelved-alarms { 1849 type yang:gauge32; 1850 description 1851 "This object shows the total number of currently 1852 alarms, i.e., the total number of entries 1853 in the alarm list."; 1854 } 1855 leaf alarm-shelf-last-changed { 1856 type yang:date-and-time; 1857 description 1858 "A timestamp when the shelved alarm list was last 1859 changed. The value can be used by a manager to 1860 initiate an alarm resynchronization procedure."; 1861 } 1862 list shelved-alarm { 1863 key "resource alarm-type-id alarm-type-qualifier"; 1864 description 1865 "The list of shelved alarms. Shelved alarms 1866 can only be updated from the underlying resource, 1867 no operator actions are supported."; 1868 uses common-alarm-parameters; 1869 leaf shelf-name { 1870 type leafref { 1871 path "/alarms/control/alarm-shelving/shelf/name"; 1872 require-instance false; 1873 } 1874 description 1875 "The name of the shelf."; 1876 } 1877 uses resource-alarm-parameters; 1878 list operator-state-change { 1879 if-feature "operator-actions"; 1880 key "time"; 1881 description 1882 "This list is used by operators to indicate 1883 the state of human intervention on an alarm. 1884 For shelved alarms, the system has set the list 1885 item in the list to 'shelved'."; 1886 uses operator-parameters; 1887 } 1888 } 1889 } 1890 list alarm-profile { 1891 if-feature "alarm-profile"; 1892 key "alarm-type-id alarm-type-qualifier-match resource"; 1893 ordered-by user; 1894 description 1895 "This list is used to assign further information or 1896 configuration for each alarm type. This module supports 1897 a mechanism where the client can override the system 1898 default alarm severity levels. The alarm-profile is 1899 also a useful augmentation point for specific additions 1900 to alarm types."; 1901 leaf alarm-type-id { 1902 type al:alarm-type-id; 1903 description 1904 "The alarm type identifier to match."; 1905 } 1906 leaf alarm-type-qualifier-match { 1907 type string; 1908 description 1909 "A W3C regular expression that is used to 1910 match."; 1911 } 1912 leaf resource { 1913 type al:resource-match; 1914 description 1915 "Specifies which resources to match."; 1916 } 1917 leaf description { 1918 type string; 1919 mandatory true; 1920 description 1921 "A description of the alarm profile."; 1922 } 1923 container alarm-severity-assignment-profile { 1924 if-feature "severity-assignment"; 1925 description 1926 "The client can override the system default 1927 severity level."; 1928 reference 1929 "ITU M.3100, ITU M.3160 1930 - Generic Network Information Model, 1931 Alarm Severity Assignment Profile"; 1932 leaf-list severity-levels { 1933 type al:severity; 1934 ordered-by user; 1935 description 1936 "Specifies the configured severity level(s) for the 1937 matching alarm. If the alarm has several severity 1938 levels the leaf-list shall be given in rising severity 1939 order. The original M3100/M3160 ASAP function only 1940 allows for a one-to-one mapping between alarm type and 1941 severity but since the IETF alarm module supports 1942 stateful alarms the mapping must allow for several 1943 severity levels. 1945 Assume a high-utilisation alarm type with two 1946 thresholds with the system default severity levels of 1947 threshold1 = warning and threshold2 = minor. Setting 1948 this leaf-list to (minor, major) will assign the 1949 severity levels threshold1 = minor and 1950 threshold2 = major"; 1951 } 1952 } 1953 } 1954 } 1956 /* 1957 * Operations 1958 */ 1960 rpc compress-alarms { 1961 if-feature "alarm-history"; 1962 description 1963 "This operation requests the server to compress entries in the 1964 alarm list by removing all but the latest state change for all 1965 alarms. Conditions in the input are logically ANDed. If no 1966 input condition is given, all alarms are compressed."; 1967 input { 1968 leaf resource { 1969 type leafref { 1970 path "/alarms/alarm-list/alarm/resource"; 1971 require-instance false; 1972 } 1973 description 1974 "Compress the alarms with this resource."; 1975 } 1976 leaf alarm-type-id { 1977 type leafref { 1978 path "/alarms/alarm-list/alarm/alarm-type-id"; 1979 require-instance false; 1980 } 1981 description 1982 "Compress alarms with this alarm-type-id."; 1983 } 1984 leaf alarm-type-qualifier { 1985 type leafref { 1986 path "/alarms/alarm-list/alarm/alarm-type-qualifier"; 1987 require-instance false; 1988 } 1989 description 1990 "Compress the alarms with this alarm-type-qualifier."; 1991 } 1992 } 1993 output { 1994 leaf compressed-alarms { 1995 type uint32; 1996 description 1997 "Number of compressed alarm entries."; 1998 } 1999 } 2000 } 2001 rpc compress-shelved-alarms { 2002 if-feature "alarm-history and alarm-shelving"; 2003 description 2004 "This operation requests the server to compress entries in the 2005 shelved alarm list by removing all but the latest state change 2006 for all alarms. Conditions in the input are logically ANDed. 2007 If no input condition is given, all alarms are compressed."; 2008 input { 2009 leaf resource { 2010 type leafref { 2011 path "/alarms/shelved-alarms/shelved-alarm/resource"; 2012 require-instance false; 2013 } 2014 description 2015 "Compress the alarms with this resource."; 2016 } 2017 leaf alarm-type-id { 2018 type leafref { 2019 path "/alarms/shelved-alarms/shelved-alarm/alarm-type-id"; 2020 require-instance false; 2022 } 2023 description 2024 "Compress alarms with this alarm-type-id."; 2025 } 2026 leaf alarm-type-qualifier { 2027 type leafref { 2028 path "/alarms/shelved-alarms/shelved-alarm" 2029 + "/alarm-type-qualifier"; 2030 require-instance false; 2031 } 2032 description 2033 "Compress the alarms with this alarm-type-qualifier."; 2034 } 2035 } 2036 output { 2037 leaf compressed-alarms { 2038 type uint32; 2039 description 2040 "Number of compressed alarm entries."; 2041 } 2042 } 2043 } 2045 grouping filter-input { 2046 description 2047 "Grouping to specify a filter construct on alarm information."; 2048 leaf alarm-status { 2049 type enumeration { 2050 enum any { 2051 description 2052 "Ignore alarm clearance status."; 2053 } 2054 enum cleared { 2055 description 2056 "Filter cleared alarms."; 2057 } 2058 enum not-cleared { 2059 description 2060 "Filter not cleared alarms."; 2061 } 2062 } 2063 mandatory true; 2064 description 2065 "The clearance status of the alarm."; 2066 } 2067 container older-than { 2068 presence "Age specification"; 2069 description 2070 "Matches the 'last-status-change' leaf in the alarm."; 2071 choice age-spec { 2072 description 2073 "Filter using date and time age."; 2074 case seconds { 2075 leaf seconds { 2076 type uint16; 2077 description 2078 "Seconds part"; 2079 } 2080 } 2081 case minutes { 2082 leaf minutes { 2083 type uint16; 2084 description 2085 "Minute part"; 2086 } 2087 } 2088 case hours { 2089 leaf hours { 2090 type uint16; 2091 description 2092 "Hours part."; 2093 } 2094 } 2095 case days { 2096 leaf days { 2097 type uint16; 2098 description 2099 "Day part"; 2100 } 2101 } 2102 case weeks { 2103 leaf weeks { 2104 type uint16; 2105 description 2106 "Week part"; 2107 } 2108 } 2109 } 2110 } 2111 container severity { 2112 presence "Severity filter"; 2113 choice sev-spec { 2114 description 2115 "Filter based on severity level."; 2116 leaf below { 2117 type severity; 2118 description 2119 "Severity less than this leaf."; 2120 } 2121 leaf is { 2122 type severity; 2123 description 2124 "Severity level equal this leaf."; 2125 } 2126 leaf above { 2127 type severity; 2128 description 2129 "Severity level higher than this leaf."; 2130 } 2131 } 2132 description 2133 "Filter based on severity."; 2134 } 2135 container operator-state-filter { 2136 if-feature "operator-actions"; 2137 presence "Operator state filter"; 2138 leaf state { 2139 type operator-state; 2140 description 2141 "Filter on operator state."; 2142 } 2143 leaf user { 2144 type string; 2145 description 2146 "Filter based on which operator."; 2147 } 2148 description 2149 "Filter based on operator state."; 2150 } 2151 } 2153 rpc purge-alarms { 2154 description 2155 "This operation requests the server to delete entries from the 2156 alarm list or the shelved alarms list according to the 2157 supplied criteria. To purge alarms in the shelved alarms, 2158 set the operator-state filter input to 'shelved'. 2159 Typically it can be used to delete alarms that are 2160 in closed operator state and older than a specified time. 2161 In the shelved alarm list it makes sense to delete alarms that 2162 are not relevant anymore. 2163 The number of purged alarms is returned as an output 2164 parameter."; 2165 input { 2166 uses filter-input; 2167 } 2168 output { 2169 leaf purged-alarms { 2170 type uint32; 2171 description 2172 "Number of purged alarms."; 2173 } 2174 } 2175 } 2177 /* 2178 * Notifications 2179 */ 2181 notification alarm-notification { 2182 description 2183 "This notification is used to report a state change for an 2184 alarm. The same notification is used for reporting a newly 2185 raised alarm, a cleared alarm or changing the text and/or 2186 severity of an existing alarm."; 2187 uses common-alarm-parameters; 2188 uses alarm-state-change-parameters; 2189 } 2190 notification alarm-inventory-changed { 2191 description 2192 "This notification is used to report that the list of possible 2193 alarms has changed. This can happen when for example if a new 2194 software module is installed, or a new physical card is 2195 inserted."; 2196 } 2197 } 2199 2201 6. X.733 Extensions 2203 Many alarm systems are based on the X.733, [X.733], and X.736 [X.736] 2204 alarm standards. This module augments the alarm inventory, the alarm 2205 lists and the alarm notification with X.733 and X.736 parameters. 2207 The module also supports a feature whereby the alarm manager can 2208 configure the mapping from alarm types to X.733 event-type and 2209 probable-cause parameters. This might be needed when the default 2210 mapping provided by the system is in conflict with other management 2211 systems or not considered correct. 2213 Note that the IETF Alarm Module term 'resource' is synonymous to the 2214 ITU term 'managed object'. 2216 7. The X.733 Mapping Module 2218 This YANG module references [X.733] and [X.736]. 2220 file "ietf-alarms-x733@2018-10-09.yang" 2221 module ietf-alarms-x733 { 2222 yang-version 1.1; 2223 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; 2224 prefix x733; 2226 import ietf-alarms { 2227 prefix al; 2228 } 2229 import ietf-yang-types { 2230 prefix yang; 2231 reference "RFC 6991: Common YANG Data Types"; 2232 } 2234 organization 2235 "IETF CCAMP Working Group"; 2236 contact 2237 "WG Web: 2238 WG List: 2240 Editor: Stefan Vallin 2241 2243 Editor: Martin Bjorklund 2244 "; 2245 description 2246 "This module augments the ietf-alarms module with X.733 alarm 2247 parameters. 2249 The following structures are augmented with X.733 event type 2250 and probable cause: 2252 1) alarms/alarm-inventory: all possible alarm types 2253 2) alarms/alarm-list: every alarm in the system 2254 3) alarm-notification: notifications indicating alarm state 2255 changes 2257 The module also optionally allows the alarm management system 2258 to configure the mapping from the IETF Alarm module alarm keys 2259 to the ITU tuple (event-type, probable-cause). 2261 The mapping does not include a corresponding X.733 specific 2262 problem value. The recommendation is to use the 2263 'alarm-type-qualifier' leaf which serves the same purpose. 2265 The module uses an integer and a corresponding string for 2266 probable cause instead of a globally defined enumeration, in 2267 order to be able to manage conflicting enumeration definitions. 2268 A single globally defined enumeration is challenging to 2269 maintain."; 2270 reference 2271 "ITU Recommendation X.733: Information Technology 2272 - Open Systems Interconnection 2273 - System Management: Alarm Reporting Function"; 2275 revision 2018-10-09 { 2276 description 2277 "Initial revision."; 2278 reference "RFC XXXX: YANG Alarm Module"; 2279 } 2281 /* 2282 * Features 2283 */ 2285 feature configure-x733-mapping { 2286 description 2287 "The system supports configurable X733 mapping from 2288 the IETF alarm module alarm-type to X733 event-type 2289 and probable-cause."; 2290 } 2292 /* 2293 * Typedefs 2294 */ 2296 typedef event-type { 2297 type enumeration { 2298 enum other { 2299 value 1; 2300 description 2301 "None of the below."; 2302 } 2303 enum communications-alarm { 2304 value 2; 2305 description 2306 "An alarm of this type is principally associated with the 2307 procedures and/or processes required to convey 2308 information from one point to another."; 2310 } 2311 enum quality-of-service-alarm { 2312 value 3; 2313 description 2314 "An alarm of this type is principally associated with a 2315 degradation in the quality of a service."; 2316 } 2317 enum processing-error-alarm { 2318 value 4; 2319 description 2320 "An alarm of this type is principally associated with a 2321 software or processing fault."; 2322 } 2323 enum equipment-alarm { 2324 value 5; 2325 description 2326 "An alarm of this type is principally associated with an 2327 equipment fault."; 2328 } 2329 enum environmental-alarm { 2330 value 6; 2331 description 2332 "An alarm of this type is principally associated with a 2333 condition relating to an enclosure in which the equipment 2334 resides."; 2335 } 2336 enum integrity-violation { 2337 value 7; 2338 description 2339 "An indication that information may have been illegally 2340 modified, inserted or deleted."; 2341 } 2342 enum operational-violation { 2343 value 8; 2344 description 2345 "An indication that the provision of the requested service 2346 was not possible due to the unavailability, malfunction or 2347 incorrect invocation of the service."; 2348 } 2349 enum physical-violation { 2350 value 9; 2351 description 2352 "An indication that a physical resource has been violated 2353 in a way that suggests a security attack."; 2354 } 2355 enum security-service-or-mechanism-violation { 2356 value 10; 2357 description 2358 "An indication that a security attack has been detected by 2359 a security service or mechanism."; 2360 } 2361 enum time-domain-violation { 2362 value 11; 2363 description 2364 "An indication that an event has occurred at an unexpected 2365 or prohibited time."; 2366 } 2367 } 2368 description 2369 "The event types as defined by X.733 and X.736."; 2370 reference 2371 "ITU Recommendation X.733: Information Technology 2372 - Open Systems Interconnection 2373 - System Management: Alarm Reporting Function 2374 ITU Recommendation X.736: Information Technology 2375 - Open Systems Interconnection 2376 - System Management: Security Alarm Reporting Function"; 2377 } 2379 typedef trend { 2380 type enumeration { 2381 enum less-severe { 2382 description 2383 "There is at least one outstanding alarm of a 2384 severity higher (more severe) than that in the 2385 current alarm."; 2386 } 2387 enum no-change { 2388 description 2389 "The Perceived severity reported in the current 2390 alarm is the same as the highest (most severe) 2391 of any of the outstanding alarms"; 2392 } 2393 enum more-severe { 2394 description 2395 "The Perceived severity in the current alarm is 2396 higher (more severe) than that reported in any 2397 of the outstanding alarms."; 2398 } 2399 } 2400 description 2401 "This type is used to describe the 2402 severity trend of the alarming resource"; 2403 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2404 } 2405 typedef value-type { 2406 type union { 2407 type int64; 2408 type uint64; 2409 type decimal64 { 2410 fraction-digits 2; 2411 } 2412 } 2413 description 2414 "A generic union type to match ITU choice of integer 2415 and real."; 2416 } 2418 /* 2419 * Groupings 2420 */ 2422 grouping x733-alarm-parameters { 2423 description 2424 "Common X.733 parameters for alarms."; 2425 leaf event-type { 2426 type event-type; 2427 description 2428 "The X.733/X.736 event type for this alarm."; 2429 } 2430 leaf probable-cause { 2431 type uint32; 2432 description 2433 "The X.733 probable cause for this alarm."; 2434 } 2435 leaf probable-cause-string { 2436 type string; 2437 description 2438 "The user friendly string matching 2439 the probable cause integer value. The string 2440 SHOULD match the X.733 enumeration. For example, 2441 value 27 is 'localNodeTransmissionError'."; 2442 } 2443 container threshold-information { 2444 description 2445 "This parameter shall be present when the alarm 2446 is a result of crossing a threshold. "; 2447 leaf triggered-threshold { 2448 type string; 2449 description 2450 "The identifier of the threshold attribute that 2451 caused the notification."; 2452 } 2453 leaf observed-value { 2454 type value-type; 2455 description 2456 "The value of the gauge or counter which crossed 2457 the threshold. This may be different from the 2458 threshold value if, for example, the gauge may 2459 only take on discrete values."; 2460 } 2461 choice threshold-level { 2462 description 2463 "In the case of a gauge the threshold level specifies 2464 a pair of threshold values, the first being the value 2465 of the crossed threshold and the second, its corresponding 2466 hysteresis; in the case of a counter the threshold level 2467 specifies only the threshold value."; 2468 case up { 2469 leaf up-high { 2470 type value-type; 2471 description 2472 "The going up threshold for rising the alarm."; 2473 } 2474 leaf up-low { 2475 type value-type; 2476 description 2477 "The threshold level for clearing the alarm. 2478 This is used for hysteresis functions for gauges."; 2479 } 2480 } 2481 case down { 2482 leaf down-low { 2483 type value-type; 2484 description 2485 "The going down threshold for rising the alarm."; 2486 } 2487 leaf down-high { 2488 type value-type; 2489 description 2490 "The threshold level for clearing the alarm. 2491 This is used for hysteresis functions for gauges."; 2492 } 2493 } 2494 } 2495 leaf arm-time { 2496 type yang:date-and-time; 2497 description 2498 "For a gauge threshold, the time at which the threshold 2499 was last re-armed, namely the time after the previous 2500 threshold crossing at which the hysteresis value of the 2501 threshold was exceeded thus again permitting generation 2502 of notifications when the threshold is crossed. 2503 For a counter threshold, the later of the time at which 2504 the threshold offset was last applied, or the time at 2505 which the counter was last initialized (for resettable 2506 counters)."; 2507 } 2508 } 2509 list monitored-attributes { 2510 uses attribute; 2511 key "id"; 2512 description 2513 "The Monitored attributes parameter, when present, defines 2514 one or more attributes of the resource and their 2515 corresponding values at the time of the alarm."; 2516 } 2517 leaf-list proposed-repair-actions { 2518 type string; 2519 description 2520 "This parameter, when present, is used if the cause is 2521 known and the system being managed can suggest one or 2522 more solutions (such as switch in standby equipment, 2523 retry, replace media)."; 2524 } 2525 leaf trend-indication { 2526 type trend; 2527 description 2528 "This parameter specifies the current 2529 severity trend of the resource. If present it 2530 indicates that there are one or more alarms 2531 ('outstanding alarms') which have not been cleared, 2532 and pertain to the same resource as that to which 2533 this alarm ('current alarm') pertains. 2534 The possible values are: 2536 more-severe: The Perceived severity in the current 2537 alarm is higher (more severe) than that reported in 2538 any of the outstanding alarms. 2540 no-change: The Perceived severity reported in the 2541 current alarm is the same as the highest (most severe) 2542 of any of the outstanding alarms. 2544 less-severe: There is at least one outstanding alarm 2545 of a severity higher (more severe) than that in the 2546 current alarm."; 2547 } 2548 leaf backedup-status { 2549 type boolean; 2550 description 2551 "This parameter, when present, specifies whether or not 2552 the object emitting the alarm has been backed-up, and 2553 services provided to the user have, therefore, not been 2554 disrupted. The use of this field in conjunction with the 2555 severity field provides information in an independent form 2556 to qualify the seriousness of the alarm and the ability of 2557 the system as a whole to continue to provide services. 2558 If the value of this parameter is true, it indicates that 2559 the object emitting the alarm has been backed-up; if false, 2560 the object has not been backed-up."; 2561 } 2562 leaf backup-object { 2563 type al:resource; 2564 description 2565 "This parameter shall be present when the Backed-up status 2566 parameter is present and has the value true. This parameter 2567 specifies the managed object instance that is providing 2568 back-up services for the managed object about which the 2569 notification pertains. This parameter is useful, 2570 for example, when the back-up object is from a pool of 2571 objects any of which may be dynamically allocated to 2572 replace a faulty object."; 2573 } 2574 list additional-information { 2575 key "identifier"; 2576 description 2577 "This parameter allows the inclusion of a 2578 set of additional information in the alarm. It is 2579 a series of data structures each of which contains three 2580 items of information: an identifier, a significance 2581 indicator, and the problem information."; 2582 leaf identifier { 2583 type string; 2584 description 2585 "Identifies the data-type of the information parameter."; 2586 } 2587 leaf significant { 2588 type boolean; 2589 description 2590 "Set to true if the receiving system must be able to 2591 parse the contents of the information subparameter 2592 for the event report to be fully understood."; 2593 } 2594 leaf information { 2595 type string; 2596 description 2597 "Additional information about the alarm."; 2598 } 2599 } 2600 leaf security-alarm-detector { 2601 type al:resource; 2602 description 2603 "This parameter identifies the detector of the security 2604 alarm."; 2605 } 2606 leaf service-user { 2607 type al:resource; 2608 description 2609 "This parameter identifies the service-user whose request 2610 for service led to the generation of the security alarm."; 2611 } 2612 leaf service-provider { 2613 type al:resource; 2614 description 2615 "This parameter identifies the intended service-provider 2616 of the service that led to the generation of the security 2617 alarm."; 2618 } 2619 reference 2620 "ITU Recommendation X.733: Information Technology 2621 - Open Systems Interconnection 2622 - System Management: Alarm Reporting Function 2623 ITU Recommendation X.736: Information Technology 2624 - Open Systems Interconnection 2625 - System Management: Security Alarm Reporting Function"; 2626 } 2628 grouping x733-alarm-definition-parameters { 2629 description 2630 "Common X.733 parameters for alarm definitions. 2631 This grouping is used to define those alarm 2632 attributes that can be mapped from the alarm-type 2633 mechanism in the ietf-alarm module."; 2634 leaf event-type { 2635 type event-type; 2636 description 2637 "The alarm type has this X.733/X.736 event type."; 2638 } 2639 leaf probable-cause { 2640 type uint32; 2641 description 2642 "The alarm type has this X.733 probable cause value. 2643 This module defines probable cause as an integer 2644 and not as an enumeration. The reason being that the 2645 primary use of probable cause is in the management 2646 application if it is based on the X.733 standard. 2647 However, most management applications have their own 2648 defined enum definitions and merging enums from 2649 different systems might create conflicts. By using 2650 a configurable uint32 the system can be configured 2651 to match the enum values in the management application."; 2652 } 2653 leaf probable-cause-string { 2654 type string; 2655 description 2656 "This string can be used to give a user friendly string 2657 to the probable cause value."; 2658 } 2659 } 2661 grouping attribute { 2662 description 2663 "A grouping to match the ITU generic reference to 2664 an attribute."; 2665 leaf id { 2666 type al:resource; 2667 description 2668 "The resource representing the attribute."; 2669 } 2670 leaf value { 2671 type string; 2672 description 2673 "The value represented as a string since it could 2674 be of any type."; 2675 } 2676 reference "Module Attribute-ASN1Module (X.721:02/1992)"; 2677 } 2679 /* 2680 * Add X.733 parameters to the alarm definitions, alarms, 2681 * and notification. 2682 */ 2684 augment "/al:alarms/al:alarm-inventory/al:alarm-type" { 2685 description 2686 "Augment X.733 mapping information to the alarm inventory."; 2687 uses x733-alarm-definition-parameters; 2688 } 2690 /* 2691 * Add X.733 configurable mapping. 2692 */ 2694 augment "/al:alarms/al:control" { 2695 description 2696 "Add X.733 mapping capabilities. "; 2697 list x733-mapping { 2698 if-feature "configure-x733-mapping"; 2699 key "alarm-type-id alarm-type-qualifier-match"; 2700 description 2701 "This list allows a management application to control the 2702 X.733 mapping for all alarm types in the system. Any entry 2703 in this list will allow the alarm manager to over-ride the 2704 default X.733 mapping in the system and the final mapping 2705 will be shown in the alarm inventory."; 2706 leaf alarm-type-id { 2707 type al:alarm-type-id; 2708 description 2709 "Map the alarm type with this alarm type identifier."; 2710 } 2711 leaf alarm-type-qualifier-match { 2712 type string; 2713 description 2714 "A W3C regular expression that is used when mapping an 2715 alarm type and alarm-type-qualifier to X.733 parameters."; 2716 } 2717 uses x733-alarm-definition-parameters; 2718 } 2719 } 2720 augment "/al:alarms/al:alarm-list/al:alarm" { 2721 description 2722 "Augment X.733 information to the alarm."; 2723 uses x733-alarm-parameters; 2724 } 2725 augment "/al:alarms/al:shelved-alarms/al:shelved-alarm" { 2726 description 2727 "Augment X.733 information to the alarm."; 2728 uses x733-alarm-parameters; 2729 } 2730 augment "/al:alarm-notification" { 2731 description 2732 "Augment X.733 information to the alarm notification."; 2733 uses x733-alarm-parameters; 2734 } 2735 } 2737 2739 8. IANA Considerations 2741 This document registers a URI in the IETF XML registry [RFC3688]. 2742 Following the format in RFC 3688, the following registration is 2743 requested to be made. 2745 URI: urn:ietf:params:xml:ns:yang:ietf-alarms 2747 Registrant Contact: The IESG. 2749 XML: N/A, the requested URI is an XML namespace. 2751 This document registers a YANG module in the YANG Module Names 2752 registry [RFC6020]. 2754 name: ietf-alarms 2755 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms 2756 prefix: al 2757 reference: RFC XXXX 2759 9. Security Considerations 2761 The YANG module specified in this document defines a schema for data 2762 that is designed to be accessed via network management protocols such 2763 as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer 2764 is the secure transport layer, and the mandatory-to-implement secure 2765 transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer 2766 is HTTPS, and the mandatory-to-implement secure transport is TLS 2767 [RFC5246]. 2769 The NETCONF access control model [RFC6536] provides the means to 2770 restrict access for particular NETCONF or RESTCONF users to a 2771 preconfigured subset of all available NETCONF or RESTCONF protocol 2772 operations and content. 2774 There are a number of data nodes defined in this YANG module that are 2775 writable/creatable/deletable (i.e., config true, which is the 2776 default). These data nodes may be considered sensitive or vulnerable 2777 in some network environments. Write operations (e.g., edit-config) 2778 to these data nodes without proper protection can have a negative 2779 effect on network operations. These are the subtrees and data nodes 2780 and their sensitivity/vulnerability: 2782 /alarms/control/notify-status-change: This leaf controls whether an 2783 alarm should notify only raise and clear or all severity level 2784 changes. Unauthorized access to leaf could have a negative impact 2785 on operational procedures relying on fine-grained alarm state 2786 change reporting. 2788 /alarms/control/alarm-shelving/shelf: This list controls the 2789 shelving (blocking) of alarms. Unauthorized access to this list 2790 could jeopardize the alarm management procedures since these 2791 alarms will not be notified and not be part of the alarm list. 2793 Some of the RPC operations in this YANG module may be considered 2794 sensitive or vulnerable in some network environments. It is thus 2795 important to control access to these operations. These are the 2796 operations and their sensitivity/vulnerability: 2798 purge-alarms: This RPC deletes alarms from the alarm list. 2799 Unauthorized use of this RPC could jeopardize the alarm management 2800 procedures since the deleted alarms may be vital for the alarm 2801 management application. 2803 10. Acknowledgements 2805 The authors wish to thank Viktor Leijon and Johan Nordlander for 2806 their valuable input on forming the alarm model. 2808 The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and 2809 Balazs Lengyel for their extensive reviews and contributions to this 2810 document. 2812 11. References 2814 11.1. Normative References 2816 [M.3100] International Telecommunications Union, "Generic Network 2817 Information Model", ITU-T Recommendation M.3100, 2005. 2819 [M.3160] International Telecommunications Union, "Generic, 2820 protocol-neutral management information model", 2821 ITU-T Recommendation M.3100, 2008. 2823 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2824 Requirement Levels", BCP 14, RFC 2119, 2825 DOI 10.17487/RFC2119, March 1997, . 2828 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 2829 DOI 10.17487/RFC3688, January 2004, . 2832 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 2833 (TLS) Protocol Version 1.2", RFC 5246, 2834 DOI 10.17487/RFC5246, August 2008, . 2837 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 2838 the Network Configuration Protocol (NETCONF)", RFC 6020, 2839 DOI 10.17487/RFC6020, October 2010, . 2842 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 2843 and A. Bierman, Ed., "Network Configuration Protocol 2844 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 2845 . 2847 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 2848 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 2849 . 2851 [RFC6536] Bierman, A. and M. Bjorklund, "Network Configuration 2852 Protocol (NETCONF) Access Control Model", RFC 6536, 2853 DOI 10.17487/RFC6536, March 2012, . 2856 [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", 2857 RFC 6991, DOI 10.17487/RFC6991, July 2013, 2858 . 2860 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 2861 RFC 7950, DOI 10.17487/RFC7950, August 2016, 2862 . 2864 [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 2865 Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, 2866 . 2868 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2869 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2870 May 2017, . 2872 [X.733] International Telecommunications Union, "Information 2873 Technology - Open Systems Interconnection - Systems 2874 Management: Alarm Reporting Function", 2875 ITU-T Recommendation X.733, 1992. 2877 11.2. Informative References 2879 [ALARMIRP] 2880 3GPP, "Telecommunication management; Fault Management; 2881 Part 2: Alarm Integration Reference Point (IRP): 2882 Information Service (IS)", 3GPP TS 32.111-2 3.4.0, March 2883 2005. 2885 [ALARMSEM] 2886 Wallin, S., Leijon, V., Nordlander, J., and N. Bystedt, 2887 "The semantics of alarm definitions: enabling systematic 2888 reasoning about alarms. International Journal of Network 2889 Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, 2890 http://dx.doi.org/10.1002/nem.800", March 2012. 2892 [EEMUA] EEMUA Publication No. 191 Engineering Equipment and 2893 Materials Users Association, London, 2 edition., "Alarm 2894 Systems: A Guide to Design, Management and Procurement.", 2895 2007. 2897 [G.7710] ITU-T, "SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL 2898 SYSTEMS AND NETWORKS Data over Transport - Generic aspects 2899 - Transport network control aspects. Common equipment 2900 management function requirements", 2012. 2902 [ISA182] International Society of Automation,ISA, "ANSI/ISA- 2903 18.2-2009 Management of Alarm Systems for the Process 2904 Industries", 2009. 2906 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 2907 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 2908 September 2004, . 2910 [RFC4268] Chisholm, S. and D. Perkins, "Entity State MIB", RFC 4268, 2911 DOI 10.17487/RFC4268, November 2005, . 2914 [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", 2915 BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, 2916 . 2918 [RFC8348] Bierman, A., Bjorklund, M., Dong, J., and D. Romascanu, "A 2919 YANG Data Model for Hardware Management", RFC 8348, 2920 DOI 10.17487/RFC8348, March 2018, . 2923 [X.736] International Telecommunications Union, "Information 2924 Technology - Open Systems Interconnection - Systems 2925 Management: Security alarm reporting function", 2926 ITU-T Recommendation X.736, 1992. 2928 Appendix A. Vendor-specific Alarm-Types Example 2930 This example shows how to define alarm-types in a vendor-specific 2931 module. In this case the vendor "xyz" has chosen to define top level 2932 identities according to X.733 event types. 2934 module example-xyz-alarms { 2935 namespace "urn:example:xyz-alarms"; 2936 prefix xyz-al; 2938 import ietf-alarms { 2939 prefix al; 2940 } 2942 identity xyz-alarms { 2943 base al:alarm-type-id; 2944 } 2946 identity communications-alarm { 2947 base xyz-alarms; 2948 } 2949 identity quality-of-service-alarm { 2950 base xyz-alarms; 2951 } 2952 identity processing-error-alarm { 2953 base xyz-alarms; 2954 } 2955 identity equipment-alarm { 2956 base xyz-alarms; 2957 } 2958 identity environmental-alarm { 2959 base xyz-alarms; 2960 } 2962 // communications alarms 2963 identity link-alarm { 2964 base communications-alarm; 2965 } 2967 // QoS alarms 2968 identity high-jitter-alarm { 2969 base quality-of-service-alarm; 2970 } 2971 } 2973 Appendix B. Alarm Inventory Example 2975 This shows an alarm inventory, it shows one alarm type defined only 2976 with the identifier, and another dynamically configured. In the 2977 latter case a digital input has been connected to a smoke-detector, 2978 therefore the 'alarm-type-qualifier' is set to "smoke-detector" and 2979 the 'alarm-type-identity' to "environmental-alarm". 2981 2984 2985 2986 xyz-al:link-alarm 2987 2988 2989 /dev:interfaces/dev:interface 2990 2991 true 2992 2993 Link failure, operational state down but admin state up 2994 2995 2996 2997 xyz-al:environmental-alarm 2998 smoke-alarm 2999 true 3000 3001 Connected smoke detector to digital input 3002 3003 3004 3005 3007 Appendix C. Alarm List Example 3009 In this example we show an alarm that has toggled [major, clear, 3010 major]. An operator has acknowledged the alarm. 3012 3015 3016 1 3017 2015-04-08T08:39:50.00Z 3019 3020 3021 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3022 3023 xyz-al:link-alarm 3024 3026 2015-04-08T08:39:50.00Z 3027 false 3028 1.3.6.1.2.1.2.2.1.1.17 3029 2015-04-08T08:39:40.00Z 3030 major 3031 3032 Link operationally down but administratively up 3033 3034 3035 3036 major 3037 3038 Link operationally down but administratively up 3039 3040 3041 3042 3043 cleared 3044 3045 Link operationally up and administratively up 3046 3047 3048 3049 3050 major 3051 3052 Link operationally down but administratively up 3053 3054 3055 3056 3057 ack 3058 joe 3059 Will investigate, ticket TR764999 3060 3061 3062 3063 3065 Appendix D. Alarm Shelving Example 3067 This example shows how to shelf alarms. We shelf alarms related to 3068 the smoke-detectors since they are being installed and tested. We 3069 also shelf all alarms from FastEthernet1/0. 3071 3074 3075 3076 3077 FE10 3078 3079 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 3080 3081 3082 3083 detectortest 3084 xyz-al:environmental-alarm 3085 3086 smoke-alarm 3087 3088 3089 3090 3091 3093 Appendix E. X.733 Mapping Example 3095 This example shows how to map a dynamic alarm type (alarm-type- 3096 identity=environmental-alarm, alarm-type-qualifier=smoke-alarm) to 3097 the corresponding X.733 event-type and probable cause parameters. 3099 3101 3102 3104 xyz-al:environmental-alarm 3105 3106 smoke-alarm 3107 3108 quality-of-service-alarm 3109 777 3110 3111 3112 3114 Appendix F. Relationships to other standards 3116 This section briefly describes how this alarm module relates to other 3117 relevant standards. 3119 F.1. Relationship to RFC 8348 3121 RFC 8348 [RFC8348] defines a YANG data model for the management of 3122 hardware. The "alarm-state" in RFC 8348 (and EntityAlarmStatus in 3123 RFC 4268 [RFC4268]) is a summary of the alarm severity levels that 3124 may be active on the specific hardware component. It does not say 3125 anything about how alarms are reported, and it doesn't provide any 3126 details of the alarms. 3128 The mapping between the alarm YANG data-model and the alarm-state in 3129 RFC 8348 are outlined below 3131 resource: corresponds to /hardware/component/ 3133 is-cleared: no bit set in /hardware/component/state/alarm-state 3135 perceived-severity: corresponding bit set in 3136 /hardware/component/state/alarm-state 3138 operator-state-change/state: if the alarm is acknowledged by the 3139 operator it may correspond to under-repair 3141 F.2. Relationship to other alarm standards 3143 F.2.1. Alarm definition 3145 The table below summarizes relevant definitions of the term "alarm" 3146 in other alarm standards. 3148 +------------+---------------------------+--------------------------+ 3149 | Standard | Definition | Comment | 3150 +------------+---------------------------+--------------------------+ 3151 | X.733 | error: A deviation of a | The X.733 alarm | 3152 | [X.733] | system from normal | definition is focused on | 3153 | | operation. fault: The | the notification as such | 3154 | | physical or algorithmic | and not the state. It | 3155 | | cause of a malfunction. | also uses the basic | 3156 | | Faults manifest | criteria of deviation | 3157 | | themselves as errors. | from normal condition. | 3158 | | alarm: A notification, of | There is no requirement | 3159 | | the form defined by this | for an operation action | 3160 | | function, of a specific | to be required. | 3161 | | event. An alarm may or | | 3162 | | may not represent an | | 3163 | | error. | | 3164 | | | | 3165 | G.7710 | Alarms are indications | The G.7710 definition is | 3166 | [G.7710] | that are automatically | close to the original | 3167 | | generated by an NE as a | X.733 definition. | 3168 | | result of the declaration | | 3169 | | of a failure. | | 3170 | | | | 3171 | Alarm MIB | Alarm: Persistent | RFC 3877 defines alarm | 3172 | [RFC3877] | indication of a fault. | referring back to "a | 3173 | | Fault: Lasting error or | deviation from normal | 3174 | | warning condition. | operation". This is | 3175 | | Error: A deviation of a | problematic, since this | 3176 | | system from normal | might not require an | 3177 | | operation. | operator action. The | 3178 | | | alarm MIB is state | 3179 | | | oriented rather than | 3180 | | | notification oriented, | 3181 | | | an alarm is a "lasting | 3182 | | | condition", not a | 3183 | | | discrete notification | 3184 | | | reporting about a | 3185 | | | condition state change. | 3186 | | | | 3187 | ISA | Alarm: An audible and/or | The ISA standard adds an | 3188 | [ISA182] | visible means of | important requirement to | 3189 | | indicating to the | the "deviation from | 3190 | | operator an equipment | normal condition state"; | 3191 | | malfunction, process | requiring a response. | 3192 | | deviation or abnormal | | 3193 | | condition requiring a | | 3194 | | response. | | 3195 | | | | 3196 | EEMUA | An alarm is an event to | This is the foundation | 3197 | [EEMUA] | which an operator must | for the definition of | 3198 | | knowingly react,respond, | alarm in this document. | 3199 | | and acknowledge - not | It focuses on the core | 3200 | | simply acknowledge and | criteria that an action | 3201 | | ignore. | is really needed. | 3202 | | | | 3203 | 3GPP Alarm | 3GPP v15: An alarm | The latest 3GPP Alarm | 3204 | IRP | signifies an undesired | IRP version uses | 3205 | [ALARMIRP] | condition of a resource | literally the same alarm | 3206 | | (e.g. network element, | definition as this alarm | 3207 | | link) for which an | module. It is worth | 3208 | | operator action is | noting that earlier | 3209 | | required. It emphasizes a | versions used a | 3210 | | key requirement that | definition not requiring | 3211 | | operators [...] should | an operator action and | 3212 | | not be informed about an | the more broad | 3213 | | undesired condition | definition of deviation | 3214 | | unless it requires | from normal condition. | 3215 | | operator action. 3GPP | The earlier version also | 3216 | | v12: alarm: abnormal | defined an alarm as a | 3217 | | network entity condition, | special case of "event". | 3218 | | which categorizes an | | 3219 | | event as a fault. fault: | | 3220 | | a deviation of a system | | 3221 | | from normal operation, | | 3222 | | which may result in the | | 3223 | | loss of operational | | 3224 | | capabilities [...] | | 3225 +------------+---------------------------+--------------------------+ 3227 Table 1: Definition of alarm in standards 3229 The evolution of the definition of alarm moves from focused on events 3230 reporting a deviation from normal operation towards a definition to a 3231 undesired *state* which *requires an operator action*. 3233 F.2.2. Data model 3235 This section describes how this YANG alarm module relates to other 3236 standard data models. Note well that we cover other data-models for 3237 alarm interfaces. Not other standards such as SDO specific alarms 3238 for example. 3240 F.2.2.1. X.733 3242 X.733 has acted as a base for several alarm data models over the 3243 year. The YANG alarm module differs in the following ways: 3245 X.733 models the alarm list as a list of notifications. The YANG 3246 alarm module defines the alarm list as the current alarm states 3247 for the resources, which is generated from the state change 3248 reporting notifications. 3250 In X.733 an alarm can have the severity level clear. In the YANG 3251 alarm module "clear" is not a severity level, it is a separate 3252 state of the alarm. An alarm can have the following states for 3253 example (major, cleared), (minor, not cleared) 3255 X.733 uses a flat globally defined enumerated "probable cause" to 3256 identify alarm types. This alarm module uses a hierarchical YANG 3257 identity, alarm-type. This enables delegation of alarm types 3258 within organizations. It also lets management reason about 3259 "abstract" alarm-types corresponding to base identities, see 3260 Section 3.2. 3262 The YANG alarm module has not included the majority of the X.733 3263 alarm attributes. Rather these are defined in an augmenting 3264 module if "strict" X.733 compliance is needed. 3266 F.2.2.2. RFC 3877, the Alarm MIB 3268 The MIB in RFC 3877 takes a different approach, rather than defining 3269 a concrete data model for alarms, it defines a model to map existing 3270 SNMP managed objects and notifications into alarm states and alarm 3271 notifications. This was necessary since MIBs were already defined 3272 with both managed objects and notifications indicating alarms, for 3273 example linkUp and linkDown notifications in combination with 3274 ifAdminState and ifOperState. So RFC 3877 can not really be compared 3275 to the alarm YANG module in that sense. 3277 The Alarm MIB maps existing MIB definitions into alarms, 3278 alarmModelTable. The upside of that is that a SNMP Manager can at 3279 runtime read the possible alarm types. This corresponds to the 3280 alarmInventory in the alarm YANG module. 3282 F.2.2.3. 3GPP Alarm IRP 3284 The 3GPP Alarm IRP is an evolution of X.733. Main differences 3285 between the alarm YANG module and 3GPP are: 3287 3GPP keeps the majority of the X.733 attributes, the alarm YANG 3288 module does not. 3290 3GPP introduced overlapping and possibly conflicting keys for 3291 alarms, alarmId and (managed object, event type, probable cause, 3292 specific problem). (See Annex C in [X.733] Example 3). In the 3293 YANG alarm module the key for identifying an alarm instance is 3294 clearly defined by (resource, alarm-type, alarm-type-qualifier). 3295 See also Section 3.4 for more information. 3297 The alarm YANG module clearly separates the resource/ 3298 instrumentation life cycle from the operator life cycle. 3GPP 3299 allows operators to set the alarm severity to clear, this is not 3300 allowed by this module, rather an operator closes an alarm which 3301 does not affect the severity. 3303 F.2.2.4. G.7710 3305 G.7710 is different than the previous referenced alarm standards. It 3306 does define a data-model for alarm reporting. It defines common 3307 equipment management function requirements including alarm 3308 instrumentation. The scope is transport networks. 3310 The requirements in G.7710 corresponds to features in the alarm YANG 3311 module in the following way: 3313 Alarm Severity Assignment Profile (ASAP): the alarm profile 3314 "/alarms/alarm-profile/". 3316 Alarm Reporting Control (ARC): alarm shelving "/alarms/control/ 3317 alarm-shelving/" and the ability to control alarm notifications 3318 "/alarms/control/notify-status-changes". 3320 Appendix G. Alarm Usability Requirements 3322 This section defines usability requirements for alarms. Alarm 3323 usability is important for an alarm interface. A data-model will 3324 help in defining the format but if the actual alarms are of low value 3325 we have not gained the goal of alarm management. 3327 Common alarm problems and the cause of the problems are summarized in 3328 Table 2. This summary is adopted to networking based on the ISA 3329 [ISA182] and EEMUA [EEMUA] standards. 3331 +------------------+--------------------------------+---------------+ 3332 | Problem | Cause | How this | 3333 | | | module | 3334 | | | address the | 3335 | | | cause | 3336 +------------------+--------------------------------+---------------+ 3337 | Alarms are | "Nuisance" alarms (chattering | Strict | 3338 | generated but | alarms and fleeting alarms), | definition of | 3339 | they are ignored | faulty hardware, redundant | alarms | 3340 | by the operator. | alarms, cascading alarms, | requiring | 3341 | | incorrect alarm settings, | corrective | 3342 | | alarms have not been | response. | 3343 | | rationalized, the alarms | Alarm | 3344 | | represent log information | requirements | 3345 | | rather than true alarms. | in Table 3. | 3346 | | | | 3347 | When alarms | Insufficient alarm response | The alarm | 3348 | occur, operators | procedures and not well | inventory | 3349 | do not know how | defined alarm types. | lists all | 3350 | to respond. | | alarm types | 3351 | | | and | 3352 | | | corrective | 3353 | | | actions. | 3354 | | | Alarm | 3355 | | | requirements | 3356 | | | in Table 3. | 3357 | | | | 3358 | The alarm | Nuisance alarms, stale alarms, | The alarm | 3359 | display is full | alarms from equipment not in | definition | 3360 | of alarms, even | service. | and alarm | 3361 | when there is | | shelving. | 3362 | nothing wrong. | | | 3363 | | | | 3364 | During a | Incorrect prioritization of | State-based | 3365 | failure, | alarms. Not using advanced | alarm model, | 3366 | operators are | alarm techniques (e.g. state- | alarm rate | 3367 | flooded with so | based alarming). | requirements | 3368 | many alarms that | | in Table 4 | 3369 | they do not know | | and Table 5 | 3370 | which ones are | | | 3371 | the most | | | 3372 | important. | | | 3373 +------------------+--------------------------------+---------------+ 3375 Table 2: Alarm Problems and Causes 3377 Based upon the above problems EEMUA gives the following definition of 3378 a good alarm: 3380 +----------------+--------------------------------------------------+ 3381 | Characteristic | Explanation | 3382 +----------------+--------------------------------------------------+ 3383 | Relevant | Not spurious or of low operational value. | 3384 | | | 3385 | Unique | Not duplicating another alarm. | 3386 | | | 3387 | Timely | Not long before any response is needed or too | 3388 | | late to do anything. | 3389 | | | 3390 | Prioritized | Indicating the importance that the operator | 3391 | | deals with the problem. | 3392 | | | 3393 | Understandable | Having a message which is clear and easy to | 3394 | | understand. | 3395 | | | 3396 | Diagnostic | Identifying the problem that has occurred. | 3397 | | | 3398 | Advisory | Indicative of the action to be taken. | 3399 | | | 3400 | Focusing | Drawing attention to the most important issues. | 3401 +----------------+--------------------------------------------------+ 3403 Table 3: Definition of a Good Alarm 3405 Vendors SHOULD rationalize all alarms according to above. Another 3406 crucial requirement is acceptable alarm notification rates. Vendors 3407 SHOULD make sure that they do not exceed the recommendations from 3408 EEMUA below: 3410 +-----------------------------------+-------------------------------+ 3411 | Long Term Alarm Rate in Steady | Acceptability | 3412 | Operation | | 3413 +-----------------------------------+-------------------------------+ 3414 | More than one per minute | Very likely to be | 3415 | | unacceptable. | 3416 | | | 3417 | One per 2 minutes | Likely to be over-demanding. | 3418 | | | 3419 | One per 5 minutes | Manageable. | 3420 | | | 3421 | Less than one per 10 minutes | Very likely to be acceptable. | 3422 +-----------------------------------+-------------------------------+ 3424 Table 4: Acceptable Alarm Rates, Steady State 3426 +----------------------------+--------------------------------------+ 3427 | Number of alarms displayed | Acceptability | 3428 | in 10 minutes following a | | 3429 | major network problem | | 3430 +----------------------------+--------------------------------------+ 3431 | More than 100 | Definitely excessive and very likely | 3432 | | to lead to the operator to abandon | 3433 | | the use of the alarm system. | 3434 | | | 3435 | 20-100 | Hard to cope with. | 3436 | | | 3437 | Under 10 | Should be manageable - but may be | 3438 | | difficult if several of the alarms | 3439 | | require a complex operator response. | 3440 +----------------------------+--------------------------------------+ 3442 Table 5: Acceptable Alarm Rates, Burst 3444 The numbers in Table 4 and Table 5 are the sum of all alarms for a 3445 network being managed from one alarm console. So every individual 3446 system or NMS contributes to these numbers. 3448 Vendors SHOULD make sure that the following rules are used in 3449 designing the alarm interface: 3451 1. Rationalize the alarms in the system to ensure that every alarm 3452 is necessary, has a purpose, and follows the cardinal rule - that 3453 it requires an operator response. Adheres to the rules of 3454 Table 3 3456 2. Audit the quality of the alarms. Talk with the operators about 3457 how well the alarm information support them. Do they know what 3458 to do in the event of an alarm? Are they able to quickly 3459 diagnose the problem and determine the corrective action? Does 3460 the alarm text adhere to the requirements in Table 3? 3462 3. Analyze and benchmark the performance of the system and compare 3463 it to the recommended metrics in Table 4 and Table 5. Start by 3464 identifying nuisance alarms, standing alarms at normal state and 3465 startup. 3467 Authors' Addresses 3469 Stefan Vallin 3470 Stefan Vallin AB 3472 Email: stefan@wallan.se 3473 Martin Bjorklund 3474 Cisco 3476 Email: mbj@tail-f.com