idnits 2.17.1 draft-ietf-ccamp-alarm-module-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 412 has weird spacing: '...perator str...' == Line 417 has weird spacing: '...w state ope...' == Line 558 has weird spacing: '...alifier ala...' == Line 606 has weird spacing: '...alifier lea...' == Line 616 has weird spacing: '...everity sev...' == (2 more instances...) -- The document date (February 8, 2018) is 2268 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RFC6536' is mentioned on line 2327, but not defined ** Obsolete undefined reference: RFC 6536 (Obsoleted by RFC 8341) ** Obsolete normative reference: RFC 5246 (Obsoleted by RFC 8446) == Outdated reference: A later version (-06) exists of draft-ietf-netmod-yang-tree-diagrams-05 Summary: 2 errors (**), 0 flaws (~~), 9 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group S. Vallin 3 Internet-Draft Stefan Vallin AB 4 Intended status: Standards Track M. Bjorklund 5 Expires: August 12, 2018 Cisco 6 February 8, 2018 8 YANG Alarm Module 9 draft-ietf-ccamp-alarm-module-01 11 Abstract 13 This document defines a YANG module for alarm management. It 14 includes functions for alarm list management, alarm shelving and 15 notifications to inform management systems. There are also RPCs to 16 manage the operator state of an alarm and administrative alarm 17 procedures. The module carefully maps to relevant alarm standards. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at http://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on August 12, 2018. 36 Copyright Notice 38 Copyright (c) 2018 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (http://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 54 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 55 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 56 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 57 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 58 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 59 3.3. Identifying Resource . . . . . . . . . . . . . . . . . . 7 60 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 7 61 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 62 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 8 63 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 9 64 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 9 65 3.6. Root Cause and Impacted Resources . . . . . . . . . . . . 10 66 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 10 67 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 10 68 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 11 69 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 11 70 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 12 71 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 13 72 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 13 73 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 15 74 4.6. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 15 75 4.7. Notifications . . . . . . . . . . . . . . . . . . . . . . 15 76 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 15 77 6. X.733 Alarm Mapping Data Model . . . . . . . . . . . . . . . 43 78 7. X.733 Alarm Mapping YANG Module . . . . . . . . . . . . . . . 43 79 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 49 80 9. Security Considerations . . . . . . . . . . . . . . . . . . . 50 81 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 51 82 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 51 83 11.1. Normative References . . . . . . . . . . . . . . . . . . 51 84 11.2. Informative References . . . . . . . . . . . . . . . . . 52 85 Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 53 86 Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 54 87 Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 54 88 Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 56 89 Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 56 90 Appendix F. Background and Usability Requirements . . . . . . . 57 91 F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 57 92 F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 58 93 F.2. Usability Requirements . . . . . . . . . . . . . . . . . 58 94 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 96 1. Introduction 98 This document defines a YANG [RFC7950] module for alarm management. 99 The purpose is to define a standardised alarm interface for network 100 devices that can be easily integrated into management applications. 101 The model is also applicable as a northbound alarm interface in the 102 management applications. 104 Alarm monitoring is a fundamental part of monitoring the network. 105 Raw alarms from devices do not always tell the status of the network 106 services or necessarily point to the root cause. However, being able 107 to feed alarms to the alarm management application in a standardised 108 format is a starting point for performing higher level network 109 assurance tasks. 111 This document defines a standardised YANG module for alarm 112 management. The design of the module is based on experience from 113 using and implementing available alarm standards from ITU [X.733], 114 3GPP [ALARMIRP] and ANSI [ISA182]. 116 1.1. Terminology and Notation 118 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 119 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and 120 "OPTIONAL" in this document are to be interpreted as described in BCP 121 14 [RFC2119] [RFC8174] when, and only when, they appear in all 122 capitals, as shown here. 124 The following terms are defined in [RFC7950]: 126 o action 128 o client 130 o data tree 132 o RPC 134 o server 136 The following terms are used within this document: 138 o Alarm (the general concept): An alarm signifies an undesirable 139 state in a resource that requires corrective action. 141 o Alarm Instance: The alarm state for a specific resource and alarm 142 type. For example (GigabitEthernet0/15, link-alarm). An entry in 143 the alarm list. 145 o Alarm Inventory: A list of all possible alarm types on a system. 147 o Alarm Shelving: Blocking alarms according to specific criteria. 149 o Alarm Type: An alarm type identifies a possible unique alarm state 150 for a resource. Alarm types are names to identify the state like 151 "link-alarm", "jitter-violation", "high-disk-utilization". 153 o Management System: The alarm management application that consumes 154 the alarms, i.e., acts as a client. 156 o Resource: A fine-grained identification of the alarming resource, 157 for example: an interface, a process. 159 o System: The system that implements this YANG alarm module, i.e., 160 acts as a server. This corresponds to a network device or a 161 management application that provides a north-bound alarm 162 interface. 164 Tree diagrams used in this document follow the notation defined in 165 [I-D.ietf-netmod-yang-tree-diagrams]. 167 2. Objectives 169 The objectives for the design of the Alarm Module are: 171 o Simple to use. If a system supports this module, it shall be 172 straight-forward to integrate this into a YANG based alarm 173 manager. 175 o View alarms as states on resources and not as discrete 176 notifications. 178 o Clear definition of "alarm" in order to exclude general events 179 that should not be forwarded as alarm notifications. 181 o Clear and precise identification of alarm types and alarm 182 instances. 184 o A management system should be able to pull all available alarm 185 types from a system, i.e., read the alarm inventory from a system. 186 This makes it possible to prepare alarm operators with 187 corresponding alarm instructions. 189 o Address alarm usability requirements, see Appendix F. While IETF 190 has not really addressed alarm management, telecom standards has 191 addressed it purely from a protocol perspective. The process 192 industry has published several relevant standards addressing 193 requirements for a useful alarm interface; [EEMUA], [ISA182]. 194 This alarm module defines usability requirements as well as a YANG 195 data model. 197 o Mapping to X.733, which is a requirement for some alarm systems. 198 Still, keep some of the X.733 concepts out of the core model in 199 order to make the model small and easy to understand. 201 3. Alarm Module Concepts 203 This section defines the fundamental concepts behind the data model. 204 This section is rooted in the works of Vallin et. al [ALARMSEM]. 206 3.1. Alarm Definition 208 An alarm signifies an undesirable state in a resource that requires 209 corrective action. 211 There are two main things to remember from this definition: 213 1. the definition focuses on leaving out events and logging 214 information in general. Alarms should only be used for undesired 215 states that require action. 217 2. the definition also focus on alarms as a state on a resource, not 218 the notifications that report the state changes. 220 See Appendix F for more motivation and consequences around this 221 definition. 223 3.2. Alarm Type 225 This document defines an alarm type with an alarm type id and an 226 alarm type qualifier. 228 The alarm type id is modeled as a YANG identity. With YANG 229 identities, new alarm types can be defined in a distributed fashion. 230 YANG identities are hierarchical, which means that an hierarchy of 231 alarm types can be defined. 233 Standards and vendors should define their own alarm type identities 234 based on this definition. 236 The use of YANG identities means that all possible alarms are 237 identified at design time. This explicit declaration of alarm types 238 makes it easier to allow for alarm qualification reviews and 239 preparation of alarm actions and documentation. 241 There are occasions where the alarm types are not known at design 242 time. For example, a system with digital inputs that allows users to 243 connects detectors (e.g., smoke detector) to the inputs. In this 244 case it is a configuration action that says that certain connectors 245 are fire alarms for example. The drawback of this is that there is a 246 big risk that alarm operators will receive alarm types as a surprise, 247 they do not know how to resolve the problem since a defined alarm 248 procedure does not necessarily exist. 250 In order to allow for dynamic addition of alarm types the alarm 251 module also allows for further qualification of the identity based 252 alarm type using a string. 254 A vendor or standard can then define their own alarm-type hierarchy. 255 The example below shows a hierarchy based on X.733 event types: 257 import ietf-alarms { 258 prefix al; 259 } 260 identity vendor-alarms { 261 base al:alarm-type; 262 } 263 identity communications-alarm { 264 base vendor-alarms; 265 } 266 identity link-alarm { 267 base communications-alarm; 268 } 270 Alarm types can be abstract. An abstract alarm type is used as a 271 base for defining hierarchical alarm types. Concrete alarm types are 272 used for alarm states and appear in the alarm inventory. There are 273 two kinds of concrete alarm types: 275 1. The last subordinate identity in the "alarm-type-id" hierarchy is 276 concrete, for example: "alarm-identity.environmental- 277 alarm.smoke". In this example "alarm-identity" and 278 "environmental-alarm" are abstract YANG identities, whereas 279 "smoke" is a concrete YANG identity. 281 2. The YANG identity hierarchy is abstract and the concrete alarm 282 type is defined by the dynamic alarm qualifier string, for 283 example: "alarm-identity.environmental-alarm.external-detector" 284 with alarm-type-qualifier "smoke". 286 For example: 288 // Alternative 1: concrete alarm type identity 289 import ietf-alarms { 290 prefix al; 291 } 292 identity environmental-alarm { 293 base al:alarm-type; 294 description "Abstract alarm type"; 295 } 296 identity smoke { 297 base environmental-alarm; 298 description "Concrete alarm type"; 299 } 301 // Alternative 2: concrete alarm type qualifier 302 import ietf-alarms { 303 prefix al; 304 } 305 identity environmental-alarm { 306 base al:alarm-type; 307 description "Abstract alarm type"; 308 } 309 identity external-detector { 310 base environmental-alarm; 311 description 312 "Abstract alarm type, a run-time configuration 313 procedure sets the type of alarm detected. This will 314 be reported in the alarm-type-qualifier."; 315 } 317 3.3. Identifying Resource 319 It is of vital importance to be able to refer to the alarming 320 resource. This reference must be as fine-grained as possible. If 321 the alarming resource exists in the data tree then an instance- 322 identifier MUST be used with the full path to the object. 324 This module also allows for alternate naming of the alarming resource 325 if it is not available in the data tree. 327 3.4. Identifying Alarm Instances 329 A primary goal of this alarm module is to remove any ambiguity in how 330 alarm notifications are mapped to an update of an alarm instance. 331 X.733 and especially 3GPP were not really clear on this point. This 332 YANG alarm module states that the tuple (resource, alarm type 333 identifier, alarm type qualifier) corresponds to a single alarm 334 instance. This means that alarm notifications for the same resource 335 and same alarm type are matched to update the same alarm instance. 336 These three leafs are therefore used as the key in the alarm list: 338 list alarm { 339 key "resource alarm-type-id alarm-type-qualifier"; 340 ... 341 } 343 3.5. Alarm Life-Cycle 345 The alarm model clearly separates the resource alarm life-cycle from 346 the operator and administrative life-cycles of an alarm. 348 o resource alarm life-cycle: the alarm instrumentation that controls 349 alarm raise, clearance, and severity changes. 351 o operator alarm life-cycle: operators acting upon alarms with 352 actions like acknowledgment and closing. Closing an alarm implies 353 that the operator considers the corrective action performed. 354 Operators can also shelf (block/filter) alarms in order to avoid 355 nuisance alarms. 357 o administrative alarm life-cycle: deleting (purging) alarms and 358 compressing the alarm status change list. This module exposes 359 operations to manage the administrative life-cycle. The server 360 may also perform these operations based on other policies, but how 361 that is done is out of scope for this document. 363 A server SHOULD describe how long it retains cleared/closed alarms: 364 until manually purged or if it has an automatic removal policy. 366 3.5.1. Resource Alarm Life-Cycle 368 From a resource perspective, an alarm can have the following life- 369 cycle: raise, change severity, change severity, clear, being raised 370 again etc. All of these status changes can have different alarm 371 texts generated by the instrumentation. Two important things to 372 note: 374 1. Alarms are not deleted when they are cleared. Deleting alarms is 375 an administrative process. The alarm module defines an rpc 376 "purge" that deletes alarms. 378 2. Alarms are not cleared by operators, only the underlying 379 instrumentation can clear an alarm. Operators can close alarms. 381 The YANG tree representation below illustrates the resource oriented 382 life-cycle: 384 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 385 ... 386 +--ro is-cleared boolean 387 +--ro last-changed yang:date-and-time 388 +--ro perceived-severity severity 389 +--ro alarm-text alarm-text 390 +--ro status-change* [time] 391 +--ro time yang:date-and-time 392 +--ro perceived-severity severity 393 +--ro alarm-text alarm-text 395 For every status change from the resource perspective a row is added 396 to the "status-change" list. The last status values are also 397 represented at leafs for the alarm. Note well that the alarm 398 severity does not include "cleared", alarm clearance is a flag. 400 An alarm can therefore look like this: ((GigabitEthernet0/25, link- 401 alarm,""), false, T, major, "Interface GigabitEthernet0/25 down") 403 3.5.2. Operator Alarm Life-cycle 405 Operators can also act upon alarms using the set-operator-state 406 action: 408 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 409 ... 410 +--ro operator-state-change* [time] {operator-actions}? 411 | +--ro time yang:date-and-time 412 | +--ro operator string 413 | +--ro state operator-state 414 | +--ro text? string 415 +---x set-operator-state {operator-actions}? 416 +---w input 417 +---w state operator-state 418 +---w text? string 420 The operator state for an alarm can be: "none", "ack", "shelved", and 421 "closed". Alarm deletion (using the rpc "purge-alarms"), can use 422 this state as a criteria. A closed alarm is an alarm where the 423 operator has performed any required corrective actions. Closed 424 alarms are good candidates for being deleted. 426 3.5.3. Administrative Alarm Life-Cycle 428 Deleting alarms from the alarm list is considered an administrative 429 action. This is supported by the "purge-alarms" rpc. The "purge- 430 alarms" rpc takes a filter as input. The filter selects alarms based 431 on the operator and resource life-cycle such as "all closed cleared 432 alarms older than a time specification". The server may also perform 433 these operations based on other policies, but how that is done is out 434 of scope for this document. 436 Alarms can be compressed. Compressing an alarm deletes all entries 437 in the alarm's "status-change" list except for the last status 438 change. A client can perform this using the "compress-alarms" rpc. 439 The server may also perform these operations based on other policies, 440 but how that is done is out of scope for this document. 442 3.6. Root Cause and Impacted Resources 444 The general principle of this alarm module is to limit the amount of 445 alarms. The alarm has two leaf-lists to identify possible impacted 446 resources and possible root-cause resources. The system should not 447 send individual alarms for the possible root-cause resources and 448 impacted resources. These serves as hints only. It is up to the 449 client application to use this information to present the overall 450 status. 452 3.7. Alarm Shelving 454 Alarm shelving is an important function in order for alarm management 455 applications and operators to stop superfluous alarms. A shelved 456 alarm implies that any alarms fulfilling this criteria are ignored 457 (blocked/filtered). Shelved alarms appear in a dedicated shelved 458 alarm list in order not to disturb the relevant alarms. Shelved 459 alarms do not generate notifications. 461 4. Alarm Data Model 463 Alarm shelving and operator actions are YANG features so that a 464 server can select not to support these. 466 The data model has the following overall structure: 468 +--rw alarms 469 +--rw control 470 | +--rw max-alarm-status-changes? union 471 | +--rw notify-status-changes? boolean 472 | +--rw alarm-shelving {alarm-shelving}? 473 | ... 474 +--ro alarm-inventory 475 | +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 476 | ... 477 +--ro summary 478 | +--ro alarm-summary* [severity] 479 | | ... 480 | +--ro shelves-active? empty {alarm-shelving}? 481 +--ro alarm-list 482 | +--ro number-of-alarms? yang:gauge32 483 | +--ro last-changed? yang:date-and-time 484 | +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 485 | ... 486 +--ro shelved-alarms {alarm-shelving}? 487 +--ro number-of-shelved-alarms? yang:gauge32 488 +--ro alarm-shelf-last-changed? yang:date-and-time 489 +--ro shelved-alarm* 490 [resource alarm-type-id alarm-type-qualifier] 491 ... 493 4.1. Alarm Control 495 The "/alarms/control/notify-status-changes" leaf controls if 496 notifications are sent for all state changes, severity change and 497 alarm text change, or just for new and cleared alarms. 499 Every alarm has a list of status changes, this is a circular list. 500 The length of this list is controlled by "/alarms/control/max-alarm- 501 status-changes". 503 4.1.1. Alarm Shelving 505 The shelving control tree is shown below: 507 +--rw alarms 508 +--rw control 509 +--rw alarm-shelving {alarm-shelving}? 510 +--rw shelf* [name] 511 +--rw name string 512 +--rw resource* resource-match 513 +--rw alarm-type-id? alarm-type-id 514 +--rw alarm-type-qualifier-match? string 515 +--rw description? string 517 Shelved alarms are shown in a dedicated shelved alarm list. The 518 instrumentation MUST move shelved alarms from the alarm list 519 (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- 520 alarms/). Shelved alarms do not generate any notifications. When 521 the shelving criteria is removed or changed the alarm list MUST be 522 updated to the correct actual state of the alarms. 524 Shelving and unshelving can only be performed by editing the shelf 525 configuration. It cannot be performed on individual alarms. The 526 server will add an operator state indicating that the alarm was 527 shelved/unshelved. 529 A leaf (/alarms/summary/shelfs-active) in the alarm summary indicates 530 if there are shelved alarms. 532 A system can select to not support the shelving feature. 534 4.2. Alarm Inventory 536 The alarm inventory represents all possible alarm types that may 537 occur in the system. A management system may use this to build alarm 538 procedures. The alarm inventory is relevant for several reasons: 540 The system might not instrument all alarm type identities. 542 The system has configured dynamic alarm types using the alarm 543 qualifier. The inventory makes it possible for the management 544 system to discover these. 546 Note that the mechanism whereby dynamic alarm types are added using 547 the alarm type qualifier MUST populate this list. 549 The optional leaf-list "resource" in the alarm inventory enables the 550 system to publish for which resources a given alarm type may appear. 552 The alarm inventory tree is shown below: 554 +--rw alarms 555 +--ro alarm-inventory 556 +--ro alarm-type* [alarm-type-id alarm-type-qualifier] 557 +--ro alarm-type-id alarm-type-id 558 +--ro alarm-type-qualifier alarm-type-qualifier 559 +--ro resource* resource-match 560 +--ro has-clear boolean 561 +--ro severity-levels* severity 562 +--ro description string 564 4.3. Alarm Summary 566 The alarm summary list summarises alarms per severity; how many 567 cleared, cleared and closed, and closed. It also gives an indication 568 if there are shelved alarms. 570 The alarm summary tree is shown below: 572 +--rw alarms 573 +--ro summary 574 +--ro alarm-summary* [severity] 575 | +--ro severity severity 576 | +--ro total? yang:gauge32 577 | +--ro cleared? yang:gauge32 578 | +--ro cleared-not-closed? yang:gauge32 579 | | {operator-actions}? 580 | +--ro cleared-closed? yang:gauge32 581 | | {operator-actions}? 582 | +--ro not-cleared-closed? yang:gauge32 583 | | {operator-actions}? 584 | +--ro not-cleared-not-closed? yang:gauge32 585 | {operator-actions}? 586 +--ro shelves-active? empty {alarm-shelving}? 588 4.4. The Alarm List 590 The alarm list (/alarms/alarm-list) is a function from (resource, 591 alarm type, alarm type qualifier) to the current alarm state. 593 +--ro alarm-list 594 +--ro number-of-alarms? yang:gauge32 595 +--ro last-changed? yang:date-and-time 596 +--ro alarm* [resource alarm-type-id alarm-type-qualifier] 597 +--ro resource resource 598 +--ro alarm-type-id alarm-type-id 599 +--ro alarm-type-qualifier alarm-type-qualifier 600 +--ro alt-resource* resource 601 +--ro related-alarm* 602 | [resource alarm-type-id alarm-type-qualifier] 603 | +--ro resource 604 | | -> /alarms/alarm-list/alarm/resource 605 | +--ro alarm-type-id leafref 606 | +--ro alarm-type-qualifier leafref 607 +--ro impacted-resource* resource 608 +--ro root-cause-resource* resource 609 +--ro time-created yang:date-and-time 610 +--ro is-cleared boolean 611 +--ro last-changed yang:date-and-time 612 +--ro perceived-severity severity 613 +--ro alarm-text alarm-text 614 +--ro status-change* [time] {alarm-history}? 615 | +--ro time yang:date-and-time 616 | +--ro perceived-severity severity-with-clear 617 | +--ro alarm-text alarm-text 618 +--ro operator-state-change* [time] {operator-actions}? 619 | +--ro time yang:date-and-time 620 | +--ro operator string 621 | +--ro state operator-state 622 | +--ro text? string 623 +---x set-operator-state {operator-actions}? 624 +---w input 625 +---w state writable-operator-state 626 +---w text? string 628 Every alarm has three important states, the resource clearance state 629 "is-cleared", the severity "perceived-severity" and the operator 630 state available in the operator state change list. 632 In order to see the alarm history the resource state changes are 633 available in the "status-change" list and the operator history is 634 available in the "operator-state-change" list. 636 4.5. The Shelved Alarms List 638 The shelved alarm list has the same structure as the alarm list 639 above. It shows all the alarms that matches the shelving criteria 640 (/alarms/control/alarm-shelving). 642 4.6. RPCs and Actions 644 The alarm module supports rpcs and actions to manage the alarms: 646 "purge-alarms" (rpc): delete alarms according to specific 647 criteria, for example all cleared alarms older then a specific 648 date. 650 "compress-alarms" (rpc): compress the status-change list for the 651 alarms. 653 "set-operator-state" (action): change the operator state for an 654 alarm: for example acknowledge. 656 4.7. Notifications 658 The alarm module supports a general notification to report alarm 659 state changes. It carries all relevant parameters for the alarm 660 management application. 662 There is also a notification to report that an operator changed the 663 operator state on an alarm, like acknowledge. 665 If the alarm inventory is changed, for example a new card type is 666 inserted, a notification will tell the management application that 667 new alarm types are available. 669 5. Alarm YANG Module 671 This YANG module references [RFC6991]. 673 file "ietf-alarms@2018-02-01.yang" 674 module ietf-alarms { 675 yang-version 1.1; 676 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; 677 prefix al; 679 import ietf-yang-types { 680 prefix yang; 681 reference "RFC 6991: Common YANG Data Types."; 683 } 685 organization 686 "IETF CCAMP Working Group"; 688 contact 689 "WG Web: 690 WG List: 692 Editor: Stefan Vallin 693 695 Editor: Martin Bjorklund 696 "; 698 description 699 "This module defines an interface for managing alarms. Main 700 inputs to the module design are the 3GPP Alarm IRP, ITU-T X.733 701 and ANSI/ISA-18.2 alarm standards. 703 Main features of this module include: 705 * Alarm list: 706 A list of all alarms. Cleared alarms stay in 707 the list until explicitly removed. 709 * Operator actions on alarms: 710 Acknowledging and closing alarms. 712 * Administrative actions on alarms: 713 Purging alarms from the list according to specific 714 criteria. 716 * Alarm inventory: 717 A management application can read all 718 alarm types implemented by the system. 720 * Alarm shelving: 721 Shelving (blocking) alarms according 722 to specific criteria. 724 This module uses a stateful view on alarms. An alarm is a state 725 for a specific resource (note that an alarm is not a 726 notification). An alarm type is a possible alarm state for a 727 resource. For example, the tuple: 729 ('link-alarm', 'GigabitEthernet0/25') 731 is an alarm of type 'link-alarm' on the resource 732 'GigabitEthernet0/25'. 734 Alarm types are identified using YANG identities and an optional 735 string-based qualifier. The string-based qualifier allows for 736 dynamic extension of the statically defined alarm types. Alarm 737 types identify a possible alarm state and not the individual 738 notifications. For example, the traditional 'link-down' and 739 'link-up' notifications are two notifications referring to the 740 same alarm type 'link-alarm'. 742 With this design there is no ambiguity about how alarm and alarm 743 clear correlation should be performed: notifications that report 744 the same resource and alarm type are considered updates of the 745 same alarm, e.g., clearing an active alarm or changing the 746 severity of an alarm. 748 The instrumentation can update 'severity' and 'alarm-text' on an 749 existing alarm. The above alarm example can therefore look 750 like: 752 (('link-alarm', 'GigabitEthernet0/25'), 753 warning, 754 'interface down while interface admin state is up') 756 There is a clear separation between updates on the alarm from 757 the underlying resource, like clear, and updates from an 758 operator like acknowledge or closing an alarm: 760 (('link-alarm', 'GigabitEthernet0/25'), 761 warning, 762 'interface down while interface admin state is up', 763 cleared, 764 closed) 766 Administrative actions like removing closed alarms older than a 767 given time is supported. 769 Copyright (c) 2018 IETF Trust and the persons identified as 770 authors of the code. All rights reserved. 772 Redistribution and use in source and binary forms, with or 773 without modification, is permitted pursuant to, and subject to 774 the license terms contained in, the Simplified BSD License set 775 forth in Section 4.c of the IETF Trust's Legal Provisions 776 Relating to IETF Documents 777 (https://trustee.ietf.org/license-info). 779 The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL 780 NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and 781 'OPTIONAL' in the module text are to be interpreted as described 782 in RFC 2119 (https://tools.ietf.org/html/rfc2119). 784 This version of this YANG module is part of RFC XXXX 785 (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for 786 full legal notices."; 788 revision 2018-02-01 { 789 description 790 "Initial revision."; 791 reference 792 "RFC XXXX: YANG Alarm Module"; 793 } 795 /* 796 * Features 797 */ 799 feature operator-actions { 800 description 801 "This feature indicates that the system supports operator 802 states on alarms."; 803 } 805 feature alarm-shelving { 806 description 807 "This feature indicates that the system supports shelving 808 (blocking) alarms."; 809 } 811 feature alarm-history { 812 description 813 "This feature indicates that server maintains a history of 814 state changes for each alarm. For example, if an alarm 815 toggles between cleared and active 10 times, these state 816 changes are present in a separate list in the alarm."; 817 } 818 /* 819 * Identities 820 */ 822 identity alarm-type-id { 823 description 824 "Base identity for alarm types. A unique identification of the 825 alarm, not including the resource. Different resources can 826 share alarm types. If the resource reports the same alarm 827 type, it is to be considered to be the same alarm. The alarm 828 type is a simplification of the different X.733 and 3GPP alarm 829 IRP alarm correlation mechanisms and it allows for 830 hierarchical extensions. 832 A string-based qualifier can be used in addition to the 833 identity in order to have different alarm types based on 834 information not known at design-time, such as values in 835 textual SNMP Notification var-binds. 837 Standards and vendors can define sub-identities to clearly 838 identify specific alarm types. 840 This identity is abstract and MUST NOT be used for alarms."; 841 } 843 /* 844 * Common types 845 */ 847 typedef resource { 848 type union { 849 type instance-identifier { 850 require-instance false; 851 } 852 type yang:object-identifier; 853 type string; 854 } 855 description 856 "This is an identification of the alarming resource, such as an 857 interface. It should be as fine-grained as possible both to 858 guide the operator and to guarantee uniqueness of the alarms. 860 If the alarming resource is modelled in YANG, this type will 861 be an instance-identifier. 863 If the resource is an SNMP object, the type will be an 864 object-identifier. 866 If the resource is anything else, for example a distinguished 867 name or a CIM path, this type will be a string. 869 If the server supports several models, the presedence should 870 be in the order as given in the union definition."; 871 } 873 typedef resource-match { 874 type union { 875 type yang:xpath1.0; 876 type yang:object-identifier; 877 type string; 878 } 879 description 880 "This type is used to match resources of type 'resource'. 881 Since the type 'resource' is a union of three different types, 882 the 'resource-match' type is also a union if corresponding 883 types. 885 If the type is given as an XPath 1.0 expression, a resource 886 of type 'instance-identifier' matches if the instance is part 887 of the node set that is the result of evaluating the XPath 1.0 888 expression. For example, the XPath 1.0 expression: 890 /if:interfaces/if:interface[if:type='ianaift:ethernetCsmacd'] 892 would match the resource instance-identifier: 894 /if:interfaces/if:interface[if:name='eth1'], 896 assuming that the interface 'eth1' is of type 897 'ianaift:ethernetCsmacd'. 899 If the type is given as an object identifier, a resource of 900 type 'object-identifier' matches if the match object 901 identifier is a prefix of the resource's object identifier. 902 For example, the value: 904 1.3.6.1.2.1.2.2 906 would match the resource object identifier: 908 1.3.6.1.2.1.2.2.1.1.5 910 If the type is given as a string, it is interpreted as a W3C 911 regular expression, which matches a resource of type 'string' 912 if the given regular expression matches the resource string. 914 If the type is given as an XPath expressionm it is evaluated 915 in the following XPath context: 917 o The set of namespace declarations are those in scope on 918 the leaf element where this type is used. 920 o The set of variable bindings is empty. 922 o The function library is the core function library 923 and the functions defined in Section 10 of RFC 7950. 925 o The function library is the core function library 927 o The context node is the root node in the data tree."; 928 } 930 typedef alarm-text { 931 type string; 932 description 933 "The string used to inform operators about the alarm. This 934 MUST contain enough information for an operator to be able 935 to understand the problem and how to resolve it. If this 936 string contains structure, this format should be clearly 937 documented for programs to be able to parse that 938 information."; 939 } 941 typedef severity { 942 type enumeration { 943 enum indeterminate { 944 value 2; 945 description 946 "Indicates that the severity level could not be 947 determined. This level SHOULD be avoided."; 948 } 949 enum minor { 950 value 3; 951 description 952 "The 'minor' severity level indicates the existence of a 953 non-service affecting fault condition and that corrective 954 action should be taken in order to prevent a more serious 955 (for example, service affecting) fault. Such a severity 956 can be reported, for example, when the detected alarm 957 condition is not currently degrading the capacity of the 958 resource."; 959 } 960 enum warning { 961 value 4; 962 description 963 "The 'warning' severity level indicates the detection of 964 a potential or impending service affecting fault, before 965 any significant effects have been felt. Action should be 966 taken to further diagnose (if necessary) and correct the 967 problem in order to prevent it from becoming a more 968 serious service affecting fault."; 969 } 970 enum major { 971 value 5; 972 description 973 "The 'major' severity level indicates that a service 974 affecting condition has developed and an urgent 975 corrective action is required. Such a severity can be 976 reported, for example, when there is a severe 977 degradation in the capability of the resource 978 and its full capability must be restored."; 979 } 980 enum critical { 981 value 6; 982 description 983 "The 'critical' severity level indicates that a service 984 affecting condition has occurred and an immediate 985 corrective action is required. Such a severity can be 986 reported, for example, when a resource becomes totally 987 out of service and its capability must be restored."; 988 } 989 } 990 description 991 "The severity level of the alarm. Note well that value 'clear' 992 is not included. If an alarm is cleared or not is a separate 993 boolean flag."; 994 reference 995 "ITU Recommendation X.733: Information Technology 996 - Open Systems Interconnection 997 - System Management: Alarm Reporting Function"; 998 } 1000 typedef severity-with-clear { 1001 type union { 1002 type enumeration { 1003 enum cleared { 1004 value 1; 1005 description 1006 "The alarm is cleared by the instrumentation."; 1007 } 1008 } 1009 type severity; 1010 } 1011 description 1012 "The severity level of the alarm including clear. 1013 This is used *only* in notifications reporting state changes 1014 for an alarm."; 1015 } 1017 typedef writable-operator-state { 1018 type enumeration { 1019 enum none { 1020 value 1; 1021 description 1022 "The alarm is not being taken care of."; 1023 } 1024 enum ack { 1025 value 2; 1026 description 1027 "The alarm is being taken care of. Corrective action not 1028 taken yet, or failed"; 1029 } 1030 enum closed { 1031 value 3; 1032 description 1033 "Corrective action taken successfully."; 1034 } 1035 } 1036 description 1037 "Operator states on an alarm. The 'closed' state indicates 1038 that an operator considers the alarm being resolved. This 1039 is separate from the alarm's 'is-cleared' leaf."; 1040 } 1042 typedef operator-state { 1043 type union { 1044 type writable-operator-state; 1045 type enumeration { 1046 enum shelved { 1047 value 4; 1048 description 1049 "The alarm is shelved. Alarms in /alarms/shelved-alarms/ 1050 MUST be assigned this operator state by the server as 1051 the last entry in the operator-state-change list. The 1052 text for that entry SHOULD include the shelf name."; 1053 } 1054 enum un-shelved { 1055 value 5; 1056 description 1057 "The alarm is moved back to 'alarm-list' from a shelf. 1058 Alarms that are moved from /alarms/shelved-alarms/ 1059 to /alarms/alarm-list MUST be assigned this 1060 state by the server as the last entry in the 1061 'operator-state-change' list. The text for that 1062 entry SHOULD include the shelf name."; 1063 } 1064 } 1065 } 1066 description 1067 "Operator states on an alarm. The 'closed' state indicates 1068 that an operator considers the alarm being resolved. This 1069 is separate from the alarm's 'is-cleared' leaf."; 1070 } 1072 /* Alarm type */ 1074 typedef alarm-type-id { 1075 type identityref { 1076 base alarm-type-id; 1077 } 1078 description 1079 "Identifies an alarm type. The description of the alarm type 1080 id MUST indicate if the alarm type is abstract or not. An 1081 abstract alarm type is used as a base for other alarm type ids 1082 and will not be used as a value for an alarm or be present in 1083 the alarm inventory."; 1084 } 1086 typedef alarm-type-qualifier { 1087 type string; 1088 description 1089 "If an alarm type can not be fully specified at design time by 1090 alarm-type-id, this string qualifier is used in addition to 1091 fully define a unique alarm type. 1093 The definition of alarm qualifiers is considered being part 1094 of the instrumentation and out of scope for this module. 1095 An empty string is used when this is part of a key."; 1096 } 1098 /* 1099 * Groupings 1100 */ 1102 grouping common-alarm-parameters { 1103 description 1104 "Common parameters for an alarm. 1106 This grouping is used both in the alarm list and in the 1107 notification representing an alarm state change."; 1109 leaf resource { 1110 type resource; 1111 mandatory true; 1112 description 1113 "The alarming resource. See also 'alt-resource'. 1114 This could for example be a reference to the alarming 1115 interface"; 1116 } 1118 leaf alarm-type-id { 1119 type alarm-type-id; 1120 mandatory true; 1121 description 1122 "This leaf and the leaf 'alarm-type-qualifier' together 1123 provides a unique identification of the alarm type."; 1124 } 1126 leaf alarm-type-qualifier { 1127 type alarm-type-qualifier; 1128 description 1129 "This leaf is used when the 'alarm-type-id' leaf cannot 1130 uniquely identify the alarm type. Normally, this is not 1131 the case, and this leaf is the empty string."; 1132 } 1134 leaf-list alt-resource { 1135 type resource; 1136 description 1137 "Used if the alarming resource is available over other 1138 interfaces. This field can contain SNMP OID's, CIM paths or 1139 3GPP Distinguished names for example."; 1140 } 1142 list related-alarm { 1143 key "resource alarm-type-id alarm-type-qualifier"; 1145 description 1146 "References to related alarms. Note that the related alarm 1147 might have been removed from the alarm list."; 1149 leaf resource { 1150 type leafref { 1151 path "/alarms/alarm-list/alarm/resource"; 1152 require-instance false; 1153 } 1154 description 1155 "The alarming resource for the related alarm."; 1156 } 1157 leaf alarm-type-id { 1158 type leafref { 1159 path "/alarms/alarm-list/alarm" 1160 + "[resource=current()/../resource]" 1161 + "/alarm-type-id"; 1162 require-instance false; 1164 } 1165 description 1166 "The alarm type identifier for the related alarm."; 1167 } 1168 leaf alarm-type-qualifier { 1169 type leafref { 1170 path "/alarms/alarm-list/alarm" 1171 + "[resource=current()/../resource]" 1172 + "[alarm-type-id=current()/../alarm-type-id]" 1173 + "/alarm-type-qualifier"; 1174 require-instance false; 1175 } 1176 description 1177 "The alarm qualifier for the related alarm."; 1178 } 1179 } 1180 leaf-list impacted-resource { 1181 type resource; 1182 description 1183 "Resources that might be affected by this alarm. If the 1184 system creates an alarm on a resource and also has a mapping 1185 to other resources that might be impacted, these resources 1186 can be listed in this leaf-list. In this way the system can 1187 create one alarm instead of several. For example, if an 1188 interface has an alarm, the 'impacted-resource' can 1189 reference the aggregated port channels."; 1190 } 1191 leaf-list root-cause-resource { 1192 type resource; 1193 description 1194 "Resources that are candidates for causing the alarm. If the 1195 system has a mechanism to understand the candidate root 1196 causes of an alarm, this leaf-list can be used to list the 1197 root cause candidate resources. In this way the system can 1198 create one alarm instead of several. An example might be a 1199 logging system (alarm resource) that fails, the alarm can 1200 reference the file-system in the 'root-cause-resource' 1201 leaf-list. Note that the intended use is not to also send an 1202 an alarm with the root-cause-resource as alarming resource. 1203 The root-cause-resource leaf list is a hint and should not 1204 also generate an alarm for the same problem."; 1205 } 1206 } 1208 grouping alarm-state-change-parameters { 1209 description 1210 "Parameters for an alarm state change. 1212 This grouping is used both in the alarm list's 1213 status-change list and in the notification representing an 1214 alarm state change."; 1216 leaf time { 1217 type yang:date-and-time; 1218 mandatory true; 1219 description 1220 "The time the status of the alarm changed. The value 1221 represents the time the real alarm state change appeared 1222 in the resource and not when it was added to the 1223 alarm list. The /alarm-list/alarm/last-changed MUST be 1224 set to the same value."; 1225 } 1226 leaf perceived-severity { 1227 type severity-with-clear; 1228 mandatory true; 1229 description 1230 "The severity of the alarm as defined by X.733. Note 1231 that this may not be the original severity since the alarm 1232 may have changed severity."; 1233 reference 1234 "ITU Recommendation X.733: Information Technology 1235 - Open Systems Interconnection 1236 - System Management: Alarm Reporting Function"; 1237 } 1238 leaf alarm-text { 1239 type alarm-text; 1240 mandatory true; 1241 description 1242 "A user friendly text describing the alarm state change."; 1243 reference 1244 "ITU Recommendation X.733: Information Technology 1245 - Open Systems Interconnection 1246 - System Management: Alarm Reporting Function"; 1247 } 1248 } 1250 grouping operator-parameters { 1251 description 1252 "This grouping defines parameters that can 1253 be changed by an operator"; 1254 leaf time { 1255 type yang:date-and-time; 1256 mandatory true; 1257 description 1258 "Timestamp for operator action on alarm."; 1259 } 1260 leaf operator { 1261 type string; 1262 mandatory true; 1263 description 1264 "The name of the operator that has acted on this 1265 alarm."; 1266 } 1267 leaf state { 1268 type operator-state; 1269 mandatory true; 1270 description 1271 "The operator's view of the alarm state."; 1272 } 1273 leaf text { 1274 type string; 1275 description 1276 "Additional optional textual information provided by 1277 the operator."; 1278 } 1279 } 1281 grouping resource-alarm-parameters { 1282 description 1283 "Alarm parameters that originates from the resource view."; 1284 leaf is-cleared { 1285 type boolean; 1286 mandatory true; 1287 description 1288 "Indicates the current clearance state of the alarm. An 1289 alarm might toggle from active alarm to cleared alarm and 1290 back to active again."; 1291 } 1293 leaf last-changed { 1294 type yang:date-and-time; 1295 mandatory true; 1296 description 1297 "A timestamp when the alarm status was last changed. Status 1298 changes are changes to 'is-cleared', 'perceived-severity', 1299 and 'alarm-text'."; 1300 } 1302 leaf perceived-severity { 1303 type severity; 1304 mandatory true; 1305 description 1306 "The last severity of the alarm. 1308 If an alarm was raised with severity 'warning', but later 1309 changed to 'major', this leaf will show 'major'."; 1310 } 1312 leaf alarm-text { 1313 type alarm-text; 1314 mandatory true; 1315 description 1316 "The last reported alarm text. This text should contain 1317 information for an operator to be able to understand 1318 the problem and how to resolve it."; 1319 } 1321 list status-change { 1322 if-feature alarm-history; 1323 key "time"; 1324 min-elements 1; 1325 description 1326 "A list of status change events for this alarm. 1328 The entry with latest time-stamp in this list MUST 1329 correspond to the leafs 'is-cleared', 'perceived-severity' 1330 and 'alarm-text' for the alarm. The time-stamp for that 1331 entry MUST be equal to the 'last-changed' leaf. 1333 This list is ordered according to the timestamps of 1334 alarm state changes. The last item corresponds to the 1335 latest state change. 1337 The following state changes creates an entry in this 1338 list: 1339 - changed severity (warning, minor, major, critical) 1340 - clearance status, this also updates the 'is-cleared' 1341 leaf 1342 - alarm text update"; 1344 uses alarm-state-change-parameters; 1345 } 1346 } 1348 /* 1349 * The /alarms data tree 1350 */ 1352 container alarms { 1353 description 1354 "The top container for this module"; 1356 container control { 1357 description 1358 "Configuration to control the alarm behaviour."; 1359 leaf max-alarm-status-changes { 1360 type union { 1361 type uint16; 1362 type enumeration { 1363 enum infinite { 1364 description 1365 "The status change entries are accumulated 1366 infinitely."; 1367 } 1368 } 1369 } 1370 default 32; 1371 description 1372 "The status-change entries are kept in a circular list 1373 per alarm. When this number is exceeded, the oldest 1374 status change entry is automatically removed. If the 1375 value is 'infinite', the status change entries are 1376 accumulated infinitely."; 1377 } 1379 leaf notify-status-changes { 1380 type boolean; 1381 default false; 1382 description 1383 "This leaf controls whether notifications are sent on all 1384 alarm status updates, e.g., updated perceived-severity or 1385 alarm-text. By default the notifications are only sent 1386 when a new alarm is raised, re-raised after being cleared 1387 and when an alarm is cleared."; 1388 } 1389 container alarm-shelving { 1390 if-feature alarm-shelving; 1391 description 1392 "The alarm-shelving/shelf list is used to shelve 1393 (block/filter) alarms. The server will move any alarms 1394 corresponding to the shelving criteria from the 1395 alarms/alarm-list/alarm list to the 1396 alarms/shelved-alarms/shelved-alarm list. It will also 1397 stop sending notifications for the shelved alarms. The 1398 conditions in the shelf criteria are logically ANDed. 1399 When the shelving criteria is deleted or changed, the 1400 non-matching alarms MUST appear in the 1401 alarms/alarm-list/alarm list according to the real state. 1402 This means that the instrumentation MUST maintain states 1403 for the shelved alarms. Alarms that match the criteria 1404 shall have an operator-state 'shelved'. When the shelf 1405 configuration will remove an alarm from the shelf the 1406 server shall add an operator state 'unshelved'"; 1407 list shelf { 1408 key "name"; 1409 leaf name { 1410 type string; 1411 description 1412 "An arbitrary name for the alarm shelf."; 1413 } 1414 description 1415 "Each entry defines the criteria for shelving alarms. 1416 Criterias are ANDed. If no criteria are specified, 1417 all alarms will be shelved."; 1419 leaf-list resource { 1420 type resource-match; 1421 description 1422 "Shelve alarms for matching resources."; 1423 } 1424 leaf alarm-type-id { 1425 type alarm-type-id; 1426 description 1427 "Shelve all alarms that have an alarm-type-id that is 1428 equal to or derived from the given alarm-type-id."; 1429 } 1430 leaf alarm-type-qualifier-match { 1431 type string; 1432 description 1433 "A W3C regular expression that is used to match 1434 an alarm type qualifier. Shelve all alarms that 1435 matches this regular expression for the alarm 1436 type qualifier."; 1437 } 1438 leaf description { 1439 type string; 1440 description 1441 "An optional textual description of the shelf. This 1442 description should include the reason for shelving 1443 these alarms."; 1444 } 1445 } 1446 } 1447 } 1449 container alarm-inventory { 1450 config false; 1451 description 1452 "This alarm-inventory/alarm-type list contains all possible 1453 alarm types for the system. 1454 If the system knows for which resources a specific alarm 1455 type can appear, this is also identified in the inventory. 1456 The list also tells if each alarm type has a corresponding 1457 clear state. The inventory shall only contain concrete 1458 alarm types. 1460 The alarm inventory MUST be updated by the system when new 1461 alarms can appear. This can be the case when installing new 1462 software modules or inserting new card types. A 1463 notification 'alarm-inventory-changed' is sent when the 1464 inventory is changed."; 1466 list alarm-type { 1467 key "alarm-type-id alarm-type-qualifier"; 1468 description 1469 "An entry in this list defines a possible alarm."; 1470 leaf alarm-type-id { 1471 type alarm-type-id; 1472 description 1473 "The statically defined alarm type identifier for this 1474 possible alarm."; 1475 } 1476 leaf alarm-type-qualifier { 1477 type alarm-type-qualifier; 1478 description 1479 "The optionally dynamically defined alarm type identifier 1480 for this possible alarm."; 1481 } 1482 leaf-list resource { 1483 type resource-match; 1484 description 1485 "Optionally, specifies for which resources the alarm type 1486 is valid."; 1487 } 1488 leaf has-clear { 1489 type boolean; 1490 mandatory true; 1491 description 1492 "This leaf tells the operator if the alarm will be 1493 cleared when the correct corrective action has been 1494 taken. Implementations SHOULD strive for detecting the 1495 cleared state for all alarm types. If this leaf is 1496 true, the operator can monitor the alarm until it 1497 becomes cleared after the corrective action has been 1498 taken. If this leaf is false the operator needs to 1499 validate that the alarm is not longer active using other 1500 mechanisms. Alarms can lack a corresponding clear due 1501 to missing instrumentation or that there is no logical 1502 corresponding clear state."; 1503 } 1504 leaf-list severity-levels { 1505 type severity; 1506 description 1507 "This leaf-list indicates the possible severity levels of 1508 this alarm type. Note well that 'clear' is not part of 1509 the severity type. In general, the severity level should 1510 be defined by the instrumentation based on dynamic state 1511 and not defined statically by the alarm type in order to 1512 provide relevant severity level based on dynamic state 1513 and context. However most alarm types have a defined set 1514 of possible severity levels and this should be provided 1515 here."; 1516 } 1517 leaf description { 1518 type string; 1519 mandatory true; 1520 description 1521 "A description of the possible alarm. It SHOULD include 1522 information on possible underlying root causes and 1523 corrective actions."; 1524 } 1525 } 1526 } 1528 container summary { 1529 config false; 1530 description 1531 "This container gives a summary of number of alarms"; 1532 list alarm-summary { 1533 key "severity"; 1534 description 1535 "A global summary of all alarms in the system. The summary 1536 does not include shelved alarms"; 1538 leaf severity { 1539 type severity; 1540 description 1541 "Alarm summary for this severity level."; 1542 } 1543 leaf total { 1544 type yang:gauge32; 1545 description 1546 "Total number of alarms of this severity level."; 1547 } 1548 leaf cleared { 1549 type yang:gauge32; 1550 description 1551 "For this severity level, the number of alarms that are 1552 cleared."; 1553 } 1554 leaf cleared-not-closed { 1555 if-feature operator-actions; 1556 type yang:gauge32; 1557 description 1558 "For this severity level, the number of alarms that are 1559 cleared but not closed."; 1560 } 1561 leaf cleared-closed { 1562 if-feature operator-actions; 1563 type yang:gauge32; 1564 description 1565 "For this severity level, the number of alarms that are 1566 cleared and closed."; 1567 } 1568 leaf not-cleared-closed { 1569 if-feature operator-actions; 1570 type yang:gauge32; 1571 description 1572 "For this severity level, the number of alarms that are 1573 not cleared but closed."; 1574 } 1575 leaf not-cleared-not-closed { 1576 if-feature operator-actions; 1577 type yang:gauge32; 1578 description 1579 "For this severity level, the number of alarms that are 1580 not cleared and not closed."; 1581 } 1582 } 1583 leaf shelves-active { 1584 if-feature alarm-shelving; 1585 type empty; 1586 description 1587 "This is a hint to the operator that there are active 1588 alarm shelves. This leaf MUST exist if the 1589 alarms/shelved-alarms/number-of-shelved-alarms is > 0."; 1590 } 1591 } 1593 container alarm-list { 1594 config false; 1595 description 1596 "The alarms in the system."; 1597 leaf number-of-alarms { 1598 type yang:gauge32; 1599 description 1600 "This object shows the total number of 1601 alarms in the system, i.e., the total number 1602 of entries in the alarm list."; 1603 } 1605 leaf last-changed { 1606 type yang:date-and-time; 1607 description 1608 "A timestamp when the alarm list was last 1609 changed. The value can be used by a manager to 1610 initiate an alarm resynchronization procedure."; 1611 } 1613 list alarm { 1614 key "resource alarm-type-id alarm-type-qualifier"; 1616 description 1617 "The list of alarms. Each entry in the list holds one 1618 alarm for a given alarm type and resource. 1619 An alarm can be updated from the underlying resource or 1620 by the user. The following leafs are maintained by the 1621 resource: is-cleared, last-change, perceived-severity, 1622 and alarm-text. An operator can change: operator-state 1623 and operator-text. 1625 Entries appear in the alarm list the first time an 1626 alarm becomes active for a given alarm-type and resource. 1627 Entries do not get deleted when the alarm is cleared, this 1628 is a boolean state in the alarm. 1630 Alarm entries are removed, purged, from the list by an 1631 explicit purge action. For example, delete all alarms 1632 that are cleared and in closed operator-state that are 1633 older than 24 hours. Systems may also remove alarms based 1634 on locally configured policies which is out of scope for 1635 this module."; 1636 uses common-alarm-parameters; 1638 leaf time-created { 1639 type yang:date-and-time; 1640 mandatory true; 1641 description 1642 "The time-stamp when this alarm entry was created. This 1643 represents the first time the alarm appeared, it can 1644 also represent that the alarm re-appeared after a purge. 1645 Further state-changes of the same alarm does not change 1646 this leaf, these changes will update the 'last-changed' 1647 leaf."; 1648 } 1650 uses resource-alarm-parameters; 1652 list operator-state-change { 1653 if-feature operator-actions; 1654 key "time"; 1655 description 1656 "This list is used by operators to indicate 1657 the state of human intervention on an alarm. 1658 For example, if an operator has seen an alarm, 1659 the operator can add a new item to this list indicating 1660 that the alarm is acknowledged."; 1662 uses operator-parameters; 1663 } 1665 action set-operator-state { 1666 if-feature operator-actions; 1667 description 1668 "This is a means for the operator to indicate 1669 the level of human intervention on an alarm."; 1670 input { 1671 leaf state { 1672 type writable-operator-state; 1673 mandatory true; 1674 description 1675 "Set this operator state."; 1676 } 1677 leaf text { 1678 type string; 1679 description 1680 "Additional optional textual information."; 1681 } 1682 } 1683 } 1684 } 1685 } 1687 container shelved-alarms { 1688 if-feature alarm-shelving; 1689 config false; 1690 description 1691 "The shelved alarms. Alarms appear here if they match the 1692 criterias in /alarms/control/alarm-shelving. This list does 1693 not generate any notifications. The list represents alarms 1694 that are considered not relevant by the operator. Alarms in 1695 this list have an operator-state of 'shelved'. This can not 1696 be changed."; 1697 leaf number-of-shelved-alarms { 1698 type yang:gauge32; 1699 description 1700 "This object shows the total number of currently 1701 alarms, i.e., the total number of entries 1702 in the alarm list."; 1703 } 1705 leaf alarm-shelf-last-changed { 1706 type yang:date-and-time; 1707 description 1708 "A timestamp when the shelved alarm list was last 1709 changed. The value can be used by a manager to 1710 initiate an alarm resynchronization procedure."; 1711 } 1713 list shelved-alarm { 1714 key "resource alarm-type-id alarm-type-qualifier"; 1716 description 1717 "The list of shelved alarms. Each entry in the list holds 1718 one alarm for a given alarm type and resource. An alarm 1719 can be updated from the underlying resource or by the 1720 user. These changes are reflected in different lists 1721 below the corresponding alarm."; 1723 uses common-alarm-parameters; 1725 leaf shelf-name { 1726 type leafref { 1727 path "/alarms/control/alarm-shelving/shelf/name"; 1728 require-instance false; 1729 } 1730 description 1731 "The name of the shelf."; 1732 } 1733 uses resource-alarm-parameters; 1735 list operator-state-change { 1736 if-feature operator-actions; 1737 key "time"; 1738 description 1739 "This list is used by operators to indicate 1740 the state of human intervention on an alarm. 1741 For example, if an operator has seen an alarm, 1742 the operator can add a new item to this list indicating 1743 that the alarm is acknowledged."; 1745 uses operator-parameters; 1746 } 1747 } 1748 } 1749 } 1751 /* 1752 * Operations 1753 */ 1755 rpc compress-alarms { 1756 if-feature alarm-history; 1757 description 1758 "This operation requests the server to compress entries in the 1759 alarm list by removing all but the latest state change for all 1760 alarms. Conditions in the input are logically ANDed. If no 1761 input condition is given, all alarms are compressed."; 1762 input { 1763 leaf resource { 1764 type leafref { 1765 path "/alarms/alarm-list/alarm/resource"; 1766 require-instance false; 1767 } 1768 description 1769 "Compress the alarms with this resource."; 1770 } 1771 leaf alarm-type-id { 1772 type leafref { 1773 path "/alarms/alarm-list/alarm/alarm-type-id"; 1774 } 1775 description 1776 "Compress alarms with this alarm-type-id."; 1777 } 1778 leaf alarm-type-qualifier { 1779 type leafref { 1780 path "/alarms/alarm-list/alarm/alarm-type-qualifier"; 1781 } 1782 description 1783 "Compress the alarms with this alarm-type-qualifier."; 1784 } 1785 } 1786 output { 1787 leaf compressed-alarms { 1788 type uint32; 1789 description 1790 "Number of compressed alarm entries."; 1791 } 1792 } 1793 } 1795 grouping filter-input { 1796 description 1797 "Grouping to specify a filter construct on alarm information."; 1798 leaf alarm-status { 1799 type enumeration { 1800 enum any { 1801 description 1802 "Ignore alarm clearance status."; 1803 } 1804 enum cleared { 1805 description 1806 "Filter cleared alarms."; 1807 } 1808 enum not-cleared { 1809 description 1810 "Filter not cleared alarms."; 1811 } 1812 } 1813 mandatory true; 1814 description 1815 "The clearance status of the alarm."; 1816 } 1818 container older-than { 1819 presence "Age specification"; 1820 description 1821 "Matches the 'last-status-change' leaf in the alarm."; 1822 choice age-spec { 1823 description 1824 "Filter using date and time age."; 1825 case seconds { 1826 leaf seconds { 1827 type uint16; 1828 description 1829 "Seconds part"; 1830 } 1831 } 1832 case minutes { 1833 leaf minutes { 1834 type uint16; 1835 description 1836 "Minute part"; 1837 } 1838 } 1839 case hours { 1840 leaf hours { 1841 type uint16; 1842 description 1843 "Hours part."; 1844 } 1845 } 1846 case days { 1847 leaf days { 1848 type uint16; 1849 description 1850 "Day part"; 1851 } 1852 } 1853 case weeks { 1854 leaf weeks { 1855 type uint16; 1856 description 1857 "Week part"; 1858 } 1859 } 1860 } 1861 } 1862 container severity { 1863 presence "Severity filter"; 1864 choice sev-spec { 1865 description 1866 "Filter based on severity level."; 1867 leaf below { 1868 type severity; 1869 description 1870 "Severity less than this leaf."; 1871 } 1872 leaf is { 1873 type severity; 1874 description 1875 "Severity level equal this leaf."; 1876 } 1877 leaf above { 1878 type severity; 1879 description 1880 "Severity level higher than this leaf."; 1881 } 1882 } 1883 description 1884 "Filter based on severity."; 1885 } 1886 container operator-state-filter { 1887 if-feature operator-actions; 1888 presence "Operator state filter"; 1889 leaf state { 1890 type operator-state; 1891 description 1892 "Filter on operator state."; 1893 } 1894 leaf user { 1895 type string; 1896 description 1897 "Filter based on which operator."; 1898 } 1899 description 1900 "Filter based on operator state."; 1901 } 1902 } 1904 rpc purge-alarms { 1905 description 1906 "This operation requests the server to delete entries from the 1907 alarm list according to the supplied criteria. Typically it 1908 can be used to delete alarms that are in closed operator state 1909 and older than a specified time. The number of purged alarms 1910 is returned as an output parameter"; 1911 input { 1912 uses filter-input; 1913 } 1914 output { 1915 leaf purged-alarms { 1916 type uint32; 1917 description 1918 "Number of purged alarms."; 1919 } 1920 } 1921 } 1923 /* 1924 * Notifications 1925 */ 1927 notification alarm-notification { 1928 description 1929 "This notification is used to report a state change for an 1930 alarm. The same notification is used for reporting a newly 1931 raised alarm, a cleared alarm or changing the text and/or 1932 severity of an existing alarm."; 1934 uses common-alarm-parameters; 1935 uses alarm-state-change-parameters; 1936 } 1938 notification alarm-inventory-changed { 1939 description 1940 "This notification is used to report that the list of possible 1941 alarms has changed. This can happen when for example if a new 1942 software module is installed, or a new physical card is 1943 inserted"; 1944 } 1946 notification operator-action { 1947 if-feature operator-actions; 1948 description 1949 "This notification is used to report that an operator 1950 acted upon an alarm."; 1952 leaf resource { 1953 type leafref { 1954 path "/alarms/alarm-list/alarm/resource"; 1955 require-instance false; 1956 } 1957 description 1958 "The alarming resource."; 1959 } 1960 leaf alarm-type-id { 1961 type leafref { 1962 path "/alarms/alarm-list/alarm" 1963 + "[resource=current()/../resource]" 1964 + "/alarm-type-id"; 1965 require-instance false; 1966 } 1967 description 1968 "The alarm type identifier for the alarm."; 1969 } 1970 leaf alarm-type-qualifier { 1971 type leafref { 1972 path "/alarms/alarm-list/alarm" 1973 + "[resource=current()/../resource]" 1974 + "[alarm-type-id=current()/../alarm-type-id]" 1975 + "/alarm-type-qualifier"; 1976 require-instance false; 1977 } 1978 description 1979 "The alarm qualifier for the alarm."; 1981 } 1982 uses operator-parameters; 1983 } 1984 } 1986 1988 6. X.733 Alarm Mapping Data Model 1990 Many alarm management systems are based on the X.733 alarm standard. 1991 This YANG module allows a mapping from alarm types to X.733 event- 1992 type and probable-cause. 1994 The module augments the alarm inventory, the alarm list and the alarm 1995 notification with X.733 parameters. 1997 The module also supports a feature whereby the alarm manager can 1998 configure the mapping. This might be needed when the default mapping 1999 provided by the system is in conflict with other systems or not 2000 considered good. 2002 7. X.733 Alarm Mapping YANG Module 2004 This YANG module references [X.733]. 2006 file "ietf-alarms-x733@2017-10-30.yang" 2007 module ietf-alarms-x733 { 2008 yang-version 1.1; 2009 namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; 2010 prefix x733; 2012 import ietf-alarms { 2013 prefix al; 2014 } 2016 organization 2017 "IETF CCAMP Working Group"; 2019 contact 2020 "WG Web: 2021 WG List: 2023 Editor: Stefan Vallin 2024 2026 Editor: Martin Bjorklund 2027 "; 2029 description 2030 "This module augments the ietf-alarms module with X.733 mapping 2031 information. The following structures are augmented with 2032 event type and probable cause: 2034 1) alarm inventory: all possible alarms. 2035 2) alarm: every alarm in the system. 2036 3) alarm notification: notifications indicating alarm state 2037 changes. 2039 The module also optionally allows the alarm management system 2040 to configure the mapping. The mapping does not include a 2041 a corresponding specific problem value. The recommendation is 2042 to use alarm-type-qualifier which serves the same purpose."; 2043 reference 2044 "ITU Recommendation X.733: Information Technology 2045 - Open Systems Interconnection 2046 - System Management: Alarm Reporting Function"; 2048 revision 2017-10-30 { 2049 description 2050 "Initial revision."; 2051 reference 2052 "RFC XXXX: YANG Alarm Module"; 2053 } 2055 /* 2056 * Features 2057 */ 2059 feature configure-x733-mapping { 2060 description 2061 "The system supports configurable X733 mapping from 2062 alarm type to event type and probable cause."; 2063 } 2065 /* 2066 * Typedefs 2067 */ 2069 typedef event-type { 2070 type enumeration { 2071 enum other { 2072 value 1; 2073 description 2074 "None of the below."; 2075 } 2076 enum communications-alarm { 2077 value 2; 2078 description 2079 "An alarm of this type is principally associated with the 2080 procedures and/or processes required to convey 2081 information from one point to another."; 2082 reference 2083 "ITU Recommendation X.733: Information Technology 2084 - Open Systems Interconnection 2085 - System Management: Alarm Reporting Function"; 2086 } 2087 enum quality-of-service-alarm { 2088 value 3; 2089 description 2090 "An alarm of this type is principally associated with a 2091 degradation in the quality of a service."; 2092 reference 2093 "ITU Recommendation X.733: Information Technology 2094 - Open Systems Interconnection 2095 - System Management: Alarm Reporting Function"; 2096 } 2097 enum processing-error-alarm { 2098 value 4; 2099 description 2100 "An alarm of this type is principally associated with a 2101 software or processing fault."; 2102 reference 2103 "ITU Recommendation X.733: Information Technology 2104 - Open Systems Interconnection 2105 - System Management: Alarm Reporting Function"; 2106 } 2107 enum equipment-alarm { 2108 value 5; 2109 description 2110 "An alarm of this type is principally associated with an 2111 equipment fault."; 2112 reference 2113 "ITU Recommendation X.733: Information Technology 2114 - Open Systems Interconnection 2115 - System Management: Alarm Reporting Function"; 2116 } 2117 enum environmental-alarm { 2118 value 6; 2119 description 2120 "An alarm of this type is principally associated with a 2121 condition relating to an enclosure in which the equipment 2122 resides."; 2123 reference 2124 "ITU Recommendation X.733: Information Technology 2125 - Open Systems Interconnection 2126 - System Management: Alarm Reporting Function"; 2127 } 2128 enum integrity-violation { 2129 value 7; 2130 description 2131 "An indication that information may have been illegally 2132 modified, inserted or deleted."; 2133 reference 2134 "ITU Recommendation X.736: Information Technology 2135 - Open Systems Interconnection 2136 - System Management: Security Alarm Reporting Function"; 2137 } 2138 enum operational-violation { 2139 value 8; 2140 description 2141 "An indication that the provision of the requested service 2142 was not possible due to the unavailability, malfunction or 2143 incorrect invocation of the service."; 2144 reference 2145 "ITU Recommendation X.736: Information Technology 2146 - Open Systems Interconnection 2147 - System Management: Security Alarm Reporting Function"; 2148 } 2149 enum physical-violation { 2150 value 9; 2151 description 2152 "An indication that a physical resource has been violated 2153 in a way that suggests a security attack."; 2154 reference 2155 "ITU Recommendation X.736: Information Technology 2156 - Open Systems Interconnection 2157 - System Management: Security Alarm Reporting Function"; 2158 } 2159 enum security-service-or-mechanism-violation { 2160 value 10; 2161 description 2162 "An indication that a security attack has been detected by 2163 a security service or mechanism."; 2164 reference 2165 "ITU Recommendation X.736: Information Technology 2166 - Open Systems Interconnection 2167 - System Management: Security Alarm Reporting Function"; 2168 } 2169 enum time-domain-violation { 2170 value 11; 2171 description 2172 "An indication that an event has occurred at an unexpected 2173 or prohibited time."; 2174 reference 2175 "ITU Recommendation X.736: Information Technology 2176 - Open Systems Interconnection 2177 - System Management: Security Alarm Reporting Function"; 2178 } 2179 } 2180 description 2181 "The event types as defined by X.733 and X.736. The use of the 2182 term 'event' is a bit confusing. In an alarm context these 2183 are top level alarm types."; 2184 } 2186 /* 2187 * Groupings 2188 */ 2190 grouping x733-alarm-parameters { 2191 description 2192 "Common X.733 parameters for alarms."; 2194 leaf event-type { 2195 type event-type; 2196 description 2197 "The X.733/X.736 event type for this alarm."; 2198 } 2199 leaf probable-cause { 2200 type uint32; 2201 description 2202 "The X.733 probable cause for this alarm."; 2203 } 2204 } 2206 grouping x733-alarm-definition-parameters { 2207 description 2208 "Common X.733 parameters for alarm definitions."; 2210 leaf event-type { 2211 type event-type; 2212 description 2213 "The alarm type has this X.733/X.736 event type."; 2214 } 2215 leaf probable-cause { 2216 type uint32; 2217 description 2218 "The alarm type has this X.733 probable cause value. 2219 This module defines probable cause as an integer 2220 and not as an enumeration. The reason being that the 2221 primary use of probable cause is in the management 2222 application if it is based on the X.733 standard. 2223 However, most management applications have their own 2224 defined enum definitions and merging enums from 2225 different systems might create conflicts. By using 2226 a configurable uint32 the system can be configured 2227 to match the enum values in the manager."; 2228 } 2229 } 2231 /* 2232 * Add X.733 parameters to the alarm definitions, alarms, 2233 * and notification. 2234 */ 2236 augment "/al:alarms/al:alarm-inventory/al:alarm-type" { 2237 description 2238 "Augment X.733 mapping information to the alarm inventory."; 2240 uses x733-alarm-definition-parameters; 2241 } 2243 augment "/al:alarms/al:control" { 2244 description 2245 "Add X.733 mapping capabilities. "; 2246 list x733-mapping { 2247 if-feature configure-x733-mapping; 2248 key "alarm-type-id alarm-type-qualifier-match"; 2249 description 2250 "This list allows a management application to control the 2251 X.733 mapping for all alarm types in the system. Any entry 2252 in this list will allow the alarm manager to over-ride the 2253 default X.733 mapping in the system and the final mapping 2254 will be shown in the alarm-inventory"; 2256 leaf alarm-type-id { 2257 type al:alarm-type-id; 2258 description 2259 "Map the alarm type with this alarm type identifier."; 2260 } 2261 leaf alarm-type-qualifier-match { 2262 type string; 2263 description 2264 "A W3C regular expression that is used when mapping an 2265 alarm type and alarm-type-qualifier to X.733 parameters."; 2266 } 2268 uses x733-alarm-definition-parameters; 2270 } 2271 } 2273 augment "/al:alarms/al:alarm-list/al:alarm" { 2274 description 2275 "Augment X.733 information to the alarm."; 2277 uses x733-alarm-parameters; 2278 } 2280 augment "/al:alarms/al:shelved-alarms/al:shelved-alarm" { 2281 description 2282 "Augment X.733 information to the alarm."; 2284 uses x733-alarm-parameters; 2285 } 2287 augment "/al:alarm-notification" { 2288 description 2289 "Augment X.733 information to the alarm notification."; 2291 uses x733-alarm-parameters; 2292 } 2293 } 2295 2297 8. IANA Considerations 2299 This document registers a URI in the IETF XML registry [RFC3688]. 2300 Following the format in RFC 3688, the following registration is 2301 requested to be made. 2303 URI: urn:ietf:params:xml:ns:yang:ietf-alarms 2305 Registrant Contact: The IESG. 2307 XML: N/A, the requested URI is an XML namespace. 2309 This document registers a YANG module in the YANG Module Names 2310 registry [RFC6020]. 2312 name: ietf-alarms 2313 namespace: urn:ietf:params:xml:ns:yang:ietf-alarms 2314 prefix: al 2315 reference: RFC XXXX 2317 9. Security Considerations 2319 The YANG module specified in this document defines a schema for data 2320 that is designed to be accessed via network management protocols such 2321 as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer 2322 is the secure transport layer, and the mandatory-to-implement secure 2323 transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer 2324 is HTTPS, and the mandatory-to-implement secure transport is TLS 2325 [RFC5246]. 2327 The NETCONF access control model [RFC6536] provides the means to 2328 restrict access for particular NETCONF or RESTCONF users to a 2329 preconfigured subset of all available NETCONF or RESTCONF protocol 2330 operations and content. 2332 There are a number of data nodes defined in this YANG module that are 2333 writable/creatable/deletable (i.e., config true, which is the 2334 default). These data nodes may be considered sensitive or vulnerable 2335 in some network environments. Write operations (e.g., edit-config) 2336 to these data nodes without proper protection can have a negative 2337 effect on network operations. These are the subtrees and data nodes 2338 and their sensitivity/vulnerability: 2340 /alarms/control/notify-status-change: This leaf controls whether an 2341 alarm should notify only raise and clear or all severity level 2342 changes. Unauthorized access to leaf could have a negative impact 2343 on operational procedures relying on fine-grained alarm state 2344 change reporting. 2346 /alarms/control/alarm-shelving/shelf: This list controls the 2347 shelving (blocking) of alarms. Unauthorized access to this list 2348 could jeopardize the alarm management procedures since these 2349 alarms will not be notified and not be part of the alarm list. 2351 Some of the RPC operations in this YANG module may be considered 2352 sensitive or vulnerable in some network environments. It is thus 2353 important to control access to these operations. These are the 2354 operations and their sensitivity/vulnerability: 2356 purge-alarms: This RPC deletes alarms from the alarm list. 2357 Unauthorized use of this RPC could jeopardize the alarm management 2358 procedures since the deleted alarms may be vital for the alarm 2359 management application. 2361 10. Acknowledgements 2363 The authors wish to thank Viktor Leijon and Johan Nordlander for 2364 their valuable input on forming the alarm model. 2366 The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and 2367 Balazs Lengyel for their extensive reviews and contributions to this 2368 document. 2370 11. References 2372 11.1. Normative References 2374 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 2375 Requirement Levels", BCP 14, RFC 2119, 2376 DOI 10.17487/RFC2119, March 1997, . 2379 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 2380 DOI 10.17487/RFC3688, January 2004, . 2383 [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security 2384 (TLS) Protocol Version 1.2", RFC 5246, 2385 DOI 10.17487/RFC5246, August 2008, . 2388 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 2389 the Network Configuration Protocol (NETCONF)", RFC 6020, 2390 DOI 10.17487/RFC6020, October 2010, . 2393 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 2394 and A. Bierman, Ed., "Network Configuration Protocol 2395 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 2396 . 2398 [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure 2399 Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, 2400 . 2402 [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", 2403 RFC 6991, DOI 10.17487/RFC6991, July 2013, 2404 . 2406 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", 2407 RFC 7950, DOI 10.17487/RFC7950, August 2016, 2408 . 2410 [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 2411 Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, 2412 . 2414 [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2415 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, 2416 May 2017, . 2418 [X.733] International Telecommunications Union, "Information 2419 Technology - Open Systems Interconnection - Systems 2420 Management: Alarm Reporting Function", 2421 ITU-T Recommendation X.733, 1992. 2423 11.2. Informative References 2425 [ALARMIRP] 2426 3GPP, "Telecommunication management; Fault Management; 2427 Part 2: Alarm Integration Reference Point (IRP): 2428 Information Service (IS)", 3GPP TS 32.111-2 3.4.0, March 2429 2005. 2431 [ALARMSEM] 2432 Wallin, S., Leijon, V., Nordlander, J., and N. Bystedt, 2433 "The semantics of alarm definitions: enabling systematic 2434 reasoning about alarms. International Journal of Network 2435 Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, 2436 http://dx.doi.org/10.1002/nem.800", March 2012. 2438 [EEMUA] EEMUA Publication No. 191 Engineering Equipment and 2439 Materials Users Association, London, 2 edition., "Alarm 2440 Systems: A Guide to Design, Management and Procurement.", 2441 2007. 2443 [I-D.ietf-netmod-yang-tree-diagrams] 2444 Bjorklund, M. and L. Berger, "YANG Tree Diagrams", draft- 2445 ietf-netmod-yang-tree-diagrams-05 (work in progress), 2446 January 2018. 2448 [ISA182] International Society of Automation,ISA, "ANSI/ISA- 2449 18.2-2009 Management of Alarm Systems for the Process 2450 Industries", 2009. 2452 [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management 2453 Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, 2454 September 2004, . 2456 Appendix A. Vendor-specific Alarm-Types Example 2458 This example shows how to define alarm-types in a vendor-specific 2459 module. In this case the vendor "xyz" has chosen to define top level 2460 identities according to X.733 event types. 2462 module example-xyz-alarms { 2463 namespace "urn:example:xyz-alarms"; 2464 prefix xyz-al; 2466 import ietf-alarms { 2467 prefix al; 2468 } 2470 identity xyz-alarms { 2471 base al:alarm-type-id; 2472 } 2474 identity communications-alarm { 2475 base xyz-alarms; 2476 } 2477 identity quality-of-service-alarm { 2478 base xyz-alarms; 2479 } 2480 identity processing-error-alarm { 2481 base xyz-alarms; 2482 } 2483 identity equipment-alarm { 2484 base xyz-alarms; 2485 } 2486 identity environmental-alarm { 2487 base xyz-alarms; 2488 } 2490 // communications alarms 2491 identity link-alarm { 2492 base communications-alarm; 2493 } 2495 // QoS alarms 2496 identity high-jitter-alarm { 2497 base quality-of-service-alarm; 2498 } 2499 } 2501 Appendix B. Alarm Inventory Example 2503 This shows an alarm inventory, it shows one alarm type defined only 2504 with the identifier, and another dynamically configured. In the 2505 latter case a digital input has been connected to a smoke-detector, 2506 therefore the 'alarm-type-qualifier' is set to "smoke-detector" and 2507 the 'alarm-type-identity' to "environmental-alarm". 2509 2512 2513 2514 xyz-al:link-alarm 2515 2516 2517 /dev:interfaces/dev:interface 2518 2519 true 2520 2521 Link failure, operational state down but admin state up 2522 2523 2524 2525 xyz-al:environmental-alarm 2526 smoke-alarm 2527 true 2528 2529 Connected smoke detector to digital input 2530 2531 2532 2533 2535 Appendix C. Alarm List Example 2537 In this example we show an alarm that has toggled [major, clear, 2538 major]. An operator has acknowledged the alarm. 2540 2543 2544 1 2545 2015-04-08T08:39:50.00Z 2547 2548 2549 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 2550 2551 xyz-al:link-alarm 2552 2554 2015-04-08T08:39:50.00Z 2555 false 2556 1.3.6.1.2.1.2.2.1.1.17 2557 2015-04-08T08:39:40.00Z 2558 major 2559 2560 Link operationally down but administratively up 2561 2562 2563 2564 major 2565 2566 Link operationally down but administratively up 2567 2568 2569 2570 2571 cleared 2572 2573 Link operationally up and administratively up 2574 2575 2576 2577 2578 major 2579 2580 Link operationally down but administratively up 2581 2582 2583 2584 2585 ack 2586 joe 2587 Will investigate, ticket TR764999 2588 2589 2590 2591 2593 Appendix D. Alarm Shelving Example 2595 This example shows how to shelf alarms. We shelf alarms related to 2596 the smoke-detectors since they are being installed and tested. We 2597 also shelf all alarms from FastEthernet1/0. 2599 2602 2603 2604 2605 FE10 2606 2607 /dev:interfaces/dev:interface[name='FastEthernet1/0'] 2608 2609 2610 2611 detectortest 2612 xyz-al:environmental-alarm 2613 2614 smoke-alarm 2615 2616 2617 2618 2619 2621 Appendix E. X.733 Mapping Example 2623 This example shows how to map a dynamic alarm type (alarm-type- 2624 identity=environmental-alarm, alarm-type-qualifier=smoke-alarm) to 2625 the corresponding X.733 event-type and probable cause parameters. 2627 2629 2630 2632 xyz-al:environmental-alarm 2633 2634 smoke-alarm 2635 2636 quality-of-service-alarm 2637 777 2638 2639 2640 2642 Appendix F. Background and Usability Requirements 2644 This section gives background information regarding design choices in 2645 the alarm module. It also defines usability requirements for alarms. 2646 Alarm usability is important for an alarm interface. A data-model 2647 will help in defining the format but if the actual alarms is of low 2648 value we have not gained the goal of alarm management. 2650 The telecommunication domain has standardised an alarm interface in 2651 ITU-T X.733 [X.733]. This continued in mobile networks within the 2652 3GPP organisation [ALARMIRP]. Although SNMP is the dominant 2653 mechanism for monitoring devices, IETF did not early on standardise 2654 an alarm MIB. Instead, management systems interpreted the enterprise 2655 specific traps per MIB and device to build an alarm list. When 2656 finally The Alarm MIB [RFC3877] was published, it had to address the 2657 existence of enterprise traps and map these into alarms. This 2658 requirement led to a MIB that is not always easy to use. 2660 F.1. Alarm Concepts 2662 There are two misconceptions regarding alarms and alarm interfaces 2663 that are important to sort out. The first problem is that alarms are 2664 mixed with events in general. Alarms MUST correspond to an 2665 undesirable state that needs corrective action. Many implementations 2666 of alarm interfaces do not adhere to this principle and just send 2667 events in general. In order to qualify as an alarm, there must exist 2668 a corrective action. If that is not true, it is an event that can go 2669 into logs. 2671 The other misconception is that the term "alarm" refers to the 2672 notification itself. Rather, an alarm is a state of a resource in 2673 the system. The alarm notifications report state changes of the 2674 alarm, such as alarm raise and alarm clear. 2676 "One of the most important principles of alarm management is that an 2677 alarm requires an action. This means that if the operator does not 2678 need to respond to an alarm (because unacceptable consequences do not 2679 occur), then it is not an alarm. Following this cardinal rule will 2680 help eliminate many potential alarm management issues." [ISA182] 2682 F.1.1. Alarm type 2684 Since every alarm has a corresponding corrective action, a vendor can 2685 to prepare a list of available alarms and their corrective actions. 2686 We use the term "alarm type" to refer to every possible alarm that 2687 could be active in the system. 2689 Alarm types are also fundamental in order to provide a state-based 2690 alarm list. The alarm list correlates alarm state changes for the 2691 same alarm type and the same resource into one alarm. 2693 Different alarm interfaces use different mechanisms to define alarm 2694 types, ranging from simple error numbers to more advanced mechanisms 2695 like the X.733 triplet of event type, probable cause and specific 2696 problem. 2698 A common misunderstanding is that individual alarm notifications are 2699 alarm types. This is not correct; e.g., "link-up" and "link-down" 2700 are two notifications reporting different states for the same alarm 2701 type, "link-alarm". 2703 F.2. Usability Requirements 2705 Common alarm problems and the cause of the problems are summarised in 2706 Table 1. This summary is adopted to networking based on the ISA 2707 [ISA182] and EEMUA [EEMUA] standards. 2709 +------------------+--------------------------------+---------------+ 2710 | Problem | Cause | How this | 2711 | | | module | 2712 | | | address the | 2713 | | | cause | 2714 +------------------+--------------------------------+---------------+ 2715 | Alarms are | "Nuisance" alarms (chattering | Strict | 2716 | generated but | alarms and fleeting alarms), | definition of | 2717 | they are ignored | faulty hardware, redundant | alarms | 2718 | by the operator. | alarms, cascading alarms, | requiring | 2719 | | incorrect alarm settings, | corrective | 2720 | | alarms have not been | response. | 2721 | | rationalised, the alarms | Alarm | 2722 | | represent log information | requirements | 2723 | | rather than true alarms. | in Table 2. | 2724 | | | | 2725 | When alarms | Insufficient alarm response | The alarm | 2726 | occur, operators | procedures and not well | inventory | 2727 | do not know how | defined alarm types. | lists all | 2728 | to respond. | | alarm types | 2729 | | | and | 2730 | | | corrective | 2731 | | | actions. | 2732 | | | Alarm | 2733 | | | requirements | 2734 | | | in Table 2. | 2735 | | | | 2736 | The alarm | Nuisance alarms, stale alarms, | The alarm | 2737 | display is full | alarms from equipment not in | definition | 2738 | of alarms, even | service. | and alarm | 2739 | when there is | | shelving. | 2740 | nothing wrong. | | | 2741 | | | | 2742 | During a | Incorrect prioritization of | State-based | 2743 | failure, | alarms. Not using advanced | alarm model, | 2744 | operators are | alarm techniques (e.g. state- | alarm rate | 2745 | flooded with so | based alarming). | requirements | 2746 | many alarms that | | in Table 3 | 2747 | they do not know | | and Table 4 | 2748 | which ones are | | | 2749 | the most | | | 2750 | important. | | | 2751 +------------------+--------------------------------+---------------+ 2753 Table 1: Alarm Problems and Causes 2755 Based upon the above problems EEMUA gives the following definition of 2756 a good alarm: 2758 +----------------+--------------------------------------------------+ 2759 | Characteristic | Explanation | 2760 +----------------+--------------------------------------------------+ 2761 | Relevant | Not spurious or of low operational value. | 2762 | | | 2763 | Unique | Not duplicating another alarm. | 2764 | | | 2765 | Timely | Not long before any response is needed or too | 2766 | | late to do anything. | 2767 | | | 2768 | Prioritised | Indicating the importance that the operator | 2769 | | deals with the problem. | 2770 | | | 2771 | Understandable | Having a message which is clear and easy to | 2772 | | understand. | 2773 | | | 2774 | Diagnostic | Identifying the problem that has occurred. | 2775 | | | 2776 | Advisory | Indicative of the action to be taken. | 2777 | | | 2778 | Focusing | Drawing attention to the most important issues. | 2779 +----------------+--------------------------------------------------+ 2781 Table 2: Definition of a Good Alarm 2783 Vendors SHOULD rationalise all alarms according to above. Another 2784 crucial requirement is acceptable alarm rates. Vendors SHOULD make 2785 sure that they do not exceed the recommendations from EEMUA below: 2787 +-----------------------------------+-------------------------------+ 2788 | Long Term Alarm Rate in Steady | Acceptability | 2789 | Operation | | 2790 +-----------------------------------+-------------------------------+ 2791 | More than one per minute | Very likely to be | 2792 | | unacceptable. | 2793 | | | 2794 | One per 2 minutes | Likely to be over-demanding. | 2795 | | | 2796 | One per 5 minutes | Manageable. | 2797 | | | 2798 | Less than one per 10 minutes | Very likely to be acceptable. | 2799 +-----------------------------------+-------------------------------+ 2801 Table 3: Acceptable Alarm Rates, Steady State 2803 +----------------------------+--------------------------------------+ 2804 | Number of alarms displayed | Acceptability | 2805 | in 10 minutes following a | | 2806 | major network problem | | 2807 +----------------------------+--------------------------------------+ 2808 | More than 100 | Definitely excessive and very likely | 2809 | | to lead to the operator to abandon | 2810 | | the use of the alarm system. | 2811 | | | 2812 | 20-100 | Hard to cope with. | 2813 | | | 2814 | Under 10 | Should be manageable - but may be | 2815 | | difficult if several of the alarms | 2816 | | require a complex operator response. | 2817 +----------------------------+--------------------------------------+ 2819 Table 4: Acceptable Alarm Rates, Burst 2821 The numbers in Table 3 and Table 4 are the sum of all alarms for a 2822 network being managed from one alarm console. So every individual 2823 system or NMS contributes to these numbers. 2825 Vendors SHOULD make sure that the following rules are used in 2826 designing the alarm interface: 2828 1. Rationalize the alarms in the system to ensure that every alarm 2829 is necessary, has a purpose, and follows the cardinal rule - that 2830 it requires an operator response. Adheres to the rules of 2831 Table 2 2833 2. Audit the quality of the alarms. Talk with the operators about 2834 how well the alarm information support them. Do they know what 2835 to do in the event of an alarm? Are they able to quickly 2836 diagnose the problem and determine the corrective action? Does 2837 the alarm text adhere to the requirements in Table 2? 2839 3. Analyze and benchmark the performance of the system and compare 2840 it to the recommended metrics in Table 3 and Table 4. Start by 2841 identifying nuisance alarms, standing alarms at normal state and 2842 startup. 2844 Authors' Addresses 2846 Stefan Vallin 2847 Stefan Vallin AB 2849 Email: stefan@wallan.se 2850 Martin Bjorklund 2851 Cisco 2853 Email: mbj@tail-f.com