idnits 2.17.1 draft-thaler-gdt-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-03-29) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity. ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 58: '...y one or more of the MUST requirements...' RFC 2119 keyword, line 239: '...rent" ELS. This table SHOULD be saved...' RFC 2119 keyword, line 298: '...owC. The expert MAY also have one or ...' RFC 2119 keyword, line 319: '...the expert MUST fulfill the following ...' RFC 2119 keyword, line 395: '...server, an agent SHOULD first attempt ...' (22 more instances...) Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (24 January 1997) is 9926 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: 'Ack-Timeout' on line 647 -- Looks like a reference, but probably isn't: 'Expert-Timeout' on line 651 -- Looks like a reference, but probably isn't: 'Hypothesis-Period' on line 656 -- Looks like a reference, but probably isn't: 'Deletion-Timeout' on line 1399 -- Looks like a reference, but probably isn't: 'Origin-Timeout' on line 1394 -- Looks like a reference, but probably isn't: 'SR-Period' on line 1403 -- Looks like a reference, but probably isn't: 'Advertisement-Period' on line 1413 -- Looks like a reference, but probably isn't: 'Capability-Holdtime' on line 1802 -- Looks like a reference, but probably isn't: 'Whois-Delay' on line 1416 -- Possible downref: Non-RFC (?) normative reference: ref. '1' -- Possible downref: Non-RFC (?) normative reference: ref. '2' -- Possible downref: Non-RFC (?) normative reference: ref. '3' Summary: 10 errors (**), 0 flaws (~~), 1 warning (==), 14 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Dave Thaler 2 INTERNET-DRAFT Merit 3 Expires July 1998 24 January 1997 5 Globally-Distributed Troubleshooting (GDT): Protocol Specification 6 8 Status of this Memo 10 This document is an Internet Draft. Internet Drafts are working 11 documents of the Internet Engineering Task Force (IETF), its Areas, and 12 its Working Groups. Note that other groups may also distribute working 13 documents as Internet Drafts. 15 Internet Drafts are valid for a maximum of six months and may be 16 updated, replaced, or obsoleted by other documents at any time. It is 17 inappropriate to use Internet Drafts as reference material or to cite 18 them other than as a "work in progress". 20 To learn the current status of any Internet-Draft, please check the 21 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 22 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 23 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 24 ftp.isi.edu (US West Coast). 26 Abstract 28 This document describes a protocol for "globally-distributed 29 troubleshooting" (GDT). GDT automates, where possible, the process of 30 problem reporting and referral between customers and Internet Service 31 Providers (ISPs), as well as between ISPs. GDT also provides an 32 automatic mechanism for periodic status reports, and allows an ISP to 33 make information (such as expected time to repair) on a current problem 34 readily accessible to those directly affected by it, without requiring 35 human intervention. 37 Draft GDT Protocol Specification January 1997 39 1. Introduction 41 1.1. Purpose 43 The GDT protocol automates, where possible, the process of problem 44 reporting and referral between customers and ISPs, as well as between 45 multiple ISPs. GDT provides an automatic mechanism for periodic status 46 reports, and allows an ISP to make information (such as expected time to 47 repair) on a current problem readily accessible to those directly 48 affected by it, without requiring human intervention. 50 1.2. Terminology 52 This document uses the same words as RFC 2119 for defining the 53 significance of each particular requirement. These words are: 55 MUST: 56 This word or the adjective "required" means that the item is an 57 absolute requirement of the specification. An implementation is not 58 compliant if it fails to satisfy one or more of the MUST requirements 59 for the protocols it implements. 61 SHOULD: 62 This word or the adjective "recommended" means there may exist valid 63 reasons in particular circumstances to ignore this item, but the full 64 implications should be understood and the case carefully weighed 65 before choosing a different course. 67 MAY: 68 This word or the adjective "optional" means that this item is truly 69 optional. One implementation may choose to include the item because 70 a particular application requires it or because it enhances the 71 product, for example, another implementation may omit the same item. 73 This document also uses the following technical terms: 75 agent: 76 An entity supporting the GDT protocol. There are three levels of 77 sophistication of agents: simple agents, experts, and expert location 78 servers (ELSs). 80 network element, or element: 81 A logical network object (e.g., an Ethernet, a process, or a TCP 83 Draft GDT Protocol Specification January 1997 85 connection). 87 area of expertise: 88 An identifier representing a specific set of network elements. 90 capability: 91 An (access mode, area of expertise) pair representing the knowledge 92 and permissions necesssary to troubleshoot problems with a set of 93 network elements. Access mode may be either diagnosis-only or 94 diagnosis-and-repair. 96 expert: 97 An agent with one or more capabilities. 99 expert location server (ELS): 100 An agent which has the responsibility of locating experts with 101 capabilities covering a given problem. 103 hypothesis: 104 A guess that a specific problem, indentified by a (problem 105 type,network element) pair, exists. Each hypothesis is submitted to 106 an expert with an area of expertise covering the given network 107 element. 109 problem type: 110 A general classification of problems. Problem types are classified 111 as follows. 113 Superficial problem types: 115 lowH 116 The element is experiencing degraded performance (low health). 117 This is the most general problem type, of which all others are 118 specific instances. 120 highU 121 The element is experiencing degraded performance due to unusually 122 high utilization (congestion), as opposed to the element being 123 down or malfunctioning. 125 Intermediate problem types: 127 lowerH 128 Degraded performance is due to degraded performance of an element 129 upon which the current element depends. 131 Draft GDT Protocol Specification January 1997 133 downstreamH 134 Degraded performance is due to degraded performance at one or more 135 downstream elements. 137 higherU 138 Unusually high utilization is due to unusually high resource 139 demands from a higher layer. 141 upstreamU 142 Unusually high utilization is due to unusually high throughput 143 demands from one or more upstream elements. 145 We assume that every problem observed is ultimately caused by the 146 presence of one or more of the following primary problem types at one or 147 more network elements: 149 highD 150 The element is demanding too many resources from other elements. 152 lowC 153 The element's capacity is insufficient to efficiently support 154 normal operation. 156 badHW 157 The hardware or software implementation is malfunctioning. 159 status: 160 Problem status values are classified as follows: 162 Unconfirmed 163 A test is in progress to confirm that the problem exists. 165 Diagnosis-Deferred 166 The test for the existence of the problem has been deferred until 167 a later time. 169 Rejected 170 The test failed, indicating that the problem does not exist, and 171 the problem state will shortly go away. 173 Indeterminate 174 Either no test is known, or the test was inconclusive, and the 175 problem state will shortly go away. 177 Draft GDT Protocol Specification January 1997 179 Confirmed 180 Existence of the problem is acknowledged, and causes are currently 181 being investigated. 183 Covered 184 Another problem is known to be causing the current problem, and 185 hence the expert is waiting for the cause to be repaired. 187 CantRepair 188 No repair is possible for the problem. 190 Isolated 191 A repair and retest are in progress. 193 Repair-Deferred 194 The repair has been deferred until a later time. 196 Repaired 197 The problem was successfully repaired, and the problem state will 198 shortly go away. 200 WentAway 201 The problem disappeared, and hence the problem state will shortly 202 go away. 204 Retesting 205 All previously-confirmed causes are gone, and a retest is in 206 progress. 208 Deleted 209 No state for the problem exists. 211 2. Protocol Overview 213 Problem reporting and resolution is a multi-phase procedure. In the 214 first phase, a Hypothesis message is submitted to an expert whose area 215 of expertise includes the network element which is experiencing 216 problems. Since there is no restriction on where such hypotheses may 217 originate, the expert next attempts to verify the existence of the 218 reported problem. If the problem is confirmed, the expert then 219 generates additional hypotheses about potential causes, which are in 220 turn submitted to appropriate experts. This process continues until one 221 or more problems are confirmed which have no causes. Repairs are then 222 requested for these problems. If repairs can not be immediately 223 Draft GDT Protocol Specification January 1997 225 initiated, repairs are then attempted at their immediate effects, and so 226 on back down the cause tree. 228 2.1. Agent Requirements 230 There are three (logical) tables which must be present in some form in 231 all GDT agents. While there may be more efficient ways of representing 232 the same information, example representations are given below. 234 Capability Table 235 The entry for each known capability has, associated with the 236 capability itself, the IP address of an expert, and a Capability- 237 Timer. This table is initialized upon startup to hold the agent's 238 own capabilities (if any), plus at least one all-encompassing 239 (default) capability for a "parent" ELS. This table SHOULD be saved 240 to stable storage. 242 Hypothesis Table 243 This table holds information on problems which the agent believes may 244 exist, and is waiting for them to be repaired or otherwise resolved. 245 The entry for each (proposed problem type, network element) pair has, 246 associated with it, the last known status and sequence number 247 received from a remote expert, an expert list, and a Expert-Timer. 248 The expert list is a set of experts with capabilities covering the 249 given hypothesis. 251 Reliable Message Table 252 This table keeps state for those messages which must be transmitted 253 reliably. The entry for each message to be sent reliably has, 254 associated with a given combination of message, destination IP 255 address, and port number, a transmission count and an Ack-Timer. 257 Any GDT agent may also be an expert. In addition to fulfilling all of 258 the requirements for simple agents, GDT experts have the following 259 additional table: 261 Problem Table 262 This table holds information on current problems, within the agent's 263 areas of expertise, on which the agent is currently working to 264 diagnose or repair. The entry for each active (problem type,network 265 element) pair has, associated with it, a problem status value, an 266 origin list, a cause list, and a Deletion-Timer. Each entry in the 267 origin list contains an IP address and port number from which a 268 Hypothesis about this problem was received, and an Origin-Timer. The 270 Draft GDT Protocol Specification January 1997 272 cause list is a set of hypotheses in the hypothesis table which are 273 potential causes of the given problem, each with an associated 274 Retracted-bit. 276 Finally, some experts may be Expert Location Servers (ELS's). In 277 addition to fulfilling all of the requirements for experts, ELS's have 278 the following additional table: 280 Child Table 281 This table holds information on the ELS's current children in the 282 hierarchy. The entry for each child has, associated with the child's 283 address, a maximal prefix length, an active prefix length, and a 284 Child-Timer. 286 2.1.1. Advertising Capabilities 288 An expert may advertise capabilities with an access mode of either 289 diagnosis-only or diagnosis-and-repair. To advertise a capability with 290 an access mode of diagnosis-only for some area of expertise, the expert 291 MUST fulfill the following requirements for any element within the area 292 of expertise: 294 (1) Be able to perform a test, or request that an operator perform a 295 test, to Confirm or Reject (or report that the test was 296 Indeterminate) a Hypotheses that the given network element is 297 experiencing each of the following problem types: lowH, highU, 298 highD, lowC. The expert MAY also have one or more tests for badHW. 300 (2) Be able to obtain the list of elements (if any) above the element 301 (i.e., those elements which depend upon the given element), or 302 report that the list resolution was Indeterminate. 304 (3) Be able to obtain the list of elements (if any) below the element 305 (i.e., those elements upon which the given element depends), or 306 report that the list resolution was Indeterminate. 308 (4) Be able to obtain the list of elements (if any) upstream of the 309 element (i.e., those adjacent elements which send data to the given 310 element), or report that the list resolution was Indeterminate. 312 (5) Be able to obtain the list of elements (if any) downstream from the 313 element (i.e., those adjacent elements to which the given element 314 sends data), or report that the list resolution was Indeterminate. 316 Draft GDT Protocol Specification January 1997 318 To advertise a capability with an access mode of diagnosis-and-repair, 319 the expert MUST fulfill the following requirement in addition to those 320 listed above: 322 (1) Be able to request that an operator perform, report on automatic 323 performing (if the element is self-correcting) of, or itself 324 perform each of the following types of repairs: 326 o When one element is imposing too many resource demands on another 327 element (higherU, upstreamU, highD), the amount of resources 328 available to it should be decreased. 330 o When capacity is insufficient to meet normal demand (lowC), the 331 capacity should be increased. 333 o When an element at a lower layer is faulty (lowerH), the higher 334 layer element may be reconfigured so that it no longer depends on 335 the faulty element (e.g., using a backup link instead). 337 o When a hardware fault or software bug exists (badHW), the problem 338 should be corrected, or the element replaced. 340 +----------+ +---------+ 341 | GDT |<------------->|Scheduler| 342 |Controller| +---------+ 343 +----------+ 344 ^ ... ^ ... 345 | +--------------------+ 346 v v 347 +-----------------+ +-----------------+ 348 | Test Supervisor | | Test Supervisor | 349 +=================+ . . . +=================+ . . . 350 | List Resolver | | List Resolver | 351 +=================+ +-----------------+ 352 |Repair Supervisor| Diagnosis-Only 353 +-----------------+ Domain Expertise Modules 354 Diagnosis-and-Control 355 Domain Expertise Modules 357 Figure 1: Logical Architecture of a GDT Expert 359 Figure 1 shows the logical architecture of a single GDT expert. This 360 document specifies the behavior of the (logical) GDT Controller module. 361 A (logical) scheduler is used to schedule tests and repairs. For each 362 Draft GDT Protocol Specification January 1997 364 of its capabilities, if any, the agent has a (logical) domain-expertise 365 module which is able to conduct tests, resolve lists of related 366 elements, and potentially supervise repairs. 368 2.2. Expert Location 370 Expert location servers are organized into a hierarchy by location, 371 where the leaves are actual experts. At startup time, experts and 372 location servers locate an appropriate parent, and add themselves to the 373 hierarchy. Agents starting up also locate one or more appropriate 374 servers and install default entries for these servers in their local 375 capability cache. 377 To do this, each expert location server is configured with a "maximal 378 policy prefix", denoting the range of addresses of experts for which it 379 is willing to be a parent. (This allows some policy control over the 380 server hierarchy.) Root servers should have a maximal policy prefix of 381 0/0. The length of the associated mask will be called the "maximal mask 382 length". 384 A hierarchy of experts is then constructed dynamically subject to the 385 policy constraints. The hierarchy thus constructed has the following 386 properties which prevent loops in the hierarchy: 388 o Every expert has an "active mask length" which is the longer of: its 389 own maximal mask length, and its parent's active mask length + 1. 391 o Every child has an address within its parent's active policy prefix. 393 An analysis of this scheme can be found in [1]. 395 To locate an initial server, an agent SHOULD first attempt to locate a 396 nearby server using an expanding-ring multicast search over a well-known 397 group address to which all experts listen. If this search fails, the 398 agent then uses a manually-configured list of servers, such as a list of 399 local servers. 401 2.3. Reliable Message Transport 403 All messages are sent using UDP. Experts listen on a well-known port 404 number, while clients may use any local port number. Since GDT is a 405 soft-state, connectionless protocol, some messages must be retransmitted 406 to achieve reliability. Hypothesis and Retract messages are always 407 Draft GDT Protocol Specification January 1997 409 acknowledged by receiving Status-Report messages. Status-Report 410 messages are sometimes sent reliably, and are acknowledged by Status-Ack 411 messages. Each message also contains a 16-bit sequence number to detect 412 out-of-order packets. 414 When a message is to be sent reliably, an entry for it is created in the 415 (logical) reliable message table. The message will then be transmitted 416 up to three times at intervals of [Ack-Timeout] seconds before giving 417 up. 419 3. Basic Behavior 421 In this section, we describe the detailed protocol which all GDT agents 422 must perform. Packet formats are described in Section 6. 424 3.1. Startup 426 When an agent first starts up, it must determine one or more expert 427 location servers and create default entries in its capability cache for 428 them. To do this, it SHOULD first join the RootAdv group and then 429 perform an expanding-ring multicast to locate a nearby server; if this 430 fails, it may wait until an Am-Root message is received and use the 431 root. 433 To perform an expanding-ring search, the agent joins the GDT-Server- 434 Location group, and multicasts periodic Whois-Server messages to the 435 All-GDT-Servers group with increasing TTLs until either a maximum TTL is 436 reached, or an Am-Server message is received from a legal parent. 438 When an Am-Server message is received from a legal parent (and the agent 439 is not an ELS), the agent may leave the GDT-Server-Location group. The 440 source of the message is then chosen as the agent's parent, and the 441 rules in Section 3.2 are followed. 443 3.2. Setting One's Parent 445 Whenever an agent changes its parent, any existing "default" entries are 446 first removed from the capability table. If the new parent is not null, 447 then a default entry for it is added to the capability table. 449 Draft GDT Protocol Specification January 1997 451 3.3. Detecting a problem and sending a Hypothesis 453 Problems are detected in an implementation-dependent manner (e.g., by 454 other protocols). When a problem is observed, the agent identifies the 455 network element experiencing a problem (element naming is discussed in a 456 separate document [2]), and checks its hypothesis table for an existing 457 hypothesis entry. 459 If an existing entry is found: 461 (1) If the agent is an expert, and the hypothesis was proposed by an 462 expert as the cause of a problem in the problem table, processing 463 continues (for that problem only) as if a Status-Report had been 464 received with the last known status of the hypothesis entry, using 465 the rules in Section 4.7. 467 (2) If the agent is not an expert, then the problem has already been 468 reported, and nothing further need be done at this point. 470 If no entry is found in the hypothesis table: 472 (1) If an entry exists in the reliable message table for a Retract of 473 the same Hypothesis, the entry is deleted. 475 (2) A new hypothesis entry is created with a last known status of 476 Unconfirmed. 478 (3) The agent then checks its capability table for a list of one or 479 more experts whose capabilities cover that element, and places them 480 in the expert list of the hypothesis entry. A capability covers an 481 element if all required attributes match, and no optional 482 attributes conflict. 484 (4) If no experts were found, processing continues as if a Status- 485 Report had been received with a status value of Indeterminate 486 (which we will refer to as an SR:Indeterminate message) had been 487 received (Sections 3.6.2 and 4.7.2). 489 (5) If any experts were found with diagnosis-and-repair capability, a 490 Hypothesis message is reliably sent to one of those experts. If 491 all experts found had diagnosis-only capability, a Hypothesis 492 message is reliably sent to one of them. To choose among multiple 493 equivalent experts, the following algorithm is employed (an 494 analysis of which can be found in [3]): 496 Draft GDT Protocol Specification January 1997 498 (a) Compute the CRC-32 checksum (X) of the network element 499 identifier (as discussed in [2]). 501 (b) For each possible expert address E, compute a value: 502 Value(X,E_i)=(1103515245*((1103515245* E + 12345) XOR X) 503 + 12345) mod 2^31 505 (c) The expert with the highest resulting value is then chosen as 506 the destination expert. This algorithm ensures that all agents 507 reporting the same problem use the same destination expert as 508 long as they see the same set of expert capabilities, while 509 dividing up problems among equivalent experts. 511 3.4. Sending a Retract 513 Any agent terminating nicely SHOULD retract all outstanding hypotheses. 514 An agent MAY also retract any outstanding hypothesis at any time. (For 515 example, as discussed in Section 4.7.3, experts retract hypotheses as a 516 result of another hypothesis being confirmed.) 518 When an agent wishes to retract a hypothesis, it sends a Retract message 519 to the same expert to which it sent the Hypothesis message. Unless the 520 agent is terminating, this Retract is sent reliably. The corresponding 521 entry is then deleted from the Hypothesis Table. If an entry exists in 522 the reliable message table for the original Hypothesis message, that 523 entry is deleted as well. 525 3.5. Receiving a Redirect 527 When an agent receives a Redirect containing a list of capabilities, it 528 MAY add the included capabilities to its capability table. 530 The agent then searches for a hypothesis entry which matches the problem 531 type and network element in the Redirect. If none is found, the Redirect 532 is dropped. Otherwise, the redirect origin is removed from the 533 hypothesis' list of experts, the experts listed in the Redirect are 534 added to the list, the Hypothesis is reliably sent to one of the 535 experts, and the Expert-Timer is cancelled if it was running. 537 Draft GDT Protocol Specification January 1997 539 3.6. Receiving a Status-Report (SR) 541 When an agent receives a Status-Report, it does the following: 543 (1) If the Ack-Request bit is set, a Status-Ack is sent to the origin 544 of the message. 546 (2) The agent then searches for a matching hypothesis entry. If none is 547 found, or if the source of the SR was not the expert to which the 548 matched hypothesis was sent, the SR is silently dropped and no 549 further processing of the Status Report is done. Otherwise, 551 (3) The hypothesis entry's Expert-Timer is restarted. 553 (4) If an entry exists in the (logical) reliable message table for the 554 Hypothesis, the reliable message entry is deleted. 556 (5) If the status in the SR is neither Unconfirmed nor Diagnosis- 557 Deferred, and an entry exists in the reliable message table for a 558 Retract of the hypothesis, then the reliable message entry is 559 deleted. If the status in the SR is Deleted, the hypothesis state 560 is deleted (see Section 4.9). 562 When a problem whose status is being reported is the same as the problem 563 in the hypothesis and the signed 16-bit difference between the included 564 sequence number and the stored sequence number is positive, the new 565 status and sequence number are stored in the hypothesis entry, and 566 additional processing for some SR's is done as follows: 568 3.6.1. Receiving SR:Rejected 570 When a SR:Rejected message is received, the matched hypothesis state is 571 deleted (see Section 4.9). 573 3.6.2. Receiving SR:Indeterminate 575 When a SR:Indeterminate message is received, the expert is removed from 576 the matched hypothesis entry's expert list. If any experts remain in 577 the expert list, a Hypothesis message is sent to one of them, the 578 Expert-Timer is cancelled if it was running, and the matched hypothesis 579 is marked as Unconfirmed. If the expert list is empty, the hypothesis 580 entry's Expert-Timer is stopped; in addition, if no problem table 581 entries have the hypothesis in the cause list, the hypothesis state is 582 Draft GDT Protocol Specification January 1997 584 deleted (see Section 4.9). 586 3.6.3. Receiving SR:Repaired, SR:CantRepair, or SR:WentAway 588 When a SR:Repaired, or SR:CantRepair, or SR:WentAway message is 589 received, the hypothesis entry's Expert-Timer is stopped. If the agent 590 is not an expert (or the agent is an expert and no problem entries have 591 the hypothesis in the cause list), the hypothesis state is deleted (see 592 Section 4.9). 594 3.7. Timers 596 Timers are implemented in an implementation-specific manner. For 597 example, a timer may count up or down, or may simply expire at a 598 specific time. Setting a timer to a value T means that it will expire 599 after T seconds. 601 Ack-Timer: 602 An Ack-Timer is kept for each reliable message entry. It is 603 initialized to [Ack-Timeout] when the entry is created (i.e., when a 604 message is to be sent reliably). It is cancelled if the entry is 605 deleted (i.e., when an acknowledgement is received). The first and 606 second times it expires, the timer is reset to [Ack-Timeout], and the 607 message is resent. If it expires a third time, and the message sent 608 was a Status-Report, the destination is removed from the origin list 609 of the matching problem state. If it expires a third time, and the 610 message sent was a Hypothesis, processing continues as if an 611 SR:Indeterminate message had been received from the destination. If 612 it expires a third time, and the message sent was a Retract, the 613 hypothesis entry is deleted (see Section 4.9). If it expires a third 614 time, and the destination was the current parent, the agent expires 615 all state for its old parent, resets its parent to be null (if it is 616 not an expert) or equal to the root (if it is an expert), and repeats 617 the parent selection process in Section 3.1. 619 Expert-Timer: 620 An Expert-Timer is associated with each hypothesis entry. It is set 621 to [Expert-Timeout] whenever a Status-Report message is received with 622 a matching problem type and network element. If it expires, the 623 hypothesis is processed as if an SR:Indeterminate message had been 624 received (Sections 3.6.2 and 4.7.2), and a new hypothesis is reported 625 of lowH with unicast connectivity between the local agent and the 626 remote expert (Section 3.3) unless this (new) problem is identical to 628 Draft GDT Protocol Specification January 1997 630 the hypothesis whose Expert-Timer expired. 632 Hypothesis-Timer: 633 At startup time, the Hypothesis-Timer is initialized to a random 634 value between 0 and [Hypothesis-Period] seconds. When it expires, 635 the timer is immediately reset to [Hypothesis-Period] seconds, and a 636 Hypothesis is sent to the first expert in the expert list of each 637 hypothesis state entry. 639 Capability-Timer: 640 For each non-local capability in the capability table, its 641 Capability-Timer is reset to the associated Holdtime in any Redirects 642 or (if the agent is an ELS) Am-Child messages received which contain 643 it. When it expires, the capability entry is deleted. 645 3.7.1. Default Values 647 [Ack-Timeout] 648 The time after which a message will be resent unless an 649 acknowledgement was received. Default: 5 seconds. 651 [Expert-Timeout] 652 The time after which a hypothesis will be marked Indeterminate unless 653 a Status-Report for it is received. Default: 190 (= default [SR- 654 Period]*3 + 10) seconds. 656 [Hypothesis-Period] 657 The time between sending periodic Hypothesis messages for all 658 hypotheses. Default: 300 seconds. 660 4. Expert Behavior 662 In addition to following all of the rules for simple agents, GDT experts 663 must do additional processing as follows. 665 4.1. Startup 667 The root and parent are both initialized to be null, and the expert 668 joins the RootAdv group. The root will be set as soon as an Am-Root 669 message is received from a legal parent. 671 Draft GDT Protocol Specification January 1997 673 The expert SHOULD also begin an expanding-ring search, as described in 674 Section 3.1. The parent will be set as soon as an Am-Root or Am-Server 675 message is received from a legal parent. 677 4.2. Receiving an Am-Root message 679 When an Am-Root message is received, the following actions are taken. 681 The source's address and active mask length are extracted from the 682 message. If the agent's own address does not fall within this prefix, 683 the message is silently dropped and no further processing is done. 685 If the agent has no root state, or if the source's maximal mask length 686 is less than the stored root's maximal mask length, or if the mask are 687 equal but the source has a lower address than the stored root, then: 689 (1) The source's information is stored as the new root. 691 (2) If the agent has no parent, the source's information is also stored 692 as the new parent, and the actions in Section 3.2 are performed. 694 If the source is the current root, the Root-Timer is restarted. 696 4.3. Receiving a Transfer 698 When an expert receives a Transfer, it searches its reliable message for 699 an Am-Child message with the same sequence number as in the 700 acknowledgement. If none is found, the Am-Parent message is silently 701 dropped, and no further processing is done. 703 Otherwise, the reliable message entry is deleted. 705 If the expert then checks to see whether the message was sent by its 706 current parent. If not, the message is silently dropped, and no further 707 processing is done. 709 The new parent's address, maximal prefix length, and active prefix 710 lengths are then extracted from the message and stored, and the actions 711 in Section 3.2 performed. 713 Finally, it reliably sends an Am-Child message to its new parent. 715 Draft GDT Protocol Specification January 1997 717 4.4. Receiving an Am-Parent message 719 When an Am-Parent message is received, the reliable message table is 720 searched for an Am-Child message with the same sequence number as in the 721 acknowledgement. If none is found, the Am-Parent message is silently 722 dropped, and no further processing is done. 724 Otherwise, the reliable message entry is deleted. 726 The expert then checks to see whether the message was sent by its 727 current parent. If not, the message is silently dropped, and no further 728 processing is done. 730 The Parent-Timer is then refreshed. 732 4.5. Receiving a Hypothesis message 734 When an expert receives a Hypothesis message, it first checks to see if 735 one of its own capabilities covers the given hypothesis. If a local 736 capability was found, the expert searches for any matching problem 737 entry, and proceeds as follows: 739 (1) If a matching problem entry was found, the origin of the Hypothesis 740 is added to the problem entry's origin list, and a Status-Report 741 with the problem's current status is then returned to the origin. 742 Else, 744 (2) If no matching entry was found, a new problem entry is created with 745 the problem type and element copied from the Hypothesis received. 746 The problem status is set to Unconfirmed, and the origin of the 747 Hypothesis included in the entry's origin list. The expert then 748 schedules a test of the Hypothesis (see Section 4.10.1). Finally, 749 if the problem status is still Unconfirmed after the test is 750 scheduled, an SR:Unconfirmed message is (unreliably) sent to the 751 origin. 753 If no local capability was found, it next checks its capability table 754 for any other experts with capabilities covering the given hypothesis. 755 If none are found, an SR:Indeterminate message is (unreliably) sent to 756 the origin of the Hypothesis. 758 If no local capability was matched, but one or more remote experts were 759 found, the expert must either act as a proxy and forward the Hypothesis, 760 or reply to the origin with a Redirect message containing the list of 761 Draft GDT Protocol Specification January 1997 763 remote experts. In either case, the list of experts is first ordered in 764 the same manner described in Section 3.3. 766 4.6. Receiving a Retract message 768 When an expert receives a Retract message, it first searches for a 769 matching problem entry. If a problem is found whose status is Confirmed, 770 Covered, Retesting, Isolated, or Repair-Deferred, the Retract is 771 silently dropped. 773 Otherwise, if a problem entry was found, the origin is removed from its 774 origin list. If the origin list becomes null for an entry whose status 775 is Diagnosis-Deferred, the test SHOULD be unscheduled, a Retract 776 reliably sent for any hypothesis in the cause list which is marked as 777 Unconfirmed or Diagnosis-Deferred, and the problem entry deleted. If the 778 list origin becomes null for a problem entry whose status is 779 Unconfirmed, the test in progress MAY be aborted, a Retract reliably 780 sent for any hypothesis in the cause list which is marked as Unconfirmed 781 or Diagnosis-Deferred, and the entry deleted. 783 Finally, a SR:Deleted message is sent to the origin of the Retract. 785 4.7. Receiving a Status-Report (SR) 787 When a Status-Report is received, in addition to following the rules 788 given in Section 3.6, an expert also follows the rules outlined in this 789 section for each problem entry, P, which previously (before any 790 deletions described in Section {s:agentSR}) contained the matched 791 hypothesis, H, in its cause list. 793 (1) If the status in the Status-Report is not Deleted, Unconfirmed, or 794 Diagnosis-Deferred, P's Retracted-bit for H is cleared. 796 (2) If P is the same as the problem (R) whose status was reported in 797 the SR, then a circular dependency must exist. To break the loop in 798 the cause tree, the hypothesis H SHOULD be deleted (see Section 799 4.9). 801 If the Relay-bit was set in the received message, the Status-Report is 802 relayed to all origins of problems with the hypothesis in the cause 803 list. Furthermore, if the Ack-Request bit was set in the message 804 received, the message is relayed reliably. 806 Draft GDT Protocol Specification January 1997 808 When a problem whose status is being reported (R) is the same as the 809 problem in the hypothesis H, and the signed 16-bit difference between 810 the included sequence number and the stored sequence number is positive, 811 additional processing is done as described below. 813 4.7.1. Receiving SR:Diagnosis-Deferred 815 If there are any other hypotheses in the cause list which were not 816 previously submitted, the expert may submit one or more of them. 818 4.7.2. Receiving SR:Indeterminate 820 If, after normal processing, no alternative experts exist, and P's 821 status is Confirmed, Covered, or CantRepair, and all hypotheses left in 822 P's cause list have been marked as Indeterminate: 824 (1) If there is only one hypothesis in the cause list, then this 825 hypothesis (H) is used below. Otherwise, if a hypothesis (H) of 826 badHW of the same element is not already in the cause list, one is 827 created and added to the cause list, and an entry is created for it 828 in the problem table as well. 830 (2) If the expert (if any) for H is the local agent itself, then the 831 associated problem table entry for H is treated as having been 832 confirmed (see Section 4.10.5). Otherwise, H is marked as Repair- 833 Deferred, and a repair is scheduled for P (see Section 4.12.1). 835 If, after normal processing, no alternative experts exist, and P's 836 status is Unconfirmed, Diagnosis-Deferred, or Intermediate, and all 837 hypotheses left in P's cause list have been marked as Indeterminate, the 838 problem status is set to Indeterminate, and a Status-Report with the 839 Ack-Request bit set is reliably sent to all origins. The problem entry 840 is then scheduled for deletion by setting its Deletion-Timer to 841 [Deletion-Timeout]. 843 4.7.3. Receiving SR:Confirmed, SR:Covered, SR:Retesting, or 844 SR:Isolated 846 (1) If the current problem status is either Unconfirmed or Diagnosis- 847 Deferred, the problem status is set to Confirmed, and a 848 SR:Confirmed message with the Relay and Ack-Request bits set is 850 Draft GDT Protocol Specification January 1997 852 reliably sent to all origins. 854 (2) For each Unconfirmed or Diagnosis-Deferred cause of the current 855 problem, if the Retract-bit was not set, the Retract-bit is set. 856 If, as a result, all problem entries with the hypothesis in the 857 cause list have the Retract-bit set, then a Retract message is 858 reliably sent to the same expert to which the Hypothesis was 859 submitted. 861 (3) If the current problem status is then Confirmed, it is changed to 862 Covered. 864 (4) If the current problem status is Repair-Deferred, the scheduled 865 repair SHOULD be unscheduled and the status changed to Covered. 867 4.7.4. Receiving SR:Repair-Deferred 869 (1) If the current problem status is either Unconfirmed or Diagnosis- 870 Deferred, the problem status is set to Confirmed, and a 871 SR:Confirmed message with the Relay and Ack-Request bits set is 872 reliably sent to all origins. 874 (2) If the current problem status is Confirmed, it is changed to 875 Covered. 877 (3) If P's status is Covered, and all hypotheses in the cause list have 878 been marked as Indeterminate or Repair-Deferred, then a repair is 879 scheduled for P (see Section 4.12.1). 881 4.7.5. Receiving SR:Repaired 883 (1) If the current problem status is either Unconfirmed or Diagnosis- 884 Deferred, the problem status is set to Confirmed, and a 885 SR:Confirmed message with the Relay and Ack-Request bits set is 886 reliably sent to all origins. 888 (2) If the current problem status is Confirmed, it is changed to 889 Covered. 891 (3) If the current problem status is Isolated, the repair in progress 892 MAY be aborted and the state changed to Covered. 894 Draft GDT Protocol Specification January 1997 896 (4) If the current problem status is Repair-Deferred, the scheduled 897 repair SHOULD be unscheduled and the state changed to Covered. 899 (5) If the current problem status is Covered, it is changed to 900 Retesting, and a retest is scheduled (Section 4.10.1). 902 (6) If the current problem status is Retesting, the problem entry is of 903 intermediate type, and there are no other hypotheses in the cause 904 list which are marked as Confirmed, Covered, Retesting, Repair- 905 Deferred, or Isolated, then the problem entry's status is set to 906 Repaired, a Status-Report with the Ack-Request bit set is reliably 907 sent to all agents in the entry's origin list, and the problem 908 entry is scheduled for deletion by setting its Deletion-Timer to 909 [Deletion-Timeout]. In addition, for any hypotheses in the cause 910 list whose last known status is Unconfirmed or Diagnosis-Deferred, 911 the associated Retract-bit is set. If, as a result, all problem 912 entries with the hypothesis in the cause list have the Retract-bit 913 set, then a Retract message is reliably sent to the same expert to 914 which the Hypothesis was submitted. 916 4.7.6. Receiving SR:WentAway 918 (1) If the current problem status is either Unconfirmed or Diagnosis- 919 Deferred, the problem status is set to Confirmed, and a 920 SR:Confirmed message with the Relay and Ack-Request bits set is 921 reliably sent to all origins. 923 (2) If the current problem status is Confirmed, it is changed to 924 Covered. 926 (3) If the current problem status is Isolated, the repair in progress 927 MAY be aborted and the state changed to Covered. 929 (4) If the current problem status is Repair-Deferred, the scheduled 930 repair SHOULD be unscheduled and the state changed to Covered. 932 (5) If the current problem status is Covered, it is changed to 933 Retesting, and a retest is scheduled (Section 4.10.1). 935 (6) If the current problem status is Retesting, the problem entry is of 936 intermediate type, and there are no other hypotheses in the cause 937 list which are marked as Confirmed, Covered, Retesting, Repair- 938 Deferred, or Isolated, then the problem entry's status is set to 939 WentAway, a Status-Report with the Ack-Request bit set is reliably 941 Draft GDT Protocol Specification January 1997 943 sent to all agents in the entry's origin list, and the problem 944 entry is scheduled for deletion by setting its Deletion-Timer to 945 [Deletion-Timeout]. In addition, for any hypotheses in the cause 946 list whose last known status is Unconfirmed or Diagnosis-Deferred, 947 the associated Retract-bit is set. If, as a result, all problem 948 entries with the hypothesis in the cause list have the Retract-bit 949 set, then a Retract message is reliably sent to the same expert to 950 which the Hypothesis was submitted. 952 4.8. Receiving a Status-Ack 954 When a Status-Ack is received, the reliable message table is searched 955 for a Status-Report with the same sequence number as in the 956 acknowledgement. If none is found, the Status-Ack is silently dropped. 957 Otherwise, the reliable message entry is deleted. 959 4.9. Deleting hypothesis state 961 Whenever hypothesis state is deleted at an expert, the following actions 962 are performed for each problem entry, P, with the hypothesis in its 963 cause list: 965 (1) Remove the hypothesis from the cause list. 967 (2) If P's status is Unconfirmed or Diagnosis-Deferred (which may 968 happen with intermediate problem types), and the cause list is 969 empty, the problem status is set to Rejected, and a Status-Report 970 with the Ack-Request bit set is reliably sent to all origins. The 971 problem entry is then scheduled for deletion by setting its 972 Deletion-Timer to [Deletion-Timeout]. 974 (3) If P's status is Unconfirmed or Diagnosis-Deferred (which may 975 happen with intermediate problem types), and its cause list 976 contains one or more hypotheses, all of which have been marked as 977 Indeterminate, the problem status is set to Indeterminate, and a 978 Status-Report with the Ack-Request bit set is reliably sent to all 979 origins. The problem entry is then scheduled for deletion by 980 setting its Deletion-Timer to [Deletion-Timeout]. 982 (4) If P's status is Confirmed and the cause list is empty, a repair is 983 scheduled (Section 4.12.1). 985 Draft GDT Protocol Specification January 1997 987 (5) If P's status is Confirmed and the cause list is non-empty, P is 988 treated as if a SR:Indeterminate message had been received for a 989 cause, by following the rules given in Section 4.7.2. 991 4.10. Supervising Tests 993 4.10.1. Scheduling a test 995 A test is scheduled for a problem whenever a new Hypothesis is received 996 or a SR:Repaired or SR:WentAway message is received for a cause. 998 To schedule a test, the following steps are performed: 1000 (1) If the current problem status is Covered, it is changed to 1001 Retesting. 1003 (2) If the problem type is higherU, upstreamU, downstreamH, or lowerH, 1004 a resolution is begun for the higher list, upstream list, 1005 downstream list, or lower list, respectively. If the current 1006 problem status is Unconfirmed, and resolution is done in a non- 1007 blocking fashion, then the problem status is changed to Diagnosis- 1008 Deferred. 1010 (3) If the problem is a primary or superficial type, an expert may 1011 elect to begin a test immediately, or to defer it until a later 1012 time. (This decision is made by the Scheduler in an 1013 implementation-specific manner). If the current problem status is 1014 Unconfirmed, and the test was deferred, the problem status is 1015 changed to Diagnosis-Deferred. 1017 4.10.2. When the time for a deferred test arrives 1019 If the current problem status is Diagnosis-Deferred, the problem status 1020 is set to Unconfirmed. The test is then begun. 1022 4.10.3. When a test completes, with negative results 1024 When a test completes, rejecting the hypothesis tested: 1026 (1) If the problem status was Unconfirmed, the status is set to 1027 Rejected. 1029 Draft GDT Protocol Specification January 1997 1031 (2) If the problem status was Retesting, then the status is set to 1032 Repaired if any cause was marked as Repaired, and to WentAway 1033 otherwise. 1035 (3) If the problem status was Repair-Deferred, the status is set to 1036 WentAway. 1038 (4) If the problem status was Isolated, the status is set to Repaired. 1040 If the problem status changed, a Status-Report with the Ack-Request bit 1041 set is reliably sent to all agents in the entry's origin list. 1043 The problem entry is then scheduled for deletion by setting its 1044 Deletion-Timer to [Deletion-Timeout]. 1046 For any hypotheses in the cause list whose last known status is not 1047 Indeterminate, Rejected, WentAway, or Repaired, the Retract-bit is set. 1048 If, as a result, all problem entries with the hypothesis in the cause 1049 list have the Retract-bit set, then a Retract message is reliably sent 1050 to the same expert to which the Hypothesis was submitted. 1052 4.10.4. When a test is indeterminate 1054 When a test completes, and the result is indeterminate, or when no test 1055 can be done: 1057 (1) The problem status is set to Indeterminate, and a SR:Indeterminate 1058 message with the Ack-Request bit set is reliably sent to all agents 1059 in the problem entry's origin list. 1061 (2) The problem entry is then scheduled for deletion by setting its the 1062 Deletion-Timer to [Deletion-Timeout]. 1064 4.10.5. When a test completes, with positive results 1066 When a test completes, confirming a hypothesis: 1068 (1) If the previous problem status was Unconfirmed, the problem status 1069 is set to Confirmed, a triggered SR:Confirmed with the Relay-bit 1070 set is reliably sent to all origins of the problem entry, and 1071 causal hypotheses are generated (see Section 4.10.6). Else, 1073 Draft GDT Protocol Specification January 1997 1075 (2) If the previous problem status was Retesting, the problem status is 1076 set to Confirmed, and causal hypotheses are generated (see Section 1077 4.10.6). Else, 1079 (3) If the previous problem status was Repair-Deferred, the problem 1080 status is set to Isolated and a repair is immediately initiated. 1081 Else, 1083 (4) If the previous problem status was Isolated: 1085 If other possible repairs exist, the next repair MAY be scheduled 1086 (Section 4.12.1). 1088 Otherwise, the problem status is set to Confirmed and causal 1089 hypotheses generated (see Section 4.10.6). (This covers the case 1090 where a new cause arose during the repair, but can result in 1091 redoing the same repair when multiple repairs are possible, unless 1092 the expert remembers that it has just tried the repair and failed.) 1094 4.10.6. Generating causal hypotheses 1096 If the problem type is lowH, hypotheses of highU, lowerH, and 1097 downstreamH (and optionally badHW) are generated about the current 1098 element. These hypotheses are added to the problem entry's cause list, 1099 and one or more (recommend all) of them are submitted to itself (Section 1100 3.3). 1102 If the problem type is highU, hypotheses of upstreamU, higherU, lowC, 1103 and highD (and optionally badHW) are generated about the current 1104 element. These hypotheses are added to the problem entry's cause list, 1105 and one or more (recommend all) of them are submitted to itself (Section 1106 3.3). 1108 If the problem type is lowC, highD, or badHW, no hypotheses are 1109 submitted since they represent primary-type problems. Instead, a repair 1110 is scheduled (Section 4.12.1). 1112 4.11. Resolving lists of elements 1114 Intermediate problem types are those whose immediate causes are problems 1115 with other elements. For intermediate problems, a list of other, 1116 potentially problematic, elements must be resolved. 1118 Draft GDT Protocol Specification January 1997 1120 4.11.1. When a list resolution succeeds 1122 If the list is empty, then the test is rejected (Section 4.10.3). 1124 If the list is not empty, then hypotheses of lowH (if the problem type 1125 was lowerH or downstreamH) or highU (if the problem type was upstreamU 1126 or higherU) of each element in the list are added to the problem entry's 1127 cause list, and one or more (recommend all) of them are submitted to an 1128 expert (Section 3.3). 1130 4.11.2. When a list resolution fails 1132 When a list resolution fails for an intermediate problem, the test for 1133 the intermediate problem is declared to be Indeterminate (Section 1134 4.10.4). 1136 4.12. Supervising Repairs 1138 A "repair" may entail performing an automated procedure, interacting 1139 with an operator, or simply alerting an operator and waiting until the 1140 operator notifies the agent that a manual repair has completed. 1142 4.12.1. Scheduling a repair 1144 A repair is scheduled for a problem whenever the cause list of a 1145 Confirmed problem entry becomes empty, or when all hypotheses left in 1146 the cause list of a Covered problem entry have been marked as Repair- 1147 Deferred. 1149 If the expert has diagnosis-only capability for the given problem, 1150 status is set to CantRepair, and a Status-Report with the Ack-Request 1151 and Relay bits set is reliably sent to all origins. (This ensures that 1152 repairs will be attempted at its immediate effects.) The problem entry 1153 is then scheduled for deletion by setting its Deletion-Timer to 1154 [Deletion-Timeout]. 1156 An expert may elect to begin a repair immediately, or to defer it until 1157 a later time. (This decision is made by the Scheduler in an 1158 implementation-specific manner.) If the repair is initiated immediately, 1159 the problem status is changed to Isolated. If the repair is deferred, 1160 the problem status is changed to Repair-Deferred. In either case, a 1161 Status-Report with the Relay-bit set is then sent to all origins. 1163 Draft GDT Protocol Specification January 1997 1165 4.12.2. When the time for a deferred repair arrives 1167 The problem status is first changed to Isolated. In an implementation- 1168 specific (or domain-specific) manner, the expert then decides whether 1169 the repair has been deferred long enough that the problem must be 1170 reconfirmed (e.g., if the time elapsed since the problem was initially 1171 confirmed is greater than some threshold). If the problem must be 1172 reconfirmed, a test is immediately begun. Otherwise, the repair is 1173 immediately begun. 1175 4.12.3. When a repair completes 1177 Another test is immediately initiated to verify that the repair was 1178 successful. 1180 4.13. Timers 1182 Origin-Timer: 1183 An Origin-Timer is associated with each origin of a problem entry. 1184 It is set to [Origin-Timeout] when the origin is first added to the 1185 entry's origin list, and is reset to that value whenever a subsequent 1186 Hypothesis message is received from that origin for the given 1187 problem. When it expires, the origin is removed from the entry's 1188 origin list. If the origin list becomes null for an entry whose 1189 status is Diagnosis-Deferred, the test SHOULD be unscheduled and the 1190 entry deleted. If the list becomes null for an entry whose status is 1191 Unconfirmed, the test in progress MAY be aborted and the entry 1192 deleted. 1194 Deletion-Timer: 1195 A Deletion-Timer is kept for each problem entry. It is initialized 1196 to [Deletion-Timeout] when the problem status is set to any of: 1197 Rejected, Indeterminate, Repaired, or WentAway. When it expires, the 1198 entry is deleted, and any hypotheses in the cause list which are 1199 referenced in no other problem entry cause lists are deleted (in 1200 addition, a Retract with the Ack-Request bit set is also sent for 1201 each of these hypotheses which is marked Unconfirmed or Diagnosis- 1202 Deferred). 1204 Status-Report-Timer: 1205 At startup time, the Status-Report-Timer is initialized to a random 1206 value between 0 and [SR-Period] seconds. When it expires, the timer 1207 is immediately reset to [SR-Period] seconds, and a Status-Report 1209 Draft GDT Protocol Specification January 1997 1211 (with the Relay-bit set if the status is Isolated or Repair-Deferred) 1212 is then sent to all origins for each problem state entry. This timer 1213 should not be reset by other events. 1215 Advertisement-Timer: 1216 At startup time, the Advertisement-Timer is initialized to a random 1217 value between 0 and [Advertisement-Period] seconds. When it expires, 1218 the timer is immediately reset to [Advertisement-Period] seconds, and 1219 an Am-Child message is sent to the expert's parent, containing a list 1220 of the expert's capabilities. If the expert is also an ELS, and has 1221 no parent, then an Am-Root message is multicast to the RootAdv group. 1222 This timer should not be reset by other events. 1224 Root-Timer: 1225 A Root-Timer is associated with the current root. It is set to the 1226 Holdtime included in the Am-Root message when the root is first set, 1227 and is reset to the included Holdtime whenever a subsequent Am-Root 1228 message is received from it. When the Root-Timer expires, if the 1229 root is also the expert's parent, and the expert is an ELS, then the 1230 root is set to the expert itself. Otherwise, the root is set to be 1231 empty. 1233 4.13.1. Default Values 1235 [Origin-Timeout] 1236 The time after which state for an origin will be removed unless a 1237 periodic Hypothesis is received from it. Default: 910 (= default 1238 [Hypothesis-Period]*3 + 10) seconds. 1240 [Deletion-Timeout] 1241 The time between scheduling deletion of an entry, and the actual 1242 deletion. Default: 5 seconds. 1244 [SR-Period] 1245 The time between sending periodic Status-Reports for all entries. 1246 Default: 60 seconds. 1248 [Advertisement-Period] 1249 The time between sending periodic Am-Child messages to the parent 1250 ELS. Default: 1 day. 1252 [Capability-Holdtime] 1253 The holdtime for one's capabilities include in Am-Child messages 1254 sent. This should be set to 2.5 * [Advertisement-Period]. Default: 1256 Draft GDT Protocol Specification January 1997 1258 2.5 days. 1260 5. Expert Location Server (ELS) Behavior 1262 In addition to following all of the rules for experts, ELS's must do 1263 additional processing as follows. 1265 5.1. Startup 1267 The ELS initializes its active mask length to be equal to its maximal 1268 mask length, and initializes its root to be itself and its parent to be 1269 null. 1271 The ELS then joins the RootAdv group. 1273 The ELS SHOULD also begin an expanding-ring search, as described in 1274 Section 3.1. The parent will be set as soon as an Am-Root or Am-Server 1275 message is received from a legal parent. When the expanding ring search 1276 completes (or [Capability-Holdtime] seconds after startup, if no such 1277 search is done), the ELS should join the All-GDT-Servers group so it may 1278 receive Whois-Server messages. 1280 5.2. Receiving a Whois-Server message 1282 When a Whois-Server message is received, the ELS checks to see if the 1283 included address is within the ELS's active policy prefix. If not, the 1284 message is silently dropped. 1286 If the included mask length is shorter than the ELS's own maximal mask 1287 length, or if they are equal but the origin's address is lower than the 1288 ELS's own address, then the message is silently dropped. 1290 If the Whois-Timer is already running, the message is silently dropped. 1292 Otherwise, the ELS starts its Whois-Timer to a random value between 0 1293 and [Whois-Delay] seconds, using the smallest clock granularity 1294 available. 1296 Draft GDT Protocol Specification January 1997 1298 5.3. Receiving an Am-Server message 1300 The message is first processed as with an Am-Root message, following the 1301 rules in Section 4.2; afterwards, if the origin is the root, then the 1302 ELS leaves the GDT-Server-Location group and any expanding-ring search 1303 in progress is ended. 1305 Otherwise, if the Whois-Timer is running, and the Am-Server message 1306 specifies that the sender's active policy prefix is equal to or less 1307 specific than the ELS's own active policy prefix, then the Whois-Timer 1308 is cancelled. 1310 5.4. Receiving an Am-Parent message 1312 If the message was not dropped according to the rules of Section 4.4, 1313 then the following additional steps are taken: 1315 (1) The expert's own active prefix length is reset to the maximum of: 1316 its own maximal prefix length, and (new parent's active prefix 1317 length + 1). 1319 (2) For each child in the child table whose address no longer falls 1320 with the expert's own active prefix, the child is removed from the 1321 table and a Transfer is sent to it, redirecting it to the expert's 1322 parent. 1324 (3) For each child in the child table whose active mask length is not 1325 greater than the expert's own active mask length, an Am-Parent 1326 message is sent to the child. 1328 5.5. Receiving an Am-Child message 1330 When an ELS receives an Am-Child message, it compares the senders's 1331 address and maximal prefix length included in the message with the 1332 active policy prefixes of its own children. 1334 (1) If any child is a legal parent of the sender, the ELS replies to 1335 the sender with a Transfer, redirecting it to that child, and, if 1336 the sender was also in the child table, it is removed. Else, 1338 (2) If no child is a legal parent of the sender, but the ELS itself is 1339 a legal parent, then: 1341 Draft GDT Protocol Specification January 1997 1343 (a) It adds the included capabilities to its capability table and 1344 initializes Capability-Timers to the associated holdtimes. 1346 (b) It then sends an Am-Parent message back to the sender with the 1347 same sequence number as in the advertisement. 1349 (c) If the sender was not in the child table, the sender is added. 1350 For each other child for which the sender is a legal parent, a 1351 Transfer is then sent to the other child, redirecting it to the 1352 sender, and the other child is removed from the child table. 1354 (d) The Child-Timer for the new child is then restarted. 1355 Else, 1357 (3) If the ELS is not a legal parent of the sender, then the ELS 1358 replies to the sender with a Transfer, redirecting it to the ELS's 1359 parent (or an address of 0 if it has no parent), and if the sender 1360 was also in the child table, it is removed. 1362 5.5.1. Sending Am-Child messages with Aggregate Capabilities 1364 If the ELS has a parent ELS, then every time the Advertisement-Timer 1365 expires, the ELS will send an Am-Child to its parent (as all experts 1366 do). 1368 The included capabilities MUST cover all capabilities which have been 1369 learned through Am-Child messages. The ELS should aggregate 1370 capabilities where possible, and must not include duplicate capabilities 1371 in the advertisement. 1373 5.6. Timers 1375 Child-Timer: 1376 A Child-Timer is associated with each entry in the child table. It 1377 is set to the Holdtime included in the Am-Child message when the 1378 child is first added to the child table, and is reset to the included 1379 Holdtime whenever a subsequent Am-Child message is received from that 1380 child. When it expires, the entry is removed from the child table. 1382 Whois-Timer: 1383 The Whois-Timer is started when a Whois-Server message is received. 1384 It is not reset when another Whois-Server message is received while 1385 the timer is already running. The timer is cancelled if an Am-Root 1387 Draft GDT Protocol Specification January 1997 1389 message is received. If the timer expires, the ELS multicasts an 1390 Am-Root message (whether or not it is the root) to the RootAdv group. 1392 5.6.1. Default Values 1394 [Origin-Timeout] 1395 The time after which an origin will be removed unless a periodic 1396 Hypothesis is received from it. Default: 910 (= default 1397 [Hypothesis-Period]*3 + 10) seconds. 1399 [Deletion-Timeout] 1400 The time between scheduling deletion of an entry, and the actual 1401 deletion. Default: 5 seconds. 1403 [SR-Period] 1404 The time between sending periodic Status-Reports for all entries. 1405 Default: 60 seconds. 1407 [Advertisement-Period] 1408 The time between sending periodic Am-Child messages to the parent 1409 ELS. Default: 1 day. 1411 [Capability-Holdtime] 1412 The holdtime for one's capabilities include in Am-Child messages 1413 sent. This should be set to 2.5 * [Advertisement-Period]. Default: 1414 2.5 days. 1416 [Whois-Delay] 1417 The maximum delay before responding to a Whois-Server message. 1418 Default: 2 seconds. 1420 Draft GDT Protocol Specification January 1997 1422 6. Packet Formats 1424 The header of each GDT message has the format illustrated below. The 1425 source IP address, port number, and message length are all contained in 1426 the encapsulating IP and UDP headers. 1428 0 1 2 3 1429 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1430 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1431 |GDT Ver| Rsvd | MType | Rsvd | Sequence Number | 1432 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1434 GDT Ver 1435 Identifies the protocol version. The version number of the protocol 1436 defined in this memo is zero (0). 1438 Rsvd 1439 Some messages use these fields for special purposes. Unless 1440 otherwise specified, these bits are transmitted as zero and ignored 1441 upon receipt. 1443 MType 1444 Types for specific GDT messages. GDT message types are: 1445 0 Hypothesis 1446 1 Retract 1447 2 Redirect 1448 3 Status-Report 1449 4 Status-Ack 1450 5 Am-Child 1451 6 Am-Parent 1452 7 Transfer 1453 8 Am-Root 1454 9 Whois-Server 1455 10 Am-Server 1457 Sequence Number 1458 The sequence number is used by the receiver to detect out-of-order 1459 packets. The sequence number MUST increment by at least one for each 1460 GDT message sent to the same destination concerning the same problem. 1461 (It MAY, for example, increment by one for each message sent 1462 regardless of the destination or problem.) The sequence number is 1463 only used by the receiver to detect out-of-order packets. 1465 Draft GDT Protocol Specification January 1997 1467 An Encoded-Unicast-Address has the following format: 1468 0 1 2 3 1469 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1470 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1471 | Addr Family | Encoding Type | Unicast Address ... 1472 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1474 Addr Family 1475 The address family of the `Unicast Address' field of this address. 1477 The address family numbers currently assigned by IANA are: 1478 Number Description 1479 ------ --------------------------------------------------------- 1480 0 Reserved 1481 1 IP (IP version 4) 1482 2 IP6 (IP version 6) 1483 3 NSAP 1484 4 HDLC (8-bit multidrop) 1485 5 BBN 1822 1486 6 802 (includes all 802 media plus Ethernet "canonical format") 1487 7 E.163 1488 8 E.164 (SMDS, Frame Relay, ATM) 1489 9 F.69 (Telex) 1490 10 X.121 (X.25, Frame Relay) 1491 11 IPX 1492 12 Appletalk 1493 13 Decnet IV 1494 14 Banyan Vines 1495 15 E.164 with NSAP format subaddress 1497 Encoding Type 1498 The type of encoding used within a specific Address Family. The value 1499 `0' is reserved for this field, and represents the native encoding of 1500 the Address Family. 1502 Unicast Address 1503 The unicast address as represented by the given Address Family and 1504 Encoding Type. 1506 Draft GDT Protocol Specification January 1997 1508 In addition, an Encoded-Network-Element and an Encoded-Capability both 1509 have the following format: 1510 0 1 2 3 1511 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1512 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1513 | Length | Value ... 1514 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1516 Options, each composed of a type, length (of the value only), and value, 1517 can be included in some messages. An option has the following format: 1518 0 1 2 3 1519 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1520 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1521 | Type | Length | Value ... 1522 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1524 Legal option types include: 1526 0--8 Cause 1527 The option identifies a cause of the reported problem, where the type 1528 is a problem type (see Section 6.1), and the value is a network 1529 element identifier. This option is typically included in relayed 1530 Status-Report messages. 1532 16 Expected time to confirm (ETC) 1533 The length is set to 4, and the value is the expected number of 1534 seconds until a test for a problem's existence is completed. This 1535 option is typically included in SR:Unconfirmed and SR:Diagnosis- 1536 Deferred messages. 1538 17 Expected time to repair (ETR) 1539 The length is set to 4, and the value is the expected number of 1540 seconds until the problem is repaired. This option is typically 1541 included in SR:Isolated and SR:Repair-Deferred messages. 1543 18 Attributes 1544 The value is a set of additional (optional) attributes known for a 1545 network element. This option is typically included in SR:Hypothesis 1546 and SR:Redirects. 1548 Other option types are reserved. Unknown options should be ignored, but 1549 should be propagated in relayed Status-Reports. 1551 Draft GDT Protocol Specification January 1997 1553 6.1. Hypothesis Message 1555 0 1 2 3 1556 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1557 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1558 |GDT Ver| Rsvd |MType=0| Rsvd | Sequence Number | 1559 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1560 | ProblemType | Encoded-Network-Element ... 1561 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1563 GDT Ver, Rsvd, Sequence Number, Encoded-Network-Element 1564 See above. 1566 ProblemType 1567 Legal values are: 1568 0 lowH 1569 1 highU 1570 2 lowerH 1571 3 higherU 1572 4 upstreamU 1573 5 highD 1574 6 lowC 1575 7 badHW 1576 8 downstreamH 1578 6.2. Retract Message 1580 0 1 2 3 1581 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1582 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1583 |GDT Ver| Rsvd |MType=1| Rsvd | Sequence Number | 1584 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1585 | ProblemType | Encoded-Network-Element ... 1586 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1587 | (optional) Attribute-Option ... 1588 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1590 GDT Ver, Rsvd, Sequence Number, ProblemType, Encoded-Network-Element 1591 See above. 1593 Draft GDT Protocol Specification January 1997 1595 6.3. Redirect Message 1597 0 1 2 3 1598 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1599 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1600 |GDT Ver|X|X|O|X|MType=2| Rsvd | Sequence Number | 1601 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1602 | ProblemType | Encoded-Network-Element ... 1603 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1604 | (optional) Attribute-Option ... 1605 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1606 | Encoded-Expert-Address-1 | 1607 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1608 | Holdtime-1 | 1609 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1610 | Rsvd |R| Encoded-Capability-1 ... 1611 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1612 | . . . | 1613 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1614 | Encoded-Expert-Address-n | 1615 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1616 | Holdtime-n | 1617 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1618 | Rsvd |R| Encoded-Capability-n ... 1619 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1621 GDT Ver, Rsvd, Sequence Number, ProblemType, Encoded-Network-Element 1622 See above. 1624 Option present-bit (O) 1625 If set, an Attribute option is included; if cleared, no options are 1626 included. 1628 Encoded-Expert-Address 1629 The address of an expert whose capability follows. The format of 1630 this field is an Encoded-Unicast-Address as shown above. 1632 Holdtime 1633 The time-to-live (in seconds) of the following capability. 1635 Encoded-Capability 1636 The encoded capability of the indicated expert. 1638 Draft GDT Protocol Specification January 1997 1640 6.4. Status-Report Message 1642 0 1 2 3 1643 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1644 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1645 |GDT Ver|A|R|X|X|MType=3| Status| Sequence Number | 1646 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1647 | ProblemType | Encoded-Network-Element ... 1648 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1649 | Option-1 ... 1650 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1651 | . . . | 1652 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1653 | Option-n ... 1654 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1656 GDT Ver, Sequence Number, ProblemType, Encoded-Network-Element 1657 See above. 1659 Ack-Request bit (A) 1660 This bit indicates that a Status-Ack is requested. Currently, this 1661 bit is only set in triggered SR:Confirmed messages. 1663 Relay-bit (R) 1664 This bit indicates that the Status-Report is to be relayed to origins 1665 of the problem's effects. When relayed, the Ack-Request and Relay 1666 bits, Status field, and any Options are preserved in the relayed 1667 Status-Report. If no Cause option is present in a Status-Report 1668 received with the Relay-bit set, a Cause option is added to the 1669 Status-Report sent, using the problem type and network element from 1670 the original Status-Report. Currently, the Relay-bit bit is set in 1671 SR:Isolated, SR:Repair-Deferred, and triggered SR:Confirmed messages. 1673 Reserved-bits (X) 1674 Transmitted as zero. Ignored upon receipt. 1676 Status 1677 Status values are: 1678 0 Unconfirmed 1679 1 Diagnosis-Deferred 1680 2 Rejected 1681 3 Indeterminate 1682 4 Confirmed 1683 5 Covered 1684 6 CantRepair 1686 Draft GDT Protocol Specification January 1997 1688 7 Isolated 1689 8 Repair-Deferred 1690 9 Repaired 1691 10 WentAway 1692 11 Retesting 1693 12 Deleted 1695 Options 1696 A Cause option MUST be included if (and only if) the Status-Report is 1697 a relayed version of another Status-Report. The Cause option 1698 includes the problem type and network element of the original 1699 Status-Report. The Status field thus indicates the status of the 1700 cause when this option is present, rather than the status of the 1701 reported problem. 1703 An ETC option SHOULD be included if the Status is Unconfirmed or 1704 Diagnosis-Deferred. 1706 An ETR option SHOULD be included if the Status is Isolated or Repair- 1707 Deferred. 1709 6.5. Status-Ack Message 1711 0 1 2 3 1712 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1713 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1714 |GDT Ver| Rsvd |Mtype=4| Status| Sequence Number | 1715 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1716 | ProblemType | Encoded-Network-Element ... 1717 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1719 Sequence Number 1720 The sequence number of the Status-Report which is being acknowledged. 1722 All other fields are described above. 1724 Draft GDT Protocol Specification January 1997 1726 6.6. Am-Child Message 1728 0 1 2 3 1729 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1730 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1731 |GDT Ver| Rsvd |MType=5| Rsvd | Sequence Number | 1732 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1733 | Holdtime | 1734 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1735 | Rsvd |R| Encoded-Capability-1 ... 1736 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1737 | | | . . . | 1738 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1739 | Rsvd |R| Encoded-Capability-n ... 1740 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1742 Holdtime 1743 The amount of time (in seconds) the capabilities are valid. This 1744 field allows capabilities to be aged out, and should be set to 1745 [Capability-Holdtime]. 1747 Repair-bit (R) 1748 If set, this bit indicates that the following capability is valid for 1749 both diagnosis and repair; otherwise, the capability is valid for 1750 diagnosis only. 1752 6.7. Am-Parent Message 1754 0 1 2 3 1755 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1756 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1757 |GDT Ver| Rsvd |MType=6| Rsvd | Sequence Number | 1758 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1760 Sequence Number 1761 The sequence number of the Am-Child message which is being 1762 acknowledged. 1764 Draft GDT Protocol Specification January 1997 1766 6.8. Transfer Message 1768 0 1 2 3 1769 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1770 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1771 |GDT Ver| Rsvd |MType=7| Rsvd | Sequence Number | 1772 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1773 | Encoded-Expert-Address ... 1774 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + + 1775 | MaxMaskLen | CurrMaskLen | 1776 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1778 Encoded-Expert-Address 1779 The address of the expert which the receiver should try as a parent. 1781 MaxMaskLen 1782 The length of the indicated expert's maximal policy prefix. 1784 CurrMaskLen 1785 The length of the indicated expert's active policy prefix. 1787 6.9. Am-Root Message 1789 0 1 2 3 1790 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1791 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1792 |GDT Ver| Rsvd |MType=8| Rsvd | Sequence Number | 1793 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1794 | Holdtime | 1795 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1796 | MaxMaskLen | CurrMaskLen | 1797 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1799 Holdtime 1800 The amount of time (in seconds) this announcement is valid. This 1801 field allows the Root to be aged out, and should be set to the 1802 sender's [Capability-Holdtime]. 1804 MaxMaskLen 1805 The length of the sender's maximal policy prefix. 1807 CurrMaskLen 1808 The length of the sender's active policy prefix. 1810 Draft GDT Protocol Specification January 1997 1812 6.10. Whois-Server Message 1814 0 1 2 3 1815 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1816 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1817 |GDT Ver| Rsvd |MType=9| Rsvd | Sequence Number | 1818 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1819 | MaxMaskLen | TTL | 1820 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1822 MaxMaskLen 1823 For an ELS, this is the length of the sender's maximal policy prefix. 1824 All other agents should set this field to 255. 1826 TTL 1827 The TTL at which the Whois-Server message is being sent, and at which 1828 the server should respond with an Am-Server message. 1829 All other fields are described above. 1831 6.11. Am-Server Message 1833 0 1 2 3 1834 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1835 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1836 |GDT Ver| Rsvd |MTyp=10| Rsvd | Sequence Number | 1837 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1838 | Holdtime | 1839 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1840 | MaxMaskLen | CurrMaskLen | 1841 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 1842 All fields are as those for the Am-Root message. 1844 7. References 1846 [1] Thaler, D., and C.V. Ravishankar, "Distributed Top-Down Hierarchy 1847 Construction", INFOCOM'98. 1849 [2] Thaler, D., "GDT Element Naming", Work in Progress, Jan. 1998. 1851 [3] Thaler, D., and C.V. Ravishankar, "Using Name-Based Mappings to 1852 Increase Hit Rates", IEEE/ACM Transactions on Networking, Feb. 1853 1998. 1855 Draft GDT Protocol Specification January 1997 1857 8. Security Considerations 1859 In general, Hypothesis messages need not be authenticated, since problem 1860 reports may be accepted from all sources. Before taking any further 1861 action, however, an expert will verify the existence of a reported 1862 problem. 1864 One potential denial-of-service attack is sending a large number of 1865 Hypotheses for non-existent problems. Experts may combat such attacks by 1866 caching results of previous tests, and by deferring tests when a 1867 denial-of-service attack is suspected. In extreme cases where not 1868 enough resources exist to keep state for such origins, the expert may 1869 reply by sending an SR:Indeterminate message, implying that the test was 1870 indeterminate. This option does not require any state to be kept. 1872 If Status-Report messages are unauthenticated, an attacker could either 1873 cause a non-existent problem to be falsely confirmed, in which case the 1874 origin will continue to wait for more feedback until the expert times 1875 out, or cause a true problem to be falsely rejected, in which case the 1876 origin must simply deal with the symptom (just as if the remote expert 1877 were unreachable). 1879 9. Address of Author 1881 Dave Thaler 1882 Merit Network, Inc 1883 4251 Plymouth Rd., Suite C 1884 Ann Arbor, MI 48105-2785 1885 Phone: +1 313 647 4813 1886 EMail: thalerd@merit.net 1888 Draft GDT Protocol Specification January 1997 1890 Table of Contents 1892 1 Introduction .................................................... 2 1893 1.1 Purpose ....................................................... 2 1894 1.2 Terminology ................................................... 2 1895 2 Protocol Overview ............................................... 5 1896 2.1 Agent Requirements ............................................ 6 1897 2.1.1 Advertising Capabilities .................................... 7 1898 2.2 Expert Location ............................................... 9 1899 2.3 Reliable Message Transport .................................... 9 1900 3 Basic Behavior .................................................. 10 1901 3.1 Startup ....................................................... 10 1902 3.2 Setting One's Parent .......................................... 10 1903 3.3 Detecting a problem and sending a Hypothesis .................. 11 1904 3.4 Sending a Retract ............................................. 12 1905 3.5 Receiving a Redirect .......................................... 12 1906 3.6 Receiving a Status-Report (SR) ................................ 13 1907 3.6.1 Receiving SR:Rejected ....................................... 13 1908 3.6.2 Receiving SR:Indeterminate .................................. 13 1909 3.6.3 Receiving SR:Repaired, SR:CantRepair, or SR:WentAway ........ 14 1910 3.7 Timers ........................................................ 14 1911 3.7.1 Default Values .............................................. 15 1912 4 Expert Behavior ................................................. 15 1913 4.1 Startup ....................................................... 15 1914 4.2 Receiving an Am-Root message .................................. 16 1915 4.3 Receiving a Transfer .......................................... 16 1916 4.4 Receiving an Am-Parent message ................................ 17 1917 4.5 Receiving a Hypothesis message ................................ 17 1918 4.6 Receiving a Retract message ................................... 18 1919 4.7 Receiving a Status-Report (SR) ................................ 18 1920 4.7.1 Receiving SR:Diagnosis-Deferred ............................. 19 1921 4.7.2 Receiving SR:Indeterminate .................................. 19 1922 4.7.3 Receiving SR:Confirmed, SR:Covered, SR:Retesting, or 1923 SR:Isolated .................................................. 19 1924 4.7.4 Receiving SR:Repair-Deferred ................................ 20 1925 4.7.5 Receiving SR:Repaired ....................................... 20 1926 4.7.6 Receiving SR:WentAway ....................................... 21 1927 4.8 Receiving a Status-Ack ........................................ 22 1928 4.9 Deleting hypothesis state ..................................... 22 1929 4.10 Supervising Tests ............................................ 23 1930 4.10.1 Scheduling a test .......................................... 23 1931 4.10.2 When the time for a deferred test arrives .................. 23 1932 4.10.3 When a test completes, with negative results ............... 23 1933 4.10.4 When a test is indeterminate ............................... 24 1934 Draft GDT Protocol Specification January 1997 1936 4.10.5 When a test completes, with positive results ............... 24 1937 4.10.6 Generating causal hypotheses ............................... 25 1938 4.11 Resolving lists of elements .................................. 25 1939 4.11.1 When a list resolution succeeds ............................ 26 1940 4.11.2 When a list resolution fails ............................... 26 1941 4.12 Supervising Repairs .......................................... 26 1942 4.12.1 Scheduling a repair ........................................ 26 1943 4.12.2 When the time for a deferred repair arrives ................ 27 1944 4.12.3 When a repair completes .................................... 27 1945 4.13 Timers ....................................................... 27 1946 4.13.1 Default Values ............................................. 28 1947 5 Expert Location Server (ELS) Behavior ........................... 29 1948 5.1 Startup ....................................................... 29 1949 5.2 Receiving a Whois-Server message .............................. 29 1950 5.3 Receiving an Am-Server message ................................ 30 1951 5.4 Receiving an Am-Parent message ................................ 30 1952 5.5 Receiving an Am-Child message ................................. 30 1953 5.5.1 Sending Am-Child messages with Aggregate Capabilities ....... 31 1954 5.6 Timers ........................................................ 31 1955 5.6.1 Default Values .............................................. 32 1956 6 Packet Formats .................................................. 33 1957 6.1 Hypothesis Message ............................................ 36 1958 6.2 Retract Message ............................................... 36 1959 6.3 Redirect Message .............................................. 37 1960 6.4 Status-Report Message ......................................... 38 1961 6.5 Status-Ack Message ............................................ 39 1962 6.6 Am-Child Message .............................................. 40 1963 6.7 Am-Parent Message ............................................. 40 1964 6.8 Transfer Message .............................................. 41 1965 6.9 Am-Root Message ............................................... 41 1966 6.10 Whois-Server Message ......................................... 42 1967 6.11 Am-Server Message ............................................ 42 1968 7 References ...................................................... 42 1969 8 Security Considerations ......................................... 43 1970 9 Address of Author ............................................... 43