idnits 2.17.1 draft-ietf-p2psip-diagnostics-22.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 332 has weird spacing: '...ionType type;...' == Line 527 has weird spacing: '... opaque diagn...' -- The document date (March 24, 2016) is 2945 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'TBD7' is mentioned on line 1084, but not defined == Missing Reference: 'TBD8' is mentioned on line 1085, but not defined == Missing Reference: 'TBD1' is mentioned on line 1104, but not defined == Missing Reference: 'TBD2' is mentioned on line 1105, but not defined == Missing Reference: 'TBD3' is mentioned on line 1106, but not defined == Missing Reference: 'TBD4' is mentioned on line 1107, but not defined == Missing Reference: 'TBD5' is mentioned on line 1108, but not defined == Missing Reference: 'TBD6' is mentioned on line 1109, but not defined ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126) == Outdated reference: A later version (-09) exists of draft-ietf-p2psip-concepts-08 Summary: 1 error (**), 0 flaws (~~), 12 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 P2PSIP Working Group H. Song 3 Internet-Draft X. Jiang 4 Intended status: Standards Track R. Even 5 Expires: September 25, 2016 Huawei 6 D. Bryan 7 ethernot.org 8 Y. Sun 9 ICT 10 March 24, 2016 12 P2P Overlay Diagnostics 13 draft-ietf-p2psip-diagnostics-22 15 Abstract 17 This document describes mechanisms for P2P overlay diagnostics. It 18 defines extensions to the RELOAD base protocol to collect diagnostic 19 information, and details the protocol specifications for these 20 extensions. Useful diagnostic information for connection and node 21 status monitoring is also defined. The document also describes the 22 usage scenarios and provides examples of how these methods are used 23 to perform diagnostics. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on September 25, 2016. 42 Copyright Notice 44 Copyright (c) 2016 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 Table of Contents 59 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 60 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 61 3. Diagnostic Scenarios . . . . . . . . . . . . . . . . . . . . 4 62 4. Data Collection Mechanisms . . . . . . . . . . . . . . . . . 5 63 4.1. Overview of Operations . . . . . . . . . . . . . . . . . 5 64 4.2. "Ping-like" Behavior: Extending Ping . . . . . . . . . . 7 65 4.2.1. RELOAD Request Extension: Ping . . . . . . . . . . . 7 66 4.3. "Traceroute-like" Behavior: The Path_Track Method . . . . 8 67 4.3.1. New RELOAD Request: PathTrack . . . . . . . . . . . . 9 68 4.4. Error Code Extensions . . . . . . . . . . . . . . . . . . 11 69 5. Diagnostic Data Structures . . . . . . . . . . . . . . . . . 11 70 5.1. DiagnosticsRequest Data Structure . . . . . . . . . . . . 12 71 5.2. DiagnosticsResponse Data Structure . . . . . . . . . . . 13 72 5.3. dMFlags and Diagnostic Kind ID Types . . . . . . . . . . 15 73 6. Message Processing . . . . . . . . . . . . . . . . . . . . . 18 74 6.1. Message Creation and Transmission . . . . . . . . . . . . 18 75 6.2. Message Processing: Intermediate Peers . . . . . . . . . 19 76 6.3. Message Response Creation . . . . . . . . . . . . . . . . 20 77 6.4. Interpreting Results . . . . . . . . . . . . . . . . . . 21 78 7. Authorization through Overlay Configuration . . . . . . . . . 21 79 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 80 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 81 9.1. Diagnostics Flag . . . . . . . . . . . . . . . . . . . . 22 82 9.2. Diagnostic Kind ID . . . . . . . . . . . . . . . . . . . 23 83 9.3. Message Codes . . . . . . . . . . . . . . . . . . . . . . 24 84 9.4. Error Code . . . . . . . . . . . . . . . . . . . . . . . 25 85 9.5. Message Extension . . . . . . . . . . . . . . . . . . . . 25 86 9.6. XML Name Space Registration . . . . . . . . . . . . . . . 25 87 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 26 88 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 26 89 11.1. Normative References . . . . . . . . . . . . . . . . . . 26 90 11.2. Informative References . . . . . . . . . . . . . . . . . 27 91 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 27 92 A.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 28 93 A.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 28 94 A.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 28 95 Appendix B. Problems with Generating Multiple Responses on Path 28 96 Appendix C. Changes to the Draft . . . . . . . . . . . . . . . . 28 97 C.1. Changes since -00 version . . . . . . . . . . . . . . . . 29 98 C.2. Changes since -01 version . . . . . . . . . . . . . . . . 29 99 C.3. Changes since -02 version . . . . . . . . . . . . . . . . 29 100 C.4. Changes since -03 version . . . . . . . . . . . . . . . . 29 101 C.5. Changes since -04 version . . . . . . . . . . . . . . . . 29 102 C.6. Changes since -05 version . . . . . . . . . . . . . . . . 29 103 C.7. Changes in version -10 . . . . . . . . . . . . . . . . . 29 104 C.8. Changes in version -15 . . . . . . . . . . . . . . . . . 30 105 C.9. Changes in version -20 . . . . . . . . . . . . . . . . . 30 106 C.10. Changes in version -22 . . . . . . . . . . . . . . . . . 31 107 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 31 109 1. Introduction 111 In the last few years, overlay networks have rapidly evolved and 112 emerged as a promising platform for deployment of new applications 113 and services in the Internet. One of the reasons overlay networks 114 are seen as an excellent platform for large scale distributed systems 115 is their resilience in the presence of failures. This resilience has 116 three aspects: data replication, routing recovery, and static 117 resilience. Routing recovery algorithms are used to repopulate the 118 routing table with live nodes when failures are detected. Static 119 resilience measures the extent to which an overlay can route around 120 failures even before the recovery algorithm repairs the routing 121 table. Both routing recovery and static resilience rely on accurate 122 and timely detection of failures. 124 There are a number of situations in which some nodes in a Peer-to- 125 Peer (P2P) overlay may malfunction or behave badly. For example, 126 these nodes may be disabled, congested, or may be misrouting 127 messages. The impact of these malfunctions on the overlay network 128 may be a degradation of quality of service provided collectively by 129 the peers in the overlay network or an interruption of the overlay 130 services. It is desirable to identify malfunctioning or badly 131 behaving peers through diagnostic tools, and exclude or reject them 132 from the P2P system. Node failures may also be caused by failures of 133 underlying layers. For example, recovery from an incorrect overlay 134 topology may be slow when the speed at which IP routing recovers 135 after link failures is very slow. Moreover, if a backbone link fails 136 and the failover is slow, the network may be partitioned, leading to 137 partitions of overlay topologies and inconsistent routing results 138 between different partitioned components. 140 Some keep-alive algorithms based on periodic probe and acknowledge 141 mechanisms enable accurate and timely detection of failures of one 142 node's neighbors [Overlay-Failure-Detection], but these algorithms by 143 themselves can only detect the disabled neighbors using the periodic 144 method. This may not be sufficient for the service provider 145 operating the overlay network. 147 A P2P overlay diagnostic framework supporting periodic and on-demand 148 methods for detecting node failures and network failures is 149 desirable. This document describes a general P2P overlay diagnostic 150 extension to the base protocol RELOAD [RFC6940] and is intended as a 151 complement to keep-alive algorithms in the P2P overlay itself. 152 Readers are advised to consult [I-D.ietf-p2psip-concepts] for further 153 background on the problem domain. 155 2. Terminology 157 This document uses the concepts defined in RELOAD [RFC6940]. 159 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 160 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 161 document are to be interpreted as described in [RFC2119]. 163 3. Diagnostic Scenarios 165 P2P systems are self-organizing and ideally setup and configuration 166 of individual P2P nodes requires no network management in the 167 traditional sense. However, users of an overlay, as well as P2P 168 service providers may contemplate usage scenarios where some 169 monitoring and diagnostics are required. We present a simple 170 connectivity test and some useful diagnostic information that may be 171 used in such diagnostics. 173 The common usage scenarios for P2P diagnostics can be broadly 174 categorized in three classes: 176 a. Automatic diagnostics built into the P2P overlay routing 177 protocol. Nodes perform periodic checks of known neighbors and 178 remove those nodes from the routing tables that fail to respond 179 to connectivity checks [Handling_Churn_in_a_DHT]. Unresponsive 180 nodes may only be temporarily disabled, for example due to a 181 local cryptographic processing overload, disk processing overload 182 or link overload. It is therefore useful to repeat the 183 connectivity checks to see nodes have recovered and can be again 184 placed in the routing tables. This process is known as 'failed 185 node recovery' and can be optimized as described in the paper 186 "Handling Churn in a DHT" [Handling_Churn_in_a_DHT]. 188 b. Diagnostics used by a particular node to follow up on an 189 individual user complaint or failure. For example, a technical 190 support staff member may use a desktop sharing application (with 191 the permission of the user) to remotely determine the health of, 192 and possible problems with, the malfunctioning node. Part of the 193 remote diagnostics may consist of simple connectivity tests with 194 other nodes in the P2P overlay and retrieval of statistics from 195 nodes in the overlay. The simple connectivity tests are not 196 dependent on the type of P2P overlay. Note that other tests may 197 be required as well, including checking the health and 198 performance of the user's computer or mobile device and checking 199 the bandwidth of the link connecting the user to the Internet. 201 c. P2P system-wide diagnostics used to check the overall health of 202 the P2P overlay network. These include checking the consumption 203 of network bandwidth, checking for the presence of problem links 204 and checking for abusive or malicious nodes. This is not a 205 trivial problem and has been studied in detail for content and 206 streaming P2P overlays [Diagnostic_Framework], and has not been 207 addressed in earlier documents 208 [Diagnostics_and_NAT_traversal_in_P2PP]. While this is a 209 difficult problem, a great deal of information that can help in 210 diagnosing these problems can be obtained by obtaining basic 211 diagnostic information for peers and the network. This document 212 provides a framework for obtaining this information. 214 4. Data Collection Mechanisms 216 4.1. Overview of Operations 218 The diagnostic mechanisms described in this document are primarily 219 intended to detect and locate failures or monitor performance in P2P 220 overlay networks. It provides mechanisms to detect and locate 221 malfunctioning or badly behaving nodes including disabled nodes, 222 congested nodes and misrouting peers. It provides a mechanism to 223 detect direct connectivity or connectivity to a specified node, a 224 mechanism to detect the availability of specified resource records 225 and a mechanism to discover P2P overlay topology and the underlay 226 topology failures. 228 The RELOAD diagnostics extensions define two mechanisms to collect 229 data. The first is an extension to the RELOAD Ping mechanism, 230 allowing diagnostic data to be queried from a node, as well as to 231 diagnose the path to that node. The second is a new method, 232 PathTrack, for collecting diagnostic information iteratively. 233 Payloads for these mechanisms allowing diagnostic data to be 234 collected and represented are presented, and additional error codes 235 are introduced. Essentially, this document reuses RELOAD 236 [RFC6940]specification and extends them to introduce the new 237 diagnostics methods. The extensions strictly follow how RELOAD 238 specifies message routing, transport, NAT traversal, and other RELOAD 239 protocol features. 241 This document primarily describes how to detect and locate failures 242 including disabled nodes, congested nodes, misrouting behaviors and 243 underlying network faults in P2P overlay networks through a simple 244 and efficient mechanism. This mechanism is modeled after the ping/ 245 traceroute paradigm: ping [RFC0792] is used for connectivity checks, 246 and traceroute is used for hop-by-hop fault localization as well as 247 path tracing. This document specifies a "ping-like" mode (by 248 extending the RELOAD Ping method to gather diagnostics) and a 249 "traceroute-like" mode (by defining the new PathTrack method) for 250 diagnosing P2P overlay networks. 252 One way these tools can be used is to detect the connectivity to the 253 specified node or the availability of the specified resource-record 254 through the extended Ping operation. Once the overlay network 255 receives some alarms about overlay service degradation or 256 interruption, a Ping is sent. If the Ping fails, one can then send a 257 PathTrack to determine where the fault lies. 259 The diagnostic information can only be provided to authorized nodes. 260 Some diagnostic information can be provided to all the participants 261 in the P2P overlay, and some other diagnostic information can only be 262 provided to the nodes authorized by the local or overlay policy. The 263 authorization depends on the type of the diagnostic information and 264 the administrative considerations, and is application specific. 266 This document considers the general administrative scenario based on 267 diagnostic Kind, where a whole overlay can authorize a certain kind 268 of diagnostic information to a small list of particular nodes (e.g. 269 administrative nodes). That means, if a node gets the authorization 270 to access a diagnostic Kind, it can access that information from all 271 nodes in the overlay network. It leaves the scenario where a 272 particular node authorizes its diagnostic information to a particular 273 list of nodes out of scope. This could be achieved by extension of 274 this document if there is requirement in the near future. The 275 default policy or access rule for a type of diagnostic information is 276 "deny" unless specified in the diagnostics extension document. As 277 the RELOAD protocol already requires that each message carries the 278 message signature of the sender, the receiver of the diagnostics 279 requests can use the signature to identify the sender. It can then 280 use the overlay configuration file with this signature to determine 281 which types of diagnostic information that node is authorized for. 283 In the remainder of this section we define mechanisms for collecting 284 data, as well as the specific protocol extensions (message 285 extensions, new methods, and error codes) required to collect this 286 information. In Section 5 we discuss the format of the data 287 collected, and in Section 6 we discuss detailed message processing. 289 It is important to note that the mechanisms described in this 290 document do not guarantee that the information collected is in fact 291 related to the previous failures. However, using the information 292 from previous traversed nodes, the user (or management system) may be 293 able to infer the problem. Symmetric routing can be achieved by 294 using the Via List [RFC6940] (or an alternate DHT routing algorithm), 295 but the response path is not guaranteed to be the same. 297 4.2. "Ping-like" Behavior: Extending Ping 299 To provide "ping-like" behavior, the RELOAD Ping method is extended 300 to collect diagnostic data along the path. The request message is 301 forwarded by the intermediate peers along the path and then 302 terminated by the responsible peer. After optional local 303 diagnostics, the responsible peer returns a response message. If an 304 error is found when routing, an Error response is sent to the 305 initiator node by the intermediate peer. 307 The message flow of a Ping message (with diagnostic extensions) is as 308 follows: 310 Peer A Peer B Peer C Peer D 311 | | | | 312 |(1). PingReq | | | 313 |------------------->|(2). PingReq | | 314 | |------------------->|(3). PingReq | 315 | | |------------------->| 316 | | | | 317 | | |<-------------------| 318 | |<-------------------|(4). PingAns | 319 |<-------------------|(5). PingAns | | 320 |(6). PingAns | | | 321 | | | | 323 Figure 1: Ping Diagnostic Message Flow 325 4.2.1. RELOAD Request Extension: Ping 327 To extend the ping request for use in diagnostics, a new extension of 328 RELOAD is defined. The structure for a MessageExtension in RELOAD is 329 defined as: 331 struct { 332 MessageExtensionType type; 333 Boolean critical; 334 opaque extension_contents<0..2^32-1>; 335 } MessageExtension; 337 For the Ping request extension, we define a new MessageExtensionType, 338 extension 0x0002 named Diagnostic_Ping, as specified in Table 4. The 339 extension contents consists of a DiagnosticsRequest structure, 340 defined later in this document in Section 5.1. This extension MAY be 341 used for new requests of the Ping method and MUST NOT be included in 342 requests using any other method. 344 This extension is not critical. If a peer does not support the 345 extension, they will simply ignore the diagnostic portion of the 346 message, and will treat the message as if it was a normal ping. 347 Senders MUST accept a response that lacks diagnostic information and 348 SHOULD NOT resend the message expecting a reply. Receivers who 349 receive a method other than Ping including this extension MUST ignore 350 the extension. 352 4.3. "Traceroute-like" Behavior: The Path_Track Method 354 We define a simple PathTrack method for retrieving diagnostic 355 information iteratively. 357 The operation of this request is shown below in Figure 2. The 358 initiator node A asks its neighbor B which is the next hop peer to 359 the destination ID, and B returns a message with the next hop peer C 360 information, along with optional diagnostic information for B to the 361 initiator node. Then the initiator node A asks the next hop peer C 362 (direct response routing [RFC7263] or via symmetric routing) to 363 return next hop peer D information and diagnostic information of C. 364 Unless a failure prevents the message from being forwarded, this step 365 can be iteratively repeated until the request reaches responsible 366 peer D for the destination ID, and retrieves diagnostic information 367 of peer D. 369 The message flow of a PathTrack message (with diagnostic extensions) 370 is as follows: 372 Peer-A Peer-B Peer-C Peer-D 373 | | | | 374 |(1).PathTrackReq | | | 375 |------------------->| | | 376 |(2).PathTrackAns | | | 377 |<-------------------| | | 378 | |(3).PathTrackReq | | 379 |--------------------|------------------->| | 380 | |(4).PathTrackAns | | 381 |<-------------------|--------------------| | 382 | | |(5).PathTrackReq | 383 |--------------------|--------------------|------------------->| 384 | | |(6).PathTrackAns | 385 |<-------------------|--------------------|--------------------| 386 | | | | 388 Figure 2: PathTrack Diagnostic Message Flow 390 There have been proposals that RouteQuery and a series of Fetch 391 requests can be used to replace the PathTrack mechanism, but in the 392 presence of high rates of churn, such an operation would not, 393 strictly speaking, provide identical results, as the path may change 394 between RouteQuery and Fetch operations. While obviously the path 395 could change between steps of PathTrack as well, with a single 396 message rather than two messages for query and fetch, less 397 inconsistency is likely, and thus the use of a single message is 398 preferred. 400 Given that in a typical diagnostic scenario the peer sending the 401 PathTrack request desires to obtain information about the current 402 path to the destination, in the event that succesive calls to 403 PathTrack return different paths, the results should be discarded and 404 the request resent, ensuring that the second request traverses the 405 appropriate path. 407 4.3.1. New RELOAD Request: PathTrack 409 This document defines a new RELOAD method, PathTrack, to retrieve the 410 diagnostic information from the intermediate peers along the routing 411 path. At each step of the PathTrack request, the responsible peer 412 responds to the initiator node with requested status information. 413 Status information can include a peer's congestion state, processing 414 power, available bandwidth, the number of entries in its neighbor 415 table, uptime, identity, network address information, and next hop 416 peer information. 418 A PathTrack request specifies which diagnostic information is 419 requested using a DiagnosticsRequest data structure, defined and 420 discussed in detail later in this document in Section 5.1. Base 421 information is requested by setting the appropriate flags in the data 422 structure in the request. If all flags are clear (no bits are set), 423 then the PathTrack request is only used for requesting the next hop 424 information. In this case the iterative mode of PathTrack is 425 degraded to a RouteQuery method which is only used for checking the 426 liveness of the peers along the routing path. The PathTrack request 427 can be routed using direct response routing or other routing methods 428 chosen by the initiator node. 430 A response to a successful PathTrackReq is a PathTrackAns message. 431 The PathTrackAns contains general diagnostic information in the 432 payload, returned using a DiagnosticResponse data structure. This 433 data structure is defined and discussed in detail later in this 434 document in Section 5.2. The information returned is determined 435 based on the information requested in the flags in the corresponding 436 request. 438 4.3.1.1. PathTrack Request 440 The structure of the PathTrack request is as follows: 442 struct{ 443 Destination destination; 444 DiagnosticsRequest request; 445 }PathTrackReq; 447 The fields of the PathTrackReq are as follows: 449 destination : The destination which the initiator node is 450 interested in. This may be any valid destination object, 451 including a NodeID, opaque ids, or ResourceID. One example should 452 be noted that, for debugging purpose, the initiator will use the 453 destination ID as it was used when failure happened. 455 request : A DiagnosticsRequest, as discussed in Section 5.1. 457 4.3.1.2. PathTrack Response 459 The structure of the PathTrack response is as follows: 461 struct{ 462 Destination next_hop; 463 DiagnosticsResponse response; 464 }PathTrackAns; 466 The fields of the PathTrackAns are as follows: 468 next_hop : The information of the next hop node from the 469 responding intermediate peer to the destination. If the 470 responding peer is the responsible peer for the destination ID, 471 then the next_hop node ID equals the responding node ID, and after 472 receiving a PathTrackAns where the next_hop node ID equals the 473 responding node ID the initiator MUST stop the iterative process. 475 response : A DiagnosticsResponse, as discussed in Section 5.2. 477 4.4. Error Code Extensions 479 This document extends the Error response method defined in the RELOAD 480 specification to support error cases resulting from diagnostic 481 queries. When an error is encountered in RELOAD, the Message Code 482 0xFFFF is returned. The ErrorResponse structure includes an error 483 code. We define new error codes to report possible error conditions 484 detected while performing diagnostics: 486 Code Value Error Code Name 487 TBD1 Underlay Destination Unreachable 488 TBD2 Underlay Time exceeded 489 TBD3 Message Expired 490 TBD4 Upstream Misrouting 491 TBD5 Loop detected 492 TBD6 TTL hops exceeded 494 The final error codes will be assigned by IANA as specified in RELOAD 495 protocol [RFC6940]. The error code is returned by the upstreaming 496 node before the failure node. And the upstreaming node uses the 497 normal ping to detect the failure type and return it to the initiator 498 node, which will help the user (initiator node) to understand where 499 the failure happened and what kind of error happened, as the failure 500 may happen at the same location and for the same reason when sending 501 the normal message and the diagnostics message. 503 As defined in RELOAD, additional information may be stored (in an 504 implementation-specific way) in the optional error_info byte string. 505 While the specifics are obviously left to the implementation, as an 506 example, in the case of TBD1, the error_field could be used to 507 provide additional information as to why the underlay destination is 508 unreachable (net unreachable, host unreachable, fragmentation needed, 509 etc.) 511 5. Diagnostic Data Structures 513 Both the extended Ping method and PathTrack method use the following 514 common diagnostics data structures to collect data. Two common 515 structures are defined: DiagnosticsRequest for requesting data, and 516 DiagnosticsResponse for returning the information. 518 5.1. DiagnosticsRequest Data Structure 520 The DiagnosticsRequest data structure is used to request diagnostic 521 information and has the following form: 523 enum{ (2^16-1) } DiagnosticKindId; 525 struct{ 526 DiagnosticKindId kind; 527 opaque diagnostic_extension_contents<0..2^32-1>; 528 }DiagnosticExtension; 530 struct{ 531 uint64 expiration; 532 uint64 timestamp_initiated; 533 uint64 dMFlags; 534 uint32 ext_length; 535 DiagnosticExtension diagnostic_extensions_list<0..2^32-1>; 536 }DiagnosticsRequest; 538 The fields in the DiagnosticsRequest are as follows: 540 expiration : The time when the request will expire represented as 541 the number of milliseconds elapsed since midnight Jan 1, 1970 UTC 542 not counting leap seconds. This will have the same values for 543 seconds as standard UNIX time or POSIX time. More information can 544 be found at UnixTime [UnixTime]. This value MUST have a value of 545 between 1 and 600 seconds in the future. This value is used to 546 prevent replay attacks. 548 timestamp_initiated : The time when the diagnostics request was 549 initiated represented as the number of milliseconds elapsed since 550 midnight Jan 1, 1970 UTC not counting leap seconds. This will 551 have the same values for seconds as standard UNIX time or POSIX 552 time. 554 dMFlags : A mandatory field which is an unsigned 64-bit integer 555 indicating which base diagnostic information the request initiator 556 node is interested in. The initiator sets different bits to 557 retrieve different kinds of diagnostic information. If dMFlags is 558 set to zero, then no base diagnostic information is conveyed in 559 the PathTrack response. If dMFlag is set to all '1's, then all 560 base diagnostic information values are requested. A request may 561 set any number of the flags to request the corresponding 562 diagnostic information. 564 Note this memo specifies the initial set of flags, the flags can 565 be extended. The dMflags indicate general diagnostic information 566 The mapping between the bits in the dMFlags and the diagnostic 567 information kind presented is as described in Section 9.1. 569 ext_length : the length of the extended diagnostic request 570 information in bytes. If the value is greater than or equal to 1, 571 then some extended diagnostic information is being requested, on 572 the assumption this information will be included in the response 573 if the recipient understands the extended request and is willing 574 to provide it. The specific diagnostic information requested is 575 defined in the diagnostic_extensions_list below. A value of zero 576 indicates no extended diagnostic information is being requested. 577 The value of ext_length MUST NOT be negative. Note that it is not 578 the length of the entire DiagnosticsRequest data structure, but of 579 the data making up the diagnostic_extensions_list. 581 diagnostic_extensions_list : consists of one or more 582 DiagnosticExtension structures (see below) documenting additional 583 diagnostic information being requested. Each DiagnosticExtension 584 consists of the following fields: 586 kind : a numerical code indicating the type of extension 587 diagnostic information (see Section 9.2). Note that kinds 588 0xF000 - 0xFFFE are reserved for overlay specific diagnostics 589 and may be used without IANA registration for local diagnostic 590 information. Kinds from 0x0000 to 0x003F MUST NOT be indicated 591 in the diagnostic_extensions_list in the message request, as 592 they may be represented using the dMFlags in a much simpler 593 (and more space efficient) way. 595 diagnostic_extension_contents : the opaque data containing the 596 request for this particular extension. This data is extension 597 dependent. 599 5.2. DiagnosticsResponse Data Structure 600 enum { (2^16-1) } DiagnosticKindId; 601 struct{ 602 DiagnosticKindId kind; 603 opaque diagnostic_info_contents<0..2^16-1>; 604 }DiagnosticInfo; 606 struct{ 607 uint64 expiration; 608 uint64 timestamp_initiated; 609 uint64 timestamp_received; 610 uint8 hop_counter; 611 uint32 ext_length; 612 DiagnosticInfo diagnostic_info_list<0..2^32-1>; 613 }DiagnosticsResponse; 615 The fields in the DiagnosticsResponse are as follows: 617 expiration : The time when the response will expire represented as 618 the number of milliseconds elapsed since midnight Jan 1, 1970 UTC 619 not counting leap seconds. This will have the same values for 620 seconds as standard UNIX time or POSIX time. This value MUST have 621 a value of between 1 and 600 seconds in the future. 623 timestamp_initiated: This value is copied from the diagnostics 624 request message. The benefit of containing such a value in the 625 response message is that the initiator node does not have to 626 maintain the state. 628 timestamp_received : The time when the diagnostic request was 629 received represented as the number of milliseconds elapsed since 630 midnight Jan 1, 1970 UTC not counting leap seconds. This will 631 have the same values for seconds as standard UNIX time or POSIX 632 time. 634 hop_counter : This field only appears in diagnostic responses. It 635 MUST be exactly copied from the TTL field of the forwarding header 636 in the received request. This information is sent back to the 637 request initiator, allowing it to compute the number of hops that 638 the message traversed in the overlay. 640 ext_length : the length of the returned DiagnosticInfo information 641 in bytes. If the value is greater than or equal to 1, then some 642 extended diagnostic information (as specified in the 643 DiagnosticsRequest) was available and is being returned. In that 644 case, this value indicates the length of the returned information. 645 A value of zero indicates no extended diagnostic information is 646 included, either because none was requested or the request could 647 not be accommodated. The value of ext_length MUST NOT be 648 negative. Note that it is not the length of the entire 649 DiagnosticsRequest data structure, but of the data making up the 650 diagnostic_info_list. 652 diagnostic_info_list : consists of one or more DiagnosticInfo 653 structures containing the requested diagnostic_info_contents. The 654 fields in the DiagnosticInfo structure are as follows: 656 kind : A numeric code indicating the type of information being 657 returned. For base data requested using the dMFlags, this code 658 corresponds to the dMFlag set, and is described in Section 5.1. 659 For diagnostic extensions, this code will be identical to the 660 value of the DiagnosticKindId set in the "kind" field of the 661 DiagnosticExtension of the request. See Section 9.2. 663 diagnostic_info_contents : Data containing the value for the 664 diagnostic information being reported. Various kinds of 665 diagnostic information can be retrieved, Please refer to 666 Section 5.3 for details of the diagnostic Kind ID for the base 667 diagnostic information that may be reported. 669 5.3. dMFlags and Diagnostic Kind ID Types 671 The dMFlags field described above is a 64 bit field that allows 672 initiator nodes to identify up to 62 items of base information to 673 request in a request message (the first and last flags being 674 reserved). The dMFlags also reserves all "0"s that means nothing is 675 requested, and all "1"s that means everything is requested. But at 676 the same time, the first and last bits cannot be used for other 677 purposes, and they MUST be set to 0 when other particular diagnostic 678 information kinds are requested. When the requested base information 679 is returned in the response, the value of the diagnostic Kind ID will 680 correspond to the numeric field marked in the dMFlags in the request. 681 The values for the dMFlags are defined in Section 9.1 and the 682 diagnostic Kind IDs are defined in Section 9.2. The information 683 contained for each value is described in this section. Access to 684 each kind of diagnostic information MUST NOT be allowed unless 685 compliant to the rules defined in Section 7. 687 STATUS_INFO (8 bits):A single value element containing an unsigned 688 byte representing whether or not the node is in congestion status. 689 An example usage of STATUS_INFO is for congestion-aware routing. 690 In this scenario, each peer has to update its congestion status 691 periodically. An intermediate peer in the distributed hash table 692 (DHT) network will choose its next hop according to both the DHT 693 routing algorithm and the status information. This is done to 694 avoid increasing load on congested peers. The rightmost 4 bits 695 are used and other bits MUST be cleared to "0"s for future use. 697 There are 16 levels of congestion status, with "0x00" represent 698 zero load and "0x0F" represent congested. This document does not 699 provide a specific method for congestion, leaving this decision to 700 each overlay implementation. One possible option for an overlay 701 implementation would be to take node's CPU/memory/bandwidth usage 702 percentage in the past 600 seconds and normalize the highest value 703 to the range from 0x00 to 0x0F. And an overlay implementation can 704 also decide to not use all that 16 values from 0x00 to 0x0F. A 705 future draft may define an objective measure or specific algorithm 706 for this. 708 ROUTING_TABLE_SIZE (32 bits): A single value element containing an 709 unsigned 32-bit integer representing the number of peers in the 710 peer's routing table. The administrator of the overlay may be 711 interested in statistics of this value for reasons such as routing 712 efficiency. 714 PROCESS_POWER (64 bits): A single value element containing an 715 unsigned 64-bit integer specifying the processing power of the 716 node in unit of MIPS. Fractional values are rounded up. 718 UPSTREAM_BANDWIDTH (64 bits): A single value element containing an 719 unsigned 64-bit integer specifying the upstream network bandwidth 720 (provisioned or maximum, not available) of the node in unit of 721 Kbps. Fractional values are rounded up. For multihomed hosts, 722 this should be the link used to send the response. 724 DOWNSTREAM_BANDWIDTH (64 bits): A single value element containing 725 an unsigned 64-bit integer specifying the downstream network 726 bandwidth (provisioned or maximum, not available) of the node in 727 unit of Kbps. Fractional values are rounded up. For multihomed 728 hosts, this should be the link the request was received from. 730 SOFTWARE_VERSION: A single value element containing a US-ASCII 731 string that identifies the manufacture, model, operating system 732 information and the version of the software. Given that there are 733 very large number of peers in some networks, and no peer is likely 734 to know all other peer's software, this information may be very 735 useful to help determine if the cause of certain groups of 736 misbehaving peers is related to specific software versions. While 737 the format is peer-defined, a suggested format is as follows: 738 "ApplicationProductToken (Platform; OS-or-CPU) VendorProductToken 739 (VendorComment)". For example: "MyReloadApp/1.0 (Unix; Linux 740 x86_64) libreload-java/0.7.0 (Stonyfish Inc.)". The string is a 741 C-style string, and MUST be terminated by "\0"."\0" MUST NOT be 742 included in the string itself to prevent confusion with the 743 delimiter. 745 MACHINE_UPTIME (64 bits): A single value element containing an 746 unsigned 64-bit integer specifying the time the node's underlying 747 system has been up in seconds. 749 APP_UPTIME (64 bits): A single value element containing an 750 unsigned 64-bit integer specifying the time the P2P application 751 has been up in seconds. 753 MEMORY_FOOTPRINT (64 bits): A single value element containing an 754 unsigned 64-bit integer representing the memory footprint of the 755 peer program in kilobytes (1024 bytes). Fractional values are 756 rounded up. 758 DATASIZE_STORED (64 bits): An unsigned 64-bit integer representing 759 the number of bytes of data being stored by this node. 761 INSTANCES_STORED: An array element containing the number of 762 instances of each kind stored. The array is indexed by Kind-ID. 763 Each entry is an unsigned 64-bit integer. 765 MESSAGES_SENT_RCVD: An array element containing the number of 766 messages sent and received. The array is indexed by method code. 767 Each entry in the array is a pair of unsigned 64-bit integers 768 (packed end to end) representing sent and received. 770 EWMA_BYTES_SENT (32 bits): A single value element containing an 771 unsigned 32-bit integer representing an exponential weighted 772 average of bytes sent per second by this peer. sent = alpha x 773 sent_present + (1 - alpha) x sent_last where sent_present 774 represents the bytes sent per second since the last calculation 775 and sent_last represents the last calculation of bytes sent per 776 second. A suitable value for alpha is 0.8 (the implementation can 777 decide other suitable value for this). This value is calculated 778 every five seconds (the implementation can also decide other 779 length of the time period). The value for the very first time 780 period should simply be the average of bytes sent in that time 781 period. 783 EWMA_BYTES_RCVD (32 bits): A single value element containing an 784 unsigned 32-bit integer representing an exponential weighted 785 average of bytes received per second by this peer. rcvd = alpha x 786 rcvd_present + (1 - alpha) x rcvd_last where rcvd_present 787 represents the bytes received per second since the last 788 calculation and rcvd_last represents the last calculation of bytes 789 received per second. A suitable value for alpha is 0.8 (the 790 implementation can decide other suitable value for this). This 791 value is calculated every five seconds (the implementation can 792 also decide other length of the time period). The value for the 793 very first time period should simply be the average of bytes 794 received in that time period. 796 UNDERLAY_HOP (8 bits): Indicates the IP layer hops from the 797 intermediate peer which receives the diagnostics message to the 798 next hop peer for this message. (Note: RELOAD does not require 799 the intermediate peers to look into the message body. So here we 800 use PathTrack to gather underlay hops for diagnostics purpose). 802 BATTERY_STATUS (8 bits): The left-most bit is used to indicate 803 whether this peer is using a battery or not. If this bit is clear 804 (set to '0'), then the peer is using a battery for power. The 805 other 7 bits are to be determined by specific applications. 807 6. Message Processing 809 6.1. Message Creation and Transmission 811 When constructing either a Ping message with diagnostic extensions or 812 a PathTrack message, the sender first creates and populates a 813 DiagnosticsRequest data structure. The timestamp_initiated field is 814 set to the current time, and the expiration field is constructed 815 based on this time. The sender includes the dMFlags field in the 816 structure, setting any number (including all) of the flags to request 817 particular diagnostic information. The sender MAY leave all the bits 818 unset, requesting no particular diagnostic information. 820 The sender MAY also include diagnostic extensions in the 821 DiagnosticsRequest data structure to request additional information. 822 If the sender includes any extensions, it MUST calculate the length 823 of these extensions and set the ext_length field to this value. If 824 no extensions are included, the sender MUST set ext_length to zero. 826 The format of the DiagnosticRequest data structure and its fields 827 MUST follow the restrictions defined in Section 5.1. 829 When constructing a Ping message with diagnostic extensions, the 830 sender MUST create an MessageExtension structure as defined in RELOAD 831 [RFC6940], setting the value of type to 0x0002, and the value of 832 critical to FALSE. The value of extension_contents MUST be a 833 DiagnosticsRequest structure as defined above. The message MAY be 834 directed to a particular NodeId or ResourceID, but MUST NOT be sent 835 to the broadcast NodeID. 837 When constructing a PathTrack message, the sender MUST set the 838 message_code for the RELOAD MessageContents structure to 839 path_track_req TBD7. The request field of the PathTrackReq MUST be 840 set to the DiagnosticsRequest data structure defined above. The 841 destination field MUST be set to the desired destination, which MAY 842 be either a NodeId or ResourceID but SHOULD NOT be the broadcast 843 NodeID. 845 6.2. Message Processing: Intermediate Peers 847 When a request arrives at a peer, if the peer's responsible ID space 848 does not cover the destination ID of the request, then the peer MUST 849 continue processing this request according to the overlay specified 850 routing mode from RELOAD protocol. 852 In P2P overlay, error responses to a message can be generated by 853 either an intermediate peer or the responsible peer. When a request 854 is received at a peer, the peer may find connectivity failures or 855 malfunctioning peers through the pre-defined rules of the overlay 856 network, e.g. by analyzing via list or underlay error messages. In 857 this case, the intermediate peer returns an error response to the 858 initiator node, reporting any malfunction node information available 859 in the error message payload. All error responses generated MUST 860 contain the appropriate error code. 862 Each intermediate peer receiving a Ping message with extensions (and 863 which understands the extension) or receiving a PathTrack request/ 864 response MUST check the expiration value (Unix time format) to 865 determine if the message is expired. If the message expired, the 866 intermediate peer MUST generate a response with Error Code TBD3 867 "Message Expired", return the response to the initiator node, and 868 discard the message. 870 The intermediate peer MUST return an error response with the Error 871 Code TBD1 "Underlay Destination Unreachable" when it receives an ICMP 872 message with "Destination Unreachable" information after forwarding 873 the received request to the destination peer. 875 The intermediate peer MUST return an error response with the Error 876 Code TBD2 "Underlay Time Exceeded" when it receives an ICMP message 877 with "Time Exceeded" information after forwarding the received 878 request. 880 The peer MUST return an Error response with Error Code TBD4 "Upstream 881 Misrouting" when it finds its upstream peer disobeys the routing 882 rules defined in the overlay. The immediate upstream peer 883 information MUST also be conveyed to the initiator node. 885 The peer MUST return an Error response with Error Code TBD5 "Loop 886 detected" when it finds a loop through the analysis of via list. 888 The peer MUST return an Error response with Error Code TBD6 "TTL hops 889 exceeded" when it finds that the TTL field value is no more than 0 890 when forwarding. 892 6.3. Message Response Creation 894 When a diagnostic request message arrives at a peer, it is 895 responsible for the destination ID specified in the forwarding 896 header, and assuming it understands the extension (in the case of 897 Ping) or the new request type PathTrack, it MUST follow the 898 specifications defined in RELOAD to form the response header, and 899 perform the following operations: 901 When constructing a PathTrack response, the sender MUST set the 902 message_code for the RELOAD MessageContents structure to 903 path_track_ans TBD8. 905 The receiver MUST check the expiration value (Unix time format) in 906 the DiagnosticsRequest to determine if the message is expired. If 907 the message is expired, the peer MUST generate a response with the 908 Error Code TBD3 "Message Expired", return the response to the 909 initiator node, and discard the message. 911 If the message is not expired, the receiver MUST construct a 912 DiagnosticsResponse structure, as follows: The TTL value from the 913 forwarding header is copied to the hop_counter field of the 914 DiagnosticsResponse structure. Note that the default value for TTL 915 at the beginning represents 100-hops unless overlay configuration has 916 overridden the value. The receiver generates an Unix time format 917 timestamp for the current time of day and places it in the 918 timestamp_received field, and constructs a new expiration time and 919 places it in the expiration field of the DiagnosticsResponse. 921 The destination peer MUST check if the initiator node has the 922 authority to request specific types of diagnostic information, and if 923 appropriate, append the diagnostic information requested in the 924 dMFlags and diagnostic_extensions (if any) using the 925 diagnostic_info_list field to the DiagnosticsResponse structure. If 926 any information returned, the receiver MUST calculate the length of 927 the response and set ext_length appropriately. If no diagnostic 928 information is returned, ext_length MUST be set to zero. 930 The format of the DiagnosticResponse data structure and its fields 931 MUST follow the restrictions defined in Section 5.2. 933 In the event of an error, an error response containing the error code 934 followed by the description (if they exist) MUST be created and sent 935 to the sender. If the initiator node asks for diagnostic information 936 that they are not authorized to query, the receiving peer MUST return 937 an Error response with the Error Code 2 "Error_Forbidden". 939 6.4. Interpreting Results 941 The initiator node, as well as the responding peer, may compute the 942 overlay One-Way-Delay time through the value in timestamp_received 943 and the timestamp_initiated field. However, for a single hop 944 measurement, the traditional measurement methods (IP layer ping) MUST 945 be used instead of the overlay layer diagnostics methods. 947 The P2P overlay network using the diagnostics methods specified in 948 this document MUST enforce time synchronization with a central time 949 server. Network Time Protocol [RFC5905] can usually maintain time to 950 within tens of milliseconds over the public Internet, and can achieve 951 better than one millisecond accuracy in local area networks under 952 ideal conditions. However, this document does not specify the choice 953 for time resolution and synchronization, leaving it to the 954 implementation. 956 The initiator node receiving the Ping response may check the 957 hop_counter field and compute the overlay hops to the destination 958 peer for the statistics of connectivity quality from the perspective 959 of overlay hops. 961 7. Authorization through Overlay Configuration 963 Different level of access control can be made for different users/ 964 nodes. For example, diagnostic information A can be accessed by node 965 1 and 2, but diagnostic information B can only be accessed by node 2. 967 The overlay configuration file MUST contain the following XML 968 elements for authorizing a node to access the relative diagnostic 969 Kinds. 971 diagnostic-kind: This has the attribute "kind" with the hexadecimal 972 number indicating the diagnostic Kind ID, this attribute has the same 973 value with Section 9.2, and at least one sub element "access-node". 975 access-node: This element contains one hexadecimal number indicating 976 a NodeID, and the node with this NodeID is allowed to access the 977 diagnostic "kind" under the same diagnostic-kind element. 979 8. Security Considerations 981 The authorization for diagnostic information must be designed with 982 care to prevent it becoming a method to retrieve information for bot 983 attacks. It should also be noted that attackers can use diagnostics 984 to analyze overlay information to attack certain key peers. For 985 example, diagnostic information might be used to fingerprint a peer 986 where the peer will loose its anonymity characteristics, but 987 anonymity might be very important for some P2P overlay networks, and 988 defenses against such fingerprinting are probably very hard. As 989 such, networks where anonymity is of very high importance may find 990 implementation of diagnostics problematic or even undesirable, 991 despite the many advantages it offers. As this document is a RELOAD 992 extension, it follows RELOAD message header and routing 993 specifications, the common security considerations described in the 994 base document [RFC6940] are also applicable to this document. 995 Overlays may define their own requirements on who can collect/share 996 diagnostic information. 998 9. IANA Considerations 1000 9.1. Diagnostics Flag 1002 IANA is asked to create a "RELOAD Diagnostics Flag" Registry under 1003 protocol RELOAD. Entries in this registry are 1-bit flags contained 1004 in a 64-bits long integer dMFlags denoting diagnostic information to 1005 be retrieved as described in Section 4.3.1. New entries SHALL be 1006 defined via [RFC5226] Standards Action. The initial contents of this 1007 registry are: 1009 +-------------------------+----------------------------+----------+ 1010 | diagnostic information |diagnostic flag in dMFlags | RFC | 1011 |-------------------------+----------------------------+----------| 1012 |Reserved All 0s value | 0x 0000 0000 0000 0000 |RFC-[TBDX]| 1013 |Reserved First Bit | 0x 0000 0000 0000 0001 |RFC-[TBDX]| 1014 |STATUS_INFO | 0x 0000 0000 0000 0002 |RFC-[TBDX]| 1015 |ROUTING_TABLE_SIZE | 0x 0000 0000 0000 0004 |RFC-[TBDX]| 1016 |PROCESS_POWER | 0x 0000 0000 0000 0008 |RFC-[TBDX]| 1017 |UPSTREAM_BANDWIDTH | 0x 0000 0000 0000 0010 |RFC-[TBDX]| 1018 |DOWNSTREAM_ BANDWIDTH | 0x 0000 0000 0000 0020 |RFC-[TBDX]| 1019 |SOFTWARE_VERSION | 0x 0000 0000 0000 0040 |RFC-[TBDX]| 1020 |MACHINE_UPTIME | 0x 0000 0000 0000 0080 |RFC-[TBDX]| 1021 |APP_UPTIME | 0x 0000 0000 0000 0100 |RFC-[TBDX]| 1022 |MEMORY_FOOTPRINT | 0x 0000 0000 0000 0200 |RFC-[TBDX]| 1023 |DATASIZE_STORED | 0x 0000 0000 0000 0400 |RFC-[TBDX]| 1024 |INSTANCES_STORED | 0x 0000 0000 0000 0800 |RFC-[TBDX]| 1025 |MESSAGES_SENT_RCVD | 0x 0000 0000 0000 1000 |RFC-[TBDX]| 1026 |EWMA_BYTES_SENT | 0x 0000 0000 0000 2000 |RFC-[TBDX]| 1027 |EWMA_BYTES_RCVD | 0x 0000 0000 0000 4000 |RFC-[TBDX]| 1028 |UNDERLAY_HOP | 0x 0000 0000 0000 8000 |RFC-[TBDX]| 1029 |BATTERY_STATUS | 0x 0000 0000 0001 0000 |RFC-[TBDX]| 1030 |Reserved Last Bit | 0x 8000 0000 0000 0000 |RFC-[TBDX]| 1031 |Reserved All 1s value | 0x FFFF FFFF FFFF FFFF |RFC-[TBDX]| 1032 +-------------------------+----------------------------+----------+ 1034 [To RFC editor: Please replace all RFC-[TBDX] in this document with 1035 the RFC number of this document.] 1037 9.2. Diagnostic Kind ID 1039 IANA is asked to create a "RELOAD Diagnostic Kind ID" Registry under 1040 protocol RELOAD. Entries in this registry are 16-bit integers 1041 denoting diagnostics extension data kinds carried in the diagnostic 1042 request and response message, as described in Section 5.2. Code 1043 points from 0x0001 to 0x003E are asked to be assigned together with 1044 flags within "RELOAD Diagnostics Flag" registry via RFC 5226 1045 [RFC5226] standards action. Code points in the range 0x003F to 1046 0xEFFF SHALL be registered via RFC 5226 standards action. 1048 +---------------------------+---------------+---------------+ 1049 | Diagnostic Kind | Code | Specification | 1050 +---------------------------+---------------+---------------+ 1051 | reserved | 0x0000 | RFC-[TBDX] | 1052 | STATUS_INFO | 0x0001 | RFC-[TBDX] | 1053 | ROUTING_TABLE_SIZE | 0x0002 | RFC-[TBDX] | 1054 | PROCESS_POWER | 0x0003 | RFC-[TBDX] | 1055 | UPSTREAM_BANDWIDTH | 0x0004 | RFC-[TBDX] | 1056 | DOWNSTREAM_BANDWIDTH | 0x0005 | RFC-[TBDX] | 1057 | SOFTWARE_VERSION | 0x0006 | RFC-[TBDX] | 1058 | MACHINE_UPTIME | 0x0007 | RFC-[TBDX] | 1059 | APP_UPTIME | 0x0008 | RFC-[TBDX] | 1060 | MEMORY_FOOTPRINT | 0x0009 | RFC-[TBDX] | 1061 | DATASIZE_STORED | 0x000A | RFC-[TBDX] | 1062 | INSTANCES_STORED | 0x000B | RFC-[TBDX] | 1063 | MESSAGES_SENT_RCVD | 0x000C | RFC-[TBDX] | 1064 | EWMA_BYTES_SENT | 0x000D | RFC-[TBDX] | 1065 | EWMA_BYTES_RCVD | 0x000E | RFC-[TBDX] | 1066 | UNDERLAY_HOP | 0x000F | RFC-[TBDX] | 1067 | BATTERY_STATUS | 0x0010 | RFC-[TBDX] | 1068 | reserved for future flags | 0x0011-3E | RFC-[TBDX] | 1069 | local use (reserved) | 0xF000-0xFFFE | RFC-[TBDX] | 1070 | reserved | 0xFFFF | RFC-[TBDX] | 1071 +---------------------------+---------------+---------------+ 1073 Table 1: Diagnostic Kind 1075 9.3. Message Codes 1077 This document introduces two new types of messages and their 1078 responses, requiring the following additions to the "RELOAD Message 1079 Code" Registry defined in RELOAD [RFC6940]. These additions are: 1081 +-------------------+------------+----------+ 1082 | Message Code Name | Code Value | RFC | 1083 +-------------------+------------+----------+ 1084 | path_track_req | [TBD7] | RFC-AAAA | 1085 | path_track_ans | [TBD8] | RFC-AAAA | 1086 +-------------------+------------+----------+ 1088 Table 2: Extensions to RELOAD Message Codes 1090 [To RFC editor: Values starting at TBD1 were used to prevent 1091 collisions with RELOAD base values and other extensions. Please 1092 replace with the next highest available values. The final message 1093 codes will be assigned by IANA. And all RFC-AAAA should be replaced 1094 with the RFC number of RELOAD when publication.] 1096 9.4. Error Code 1098 This document introduces the following new error codes, extending the 1099 "RELOAD Message Code" registry as described below: 1101 +----------------------------------------+------------+----------+ 1102 | Message Code Name | Code Value | RFC | 1103 +----------------------------------------+------------+----------+ 1104 | Error_Underlay_Destination_Unreachable | [TBD1] | RFC-AAAA | 1105 | Error_Underlay_Time_Exceeded | [TBD2] | RFC-AAAA | 1106 | Error_Message_Expired | [TBD3] | RFC-AAAA | 1107 | Error_Upstream_Misrouting | [TBD4] | RFC-AAAA | 1108 | Error_Loop_Detected | [TBD5] | RFC-AAAA | 1109 | Error_TTL_Hops_Exceeded | [TBD6] | RFC-AAAA | 1110 +----------------------------------------+------------+----------+ 1112 Table 3: Extensions to RELOAD Error Codes 1114 [To RFC editor: Values starting at TBD1 were used to prevent 1115 collisions with RELOAD base values and other extensions. Please 1116 replace with the next highest available values. The final message 1117 codes will be assigned by IANA. And all RFC-AAAA should be replaced 1118 with the RFC number of RELOAD when publication.] 1120 9.5. Message Extension 1122 This document introduces the following new RELOAD extension code: 1124 +-----------------+------------+----------+ 1125 | Extension Name | Code Value | RFC | 1126 +-----------------+------------+----------+ 1127 | Diagnostic_Ping | 0x0002 | RFC-AAAA | 1128 +-----------------+------------+----------+ 1130 Table 4: New RELOAD Extension Code 1132 [To RFC editor: The value 0x0002 was used to prevent collisions with 1133 other extensions. Please replace with the next highest available 1134 value. The final codes will be assigned by IANA. And all RFC-AAAA 1135 should be replaced with the RFC number of RELOAD when publication.] 1137 9.6. XML Name Space Registration 1139 This document registers a URI for the config-diagnostics XML 1140 namespaces in the IETF XML registry defined in [RFC3688]. All the 1141 elements defined in this document belong to this namespace. 1143 URI: urn:ietf:params:xml:ns:p2p:config-diagnostics 1144 Registrant Contact: The IESG. 1145 XML: N/A, the requested URIs are XML namespaces 1147 And the overlay configuration file MUST contain the following xml 1148 language declaring P2P diagnostics as a mandatory extension to 1149 RELOAD. 1151 1152 urn:ietf:params:xml:ns:p2p:config-diagnostics 1153 1155 10. Acknowledgments 1157 We would like to thank Zheng Hewen for the contribution of the 1158 initial version of this document. We would also like to thank Bruce 1159 Lowekamp, Salman Baset, Henning Schulzrinne, Jiang Haifeng and Marc 1160 Petit-Huguenin for the email discussion and their valued comments, 1161 and special thanks to Henry Sinnreich for contributing to the usage 1162 scenarios text. We would like to thank the authors of the RELOAD 1163 protocol for transferring text about diagnostics to this document. 1165 11. References 1167 11.1. Normative References 1169 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1170 RFC 792, DOI 10.17487/RFC0792, September 1981, 1171 . 1173 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1174 Requirement Levels", BCP 14, RFC 2119, 1175 DOI 10.17487/RFC2119, March 1997, 1176 . 1178 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 1179 DOI 10.17487/RFC3688, January 2004, 1180 . 1182 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1183 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1184 DOI 10.17487/RFC5226, May 2008, 1185 . 1187 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 1188 "Network Time Protocol Version 4: Protocol and Algorithms 1189 Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, 1190 . 1192 [RFC6940] Jennings, C., Lowekamp, B., Ed., Rescorla, E., Baset, S., 1193 and H. Schulzrinne, "REsource LOcation And Discovery 1194 (RELOAD) Base Protocol", RFC 6940, DOI 10.17487/RFC6940, 1195 January 2014, . 1197 [RFC7263] Zong, N., Jiang, X., Even, R., and Y. Zhang, "An Extension 1198 to the REsource LOcation And Discovery (RELOAD) Protocol 1199 to Support Direct Response Routing", RFC 7263, 1200 DOI 10.17487/RFC7263, June 2014, 1201 . 1203 11.2. Informative References 1205 [UnixTime] 1206 "UnixTime", .>. 1209 [I-D.ietf-p2psip-concepts] 1210 Bryan, D., Matthews, P., Shim, E., Willis, D., and S. 1211 Dawkins, "Concepts and Terminology for Peer to Peer SIP", 1212 draft-ietf-p2psip-concepts-08 (work in progress), February 1213 2016. 1215 [Overlay-Failure-Detection] 1216 Zhuang, S., "On failure detection algorithms in overlay 1217 networks", Proc. IEEE Infocomm, Mar 2005. 1219 [Handling_Churn_in_a_DHT] 1220 Rhea, S., "Handling Churn in a DHT", USENIX Annual 1221 Conference, June 2004. 1223 [Diagnostic_Framework] 1224 Jin, X., "A Diagnostic Framework for Peer-to-Peer 1225 Streaming", 2005. 1227 [Diagnostics_and_NAT_traversal_in_P2PP] 1228 Gupta, G., "Diagnostics and NAT Traversal in P2PP - Design 1229 and Implementation", Columbia University Report , June 1230 2008. 1232 Appendix A. Examples 1234 Below, we sketch how these metrics can be used. 1236 A.1. Example 1 1238 A peer may set EWMA_BYTES_SENT and EWMA_BYTES_RCVD flags in the 1239 PathTrackReq to its direct neighbors. A peer can use EWMA_BYTES_SENT 1240 and EWMA_BYTES_RCVD of another peer to infer whether it is acting as 1241 a media relay. It may then choose not to forward any requests for 1242 media relay to this peer. Similarly, among the various candidates 1243 for filling up routing table, a peer may prefer a peer with a large 1244 UPTIME value, small RTT, and small LAST_CONTACT value. 1246 A.2. Example 2 1248 A peer may set the STATUS_INFO Flag in the PathTrackReq to a remote 1249 destination peer. The overlay has its own threshold definition for 1250 congestion. The peer can obtain knowledge of all the status 1251 information of the intermediate peers along the path. Then it can 1252 choose other paths to that node for the subsequent requests. 1254 A.3. Example 3 1256 A peer may use Ping to evaluate the average overlay hops to other 1257 peers by sending PingReq to a set of random resource or node IDs in 1258 the overlay. A peer may adjust its timeout value according to the 1259 change of average overlay hops. 1261 Appendix B. Problems with Generating Multiple Responses on Path 1263 An earlier version of this document considered an approach where a 1264 response was generated by each intermediate peer as the message 1265 traversed the overlay. This approach was discarded. One reason this 1266 approach was discarded was that it could provide a DoS mechanism, 1267 whereby an attacker could send an arbitrary message claiming to be 1268 from a spoofed "sender" the real sender wished to attack. As a 1269 result of sending this one message, many messages would be generated 1270 and sent back to the spoofed "sender" - one from each intermediate 1271 peer on the message path. While authentication mechanisms could 1272 reduce some risk of this attack, it still resulted in a fundamental 1273 break from the request-response nature of the RELOAD protocol, as 1274 multiple responses are generated to a single request. Although one 1275 request with responses from all the peers in the route will be more 1276 efficient, it was determined to be too great a security risk and 1277 deviation from the RELOAD architecture. 1279 Appendix C. Changes to the Draft 1281 To RFC editor: This section is to track the changes. Please remove 1282 this section before publication. 1284 C.1. Changes since -00 version 1286 1. Changed title from "Diagnose P2PSIP Overlay Network" to "P2PSIP 1287 Overlay Diagnostics". 1289 2. Changed the table of contents. Add a section about message 1290 processing and a section of examples. 1292 3. Merge diagnostics text from the p2psip base draft -01. 1294 4. Removed ECHO method for security reasons. 1296 C.2. Changes since -01 version 1298 Added BATTERY_STATUS as diagnostic information. 1300 Removed UnderlayTTL test from the Ping method, instead adding an 1301 UNDERLAY_HOP diagnostic information for PathTrack method. 1303 Give some examples for diagnostic information, and give some 1304 editor's notes for further work. 1306 C.3. Changes since -02 version 1308 Provided further explanation as to why the base draft Ping in the 1309 current form cannot be used to replace Ping, and why some combination 1310 of methods cannot replace PathTrack. 1312 C.4. Changes since -03 version 1314 Modified structure used to share information collected. Both 1315 mechanisms now use a common data structure to convey information. 1317 C.5. Changes since -04 version 1319 Updated the authors' addresses and modified the last sentence in . 1320 (Section 4.3.1.2) 1322 C.6. Changes since -05 version 1324 Resolve Marc's comments from the mailing list. And define the 1325 details of STATUS_INO. 1327 C.7. Changes in version -10 1329 Resolve the authorization issue and other comments (e.g. define 1330 diagnostics as a mandatory extension) from WGLC. And check for the 1331 languages. 1333 C.8. Changes in version -15 1335 Changed several diagnostic Kind return values to be 64 bit vs. 32 bit 1336 to provide headroom. Split bandwidth into upstream and downstream. 1337 Renamed length in diagnostic request object to ext_length, added 1338 ext_length to response object, and clarified that ext_length is 1339 length of diagnostic info/extensions being returned, not the length 1340 of the object. 1342 Aligned many flags/values with RELOAD by using hex vs decimal values. 1344 Significant reorganization and edit for readability. 1346 C.9. Changes in version -20 1348 Addressed the IESG comments: 1350 (1) this document does not update RFC 6940, but is an extension 1352 (2) remove "p2psip" from the document, according to Ben and 1353 Benoit's comments 1355 (3) update Roni's email address 1357 (4) re-check the document to make sure that access control policy 1358 is the same 1360 (5) change Trust policy from "pre-5378" to "200902" 1362 (6) adress the EWMA_BYTES_RCVD and EWMA_BYTES_SENT equation 1363 problem rasied by Alisa 1365 (7) replace "IANA SHALL" with "IANA is asked to" according to 1366 Spencer and Barry's concern 1368 (8) replace "SHOULD's with "MUST"s in Section 6.2, change "MAY" to 1369 "may" in Section 6.4 according to Ben's comments 1371 (9) add a paragraph in Section 4.3 to explain this document does 1372 not gurantee the same path fro Path_Track, but only provides 1373 information for analysis, according to the list discussion with 1374 Alvaro 1376 (10) change "directly or via symmetric routing" in Section 4.3 to 1377 "direct response routing or via symmetric routing", and give a 1378 reference to direct response routing RFC, according to the list 1379 discussion with Alvaro 1380 (11) change Section 5.3 and 9.1 about the reserved dMFlags bits 1381 issue according to Jari and Alexey's comment 1383 (12) replace "diagnostic kind type" with "diagnostic Kind" 1385 (12) correct other minor editorial issues 1387 C.10. Changes in version -22 1389 (1) fix the bugs in IANA section 1391 Authors' Addresses 1393 Haibin Song 1394 Huawei 1396 Email: haibin.song@huawei.com 1398 Jiang Xingfeng 1399 Huawei 1401 Email: jiangxingfeng@huawei.com 1403 Roni Even 1404 Huawei 1405 14 David Hamelech 1406 Tel Aviv 64953 1407 Israel 1409 Email: ron.even.tlv@gmail.com 1411 David A. Bryan 1412 ethernot.org 1413 Cedar Park, Texas 1414 United States of America 1416 Email: dbryan@ethernot.org 1418 Yi Sun 1419 ICT 1421 Email: sunyi@ict.ac.cn