idnits 2.17.1 draft-ietf-p2psip-diagnostics-19.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 22 instances of too long lines in the document, the longest one being 2 characters in excess of 72. -- The draft header indicates that this document updates RFC6940, but the abstract doesn't seem to mention this, which it should. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 337 has weird spacing: '...ionType type;...' == Line 531 has weird spacing: '... opaque diagn...' (Using the creation date from RFC6940, updated by this document, for RFC5378 checks: 2008-10-28) -- The document seems to contain a disclaimer for pre-RFC5378 work, and may have content which was first submitted before 10 November 2008. The disclaimer is necessary when there are original authors that you have been unable to contact, or if some do not wish to grant the BCP78 rights to the IETF Trust. If you are able to get all authors (current and original) to grant those rights, you can and should remove the disclaimer; otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 26, 2015) is 3075 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'TBD1' is mentioned on line 1102, but not defined == Missing Reference: 'TBD2' is mentioned on line 1093, but not defined == Missing Reference: 'TBD3' is mentioned on line 1094, but not defined == Missing Reference: 'TBD4' is mentioned on line 1095, but not defined == Missing Reference: 'TBD5' is mentioned on line 1096, but not defined == Missing Reference: 'TBD6' is mentioned on line 1097, but not defined == Missing Reference: '0x00' is mentioned on line 702, but not defined == Missing Reference: '0x0F' is mentioned on line 702, but not defined == Missing Reference: 'TBD7' is mentioned on line 1072, but not defined == Missing Reference: 'TBD8' is mentioned on line 1073, but not defined == Outdated reference: A later version (-09) exists of draft-ietf-p2psip-concepts-07 -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 1 error (**), 0 flaws (~~), 14 warnings (==), 4 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 P2PSIP Working Group H. Song 3 Internet-Draft X. Jiang 4 Updates: 6940 (if approved) R. Even 5 Intended status: Standards Track Huawei 6 Expires: May 29, 2016 D. Bryan 7 ethernot.org 8 Y. Sun 9 ICT 10 November 26, 2015 12 P2P Overlay Diagnostics 13 draft-ietf-p2psip-diagnostics-19 15 Abstract 17 This document describes mechanisms for P2P overlay diagnostics. It 18 defines extensions to the RELOAD P2PSIP base protocol to collect 19 diagnostic information, and details the protocol specifications for 20 these extensions. Useful diagnostic information for connection and 21 node status monitoring is also defined. The document also describes 22 the usage scenarios and provides examples of how these methods are 23 used to perform diagnostics in P2PSIP overlay networks. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on May 29, 2016. 42 Copyright Notice 44 Copyright (c) 2015 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 This document may contain material from IETF Documents or IETF 58 Contributions published or made publicly available before November 59 10, 2008. The person(s) controlling the copyright in some of this 60 material may not have granted the IETF Trust the right to allow 61 modifications of such material outside the IETF Standards Process. 62 Without obtaining an adequate license from the person(s) controlling 63 the copyright in such materials, this document may not be modified 64 outside the IETF Standards Process, and derivative works of it may 65 not be created outside the IETF Standards Process, except to format 66 it for publication as an RFC or to translate it into languages other 67 than English. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 3. Diagnostic Scenarios . . . . . . . . . . . . . . . . . . . . 4 74 4. Data Collection Mechanisms . . . . . . . . . . . . . . . . . 5 75 4.1. Overview of Operations . . . . . . . . . . . . . . . . . 5 76 4.2. "Ping-like" Behavior: Extending Ping . . . . . . . . . . 7 77 4.2.1. RELOAD Request Extension: Ping . . . . . . . . . . . 7 78 4.3. "Traceroute-like" Behavior: The Path_Track Method . . . . 8 79 4.3.1. New RELOAD Request: PathTrack . . . . . . . . . . . . 9 80 4.4. Error Code Extensions . . . . . . . . . . . . . . . . . . 11 81 5. Diagnostic Data Structures . . . . . . . . . . . . . . . . . 11 82 5.1. DiagnosticsRequest Data Structure . . . . . . . . . . . . 12 83 5.2. DiagnosticsResponse Data Structure . . . . . . . . . . . 13 84 5.3. dMFlags and Diagnostic Kind ID Types . . . . . . . . . . 15 85 6. Message Processing . . . . . . . . . . . . . . . . . . . . . 18 86 6.1. Message Creation and Transmission . . . . . . . . . . . . 18 87 6.2. Message Processing: Intermediate Peers . . . . . . . . . 18 88 6.3. Message Response Creation . . . . . . . . . . . . . . . . 19 89 6.4. Interpreting Results . . . . . . . . . . . . . . . . . . 20 90 7. Authorization through Overlay Configuration . . . . . . . . . 21 91 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 92 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 93 9.1. Diagnostics Flag . . . . . . . . . . . . . . . . . . . . 22 94 9.2. Diagnostic Kind ID Types . . . . . . . . . . . . . . . . 22 95 9.3. Message Codes . . . . . . . . . . . . . . . . . . . . . . 23 96 9.4. Error Code . . . . . . . . . . . . . . . . . . . . . . . 24 97 9.5. Message Extension . . . . . . . . . . . . . . . . . . . . 24 98 9.6. XML Name Space Registration . . . . . . . . . . . . . . . 24 99 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 100 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 101 11.1. Normative References . . . . . . . . . . . . . . . . . . 25 102 11.2. Informative References . . . . . . . . . . . . . . . . . 26 103 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 26 104 A.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 26 105 A.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 27 106 A.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 27 107 Appendix B. Problems with Generating Multiple Responses on Path 27 108 Appendix C. Changes to the Draft . . . . . . . . . . . . . . . . 27 109 C.1. Changes since -00 version . . . . . . . . . . . . . . . . 27 110 C.2. Changes since -01 version . . . . . . . . . . . . . . . . 28 111 C.3. Changes since -02 version . . . . . . . . . . . . . . . . 28 112 C.4. Changes since -03 version . . . . . . . . . . . . . . . . 28 113 C.5. Changes since -04 version . . . . . . . . . . . . . . . . 28 114 C.6. Changes since -05 version . . . . . . . . . . . . . . . . 28 115 C.7. Changes in version -10 . . . . . . . . . . . . . . . . . 28 116 C.8. Changes in version -15 . . . . . . . . . . . . . . . . . 28 117 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 29 119 1. Introduction 121 In the last few years, overlay networks have rapidly evolved and 122 emerged as a promising platform for deployment of new applications 123 and services in the Internet. One of the reasons overlay networks 124 are seen as an excellent platform for large scale distributed systems 125 is their resilience in the presence of failures. This resilience has 126 three aspects: data replication, routing recovery, and static 127 resilience. Routing recovery algorithms are used to repopulate the 128 routing table with live nodes when failures are detected. Static 129 resilience measures the extent to which an overlay can route around 130 failures even before the recovery algorithm repairs the routing 131 table. Both routing recovery and static resilience rely on accurate 132 and timely detection of failures. 134 There are a number of situations in which some nodes in a Peer-to- 135 Peer (P2P) overlay may malfunction or behave badly. For example, 136 these nodes may be disabled, congested, or may be misrouting 137 messages. The impact of these malfunctions on the overlay network 138 may be a degradation of quality of service provided collectively by 139 the peers in the overlay network or an interruption of the overlay 140 services. It is desirable to identify malfunctioning or badly 141 behaving peers through diagnostic tools, and exclude or reject them 142 from the P2P system. Node failures may also be caused by failures of 143 underlying layers. For example, recovery from an incorrect overlay 144 topology may be slow when the speed at which IP routing recovers 145 after link failures is very slow. Moreover, if a backbone link fails 146 and the failover is slow, the network may be partitioned, leading to 147 partitions of overlay topologies and inconsistent routing results 148 between different partitioned components. 150 Some keep-alive algorithms based on periodic probe and acknowledge 151 mechanisms enable accurate and timely detection of failures of one 152 node's neighbors [Overlay-Failure-Detection], but these algorithms by 153 themselves can only detect the disabled neighbors using the periodic 154 method. This may not be sufficient for the service provider 155 operating the overlay network. 157 For Peer-to-Peer SIP (P2PSIP), a single, general P2PSIP overlay 158 diagnostic framework supporting periodic and on-demand methods for 159 detecting node failures and network failures is desirable. This 160 document describes a general P2PSIP overlay diagnostic extension to 161 the P2PSIP base protocol RELOAD [RFC6940] and is intended as a 162 complement to keep-alive algorithms in the P2PSIP overlay itself. 163 Readers are advised to consult [I-D.ietf-p2psip-concepts] for further 164 background on the problem domain. 166 2. Terminology 168 This document uses the concepts defined in RELOAD [RFC6940]. 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in [RFC2119]. 174 3. Diagnostic Scenarios 176 P2P systems are self-organizing and ideally setup and configuration 177 of individual P2P nodes requires no network management in the 178 traditional sense. However, users of an overlay, as well as P2P 179 service providers may contemplate usage scenarios where some 180 monitoring and diagnostics are required. We present a simple 181 connectivity test and some useful diagnostic information that may be 182 used in such diagnostics. 184 The common usage scenarios for P2P diagnostics can be broadly 185 categorized in three classes: 187 a. Automatic diagnostics built into the P2P overlay routing 188 protocol. Nodes perform periodic checks of known neighbors and 189 remove those nodes from the routing tables that fail to respond 190 to connectivity checks [Handling_Churn_in_a_DHT]. Unresponsive 191 nodes may only be temporarily disabled, for example due to a 192 local cryptographic processing overload, disk processing overload 193 or link overload. It is therefore useful to repeat the 194 connectivity checks to see nodes have recovered and can be again 195 placed in the routing tables. This process is known as 'failed 196 node recovery' and can be optimized as described in the paper 197 "Handling Churn in a DHT" [Handling_Churn_in_a_DHT]. 199 b. Diagnostics used by a particular node to follow up on an 200 individual user complaint or failure. For example, a technical 201 support staff member may use a desktop sharing application (with 202 the permission of the user) to remotely determine the health of, 203 and possible problems with, the malfunctioning node. Part of the 204 remote diagnostics may consist of simple connectivity tests with 205 other nodes in the P2PSIP overlay and retrieval of statistics 206 from nodes in the overlay. The simple connectivity tests are not 207 dependent on the type of P2PSIP overlay. Note that other tests 208 may be required as well, including checking the health and 209 performance of the user's computer or mobile device and checking 210 the bandwidth of the link connecting the user to the Internet. 212 c. P2P system-wide diagnostics used to check the overall health of 213 the P2P overlay network. These include checking the consumption 214 of network bandwidth, checking for the presence of problem links 215 and checking for abusive or malicious nodes. This is not a 216 trivial problem and has been studied in detail for content and 217 streaming P2P overlays [Diagnostic_Framework], and has not been 218 addressed in earlier P2PSIP documents 219 [Diagnostics_and_NAT_traversal_in_P2PP]. While this is a 220 difficult problem, a great deal of information that can help in 221 diagnosing these problems can be obtained by obtaining basic 222 diagnostic information for peers and the network. This document 223 provides a framework for obtaining this information. 225 4. Data Collection Mechanisms 227 4.1. Overview of Operations 229 The diagnostic mechanisms described in this document are primarily 230 intended to detect and locate failures or monitor performance in 231 P2PSIP overlay networks. It provides mechanisms to detect and locate 232 malfunctioning or badly behaving nodes including disabled nodes, 233 congested nodes and misrouting peers. It provides a mechanism to 234 detect direct connectivity or connectivity to a specified node, a 235 mechanism to detect the availability of specified resource records 236 and a mechanism to discover P2PSIP overlay topology and the underlay 237 topology failures. 239 The P2PSIP diagnostics extensions define two mechanisms to collect 240 data. The first is an extension to the RELOAD Ping mechanism, 241 allowing diagnostic data to be queried from a node, as well as to 242 diagnose the path to that node. The second is a new method and 243 response, PathTrack, for collecting diagnostic information 244 iteratively. Payloads for these mechanisms allowing diagnostic data 245 to be collected and represented are presented, and additional error 246 codes are introduced. Essentially, this document reuses RELOAD 247 [RFC6940]specification and extends them to introduce the new 248 diagnostics methods. The extensions strictly follow how RELOAD 249 specifies message routing, transport, NAT traversal, and other RELOAD 250 protocol features. The diagnostic methods are however P2PSIP 251 protocol independent. 253 This document primarily describes how to detect and locate failures 254 including disabled nodes, congested nodes, misrouting behaviors and 255 underlying network faults in P2PSIP overlay networks through a simple 256 and efficient mechanism. This mechanism is modeled after the ping/ 257 traceroute paradigm: ping [RFC0792] is used for connectivity checks, 258 and traceroute is used for hop-by-hop fault localization as well as 259 path tracing. This document specifies a "ping-like" mode (by 260 extending the RELOAD Ping method to gather diagnostics) and a 261 "traceroute-like" mode (by defining the new PathTrack method) for 262 diagnosing P2PSIP overlay networks. 264 One way these tools can be used is to detect the connectivity to the 265 specified node or the availability of the specified resource-record 266 through the extended P2PSIP Ping operation. Once the overlay network 267 receives some alarms about overlay service degradation or 268 interruption, a Ping is sent. If the Ping fails, one can then send a 269 PathTrack to determine where the fault lies. 271 The diagnostic information can only be provided to authorized nodes. 272 Some diagnostic information can be provided to all the participants 273 in the P2PSIP overlay, and some other diagnostic information can only 274 be provided to the nodes authorized by the local or overlay policy. 275 The authorization depends on the type of the diagnostic information 276 and the administrative considerations, and is application specific. 278 This document considers the general administrative scenario based on 279 diagnostic kind type, where a whole overlay can authorize a certain 280 type of diagnostic information to a small list of particular nodes 281 (e.g. administrative nodes). That means, if a node gets the 282 authorization to access a diagnostic kind type, it can access that 283 information from all nodes in the overlay network. It leaves the 284 scenario where a particular node authorizes its diagnostic 285 information to a particular list of nodes out of scope. This could 286 be achieved by extension of this document if there is requirement in 287 the near future. The default policy or access rule for a type of 288 diagnostic information is "permit" unless specified in the 289 diagnostics extension document. As the RELOAD protocol already 290 requires that each message carries the message signature of the 291 sender, the receiver of the diagnostics requests can use the 292 signature to identify the sender. It can then use the overlay 293 configuration file with this signature to determine which types of 294 diagnostic information that node is authorized for. 296 In the remainder of this section we define mechanisms for collecting 297 data, as well as the specific protocol extensions (message 298 extensions, new methods, and error codes) required to collect this 299 information. In Section 5 we discuss the format of the data 300 collected, and in Section 6 we discuss detailed message processing. 302 4.2. "Ping-like" Behavior: Extending Ping 304 To provide "ping-like" behavior, the RELOAD Ping method is extended 305 to collect diagnostic data along the path. The request message is 306 forwarded by the intermediate peers along the path and then 307 terminated by the responsible peer. After optional local 308 diagnostics, the responsible peer returns a response message. If an 309 error is found when routing, an Error response is sent to the 310 initiator node by the intermediate peer. 312 The message flow of a Ping message (with diagnostic extensions) is as 313 follows: 315 Peer A Peer B Peer C Peer D 316 | | | | 317 |(1). PingReq | | | 318 |------------------->|(2). PingReq | | 319 | |------------------->|(3). PingReq | 320 | | |------------------->| 321 | | | | 322 | | |<-------------------| 323 | |<-------------------|(4). PingAns | 324 |<-------------------|(5). PingAns | | 325 |(6). PingAns | | | 326 | | | | 328 Figure 1: Ping Diagnostic Message Flow 330 4.2.1. RELOAD Request Extension: Ping 332 To extend the ping request for use in diagnostics, a new extension of 333 RELOAD is defined. The structure for a MessageExtension in RELOAD is 334 defined as: 336 struct { 337 MessageExtensionType type; 338 Boolean critical; 339 opaque extension_contents<0..2^32-1>; 340 } MessageExtension; 342 For the Ping request extension, we define a new MessageExtensionType, 343 extension 0x0002 named Diagnostic_Ping, as specified in Table 4. The 344 extension contents consists of a DiagnosticsRequest structure, 345 defined later in this document in Section 5.1. This extension MAY be 346 used for new requests of the Ping method and MUST NOT be included in 347 requests using any other method. 349 This extension is not critical. If a peer does not support the 350 extension, they will simply ignore the diagnostic portion of the 351 message, and will treat the message as if it was a normal ping. 352 Senders MUST accept a response that lacks diagnostic information and 353 SHOULD NOT resend the message expecting a reply. Receivers who 354 receive a method other than Ping including this extension MUST ignore 355 the extension. 357 4.3. "Traceroute-like" Behavior: The Path_Track Method 359 We define a simple PathTrack method for retrieving diagnostic 360 information iteratively. 362 The operation of this request is shown below in Figure 2. The 363 initiator node A asks its neighbor B which is the next hop peer to 364 the destination ID, and B returns a message with the next hop peer C 365 information, along with optional diagnostic information for B to the 366 initiator node. Then the initiator node A asks the next hop peer C 367 (directly or via symmetric routing) to return next hop peer D 368 information and diagnostic information of C. Unless a failure 369 prevents the message from being forwarded, this step can be 370 iteratively repeated until the request reaches responsible peer D for 371 the destination ID, and retrieves diagnostic information of peer D. 373 The message flow of a PathTrack message (with diagnostic extensions) 374 is as follows: 376 Peer-A Peer-B Peer-C Peer-D 377 | | | | 378 |(1).PathTrackReq | | | 379 |------------------->| | | 380 |(2).PathTrackAns | | | 381 |<-------------------| | | 382 | |(3).PathTrackReq | | 383 |--------------------|------------------->| | 384 | |(4).PathTrackAns | | 385 |<-------------------|--------------------| | 386 | | |(5).PathTrackReq | 387 |--------------------|--------------------|------------------->| 388 | | |(6).PathTrackAns | 389 |<-------------------|--------------------|--------------------| 390 | | | | 392 Figure 2: PathTrack Diagnostic Message Flow 394 There have been proposals that RouteQuery and a series of Fetch 395 requests can be used to replace the PathTrack mechanism, but in the 396 presence of high rates of churn, such an operation would not, 397 strictly speaking, provide identical results, as the path may change 398 between RouteQuery and Fetch operations. While obviously the path 399 could change between steps of PathTrack as well, with a single 400 message rather than two messages for query and fetch, less 401 inconsistency is likely, and thus the use of a single message is 402 preferred. 404 Given that in a typical diagnostic scenario the peer sending the 405 PathTrack request desires to obtain information about the current 406 path to the destination, in the event that succesive calls to 407 PathTrack return different paths, the results should be discarded and 408 the request resent, ensuring that the second request traverses the 409 appropriate path. 411 4.3.1. New RELOAD Request: PathTrack 413 This document defines a new RELOAD method, PathTrack, to retrieve the 414 diagnostic information from the intermediate peers along the routing 415 path. At each step of the PathTrack request, the responsible peer 416 responds to the initiator node with requested status information. 417 Status information can include a peer's congestion state, processing 418 power, available bandwidth, the number of entries in its neighbor 419 table, uptime, identity, network address information, and next hop 420 peer information. 422 A PathTrack request specifies which diagnostic information is 423 requested using a DiagnosticsRequest data structure, defined and 424 discussed in detail later in this document in Section 5.1. Base 425 information is requested by setting the appropriate flags in the data 426 structure in the request. If all flags are clear (no bits are set), 427 then the PathTrack request is only used for requesting the next hop 428 information. In this case the iterative mode of PathTrack is 429 degraded to a RouteQuery method which is only used for checking the 430 liveness of the peers along the routing path. The PathTrack request 431 can be routed directly or through the overlay based on the routing 432 mode chosen by the initiator node. 434 A response to a successful PathTrackReq is a PathTrackAns message. 435 The PathTrackAns contains general diagnostic information in the 436 payload, returned using a DiagnosticResponse data structure. This 437 data structure is defined and discussed in detail later in this 438 document in Section 5.2. The information returned is determined 439 based on the information requested in the flags in the corresponding 440 request. 442 4.3.1.1. PathTrack Request 444 The structure of the PathTrack request is as follows: 446 struct{ 447 Destination destination; 448 DiagnosticsRequest request; 449 }PathTrackReq; 451 The fields of the PathTrackReq are as follows: 453 destination : The destination which the initiator node is 454 interested in. This may be any valid destination object, 455 including a NodeID, opaque ids, or ResourceID. One example should 456 be noted that, for debugging purpose, the initiator will use the 457 destination ID as it was used when failure happened. 459 request : A DiagnosticsRequest, as discussed in Section 5.1. 461 4.3.1.2. PathTrack Response 463 The structure of the PathTrack response is as follows: 465 struct{ 466 Destination next_hop; 467 DiagnosticsResponse response; 468 }PathTrackAns; 470 The fields of the PathTrackAns are as follows: 472 next_hop : The information of the next hop node from the 473 responding intermediate peer to the destination. If the 474 responding peer is the responsible peer for the destination ID, 475 then the next_hop node ID equals the responding node ID, and after 476 receiving a PathTrackAns where the next_hop node ID equals the 477 responding node ID the initiator MUST stop the iterative process. 479 response : A DiagnosticsResponse, as discussed in Section 5.2. 481 4.4. Error Code Extensions 483 This document extends the Error response method defined in the RELOAD 484 specification to support error cases resulting from diagnostic 485 queries. When an error is encountered in RELOAD, the Message Code 486 0xFFFF is returned. The ErrorResponse structure includes an error 487 code. We define new error codes to report possible error conditions 488 detected while performing diagnostics: 490 Code Value Error Code Name 491 [TBD1] Underlay Destination Unreachable 492 [TBD2] Underlay Time exceeded 493 [TBD3] Message Expired 494 [TBD4] Upstream Misrouting 495 [TBD5] Loop detected 496 [TBD6] TTL hops exceeded 498 The final error codes will be assigned by IANA as specified in RELOAD 499 protocol [RFC6940]. The error code is returned by the upstreaming 500 node before the failure node. And the upstreaming node uses the 501 normal ping to detect the failure type and return it to the initiator 502 node, which will help the user (initiator node) to understand where 503 the failure happened and what kind of error happened, as the failure 504 may happen at the same location and for the same reason when sending 505 the normal message and the diagnostics message. 507 As defined in RELOAD, additional information may be stored (in an 508 implementation-specific way) in the optional error_info byte string. 509 While the specifics are obviously left to the implementation, as an 510 example, in the case of [TBD1], the error_field could be used to 511 provide additional information as to why the underlay destination is 512 unreachable (net unreachable, host unreachable, fragmentation needed, 513 etc.) 515 5. Diagnostic Data Structures 517 Both the extended Ping method and PathTrack method use the following 518 common diagnostics data structures to collect data. Two common 519 structures are defined: DiagnosticsRequest for requesting data, and 520 DiagnosticsResponse for returning the information. 522 5.1. DiagnosticsRequest Data Structure 524 The DiagnosticsRequest data structure is used to request diagnostic 525 information and has the following form: 527 enum{ (2^16-1) } DiagnosticKindId; 529 struct{ 530 DiagnosticKindId kind; 531 opaque diagnostic_extension_contents<0..2^32-1>; 532 }DiagnosticExtension; 534 struct{ 535 uint64 expiration; 536 uint64 timestamp_initiated; 537 uint64 dMFlags; 538 uint32 ext_length; 539 DiagnosticExtension diagnostic_extensions_list<0..2^32-1>; 540 }DiagnosticsRequest; 542 The fields in the DiagnosticsRequest are as follows: 544 expiration : The time when the request will expire represented as 545 the number of milliseconds elapsed since midnight Jan 1, 1970 UTC 546 not counting leap seconds. This will have the same values for 547 seconds as standard UNIX time or POSIX time. More information can 548 be found at UnixTime [UnixTime]. This value MUST have a value of 549 between 1 and 600 seconds in the future. This value is used to 550 prevent replay attacks. 552 timestamp_initiated : The time when the P2PSIP diagnostics request 553 was initiated represented as the number of milliseconds elapsed 554 since midnight Jan 1, 1970 UTC not counting leap seconds. This 555 will have the same values for seconds as standard UNIX time or 556 POSIX time. 558 dMFlags : A mandatory field which is an unsigned 64-bit integer 559 indicating which base diagnostic information the request initiator 560 node is interested in. The initiator sets different bits to 561 retrieve different kinds of diagnostic information. If dMFlags is 562 set to zero, then no base diagnostic information is conveyed in 563 the PathTrack response. If dMFlag is set to all '1's, then all 564 base diagnostic information values are requested. A request may 565 set any number of the flags to request the corresponding 566 diagnostic information. 568 Note this memo specifies the initial set of flags, the flags can 569 be extended. The dMflags indicate general diagnostic information 570 The mapping between the bits in the dMFlags and the diagnostic 571 information kind presented is as described in Section 9.1. 573 ext_length : the length of the extended diagnostic request 574 information in bytes. If the value is greater than or equal to 1, 575 then some extended diagnostic information is being requested, on 576 the assumption this information will be included in the response 577 if the recipient understands the extended request and is willing 578 to provide it. The specific diagnostic information requested is 579 defined in the diagnostic_extensions_list below. A value of zero 580 indicates no extended diagnostic information is being requested. 581 The value of ext_length MUST NOT be negative. Note that it is not 582 the length of the entire DiagnosticsRequest data structure, but of 583 the data making up the diagnostic_extensions_list. 585 diagnostic_extensions_list : consists of one or more 586 DiagnosticExtension structures (see below) documenting additional 587 diagnostic information being requested. Each DiagnosticExtension 588 consists of the following fields: 590 kind : a numerical code indicating the type of extension 591 diagnostic information (see Section 9.2). Note that kinds 592 0xF000 - 0xFFFE are reserved for overlay specific diagnostics 593 and may be used without IANA registration for local diagnostic 594 information. Kinds from 0x0000 to 0x003F MUST NOT be indicated 595 in the diagnostic_extensions_list in the message request, as 596 they may be represented using the dMFlags in a much simpler 597 (and more space efficient) way. 599 diagnostic_extension_contents : the opaque data containing the 600 request for this particular extension. This data is extension 601 dependent. 603 5.2. DiagnosticsResponse Data Structure 604 enum { (2^16-1) } DiagnosticKindId; 605 struct{ 606 DiagnosticKindId kind; 607 opaque diagnostic_info_contents<0..2^16-1>; 608 }DiagnosticInfo; 610 struct{ 611 uint64 expiration; 612 uint64 timestamp_initiated; 613 uint64 timestamp_received; 614 uint8 hop_counter; 615 uint32 ext_length; 616 DiagnosticInfo diagnostic_info_list<0..2^32-1>; 617 }DiagnosticsResponse; 619 The fields in the DiagnosticsResponse are as follows: 621 expiration : The time when the response will expire represented as 622 the number of milliseconds elapsed since midnight Jan 1, 1970 UTC 623 not counting leap seconds. This will have the same values for 624 seconds as standard UNIX time or POSIX time. This value MUST have 625 a value of between 1 and 600 seconds in the future. 627 timestamp_initiated: This value is copied from the diagnostics 628 request message. The benefit of containing such a value in the 629 response message is that the initiator node does not have to 630 maintain the state. 632 timestamp_received : The time when P2PSIP Overlay diagnostic 633 request was received represented as the number of milliseconds 634 elapsed since midnight Jan 1, 1970 UTC not counting leap seconds. 635 This will have the same values for seconds as standard UNIX time 636 or POSIX time. 638 hop_counter : This field only appears in diagnostic responses. It 639 MUST be exactly copied from the TTL field of the forwarding header 640 in the received request. This information is sent back to the 641 request initiator, allowing it to compute the number of hops that 642 the message traversed in the overlay. 644 ext_length : the length of the returned DiagnosticInfo information 645 in bytes. If the value is greater than or equal to 1, then some 646 extended diagnostic information (as specified in the 647 DiagnosticsRequest) was available and is being returned. In that 648 case, this value indicates the length of the returned information. 649 A value of zero indicates no extended diagnostic information is 650 included, either because none was requested or the request could 651 not be accommodated. The value of ext_length MUST NOT be 652 negative. Note that it is not the length of the entire 653 DiagnosticsRequest data structure, but of the data making up the 654 diagnostic_info_list. 656 diagnostic_info_list : consists of one or more DiagnosticInfo 657 structures containing the requested diagnostic_info_contents. The 658 fields in the DiagnosticInfo structure are as follows: 660 kind : A numeric code indicating the type of information being 661 returned. For base data requested using the dMFlags, this code 662 corresponds to the dMFlag set, and is described in Section 5.1. 663 For diagnostic extensions, this code will be identical to the 664 value of the DiagnosticKindId set in the "kind" field of the 665 DiagnosticExtension of the request. See Section 9.2. 667 diagnostic_info_contents : Data containing the value for the 668 diagnostic information being reported. Various kinds of 669 diagnostic information can be retrieved, Please refer to 670 Section 5.3 for details of the diagnostic kind ID for the base 671 diagnostic information that may be reported. 673 5.3. dMFlags and Diagnostic Kind ID Types 675 The dMFlags field described above is a 64 bit field that allows 676 initiator nodes to identify up to 62 items of base information to 677 request in a request message (the first and last flags being 678 reserved). When the requested base information is returned in the 679 response, the value of the diagnostic kind ID will correspond to the 680 numeric field marked in the dMFlags in the request. The values for 681 the dMFlags are defined in Section 9.1 and the diagnostic kind IDs 682 are defined in Section 9.2. The information contained for each value 683 is described in this section. Access to each kind of diagnostic 684 information MUST NOT be allowed unless compliant to the rules defined 685 in Section 7. 687 STATUS_INFO (8 bits):A single value element containing an unsigned 688 byte representing whether or not the node is in congestion status. 689 An example usage of STATUS_INFO is for congestion-aware routing. 690 In this scenario, each peer has to update its congestion status 691 periodically. An intermediate peer in the distributed hash table 692 (DHT) network will choose its next hop according to both the DHT 693 routing algorithm and the status information. This is done to 694 avoid increasing load on congested peers. The rightmost 4 bits 695 are used and other bits MUST be cleared to "0"s for future use. 696 There are 16 levels of congestion status, with "0x00" represent 697 zero load and "0x0F" represent congested. This document does not 698 provide a specific method for congestion, leaving this decision to 699 each overlay implementation. One possible option for an overlay 700 implementation would be to take node's CPU/memory/bandwidth usage 701 percentage in the past 600 seconds and normalize the highest value 702 to the range [0x00, 0x0F]. And an overlay implementation can also 703 decide to not use all that 16 values from 0x00 to 0x0F. A future 704 draft may define an objective measure or specific algorithm for 705 this. 707 ROUTING_TABLE_SIZE (32 bits): A single value element containing an 708 unsigned 32-bit integer representing the number of peers in the 709 peer's routing table. The administrator of the overlay may be 710 interested in statistics of this value for reasons such as routing 711 efficiency. 713 PROCESS_POWER (64 bits): A single value element containing an 714 unsigned 64-bit integer specifying the processing power of the 715 node in unit of MIPS. Fractional values are rounded up. 717 UPSTREAM_BANDWIDTH (64 bits): A single value element containing an 718 unsigned 64-bit integer specifying the upstream network bandwidth 719 (provisioned or maximum, not available) of the node in unit of 720 Kbps. Fractional values are rounded up. For multihomed hosts, 721 this should be the link used to send the response. 723 DOWNSTREAM_BANDWIDTH (64 bits): A single value element containing 724 an unsigned 64-bit integer specifying the downstream network 725 bandwidth (provisioned or maximum, not available) of the node in 726 unit of Kbps. Fractional values are rounded up. For multihomed 727 hosts, this should be the link the request was received from. 729 SOFTWARE_VERSION: A single value element containing a US-ASCII 730 string that identifies the manufacture, model, operating system 731 information and the version of the software. Given that there are 732 very large number of peers in some networks, and no peer is likely 733 to know all other peer's software, this information may be very 734 useful to help determine if the cause of certain groups of 735 misbehaving peers is related to specific software versions. While 736 the format is peer-defined, a suggested format is as follows: 737 "ApplicationProductToken (Platform; OS-or-CPU) VendorProductToken 738 (VendorComment)". For example: "MyReloadApp/1.0 (Unix; Linux 739 x86_64) libreload-java/0.7.0 (Stonyfish Inc.)". The string is a 740 C-style string, and MUST be delimited by "\0"."\0" MUST NOT be 741 included in the string itself to prevent confusion with the 742 delimiter. 744 MACHINE_UPTIME (64 bits): A single value element containing an 745 unsigned 64-bit integer specifying the time the node's underlying 746 system has been up in seconds. 748 APP_UPTIME (64 bits): A single value element containing an 749 unsigned 64-bit integer specifying the time the P2P application 750 has been up in seconds. 752 MEMORY_FOOTPRINT (64 bits): A single value element containing an 753 unsigned 64-bit integer representing the memory footprint of the 754 peer program in kilobytes (1024 bytes). Fractional values are 755 rounded up. 757 DATASIZE_STORED (64 bits): An unsigned 64-bit integer representing 758 the number of bytes of data being stored by this node. 760 INSTANCES_STORED: An array element containing the number of 761 instances of each kind stored. The array is indexed by Kind-ID. 762 Each entry is an unsigned 64-bit integer. 764 MESSAGES_SENT_RCVD: An array element containing the number of 765 messages sent and received. The array is indexed by method code. 766 Each entry in the array is a pair of unsigned 64-bit integers 767 (packed end to end) representing sent and received. 769 EWMA_BYTES_SENT (32 bits): A single value element containing an 770 unsigned 32-bit integer representing an exponential weighted 771 average of bytes sent per second by this peer. sent = alpha x 772 sent_present + (1 - alpha) x sent where sent_present represents 773 the bytes sent per second since the last calculation and sent 774 represents the last calculation of bytes sent per second. A 775 suitable value for alpha is 0.8. This value is calculated every 776 five seconds. 778 EWMA_BYTES_RCVD (32 bits): A single value element containing an 779 unsigned 32-bit integer representing an exponential weighted 780 average of bytes received per second by this peer. rcvd = alpha x 781 rcvd_present + (1 - alpha) x rcvd where rcvd_present represents 782 the bytes received per second since the last calculation and rcvd 783 represents the last calculation of bytes received per second. A 784 suitable value for alpha is 0.8. This value is calculated every 785 five seconds. 787 UNDERLAY_HOP (8 bits): Indicates the IP layer hops from the 788 intermediate peer which receives the diagnostics message to the 789 next hop peer for this message. (Note: RELOAD does not require 790 the intermediate peers to look into the message body. So here we 791 use PathTrack to gather underlay hops for diagnostics purpose). 793 BATTERY_STATUS (8 bits): The left-most bit is used to indicate 794 whether this peer is using a battery or not. If this bit is clear 795 (set to '0'), then the peer is using a battery for power. The 796 other 7 bits are to be determined by specific applications. 798 6. Message Processing 800 6.1. Message Creation and Transmission 802 When constructing either a Ping message with diagnostic extensions or 803 a PathTrack message, the sender first creates and populates a 804 DiagnosticsRequest data structure. The timestamp_initiated field is 805 set to the current time, and the expiration field is constructed 806 based on this time. The sender includes the dMFlags field in the 807 structure, setting any number (including all) of the flags to request 808 particular diagnostic information. The sender MAY leave all the bits 809 unset, requesting no particular diagnostic information. 811 The sender MAY also include diagnostic extensions in the 812 DiagnosticsRequest data structure to request additional information. 813 If the sender includes any extensions, it MUST calculate the length 814 of these extensions and set the ext_length field to this value. If 815 no extensions are included, the sender MUST set ext_length to zero. 817 The format of the DiagnosticRequest data structure and its fields 818 MUST follow the restrictions defined in Section 5.1. 820 When constructing a Ping message with diagnostic extensions, the 821 sender MUST create an MessageExtension structure as defined in RELOAD 822 [RFC6940], setting the value of type to 0x0002, and the value of 823 critical to FALSE. The value of extension_contents MUST be a 824 DiagnosticsRequest structure as defined above. The message MAY be 825 directed to a particular NodeId or ResourceID, but MUST NOT be sent 826 to the broadcast NodeID. 828 When constructing a PathTrack message, the sender MUST set the 829 message_code for the RELOAD MessageContents structure to 830 path_track_req ([TBD7]). The request field of the PathTrackReq MUST 831 be set to the DiagnosticsRequest data structure defined above. The 832 destination field MUST be set to the desired destination, which MAY 833 be either a NodeId or ResourceID but SHOULD NOT be the broadcast 834 NodeID. 836 6.2. Message Processing: Intermediate Peers 838 When a request arrives at a peer, if the peer's responsible ID space 839 does not cover the destination ID of the request, then the peer MUST 840 continue processing this request according to the overlay specified 841 routing mode from RELOAD protocol. 843 In P2PSIP overlay, error responses to a message can be generated by 844 either an intermediate peer or the responsible peer. When a request 845 is received at a peer, the peer may find connectivity failures or 846 malfunctioning peers through the pre-defined rules of the overlay 847 network, e.g. by analyzing via list or underlay error messages. In 848 this case, the intermediate peer SHOULD return an error response to 849 the initiator node, reporting any malfunction node information 850 available in the error message payload. All error responses 851 generated MUST contain the appropriate error code. 853 Each intermediate peer receiving a Ping message with extensions (and 854 which understands the extension) or receiving a PathTrack request/ 855 response SHOULD check the expiration value (Unix time format) to 856 determine if the message is expired. If the message expired, the 857 intermediate peer SHOULD generate a response with Error Code [TBD3] 858 "Message Expired", return the response to the initiator node, and 859 discard the message. 861 The intermediate peer SHOULD return an error response with the Error 862 Code [TBD1] "Underlay Destination Unreachable" when it receives an 863 ICMP message with "Destination Unreachable" information after 864 forwarding the received request to the destination peer. 866 The intermediate peer SHOULD return an error response with the Error 867 Code [TBD2] "Underlay Time Exceeded" when it receives an ICMP message 868 with "Time Exceeded" information after forwarding the received 869 request. 871 The peer SHOULD return an Error response with Error Code [TBD4] 872 "Upstream Misrouting" when it finds its upstream peer disobeys the 873 routing rules defined in the overlay. The immediate upstream peer 874 information SHOULD also be conveyed to the initiator node. 876 The peer SHOULD return an Error response with Error Code [TBD5] "Loop 877 detected" when it finds a loop through the analysis of via list. 879 The peer SHOULD return an Error response with Error Code [TBD6] "TTL 880 hops exceeded" when it finds that the TTL field value is no more than 881 0 when forwarding. 883 6.3. Message Response Creation 885 When a diagnostic request message arrives at a peer, it is 886 responsible for the destination ID specified in the forwarding 887 header, and assuming it understands the extension (in the case of 888 Ping) or the new request type PathTrack, it MUST follow the 889 specifications defined in RELOAD to form the response header, and 890 perform the following operations: 892 When constructing a PathTrack response, the sender MUST set the 893 message_code for the RELOAD MessageContents structure to 894 path_track_ans ([TBD8]). 896 The receiver MUST check the expiration value (Unix time format) in 897 the DiagnosticsRequest to determine if the message is expired. If 898 the message is expired, the peer MUST generate a response with the 899 Error Code [TBD3] "Message Expired", return the response to the 900 initiator node, and discard the message. 902 If the message is not expired, the receiver MUST construct a 903 DiagnosticsResponse structure, as follows: The TTL value from the 904 forwarding header is copied to the hop_counter field of the 905 DiagnosticsResponse structure. Note that the default value for TTL 906 at the beginning represents 100-hops unless overlay configuration has 907 overridden the value. The receiver generates an Unix time format 908 timestamp for the current time of day and places it in the 909 timestamp_received field, and constructs a new expiration time and 910 places it in the expiration field of the DiagnosticsResponse. 912 The destination peer MUST check if the initiator node has the 913 authority to request specific types of diagnostic information, and if 914 appropriate, append the diagnostic information requested in the 915 dMFlags and diagnostic_extensions (if any) using the 916 diagnostic_info_list field to the DiagnosticsResponse structure. If 917 any information returned, the receiver MUST calculate the length of 918 the response and set ext_length appropriately. If no diagnostic 919 information is returned, ext_length MUST be set to zero. 921 The format of the DiagnosticResponse data structure and its fields 922 MUST follow the restrictions defined in Section 5.2. 924 In the event of an error, an error response containing the error code 925 followed by the description (if they exist) MUST be created and sent 926 to the sender. If the initiator node asks for diagnostic information 927 that they are not authorized to query, the receiving peer MUST return 928 an Error response with the Error Code 2 "Error_Forbidden". 930 6.4. Interpreting Results 932 The initiator node, as well as the responding peer, MAY compute the 933 overlay One-Way-Delay time through the value in timestamp_received 934 and the timestamp_initiated field. However, for a single hop 935 measurement, the traditional measurement methods (IP layer ping) MUST 936 be used instead of the overlay layer diagnostics methods. 938 The P2P overlay network using the diagnostics methods specified in 939 this document MUST enforce time synchronization with a central time 940 server. Network Time Protocol [RFC5905] can usually maintain time to 941 within tens of milliseconds over the public Internet, and can achieve 942 better than one millisecond accuracy in local area networks under 943 ideal conditions. However, this document does not specify the choice 944 for time synchronization, leaving it to the implementation. 946 The initiator node receiving the Ping response MAY check the 947 hop_counter field and compute the overlay hops to the destination 948 peer for the statistics of connectivity quality from the perspective 949 of overlay hops. 951 7. Authorization through Overlay Configuration 953 Different level of access control can be made for different users/ 954 nodes. For example, diagnostic information A can be accessed by node 955 1 and 2, but diagnostic information B can only be accessed by node 2. 957 The overlay configuration file MUST contain the following XML 958 elements for authorizing a node to access the relative diagnostic 959 kinds. 961 diagnostic-kind: This has the attribute "kind" with the hexadecimal 962 number indicating the diagnostic Kind Type, this attribute has the 963 same value with Section 9.2, and at least one sub element "access- 964 node". 966 access-node: This element contains one hexadecimal number indicating 967 a NodeID, and the node with this NodeID is allowed to access the 968 diagnostic "kind" under the same diagnostic-kind element. 970 8. Security Considerations 972 The authorization for diagnostic information must be designed with 973 care to prevent it becoming a method to retrieve information for bot 974 attacks. It should also be noted that attackers can use diagnostics 975 to analyze overlay information to attack certain key peers. For 976 example, diagnostic information might be used to fingerprint a peer 977 where the peer will loose its anonymity characteristics, but 978 anonymity might be very important for some P2P overlay networks, and 979 defenses against such fingerprinting are probably very hard. As 980 such, networks where anonymity is of very high importance may find 981 implementation of diagnostics problematic or even undesirable, 982 despite the many advantages it offers. As this document is a RELOAD 983 extension, it follows RELOAD message header and routing 984 specifications, the common security considerations described in the 985 base document [RFC6940] are also applicable to this document. 986 Overlays may define their own requirements on who can collect/share 987 diagnostic information. 989 9. IANA Considerations 991 9.1. Diagnostics Flag 993 IANA SHALL create a "RELOAD Diagnostics Flag" Registry. Entries in 994 this registry are 1-bit flags contained in a 64-bits long integer 995 dMFlags denoting diagnostic information to be retrieved as described 996 in Section 4.3.1. New entries SHALL be defined via [RFC5226] 997 Standards Action. The initial contents of this registry are: 999 +-------------------------+------------------------------+----------+ 1000 | diagnostic information |diagnostic flag in dMFlags | RFC | 1001 |-------------------------+------------------------------+----------| 1002 |Reserved | 0x 0000 0000 0000 0000 |RFC-[TBDX]| 1003 |STATUS_INFO | 0x 0000 0000 0000 0001 |RFC-[TBDX]| 1004 |ROUTING_TABLE_SIZE | 0x 0000 0000 0000 0002 |RFC-[TBDX]| 1005 |PROCESS_POWER | 0x 0000 0000 0000 0004 |RFC-[TBDX]| 1006 |UPSTREAM_BANDWIDTH | 0x 0000 0000 0000 0008 |RFC-[TBDX]| 1007 |DOWNSTREAM_ BANDWIDTH | 0x 0000 0000 0000 0010 |RFC-[TBDX]| 1008 |SOFTWARE_VERSION | 0x 0000 0000 0000 0020 |RFC-[TBDX]| 1009 |MACHINE_UPTIME | 0x 0000 0000 0000 0040 |RFC-[TBDX]| 1010 |APP_UPTIME | 0x 0000 0000 0000 0080 |RFC-[TBDX]| 1011 |MEMORY_FOOTPRINT | 0x 0000 0000 0000 0100 |RFC-[TBDX]| 1012 |DATASIZE_STORED | 0x 0000 0000 0000 0200 |RFC-[TBDX]| 1013 |INSTANCES_STORED | 0x 0000 0000 0000 0400 |RFC-[TBDX]| 1014 |MESSAGES_SENT_RCVD | 0x 0000 0000 0000 0800 |RFC-[TBDX]| 1015 |EWMA_BYTES_SENT | 0x 0000 0000 0000 1000 |RFC-[TBDX]| 1016 |EWMA_BYTES_RCVD | 0x 0000 0000 0000 2000 |RFC-[TBDX]| 1017 |UNDERLAY_HOP | 0x 0000 0000 0000 4000 |RFC-[TBDX]| 1018 |BATTERY_STATUS | 0x 0000 0000 0000 8000 |RFC-[TBDX]| 1019 |Reserved | 0x FFFF FFFF FFFF FFFF |RFC-[TBDX]| 1020 +-------------------------+------------------------------+--------+ 1022 [To RFC editor: Please replace all RFC-[TBDX] in this document with 1023 the RFC number of this document.] 1025 9.2. Diagnostic Kind ID Types 1027 IANA SHALL create a "RELOAD Diagnostic Kind ID Types" Registry. 1028 Entries in this registry are 16-bit integers denoting diagnostics 1029 extension data kind types carried in the diagnostic request and 1030 response message, as described in Section 5.2. Code points from 1031 0x0000 to 0x003F SHALL be assigned together with flags within "RELOAD 1032 Diagnostics Flag" registry via RFC 5226 [RFC5226] standards action. 1033 Code points in the range 0x0040 to 0xEFFF SHALL be registered via RFC 1034 5226 standards action. 1036 +---------------------------+---------------+---------------+ 1037 | Diagnostic Kind Type | Code | Specification | 1038 +---------------------------+---------------+---------------+ 1039 | reserved | 0x0000 | RFC-[TBDX] | 1040 | STATUS_INFO | 0x0001 | RFC-[TBDX] | 1041 | ROUTING_TABLE_SIZE | 0x0002 | RFC-[TBDX] | 1042 | PROCESS_POWER | 0x0003 | RFC-[TBDX] | 1043 | UPSTREAM_BANDWIDTH | 0x0004 | RFC-[TBDX] | 1044 | DOWNSTREAM_BANDWIDTH | 0x0005 | RFC-[TBDX] | 1045 | SOFTWARE_VERSION | 0x0006 | RFC-[TBDX] | 1046 | MACHINE_UPTIME | 0x0007 | RFC-[TBDX] | 1047 | APP_UPTIME | 0x0008 | RFC-[TBDX] | 1048 | MEMORY_FOOTPRINT | 0x0009 | RFC-[TBDX] | 1049 | DATASIZE_STORED | 0x000A | RFC-[TBDX] | 1050 | INSTANCES_STORED | 0x000B | RFC-[TBDX] | 1051 | MESSAGES_SENT_RCVD | 0x000C | RFC-[TBDX] | 1052 | EWMA_BYTES_SENT | 0x000D | RFC-[TBDX] | 1053 | EWMA_BYTES_RCVD | 0x000E | RFC-[TBDX] | 1054 | UNDERLAY_HOP | 0x000F | RFC-[TBDX] | 1055 | BATTERY_STATUS | 0x0010 | RFC-[TBDX] | 1056 | reserved for future flags | 0x0011-40 | RFC-[TBDX] | 1057 | local use (reserved) | 0xF000-0xFFFE | RFC-[TBDX] | 1058 | reserved | 0xFFFF | RFC-[TBDX] | 1059 +---------------------------+---------------+---------------+ 1061 Table 1: Diagnostic Kind Types 1063 9.3. Message Codes 1065 This document introduces two new types of messages and their 1066 responses, requiring the following additions to the "RELOAD Message 1067 Code" Registry defined in RELOAD [RFC6940]. These additions are: 1069 +-------------------+------------+----------+ 1070 | Message Code Name | Code Value | RFC | 1071 +-------------------+------------+----------+ 1072 | path_track_req | [TBD7] | RFC-AAAA | 1073 | path_track_ans | [TBD8] | RFC-AAAA | 1074 +-------------------+------------+----------+ 1076 Table 2: Extensions to RELOAD Message Codes 1078 [To RFC editor: Values starting at [TBD1] were used to prevent 1079 collisions with RELOAD base values and other extensions. Please 1080 replace with the next highest available values. The final message 1081 codes will be assigned by IANA. And all RFC-AAAA should be replaced 1082 with the RFC number of RELOAD when publication.] 1084 9.4. Error Code 1086 This document introduces the following new error codes, extending the 1087 "RELOAD Message Code" registry as described below: 1089 +----------------------------------------+------------+----------+ 1090 | Message Code Name | Code Value | RFC | 1091 +----------------------------------------+------------+----------+ 1092 | Error_Underlay_Destination_Unreachable | [TBD1] | RFC-AAAA | 1093 | Error_Underlay_Time_Exceeded | [TBD2] | RFC-AAAA | 1094 | Error_Message_Expired | [TBD3] | RFC-AAAA | 1095 | Error_Upstream_Misrouting | [TBD4] | RFC-AAAA | 1096 | Error_Loop_Detected | [TBD5] | RFC-AAAA | 1097 | Error_TTL_Hops_Exceeded | [TBD6] | RFC-AAAA | 1098 +----------------------------------------+------------+----------+ 1100 Table 3: Extensions to RELOAD Error Codes 1102 [To RFC editor: Values starting at [TBD1] were used to prevent 1103 collisions with RELOAD base values and other extensions. Please 1104 replace with the next highest available values. The final message 1105 codes will be assigned by IANA. And all RFC-AAAA should be replaced 1106 with the RFC number of RELOAD when publication.] 1108 9.5. Message Extension 1110 This document introduces the following new RELOAD extension code: 1112 +-----------------+------------+----------+ 1113 | Extension Name | Code Value | RFC | 1114 +-----------------+------------+----------+ 1115 | Diagnostic_Ping | 0x0002 | RFC-AAAA | 1116 +-----------------+------------+----------+ 1118 Table 4: New RELOAD Extension Code 1120 [To RFC editor: The value 0x0002 was used to prevent collisions with 1121 other extensions. Please replace with the next highest available 1122 value. The final codes will be assigned by IANA. And all RFC-AAAA 1123 should be replaced with the RFC number of RELOAD when publication.] 1125 9.6. XML Name Space Registration 1127 This document registers a URI for the config-diagnostics XML 1128 namespaces in the IETF XML registry defined in [RFC3688]. All the 1129 elements defined in this document belong to this namespace. 1131 URI: urn:ietf:params:xml:ns:p2p:config-diagnostics 1132 Registrant Contact: The IESG. 1133 XML: N/A, the requested URIs are XML namespaces 1135 And the overlay configuration file MUST contain the following xml 1136 language declaring P2PSIP diagnostics as a mandatory extension to 1137 RELOAD. 1139 1140 urn:ietf:params:xml:ns:p2p:config-diagnostics 1141 1143 10. Acknowledgments 1145 We would like to thank Zheng Hewen for the contribution of the 1146 initial version of this document. We would also like to thank Bruce 1147 Lowekamp, Salman Baset, Henning Schulzrinne, Jiang Haifeng and Marc 1148 Petit-Huguenin for the email discussion and their valued comments, 1149 and special thanks to Henry Sinnreich for contributing to the usage 1150 scenarios text. We would like to thank the authors of the RELOAD 1151 protocol for transferring text about diagnostics to this document. 1153 11. References 1155 11.1. Normative References 1157 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1158 RFC 792, DOI 10.17487/RFC0792, September 1981, 1159 . 1161 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1162 Requirement Levels", BCP 14, RFC 2119, 1163 DOI 10.17487/RFC2119, March 1997, 1164 . 1166 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 1167 DOI 10.17487/RFC3688, January 2004, 1168 . 1170 [RFC5905] Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch, 1171 "Network Time Protocol Version 4: Protocol and Algorithms 1172 Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010, 1173 . 1175 [RFC6940] Jennings, C., Lowekamp, B., Ed., Rescorla, E., Baset, S., 1176 and H. Schulzrinne, "REsource LOcation And Discovery 1177 (RELOAD) Base Protocol", RFC 6940, DOI 10.17487/RFC6940, 1178 January 2014, . 1180 11.2. Informative References 1182 [UnixTime] 1183 "UnixTime", .>. 1186 [I-D.ietf-p2psip-concepts] 1187 Bryan, D., Matthews, P., Shim, E., Willis, D., and S. 1188 Dawkins, "Concepts and Terminology for Peer to Peer SIP", 1189 draft-ietf-p2psip-concepts-07 (work in progress), May 1190 2015. 1192 [Overlay-Failure-Detection] 1193 Zhuang, S., "On failure detection algorithms in overlay 1194 networks", Proc. IEEE Infocomm, Mar 2005. 1196 [Handling_Churn_in_a_DHT] 1197 Rhea, S., "Handling Churn in a DHT", USENIX Annual 1198 Conference, June 2004. 1200 [Diagnostic_Framework] 1201 Jin, X., "A Diagnostic Framework for Peer-to-Peer 1202 Streaming", 2005. 1204 [Diagnostics_and_NAT_traversal_in_P2PP] 1205 Gupta, G., "Diagnostics and NAT Traversal in P2PP - Design 1206 and Implementation", Columbia University Report , June 1207 2008. 1209 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1210 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1211 DOI 10.17487/RFC5226, May 2008, 1212 . 1214 Appendix A. Examples 1216 Below, we sketch how these metrics can be used. 1218 A.1. Example 1 1220 A peer may set EWMA_BYTES_SENT and EWMA_BYTES_RCVD flags in the 1221 PathTrackReq to its direct neighbors. A peer can use EWMA_BYTES_SENT 1222 and EWMA_BYTES_RCVD of another peer to infer whether it is acting as 1223 a media relay. It may then choose not to forward any requests for 1224 media relay to this peer. Similarly, among the various candidates 1225 for filling up routing table, a peer may prefer a peer with a large 1226 UPTIME value, small RTT, and small LAST_CONTACT value. 1228 A.2. Example 2 1230 A peer may set the STATUS_INFO Flag in the PathTrackReq to a remote 1231 destination peer. The overlay has its own threshold definition for 1232 congestion. The peer can obtain knowledge of all the status 1233 information of the intermediate peers along the path. Then it can 1234 choose other paths to that node for the subsequent requests. 1236 A.3. Example 3 1238 A peer may use Ping to evaluate the average overlay hops to other 1239 peers by sending PingReq to a set of random resource or node IDs in 1240 the overlay. A peer may adjust its timeout value according to the 1241 change of average overlay hops. 1243 Appendix B. Problems with Generating Multiple Responses on Path 1245 An earlier version of this document considered an approach where a 1246 response was generated by each intermediate peer as the message 1247 traversed the overlay. This approach was discarded. One reason this 1248 approach was discarded was that it could provide a DoS mechanism, 1249 whereby an attacker could send an arbitrary message claiming to be 1250 from a spoofed "sender" the real sender wished to attack. As a 1251 result of sending this one message, many messages would be generated 1252 and sent back to the spoofed "sender" - one from each intermediate 1253 peer on the message path. While authentication mechanisms could 1254 reduce some risk of this attack, it still resulted in a fundamental 1255 break from the request-response nature of the RELOAD protocol, as 1256 multiple responses are generated to a single request. Although one 1257 request with responses from all the peers in the route will be more 1258 efficient, it was determined to be too great a security risk and 1259 deviation from the RELOAD architecture. 1261 Appendix C. Changes to the Draft 1263 To RFC editor: This section is to track the changes. Please remove 1264 this section before publication. 1266 C.1. Changes since -00 version 1268 1. Changed title from "Diagnose P2PSIP Overlay Network" to "P2PSIP 1269 Overlay Diagnostics". 1271 2. Changed the table of contents. Add a section about message 1272 processing and a section of examples. 1274 3. Merge diagnostics text from the p2psip base draft -01. 1276 4. Removed ECHO method for security reasons. 1278 C.2. Changes since -01 version 1280 Added BATTERY_STATUS as diagnostic information. 1282 Removed UnderlayTTL test from the Ping method, instead adding an 1283 UNDERLAY_HOP diagnostic information for PathTrack method. 1285 Give some examples for diagnostic information, and give some 1286 editor's notes for further work. 1288 C.3. Changes since -02 version 1290 Provided further explanation as to why the base draft Ping in the 1291 current form cannot be used to replace Ping, and why some combination 1292 of methods cannot replace PathTrack. 1294 C.4. Changes since -03 version 1296 Modified structure used to share information collected. Both 1297 mechanisms now use a common data structure to convey information. 1299 C.5. Changes since -04 version 1301 Updated the authors' addresses and modified the last sentence in . 1302 (Section 4.3.1.2) 1304 C.6. Changes since -05 version 1306 Resolve Marc's comments from the mailing list. And define the 1307 details of STATUS_INO. 1309 C.7. Changes in version -10 1311 Resolve the authorization issue and other comments (e.g. define 1312 diagnostics as a mandatory extension) from WGLC. And check for the 1313 languages. 1315 C.8. Changes in version -15 1317 Changed several diagnostic kind return values to be 64 bit vs. 32 bit 1318 to provide headroom. Split bandwidth into upstream and downstream. 1319 Renamed length in diagnostic request object to ext_length, added 1320 ext_length to response object, and clarified that ext_length is 1321 length of diagnostic info/extensions being returned, not the length 1322 of the object. 1324 Aligned many flags/values with RELOAD by using hex vs decimal values. 1326 Significant reorganization and edit for readability. 1328 Authors' Addresses 1330 Haibin Song 1331 Huawei 1333 Email: haibin.song@huawei.com 1335 Jiang Xingfeng 1336 Huawei 1338 Email: jiangxingfeng@huawei.com 1340 Roni Even 1341 Huawei 1342 14 David Hamelech 1343 Tel Aviv 64953 1344 Israel 1346 Email: roni.even@mail101.huawei.com 1348 David A. Bryan 1349 ethernot.org 1350 Cedar Park, Texas 1351 United States of America 1353 Email: dbryan@ethernot.org 1355 Yi Sun 1356 ICT 1358 Email: sunyi@ict.ac.cn