idnits 2.17.1 draft-ietf-p2psip-diagnostics-16.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 337 has weird spacing: '...ionType type;...' == Line 525 has weird spacing: '... opaque diagn...' == The document seems to contain a disclaimer for pre-RFC5378 work, but was first submitted on or after 10 November 2008. The disclaimer is usually necessary only for documents that revise or obsolete older RFCs, and that take significant amounts of text from those RFCs. If you can contact all authors of the source material and they are willing to grant the BCP78 rights to the IETF Trust, you can and should remove the disclaimer. Otherwise, the disclaimer is needed and you can ignore this comment. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (January 23, 2015) is 3374 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: '0x00' is mentioned on line 678, but not defined == Missing Reference: '0x0F' is mentioned on line 678, but not defined == Unused Reference: 'RFC0792' is defined on line 1126, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-p2psip-self-tuning' is defined on line 1149, but no explicit reference was found in the text == Unused Reference: 'I-D.ietf-p2psip-concepts' is defined on line 1155, but no explicit reference was found in the text == Outdated reference: A later version (-09) exists of draft-ietf-p2psip-concepts-06 -- Obsolete informational reference (is this intentional?): RFC 5226 (Obsoleted by RFC 8126) Summary: 0 errors (**), 0 flaws (~~), 10 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 P2PSIP Working Group H. Song 3 Internet-Draft X. Jiang 4 Intended status: Standards Track R. Even 5 Expires: July 27, 2015 Huawei 6 D. Bryan 7 Ethernot.org 8 Y. Sun 9 ICT 10 January 23, 2015 12 P2P Overlay Diagnostics 13 draft-ietf-p2psip-diagnostics-16 15 Abstract 17 This document describes mechanisms for P2P overlay diagnostics. It 18 defines extensions to the RELOAD P2PSIP base protocol to collect 19 diagnostic information, and details the protocol specifications for 20 these extensions. Useful diagnostic information for connection and 21 node status monitoring is also defined. The document also describes 22 the usage scenarios and provides examples of how these methods are 23 used to perform diagnostics in P2PSIP overlay networks. 25 Status of This Memo 27 This Internet-Draft is submitted in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF). Note that other groups may also distribute 32 working documents as Internet-Drafts. The list of current Internet- 33 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 This Internet-Draft will expire on July 27, 2015. 42 Copyright Notice 44 Copyright (c) 2014 IETF Trust and the persons identified as the 45 document authors. All rights reserved. 47 This document is subject to BCP 78 and the IETF Trust's Legal 48 Provisions Relating to IETF Documents 49 (http://trustee.ietf.org/license-info) in effect on the date of 50 publication of this document. Please review these documents 51 carefully, as they describe your rights and restrictions with respect 52 to this document. Code Components extracted from this document must 53 include Simplified BSD License text as described in Section 4.e of 54 the Trust Legal Provisions and are provided without warranty as 55 described in the Simplified BSD License. 57 This document may contain material from IETF Documents or IETF 58 Contributions published or made publicly available before November 59 10, 2008. The person(s) controlling the copyright in some of this 60 material may not have granted the IETF Trust the right to allow 61 modifications of such material outside the IETF Standards Process. 62 Without obtaining an adequate license from the person(s) controlling 63 the copyright in such materials, this document may not be modified 64 outside the IETF Standards Process, and derivative works of it may 65 not be created outside the IETF Standards Process, except to format 66 it for publication as an RFC or to translate it into languages other 67 than English. 69 Table of Contents 71 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 72 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 3. Diagnostic Scenarios . . . . . . . . . . . . . . . . . . . . 4 74 4. Data Collection Mechanisms . . . . . . . . . . . . . . . . . 5 75 4.1. Overview of Operations . . . . . . . . . . . . . . . . . 5 76 4.2. "Ping-like" Behavior: Extending Ping . . . . . . . . . . 7 77 4.2.1. RELOAD Request Extension: Ping . . . . . . . . . . . 7 78 4.3. "Traceroute-like" Behavior: The Path_Track Method . . . . 8 79 4.3.1. New RELOAD Request: PathTrack . . . . . . . . . . . . 9 80 4.3.1.1. PathTrack Request . . . . . . . . . . . . . . . . 10 81 4.3.1.2. PathTrack Response . . . . . . . . . . . . . . . 10 82 4.4. Error Code Extensions . . . . . . . . . . . . . . . . . . 11 83 5. Diagnostic Data Structures . . . . . . . . . . . . . . . . . 11 84 5.1. DiagnosticsRequest Data Structure . . . . . . . . . . . . 12 85 5.2. DiagnosticsResponse Data Structure . . . . . . . . . . . 13 86 5.3. dMFlags and Diagnostic Kind ID Types . . . . . . . . . . 14 87 6. Message Processing . . . . . . . . . . . . . . . . . . . . . 17 88 6.1. Message Creation and Transmission . . . . . . . . . . . . 17 89 6.2. Message Processing: Intermediate Peers . . . . . . . . . 18 90 6.3. Message Response Creation . . . . . . . . . . . . . . . . 19 91 6.4. Interpreting Results . . . . . . . . . . . . . . . . . . 20 92 7. Authorization through Overlay Configuration . . . . . . . . . 20 93 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 94 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 21 95 9.1. Diagnostics Flag . . . . . . . . . . . . . . . . . . . . 21 96 9.2. Diagnostic Kind ID Types . . . . . . . . . . . . . . . . 22 97 9.3. Message Codes . . . . . . . . . . . . . . . . . . . . . . 22 98 9.4. Error Code . . . . . . . . . . . . . . . . . . . . . . . 23 99 9.5. Message Extension . . . . . . . . . . . . . . . . . . . . 23 100 9.6. XML Name Space Registration . . . . . . . . . . . . . . . 24 101 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 102 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 24 103 11.1. Normative References . . . . . . . . . . . . . . . . . . 24 104 11.2. Informative References . . . . . . . . . . . . . . . . . 25 105 Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . 26 106 A.1. Example 1 . . . . . . . . . . . . . . . . . . . . . . . . 26 107 A.2. Example 2 . . . . . . . . . . . . . . . . . . . . . . . . 26 108 A.3. Example 3 . . . . . . . . . . . . . . . . . . . . . . . . 26 109 Appendix B. Problems with Generating Multiple Responses on Path 26 110 Appendix C. Changes to the Draft . . . . . . . . . . . . . . . . 27 111 C.1. Changes since -00 version . . . . . . . . . . . . . . . . 27 112 C.2. Changes since -01 version . . . . . . . . . . . . . . . . 27 113 C.3. Changes since -02 version . . . . . . . . . . . . . . . . 27 114 C.4. Changes since -03 version . . . . . . . . . . . . . . . . 27 115 C.5. Changes since -04 version . . . . . . . . . . . . . . . . 27 116 C.6. Changes since -05 version . . . . . . . . . . . . . . . . 28 117 C.7. Changes in version -10 . . . . . . . . . . . . . . . . . 28 118 C.8. Changes in version -15 . . . . . . . . . . . . . . . . . 28 119 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 28 121 1. Introduction 123 In the last few years, overlay networks have rapidly evolved and 124 emerged as a promising platform for deployment of new applications 125 and services in the Internet. One of the reasons overlay networks 126 are seen as an excellent platform for large scale distributed systems 127 is their resilience in the presence of failures. This resilience has 128 three aspects: data replication, routing recovery, and static 129 resilience. Routing recovery algorithms are used to repopulate the 130 routing table with live nodes when failures are detected. Static 131 resilience measures the extent to which an overlay can route around 132 failures even before the recovery algorithm repairs the routing 133 table. Both routing recovery and static resilience rely on accurate 134 and timely detection of failures. 136 There are a number of situations in which some nodes in a Peer-to- 137 Peer (P2P) overlay may malfunction or behave badly. For example, 138 these nodes may be disabled, congested, or may be misrouting 139 messages. The impact of these malfunctions on the overlay network 140 may be a degradation of quality of service provided collectively by 141 the peers in the overlay network or an interruption of the overlay 142 services. It is desirable to identify malfunctioning or badly 143 behaving peers through diagnostic tools, and exclude or reject them 144 from the P2P system. Node failures may also be caused by failures of 145 underlying layers. For example, recovery from an incorrect overlay 146 topology may be slow when the speed at which IP routing recovers 147 after link failures is very slow. Moreover, if a backbone link fails 148 and the failover is slow, the network may be partitioned, leading to 149 partitions of overlay topologies and inconsistent routing results 150 between different partitioned components. 152 Some keep-alive algorithms based on periodic probe and acknowledge 153 mechanisms enable accurate and timely detection of failures of one 154 node's neighbors [Overlay-Failure-Detection], but these algorithms by 155 themselves can only detect the disabled neighbors using the periodic 156 method. This may not be sufficient for the service provider 157 operating the overlay network. 159 For Peer-to-Peer SIP (P2PSIP), a single, general P2PSIP overlay 160 diagnostic framework supporting periodic and on-demand methods for 161 detecting node failures and network failures is desirable. This 162 document describes a general P2PSIP overlay diagnostic extension to 163 the P2PSIP base protocol RELOAD [RFC6940] and is intended as a 164 complement to keep-alive algorithms in the P2PSIP overlay itself. 166 2. Terminology 168 This document uses the concepts defined in RELOAD [RFC6940]. 170 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 171 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 172 document are to be interpreted as described in [RFC2119]. 174 3. Diagnostic Scenarios 176 P2P systems are self-organizing and ideally setup and configuration 177 of individual P2P nodes requires no network management in the 178 traditional sense. However, users of an overlay, as well as P2P 179 service providers may contemplate usage scenarios where some 180 monitoring and diagnostics are required. We present a simple 181 connectivity test and some useful diagnostic information that may be 182 used in such diagnostics. 184 The common usage scenarios for P2P diagnostics can be broadly 185 categorized in three classes: 187 1. Automatic diagnostics built into the P2P overlay routing 188 protocol. Nodes perform periodic checks of known neighbors and 189 remove those nodes from the routing tables that fail to respond 190 to connectivity checks [Handling_Churn_in_a_DHT]. Unresponsive 191 nodes may only be temporarily disabled, for example due to a 192 local cryptographic processing overload, disk processing overload 193 or link overload. It is therefore useful to repeat the 194 connectivity checks to see nodes have recovered and can be again 195 placed in the routing tables. This process is known as 'failed 196 node recovery' and can be optimized as described in the paper 197 "Handling Churn in a DHT" [Handling_Churn_in_a_DHT]. 199 2. Diagnostics used by a particular node to follow up on an 200 individual user complaint or failure. For example, a technical 201 support staff member may use a desktop sharing application (with 202 the permission of the user) to remotely determine the health of, 203 and possible problems with, the malfunctioning node. Part of the 204 remote diagnostics may consist of simple connectivity tests with 205 other nodes in the P2PSIP overlay and retrieval of statistics 206 from nodes in the overlay. The simple connectivity tests are not 207 dependent on the type of P2PSIP overlay. Note that other tests 208 may be required as well, including checking the health and 209 performance of the user's computer or mobile device and checking 210 the bandwidth of the link connecting the user to the Internet. 212 3. P2P system-wide diagnostics used to check the overall health of 213 the P2P overlay network. These include checking the consumption 214 of network bandwidth, checking for the presence of problem links 215 and checking for abusive or malicious nodes. This is not a 216 trivial problem and has been studied in detail for content and 217 streaming P2P overlays [Diagnostic_Framework], and has not been 218 addressed in earlier P2PSIP documents 219 [Diagnostics_and_NAT_traversal_in_P2PP]. While this is a 220 difficult problem, a great deal of information that can help in 221 diagnosing these problems can be obtained by obtaining basic 222 diagnostic information for peers and the network. This document 223 provides a framework for obtaining this information. 225 4. Data Collection Mechanisms 227 4.1. Overview of Operations 229 The diagnostic mechanisms described in this document are primarily 230 intended to detect and locate failures or monitor performance in 231 P2PSIP overlay networks. It provides mechanisms to detect and locate 232 malfunctioning or badly behaving nodes including disabled nodes, 233 congested nodes and misrouting peers. It provides a mechanism to 234 detect direct connectivity or connectivity to a specified node, a 235 mechanism to detect the availability of specified resource records 236 and a mechanism to discover P2PSIP overlay topology and the underlay 237 topology failures. 239 The P2PSIP diagnostics extensions define two mechanisms to collect 240 data. The first is an extension to the RELOAD Ping mechanism, 241 allowing diagnostic data to be queried from a node, as well as to 242 diagnose the path to that node. The second is a new method and 243 response, PathTrack, for collecting diagnostic information 244 iteratively. Payloads for these mechanisms allowing diagnostic data 245 to be collected and represented are presented, and additional error 246 codes are introduced. Essentially, this document reuses RELOAD 247 [RFC6940]specification and extends them to introduce the new 248 diagnostics methods. The extensions strictly follow RELOAD 249 specification on the messages routing, transport, NAT traversal etc. 250 The diagnostic methods are however P2PSIP protocol independent. 252 This document primarily describes how to detect and locate failures 253 including disabled nodes, congested nodes, misrouting behaviors and 254 underlying network faults in P2PSIP overlay networks through a simple 255 and efficient mechanism. This mechanism is modeled after the ping/ 256 traceroute paradigm: ping [RFC0792]is used for connectivity checks, 257 and traceroute is used for hop-by-hop fault localization as well as 258 path tracing. This document specifies a "ping-like" mode (by 259 extending the RELOAD Ping method to gather diagnostics) and a 260 "traceroute-like" mode (by defining the new PathTrack method) for 261 diagnosing P2PSIP overlay networks. 263 One way these tools can be used is to detect the connectivity to the 264 specified node or the availability of the specified resource-record 265 through the extended P2PSIP Ping operation. Once the overlay network 266 receives some alarms about overlay service degradation or 267 interruption, a Ping is sent. If the Ping fails, one can then send a 268 PathTrack to determine where the fault lies. 270 The diagnostic information can only be provided to authorized nodes. 271 Some diagnostic information can be provided to all the participants 272 in the P2PSIP overlay, and some other diagnostic information can only 273 be provided to the nodes authorized by the local or overlay policy. 274 The authorization depends on the type of the diagnostic information 275 and the administrative considerations, and is application specific. 277 This document considers the general administrative scenario based on 278 diagnostic kind type, where a whole overlay can authorize a certain 279 type of diagnostic information to a small list of particular nodes 280 (e.g. administrative nodes). That means, if a node gets the 281 authorization to access a diagnostic kind type, it can access that 282 information from all nodes in the overlay network. It leaves the 283 scenario where a particular node authorizes its diagnostic 284 information to a particular list of nodes out of scope. This could 285 be achieved by extension of this document if there is requirement in 286 the near future. The default policy or access rule for a type of 287 diagnostic information is "permit" unless specified in the 288 diagnostics extension document. As the RELOAD protocol already 289 requires that each message carries the message signature of the 290 sender, the receiver of the diagnostics requests can use the 291 signature to identify the sender. It can then use the overlay 292 configuration file with this signature to determine which types of 293 diagnostic information that node is authorized for. 295 In the remainder of this section we define mechanisms for collecting 296 data, as well as the specific protocol extensions (message 297 extensions, new methods, and error codes) required to collect this 298 information. In Section 5 we discuss the format of the data 299 collected, and in Section 6 we discuss detailed message processing. 301 4.2. "Ping-like" Behavior: Extending Ping 303 To provide "ping-like" behavior, the RELOAD Ping method is extended 304 to collect diagnostic data along the path. The request message is 305 forwarded by the intermediate peers along the path and then 306 terminated by the responsible peer. After optional local 307 diagnostics, the responsible peer returns a response message. If an 308 error is found when routing, an Error response is sent to the 309 initiator node by the intermediate peer. Please refer to the RELOAD 310 [RFC6940] for details of the protocol. 312 The message flow of a Ping message (with diagnostic extensions) is as 313 follows: 315 Peer A Peer B Peer C Peer D 316 | | | | 317 |(1). PingReq | | | 318 |------------------->|(2). PingReq | | 319 | |------------------->|(3). PingReq | 320 | | |------------------->| 321 | | | | 322 | | |<-------------------| 323 | |<-------------------|(4). PingAns | 324 |<-------------------|(5). PingAns | | 325 |(6). PingAns | | | 326 | | | | 328 Figure 1: Ping Diagnostic Message Flow 330 4.2.1. RELOAD Request Extension: Ping 332 To extend the ping request for use in diagnostics, a new extension of 333 RELOAD is defined. The structure for a MessageExtension in RELOAD is 334 defined as: 336 struct { 337 MessageExtensionType type; 338 Boolean critical; 339 opaque extension_contents<0..2^32-1>; 340 } MessageExtension; 342 For the Ping request extension, we define a new MessageExtensionType, 343 extension 0x0002 named Diagnostic_Ping, as specified in Table 4 and 344 specified in the RELOAD. The extension contents consists of a 345 DiagnosticsRequest structure, defined later in this document in 346 Section 5.1. This extension MAY be used for new requests of the the 347 Ping method and MUST NOT be included in requests using any other 348 method. 350 This extension is not critical. If a peer does not support the 351 extension, they will simply ignore the diagnostic portion of the 352 message, and will treat the message as if it was a normal ping. 353 Senders MUST accept a response that lacks diagnostic information and 354 SHOULD NOT resend the message expecting a reply. Receivers who 355 receive a method other than Ping including this extension MUST ignore 356 the extension. 358 4.3. "Traceroute-like" Behavior: The Path_Track Method 360 We define a simple PathTrack method for retrieving diagnostic 361 information iteratively. The mechanism defined in this document 362 follows the RELOAD specification, the new request and response 363 message use the message format specified in RELOAD messages. Please 364 refer to the RELOAD [RFC6940] for details of the protocol. 366 The operation of this request is shown below in Figure 2. The 367 initiator node A asks its neighbor B which is the next hop peer to 368 the destination ID, and B returns a message with the next hop peer C 369 information, along with optional diagnostic information for B to the 370 initiator node. Then the initiator node A asks the next hop peer C 371 (directly or via symmetric routing) to return next hop peer D 372 information and diagnostic information of C. Unless a failure 373 prevents the message from being forwarded, this step can be 374 iteratively repeated until the request reaches responsible peer D for 375 the destination ID, and retrieves diagnostic information of peer D. 377 The message flow of a PathTrack message (with diagnostic extensions) 378 is as follows: 380 Peer-A Peer-B Peer-C Peer-D 381 | | | | 382 |(1).PathTrackReq | | | 383 |------------------->| | | 384 |(2).PathTrackAns | | | 385 |<-------------------| | | 386 | |(3).PathTrackReq | | 387 |--------------------|------------------->| | 388 | |(4).PathTrackAns | | 389 |<-------------------|--------------------| | 390 | | |(5).PathTrackReq | 391 |--------------------|--------------------|------------------->| 392 | | |(6).PathTrackAns | 393 |<-------------------|--------------------|--------------------| 394 | | | | 396 Figure 2: PathTrack Diagnostic Message Flow 398 There have been proposals that RouteQuery and a series of Fetch 399 requests can be used to replace the PathTrack mechanism, but in the 400 presence of churn such an operation would not, strictly speaking, 401 provide identical results, as the path may change between RouteQuery 402 and Fetch operations. (although obviously the path could change 403 between steps of PathTrack as well). 405 4.3.1. New RELOAD Request: PathTrack 407 This document defines a new RELOAD method, PathTrack, to retrieve the 408 diagnostic information from the intermediate peers along the routing 409 path. At each step of the PathTrack request, the responsible peer 410 responds to the initiator node with requested status information. 411 Status information can include a peer's congestion state, processing 412 power, available bandwidth, the number of entries in its neighbor 413 table, uptime, identity, network address information, and next hop 414 peer information. 416 A PathTrack request specifies which diagnostic information is 417 requested using a DiagnosticsRequest data structure, defined and 418 discussed in detail later in this document in Section 5.1. Base 419 information is requested by setting the appropriate flags in the data 420 structure in the request. If all flags are clear (no bits are set), 421 then the PathTrack request is only used for requesting the next hop 422 information. In this case the iterative mode of PathTrack is 423 degraded to a RouteQuery method which is only used for checking the 424 liveness of the peers along the routing path. The PathTrack request 425 can be routed directly or through the overlay based on the routing 426 mode chosen by the initiator node. 428 A response to a successful PathTrackReq is a PathTrackAns message. 429 The PathTrackAns contains general diagnostic information in the 430 payload, returned using a DiagnosticResponse data structure. This 431 data structure is defined and discussed in detail later in this 432 document in Section 5.2. The information returned is determined 433 based on the information requested in the flags in the corresponding 434 request. 436 4.3.1.1. PathTrack Request 438 The structure of the PathTrack request is as follows: 440 struct{ 441 Destination destination; 442 DiagnosticsRequest request; 443 }PathTrackReq; 445 The fields of the PathTrackReq are as follows: 447 destination : The destination which the initiator node is 448 interested in. This may be any valid destination object, 449 including a NodeID, opaque ids, or ResourceID. 451 request : A DiagnosticsRequest, as discussed in Section 5.1. 453 4.3.1.2. PathTrack Response 455 The structure of the PathTrack Response is as follows: 457 struct{ 458 Destination next_hop; 459 DiagnosticsResponse response; 460 }PathTrackAns; 462 The fields of the PathTrackAns are as follows: 464 next_hop : The information of the next hop node from the 465 responding intermediate peer to the destination node. If the 466 responding peer is the responsible peer for the destination ID, 467 then the next_hop node ID equals the responding node ID, and after 468 that the initiator MUST stop the iterative process. 470 response : A DiagnosticsResponse, as discussed in Section 5.2. 472 4.4. Error Code Extensions 474 This document extends the Error response method defined in the RELOAD 475 specification to support error cases resulting from diagnostic 476 queries. When an error is encountered in RELOAD, the Message Code 477 0xFFFF is returned. The ErrorResponse structure includes an error 478 code. and we define new error codes to report possible error 479 conditions detected while performing diagnostics: 481 Code Value Error Code Name 482 0x65 Underlay Destination Unreachable 483 0x66 Underlay Time exceeded 484 0x67 Message Expired 485 0x68 Upstream Misrouting 486 0x69 Loop detected 487 0x70 TTL hops exceeded 489 The final error codes will be assigned by IANA as specified in RELOAD 490 protocol [RFC6940]. 492 In addition, this document introduces several types of error 493 information in the error_info field in the case of Code 0x65. These 494 are represented as an opaque UTF-8 text string. Here are some 495 examples for the error info. 497 error_info: 499 net unreachable 500 host unreachable 501 protocol unreachable 502 port unreachable 503 fragmentation needed 504 source route failed 506 The error_info field values of the Code 0x66 to 0x70 are to be 507 application specific and defined by the particular overlay. 509 5. Diagnostic Data Structures 511 Both the extended Ping method and Path_track methods use the 512 following common diagnostics data structures to collect data. Two 513 common structures are defined: DiagnosticsRequest for requesting 514 data, and DiagnosticsResponse for returning the information. 516 5.1. DiagnosticsRequest Data Structure 518 The DiagnosticsRequest data structure is used to request diagnostic 519 information and has the following form: 521 enum{ (2^16-1) } DiagnosticKindId; 523 struct{ 524 DiagnosticKindId kind; 525 opaque diagnostic_extension_contents<0..2^32-1>; 526 }DiagnosticExtension; 528 struct{ 529 uint64 expiration; 530 uint64 timestamp_initiated; 531 uint64 dMFlags; 532 uint32 ext_length; 533 DiagnosticExtension diagnostic_extensions_list<0..2^32-1>; 534 }DiagnosticsRequest; 536 The fields in the DiagnosticsRequest are as follows: 538 expiration : The time when the request will expire represented as 539 the number of milliseconds elapsed since midnight Jan 1, 1970 UTC 540 not counting leap seconds. This will have the same values for 541 seconds as standard UNIX time or POSIX time. More information can 542 be found at UnixTime [UnixTime]. This value MUST have a value of 543 between 1 and 600 seconds in the future. 545 timestamp_initiated : The time when the P2PSIP diagnostics request 546 was initiated represented as the number of milliseconds elapsed 547 since midnight Jan 1, 1970 UTC not counting leap seconds. This 548 will have the same values for seconds as standard UNIX time or 549 POSIX time. 551 dMFlags : A mandatory field which is an unsigned 64-bit integer 552 indicating which base diagnostic information the request initiator 553 node is interested in. The initiator sets different bits to 554 retrieve different kinds of diagnostic information. If dMFlags is 555 set to zero, then no base diagnostic information is conveyed in 556 the PathTrack response. If dMFlag is set to all '1's, then all 557 base diagnostic information values are requested. A request may 558 set any number of the flags to request the corresponding 559 diagnostic information. 561 ext_length : the length of the extended diagnostic request 562 information in bytes. If the value is greater than or equal to 1, 563 then some extended diagnostic information is requested. A value 564 of zero indicates no extended diagnostic information is included. 565 The value of ext_length MUST NOT be negative. Note that is NOT 566 the length of the entire DiagnosticsRequest data structure. 568 Note this memo specifies the initial set of flags, the flags can 569 be extended. The dMflags indicate general diagnostic information 570 The mapping between the bits in the dMFlags and the diagnostic 571 information kind presented is as described in Section 9.1. 573 diagnostic_extensions_list : consists of one or more 574 DiagnosticExtension structures (see below) documenting additional 575 diagnostic information being requested. Each DiagnosticExtension 576 consists of the following fields: 578 kind : a numerical code indicating the type of extension 579 diagnostic information (see Section 9.2). Note that kinds 580 0xF000 - 0xFFFE are reserved for overlay specific diagnostics 581 and may be used without IANA registration for local diagnostic 582 information. Kinds from 0x0000 to 0x003F MUST NOT be indicated 583 in the diagnostic_extensions_list in the message request 584 because they can be represented using the dMFlags in a much 585 simpler way. 587 diagnostic_extension_contents : the opaque data containing the 588 request for this particular extension. This data is extension 589 dependent. 591 5.2. DiagnosticsResponse Data Structure 593 enum { (2^16-1) } DiagnosticKindId; 594 struct{ 595 DiagnosticKindId kind; 596 opaque diagnostic_info_contents<0..2^16-1>; 597 }DiagnosticInfo; 599 struct{ 600 uint64 expiration; 601 uint64 timestamp_received; 602 uint8 hop_counter; 603 uint32 ext_length; 604 DiagnosticInfo diagnostic_info_list<0..2^32-1>; 605 }DiagnosticsResponse; 607 The fields in the DiagnosticsResponse are as follows: 609 expiration : The time when the response will expire represented as 610 the number of milliseconds elapsed since midnight Jan 1, 1970 UTC 611 not counting leap seconds. This will have the same values for 612 seconds as standard UNIX time or POSIX time. This value MUST have 613 a value of between 1 and 600 seconds in the future. 615 timestamp_received : The time when P2PSIP Overlay diagnostic 616 request was received represented as the number of milliseconds 617 elapsed since midnight Jan 1, 1970 UTC not counting leap seconds. 618 This will have the same values for seconds as standard UNIX time 619 or POSIX time. 621 hop_counter : This field only appears in diagnostic responses. It 622 MUST be exactly copied from the TTL field of the forwarding header 623 in the received request. This information is sent back to the 624 request initiator, allowing it to compute the number of hops that 625 the message traversed in the overlay. 627 ext_length : the length of the returned DiagnosticInfo information 628 in bytes. If the value is greater than or equal to 1, then some 629 extended diagnostic information is requested. A value of zero 630 indicates no extended diagnostic information is included. The 631 value of ext_length MUST NOT be negative. Note that is NOT the 632 length of the entire DiagnosticsRequest data structure. 634 diagnostic_info_list : consists one or more DiagnosticInfo 635 structures containing the requested diagnostic information. The 636 fields in the DiagnosticInfo structure are as follows: 638 kind : A numeric code indicating the type of information being 639 returned. For base data requested using the dMFlags, this code 640 corresponds to the dMFlag set, and is described in Section 5.1. 641 For diagnostic extensions, this code will be identical to the 642 value of the DiagnosticKindId set in the "kind" field of the 643 DiagnosticExtension of the request. See Section 9.2. 645 diagnostic_information : Data containing the value for the 646 diagnostic information being reported. Various kinds of 647 diagnostic information can be retrieved, Please refer to 648 Section 5.3 for details of the diagnostic kind ID for the base 649 diagnostic information that may be reported. 651 5.3. dMFlags and Diagnostic Kind ID Types 653 The dMFlags field described above is a 64 bit field that allows 654 initiator nodes to identify up to 62 items of base information to 655 request in a request message (the first and last flags being 656 reserved). When the requested base information is returned in the 657 response, the value of the diagnostic kind ID will correspond to the 658 numeric field marked in the dMFlags in the request. The values for 659 the dMFlags are defined in Section 9.1 and the diagnostic kind IDs 660 are defined in Section 9.2. The information contained for each value 661 is described in this section. 663 STATUS_INFO (8 bits): A single value element containing an 664 unsigned byte representing whether or not the node is in 665 congestion status. An example usage of STATUS_INFO is for 666 congestion-aware routing. In this scenario, each peer has to 667 update its congestion status periodically. An intermediate peer 668 in the distributed hash table (DHT) network will choose its next 669 hop according to both the DHT routing algorithm and the status 670 information. This is done to avoid increasing load on congested 671 peers. The rightmost 4 bits are used and other bits MUST be 672 cleared to "0"s for future use. There are 16 levels of congestion 673 status, with "0x00" represent zero load and "0x0F" represent 674 congested. This document does not provide a specific method for 675 congestion, leaving this decision to each node. One possible 676 option for a node would be to take its CPU/memory/bandwidth usage 677 percentage in the past 600 seconds and normalize the highest value 678 to the range [0x00, 0x0F]. A future draft may define an objective 679 measure or specific algorithm for this. 681 ROUTING_TABLE_SIZE (32 bits): A single value element containing an 682 unsigned 32-bit integer representing the number of peers in the 683 peer's routing table. The administrator of the overlay may be 684 interested in statistics of this value for reasons such as routing 685 efficiency. Access to this kind of diagnostic information MUST 686 NOT be allowed unless compliant to the rules defined in Section 7. 688 PROCESS_POWER (64 bits): A single value element containing an 689 unsigned 64-bit integer specifying the processing power of the 690 node in unit of MIPS. Fractional values are rounded up. 692 UPSTREAM_BANDWIDTH (64 bits): A single value element containing an 693 unsigned 64-bit integer specifying the upstream network bandwidth 694 (provisioned or maximum, not available) of the node in unit of 695 Kbps. Fractional values are rounded up. For multihomed hosts, 696 this should be the link used to send the response. 698 DOWNSTREAM_BANDWIDTH (64 bits): A single value element containing 699 an unsigned 64-bit integer specifying the downstream network 700 bandwidth (provisioned or maximum, not available) of the node in 701 unit of Kbps. Fractional values are rounded up. For multihomed 702 hosts, this should be the link the request was received from. 704 SOFTWARE_VERSION: A single value element containing a US-ASCII 705 string that identifies the manufacture, model, operating system 706 information and the version of the software. While the format is 707 peer-defined, a suggested format is as follows: 709 "ApplicationProductToken (Platform; OS-or-CPU) VendorProductToken 710 (VendorComment)". For example: "MyReloadApp/1.0 (Unix; Linux 711 x86_64) libreload-java/0.7.0 (Stonyfish Inc.)". Access to this 712 kind of diagnostic information MUST NOT be allowed unless 713 compliant to the rules defined in Section 7. 715 MACHINE_UPTIME (64 bits): A single value element containing an 716 unsigned 64-bit integer specifying the time the node's underlying 717 system has been up in seconds. 719 APP_UPTIME (64 bits): A single value element containing an 720 unsigned 64-bit integer specifying the time the P2P application 721 has been up in seconds. 723 MEMORY_FOOTPRINT (64 bits): A single value element containing an 724 unsigned 64-bit integer representing the memory footprint of the 725 peer program in kilobytes (1024 bytes). Fractional values are 726 rounded up. Access to this kind of diagnostic information MUST 727 NOT be allowed unless compliant to the rules defined in Section 7. 729 DATASIZE_STORED (64 bits): An unsigned 64-bit integer representing 730 the number of bytes of data being stored by this node. Access to 731 this kind of diagnostic information MUST NOT be allowed unless 732 compliant to the rules defined in Section 7. 734 INSTANCES_STORED: An array element containing the number of 735 instances of each kind stored. The array is indexed by Kind-ID. 736 Each entry is an unsigned 64-bit integer. Access to this kind of 737 diagnostic information MUST NOT be allowed unless compliant to the 738 rules defined in Section 7. 740 MESSAGES_SENT_RCVD: An array element containing the number of 741 messages sent and received. The array is indexed by method code. 742 Each entry in the array is a pair of unsigned 64-bit integers 743 (packed end to end) representing sent and received. Access to 744 this kind of diagnostic information MUST NOT be allowed unless 745 compliant to the rules defined in Section 7. 747 EWMA_BYTES_SENT (32 bits): A single value element containing an 748 unsigned 32-bit integer representing an exponential weighted 749 average of bytes sent per second by this peer. sent = alpha x 750 sent_present + (1 - alpha) x sent where sent_present represents 751 the bytes sent per second since the last calculation and sent 752 represents the last calculation of bytes sent per second. A 753 suitable value for alpha is 0.8. This value is calculated every 754 five seconds. Access to this kind of diagnostic information MUST 755 NOT be allowed unless compliant to the rules defined in Section 7. 757 EWMA_BYTES_RCVD (32 bits): A single value element containing an 758 unsigned 32-bit integer representing an exponential weighted 759 average of bytes received per second by this peer. rcvd = alpha x 760 rcvd_present + (1 - alpha) x rcvd where rcvd_present represents 761 the bytes received per second since the last calculation and rcvd 762 represents the last calculation of bytes received per second. A 763 suitable value for alpha is 0.8. This value is calculated every 764 five seconds. Access to this kind of diagnostic information MUST 765 NOT be allowed unless compliant to the rules defined in Section 7. 767 UNDERLAY_HOP (8 bits): Indicates the IP layer hops from the 768 intermediate peer which receives the diagnostics message to the 769 next hop peer for this message. (Note: RELOAD does not require 770 the intermediate peers to look into the message body. So here we 771 use PathTrack to gather underlay hops for diagnostics purpose). 773 BATTERY_STATUS (8 bits): The left-most bit is used to indicate 774 whether this peer is using a battery or not. If this bit is clear 775 (set to '0'), then the peer is using a battery for power. The 776 other 7 bits are to be determined by specific applications. 778 6. Message Processing 780 6.1. Message Creation and Transmission 782 When constructing either a Ping message with diagnostic extensions or 783 a PathTrack message, the sender first creates and populates a 784 DiagnosticsRequest data structure. The timestamp_initiated field is 785 set to the current time, and the expiration field is constructed 786 based on this time. The sender includes the dMFlags field in the 787 structure, setting any number (including all) of the flags to request 788 particular diagnostic information. The sender MAY leave all the bits 789 unset, requesting no particular diagnostic information. 791 The sender MAY also include diagnostic extensions in the 792 DiagnosticsRequest data structure to request additional information. 793 If the sender includes any extensions, it MUST calculate the length 794 of these extensions and set the ext_length field to this value. If 795 no extensions are included, the sender MUST set ext_length to zero. 797 The format of the DiagnosticRequest data structure and its fields 798 MUST follow the restrictions defined in Section 5.1. 800 When constructing a Ping message with diagnostic extensions, the 801 sender MUST create an MessageExtension structure as defined in RELOAD 802 [RFC6940], setting the value of type to 0x0002, and the value of 803 critical to FALSE. The value of extension_contents MUST be a 804 DiagnosticsRequest structure as defined above. The message MAY be 805 directed to a particular NodeId or ResourceID, but SHOULD NOT be sent 806 to the broadcast NodeID. 808 When constructing a PathTrack message, the sender MUST set the 809 message_code for the RELOAD MessageContents structure to 810 path_track_req (0x65). The request field of the PathTrackReq MUST be 811 set to the DiagnosticsRequest data structure defined above. The 812 destination field MUST be set to the desired destination, which `MAY 813 be either a NodeId or ResourceID but SHOULD NOT be the broadcast 814 NodeID. 816 6.2. Message Processing: Intermediate Peers 818 When a request arrives at a peer, if the peer's responsible ID space 819 does not cover the destination ID of the request, then the peer MUST 820 continue processing this request according to the overlay specified 821 routing mode from RELOAD protocol. 823 In P2PSIP overlay, error responses to a message can be generated by 824 either an intermediate peer or the responsible peer. When a request 825 is received at a peer, the peer may find connectivity failures or 826 malfunctioning peers through the pre-defined rules of the overlay 827 network, e.g. by analyzing via list or underlay error messages. In 828 this case, the intermediate peer SHOULD return an error response to 829 the initiator node, reporting any malfunction node information 830 available in the error message payload. All error responses 831 generated MUST contain the appropriate error code. 833 Each intermediate peer receiving a Ping message with extensions (and 834 which understands the extension) or receiving a PathTrack request/ 835 response SHOULD check the expiration value (Unix time format) to 836 determine if the message is expired. If the message expired, the 837 intermediate peer SHOULD generate a response with Error Code 0x67 838 "Message Expired", return the response the initiator node, and 839 discard the message. 841 The intermediate peer SHOULD return an error response with the Error 842 Code 0x65 "Underlay Destination Unreachable" when it receives an ICMP 843 message with "Destination Unreachable" information after forwarding 844 the received request to the destination peer. 846 The intermediate peer SHOULD return an error response with the Error 847 Code 0x66 "Underlay Time Exceeded" when it receives an ICMP message 848 with "Time Exceeded" information after forwarding the received 849 request. 851 The peer SHOULD return an Error response with Error Code 0x68 852 "Upstream Misrouting" when it finds its upstream peer disobeys the 853 routing rules defined in the overlay. The immediate upstream peer 854 information SHOULD also be conveyed to the initiator node. 856 The peer SHOULD return an Error response with Error Code 0x69 "Loop 857 detected" when it finds a loop through the analysis of via list. 859 The peer SHOULD return an Error response with Error Code 0x70 "TTL 860 hops exceeded" when it finds that the TTL field value is no more than 861 0 when forwarding. 863 6.3. Message Response Creation 865 When a diagnostic request message arrives at a peer, it is 866 responsible for the destination ID specified in the forwarding 867 header, and assuming it understands the extension (in the case of 868 Ping) or the new request type PathTrack, it MUST follow the 869 specifications defined in RELOAD [RFC6940] to form the response 870 header, and perform the following operations: 872 When constructing a PathTrack response, the sender MUST set the 873 message_code for the RELOAD MessageContents structure to 874 path_track_ans (0x66). 876 The receiver MUST check the expiration value (Unix time format) in 877 the DiagnosticsRequest to determine if the message is expired. If 878 the message is expired, the peer MUST generate a response with the 879 Error Code 0x67 "Message Expired", return the response to the 880 initiator node, and discard the message. 882 If the message is not expired, the receiver MUST construct a 883 DiagnosticsResponse structure, as follows: The TTL value from the 884 forwarding header is copied to the hop_counter field of the 885 DiagnosticsResponse structure. Note that the default value for TTL 886 at the beginning represents 100-hops unless overlay configuration has 887 overridden the value. The receiver generates an Unix time format 888 timestamp for the current time of day and places it in the 889 timestamp_received field, and constructs a new expiration time and 890 places it in the expiration field of the DiagnosticsResponse. 892 The destination peer MUST check if the initiator node has the 893 authority to request specific types of diagnostic information, and if 894 appropriate, append the diagnostic information requested in the 895 dMFlags and diagnostic_extensions (if any) using the 896 diagnostic_info_list field to the DiagnosticsResponse structure. If 897 any information returned, the receiver MUST calculate the length of 898 the response and set ext_length appropriately. If no diagnostic 899 information is returned, ext_length MUST be set to zero. 901 The format of the DiagnosticResponse data structure and its fields 902 MUST follow the restrictions defined in Section 5.2. 904 In the event of an error, an error response containing the error code 905 followed by the description (if they exist) MUST be created and sent 906 to the sender. If the initiator node asks for diagnostic information 907 that they are not authorized to query, the receiving peer MUST return 908 an Error response with the Error Code 2 "Error_Forbidden". 910 6.4. Interpreting Results 912 The initiator node, as well as the responding peer, MAY compute the 913 overlay One-Way-Delay time through the value in timestamp_received 914 and the timestamp_initiated field. However, for a single hop 915 measurement, the traditional measurement methods MUST be used instead 916 of the overlay layer diagnostics methods. 918 The P2P overlay network using the diagnostics methods specified in 919 this document MUST enforce time synchronization with a central time 920 server. Network Time Protocol [RFC5905] can usually maintain time to 921 within tens of milliseconds over the public Internet, and can achieve 922 better than one millisecond accuracy in local area networks under 923 ideal conditions. However, this document does not specify the choice 924 for time synchronization, leaving it to the implementation. 926 The initiator node receiving the Ping response MAY check the 927 hop_counter field and compute the overlay hops to the destination 928 peer for the statistics of connectivity quality from the perspective 929 of overlay hops. 931 7. Authorization through Overlay Configuration 933 The overlay configuration file MUST contain the following XML 934 elements for authorizing a node to access the relative diagnostic 935 kinds. 937 diagnostic-kind: This has the attribute "kind" with the hexadecimal 938 number indicating the diagnostic Kind Type, this attribute has the 939 same value with Section 9.2, and at least one sub element "access- 940 node". 942 access-node: This element contains one hexadecimal number indicating 943 a NodeID, and the node with this NodeID is allowed to access the 944 diagnostic "kind" under the same diagnostic-kind element. 946 8. Security Considerations 948 The authorization for diagnostic information must be designed with 949 care to prevent it becoming a method to retrieve information for bot 950 attacks. It should also be noted that attackers can use diagnostics 951 to analyze overlay information to attack certain key peers. As this 952 document is a RELOAD extension, it follows RELOAD message header and 953 routing specifications, the common security considerations described 954 in the base document [RFC6940] are also applicable to this document. 955 Overlays may define their own requirements on who can collect/share 956 diagnostic information. 958 9. IANA Considerations 960 9.1. Diagnostics Flag 962 IANA SHALL create a "RELOAD Diagnostics Flag" Registry. Entries in 963 this registry are 1-bit flags contained in a 64-bits long integer 964 dMFlags denoting diagnostic information to be retrieved as described 965 in Section 4.3.1. New entries SHALL be defined via [RFC5226] 966 Standards Action. The initial contents of this registry are: 968 +-------------------------+------------------------------+--------+ 969 | diagnostic information |diagnostic flag in dMFlags | RFC | 970 |-------------------------+------------------------------+--------| 971 |Reserved | 0x 0000 0000 0000 0000 |RFC-XXXX| 972 |STATUS_INFO | 0x 0000 0000 0000 0001 |RFC-XXXX| 973 |ROUTING_TABLE_SIZE | 0x 0000 0000 0000 0002 |RFC-XXXX| 974 |PROCESS_POWER | 0x 0000 0000 0000 0004 |RFC-XXXX| 975 |UPSTREAM_BANDWIDTH | 0x 0000 0000 0000 0008 |RFC-XXXX| 976 |DOWNSTREAM_ BANDWIDTH | 0x 0000 0000 0000 0010 |RFC-XXXX| 977 |SOFTWARE_VERSION | 0x 0000 0000 0000 0020 |RFC-XXXX| 978 |MACHINE_UPTIME | 0x 0000 0000 0000 0040 |RFC-XXXX| 979 |APP_UPTIME | 0x 0000 0000 0000 0080 |RFC-XXXX| 980 |MEMORY_FOOTPRINT | 0x 0000 0000 0000 0100 |RFC-XXXX| 981 |DATASIZE_STORED | 0x 0000 0000 0000 0200 |RFC-XXXX| 982 |INSTANCES_STORED | 0x 0000 0000 0000 0400 |RFC-XXXX| 983 |MESSAGES_SENT_RCVD | 0x 0000 0000 0000 0800 |RFC-XXXX| 984 |EWMA_BYTES_SENT | 0x 0000 0000 0000 1000 |RFC-XXXX| 985 |EWMA_BYTES_RCVD | 0x 0000 0000 0000 2000 |RFC-XXXX| 986 |UNDERLAY_HOP | 0x 0000 0000 0000 4000 |RFC-XXXX| 987 |BATTERY_STATUS | 0x 0000 0000 0000 8000 |RFC-XXXX| 988 |Reserved | 0x FFFF FFFF FFFF FFFF |RFC-XXXX| 989 +-------------------------+------------------------------+--------+ 991 [To RFC editor: Please replace all RFC-XXXX in this document with the 992 RFC number of this document.] 994 9.2. Diagnostic Kind ID Types 996 IANA SHALL create a "RELOAD Diagnostic Kind ID Types" Registry. 997 Entries in this registry are 16-bit integers denoting diagnostics 998 extension data kind types carried in the diagnostic request and 999 response message, as described in Section 5.2. Code points from 1000 0x0000 to 0x003F SHALL be assigned together with flags within "RELOAD 1001 Diagnostics Flag" registry via RFC 5226 [RFC5226] standards action. 1002 Code points in the range 0x0040 to 0xEFFF SHALL be registered via RFC 1003 5226 standards action. 1005 +---------------------------+---------------+---------------+ 1006 | Diagnostic Kind Type | Code | Specification | 1007 +---------------------------+---------------+---------------+ 1008 | reserved | 0x0000 | RFC-XXXX | 1009 | STATUS_INFO | 0x0001 | RFC-XXXX | 1010 | ROUTING_TABLE_SIZE | 0x0002 | RFC-XXXX | 1011 | PROCESS_POWER | 0x0003 | RFC-XXXX | 1012 | UPSTREAM_BANDWIDTH | 0x0004 | RFC-XXXX | 1013 | DOWNSTREAM_BANDWIDTH | 0x0005 | RFC-XXXX | 1014 | SOFTWARE_VERSION | 0x0006 | RFC-XXXX | 1015 | MACHINE_UPTIME | 0x0007 | RFC-XXXX | 1016 | APP_UPTIME | 0x0008 | RFC-XXXX | 1017 | MEMORY_FOOTPRINT | 0x0009 | RFC-XXXX | 1018 | DATASIZE_STORED | 0x000A | RFC-XXXX | 1019 | INSTANCES_STORED | 0x000B | RFC-XXXX | 1020 | MESSAGES_SENT_RCVD | 0x000C | RFC-XXXX | 1021 | EWMA_BYTES_SENT | 0x000D | RFC-XXXX | 1022 | EWMA_BYTES_RCVD | 0x000E | RFC-XXXX | 1023 | UNDERLAY_HOP | 0x000F | RFC-XXXX | 1024 | BATTERY_STATUS | 0x0010 | RFC-XXXX | 1025 | reserved for future flags | 0x0011-40 | RFC-XXXX | 1026 | local use (reserved) | 0xF000-0xFFFE | RFC-XXXX | 1027 | reserved | 0xFFFF | RFC-XXXX | 1028 +---------------------------+---------------+---------------+ 1030 Table 1: Diagnostic Kind Types 1032 9.3. Message Codes 1034 This document introduces two new types of messages and their 1035 responses, requiring the following additions to the "RELOAD Message 1036 Code" Registry defined in RELOAD [RFC6940]. These additions are: 1038 +-------------------+------------+----------+ 1039 | Message Code Name | Code Value | RFC | 1040 +-------------------+------------+----------+ 1041 | path_track_req | 0x65 | RFC-AAAA | 1042 | path_track_ans | 0x66 | RFC-AAAA | 1043 +-------------------+------------+----------+ 1045 Table 2: Extensions to RELOAD Message Codes 1047 [To RFC editor: Values starting at 0x65 were used to prevent 1048 collisions with RELOAD base values and other extensions. Please 1049 replace with the next highest available values. The final message 1050 codes will be assigned by IANA. And all RFC-AAAA should be replaced 1051 with the RFC number of RELOAD when publication.] 1053 9.4. Error Code 1055 This document introduces the following new error codes, extending the 1056 "RELOAD Message Code" registry as described below: 1058 +----------------------------------------+------------+----------+ 1059 | Message Code Name | Code Value | RFC | 1060 +----------------------------------------+------------+----------+ 1061 | Error_Underlay_Destination_Unreachable | 0x65 | RFC-AAAA | 1062 | Error_Underlay_Time_Exceeded | 0x66 | RFC-AAAA | 1063 | Error_Message_Expired | 0x67 | RFC-AAAA | 1064 | Error_Upstream_Misrouting | 0x68 | RFC-AAAA | 1065 | Error_Loop_Detected | 0x69 | RFC-AAAA | 1066 | Error_TTL_Hops_Exceeded | 0x70 | RFC-AAAA | 1067 +----------------------------------------+------------+----------+ 1069 Table 3: Extensions to RELOAD Error Codes 1071 [To RFC editor: Values starting at 0x65 were used to prevent 1072 collisions with RELOAD base values and other extensions. Please 1073 replace with the next highest available values. The final message 1074 codes will be assigned by IANA. And all RFC-AAAA should be replaced 1075 with the RFC number of RELOAD when publication.] 1077 9.5. Message Extension 1079 This document introduces the following new RELOAD extension code: 1081 +-----------------+------------+----------+ 1082 | Extension Name | Code Value | RFC | 1083 +-----------------+------------+----------+ 1084 | Diagnostic_Ping | 0x0002 | RFC-AAAA | 1085 +-----------------+------------+----------+ 1087 Table 4: New RELOAD Extension Code 1089 [To RFC editor: The value 0x0002 was used to prevent collisions with 1090 other extensions. Please replace with the next highest available 1091 value. The final codes will be assigned by IANA. And all RFC-AAAA 1092 should be replaced with the RFC number of RELOAD when publication.] 1094 9.6. XML Name Space Registration 1096 This document registers a URI for the config-diagnostics XML 1097 namespaces in the IETF XML registry defined in [RFC3688]. All the 1098 elements defined in this document belong to this namespace. 1100 URI: urn:ietf:params:xml:ns:p2p:config-diagnostics 1101 Registrant Contact: The IESG. 1102 XML: N/A, the requested URIs are XML namespaces 1104 And the overlay configuration file MUST contain the following xml 1105 language declaring P2PSIP diagnostics as a mandatory extension to 1106 RELOAD. 1108 1109 urn:ietf:params:xml:ns:p2p:config-diagnostics 1110 1112 10. Acknowledgments 1114 We would like to thank Zheng Hewen for the contribution of the 1115 initial version of this document. We would also like to thank Bruce 1116 Lowekamp, Salman Baset, Henning Schulzrinne, Jiang Haifeng and Marc 1117 Petit-Huguenin for the email discussion and their valued comments, 1118 and special thanks to Henry Sinnreich for contributing to the usage 1119 scenarios text. We would like to thank the authors of the RELOAD 1120 protocol for transferring text about diagnostics to this document. 1122 11. References 1124 11.1. Normative References 1126 [RFC0792] Postel, J., "Internet Control Message Protocol", STD 5, 1127 RFC 792, September 1981. 1129 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1130 Requirement Levels", BCP 14, RFC 2119, March 1997. 1132 [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, 1133 January 2004. 1135 [RFC5905] Mills, D., Martin, J., Burbank, J., and W. Kasch, "Network 1136 Time Protocol Version 4: Protocol and Algorithms 1137 Specification", RFC 5905, June 2010. 1139 [RFC6940] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and 1140 H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) 1141 Base Protocol", RFC 6940, January 2014. 1143 11.2. Informative References 1145 [UnixTime] 1146 "UnixTime", .>. 1149 [I-D.ietf-p2psip-self-tuning] 1150 Maenpaa, J. and G. Camarillo, "Self-tuning Distributed 1151 Hash Table (DHT) for REsource LOcation And Discovery 1152 (RELOAD)", draft-ietf-p2psip-self-tuning-15 (work in 1153 progress), June 2014. 1155 [I-D.ietf-p2psip-concepts] 1156 Bryan, D., Matthews, P., Shim, E., Willis, D., and S. 1157 Dawkins, "Concepts and Terminology for Peer to Peer SIP", 1158 draft-ietf-p2psip-concepts-06 (work in progress), June 1159 2014. 1161 [Overlay-Failure-Detection] 1162 Zhuang, S., "On failure detection algorithms in overlay 1163 networks", Proc. IEEE Infocomm, Mar 2005. 1165 [Handling_Churn_in_a_DHT] 1166 Rhea, S., "Handling Churn in a DHT", USENIX Annual 1167 Conference, June 2004. 1169 [Diagnostic_Framework] 1170 Jin, X., "A Diagnostic Framework for Peer-to-Peer 1171 Streaming", 2005. 1173 [Diagnostics_and_NAT_traversal_in_P2PP] 1174 Gupta, G., "Diagnostics and NAT Traversal in P2PP - Design 1175 and Implementation", Columbia University Report , June 1176 2008. 1178 [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an 1179 IANA Considerations Section in RFCs", BCP 26, RFC 5226, 1180 May 2008. 1182 Appendix A. Examples 1184 Below, we sketch how these metrics can be used. 1186 A.1. Example 1 1188 A peer may set EWMA_BYTES_SENT and EWMA_BYTES_RCVD flags in the 1189 PathTrackReq to its direct neighbors. A peer can use EWMA_BYTES_SENT 1190 and EWMA_BYTES_RCVD of another peer to infer whether it is acting as 1191 a media relay. It may then choose not to forward any requests for 1192 media relay to this peer. Similarly, among the various candidates 1193 for filling up routing table, a peer may prefer a peer with a large 1194 UPTIME value, small RTT, and small LAST_CONTACT value. 1196 A.2. Example 2 1198 A peer may set the STATUS_INFO Flag in the PathTrackReq to a remote 1199 destination peer. The overlay has its own threshold definition for 1200 congestion. The peer can obtain knowledge of all the status 1201 information of the intermediate peers along the path. Then it can 1202 choose other paths to that node for the subsequent requests. 1204 A.3. Example 3 1206 A peer may use Ping to evaluate the average overlay hops to other 1207 peers by sending PingReq to a set of random resource or node IDs in 1208 the overlay. A peer may adjust its timeout value according to the 1209 change of average overlay hops. 1211 Appendix B. Problems with Generating Multiple Responses on Path 1213 An earlier version of this document considered an approach where a 1214 response was generated by each intermediate peer as the message 1215 traversed the overlay. This approach was discarded. One reason this 1216 approach was discarded was that it could provide a DoS mechanism, 1217 whereby an attacker could send an arbitrary message claiming to be 1218 from a spoofed "sender" the real sender wished to attack. As a 1219 result of sending this one message, many messages would be generated 1220 and sent back to the spoofed "sender" - one from each intermediate 1221 peer on the message path. While authentication mechanisms could 1222 reduce some risk of this attack, it still resulted in a fundamental 1223 break from the request-response nature of the RELOAD protocol, as 1224 multiple responses are generated to a single request. Although one 1225 request with responses from all the peers in the route will be more 1226 efficient, it was determined to be too great a security risk and 1227 deviation from the RELOAD architecture. 1229 Appendix C. Changes to the Draft 1231 To RFC editor: This section is to track the changes. Please remove 1232 this section before publication. 1234 C.1. Changes since -00 version 1236 1. Changed title from "Diagnose P2PSIP Overlay Network" to "P2PSIP 1237 Overlay Diagnostics". 1239 2. Changed the table of contents. Add a section about message 1240 processing and a section of examples. 1242 3. Merge diagnostics text from the p2psip base draft -01. 1244 4. Removed ECHO method for security reasons. 1246 C.2. Changes since -01 version 1248 Added BATTERY_STATUS as diagnostic information. 1250 Removed UnderlayTTL test from the Ping method, instead adding an 1251 UNDERLAY_HOP diagnostic information for PathTrack method. 1253 Give some examples for diagnostic information, and give some 1254 editor's notes for further work. 1256 C.3. Changes since -02 version 1258 Provided further explanation as to why the base draft Ping in the 1259 current form cannot be used to replace Ping, and why some combination 1260 of methods cannot replace PathTrack. 1262 C.4. Changes since -03 version 1264 Modified structure used to share information collected. Both 1265 mechanisms now use a common data structure to convey information. 1267 C.5. Changes since -04 version 1269 Updated the authors' addresses and modified the last sentence in . 1270 (Section 4.3.1.2) 1272 C.6. Changes since -05 version 1274 Resolve Marc's comments from the mailing list. And define the 1275 details of STATUS_INO. 1277 C.7. Changes in version -10 1279 Resolve the authorization issue and other comments (e.g. define 1280 diagnostics as a mandatory extension) from WGLC. And check for the 1281 languages. 1283 C.8. Changes in version -15 1285 Changed several diagnostic kind return values to be 64 bit vs. 32 bit 1286 to provide headroom. Split bandwidth into upstream and downstream. 1287 Renamed length in diagnostic request object to ext_length, added 1288 ext_length to response object, and clarified that ext_length is 1289 length of diagnostic info/extensions being returned, not the length 1290 of the object. 1292 Aligned many flags/values with RELOAD by using hex vs decimal values. 1294 Significant reorganization and edit for readability. 1296 Authors' Addresses 1298 Haibin Song 1299 Huawei 1301 Email: haibin.song@huawei.com 1303 Jiang Xingfeng 1304 Huawei 1306 Email: jiangxignfeng@huawei.com 1308 Roni Even 1309 Huawei 1310 14 David Hamelech 1311 Tel Aviv 64953 1312 Israel 1314 Email: roni.even@mail101.huawei.com 1315 David A. Bryan 1316 Ethernot.org 1317 Cedar Park, Texas 1318 United States of America 1320 Email: dbryan@ethernot.org 1322 Yi Sun 1323 ICT 1325 Email: sunyi@ict.ac.cn