idnits 2.17.1 draft-bierman-core-yang-hash-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == Line 298 has weird spacing: '...newhash uin...' -- The document date (February 10, 2016) is 2998 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) == Outdated reference: A later version (-18) exists of draft-ietf-netconf-restconf-07 == Outdated reference: A later version (-11) exists of draft-vanderstok-core-comi-08 Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 core A. Bierman 3 Internet-Draft YumaWorks 4 Intended status: Standards Track P. van der Stok 5 Expires: August 13, 2016 consultant 6 February 10, 2016 8 YANG Hash 9 draft-bierman-core-yang-hash-00 11 Abstract 13 This document describes an encoding of YANG names to 30 bit hashes. 14 This document extends the CoMI draft to reduce payload and URI of 15 CoMI network requests. The technique can be applied to other YANG 16 based applications to reduce the payload by replacing the YANG names 17 with 30 bit numbers. 19 Note 21 Discussion and suggestions for improvement are requested, and should 22 be sent to core@ietf.org. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on August 13, 2016. 41 Copyright Notice 43 Copyright (c) 2016 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 59 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 60 1.1.1. Tree Diagrams . . . . . . . . . . . . . . . . . . . . 4 61 2. YANG Hash Generation . . . . . . . . . . . . . . . . . . . . 4 62 3. Re-Hash Error Procedure . . . . . . . . . . . . . . . . . . . 5 63 4. Re-Hashing Names Procedure . . . . . . . . . . . . . . . . . 6 64 5. ietf-yang-hash YANG Module . . . . . . . . . . . . . . . . . 7 65 6. YANG Re-Hash Examples . . . . . . . . . . . . . . . . . . . . 9 66 6.1. Multiple Modules . . . . . . . . . . . . . . . . . . . . 11 67 6.2. Same Module . . . . . . . . . . . . . . . . . . . . . . . 11 68 7. Retrieval of Rehashed Data . . . . . . . . . . . . . . . . . 12 69 8. YANG Hash representations . . . . . . . . . . . . . . . . . . 13 70 8.1. YANG Hash in payload . . . . . . . . . . . . . . . . . . 13 71 8.2. YANG Hash in URL . . . . . . . . . . . . . . . . . . . . 13 72 9. YANG Hash Examples . . . . . . . . . . . . . . . . . . . . . 14 73 10. Security Considerations . . . . . . . . . . . . . . . . . . . 16 74 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16 75 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 16 76 13. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 17 77 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 17 78 14.1. Normative References . . . . . . . . . . . . . . . . . . 17 79 14.2. Informative References . . . . . . . . . . . . . . . . . 18 80 Appendix A. Hash clash probability . . . . . . . . . . . . . . . 18 81 Appendix B. Hash clash storage overhead . . . . . . . . . . . . 21 82 B.1. Server tables . . . . . . . . . . . . . . . . . . . . . . 21 83 B.2. Client tables . . . . . . . . . . . . . . . . . . . . . . 22 84 B.3. Table summary . . . . . . . . . . . . . . . . . . . . . . 22 85 Appendix C. payload reduction . . . . . . . . . . . . . . . . . 23 86 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 27 88 1. Introduction 90 The Constrained Application Protocol (CoAP) [RFC7252] is designed for 91 Machine to Machine (M2M) applications such as smart energy and 92 building control. Constrained devices need to be managed in an 93 automatic fashion to handle the large quantities of devices that are 94 expected in future installations. The messages between devices need 95 to be as small and infrequent as possible. The implementation 96 complexity and runtime resources need to be as small as possible. 98 The drafts [I-D.ietf-netconf-restconf] and [I-D.vanderstok-core-comi] 99 describe REST-like interfaces to access structured data defined in 100 YANG [RFC6020]. 102 The payload format CBOR [RFC7049] can be used to reduce the size of 103 the transported payload. In that case the size of the payload 104 depends for a large part on the YANG names. Reducing the names 105 significantly reduces the payload size further (see Appendix C). 106 This draft proposes a hashing technique to encode the YANG names into 107 30 bit numbers. 109 1.1. Terminology 111 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 112 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 113 document are to be interpreted as described in [RFC2119]. 115 Readers of this specification should be familiar with all the terms 116 and concepts discussed in [RFC2578]. 118 The following terms are defined in the NETCONF protocol [RFC6241]: 119 client, configuration data, data-store, and server. 121 The following terms are defined in the YANG data modelling language 122 [RFC6020]: container, data node, key, key leaf, leaf, leaf-list, and 123 list. 125 The following terms are defined in RESTCONF protocol 126 [I-D.ietf-netconf-restconf]: data resource, data-store resource, edit 127 operation, query parameter, target resource, and unified data-store. 129 The following terms are defined in this document: 131 YANG hash: CoMI object identifier, which is a 30-bit numeric hash of 132 the YANG object identifier string for the object. 134 Rehash bit: Bit 31. If a particular YANG hash value is a re-hash 135 for an identifier, then the rehash bit will be set in the object 136 identifier. This allows the server to return descendant nodes 137 that have been rehashed, instead of returning an error for an 138 entire GET request. 140 Data-node instance: An instance of a data-node specified in a YANG 141 module present in the server. The instance is stored in the 142 memory of the server. 144 Notification-node instance: An instance of a schema node of type 145 notification, specified in a YANG module present in the server. 147 The instance is generated in the server at the occurrence of the 148 corresponding event and appended to a stream. 150 1.1.1. Tree Diagrams 152 A simplified graphical representation of the data model is used in 153 the YANG modules specified in this document. The meaning of the 154 symbols in these diagrams is as follows: 156 Brackets "[" and "]" enclose list keys. 158 Abbreviations before data node names: "rw" means configuration 159 data (read-write) and "ro" state data (read-only). 161 Symbols after data node names: "?" means an optional node, "!" 162 means a presence container, and "*" denotes a list and leaf-list. 164 Parentheses enclose choice and case nodes, and case nodes are also 165 marked with a colon (":"). 167 Ellipsis ("...") stands for contents of subtrees that are not 168 shown. 170 2. YANG Hash Generation 172 The association between string value and string number is done 173 through a hash algorithm. The 30 least significant bits of the 174 "murmur3" 32-bit hash algorithm are used. This hash algorithm is 175 described online at [murmur3]. Implementations are available online 176 [murmur-imp]. When converting 4 input bytes to a 32-bit integer in 177 the hash algorithm, the Little-Endian convention MUST be used. 179 The "murmur3_32" hash function is executed for the entire path 180 string. The value '42' is used as the seed for the hash function. 181 The YANG hash is subsequently calculated by taking the 30 least 182 significant bits. 184 The resulting 30-bit number is used by the server, unless the value 185 is already being used for a different object by the server. In this 186 case, the re-hash procedure in Section 3 is executed. 188 The hash is generated for the string representing the object path 189 identifier. A canonical representation of the path identifier is 190 used. 192 The module name is used to identify the namespace of the object 193 node. The prefix cannot be used because it is allowed to change 194 over time. The module name is never allowed to change. 196 The module name MUST be present in the identifier for the first 197 node in the object path identifier. 199 If a child node in the object path identifier is from the same 200 module namespace as its parent, then the module-name MUST NOT be 201 used in the identifier. 203 If a child node in the object path identifier is not from the same 204 module namespace as its parent, then the module-name MUST be used 205 in the identifier. 207 Choice and case node names are not included in the path 208 expression. Only 'container', 'list', 'leaf', 'leaf-list', and 209 'anyxml' nodes are listed in the path expression. 211 The YANG Hash value is calculated for all data nodes in the 212 module, even if the server only implements a subset of these 213 objects. This includes all "data-def", "rpc", "notification", and 214 external data nodes derived from "augment" statements. 216 Example: the following canonical identifier is used for the 'mtu' 217 leaf in the ietf-interfaces module: 219 /ietf-interfaces:interfaces/interface/mtu 221 Example: the following canonical identifier is used for the 'ipv4' 222 container in the ietf-ip module, which augments the 'interface' list 223 in the ietf-interfaces module: 225 /ietf-interfaces:interfaces/interface/ietf-ip:ipv4 227 3. Re-Hash Error Procedure 229 In most cases, the hash function is expected to produce unique values 230 for all the node names supported by a constrained server. Given a 231 known set of YANG modules, both server and client can calculate the 232 YANG hashes independently, and offline. 234 Even though collisions are expected to happen rather rarely, they 235 need to be considered (see Appendix A for clash probabilities). 236 Collisions can be detected before deployment, if the vendor knows 237 which modules are supported by the server, and hence all YANG hashes 238 can be calculated. Collisions occur at a given server dependent on 239 the set of modules supported by the server. The client needs to 240 discover any re-hash mappings on a per server basis. 242 If the server needs to re-hash any YANG name, then it MUST create a 243 "rehash" entry for all its rehashed node names, as described in 244 Section 4. 246 A re-hashed object identifier has the rehash bit set in the 247 identifier, every time it is sent from the server to the client. 248 This allows the client to identify nodes for which a "reverse rehash" 249 entry may need to be retrieved (see Section 6). A client does not 250 need to retrieve the rehash map before retrieving or configuring 251 rehashed data nodes. 253 If any node identifier provided by the client is not available 254 because it has been rehashed, the server MUST return a rehash error, 255 containing the 'rehash' entries for all the invalid nodes which were 256 specified by the client. 258 It is possible that none of the node identifiers provided by the 259 client in a GET method are invalid and rehashed, but rather one or 260 more descendant nodes within the selected subtree(s) have been 261 rehashed. In this case, a rehash error is not returned. Instead the 262 requested subtree(s) are returned, and the rehash bit is set for any 263 descendant node(s) that have been rehashed. The client will strip 264 off the rehash bit and retrieve the 'revhash' entry for these nodes 265 (if not already done). 267 4. Re-Hashing Names Procedure 269 A hash collision occurs if two different path identifier strings have 270 the same hash value. If the server has around 1000 node names in its 271 YANG modules, then the probability of a collision is a half per mil 272 (see Appendix A). If a hash collision occurs on the server, then the 273 node name that is causing the conflict has to be altered, such that 274 the new hash value does not conflict with any value already in use by 275 the server. 277 For example, rehashing could be done by prefixing a "~" character in 278 front of the clashing name and execute murmur3 on the thus modified 279 name. If necessary the prefixing can be done multiple times until 280 the clashes are resolved. Using the prefixing is not needed from an 281 inter-operability point of view but provides a procedure for client 282 and server to calculate the rehash without reading the "rehash" 283 entry. 285 5. ietf-yang-hash YANG Module 287 The "ietf-yang-hash" YANG module is used by the server to report any 288 objects that have been mapped to produce a new hash value that does 289 not conflict with any other YANG hash values used by the server. 291 YANG tree diagram for "ietf-yang-hash" module: 293 +--ro yang-hash 294 +--ro rehash* [hash] 295 +--ro hash uint32 296 +--ro object* 297 +--ro module string 298 +--ro newhash uint32 299 +--ro path? string 301 file "ietf-yang-hash@2016-02-10.yang" 303 module ietf-yang-hash { 304 namespace "urn:ietf:params:xml:ns:yang:ietf-yang-hash"; 305 prefix "yh"; 307 organization 308 "IETF CORE (Constrained RESTful Environments) Working Group"; 310 contact 311 "WG Web: 312 WG List: 314 WG Chair: Carsten Bormann 315 317 WG Chair: Andrew McGregor 318 320 Editor: Peter van der Stok 321 323 Editor: Andy Bierman 324 326 description 327 "This module contains re-hash information for the CoMI protocol. 329 Copyright (c) 2016 IETF Trust and the persons identified as 330 authors of the code. All rights reserved. 332 Redistribution and use in source and binary forms, with or 333 without modification, is permitted pursuant to, and subject 334 to the license terms contained in, the Simplified BSD License 335 set forth in Section 4.c of the IETF Trust's Legal Provisions 336 Relating to IETF Documents 337 (http://trustee.ietf.org/license-info). 339 This version of this YANG module is part of RFC XXXX; see 340 the RFC itself for full legal notices."; 342 // RFC Ed.: replace XXXX with actual RFC number and remove this 343 // note. 345 // RFC Ed.: remove this note 346 // Note: extracted from draft-bierman-core-yang-hash-00.txt 348 // RFC Ed.: update the date below with the date of RFC publication 349 // and remove this note. 350 revision 2016-02-10 { 351 description 352 "Initial revision."; 353 reference 354 "RFC XXXX: YANG Hash."; 355 } 357 container yang-hash { 358 config false; 359 description 360 "Contains information on the YANG Hash values used by 361 the server."; 363 list rehash { 364 key hash; 365 description 366 "Each entry describes an re-hash mapping in use by 367 the server."; 369 leaf hash { 370 type uint32; 371 description 372 "The hash value that has a collision. This hash value 373 cannot be used on the server. The rehashed 374 value for each affected object must be used instead."; 375 } 377 list object { 378 min-elements 2; 380 description 381 "Each entry identifies one of the objects involved in the 382 hash collision and contains the rehash information for 383 that object."; 385 leaf module { 386 type string; 387 mandatory true; 388 description 389 "The module name identifying the module namespace 390 for this object."; 391 } 393 leaf newhash { 394 type uint32; 395 mandatory true; 396 description 397 "The new hash value for this object. The rehash bit is 398 not set in this value."; 399 } 401 leaf path { 402 type string; 403 description 404 "The object path identifier string used in the original 405 YANG hash calculation. This object MUST be included for 406 any objects in the rehash entry with the same 'module' 407 value."; 408 } 409 } 410 } 411 } 413 } 415 417 6. YANG Re-Hash Examples 419 In this example there are two YANG modules, "foo" and "bar". 421 module foo { 422 namespace "http://example.com/ns/foo"; 423 prefix "f"; 424 revision 2015-06-07; 426 container A { 427 list B { 428 key name; 429 leaf name { type string; } 430 leaf counter1 { type uint32; } 431 } 432 } 433 } 435 module bar { 436 namespace "http://example.com/ns/bar"; 437 prefix "b"; 438 import foo { prefix f; } 439 revision 2015-06-07; 441 augment /f:A/f:B { 442 leaf counter2 { type uint32; } 443 } 444 } 446 This set of 3 YANG modules containing a total of 7 objects produces 447 the following object list. Note that actual hash values are not 448 shown, since these modules do not actually cause the YANG Hash 449 clashes described in the examples. 451 Object Path Hash 453 foo: 455 container /foo:A h1 456 list /foo:A/B h2 457 leaf /foo:A/B/name h3 458 leaf /foo:A/B/counter1 h4 460 bar: 462 leaf /foo:A/B/bar1:counter2 h5 464 6.1. Multiple Modules 466 In this example, assume that the 'B' and 'counter2' objects produce 467 the same hash value, so 'h2' and 'h5' both have the same value (e.g. 468 '1234'): 470 The client might retrieve an entry from the list "/foo:A/B", which 471 would cause this subtree to be returned. Instead, the server will 472 return a message with the resource type "core.mg.yang-hash", 473 representing the "yang-hash" data structure. Only the entry for the 474 requested identifier is returned, even if multiple 'rehash' list 475 entries exist. 477 REQ: GET example.com/mg/h2?keys="entry2" 479 RES: 4.00 "Bad Request" (Content-Format: application/cbor) 480 { 481 "ietf-yang-hash:yang-hash" : { 482 "rehash" : [ 483 { 484 "hash" : 1234, 485 "object" : [ 486 { 487 "module" : "foo", 488 "newhash" : 5678 489 }, 490 { 491 "module" : "bar", 492 "newhash" : 8182 493 } 494 ] 495 } 496 ] 497 } 498 } 500 6.2. Same Module 502 In this example, assume that the 'B', 'counter1', and 'counter2' 503 objects produce the same hash value, so 'h2', 'h4', and 'h5' objects 504 all have the same value (e.g. '1234'): 506 The client might retrieve an entry from the list "/foo:A/B", which 507 would cause this subtree to be returned. Instead, the server will 508 return a message with the resource type "core.mg.yang-hash", 509 representing the "yang-hash" data structure. Only the entry for the 510 requested identifier is returned, even if multiple 'rehash' list 511 entries exist. 513 REQ: GET example.com/mg/h2?keys="entry2" 515 RES: 4.00 "Bad Request" (Content-Format: application/cbor) 516 { 517 "ietf-yang-hash:yang-hash" : { 518 "rehash" : [ 519 { 520 "hash" : 1234, 521 "object" : [ 522 { 523 "module" : "foo", 524 "newhash" : 5678, 525 "path" : "/foo:A/B" 527 }, 528 { 529 "module" : "foo", 530 "newhash" : 2134, 531 "path" : "/foo:A/B/counter1" 532 }, 533 { 534 "module" : "bar", 535 "newhash" : 8182, 536 "path" : "/foo:A/B/bar:counter2" 537 } 538 ] 539 } 540 ] 541 } 542 } 544 7. Retrieval of Rehashed Data 546 In this example, assume that the 'B', 'counter1', and 'counter2' 547 objects produce the same hash value, so 'h2', 'h4', and 'h5' objects 548 all have the same value (e.g. '1234'): 550 The client might retrieve the top-level container "/foo:A", which 551 would cause this subtree to be returned. Since the identifier (h1) 552 has not been re-hashed, the server will return the requested data. 553 The new hashes for 'h2', 'h4',and 'h5' will be returned, except the 554 rehash bit will be set for these identifiers. 556 The notation "R+" indicates that the rehash bit is set. 558 REQ: GET example.com/mg/h1 560 RES: 2.05 Content (Content-Format: application/cbor) 561 { 562 h1 : { 563 R+5678 : { 564 { h3 : "entry1"}: 565 {R+2134: 615, 566 R+8182: 7}, 567 { h3 : "entry2"}: 568 {R+2134: 491, 569 R+8182: 26} 570 } 571 } 572 } 574 The client will notice that the rehash bit is set for 3 nodes. The 575 client will need to retrieve the full "yang-hash" container at this 576 point, if that has not already been done. The rehashed identifiers 577 will be in "rehash" list, contained in the "newhash" leaf for the 578 "object" list. 580 8. YANG Hash representations 582 YANG hashes are represented in two fashions. 584 8.1. YANG Hash in payload 586 When a YANG hash value is printed in the payload, error-path or other 587 string, then the lowercase hexadecimal representation is used. 588 Leading zeros are used so the value uses 8 hex characters. 590 8.2. YANG Hash in URL 592 When a URL contains a YANG hash, it is encoded using base64url "URL 593 and Filename safe" encoding as specified in [RFC4648]. 595 The hash H is represented as a 30-bit integer, divided into five 596 6-bit integers as follows: 598 B1 = (H & 0x3f000000) >> 24 599 B2 = (H & 0xfc0000) >> 18 600 B3 = (H & 0x03f000) >> 12 601 B4 = (H & 0x000fc0) >> 6 602 B5 = H & 0x00003f 603 Subsequently, each 6-bit integer Bx is translated into a character Cx 604 using Table 2 from [RFC4648], and a string is formed by concatenating 605 the characters in the order C1, C2, C3, C4, C5. 607 For example, the YANG hash 0x29abdcca is encoded as "pq9zK". 609 9. YANG Hash Examples 611 The YANG hash value for 'current-datetime' is calculated by 612 constructing the schema node identifier for the object: 614 /ietf-system:system-state/clock/current-datetime 616 The 30 bit murmur3 hash value (see Section 2) is calculated on this 617 string with hash: 0x047c468b and EfEaM. The request using this hash 618 value is shown below: 620 REQ: GET example.com/mg/EfEaM 622 RES: 2.05 Content (Content-Format: application/cbor) 623 { 624 0x047c468b : "2014-10-26T12:16:31Z" 625 } 627 The YANG hash values for 'clock', 'current-datetime', and 'boot- 628 datetime' are calculated by constructing the schema node identifier 629 for the objects, and then calculating the 30 bit murmur3 hash values 630 (shown in parenthesis): 632 /ietf-system:system-state/clock (0x021ca491 and CDKSQ) 633 /ietf-system:system-state/clock/current-datetime (0x047c468b) 634 /ietf-system:system-state/clock/boot-datetime (0x1fb5f4f8) 636 The YANG hash values for 'neighbor', 'ip', and 'link-layer-address' 637 are calculated by constructing the schema node identifier for the 638 objects, and then calculating the 30 bit murmur3 hash values (shown 639 in parenthesis): 641 /ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor 642 (0x2445e478 and kReR4) 643 /ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor/ip 644 (0x2283ed40 and ig-la) 645 /ietf-interfaces:interfaces/interface/ietf-ip:ipv6/neighbor/ 646 link-layer-address (0x3d6915c7) 648 The YANG translation of the SMI specifying the ipNetToMediaTable 649 [RFC4293] yields: 651 container IP-MIB { 652 container ipNetToPhysicalTable { 653 list ipNetToPhysicalEntry { 654 key "ipNetToPhysicalIfIndex 655 ipNetToPhysicalNetAddressType 656 ipNetToPhysicalNetAddress"; 657 leaf ipNetToMediaIfIndex { 658 type: int32; 659 } 660 leaf ipNetToPhysicalIfIndex { 661 type if-mib:InterfaceIndex; 662 } 663 leaf ipNetToPhysicalNetAddressType { 664 type inet-address:InetAddressType; 665 } 666 leaf ipNetToPhysicalNetAddress { 667 type inet-address:InetAddress; 668 } 669 leaf ipNetToPhysicalPhysAddress { 670 type yang:phys-address { 671 length "0..65535"; 672 } 673 } 674 leaf ipNetToPhysicalLastUpdated { 675 type yang:timestamp; 676 } 677 leaf ipNetToPhysicalType { 678 type enumeration { ... } 679 } 680 leaf ipNetToPhysicalState { 681 type enumeration { ... } 682 } 683 leaf ipNetToPhysicalRowStatus { 684 type snmpv2-tc:RowStatus; 685 } 686 } 687 } 689 The YANG hash values for 'ipNetToPhysicalEntry' and its child nodes 690 are calculated by constructing the schema node identifier for the 691 objects, and then calculating the 30 bit murmur3 hash values (shown 692 in parenthesis): 694 /IP-MIB:IP-MIB/ipNetToPhysicalTable (0x0aba15cc and kuhXM) 695 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry 696 (0xo6aaddbc and Gqt28) 697 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 698 ipNetToPhysicalIfIndex (0x346b3071) 700 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 701 ipNetToPhysicalNetAddressType (0x3650bb64) 702 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 703 ipNetToPhysicalNetAddress (0x06fd4d91) 704 /IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 705 ipNetToPhysicalPhysAddress (0x26180bcb) 706 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 707 ipNetToPhysicalLastUpdated (0x3d6bbe90) 708 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 709 ipNetToPhysicalType (0x35ecbb3d) 710 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 711 ipNetToPhysicalState (0x13038bb5) 712 /IP-MIB:IP-MIB/ipNetToPhysicalTable/ipNetToPhysicalEntry/ 713 ipNetToPhysicalRowStatus (0x09e1fa37) 715 The YANG Hash values for the YANG Patch request objects are 716 calculated as follows: 718 0x2c3f93c7: /ietf-yang-patch:yang-patch 719 0x2fb8873e: /ietf-yang-patch:yang-patch/patch-id 720 0x011640f0: /ietf-yang-patch:yang-patch/comment 721 0x16804b72: /ietf-yang-patch:yang-patch/edit 722 0x2bd93228: /ietf-yang-patch:yang-patch/edit/edit-id 723 0x1959d8c9: /ietf-yang-patch:yang-patch/edit/operation 724 0x1346e0aa: /ietf-yang-patch:yang-patch/edit/target 725 0x0750e196: /ietf-yang-patch:yang-patch/edit/point 726 0x0b45277e: /ietf-yang-patch:yang-patch/edit/where 727 0x2822c407: /ietf-yang-patch:yang-patch/edit/value 729 10. Security Considerations 731 The replacement of name-strings by numbers does not affect the 732 security of the transmitted requests. 734 11. IANA Considerations 736 No considerations for IANA apply. 738 12. Acknowledgements 740 We are very grateful to Bert Greevenbosch who suggested the use of 741 hashes and specified the use of murmur3. Many thanks for their 742 contributions go to Alexander Pelov, Juergen Schonwalder, Anuj Sehgal 743 and Michel Veillette. 745 This material is based upon work supported by the The Space & 746 Terrestrial Communications Directorate (S&TCD) under Contract No. 747 W15P7T-13-C-A616. Any opinions, findings and conclusions or 748 recommendations expressed in this material are those of the author(s) 749 and do not necessarily reflect the views of The Space & Terrestrial 750 Communications Directorate (S&TCD) . 752 13. Changelog 754 Version 0 is extracted from comi draft version 8 755 [I-D.vanderstok-core-comi]. Changed Appendix A, and added 756 Appendix C. 758 14. References 760 14.1. Normative References 762 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 763 Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/ 764 RFC2119, March 1997, 765 . 767 [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data 768 Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, 769 . 771 [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for 772 the Network Configuration Protocol (NETCONF)", RFC 6020, 773 DOI 10.17487/RFC6020, October 2010, 774 . 776 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 777 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 778 October 2013, . 780 [RFC7252] Shelby, Z., Hartke, K., and C. Bormann, "The Constrained 781 Application Protocol (CoAP)", RFC 7252, DOI 10.17487/ 782 RFC7252, June 2014, 783 . 785 [murmur3] , "murmurhash family", Web 786 http://en.wikipedia.org/wiki/MurmurHash, . 788 [murmur-imp] 789 , "murmurhash implementation", Web https://code.google.com 790 /p/smhasher/, . 792 14.2. Informative References 794 [RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J. 795 Schoenwaelder, Ed., "Structure of Management Information 796 Version 2 (SMIv2)", STD 58, RFC 2578, DOI 10.17487/ 797 RFC2578, April 1999, 798 . 800 [RFC4293] Routhier, S., Ed., "Management Information Base for the 801 Internet Protocol (IP)", RFC 4293, DOI 10.17487/RFC4293, 802 April 2006, . 804 [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., 805 and A. Bierman, Ed., "Network Configuration Protocol 806 (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, 807 . 809 [I-D.ietf-netconf-restconf] 810 Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF 811 Protocol", draft-ietf-netconf-restconf-07 (work in 812 progress), July 2015. 814 [I-D.vanderstok-core-comi] 815 Stok, P., Bierman, A., Schoenwaelder, J., and A. Sehgal, 816 "CoAP Management Interface", draft-vanderstok-core-comi-08 817 (work in progress), October 2015. 819 [coll-prob] 820 Preshing, j., "Hash collision probabilities", Web 821 http://preshing.com/20110504/hash-collision-probabilities, 822 May 2011. 824 [birthday] 825 Wikipedia, , "Birthday problem", Web https:// 826 en.wikipedia.org/wiki/Birthday_problem, . 828 Appendix A. Hash clash probability 830 +--------+---------+---------+---------+---------+---------+--------+ 831 | Number | 28 bits | 29 bits | 30 bits | 31 bits | 32 bits | 33 | 832 | of | | | | | | bits | 833 | names | | | | | | | 834 +--------+---------+---------+---------+---------+---------+--------+ 835 | 10 | 1,7E-07 | 8,4E-08 | 4,2E-08 | 2,1E-08 | 1,1E-08 | 5,2E-0 | 836 | | | | | | | 9 | 837 | | | | | | | | 838 | 100 | 1,8E-05 | 9,2E-06 | 4,6E-06 | 2,3E-06 | 1,2E-06 | 5,8E-0 | 839 | | | | | | | 7 | 840 | | | | | | | | 841 | 200 | 7,4E-05 | 3,7E-05 | 1,9E-05 | 9,3E-06 | 4,6E-06 | 2,3E-0 | 842 | | | | | | | 6 | 843 | | | | | | | | 844 | 10^3 | 1,9E-03 | 9,3E-04 | 4,7E-04 | 2,3E-04 | 1,2E-04 | 5,8E-0 | 845 | | | | | | | 5 | 846 | | | | | | | | 847 | 4000 | 3,0E-02 | 1,5E-02 | 7,5E-03 | 3,7E-03 | 1,9E-03 | 9,3E-0 | 848 | | | | | | | 4 | 849 | | | | | | | | 850 | 10^4 | 1,9E-01 | 9,3E-02 | 4,6E-02 | 2,3E-02 | 1,2E-02 | 5,8E-0 | 851 | | | | | | | 3 | 852 +--------+---------+---------+---------+---------+---------+--------+ 854 Table 1: Probability of one or more clashes 856 This appendix calculates the probability of a hash clash as function 857 of the hash size and the number of YANG names. The standard way to 858 calculate the probability of a clash is to calculate the probability 859 that no clashes occur [birthday], [coll-prob]. 861 The probability of no clashes when generating k numbers with a hash 862 size of N=2^bits is given by: 864 D(N,k) = ((N-1)/N)*((N-2)/N)*....(N-(k-1))/N (1) 866 which can be approximated with: 868 exp(-k*(k-1)/2N) (2) 870 The probability that one or more clashes occur is given by: 872 1 - exp(-k*(k-1)/2N) ~ k*(k-1)/2N (3) 874 Table 1 shows the probabilities for a given set of values of N=2^bits 875 and number of YANG node names k. Probabilities which are larger than 876 0.5 are not shown because the used approximations are not accurate 877 any more. 879 The overhead in servers and clients depends on the number of clashes. 880 Therefore it is interesting to know the probability that more than 881 one clash occurs. The probability of generating k numbers with a 882 hash size of N=2^bits, where 2 numbers are identical and all the rest 883 is different, is composed of the following parts. The probability 884 that the second number is equal to the first is 1/N. The possible 885 number of configurations of 2 equal numbers out of k is given by 886 SUM_i=1,k-1 (i). The probability of k-1 different numbers is given 887 by D(N,k-1). The probability of generating exactly one clash of two 888 numbers is given by: 890 (SUM_1,k-1 (i))*D(N,k-1)/N 892 Where we used formula (1). Working out the summation and using (2), 893 the probability that exactly one pair of hashes clashes is given by: 895 (k*(k-1)/2N)*exp(-(k-1)*(k-2)/2N) 897 The probability that more than one pair clashes is given by the 898 probability that a clash occurs minus the probability that only one 899 pair clashes. This leads to: 901 1 - exp(-k*(k-1)/2N) - (k*(k-1)/2N)*exp(-(k-1)*(k-2)/2N) 903 Substituting formula 3, gives: 905 k*(k-1)/2N - k*(k-1)/2N + (k*(k-1)^2*(k-2)/4N^2 = 907 (k*(k-1)^2*(k-2)/4N^2 909 +--------+---------+---------+---------+---------+---------+--------+ 910 | Number | 28 bits | 29 bits | 30 bits | 31 bits | 32 bits | 33 | 911 | of | | | | | | bits | 912 | names | | | | | | | 913 +--------+---------+---------+---------+---------+---------+--------+ 914 | 10 | 2,3E-14 | 5,6E-15 | 1,4E-15 | 3,5E-16 | 8,8E-17 | 2,2E-1 | 915 | | | | | | | 7 | 916 | | | | | | | | 917 | 100 | 3,3E-10 | 8,3E-11 | 2,1E-11 | 5,2E-12 | 1,3E-12 | 3,3E-1 | 918 | | | | | | | 3 | 919 | | | | | | | | 920 | 200 | 5,4E-09 | 1,4E-09 | 3,4E-10 | 8,5E-1 | 2,1E-11 | 5,3E-1 | 921 | | | | | | | 2 | 922 | | | | | | | | 923 | 10^3 | 3,5E-06 | 8,6E-07 | 2,2E-07 | 5,4E-08 | 1,4E-08 | 3,4E-0 | 924 | | | | | | | 9 | 925 | | | | | | | | 926 | 4000 | 8,9E-04 | 2,2E-04 | 5,6E-05 | 1,4E-05 | 3,5E-06 | 8,7E-0 | 927 | | | | | | | 7 | 928 | | | | | | | | 929 | 10^4 | 3,5E-02 | 8,7E-03 | 2,2E-03 | 5,4E-04 | 1,4E-04 | 3,4E-0 | 930 | | | | | | | 5 | 931 +--------+---------+---------+---------+---------+---------+--------+ 933 Table 2: Probability of more than 2 entries equal clashes 935 The corresponding probabilities are shown in Table 2. Assuming a 936 hash size of 2^30, and about 1000 YANG nodes in a server, the 937 probability of one clashing pair is 0.5*10^-3, and the probability 938 that more clashes occur is 2*10^-7. 940 Appendix B. Hash clash storage overhead 942 Clashes may occur in servers dynamically during the operation of 943 their clients, and clashes must be handled on a per server basis in 944 the client. When rehashing is possible, clashing names on a given 945 server are prefixed with a character (for example "~") and are 946 rehashed, thus leading to hash values which uniquely identify the 947 data nodes in the server. This appendix calculates the storage space 948 needed when a clash occurs in a set of servers running the same 949 server code. Appendix A shows that more than one clash in a server 950 set is exceptional, which suggests at most two clashing object names 951 in a given server. 953 The sizes of server and client tables needed to handle the clashes in 954 client and server are calculated separately, because they differ 955 significantly. 957 B.1. Server tables 959 When a request arrives at the server, the server must relate the 960 incoming hash value to the memory locations where the related values 961 are stored. In the server a translation table must be provided that 962 relates a hash value to a memory address where either the raw data or 963 a description of the data (as prescribed by the YANG compiler) are 964 stored. The required storage space is a sequence of (32 bit yang 965 hash, 64 bit memory address) for every YANG data node. The 966 translation table size in a server is 12 bytes times the number of 967 YANG data nodes in the server. 969 For every clashing hash value the following server clash table 970 entries are needed: Clashed hash value, module name, and new hash. 971 To reduce table size in the client, module name can be replaced with 972 a 1 byte module identifier. The module identifier represents the 973 index value of an array of module names. Server clash table size is: 974 2 hashes (8 bytes) + 1 module identifier (1 byte) 976 B.2. Client tables 978 In the client, the compiled code must refer to a hash value. To cope 979 with on-the-fly rehashing, the compiled code needs to invoke a 980 procedure that returns the possibly rehashed value as function of the 981 original hash value, module name, and server address. The client 982 needs to store a client clash table containing: the clashed hash 983 value, module name, server IPv6 address (or name), and rehash value 984 for as many rehashes occurring in a given server. Many servers 985 contain an identical set of YANG modules. The servers containing the 986 same module set belong to the same server type. The server type is 987 used to administrate the hash clash occurrence. To reduce client 988 clash table size, module name can be replaced with a 1 byte module 989 identifier. The module identifier represents the index value of an 990 array of module names. A table of IPv6 server addresses must already 991 exist in the client. To reduce client clash table size further, the 992 server IPv6 address can be replaced with a 1 byte server type 993 identifier. The server table can be ordered according to server 994 type. A table with server type and pointer to sub-table start 995 suffices to find all IPv6 addresses belonging to a server type. 997 The client clash table reduces to clashed hash value (4 bytes), 998 module identifier (1 byte), server type identifier (1 byte) and 999 rehash value (4 bytes). 1001 B.3. Table summary 1003 Sizes of all the tables are: 1005 Server clash table: 9 bytes per clashing object name. 1007 Client clash table: 10 bytes per server type, per clashing object 1008 name. 1010 Array of module names: Sum of module name sizes. 1012 Server identifier table: 1 byte server type + 4 bytes pointer per 1013 server type. 1015 The existence of the translation table in a server is required 1016 independent of rehashing. The table sizes calculated to estimate the 1017 storage requirements coming from CoMI clashes. Assume the following 1018 numbers: 1020 o 500 data nodes per server 1022 o 10 server types 1023 o 30 modules 1025 o Module name is on average 20 bytes 1027 o Maximum of 2 clashing object names occurring in 2 server types 1029 This yields the following overhead estimates: 1031 Server tables size: 1033 * Server clash table: 2*9 bytes represents 18 bytes 1035 * Module name array: 30*20 represents 600 bytes 1037 Client table sizes: 1039 * Client clash table: 10*2*2 represents 40 bytes for 2 object 1040 names in 2 server types. 1042 * Module name array: 30*20 represents 600 bytes. 1044 * Server identifier table: 10*5 = 50 bytes 1046 In conclusion: 1048 1. Storage space size in client is independent of number of servers 1049 but depends on number of server types. 1051 2. There is a common storage size for the module array of 600 bytes. 1053 3. Assuming 2 clashing object names in 2 server types, additional 1054 storage space in client is 40 bytes and in server 18 bytes. 1056 4. When the module array is suppressed (removing 600 bytes storage 1057 space), the server clash table and the client clash table 1058 increase with 40 bytes and 80 bytes respectively. 1060 Appendix C. payload reduction 1062 The hashing of the YANG identifier in the transported payload reduces 1063 the size of the YANG objects transported in the payload. This note 1064 calculates the payload size reduction and the number of YANG objects 1065 that can be transported in a single 802.154 CoAP packet. The payload 1066 is assumed to be a sequence of maps, where the entry ("ident" : 1067 value) is referred to as a map, as shown below. The value can be an 1068 integer or a string. 1070 { 1071 "ident1" : value 1, 1072 "ident2" : value 2. 1073 Etc ..... 1074 } 1076 In general, we can assume that the payload size (SI) of a YANG 1077 identifier string lies between 6 to 128 bytes with an average of 1078 30-40 bytes. The payload size (SV) of a value can range between 1 1079 byte to 128 bytes. The transport format is CBOR. The overhead 1080 coming from CBOR is composed of: 1082 o One byte to indicate the number of maps ( > 2 maps in the example 1083 above). 1085 o Two bytes per map: one describing the identifier, and one 1086 describing the value. 1088 When the CBOR byte describes an integer (major type 0), the size of 1089 the following integer is 0, when integer < 24. Otherwise the size of 1090 the following integer is 1 to 8 bytes. The size of the string 1091 remains unchanged when preceded by a CBOR byte (major type 2). When 1092 the string size < 24 , no extra bytes are needed; when the string 1093 size lies between 24 and 128 one extra CBOR byte overhead is needed. 1094 The parameters SE, SV and SI are introduced with the following 1095 meaning: 1097 o SE is the size of the identifier encoded by a hash or other 1098 unsigned integer, with 0 < SE < 5; because the hash size is 4 1099 bytes and the minimum managed encoded identifier is assumed to 1100 take 1 byte. 1102 o SV is the size of the value, with 0 <= SV < 128. 1104 o SI is the size of the original YANG hash identifier string, with 4 1105 < SI < 64. 1107 The impact of the conversion from YANG identifier string to unsigned 1108 integer is straightforward to calculate per map. It is assumed that 1109 one CBOR byte or two CBOR bytes are needed dependent on the sizes SI 1110 and SV. A map size with the original YANG identifier string is given 1111 by: 1113 o Map size is: 2 + SI + SV, when SI, SV < 24; 1115 o Map size is: 3 + SI + SV, when SI < 24 and SV > 23, or SI > 23 and 1116 SV < 24; 1118 o Map size is: 4 + SI + SV, when SI, SV > 23. 1120 The map size when SI is converted to an unsigned integer with size SE 1121 is given by: 1123 o Encoded map size is: 2 + SE + SV, when SV < 24; 1125 o Encoded map size is: 3 + SE + SV, when SV > 23. 1127 The improvement of the conversion can be written as 1129 o (2+SE+SV)/(2+SI+SV) when SI, SV < 24; 1131 o (2+SE+SV)(3+SI+SV) when SI > 23 and SV < 24; 1133 o (3+SE+SV)/(3+SI+SV) when SI < 24, and SV > 23; 1135 o (3+SE+SV)/(4+SI+SV) when SI, SV > 23. 1137 Table 3 shows the size reduction for different SI and SV values and 1138 SE = 4: 1140 +-------+------+------+------+------+------+-------+------+ 1141 | SV-> | 0 | 4 | 10 | 20 | 30 | 40 | 60 | 1142 +-------+------+------+------+------+------+-------+------+ 1143 | SI=6 | 0,75 | 0,83 | 0,89 | 0,93 | 0,95 | 0,96 | 0,97 | 1144 | | | | | | | | | 1145 | SI=10 | 0,83 | 0,88 | 0,91 | 0,94 | 0,95 | 0,96 | 0,97 | 1146 | | | | | | | | | 1147 | SI=20 | 0,91 | 0,92 | 0,94 | 0,95 | 0,96 | 0,97 | 0,98 | 1148 | | | | | | | | | 1149 | SI=30 | 0,97 | 0,97 | 0,98 | 0,98 | 0,98 | 0,99 | 0,99 | 1150 | | | | | | | | | 1151 | SI=40 | 0,98 | 0,98 | 0,98 | 0,98 | 0,99 | 0,99 | 0,99 | 1152 +-------+------+------+------+------+------+-------+------+ 1154 Table 3: Payload size reduction as function of SI and SV 1156 Another way to look at it is to see how many maps fit in a packet. 1157 Assuming a 802.15.4 packet with short addresses, the IEEE header is 1158 13 bytes. The CoAP header assuming only mesh traffic takes 6 bytes. 1159 The URI is composed of 3 options, where each option header takes 1 1160 byte: total of 3 bytes. The URI is assumed to be split up in the 1161 following 3 parts: 1163 o URI-host "example.net" takes 13 bytes, which necessitates 1 length 1164 byte: total of 14 bytes. 1166 o URI-path = "mg" takes 4 bytes. 1168 o URI-path = "hash value" takes 7 bytes. 1170 Total URI takes 3+14+4+7 = 28 bytes. For the CoMI payload, there 1171 remains 127 -13 - 6 - 28 = 80 bytes. Subtracting the byte indicating 1172 the number of CBOR maps, the payload size for the maps is 79 bytes. 1174 N.B. no security overhead is included. 1176 When the YANG string identifier needs to be stored, the number of 1177 storable maps is given by: 1179 o 79/(2+SI+SV) for SI, SV < 24; 1181 o 79/(3+SI+SV) for SI < 24 and SV > 23, or SI > 23 and SV < 24; 1183 o 79/(4+SI+SV) for SI, SV > 23. 1185 Table 4 shows the number of transported maps for different values of 1186 SI and SV. 1188 +-------+----+---+-----+-----+-----+-----+-----+ 1189 | SV-> | 0 | 4 | 10 | 20 | 30 | 40 | 60 | 1190 +-------+----+---+-----+-----+-----+-----+-----+ 1191 | SI=6 | 9 | 6 | 4 | 2 | 2 | 1 | 1 | 1192 | | | | | | | | | 1193 | SI=10 | 6 | 4 | 3 | 2 | 1 | 1 | 1 | 1194 | | | | | | | | | 1195 | SI=20 | 3 | 3 | 2 | 1 | 1 | 1 | 0 | 1196 | | | | | | | | | 1197 | SI=30 | 2 | 2 | 1 | 1 | 1 | 1 | 0 | 1198 | | | | | | | | | 1199 | SI=40 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1200 +-------+----+---+-----+-----+-----+-----+-----+ 1202 Table 4: Nr of transported maps as function of SI and SV 1204 Assuming the encoding of the YANG identifier to an unsigned integer 1205 with size 0 < SE < 5, the number of storable maps is given by: 1207 o 79/(2+SE+SV) for SV < 24; 1209 o 79/(3+SE+SV) for SV > 23. 1211 Table 5 shows the number of transported maps for different values of 1212 SE and SV, independent of SI. 1214 +------+----+----+-----+-----+-----+-----+-----+ 1215 | SV-> | 0 | 4 | 10 | 20 | 30 | 40 | 60 | 1216 +------+----+----+-----+-----+-----+-----+-----+ 1217 | SE=1 | 26 | 11 | 6 | 3 | 2 | 1 | 1 | 1218 | | | | | | | | | 1219 | SE=2 | 19 | 9 | 5 | 3 | 2 | 1 | 1 | 1220 | | | | | | | | | 1221 | SE=3 | 15 | 8 | 5 | 3 | 2 | 1 | 1 | 1222 | | | | | | | | | 1223 | SE=4 | 13 | 7 | 4 | 3 | 2 | 1 | 1 | 1224 +------+----+----+-----+-----+-----+-----+-----+ 1226 Table 5: Nr of transported maps as function of SE and SV, 1228 The value of SI for the average YANG string identifier is 30. When 1229 the size of the value part of the map is less than 10 (SV < 10) and 1230 SI=30, the impact of the hashing is significant: 10 maps can be 1231 transported instead of 1 or 2. Further reducing the value of SE from 1232 4 to 1 increases the number of transported maps with a factor 2 to 1233 1,2. 1235 Authors' Addresses 1237 Andy Bierman 1238 YumaWorks 1239 685 Cochran St. 1240 Suite #160 1241 Simi Valley, CA 93065 1242 USA 1244 Email: andy@yumaworks.com 1246 Peter van der Stok 1247 consultant 1249 Phone: +31-492474673 (Netherlands), +33-966015248 (France) 1250 Email: consultancy@vanderstok.org 1251 URI: www.vanderstok.org