idnits 2.17.1 draft-ietf-roll-rpl-observations-05.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (November 30, 2020) is 1244 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RIOT' is mentioned on line 802, but not defined == Missing Reference: 'CONTIKI' is mentioned on line 802, but not defined == Outdated reference: A later version (-18) exists of draft-ietf-roll-aodv-rpl-08 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ROLL R. Jadhav, Ed. 3 Internet-Draft 4 Intended status: Standards Track R. Sahoo 5 Expires: June 3, 2021 Juniper 6 Y. Wu 7 Huawei 8 November 30, 2020 10 RPL Observations 11 draft-ietf-roll-rpl-observations-05 13 Abstract 15 This document describes RPL protocol design issues, various 16 observations and possible consequences of the design and 17 implementation choices. 19 Status of This Memo 21 This Internet-Draft is submitted in full conformance with the 22 provisions of BCP 78 and BCP 79. 24 Internet-Drafts are working documents of the Internet Engineering 25 Task Force (IETF). Note that other groups may also distribute 26 working documents as Internet-Drafts. The list of current Internet- 27 Drafts is at https://datatracker.ietf.org/drafts/current/. 29 Internet-Drafts are draft documents valid for a maximum of six months 30 and may be updated, replaced, or obsoleted by other documents at any 31 time. It is inappropriate to use Internet-Drafts as reference 32 material or to cite them other than as "work in progress." 34 This Internet-Draft will expire on June 3, 2021. 36 Copyright Notice 38 Copyright (c) 2020 IETF Trust and the persons identified as the 39 document authors. All rights reserved. 41 This document is subject to BCP 78 and the IETF Trust's Legal 42 Provisions Relating to IETF Documents 43 (https://trustee.ietf.org/license-info) in effect on the date of 44 publication of this document. Please review these documents 45 carefully, as they describe your rights and restrictions with respect 46 to this document. Code Components extracted from this document must 47 include Simplified BSD License text as described in Section 4.e of 48 the Trust Legal Provisions and are provided without warranty as 49 described in the Simplified BSD License. 51 Table of Contents 53 1. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 3 54 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 55 2.1. Requirements Language and Terminology . . . . . . . . . . 3 56 3. DTSN increment in storing MOP . . . . . . . . . . . . . . . . 4 57 3.1. Deliberations . . . . . . . . . . . . . . . . . . . . . . 5 58 4. DAO retransmission and use of DAO-ACK in storing MOP . . . . 5 59 4.1. Significance of bidirectional Path establishment 60 indication and relevance of DAO-ACK . . . . . . . . . . . 6 61 4.2. Problems with hop-by-hop DAO-ACK . . . . . . . . . . . . 6 62 4.3. Problems with end-to-end DAO-ACK . . . . . . . . . . . . 6 63 4.4. Deliberations . . . . . . . . . . . . . . . . . . . . . . 6 64 4.5. Implementation Notes . . . . . . . . . . . . . . . . . . 7 65 5. Interpreting Trickle Timer . . . . . . . . . . . . . . . . . 7 66 6. Handling resource unavailability . . . . . . . . . . . . . . 8 67 6.1. Deliberations . . . . . . . . . . . . . . . . . . . . . . 8 68 7. Handling aggregated targets . . . . . . . . . . . . . . . . . 9 69 7.1. Deliberations . . . . . . . . . . . . . . . . . . . . . . 9 70 8. RPL Transit Information in DAO . . . . . . . . . . . . . . . 9 71 8.1. Deliberations . . . . . . . . . . . . . . . . . . . . . . 10 72 9. Upgrades or Extensions to RPL protocol . . . . . . . . . . . 10 73 10. Path Control bits handling . . . . . . . . . . . . . . . . . 10 74 11. Asymmetric Links and RPL . . . . . . . . . . . . . . . . . . 11 75 12. Adjacencies probing with RPL . . . . . . . . . . . . . . . . 11 76 12.1. Deliberations . . . . . . . . . . . . . . . . . . . . . 12 77 13. Control Options eliding mechanism in RPL . . . . . . . . . . 12 78 14. Managing persistent variables across node reboots . . . . . . 12 79 14.1. Persistent storage and RPL state information . . . . . . 12 80 14.2. Lollipop Counters . . . . . . . . . . . . . . . . . . . 13 81 14.3. RPL State variables . . . . . . . . . . . . . . . . . . 14 82 14.3.1. DODAG Version . . . . . . . . . . . . . . . . . . . 14 83 14.3.2. DTSN field in DIO . . . . . . . . . . . . . . . . . 14 84 14.3.3. PathSequence . . . . . . . . . . . . . . . . . . . . 15 85 14.4. State variables update frequency . . . . . . . . . . . . 15 86 14.5. Deliberations . . . . . . . . . . . . . . . . . . . . . 15 87 14.6. Implementation Notes . . . . . . . . . . . . . . . . . . 16 88 15. Capabilities and its role in RPL . . . . . . . . . . . . . . 16 89 15.1. Handshaking node capabilities . . . . . . . . . . . . . 16 90 15.2. How do Capabilities differ from MOP and Configuration 91 Option? . . . . . . . . . . . . . . . . . . . . . . . . 17 92 15.3. Deliberations . . . . . . . . . . . . . . . . . . . . . 17 93 16. Backward Compatibility issues with RPL Options . . . . . . . 17 94 17. RPL under-specification . . . . . . . . . . . . . . . . . . . 17 95 18. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 18 96 19. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18 97 20. Security Considerations . . . . . . . . . . . . . . . . . . . 18 98 21. References . . . . . . . . . . . . . . . . . . . . . . . . . 18 99 21.1. Normative References . . . . . . . . . . . . . . . . . . 18 100 21.2. Informative References . . . . . . . . . . . . . . . . . 19 101 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . 19 102 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 19 104 1. Motivation 106 The primary motivation for this draft is to enlist different issues 107 with RPL operation and invoke a discussion within the working group. 108 This draft by itself is not intended for RFC tracks but as a WG 109 discussion track. This draft may in turn result in other work items 110 taken up by the WG which may improvise on the issues mentioned 111 herewith. 113 2. Introduction 115 RPL [RFC6550] specifies a proactive distance-vector routing scheme 116 designed for LLNs (Low Power and Lossy Networks). RPL enables the 117 network to be formed as a DODAG and supports storing mode and non- 118 storing mode of operations. Non-storing mode allows reduced memory 119 resource usage on the nodes by allowing non-BR nodes to operate 120 without managing a routing table and involves use of source routing 121 by the Root to direct the traffic along a specific path. In storing 122 mode of operation intermediate routers maintain routing tables. 124 This work aims to highlight various issues with RPL which makes it 125 difficult to handle certain scenarios. This work will highlight such 126 issues in context to RPL's mode of operations (storing versus non- 127 storing). There are cases where RPL does not provide clear rules and 128 implementations have to make their choices hindering interoperability 129 and performance. 131 [I-D.clausen-lln-rpl-experiences] provides some interesting points. 132 Some sections in this draft may overlap with some observations in 133 [clausen], but this is been done to further extend some scenarios or 134 observations. It is highly encouraged that readers should also visit 135 [I-D.clausen-lln-rpl-experiences] for other insights. Regardless, 136 this draft is self-sufficient in a way that it does not expect to 137 have read [clausen-draft]. 139 2.1. Requirements Language and Terminology 141 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 142 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 143 document are to be interpreted as described in RFC 2119 [RFC2119]. 145 NS-MOP = RPL Non-storing Mode of Operation 147 S-MOP = RPL Storing Mode of Operation 149 This document uses terminology described in [RFC6550] and [RFC6775]. 151 3. DTSN increment in storing MOP 153 DTSN increment has major impact on the overall RPL control traffic 154 and on the efficiency of downstream route update. DTSN is sent as 155 part of DIO message and signals the downstream nodes to trigger the 156 target advertisement. The 6LR needs to decide when to update the 157 DTSN and usually it should do it in a conservative way. The DTSN 158 update mechanism determines how soon the downward routes are 159 established along the new path. RPL specifications does not provide 160 any clear mechanism on how the DTSN update should happen in case of 161 storing mode. 163 (6LBR) 164 | 165 | 166 | 167 (A) 168 / \ 169 / \ 170 / \ 171 (B) -(C) 172 | / | 173 | / | 174 | / | 175 (D)- (E) 176 \ ; 177 \ ; 178 \ ; 179 (F) 180 / \ 181 / \ 182 / \ 183 (G) (H) 185 Figure 1: Sample topology 187 Consider example topology shown in Figure 1, assume that node D 188 switches the parent from node B to C. Ideally the downstream nodes D 189 and its sub-childs should send their target advertisement to the new 190 path via node C. To achieve this result in a efficient way is a 191 challenge. Incrementing DTSN is the only way to trigger the DAO on 192 downstream nodes. But this trigger should be sent not only on the 193 first hop but to all the grand-child nodes. Thus DTSN has to be 194 incremented in the complete sub-DODAG rooted at node D thus resulting 195 in DIO/DAO storm along the sub-DODAG. This is specifically a big 196 issue in high density networks where the metric deteoration might 197 happen transiently even though the signal strength is good. 199 The primary implementation issue is whether a child node increment 200 its own DTSN when it receives DTSN update from its parent node? This 201 would result in DAO-updates in the sub-DODAG, thus the cost could be 202 very high. If not incremented it may result in serious loss of 203 connectivity for nodes in the sub-DODAG. 205 3.1. Deliberations 207 (1) In S-MOP, should the child node increment its DTSN on seeing 208 that its preferred parent has updated its DTSN? 210 (2) What are rules for DTSN increment for S-MOP, which multiple 211 implementations can follow thus allowing consistent performance 212 across different implementations? 214 4. DAO retransmission and use of DAO-ACK in storing MOP 216 [RFC6550] has an optional DAO-ACK mechanism using which an upstream 217 parent confirms the reception of a DAO from the downstream child. In 218 case of storing mode, the DAO is addressed to the immediate hop 219 upstream parent resulting in DAO-ACK from the parent. There are two 220 implementations possible: 222 (1) Hop-by-hop ACK: A parent responds with a DAO-ACK immedetialy 223 after receiving the DAO. 225 (2) End-to-End ACK: A node waits for the upstream parent to send 226 DAO-ACK to respond with a DAO-ACK downstream. The upstream 227 parent may do as many attempts to successfully send this DAO 228 upstream. In other words, the parent node accepts the 229 responsibilty of sending the DAO upstream till the point it is 230 ACKed the moment it responds back with its own ACK to the child. 232 1-> 3-> 233 DAO DAO 234 (TgtNode)--------(6LR)-------(root) 235 ACK ACK 236 <-2 <-4 238 Figure 2: Hop-by-hop DAO-ACK 239 1-> 2-> 240 DAO DAO 241 (TgtNode)--------(6LR)-------(root) 242 ACK ACK 243 <-4 <-3 245 Figure 3: End-to-End DAO-ACK 247 4.1. Significance of bidirectional Path establishment indication and 248 relevance of DAO-ACK 250 Lot of application traffic patterns requires that the bidirectional 251 path be established between the target node and the root. A typical 252 example is that COAP request with ACK bit set would require an 253 acknowledgement from the end receiver and thus warrants bidirectional 254 path establishment. It is imperative that the target node first 255 ascertains whether such a bidirectional path is established before 256 initiating such application traffic. In case of non-storing MOP, the 257 DAO-ACK works perfectly fine to ascertain such bidirectional 258 connectivity since it is an indication that the root which usually is 259 the direct destination of the DAO has received the DAO. But in case 260 of storing MOP, things are more complicated since DAO is sent hop-by- 261 hop and the DAO-ACK semantics are not clear enough as per the current 262 specification. As mentioned in above section, an implementation can 263 choose to implement hop-by-hop ACK or end-to-end ACK. 265 4.2. Problems with hop-by-hop DAO-ACK 267 The primary issue with this mode is that target node cannot ascertain 268 bidirection path connectivity on the reception of the DAO-ACK. 270 4.3. Problems with end-to-end DAO-ACK 272 In this case, it is possible for the target node to ascertain if the 273 DAO has indeed reached the root since the reception of DAO-ACK on 274 target node confirms this. However there is extra state information 275 that needs to be maintained on the 6LRs on behalf of all the child 276 nodes. Also it is very difficult for the target node to ascertain a 277 timer value to decide whether the DAO transmission has failed to 278 reach the root. 280 4.4. Deliberations 282 (1) How should an implementation interpret the DAO-ACK semantics? 284 (2) What is the best way for the target node to know that the end to 285 end bidirectional path is successfully installed or updated? In 286 NS-MOP, the DAO-ACK provides a clear way to do this. Can the 287 same be achieved for storing-MOP? 289 (3) What happens if the DAO-ACK with Status!=0 is responded by 290 ancestor node? 292 (4) How to selectively NACK subset of targets in case target options 293 are aggregated? 295 4.5. Implementation Notes 297 Current RPL open source implementations have both types of DAO-ACK 298 implementations. For e.g. RIOT supports hop-by-hop DAO-ACK. 299 Contiki older versions supported hop-by-hop ACK but the recent 300 version have changed to end-to-end ACK implementation. 302 The sequence of sending no-path DAO and DAO matters when updating the 303 routing adjacencies on a parent switch. If an implementation chooses 304 to send no-path DAO before DAO then it results in significantly more 305 overhead for route invalidation. This is because no-path DAO would 306 traverse all the way up to the BR clearing the routes on the way. In 307 case there is a common ancestor post which the old and new path 308 remains same then it is better to send regular DAO first thus 309 limiting the propagation of subsequent no-path DAO till this common 310 ancestor. 312 5. Interpreting Trickle Timer 314 Trickle algorithm defines a mechanism to reset the timer. Trickle 315 timer reset is unlike regular periodic timers wherein the timer is 316 simply reset to start again. Reset of trickle timer implies 317 resetting the trickle back to Imin and starting with a new interval 318 as mentioned in Section 4.2 of [RFC6206]. 320 |----|--------|----------------|------------------------------| . . . . 321 Imin I2 I3 I4 I5 323 Figure 4: Trickle Timer Operation 325 The above figure shows an example of trickle intervals. An interval 326 is double that of the previous interval size. Section 4.2. of 327 [RFC6206] states that, 329 "If Trickle hears a transmission that is "inconsistent" and I is 330 greater than Imin, it resets the Trickle timer. To reset the timer, 331 Trickle sets I to Imin and starts a new interval as in step 2. If I 332 is equal to Imin when Trickle hears an "inconsistent" transmission, 333 Trickle does nothing. Trickle can also reset its timer in response 334 to external "events"." 336 Thus if the trickle timer has advanced to subsequent intervals i.e., 337 >= I2, then a reset of trickle timer implies going back to Imin. 338 However, if the trickle timer is currently in Imin and if it hears an 339 inconsistent transmission then it does nothing. 341 In context to multicast DIS/DIO operation, this implies that if the 342 DIO trickle timer is already at Imin and if the node hears a 343 multicast DIS, then the timer does nothing. It MUST NOT reset the 344 timer again in this case. 346 An implementation MUST never restart the timer within an interval. 347 For e.g., in the above figure, if the timer is in interval I2, the 348 implementation MUST never restart the timer to the beginning of the 349 current interval i.e., I2. If the timer is in interval T2 and if the 350 reset is to be done then the interval is set back to Imin. If the 351 timer is already in Imin, then the reset should do nothing. 353 6. Handling resource unavailability 355 The nodes in the constrained networks have to maintain various 356 records such as neighbor cache entries and routing entries on behalf 357 of other targets to facilitate packet forwarding. Because of the 358 constrained nature of the devices the memory available may be very 359 limited and thus the path selection algorithm may have to take into 360 consideration such resource constraints as well. 362 RPL currently does not have any mechanism to advertise such resource 363 indicator metrics. The primary tables associated with RPL are 364 routing table and the neighbor cache. Even though neighbor cache is 365 not directly linked with RPL protocol, the maintenance of routing 366 adjacencies results in updates to neigbor cache. 368 6.1. Deliberations 370 Is it possible to know that an upstream parent/ancestor cannot 371 hold enough routing entries and thus this path should not be used? 373 Is it possible to know that an upstream parent cannot hold any 374 more neighbor cache entry and thus this upstream parent should not 375 be used? 377 7. Handling aggregated targets 379 RPL allows and defines specific procedures so as to aid target 380 aggregation in DAO. Having said that, the specification does not 381 mandate use of aggregated targets nor does it make any comment on 382 whether a receiving node needs to handle it. Target aggregation is 383 an useful tool and especially helps with link layer technologies that 384 does not suffer from low MTUs such as PLC. Even if the 385 implementation does not support aggregating targets, it should 386 atleast mandate reception of aggregated targets in DAO. 388 RPL has a mechanism currently to ACK the DAO but it does not have a 389 mechanism to ACK the target option. Thus in case of aggregated 390 targets in the DAO, if the subset of the targets fail then it is 391 impossible for the DAO-ACK to signal this to the DAO sender. 393 7.1. Deliberations 395 Even if the implementation does not support aggregating targets, 396 should it atleast mandate reception and handling of aggregated 397 targets in DAO? 399 There is a good scope for compressing aggregated targets which can 400 significantly reduce the RPL control overhead. 402 How to selectively NACK subset of targets in case target options 403 are aggregated? 405 The DEFAULT_DAO_DELAY of 1sec does not help much with aggregation. 406 The upstream parent nodes should wait for more time then the child 407 nodes so as to effectively aggregate. Can we have 408 DEFAULT_DAO_DELAY a function of the level/rank the node is at? 410 8. RPL Transit Information in DAO 412 RPL allows associating a target or set of targets with a Transit 413 Information Option which contains attributes for a path to one or 414 more destinations identified by the set of targets. In case of NS- 415 MOP, the transit Information will contain the all critical Parent 416 Address which allows the common ancestor usually the root to identify 417 the source route header for the target node. The Transit Information 418 also contains other information such as Path Sequence and Path 419 Lifetime which are critical for maintaining route adjacencies. 421 RPL however does not mandate the use of Transit Information Option 422 for targets. 424 8.1. Deliberations 426 Is it ok to let implementations decide on the inclusion of Transit 427 Information Option? 429 Is it possible to achieve interop without mandating use of Transit 430 Information Option? 432 If the Transit Information Option is sent, should the handling of 433 PathSequence be mandated? 435 9. Upgrades or Extensions to RPL protocol 437 RPL extensibility is highly desirable and is controlled by protocol 438 elements within the messaging framework. In the pursuit to keep the 439 signalling overhead less, RPL specification has been restricting in 440 its approach to extend its field ranges, thus in some cases putting 441 extensibility at stakes. Consider for example, the mode of operation 442 bits which is three bits in the RPL specification. These bits are 443 already saturated and it may be difficult to add major upgrades 444 without extending these bits. 446 Addition of new Control Options or new RPL Codes almost certainly 447 results in backward compatibility issues. RFC6550 clearly mentions 448 that a message with an unknown RPL Code MUST be silently discarded. 449 However, no explicit handling is suggested for unknown RPL control 450 option types. In some cases, implementations simply copy-forward an 451 unknown option as it is while in other cases the unknown option is 452 stripped off before forwarding the message. 454 Deliberations: 456 (1) What are the extensibility options RPL could implement? How 457 much overhead would it incur? 459 (2) Most of the extensions are in the form of new control options. 460 Should RPL have a mechanism to only handle such extensions in a 461 backward compatible but in a generic manner? 463 10. Path Control bits handling 465 RPL uses Path Control bits in the DAO's Transit Information Option 466 for installing multiple downward routes to the nodes. These multiple 467 routes could be used for reliability, latency or traffic load- 468 balancing within a DAG. The path control bits are usable both in 469 storing and non-storing mode of operation. 471 RFC6550 Section 9.9 bullet point 9 requires a mandatory setting of 472 Path Control bits in all the unicast DAOs sent by the Target node. 473 However, no existing implementation of RPL supports this. There is 474 no reason for a network which only requires a single path to the root 475 to mandatorily support path control bits. 477 Deliberations: 479 (1) Should the mandatory clause for supporting Path Control Bits in 480 RFC6550 Section 9.9 point 9 be removed? 482 (2) Handling Path Control Bits may be complex. An implementation 483 guideline explaining the use-cases and resource (memory 484 requirements) assumptions would help implementors decide the 485 utility of this technique. 487 11. Asymmetric Links and RPL 489 Section 3.1 of [I-D.ietf-intarea-adhoc-wireless-com] explains 490 asymmetric link characteristics and what it takes for a protocol to 491 support asymmetric links. RPL depends on bi-directional links for 492 control even though near-perfect symmetry is not expected. The 493 implication of this is that the upstream and downstream path remains 494 same within a given RPL instance for any pair of nodes. There are 495 following questions sprouting of this design: 497 (1) Is it possible to detect asymmetric links? 499 (2) In the presence of asymmetric links what is the impact on the 500 control overhead and is there a way to possibly mitigate or 501 alleviate any negative impact? 503 [I-D.ietf-roll-aodv-rpl] defines a mechanism to use a pair of 504 instances which are coupled. This allows disjoint upstream and 505 downstream paths between pair of nodes assuming that the link 506 asymmetricity is detected using some outside techniques. The link 507 assumes that the link asymmetricity is already known to the nodes in 508 the form of static configuration. In case of 6tisch networks, the 509 availability of transmission slots information can be used to 510 identify link asymmetricity. The challenge with regards to detecting 511 link asymmetricity arises from scenarios where, for example, the 512 nodes transmit with unequal power levels. 514 12. Adjacencies probing with RPL 516 RPL avoids periodic hello messaging as compared to other distance- 517 vector protocols. It uses trickle timer based mechanism to update 518 configuration parameters. This significantly reduces the RPL control 519 overhead. One of the fallout of this design choice is that, in the 520 absence of regular traffic, the adjacencies could not be tested and 521 repaired if broken. 523 RPL provides a mechanism in the form of unicast DIS to query a 524 particular node for its DIO. A node receiving a unicast DIS MUST 525 respond with a unicast DIO with Configuration Option. This mechanism 526 could as well be made use of for probing adjacencies and certain 527 implementations such as Contiki uses this. The periodicity of the 528 probing is implementation dependent, but the node is expected to 529 invoke probing only when 531 (1) There is no data traffic based on which the links could be 532 tested. 534 (2) There is no L2 feedback. In some case, L2 might provide 535 periodic beacons at link layer and the absence of beacons could 536 be used for link tests. 538 12.1. Deliberations 540 (1) Should the probing scheme be standardized? In some cases using 541 multicast based probing may prove advantageous. 543 (2) In some cases using multicast based probing may prove 544 advantageous. Currently RPL does not have multicast based 545 probing. Multicast DIS/DIO may not be suitable for probing 546 because it could possibly lead to change of states. 548 13. Control Options eliding mechanism in RPL 550 RPL configuration changes are rare and thus various configuration 551 options may not change over a long period of time. RPL provides a 552 way for the configuration options to be elided but there are no clear 553 guidelines on how the eliding should be handled. In the absence of 554 such guidelines, it is possible that certain nodes may end up using 555 stale configuration in the event of transient link failures. 557 14. Managing persistent variables across node reboots 559 14.1. Persistent storage and RPL state information 561 Devices are required to be functional for several years without 562 manual maintanence. Usually battery power consumption is considered 563 key for operating the devices for several (tens of) years. But apart 564 from battery, flash memory endurance may prove to be a lifetime 565 bottleneck in constrained networks. Endurance is defined as maximum 566 number of erase-write cycles that a NAND/NOR cell can undergo before 567 losing its 'gauranteed' write operation. In some cases (cheaper 568 NAND-MLC/TLC), the endurance can be as less as 2K cycles. Thus for 569 e.g. if a given cell is written 5 times a day, that NAND-flash cell 570 assuming an endurance of 10K cycles may last for less than 6 years. 572 Wear leveling is a popular technique used in flash memory to minimize 573 the impact of limited cell endurance. Wear leveling works by 574 arranging data so that erasures and re-writes are distributed evenly 575 across the medium. The memory sectors are over-provisioned so that 576 the writes are distributed across multiple sectors. Many IoT 577 platforms do not necessarily consider this over-provisioning and 578 usually provision the memory only to what is required. Some 579 scenarios such as street-lighting may not require the application 580 layer to write any information to the persistent storage and thus the 581 over-provisioning is often ignored. In such cases if the network 582 stack ends up using persistent storage for maintaining its state 583 information then it becomes counter-productive. 585 In a star topology, the amount of persistent data write done by 586 network protocols is very limited. But ad-hoc networks employing 587 routing protocols such as RPL assume certain state information to be 588 retained across node reboots. In case of IoT devices this storage is 589 mostly floating gate based NAND/NOR based flash memory. The impact 590 of loss of this state information differs depending upon the type 591 (6LN/6LR/6LBR) of the node. 593 14.2. Lollipop Counters 595 [RFC6550] Section 7.2. explains sequence counter operation defining 596 lollipop [Perlman83] style counters. Lollipop counters specify 597 mechanism in which even if the counter value wraps, the algorithm 598 would be able to tell whether the received value is the latest or 599 not. This mechanism also helps in "some cases" to recover from node 600 reboot, but is not foolproof. 602 Consider an e.g. where Node A boots up and initialises the seqcnt to 603 240 as recommended in [RFC6550]. Node A communicates to Node B using 604 this seqcnt and node B uses this seqcnt to determine whether the 605 information node A sent in the packet is latest. Now lets assume, 606 the counter value reaches 250 after some operations on Node A, and 607 node B keeps receiving updated seqcnt from node A. Now consider that 608 node A reboots, and since it reinitializes the seqcnt value to 240 609 and sends the information to node B (who has seqcnt of 250 stored on 610 behalf of node A). As per section 7.2. of [RFC6550], when node B 611 receives this packet it will consider the information to be old 612 (since 240 < 250). 614 +-----+-----+----------+ 615 | A | B | Output | 616 +-----+-----+----------+ 617 | 240 | 240 | AB, new | 623 | 240 | :: | A>B, new | 624 | 240 | 127 | A>B, new | 625 +-----+-----+----------+ 627 Default values for lollipop counters considered from [RFC6550] 628 Section 7.2. 630 Table 1: Example lollipop counter operation 632 Based on this figure, there is dead zone (240 to 0) in which if A 633 operates after reboot then the seqcnt will always be considered 634 smaller. Thus node A needs to maintain the seqcnt in persistent 635 storage and reuse this on reboot. 637 14.3. RPL State variables 639 The impact of loss of RPL state information differs depending upon 640 the node type (6LN/6LR/6LBR). Following sections explain different 641 state variables and the impact in case this information is lost on 642 reboot. 644 14.3.1. DODAG Version 646 The tuple (RPLInstanceID, DODAGID, DODAGVersionNumber) uniquely 647 identifies a DODAG Version. DODAGVersionNumber is incremented 648 everytime a global repair is initiated for the instance (global or 649 local). A node receiving an older DODAGVersionNumber will ignore the 650 DIO message assuming it to be from old DODAG version. Thus a 6LBR 651 node (and 6LR node in case of local DODAG) needs to maintain the 652 DODAGVersionNumber in the persistent storage, so as to be available 653 on reboot. In case the 6LBR could not use the latest 654 DODAGVersionNumber the implication are that it won't be able to 655 recover/re-establish the routing table. 657 14.3.2. DTSN field in DIO 659 DTSN (Destination advertisement Trigger Sequence Number) is a DIO 660 message field used as part of procedure to maintain Downward routes. 661 A 6LBR/6LR node may increment a DTSN in case it requires the 662 downstream nodes to send DAO and thus update downward routes on the 663 6LBR/6LR node. In case of RPL NS-MOP, only the 6LBR maintains the 664 downward routes and thus controls this field update. In case of 665 S-MOP, 6LRs additionally keep downward routes and thus control this 666 field update. 668 In S-MOP, when a 6LR node switches parent it may have to issue a DIO 669 with incremented DTSN to trigger downstream child nodes to send DAO 670 so that the downward routes are established in all parent/ancestor 671 set. Thus in S-MOP, the frequency of DTSN update might be relatively 672 high (given the node density and hysteresis set by objective function 673 to switch parent). 675 14.3.3. PathSequence 677 PathSequence is part of RPL Transit Option, and associated with RPL 678 Target option. A node whichs owns a target address can associate a 679 PathSequence in the DAO message to denote freshness of the target 680 information. This is especially useful when a node uses multiple 681 paths or multiple parents to advertise its reachability. 683 Loss of PathSequence information maintained on the target node can 684 result in routing adjacencies been lost on 6LRs/6LBR/6BBR. 686 14.4. State variables update frequency 688 +--------------------+-------------------+------------------------+ 689 | State variable | Update frequency | Impacts node type | 690 +--------------------+-------------------+------------------------+ 691 | DODAGVersionNumber | Low | 6LBR, 6LR(local DODAG) | 692 | DTSN | High(SM),Low(NSM) | 6LBR, 6LR | 693 | PathSequence | High(SM),Low(NSM) | 6LR, 6LN | 694 +--------------------+-------------------+------------------------+ 696 Low=<5 per day, High=>5 per day; SM=Storing MOP, NSM=Non-Storing MOP 698 Table 2: RPL State variables 700 14.5. Deliberations 702 (1) Is it possible that RPL removes the use of persistent storage 703 for maintaining state information? 705 (2) In most cases, the node reboots will happen very rarely. Thus 706 doing a persistent storage book-keeping for handling node reboot 707 might not make sense. Is it possible to consider signaling 708 (especially after the node reboots) so as to avoid maintaining 709 this persistent state? Is it possible to use one-time on-reboot 710 signalling to recover some state information? 712 (3) It is necessary that RPL avoids using persistent storage as far 713 as possible. Ideally, extensions to RPL should consider this as 714 a design requirement especially for 6LR and 6LN nodes. DTSN and 715 PathSequence are the primary state variables which have major 716 impact. 718 14.6. Implementation Notes 720 An implementation should use a random DAOSequence number on reboot so 721 as to avoid a risk of reusing the same DAOSequence on reboot. 722 Regardless the sequence counter size of 8bits does not provide much 723 gurantees towards choosing a good random number. A parent node will 724 not respond with a DAO-ACK in case it sees a DAO with the same 725 previous DAOSequence. 727 Write-Before-Use: The state information should be written to the 728 flash before using it in the messaging. If it is done the other way, 729 then the chances are that the node power downs before writing to the 730 persistent storage. 732 15. Capabilities and its role in RPL 734 RPL is a distributed protocol and it requires that the participating 735 nodes agree on basic set of primitives to follow. RPL currently 736 handles this using MOP (Mode of Operation) bits in the DIO. MOP bits 737 inform the nodes the basic mode of operation a node MUST support to 738 join the Instance as a 6LR. The MOP is decided and advertised by the 739 root of the RPL Instance. A node not supporting the given MOP may 740 still join the Instance as a leaf node or 6LN. 742 RPL further uses DIO Configuration Option to advertise the 743 configuration each node needs to use (for e.g., for trickle timer). 745 15.1. Handshaking node capabilities 747 Currently there exist no mechanism to handshake capabilities of the 748 root or 6LRs or 6LNs. If a feature is optional and is supported by 749 6LRs/6LNs then currently there exists no mechanism to signal it. 750 There are several RPL extension proposals which are possibly optional 751 features. Root needs to know if the 6LR/6LN supports these optional 752 features to enable the extension in that path context. Similarly 753 6LRs and 6LNs need to know whether the root supports certain 754 extensions that it can make use of. 756 15.2. How do Capabilities differ from MOP and Configuration Option? 758 Unlike MOP and Configuration Option which are issued by the root of 759 the Instance, Capabilities can be issued by any node. A 6LN/6LR node 760 can advertise its capabilities such that those can be seen by 761 intermediate 6LRs and the root of the Instance. 763 15.3. Deliberations 765 (1) Is it possible for leaf nodes to advertise their set of 766 capabilities, which can be used by root and/or intermediate 6LRs 767 to make run time decisions? 769 (2) How should these capabilities be carried? Should it be carried 770 in DAO/DIO/DAO-ACK? 772 (3) Should the definition of capabilities be same in both directions 773 (upstream/downstream)? 775 16. Backward Compatibility issues with RPL Options 777 Most of the new work in ROLL requires addition of new control 778 options. Everytime a new control option is added, it is required 779 that all the nodes upgrade to support this option. In many cases, 780 the new specification declares using a Flag day to switch to the new 781 functionality. 783 New control options may not require mandatory handling on every node 784 but it requires at-least some processing. For e.g., assume that a 785 new control option is added to DIO message. The option does not 786 require any handling on the nodes not supporting it but it requires 787 at-least for these nodes to forward this new control option 788 downstream. Currently the new control option may be stripped off. 790 It should be possible for the unknown control options to be copied 791 as-is to the downstream/upstream node(s). The specification defining 792 the new control option will decide whether a node should strip-off or 793 copy the unknown control option. 795 17. RPL under-specification 797 (a) PathSequence: Is it mandatory to use PathSequence in DAO Transit 798 Information Option? RPL mentions that a 6LR/6LBR hosting the 799 routing entry on behalf of target node should refresh the 800 lifetime on reception of a new Path Sequence. But RPL does not 801 necessarily mandate use of Path Sequence. Most of the open 802 source implementation [RIOT] [CONTIKI] currently do not issue 803 Path Sequence in the DAO message. 805 (b) Target Option aggregation in DAO: RPL allows multiple targets to 806 be aggregated in a single DAO message and has introduced a 807 notion of DelayDAO using which a 6LR node could delay its DAO to 808 enable such aggregation. But RPL does not have clear text on 809 handling of aggregated DAOs and thus it hinders 810 interoperability. 812 (c) DTSN Update: RPL does not clearly define in which cases DTSN 813 should be updated in case of storing mode of operation. More 814 details for this are presented in Section 3. 816 18. Acknowledgements 818 Many thanks to Pascal Thubert for hallway chats and for helping 819 understand the existing design rationales. Thanks to Michael 820 Richardson for Unstrung RPL implementation rationale. Thanks to ML 821 discussions, in particular (https://www.ietf.org/mail- 822 archive/web/roll/current/msg09443.html). 824 19. IANA Considerations 826 This memo includes no request to IANA. 828 20. Security Considerations 830 This is an information draft and does add any changes to the existing 831 specifications. 833 21. References 835 21.1. Normative References 837 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 838 Requirement Levels", BCP 14, RFC 2119, 839 DOI 10.17487/RFC2119, March 1997, 840 . 842 [RFC6206] Levis, P., Clausen, T., Hui, J., Gnawali, O., and J. Ko, 843 "The Trickle Algorithm", RFC 6206, DOI 10.17487/RFC6206, 844 March 2011, . 846 [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., 847 Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, 848 JP., and R. Alexander, "RPL: IPv6 Routing Protocol for 849 Low-Power and Lossy Networks", RFC 6550, 850 DOI 10.17487/RFC6550, March 2012, 851 . 853 [RFC6775] Shelby, Z., Ed., Chakrabarti, S., Nordmark, E., and C. 854 Bormann, "Neighbor Discovery Optimization for IPv6 over 855 Low-Power Wireless Personal Area Networks (6LoWPANs)", 856 RFC 6775, DOI 10.17487/RFC6775, November 2012, 857 . 859 21.2. Informative References 861 [I-D.clausen-lln-rpl-experiences] 862 Clausen, T., Verdiere, A., Yi, J., Herberg, U., and Y. 863 Igarashi, "Observations on RPL: IPv6 Routing Protocol for 864 Low power and Lossy Networks", draft-clausen-lln-rpl- 865 experiences-11 (work in progress), March 2018. 867 [I-D.ietf-intarea-adhoc-wireless-com] 868 Baccelli, E. and C. Perkins, "Multi-hop Ad Hoc Wireless 869 Communication", draft-ietf-intarea-adhoc-wireless-com-02 870 (work in progress), July 2016. 872 [I-D.ietf-roll-aodv-rpl] 873 Anamalamudi, S., Zhang, M., Perkins, C., Anand, S., and B. 874 Liu, "AODV based RPL Extensions for Supporting Asymmetric 875 P2P Links in Low-Power and Lossy Networks", draft-ietf- 876 roll-aodv-rpl-08 (work in progress), May 2020. 878 [Perlman83] 879 Perlman, R., "Fault-Tolerant Broadcast of Routing 880 Information", North-Holland Computer Networks, Vol.7, 881 December 1983. 883 Appendix A. Additional Stuff 885 Authors' Addresses 887 Rahul Arvind Jadhav (editor) 888 Marathahalli 889 Bangalore, Karnataka 560037 890 India 892 Email: rahul.ietf@gmail.com 893 Rabi Narayan Sahoo 894 Juniper 895 Whitefield 896 Bangalore, Karnataka 560037 897 India 899 Email: rabinarayans0828@gmail.com 901 Yuefeng Wu 902 Huawei 903 No.101, Software Avenue, Yuhuatai District, 904 Nanjing, Jiangsu 210012 905 China 907 Phone: +86-15251896569 908 Email: wuyuefeng@huawei.com