idnits 2.17.1 draft-rahul-roll-rpl-observations-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year == The document doesn't use any RFC 2119 keywords, yet seems to have RFC 2119 boilerplate text. -- The document date (February 4, 2018) is 2244 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RIOT' is mentioned on line 403, but not defined == Missing Reference: 'CONTIKI' is mentioned on line 403, but not defined == Unused Reference: 'RFC6551' is defined on line 444, but no explicit reference was found in the text == Unused Reference: 'RFC6552' is defined on line 450, but no explicit reference was found in the text ** Downref: Normative reference to an Experimental RFC: RFC 6997 == Outdated reference: A later version (-11) exists of draft-clausen-lln-rpl-experiences-10 Summary: 1 error (**), 0 flaws (~~), 7 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 ROLL R. Jadhav, Ed. 3 Internet-Draft R. Sahoo 4 Intended status: Standards Track Y. Wu 5 Expires: August 8, 2018 Huawei 6 February 4, 2018 8 RPL Observations 9 draft-rahul-roll-rpl-observations-00 11 Abstract 13 This document describes RPL protocol design issues, various 14 observations and possible consequences of the design and 15 implementation choices. Also mentioned are implementation notes for 16 the developers to be used in specific contexts. 18 Status of This Memo 20 This Internet-Draft is submitted in full conformance with the 21 provisions of BCP 78 and BCP 79. 23 Internet-Drafts are working documents of the Internet Engineering 24 Task Force (IETF). Note that other groups may also distribute 25 working documents as Internet-Drafts. The list of current Internet- 26 Drafts is at https://datatracker.ietf.org/drafts/current/. 28 Internet-Drafts are draft documents valid for a maximum of six months 29 and may be updated, replaced, or obsoleted by other documents at any 30 time. It is inappropriate to use Internet-Drafts as reference 31 material or to cite them other than as "work in progress." 33 This Internet-Draft will expire on August 8, 2018. 35 Copyright Notice 37 Copyright (c) 2018 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (https://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with respect 45 to this document. Code Components extracted from this document must 46 include Simplified BSD License text as described in Section 4.e of 47 the Trust Legal Provisions and are provided without warranty as 48 described in the Simplified BSD License. 50 Table of Contents 52 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 53 1.1. Requirements Language and Terminology . . . . . . . . . . 3 54 2. Managing persistent variables across node reboots . . . . . . 3 55 2.1. Persistent storage and RPL state information . . . . . . 3 56 2.2. Lollipop Counters . . . . . . . . . . . . . . . . . . . . 3 57 2.3. RPL State variables . . . . . . . . . . . . . . . . . . . 4 58 2.3.1. DODAG Version . . . . . . . . . . . . . . . . . . . . 5 59 2.3.2. DTSN field in DIO . . . . . . . . . . . . . . . . . . 5 60 2.3.3. PathSequence . . . . . . . . . . . . . . . . . . . . 5 61 2.4. State variables update frequency . . . . . . . . . . . . 5 62 2.5. Recommendations . . . . . . . . . . . . . . . . . . . . . 6 63 2.6. Implementation Notes . . . . . . . . . . . . . . . . . . 6 64 3. DTSN increment in storing MOP . . . . . . . . . . . . . . . . 6 65 4. DAO retransmission and use of DAO-ACK . . . . . . . . . . . . 7 66 5. Handling resource unavailability . . . . . . . . . . . . . . 8 67 6. Traffic Types observations . . . . . . . . . . . . . . . . . 9 68 7. RPL under-specification . . . . . . . . . . . . . . . . . . . 9 69 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 10 70 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 71 10. Security Considerations . . . . . . . . . . . . . . . . . . . 10 72 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 73 11.1. Normative References . . . . . . . . . . . . . . . . . . 10 74 11.2. Informative References . . . . . . . . . . . . . . . . . 11 75 Appendix A. Additional Stuff . . . . . . . . . . . . . . . . . . 11 76 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 78 1. Introduction 80 RPL [RFC6550] specifies a proactive distance-vector routing scheme 81 designed for LLNs (Low Power and Lossy Networks). RPL enables the 82 network to be formed as a DODAG and supports storing mode and non- 83 storing mode of operations. Non-storing mode allows reduced memory 84 resource usage on the nodes by allowing non-BR nodes to operate 85 without managing a routing table and involves use of source routing 86 by the 6LBR to direct the traffic along a specific path. In storing 87 mode of operation intermediate routers maintain routing tables. 89 This work aims to highlight various issues with RPL which makes it 90 difficult to handle certain scenarios. This work will highlight such 91 issues in context to RPL's mode of operations (storing versus non- 92 storing). There are cases where RPL does not provide clear rules and 93 implementations have to make their choices hindering interoperability 94 and performance. 96 [I-D.clausen-lln-rpl-experiences] provides some interesting points. 97 Some sections in this draft may overlap with some observations in 99 [clausen], but this is been done to further extend some scenarios or 100 observations. It is highly encouraged that readers should also visit 101 [I-D.clausen-lln-rpl-experiences] for other insights. Regardless, 102 this draft is self-sufficient in a way that it does not expect to 103 have read [clausen-draft]. 105 1.1. Requirements Language and Terminology 107 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 108 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 109 document are to be interpreted as described in RFC 2119 [RFC2119]. 111 NS-MOP = RPL Non-storing Mode of Operation 113 S-MOP = RPL Storing Mode of Operation 115 This document uses terminology described in [RFC6550] and [RFC6775]. 117 2. Managing persistent variables across node reboots 119 2.1. Persistent storage and RPL state information 121 Devices are required to be functional for several years without 122 manual maintanence. Usually battery power consumption is considered 123 key for operating the devices for several (tens of) years. But apart 124 from battery, flash memory endurance may prove to be a lifetime 125 bottleneck in constrained networks. Endurance is defined as maximum 126 number of erase-write cycles that a NAND/NOR cell can undergo before 127 losing its 'gauranteed' write operation. In some cases (cheaper 128 NAND-MLC/TLC), the endurance can be as less as 2K cycles. Thus for 129 e.g. if a given cell is written 5 times a day, that NAND-flash cell 130 assuming an endurance of 10K cycles may last for less than 6 years. 132 In a star topology, the amount of persistent data write done by 133 network protocols is very limited. But ad-hoc networks employing 134 routing protocols such as RPL assume certain state information to be 135 retained across node reboots. In case of IoT devices this storage is 136 mostly floating gate based NAND/NOR based flash memory. The impact 137 of loss of this state information differs depending upon the type 138 (6LN/6LR/6LBR) of the node. 140 2.2. Lollipop Counters 142 [RFC6550] Section 7.2. explains sequence counter operation defining 143 lollipop [Perlman83] style counters. Lollipop counters specify 144 mechanism in which even if the counter value wraps, the algorithm 145 would be able to tell whether the received value is the latest or 146 not. This mechanism also helps in "some cases" to recover from node 147 reboot, but is not foolproof. 149 Consider an e.g. where Node A boots up and initialises the seqcnt to 150 240 as recommended in [RFC6550]. Node A communicates to Node B using 151 this seqcnt and node B uses this seqcnt to determine whether the 152 information node A sent in the packet is latest. Now lets assume, 153 the counter value reaches 250 after some operations on Node A, and 154 node B keeps receiving updated seqcnt from node A. Now consider that 155 node A reboots, and since it reinitializes the seqcnt value to 240 156 and sends the information to node B (who has seqcnt of 250 stored on 157 behalf of node A). As per section 7.2. of [RFC6550], when node B 158 receives this packet it will consider the information to be old 159 (since 240 < 250). 161 +-----+-----+----------+ 162 | A | B | Output | 163 +-----+-----+----------+ 164 | 240 | 240 | AB, new | 170 | 240 | :: | A>B, new | 171 | 240 | 127 | A>B, new | 172 +-----+-----+----------+ 174 Default values for lollipop counters considered from [RFC6550] 175 Section 7.2. 177 Table 1: Example lollipop counter operation 179 Based on this figure, there is dead zone (240 to 0) in which if A 180 operates after reboot then the seqcnt will always be considered 181 smaller. Thus node A needs to maintain the seqcnt in persistent 182 storage and reuse this on reboot. 184 2.3. RPL State variables 186 The impact of loss of RPL state information differs depending upon 187 the node type (6LN/6LR/6LBR). Following sections explain different 188 state variables and the impact in case this information is lost on 189 reboot. 191 2.3.1. DODAG Version 193 The tuple (RPLInstanceID, DODAGID, DODAGVersionNumber) uniquely 194 identifies a DODAG Version. DODAGVersionNumber is incremented 195 everytime a global repair is initiated for the instance (global or 196 local). A node receiving an older DODAGVersionNumber will ignore the 197 DIO message assuming it to be from old DODAG version. Thus a 6LBR 198 node (and 6LR node in case of local DODAG) needs to maintain the 199 DODAGVersionNumber in the persistent storage, so as to be available 200 on reboot. In case the 6LBR could not use the latest 201 DODAGVersionNumber the implication are that it won't be able to 202 recover/re-establish the routing table. 204 2.3.2. DTSN field in DIO 206 DTSN (Destination advertisement Trigger Sequence Number) is a DIO 207 message field used as part of procedure to maintain Downward routes. 208 A 6LBR/6LR node may increment a DTSN in case it requires the 209 downstream nodes to send DAO and thus update downward routes on the 210 6LBR/6LR node. In case of RPL NS-MOP, only the 6LBR maintains the 211 downward routes and thus controls this field update. In case of 212 S-MOP, 6LRs additionally keep downward routes and thus control this 213 field update. 215 In S-MOP, when a 6LR node switches parent it may have to issue a DIO 216 with incremented DTSN to trigger downstream child nodes to send DAO 217 so that the downward routes are established in all parent/ancestor 218 set. Thus in S-MOP, the frequency of DTSN update might be relatively 219 high (given the node density and hysteresis set by objective function 220 to switch parent). 222 2.3.3. PathSequence 224 PathSequence is part of RPL Transit Option, and associated with RPL 225 Target option. A node whichs owns a target address can associate a 226 PathSequence in the DAO message to denote freshness of the target 227 information. This is especially useful when a node uses multiple 228 paths or multiple parents to advertise its reachability. 230 Loss of PathSequence information maintained on the target node can 231 result in routing adjacencies been lost on 6LRs/6LBR/6BBR. 233 2.4. State variables update frequency 234 +--------------------+-------------------+------------------------+ 235 | State variable | Update frequency | Impacts node type | 236 +--------------------+-------------------+------------------------+ 237 | DODAGVersionNumber | Low | 6LBR, 6LR(local DODAG) | 238 | DTSN | High(SM),Low(NSM) | 6LBR, 6LR | 239 | PathSequence | High(SM),Low(NSM) | 6LR, 6LN | 240 +--------------------+-------------------+------------------------+ 242 Low=<5 per day, High=>5 per day; SM=Storing MOP, NSM=Non-Storing MOP 244 Table 2: RPL State variables 246 2.5. Recommendations 248 It is necessary that RPL avoids using persistent storage as far as 249 possible. Ideally, extensions to RPL should consider this as a 250 design requirement especially for 6LR and 6LN nodes. DTSN and 251 PathSequence are the primary state variables which have major impact. 253 2.6. Implementation Notes 255 An implementation should use a random DAOSequence number on reboot so 256 as to avoid a risk of reusing the same DAOSequence on reboot. A 257 parent node will not respond with a DAO-ACK in case it sees a DAO 258 with the same previous DAOSequence. 260 Write-Before-Use: The state information should be written to the 261 flash before using it in the messaging. If it is done the other way, 262 then the chances are that the node power downs before writing to the 263 persistent storage. 265 3. DTSN increment in storing MOP 267 DTSN increment has major impact on the overall RPL control traffic 268 and on the efficiency of downstream route update. DTSN is sent as 269 part of DIO message and signals the downstream nodes to trigger the 270 target advertisement. The 6LR needs to decide when to update the 271 DTSN and usually it should do it in a conservative way. The DTSN 272 update mechanism determines how soon the downward routes are 273 established along the new path. RPL specifications does not provide 274 any clear mechansim on how the DTSN update should happen in case of 275 storing mode. 277 (6LBR) 278 | 279 | 280 | 281 (A) 282 / \ 283 / \ 284 / \ 285 (B) -(C) 286 | / | 287 | / | 288 | / | 289 (D)- (E) 290 \ ; 291 \ ; 292 \ ; 293 (F) 294 / \ 295 / \ 296 / \ 297 (G) (H) 299 Figure 1: Sample topology 301 Consider example topology shown in figure Figure 1, assume that node 302 D switches the parent from node B to C. Ideally the downstream nodes 303 D and its sub-childs should send their target advertisement to the 304 new path via node C. To achieve this result in a efficient way is a 305 challenge. Incrementing DTSN is the only way to trigger the DAO on 306 downstream nodes. But this trigger should be sent not only on the 307 first hop but to all the grand-child nodes. Thus DTSN has to be 308 incremented in the complete sub-DODAG rooted at node D thus resulting 309 in DIO/DAO storm along the sub-DODAG. This is specifically a big 310 issue in high density networks where the metric deteoration might 311 happen transiently even though the signal strength is good. 313 The primary implementation issue is whether a child node increment 314 its own DTSN when it receives DTSN update from its parent node? This 315 would result in DAO-updates in the sub-DODAG, thus the cost could be 316 very high. If not incremented it may result in serious loss of 317 connectivity for nodes in the sub-DODAG. 319 4. DAO retransmission and use of DAO-ACK 321 [RFC6550] has an optional DAO-ACK mechanism using which an upstream 322 parent confirms the reception of a DAO from the downstream child. In 323 case of storing mode, the DAO is addressed to the immediate hop 324 upstream parent resulting in DAO-ACK from the parent. There are two 325 implementations possible: 327 (1) A parent responds with a DAO-ACK immedetialy after receiving the 328 DAO. 330 (2) A node waits for the upstream parent to send DAO-ACK to respond 331 with a DAO-ACK downstream. This may not be feasible to use on 332 constrained devices because it requires additional state 333 information and timers to be handled on behalf of multiple 334 downstream nodes whose DAO is in transit. 336 Following scenarios do not have clear handling in the specs: 338 (1) What happens if the DAO-ACK for the target is lost at the 339 ancestor node link? 341 (2) What happens if the DAO-ACK with Status!=0 is responded by 342 ancestor node? 344 (3) Is there any way for the target node to know that the DAO it 345 sent has reached the 6LBR successfully? 347 Note that any of these inefficiencies are not present in case of 348 NSMOP in which the DAO is addressed directly to the 6LBR. 350 5. Handling resource unavailability 352 The nodes in the constrained networks have to maintain various 353 records such as neighbor cache entries and routing entries on behalf 354 of other targets to facilitate packet forwarding. Because of the 355 constrained nature of the devices the memory available may be very 356 limited and thus the path selection algorithm may have to take into 357 consideration such resource constraints as well. 359 RPL currently does not have any mechanism to advertise such resource 360 indicator metrics. The primary tables associated with RPL are 361 routing table and the neighbor cache. Even though neighbor cache is 362 not directly linked with RPL protocol, the maintenance of routing 363 adjacencies results in updates to neigbor cache. 365 Following needs to be handled by the specs: 367 Is it possible to know that an upstream parent/ancestor cannot 368 hold enough routing entries and thus this path should not be used? 369 Is it possible to know that an upstream parent cannot hold any 370 more neighbor cache entry and thus this upstream parent should not 371 be used? 373 6. Traffic Types observations 375 RPL is more suited towards MP2P (multi-point to point) traffic, the 376 central point here usually is a grounded root/6LBR node. [RFC6997] 377 allows establishing P2P paths within the DODAG. There are situations 378 where a MP2P network needs to be established within the DODAG. For 379 e.g. there could be multiple switches connecting the same light bulb. 380 Currently to achieve this, every switch needs to establish a P2P path 381 to the bulb. In cases where the cardinality of nodes connecting to 382 the same node is high the cost of establishing P2P paths could be 383 very high. RPL allows 'floating' DODAG to be created but the 384 specification defines it to be used under other circumstances. To 385 quote [RFC6550], 387 "A grounded DODAG offers connectivity to hosts that are required for 388 satisfying the application-defined goal. ___A floating DODAG is not 389 expected to satisfy the goal; in most cases, it only provides routes 390 to nodes within the DODAG. Floating DODAGs may be used, for example, 391 to preserve interconnectivity during repair.___" 393 Thus it is not clear whether floating DODAGs can be put to use for 394 establishing MP2P paths within the DODAG. 396 7. RPL under-specification 398 (a) PathSequence: Is it mandatory to use PathSequence in DAO Transit 399 container? RPL mentions that a 6LR/6LBR hosting the routing 400 entry on behalf of target node should refresh the lifetime on 401 reception of a new Path Sequence. But RPL does not necessarily 402 mandate use of Path Sequence. Most of the open source 403 implementation [RIOT] [CONTIKI] currently do not issue Path 404 Sequence in the DAO message. 406 (b) Target Container aggregation in DAO: RPL allows multiple targets 407 to be aggregated in a single DAO message and has introduced a 408 notion of DelayDAO using which a 6LR node could delay its DAO to 409 enable such aggregation. But RPL does not have clear text on 410 handling of aggregated DAOs and thus it hinders 411 interoperability. 413 (c) DTSN Update: RPL does not clearly define in which cases DTSN 414 should be updated in case of storing mode of operation. More 415 details for this are presented in Section 3. 417 8. Acknowledgements 419 9. IANA Considerations 421 This memo includes no request to IANA. 423 10. Security Considerations 425 This is an information draft and does add any changes to the existing 426 specifications. 428 11. References 430 11.1. Normative References 432 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 433 Requirement Levels", BCP 14, RFC 2119, 434 DOI 10.17487/RFC2119, March 1997, 435 . 437 [RFC6550] Winter, T., Ed., Thubert, P., Ed., Brandt, A., Hui, J., 438 Kelsey, R., Levis, P., Pister, K., Struik, R., Vasseur, 439 JP., and R. Alexander, "RPL: IPv6 Routing Protocol for 440 Low-Power and Lossy Networks", RFC 6550, 441 DOI 10.17487/RFC6550, March 2012, 442 . 444 [RFC6551] Vasseur, JP., Ed., Kim, M., Ed., Pister, K., Dejean, N., 445 and D. Barthel, "Routing Metrics Used for Path Calculation 446 in Low-Power and Lossy Networks", RFC 6551, 447 DOI 10.17487/RFC6551, March 2012, 448 . 450 [RFC6552] Thubert, P., Ed., "Objective Function Zero for the Routing 451 Protocol for Low-Power and Lossy Networks (RPL)", 452 RFC 6552, DOI 10.17487/RFC6552, March 2012, 453 . 455 [RFC6775] Shelby, Z., Ed., Chakrabarti, S., Nordmark, E., and C. 456 Bormann, "Neighbor Discovery Optimization for IPv6 over 457 Low-Power Wireless Personal Area Networks (6LoWPANs)", 458 RFC 6775, DOI 10.17487/RFC6775, November 2012, 459 . 461 [RFC6997] Goyal, M., Ed., Baccelli, E., Philipp, M., Brandt, A., and 462 J. Martocci, "Reactive Discovery of Point-to-Point Routes 463 in Low-Power and Lossy Networks", RFC 6997, 464 DOI 10.17487/RFC6997, August 2013, 465 . 467 11.2. Informative References 469 [I-D.clausen-lln-rpl-experiences] 470 Clausen, T., Verdiere, A., Yi, J., Herberg, U., and Y. 471 Igarashi, "Observations on RPL: IPv6 Routing Protocol for 472 Low power and Lossy Networks", draft-clausen-lln-rpl- 473 experiences-10 (work in progress), January 2018. 475 [Perlman83] 476 Perlman, R., "Fault-Tolerant Broadcast of Routing 477 Information", North-Holland Computer Networks, Vol.7, 478 December 1983. 480 Appendix A. Additional Stuff 482 Authors' Addresses 484 Rahul Arvind Jadhav (editor) 485 Huawei 486 Kundalahalli Village, Whitefield, 487 Bangalore, Karnataka 560037 488 India 490 Phone: +91-080-49160700 491 Email: rahul.ietf@gmail.com 493 Rabi Narayan Sahoo 494 Huawei 495 Kundalahalli Village, Whitefield, 496 Bangalore, Karnataka 560037 497 India 499 Phone: +91-080-49160700 500 Email: rabinarayans@huawei.com 501 Yuefeng Wu 502 Huawei 503 No.101, Software Avenue, Yuhuatai District, 504 Nanjing, Jiangsu 210012 505 China 507 Phone: +86-15251896569 508 Email: wuyuefeng@huawei.com