idnits 2.17.1 draft-voit-netmod-peer-mount-requirements-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Introduction section. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 9, 2015) is 3329 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- -- Looks like a reference, but probably isn't: '1' on line 957 -- Looks like a reference, but probably isn't: '2' on line 960 -- Looks like a reference, but probably isn't: '3' on line 962 -- Looks like a reference, but probably isn't: '4' on line 964 ** Obsolete normative reference: RFC 3768 (Obsoleted by RFC 5798) Summary: 2 errors (**), 0 flaws (~~), 1 warning (==), 5 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NETCONF Data Modeling Language Working Group (netmod) E. Voit 2 Internet-Draft A. Clemm 3 Intended status: Informational Cisco Systems 4 Expires: September 10, 2015 S. Mertens 5 Prismtech 6 March 9, 2015 8 Requirements for Peer Mounting of YANG subtrees from Remote Datastores 9 draft-voit-netmod-peer-mount-requirements-02 11 Abstract 13 Network integrated applications want simple ways to access YANG 14 objects and subtrees which might be distributed across network. 15 Performance requirements may dictate that it is unaffordable for a 16 subset of these applications to go through existing centralized 17 management brokers. For such applications, development complexity 18 must be minimized. Specific aspects of complexity developers want to 19 ignore include: 21 o whether authoritative information is actually sourced from remote 22 datastores (as well as how to get to those datastores), 24 o whether such information has been locally cached or not, 26 o whether there are zero, one, or more controllers asserting 27 ownership of information, and 29 o whether there are interactions with other applications 30 concurrently running elsewhere 32 The solution requirements described in this document detail what is 33 needed to support application access to authoritative network YANG 34 objects from controllers (star) or peering network devices (mesh) in 35 such a way to meet these goals. 37 Status of This Memo 39 This Internet-Draft is submitted in full conformance with the 40 provisions of BCP 78 and BCP 79. 42 Internet-Drafts are working documents of the Internet Engineering 43 Task Force (IETF). Note that other groups may also distribute 44 working documents as Internet-Drafts. The list of current Internet- 45 Drafts is at http://datatracker.ietf.org/drafts/current/. 47 Internet-Drafts are draft documents valid for a maximum of six months 48 and may be updated, replaced, or obsoleted by other documents at any 49 time. It is inappropriate to use Internet-Drafts as reference 50 material or to cite them other than as "work in progress." 52 This Internet-Draft will expire on September 10, 2015. 54 Copyright Notice 56 Copyright (c) 2015 IETF Trust and the persons identified as the 57 document authors. All rights reserved. 59 This document is subject to BCP 78 and the IETF Trust's Legal 60 Provisions Relating to IETF Documents 61 (http://trustee.ietf.org/license-info) in effect on the date of 62 publication of this document. Please review these documents 63 carefully, as they describe your rights and restrictions with respect 64 to this document. Code Components extracted from this document must 65 include Simplified BSD License text as described in Section 4.e of 66 the Trust Legal Provisions and are provided without warranty as 67 described in the Simplified BSD License. 69 Table of Contents 71 1. Business Problem . . . . . . . . . . . . . . . . . . . . . . 3 72 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 73 3. Solution Context . . . . . . . . . . . . . . . . . . . . . . 5 74 3.1. Peer Mount . . . . . . . . . . . . . . . . . . . . . . . 6 75 3.2. Eventual Consistency and YANG 1.1 . . . . . . . . . . . . 7 76 4. Example Use Cases . . . . . . . . . . . . . . . . . . . . . . 8 77 4.1. Cloud Policer . . . . . . . . . . . . . . . . . . . . . . 8 78 4.2. DDoS Thresholding . . . . . . . . . . . . . . . . . . . . 9 79 4.3. Service Chain Classification, Load Balancing and Capacity 80 Management . . . . . . . . . . . . . . . . . . . . . . . 10 81 5. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 11 82 5.1. Application Simplification . . . . . . . . . . . . . . . 11 83 5.2. Caching . . . . . . . . . . . . . . . . . . . . . . . . . 13 84 5.3. Subscribing to Remote Object Updates . . . . . . . . . . 14 85 5.4. Lifecycle of the Mount Topology . . . . . . . . . . . . . 14 86 5.4.1. Discovery and Creation of Mount Topology . . . . . . 14 87 5.4.2. Restrictions on the Mount Topology . . . . . . . . . 15 88 5.5. Mount Filter . . . . . . . . . . . . . . . . . . . . . . 15 89 5.6. Auto-Negotiation of Peer Mount Client QoS . . . . . . . . 15 90 5.7. Datastore Qualification . . . . . . . . . . . . . . . . . 16 91 5.8. Local Mounting . . . . . . . . . . . . . . . . . . . . . 16 92 5.9. Mount Cascades . . . . . . . . . . . . . . . . . . . . . 16 93 5.10. Transport . . . . . . . . . . . . . . . . . . . . . . . . 16 94 5.11. Security Considerations . . . . . . . . . . . . . . . . . 17 95 5.12. High Availability . . . . . . . . . . . . . . . . . . . . 17 96 5.12.1. Reliability . . . . . . . . . . . . . . . . . . . . 18 97 5.12.2. Alignment to late joining peers . . . . . . . . . . 18 98 5.12.3. Liveliness . . . . . . . . . . . . . . . . . . . . . 18 99 5.12.4. Merging of datasets . . . . . . . . . . . . . . . . 18 100 5.12.5. Distributed Mount Servers . . . . . . . . . . . . . 19 101 5.13. Configuration . . . . . . . . . . . . . . . . . . . . . . 19 102 5.14. Assurance and Monitoring . . . . . . . . . . . . . . . . 19 103 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 104 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 19 105 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 106 8.1. Normative References . . . . . . . . . . . . . . . . . . 20 107 8.2. Informative References . . . . . . . . . . . . . . . . . 20 108 8.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 21 109 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 21 111 1. Business Problem 113 Instrumenting Physical and Virtual Network Elements purely along 114 device boundaries is insufficient for today's requirements. Instead, 115 users, applications, and operators are asking for the ability to 116 interact with varying subsets of network information at the highest 117 viable level of abstraction. Likewise applications that run locally 118 on devices may require access to data that transcends the boundaries 119 of the device they are deployed. Achieving this can be difficult 120 since a running network is comprised of a distributed mesh of object 121 ownership. (I.e., the authoritative device owning a particular 122 object will vary.) Solutions require the transparent assembly of 123 different objects from across a network in order to provide 124 consolidated, time synchronized, and consistent views required for 125 that abstraction. 127 Recent approaches have focused on a Network Controller as the arbiter 128 of new network-wide abstractions. Controller based solutions are 129 supportable by requirements outlined in this document. However this 130 is not the only deployment model covered by this document. Equally 131 valid are deployment models where Network Elements exchange 132 information in a way which allows one or more of those Elements to 133 provide the desired network level abstraction. This is not a new 134 idea. Examples of Network Element based protocols which already do 135 network level abstractions include VRRP [RFC3768], mLACP/ICCP[ICCP], 136 and Anycast-RP [RFC4610] . As network elements increase their compute 137 power and support Linux based compute virtualization, we should 138 expect additional local applications to emerge as well (such as 139 Distributed Analytics [1]). 141 Ultimately network application programming must be simplified. To do 142 this: 144 o we must provide APIs to both controller and network element based 145 applications in a way which allows access to network objects as if 146 they were coming from a cloud, 148 o we must enable these local applications to interact with network 149 level abstractions, 151 o we must hide the mesh of interdependencies and consistency 152 enforcement mechanisms between devices which will underpin a 153 particular abstraction, 155 o we must enable flexible deployment models, in which applications 156 are able to run not only on controller and OSS frameworks but also 157 on network devices without requiring heavy middleware with large 158 footprints, and 160 o we need to maintain clear authoritative ownership of individual 161 data items while not burdening applications with the need to 162 reconcile and synchronize information replicated in different 163 systems, nor needing to maintain redundant data models that 164 operate on the same underlying data. 166 These steps will eliminate much unnecessary overhead currently 167 required of today's network programmer. 169 2. Terminology 171 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 172 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 173 document are to be interpreted as described in RFC 2119 [RFC2119]. 175 Authoritative Datastore - A datastore containing the authoritative 176 copy of an object, i.e. the source and the "owner" of the object. 178 Client Datastore - a datastore containing an object whose source and 179 "owner" is a remote datastore. 181 Data Node - An instance of management information in a YANG 182 datastore. 184 Datastore - A conceptual store of instantiated information, with 185 individual data items represented by data nodes which are arranged in 186 hierarchical manner. 188 Data Subtree - An instantiated data node and the data nodes that are 189 hierarchically contained within it. 191 Mount Client - The system at which the mount point resides, into 192 which on or more remote subtrees may be mounted. 194 Mount Binding - An instance of mounting from a specific Mount Point 195 to a remote datastore. Types include: 197 o On-demand: Mount Client only pulls information when application 198 requests 200 o Periodic: Mount Server pushes current state at a pre-defined 201 interval 203 o Unsolicited: Mount Server maintains active bindings and sends to 204 client cache upon change 206 Mount Point - Point in the local data store which may reference a 207 single remote subtree 209 Mount Server - The server with which the Mount Client communicates 210 and which provides the Mount Client with access to the mounted 211 information. Can be used synonymously with Mount Target. 213 Peer Mount - The act of representing remote objects in the local 214 datastore 216 Target Data Node - Data Node on Mount Server against which a Mount 217 Binding is established 219 3. Solution Context 221 YANG modeling has emerged as a preferred way to offer network 222 abstractions. The requirements in this document can be enabled by 223 expanding of the syntax of YANG capabilities embodied within RFC 6020 224 [RFC6020] and YANG 1.1 [rfc6020bis]. A companion draft to this one 225 which details a potential set of YANG technology extensions which can 226 support key requirements within this document are contained in . 227 [draft-clemm-mount]. 229 To date systems built upon YANG models have been missing two 230 capabilities: 232 1. Peer Datastore Mount: Datastores have not been able to proxy 233 objects located elsewhere. This puts additional burden upon 234 applications which then need to find and access multiple 235 (potentially remote) systems. 237 2. Eventual Consistency: YANG Datastore implementations have 238 typically assumed ACID [2] transaction models. There is nothing 239 inherent in YANG itself which demands ACID transactional 240 guarantees. YANG models can also expose information which might 241 be in the process of undergoing convergence. Since IP networking 242 has been designed with convergence in mind, this is a useful 243 capability since some types of applications must participate 244 where there is dynamically changing state. 246 3.1. Peer Mount 248 First this document will dive deeper into Peer Datastore Mount 249 (a.k.a., "Peer Mount"). Contrary to existing YANG datastores, where 250 hierarchical datatree(s) are local in scope and only includes data 251 that is "owned" by the local system, we need an agent or interface on 252 one system which is able refer to managed resources that reside on 253 another system. This allows applications on the same system as the 254 YANG datastore server, as well as remote clients that access the 255 datastore through a management protocol such as NETCONF, to access 256 all data as if it were local to that same server. This must be done 257 in a manner that is transparent to users and applications. This must 258 be done in a way which does not require a user or application to be 259 aware of the fact that some data resides in a different location and 260 have them directly access that other system. In this way, the user 261 is projected an image of one virtual consolidated datastore. 263 The value in such a datastore comes from its under-the-covers 264 federation. The datastore transparently exposes information from 265 multiple systems across the network. The user does not need to be 266 aware of the precise distribution and ownership of data themselves, 267 nor is there a need for the application to discover those data 268 sources, maintain separate associations with them, and partition its 269 operations to fit along remote system boundaries. The effect is that 270 a network device can broaden and customize the information available 271 for local access. Life for the application is easier. 273 Any Object type can be included in such a datastore. This can 274 include configuration data that is either persistent or ephemeral, 275 and which is valid within only a single device or across a domain of 276 devices. This can include operational data that represents state 277 across a single device or across a multiple devices. 279 Another useful aspect of "Peer Mount" is its ability to embed 280 information from external YANG models which haven't necessarily been 281 normalized. Normalization is a good thing. But the massive human 282 efforts invested in uber-data-models have never gained industry 283 traction due to the resulting models' brittle nature and complexity. 284 By mounting remote trees/objects into local datastores it is possible 285 to expose remote objects under a locally optimized hierarchy without 286 having to transpose remote objects into a separate local model. Once 287 this exists, object translation and normalization become optional 288 capabilities which may also be hidden. 290 Another useful aspect of "Peer Mount" is its ability to mount remote 291 trees where the local datastore does not know the full subtree being 292 installed. In fact, the remote datastore might be dynamically 293 changing the mounted tree. These dynamic changes can be reflected as 294 needed under the "attachment points" within the namespace hierarchy 295 where the data subtrees from remote systems have been mounted. In 296 this case, the precise details of what these subtrees exactly contain 297 does not need to be understood by the system implementing the 298 attachment point, it simply acts as a single point of entry and 299 "proxy" for the attached data. 301 3.2. Eventual Consistency and YANG 1.1 303 The CAP theorem [3] states that it is impossible for a distributed 304 computer system to simultaneously provide Consistency, Availability, 305 and Partition tolerance. (I.e., distributed network state management 306 is hard.) Mostly for this reason YANG implementations have shied 307 away from distributed datastore implementations where ACID 308 transactional guarantees cannot be given. This of course limits the 309 universe of applicability for YANG technology. 311 Leveraging YANG concepts, syntax, and models for objects which might 312 be happening to undergo network convergence is valuable. Such reuse 313 greatly expands the universe of information visible to networking 314 applications. The good news is that there is nothing in YANG 1.1 315 syntax that prohibits its reapplication for distributed datastores. 316 Extensions are needed however. 318 Requirements described within this document can be used to define 319 technology extensions to YANG 1.1 for remote datastore mounting. 320 Because of the CAP theorem, it must be recognized that systems built 321 upon these extensions MAY choose to support eventual consistency 322 rather than ACID guarantees. Some applications do not demand ACID 323 guarantees (examples are contained in this document's Use Case 324 section). Therefore for certain classes of applications, eventual 325 consistency [4] should be viewed as a cornerstone feature capability 326 rather than a bug. 328 Other industries have been able to identify and realize the value in 329 such model. The Object Management Group Data-Distribution Service 330 for Real-Time Systems has even standardized these capabilities for 331 non-YANG deployments [OMG-DDS]. Commercial deployments exist. 333 4. Example Use Cases 335 Many types of applications can benefit from the simple and quick 336 availability of objects from peer network devices. Because network 337 management and orchestration systems have been fulfilling a subset of 338 the requirements for decades, it is important to focus on what has 339 changed. Changes include: 341 o SDN applications wish to interact with local datastore(s) as if 342 they represent the real-time state of the distributed network. 344 o Independent sets of applications and SDN controllers might care 345 about the same authoritative data node or subtree. 347 o Changes in the real-time state of objects can announce themselves 348 to subscribing applications. 350 o The union of an ever increasing number of abstractions provided 351 from different layers of the network are assumed to be consistent 352 with each other (at least once a reasonable convergence time has 353 been factored in). 355 o CPU and VM improvements makes running Linux based applications on 356 network elements viable. 358 Such changes can enable a new class of applications. These 359 applications are built upon fast-feedback-loops which dynamically 360 tune the network based on iterative interactions upon a distributed 361 datastore. 363 4.1. Cloud Policer 365 A Cloud Policer enables a single aggregated data rate to tenants/ 366 users of a data center cloud that applies across their VMs; a rate 367 independent of where specific VMs are physically hosted. This works 368 by having edge router based traffic counters available to a 369 centralized application, which can then maintain an aggregate across 370 those counters. Based on the sum of the counters across the set of 371 edge routers, new values for each device based Policer can be 372 recalculated and installed. Effectively policing rates are 373 continuously rebalanced based on the most recent traffic offered to 374 the aggregate set of edge devices. 376 The cloud policer provides a very simple cloud QoS model. Many other 377 QoS models could also be implemented. Example extensions include: 379 o CIR/PIR guarantees for a tenant, 380 o hierarchical QoS treatment, 382 o providing traffic delivery guarantees for specific enterprise 383 branch offices, and 385 o adjusting the prioritization of one application based on the 386 activity of another application which perhaps is in a completely 387 different location. 389 It is possible to implement such a cloud policer application with 390 maximum application developer simplicity using peer mount. To do 391 this the application accesses a local datastore which in turn does a 392 peer mount from edge routers the objects which house current traffic 393 counter statistics. These counters are accessed as if they were part 394 of the local datastore structures, without concern for the fact that 395 the actual authoritative copies reside on remote systems. 397 Beyond this centralized counter collection peer mount, it is also 398 possible to have distributed edge routers mount information in the 399 reverse direction. In this case local edge routers can peer mount 400 centrally calculated policer rates for the device, and access these 401 objects as if they were locally configured. 403 For both directions of mounting, the authoritative copy resides in a 404 single system and is mounted by peers. Therefore issues with regards 405 to inconsistent configuration of the same redundant data across the 406 network are avoided. Also as can be seen in this use case, the same 407 system can act as a mount client of some objects while acting as 408 server for other objects. 410 4.2. DDoS Thresholding 412 Another extension of the "Cloud Policer" application is the creation 413 of additional action thresholds at bandwidth rates far greater than 414 might be expected. If these higher thresholds are hit, it is 415 possible to connect in DDoS scrubbers to ingress traffic. This can 416 be done in seconds after a bandwidth spike. This can also be done if 417 non-bandwidth counters are available. For example, if TCP flag 418 counts are available it is possible to look for changes in SYN/ACK 419 ratios which might signal a different type of attack. In all cases, 420 when network counters indicate a return to normal traffic profiles 421 the DDoS Scrubbers can be automatically disconnected. 423 Benefits of only connecting a DDoS scrubber in the rare event an 424 attack might be underway include: 426 o marking down traffic for an out-of-profile tenant so that an 427 potential attack doesn't adversely impact others, 429 o applying DDoS Scrubbing across many devices when an attack is 430 detected in one, 432 o reducing DDoS scrubber CPU, power, and licensing requirements 433 (during the vast majority of time, spikes are not occurring), and 435 o dynamic management and allocation of scarce platform resources 436 (such as optimizing span port usage, or limiting IP-FIX reporting 437 to levels where devices can do full flow detail exporting). 439 4.3. Service Chain Classification, Load Balancing and Capacity 440 Management 442 Service Chains will dynamically change ingress classification 443 filters, allocate paths from many ingress devices across shared 444 resources. This information needs to be updated in real time as 445 available capacity is allocated or failures are discovered. It is 446 possible to simplify service chain configuration and dynamic topology 447 maintenance by transparently updating remote cached topologies when 448 an authoritative object is changed within a central repository. For 449 example if the CPU in one VM spikes, you might want to recalculate 450 and adjust many chained paths to relieve the pressure. Or perhaps 451 after the recalculation you want to spin up a new VM, and then adjust 452 chains when that capacity is on-line. 454 A key value here is central calculation and transparent auto- 455 distribution. In other words, a change only need be updated by an 456 application in a single location, and the infrastructure will 457 automatically synchronize changes across any number of subscribing 458 devices without application involvement. In fact, the application 459 need not even know many devices are monitoring the object which has 460 been changed. 462 Beyond 1:n policy distribution, applications can step back from 463 aspects of failure recovery. What happens if a device is rebooting 464 or simply misses a distribution of new information? With peer mount 465 there is no doubt as to where the authoritative information resides 466 if things get out of synch. 468 While this ability is certainly useful for dynamic service chain 469 filtering classification and next hop mapping, this use case has more 470 general applicability. With a distributed datastore, diverse 471 applications and hosts can locally access a single device's current 472 VM CPU and Bandwidth values. They can do it without needing to 473 explicitly query that remote machine. Updates from a device would 474 come from a periodic push of stats to a transparent cache to 475 subscribed, or via an unsolicited update which is only sent when 476 these value exceed established norms. 478 5. Requirements 480 To achieve the objectives described above, the network needs to 481 support a number of requirements 483 5.1. Application Simplification 485 A major obstacle to network programmability are any requirements 486 which force applications to use abstractions more complicated than 487 the developer cares to touch. To simplify applications development 488 and reduce unnecessary code, the following needs must be met. 490 Applications MUST be able to access a local datastore which includes 491 objects whose authoritative source is located in a remote datastore 492 hosted on a different server. 494 Local datastores MUST be able to provide a hierarchical view of 495 objects assembled from objects whose authoritative source may 496 originate from potentially different and overlapping namespaces. 498 Applications MUST be able to access all objects of a datastore 499 without concern where the actual object is located, i.e. whether the 500 authoritative copy of the object is hosted on the same system as the 501 local datastore or whether it is hosted in a remote datastore. 503 With two exceptions, a datastore's application facing interfaces MUST 504 make no differentiation whether individual objects exposed are 505 authoritatively owned by the datastore or mounted from remote. This 506 includes Netconf and Restconf as well as other, possibly proprietary 507 interfaces (such as, CLI generated from corresponding YANG data 508 models). The two exceptions are that it is acceptable to make a 509 distinction between an object authoritatively owned by the data store 510 and a remote object as follows: 512 o Object updates / editing, creation and deletion. E.g. via edit- 513 config conditions and constraints are assessed at the 514 authoritative datastore when the update/create/delete is 515 conducted. Any conditions or constraints at remote client 516 datastores are NOT assessed. 518 o Locks obtained at a client datastore: It is conceivable for the 519 interface to distinguish between two lock modes: locking the 520 entire subtree including remote data (in which case the 521 datastore's mount client needs to explicitly obtain and release 522 locks from mounted authoritative datastores), or locking only 523 authoritatively owned data, excluding remote data from the lock. 525 These exceptions should not be very problematic as non-authoritative 526 copies will typically be marked as read-only. This will not violate 527 any considerations of "no differentiation" of local or remote. 529 When a change is made to an object, that change will be reflected in 530 any datastore in which the object is included. This means that a 531 change made to the object through a remote datastore will affect the 532 object in the authoritative datastore. Likewise, changes to an 533 object in the authoritative datastore will be reflected at any client 534 datastores. 536 The distributed datastore MUST be able to include objects from 537 multiple remote datastores. The same object may be included in 538 multiple remote datastores; in other words, an object's authoritative 539 datastore MUST support multiple clients. 541 The distributed datastore infrastructure MUST enable to access to 542 some subset of the same objects on different devices. (This includes 543 multiple controllers as well as multiple physical and virtual peer 544 devices.) 546 Applications SHOULD be able to extract a time synchronized set of 547 operational data from the datastore. (In other words, the 548 application asks for a subset of network state at time-stamp or time- 549 range "X". The datastore would then deliver time synchronized 550 snapshots of the network state per the request. The datastore may 551 work with NTP and operational counter to optimize the synchronization 552 results of such a query. It is understood that some types of data 553 might be undergoing convergence conditions.) 555 Authoritative datastore retain full ownership of "their" objects. 556 This means that while remote datastores may access the data, any 557 modifications to objects that are initiated at those remote 558 datastores need to be authorized by the authoritative owner of the 559 data. Likewise, the authoritative owner of the data may make changes 560 to objects, including modifications, additions, and deletions, 561 without needing to first ask for permission from remote clients. 563 Applications MUST be designed to deal with incomplete data if remote 564 objects are not accessible, e.g. due to temporal connectivity issues 565 preventing access to the authoritative source. (This will be true 566 for many protocols and programming languages. Mount is unlikely to 567 add anything new here unless applications have extra error handling 568 routines to deal with when there is no response from a remote 569 system.). 571 5.2. Caching 573 Remote objects in a datastore can be accessed "on demand", when the 574 application interacting with the datastore demands it. In that case, 575 a request made to the local datastore is forwarded to the remote 576 system. The response from the remote system, e.g. the retrieved 577 data, is subsequently merged and collated with the other data to 578 return a consolidated response to the invoking application. 580 A downside of a datastore which is distributed across devices can be 581 the latency induced when remote object acquisition is necessary. 582 There are plenty of applications which have requirements which simply 583 cannot be served when latency is introduced. The good news is that 584 the concept of caching lends itself well to distributed datastores. 585 It is possible to transparently store some types of objects locally 586 even when the authoritative copy is remote. Instead of fetching data 587 on demand when an application demands it, the application is simply 588 provided with the local copy. It is then up to the datastore 589 infrastructure to keep selected replicated info in synch, e.g. by 590 prefetching information, or by having the remote system publish 591 updates which are then locally stored. At this point, it is expected 592 that a preferred method of subscribing to and publishing updates will 593 be accomplished via [yang-pub-sub-reqts] and 594 [draft-clemm-datastore-push]. Other methods could work equally well 595 . 597 This is not a new idea. Caching and Content Delivery Networks (CDN) 598 have sped read access for objects within the Internet for years. 599 This has enabled greater performance and scale for certain content. 600 Just as important, these technologies have been employed without end 601 user applications being explicitly aware of their involvement. Such 602 concepts are applicable for scaling the performance of a distributed 603 datastore. 605 Where caching occurs, it MUST be possible for the Mount Client to 606 store object copies of a remote data node or subtree in such a way 607 that applications are unaware that any caching is occurring. 608 However, the interface to a datastore MAY provide applications with a 609 special mode/flag to allow them to force a read-through. 611 Where caching occurs, system administration facilities SHOULD allow 612 facilities to flush either the entire cache, or information 613 associated with select Mount Points. 615 5.3. Subscribing to Remote Object Updates 617 When caching occurs, data can go stale. [draft-clemm-datastore-push] 618 provides a mechanism where changes in an authoritative data node or 619 subtree can be monitored. If changes occur, these changes can be 620 delivered to any subscribing datastores. In this way remote caches 621 can be kept up-to-date. In this way, directly monitoring remote 622 applications can quickly receive notifications without continuous 623 polling. 625 A Mount Server SHOULD support [draft-clemm-datastore-push] Periodic 626 or On-Change pub/sub capabilities in which one or more remote clients 627 subscribe to updates of a target data node / subtree, which are then 628 automatically published by the Mount Server. 630 It MUST be possible for Applications to bind to subscribed Data Node 631 / Subtrees so that upon Mount Client receipt of subscribed 632 information, it is immediately passed to the application. 634 It MUST be possible for a Target Data Node to support 1:n Mount 635 Bindings to many subscribed Mount Points. 637 5.4. Lifecycle of the Mount Topology 639 Mount can drive a dynamic and richly interconnected mesh of peer-to- 640 peer of object relationships. Each of these Mounts will be 641 independently established by a Mount Client. 643 It MUST be possible to bootstrap the Mount Client by providing the 644 YANG paths to resources on the Mount Server. 646 There SHOULD be the ability to add Mount Client bindings during run- 647 time. 649 A Mount Client MUST be able to be able to create, delete, and timeout 650 Mount Bindings. 652 Any Subscription MUST be able to inform the Mount Client of an 653 intentional/graceful disconnect. 655 A Mount Client MUST be able to verify the status of Subscriptions, 656 and drive re-establishment if it has disappeared. 658 5.4.1. Discovery and Creation of Mount Topology 660 Application visibility into an ever-changing set of network objects 661 is not trivial. While some applications can be easily configured to 662 know the Devices and available Mount Points of interest, other 663 applications will have to balance many aspects of dynamic device 664 availability, capabilities, and interconnectedness. For the most 665 part, maintenance of these dynamic elements can be done on the YANG 666 objects themselves without anything needed new for Peer Mount. 667 Technologies such as need reference are covered in other standards 668 initiatives. Therefore this draft does delve deeply into the needs 669 for Auto-discovery of YANG objects which may be advertised. 671 However it will likely become interesting for a network element to 672 limit the Data Subtrees which might be subscribed for Unsolicited and 673 Periodic Update. It is assumed these capabilities will be included 674 as part of [draft-clemm-datastore-push] 676 5.4.2. Restrictions on the Mount Topology 678 Mount Clients MUST NOT create recursive Mount bindings (i.e., the 679 Mount Client should not load any object or subtree which it has 680 already delivered to another in the role of a Mount Server.) Note: 681 Objects mounted from a controller as part of orchestration are *not* 682 considered the same objects as those which might be mounted back from 683 a network device showing the actual running config. 685 5.5. Mount Filter 687 The Mount Server default MUST be to deliver the same Data Node / 688 Subtree that would have been delivered via direct YANG access. 690 It SHOULD be possible for a Mount Client to request something less 691 that the full subtree or a target node as defined in 692 [yang-pub-sub-reqts]. 694 5.6. Auto-Negotiation of Peer Mount Client QoS 696 The interest that a Mount Client expresses in a particular subtree 697 SHOULD include the non-functional data delivery requirements (QoS) on 698 the data that is being mounted. Additionally, Mount Servers SHOULD 699 advertise their data delivery capabilities. With this information 700 the Mount Client can decide whether the quality of the delivered data 701 is sufficient to serve applications residing above the Mount Client. 703 An example here is reliability. A reliable protocol might be 704 overkill for a state that is republished with high frequency. 705 Therefore a Mount Server may sometimes choose to not provide a 706 reliable method of communication for certain objects. It is up to 707 the Mount Client to determine whether what is offered is sufficiently 708 reliable for its application. Only when the Mount Server is offering 709 data delivery QoS better or equal to what is requested, shall a mount 710 binding be established. 712 Another example is where subscribed objects must be pushed from the 713 Mount Server within a certain interval from when an object change is 714 identified. In such a scenario the interval period of the Mount 715 Server must be equal or smaller than what is requested by a Mount 716 Client. If this "deadline" is not met by the Mount Server the 717 infrastructure MAY take action to notify clients. 719 5.7. Datastore Qualification 721 It is conceivable to differentiate between different datastores on 722 the remote server, that is, to designate the name of the actual 723 datastore to mount, e.g. "running" or "startup". If on the target 724 node there are multiple datastores available, but there has no 725 specific datastore identified by the Mount Client, then the running 726 or "effective" datastore is the assumed target. 728 It is conceivable to use such Datastore Qualification in conjunction 729 with ephemeral datastores, to address requirements being worked in 730 the I2RS WG [draft-haas]. 732 5.8. Local Mounting 734 It is conceivable that the mount target does not reside in a remote 735 datastore, but that data nodes in the same datastore as the 736 mountpoint are targeted for mounting. This amounts to introducing an 737 "aliasing" capability in a datastore. While this is not the scenario 738 that is primarily targeted, it is supported and there may be valid 739 use cases for it. 741 5.9. Mount Cascades 743 It is possible for the mounted subtree to in turn contain a 744 mountpoint. However, circular mount relationships MUST NOT be 745 introduced. For this reason, a mounted subtree MUST NOT contain a 746 mountpoint that refers back to the mounting system with a mount 747 target that directly or indirectly contains the originating 748 mountpoint. As part of a mount operation, the mount points of the 749 mounted system need to be checked accordingly. 751 5.10. Transport 753 Many secured transports are viable assuming transport, data security, 754 scale, and performance objectives are met. Netconf is recommended 755 for starting. Other transports may be proposed over time. 757 It MUST be possible to support Netconf Transport of subscribed Nodes 758 and Subtrees. 760 5.11. Security Considerations 762 Many security mechanisms exist to protect data access for CLI and API 763 on network devices. To the degree possible these mechanisms should 764 transparently protect data when performing a Peer Mount. 766 The same mechanisms used to determine whether a remote host has 767 access to a particular YANG Data Node or Subtree MUST be invoked to 768 determine whether a Mount Client has access to that information. 770 The same traditional transport level security mechanism security used 771 for YANG over a particular transport MUST be used for the delivery of 772 objects from a Mount Server to a Mount Client. 774 A Mount Server implementation MUST NOT change any credentials passed 775 by the Mount Client system for any Mount Binding request. 777 The Mount Server MUST deliver no more objects from a Data Node or 778 Subtree than allowable based on the security credentials provided by 779 the Mount Client. 781 To ensure the ensuring maximum scale limits, it MUST be possible to 782 for a Mount Server to limit the number of bindings and transactional 783 limits 785 It SHOULD be possible to prioritize which Mount Binding instances 786 should be serviced first if there is CPU, bandwidth, or other 787 capacity constraints. 789 5.12. High Availability 791 A key intent for Peer Mount is to allow access to an authoritative 792 copy of an object for a particular domain. Of course system and 793 software failures or scheduled upgrades might mean that the primary 794 copy is not consistently accessible from a single device. In 795 addition, system failovers might mean that the authoritative copy 796 might be housed on a different device than the one where the binding 797 was originally established. Peer Mount architectures must be built 798 to enable Mount Clients to transparently provide access to objects 799 where the authoritative copy moves due to dynamic network 800 reconfigurations . 802 A Peer Mount architecture MUST guarantee that mount bindings between 803 a Mount Server and Mount Clients are eventually consistent. The 804 infrastructure providing this level of consistency MUST be able to 805 operate in scenarios where a system is (temporarily) not fully 806 connected. Furthermore, Mount Clients MAY have various requirements 807 on the boundaries under which eventual consistency is allowed to take 808 place. This subject can be decomposed in the following items: 810 5.12.1. Reliability 812 Eventual consistency can only be guaranteed when peers are 813 communicating using a reliable method of data delivery. A scenario 814 that deserves attention in particular is when a subset of Mount 815 Clients receive a pushed subscription update. If a Mount Server 816 loses connectivity, cross network element consistency can be lost. 817 In such a scenario Mount Clients MAY elect a new designated Mount 818 Server from the set of Mount Clients which have received the latest 819 state. 821 5.12.2. Alignment to late joining peers 823 When a mount binding is established a Mount Server SHOULD provide the 824 Mount Client with the latest state of the requested data. In order 825 to increase availability and fault tolerance an infrastructure MAY 826 support the capability to have multiple alignment sources. In 827 (temporary) absence of a Mount Server, Mount Clients MAY elect a 828 temporary Mount Server to service late joining Mount Clients. 830 5.12.3. Liveliness 832 Upon losing liveliness and being unable to refresh cached data 833 provided from a Mount Server, a Mount Client MAY decide to purge the 834 mount bindings of that server. Purging mount bindings under such 835 conditions however makes a system vulnerable to losing network-wide 836 consistency. A Mount Client can take proactive action based on the 837 assumption that the Mount Server is no longer available. When 838 connectivity is only temporarily lost, this assumption could be false 839 for other datastores. This can introduce a potential for decision- 840 making based on semantical disagreement. To properly handle these 841 scenarios, application behavior MUST be designed accordingly and 842 timeouts with regards to liveliness detection MUST be carefully 843 determined. 845 5.12.4. Merging of datasets 847 A traditional problem with merging replicated datasets during the 848 failover and recovery of Mount Servers is handling the corresponding 849 target data node lifecycle management. When two replicas of a 850 dataset experienced a prolonged loss of connectivity a merge between 851 the two is required upon re-establishing connectivity. A replica 852 might have been modifying contents of the set, including deletion of 853 objects. A naive merge of the two replicas would discard these 854 deletes by aligning the now stale, deleted objects to the replica 855 that deleted them. 857 Authoritative ownership is an elegant solution to this problem since 858 modifications of content can only take place at the owner. Therefore 859 a Mount Client SHOULD, upon reestablishing connectivity with a newly 860 authoritative Mount Server, replace any existing cache contents from 861 a mount binding with the latest version. 863 5.12.5. Distributed Mount Servers 865 For selected objects, Mount Bindings SHOULD be allowed to Anycast 866 addresses so that a Distributed Mount Server implementation can 867 transparently provide (a) availability during failure events to Mount 868 Clients, and (b) load balancing on behalf of Mount Clients. 870 5.13. Configuration 872 At the Mount Client, it MUST be possible for all Mount bindings to 873 configure the following such that the application needs no knowledge. 874 This will include a diverse list of elements such as the YANG URI 875 path to the remote subtree. 877 5.14. Assurance and Monitoring 879 API usage for YANG should be tracked via existing mechanisms. There 880 is no intent to require additional transaction tracking than would 881 have been provided normally. However there are additional 882 requirements which should allow the state of existing and historical 883 bindings to be provided. 885 A Mount Client MUST be able to poll a Mount Server for the state of 886 Subsciptions maintained between the two devices. 888 A Mount Server MUST be able to publish the set of Subscriptions which 889 are currently established on or below any identified data node. 891 6. IANA Considerations 893 This document makes no request of IANA. 895 7. Acknowledgements 897 We wish to acknowledge the helpful contributions, comments, and 898 suggestions that were received from Ambika Prasad Tripathy. Shashi 899 Kumar Bansal, Prabhakara Yellai, Dinkar Kunjikrishnan, Harish 900 Gumaste, Rohit M., Shruthi V. , Sudarshan Ganapathi, and Swaroop 901 Shastri. 903 8. References 905 8.1. Normative References 907 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 908 Requirement Levels", BCP 14, RFC 2119, March 1997. 910 [RFC3768] Hinden, R., "Virtual Router Redundancy Protocol (VRRP)", 911 RFC 3768, April 2004. 913 [RFC4610] Farinacci, D. and Y. Cai, "Anycast-RP Using Protocol 914 Independent Multicast (PIM)", RFC 4610, August 2006. 916 [RFC6020] Bjorklund, M., "YANG - A Data Modeling Language for the 917 Network Configuration Protocol (NETCONF)", RFC 6020, 918 October 2010. 920 8.2. Informative References 922 [ICCP] Martini, Luca., "Inter-Chassis Communication Protocol for 923 L2VPN PE Redundancy", March 2014, 924 . 926 [OMG-DDS] "Data Distribution Service for Real-time Systems, version 927 1.2", January 2007, . 929 [draft-clemm-datastore-push] 930 Clemm, Alex., "Subscribing to datastore push updates", 931 March 2015. 933 [draft-clemm-mount] 934 Clemm, Alex., "Mounting YANG-Defined Information from 935 Remote Datastores", October 2014, 936 . 939 [draft-haas] 940 Haas, J., "I2RS requirements for netmod/netconf draft- 941 haas-i2rs-netmod-netconf-requirements-00", September 2014, 942 . 944 [rfc6020bis] 945 Bjorklund, Martin., "YANG - A Data Modeling Language for 946 the Network Configuration Protocol (NETCONF)", January 947 2015, . 950 [yang-pub-sub-reqts] 951 Voit, Eric., Clemm, Alex., and Alberto. Gonzalez Prieto, 952 "Requirements for Subscription to YANG Datastores", March 953 2015. 955 8.3. URIs 957 [1] http://thomaswdinsmore.com/2014/05/01/distributed-analytics- 958 primer/ 960 [2] http://en.wikipedia.org/wiki/ACID 962 [3] http://robertgreiner.com/2014/08/cap-theorem-revisited/ 964 [4] http://guide.couchdb.org/draft/consistency.html 966 Authors' Addresses 968 Eric Voit 969 Cisco Systems 971 Email: evoit@cisco.com 973 Alex Clemm 974 Cisco Systems 976 Email: alex@cisco.com 978 Sander Mertens 979 Prismtech 981 Email: sander.mertens@prismtech.com