idnits 2.17.1 draft-unify-nfvrg-devops-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (July 6, 2015) is 3216 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Outdated reference: A later version (-02) exists of draft-aldrin-sfc-oam-framework-01 == Outdated reference: A later version (-04) exists of draft-unify-nfvrg-challenges-02 == Outdated reference: A later version (-02) exists of draft-cmzrjp-ippm-twamp-yang-01 Summary: 0 errors (**), 0 flaws (~~), 4 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 NFVRG C. Meirosu 2 Internet Draft Ericsson 3 Intended status: Informational A. Manzalini 4 Expires: January 2016 Telecom Italia 5 J. Kim 6 Deutsche Telekom 7 R. Steinert 8 SICS 9 S. Sharma 10 iMinds 11 G. Marchetto 12 Politecnico di Torino 13 I. Papafili 14 Hellenic Telecommunications Organization 15 K. Pentikousis 16 EICT 17 S. Wright 18 AT&T 20 July 6, 2015 22 DevOps for Software-Defined Telecom Infrastructures 23 draft-unify-nfvrg-devops-02.txt 25 Abstract 27 Carrier-grade network management was optimized for environments built 28 with monolithic physical nodes and involves significant deployment, 29 integration and maintenance efforts from network service providers. 30 The introduction of virtualization technologies, from the physical 31 layer all the way up to the application layer, however, invalidates 32 several well-established assumptions in this domain. This draft opens 33 the discussion in NFVRG about challenges related to transforming the 34 telecom network infrastructure into an agile, model-driven production 35 environment for communication services. We take inspiration from data 36 center DevOps regarding how to simplify and automate management 37 processes for a telecom service provider software-defined 38 infrastructure (SDI). Finally, we introduce challenges associated 39 with operationalizing DevOps principles at scale in software-defined 40 telecom networks in three areas related to key monitoring, 41 verification and troubleshooting processes. 43 Status of this Memo 45 This Internet-Draft is submitted in full conformance with the 46 provisions of BCP 78 and BCP 79. 48 Internet-Drafts are working documents of the Internet Engineering 49 Task Force (IETF), its areas, and its working groups. Note that 50 other groups may also distribute working documents as Internet- 51 Drafts. 53 Internet-Drafts are draft documents valid for a maximum of six months 54 and may be updated, replaced, or obsoleted by other documents at any 55 time. It is inappropriate to use Internet-Drafts as reference 56 material or to cite them other than as "work in progress." 58 The list of current Internet-Drafts can be accessed at 59 http://www.ietf.org/ietf/1id-abstracts.txt 61 The list of Internet-Draft Shadow Directories can be accessed at 62 http://www.ietf.org/shadow.html 64 This Internet-Draft will expire on January 6, 2015. 66 Copyright Notice 68 Copyright (c) 2015 IETF Trust and the persons identified as the 69 document authors. All rights reserved. 71 This document is subject to BCP 78 and the IETF Trust's Legal 72 Provisions Relating to IETF Documents 73 (http://trustee.ietf.org/license-info) in effect on the date of 74 publication of this document. Please review these documents 75 carefully, as they describe your rights and restrictions with respect 76 to this document. Code Components extracted from this document must 77 include Simplified BSD License text as described in Section 4.e of 78 the Trust Legal Provisions and are provided without warranty as 79 described in the Simplified BSD License. 81 Table of Contents 83 1. Introduction...................................................3 84 2. Software-Defined Telecom Infrastructure: Roles and DevOps 85 principles........................................................5 86 2.1. Service Developer Role....................................5 87 2.2. VNF Developer role........................................5 88 2.3. Operator role.............................................6 89 2.4. DevOps Principles.........................................6 90 3. Continuous Integration.........................................7 91 4. Continuous Delivery............................................8 92 5. Stability Challenges...........................................8 93 6. Consistency, Availability and Partitioning Challenges.........10 94 7. Observability Challenges......................................11 95 8. Verification Challenges.......................................11 96 9. Troubleshooting Challenges....................................13 97 10. Programmable network management..............................14 98 11. DevOps Performance Metrics...................................15 99 12. Security Considerations......................................16 100 13. IANA Considerations..........................................16 101 14. Informative References.......................................16 102 15. Acknowledgments..............................................18 104 1. Introduction 106 Carrier-grade network management was developed as an incremental 107 solution once a particular network technology matured and came to be 108 deployed in parallel with legacy technologies. This approach requires 109 significant integration efforts when new network services are 110 launched. Both centralized and distributed algorithms have been 111 developed in order to solve very specific problems related to 112 configuration, performance and fault management. However, such 113 algorithms consider a network that is by and large functionally 114 static. Thus, management processes related to introducing new or 115 maintaining functionality are complex and costly due to significant 116 efforts required for verification and integration. 118 Network virtualization, by means of Software-Defined Networking (SDN) 119 and Network Function Virtualization (NFV), creates an environment 120 where network functions are no longer static nor stricltly embedded 121 in physical boxes deployed at fixed points. The virtualized network 122 is dynamic and open to fast-paced innovation enabling efficient 123 network management and reduction of operating cost for network 124 operators. A significant part of network capabilities are expected to 125 become available through interfaces that resemble the APIs widespread 126 within datacenters instead of the traditional telecom means of 127 management such as the Simple Network Management Protocol, Command 128 Line Interfaces or CORBA. Such an API-based approach, combined with 129 the programmability offered by SDN interfaces [RFC7426], open 130 opportunities for handling infrastructure, resources, and Virtual 131 Network Functions (VNFs) as code, employing techniques from software 132 engineering. 134 The efficiency and integration of existing management techniques in 135 virtualized and dynamic network environments are limited, however. 136 Monitoring tools, e.g. based on simple counters, physical network 137 taps and active probing, do not scale well and provide only a small 138 part of the observability features required in such a dynamic 139 environment. Although huge amounts of monitoring data can be 140 collected from the nodes, the typical granularity is rather coarse. 141 Debugging and troubleshooting techniques developed for software- 142 defined environments are a research topic that has gathered interest 143 in the research community in the last years. Still, it is yet to be 144 explored how to integrate them into an operational network management 145 system. Moreover, research tools developed in academia (such as 146 NetSight [H2014], OFRewind [W2011], FlowChecker [S2010], etc.) were 147 limited to solving very particular, well-defined problems, and 148 oftentimes are not built for automation and integration into carrier- 149 grade network operations workflows. 151 The topics at hand have already attracted several standardization 152 organizations to look into the issues arising in this new 153 environment. For example, IETF working groups have activities in the 154 area of OAM and Verification for Service Function Chaining 155 [I-D.aldrin-sfc-oam-framework] [I-D.lee-sfc-verification] for Service 156 Function Chaining. At IRTF, [RFC7149] asks a set of relevant 157 questions regarding operations of SDNs. The ETSI NFV ISG defines the 158 MANO interfaces [NFVMANO], and TMForum investigates gaps between 159 these interfaces and existing specifications in [TR228]. The need for 160 programmatic APIs in the orchestration of compute, network and 161 storage resources is discussed in [I- 162 D.unify-nfvrg-challenges]. 164 From a research perspective, problems related to operations of 165 software-defined networks are in part outlined in [SDNsurvey] and 166 research referring to both cloud and software-defined networks are 167 discussed in [D4.1]. 169 The purpose of this first version of this document is to act as a 170 discussion opener in NFVRG by describing a set of principles that are 171 relevant for applying DevOps ideas to managing software-defined 172 telecom network infrastructures. We identify a set of challenges 173 related to developing tools, interfaces and protocols that would 174 support these principles and how can we leverage standard APIs for 175 simplifying management tasks. 177 2. Software-Defined Telecom Infrastructure: Roles and DevOps principles 179 Agile methods used in many software focused companies are focused on 180 releasing small interactions of code tom implement VNFs with high 181 velocity and high quality into a production environment. Similarly 182 Service providers are interested to release incremental improvements 183 in the network services that they create from virtualized network 184 functions. The cycle time for DevOps as applied in many open source 185 projects is on the order of one quarter year or 13 weeks. 187 The code needs to undergo a significant amount of automated testing 188 and verification with pre-defined templates in a realistic setting. 189 From the point of view of infrastructure management, the verification 190 of the network configuration as result of network policy 191 decomposition and refinement, as well as the configuration of virtual 192 functions, is one of the most sensitive operations. When 193 troubleshooting the cause of unexpected behavior, fine-grained 194 visibility onto all resources supporting the virtual functions 195 (either compute, or network-related) is paramount to facilitating 196 fast resolution times. While compute resources are typically very 197 well covered by debugging and profiling toolsets based on many years 198 of advances in software engineering, programmable network resources 199 are a still a novelty and tools exploiting their potential are 200 scarce. 202 2.1. Service Developer Role 204 We identify two dimensions of the "developer" role in software- 205 defined infrastructure (SDI). One dimension relates to determining 206 which high-level functions should be part of a particular service, 207 deciding what logical interconnections are needed between these 208 blocks and defining a set of high-level constraints or goals related 209 to parameters that define, for instance, a Service Function Chain. 210 This could be determined by the product owner for a particular family 211 of services offered by a telecom provider. Or, it might be a key 212 account representative that adapts an existing service template to 213 the requirements of a particular customer by adding or removing a 214 small number of functional entities. We refer to this person as the 215 Service Developer and for simplicity (access control, training on 216 technical background, etc.) we consider the role to be internal to 217 the telecom provider. 219 2.2. VNF Developer role 221 The other dimension of the "developer" role is a person that writes 222 the software code for a new virtual network function (VNF). Depending 223 on the actual VNF being developed, this person might be internal or 224 external to the telecom provider. We refer to them as VNF Developers. 226 2.3. Operator role 228 The role of an Operator in SDI is to ensure that the deployment 229 processes were successful and a set of performance indicators 230 associated to a service are met while the service is supported on 231 virtual infrastructure within the domain of a telecom provider. 233 System integration roles are important and we intend to approach them 234 in a future reversion of this draft. 236 2.4. DevOps Principles 238 In line with the generic DevOps concept outlined in [DevOpsP], we 239 consider that these four principles as important for adapting DevOps 240 ideas to SDI: 242 * Deploy with repeatable, reliable processes: Service and VNF 243 Developers should be supported by automated build, orchestrate and 244 deploy processes that are identical in the development, test and 245 production environments. Such processes need to be made reliable and 246 trusted in the sense that they should reduce the chance of human 247 error and provide visibility at each stage of the process, as well as 248 have the possibility to enable manual interactions in certain key 249 stages. 251 * Develop and test against production-like systems: both Service 252 Developers and VNF Developers need to have the opportunity to verify 253 and debug their respective SDI code in systems that have 254 characteristics which are very close to the production environment 255 where the code is expected to be ultimately deployed. Customizations 256 of Service Function Chains or VNFs could thus be released frequently 257 to a production environment in compliance with policies set by the 258 Operators. Adequate isolation and protection of the services active 259 in the infrastructure from services being tested or debugged should 260 be provided by the production environment. 262 * Monitor and validate operational quality: Service Developers, VNF 263 Developers and Operators must be equipped with tools, automated as 264 much as possible, that enable to continuously monitor the operational 265 quality of the services deployed on SDI. Monitoring tools should be 266 complemented by tools that allow verifying and validating the 267 operational quality of the service in line with established 268 procedures which might be standardized (for example, Y.1564 Ethernet 269 Activation [Y1564]) or defined through best practices specific to a 270 particular telecom operator. 272 * Amplify development cycle feedback loops: An integral part of the 273 DevOps ethos is building a cross-cultural environment that bridges 274 the cultural gap between the desire for continuous change by the 275 Developers and the demand by the Operators for stability and 276 reliability of the infrastructure. Feedback from customers is 277 collected and transmitted throughout the organization. From a 278 technical perspective, such cultural aspects could be addressed 279 through common sets of tools and APIs that are aimed at providing a 280 shared vocabulary for both Developers and Operators, as well as 281 simplifying the reproduction of problematic situations in the 282 development, test and operations environments. 284 Network operators that would like to move to agile methods to deploy 285 and manage their networks and services face a different environment 286 compared to typical software companies where simplified trust 287 relationships between personnel are the norm. In such companies, it 288 is not uncommon that the same person may be rotating between 289 different roles. In contrast, in a telecom service provider, there 290 are strong organizational boundaries between suppliers (whether in 291 Developer roles for network functions, or in Operator roles for 292 outsourced services) and the carrier's own personnel that might also 293 take both Developer and Operator roles. How DevOps principles reflect 294 on these trust relationships and to what extent initiatives such as 295 co-creation could transform the environment to facilitate closer Dev 296 and Ops integration across business boundaries is an interesting area 297 for business studies, but we could not for now identify a specific 298 technological challenge. 300 3. Continuous Integration 302 Software integration is the process of bringing together the software 303 component subsystems into one software system, and ensuring that the 304 subsystems function together as a system. Software integration can 305 apply regardless of the size of the software components. The 306 objective of Continuous Integration is to prevent integration 307 problems close to the expected release of a software development 308 project into a production (operations) environment. Continuous 309 Integration is therefore closely coupled with the notion of DevOps as 310 a mechanism to ease the transition from development to operations. 312 Continuous integration may result in multiple builds per day. It is 313 also typically used in conjunction with test driven development 314 approaches that integrate unit testing into the build process. The 315 unit testing is typically automated through build servers. Such 316 servers may implement a variety of additional static and dynamic 317 tests as well as other quality control and documentation extraction 318 functions. The reduced cycle times of continuous enable improved 319 software quality by applying small efforts frequently. 321 Continuous Integration applies to developers of VNF as they integrate 322 the components that they need to deliver their VNF. The VNFs may 323 contain components developed by different teams within the VNF 324 Provider, or may integrate code developed externally - e.g. in 325 commercial code libraries or in open source communities. 327 Service providers also apply continuous integration in the 328 development of network services. Network services are comprised of 329 various aspects including VNFs and connectivity within and between 330 them as well as with various associated resource authorizations. The 331 components of the networks service are all dynamic, and largely 332 represented by software that must be integrated regularly to maintain 333 consistency. Some of the software components that Service Providers 334 may be sourced from VNF Providers or from open source communities. 335 Service Providers are increasingly motivated to engage with open 336 Source communities [OSandS]. Open source interfaces supported by open 337 source communities may be more useful than traditional paper 338 interface specifications. Even where Service Providers are deeply 339 engaged in the open source community (e.g. OPNFV) many service 340 providers may prefer to obtain the code through some software 341 provider as a business practice. Such software providers have the 342 same interests in software integration as other VNF providers. 344 4. Continuous Delivery 346 The practice of Continuous Delivery extends Continuous Integration by 347 ensuring that the software checked in on the mainline is always in a 348 user deployable state and enables rapid deployment by those users. 350 5. Stability Challenges 352 The dimensions, dynamicity and heterogeneity of networks are growing 353 continuously. Monitoring and managing the network behavior in order 354 to meet technical and business objectives is becoming increasingly 355 complicated and challenging, especially when considering the need of 356 predicting and taming potential instabilities. 358 In general, instability in networks may have primary effects both 359 jeopardizing the performance and compromising an optimized use of 360 resources, even across multiple layers: in fact, instability of end- 361 to-end communication paths may depend both on the underlying 362 transport network, as well as the higher level components specific to 363 flow control and dynamic routing. For example, arguments for 364 introducing advanced flow admission control are essentially derived 365 from the observation that the network otherwise behaves in an 366 inefficient and potentially unstable manner. Even with resources over 367 provisioning, a network without an efficient flow admission control 368 has instability regions that can even lead to congestion collapse in 369 certain configurations. Another example is the instability which is 370 characteristic of any dynamically adaptive routing system. Routing 371 instability, which can be (informally) defined as the quick change of 372 network reachability and topology information, has a number of 373 possible origins, including problems with connections, router 374 failures, high levels of congestion, software configuration errors, 375 transient physical and data link problems, and software bugs. 377 As a matter of fact, the states monitored and used to implement the 378 different control and management functions in network nodes are 379 governed by several low-level configuration commands (today still 380 done mostly manually). Further, there are several dependencies among 381 these states and the logic updating the states (most of which are not 382 kept aligned automatically). Normally, high-level network goals (such 383 as the connectivity matrix, load-balancing, traffic engineering 384 goals, survivability requirements, etc) are translated into low-level 385 configuration commands (mostly manually) individually executed on the 386 network elements (e.g., forwarding table, packet filters, link- 387 scheduling weights, and queue-management parameters, as well as 388 tunnels and NAT mappings). Network instabilities due to configuration 389 errors can spread from node to node and propagate throughout the 390 network. 392 DevOps in the data center is a source of inspiration regarding how to 393 simplify and automate management processes for software-defined 394 infrastructure. 396 As a specific example, automated configuration functions are expected 397 to take the form of a "control loop" that monitors (i.e., measures) 398 current states of the network, performs a computation, and then 399 reconfigures the network. These types of functions must work 400 correctly even in the presence of failures, variable delays in 401 communicating with a distributed set of devices, and frequent changes 402 in network conditions. Nevertheless cascading and nesting of 403 automated configuration processes can lead to the emergence of non- 404 linear network behaviors, and as such sudden instabilities (i.e. 406 identical local dynamic can give rise to widely different global 407 dynamics). 409 6. Consistency, Availability and Partitioning Challenges 411 The CAP theorem [CAP] states that any networked shared-data system 412 can have at most two of following three properties: 1) Consistency 413 (C) equivalent to having a single up-to-date copy of the data; 2) 414 high Availability (A) of that data (for updates); and 3) tolerance to 415 network Partitions (P). 417 Looking at a telecom SDI as a distributed computational system 418 (routing/forwarding packets can be seen as a computational problem), 419 just two of the three CAP properties will be possible at the same 420 time. The general idea is that 2 of the 3 have to be chosen. CP favor 421 consistency, AP favor availability, CA there are no partition. This 422 has profound implications for technologies that need to be developed 423 in line with the "deploy with repeatable, reliable processes" 424 principle for configuring SDI states. Latency or delay and 425 partitioning properties are closely related, and such relation 426 becomes more important in the case of telecom service providers where 427 Devs and Ops interact with widely distributed infrastructure. 428 Limitations of interactions between centralized management and 429 distributed control need to be carefully examined in such 430 environments. Traditionally connectivity was the main concern: C and 431 A was about delivering packets to destination. The features and 432 capabilities of SDN and NFV are changing the concerns: for example 433 in SDN, control plane Partitions no longer imply data plane 434 Partitions, so A does not imply C. In practice, CAP reflects the need 435 for a balance between local/distributed operations and 436 remote/centralized operations. 438 Furthermore to CAP aspects related to individual protocols, 439 interdependencies between CAP choices for both resources and VNFs 440 that are interconnected in a forwarding graph need to be considered. 441 This is particularly relevant for the "Monitor and Validate 442 Operational Quality" principle, as apart from transport protocols, 443 most OAM functionality is generally configured in processes that are 444 separated from the configuration of the monitored entities. Also, 445 partitioning in a monitoring plane implemented through VNFs executed 446 on compute resources does not necessarily mean that the dataplane of 447 the monitored VNF was partitioned as well. 449 7. Observability Challenges 451 Monitoring algorithms need to operate in a scalable manner while 452 providing the specified level of observability in the network, either 453 for operation purposes (Ops part) or for debugging in a development 454 phase (Dev part). We consider the following challenges: 456 * Scalability - relates to the granularity of network observability, 457 computational efficiency, communication overhead, and strategic 458 placement of monitoring functions. 460 * Distributed operation and information exchange between monitoring 461 functions - monitoring functions supported by the nodes may perform 462 specific operations (such as aggregation or filtering) locally on the 463 collected data or within a defined data neighborhood and forward only 464 the result to a management system. Such operation may require 465 modifications of existing standards and development of protocols for 466 efficient information exchange and messaging between monitoring 467 functions. Different levels of granularity may need to be offered for 468 the data exchanged through the interfaces, depending on the Dev or 469 Ops role. 471 * Configurability and conditional observability - monitoring 472 functions that go beyond measuring simple metrics (such as delay, or 473 packet loss) require expressive monitoring annotation languages for 474 describing the functionality such that it can be programmed by a 475 controller. Monitoring algorithms implementing self-adaptive 476 monitoring behavior relative to local network situations may employ 477 such annotation languages to receive high-level objectives (KPIs 478 controlling tradeoffs between accuracy and measurement frequency, for 479 example) and conditions for varying the measurement intensity. 481 * Automation - includes mapping of monitoring functionality from a 482 logical forwarding graph to virtual or physical instances executing 483 in the infrastructure, as well as placement and re-placement of 484 monitoring functionality for required observability coverage and 485 configuration consistency upon updates in a dynamic network 486 environment. 488 8. Verification Challenges 490 Enabling ongoing verification of code is an important goal of 491 continuous integration as part of the data center DevOps concept. In 492 a telecom SDI, service definitions, decompositions and configurations 493 need to be expressed in machine-readable encodings. For example, 494 configuration parameters could be expressed in terms of YANG data 495 models. However, the infrastructure management layers (such as 496 Software-Defined Network Controllers and Orchestration functions) 497 might not always export such machine-readable descriptions of the 498 runtime configuration state. In this case, the management layer 499 itself could be expected to include a verification process that has 500 the same challenges as the stand-alone verification processes we 501 outline later in this section. In that sense, verification can be 502 considered as a set of features providing gatekeeper functions to 503 verify both the abstract service models and the proposed resource 504 configuration before or right after the actual instantiation on the 505 infrastructure layer takes place. 507 A verification process can involve different layers of the network 508 and service architecture. Starting from a high-level verification of 509 the customer input (for example, a Service Graph as defined in [I- 510 D.unify-nfvrg-challenges]), the verification process could go more in 511 depth to reflect on the Service Function Chain configuration. At the 512 lowest layer, the verification would handle the actual set of 513 forwarding rules and other configuration parameters associated to a 514 Service Function Chain instance. This enables the verification of 515 more quantitative properties (e.g. compliance with resource 516 availability), as well as a more detailed and precise verification of 517 the abovementioned topological ones. Existing SDN verification tools 518 could be deployed in this context, but the majority of them only 519 operate on flow space rules commonly expressed using OpenFlow syntax. 521 Moreover, such verification tools were designed for networks where 522 the flow rules are necessary and sufficient to determine the 523 forwarding state. This assumption is valid in networks composed only 524 by network functions that forward traffic by analyzing only the 525 packet headers (e.g. simple routers, stateless firewalls, etc.). 526 Unfortunately, most of the real networks contain active network 527 functions, represented by middle-boxes that dynamically change the 528 forwarding path of a flow according to function-local algorithms and 529 an internal state (that is based on the received packets), e.g. load 530 balancers, packet marking modules and intrusion detection systems. 531 The existing verification tools do not consider active network 532 functions because they do not account for the dynamic transformation 533 of an internal state into the verification process. 535 Defining a set of verification tools that can account for active 536 network functions is a significant challenge. In order to perform 537 verification based on formal properties of the system, the internal 538 states of an active (virtual or not) network function would need to 539 be represented. Although these states would increase the verification 540 process complexity (e.g., using simple model checking would not be 541 feasible due to state explosion), they help to better represent the 542 forwarding behavior in real networks. A way to address this challenge 543 is by attempting to summarize the internal state of an active network 544 function in a way that allows for the verification process to finish 545 within a reasonable time interval. 547 9. Troubleshooting Challenges 549 One of the problems brought up by the complexity introduced by NFV 550 and SDN is pinpointing the cause of a failure in an infrastructure 551 that is under continuous change. Developing an agile and low- 552 maintenance debugging mechanism for an architecture that is comprised 553 of multiple layers and discrete components is a particularly 554 challenging task to carry out. Verification, observability, and 555 probe-based tools are key to troubleshooting processes, regardless 556 whether they are followed by Dev or Ops personnel. 558 * Automated troubleshooting workflows 560 Failure is a frequently occurring event in network operation. 561 Therefore, it is crucial to monitor components of the system 562 periodically. Moreover, the troubleshooting system should search for 563 the cause automatically in the case of failure. If the system follows 564 a multi-layered architecture, monitoring and debugging actions should 565 be performed on components from the topmost layer to the bottom layer 566 in a chain. Likewise, the result of operations should be notified in 567 reverse order. In this regard, one should be able to define 568 monitoring and debugging actions through a common interface that 569 employs layer hopping logic. Besides, this interface should allow 570 fine-grained and automatic on-demand control for the integration of 571 other monitoring and verification mechanisms and tools. 573 * Troubleshooting with active measurement methods 575 Besides detecting network changes based on passively collected 576 information, active probes to quantify delay, network utilization and 577 loss rate are important to debug errors and to evaluate the 578 performance of network elements. While tools that are effective in 579 determining such conditions for particular technologies were 580 specified by IETF and other standardization organization, their use 581 requires a significant amount of manual labor in terms of both 582 configuration and interpretation of the results; see also Section 583 Error! Reference source not found. 585 In contrast, methods that test and debug networks systematically 586 based on models generated from the router configuration, router 587 interface tables or forwarding tables, would significantly simplify 588 management. They could be made usable by Dev personnel that have 589 little expertise on diagnosing network defects. Such tools naturally 590 lend themselves to integration into complex troubleshooting workflows 591 that could be generated automatically based on the description of a 592 particular service chain. However, there are scalability challenges 593 associated with deploying such tools in a network. Some tools may 594 poll each networking device for the forwarding table information to 595 calculate the minimum number of test packets to be transmitted in the 596 network. Therefore, as the network size and the forwarding table size 597 increase, forwarding table updates for the tools may put a non- 598 negligible load in the network. 600 10. Programmable network management 602 The ability to automate a set of actions to be performed on the 603 infrastructure, be it virtual or physical, is key to productivity 604 increases following the application of DevOps principles. Previous 605 sections in this document touched on different dimensions of 606 programmability: 608 - Section 6 approached programmability in the context of developing 609 new capabilities for monitoring and for dynamically setting 610 configuration parameters of deployed monitoring functions 612 - Section 7 reflected on the need to determine the correctness of 613 actions that are to be inflicted on the infrastructure as result 614 of executing a set of high-level instructions 616 - Section 8 considered programmability in the perspective of an 617 interface to facilitate dynamic orchestration of troubleshooting 618 steps towards building workflows and for reducing the manual steps 619 required in troubleshooting processes 621 We expect that programmable network management - along the lines of 622 [RFC7426] - will draw more interest as we move forward. For 623 example,in [I-D.unify-nfvrg-challenges], the authors identify the 624 need for presenting programmable interfaces that accept instructions 625 in a standards-supported manner for the Two-way Active Measurement 626 Protocol (TWAMP)TWAMP protocol. More specifically, an excellent 627 example in this case is traffic measurements, which are extensively 628 used today to determine SLA adherence as well as debug and 629 troubleshoot pain points in service delivery. TWAMP is both widely 630 implemented by all established vendors and deployed by most global 631 operators. However, TWAMP management and control today relies solely 632 on diverse and proprietary tools provided by the respective vendors 633 of the equipment. For large, virtualized, and dynamically 634 instantiated infrastructures where network functions are placed 635 according to orchestration algorithms proprietary mechanisms for 636 managing TWAMP measurements have severe limitations. For example, 637 today's TWAMP implementations are managed by vendor-specific, 638 typically command-line interfaces (CLI), which can be scripted on a 639 platform-by-platform basis. As a result, although the control and 640 test measurement protocols are standardized, their respective 641 management is not. This hinders dramatically the possibility to 642 integrate such deployed functionality in the SP-DevOps concept. In 643 this particular case, recent efforts in the IPPM WG 644 [I-D.cmzrjp-ippm-twamp-yang] aim to define a standard TWAMP data 645 model and effectively increase the programmability of TWAMP 646 deployments in the future. 648 Data center DevOps tools, such as those surveyed in [D4.1], developed 649 proprietary methods for describing and interacting through interfaces 650 with the managed infrastructure. Within certain communities, they 651 became de-facto standards in the same way particular CLIs became de- 652 facto standards for Internet professionals. Although open-source 653 components and a strong community involvement exists, the diversity 654 of the new languages and interfaces creates a burden for both vendors 655 in terms of choosing which ones to prioritize for support, and then 656 developing the functionality and operators that determine what fits 657 best for the requirements of their systems. 659 11. DevOps Performance Metrics 661 Defining a set of metrics that are used as performance indicators is 662 important for service providers to ensure the successful deployment 663 and operation of a service in the software-defined telecom 664 infrastructure. 666 We identify three types of considerations that are particularly 667 relevant for these metrics: 1) technical considerations directly 668 related to the service provided, 2) process-related considerations 669 regarding the deployment, maintenance and troubleshooting of the 670 service, i.e. concerning the operation of VNFs, and 3) cost-related 671 considerations associated to the benefits from using a Software- 672 Defined Telecom Infrastructure. 674 First, technical performance metrics shall be service-dependent/- 675 oriented and may address inter-alia service performance in terms of 676 delay, throughput, congestion, energy consumption, availability, etc. 677 Acceptable performance levels should be mapped to SLAs and the 678 requirements of the service users. Metrics in this category were 679 defined in IETF working groups and other standardization 680 organizations with responsibility over particular service or 681 infrastructure descriptions. 683 Second, process-related metrics shall serve a wider perspective in 684 the sense that they shall be applicable for multiple types of 685 services. For instance, process-related metrics may include: number 686 of probes for end-to-end QoS monitoring, number of on-site 687 interventions, number of unused alarms, number of configuration 688 mistakes, incident/trouble delay resolution, delay between service 689 order and deliver, or number of self-care operations. 691 Third, cost-related metrics shall be used to monitor and assess the 692 benefit of employing SDI compared to the usage of legacy hardware 693 infrastructure with respect to operational costs, e.g. possible man- 694 hours reductions, elimination of deployment and configuration 695 mistakes, etc. 697 Finally, identifying a number of highly relevant metrics for DevOps 698 and especially monitoring and measuring them is highly challenging 699 because of the amount and availability of data sources that could be 700 aggregated within one such metric, e.g. calculation of human 701 intervention, or secret aspects of costs. 703 12. Security Considerations 705 TBD 707 13. IANA Considerations 709 This memo includes no request to IANA. 711 14. Informative References 713 [NFVMANO] ETSI, "Network Function Virtualization (NFV) Management 714 and Orchestration V0.6.1 (draft)", Jul. 2014 716 [I-D.aldrin-sfc-oam-framework] S. Aldrin, R. Pignataro, N. Akiya. 717 "Service Function Chaining Operations, Administration and 718 Maintenance Framework", draft-aldrin-sfc-oam-framework-01, 719 (work in progress), July 2014. 721 [I-D.lee-sfc-verification] S. Lee and M. Shin. "Service Function 722 Chaining Verification", draft-lee-sfc-verification-00, 723 (work in progress), February 2014. 725 [RFC7426] E. Haleplidis (Ed.), K. Pentikousis (Ed.), S. Denazis, J. 726 Hadi Salim, D. Meyer, and O. Koufopavlou, "Software Defined 727 Networking (SDN): Layers and Architecture Terminology", 728 RFC 7426, January 2015 730 [RFC7149] M. Boucadair and C Jaquenet. "Software-Defined Networking: 731 A Perspective from within a Service Provider Environment", 732 RFC 7149, March 2014. 734 [TR228] TMForum Gap Analysis Related to MANO Work. TR228, May 2014 736 [I-D.unify-nfvrg-challenges] R. Szabo et al. "Unifying Carrier and 737 Cloud Networks: Problem Statement and Challenges", draft- 738 unify-nfvrg-challenges-02 (work in progress), July 2015 740 [I-D.cmzrjp-ippm-twamp-yang] Civil, R., Morton, A., Zheng, L., 741 Rahman, R., Jethanandani, M., and K. Pentikousis, "Two-Way 742 Active Measurement Protocol (TWAMP) Data Model", draft- 743 cmzrjp-ippm-twamp-yang-01 (work in progress), July 2015. 745 [D4.1] W. John et al. D4.1 Initial requirements for the SP-DevOps 746 concept, universal node capabilities and proposed tools, 747 August 2014. 749 [SDNsurvey] D. Kreutz, F. M. V. Ramos, P. Verissimo, C. Esteve 750 Rothenberg, S. Azodolmolky, S. Uhlig. "Software-Defined 751 Networking: A Comprehensive Survey." To appear in 752 proceedings of the IEEE, 2015. 754 [DevOpsP] "DevOps, the IBM Approach" 2013. [Online]. 756 [Y1564] ITU-R Recommendation Y.1564: Ethernet service activation 757 test methodology, March 2011 759 [CAP] E. Brewer, "CAP twelve years later: How the "rules" have 760 changed", IEEE Computer, vol.45, no.2, pp.23,29, Feb. 2012. 762 [H2014] N. Handigol, B. Heller, V. Jeyakumar, D. Mazieres, N. 763 McKeown; "I Know What Your Packet Did Last Hop: Using 764 Packet Histories to Troubleshoot Networks", In Proceedings 765 of the 11th USENIX Symposium on Networked Systems Design 766 and Implementation (NSDI 14), pp.71-95 768 [W2011] A. Wundsam, D. Levin, S. Seetharaman, A. Feldmann; 769 "OFRewind: Enabling Record and Replay Troubleshooting for 770 Networks". In Proceedings of the Usenix Anual Technical 771 Conference (Usenix ATC '11), pp 327-340 773 [S2010] E. Al-Shaer and S. Al-Haj. "FlowChecker: configuration 774 analysis and verification of federated Openflow 775 infrastructures" In Proceedings of the 3rd ACM workshop on 776 Assurable and usable security configuration (SafeConfig 777 '10). Pp. 37-44 779 [OSandS] S. Wright, D. Druta, "Open Source and Standards: The Role 780 of Open Source in the Dialogue between Research and 781 Standardization" Globecom Workshops (GC Wkshps), 2014 , 782 pp.650,655, 8-12 Dec. 2014 784 15. Acknowledgments 786 The research leading to these results has received funding from the 787 European Union Seventh Framework Programme FP7/2007-2013 under grant 788 agreement no. 619609 - the UNIFY project. The views expressed here 789 are those of the authors only. The European Commission is not liable 790 for any use that may be made of the information in this document. 792 We would like to thank in particular the UNIFY WP4 contributors, the 793 internal reviewers of the UNIFY WP4 deliverables, and Wolfgang John 794 from Ericsson for the useful discussions and insightful comments. 796 This document was prepared using 2-Word-v2.0.template.dot. 798 Authors' Addresses 800 Catalin Meirosu 801 Ericsson Research 802 S-16480 Stockholm, Sweden 803 Email: catalin.meirosu@ericsson.com 805 Antonio Manzalini 806 Telecom Italia 807 Via Reiss Romoli, 274 808 10148 - Torino, Italy 809 Email: antonio.manzalini@telecomitalia.it 811 Juhoon Kim 812 Deutsche Telekom AG 813 Winterfeldtstr. 21 814 10781 Berlin, Germany 815 Email: J.Kim@telekom.de 817 Rebecca Steinert 818 SICS Swedish ICT AB 819 Box 1263, SE-16429 Kista, Sweden 820 Email: rebste@sics.se 822 Sachin Sharma 823 Ghent University-iMinds 824 Research group IBCN - Department of Information Technology 825 Zuiderpoort Office Park, Blok C0 826 Gaston Crommenlaan 8 bus 201 827 B-9050 Gent, Belgium 828 Email: sachin.sharma@intec.ugent.be 830 Guido Marchetto 831 Politecnico di Torino 832 Corso Duca degli Abruzzi 24 833 10129 - Torino, Italy 834 Email: guido.marchetto@polito.it 836 Ioanna Papafili 837 Hellenic Telecommunications Organization 838 Measurements and Wireless Technologies Section 839 Laboratories and New Technologies Division 840 2, Spartis & Pelika str., Maroussi, 841 GR-15122, Attica, Greece 842 Buidling E, Office 102 843 Email: iopapafi@oteresearch.gr 845 Kostas Pentikousis 846 EICT GmbH 847 Torgauer Strasse 12-15 848 Berlin 10829 849 Germany 850 Email: k.pentikousis@eict.de 852 Steven Wright 853 AT&T Services Inc. 854 1057 Lenox Park Blvd NE, STE 4D28 855 Atlanta, GA 30319 856 USA 857 Email: sw3588@att.com