idnits 2.17.1 

draft-xwu-bmwg-nslam-01.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (May 18, 2018) is 2167 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Unused Reference: 'L2SM' is defined on line 707, but no explicit
     reference was found in the text

  == Unused Reference: 'RFC8049' is defined on line 711, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 5226 (Obsoleted by RFC 8126)

  == Outdated reference: A later version (-10) exists of
     draft-ietf-l2sm-l2vpn-service-model-02

  -- Obsolete informational reference (is this intentional?): RFC 8049
     (Obsoleted by RFC 8299)


     Summary: 1 error (**), 0 flaws (~~), 4 warnings (==), 2 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Benchmarking Workgroup                                             S. Wu
3	Internet-Draft                                          Juniper Networks
4	Intended status: Informational                              May 18, 2018
5	Expires: November 19, 2018

7	                  Network Service Layer Abstract Model
8	                        draft-xwu-bmwg-nslam-01

10	Abstract

12	   While the networking technologies have evolved over the years, the
13	   layered approach has been dominant in many network solutions.  Each
14	   layer may have multiple interchangeable, competing alternatives that
15	   deliver a similar set of functionality.  In order to provide an
16	   objective benchmarking data among various implementations, the need
17	   arises for a common abstract model for each network service layer,
18	   with a set of required and optional specifications in respective
19	   layers.  Many overlay and or underlay solutions can be described
20	   using these models.

22	Status of This Memo

24	   This Internet-Draft is submitted in full conformance with the
25	   provisions of BCP 78 and BCP 79.

27	   Internet-Drafts are working documents of the Internet Engineering
28	   Task Force (IETF).  Note that other groups may also distribute
29	   working documents as Internet-Drafts.  The list of current Internet-
30	   Drafts is at https://datatracker.ietf.org/drafts/current/.

32	   Internet-Drafts are draft documents valid for a maximum of six months
33	   and may be updated, replaced, or obsoleted by other documents at any
34	   time.  It is inappropriate to use Internet-Drafts as reference
35	   material or to cite them other than as "work in progress."

37	   This Internet-Draft will expire on November 19, 2018.

39	Copyright Notice

41	   Copyright (c) 2018 IETF Trust and the persons identified as the
42	   document authors.  All rights reserved.

44	   This document is subject to BCP 78 and the IETF Trust's Legal
45	   Provisions Relating to IETF Documents
46	   (https://trustee.ietf.org/license-info) in effect on the date of
47	   publication of this document.  Please review these documents
48	   carefully, as they describe your rights and restrictions with respect
49	   to this document.  Code Components extracted from this document must
50	   include Simplified BSD License text as described in Section 4.e of
51	   the Trust Legal Provisions and are provided without warranty as
52	   described in the Simplified BSD License.

54	Table of Contents

56	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
57	     1.1.  Terminology . . . . . . . . . . . . . . . . . . . . . . .   3
58	     1.2.  Purpose of The Document . . . . . . . . . . . . . . . . .   3
59	     1.3.  Conventions Used in This Document . . . . . . . . . . . .   3
60	   2.  Network Service Framework . . . . . . . . . . . . . . . . . .   4
61	     2.1.  Node  . . . . . . . . . . . . . . . . . . . . . . . . . .   4
62	     2.2.  Topology  . . . . . . . . . . . . . . . . . . . . . . . .   6
63	     2.3.  Infrastructure  . . . . . . . . . . . . . . . . . . . . .   7
64	     2.4.  Services  . . . . . . . . . . . . . . . . . . . . . . . .   8
65	   3.  Service Models  . . . . . . . . . . . . . . . . . . . . . . .   9
66	     3.1.  Layer 2 Ethernet Service Model  . . . . . . . . . . . . .   9
67	     3.2.  Layer 3 Service Model . . . . . . . . . . . . . . . . . .  10
68	     3.3.  Infrastructure Service Model  . . . . . . . . . . . . . .  10
69	     3.4.  Node Level Features . . . . . . . . . . . . . . . . . . .  11
70	     3.5.  Common Service Specification  . . . . . . . . . . . . . .  11
71	     3.6.  Common Network Events . . . . . . . . . . . . . . . . . .  12
72	       3.6.1.  Event Attributes  . . . . . . . . . . . . . . . . . .  12
73	       3.6.2.  Hardware Related  . . . . . . . . . . . . . . . . . .  13
74	       3.6.3.  Software Component  . . . . . . . . . . . . . . . . .  13
75	       3.6.4.  Protocol Events . . . . . . . . . . . . . . . . . . .  14
76	       3.6.5.  Redundancy Failover . . . . . . . . . . . . . . . . .  14
77	   4.  Use of Network Service Layer Abstract Model . . . . . . . . .  14
78	   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  15
79	   6.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  15
80	   7.  Security Considerations . . . . . . . . . . . . . . . . . . .  15
81	   8.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  15
82	     8.1.  Normative References  . . . . . . . . . . . . . . . . . .  15
83	     8.2.  Informative References  . . . . . . . . . . . . . . . . .  15
84	   Author's Address  . . . . . . . . . . . . . . . . . . . . . . . .  16

86	1.  Introduction

88	   This document provides a reference model for common network service
89	   framework.  The main purpose is to abstract service model for each
90	   network layer with a small set of key specifications.  This is
91	   essential to characterize the capability and capacity of a production
92	   network, a target network design.  A complete service model mainly
93	   includes

95	      Infrastructure - devices, links, and other equipment.

97	      Services - network applications provisioned.  It is often defined
98	      as device configuration and or resource allocation.

100	      Capacity - A set of objects dynamically created for both control
101	      and forwarding planes, such as routes, traffic, subscribers and
102	      etc.  In some cases, the amount and types of traffic may impact
103	      control plane objects, such as multicast or ethernet networks.

105	      Performance Metrics - infrastructure resource utilization.

107	1.1.  Terminology

109	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
110	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
111	   document are to be interpreted as described in RFC 2119 [RFC2119].

113	1.2.  Purpose of The Document

115	   Many efforts to YANG model and OpenConfig collaboration are well
116	   under way.  This document specifies a higher layer abstraction that
117	   reuses a small subset of YANG keywords for service description
118	   purpose.  It SHALL NOT be used for production provisioning purpose.
119	   Instead, it can be adopted for design spec, capacity planning,
120	   product benchmarking and test setup.

122	   The specification described in this document SHALL be used for
123	   outline service requirements from customer perspective, instead of
124	   network implementation mechanism from operators perspective.

126	1.3.  Conventions Used in This Document

128	   Descriptive terms can quickly become overloaded.  For consistency,
129	   the following definitions are used.

131	   o  Node - The name for an attrubute.

133	   o  Brackets "[" and "]" enclose list keys.

135	   o  Abbreviations before data node names: "rw" means configuration
136	      data (read-write), and "ro" means state data (read-only).

138	   o  Parentheses enclose choice and case nodes, and case nodes are also
139	      marked with a colon (":").

141	2.  Network Service Framework

143	   The network service layer abstract model is illustrated by Figure 1.
144	   This shows a stack of components to enable end-to-end services.  Not
145	   all components are required for a given service.  A common use case
146	   is to pick one component as target service with a detailed profile,
147	   with the remaining components as supporting technologies using
148	   default profiles.

150	                   Network Service Layer Abstract Model

152	                          +---+-+-+-+---+-+-+-+-+
153	                          | S |L|L| |   |L| |L| |
154	                          | E |2|2|V| E |3|M|U|6|
155	                          | R |V|C|P| V |V|V|B|P|
156	                          | V |P|K|L| P |P|P|G|E|
157	                          | I |N|T|S| N |N|N|P| |
158	                          | C +-+-+-+-+-+-+-+-+-+
159	                          | E | L2SVC |  L3SVC  |
160	                          +---+-------+---------+
161	                          |   |      BGP        |
162	                          | I +-----------------+
163	                          | N |    Transport    |
164	                          | F +-----------------+
165	                          | R |      IGP        |
166	                          | A +-----------------+
167	                          |   |    Bridging     |
168	                          +---+---------+-------+
169	                          | T |         | Active|
170	                          | O | Logical |Standby|
171	                          | P +---------+ M-Home|
172	                          | O | Physical| L/B   |
173	                          +---+-+-+-+-+-+-+-+-+-+
174	                          | N |I| | | | |C| | |I|
175	                          | O |N|F|Q| | |G|N|N|S|
176	                          | D |T|A|O|F|G|S|S|S|S|
177	                          | E |F|B|S|W|R|S|R|F|U|
178	                          +---+-+-+-+-+-+-+-+-+-+

180	                                 Figure 1

182	2.1.  Node

184	   A network node or a network device processes and forwards traffic
185	   based on predefined or dynamically learned policies.  This section
186	   covers standalone features like the following:

188	   o  INTF - Network interfaces that provides internal or external
189	      connectivity to adjacent devices.  Only physical properties of the
190	      interfaces are of concern in this level.  The interfaces can be
191	      physical or logical, wired or wireless.

193	   o  FAB - Fabric capacity.  It provides redundancy and cross connect
194	      within the same network device among various linecards or
195	      interfaces

197	   o  QOS - Quality of Services.  Traffic queuing, buffering, and
198	      congestion management technologies are used in this level

200	   o  FW - Firewall filters or access control list.  This commonly
201	      refers to stateless packet inspection and filtering.  Stateful
202	      firewall is out of scope of this document.  Number of filters
203	      daisy chained on a given protocol family, number of terms within
204	      same filter, and depth of packet inspection can all affect speed
205	      and latency of traffic forwarding.  It also provides necessary
206	      security protection of node, where protocol traffic may be
207	      affected.

209	   o  GR - Graceful Restart per protocol.  It needs to cooperate with
210	      adjacent node

212	   o  CGSS - Controller Graceful / Stateful Switchover.  A network
213	      device often has two redundant controllers to minimize the
214	      disruption in event of catastrophic failure on the primary
215	      controller.  This is accomplished via real time state
216	      synchronization from the primary to the backup controller.  It,
217	      however, should be used along with either NSR or NSF to achieve
218	      optimal redundancy.

220	   o  NSR - Non-Stop Routing - hitless failover of route processor.  It
221	      maintains an active copy of route information base (RIB) as well
222	      as state for protocol exchange so that the protocol adjacency is
223	      not reset.

225	   o  NSF - Non-Stop Forwarding for layer 2 traffic, including layer 2
226	      protocols such as spanning tree state

228	   o  ISSU - In Service Software Upgrade - Sub-second traffic loss in
229	      many modern routing platforms.  The demand for this feature
230	      continues to grow from the field.  Some study shows that the
231	      downtime due to software upgrade is greater than that caused by
232	      unplanned outages.

234	2.2.  Topology

236	   Placement of network devices and corresponding links plays an
237	   important role for optimal traffic forwarding.  There are two types
238	   of topologies:

240	   o  Physical Topology - Actual physical connectivity via fiber, coax,
241	      cat5 or even wireless.  That could be a ring, bus, star or matrix
242	      topology.  Even though all can be modeled using point-to-point
243	      connections.

245	   o  Logical Topology - With aggregated ethernet, extended dot1q
246	      tunneling, or VxLAN, a logical or virtual topology can be easily
247	      created spanning across geography boundaries.  Recent development
248	      of multi-chassis, virtual-chassis, node-slicing technologies, and
249	      multiple logical units within a single physical node have enabled
250	      logical topology deployment more flexible and agile.

252	   With various topology, the following functionalities need to be taken
253	   into consideration for feature design and validation.

255	   o  Active-Standby - 1:1 or 1:n support.  The liveness detection is
256	      essential to trigger failover.

258	   o  M-Home - Multi-homing support.  A customer edge (CE) device can be
259	      homed to 2 or more Provider Edge (PE) devices at the same time.
260	      This is a common redundancy design in layer 2 service offering

262	   o  L/B - Load Balancing - When multiple diverse paths exist for a
263	      given destination, it is important to achieve load balancing based
264	      on multiple criteria, such as per packet, or per prefix.
265	      Sometimes, cascading effect can make issues more complex and
266	      harder to resolve

268	   The topology, regardless of physical or virtual, can be better
269	   depicted using a collection of nodes and point-to-point links.  Some
270	   broadcast network, or ring topology, can also be abstracted using
271	   same collection of point-to-point links.  For example, in a wireless
272	   LAN network, each client is a node with wireless LAN NIC as its
273	   physical interface.  The access point is the node, at which all WLAN
274	   cients terminate with airwaves.  The Service Set IDentifier (SSID) on
275	   this access point can be considered as part of broadcast domain with
276	   many pseudo-ports taking airwave terminations from clients.

278	   The default link id can use srcnode-dstnode-linkseq to uniquely
279	   identify a link in this topology.  If this is a link connecting two
280	   ports on the same node, it can use link-id of srcnode-srcnode-
281	   linkseq-portseq.  Additional attributes of the node can be added with
282	   proper placement for auto topology diagram.

284	                        Network Topology Definition

286	   node-id-1 {
287	       maker: maker_name,
288	       model: model_name,
289	       controller: controller-type,
290	       mgmt_ip: mgmt_ip_address,
291	       links: {
292	           link-id-1 {
293	               name: link_name,
294	               connector: 'sfpp',
295	               attr: ['10G', 'Ethernet'],
296	               node_dst: destination node-id,
297	               link_seq: sequence number for links between the node pair
298	               ...
299	           }
300	       }
301	       ...
302	   }
303	   node-id-2 {
304	       ...
305	   }

307	                                 Figure 2

309	2.3.  Infrastructure

311	   Network infrastructure here refers to a list of protocols and
312	   policies for a data center network, an enterprise network, or a core
313	   backbone in a service provider network.

315	   o  Bridging - Spanning Tree Protocol (STP) and its various flavors,
316	      802.1q tunneling, Q-in-Q, VRRP and etc

318	   o  IGP - Interior Gateway Protocol - some common choices are OSPF,
319	      IS-IS, RIP, RIPng.  For multicast, choices are PIM and its various
320	      flavors including MSDP, Bootstrap, DVMRP

322	   o  Transport - Tunnel technologies including

324	      *  MPLS - Multi-Protocol Label Switching - most commonly used in
325	         service provider network

327	         +  LDP - Label distribution protocol - including mLDP and LDP
328	            Tunneling through RSVP LSPs

330	         +  RSVP - Resource Reservation Protocol - including P2MP and
331	            its various features like Fast ReRoute - FRR.

333	      *  IPSec - IPSec Tunnel with AH or ESP

335	      *  GRE - Generic Routing Encapsulation (GRE) tunnels provides a
336	         flexible direct adjacency between two remote routers

338	      *  VxLAN - In data center interconnect (DCI) solutions, VxLAN
339	         encapsulation provides data plane for layer 2 frames

341	   o  BGP - Define families and their sub-SAFI deployed, as well as
342	      route reflector topology.

344	2.4.  Services

346	   Previous sections mostly outline an operator's implementation of the
347	   network, while customers may not necessarily care about these.  This
348	   section defines service profiles from customer's view.

350	   o  Layer 2 Services

352	      *  Layer 2 VPN - RFC 6624

354	      *  Martini Layer 2 Circuit - RFC 4906

356	      *  Virtual Private LAN Services - RFC 4761

358	      *  Ethernet VPN - RFC 7432

360	   o  Layer 3 Services

362	      *  Type 5 Route for EVPN - draft-ietf-bess-evpn-prefix-
363	         advertisement-05

365	      *  Layer 3 VPN - RFC 4364

367	      *  Labled Unicast BGP - RFC 3107

369	      *  Draft Rosen MVPN - RFC 6037

371	      *  NG MVPN - RFC 6513

373	      *  6PE - RFC 4798

375	   In next section, an abstract model is proposed to identify key
376	   metrics for both layer 2 and layer 3 model

378	3.  Service Models

380	   A service model is a high level abstraction of network deployment
381	   from bottom up.  It defines a set of common key characteristics of
382	   customer traffic profile in both control and forwarding planes.  The
383	   network itself should be considered as a blackbox and deliver the
384	   services regardless of types of network equipment vendor or network
385	   technologies.

387	   The abstraction removes some details like specific IP address
388	   assignment, and favors address range and its distribution.  The goal
389	   is to describe aggregated network behavior instead of granular
390	   network element configuration.  It is up to implementation to map
391	   aggregated metrics to actual configuration for the network devices,
392	   protocol emulator and traffic genrator.

394	   A single network may be comprised of multiple instances of service
395	   models defined below.

397	3.1.  Layer 2 Ethernet Service Model

399	   The metrics outlined below are for layer 2 network services typically
400	   within a data center, data center interconnect, metro ethernet, or
401	   layer 2 domain over WAN or even inter-carrier.

403	   o  service-type: identityref, ELAN, ELINE, ETREE
404	   o  sites-per-instance: uint32, an average number of sites a layer 2
405	      instance may span across
406	   o  global-mac-count: uint32, Global MAC from all attachment circuits,
407	      local and remote.  This is probably the most important metric that
408	      determines the capacity requirements in layer 2 for both control
409	      and forwarding planes
410	   o  interface-mac-max: uint32, maximum number of locally learned MAC
411	      addresses per logical interface, aka attachment circuit
412	   o  single-home-segments: uint32, number of single homed ethernet
413	      segments per service instance
414	   o  multi-home-segments: uint32, number of multi homed ethernet
415	      segments per layer 2 service instance
416	   o  service-instance-count: uint32, total number of layer 2 service
417	      instances.  Typically, one customer is
418	   o  traffic-type: list, {known-unicast: %, multicast, %, broadcast: %,
419	      unknown-unicast: %}
420	   o  traffic-frame-size: list, predefined mixture of traffic frame size
421	      distribution
422	   o  traffic-load: speed of traffic being sent towards the network.
423	      This can be defined as frame per second (fps), or actual speed in
424	      bps.  This is particular important whenever some component along
425	      forwarding path is implemented in software, the throughput might
426	      be affected significantly at high speed
427	   o  traffic-flow: A distribution of flows.  This may affect efficient
428	      use of load-balancing techniques and resource consumption.  More
429	      details discussed in later section of this document.
430	   o  layer3-gateway-count: uint32, number of layer 2 service instances
431	      that also provide layer 3 gateway service
432	   o  arpnp-table-size: uint32.  This is only relevant with presence of
433	      layer 3 gateway

435	   Integrated routing and bridging (IRB) and EVPN Type 5 route have
436	   blurred boundaries between layer 2 and layer 3 services.

438	3.2.  Layer 3 Service Model

440	   This section outlines traffic type, layer 3 protocol families, layer
441	   3 prefixes distribution, layer 3 traffic flow and packet size
442	   distributions.

444	   o  proto-family: protocol family are defined with three sub-
445	      attributes.  The list may grow as the complexity

447	      *  proto - list: inet, inet6, iso
448	      *  type - list: unicast, mcast, segment, labeled
449	      *  vpn - list, true, false
450	   o  prefix-count, uint32, total unique prefixes
451	   o  prefix-distrib, list of prefix length size and percentage.  This
452	      could be a distribution pattern, such as uniform, random.  Or
453	      simply top representation of prefix lengths
454	   o  bgp-path-count, uint32, total BGP paths
455	   o  bgp-path-distrib, top representation of number of paths per prefix
456	   o  traffic-frame-size, similar to traffic-frame-size in layer 2
457	      model.  The focus is on the MTU size on each protocol interfaces
458	      and the impact of fragmentation
459	   o  traffic-flow, similar to traffic-flow in layer 2 model, it focuses
460	      on a set of labels, source and destination addresses as well as
461	      ports
462	   o  traffic-load, similar to traffic-load in layer 2 model
463	   o  ifl-count, uint32,
464	   o  vpn-count, uint32,

466	3.3.  Infrastructure Service Model

468	   o  bgp-peer-ext-count, uint32, number of eBGP peers
469	   o  bgp-peer-int-count, uint32, number of iBGP peers
470	   o  bgp-path-mtu, list, true or false.  Larger path mtu helps
471	      convergence

473	   o  bgp-hold-time-distrib, list of top hold-time values and their
474	      respective percentage out of all peers.
475	   o  bgp-as-path-distrib, list of top as-path lengths and their
476	      respective percentage of all BGP paths
477	   o  bgp-community-distrib, list of top community size and their
478	      respective percentage out of all BGP paths
479	   o  mpls-sig, list, MPLS signaling protocol, rsvp or ldp
480	   o  rsvp-lsp-count-ingress, uint32, total ingress lsp count
481	   o  rsvp-lsp-count-transit, uint32, total transit lsp count
482	   o  rsvp-lsp-count-egress, uint32, total egress lsp count
483	   o  ldp-fec-count, uint32, total forwarding equivalence class
484	   o  rsvp-lsp-protection, list, link-node, link, frr
485	   o  ospf-interface-type, list, point-to-point, broadcast, non-
486	      broadcast multi-access
487	   o  ospf-lsa-distrib, list.  OSPF Link Statement Advertisement
488	      distribution is comprised of those for core router in backbone
489	      area, and internal router in non-Backbone areas.  A common
490	      modeling can include number of LSAs per OSPF LSA type
491	   o  ospf-route-count, list, total OSPF routes in both backbone and
492	      non-backbone areas
493	   o  isis-lsp-distrib, list, similar to ospf-lsa-distrib
494	   o  isis-route-count, list, total IS-IS routes in both level-1 and
495	      level-2 areas

497	   TODO: bridging, OAM, EOAM, BFD and etc

499	3.4.  Node Level Features

501	   TODO: node level feature set

503	3.5.  Common Service Specification

505	   For most network services, regardless of layer 2 or layer 3, protocol
506	   families, the following needs to be considered when measuring network
507	   capacity and baseline.

509	   o  rib-learning-time, uint32 in seconds.  This indicates show quickly
510	      the route processor learns routing objects either locally and
511	      remotely
512	   o  fib-learning-time.  In large routing system, forwarding engine
513	      residing on separate hardware from controller, takes additional
514	      time to install all forwarding entries learned by controller.
515	   o  convergence-time, this is could be as a result of many events,
516	      such as uplink failure, ae member link failure, fast reroute,
517	      local repair, upstream node failure and etc
518	   o  multihome-failover-time, this refers to traffic convergence in a
519	      topology where a customer edge (CE) device is connected to two or
520	      more provider edge (PE) devices.

522	   o  issu-dark-window-size.  Unlike NSR, the goal of ISSU is not zero
523	      packet loss.  Instead, there will be a few seconds, or in some
524	      cases, sub-second dark window where it sees both total packet loss
525	      for both transit and or host bound protocol traffic.
526	   o  cpu-util, total CPU utilization of the controllers in stead state
527	   o  cpu-util-peak, Peak CPU utilization on the controller in event of
528	      failure, and convergence
529	   o  mem-util, total memory utilization of the controllers in steady
530	      state
531	   o  mem-util-peak, total memory utilization on the controller in event
532	      of failure and convergence
533	   o  processes, list of top processes running on the controllers with
534	      their CPU and memory utilization.
535	   o  lc-cpu-util, top CPU utilization on the line cards
536	   o  lc-cpu-util-peak, maximum peak CPU utilization among all line
537	      cards in event of failure and convergence
538	   o  lc-mem-util, top memory utilization on the line cards
539	   o  lc-mem-util-peak, maximum peak memory utilization among all line
540	      cards in event of failure and convergence
541	   o  throughout, in both pps and bps.  This is measured with zero
542	      packet loss.  For virtualized environment, throughput is sometimes
543	      measured with a small loss tolerance given the nature of shared
544	      resource
545	   o  traffic performance, in both pps and bps.  It is measured the rate
546	      of traffic received by pumping oversubscribed traffic at ingress
547	   o  latency in us. this is more important within a local data center
548	      environment rather than DCI over wide area network.  Use of
549	      extensive firewall filter or access control lists may affect
550	      latency
551	   o  Out of Order Packet - This can happen in intra-node or over ECMP
552	      where different paths have large latency/delay variations.

554	   The list of metrics can be used for network monitoring during network
555	   resiliency test.  This is to understand how quickly a network service
556	   can restore during various events and failures

558	3.6.  Common Network Events

560	   A list of events is defined to characterize network resiliency.
561	   These attributes require that the provider networks have diverse
562	   paths and node redundancy built-in.  They directly affect service
563	   level agreement and network availability.

565	3.6.1.  Event Attributes

567	   Each network or system event may each be defined with the following
568	   aspects
569	   o  event-iteration, uint16, event to be repeated
570	   o  event-interval, unit16, seconds in between consecutive events
571	   o  event-dist, list, random, equal, or other type of event scheduling
572	   o  event-timeout, uint16, seconds when a single event is expected to
573	      complete
574	   o  event-convergence, uint16, seconds before the network can be
575	      recovered

577	3.6.2.  Hardware Related

579	   Some hardware failures can not easily replicated, or even simulated
580	   in a lab environment, like the memory errors.  A system or network
581	   should be equipped to monitor, detect and contain the impact to avoid
582	   global catastrohpic failure that may proagate beyond a single node or
583	   the regional network.

585	   o  hw-mod-yank: hardware module removal and insertion.
586	   o  hw-interface: Transceiver on/off or any other simulated link
587	      failures.
588	   o  hw-storage: storage failure, ether local or network attached.
589	   o  hw-power: unplanned power failure.
590	   o  hw-controller: Controller failure
591	   o  hw-memory: memory errors

593	3.6.3.  Software Component

595	   o  sw-daemon-watchdog-loss: Induced CPU hog that trigger watchdog
596	      failure
597	   o  sw-daemon-restart-graceful: Graceful software daemon restart.
598	   o  sw-daemon-restart-kill: The process is killed and the daemon was
599	      forced to restart
600	   o  sw-daemon-panic: Sometimes a panic can introduced to trigger a
601	      coredump of software daemon along with restart.
602	   o  sw-os-panic: network operating system may panic under various
603	      situations.  Many network products with a console access support
604	      OS panic with a special sequence keystroke.  Sometimes it may also
605	      generate a coredump for further debug
606	   o  sw-upgrade: In many provider networks, the most downtime actually
607	      come from scheduled maintenance, especially software or firmware
608	      upgrade to provide better feature set or as a result of security
609	      patch.  It is important to understand the downtime requirement for
610	      a routine software upgrade on a given network device or the
611	      devices in the network.  This often presents a challenge to the
612	      access network.

614	3.6.4.  Protocol Events

616	   o  protocol-keepalive-loss: Loss of keepalive, such as hellos for
617	      routing-protocols like OSPF and BGP
618	   o  oam-keepalive-loss: there are many OAM protocols, such as EOAM,
619	      MPLS OAM, LACP, one of their main purposes is to detect
620	      reachability.  This is different from routing protocols keepalive
621	   o  protocol-adjacency-reset: clear all protocol neighbors and any
622	      routes or link states learned from the neighbor
623	   o  protocol-db-purge: remove all database objects learned from a
624	      particular neighbor, or a group of neighbors, or all neigbors.
625	      The database maybe original set of state learned from neighbors,
626	      or the consolidated database.

628	3.6.5.  Redundancy Failover

630	   The network protocols and design have a lot of redundancy built-in.
631	   It is important to benchmark their effectiveness.

633	   o  ha-lag-links: measure packet loss in milliseconds when member
634	      link(s) of a link aggregation group is torn down.  In case of
635	      protected interface, traffic should failover seamlessly to the
636	      backup interface in event of primary link failure
637	   o  ha-controller: In a system with redundant controller, measure the
638	      network recovery time when the primary controller fails.  If the
639	      advanced non-stop routing/forwarding is enabled, the network
640	      should only experience zero or sub-second traffic loss.
641	   o  ha-multihome: in addition to device level redundancy, many
642	      protocols support network layer redundancy though multihoming such
643	      as EVPN.
644	   o  ha-mpls-frr: MPLS RSVP Fast ReRoute provides core network
645	      redundancy
646	   o  ha-uplink: the core network is typically designed to provide path
647	      diversity for edge devices, at either layer 2 and layer 3
648	      connectivity.  The resiliency of network is measured by how fast
649	      the system detects the failure and reroute the traffic

651	4.  Use of Network Service Layer Abstract Model

653	   The primary goal is to characterize and document a complex network
654	   using a simplified service model.  While eliminating many details
655	   such as address assignment, actual route or mac entries, it retains a
656	   set of key network information, including services, scale, and
657	   performance profiles.  This can be used to validate how well each
658	   underlying solution performs when delivering same set of services.

660	   The model can also be used to build a virtualized topology with both
661	   static and dynamic scale closely resemble to a real network.  This
662	   eases network design and benchmarking, and helps capacity planning by
663	   studying the impact with changes to a specific dimension.

665	5.  Acknowledgements

667	   The authors appreciate and acknowledge comments from Al Morton and
668	   others based on initial discussions.

670	6.  IANA Considerations

672	   This memo includes no request to IANA.

674	   All drafts are required to have an IANA considerations section (see
675	   Guidelines for Writing an IANA Considerations Section in RFCs
676	   [RFC5226] for a guide).  If the draft does not require IANA to do
677	   anything, the section contains an explicit statement that this is the
678	   case (as above).  If there are no requirements for IANA, the section
679	   will be removed during conversion into an RFC by the RFC Editor.

681	7.  Security Considerations

683	   All drafts are required to have a security considerations section.
684	   See RFC 3552 [RFC3552] for a guide.

686	8.  References

688	8.1.  Normative References

690	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
691	              Requirement Levels", BCP 14, RFC 2119,
692	              DOI 10.17487/RFC2119, March 1997,
693	              <https://www.rfc-editor.org/info/rfc2119>.

695	   [RFC3552]  Rescorla, E. and B. Korver, "Guidelines for Writing RFC
696	              Text on Security Considerations", BCP 72, RFC 3552,
697	              DOI 10.17487/RFC3552, July 2003,
698	              <https://www.rfc-editor.org/info/rfc3552>.

700	   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
701	              IANA Considerations Section in RFCs", RFC 5226,
702	              DOI 10.17487/RFC5226, May 2008,
703	              <https://www.rfc-editor.org/info/rfc5226>.

705	8.2.  Informative References

707	   [L2SM]     B. Wu et al, "A Yang Data Model for L2VPN Service
708	              Delivery", 2017, <https://www.ietf.org/id/
709	              draft-ietf-l2sm-l2vpn-service-model-02.txt>.

711	   [RFC8049]  Litkowski, S., Tomotaki, L., and K. Ogaki, "YANG Data
712	              Model for L3VPN Service Delivery", RFC 8049,
713	              DOI 10.17487/RFC8049, February 2017,
714	              <https://www.rfc-editor.org/info/rfc8049>.

716	Author's Address

718	   Sean Wu
719	   Juniper Networks
720	   2251 Corporate Park Dr.
721	   Suite #200
722	   Herndon, VA  20171
723	   US

725	   Phone: +1 571 203 1898
726	   Email: xwu@juniper.net