idnits 2.17.1 

draft-ietf-ippm-alt-mark-10.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (September 11, 2017) is 2412 days in the past.  Is
     this intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-12) exists of
     draft-ietf-bier-mpls-encapsulation-07

  == Outdated reference: A later version (-15) exists of
     draft-ietf-bier-pmmm-oam-02

  == Outdated reference: A later version (-07) exists of
     draft-ietf-mpls-flow-ident-05

  == Outdated reference: A later version (-10) exists of
     draft-ietf-mpls-rfc6374-sfl-00

  == Outdated reference: A later version (-12) exists of
     draft-ietf-nvo3-encap-00


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                   G. Fioccola, Ed.
3	Internet-Draft                                                A. Capello
4	Intended status: Experimental                                M. Cociglio
5	Expires: March 15, 2018                                   L. Castaldelli
6	                                                          Telecom Italia
7	                                                                 M. Chen
8	                                                                L. Zheng
9	                                                     Huawei Technologies
10	                                                               G. Mirsky
11	                                                                     ZTE
12	                                                              T. Mizrahi
13	                                                                 Marvell
14	                                                      September 11, 2017

16	 Alternate Marking method for passive and hybrid performance monitoring
17	                      draft-ietf-ippm-alt-mark-10

19	Abstract

21	   This document describes a method to perform packet loss, delay and
22	   jitter measurements on live traffic.  This method is based on
23	   Alternate Marking (Coloring) technique.  A report is provided in
24	   order to explain an example and show the method applicability.  This
25	   technique can be applied in various situations as detailed in this
26	   document and could be considered passive or hybrid depending on the
27	   application.

29	Requirements Language

31	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
32	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
33	   "OPTIONAL" in this document are to be interpreted as described in BCP
34	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
35	   capitals, as shown here.

37	Status of This Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at https://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on March 15, 2018.

54	Copyright Notice

56	   Copyright (c) 2017 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (https://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
72	   2.  Overview of the method  . . . . . . . . . . . . . . . . . . .   4
73	   3.  Detailed description of the method  . . . . . . . . . . . . .   6
74	     3.1.  Packet loss measurement . . . . . . . . . . . . . . . . .   6
75	       3.1.1.  Coloring the packets  . . . . . . . . . . . . . . . .  11
76	       3.1.2.  Counting the packets  . . . . . . . . . . . . . . . .  11
77	       3.1.3.  Collecting data and calculating packet loss . . . . .  12
78	     3.2.  Timing aspects  . . . . . . . . . . . . . . . . . . . . .  12
79	     3.3.  One-way delay measurement . . . . . . . . . . . . . . . .  14
80	       3.3.1.  Single marking methodology  . . . . . . . . . . . . .  14
81	       3.3.2.  Double marking methodology  . . . . . . . . . . . . .  16
82	     3.4.  Delay variation measurement . . . . . . . . . . . . . . .  17
83	   4.  Considerations  . . . . . . . . . . . . . . . . . . . . . . .  18
84	     4.1.  Synchronization . . . . . . . . . . . . . . . . . . . . .  18
85	     4.2.  Data Correlation  . . . . . . . . . . . . . . . . . . . .  18
86	     4.3.  Packet Re-ordering  . . . . . . . . . . . . . . . . . . .  19
87	   5.  Implementation and deployment . . . . . . . . . . . . . . . .  20
88	     5.1.  Report on the operational experiment at Telecom Italia  .  20
89	       5.1.1.  Metric transparency . . . . . . . . . . . . . . . . .  22
90	     5.2.  IP flow performance measurement (IPFPM) . . . . . . . . .  22
91	     5.3.  OAM Passive Performance Measurement . . . . . . . . . . .  22
92	     5.4.  RFC6374 Use Case  . . . . . . . . . . . . . . . . . . . .  23
93	     5.5.  Application to active performance measurement . . . . . .  23
94	   6.  Hybrid measurement  . . . . . . . . . . . . . . . . . . . . .  23
95	   7.  Summary . . . . . . . . . . . . . . . . . . . . . . . . . . .  23
96	   8.  Compliance with RFC6390 guidelines  . . . . . . . . . . . . .  24
97	   9.  Security Considerations . . . . . . . . . . . . . . . . . . .  26
98	   10. IANA Considerations . . . . . . . . . . . . . . . . . . . . .  27
99	   11. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  27
100	   12. References  . . . . . . . . . . . . . . . . . . . . . . . . .  27
101	     12.1.  Normative References . . . . . . . . . . . . . . . . . .  27
102	     12.2.  Informative References . . . . . . . . . . . . . . . . .  28
103	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  30

105	1.  Introduction

107	   Nowadays, most Service Providers' networks carry traffic with
108	   contents that are highly sensitive to packet loss [RFC7680], delay
109	   [RFC7679], and jitter [RFC3393].

111	   In view of this scenario, Service Providers need methodologies and
112	   tools to monitor and measure network performances with an adequate
113	   accuracy, in order to constantly control the quality of experience
114	   perceived by their customers.  On the other hand, performance
115	   monitoring provides useful information for improving network
116	   management (e.g.  isolation of network problems, troubleshooting,
117	   etc.).

119	   A lot of work related to OAM, that includes also performance
120	   monitoring techniques, has been done by Standards Developing
121	   Organizations(SDOs): [RFC7276] provides a good overview of existing
122	   OAM mechanisms defined in IETF, ITU-T and IEEE.  Considering IETF, a
123	   lot of work has been done on fault detection and connectivity
124	   verification, while a minor effort has been dedicated so far to
125	   performance monitoring.  The IPPM WG has defined standard metrics to
126	   measure network performance; however, the methods developed in this
127	   WG mainly refer to focus on active measurement techniques.  More
128	   recently, the MPLS WG has defined mechanisms for measuring packet
129	   loss, one-way and two-way delay, and delay variation in MPLS
130	   networks[RFC6374], but their applicability to passive measurements
131	   has some limitations, especially for pure connection-less networks.

133	   The lack of adequate tools to measure packet loss with the desired
134	   accuracy drove an effort to design a new method for the performance
135	   monitoring of live traffic, possibly easy to implement and deploy.
136	   The effort led to the method described in this document: basically,
137	   it is a passive performance monitoring technique, potentially
138	   applicable to any kind of packet based traffic, including Ethernet,
139	   IP, and MPLS, both unicast and multicast.  The method addresses
140	   primarily packet loss measurement, but it can be easily extended to
141	   one-way delay and delay variation measurements as well.

143	   The method has been explicitly designed for passive measurements but
144	   it can also be used with active probes.  Passive measurements are
145	   usually more easily understood by customers and provide a much better
146	   accuracy, especially for packet loss measurements.

148	   RFC 7799 [RFC7799] defines passive and hybrid methods of measurement.
149	   In particular, Passive Methods of Measurement are based solely on
150	   observations of an undisturbed and unmodified packet stream of
151	   interest; Hybrid Methods are Methods of Measurement that use a
152	   combination of Active Methods and Passive Methods.

154	   Taking into consideration these definitions, Alternate Marking Method
155	   could be considered Hybrid or Passive depending on the case.  In case
156	   the marking field is obtained by changing existing field values of
157	   the packets (e.g.  DSCP field), the technique is Hybrid.  In case the
158	   marking field is dedicated, reserved and is included in the protocol
159	   specification Alternate Marking technique can be considered as
160	   Passive (e.g.  RFC6374 Synonymous Flow Label or OAM Marking Bits in
161	   BIER Header).

163	   This document is organized as follows:

165	   o  Section 2 gives an overview of the method, including a comparison
166	      with different measurement strategies;

168	   o  Section 3 describes the method in detail;

170	   o  Section 4 reports considerations about synchronization, data
171	      correlation and packet re-ordering;

173	   o  Section 5 reports examples of implementation and deployment of the
174	      method.  Furthermore the operational experiment done at Telecom
175	      Italia is described;

177	   o  Section 6 introduces Hybrid measurement aspects;

179	   o  Section 7 is about the Compliance with RFC6390 guidelines;

181	   o  Section 8 includes some security aspects;

183	   o  Section 9 finally summarizes some concluding remarks.

185	2.  Overview of the method

187	   In order to perform packet loss measurements on a live traffic flow,
188	   different approaches exist.  The most intuitive one consists in
189	   numbering the packets, so that each router that receives the flow can
190	   immediately detect a packet missing.  This approach, though very
191	   simple in theory, is not simple to achieve: it requires the insertion
192	   of a sequence number into each packet and the devices must be able to
193	   extract the number and check it in real time.  Such a task can be
194	   difficult to implement on live traffic: if UDP is used as the
195	   transport protocol, the sequence number is not available; on the
196	   other hand, if a higher layer sequence number (e.g. in the RTP
197	   header) is used, extracting that information from each packet and
198	   process it in real time could overload the device.

200	   An alternate approach is to count the number of packets sent on one
201	   end, the number of packets received on the other end, and to compare
202	   the two values.  This operation is much simpler to implement, but
203	   requires that the devices performing the measurement are in sync: in
204	   order to compare two counters it is required that they refer exactly
205	   to the same set of packets.  Since a flow is continuous and cannot be
206	   stopped when a counter has to be read, it could be difficult to
207	   determine exactly when to read the counter.  A possible solution to
208	   overcome this problem is to virtually split the flow in consecutive
209	   blocks by inserting periodically a delimiter so that each counter
210	   refers exactly to the same block of packets.  The delimiter could be
211	   for example a special packet inserted artificially into the flow.
212	   However, delimiting the flow using specific packets has some
213	   limitations.  First, it requires generating additional packets within
214	   the flow and requires the equipment to be able to process those
215	   packets.  In addition, the method is vulnerable to out of order
216	   reception of delimiting packets and, to a lesser extent, to their
217	   loss.

219	   The method proposed in this document follows the second approach, but
220	   it doesn't use additional packets to virtually split the flow in
221	   blocks.  Instead, it "marks" the packets so that the packets
222	   belonging to the same block will have the same color, whilst
223	   consecutive blocks will have different colors.  Each change of color
224	   represents a sort of auto-synchronization signal that guarantees the
225	   consistency of measurements taken by different devices along the path
226	   (see also [I-D.cociglio-mboned-multicast-pm] and
227	   [I-D.tempia-opsawg-p3m], where this technique was introduced).

229	   Figure 1 represents a very simple network and shows how the method
230	   can be used to measure packet loss on different network segments: by
231	   enabling the measurement on several interfaces along the path, it is
232	   possible to perform link monitoring, node monitoring or end-to-end
233	   monitoring.  The method is flexible enough to measure packet loss on
234	   any segment of the network and can be used to isolate the faulty
235	   element.

237	                            Traffic flow
238	        ========================================================>
239	          +------+       +------+       +------+       +------+
240	      ---<>  R1  <>-----<>  R2  <>-----<>  R3  <>-----<>  R4  <>---
241	          +------+       +------+       +------+       +------+
242	          .              .      .              .       .      .
243	          .              .      .              .       .      .
244	          .              <------>              <------->      .
245	          .          Node Packet Loss      Link Packet Loss   .
246	          .                                                   .
247	          <--------------------------------------------------->
248	                           End-to-End Packet loss

250	                     Figure 1: Available measurements

252	3.  Detailed description of the method

254	   This section describes in detail how the method operate.  A special
255	   emphasis is given to the measurement of packet loss, that represents
256	   the core application of the method, but applicability to delay and
257	   jitter measurements is also considered.

259	3.1.  Packet loss measurement

261	   The basic idea is to virtually split traffic flows into consecutive
262	   blocks: each block represents a measurable entity unambiguously
263	   recognizable by all network devices along the path.  By counting the
264	   number of packets in each block and comparing the values measured by
265	   different network devices along the path, it is possible to measure
266	   packet loss occurred in any single block between any two points.

268	   As discussed in the previous section, a simple way to create the
269	   blocks is to "color" the traffic (two colors are sufficient) so that
270	   packets belonging to different consecutive blocks will have different
271	   colors.  Whenever the color changes, the previous block terminates
272	   and the new one begins.  Hence, all the packets belonging to the same
273	   block will have the same color and packets of different consecutive
274	   blocks will have different colors.  The number of packets in each
275	   block depends on the criterion used to create the blocks:

277	   o  if the color is switched after a fixed number of packets, then
278	      each block will contain the same number of packets (except for any
279	      losses);

281	   o  if the color is switched according to a fixed timer, then the
282	      number of packets may be different in each block depending on the
283	      packet rate.

285	   The following figure shows how a flow looks like when it is split in
286	   traffic blocks with colored packets.

288	   A: packet with A coloring
289	   B: packet with B coloring

291	            |           |           |           |           |
292	            |           |    Traffic flow       |           |
293	    ------------------------------------------------------------------->
294	     BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA
295	    ------------------------------------------------------------------->
296	       ...  |  Block 5  |  Block 4  |  Block 3  |  Block 2  |  Block 1
297	            |           |           |           |           |

299	                        Figure 2: Traffic coloring

301	   Figure 3 shows how the method can be used to measure link packet loss
302	   between two adjacent nodes.

304	   Referring to the figure, let's assume we want to monitor the packet
305	   loss on the link between two routers: router R1 and router R2.
306	   According to the method, the traffic is colored alternatively with
307	   two different colors, A and B.  Whenever the color changes, the
308	   transition generates a sort of square-wave signal, as depicted in the
309	   following figure.

311	   Color A   ----------+           +-----------+           +----------
312	                       |           |           |           |
313	   Color B             +-----------+           +-----------+
314	              Block n        ...      Block 3     Block 2     Block 1
315	            <---------> <---------> <---------> <---------> <--------->

317	                                Traffic flow
318	            ===========================================================>
319	   Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA...
320	            ===========================================================>

322	                 Figure 3: Computation of link packet loss

324	   Traffic coloring could be done by R1 itself or by an upward router.
325	   R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface:
326	   C(A)R1 counts the packets with color A and C(B)R1 counts those with
327	   color B.  As long as traffic is colored A, only counter C(A)R1 will
328	   be incremented, while C(B)R1 is not incremented; vice versa, when the
329	   traffic is colored as B, only C(B)R1 is incremented.  C(A)R1 and
330	   C(B)R1 can be used as reference values to determine the packet loss
331	   from R1 to any other measurement point down the path.  Router R2,
332	   similarly, will need two counters on its ingress interface, C(A)R2
333	   and C(B)R2, to count the packets received on that interface and
334	   colored with color A and B respectively.  When an A block ends, it is
335	   possible to compare C(A)R1 and C(A)R2 and calculate the packet loss
336	   within the block; similarly, when the successive B block terminates,
337	   it is possible to compare C(B)R1 with C(B)R2, and so on for every
338	   successive block.

340	   Likewise, by using two counters on R2 egress interface it is possible
341	   to count the packets sent out of R2 interface and use them as
342	   reference values to calculate the packet loss from R2 to any
343	   measurement point down R2.

345	   Using a fixed timer for color switching offers a better control over
346	   the method: the (time) length of the blocks can be chosen large
347	   enough to simplify the collection and the comparison of measures
348	   taken by different network devices.  It's preferable to read the
349	   value of the counters not immediately after the color switch: some
350	   packets could arrive out of order and increment the counter
351	   associated to the previous block (color), so it is worth waiting for
352	   some time.  A safe choice is to wait L/2 time units (where L is the
353	   duration for each block) after the color switch, to read the still
354	   counter of the previous color, so the possibility to read a running
355	   counter instead of a still one is minimized.  The drawback is that
356	   the longer the duration of the block, the less frequent the
357	   measurement can be taken.

359	   The following table shows how the counters can be used to calculate
360	   the packet loss between R1 and R2.  The first column lists the
361	   sequence of traffic blocks while the other columns contain the
362	   counters of A-colored packets and B-colored packets for R1 and R2.
363	   In this example, we assume that the values of the counters are reset
364	   to zero whenever a block ends and its associated counter has been
365	   read: with this assumption, the table shows only relative values,
366	   that is the exact number of packets of each color within each block.
367	   If the values of the counters were not reset, the table would contain
368	   cumulative values, but the relative values could be determined simply
369	   by difference from the value of the previous block of the same color.

371	   The color is switched on the basis of a fixed timer (not shown in the
372	   table), so the number of packets in each block is different.

374	           +-------+--------+--------+--------+--------+------+
375	           | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss |
376	           +-------+--------+--------+--------+--------+------+
377	           | 1     | 375    | 0      | 375    | 0      | 0    |
378	           |       |        |        |        |        |      |
379	           | 2     | 0      | 388    | 0      | 388    | 0    |
380	           |       |        |        |        |        |      |
381	           | 3     | 382    | 0      | 381    | 0      | 1    |
382	           |       |        |        |        |        |      |
383	           | 4     | 0      | 377    | 0      | 374    | 3    |
384	           |       |        |        |        |        |      |
385	           | ...   | ...    | ...    | ...    | ...    | ...  |
386	           |       |        |        |        |        |      |
387	           | 2n    | 0      | 387    | 0      | 387    | 0    |
388	           |       |        |        |        |        |      |
389	           | 2n+1  | 379    | 0      | 377    | 0      | 2    |
390	           +-------+--------+--------+--------+--------+------+

392	       Table 1: Evaluation of counters for packet loss measurements

394	   During an A block (blocks 1, 3 and 2n+1), all the packets are
395	   A-colored, therefore the C(A) counters are incremented to the number
396	   seen on the interface, while C(B) counters are zero.  Vice versa,
397	   during a B block (blocks 2, 4 and 2n), all the packets are B-colored:
398	   C(A) counters are zero, while C(B) counters are incremented.

400	   When a block ends (because of color switching) the relative counters
401	   stop incrementing and it is possible to read them, compare the values
402	   measured on router R1 and R2 and calculate the packet loss within
403	   that block.

405	   For example, looking at the table above, during the first block
406	   (A-colored), C(A)R1 and C(A)R2 have the same value (375), which
407	   corresponds to the exact number of packets of the first block (no
408	   loss).  Also during the second block (B-colored) R1 and R2 counters
409	   have the same value (388), which corresponds to the number of packets
410	   of the second block (no loss).  During blocks three and four, R1 and
411	   R2 counters are different, meaning that some packets have been lost:
412	   in the example, one single packet (382-381) was lost during block
413	   three and three packets (377-374) were lost during block four.

415	   The method applied to R1 and R2 can be extended to any other router
416	   and applied to more complex networks, as far as the measurement is
417	   enabled on the path followed by the traffic flow(s) being observed.

419	   It's worth mentioning two different strategies that can be used when
420	   implementing the method:

422	   o  flow-based: the flow-based strategy is used when only a limited
423	      number of traffic flows need to be monitored.  According to this
424	      strategy, only a subset of the flows is colored.  Counters for
425	      packet loss measurements can be instantiated for each single flow,
426	      or for the set as a whole, depending on the desired granularity.
427	      A relevant problem with this approach is the necessity to know in
428	      advance the path followed by flows that are subject to
429	      measurement.  Path rerouting and traffic load-balancing increase
430	      the issue complexity, especially for unicast traffic.  The problem
431	      is easier to solve for multicast traffic where load balancing is
432	      seldom used and static joins are frequently used to force traffic
433	      forwarding and replication.

435	   o  link-based: measurements are performed on all the traffic on a
436	      link by link basis.  The link could be a physical link or a
437	      logical link.  Counters could be instantiated for the traffic as a
438	      whole or for each traffic class (in case it is desired to monitor
439	      each class separately), but in the second case a couple of
440	      counters is needed for each class.

442	   As mentioned, the flow-based measurement requires the identification
443	   of the flow to be monitored and the discovery of the path followed by
444	   the selected flow.  It is possible to monitor a single flow or
445	   multiple flows grouped together, but in this case measurement is
446	   consistent only if all the flows in the group follow the same path.
447	   Moreover if a measurement is performed by grouping many flows, it is
448	   not possible to determine exactly which flow was affected by packets
449	   loss.  In order to have measures per single flow it is necessary to
450	   configure counters for each specific flow.  Once the flow(s) to be
451	   monitored have been identified, it is necessary to configure the
452	   monitoring on the proper nodes.  Configuring the monitoring means
453	   configuring the rule to intercept the traffic and configuring the
454	   counters to count the packets.  To have just an end-to-end
455	   monitoring, it is sufficient to enable the monitoring on the first
456	   and the last hop routers of the path: the mechanism is completely
457	   transparent to intermediate nodes and independent from the path
458	   followed by traffic flows.  On the contrary, to monitor the flow on a
459	   hop-by-hop basis along its whole path it is necessary to enable the
460	   monitoring on every node from the source to the destination.  In case
461	   the exact path followed by the flow is not known a priori (i.e. the
462	   flow has multiple paths to reach the destination) it is necessary to
463	   enable the monitoring system on every path: counters on interfaces
464	   traversed by the flow will report packet count, counters on other
465	   interfaces will be null.

467	3.1.1.  Coloring the packets

469	   The coloring operation is fundamental in order to create packet
470	   blocks.  This implies choosing where to activate the coloring and how
471	   to color the packets.

473	   In case of flow-based measurements, it is desirable, in general, to
474	   have a single coloring node because it is easier to manage and
475	   doesn't rise any risk of conflict (consider the case where two nodes
476	   color the same flow).  Thus it is advantageous to color the flow as
477	   close as possible to the source.  In addition, coloring a flow close
478	   to the source allows an end-to-end measure if a measurement point is
479	   enabled on the last-hop router as well.  The only requirement is that
480	   the coloring must change periodically and every node along the path
481	   must be able to identify unambiguously the colored packets.  For
482	   link-based measurements, all traffic needs to be colored when
483	   transmitted on the link.  If the traffic had already been colored,
484	   then it has to be re-colored because the color must be consistent on
485	   the link.  This means that each hop along the path must (re-)color
486	   the traffic; the color is not required to be consistent along
487	   different links.

489	   Traffic coloring can be implemented by setting a specific bit in the
490	   packet header and changing the value of that bit periodically.  How
491	   to choose the marking field depends on the application and is out of
492	   scope here.

494	3.1.2.  Counting the packets

496	   Assuming that the coloring of the packets is performed only by the
497	   source node, the nodes between source and destination (included) have
498	   to count the colored packets that they receive and forward: this
499	   operation can be enabled on every router along the path or only on a
500	   subset, depending on which network segment is being monitored (a
501	   single link, a particular metro area, the backbone, the whole path).

503	   Since the color switches periodically between two values, two
504	   counters (one for each value) are needed: one counter for packets
505	   with color A and one counter for packets with color B.  For each flow
506	   (or group of flows) being monitored and for every interface where the
507	   monitoring is active, a couple of counters is needed.  For example,
508	   in order to monitor separately 3 flows on a router with 4 interfaces
509	   involved, 24 counters are needed (2 counters for each of the 3 flows
510	   on each of the 4 interfaces).

512	   In case of link-based measurements the behaviour is similar except
513	   that coloring and counting operations are performed on a link by link
514	   basis at each endpoint of the link.

516	   Another important aspect to take into consideration is when to read
517	   the counters: in order to count the exact number of packets of a
518	   block the routers must perform this operation when that block has
519	   ended: in other words, the counter for color A must be read when the
520	   current block has color B, in order to be sure that the value of the
521	   counter is stable.  This task can be accomplished in two ways.  The
522	   general approach suggests to read the counters periodically, many
523	   times during a block duration, and to compare these successive
524	   readings: when the counter stops incrementing means that the current
525	   block has ended and its value can be elaborated safely.
526	   Alternatively, if the coloring operation is performed on the basis of
527	   a fixed timer, it is possible to configure the reading of the
528	   counters according to that timer: for example, reading the counter
529	   for color A every period in the middle of the subsequent block with
530	   color B is a safe choice.  A sufficient margin should be considered
531	   between the end of a block and the reading of the counter, in order
532	   to take into account any out-of-order packets.

534	3.1.3.  Collecting data and calculating packet loss

536	   The nodes enabled to perform performance monitoring collect the value
537	   of the counters, but they are not able to directly use this
538	   information to measure packet loss, because they only have their own
539	   samples.  For this reason, an external Network Management System
540	   (NMS) can be used to collect and elaborate data and to perform packet
541	   loss calculation.  The NMS compares the values of counters from
542	   different nodes and can calculate if some packets were lost (even a
543	   single packet) and also where packets were lost.

545	   The value of the counters needs to be transmitted to the NMS as soon
546	   as it has been read.  This can be accomplished by using SNMP or FTP
547	   and can be done in Push Mode or Polling Mode.  In the first case,
548	   each router periodically sends the information to the NMS, in the
549	   latter case it is the NMS that periodically polls routers to collect
550	   information.  In any case, the NMS has to collect all the relevant
551	   values from all the routers within one cycle of the timer.

553	   If link-based measurement is used, it would be possible to use a
554	   protocol to exchange values of counters between the two endpoints in
555	   order to let them perform the packet loss calculation for each
556	   traffic direction.  A similar approach could be also applied to a
557	   flow-based measurement.

559	3.2.  Timing aspects

561	   This document introduces two color switching method: one is based on
562	   fixed number of packet, the other is based on fixed timer.  But the
563	   method based on fixed timer is preferable because is more
564	   deterministic, and will be considered in the rest of the dcoument.

566	   By considering the clock error between network devices R1 and R2,
567	   they must be synchronized to the same clock reference with an
568	   accuracy of +/- L/2 time units, where L is the time duration of the
569	   block.  So each colored packet can be assigned to the right batch by
570	   each router.  This is because the minimum time distance between two
571	   packets of the same color but belonging to different batches is L
572	   time units.

574	   In practice, there are also out of order at batch boundaries,
575	   strictly related to the delay between measurement points.  This means
576	   that, without considering clock error, we wait L/2 after color
577	   switching to be sure to take a still counter.

579	   In summary we need to take into account two contributions: clock
580	   error between network devices and the interval we need to wait to
581	   avoid out of order because of network delay.

583	   The following figure explains both issues.

585	   ...BBBBBBBBB | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | BBBBBBBBB...
586	                |<======================================>|
587	                |                   L                    |
588	   ...=========>|<==================><==================>|<==========...
589	                |       L/2                   L/2        |
590	                |<===>|                            |<===>|
591	                   d  |                            |   d
592	                      |<==========================>|
593	                       available counting interval

595	                         Figure 4: Timing aspects

597	   It is assumed that all network devices are synchronized to a common
598	   reference time with an accuracy of +/- A/2.  Thus, the difference
599	   between the clock values of any two network devices is bounded by A.

601	   The guardband d is given by:

603	   d = A + D_max - D_min,

605	   where A is the clock accuracy, D_max is an upper bound on the network
606	   delay between the network devices, and D_min is a lower bound on the
607	   delay.

609	   The available counting interval is L - 2d that must be > 0.

611	   The condition that must be satisfied and is a requirement on the
612	   synchronization accuracy is:

614	   d < L/2.

616	3.3.  One-way delay measurement

618	   The same principle used to measure packet loss can be applied also to
619	   one-way delay measurement.  There are three alternatives, as
620	   described hereinafter.

622	3.3.1.  Single marking methodology

624	   The alternation of colors can be used as a time reference to
625	   calculate the delay.  Whenever the color changes (that means that a
626	   new block has started) a network device can store the timestamp of
627	   the first packet of the new block; that timestamp can be compared
628	   with the timestamp of the same packet on a second router to compute
629	   packet delay.  Considering Figure 2, R1 stores a timestamp TS(A1)R1
630	   when it sends the first packet of block 1 (A-colored), a timestamp
631	   TS(B2)R1 when it sends the first packet of block 2 (B-colored) and so
632	   on for every other block.  R2 performs the same operation on the
633	   receiving side, recording TS(A1)R2, TS(B2)R2 and so on.  Since the
634	   timestamps refer to specific packets (the first packet of each block)
635	   we are sure that timestamps compared to compute delay refer to the
636	   same packets.  By comparing TS(A1)R1 with TS(A1)R2 (and similarly
637	   TS(B2)R1 with TS(B2)R2 and so on) it is possible to measure the delay
638	   between R1 and R2.  In order to have more measurements, it is
639	   possible to take and store more timestamps, referring to other
640	   packets within each block.

642	   In order to coherently compare timestamps collected on different
643	   routers, the clocks on the network nodes must be in sync.
644	   Furthermore, a measurement is valid only if no packet loss occurs and
645	   if packet misordering can be avoided, otherwise the first packet of a
646	   block on R1 could be different from the first packet of the same
647	   block on R2 (f.i. if that packet is lost between R1 and R2 or it
648	   arrives after the next one).

650	   The following table shows how timestamps can be used to calculate the
651	   delay between R1 and R2.  The first column lists the sequence of
652	   blocks while other columns contain the timestamp referring to the
653	   first packet of each block on R1 and R2.  The delay is computed as a
654	   difference between timestamps.  For the sake of simplicity, all the
655	   values are expressed in milliseconds.

657	      +-------+---------+---------+---------+---------+-------------+
658	      | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 |
659	      +-------+---------+---------+---------+---------+-------------+
660	      | 1     | 12.483  | -       | 15.591  | -       | 3.108       |
661	      |       |         |         |         |         |             |
662	      | 2     | -       | 6.263   | -       | 9.288   | 3.025       |
663	      |       |         |         |         |         |             |
664	      | 3     | 27.556  | -       | 30.512  | -       | 2.956       |
665	      |       |         |         |         |         |             |
666	      |       | -       | 18.113  | -       | 21.269  | 3.156       |
667	      |       |         |         |         |         |             |
668	      | ...   | ...     | ...     | ...     | ...     | ...         |
669	      |       |         |         |         |         |             |
670	      | 2n    | 77.463  | -       | 80.501  | -       | 3.038       |
671	      |       |         |         |         |         |             |
672	      | 2n+1  | -       | 24.333  | -       | 27.433  | 3.100       |
673	      +-------+---------+---------+---------+---------+-------------+

675	         Table 2: Evaluation of timestamps for delay measurements

677	   The first row shows timestamps taken on R1 and R2 respectively and
678	   referring to the first packet of block 1 (which is A-colored).  Delay
679	   can be computed as a difference between the timestamp on R2 and the
680	   timestamp on R1.  Similarly, the second row shows timestamps (in
681	   milliseconds) taken on R1 and R2 and referring to the first packet of
682	   block 2 (which is B-colored).  Comparing timestamps taken on
683	   different nodes in the network and referring to the same packets
684	   (identified using the alternation of colors) it is possible to
685	   measure delay on different network segments.

687	   For the sake of simplicity, in the above example a single measurement
688	   is provided within a block, taking into account only the first packet
689	   of each block.  The number of measurements can be easily increased by
690	   considering multiple packets in the block: for instance, a timestamp
691	   could be taken every N packets, thus generating multiple delay
692	   measurements.  Taking this to the limit, in principle the delay could
693	   be measured for each packet, by taking and comparing the
694	   corresponding timestamps (possible but impractical from an
695	   implementation point of view).

697	3.3.1.1.  Mean delay

699	   As mentioned before, the method previously exposed for measuring the
700	   delay is sensitive to out of order reception of packets.  In order to
701	   overcome this problem, a different approach has been considered: it
702	   is based on the concept of mean delay.  The mean delay is calculated
703	   by considering the average arrival time of the packets within a
704	   single block.  The network device locally stores a timestamp for each
705	   packet received within a single block: summing all the timestamps and
706	   dividing by the total number of packets received, the average arrival
707	   time for that block of packets can be calculated.  By subtracting the
708	   average arrival times of two adjacent devices it is possible to
709	   calculate the mean delay between those nodes.  When computing the
710	   mean delay, measurement error could be augmented by accumulating
711	   measurement error of a lot of packets.  This method is robust to out
712	   of order packets and also to packet loss (only a small error is
713	   introduced).  Moreover, it greatly reduces the number of timestamps
714	   (only one per block for each network device) that have to be
715	   collected by the management system.  On the other hand, it only gives
716	   one measure for the duration of the block (f.i. 5 minutes), and it
717	   doesn't give the minimum, maximum and median delay values (RFC 6703
718	   [RFC6703]).  This limitation could be overcome by reducing the
719	   duration of the block (f.i. from 5 minutes to a few seconds), that
720	   implicates an highly optimized implementation of the method.

722	   By summing the mean delays of the two directions of a path, it is
723	   also possible to measure the two-way mean delay (round-trip delay).

725	3.3.2.  Double marking methodology

727	   The Single marking methodology for one-way delay measurement is
728	   sensitive to out of order reception of packets.  The first approach
729	   to overcome this problem is described before and is based on the
730	   concept of mean delay.  But the limitation of mean delay is that it
731	   doesn't give information about the delay values distribution for the
732	   duration of the block.  Additionally it may be useful to have not
733	   only the mean delay but also the minimum, maximum and median delay
734	   values and, in wider terms, to know more about the statistic
735	   distribution of delay values.  So in order to have more information
736	   about the delay and to overcome out of order issues, a different
737	   approach can be introduced: it is based on double marking
738	   methodology.

740	   Basically, the idea is to use the first marking to create the
741	   alternate flow and, within this colored flow, a second marking to
742	   select the packets for measuring delay/jitter.  The first marking is
743	   needed for packet loss and mean delay measurement.  The second
744	   marking creates a new set of marked packets that are fully identified
745	   over the network, so that a network device can store the timestamps
746	   of these packets; these timestamps can be compared with the
747	   timestamps of the same packets on a second router to compute packet
748	   delay values for each packet.  The number of measurements can be
749	   easily increased by changing the frequency of the second marking.
750	   But the frequency of the second marking must be not too high in order
751	   to avoid out of order issues.  Between packets with the second
752	   marking there should be a security time gap (e.g. this gap could be,
753	   at the minimum, the mean network delay calculated with the previous
754	   methodology) to avoid out of order issues and also to have a number
755	   of measurement packets that is rate independent.  If a second marking
756	   packet is lost, the delay measurement for the considered block is
757	   corrupted and should be discarded.

759	   Mean delay is calculated on all the packets of a sample and is a
760	   simple computation to be performed for single marking method.  In
761	   some cases the mean delay measure is not sufficient to characterize
762	   the sample, and more statistics of delay extent data are needed, e.g.
763	   percentiles, variance and median delay values.  The conventional
764	   range (maximum-minimum) should be avoided for several reasons,
765	   including stability of the maximum delay due to the influence by
766	   outliers.  RFC 5481 [RFC5481] section 6.5 highlights how the 99.9th
767	   percentile of delay and delay variation is more helpful to
768	   performance planners.  To overcome this drawback the idea is to
769	   couple the mean delay measure for the entire batch with double
770	   marking method, where a subset of batch packets are selected for
771	   extensive delay calculation by using a second marking.  In this way
772	   it is possible to perform a detailed analysis on these double marked
773	   packets.  Please note that there are classic algorithms for median
774	   and variance calculation, but are out of the scope of this document.
775	   The comparison between the mean delay for the entire batch and the
776	   mean delay on these double marked packets gives an useful information
777	   since it is possible to understand if the double marking measurements
778	   are actually representative of the delay trends.

780	3.4.  Delay variation measurement

782	   Similarly to one-way delay measurement (both for single marking and
783	   double marking), the method can also be used to measure the inter-
784	   arrival jitter.  We refer to the definition in RFC 3393 [RFC3393].
785	   The alternation of colors, for single marking method, can be used as
786	   a time reference to measure delay variations.  In case of double
787	   marking, the time reference is given by the second marked packets.
788	   Considering the example depicted in Figure 2, R1 stores a timestamp
789	   TS(A)R1 whenever it sends the first packet of a block and R2 stores a
790	   timestamp TS(B)R2 whenever it receives the first packet of a block.
791	   The inter-arrival jitter can be easily derived from one-way delay
792	   measurement, by evaluating the delay variation of consecutive
793	   samples.

795	   The concept of mean delay can also be applied to delay variation, by
796	   evaluating the average variation of the interval between consecutive
797	   packets of the flow from R1 to R2.

799	4.  Considerations

801	   This section highlights some considerations about the methodology.

803	4.1.  Synchronization

805	   The Alternate Marking technique does not require a strong
806	   synchronization, especially for packet loss and two-way delay
807	   measurement.  Only one-way delay measurement requires network devices
808	   to have synchronized clocks.

810	   The color switching is the reference for all the network devices, and
811	   the only requirement to be achieved is that all network devices have
812	   to recognize the right batch along the path.

814	   If the length of the measurement period is L time units, then all
815	   network devices must be synchronized to the same clock reference with
816	   an accuracy of +/- L/2 time units (without considering network
817	   delay).  This level of accuracy guarantees that all network devices
818	   consistently match the color bit to the correct block.  For example,
819	   if the color is toggeled every second (L = 1 second), then clocks
820	   must be synchronized with an accuracy of +/- 0.5 second to a common
821	   time reference.

823	   This synchronization requirement can be satisfied even with a
824	   relatively inaccurate synchronization method.  This is true for
825	   packet loss and two-way delay measurement, instead, for one-way delay
826	   measurement clock synchronization must be accurate.

828	   Therefore, a system that uses only packet loss and two-way delay
829	   measurement does not require synchronization.  This is because the
830	   value of the clocks of network devices does not affect the
831	   computation of the two-way delay measurement.

833	4.2.  Data Correlation

835	   Data Correlation is the mechanism to compare counters and timestamps
836	   for packet loss, delay and delay variation calculation.  It could be
837	   performed in several ways depending on the alternate marking
838	   application and use case.

840	   o  A possibility is to use a centralized solution using Network
841	      Management System (NMS) to correlate data;

843	   o  Another possibility is to define a protocol based distributed
844	      solution, by defining a new protocol or by extending the existing
845	      protocols (e.g.  RFC6374, TWAMP, OWAMP) in order to communicate
846	      the counters and timestamps between nodes.

848	   In the following paragraphs an example data correlation mechanism is
849	   explained and could be use independently of the adopted solutions.

851	   When data is collected on the upstream and downstream node, e.g.,
852	   packet counts for packet loss measurement or timestamps for packet
853	   delay measurement, and periodically reported to or pulled by other
854	   nodes or NMS, a certain data correlation mechanism SHOULD be in use
855	   to help the nodes or NMS to tell whether any two or more packet
856	   counts are related to the same block of markers, or any two
857	   timestamps are related to the same marked packet.

859	   The alternate marking method described in this document literally
860	   split the packets of the measured flow into different measurement
861	   blocks, in addition a Block Number could be assigned to each of such
862	   measurement block.  The BN is generated each time a node reads the
863	   data (packet counts or timestamps), and is associated with each
864	   packet count and timestamp reported to or pulled by other nodes or
865	   NMS.  The value of BN could be calculated as the modulo of the local
866	   time (when the data are read) and the interval of the marking time
867	   period.

869	   When the nodes or NMS see, for example, same BNs associated with two
870	   packet counts from an upstream and a downstream node respectively, it
871	   considers that these two packet counts corresponding to the same
872	   block, i.e. that these two packet counts belong to the same block of
873	   markers from the upstream and downstream node.  The assumption of
874	   this BN mechanism is that the measurement nodes are time
875	   synchronized.  This requires the measurement nodes to have a certain
876	   time synchronization capability (e.g., the Network Time Protocol
877	   (NTP) RFC 5905 [RFC5905], or the IEEE 1588 Precision Time Protocol
878	   (PTP) [IEEE-1588]).  Synchronization aspects are further discussed in
879	   Section 4.

881	4.3.  Packet Re-ordering

883	   Due to ECMP, packet re-ordering is very common in IP network.  The
884	   accuracy of marking based PM, especially packet loss measurement, may
885	   be affected by packet re-ordering.  Take a look at the following
886	   example:

888	   Block   :    1    |    2    |    3    |    4    |    5    |...
889	   --------|---------|---------|---------|---------|---------|---
890	   Node R1 : AAAAAAA | BBBBBBB | AAAAAAA | BBBBBBB | AAAAAAA |...
891	   Node R2 : AAAAABB | AABBBBA | AAABAAA | BBBBBBA | ABAAABA |...

893	                        Figure 5: Packet Reordering

895	   In Figure 5 the packet stream for Node R1 isn't being reordered, and
896	   can be safely assigned to interval blocks, but the packet stream for
897	   Node R2 is being reordered, so, looking at the packet with the marker
898	   of "B" in block 3, there is no safe way to tell whether the packet
899	   belongs to block 2 or block 4.

901	   In general there is the need to assign packets with the marker of "B"
902	   or "A" to the right interval blocks.  Most of the packet re-ordering
903	   occur at the edge of adjacent blocks, and they are easy to handle if
904	   the interval of each block is sufficient large.  Then, it can assume
905	   that the packets with different marker belong to the block that they
906	   are more close to.  If the interval is small, it is difficult and
907	   sometime impossible to determine to which block a packet belongs.

909	   To choose a proper interval is important and how to choose a proper
910	   interval is out of the scope of this document.  But an implementation
911	   SHOULD provide a way to configure the interval and allow a certain
912	   degree of packet re-ordering.

914	5.  Implementation and deployment

916	   The methodology described in the previous sections can be applied in
917	   various situations.  Basically Alternate Marking technique could be
918	   used in many cases for performance measurement.  The only requirement
919	   is to select and mark the flow to be monitored; in this way packets
920	   are batched by the sender and each batch is alternately marked such
921	   that can be easily recognized by the receiver.

923	   An example of implementation and deployment is explained in the next
924	   section, just to clarify how the method can work.

926	5.1.  Report on the operational experiment at Telecom Italia

928	   The method described in this document, also called PNPM (Packet
929	   Network Performance Monitoring), has been invented and engineered in
930	   Telecom Italia and it's currently being used in Telecom Italia's
931	   network.  The methodology has been applied by leveraging functions
932	   and tools available on IP routers and it's currently being used to
933	   monitor packet loss in some portions of Telecom Italia's network.
934	   The application of the method to delay measurement is currently being
935	   evaluated in Telecom Italia's labs.  This section describes how the
936	   features currently available on existing routing platforms can be
937	   used to apply the method, in order to give an example of
938	   implementation and deployment.

940	   The current implementation in Telecom Italia uses the flow-based
941	   strategy, as defined in section 3.  The link-based strategy could be
942	   applied to physical link or a logical link (e.g.  Ethernet VLAN or a
943	   MPLS PW).

945	   The method is applied in Telecom Italia's network to multicast IPTV
946	   channels or other specific traffic flows with high QoS requirements
947	   (i.e.  Mobile Backhauling traffic implemented with a VPN MPLS).

949	   The implementation of the method by a Service Provider needs to use
950	   the router features.  With current router implementations, only QoS
951	   related fields and features offer the required flexibility to set
952	   bits in the packet header.  In case a Service Provider only uses the
953	   three most significant bits of the DSCP field (corresponding to IP
954	   Precedence) for QoS classification and queuing, it is possible to use
955	   the two less significant bits of the DSCP field (bit 0 and bit 1) to
956	   implement the method without affecting QoS policies.  One of the two
957	   bits (bit 0) could be used to identify flows subject to traffic
958	   monitoring (set to 1 if the flow is under monitoring, otherwise it is
959	   set to 0), while the second (bit 1) can be used for coloring the
960	   traffic (switching between values 0 and 1, corresponding to color A
961	   and B) and creating the blocks.

963	   In practice, coloring the traffic using the DSCP field can be
964	   implemented by configuring on the router output interface an access
965	   list that intercepts the flow(s) to be monitored and applies to them
966	   a policy that sets the DSCP field accordingly.  Since traffic
967	   coloring has to be switched between the two values over time, the
968	   policy needs to be modified periodically: an automatic script is used
969	   to perform this task on the basis of a fixed timer.

971	   In Telecom Italia's implementation the timer is set to 5 minutes:
972	   this value showed to be a good compromise between measurement
973	   frequency and stability of the measurement (i.e. possibility to
974	   collect all the measures referring to the same block).

976	   If traffic is colored using the DSCP field an access-list that
977	   matches specific DSCP values can be used to count the packets of the
978	   flow(s) being monitored.  The access-list is installed on all the
979	   routers of the path.  Also, a 5 minutes timer for color switching is
980	   a safe choice for reading the counters.

982	   The counters are collected by using an automatic script that sends
983	   out these to a Network Management System (NMS).  The NMS is
984	   responsible for packet loss calculation, performed by comparing the
985	   values of counters from the routers along the flow(s) path.

987	5.1.1.  Metric transparency

989	   Since a Service Provider application is described here, the method
990	   can be applied to end-to-end services supplied to Customers.  So it
991	   is important to highlight that the method SHOULD be transparent
992	   outside the Service Provider domain.

994	   In Telecom Italia's implementation the source node colors the packets
995	   with a policy that is modified periodically via an automatic script
996	   in order to alternate the DSCP field of the packets.  The nodes
997	   between source and destination (included) have to count with an
998	   access-list the colored packets that they receive and forward.

1000	   Moreover the destination node has an important role: the colored
1001	   packets are intercepted and a policy restores and sets the DSCP field
1002	   of all the packets to the initial value.  In this way the metric is
1003	   transparent because outside the section of the network under
1004	   monitoring the traffic flow is unchanged.

1006	   In such a case, thanks to this restoring technique, network elements
1007	   outside the Alternate Marking monitoring domain (e.g. the two
1008	   Provider Edge nodes of the Mobile Backhauling VPN MPLS) are totally
1009	   anaware that packets were marked.  So this restoring technique makes
1010	   Alternate Marking completely transparent outside its monitoring
1011	   domain.

1013	5.2.  IP flow performance measurement (IPFPM)

1015	   This application of marking method is described in
1016	   [I-D.chen-ippm-coloring-based-ipfpm-framework].

1018	5.3.  OAM Passive Performance Measurement

1020	   In [I-D.ietf-bier-mpls-encapsulation] two OAM bits from Bit Index
1021	   Explicit Replication (BIER) Header are reserved for the passive
1022	   performance measurement marking method.  [I-D.ietf-bier-pmmm-oam]
1023	   details the measurement for multicast service over BIER domain.

1025	   In addition, the alternate marking method could also be used in a
1026	   Service Function Chaining (SFC) domain.

1028	   The application of the marking method to Network Virtualization
1029	   Overlays (NVO3) protocols is a work in progress (see
1030	   [I-D.ietf-nvo3-encap]).

1032	5.4.  RFC6374 Use Case

1034	   RFC6374 [RFC6374] uses the LM packet as the packet accounting
1035	   demarcation point.  Unfortunately this gives rise to a number of
1036	   problems that may lead to significant packet accounting errors in
1037	   certain situations.  [I-D.ietf-mpls-flow-ident] discusses the desired
1038	   capabilities for MPLS flow identification in order to perform a
1039	   better in-band performance monitoring of user data packets.  A method
1040	   of accomplishing identification is Synonymous Flow Labels (SFL)
1041	   introduced in [I-D.bryant-mpls-sfl-framework], while
1042	   [I-D.ietf-mpls-rfc6374-sfl] describes RFC6374 performance
1043	   measurements with SFL.

1045	5.5.  Application to active performance measurement

1047	   [I-D.fioccola-ippm-alt-mark-active] describes how to extend the
1048	   existing Active Measurement Protocol, in order to implement alternate
1049	   marking methodology.  [I-D.fioccola-ippm-rfc6812-alt-mark-ext]
1050	   describes an extension to the Cisco SLA Protocol Measurement-Type
1051	   UDP-Measurement.

1053	6.  Hybrid measurement

1055	   The method has been explicitly designed for passive measurements but
1056	   it can also be used with active measurements.  In order to have both
1057	   end to end measurements and intermediate measurements (hybrid
1058	   measurements) two end points can exchanges artificial traffic flows
1059	   and apply alternate marking over these flows.  In the intermediate
1060	   points artificial traffic is managed in the same way as real traffic
1061	   and measured as specified before.  So the application of marking
1062	   method can simplify also the active measurement, as explained in
1063	   [I-D.fioccola-ippm-alt-mark-active].

1065	7.  Summary

1067	   The advantages of the method described in this document are:

1069	   o  easy implementation: it can be implemented using features already
1070	      available on major routing platforms;

1072	   o  low computational effort: the additional load on processing is
1073	      negligible;

1075	   o  accurate packet loss measurement: single packet loss granularity
1076	      is achieved with a passive measurement;

1078	   o  potential applicability to any kind of packet/frame -based
1079	      traffic: Ethernet, IP, MPLS, etc., both unicast and multicast;

1081	   o  robustness: the method can tolerate out of order packets and it's
1082	      not based on "special" packets whose loss could have a negative
1083	      impact;

1085	   o  no interoperability issues: the features required to implement the
1086	      method are available on all current routing platforms.

1088	   The method doesn't raise any specific need for protocol extension,
1089	   but it could be further improved by means of some extension to
1090	   existing protocols.  Specifically, the use of DiffServ bits for
1091	   coloring the packets could not be a viable solution in some cases: a
1092	   standard method to color the packets for this specific application
1093	   could be beneficial.

1095	8.  Compliance with RFC6390 guidelines

1097	   RFC6390 [RFC6390] defines a framework and a process for developing
1098	   Performance Metrics for protocols above and below the IP layer (such
1099	   as IP-based applications that operate over reliable or datagram
1100	   transport protocols).

1102	   This document doesn't aim to propose a new Performance Metric but a
1103	   new method of measurement for a few Performance Metrics that have
1104	   already been standardized.  Nevertheless, it's worth applying
1105	   [RFC6390] guidelines to the present document, in order to provide a
1106	   more complete and coherent description of the proposed method.  We
1107	   used a subset of the Performance Metric Definition template defined
1108	   by [RFC6390].

1110	   o  Metric name and description: as already stated, this document
1111	      doesn't propose any new Performance Metric.  On the contrary, it
1112	      describes a novel method for measuring packet loss [RFC7680].  The
1113	      same concept, with small differences, can also be used to measure
1114	      delay [RFC7679], and jitter [RFC3393].  The document mainly
1115	      describes the applicability to packet loss measurement.

1117	   o  Method of Measurement or Calculation: according to the method
1118	      described in the previous sections, the number of packets lost is
1119	      calculated by subtracting the value of the counter on the source
1120	      node from the value of the counter on the destination node.  Both
1121	      counters must refer to the same color.  The calculation is
1122	      performed when the value of the counters is in a steady state.

1124	   o  Units of Measurement: the method calculates and reports the exact
1125	      number of packets sent by the source node and not received by the
1126	      destination node.

1128	   o  Measurement Points: the measurement can be performed between
1129	      adjacent nodes, on a per-link basis, or along a multi-hop path,
1130	      provided that the traffic under measurement follows that path.  In
1131	      case of a multi-hop path, the measurements can be performed both
1132	      end-to-end and hop-by-hop.

1134	   o  Measurement Timing: the method have a constraint on the frequency
1135	      of measurements.  In order to perform a measure, the counter must
1136	      be in a steady state: this happens when the traffic is being
1137	      colored with the alternate color; for example in the Telecom
1138	      Italia application of the method the time interval is set to 5
1139	      minutes.

1141	   o  Implementation: the Telecom Italia application of the method uses
1142	      two encodings of the DSCP field to color the packets; this enables
1143	      the use of policy configurations on the router to color the
1144	      packets and accordingly configure the counter for each color.  The
1145	      path followed by traffic being measured should be known in advance
1146	      in order to configure the counters along the path and be able to
1147	      compare the correct values.

1149	   o  Use and Applications: the method can be used to measure packet
1150	      loss with high precision on live traffic; moreover, by combining
1151	      end-to-end and per-link measurements, the method is useful to
1152	      pinpoint the single link that is experiencing loss events.

1154	   o  Reporting Model: the value of the counters has to be sent to a
1155	      centralized management system that perform the calculations; such
1156	      samples must contain a reference to the time interval they refer
1157	      to, so that the management system can perform the correct
1158	      correlation; the samples have to be sent while the corresponding
1159	      counter is in a steady state (within a time interval), otherwise
1160	      the value of the sample should be stored locally.

1162	   o  Dependencies: the values of the counters have to be correlated to
1163	      the time interval they refer to; moreover, as far the Telecom
1164	      Italia application of the method is based on DSCP values, there
1165	      are significant dependencies on the usage of the DSCP field: it
1166	      must be possible to rely on unused DSCP values without affecting
1167	      QoS-related configuration and behavior; moreover, the intermediate
1168	      nodes must not change the value of the DSCP field not to alter the
1169	      measurement.

1171	   o  Organization of Results: the method of measurement produces
1172	      singletons.

1174	   o  Parameters: currently, the main parameter of the method is the
1175	      time interval used to alternate the colors and read the counters.

1177	9.  Security Considerations

1179	   This document specifies a method to perform measurements in the
1180	   context of a Service Provider's network and has not been developed to
1181	   conduct Internet measurements, so it does not directly affect
1182	   Internet security nor applications which run on the Internet.
1183	   However, implementation of this method must be mindful of security
1184	   and privacy concerns.

1186	   There are two types of security concerns: potential harm caused by
1187	   the measurements and potential harm to the measurements.

1189	   o  Harm caused by the measurement: the measurements described in this
1190	      document are passive, so there are no new packets injected into
1191	      the network causing potential harm to the network itself and to
1192	      data traffic.  Nevertheless, the method implies modifications on
1193	      the fly to the IP header of data packets: this must be performed
1194	      in a way that doesn't alter the quality of service experienced by
1195	      packets subject to measurements and that preserve stability and
1196	      performance of routers doing the measurements.  One of the main
1197	      security threats in OAM protocols is network reconnaissance; an
1198	      attacker can gather information about the network performance by
1199	      passively eavesdropping to OAM messages.  The advantage of the
1200	      methods described in this document is that the marking bits are
1201	      the only information that is exchanged between the network
1202	      devices.  Therefore, passive eavesdropping to data plane traffic
1203	      does not allow attackers to gain information about the network
1204	      performance.

1206	   o  Harm to the measurement: the measurements could be harmed by
1207	      routers altering the marking of the packets, or by an attacker
1208	      injecting artificial traffic.  Authentication techniques, such as
1209	      digital signatures, may be used where appropriate to guard against
1210	      injected traffic attacks.  Since the measurement itself may be
1211	      affected by routers (or other network devices) along the path of
1212	      IP packets intentionally altering the value of marking bits of
1213	      packets, as mentioned above, the mechanism specified in this
1214	      document can be applied just in the context of a controlled
1215	      domain, and thus the routers (or other network devices) are
1216	      locally administered and this type of attack can be avoided.  In
1217	      addition, an attacker can't gain information about network
1218	      performance from a single monitoring point, and must use
1219	      synchronized monitoring points at multiple points on the path,
1220	      because they have to do the same kind of measurement and
1221	      aggregation that Service Providers using Alternate Marking must
1222	      do.

1224	   The privacy concerns of network measurement are limited because the
1225	   method only relies on information contained in the IP header without
1226	   any release of user data.

1228	   Delay attacks are another potential threat in the context of this
1229	   document.  Delay measurement is performed using a specific packet in
1230	   each block, marked by a dedicated color bit.  Therefore, a man-in-
1231	   the-middle attacker can selectively induce synthetic delay only to
1232	   delay-colored packets, causing systematic error in the delay
1233	   measurements.  As discussed in previous sections, the methods
1234	   described in this document rely on an underlying time synchronization
1235	   protocol.  Thus, by attacking the time protocol an attacker can
1236	   potentially compromise the integrity of the measurement.  A detailed
1237	   discussion about the threats against time protocols and how to
1238	   mitigate them is presented in RFC 7384 [RFC7384].

1240	10.  IANA Considerations

1242	   There are no IANA actions required.

1244	11.  Acknowledgements

1246	   The previous IETF drafts about this technique were:
1247	   [I-D.cociglio-mboned-multicast-pm] and [I-D.tempia-opsawg-p3m].

1249	   The authors would like to thank Alberto Tempia Bonda, Domenico
1250	   Laforgia, Daniele Accetta and Mario Bianchetti for their contribution
1251	   to the definition and the implementation of the method.

1253	12.  References

1255	12.1.  Normative References

1257	   [IEEE-1588]
1258	              IEEE 1588-2008, "IEEE Standard for a Precision Clock
1259	              Synchronization Protocol for Networked Measurement and
1260	              Control Systems", July 2008.

1262	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1263	              Requirement Levels", BCP 14, RFC 2119,
1264	              DOI 10.17487/RFC2119, March 1997,
1265	              <https://www.rfc-editor.org/info/rfc2119>.

1267	   [RFC3393]  Demichelis, C. and P. Chimento, "IP Packet Delay Variation
1268	              Metric for IP Performance Metrics (IPPM)", RFC 3393,
1269	              DOI 10.17487/RFC3393, November 2002,
1270	              <https://www.rfc-editor.org/info/rfc3393>.

1272	   [RFC5905]  Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
1273	              "Network Time Protocol Version 4: Protocol and Algorithms
1274	              Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
1275	              <https://www.rfc-editor.org/info/rfc5905>.

1277	   [RFC7679]  Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton,
1278	              Ed., "A One-Way Delay Metric for IP Performance Metrics
1279	              (IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, January
1280	              2016, <https://www.rfc-editor.org/info/rfc7679>.

1282	   [RFC7680]  Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton,
1283	              Ed., "A One-Way Loss Metric for IP Performance Metrics
1284	              (IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, January
1285	              2016, <https://www.rfc-editor.org/info/rfc7680>.

1287	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1288	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1289	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1291	12.2.  Informative References

1293	   [I-D.bryant-mpls-sfl-framework]
1294	              Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S.,
1295	              and G. Mirsky, "Synonymous Flow Label Framework", draft-
1296	              bryant-mpls-sfl-framework-05 (work in progress), June
1297	              2017.

1299	   [I-D.chen-ippm-coloring-based-ipfpm-framework]
1300	              Chen, M., Zheng, L., Mirsky, G., Fioccola, G., and T.
1301	              Mizrahi, "IP Flow Performance Measurement Framework",
1302	              draft-chen-ippm-coloring-based-ipfpm-framework-06 (work in
1303	              progress), March 2016.

1305	   [I-D.cociglio-mboned-multicast-pm]
1306	              Cociglio, M., Capello, A., Bonda, A., and L. Castaldelli,
1307	              "A method for IP multicast performance monitoring", draft-
1308	              cociglio-mboned-multicast-pm-01 (work in progress),
1309	              October 2010.

1311	   [I-D.fioccola-ippm-alt-mark-active]
1312	              Fioccola, G., Clemm, A., Bryant, S., Cociglio, M.,
1313	              Chandramouli, M., and A. Capello, "Alternate Marking
1314	              Extension to Active Measurement Protocol", draft-fioccola-
1315	              ippm-alt-mark-active-01 (work in progress), March 2017.

1317	   [I-D.fioccola-ippm-rfc6812-alt-mark-ext]
1318	              Fioccola, G., Clemm, A., Cociglio, M., Chandramouli, M.,
1319	              and A. Capello, "Alternate Marking Extension to Cisco SLA
1320	              Protocol RFC6812", draft-fioccola-ippm-rfc6812-alt-mark-
1321	              ext-01 (work in progress), March 2016.

1323	   [I-D.ietf-bier-mpls-encapsulation]
1324	              Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J.,
1325	              Aldrin, S., and I. Meilik, "Encapsulation for Bit Index
1326	              Explicit Replication in MPLS and non-MPLS Networks",
1327	              draft-ietf-bier-mpls-encapsulation-07 (work in progress),
1328	              June 2017.

1330	   [I-D.ietf-bier-pmmm-oam]
1331	              Mirsky, G., Zheng, L., Chen, M., and G. Fioccola,
1332	              "Performance Measurement (PM) with Marking Method in Bit
1333	              Index Explicit Replication (BIER) Layer", draft-ietf-bier-
1334	              pmmm-oam-02 (work in progress), July 2017.

1336	   [I-D.ietf-mpls-flow-ident]
1337	              Bryant, S., Pignataro, C., Chen, M., Li, Z., and G.
1338	              Mirsky, "MPLS Flow Identification Considerations", draft-
1339	              ietf-mpls-flow-ident-05 (work in progress), July 2017.

1341	   [I-D.ietf-mpls-rfc6374-sfl]
1342	              Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S.,
1343	              Mirsky, G., and G. Fioccola, "RFC6374 Synonymous Flow
1344	              Labels", draft-ietf-mpls-rfc6374-sfl-00 (work in
1345	              progress), June 2017.

1347	   [I-D.ietf-nvo3-encap]
1348	              Boutros, S., Ganga, I., Garg, P., Manur, R., Mizrahi, T.,
1349	              Mozes, D., and E. Nordmark, "NVO3 Encapsulation
1350	              Considerations", draft-ietf-nvo3-encap-00 (work in
1351	              progress), June 2017.

1353	   [I-D.tempia-opsawg-p3m]
1354	              Capello, A., Cociglio, M., Castaldelli, L., and A. Bonda,
1355	              "A packet based method for passive performance
1356	              monitoring", draft-tempia-opsawg-p3m-04 (work in
1357	              progress), February 2014.

1359	   [RFC5481]  Morton, A. and B. Claise, "Packet Delay Variation
1360	              Applicability Statement", RFC 5481, DOI 10.17487/RFC5481,
1361	              March 2009, <https://www.rfc-editor.org/info/rfc5481>.

1363	   [RFC6374]  Frost, D. and S. Bryant, "Packet Loss and Delay
1364	              Measurement for MPLS Networks", RFC 6374,
1365	              DOI 10.17487/RFC6374, September 2011,
1366	              <https://www.rfc-editor.org/info/rfc6374>.

1368	   [RFC6390]  Clark, A. and B. Claise, "Guidelines for Considering New
1369	              Performance Metric Development", BCP 170, RFC 6390,
1370	              DOI 10.17487/RFC6390, October 2011,
1371	              <https://www.rfc-editor.org/info/rfc6390>.

1373	   [RFC6703]  Morton, A., Ramachandran, G., and G. Maguluri, "Reporting
1374	              IP Network Performance Metrics: Different Points of View",
1375	              RFC 6703, DOI 10.17487/RFC6703, August 2012,
1376	              <https://www.rfc-editor.org/info/rfc6703>.

1378	   [RFC7276]  Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
1379	              Weingarten, "An Overview of Operations, Administration,
1380	              and Maintenance (OAM) Tools", RFC 7276,
1381	              DOI 10.17487/RFC7276, June 2014,
1382	              <https://www.rfc-editor.org/info/rfc7276>.

1384	   [RFC7384]  Mizrahi, T., "Security Requirements of Time Protocols in
1385	              Packet Switched Networks", RFC 7384, DOI 10.17487/RFC7384,
1386	              October 2014, <https://www.rfc-editor.org/info/rfc7384>.

1388	   [RFC7799]  Morton, A., "Active and Passive Metrics and Methods (with
1389	              Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
1390	              May 2016, <https://www.rfc-editor.org/info/rfc7799>.

1392	Authors' Addresses

1394	   Giuseppe Fioccola (editor)
1395	   Telecom Italia
1396	   Via Reiss Romoli, 274
1397	   Torino  10148
1398	   Italy

1400	   Email: giuseppe.fioccola@telecomitalia.it

1402	   Alessandro Capello
1403	   Telecom Italia
1404	   Via Reiss Romoli, 274
1405	   Torino  10148
1406	   Italy

1408	   Email: alessandro.capello@telecomitalia.it
1409	   Mauro Cociglio
1410	   Telecom Italia
1411	   Via Reiss Romoli, 274
1412	   Torino  10148
1413	   Italy

1415	   Email: mauro.cociglio@telecomitalia.it

1417	   Luca Castaldelli
1418	   Telecom Italia
1419	   Via Reiss Romoli, 274
1420	   Torino  10148
1421	   Italy

1423	   Email: luca.castaldelli@telecomitalia.it

1425	   Mach(Guoyi) Chen
1426	   Huawei Technologies

1428	   Email: mach.chen@huawei.com

1430	   Lianshu Zheng
1431	   Huawei Technologies

1433	   Email: vero.zheng@huawei.com

1435	   Greg Mirsky
1436	   ZTE
1437	   USA

1439	   Email: gregimirsky@gmail.com

1441	   Tal Mizrahi
1442	   Marvell
1443	   6 Hamada st.
1444	   Yokneam
1445	   Israel

1447	   Email: talmi@marvell.com