idnits 2.17.1 

draft-ietf-ippm-alt-mark-14.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (December 7, 2017) is 2331 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Outdated reference: A later version (-04) exists of
     draft-fioccola-ippm-multipoint-alt-mark-01

  == Outdated reference: A later version (-15) exists of
     draft-ietf-bier-pmmm-oam-03

  == Outdated reference: A later version (-07) exists of
     draft-ietf-mpls-flow-ident-05

  == Outdated reference: A later version (-10) exists of
     draft-ietf-mpls-rfc6374-sfl-01

  == Outdated reference: A later version (-12) exists of
     draft-ietf-nvo3-encap-01


     Summary: 0 errors (**), 0 flaws (~~), 6 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.
--------------------------------------------------------------------------------


2	Network Working Group                                   G. Fioccola, Ed.
3	Internet-Draft                                                A. Capello
4	Intended status: Experimental                                M. Cociglio
5	Expires: June 10, 2018                                    L. Castaldelli
6	                                                          Telecom Italia
7	                                                                 M. Chen
8	                                                                L. Zheng
9	                                                     Huawei Technologies
10	                                                               G. Mirsky
11	                                                                     ZTE
12	                                                              T. Mizrahi
13	                                                                 Marvell
14	                                                        December 7, 2017

16	 Alternate Marking method for passive and hybrid performance monitoring
17	                      draft-ietf-ippm-alt-mark-14

19	Abstract

21	   This document describes a method to perform packet loss, delay and
22	   jitter measurements on live traffic.  This method is based on
23	   Alternate Marking (Coloring) technique.  A report is provided in
24	   order to explain an example and show the method applicability.  This
25	   technology can be applied in various situations as detailed in this
26	   document and could be considered passive or hybrid depending on the
27	   application.

29	Requirements Language

31	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
32	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
33	   "OPTIONAL" in this document are to be interpreted as described in BCP
34	   14 [RFC2119] [RFC8174] when, and only when, they appear in all
35	   capitals, as shown here.

37	Status of This Memo

39	   This Internet-Draft is submitted in full conformance with the
40	   provisions of BCP 78 and BCP 79.

42	   Internet-Drafts are working documents of the Internet Engineering
43	   Task Force (IETF).  Note that other groups may also distribute
44	   working documents as Internet-Drafts.  The list of current Internet-
45	   Drafts is at https://datatracker.ietf.org/drafts/current/.

47	   Internet-Drafts are draft documents valid for a maximum of six months
48	   and may be updated, replaced, or obsoleted by other documents at any
49	   time.  It is inappropriate to use Internet-Drafts as reference
50	   material or to cite them other than as "work in progress."

52	   This Internet-Draft will expire on June 10, 2018.

54	Copyright Notice

56	   Copyright (c) 2017 IETF Trust and the persons identified as the
57	   document authors.  All rights reserved.

59	   This document is subject to BCP 78 and the IETF Trust's Legal
60	   Provisions Relating to IETF Documents
61	   (https://trustee.ietf.org/license-info) in effect on the date of
62	   publication of this document.  Please review these documents
63	   carefully, as they describe your rights and restrictions with respect
64	   to this document.  Code Components extracted from this document must
65	   include Simplified BSD License text as described in Section 4.e of
66	   the Trust Legal Provisions and are provided without warranty as
67	   described in the Simplified BSD License.

69	Table of Contents

71	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   3
72	   2.  Overview of the method  . . . . . . . . . . . . . . . . . . .   5
73	   3.  Detailed description of the method  . . . . . . . . . . . . .   6
74	     3.1.  Packet loss measurement . . . . . . . . . . . . . . . . .   6
75	       3.1.1.  Coloring the packets  . . . . . . . . . . . . . . . .  11
76	       3.1.2.  Counting the packets  . . . . . . . . . . . . . . . .  11
77	       3.1.3.  Collecting data and calculating packet loss . . . . .  12
78	     3.2.  Timing aspects  . . . . . . . . . . . . . . . . . . . . .  13
79	     3.3.  One-way delay measurement . . . . . . . . . . . . . . . .  14
80	       3.3.1.  Single marking methodology  . . . . . . . . . . . . .  14
81	       3.3.2.  Double marking methodology  . . . . . . . . . . . . .  16
82	     3.4.  Delay variation measurement . . . . . . . . . . . . . . .  17
83	   4.  Considerations  . . . . . . . . . . . . . . . . . . . . . . .  18
84	     4.1.  Synchronization . . . . . . . . . . . . . . . . . . . . .  18
85	     4.2.  Data Correlation  . . . . . . . . . . . . . . . . . . . .  19
86	     4.3.  Packet Re-ordering  . . . . . . . . . . . . . . . . . . .  20
87	   5.  Applications, implementation and deployment . . . . . . . . .  20
88	     5.1.  Report on the operational experiment  . . . . . . . . . .  21
89	       5.1.1.  Metric transparency . . . . . . . . . . . . . . . . .  23
90	   6.  Hybrid measurement  . . . . . . . . . . . . . . . . . . . . .  24
91	   7.  Compliance with RFC6390 guidelines  . . . . . . . . . . . . .  24
92	   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  26
93	   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  28
94	   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  28
95	   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  28
96	     11.1.  Normative References . . . . . . . . . . . . . . . . . .  28
97	     11.2.  Informative References . . . . . . . . . . . . . . . . .  29
98	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  31

100	1.  Introduction

102	   Nowadays, most Service Providers' networks carry traffic with
103	   contents that are highly sensitive to packet loss [RFC7680], delay
104	   [RFC7679], and jitter [RFC3393].

106	   In view of this scenario, Service Providers need methodologies and
107	   tools to monitor and measure network performances with an adequate
108	   accuracy, in order to constantly control the quality of experience
109	   perceived by their customers.  On the other hand, performance
110	   monitoring provides useful information for improving network
111	   management (e.g.  isolation of network problems, troubleshooting,
112	   etc.).

114	   A lot of work related to OAM, that includes also performance
115	   monitoring techniques, has been done by Standards Developing
116	   Organizations(SDOs): [RFC7276] provides a good overview of existing
117	   OAM mechanisms defined in IETF, ITU-T and IEEE.  Considering IETF, a
118	   lot of work has been done on fault detection and connectivity
119	   verification, while a minor effort has been dedicated so far to
120	   performance monitoring.  The IPPM WG has defined standard metrics to
121	   measure network performance; however, the methods developed in this
122	   WG mainly refer to focus on active measurement techniques.  More
123	   recently, the MPLS WG has defined mechanisms for measuring packet
124	   loss, one-way and two-way delay, and delay variation in MPLS
125	   networks[RFC6374], but their applicability to passive measurements
126	   has some limitations, especially for pure connection-less networks.

128	   The lack of adequate tools to measure packet loss with the desired
129	   accuracy drove an effort to design a new method for the performance
130	   monitoring of live traffic, easy to implement and deploy.  The effort
131	   led to the method described in this document: basically, it is a
132	   passive performance monitoring technique, potentially applicable to
133	   any kind of packet based traffic, including Ethernet, IP, and MPLS,
134	   both unicast and multicast.  The method addresses primarily packet
135	   loss measurement, but it can be easily extended to one-way delay and
136	   delay variation measurements as well.

138	   The method has been explicitly designed for passive measurements but
139	   it can also be used with active probes.  Passive measurements are
140	   usually more easily understood by customers and provide a much better
141	   accuracy, especially for packet loss measurements.

143	   RFC 7799 [RFC7799] defines passive and hybrid methods of measurement.
144	   In particular, Passive Methods of Measurement are based solely on
145	   observations of an undisturbed and unmodified packet stream of
146	   interest; Hybrid Methods are Methods of Measurement that use a
147	   combination of Active Methods and Passive Methods.

149	   Taking into consideration these definitions, Alternate Marking Method
150	   could be considered Hybrid or Passive depending on the case.  In case
151	   the marking method is obtained by changing existing field values of
152	   the packets (e.g.  DSCP field), the technique is Hybrid.  In case the
153	   marking field is dedicated, reserved and is included in the protocol
154	   specification Alternate Marking technique can be considered as
155	   Passive (e.g.  RFC6374 Synonymous Flow Label or OAM Marking Bits in
156	   BIER Header).

158	   The advantages of the method described in this document are:

160	   o  easy implementation: it can be implemented or by using features
161	      already available on major routing platforms as described in
162	      Section 5.1 or by applying an optimized implementation of the
163	      method for both legacy and newest technologies;

165	   o  low computational effort: the additional load on processing is
166	      negligible;

168	   o  accurate packet loss measurement: single packet loss granularity
169	      is achieved with a passive measurement;

171	   o  potential applicability to any kind of packet/frame -based
172	      traffic: Ethernet, IP, MPLS, etc., both unicast and multicast;

174	   o  robustness: the method can tolerate out of order packets and it's
175	      not based on "special" packets whose loss could have a negative
176	      impact;

178	   o  flexibility: all the timestamp formats are allowed, because they
179	      are managed out-of-band.  The format (the Network Time Protocol
180	      (NTP) RFC 5905 [RFC5905] or the IEEE 1588 Precision Time Protocol
181	      (PTP) [IEEE-1588]) depends on the precision you want;

183	   o  no interoperability issues: the features required to experiment
184	      and test the method (as described in Section 5.1) are available on
185	      all current routing platforms.  Both a centarlized or distributed
186	      solution can be used to harvest data from the routers.

188	   The method doesn't raise any specific need for protocol extension,
189	   but it could be further improved by means of some extension to
190	   existing protocols.  Specifically, the use of DiffServ bits for
191	   coloring the packets could not be a viable solution in some cases: a
192	   standard method to color the packets for this specific application
193	   could be beneficial.

195	2.  Overview of the method

197	   In order to perform packet loss measurements on a production traffic
198	   flow, different approaches exist.  The most intuitive one consists in
199	   numbering the packets, so that each router that receives the flow can
200	   immediately detect a packet missing.  This approach, though very
201	   simple in theory, is not simple to achieve: it requires the insertion
202	   of a sequence number into each packet and the devices must be able to
203	   extract the number and check it in real time.  Such a task can be
204	   difficult to implement on live traffic: if UDP is used as the
205	   transport protocol, the sequence number is not available; on the
206	   other hand, if a higher layer sequence number (e.g. in the RTP
207	   header) is used, extracting that information from each packet and
208	   process it in real time could overload the device.

210	   An alternate approach is to count the number of packets sent on one
211	   end, the number of packets received on the other end, and to compare
212	   the two values.  This operation is much simpler to implement, but
213	   requires that the devices performing the measurement are in sync: in
214	   order to compare two counters it is required that they refer exactly
215	   to the same set of packets.  Since a flow is continuous and cannot be
216	   stopped when a counter has to be read, it can be difficult to
217	   determine exactly when to read the counter.  A possible solution to
218	   overcome this problem is to virtually split the flow in consecutive
219	   blocks by inserting periodically a delimiter so that each counter
220	   refers exactly to the same block of packets.  The delimiter could be
221	   for example a special packet inserted artificially into the flow.
222	   However, delimiting the flow using specific packets has some
223	   limitations.  First, it requires generating additional packets within
224	   the flow and requires the equipment to be able to process those
225	   packets.  In addition, the method is vulnerable to out of order
226	   reception of delimiting packets and, to a lesser extent, to their
227	   loss.

229	   The method proposed in this document follows the second approach, but
230	   it doesn't use additional packets to virtually split the flow in
231	   blocks.  Instead, it "marks" the packets so that the packets
232	   belonging to the same block will have the same color, whilst
233	   consecutive blocks will have different colors.  Each change of color
234	   represents a sort of auto-synchronization signal that guarantees the
235	   consistency of measurements taken by different devices along the path
236	   (see also [I-D.cociglio-mboned-multicast-pm] and
237	   [I-D.tempia-opsawg-p3m], where this technique was introduced).

239	   Figure 1 represents a very simple network and shows how the method
240	   can be used to measure packet loss on different network segments: by
241	   enabling the measurement on several interfaces along the path, it is
242	   possible to perform link monitoring, node monitoring or end-to-end
243	   monitoring.  The method is flexible enough to measure packet loss on
244	   any segment of the network and can be used to isolate the faulty
245	   element.

247	                            Traffic flow
248	        ========================================================>
249	          +------+       +------+       +------+       +------+
250	      ---<>  R1  <>-----<>  R2  <>-----<>  R3  <>-----<>  R4  <>---
251	          +------+       +------+       +------+       +------+
252	          .              .      .              .       .      .
253	          .              .      .              .       .      .
254	          .              <------>              <------->      .
255	          .          Node Packet Loss      Link Packet Loss   .
256	          .                                                   .
257	          <--------------------------------------------------->
258	                           End-to-End Packet loss

260	                     Figure 1: Available measurements

262	3.  Detailed description of the method

264	   This section describes in detail how the method operate.  A special
265	   emphasis is given to the measurement of packet loss, that represents
266	   the core application of the method, but applicability to delay and
267	   jitter measurements is also considered.

269	3.1.  Packet loss measurement

271	   The basic idea is to virtually split traffic flows into consecutive
272	   blocks: each block represents a measurable entity unambiguously
273	   recognizable by all network devices along the path.  By counting the
274	   number of packets in each block and comparing the values measured by
275	   different network devices along the path, it is possible to measure
276	   packet loss occurred in any single block between any two points.

278	   As discussed in the previous section, a simple way to create the
279	   blocks is to "color" the traffic (two colors are sufficient) so that
280	   packets belonging to different consecutive blocks will have different
281	   colors.  Whenever the color changes, the previous block terminates
282	   and the new one begins.  Hence, all the packets belonging to the same
283	   block will have the same color and packets of different consecutive
284	   blocks will have different colors.  The number of packets in each
285	   block depends on the criterion used to create the blocks:

287	   o  if the color is switched after a fixed number of packets, then
288	      each block will contain the same number of packets (except for any
289	      losses);

291	   o  if the color is switched according to a fixed timer, then the
292	      number of packets may be different in each block depending on the
293	      packet rate.

295	   The following figure shows how a flow looks like when it is split in
296	   traffic blocks with colored packets.

298	   A: packet with A coloring
299	   B: packet with B coloring

301	            |           |           |           |           |
302	            |           |    Traffic flow       |           |
303	    ------------------------------------------------------------------->
304	     BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA
305	    ------------------------------------------------------------------->
306	       ...  |  Block 5  |  Block 4  |  Block 3  |  Block 2  |  Block 1
307	            |           |           |           |           |

309	                        Figure 2: Traffic coloring

311	   Figure 3 shows how the method can be used to measure link packet loss
312	   between two adjacent nodes.

314	   Referring to the figure, let's assume we want to monitor the packet
315	   loss on the link between two routers: router R1 and router R2.
316	   According to the method, the traffic is colored alternatively with
317	   two different colors, A and B.  Whenever the color changes, the
318	   transition generates a sort of square-wave signal, as depicted in the
319	   following figure.

321	   Color A   ----------+           +-----------+           +----------
322	                       |           |           |           |
323	   Color B             +-----------+           +-----------+
324	              Block n        ...      Block 3     Block 2     Block 1
325	            <---------> <---------> <---------> <---------> <--------->

327	                                Traffic flow
328	            ===========================================================>
329	   Color ...AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA...
330	            ===========================================================>

332	                 Figure 3: Computation of link packet loss

334	   Traffic coloring could be done by R1 itself or it is already done
335	   before.  R1 needs two counters, C(A)R1 and C(B)R1, on its egress
336	   interface: C(A)R1 counts the packets with color A and C(B)R1 counts
337	   those with color B.  As long as traffic is colored A, only counter
338	   C(A)R1 will be incremented, while C(B)R1 is not incremented; vice
339	   versa, when the traffic is colored as B, only C(B)R1 is incremented.
340	   C(A)R1 and C(B)R1 can be used as reference values to determine the
341	   packet loss from R1 to any other measurement point down the path.
342	   Router R2, similarly, will need two counters on its ingress
343	   interface, C(A)R2 and C(B)R2, to count the packets received on that
344	   interface and colored with color A and B respectively.  When an A
345	   block ends, it is possible to compare C(A)R1 and C(A)R2 and calculate
346	   the packet loss within the block; similarly, when the successive B
347	   block terminates, it is possible to compare C(B)R1 with C(B)R2, and
348	   so on for every successive block.

350	   Likewise, by using two counters on R2 egress interface it is possible
351	   to count the packets sent out of R2 interface and use them as
352	   reference values to calculate the packet loss from R2 to any
353	   measurement point down R2.

355	   Using a fixed timer for color switching offers a better control over
356	   the method: the (time) length of the blocks can be chosen large
357	   enough to simplify the collection and the comparison of measures
358	   taken by different network devices.  It's preferable to read the
359	   value of the counters not immediately after the color switch: some
360	   packets could arrive out of order and increment the counter
361	   associated to the previous block (color), so it is worth waiting for
362	   some time.  A safe choice is to wait L/2 time units (where L is the
363	   duration for each block) after the color switch, to read the still
364	   counter of the previous color, so the possibility to read a running
365	   counter instead of a still one is minimized.  The drawback is that
366	   the longer the duration of the block, the less frequent the
367	   measurement can be taken.

369	   The following table shows how the counters can be used to calculate
370	   the packet loss between R1 and R2.  The first column lists the
371	   sequence of traffic blocks while the other columns contain the
372	   counters of A-colored packets and B-colored packets for R1 and R2.
373	   In this example, we assume that the values of the counters are reset
374	   to zero whenever a block ends and its associated counter has been
375	   read: with this assumption, the table shows only relative values,
376	   that is the exact number of packets of each color within each block.
377	   If the values of the counters were not reset, the table would contain
378	   cumulative values, but the relative values could be determined simply
379	   by difference from the value of the previous block of the same color.

381	   The color is switched on the basis of a fixed timer (not shown in the
382	   table), so the number of packets in each block is different.

384	           +-------+--------+--------+--------+--------+------+
385	           | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss |
386	           +-------+--------+--------+--------+--------+------+
387	           | 1     | 375    | 0      | 375    | 0      | 0    |
388	           |       |        |        |        |        |      |
389	           | 2     | 0      | 388    | 0      | 388    | 0    |
390	           |       |        |        |        |        |      |
391	           | 3     | 382    | 0      | 381    | 0      | 1    |
392	           |       |        |        |        |        |      |
393	           | 4     | 0      | 377    | 0      | 374    | 3    |
394	           |       |        |        |        |        |      |
395	           | ...   | ...    | ...    | ...    | ...    | ...  |
396	           |       |        |        |        |        |      |
397	           | 2n    | 0      | 387    | 0      | 387    | 0    |
398	           |       |        |        |        |        |      |
399	           | 2n+1  | 379    | 0      | 377    | 0      | 2    |
400	           +-------+--------+--------+--------+--------+------+

402	       Table 1: Evaluation of counters for packet loss measurements

404	   During an A block (blocks 1, 3 and 2n+1), all the packets are
405	   A-colored, therefore the C(A) counters are incremented to the number
406	   seen on the interface, while C(B) counters are zero.  Vice versa,
407	   during a B block (blocks 2, 4 and 2n), all the packets are B-colored:
408	   C(A) counters are zero, while C(B) counters are incremented.

410	   When a block ends (because of color switching) the relative counters
411	   stop incrementing and it is possible to read them, compare the values
412	   measured on router R1 and R2 and calculate the packet loss within
413	   that block.

415	   For example, looking at the table above, during the first block
416	   (A-colored), C(A)R1 and C(A)R2 have the same value (375), which
417	   corresponds to the exact number of packets of the first block (no
418	   loss).  Also during the second block (B-colored) R1 and R2 counters
419	   have the same value (388), which corresponds to the number of packets
420	   of the second block (no loss).  During blocks three and four, R1 and
421	   R2 counters are different, meaning that some packets have been lost:
422	   in the example, one single packet (382-381) was lost during block
423	   three and three packets (377-374) were lost during block four.

425	   The method applied to R1 and R2 can be extended to any other router
426	   and applied to more complex networks, as far as the measurement is
427	   enabled on the path followed by the traffic flow(s) being observed.

429	   It's worth mentioning two different strategies that can be used when
430	   implementing the method:

432	   o  flow-based: the flow-based strategy is used when only a limited
433	      number of traffic flows need to be monitored.  According to this
434	      strategy, only a subset of the flows is colored.  Counters for
435	      packet loss measurements can be instantiated for each single flow,
436	      or for the set as a whole, depending on the desired granularity.
437	      A relevant problem with this approach is the necessity to know in
438	      advance the path followed by flows that are subject to
439	      measurement.  Path rerouting and traffic load-balancing increase
440	      the issue complexity, especially for unicast traffic.  The problem
441	      is easier to solve for multicast traffic where load balancing is
442	      seldom used and static joins are frequently used to force traffic
443	      forwarding and replication.

445	   o  link-based: measurements are performed on all the traffic on a
446	      link by link basis.  The link could be a physical link or a
447	      logical link.  Counters could be instantiated for the traffic as a
448	      whole or for each traffic class (in case it is desired to monitor
449	      each class separately), but in the second case a couple of
450	      counters is needed for each class.

452	   As mentioned, the flow-based measurement requires the identification
453	   of the flow to be monitored and the discovery of the path followed by
454	   the selected flow.  It is possible to monitor a single flow or
455	   multiple flows grouped together, but in this case measurement is
456	   consistent only if all the flows in the group follow the same path.
457	   Moreover if a measurement is performed by grouping many flows, it is
458	   not possible to determine exactly which flow was affected by packets
459	   loss.  In order to have measures per single flow it is necessary to
460	   configure counters for each specific flow.  Once the flow(s) to be
461	   monitored have been identified, it is necessary to configure the
462	   monitoring on the proper nodes.  Configuring the monitoring means
463	   configuring the rule to intercept the traffic and configuring the
464	   counters to count the packets.  To have just an end-to-end
465	   monitoring, it is sufficient to enable the monitoring on the first
466	   and the last hop routers of the path: the mechanism is completely
467	   transparent to intermediate nodes and independent from the path
468	   followed by traffic flows.  On the contrary, to monitor the flow on a
469	   hop-by-hop basis along its whole path it is necessary to enable the
470	   monitoring on every node from the source to the destination.  In case
471	   the exact path followed by the flow is not known a priori (i.e. the
472	   flow has multiple paths to reach the destination) it is necessary to
473	   enable the monitoring system on every path: counters on interfaces
474	   traversed by the flow will report packet count, counters on other
475	   interfaces will be null.

477	3.1.1.  Coloring the packets

479	   The coloring operation is fundamental in order to create packet
480	   blocks.  This implies choosing where to activate the coloring and how
481	   to color the packets.

483	   In case of flow-based measurements, the flow to monitor can be
484	   defined by a set of selection rules (e.g. headers fields) used to
485	   match a subset of the packets; in this way it is possible to control
486	   the number of involved nodes, the path followed by the packets and
487	   the size of the flows.  it is possible, in general, to have multiple
488	   coloring nodes or a single coloring node that is easier to manage and
489	   doesn't rise any risk of conflict.  Coloring in multiple nodes can be
490	   done and the requirement is that the coloring must change
491	   periodically between the nodes according to the timing considerations
492	   in Section 3.2; so every node, that is designated as a measurement
493	   point along the path, should be able to identify unambiguously the
494	   colored packets.  Furthermore [I-D.fioccola-ippm-multipoint-alt-mark]
495	   generalizes the coloring for multipoint to multipoint flow.  In
496	   addition, it can be advantageous to color the flow as close as
497	   possible to the source because it allows an end-to-end measure if a
498	   measurement point is enabled on the last-hop router as well.

500	   For link-based measurements, all traffic needs to be colored when
501	   transmitted on the link.  If the traffic had already been colored,
502	   then it has to be re-colored because the color must be consistent on
503	   the link.  This means that each hop along the path must (re-)color
504	   the traffic; the color is not required to be consistent along
505	   different links.

507	   Traffic coloring can be implemented by setting a specific bit in the
508	   packet header and changing the value of that bit periodically.  How
509	   to choose the marking field depends on the application and is out of
510	   scope here.  However some applications are reported in Section 5.

512	3.1.2.  Counting the packets

514	   For flow-based measurements, assuming that the coloring of the
515	   packets is performed only by the source nodes, the nodes between
516	   source and destination (included) have to count the colored packets
517	   that they receive and forward: this operation can be enabled on every
518	   router along the path or only on a subset, depending on which network
519	   segment is being monitored (a single link, a particular metro area,
520	   the backbone, the whole path).  Since the color switches periodically
521	   between two values, two counters (one for each value) are needed: one
522	   counter for packets with color A and one counter for packets with
523	   color B.  For each flow (or group of flows) being monitored and for
524	   every interface where the monitoring is active, a couple of counters
525	   is needed.  For example, in order to monitor separately 3 flows on a
526	   router with 4 interfaces involved, 24 counters are needed (2 counters
527	   for each of the 3 flows on each of the 4 interfaces).  Furthermore
528	   [I-D.fioccola-ippm-multipoint-alt-mark] generalizes the counting for
529	   multipoint to multipoint flow.

531	   In case of link-based measurements the behaviour is similar except
532	   that coloring and counting operations are performed on a link by link
533	   basis at each endpoint of the link.

535	   Another important aspect to take into consideration is when to read
536	   the counters: in order to count the exact number of packets of a
537	   block the routers must perform this operation when that block has
538	   ended: in other words, the counter for color A must be read when the
539	   current block has color B, in order to be sure that the value of the
540	   counter is stable.  This task can be accomplished in two ways.  The
541	   general approach suggests to read the counters periodically, many
542	   times during a block duration, and to compare these successive
543	   readings: when the counter stops incrementing means that the current
544	   block has ended and its value can be elaborated safely.
545	   Alternatively, if the coloring operation is performed on the basis of
546	   a fixed timer, it is possible to configure the reading of the
547	   counters according to that timer: for example, reading the counter
548	   for color A every period in the middle of the subsequent block with
549	   color B is a safe choice.  A sufficient margin should be considered
550	   between the end of a block and the reading of the counter, in order
551	   to take into account any out-of-order packets.

553	3.1.3.  Collecting data and calculating packet loss

555	   The nodes enabled to perform performance monitoring collect the value
556	   of the counters, but they are not able to directly use this
557	   information to measure packet loss, because they only have their own
558	   samples.  For this reason, an external Network Management System
559	   (NMS) can be used to collect and elaborate data and to perform packet
560	   loss calculation.  The NMS compares the values of counters from
561	   different nodes and can calculate if some packets were lost (even a
562	   single packet) and also where packets were lost.

564	   The value of the counters needs to be transmitted to the NMS as soon
565	   as it has been read.  This can be accomplished by using SNMP or FTP
566	   and can be done in Push Mode or Polling Mode.  In the first case,
567	   each router periodically sends the information to the NMS, in the
568	   latter case it is the NMS that periodically polls routers to collect
569	   information.  In any case, the NMS has to collect all the relevant
570	   values from all the routers within one cycle of the timer.

572	   it would be also possible to use a protocol to exchange values of
573	   counters between the two endpoints in order to let them perform the
574	   packet loss calculation for each traffic direction.

576	   A possible approach for the performance measurement architecture is
577	   explained in [I-D.chen-ippm-coloring-based-ipfpm-framework], while
578	   [I-D.chen-ippm-ipfpm-report] introduces new information elements of
579	   IPFIX (RFC 7011 [RFC7011]).

581	3.2.  Timing aspects

583	   This document introduces two color switching method: one is based on
584	   fixed number of packet, the other is based on fixed timer.  But the
585	   method based on fixed timer is preferable because is more
586	   deterministic, and will be considered in the rest of the dcoument.

588	   By considering the clock error between network devices R1 and R2,
589	   they must be synchronized to the same clock reference with an
590	   accuracy of +/- L/2 time units, where L is the time duration of the
591	   block.  So each colored packet can be assigned to the right batch by
592	   each router.  This is because the minimum time distance between two
593	   packets of the same color but belonging to different batches is L
594	   time units.

596	   In practice, there are also out of order at batch boundaries,
597	   strictly related to the delay between measurement points.  This means
598	   that, without considering clock error, we wait L/2 after color
599	   switching to be sure to take a still counter.

601	   In summary we need to take into account two contributions: clock
602	   error between network devices and the interval we need to wait to
603	   avoid out of order because of network delay.

605	   The following figure explains both issues.

607	   ...BBBBBBBBB | AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA | BBBBBBBBB...
608	                |<======================================>|
609	                |                   L                    |
610	   ...=========>|<==================><==================>|<==========...
611	                |       L/2                   L/2        |
612	                |<===>|                            |<===>|
613	                   d  |                            |   d
614	                      |<==========================>|
615	                       available counting interval

617	                         Figure 4: Timing aspects

619	   It is assumed that all network devices are synchronized to a common
620	   reference time with an accuracy of +/- A/2.  Thus, the difference
621	   between the clock values of any two network devices is bounded by A.

623	   The guardband d is given by:

625	   d = A + D_max - D_min,

627	   where A is the clock accuracy, D_max is an upper bound on the network
628	   delay between the network devices, and D_min is a lower bound on the
629	   delay.

631	   The available counting interval is L - 2d that must be > 0.

633	   The condition that must be satisfied and is a requirement on the
634	   synchronization accuracy is:

636	   d < L/2.

638	3.3.  One-way delay measurement

640	   The same principle used to measure packet loss can be applied also to
641	   one-way delay measurement.  There are three alternatives, as
642	   described hereinafter.

644	3.3.1.  Single marking methodology

646	   The alternation of colors can be used as a time reference to
647	   calculate the delay.  Whenever the color changes (that means that a
648	   new block has started) a network device can store the timestamp of
649	   the first packet of the new block; that timestamp can be compared
650	   with the timestamp of the same packet on a second router to compute
651	   packet delay.  Considering Figure 2, R1 stores a timestamp TS(A1)R1
652	   when it sends the first packet of block 1 (A-colored), a timestamp
653	   TS(B2)R1 when it sends the first packet of block 2 (B-colored) and so
654	   on for every other block.  R2 performs the same operation on the
655	   receiving side, recording TS(A1)R2, TS(B2)R2 and so on.  Since the
656	   timestamps refer to specific packets (the first packet of each block)
657	   we are sure that timestamps compared to compute delay refer to the
658	   same packets.  By comparing TS(A1)R1 with TS(A1)R2 (and similarly
659	   TS(B2)R1 with TS(B2)R2 and so on) it is possible to measure the delay
660	   between R1 and R2.  In order to have more measurements, it is
661	   possible to take and store more timestamps, referring to other
662	   packets within each block.

664	   In order to coherently compare timestamps collected on different
665	   routers, the clocks on the network nodes must be in sync.
666	   Furthermore, a measurement is valid only if no packet loss occurs and
667	   if packet misordering can be avoided, otherwise the first packet of a
668	   block on R1 could be different from the first packet of the same
669	   block on R2 (f.i. if that packet is lost between R1 and R2 or it
670	   arrives after the next one).

672	   The following table shows how timestamps can be used to calculate the
673	   delay between R1 and R2.  The first column lists the sequence of
674	   blocks while other columns contain the timestamp referring to the
675	   first packet of each block on R1 and R2.  The delay is computed as a
676	   difference between timestamps.  For the sake of simplicity, all the
677	   values are expressed in milliseconds.

679	      +-------+---------+---------+---------+---------+-------------+
680	      | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 |
681	      +-------+---------+---------+---------+---------+-------------+
682	      | 1     | 12.483  | -       | 15.591  | -       | 3.108       |
683	      |       |         |         |         |         |             |
684	      | 2     | -       | 6.263   | -       | 9.288   | 3.025       |
685	      |       |         |         |         |         |             |
686	      | 3     | 27.556  | -       | 30.512  | -       | 2.956       |
687	      |       |         |         |         |         |             |
688	      |       | -       | 18.113  | -       | 21.269  | 3.156       |
689	      |       |         |         |         |         |             |
690	      | ...   | ...     | ...     | ...     | ...     | ...         |
691	      |       |         |         |         |         |             |
692	      | 2n    | 77.463  | -       | 80.501  | -       | 3.038       |
693	      |       |         |         |         |         |             |
694	      | 2n+1  | -       | 24.333  | -       | 27.433  | 3.100       |
695	      +-------+---------+---------+---------+---------+-------------+

697	         Table 2: Evaluation of timestamps for delay measurements

699	   The first row shows timestamps taken on R1 and R2 respectively and
700	   referring to the first packet of block 1 (which is A-colored).  Delay
701	   can be computed as a difference between the timestamp on R2 and the
702	   timestamp on R1.  Similarly, the second row shows timestamps (in
703	   milliseconds) taken on R1 and R2 and referring to the first packet of
704	   block 2 (which is B-colored).  Comparing timestamps taken on
705	   different nodes in the network and referring to the same packets
706	   (identified using the alternation of colors) it is possible to
707	   measure delay on different network segments.

709	   For the sake of simplicity, in the above example a single measurement
710	   is provided within a block, taking into account only the first packet
711	   of each block.  The number of measurements can be easily increased by
712	   considering multiple packets in the block: for instance, a timestamp
713	   could be taken every N packets, thus generating multiple delay
714	   measurements.  Taking this to the limit, in principle the delay could
715	   be measured for each packet, by taking and comparing the
716	   corresponding timestamps (possible but impractical from an
717	   implementation point of view).

719	3.3.1.1.  Mean delay

721	   As mentioned before, the method previously exposed for measuring the
722	   delay is sensitive to out of order reception of packets.  In order to
723	   overcome this problem, a different approach has been considered: it
724	   is based on the concept of mean delay.  The mean delay is calculated
725	   by considering the average arrival time of the packets within a
726	   single block.  The network device locally stores a timestamp for each
727	   packet received within a single block: summing all the timestamps and
728	   dividing by the total number of packets received, the average arrival
729	   time for that block of packets can be calculated.  By subtracting the
730	   average arrival times of two adjacent devices it is possible to
731	   calculate the mean delay between those nodes.  When computing the
732	   mean delay, measurement error could be augmented by accumulating
733	   measurement error of a lot of packets.  This method is robust to out
734	   of order packets and also to packet loss (only a small error is
735	   introduced).  Moreover, it greatly reduces the number of timestamps
736	   (only one per block for each network device) that have to be
737	   collected by the management system.  On the other hand, it only gives
738	   one measure for the duration of the block (f.i. 5 minutes), and it
739	   doesn't give the minimum, maximum and median delay values (RFC 6703
740	   [RFC6703]).  This limitation could be overcome by reducing the
741	   duration of the block (f.i. from 5 minutes to a few seconds), that
742	   implicates an highly optimized implementation of the method.

744	   By summing the mean delays of the two directions of a path, it is
745	   also possible to measure the two-way mean delay (round-trip delay).

747	3.3.2.  Double marking methodology

749	   The Single marking methodology for one-way delay measurement is
750	   sensitive to out of order reception of packets.  The first approach
751	   to overcome this problem is described before and is based on the
752	   concept of mean delay.  But the limitation of mean delay is that it
753	   doesn't give information about the delay values distribution for the
754	   duration of the block.  Additionally it may be useful to have not
755	   only the mean delay but also the minimum, maximum and median delay
756	   values and, in wider terms, to know more about the statistic
757	   distribution of delay values.  So in order to have more information
758	   about the delay and to overcome out of order issues, a different
759	   approach can be introduced: it is based on double marking
760	   methodology.

762	   Basically, the idea is to use the first marking to create the
763	   alternate flow and, within this colored flow, a second marking to
764	   select the packets for measuring delay/jitter.  The first marking is
765	   needed for packet loss and mean delay measurement.  The second
766	   marking creates a new set of marked packets that are fully identified
767	   over the network, so that a network device can store the timestamps
768	   of these packets; these timestamps can be compared with the
769	   timestamps of the same packets on a second router to compute packet
770	   delay values for each packet.  The number of measurements can be
771	   easily increased by changing the frequency of the second marking.
772	   But the frequency of the second marking must be not too high in order
773	   to avoid out of order issues.  Between packets with the second
774	   marking there should be a security time gap (e.g. this gap could be,
775	   at the minimum, the mean network delay calculated with the previous
776	   methodology) to avoid out of order issues and also to have a number
777	   of measurement packets that is rate independent.  If a second marking
778	   packet is lost, the delay measurement for the considered block is
779	   corrupted and should be discarded.

781	   Mean delay is calculated on all the packets of a sample and is a
782	   simple computation to be performed for single marking method.  In
783	   some cases the mean delay measure is not sufficient to characterize
784	   the sample, and more statistics of delay extent data are needed, e.g.
785	   percentiles, variance and median delay values.  The conventional
786	   range (maximum-minimum) should be avoided for several reasons,
787	   including stability of the maximum delay due to the influence by
788	   outliers.  RFC 5481 [RFC5481] Section 6.5 highlights how the 99.9th
789	   percentile of delay and delay variation is more helpful to
790	   performance planners.  To overcome this drawback the idea is to
791	   couple the mean delay measure for the entire batch with double
792	   marking method, where a subset of batch packets are selected for
793	   extensive delay calculation by using a second marking.  In this way
794	   it is possible to perform a detailed analysis on these double marked
795	   packets.  Please note that there are classic algorithms for median
796	   and variance calculation, but are out of the scope of this document.
797	   The comparison between the mean delay for the entire batch and the
798	   mean delay on these double marked packets gives an useful information
799	   since it is possible to understand if the double marking measurements
800	   are actually representative of the delay trends.

802	3.4.  Delay variation measurement

804	   Similarly to one-way delay measurement (both for single marking and
805	   double marking), the method can also be used to measure the inter-
806	   arrival jitter.  We refer to the definition in RFC 3393 [RFC3393].
807	   The alternation of colors, for single marking method, can be used as
808	   a time reference to measure delay variations.  In case of double
809	   marking, the time reference is given by the second marked packets.

811	   Considering the example depicted in Figure 2, R1 stores a timestamp
812	   TS(A)R1 whenever it sends the first packet of a block and R2 stores a
813	   timestamp TS(B)R2 whenever it receives the first packet of a block.
814	   The inter-arrival jitter can be easily derived from one-way delay
815	   measurement, by evaluating the delay variation of consecutive
816	   samples.

818	   The concept of mean delay can also be applied to delay variation, by
819	   evaluating the average variation of the interval between consecutive
820	   packets of the flow from R1 to R2.

822	4.  Considerations

824	   This section highlights some considerations about the methodology.

826	4.1.  Synchronization

828	   The Alternate Marking technique does not require a strong
829	   synchronization, especially for packet loss and two-way delay
830	   measurement.  Only one-way delay measurement requires network devices
831	   to have synchronized clocks.

833	   The color switching is the reference for all the network devices, and
834	   the only requirement to be achieved is that all network devices have
835	   to recognize the right batch along the path.

837	   If the length of the measurement period is L time units, then all
838	   network devices must be synchronized to the same clock reference with
839	   an accuracy of +/- L/2 time units (without considering network
840	   delay).  This level of accuracy guarantees that all network devices
841	   consistently match the color bit to the correct block.  For example,
842	   if the color is toggeled every second (L = 1 second), then clocks
843	   must be synchronized with an accuracy of +/- 0.5 second to a common
844	   time reference.

846	   This synchronization requirement can be satisfied even with a
847	   relatively inaccurate synchronization method.  This is true for
848	   packet loss and two-way delay measurement, instead, for one-way delay
849	   measurement clock synchronization must be accurate.

851	   Therefore, a system that uses only packet loss and two-way delay
852	   measurement does not require synchronization.  This is because the
853	   value of the clocks of network devices does not affect the
854	   computation of the two-way delay measurement.

856	4.2.  Data Correlation

858	   Data Correlation is the mechanism to compare counters and timestamps
859	   for packet loss, delay and delay variation calculation.  It could be
860	   performed in several ways depending on the alternate marking
861	   application and use case.

863	   o  A possibility is to use a centralized solution using Network
864	      Management System (NMS) to correlate data;

866	   o  Another possibility is to define a protocol based distributed
867	      solution, by defining a new protocol or by extending the existing
868	      protocols (e.g.  RFC6374, TWAMP, OWAMP) in order to communicate
869	      the counters and timestamps between nodes.

871	   In the following paragraphs an example data correlation mechanism is
872	   explained and could be use independently of the adopted solutions.

874	   When data is collected on the upstream and downstream node, e.g.,
875	   packet counts for packet loss measurement or timestamps for packet
876	   delay measurement, and periodically reported to or pulled by other
877	   nodes or NMS, a certain data correlation mechanism SHOULD be in use
878	   to help the nodes or NMS to tell whether any two or more packet
879	   counts are related to the same block of markers, or any two
880	   timestamps are related to the same marked packet.

882	   The alternate marking method described in this document literally
883	   split the packets of the measured flow into different measurement
884	   blocks, in addition a Block Number could be assigned to each of such
885	   measurement block.  The BN is generated each time a node reads the
886	   data (packet counts or timestamps), and is associated with each
887	   packet count and timestamp reported to or pulled by other nodes or
888	   NMS.  The value of BN could be calculated as the modulo of the local
889	   time (when the data are read) and the interval of the marking time
890	   period.

892	   When the nodes or NMS see, for example, same BNs associated with two
893	   packet counts from an upstream and a downstream node respectively, it
894	   considers that these two packet counts corresponding to the same
895	   block, i.e. that these two packet counts belong to the same block of
896	   markers from the upstream and downstream node.  The assumption of
897	   this BN mechanism is that the measurement nodes are time
898	   synchronized.  This requires the measurement nodes to have a certain
899	   time synchronization capability (e.g., the Network Time Protocol
900	   (NTP) RFC 5905 [RFC5905], or the IEEE 1588 Precision Time Protocol
901	   (PTP) [IEEE-1588]).  Synchronization aspects are further discussed in
902	   Section 4.

904	4.3.  Packet Re-ordering

906	   Due to ECMP, packet re-ordering is very common in IP network.  The
907	   accuracy of marking based PM, especially packet loss measurement, may
908	   be affected by packet re-ordering.  Take a look at the following
909	   example:

911	   Block   :    1    |    2    |    3    |    4    |    5    |...
912	   --------|---------|---------|---------|---------|---------|---
913	   Node R1 : AAAAAAA | BBBBBBB | AAAAAAA | BBBBBBB | AAAAAAA |...
914	   Node R2 : AAAAABB | AABBBBA | AAABAAA | BBBBBBA | ABAAABA |...

916	                        Figure 5: Packet Reordering

918	   In Figure 5 the packet stream for Node R1 isn't being reordered, and
919	   can be safely assigned to interval blocks, but the packet stream for
920	   Node R2 is being reordered, so, looking at the packet with the marker
921	   of "B" in block 3, there is no safe way to tell whether the packet
922	   belongs to block 2 or block 4.

924	   In general there is the need to assign packets with the marker of "B"
925	   or "A" to the right interval blocks.  Most of the packet re-ordering
926	   occur at the edge of adjacent blocks, and they are easy to handle if
927	   the interval of each block is sufficient large.  Then, it can assume
928	   that the packets with different marker belong to the block that they
929	   are more close to.  If the interval is small, it is difficult and
930	   sometime impossible to determine to which block a packet belongs.

932	   To choose a proper interval is important and how to choose a proper
933	   interval is out of the scope of this document.  But an implementation
934	   SHOULD provide a way to configure the interval and allow a certain
935	   degree of packet re-ordering.

937	5.  Applications, implementation and deployment

939	   The methodology described in the previous sections can be applied in
940	   various situations.  Basically Alternate Marking technique could be
941	   used in many cases for performance measurement.  The only requirement
942	   is to select and mark the flow to be monitored; in this way packets
943	   are batched by the sender and each batch is alternately marked such
944	   that can be easily recognized by the receiver.

946	   Some recent alternate marking method applications are listed below:

948	   o  IP flow performance measurement (IPFPM): this application of
949	      marking method is described in
950	      [I-D.chen-ippm-coloring-based-ipfpm-framework].  As an example, in
951	      this document, the last reserved bit of the Flag field of the IPv4
952	      header is proposed to be used for marking, while a solution for
953	      IPv6 could be to leverage the IPv6 extension header for marking.

955	   o  OAM Passive Performance Measurement: In
956	      [I-D.ietf-bier-mpls-encapsulation] two OAM bits from Bit Index
957	      Explicit Replication (BIER) Header are reserved for the passive
958	      performance measurement marking method.  [I-D.ietf-bier-pmmm-oam]
959	      details the measurement for multicast service over BIER domain.
960	      In addition, the alternate marking method could also be used in a
961	      Service Function Chaining (SFC) domain.  Lastly the application of
962	      the marking method to Network Virtualization Overlays (NVO3)
963	      protocols is considered by [I-D.ietf-nvo3-encap].

965	   o  RFC6374 Use Case: RFC6374 [RFC6374] uses the LM packet as the
966	      packet accounting demarcation point.  Unfortunately this gives
967	      rise to a number of problems that may lead to significant packet
968	      accounting errors in certain situations.
969	      [I-D.ietf-mpls-flow-ident] discusses the desired capabilities for
970	      MPLS flow identification in order to perform a better in-band
971	      performance monitoring of user data packets.  A method of
972	      accomplishing identification is Synonymous Flow Labels (SFL)
973	      introduced in [I-D.bryant-mpls-sfl-framework], while
974	      [I-D.ietf-mpls-rfc6374-sfl] describes RFC6374 performance
975	      measurements with SFL.

977	   o  active performance measurement:
978	      [I-D.fioccola-ippm-alt-mark-active] describes how to extend the
979	      existing Active Measurement Protocol, in order to implement
980	      alternate marking methodology.
981	      [I-D.fioccola-ippm-rfc6812-alt-mark-ext] describes an extension to
982	      the Cisco SLA Protocol Measurement-Type UDP-Measurement.

984	   An example of implementation and deployment is explained in the next
985	   section, just to clarify how the method can work.

987	5.1.  Report on the operational experiment

989	   The method described in this document, also called PNPM (Packet
990	   Network Performance Monitoring), has been invented and engineered in
991	   Telecom Italia.

993	   It is important to highlight that the general description of the
994	   methodology in this document is a consequence of the operational
995	   experiment.  The foundational elements of the technique have been
996	   tested and the lessons learnt from the operational experiment
997	   inspired the formalization of the Alternate Marking Method as
998	   detailed in the previous sections.

1000	   The methodology is experimented in Telecom Italia's network and is
1001	   applied to multicast IPTV channels or other specific traffic flows
1002	   with high QoS requirements (i.e.  Mobile Backhauling traffic realized
1003	   with a VPN MPLS).

1005	   This technology has been employed by leveraging functions and tools
1006	   available on IP routers and it's currently being used to monitor
1007	   packet loss in some portions of the Telecom Italia's network.  The
1008	   application of the method to delay measurement has also been
1009	   evaluated in Telecom Italia's labs.

1011	   This Section describes how the experiment has been executed, in
1012	   particular how the features currently available on existing routing
1013	   platforms can be used to apply the method, in order to give an
1014	   example of implementation and deployment.

1016	   The operational test, here described, uses the flow-based strategy,
1017	   as defined in Section 3.  Instead the link-based strategy could be
1018	   applied to physical link or a logical link (e.g.  Ethernet VLAN or a
1019	   MPLS PW).

1021	   The implementation of the method leverages the available router
1022	   functions, since the experiment has been done by a Service Provider
1023	   (as Telecom Itlaia is) on its own network.  So, with current router
1024	   implementations, only QoS related fields and features offer the
1025	   required flexibility to set bits in the packet header.  In case a
1026	   Service Provider only uses the three most significant bits of the
1027	   DSCP field (corresponding to IP Precedence) for QoS classification
1028	   and queuing, it is possible to use the two less significant bits of
1029	   the DSCP field (bit 0 and bit 1) to implement the method without
1030	   affecting QoS policies.  That is the approach used for the
1031	   experiment.  One of the two bits (bit 0) could be used to identify
1032	   flows subject to traffic monitoring (set to 1 if the flow is under
1033	   monitoring, otherwise it is set to 0), while the second (bit 1) can
1034	   be used for coloring the traffic (switching between values 0 and 1,
1035	   corresponding to color A and B) and creating the blocks.

1037	   The experiment considers a flow as all the packets sharing the same
1038	   source IP address or the same destination IP address, depending on
1039	   the direction.  In practice, once the flow has been defined, coloring
1040	   the traffic using the DSCP field can be implemented by configuring on
1041	   the router output interface an access list that intercepts the
1042	   flow(s) to be monitored and applies to them a policy that sets the
1043	   DSCP field accordingly.  Since traffic coloring has to be switched
1044	   between the two values over time, the policy needs to be modified
1045	   periodically: an automatic script is used to perform this task on the
1046	   basis of a fixed timer.  The automatic script is loaded on board of
1047	   the router and automatizes the basic operations that are needed to
1048	   realize the methodology.

1050	   After the traffic is colored using the DSCP field, all the routers on
1051	   the path can perform the counting.  For this purpose an access-list
1052	   that matches specific DSCP values can be used to count the packets of
1053	   the flow(s) being monitored.  The same access-list can be installed
1054	   on all the routers of the path.  In addition, network flow
1055	   monitoring, such as provided by IPFIX (RFC 7011 [RFC7011]), can be
1056	   used to recognize timestamps of first/last packet of a batch in order
1057	   to enable one of the alternatives to measure the delay as detailed in
1058	   Section 3.3.

1060	   In the Telecom Italia's experiment the timer is set to 5 minutes, so
1061	   the sequence of actions of the script is also executed every 5
1062	   minutes.  This value has showed to be a good compromise between
1063	   measurement frequency and stability of the measurement (i.e.
1064	   possibility to collect all the measures referring to the same block).

1066	   For this expertiment, both counters and any other data are collected
1067	   by using the automatic script that sends out these to a Network
1068	   Management System (NMS).  The NMS is responsible for packet loss
1069	   calculation, performed by comparing the values of counters from the
1070	   routers along the flow(s) path. 5 minutes timer for color switching
1071	   is a safe choice for reading the counters and is also coherent with
1072	   the reporting window of the NMS.

1074	   Note that the use of the DSCP field for marking implies that the
1075	   method in this case works reliably only within a single management
1076	   and operation domain.

1078	   Lastly, the Telecom Italia experiment scales up to 1000 flows
1079	   monitored together on a single router, while an implementation on
1080	   dedicated hardware scales more, but it was tested only in labs for
1081	   now.

1083	5.1.1.  Metric transparency

1085	   Since a Service Provider application is described here, the method
1086	   can be applied to end-to-end services supplied to Customers.  So it
1087	   is important to highlight that the method MUST be transparent outside
1088	   the Service Provider domain.

1090	   In Telecom Italia's implementation the source node colors the packets
1091	   with a policy that is modified periodically via an automatic script
1092	   in order to alternate the DSCP field of the packets.  The nodes
1093	   between source and destination (included) have to count with an
1094	   access-list the colored packets that they receive and forward.

1096	   Moreover the destination node has an important role: the colored
1097	   packets are intercepted and a policy restores and sets the DSCP field
1098	   of all the packets to the initial value.  In this way the metric is
1099	   transparent because outside the section of the network under
1100	   monitoring the traffic flow is unchanged.

1102	   In such a case, thanks to this restoring technique, network elements
1103	   outside the Alternate Marking monitoring domain (e.g. the two
1104	   Provider Edge nodes of the Mobile Backhauling VPN MPLS) are totally
1105	   anaware that packets were marked.  So this restoring technique makes
1106	   Alternate Marking completely transparent outside its monitoring
1107	   domain.

1109	6.  Hybrid measurement

1111	   The method has been explicitly designed for passive measurements but
1112	   it can also be used with active measurements.  In order to have both
1113	   end to end measurements and intermediate measurements (hybrid
1114	   measurements) two end points can exchanges artificial traffic flows
1115	   and apply alternate marking over these flows.  In the intermediate
1116	   points artificial traffic is managed in the same way as real traffic
1117	   and measured as specified before.  So the application of marking
1118	   method can simplify also the active measurement, as explained in
1119	   [I-D.fioccola-ippm-alt-mark-active].

1121	7.  Compliance with RFC6390 guidelines

1123	   RFC6390 [RFC6390] defines a framework and a process for developing
1124	   Performance Metrics for protocols above and below the IP layer (such
1125	   as IP-based applications that operate over reliable or datagram
1126	   transport protocols).

1128	   This document doesn't aim to propose a new Performance Metric but a
1129	   new method of measurement for a few Performance Metrics that have
1130	   already been standardized.  Nevertheless, it's worth applying
1131	   [RFC6390] guidelines to the present document, in order to provide a
1132	   more complete and coherent description of the proposed method.  We
1133	   used a subset of the Performance Metric Definition template defined
1134	   by [RFC6390].

1136	   o  Metric name and description: as already stated, this document
1137	      doesn't propose any new Performance Metric.  On the contrary, it
1138	      describes a novel method for measuring packet loss [RFC7680].  The
1139	      same concept, with small differences, can also be used to measure
1140	      delay [RFC7679], and jitter [RFC3393].  The document mainly
1141	      describes the applicability to packet loss measurement.

1143	   o  Method of Measurement or Calculation: according to the method
1144	      described in the previous sections, the number of packets lost is
1145	      calculated by subtracting the value of the counter on the source
1146	      node from the value of the counter on the destination node.  Both
1147	      counters must refer to the same color.  The calculation is
1148	      performed when the value of the counters is in a steady state.
1149	      The steady state is an intrinsic characteristic of the marking
1150	      method counters because the alternation of color makes the
1151	      counters associated to each color still one at a time for the
1152	      duration of a marking period.

1154	   o  Units of Measurement: the method calculates and reports the exact
1155	      number of packets sent by the source node and not received by the
1156	      destination node.

1158	   o  Measurement Points: the measurement can be performed between
1159	      adjacent nodes, on a per-link basis, or along a multi-hop path,
1160	      provided that the traffic under measurement follows that path.  In
1161	      case of a multi-hop path, the measurements can be performed both
1162	      end-to-end and hop-by-hop.

1164	   o  Measurement Timing: the method have a constraint on the frequency
1165	      of measurements.  This is detailed in Section 3.2, where it is
1166	      specified that the marking period and the guardband interval are
1167	      strictly related each other to avoid out of order issues.  That is
1168	      because, in order to perform a measure, the counter must be in a
1169	      steady state and this happens when the traffic is being colored
1170	      with the alternate color.  As an example in the experiment of the
1171	      method the time interval is set to 5 minutes, while other
1172	      optimized implementations can also use a marking period of a few
1173	      seconds.

1175	   o  Implementation: the experiment of the method uses two encodings of
1176	      the DSCP field to color the packets; this enables the use of
1177	      policy configurations on the router to color the packets and
1178	      accordingly configure the counter for each color.  The path
1179	      followed by traffic being measured should be known in advance in
1180	      order to configure the counters along the path and be able to
1181	      compare the correct values.

1183	   o  Verification: both in the Lab and in the operational network the
1184	      methodology has been tested and experimented for packet loss and
1185	      delay measurements by using traffic generators together with
1186	      precision test instruments and network emulators.

1188	   o  Use and Applications: the method can be used to measure packet
1189	      loss with high precision on live traffic; moreover, by combining
1190	      end-to-end and per-link measurements, the method is useful to
1191	      pinpoint the single link that is experiencing loss events.

1193	   o  Reporting Model: the value of the counters has to be sent to a
1194	      centralized management system that perform the calculations; such
1195	      samples must contain a reference to the time interval they refer
1196	      to, so that the management system can perform the correct
1197	      correlation; the samples have to be sent while the corresponding
1198	      counter is in a steady state (within a time interval), otherwise
1199	      the value of the sample should be stored locally.

1201	   o  Dependencies: the values of the counters have to be correlated to
1202	      the time interval they refer to; moreover, as far the experiment
1203	      of the method is based on DSCP values, there are significant
1204	      dependencies on the usage of the DSCP field: it must be possible
1205	      to rely on unused DSCP values without affecting QoS-related
1206	      configuration and behavior; moreover, the intermediate nodes must
1207	      not change the value of the DSCP field not to alter the
1208	      measurement.

1210	   o  Organization of Results: the method of measurement produces
1211	      singletons.

1213	   o  Parameters: currently, the main parameter of the method is the
1214	      time interval used to alternate the colors and read the counters.

1216	8.  Security Considerations

1218	   This document specifies a method to perform measurements in the
1219	   context of a Service Provider's network and has not been developed to
1220	   conduct Internet measurements, so it does not directly affect
1221	   Internet security nor applications which run on the Internet.
1222	   However, implementation of this method must be mindful of security
1223	   and privacy concerns.

1225	   There are two types of security concerns: potential harm caused by
1226	   the measurements and potential harm to the measurements.

1228	   o  Harm caused by the measurement: the measurements described in this
1229	      document are passive, so there are no new packets injected into
1230	      the network causing potential harm to the network itself and to
1231	      data traffic.  Nevertheless, the method implies modifications on
1232	      the fly to a header or encapsulation of the data packets: this
1233	      must be performed in a way that doesn't alter the quality of
1234	      service experienced by packets subject to measurements and that
1235	      preserve stability and performance of routers doing the
1236	      measurements.  One of the main security threats in OAM protocols
1237	      is network reconnaissance; an attacker can gather information
1238	      about the network performance by passively eavesdropping to OAM
1239	      messages.  The advantage of the methods described in this document
1240	      is that the marking bits are the only information that is
1241	      exchanged between the network devices.  Therefore, passive
1242	      eavesdropping to data plane traffic does not allow attackers to
1243	      gain information about the network performance.

1245	   o  Harm to the measurement: the measurements could be harmed by
1246	      routers altering the marking of the packets, or by an attacker
1247	      injecting artificial traffic.  Authentication techniques, such as
1248	      digital signatures, may be used where appropriate to guard against
1249	      injected traffic attacks.  Since the measurement itself may be
1250	      affected by routers (or other network devices) along the path of
1251	      IP packets intentionally altering the value of marking bits of
1252	      packets, as mentioned above, the mechanism specified in this
1253	      document can be applied just in the context of a controlled
1254	      domain, and thus the routers (or other network devices) are
1255	      locally administered and this type of attack can be avoided.  In
1256	      addition, an attacker can't gain information about network
1257	      performance from a single monitoring point, and must use
1258	      synchronized monitoring points at multiple points on the path,
1259	      because they have to do the same kind of measurement and
1260	      aggregation that Service Providers using Alternate Marking must
1261	      do.

1263	   The privacy concerns of network measurement are limited because the
1264	   method only relies on information contained in the header or
1265	   encapsulation without any release of user data.Although information
1266	   in the header or encapsulation is metadata that can be used to
1267	   compromise the privacy of users, the limited marking technique in
1268	   this document seems unlikely to substantially increase the existing
1269	   privacy risks from header or encapsulation metadata.It might be
1270	   theoretically possible to modulate the marking to serve as a covert
1271	   channel, but it would have a very low data rate if it is to avoid
1272	   adversely affecting the measurement systems that monitor the marking.

1274	   Delay attacks are another potential threat in the context of this
1275	   document.  Delay measurement is performed using a specific packet in
1276	   each block, marked by a dedicated color bit.  Therefore, a man-in-
1277	   the-middle attacker can selectively induce synthetic delay only to
1278	   delay-colored packets, causing systematic error in the delay
1279	   measurements.  As discussed in previous sections, the methods
1280	   described in this document rely on an underlying time synchronization
1281	   protocol.  Thus, by attacking the time protocol an attacker can
1282	   potentially compromise the integrity of the measurement.  A detailed
1283	   discussion about the threats against time protocols and how to
1284	   mitigate them is presented in RFC 7384 [RFC7384].

1286	9.  IANA Considerations

1288	   There are no IANA actions required.

1290	10.  Acknowledgements

1292	   The previous IETF drafts about this technique were:
1293	   [I-D.cociglio-mboned-multicast-pm] and [I-D.tempia-opsawg-p3m].

1295	   The authors would like to thank Alberto Tempia Bonda, Domenico
1296	   Laforgia, Daniele Accetta and Mario Bianchetti for their contribution
1297	   to the definition and the implementation of the method.

1299	   The authors would also thank Spencer Dawkins, Carlos Pignataro, Brian
1300	   Haberman and Eric Vyncke for their assistance and their detailed and
1301	   precious reviews.

1303	11.  References

1305	11.1.  Normative References

1307	   [IEEE-1588]
1308	              IEEE 1588-2008, "IEEE Standard for a Precision Clock
1309	              Synchronization Protocol for Networked Measurement and
1310	              Control Systems", July 2008.

1312	   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
1313	              Requirement Levels", BCP 14, RFC 2119,
1314	              DOI 10.17487/RFC2119, March 1997,
1315	              <https://www.rfc-editor.org/info/rfc2119>.

1317	   [RFC3393]  Demichelis, C. and P. Chimento, "IP Packet Delay Variation
1318	              Metric for IP Performance Metrics (IPPM)", RFC 3393,
1319	              DOI 10.17487/RFC3393, November 2002,
1320	              <https://www.rfc-editor.org/info/rfc3393>.

1322	   [RFC5905]  Mills, D., Martin, J., Ed., Burbank, J., and W. Kasch,
1323	              "Network Time Protocol Version 4: Protocol and Algorithms
1324	              Specification", RFC 5905, DOI 10.17487/RFC5905, June 2010,
1325	              <https://www.rfc-editor.org/info/rfc5905>.

1327	   [RFC7679]  Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton,
1328	              Ed., "A One-Way Delay Metric for IP Performance Metrics
1329	              (IPPM)", STD 81, RFC 7679, DOI 10.17487/RFC7679, January
1330	              2016, <https://www.rfc-editor.org/info/rfc7679>.

1332	   [RFC7680]  Almes, G., Kalidindi, S., Zekauskas, M., and A. Morton,
1333	              Ed., "A One-Way Loss Metric for IP Performance Metrics
1334	              (IPPM)", STD 82, RFC 7680, DOI 10.17487/RFC7680, January
1335	              2016, <https://www.rfc-editor.org/info/rfc7680>.

1337	   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
1338	              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
1339	              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

1341	11.2.  Informative References

1343	   [I-D.bryant-mpls-sfl-framework]
1344	              Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S.,
1345	              and G. Mirsky, "Synonymous Flow Label Framework", draft-
1346	              bryant-mpls-sfl-framework-05 (work in progress), June
1347	              2017.

1349	   [I-D.chen-ippm-coloring-based-ipfpm-framework]
1350	              Chen, M., Zheng, L., Mirsky, G., Fioccola, G., and T.
1351	              Mizrahi, "IP Flow Performance Measurement Framework",
1352	              draft-chen-ippm-coloring-based-ipfpm-framework-06 (work in
1353	              progress), March 2016.

1355	   [I-D.chen-ippm-ipfpm-report]
1356	              Chen, M., Zheng, L., and G. Mirsky, "IP Flow Performance
1357	              Measurement Report", draft-chen-ippm-ipfpm-report-01 (work
1358	              in progress), April 2016.

1360	   [I-D.cociglio-mboned-multicast-pm]
1361	              Cociglio, M., Capello, A., Bonda, A., and L. Castaldelli,
1362	              "A method for IP multicast performance monitoring", draft-
1363	              cociglio-mboned-multicast-pm-01 (work in progress),
1364	              October 2010.

1366	   [I-D.fioccola-ippm-alt-mark-active]
1367	              Fioccola, G., Clemm, A., Bryant, S., Cociglio, M.,
1368	              Chandramouli, M., and A. Capello, "Alternate Marking
1369	              Extension to Active Measurement Protocol", draft-fioccola-
1370	              ippm-alt-mark-active-01 (work in progress), March 2017.

1372	   [I-D.fioccola-ippm-multipoint-alt-mark]
1373	              Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto,
1374	              "Multipoint Alternate Marking method for passive and
1375	              hybrid performance monitoring", draft-fioccola-ippm-
1376	              multipoint-alt-mark-01 (work in progress), October 2017.

1378	   [I-D.fioccola-ippm-rfc6812-alt-mark-ext]
1379	              Fioccola, G., Clemm, A., Cociglio, M., Chandramouli, M.,
1380	              and A. Capello, "Alternate Marking Extension to Cisco SLA
1381	              Protocol RFC6812", draft-fioccola-ippm-rfc6812-alt-mark-
1382	              ext-01 (work in progress), March 2016.

1384	   [I-D.ietf-bier-mpls-encapsulation]
1385	              Wijnands, I., Rosen, E., Dolganow, A., Tantsura, J.,
1386	              Aldrin, S., and I. Meilik, "Encapsulation for Bit Index
1387	              Explicit Replication in MPLS and non-MPLS Networks",
1388	              draft-ietf-bier-mpls-encapsulation-12 (work in progress),
1389	              October 2017.

1391	   [I-D.ietf-bier-pmmm-oam]
1392	              Mirsky, G., Zheng, L., Chen, M., and G. Fioccola,
1393	              "Performance Measurement (PM) with Marking Method in Bit
1394	              Index Explicit Replication (BIER) Layer", draft-ietf-bier-
1395	              pmmm-oam-03 (work in progress), October 2017.

1397	   [I-D.ietf-mpls-flow-ident]
1398	              Bryant, S., Pignataro, C., Chen, M., Li, Z., and G.
1399	              Mirsky, "MPLS Flow Identification Considerations", draft-
1400	              ietf-mpls-flow-ident-05 (work in progress), July 2017.

1402	   [I-D.ietf-mpls-rfc6374-sfl]
1403	              Bryant, S., Chen, M., Li, Z., Swallow, G., Sivabalan, S.,
1404	              Mirsky, G., and G. Fioccola, "RFC6374 Synonymous Flow
1405	              Labels", draft-ietf-mpls-rfc6374-sfl-01 (work in
1406	              progress), December 2017.

1408	   [I-D.ietf-nvo3-encap]
1409	              Boutros, S., Ganga, I., Garg, P., Manur, R., Mizrahi, T.,
1410	              Mozes, D., Nordmark, E., Smith, M., Aldrin, S., and I.
1411	              Bagdonas, "NVO3 Encapsulation Considerations", draft-ietf-
1412	              nvo3-encap-01 (work in progress), October 2017.

1414	   [I-D.tempia-opsawg-p3m]
1415	              Capello, A., Cociglio, M., Castaldelli, L., and A. Bonda,
1416	              "A packet based method for passive performance
1417	              monitoring", draft-tempia-opsawg-p3m-04 (work in
1418	              progress), February 2014.

1420	   [RFC5481]  Morton, A. and B. Claise, "Packet Delay Variation
1421	              Applicability Statement", RFC 5481, DOI 10.17487/RFC5481,
1422	              March 2009, <https://www.rfc-editor.org/info/rfc5481>.

1424	   [RFC6374]  Frost, D. and S. Bryant, "Packet Loss and Delay
1425	              Measurement for MPLS Networks", RFC 6374,
1426	              DOI 10.17487/RFC6374, September 2011,
1427	              <https://www.rfc-editor.org/info/rfc6374>.

1429	   [RFC6390]  Clark, A. and B. Claise, "Guidelines for Considering New
1430	              Performance Metric Development", BCP 170, RFC 6390,
1431	              DOI 10.17487/RFC6390, October 2011,
1432	              <https://www.rfc-editor.org/info/rfc6390>.

1434	   [RFC6703]  Morton, A., Ramachandran, G., and G. Maguluri, "Reporting
1435	              IP Network Performance Metrics: Different Points of View",
1436	              RFC 6703, DOI 10.17487/RFC6703, August 2012,
1437	              <https://www.rfc-editor.org/info/rfc6703>.

1439	   [RFC7011]  Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
1440	              "Specification of the IP Flow Information Export (IPFIX)
1441	              Protocol for the Exchange of Flow Information", STD 77,
1442	              RFC 7011, DOI 10.17487/RFC7011, September 2013,
1443	              <https://www.rfc-editor.org/info/rfc7011>.

1445	   [RFC7276]  Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
1446	              Weingarten, "An Overview of Operations, Administration,
1447	              and Maintenance (OAM) Tools", RFC 7276,
1448	              DOI 10.17487/RFC7276, June 2014,
1449	              <https://www.rfc-editor.org/info/rfc7276>.

1451	   [RFC7384]  Mizrahi, T., "Security Requirements of Time Protocols in
1452	              Packet Switched Networks", RFC 7384, DOI 10.17487/RFC7384,
1453	              October 2014, <https://www.rfc-editor.org/info/rfc7384>.

1455	   [RFC7799]  Morton, A., "Active and Passive Metrics and Methods (with
1456	              Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
1457	              May 2016, <https://www.rfc-editor.org/info/rfc7799>.

1459	Authors' Addresses

1461	   Giuseppe Fioccola (editor)
1462	   Telecom Italia
1463	   Via Reiss Romoli, 274
1464	   Torino  10148
1465	   Italy

1467	   Email: giuseppe.fioccola@telecomitalia.it
1468	   Alessandro Capello
1469	   Telecom Italia
1470	   Via Reiss Romoli, 274
1471	   Torino  10148
1472	   Italy

1474	   Email: alessandro.capello@telecomitalia.it

1476	   Mauro Cociglio
1477	   Telecom Italia
1478	   Via Reiss Romoli, 274
1479	   Torino  10148
1480	   Italy

1482	   Email: mauro.cociglio@telecomitalia.it

1484	   Luca Castaldelli
1485	   Telecom Italia
1486	   Via Reiss Romoli, 274
1487	   Torino  10148
1488	   Italy

1490	   Email: luca.castaldelli@telecomitalia.it

1492	   Mach(Guoyi) Chen
1493	   Huawei Technologies

1495	   Email: mach.chen@huawei.com

1497	   Lianshu Zheng
1498	   Huawei Technologies

1500	   Email: vero.zheng@huawei.com

1502	   Greg Mirsky
1503	   ZTE
1504	   USA

1506	   Email: gregimirsky@gmail.com
1507	   Tal Mizrahi
1508	   Marvell
1509	   6 Hamada st.
1510	   Yokneam
1511	   Israel

1513	   Email: talmi@marvell.com