idnits 2.17.1 

draft-tempia-opsawg-p3m-04.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (February 13, 2014) is 3722 days in the past.  Is this
     intentional?


  Checking references for intended status: Experimental
  ----------------------------------------------------------------------------

  == Unused Reference: 'RFC6390' is defined on line 848, but no explicit
     reference was found in the text

  ** Obsolete normative reference: RFC 2679 (Obsoleted by RFC 7679)

  ** Obsolete normative reference: RFC 2680 (Obsoleted by RFC 7680)

  == Outdated reference: A later version (-16) exists of
     draft-ietf-opsawg-oam-overview-13


     Summary: 2 errors (**), 0 flaws (~~), 3 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Network Working Group                                         A. Capello
3	Internet-Draft                                               M. Cociglio
4	Intended status: Experimental                             L. Castaldelli
5	Expires: August 17, 2014                                  Telecom Italia
6	                                                         A. Tempia Bonda

8	                                                       February 13, 2014

10	        A packet based method for passive performance monitoring
11	                     draft-tempia-opsawg-p3m-04.txt

13	Abstract

15	   This document describes a passive method to perform packet loss,
16	   delay and jitter measurements on live traffic.  Implementation and
17	   deployment details are also explained in order to clarify how the
18	   tools and features currently available on existing routing platforms
19	   can be used to implement the method.  This method has been invented
20	   and engineered in Telecom Italia and it's currently being used in
21	   Telecom Italia's network.

23	Status of This Memo

25	   This Internet-Draft is submitted in full conformance with the
26	   provisions of BCP 78 and BCP 79.

28	   Internet-Drafts are working documents of the Internet Engineering
29	   Task Force (IETF).  Note that other groups may also distribute
30	   working documents as Internet-Drafts.  The list of current Internet-
31	   Drafts is at http://datatracker.ietf.org/drafts/current/.

33	   Internet-Drafts are draft documents valid for a maximum of six months
34	   and may be updated, replaced, or obsoleted by other documents at any
35	   time.  It is inappropriate to use Internet-Drafts as reference
36	   material or to cite them other than as "work in progress."

38	   This Internet-Draft will expire on August 17, 2014.

40	Copyright Notice

42	   Copyright (c) 2014 IETF Trust and the persons identified as the
43	   document authors.  All rights reserved.

45	   This document is subject to BCP 78 and the IETF Trust's Legal
46	   Provisions Relating to IETF Documents
47	   (http://trustee.ietf.org/license-info) in effect on the date of
48	   publication of this document.  Please review these documents
49	   carefully, as they describe your rights and restrictions with respect
50	   to this document.

52	Table of Contents

54	   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
55	   2.  Overview of the method  . . . . . . . . . . . . . . . . . . .   4
56	   3.  Detailed description of the method  . . . . . . . . . . . . .   5
57	     3.1.  Packet loss measurement . . . . . . . . . . . . . . . . .   5
58	     3.2.  One-way delay measurement . . . . . . . . . . . . . . . .   9
59	       3.2.1.  Average delay . . . . . . . . . . . . . . . . . . . .  10
60	     3.3.  Delay variation measurement . . . . . . . . . . . . . . .  11
61	   4.  Implementation and deployment . . . . . . . . . . . . . . . .  11
62	     4.1.  Coloring the packets  . . . . . . . . . . . . . . . . . .  13
63	     4.2.  Counting the packets  . . . . . . . . . . . . . . . . . .  14
64	     4.3.  Collecting data and calculating packet loss . . . . . . .  15
65	   5.  Compliance with RFC6390 guidelines  . . . . . . . . . . . . .  15
66	   6.  Security Considerations . . . . . . . . . . . . . . . . . . .  17
67	   7.  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . .  17
68	   8.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  18
69	   9.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  18
70	   10. References  . . . . . . . . . . . . . . . . . . . . . . . . .  18
71	     10.1.  Normative References . . . . . . . . . . . . . . . . . .  18
72	     10.2.  Informative References . . . . . . . . . . . . . . . . .  19
73	   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

75	1.  Introduction

77	   Nowadays, most of the traffic in Service Providers' networks carries
78	   multimedia content.  Video contents are highly sensitive to packet
79	   loss [RFC2680], while interactive contents are sensitive to delay
80	   [RFC2679], and jitter [RFC3393].

82	   In front of this scenario, Service Providers need methodologies and
83	   tools to monitor and measure network performances with an adequate
84	   accuracy, in order to constantly control the quality of experience
85	   perceived by their customers.  On the other hand, performance
86	   monitoring provides useful information for improving network
87	   management (e.g.  isolation of network problems, troubleshooting,
88	   etc.).

90	   A lot of work related to OAM, that includes also performance
91	   monitoring techniques, has been done by Standards Developing
92	   Organizations: [I-D.ietf-opsawg-oam-overview] provides a good
93	   overview of existing OAM mechanisms defined in IETF, ITU-T and IEEE.
94	   Considering IETF, a lot of work has been done on fault detection and
95	   connectivity verification, while a minor effort has been dedicated so
96	   far to performance monitoring.  The IPPM WG has defined standard
97	   metrics to measure network performance; however, the methods
98	   developed in the WG mainly refer to active measurement techniques.
99	   More recently, the MPLS WG has defined mechanisms for measuring
100	   packet loss, one-way and two-way delay, and delay variation in MPLS
101	   networks[RFC6374], but their applicability to passive measurements
102	   has some limitations, especially for pure connection-less networks.

104	   The lack of adequate tools to measure packet loss with the desired
105	   accuracy drove an effort in Telecom Italia to design a new method for
106	   the performance monitoring of live traffic, possibly easy to
107	   implement and deploy.  The effort led to the method described in this
108	   document: basically, it is a passive performance monitoring
109	   technique, potentially applicable to any kind of packet based
110	   traffic, including Ethernet, IP, and MPLS, both unicast and
111	   multicast.  The method addresses primarily packet loss measurement,
112	   but it can be easily extended to one-way delay and delay variation
113	   measurements as well.  It doesn't require any protocol extension or
114	   interaction with existing protocols, thus avoiding any
115	   interoperability issue.  Even if the method doesn't raise any
116	   specific need for standardization, it could be further improved by
117	   means of some extension to existing protocols, but this aspect is
118	   left for further study and it is out of the scope of this document.

120	   The method has been explicitly designed for passive measurements but
121	   it can also be used with active probes.  Passive measurements are
122	   usually more easily understood by customers and provide a much better
123	   accuracy, especially for packet loss measurements.

125	   The method described in this document has been invented and
126	   engineered in Telecom Italia and it's currently being used in Telecom
127	   Italia's network.

129	   This document is organized as follows:

131	   o  Section 2 gives an overview of the method, including a comparison
132	      with alternate measurement strategies;

134	   o  Section 3 describes the method in detail

136	   o  Section 4 discusses implementation and deployment considerations,
137	      with special regard to the choices adopted in Telecom Italia's own
138	      implementation;

140	   o  Section 5 includes some considerations about security aspects;

142	   o  Section 6 finally summarizes some concluding remarks.

144	2.  Overview of the method

146	   In order to perform packet loss measurements on a live traffic flow,
147	   different approaches exist.  The most intuitive one consists in
148	   numbering the packets, so that each router that receives the flow can
149	   immediately detect a packet missing.  This approach, though very
150	   simple in theory, is not simple to achieve: it requires the insertion
151	   of a sequence number into each packet and the devices must be able to
152	   extract the number and check it in real time.  Such a task can be
153	   difficult to implement on live traffic: if UDP is used as the
154	   transport protocol, the sequence number is not available; on the
155	   other hand, if a higher layer sequence number (e.g. in the RTP
156	   header) is used, extracting that information from each packet and
157	   process it in real time could overload the device.

159	   An alternate approach is to count the number of packets sent on one
160	   end, the number of packets received on the other end, and to compare
161	   the two values.  This operation is much simpler to implement, but
162	   requires that the devices performing the measurement are in sync: in
163	   order to compare two counters it is required that they refer exactly
164	   to the same set of packets.  Since a flow is continuous and cannot be
165	   stopped when a counter has to be read, it could be difficult to
166	   determine exactly when to read the counter.  A possible solution to
167	   overcome this problem is to virtually split the flow in consecutive
168	   blocks by inserting periodically a delimiter so that each counter
169	   refers exactly to the same block of packets.  The delimiter could be
170	   for example a special packet inserted artificially into the flow.
171	   However, delimiting the flow using specific packets has some
172	   limitations.  First, it requires generating additional packets within
173	   the flow and requires the equipment to be able to process those
174	   packets.  In addition, the method is vulnerable to out of order
175	   reception of delimiting packets and, to a lesser extent, to their
176	   loss.

178	   The method proposed in this document follows the second approach, but
179	   it doesn't use additional packets to virtually split the flow in
180	   blocks.  Instead, it "colors" the packets so that the packets
181	   belonging to the same block will have the same color, whilst
182	   consecutive blocks will have different colors.  Each change of color
183	   represents a sort of auto-synchronization signal that guarantees the
184	   consistency of measurements taken by different devices along the
185	   path.

187	   Figure 1 represents a very simple network and shows how the method
188	   can be used to measure packet loss on different network segments: by
189	   enabling the measurement on several interfaces along the path, it is
190	   possible to perform link monitoring, node monitoring or end-to-end
191	   monitoring.  The method is flexible enough to measure packet loss on
192	   any segment of the network and can be used to isolate the faulty
193	   element.

195	                            Traffic flow
196	        ========================================================>
197	          +------+       +------+       +------+       +------+
198	      ---<>  R1  <>-----<>  R2  <>-----<>  R3  <>-----<>  R4  <>---
199	          +------+       +------+       +------+       +------+
200	          .              .      .              .       .      .
201	          .              .      .              .       .      .
202	          .              <------>              <------->      .
203	          .          Node Packet Loss      Link Packet Loss   .
204	          .                                                   .
205	          <--------------------------------------------------->
206	                           End-to-End Packet loss

208	                     Figure 1: Available measurements

210	3.  Detailed description of the method

212	   This section describes in detail how the method.  A special emphasis
213	   is given to the measurement of packet loss, that represents the core
214	   application of the method, but applicability to delay and jitter
215	   measurements is also considered.

217	3.1.  Packet loss measurement

219	   The basic idea is to virtually split traffic flows into consecutive
220	   blocks: each block represents a measurable entity unambiguously
221	   recognizable by all network devices along the path.  By counting the
222	   number of packets in each block and comparing the values measured by
223	   different network devices along the path, it is possible to measure
224	   packet loss occurred in any single block between any two points.

226	   As discussed in the previous section, a simple way to create the
227	   blocks is to "color" the traffic (two colors are sufficient) so that
228	   packets belonging to different consecutive blocks will have different
229	   colors.  Whenever the color changes, the previous block terminates
230	   and the new one begins.  Hence, all the packets belonging to the same
231	   block will have the same color and packets of different consecutive
232	   blocks will have different colors.  The number of packets in each
233	   block depends on the criterion used to create the blocks: if the
234	   color is switched after a fixed number of packets, then each block
235	   will contain the same number of packets (except for any losses); but
236	   if the color is switched according to a fixed timer, then the number
237	   of packets may be different in each block depending on the packet
238	   rate.

240	   The following figure shows how a flow looks like when it is split in
241	   traffic blocks with colored packets.

243	   A: packet with A coloring
244	   B: packet with B coloring

246	            |           |           |           |           |
247	            |           |    Traffic flow       |           |
248	    ------------------------------------------------------------------->
249	     BBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA
250	    ------------------------------------------------------------------->
251	       ...  |  Block 5  |  Block 4  |  Block 3  |  Block 2  |  Block 1
252	            |           |           |           |           |

254	                        Figure 2: Traffic coloring

256	   Figure 3 shows how the method can be used to measure link packet loss
257	   between two adjacent nodes.

259	   Referring to the figure, let's assume we want to monitor the packet
260	   loss on the link between two routers: router R1 and router R2.
261	   According to the method, the traffic is colored alternatively with
262	   two different colors, A and B. Whenever the color changes, the
263	   transition generates a sort of square-wave signal, as depicted in the
264	   following figure.

266	 Color A    ----------+           +-----------+           +----------
267	                      |           |           |           |
268	 Color B              +-----------+           +-----------+
269	              Block n        ...      Block 3     Block 2     Block 1
270	            <---------> <---------> <---------> <---------> <--------->

272	                                Traffic flow
273	            ===========================================================>
274	 Color ... AAAAAAAAAAA BBBBBBBBBBB AAAAAAAAAAA BBBBBBBBBBB AAAAAAA...
275	            ===========================================================>

277	      Figure 3: Application of the method to compute link packet loss

279	   Traffic coloring could be done by R1 itself or by an upward router.
280	   R1 needs two counters, C(A)R1 and C(B)R1, on its egress interface:
281	   C(A)R1 counts the packets with color A and C(B)R1 counts those with
282	   color B. As long as traffic is colored A, only counter C(A)R1 will be
283	   incremented, while C(B)R1 is not incremented; vice versa, when the
284	   traffic is colored as B, only C(B)R1 is incremented.  C(A)R1 and
285	   C(B)R1 can be used as reference values to determine the packet loss
286	   from R1 to any other measurement point down the path.  Router R2,
287	   similarly, will need two counters on its ingress interface, C(A)R2
288	   and C(B)R2, to count the packets received on that interface and
289	   colored with color A and B respectively.  When an A block ends, it is
290	   possible to compare C(A)R1 and C(A)R2 and calculate the packet loss
291	   within the block; similarly, when the successive B block terminates,
292	   it is possible to compare C(B)R1 with C(B)R2, and so on for every
293	   successive block.

295	   Likewise, by using two counters on R2 egress interface it is possible
296	   to count the packets sent out of R2 interface and use them as
297	   reference values to calculate the packet loss from R2 to any
298	   measurement point down R2.

300	   Using a fixed timer for color switching offers a better control over
301	   the method: the (time) length of the blocks can be chosen large
302	   enough to simplify the collection and the comparison of measures
303	   taken by different network devices.  It's preferable to read the
304	   value of the counters not immediately after the color switch: some
305	   packets could arrive out of order and increment the counter
306	   associated to the previous block (color), so it is worth waiting for
307	   some seconds.  The drawback is that the longer the duration of the
308	   block, the less frequent the measurement can be taken.

310	   The following table shows how the counters can be used to calculate
311	   the packet loss between R1 and R2.  The first column lists the
312	   sequence of traffic blocks while the other columns contain the
313	   counters of A-colored packets and B-colored packets for R1 and R2.
314	   In this example, we assume that the values of the counters are reset
315	   to zero whenever a block ends and its associated counter has been
316	   read: with this assumption, the table shows only relative values,
317	   that is the exact number of packets of each color within each block.
318	   If the values of the counters were not reset, the table would contain
319	   cumulative values, but the relative values could be determined simply
320	   by difference from the value of the previous block of the same color.

322	   The color is switched on the basis of a fixed timer (not shown in the
323	   table), so the number of packets in each block is different.

325	           +-------+--------+--------+--------+--------+------+
326	           | Block | C(A)R1 | C(B)R1 | C(A)R2 | C(B)R2 | Loss |
327	           +-------+--------+--------+--------+--------+------+
328	           | 1     | 375    | 0      | 375    | 0      | 0    |
329	           |       |        |        |        |        |      |
330	           | 2     | 0      | 388    | 0      | 388    | 0    |
331	           |       |        |        |        |        |      |
332	           | 3     | 382    | 0      | 381    | 0      | 1    |
333	           |       |        |        |        |        |      |
334	           | 4     | 0      | 377    | 0      | 374    | 3    |
335	           |       |        |        |        |        |      |
336	           | ...   | ...    | ...    | ...    | ...    | ...  |
337	           |       |        |        |        |        |      |
338	           | n     | 0      | 387    | 0      | 387    | 0    |
339	           |       |        |        |        |        |      |
340	           | n+1   | 379    | 0      | 377    | 0      | 2    |
341	           +-------+--------+--------+--------+--------+------+

343	       Table 1: Evaluation of counters for packet loss measurements

345	   During an A block (blocks 1, 3 and n+1), all the packets are
346	   A-colored, therefore the C(A) counters are incremented to the number
347	   seen on the interface, while C(B) counters are zero.  Vice versa,
348	   during a B block (blocks 2, 4 and n), all the packets are B-colored:
349	   C(A) counters are zero, while C(B) counters are incremented.

351	   When a block ends (because of color switching) the relative counters
352	   stop incrementing and it is possible to read them, compare the values
353	   measured on router R1 and R2 and calculate the packet loss within
354	   that block.

356	   For example, looking at the table above, during the first block
357	   (A-colored), C(A)R1 and C(A)R2 have the same value (375), which
358	   corresponds to the exact number of packets of the first block (no
359	   loss).  Also during the second block (B-colored) R1 and R2 counters
360	   have the same value (388), which corresponds to the number of packets
361	   of the second block (no loss).  During blocks three and four, R1 and
362	   R2 counters are different, meaning that some packets have been lost:
363	   in the example, one single packet (382-381) was lost during block
364	   three and three packets (377-374) were lost during block four.

366	   The method applied to R1 and R2 can be extended to any other router
367	   and applied to more complex networks, as far as the measurement is
368	   enabled on the path followed by the traffic flow(s) being observed.

370	3.2.  One-way delay measurement

372	   The same principle used to measure packet loss can be applied also to
373	   one-way delay measurement: the alternation of colors can be used as a
374	   time reference to calculate the delay.  Whenever the color changes
375	   (that means that a new block has started) a network device can store
376	   the timestamp of the first packet of the new block; that timestamp
377	   can be compared with the timestamp of the same packet on a second
378	   router to compute packet delay.  Considering Figure 4, R1 stores a
379	   timestamp TS(A1)R1 when it sends the first packet of block 1
380	   (A-colored), a timestamp TS(B2)R1 when it sends the first packet of
381	   block 2 (B-colored) and so on for every other block.  R2 performs the
382	   same operation on the receiving side, recording TS(A1)R2, TS(B2)R2
383	   and so on.  Since the timestamps refer to specific packets (the first
384	   packet of each block) we are sure that timestamps compared to compute
385	   delay refer to the same packets.  By comparing TS(A1)R1 with TS(A1)R2
386	   (and similarly TS(B2)R1 with TS(B2)R2 and so on) it is possible to
387	   measure the delay between R1 and R2.  In order to have more
388	   measurements, it is possible to take and store more timestamps,
389	   referring to other packets within each block.

391	   In order to coherently compare timestamps collected on different
392	   routers, the network nodes must be in sync.  Furthermore, a
393	   measurement is valid only if no packet loss occurs and if packet
394	   misordering can be avoided, otherwise the first packet of a block on
395	   R1 could be different from the first packet of the same block on R2
396	   (f.i. if that packet is lost between R1 and R2 or it arrives after
397	   the next one).

399	   The following table shows how timestamps can be used to calculate the
400	   delay between R1 and R2.  The first column lists the sequence of
401	   blocks while other columns contain the timestamp referring to the
402	   first packet of each block on R1 and R2.  The delay is computed as a
403	   difference between timestamps.  For the sake of simplicity, all the
404	   values are expressed in milliseconds.

406	      +-------+---------+---------+---------+---------+-------------+
407	      | Block | TS(A)R1 | TS(B)R1 | TS(A)R2 | TS(B)R2 | Delay R1-R2 |
408	      +-------+---------+---------+---------+---------+-------------+
409	      | 1     | 12.483  | -       | 15.591  | -       | 3.108       |
410	      |       |         |         |         |         |             |
411	      | 2     | -       | 6.263   | -       | 9.288   | 3.025       |
412	      |       |         |         |         |         |             |
413	      | 3     | 27.556  | -       | 30.512  | -       | 2.956       |
414	      |       |         |         |         |         |             |
415	      |       | -       | 18.113  | -       | 21.269  | 3.156       |
416	      |       |         |         |         |         |             |
417	      | ...   | ...     | ...     | ...     | ...     | ...         |
418	      |       |         |         |         |         |             |
419	      | n     | 77.463  | -       | 80.501  | -       | 3.038       |
420	      |       |         |         |         |         |             |
421	      | n+1   | -       | 24.333  | -       | 27.433  | 3.100       |
422	      +-------+---------+---------+---------+---------+-------------+

424	         Table 2: Evaluation of timestamps for delay measurements

426	   The first row shows timestamps taken on R1 and R2 respectively and
427	   referring to the first packet of block 1 (which is A-colored).  Delay
428	   can be computed as a difference between the timestamp on R2 and the
429	   timestamp on R1.  Similarly, the second row shows timestamps (in
430	   milliseconds) taken on R1 and R2 and referring to the first packet of
431	   block 2 (which is B-colored).  Comparing timestamps taken on
432	   different nodes in the network and referring to the same packets
433	   (identified using the alternation of colors) it is possible to
434	   measure delay on different network segments.

436	   For the sake of simplicity, in the above example a single measurement
437	   is provided within a block, taking into account only the first packet
438	   of each block.  The number of measurements can be easily increased by
439	   considering multiple packets in the block: for instance, a timestamp
440	   could be taken every N packets, thus generating multiple delay
441	   measurements.  Taking this to the limit, in principle the delay could
442	   be measured for each packet, by taking and comparing the
443	   corresponding timestamps (possible but impractical from an
444	   implementation point of view).

446	3.2.1.  Average delay

448	   As mentioned before, the method previously exposed for measuring the
449	   delay is sensitive to out of order reception of packets.  In order to
450	   overcome this problem, a different approach has been considered: it
451	   is based on the concept of average delay.  The average delay is
452	   calculated by considering the average arrival time of the packets
453	   within a single block.  The network device locally stores a timestamp
454	   for each packet received within a single block: summing all the
455	   timestamps and dividing by the total number of packets received, the
456	   average arrival time for that block of packets can be calculated.  By
457	   subtracting the average arrival times of two adjacent devices it is
458	   possible to calculate the average delay between those nodes.  This
459	   method is robust to out of order packets and also to packet loss
460	   (only a small error is introduced).  Moreover, it greatly reduces the
461	   number of timestamps (only one per block for each network device)
462	   that have to be collected by the management system.  On the other
463	   hand, it only gives one measure for the duration of the block (f.i. 5
464	   minutes), and it doesn't give the minimum and maximum delay values.
465	   This limitation could be overcome by reducing the duration of the
466	   block (f.i. from 5 minutes to a few seconds) by means of an highly
467	   optimized implementation of the method.

469	   By summing the average delays of the two directions of a path, it is
470	   also possible to measure the two-way delay (round-trip delay).

472	3.3.  Delay variation measurement

474	   Similarly to one-way delay measurement, the method can also be used
475	   to measure the inter-arrival jitter.  The alternation of colors can
476	   be used as a time reference to measure delay variations.  Considering
477	   the example depicted in Figure 4, R1 stores a timestamp TS(A)R1
478	   whenever it sends the first packet of a block and R2 stores a
479	   timestamp TS(B)R2 whenever it receives the first packet of a block.
480	   The inter-arrival jitter can be easily derived from one-way delay
481	   measurement, by evaluating the delay variation of consecutive
482	   samples.

484	   The concept of average delay can also be applied to delay variation,
485	   by evaluating the variation of consecutive measures of the average
486	   delay.

488	4.  Implementation and deployment

490	   The methodology described in the previous sections has been
491	   implemented in Telecom Italia by leveraging functions and tools
492	   available on IP routers and it's currently being used to monitor
493	   packet loss in some portions of Telecom Italia's network.  The
494	   application of the method to delay measurement is currently being
495	   evaluated in Telecom Italia's labs.

497	   The fundamental steps for the implementation of the method can be
498	   summarized in the following items:

500	   o  coloring the packets;
501	   o  counting the packets;

503	   o  collecting data and calculating the packet loss.

505	   Before going deeper into the implementation details, it's worth
506	   mentioning two different strategies that can be used when
507	   implementing the method:

509	   o  flow-based: the flow-based strategy is used when only a limited
510	      number of traffic flows need to be monitored.  This could be the
511	      case, for example, of IPTV channels or other specific applications
512	      traffic with high QoS requirements.  According to this strategy,
513	      only a subset of the flows is colored.  Counters for packet loss
514	      measurements can be instantiated for each single flow, or for the
515	      set as a whole, depending on the desired granularity.  A relevant
516	      problem with this approach is the necessity to know in advance the
517	      path followed by flows that are subject to measurement.  Path
518	      rerouting and traffic load-balancing increase the issue
519	      complexity, especially for unicast traffic.  The problem is easier
520	      to solve for multicast traffic where load balancing is seldom
521	      used, especially for IPTV traffic where static joins are
522	      frequently used to force traffic forwarding and replication.

524	   o  link-based: measurements are performed on all the traffic on a
525	      link by link basis.  The link could be a physical link or a
526	      logical link (for instance an Ethernet VLAN or a MPLS PW).
527	      Counters could be instantiated for the traffic as a whole or for
528	      each traffic class (in case it is desired to monitor each class
529	      separately), but in the second case a couple of counters is needed
530	      for each class.

532	   The current implementation in Telecom Italia uses the first strategy.
533	   As mentioned, the flow-based measurement requires the identification
534	   of the flow to be monitored and the discovery of the path followed by
535	   the selected flow.  It is possible to monitor a single flow or
536	   multiple flows grouped together, but in this case measurement is
537	   consistent only if all the flows in the group follow the same path.
538	   Moreover, a Service Provider should be aware that, if a measurement
539	   is performed by grouping many flows, it is not possible to determine
540	   exactly which flow was affected by packets loss.  In order to have
541	   measures per single flow it is necessary to configure counters for
542	   each specific flow.  Once the flow(s) to be monitored have been
543	   identified, it is necessary to configure the monitoring on the proper
544	   nodes.  Configuring the monitoring means configuring the policy to
545	   intercept the traffic and configuring the counters to count the
546	   packets.  To have just an end-to-end monitoring, it is sufficient to
547	   enable the monitoring on the first and the last hop routers of the
548	   path: the mechanism is completely transparent to intermediate nodes
549	   and independent from the path followed by traffic flows.  On the
550	   contrary, to monitor the flow on a hop-by-hop basis along its whole
551	   path it is necessary to enable the monitoring on every node from the
552	   source to the destination.  In case the exact path followed by the
553	   flow is not known a priori (i.e. the flow has multiple paths to reach
554	   the destination) it is necessary to enable the monitoring system on
555	   every path: counters on interfaces traversed by the flow will report
556	   packet count, counters on other interfaces will be null.

558	4.1.  Coloring the packets

560	   The coloring operation is fundamental in order to create packet
561	   blocks.  This implies choosing where to activate the coloring and how
562	   to color the packets.

564	   In case of flow-based measurements, it is desirable, in general, to
565	   have a single coloring node because it is easier to manage and
566	   doesn't rise any risk of conflict (consider the case where two nodes
567	   color the same flow).  Thus it is necessary to color the flow as
568	   close as possible to the source.  In addition, coloring a flow close
569	   to the source allows an end-to-end measure if a measurement point is
570	   enabled on the last-hop router as well.  The only requirement is that
571	   the coloring must change periodically and every node along the path
572	   must be able to identify unambiguously the colored packets.  For
573	   link-based measurements, all traffic needs to be colored when
574	   transmitted on the link.  If the traffic had already been colored,
575	   then it has to be re-colored because the color must be consistent on
576	   the link.  This means that each hop along the path must (re-)color
577	   the traffic; the color is not required to be consistent along
578	   different links.

580	   Traffic coloring can be implemented by setting a specific bit in the
581	   packet header and changing the value of that bit periodically.  With
582	   current router implementations, only QoS-related fields and features
583	   offer the required flexibility to explicitly set the value of some
584	   bits in the packet header from the Command Line Interface (CLI).  In
585	   case a Service Provider only uses the three most significant bits of
586	   the DSCP field (corresponding to IP Precedence) for QoS
587	   classification and queuing, it is possible to use the two less
588	   significant bits of the DSCP field (bit 0 and bit 1) to implement the
589	   method without affecting QoS policies.  One of the two bits (bit 0)
590	   could be used to identify flows subject to traffic monitoring (set to
591	   1 if the flow is under monitoring, otherwise it is set to 0), while
592	   the second (bit 1) can be used for coloring the traffic (switching
593	   between values 0 and 1, corresponding to color A and B) and creating
594	   the blocks.

596	   In practice, coloring the traffic using the DSCP field can be
597	   implemented by configuring on the router output interface an access
598	   list that intercepts the flow(s) to be monitored and applies to them
599	   a policy that sets the DSCP field accordingly.  Since traffic
600	   coloring has to be switched between the two values over time, the
601	   policy needs to be modified periodically: an automatic script ca be
602	   used perform this task on the basis of a fixed timer.  In Telecom
603	   Italia's implementation this timer is set to 5 minutes: this value
604	   showed to be a good compromise between measurement frequency and
605	   stability of the measurement (i.e. possibility to collect all the
606	   measures referring to the same block).

608	4.2.  Counting the packets

610	   Assuming that the coloring of the packets is performed only by the
611	   source node, the nodes between source and destination (included) have
612	   to count the colored packets that they receive and forward: this
613	   operation can be enabled on every router along the path or only on a
614	   subset, depending on which network segment is being monitored (a
615	   single link, a particular metro area, the backbone, the whole path).

617	   Since the color switches periodically between two values, two
618	   counters (one for each value) are needed: one counter for packets
619	   with color A and one counter for packets with color B. For each flow
620	   (or group of flows) being monitored and for every interface where the
621	   monitoring is active, a couple of counters is needed.  For example,
622	   in order to monitor separately 3 flows on a router with 4 interfaces
623	   involved, 24 counters are needed (2 counters for each of the 3 flows
624	   on each of the 4 interfaces).  If traffic is colored using the DSCP
625	   field, as in Telecom Italia's implementation, an access-list that
626	   matches specific DSCP values can be used to count the packets of the
627	   flow(s) being monitored.

629	   In case of link-based measurements the behavior is similar except
630	   that coloring and counting operations are performed on a link by link
631	   basis at each endpoint of the link.

633	   Another important aspect to take into consideration is when to read
634	   the counters: in order to count the exact number of packets of a
635	   block the routers must perform this operation when that block has
636	   ended: in other words, the counter for color A must be read when the
637	   current block has color B, in order to be sure that the value of the
638	   counter is stable.  This task can be accomplished in two ways.  The
639	   general approach suggests to read the counters periodically, many
640	   times during a block duration, and to compare these successive
641	   readings: when the counter stops incrementing means that the current
642	   block has ended and its value can be elaborated safely.
643	   Alternatively, if the coloring operation is performed on the basis of
644	   a fixed timer, it is possible to configure the reading of the
645	   counters according to that timer: for example, if each block is 5
646	   minutes long, reading the counter for color A every 5 minute in the
647	   middle of the subsequent block (with color B) is a safe choice.  A
648	   sufficient margin should be considered between the end of a block and
649	   the reading of the counter, in order to take into account any out-of-
650	   order packets.  The choice of a 5 minutes timer for color switching
651	   was also suggested by these considerations

653	4.3.  Collecting data and calculating packet loss

655	   The nodes enabled to perform performance monitoring collect the value
656	   of the counters, but they are not able to directly use this
657	   information to measure packet loss, because they only have their own
658	   samples.  For this reason, an external Network Management System
659	   (NMS) is required to collect and elaborate data and to perform packet
660	   loss calculation.  The NMS compares the values of counters from
661	   different nodes and can calculate if some packets were lost (even a
662	   single packet) and also where packets were lost.

664	   The value of the counters needs to be transmitted to the NMS as soon
665	   as it has been read.  This can be accomplished by using SNMP or FTP
666	   and can be done in Push Mode or Polling Mode.  In the first case,
667	   each router periodically sends the information to the NMS, in the
668	   latter case it is the NMS that periodically polls routers to collect
669	   information.  In any case, the NMS has to collect all the relevant
670	   values from all the routers within one cycle of the timer (5
671	   minutes).

673	5.  Compliance with RFC6390 guidelines

675	   RFC6390 [RFC6390]defines a framework and a process for developing
676	   Performance Metrics for protocols above and below the IP layer (such
677	   as IP-based applications that operate over reliable or datagram
678	   transport protocols).

680	   This document doesn't aim to propose a new Performance Metric but a
681	   new method of measurement for a few Performance Metrics that have
682	   already been standardized.  Nevertheless, it's worth applying RFC6390
683	   guidelines to the present document, in order to provide a more
684	   complete and coherent description of the proposed method.  We used a
685	   subset of the Performance Metric Definition template defined by
686	   RFC6390.

688	   o  Metric name and description: as already stated, this document
689	      doesn't propose any new Performance Metric.  On the contrary, it
690	      describes a novel method for measuring packet loss[RFC2680].  The
691	      same concept, with small differences, can also be used to measure
692	      delay[RFC2679], and jitter[RFC3393].  The document mainly
693	      describes the applicability to packet loss measurement.

695	   o  Method of Measurement or Calculation: according to the method
696	      described in the previous sections, the number of packets lost is
697	      calculated by subtracting the value of the counter on the source
698	      node from the value of the counter on the destination node.  Both
699	      counters must refer to the same color.  The calculation is
700	      performed when the value of the counters is in a steady state.

702	   o  Units of Measurement: the method calculates and reports the exact
703	      number of packets sent by the source node and not received by the
704	      destination node.

706	   o  Measurement Points: the measurement can be performed between
707	      adjacent nodes, on a per-link basis, or along a multi-hop path,
708	      provided that the traffic under measurement follows that path.  In
709	      case of a multi-hop path, the measurements can be performed both
710	      end-to-end and hop-by-hop.

712	   o  Measurement Timing: the method have a constraint on the frequency
713	      of measurements.  In order to perform a measure, the counter must
714	      be in a steady state: this happens when the traffic is being
715	      colored with the alternate color; in the current implementation
716	      the time interval is set to 5 minutes.

718	   o  Implementation: the current implementation of the method uses two
719	      encodings of the DSCP field to color the packets; this enables the
720	      use of policy configurations on the router to color the packets
721	      and accordingly configure the counter for each color.  The path
722	      followed by traffic being measured should be known in advance in
723	      order to configure the counters along the path and be able to
724	      compare the correct values.

726	   o  Use and Applications: the method can be used to measure packet
727	      loss with high precision (i.e. 10exp(-7)) on live traffic;
728	      moreover, by combining end-to-end and per-link measurements, the
729	      method is useful to pinpoint the single link that is experiencing
730	      loss events.

732	   o  Reporting Model: the value of the counters has to be sent to a
733	      centralized management system that perform the calculations; such
734	      samples must contain a reference to the time interval they refer
735	      to, so that the management system can perform the correct
736	      correlation; the samples have to be sent while the corresponding
737	      counter is in a steady state (within a time interval), otherwise
738	      the value of the sample should be stored locally.

740	   o  Dependencies: the values of the counters have to be correlated to
741	      the time interval they refer to; moreover, as far the current
742	      implementation is based on DSCP values, there are significant
743	      dependencies on the usage of the DSCP field: it must be possible
744	      to rely on unused DSCP values without affecting QoS-related
745	      configuration and behavior; moreover, the intermediate nodes must
746	      not change the value of the DSCP field not to alter the
747	      measurement.

749	   o  Organization of Results: the method of measurement produces
750	      singletons

752	   o  Parameters: currently, the main parameter of the method is the
753	      time interval used to alternate the colors and read the counters.

755	6.  Security Considerations

757	   This document specifies a method to perform measurements in the
758	   context of a Service Provider's network and has not been developed to
759	   conduct Internet measurements, so it does not directly affect
760	   Internet security nor applications which run on the Internet.
761	   However, implementation of this method must be mindful of security
762	   and privacy concerns.

764	   There are two types of security concerns: potential harm caused by
765	   the measurements and potential harm to the measurements.  For what
766	   concerns the first point, the measurements described in this document
767	   are passive, so there are no packets injected into the network
768	   causing potential harm to the network itself and to data traffic.
769	   Nevertheless, the method implies modifications on the fly to the IP
770	   header of data packets: this must be performed in a way that doesn't
771	   alter the quality of service experienced by packets subject to
772	   measurements and that preserve stability and performance of routers
773	   doing the measurements.  The measurements themselves could be harmed
774	   by routers altering the coloring of the packets, or by an attacker
775	   injecting artificial traffic.  Authentication techniques, such as
776	   digital signatures, may be used where appropriate to guard against
777	   injected traffic attacks.

779	   The privacy concerns of network measurement are limited because the
780	   method only relies on information contained in the IP header without
781	   any release of user data.

783	7.  Conclusions

785	   The advantages of the method described in this document are:

787	   o  easy implementation: it can be implemented using features already
788	      available on major routing platforms;

790	   o  low computational effort: the additional load on processing is
791	      negligible;

793	   o  accurate packet loss measurement: single packet loss granularity
794	      is achieved with a passive measurement;

796	   o  potential applicability to any kind of packet/frame -based
797	      traffic: Ethernet, IP, MPLS, etc., both unicast and multicast;

799	   o  robustness: the method can tolerate out of order packets and it's
800	      not based on "special" packets whose loss could have a negative
801	      impact;

803	   o  no interoperability issues: the features required to implement the
804	      method are available on all current routing platforms.

806	   The method doesn't raise any specific need for standardization, but
807	   it could be further improved by means of some extension to existing
808	   protocols.  Specifically, the use of DiffServ bits for coloring the
809	   packets could not be a viable solution in some cases: a standard
810	   method to color the packets for this specific application could be
811	   beneficial.

813	8.  IANA Considerations

815	   There are no IANA actions required.

817	9.  Acknowledgements

819	   The authors would like to thank Domenico Laforgia, Daniele Accetta
820	   and Mario Bianchetti for their contribution to the definition and the
821	   implementation of the method.

823	10.  References

825	10.1.  Normative References

827	   [RFC2679]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way
828	              Delay Metric for IPPM", RFC 2679, September 1999.

830	   [RFC2680]  Almes, G., Kalidindi, S., and M. Zekauskas, "A One-way
831	              Packet Loss Metric for IPPM", RFC 2680, September 1999.

833	   [RFC3393]  Demichelis, C. and P. Chimento, "IP Packet Delay Variation
834	              Metric for IP Performance Metrics (IPPM)", RFC 3393,
835	              November 2002.

837	10.2.  Informative References

839	   [I-D.ietf-opsawg-oam-overview]
840	              Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
841	              Weingarten, "An Overview of Operations, Administration,
842	              and Maintenance (OAM) Tools", draft-ietf-opsawg-oam-
843	              overview-13 (work in progress), January 2014.

845	   [RFC6374]  Frost, D. and S. Bryant, "Packet Loss and Delay
846	              Measurement for MPLS Networks", RFC 6374, September 2011.

848	   [RFC6390]  Clark, A. and B. Claise, "Guidelines for Considering New
849	              Performance Metric Development", BCP 170, RFC 6390,
850	              October 2011.

852	Authors' Addresses

854	   Alessandro Capello
855	   Telecom Italia
856	   Via Reiss Romoli, 274
857	   Torino  10148
858	   Italy

860	   Email: alessandro.capello@telecomitalia.it

862	   Mauro Cociglio
863	   Telecom Italia
864	   Via Reiss Romoli, 274
865	   Torino  10148
866	   Italy

868	   Email: mauro.cociglio@telecomitalia.it

870	   Luca Castaldelli
871	   Telecom Italia
872	   Via Reiss Romoli, 274
873	   Torino  10148
874	   Italy

876	   Email: luca.castaldelli@telecomitalia.it
877	   Alberto Tempia Bonda

879	   Email: alberto.tempia@gmail.com