idnits 2.17.1 draft-chen-nvo3-ipid-pm-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 19, 2016) is 2776 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC2119' is mentioned on line 162, but not defined Summary: 0 errors (**), 0 flaws (~~), 2 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 NVO3 Working Group H. Chen 3 INTERNET-DRAFT Huawei Technologies 4 Intended Status: Informational September 19, 2016 5 Expires: March 23, 2017 7 Using IPID for Performance Monitoring in VxLAN Network 8 draft-chen-nvo3-ipid-pm-01 10 Abstract 12 IP Identification(IPID)is a field in IP header primarily used to 13 uniquely identify the group of fragments of a single IP packet. The 14 value of IPID field in a packet from a specific traffic flow or 15 source IP address keeps increasing until wrapped-around. 17 This document specifies a method by carefully examining IPID value to 18 monitor the performance of VxLAN network. In this memo packet loss 19 measurement is mainly considered. This method requires no extra 20 hardware support, which means it is compatible with most of the 21 deployed routers or switches. Such a mechanism is applicable to IPv4 22 network and potential useful in overlay network with different data 23 encapsulation. 25 Status of this Memo 27 This Internet-Draft is submitted to IETF in full conformance with the 28 provisions of BCP 78 and BCP 79. 30 Internet-Drafts are working documents of the Internet Engineering 31 Task Force (IETF), its areas, and its working groups. Note that 32 other groups may also distribute working documents as 33 Internet-Drafts. 35 Internet-Drafts are draft documents valid for a maximum of six months 36 and may be updated, replaced, or obsoleted by other documents at any 37 time. It is inappropriate to use Internet-Drafts as reference 38 material or to cite them other than as "work in progress." 40 The list of current Internet-Drafts can be accessed at 41 http://www.ietf.org/1id-abstracts.html 43 The list of Internet-Draft Shadow Directories can be accessed at 44 http://www.ietf.org/shadow.html 46 Copyright and License Notice 48 Copyright (c) 2015 IETF Trust and the persons identified as the 49 document authors. All rights reserved. 51 This document is subject to BCP 78 and the IETF Trust's Legal 52 Provisions Relating to IETF Documents 53 (http://trustee.ietf.org/license-info) in effect on the date of 54 publication of this document. Please review these documents 55 carefully, as they describe your rights and restrictions with respect 56 to this document. Code Components extracted from this document must 57 include Simplified BSD License text as described in Section 4.e of 58 the Trust Legal Provisions and are provided without warranty as 59 described in the Simplified BSD License. 61 Table of Contents 63 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 64 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . 4 65 3. IPID Overview . . . . . . . . . . . . . . . . . . . . . . . . . 5 66 4. Packet Loss Measurement . . . . . . . . . . . . . . . . . . . . 8 67 5. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 68 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 69 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 70 7.1 Normative References . . . . . . . . . . . . . . . . . . . 8 71 7.2 Informative References . . . . . . . . . . . . . . . . . . 8 73 1. Introduction 75 Performance Monitoring(PM) is a crucial part of network OAM, which 76 mainly includes the packet loss and delay measurement. PM methods are 77 usually classified into two categories: active(involving the addition 78 of test traffic) or passive(no interference with normal traffic). 79 Both of active and passive methods have their own strengths. Active 80 method needs injecting test traffic from one measurement point to the 81 other point, which can not be guaranteed to experience the same path 82 with the data traffic where Equal Cost Multiple Paths(ECMP) exists. 83 However, in overlay network(e.g. VxLAN) ECMP is common, which means 84 passive method is more appropriate. 86 IP Identification(IPID) is a field in IP header, which can be used to 87 implement the passive PM method. The example IPv4 header is shown in 88 Figure 1. IPID is primarily used for uniquely identifying the group 89 of fragments of a single IP packet. The value of IPID field in a 90 packet from a specific traffic flow or source IP address keeps 91 increasing until wrapped-around. 93 0 1 2 3 94 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 95 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 96 |Version| IHL |Type of Service| Total Length | 97 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 98 | Identification |Flags| Fragment Offset | 99 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 100 | Time to Live | Protocol | Header Checksum | 101 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 102 | Source Address | 103 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 104 | Destination Address | 105 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 106 | Options | Padding | 107 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 109 Figure 1: Example IPv4 Header 111 IPID is required to be unique within the maximum lifetime for all 112 packets with a given source address/destination address/protocol 113 tuple. Hence, each packet in a specific flow has a unique IPID. 114 Packets within a flow continuously increase the IPID value till it 115 reaches the maximum value. Then it wraps around and increases again. 117 An example Controller-based VxLAN network can be shown as Figure 2. 118 There is a controller connects to NVE A and NVE B. Assume there is a 119 flow transmitted from VM1 to VM3(VM1->NVE A->SW M->SW N->NVE B->VM3), 120 it is necessary to implement the packet loss measurement at NVE A and 121 NVE B. 123 This document specifies a method by carefully examining IPID value to 124 monitor the performance of Controller-based VXLAN network. In this 125 memo packet loss measurement is mainly considered. The Controller 126 will specify which flow to be monitored. Before start monitoring, it 127 will send the flow information to the specific NVEs. During the 128 monitoring period, the Controller will collect statistical 129 information from the specific NVEs in order to measure t packet loss 130 and delay value. 132 *************************** 133 * +--------------+ * 134 * | Controller | * 135 * +-|---------|--+ * 136 * / | | \ * 137 * / | | \ * 138 +---------+ * / | | \ * +---------+ 139 |+---+ | * / | | \* | +---+| 140 ||VM1| | +--/+ +-|-+ +-|-+ +-\-+ | |VM3|| 141 |+---+ +---+NVE+---+SW +-----+SW +---+NVE+---+ +---+| 142 | +---+| +-A-+ +-M-+ +-N-+ +-B-+ |+---+ | 143 | |VM2|| * * ||VM4| | 144 | +---+| * VxLAN Overlay * |+---+ | 145 +---------+ * Network * +---------+ 146 Tenant * * Tenant 147 System * * System 148 *************************** 150 Figure 2: Example Controller-based VxLAN Network 152 This method requires no extra hardware support, which means it is 153 compatible with most of the deployed routers or switches. Such a 154 mechanism is applicable to IPv4 network and potential useful in 155 overlay network with different data encapsulation. 157 2. Terminology 159 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 160 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 161 document are to be interpreted as described in RFC 2119 [RFC2119]. 163 This document makes use of the following terms, additional terms are 164 defined in [RFC7348] 165 o ECMP - Equal Cost Multiple Paths 167 o IPID - IP Identification 169 o MSB - Most Significant Bit 171 o OG - Observation Group 173 o PM - Performance Monitoring 175 3. IPID Overview 177 This document mainly considers the IPID in IPv4 header. As defined 178 in[RFC791], IPID field holds 16 bits. It is used together with the 179 source and destination address, and the protocol fields, to identify 180 datagram fragments for reassembly. 182 There used to be some experimental works using IPID field for other 183 purposes, such as for adding packet-tracing information to help trace 184 packets with spoofed source addresses[Savage_2000]. However, 185 [RFC6864] prohibits these kind of uses. It claims that the IPv4 ID 186 field MUST NOT be used for purposes other than fragmentation and 187 reassembly. Besides, [Chen_2004] describes that the 16-bit IPID 188 field carries a copy of the current value of a counter in a host's IP 189 stack. Current versions of Windows implement this counter as a 190 global counter. That is, IPID value is continuously increasing per 191 source IP address. On the contrary, current versions of Linux 192 implement this counter as a per-flow counter. That is, IPID value is 193 continuously increasing in a per flow fashion. The authors also did 194 extensive experiment to prove the incremental feature of IPID value. 195 To sum up, IPID field can only be set by the Tenant-system and used 196 as a sequence number of packets flow. 198 Observing IPID's incremental feature, it is possible to take one bit 199 in IPID field as the Criterion bit(C bit), to divide one packets flow 200 into several Observation Groups(OGs). By collecting the observed 201 packet number and starting time of each OG from the relevant NVEs, 202 the controller is able to measure packet loss and delay of each flow. 204 The VxLAN encapsulation [RFC7348] includes an outer IP header and an 205 inner IP header, both of which have its own IPID field - i.e., the 206 outer IPID and the inner IPID respectively. Because it's the inner 207 header that reflects the real flow info, this memo only use the inner 208 IPID for performance monitoring. 210 Theoretically, each bit of IPID field can be used as the C bit. But 211 selecting the Criterion bit is a little bit tricky, because high- 212 order bit varies slowly while low-order bit varies quickly. The 213 selection of C bit have to take the flow rate into consideration. To 214 illustrate, as Figure 3 shows, if taking IPID's most significant 215 bit(MSB) as the C bit, then each OG contains up to 2^15 = 32,768 216 packets. In the real deployment in data center network, most of the 217 user traffic is usually lower than the rate of 1G bps. In this case, 218 IPID will wrap-around in approximate 0.8s. When user traffic is up 219 to 10G bps, the IPID will wrap-around more quickly, may be less than 220 80ms. 222 0 1 223 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 224 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 225 | | | | | | | | | | | | | | | |C| 226 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 228 Figure 3: Example Criterion Bit 230 Figure 4 is a simple example to illustrate how the C bit is used to 231 divide the packets flow into sequential OGs. Assuming the first 232 packet observed holds the IPID value 0x00FC(bit 8 = 0). The first 4 233 packets hold the same C bit(C = 0) while the last 4 packets hold the 234 same C bit(C = 1). 236 Index H C L 237 +-+ 238 1 0 0 0 0 | 0 0 0|0|| 1 1 1 1 | 1 1 0 0 <-+ 239 2 0 0 0 0 | 0 0 0|0|| 1 1 1 1 | 1 1 0 1 | Group 1 240 3 0 0 0 0 | 0 0 1|0|| 1 1 1 1 | 1 1 1 0 | (C = 0) 241 4 0 0 0 0 | 0 0 1|0|| 1 1 1 1 | 1 1 1 1 <-+ 242 5 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 0 0 <-+ 243 6 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 0 1 | Group 2 244 7 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 1 0 | (C = 1) 245 8 0 0 0 0 | 0 0 1|1|| 0 0 0 0 | 0 0 1 1 <-+ 246 ... ... +-+ ... ... Group k 248 Figure 4: Example C bit based OG division 250 To illustrate, as shown in Figure 2 VM1 initiates a communication to 251 VM3. The packets flow from VM1 to VM3 will go through NVE A/B and 252 underlay switch M/N . The Controller will send a PM command to NVE A 253 and NVE B simultaneously. The PM command specifies the following 254 information: 256 1. which bit in IPID field will be taken as the C bit; 258 2. the basic flow information, including IP address of VM1 and VM3 259 and the the protocol type(e.g. TCP or UDP). 261 On receipt of this command, NVE A/B will count the transmitted 262 /received packets respectively in each OGs. The OGs are divided based 263 on the value of C bit. An integrated OG could be determined by two 264 adjacent reversal of C bit. To illustrate, as shown in Figure 4, 265 reversal from 0 to 1 could be seen as the start point of group 2 266 while reversal from 1 to 0 could be seen as the end point of group 2. 268 When NVE A and B start to count, firstly they have to determine the 269 integrated OGs. Then NVE A and NVE B will report the counting 270 results to the Controller. 272 The example counting results of NVE A is shown as below 274 +-------------+-------+---------+ 275 | Group index | C bit | pkt num | 276 +-------------+-------+---------+ 277 | 1 | 1 | a | 278 | 2 | 0 | b | 279 | 3 | 1 | c | 280 | 4 | 0 | d | 281 +-------------+-------+---------+ 283 Table 1: Example counting results of NVE A 285 Each time an integrated OG is counted, NVE A will report the results 286 to the Controller. The controller will record the time on receipt of 287 the results as t_A. 289 The example counting results of NVE B is shown as below 291 +-------------+-------+---------+ 292 | Group index | C bit | pkt num | 293 +-------------+-------+---------+ 294 | 1 | 0 | k' | 295 | 2 | 1 | a' | 296 | 3 | 0 | b' | 297 | 4 | 1 | c' | 298 +-------------+-------+---------+ 300 Table 2: Example counting results of NVE B 302 NVE B will report the counting results to the controller in the same 303 way as NVE A. The controller will also record the time on receipt of 304 the results as t_B. 306 In order to determine whether these two OGs are matched, the 307 Controller has to go through the following two step 309 1. compare the C bit value of these two OGs, 311 2. compare |t_A - t_B| with the value of T, where T is the time 312 duration of one single OG. T is determined by the configuration of C 313 bit and the flow rate. 315 For example, OG(1) in Table 1 has C = 1 while OG(1) in Table 2 has C 316 = 0. These two OGs do not have the same C bit value, thus the 317 Controller does not consider these two OGs are matched. On the other 318 hand, OG(2) in Table 2 is the next immediate OG and has C = 1. These 319 two OGs have the same C bit value, then the Controller will go to 320 next step to compare |t_A - t_B| with T. If |t_A - t_B| < T, then the 321 Controller considers these two OGs are matched. Otherwise, the 322 Controller considers these two OGs are not matched and simply ignores 323 them. For the case these two OGs are matched, packet number counted 324 in these two OGs can be used to determine whether the packet loss 325 take place between NVE A and NVE B. 327 4. Packet Loss Measurement 329 Packet loss measurement could be done by comparing the counted packet 330 number between the matched OGs. In the example of Section 3, packet 331 loss could be computed as follows: 333 Pkt_Loss = |a - a'| + |b - b'| + |c - c'|. 335 5. Security Considerations 337 Security considerations are not addressed in this document. 339 6. IANA Considerations 341 No IANA action is needed for this document. 343 7. References 345 7.1 Normative References 347 [RFC791] Postel, J., "Internet Protocol", September 1981. 349 7.2 Informative References 351 [Chen_2004] Chen, W., Huang, Y., Ribeiro, B., Suh, K., Zhang, H., 352 Silva, E., Kurose, J. and D. Towsley, "Exploiting the IPID 353 field to infer network path and end-system 354 characteristics", 2004. 356 [RFC6864] Touch, J., "Updated Specification of the IPv4 ID Field", 357 February 2013. 359 [RFC7348] Mahalingam, M., Dutt, D., Duda, K., Agarwal, P., Kreeger, 360 L., Sridhar, T., Bursell, M. and C. Wright, "Virtual 361 eXtensible Local Area Network (VXLAN): A Framework for 362 Overlaying Virtualized Layer 2 Networks over Layer 3 363 Networks", August 2014. 365 [Savage_2000] Savage, S., Wetherall, D., Karlin, A. and T. Anderson, 366 "Practical Network Support for IP Traceback", October 367 2000. 369 Authors' Addresses 371 Hao Chen 372 Huawei Technologies 373 101 Software Ave., Yuhuatai Dist. 374 Nanjing, Jiangsu 210012 375 China 377 Phone: +86-25-56628107 378 EMail: philips.chenhao@huawei.com