idnits 2.17.1 draft-cho-rohc-tcp-interflow-behaviour-00.txt: -(68): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(110): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(205): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(207): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(292): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(441): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(508): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(520): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(646): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(647): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(650): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(654): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(753): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(755): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(820): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(875): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(879): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(882): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(895): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(987): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(990): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1000): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1093): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1095): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1097): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1110): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1187): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding -(1236): Line appears to be too long, but this could be caused by non-ascii characters in UTF-8 encoding Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Looks like you're using RFC 2026 boilerplate. This must be updated to follow RFC 3978/3979, as updated by RFC 4748. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing document type: Expected "INTERNET-DRAFT" in the upper left hand corner of the first page == There are 63 instances of lines with non-ascii characters in the document. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 26 longer pages, the longest (page 2) being 59 lines == It seems as if not all pages are separated by form feeds - found 0 form feeds but 27 pages Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the RFC 3978 Section 5.4 Copyright Line does not match the current year == Line 24 has weird spacing: '... It is inapp...' -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (February 9, 2004) is 7382 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) ** Obsolete normative reference: RFC 2581 (Obsoleted by RFC 5681) ** Obsolete normative reference: RFC 2234 (Obsoleted by RFC 4234) -- Possible downref: Non-RFC (?) normative reference: ref. 'ROHC-TCP' -- Possible downref: Non-RFC (?) normative reference: ref. 'TCP-BEH' -- Possible downref: Non-RFC (?) normative reference: ref. 'ROHC-CR' -- Possible downref: Non-RFC (?) normative reference: ref. 'ROHC-FN' -- Possible downref: Non-RFC (?) normative reference: ref. 'EPIC-LITE' -- Possible downref: Non-RFC (?) normative reference: ref. 'EPIC-IMPL' -- Possible downref: Non-RFC (?) normative reference: ref. 'TCP-WIN' Summary: 6 errors (**), 0 flaws (~~), 6 warnings (==), 9 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Network Working Group Chia Yuan Cho 3 Internet-document Sukanta Kumar Hazra 4 Expires: August 2004 6 February 9, 2004 8 Statistical Inter-flow Field Behaviour 9 for Context Replication in ROHC-TCP 10 12 Status of This Memo 14 This document is an Internet-Draft and is in full conformance with 15 all provisions of Section 10 of RFC2026. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or obsoleted by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference 25 material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html. 33 Copyright Notice 35 Copyright (C) The Internet Society (2004). All Rights Reserved. 37 Abstract 39 Context replication increases header compression gains by reducing 40 the redundancy between flows via efficient replicate (IR-CR) packets. 41 The optimum design of IR-CR packet formats requires elaborate 42 understanding of the inter-flow redundancy. As context replication is 43 most well-suited for TCP, this document presents a statistical 44 analysis of TCP/IP inter-flow field behaviour. Based on the analysis, 45 recommendations on ROHC-TCP packet format specifications for context 46 replication are made. It is also shown that inter-flow field 47 behaviour is inherently and significantly asymmetrical, and various 48 ways of handling it are considered. Finally, based on the inter-flow 49 behaviour of TCP Window field, it is noted that current encoding 50 methods do not compress it efficiently. 52 Internet-document Statistical Inter-flow Field Behaviour February 2004 53 for Context Replication in ROHC-TCP 55 Table of contents 57 1. Introduction....................................................2 59 2. Terminology.....................................................3 61 3. Header Compression Model........................................4 63 4. Methodology.....................................................6 65 5. Results.........................................................9 67 5.1. IPv4 Identification......................................11 68 5.2. IP Don�t Fragment and Time To Live.......................13 69 5.3. IP Destination Address...................................14 70 5.4. TCP Source Port..........................................15 71 5.5. TCP Destination Port.....................................16 72 5.6. TCP Sequence Number and Acknowledgement Number...........17 73 5.7. TCP Flags and Urgent Pointer.............................18 74 5.8. TCP Window...............................................18 75 5.9. TCP Checksum.............................................21 76 5.10. TCP Options..............................................21 77 5.11. Mean Sizes of Compressed Fields..........................21 79 6. Handling Asymmetrical Inter-flow Behaviour.....................22 81 7. Security Considerations........................................23 83 8. References.....................................................23 85 9. Authors' Addresses.............................................24 87 Appendix A. State Transition Threshold............................26 89 1. Introduction 91 Context replication offers an alternative to the conventional context 92 initialization procedure by performing context initialization via 93 more efficient IR-CR packets. In contrast to IR packets, which 94 contain mostly uncompressed fields, IR-CR packets carry compressed 95 header fields, obtained by reducing the redundancy between packets of 96 different flows. As such, header compression can possibly start right 97 from the first packet of a flow and compression efficiency is 98 improved. 100 The motivations for context replication, as well as elaborations on 101 the context replication mechanism are already in [ROHC-CR]. Although 102 context replication is a general ROHC mechanism, this document 103 focuses on the application of context replication to the ROHC-TCP 104 profile in particular. This is because the motivation for context 106 Internet-document Statistical Inter-flow Field Behaviour February 2004 107 for Context Replication in ROHC-TCP 109 replication originated from the ROHC-TCP profile, and furthermore due 110 to TCP's �short-lived' characteristic, context replication is able to 111 improve header compression gains most significantly for the ROHC-TCP 112 profile. 114 Context replication is possible due to significant redundancy between 115 multiple simultaneous, or near-simultaneous flows passing through the 116 same compressor-decompressor pair. For any header compression scheme 117 to work, the first step has to be towards understanding the field 118 behaviour to recognize areas of redundancy. The nature of context 119 relication focuses on relatively unexplored inter-flow field 120 behaviour, rather than well-understood intra-flow field behaviour. In 121 that aspect, [TCP-BEH] provides an elaborate qualitative analysis on 122 TCP/IP field behaviour. However, it has focused more on the intra- 123 flow aspect rather than the inter-flow aspect, for which this 124 document is meant in part as an extension. The difficulty in 125 understanding and describing inter-flow field behaviour is compounded 126 by the fact that it depends on human usage patterns, in addition to 127 the underlying protocol characteristics. This gives inter-flow field 128 behaviour a much larger variance and higher degree of uncertainty. 130 In this document, a method of extracting the inter-flow field 131 behaviour relevant for context replication is presented, as well as 132 the quantitative results of statistical analysis on the TCP/IP inter- 133 flow behaviour, based on four TCPdump traces containing 1.9 million 134 TCP/IP packet samples. From the results, a number of 135 recommendations are made. Firstly, the possibly optimum combination 136 of encoding methods to be used for each field during context 137 replication are recommended, as well as parameters and estimated 138 probabilities of success for each encoding method. Secondly, it is 139 shown that inter-flow field behaviour is significantly asymmetrical, 140 and ways of handling this behaviour are explored. Finally, it is 141 noted that current encoding methods can be improved upon to compress 142 the Window field more efficiently. 144 For verification of the replicate packet format specifications 145 prescribed in this document, the EPIC-LITE implementation [EPIC-IMPL] 146 from the University of Split was modified to support context 147 replication. 149 2. Terminology 151 This document reuses some of the terminology found in [RFC-3095], 152 [ROHC-TCP], [ROHC-CR], [TCP-BEH], [EPIC-LITE] and [ROHC-FN]. In 153 addition, this document defines the following terms: 155 'Incoming' and 'Outgoing' Packets 156 'Incoming' packets are packets traveling towards client hosts 157 through the channel of interest over which ROHC is employed. 159 Internet-document Statistical Inter-flow Field Behaviour February 2004 160 for Context Replication in ROHC-TCP 162 'Outgoing' packets are packets traveling away from client hosts 163 through the channel of interest over which ROHC is employed. 165 Asymmetrical Header Compression 166 Header Compression is performed asymmetrically when 'incoming' and 167 'outgoing' packets are compressed differently. This requires the 168 packet format specifications for compressor-decompressor pairs to 169 be configured differently depending on the direction of packet 170 flow they deal with. 172 Replication Match Rate 173 The replication match rate for a trace is defined as the percentage 174 of uni-directional flows within the trace which can be context 175 replicated. A new flow is replicable when there is at least one 176 suitable base context present in the compressor upon arrival of the 177 first packet of the flow. This is used as a form of measure to 178 estimate the probability of using context replication for context 179 initialization. 181 State Transition Threshold 182 The State Transition Threshold for a uni-directional flow is the 183 number of initial TCP/IP packets (near the start of a flow) 184 converted into IR or IR-CR packets. 186 3. Header Compression Model 188 With the objective of extracting the TCP/IP inter-flow field 189 behaviour, we focus on the deployment of ROHC over the final hop. The 190 ROHC compressor-decompressor pair is deployed at the two endpoints of 191 the (possibly wireless) low-bandwidth channel and cooperates to 192 transmit packets efficiently in the direction towards the 193 decompressor. Since TCP requires a full-duplex channel, another 194 compressor-decompressor pair may be present to compress packets in 195 the reverse direction. Considering the direction of flow of packets 196 with respect to clients using the low-bandwidth channel, packets can 197 thus be classified as 'incoming' and 'outgoing'. 'Incoming' and 198 'outgoing' packets use different compressor-decompressor pairs. This 199 is shown in Fig. 1. 201 Although ROHC was originally targeted at cellular links, the 202 convergence of the telecommunication and computer communication 203 industries means that it may be employed over wireless links in 204 general. As such, the header compression model in Fig. 1 does not 205 define the target �low-bandwidth� channel explicitly. Mobile Terminal 206 clients are connected to the Internet via a last-hop router node as 207 seen in Fig. 1, on which we focus on the �header compression entity� 208 situated on the data link layer of the node. This can have 209 different manifestations depending on the nature of the wireless 211 Internet-document Statistical Inter-flow Field Behaviour February 2004 212 for Context Replication in ROHC-TCP 214 +---+ 'outgoing' 215 | C |--- 216 +---+ --- +-------+ +------+ 217 | D |<-- --- | +---+ | -->|Server| 218 +---+ --- -->| | D | | - - - - - - - - - -- +------+ 219 --- | +---+ | / \ -- 220 'incoming' ---| | C | | | |<-- 221 | +---+ | | | +------+ 222 Clients | |<->| Internet |<-------->|Server| 223 | +---+ | | | +------+ 224 'outgoing' -->| | D | | | |<-- 225 --- | +---+ | \ / -- 226 +---+ --- ---| | C | | - - - - - - - - - -- +-------+ 227 | C |--- --- | +---+ | -->| Other | 228 +---+ --- +-------+ |Clients| 229 | D |<-- Last-hop +-------+ 230 +---+ 'incoming' Router 231 |__________| |______________________|________| 233 Low-bandwidth Wired Wired or Wireless 234 Channel 236 C - Compressor 237 D - Decompressor 239 Fig. 1: Header compression model showing 'incoming' and 'outgoing' 240 flows 242 link. For example, in Universal Mobile Telecommunications System 243 (UMTS), the ROHC entity is part of the Packet Data Convergence 244 Protocol (PDCP) sub-layer on a Base Station; if ROHC is employed over 245 Wireless Ethernet (IEEE 802.11), it can be part of the data link 246 layer on a wireless router; in Mobile Ad Hoc networks, the ROHC 247 entity can reside on a �forwarding node�. 249 Due to the nature of the protocol suite under study, we expect 250 client-server computing to dominate over peer-to-peer, as is the case 251 currently. As such, 'incoming' and 'outgoing' flows are inherently 252 asymmetrical. As noted in [ROHC-TCP], some asymmetry is already 253 present in TCP/IP intra-flow field behaviour. An example is the 254 relationship between TCP Sequence Number and Acknowledgement Number, 255 for which 'outgoing' flows are likely to exhibit large deltas between 256 consecutive packets in Acknowledgement Number and small deltas in 257 Sequence Number, but the converse is likely for 'incoming' flows. 258 With respect to context replication, [ROHC-TCP] also acknowledges 259 some inter-flow asymmetry in the TCP source/destination port. 261 Internet-document Statistical Inter-flow Field Behaviour February 2004 262 for Context Replication in ROHC-TCP 264 As will be shown in Section 5, asymmetry becomes even more pronounced 265 between flows. The above figure partly serves to illustrate that 266 asymmetrical header compression, if desired, can be achieved by 267 configuring compressor-decompressor pairs differently based on their 268 'incoming' or 'outgoing' role. 270 Finally, it should be noted that the focus on ROHC over the final hop 271 in Fig. 1 does not reduce the scope of applicability in the obtained 272 results on inter-flow behaviour. In general, header compression may 273 be deployed over any hop, e.g. over a core network links in Multiple 274 Protocol Label Switching (MPLS), or over intermediate hops in Mobile 275 Ad Hoc networks. Regardless of the location of ROHC deployment, the 276 TCP/IP endpoints remain the same. The advantage of focusing on the 277 last hop, then, is that it allows any asymmetrical behaviour to be 278 distilled. Bi-directional asymmetry over intermediate hops causes 279 inherent asymmetrical behaviour to be lost. However, over 280 intermediate hops, inter-flow results continue to be applicable using 281 the symmetric treatment as prescribed in Section 6. 283 4. Methodology 285 Given the bizarre range of inter-flow field behaviour, a suitable 286 methodology for obtaining inter-flow field behaviour relevant for 287 context replication is proposed. 289 Inter-flow field behaviour can be obtained by emulating a context- 290 replication enabled compressor. To observe any asymmetrical 291 behaviour, Tcpdump traces are fed into the �compressor emulator� 292 separately, according to the direction they flow, i.e. �incoming� or 293 �outgoing�. Thus, the emulator simulates the compressors found on 294 client terminals and routers in the �outgoing� and �incoming� 295 directions respectively. In the same way as a compressor, the 296 emulator creates, maintains and updates a list of contexts 297 dynamically for each arriving packet. 299 The emulator keeps an extensible list of contexts, one for each 300 unique TCP connection, arranged in a Most Recently Used (MRU) stack. 301 Each TCP/IP packet updates its context unique for that flow. A 302 context retrieved for updating or referencing is placed at the top 303 of stack, followed by its base context, if a base context has just 304 been simultaneously used as reference. Whenever possible, each new 305 flow is context replicated. Context replication is possible when a 306 base context exists, with the implementation-dependent selection 307 criteria requiring the IP source to be shared, and with preference 308 but no necessity for the same IP destination. For simplicity, all 309 contexts are assumed to be acknowledged by default. Furthermore, if 310 the first packet of a flow can be context replicated, then it is 311 assumed that the subsequent two packets of the flow would also be 313 Internet-document Statistical Inter-flow Field Behaviour February 2004 314 for Context Replication in ROHC-TCP 316 replicated. This means that up to the first 3 packets of each flow 317 are converted into IR-CR packets. This number is the upper bound of 318 the State Transition Threshold range, and is based on the estimate of 319 the upper bound of TCP/IP packets possibly converted to IR-CR 320 packets. Elaboration on this would be done in Appendix A. 322 Even though we show results at the upper bound of the State 323 Transition Threshold, it was also found that the inter-flow field 324 behaviour remains invariant at smaller State Transition Threshold 325 values. 327 For the purpose of this study, four Tcpdump traces totaling 1.9 328 million packets were captured from within the Local Area Network of 329 the Institute for Infocomm Research. The LAN configuration is shown 330 in Fig. 2. Macro statistics of each trace are shown in the Table 1. 332 +--------+ 333 | Client | 334 |Terminal|<- 335 +--------+ - 336 - +--------+ 337 ->|Last-Hop| 338 ->| Router |<- 339 - +--------+ - 340 +--------+ - - +--------+ 341 | Client |<- ->| NAT | 342 |Terminal| ->| Router |<- 343 +--------+ - +--------+ - 344 +--------+ - - +--------+ 345 |Last-Hop|<- ->| Border |<->Internet 346 ->| Router | ->|Gateway | 347 - +--------+ +--------+ - +--------+ 348 - | NAT |<-- 349 ... <- ->| Router | 350 - +--------+ 351 - 352 ... <- 354 Fig. 2: Configuration of Local Area Network 356 Three out of four traces were captured at the Border Gateway, so that 357 traffic from a large number of client terminals can be gathered in 358 each single trace. However, as in most LANs, Network Address 359 Translation (NAT) is in use. NAT transparently changes �outgoing� 360 Source IP Address and Port, as well as �incoming� Destination IP 361 Address and Port. Thus, packets captured at the Border Gateway 362 reflect the changed values rather than original values. To deal with 363 this, the forth trace TCP180903 captured at a client terminal was 364 used to investigate these fields as well as to verify results from 365 traces captured at the Border Gateway. 367 Internet-document Statistical Inter-flow Field Behaviour February 2004 368 for Context Replication in ROHC-TCP 370 +---------------+-----------+------------+------------+-----------+ 371 | Trace | TCP180803 | TCP080903a | TCP080903b | TCP180903 | 372 |Identification | | | | | 373 +---------------+-----------+------------+------------+-----------+ 374 | Duration | 30 min | 30 min | 30 min | 27.4 hrs | 375 +---------------+-----------+------------+------------+-----------+ 376 | Location | Gateway | Gateway | Gateway | Client | 377 | | Router | Router | Router | Terminal | 378 +---------------+-----------+------------+------------+-----------+ 379 |No. of packets | 516172 | 509281 | 507293 | 383594 | 380 +---------------+-----------+------------+------------+-----------+ 381 | Replication | 97.5 | 94.4 | 94.3 | 93.4 | 382 | Match Rate(%) | | | | | 383 +---------------+-----------+------------+------------+-----------+ 385 Table 1: Macro statistics of TCPdump traces 387 By using packets captured from our LAN, it is assumed that TCP/IP 388 inter-flow field behaviour does not vary significantly between the 389 wired Ethernet-based channel and the target low bandwidth, possibly 390 less reliable channel where header compression takes place. Provided 391 the header compression layer is sufficiently robust to be 392 transparent, this is reasonable because the upper (network, 393 transport and application) layer protocol characteristics and human 394 usage behaviour remains the same. 396 It is desired that the inter-flow behaviour of TCP/IP fields are 397 mapped using a system of classification such that fields within a 398 category share the same characteristic. [TCP-BEH] already provides a 399 good system of classification for intra-flow field behaviour: 400 INFERRED, STATIC, STATIC-DEF, STATIC-KNOWN, CHANGING, where each 401 category follows some general trend(s) hinting how fields in that 402 category may be compressed. For inter-flow behaviour, [TCP-BEH] uses 403 a different system of classification: 'N/A/', 'No', 'Yes', which 404 unfortunately does not achieve the same level of effectiveness, 405 because one can only discern whether a field is compressible for 406 context replication, but does not know how to suitably compress it. 407 Therefore, in this document, the inter-flow field behaviour is 408 classified based on the same categories as used for intra-flow 409 behaviour: INFERRED, STATIC, STATIC-KNOWN, CHANGING. However, it 410 should be noted that the context here lies in inter-flow field 411 behaviour. Furthermore, here STATIC-DEF is merged into STATIC because 412 it is meaningless to define a STATIC category for fields defining a 413 packet stream where inter-flow field behaviour is concerned. 415 Classification can be done with the help of observing the range of 416 deltas. Here, delta is defined as the difference in field value 417 between that in the current packet and the stored field value in the 418 base context. The delta analysis is useful for the following reasons. 419 For any field not known to be INFERRED or STATIC-KNOWN, if delta = 0 420 in all samples, then this field is a STATIC field. If not, the field 422 Internet-document Statistical Inter-flow Field Behaviour February 2004 423 for Context Replication in ROHC-TCP 425 is categorized as CHANGING. For CHANGING fields, by further analyzing 426 the range of deltas obtained, it can be found whether the field can 427 still be encoded using the STATIC encoding method with significant 428 probability. Since deltas tend to be small, the number of least 429 significant bits used (in LSB encoding) to encode that field with a 430 significant probability of success can be determined. Fields which 431 tend to have uniformly distributed deltas may only be suitably 432 encoded as IRREGULAR. Finally, where certain unique trends are 433 observed for a field, raw and/or network-byte-order converted 434 versions of field values are also studied. 436 5. Results 438 Our initial categorization is shown in Table 2. Differences between 439 intra-flow classification (in [TCP-BEH]) and inter-flow 440 classification here are marked with '(2)'. At this stage, there is no 441 asymmetry observed in categorization between �incoming and �outgoing� 442 flows. 444 +-----------------------------------+------------+ 445 | Field |Category | 446 +-----------------------------------+------------+ 447 |IPv4 Version |STATIC | 448 |IPv4 Header Length |STATIC-KNOWN| 449 |IPv4 Type Of Service |STATIC(1) | 450 |IPv4 ECN Capable Transport |STATIC(1) | 451 |IPv4 Congestion Experienced |STATIC(1) | 452 |IPv4 Packet Length |INFERRED | 453 |IPv4 Identification |CHANGING | 454 |IPv4 Reserved Flag |STATIC(1) | 455 |IPv4 Don�t Fragment Flag |CHANGING | 456 |IPv4 More Fragments Flag |STATIC-KNOWN| 457 |IPv4 Fragment Offset |STATIC-KNOWN| 458 |IPv4 Time To Live |CHANGING | 459 |IPv4 Protocol |STATIC | 460 |IPv4 Header Checksum |INFERRED | 461 |IPv4 Source Address |STATIC | 462 |IPv4 Destination Address |CHANGING(2) | 463 |TCP Source Port |CHANGING(2) | 464 |TCP Destination Port |CHANGING(2) | 465 |TCP Sequence Number |CHANGING | 466 |TCP Acknowledgement Number |CHANGING | 467 |TCP Data Offset |INFERRED | 468 |TCP Reserved |STATIC(1) | 469 |TCP Congestion Window Reduced |STATIC(1) | 470 |TCP Echo Congestion Experienced |STATIC(1) | 471 |TCP URG flag |CHANGING | 472 |TCP ACK flag |CHANGING | 473 |TCP PSH flag |CHANGING | 474 |TCP RST flag |CHANGING | 476 Internet-document Statistical Inter-flow Field Behaviour February 2004 477 for Context Replication in ROHC-TCP 479 |TCP SYN flag |CHANGING | 480 |TCP FIN flag |CHANGING | 481 |TCP Window |CHANGING | 482 |TCP Checksum |CHANGING | 483 |TCP Urgent Pointer |CHANGING | 484 |TCP Options |CHANGING | 485 +-----------------------------------+------------+ 486 (1)These fields were found to be STATIC from samples, but context 487 replication should follow the classification in [TCP-BEH] for 488 future-proofing. 489 (2)Differs from intra-flow classification [TCP-BEH] due to context 490 replication. 491 Table 2: TCP/IP Fields and Classifications 493 Some changes in categorization are made in this study because of the 494 current slow adoption of IP and TCP congestion notification fields. 495 However, these fields are expected to be used in the future and 496 should be CHANGING instead of STATIC. 498 The encoding methods to be used for STATIC, STATIC-KNOWN and INFERRED 499 fields are straightforward, but CHANGING fields need to be further 500 analyzed. This will be unraveled in subsequent sub-sections. CHANGING 501 fields can sometimes be encoded with STATIC, LSB, or other encoding 502 methods with significant probability. For LSB encoding, it is desired 503 to determine the suitable number of least significant bits to be used 504 to encode that field. Therefore, our frequency bins are defined in 505 increasing ceil(log2(|delta|+1)) (the reason for this expression 506 will be elaborated later in this section), which is effectively the 507 minimum number of bits possibly used to encode delta values within 508 that bin. Negative delta values are mapped to �ceil(log2(|delta|+1)), 509 and are useful for defining the offset value used in LSB encoding. 510 From our frequency tables, we can also derive the correct combination 511 of encoding methods to use, as well as the estimated probability of 512 each encoding method being used. 514 The inter-flow behaviour of CHANGING fields can be summarized 515 directly in the form of packet format specifications for IR-CR 516 packets. This is shown in Fig. 3, in EPIC-LITE terminology [EPIC- 517 LITE], which is derived from the BNF input language [RFC-2234]. To 518 illustrate asymmetrical inter-flow behaviour, packet format 519 specifications with any differences between 'incoming' and 'outgoing' 520 flows are defined separately for each field with the postfix �_in� or 521 �_out�. Note however that if the same set of encoding methods are 522 used in both directions for the same field, and only the 523 probabilities are different, then it may mean that significant 524 asymmetrical behaviour has not been observed. 526 Internet-document Statistical Inter-flow Field Behaviour February 2004 527 for Context Replication in ROHC-TCP 529 Identification_in ::= NBO(16) ;network byte order 530 LSB(3,-1,50%) | LSB(8,-1,17%) | IRREGULAR(33%) 531 Identification_out ::= NBO(16) 532 LSB(3,-1,65%) | LSB(8,-1,14%) | IRREGULAR(21%) 534 Don�t_Fragment_in ::= STATIC(73%) | IRREGULAR(1,27%) 535 Don�t_Fragment_out ::= STATIC(99%) | IRREGULAR(1,1%) 537 Time_To_Live_in ::= STATIC(98%) | IRREGULAR(8,2%) 538 Time_To_Live_out ::= STATIC(97%) | IRREGULAR(8,3%) 540 Destination_Address_in ::= STATIC(100%) 541 Destination_Address_out ::= STATIC(86%) | IRREGULAR(32,14%) 543 Source_Port_in ::= STATIC(70%) | IRREGULAR(16,30%) 544 Source_Port_out ::= LSB(3,0,73%) | LSB(8,0,14%) | IRREGULAR(16,13%) 546 Destination_Port_in ::= LSB(3,0,73%) | LSB(8,0,14%) | 547 IRREGULAR(16,13%) 548 Destination_Port_out ::= STATIC(70%)| IRREGULAR(16,30%) 550 Sequence_Number ::= IRREGULAR(32,100%) 552 Acknowledgement_Number_in ::= IRREGULAR(32,100%) 553 Acknowledgement_Number_out ::= VALUE(32,0,33%) | IRREGULAR(32,67%) 555 URG_flag ::= IRREGULAR(1,100%) 557 ACK_flag ::= IRREGULAR(1,100%) 559 PSH_flag ::= IRREGULAR(1,100%) 561 RST_SYN_FIN_flag ::= VALUE(3,2,30%) | VALUE(3,0,65%) | 562 IRREGULAR(3,5%) 564 Urgent_Pointer ::= STATIC(99%) | IRREGULAR(16,1%) 566 Window_in ::= STATIC(30%)| IRREGULAR(16,70%) 567 Window_out ::= STATIC(43%) | IRREGULAR(16,57%) 569 Fig. 3. Packet format specifications for CHANGING fields. 571 In Fig. 3, specifications are expressed in the notation used by EPIC- 572 LITE instead of the Formal Notation [ROHC-FN] due to a number of 573 reasons. Firstly, basic encoding methods used in both remain the 574 same, and so EPIC-LITE expressions can be easily converted into 575 Formal Notation. Moreover, the equivalent of the 'multiple_packet_ 576 formats' encoding method in ROHC-FN, used to specify multiple 577 encoding methods for a field, can be represented in a more compact 579 Internet-document Statistical Inter-flow Field Behaviour February 2004 580 for Context Replication in ROHC-TCP 582 form using the OR operator, '|' in EPIC-LITE. Also, because EPIC-LITE 583 involves Huffman coding, it allows the expression of the probability 584 of each encoding method being successful as a parameter, which is 585 also useful for expressing the frequency of use of an encoding 586 method. Finally, it allows the packet format specifications to be 587 readily verified via context replication implementation in EPIC-LITE. 589 Details of the inter-flow behaviour of each CHANGING field are 590 elaborated in the following sub-sections. 592 5.1. IPv4 Identification 594 Table 3 shows the distribution of delta values in logarithmic scale. 595 Note that for delta > 0, the number of bits used to encode the delta 596 may be expressed as n = ceil(log2(|delta|+1)), as we are trying to 597 find the smallest n for which delta <= 2^n - 1. For delta < 0, the 598 equivalent mapping is n = -ceil(log2(|delta|+1)). 600 +--------+---------------+-----------+-----------+ 601 |Encoded | Delta Range | Incoming | Outgoing | 602 |Bits,n | | Frequency | Frequency | 603 +--------+---------------+-----------+-----------+ 604 |-16 |[-65535:-32768]| 6.0% | 2.3% | 605 |-15 |[-32767:-16384]| 4.5% | 2.1% | 606 |-14 |[-16383:-8192] | 2.4% | 2.1% | 607 |-13 |[-8191:-4096] | 1.5% | 0.8% | 608 |-12 |[-4095:-2048] | 0.7% | 0.6% | 609 |-11 |[-2047:-1024] | 0.3% | 0.3% | 610 |-10 |[-1023:-512] | 0.2% | 0.1% | 611 |-9 |[-511:-256] | 0.1% | 0.1% | 612 |-8 |[-255:-128] | 0.1% | 0.1% | 613 |-7 |[-127:-64] | 0.0% | 0.0% | 614 |-6 |[-63:-32] | 0.0% | 0.0% | 615 |-5 |[-31:-16] | 0.0% | 0.0% | 616 |-4 |[-15:-8] | 0.0% | 0.0% | 617 |-3 |[-7:-4] | 0.1% | 0.0% | 618 |-2 |[-3:-2] | 0.2% | 0.2% | 619 |-1 |[-1] | 0.6% | 0.4% | 620 |0 |[0] | 0.3% | 0.0% | 621 |1 |[1] | 23.4% | 33.7% | 622 |2 |[2:3] | 20.6% | 20.8% | 623 |3 |[4:7] | 6.6% | 10.5% | 624 |4 |[8:15] | 3.9% | 4.3% | 625 |5 |[16:31] | 3.6% | 3.3% | 626 |6 |[32:63] | 3.6% | 2.4% | 627 |7 |[64:127] | 3.4% | 2.0% | 628 |8 |[128:255] | 2.3% | 1.6% | 629 |9 |[256:511] | 2.3% | 1.2% | 631 Internet-document Statistical Inter-flow Field Behaviour February 2004 632 for Context Replication in ROHC-TCP 634 |10 |[512:1023] | 1.7% | 1.2% | 635 |11 |[1024:2047] | 1.4% | 1.1% | 636 |12 |[2048:4095] | 0.9% | 1.0% | 637 |13 |[4096:8191] | 1.3% | 1.1% | 638 |14 |[8192:16383] | 2.5% | 2.3% | 639 |15 |[16384:3276] | 3.0% | 2.4% | 640 |16 |[32768:65535] | 2.4% | 1.9% | 641 +--------+---------------+-----------+-----------+ 643 Table 3: Frequency distribution of Identification delta 645 Slightly asymmetrical behaviour can be observed from Table 3. 646 �Incoming� replicated packets are less likely to be encoded within 3 647 bits compared to �outgoing� replicated packets. Moreover, �incoming� 648 delta values are more distributed, with higher occurrence of negative 649 deltas as well as deltas encodable between 6 to 10 bits. This is 650 reasonable because �incoming� replicated packets face larger deltas 651 due to busy servers handling multiple connections simultaneously or 652 near-simultaneously. 654 Inter-flow Identification deltas for �outgoing� replicated packets 655 tend to be smaller than for �incoming�, as clients do not usually 656 maintain a large number of simultaneous or near-simultaneous TCP 657 connections. 659 It should be noted that Table 3 depicts network-byte-order corrected 660 Identification deltas. Typical implementation policies of IPv4 661 Identification increment are: sequential (increments by 1), 662 sequential-jump (typically increments by 256) and random. Linux based 663 implementations usually implements the sequential policy, and older 664 versions of Microsoft Windows usually implements the sequential-jump 665 policy with a jump size of 256. This is the equivalent of 666 incrementing the more significant byte of the two-byte Identification 667 field by 1. From a compression viewpoint, sequential-jump 668 implementations can be network-byte-order corrected at the compressor 669 end and reverted back to the original form at the decompressor end. 670 This approach has the advantage of compressing Identification fields 671 generated from both policies efficiently using the same encoding 672 method. A network byte order (NBO) flag is communicated to 673 differentiate between the two policies. Randomly incremented 674 Identification implementations cannot be efficiently compressed and 675 are sent as-is. 677 Current proposals for context replication compresses the 678 IPv4 Identification field into 0 or 16 bits, using VALUE and 679 IRREGULAR encoding methods respectively. The VALUE encoding method is 680 suitable for protocols like DHCP, and is not seen in Fig. 3 because 681 we are focusing on TCP/IP. However, it can be seen from the above 682 inter-flow behaviour that this field can also be compressed more 683 efficiently using LSB encoding, with recommended parameters as shown 684 in Fig. 3. 686 Internet-document Statistical Inter-flow Field Behaviour February 2004 687 for Context Replication in ROHC-TCP 689 5.2. IP Don�t Fragment Flag and Time to Live 691 The DF Flag is a single bit which may be set or unset. Although it 692 may be impractical to allow multiple encoding methods for a single 693 bit field, for the sake of characterizing its behaviour, STATIC and 694 IRREGULAR encoding methods are used. The IPv4 TTL (or equivalently, 695 IPv6 Hop Limit) is a 8-bit field which remains constant when the 696 route between the two endpoints is unchanged; when the route does 697 change due to congestion, it is better to simply send the field 698 uncompressed. Therefore, DF can be further analyzed in the same 699 category as TTL: we either encode them as STATIC, or uncompressed as 700 IRREGULAR. The actual probabilities associated with each encoding 701 method based on the samples is shown in Table 4. 703 +----------------+--------+-----------+ 704 |Encoding Method | STATIC | IRREGULAR | 705 +----------------+--------+-----------+ 706 | �Incoming� flows | 707 +-------------------------------------+ 708 |Don�t Fragment | 72.8% | 27.2% | 709 |Time To Live | 98.1% | 1.9% | 710 +----------------+--------+-----------+ 711 | �Outgoing� flows | 712 +-------------------------------------+ 713 |Don�t Fragment | 98.5% | 1.5% | 714 |Time To Live | 96.9% | 3.1% | 715 +-------------------------------------+ 717 Table 4: Percentage frequency of STATIC and IRREGULAR for DF and TTL 719 5.3. IP Destination Address 721 We have allowed for an implementation to use context replication 722 for scenarios where packets share at least the same Source IP 723 Address, but the Destination IP Address may be different. Therefore, 724 the Destination IP Address may be STATIC or IRREGULAR for these two 725 scenarios. 727 The proportion of IR-CR packets replicable due to the same/different 728 Destination IP Address is of interest. This determines how effective 729 the use of context replication to cover different IP Destination 730 Addresses can be. This proportion is tabulated in Table 5. 732 Internet-document Statistical Inter-flow Field Behaviour February 2004 733 for Context Replication in ROHC-TCP 735 +------------+----------+-----------+ 736 | | STATIC | IRREGULAR | 737 +------------+----------+-----------+ 738 | 'Incoming' | 100.0% | 0.0% | 739 | 'Outgoing' | 85.8% | 14.2% | 740 +------------+----------+-----------+ 742 Table 5: Percentage frequency of STATIC and IRREGULAR for IP 743 Destination Address. 745 As can be noted from Table 5, the results are skewed towards STATIC 746 (same Destination IP Address). This is because our emulator selects 747 the base context with preference for sharing the same Source and 748 Destination IP Address, although it is much easier to find contexts 749 sharing only the same Source IP Address. For some intervals, the 750 proportion of 'outgoing' IRREGULAR cases got as high as 48%. 752 Asymmetry is again observed to be inherent between �incoming� and 753 �outgoing� flows. �Incoming� flows originating from Internet servers 754 are not likely to engage multiple common subnet clients within a 755 short period of time. However, the converse is true for �outgoing� 756 flows, corresponding to prevalent usage patterns. 758 Our results also justify the virtue of an implementation which 759 considers context replication for cases even when the Destination IP 760 Address is different. This maximizes context replication efficiency 761 gains for �outgoing� flows. 763 5.4. TCP Source Port 765 As can be seen from Table 6, clearly asymmetrical inter-flow 766 behaviour is observed for the TCP Source Port field. This behaviour 767 is seen mainly because ports at servers are well-known ports which 768 remain unchanged. 770 +---------------------------------------------+ 771 |Encoded | Delta Range | Incoming |Outgoing | 772 |Bits,n | | Frequency|Frequency| 773 +--------+---------------+----------+---------+ 774 |-16 |[-65535:-32768]| 0.0% | 0.0% | 775 |-15 |[-32767:-16384]| 0.0% | 0.0% | 776 |-14 |[-16383:-8192] | 0.0% | 0.0% | 777 |-13 |[-8191:-4096] | 0.0% | 0.3% | 778 |-12 |[-4095:-2048] | 5.8% | 0.2% | 779 |-11 |[-2047:-1024] | 1.8% | 0.6% | 780 |-10 |[-1023:-512] | 0.1% | 1.7% | 781 |-9 |[-511:-256] | 1.0% | 0.0% | 782 |-8 |[-255:-128] | 0.5% | 0.0% | 784 Internet-document Statistical Inter-flow Field Behaviour February 2004 785 for Context Replication in ROHC-TCP 787 |-7 |[-127:-64] | 0.7% | 0.0% | 788 |-6 |[-63:-32] | 0.7% | 0.0% | 789 |-5 |[-31:-16] | 0.0% | 0.0% | 790 |-4 |[-15:-8] | 0.0% | 0.0% | 791 |-3 |[-7:-4] | 0.3% | 0.1% | 792 |-2 |[-3:-2] | 0.0% | 0.1% | 793 |-1 |[-1] | 0.0% | 2.3% | 794 |0 |[0] | 72.0% | 15.8% | 795 |1 |[1] | 0.0% | 31.9% | 796 |2 |[2:3] | 0.0% | 17.4% | 797 |3 |[4:7] | 0.0% | 7.8% | 798 |4 |[8:15] | 0.1% | 4.7% | 799 |5 |[16:31] | 0.1% | 3.3% | 800 |6 |[32:63] | 0.3% | 2.0% | 801 |7 |[64:127] | 0.3% | 3.0% | 802 |8 |[128:255] | 0.7% | 1.1% | 803 |9 |[256:511] | 0.8% | 2.7% | 804 |10 |[512:1023] | 3.0% | 3.2% | 805 |11 |[1024:2047] | 10.5% | 1.5% | 806 |12 |[2048:4095] | 1.2% | 0.1% | 807 |13 |[4096:8191] | 0.0% | 0.3% | 808 |14 |[8192:16383] | 0.0% | 0.0% | 809 |15 |[16384:3276] | 0.0% | 0.0% | 810 |16 |[32768:65535] | 0.1% | 0.0% | 811 +--------+---------------+----------+---------+ 813 Table 6: Frequency distribution of Source Port delta 815 5.5. TCP Destination Port 817 The inter-flow behaviour of the TCP Destination Port field is shown 818 in Table 7. It can be observed that the trend is the opposite to 819 that of the TCP Source Port presented previously. This can be 820 accounted for obviously because the Destination Ports of �outgoing� 821 packets are the Source Ports of replying �incoming� packets. 823 +--------+---------------+-----------+-----------+ 824 |Encoded | Delta Range | Incoming | Outgoing | 825 |Bits,n | | Frequency | Frequency | 826 +--------+---------------+-----------+-----------+ 827 |-16 |[-65535:-32768]| 0.0% | 0.0% | 828 |-15 |[-32767:-16384]| 0.0% | 0.0% | 829 |-14 |[-16383:-8192] | 0.0% | 0.0% | 830 |-13 |[-8191:-4096] | 0.3% | 0.0% | 831 |-12 |[-4095:-2048] | 0.0% | 0.4% | 832 |-11 |[-2047:-1024] | 0.0% | 4.1% | 833 |-10 |[-1023:-512] | 0.0% | 2.0% | 834 |-9 |[-511:-256] | 0.0% | 0.1% | 835 |-8 |[-255:-128] | 0.0% | 0.9% | 836 |-7 |[-127:-64] | 0.0% | 0.4% | 838 Internet-document Statistical Inter-flow Field Behaviour February 2004 839 for Context Replication in ROHC-TCP 841 |-6 |[-63:-32] | 0.0% | 0.5% | 842 |-5 |[-31:-16] | 0.0% | 1.9% | 843 |-4 |[-15:-8] | 0.0% | 0.0% | 844 |-3 |[-7:-4] | 0.3% | 0.0% | 845 |-2 |[-3:-2] | 0.2% | 0.2% | 846 |-1 |[-1] | 6.8% | 0.0% | 847 |0 |[0] | 23.3% | 74.3% | 848 |1 |[1] | 33.4% | 0.0% | 849 |2 |[2:3] | 8.4% | 0.1% | 850 |3 |[4:7] | 6.9% | 0.0% | 851 |4 |[8:15] | 3.8% | 0.1% | 852 |5 |[16:31] | 2.8% | 0.1% | 853 |6 |[32:63] | 2.3% | 0.8% | 854 |7 |[64:127] | 3.4% | 0.2% | 855 |8 |[128:255] | 1.2% | 0.4% | 856 |9 |[256:511] | 2.7% | 0.8% | 857 |10 |[512:1023] | 2.4% | 2.1% | 858 |11 |[1024:2047] | 1.4% | 8.2% | 859 |12 |[2048:4095] | 0.0% | 1.8% | 860 |13 |[4096:8191] | 0.4% | 0.4% | 861 |14 |[8192:16383] | 0.0% | 0.1% | 862 |15 |[16384:3276] | 0.0% | 0.0% | 863 |16 |[32768:65535] | 0.0% | 0.0% | 864 +--------+---------------+-----------+-----------+ 866 Table 7: Frequency distribution of Destination Port delta 868 5.6. TCP Sequence Number and Acknowledgement Number 870 The TCP Sequence Number (SEQNUM) cannot be replicated as the inter- 871 flow delta is random with a uniform probability density function, 872 regardless of the direction of flow. The TCP Acknowledgement Number 873 (ACKNUM) generally follows the randomness of SEQNUM, but a particular 874 behaviour can be exploited for compression of the first packet of 875 most �outgoing� flows. All handshaking packets with SYN set but ACK 876 clear (the first packet of TCP connections) carry ACKNUM with zero 877 value. This is a behaviour unique to �outgoing� flows because 878 service-requesting clients typically initiate the first packet within 879 TCP connections. The first �incoming� packet typically carries both 880 SYN and ACK set, and ACKNUM would be non-zero. Because up to the 881 third packet of each flow may be replicated, this represents at least 882 30% to 100% of all �outgoing� replicated packets. Thus, ACKNUM can at 883 worst be compressed as shown in Fig. 3. 885 Alternatively, instead of basing the specifications on asymmetry, all 886 compressor-decompressor pairs can treat the SYN-set ACK-not-set case 887 as a flag to infer that the value of ACKNUM is 0. These fields are 888 already appropriately handled as prescribed in [ROHC-TCP]. 890 Internet-document Statistical Inter-flow Field Behaviour February 2004 891 for Context Replication in ROHC-TCP 893 5.7. TCP Flags and Urgent Pointer 895 �TCP Flags� refers to the TCP group of six flags: URG (Urgent), ACK 896 (Acknowledgement), PSH (Push), RST (Reset), SYN (Synchronize) and FIN 897 (Finish). 899 The URG flag was not found to be set in almost our entire sample, 900 i.e. it is much more likely to be 0 than 1. In some applications, 901 however, the URG flag may be used extensively. Thus, it can be 902 encoded as IRREGULAR(1,100%). The URG flag is also useful for 903 indicating the presence of the Urgent Pointer field. The compressor- 904 decompressor pair can treat this field as IRREGULAR when URG is set 905 and zero when URG is not set. 907 ACK is not set only in the first handshaking packet of all 908 connections (similar to ACKNUM), as well as in some minority packets 909 with RST set. Since the proportion of IR-CR packets carrying an unset 910 ACK can range from 33% to 100%, it should be sent as 911 IRREGULAR(1,100%). 913 PSH was found to be varying unpredictably between 0 and 1, and is 914 thus best left as IRREGULAR(1,100%). 916 There is high correlation between RST, SYN and FIN behaviour, 917 allowing them to be encoded together. RST and FIN are not set in 918 almost 100% of replicated packets. These three flags can 919 therefore encoded as: VALUE(3,2,30%) | VALUE(3,0,65%) | 920 IRREGULAR(3,5%). Equivalently, these three flags can also be 921 encoded as prescribed in [ROHC-TCP] using the �index� encoding 922 method, with FIN or RST exclusively set as the two other common 923 values. 925 5.8. TCP Window 927 Table 8 shows the delta distribution. For flows in both directions, 928 the main peak is at delta = 0, with amplitude 43% for �outgoing� 929 replicated packets and 30% for �incoming� packets. We can encode 930 these cases with STATIC encoding. 932 +--------+---------------+-----------+-----------+ 933 |Encoded | Delta Range | Incoming | Outgoing | 934 |Bits,n | | Frequency | Frequency | 935 +--------+---------------+-----------+-----------+ 936 |-16 |[-65535:-32768]| 0.0% | 0.0% | 937 |-15 |[-32767:-16384]| 3.4% | 2.8% | 938 |-14 |[-16383:-8192] | 0.2% | 0.4% | 939 |-13 |[-8191:-4096] | 14.0% | 2.1% | 940 |-12 |[-4095:-2048] | 20.7% | 0.9% | 941 |-11 |[-2047:-1024] | 1.3% | 0.1% | 942 |-10 |[-1023:-512] | 6.6% | 1.7% | 944 Internet-document Statistical Inter-flow Field Behaviour February 2004 945 for Context Replication in ROHC-TCP 947 |-9 |[-511:-256] | 4.4% | 2.3% | 948 |-8 |[-255:-128] | 4.1% | 0.8% | 949 |-7 |[-127:-64] | 0.6% | 2.6% | 950 |-6 |[-63:-32] | 0.4% | 1.2% | 951 |-5 |[-31:-16] | 0.2% | 0.7% | 952 |-4 |[-15:-8] | 0.1% | 0.5% | 953 |-3 |[-7:-4] | 0.1% | 0.1% | 954 |-2 |[-3:-2] | 0.2% | 0.0% | 955 |-1 |[-1] | 0.2% | 0.0% | 956 |0 |[0] | 30.4% | 43.2% | 957 |1 |[1] | 0.1% | 0.0% | 958 |2 |[2:3] | 0.1% | 0.1% | 959 |3 |[4:7] | 0.1% | 0.1% | 960 |4 |[8:15] | 0.1% | 0.2% | 961 |5 |[16:31] | 0.2% | 0.2% | 962 |6 |[32:63] | 0.1% | 0.8% | 963 |7 |[64:127] | 0.4% | 1.7% | 964 |8 |[128:255] | 0.2% | 3.4% | 965 |9 |[256:511] | 1.1% | 4.0% | 966 |10 |[512:1023] | 1.1% | 6.8% | 967 |11 |[1024:2047] | 2.0% | 3.0% | 968 |12 |[2048:4095] | 0.5% | 0.1% | 969 |13 |[4096:8191] | 2.3% | 0.3% | 970 |14 |[8192:16383] | 2.5% | 3.2% | 971 |15 |[16384:3276] | 0.1% | 3.5% | 972 |16 |[32768:65535] | 2.2% | 13.1% | 973 +--------+---------------+-----------+-----------+ 975 Table 8: Frequency distribution of Window delta 977 Unlike other fields, Window delta values tend not to cluster 978 near the main peak. This is an expected behaviour. Naturally, LSB 979 would not be a suitable encoding method for the Window field. A 980 number of secondary peaks can be observed in Table 8, which suggests 981 that Windows tend to vary among a few discontinuous but commonly 982 used values. 984 We determine the most common Window values for �incoming� and 985 �outgoing� flows separately and obtain a distribution of these 986 common Window values. This is shown in Table 9. It can 987 be observed again that asymmetry is inherent between �incoming� and 988 �outgoing� flows. In this case, asymmetry is due to the use of a 989 different range of popular Window values between �incoming� and 990 �outgoing� flows. �Incoming� advertised Window fields typically come 991 from HTTP servers sending data more than receiving data. Servers 992 typically advertise their receiver window conservatively and are slow 993 to grow their windows, to prevent data overloads from handling 994 multiple clients concurrently, and because of the congestion window 996 Internet-document Statistical Inter-flow Field Behaviour February 2004 997 for Context Replication in ROHC-TCP 999 slow start algorithm [RFC-2581]. On the other 1000 hand, sources of �outgoing� traffic are normally clients downloading 1001 data from servers. To utilize bandwidth efficiently, the advertised 1002 window is usually large, usually right from the first packet. This is 1003 consistent with recent proposals for increasing the TCP initial 1004 Window size [RFC-3390]. 1006 +----------------------+----------------------+ 1007 | Incoming | Outgoing | 1008 +--------+-------------+--------+-------------+ 1009 | Value | Probability | Value | Probability | 1010 | | (%) | | (%) | 1011 +--------+-------------+--------+-------------+ 1012 | 1380 | 1.1 | 1460 | 1.6 | 1013 | 1460 | 23.5 | 2920 | 1.6 | 1014 | 2760 | 1.3 | 8192 | 3.1 | 1015 | 2920 | 22.2 | 8280 | 6.6 | 1016 | 5840 | 2.2 | 16384 | 10.3 | 1017 | 8280 | 11.7 | 16560 | 8.0 | 1018 | 11680 | 4.9 | 64240 | 26.3 | 1019 | 16384 | 6.9 | 64860 | 8.8 | 1020 | 16560 | 2.1 | 65520 | 2.6 | 1021 | 65535 | 4.6 | 65535 | 18.3 | 1022 +--------+-------------+--------+-------------+ 1023 | Total | 80.4 | - | 87.2 | 1024 +--------+-------------+--------+-------------+ 1026 Table 9: Common Window field values 1028 The common values of the Window field, inclusive of all category 1029 values found in Table 9, can be typically expressed as either (i) a 1030 multiple of the Maximum Segment Size of the end-to-end channel, or 1031 (ii) a raised power of 2, with possibly an offset of 1. 1033 The Maximum Segment Size (MSS) is negotiated between both TCP 1034 endpoints, through the TCP Options in TCP handshaking packets. The 1035 negotiated MSS and is in turn derived from the IP Maximum Transfer 1036 Unit (MTU) of the underlying network [RFC-1122]. The MTU over 1037 Ethernet is 1500 bytes, or 1492 if used with Sub-network Attachment 1038 Point (SNAP), or 1300 if used with PPP over Ethernet (for ADSL 1039 links). Subtracting 40 bytes for TCP/IPv4 protocol stack, or 60 bytes 1040 for the TCP/IPv6 protocol stack, or 120 bytes for maximum TCP/IP 1041 header size, typically advertised MSS values are 1460, 1380, 1260, 1042 1440 or 1452 bytes, in decreasing popularity. From the above set of 1043 MSS values, 1460 and 1380 are used almost exclusively. Consequently, 1044 almost all the Window values found in Table 9 can be expressed either 1045 as multiples of 1460 or 1380. Exceptions are 8192, 16384, 65535, 1046 which are raised powers of 2 with possibly offset of 1, and 65520, 1047 which is a multiple of 1260. 1049 Internet-document Statistical Inter-flow Field Behaviour February 2004 1050 for Context Replication in ROHC-TCP 1052 Thus, commonly used Window values not expressible as multiples 1053 of the MSS values are raised powers of 2 with possibly an offset of 1054 1. From Table 9, 8192, 16384 and 65535 are 2^13, 2^14 and 2^16 - 1 1055 respectively. 1057 Also, the TCP Window is always 0 when RST (Reset flag) is set. 1058 Therefore, the decompressor can infer the Window value whenever 1059 RST is set and there is no need to send it. 1061 The TCP Window field is used in both congestion and flow 1062 control. The use of congestion control can account partly for the 1063 commonly used values discussed above, as congestion control changes 1064 are in multiples of the MSS. However, values due to flow control do 1065 not follow the pattern discussed above but are typically small 1066 offsets from the above commonly used values. 1068 Currently, the Window field is either encoded as STATIC or IRREGULAR 1069 for context replication [ROHC-TCP]. The above observations illustrate 1070 that current use of encoding methods do not sufficiently make use of 1071 the unique behaviour of the Window field. It also provides the 1072 motivation for devising a more efficient way of encoding the Window 1073 field. This encoding method is elaborated upon in [TCP-WIN]. 1075 5.9. TCP Checksum 1077 The TCP Checksum field covers the pseudo-header, payload and TCP 1078 header, and varies between packets. Although ROHC packets may contain 1079 a CRC field, the CRC does not cover the payload. Since it is 1080 important to preserve data integrity, the Checksum field is sent 1081 uncompressed as IRREGULAR (16,100%). 1083 5.10. TCP Options 1085 TCP options contain a wide variety of optional fields, but commonly 1086 used options include the MSS, Window Scale and SACK-Permitted found 1087 in handshaking packets. These fields do not change between replicated 1088 packets and can thus be compressed efficiently as STATIC for context 1089 replication. 1091 5.11. Mean Sizes of Compressed Fields 1093 Table 10 shows the TCP/IP fields found in �incoming� IR-CR packets 1094 and calculates the mean sizes of their encoded forms. Compressed 1095 TCP/IP fields take up a mean size of 107.3 bits for �incoming� flows. 1096 By repeating the calculation based on �outgoing� packet format 1097 specifications, it can be shown that the mean �outgoing� IR-CR size 1098 is 97.5 bits. 1100 Internet-document Statistical Inter-flow Field Behaviour February 2004 1101 for Context Replication in ROHC-TCP 1103 +---------------------+------+--------------------------+-------+ 1104 | | Size | Encoded size (bits) & | Mean | 1105 | Field | | probability |Encoded| 1106 | |(bits)| | Size | 1107 | | | | (bits)| 1108 +---------------------+------+--------------------------+-------+ 1109 |IPv4 Identification | 16 | 3(50%) | 8(17%) | 16(33%)| 8.14 | 1110 |IPv4 Don�t Fragment | 1 | 0(73%) | 1(27%) | 0.27 | 1111 |IPv4 Time To Live | 8 | 0(98%) | 8(2%) | 0.16 | 1112 |IPv4 Dest. Address | 32 | 0(98%) | 32(2%) | 0.64 | 1113 |TCP Source Port | 16 | 0(70%) | 16(30%) | 4.80 | 1114 |TCP Dest. Port | 16 | 3(73%) | 8(14%) | 16(13%)| 5.39 | 1115 |TCP Sequence Number | 32 | 32(100%) | 32 | 1116 |TCP Ack. Num | 32 | 32(100%) | 32 | 1117 |TCP flags | 8 | 2(95%) | 5(5%) | 2.15 | 1118 |TCP Window | 16 | 0(30%) | 6(47%) | 4(8%) | 5.54 | 1119 | | | | 16(15%) | | 1120 |TCP Checksum | 16 | 16(100%) | 16 | 1121 |TCP Urgent Pointer | 16 | 0(99%) | 16(1%) | 0.16 | 1122 +---------------------+------+--------------------------+-------+ 1123 |TOTAL | 209 | - | 107.3 | 1124 +---------------------+------+--------------------------+-------+ 1126 Table 10: Mean Encoded Sizes of �incoming� TCP/IP Fields 1128 6. Handling Asymmetrical Inter-flow Behaviour 1130 From the previous section, and as summarized in Fig. 3, some TCP/IP 1131 fields exhibit inherently asymmetrical behaviour. The issue, then, is 1132 to explore various ways of handling such asymmetrical behaviour such 1133 that the gain versus complexity tradeoff can be optimized. 1135 As observable from the header compression model in Fig. 1 and 1136 asymmetrical packet format specifications in Fig. 3, asymmetrical 1137 inter-flow behaviour can be handled by asymmetrical header 1138 compression. This can be done by configuring compressor-decompressor 1139 using a different set of packet format specifications, based on their 1140 'incoming' or 'outgoing' role. While this treatment has the highest 1141 compression efficiency, its main disadvantage is that it may be more 1142 complicated than symmetrical header compression. 1144 Alternatively, asymmetrical behaviour can also be handled using 1145 symmetrical packet format specifcations, by expanding the use of the 1146 'multiple_packet_formats' encoding method [ROHC-FN] to cover 1147 asymmetrical behaviour, at the cost of using a few more 1148 'discriminator bits'. This is the methodology being adopted in 1149 current ROHC drafts. 1151 Internet-document Statistical Inter-flow Field Behaviour February 2004 1152 for Context Replication in ROHC-TCP 1154 From Fig. 3, the fields exhibiting significant asymmetrical behaviour 1155 are the IP Destination Address, TCP Source Port, Destination Port and 1156 Acknowledgement Number. (The behaviour of TCP Window is in fact 1157 also asymmetrical, but asymmetry cannot be expressed using current 1158 encoding methods) To handle these fields symmetrically, the following 1159 packet format specifications can be used instead: 1161 Destination_Address ::= STATIC(.) | IRREGULAR(32,.) %1 discriminator 1162 % bit 1164 Source_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) | 1165 IRREGULAR(16,.) %2 discriminator bits 1167 Destination_Port ::= STATIC(.) | LSB(3,0,.) | LSB(8,0,.) | 1168 IRREGULAR(16,.) %2 discriminator bits 1170 Acknowledgement_Number ::= VALUE(32,0,.) | IRREGULAR(32,.) 1171 %1 discriminator bit 1173 Fig. 4: Symmetrical packet format specifications for fields with 1174 asymmetrical behaviour 1176 The asymmetrical behaviour of Window field may be handled 1177 efficiently using a proposed encoding method as elaborated in [TCP- 1178 WIN]. This encoding method can be either symmetrical or asymmetrical. 1180 7. Security Considerations 1182 This document does not bring any new additional security 1183 considerations. 1185 8. References 1187 [RFC-3390] Allman, M., Floyd, S., Partridge, C.,. �Increasing TCP�s 1188 Initial Window�, RFC 3390, October 2002. 1190 [RFC-3095] Bormann, C., Burmeister, C., Degermark, M., Fukushima, 1191 H., Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., 1192 Le, K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, 1193 K., Wiebke, T., Yoshimura, T. and H. Zheng, "RObust 1194 Header Compression (ROHC): Framework and four profiles: 1195 RTP, UDP, ESP, and uncompressed", RFC 3095, July 2001. 1197 [RFC-2581] Allman, M., Paxon, V., Stevens, W., �TCP Congestion 1198 Control�, RFC 2581, April 1999. 1200 [RFC-2234] Crocker D, et al, "Augmented BNF for Syntax 1201 Specifications: ABNF", RFC 2234, 1997. 1203 Internet-document Statistical Inter-flow Field Behaviour February 2004 1204 for Context Replication in ROHC-TCP 1206 [RFC-1122] R. Braden, Editor, �Requirements for Internet Hosts � 1207 Communication Layers�, RFC 1122, 1989. 1209 [ROHC-TCP] Pelletier, G., Zhang, Q., Jonsson, L-E., Liao, H., West, 1210 M., "RObust Header Compression (ROHC): TCP/IP Profile 1211 (ROHC-TCP)", Internet Draft (work in progress), , May 2003. 1214 [TCP-BEH] West, M. and S. McCann, "TCP/IP Field behavior", Internet 1215 Draft (work in progress), , March 2003. 1218 [ROHC-CR] Pelletier, G., "RObust Header Compression (ROHC): Context 1219 Replication for ROHC Profiles", Internet Draft (work in 1220 progress), , 1221 October 2003. 1223 [ROHC-FN] "Formal Notation for Robust Header Compression 1224 (ROHC-FN)", R. Price et al., (work in progress), March 2003 1227 [EPIC-LITE] Price, R., Hancock, R., McCann, S., Surtees, A., Ollis, 1228 P., West, M., "Framework for EPIC-LITE", Internet Draft 1229 (work in progress), , 1230 2002. 1232 [EPIC-IMPL] L. Vidjak, M. Stula, J. Ozegovic, "Program Structures 1233 for EPIC-LITE Experimental Implementation", SoftCOM 2002. 1235 [TCP-WIN] Cho, C.Y., Hazra, S.K., �Encoding Method for TCP Window 1236 in Context Replication�, Internet Draft, to be submitted. 1238 9. Authors' Addresses 1240 Chia Yuan Cho 1241 Institute for Infocomm Research (I2R) 1242 21 Heng Mui Keng Terrace 1243 Singapore 119613 1245 Phone: +65 6874 6643 1246 Email: stucyc2@i2r.a-star.edu.sg 1248 Internet-document Statistical Inter-flow Field Behaviour February 2004 1249 for Context Replication in ROHC-TCP 1251 Sukanta Kumar Hazra 1252 Institute for Infocomm Research (I2R) 1253 21 Heng Mui Keng Terrace 1254 Singapore 119613 1256 Phone: +65 6874 1953 1257 Email: sukanta@i2r.a-star.edu.sg 1259 Internet-document Statistical Inter-flow Field Behaviour February 2004 1260 for Context Replication in ROHC-TCP 1262 Appendix A. State Transition Threshold 1264 The aim of this section is to determine a reasonable range for the 1265 number of initial TCP/IP packets possibly converted into IR or IR-CR 1266 packets, which is defined as the State Transition Threshold. 1268 The compressor state machine controls the type of packet transmitted 1269 to the decompressor. As elaborated in [ROHC-TCP], transition from the 1270 CR state to CO state at the compressor is initiated optimistically or 1271 explicitly through reception of an ROHC ACK from the decompressor. 1272 Because at least 1 IR/IR-CR packet must be sent before state 1273 transition, the State Transition Threshold, H is such that H: H >= 1. 1274 The State Transition Threshold is different from simply the number of 1275 context initializing IR/IR-CR packets sent because in uni-directional 1276 mode or optimistic bidirectional mode, a single TCP/IP packet may be 1277 sent as a number of duplicate IR/IR-CR packets (To allow the 1278 compressor to gain the optimistism necessary for upwards transition). 1280 A range of suitable values for H is derived the protocol stack nature 1281 and channel characteristics. For the TCP/IP protocol stack, we begin 1282 by looking at the first few packets exchanged for a TCP connection. 1284 Fig. 4 shows a TCP connection using TCP/IP header compression over a 1285 low-bandwidth channel. Packets in the forward direction are numbered. 1286 The first TCP packet is always converted into an IR/IR-CR packet. In 1287 the following analysis, we focus on the compressor at the client and 1288 the decompressor at the router. 1290 Suppose the channel is full-duplex, and an ROHC ACK is sent upon the 1291 successful decompression of the first packet. ROHC ACKs may be 1292 piggybacked. The earliest possible ROHC ACK sent is indicated in Fig. 1293 4 as a dotted arrow. When the compressor receives the ROHC ACK, it 1294 transits from IR/CR to CO state. Subsequently, it starts sending CO 1295 packets instead. If the channel is reliable, then the compressor 1296 receives its ROHC ACK before it sends the second TCP/IP packet and 1297 only a single TCP/IP packet becomes an IR/IR-CR packet, i.e. H = 1. 1298 This is also likely if the router-server RTT >> client-router RTT, 1299 for which case even if the first ROHC ACK is lost, the compressor may 1300 be offered ample opportunity to receive retransmitted ROHC ACKs 1301 before it sends the packet #2. Conversely, if the channel is 1302 unreliable, and/or if client-router RTT >> router-server RTT (as is 1303 likely the case for cellular links), then it is likely that the ROCH 1304 ACK is not received immediately and subsequent TCP/IP packets are 1305 still sent as IR-CR packets. However, as seen from Fig. 4, the time 1306 lapse between TCP/IP packet #1 and packet #4 is long compared to all 1307 subsequent packets (when the TCP sliding window mechanism kicks in), 1308 and it is reasonable to assume that the ROHC ACK is received before 1309 packet #4 is sent. Thus, a reasonable range is 1 <= H <= 3. 1311 Internet-document Statistical Inter-flow Field Behaviour February 2004 1312 for Context Replication in ROHC-TCP 1314 Client Router Server 1315 | | | 1316 SYN |--- #1 | | 1317 | --- | | 1318 | -->|--- | 1319 | ...| --- | 1320 | ... | -->| 1321 +-- ROHC ACK |<.. | ---| SYN,ACK 1322 | (best case) | | --- | 1323 | | ---|<-- | 1324 | | --- | | 1325 | |<-- | | 1326 | ACK |--- #2 | | 1327 | | --- | | 1328 | request |--- #3-->|--- | 1329 | | --- | --- | 1330 | | -->|--- -->| 1331 | large | | --- | 1332 | time | | -->| 1333 | lapse | | ---| reply 1334 | | | --- | 1335 | | ---|<-- | 1336 | | --- | | 1337 +--(worst case)|<-- | | 1338 |--- #4 | | 1339 | --- | | 1340 | -->|--- | 1341 | | --- | 1342 | | -->| 1343 Compressor Decompressor 1345 |_________|_________| 1346 Low Wired 1347 Bandwidth or 1348 Channel Wireless 1350 Fig. 4: TCP handshaking and ROHC ACKs 1352 Finally, because TCP/IP contains bi-directional traffic, header 1353 compression may occur in both directions and in this case the overall 1354 state transition threshold is Ho = 2H. For uni-directional protocol 1355 stacks like RTP/UDP/IP, the overall state transition threshold Ho 1356 remains at H.