idnits 2.17.1 draft-ietf-rohc-formal-notation-13.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** It looks like you're using RFC 3978 boilerplate. You should update this to the boilerplate described in the IETF Trust License Policy document (see https://trustee.ietf.org/license-info), which is required now. -- Found old boilerplate from RFC 3978, Section 5.1 on line 16. -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on line 2804. -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2815. -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2822. -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2828. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- No issues found here. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust Copyright Line does not match the current year -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- The document date (November 2006) is 6371 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Possible downref: Non-RFC (?) normative reference: ref. 'C90' == Outdated reference: A later version (-04) exists of draft-ietf-rohc-rfc3095bis-framework-01 ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322) ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234) Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 Robust Header Compression R. Finking 3 Internet-Draft Siemens/Roke Manor 4 Intended status: Standards Track G. Pelletier 5 Expires: May 5, 2007 Ericsson 6 November 2006 8 Formal Notation for Robust Header Compression (ROHC-FN) 9 draft-ietf-rohc-formal-notation-13 11 Status of this Memo 13 By submitting this Internet-Draft, each author represents that any 14 applicable patent or other IPR claims of which he or she is aware 15 have been or will be disclosed, and any of which he or she becomes 16 aware will be disclosed, in accordance with Section 6 of BCP 79. 18 Internet-Drafts are working documents of the Internet Engineering 19 Task Force (IETF), its areas, and its working groups. Note that 20 other groups may also distribute working documents as Internet- 21 Drafts. 23 Internet-Drafts are draft documents valid for a maximum of six months 24 and may be updated, replaced, or obsoleted by other documents at any 25 time. It is inappropriate to use Internet-Drafts as reference 26 material or to cite them other than as "work in progress." 28 The list of current Internet-Drafts can be accessed at 29 http://www.ietf.org/ietf/1id-abstracts.txt. 31 The list of Internet-Draft Shadow Directories can be accessed at 32 http://www.ietf.org/shadow.html. 34 This Internet-Draft will expire on May 5, 2007. 36 Copyright Notice 38 Copyright (C) The IETF Trust (2006). 40 Abstract 42 This document defines ROHC-FN (RObust Header Compression - Formal 43 Notation): a formal notation to specify field encodings for 44 compressed formats when defining new profiles within the ROHC 45 framework. ROHC-FN offers a library of encoding methods that are 46 often used in ROHC profiles and can thereby help simplifying future 47 profile development work. 49 Table of Contents 51 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 52 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 53 3. Overview of ROHC-FN . . . . . . . . . . . . . . . . . . . . . 5 54 3.1. Scope of the Formal Notation . . . . . . . . . . . . . . . 6 55 3.2. Fundamentals of the Formal Notation . . . . . . . . . . . 7 56 3.2.1. Fields and Encodings . . . . . . . . . . . . . . . . . 7 57 3.2.2. Formats and Encoding Methods . . . . . . . . . . . . . 9 58 3.3. Example using IPv4 . . . . . . . . . . . . . . . . . . . . 11 59 4. Normative Definition of ROHC-FN . . . . . . . . . . . . . . . 14 60 4.1. Structure of a Specification . . . . . . . . . . . . . . . 15 61 4.2. Identifiers . . . . . . . . . . . . . . . . . . . . . . . 15 62 4.3. Constant Definitions . . . . . . . . . . . . . . . . . . . 17 63 4.4. Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 17 64 4.4.1. Attribute References . . . . . . . . . . . . . . . . . 18 65 4.4.2. Representation of Field Values . . . . . . . . . . . . 18 66 4.5. Grouping of Fields . . . . . . . . . . . . . . . . . . . . 19 67 4.6. "THIS" . . . . . . . . . . . . . . . . . . . . . . . . . . 19 68 4.7. Expressions . . . . . . . . . . . . . . . . . . . . . . . 20 69 4.7.1. Integer Literals . . . . . . . . . . . . . . . . . . . 21 70 4.7.2. Integer Operators . . . . . . . . . . . . . . . . . . 21 71 4.7.3. Boolean Literals . . . . . . . . . . . . . . . . . . . 21 72 4.7.4. Boolean Operators . . . . . . . . . . . . . . . . . . 21 73 4.7.5. Comparison Operators . . . . . . . . . . . . . . . . . 22 74 4.8. Comments . . . . . . . . . . . . . . . . . . . . . . . . . 22 75 4.9. "ENFORCE" Statements . . . . . . . . . . . . . . . . . . . 23 76 4.10. Formal Specification of Field Lengths . . . . . . . . . . 24 77 4.11. Library of Encoding Methods . . . . . . . . . . . . . . . 25 78 4.11.1. uncompressed_value . . . . . . . . . . . . . . . . . . 25 79 4.11.2. compressed_value . . . . . . . . . . . . . . . . . . . 26 80 4.11.3. irregular . . . . . . . . . . . . . . . . . . . . . . 27 81 4.11.4. static . . . . . . . . . . . . . . . . . . . . . . . . 28 82 4.11.5. lsb . . . . . . . . . . . . . . . . . . . . . . . . . 28 83 4.11.6. crc . . . . . . . . . . . . . . . . . . . . . . . . . 30 84 4.12. Definition of Encoding Methods . . . . . . . . . . . . . . 30 85 4.12.1. Structure . . . . . . . . . . . . . . . . . . . . . . 31 86 4.12.2. Arguments . . . . . . . . . . . . . . . . . . . . . . 38 87 4.12.3. Multiple Formats . . . . . . . . . . . . . . . . . . . 39 88 4.13. Profile-specific Encoding Methods . . . . . . . . . . . . 42 89 5. Security considerations . . . . . . . . . . . . . . . . . . . 42 90 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 42 91 7. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 42 92 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 43 93 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 43 94 9.1. Normative References . . . . . . . . . . . . . . . . . . . 43 95 9.2. Informative References . . . . . . . . . . . . . . . . . . 44 96 Appendix A. Formal Syntax of ROHC-FN . . . . . . . . . . . . . . 44 97 Appendix B. Bit-level Worked Example . . . . . . . . . . . . . . 46 98 B.1. Example Packet Format . . . . . . . . . . . . . . . . . . 46 99 B.2. Initial Encoding . . . . . . . . . . . . . . . . . . . . . 47 100 B.3. Basic Compression . . . . . . . . . . . . . . . . . . . . 48 101 B.4. Inter-packet compression . . . . . . . . . . . . . . . . . 50 102 B.5. Specifying Initial Values . . . . . . . . . . . . . . . . 51 103 B.6. Multiple Packet Formats . . . . . . . . . . . . . . . . . 52 104 B.7. Variable Length Discriminators . . . . . . . . . . . . . . 54 105 B.8. Default encoding . . . . . . . . . . . . . . . . . . . . . 57 106 B.9. Control Fields . . . . . . . . . . . . . . . . . . . . . . 59 107 B.10. Use Of "ENFORCE" Statements As Conditionals . . . . . . . 61 108 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 63 109 Intellectual Property and Copyright Statements . . . . . . . . . . 65 111 1. Introduction 113 ROHC-FN is a formal notation designed to help with the definition of 114 ROHC [I-D.ietf-rohc-rfc3095bis-framework] header compression 115 profiles. Previous header compression profiles have been so far 116 specified using a combination of English text together with ASCII Box 117 notation. Unfortunately, this was sometimes unclear and ambiguous, 118 revealing the limitations of defining complex structures and 119 encodings for compressed formats this way. The primary objective of 120 the Formal Notation is to provide a more rigorous means to define 121 header formats -- compressed and uncompressed -- as well as the 122 relationships between them. No other formal notation exists which 123 meet these requirements, so ROHC-FN aims to meet them. 125 In addition, ROHC-FN offers a library of encoding methods that are 126 often used in ROHC profiles, so that the specification of new 127 profiles using the formal notation can be done without having to 128 redefine this library from scratch. Informally, an encoding method 129 defines a two-way mapping between uncompressed data and compressed 130 data. 132 2. Terminology 134 o Compressed format 136 A compressed format consists of a list of fields that provides 137 bindings between encodings and the fields it compresses. One or 138 more compressed formats can be combined to represent an entire 139 compressed header format. 141 o Context 143 Context is information about the current (de)compression state of 144 the flow. Specifically, a context for a specific field can be 145 either uninitialized, or it can include a set of one or more 146 values for the field's attributes defined by the compression 147 algorithm, where a value may come from the field's attributes 148 corresponding to a previous packet. See also a more generalized 149 definition in section 2.2 of [I-D.ietf-rohc-rfc3095bis-framework]. 151 o Control field 153 Control fields are transmitted from a ROHC compressor to a ROHC 154 decompressor, but are not part of the uncompressed header itself. 156 o Encoding method, encodings 158 Encoding methods are two-way relations that can be applied to 159 compress and decompress fields of a protocol header. 161 o Field 163 The protocol header is divided into a set of contiguous bit 164 patterns known as fields. Each field is defined by a collection 165 of attributes which indicate its value and length in bits for both 166 the compressed and uncompressed headers. The way the header is 167 divided into fields is specific to the definition of a profile, 168 and it is not necessary for the field divisions to be identical to 169 the ones given by the specification(s) for the protocol header 170 being compressed. 172 o Library of encoding methods 174 The library of encoding methods contains a number of commonly used 175 encoding methods for compressing header fields. 177 o Profile 179 A ROHC [I-D.ietf-rohc-rfc3095bis-framework] profile is a 180 description of how to compress a certain protocol stack. Each 181 profile consists of a set of formats (e.g. uncompressed and 182 compressed formats) along with a set of rules that control 183 compressor and decompressor behaviour. 185 o ROHC-FN specification 187 The specification of the set of formats of a ROHC profile using 188 ROHC-FN. 190 o Uncompressed format 192 An uncompressed format consists of a list of fields that provides 193 the order of the fields to be compressed for a contiguous set of 194 bits whose bit layout corresponds to the protocol header being 195 compressed. 197 3. Overview of ROHC-FN 199 This section gives an overview of ROHC-FN. It also explains how 200 ROHC-FN can be used to specify the compression of header fields as 201 part of a ROHC profile. 203 3.1. Scope of the Formal Notation 205 This section explains how the formal notation relates to the ROHC 206 framework and to specifications of ROHC profiles. 208 The ROHC framework [I-D.ietf-rohc-rfc3095bis-framework] provides the 209 general principles for performing robust header compression. It 210 defines the concept of a profile, which makes ROHC a general platform 211 for different compression schemes. It sets link layer requirements, 212 and in particular negotiation requirements, for all ROHC profiles. 213 It defines a set of common functions such as Context Identifiers 214 (CIDs), padding and segmentation. It also defines common formats 215 (IR, IR-DYN, Feedback, Add-CID, etc.), and finally it defines a 216 generic, profile independent, feedback mechanism. 218 A ROHC profile is a description of how to compress a certain protocol 219 stack. For example, ROHC profiles are available for RTP/UDP/IP and 220 many other protocol stacks. 222 At a high level, each ROHC profile consists of a set of formats 223 (defining the bits to be transmitted) along with a set of rules that 224 control compressor and decompressor behaviour. The purpose of the 225 formats is to define how to compress and decompress headers. The 226 formats define one or more compressed versions of each uncompressed 227 header, and simultaneously define the inverse: how to relate a 228 compressed header back to the original uncompressed header. 230 The set of formats will typically define compression of headers 231 relative to a context of field values from previous headers in a 232 flow, improving the overall compression by taking into account 233 redundancies between headers of successive packets. Therefore, in 234 addition to defining the formats, a profile has to: 236 o specify how to manage the context, for both the compressor and the 237 decompressor, 238 o define when and what to send in feedback messages, if any, from 239 decompressor to compressor, 240 o outline compression principles to make the profile robust against 241 bit errors and dropped packets. 243 All this is needed to ensure that the compressor and decompressor 244 contexts are kept consistent with each other, while still 245 facilitating the best possible compression performance. 247 The ROHC-FN is designed to help in the specification of compressed 248 formats that, when put together based on the profile definition, make 249 up the formats used in a ROHC profile. It offers a library of 250 encoding methods for compressing fields, and a mechanism for 251 combining these encoding methods to create compressed formats 252 tailored to a specific protocol stack. 254 The scope of ROHC-FN is limited to specifying the relationship 255 between the compressed and uncompressed formats. To form a complete 256 profile specification the control logic for the profile behaviour 257 needs to be defined by other means. 259 3.2. Fundamentals of the Formal Notation 261 There are two fundamental elements to the formal notation: 263 1. Fields and their encodings, which define the mapping between a 264 header's uncompressed and compressed forms. 265 2. Encoding methods, which define the way headers are broken down 266 into fields. Encoding methods define lists of uncompressed 267 fields and the lists of compressed fields they map onto. 269 These two fundamental elements are at the core of the notation and 270 are outlined below. 272 3.2.1. Fields and Encodings 274 Headers are made up of fields. For example version number, header 275 length and sequence number are all fields used in real protocols. 277 Fields have attributes. Attributes describe various things about the 278 field, including the length of the field and where the field appears 279 in the header. For example: 281 field.ULENGTH 283 indicates the uncompressed length of the field. A field is said to 284 have a value attribute, i.e. a compressed value or an uncompressed 285 value, if the corresponding length attribute is greater than zero. 286 See Section 4.4 for more details on field attributes. 288 The relationship between the compressed and uncompressed attributes 289 of a field are specified with encoding methods, using the following 290 notation: 292 field =:= encoding_method; 294 In the field definition above, the symbol "=:=" means "is encoded 295 by". This field definition does not represent an assignment 296 operation from the right hand side to the left side. Instead, it is 297 a two-way mapping between the compressed and uncompressed attributes 298 of the field. It both represents the compression and the 299 decompression operation in a single field definition, through a 300 process of two-way matching. 302 Two-way matching is a binary operation that attempts to make the 303 operands (i.e. the compressed and uncompressed attributes) the same. 304 This is similar to the unification process in logic. The operands 305 represent one unspecified data object and one specified object. 306 Values can be matched from either operand. 308 During compression, the uncompressed attributes of the field are 309 already defined. The given encoding matches the compressed 310 attributes against them. During decompression, the compressed 311 attributes of the field are already defined, so the uncompressed 312 attributes are matched to the compressed attributes using the given 313 encoding method. Thus both compression and decompression are defined 314 by a single field definition. 316 Therefore, an encoding method (including any parameters specified) 317 creates a reversible binding between the attributes of a field. At 318 the compressor, a format can be used if a set of bindings that is 319 successful for all the attributes in all its fields can be found. At 320 the decompressor, the operation is reversed using the same bindings 321 and the attributes in each field are filled according to the 322 specified bindings; decoding fails if the binding for an attribute 323 fails. 325 For example, the "static" encoding method creates a binding between 326 the attribute corresponding to the uncompressed value of the field 327 and the attribute corresponding to the value of the field in the 328 context. 330 o For the compressor, the "static" binding is successful when both 331 the context value and the uncompressed value are the same. If the 332 two values differ then the binding fails. 333 o For the decompressor, the "static" binding succeeds only if a 334 valid context entry containing the value of the uncompressed field 335 exists. Otherwise, the binding will fail. 337 Both the compressed and uncompressed forms of each field are 338 represented as a string of bits, most significant bit first, of the 339 length specified by the length attribute. The bit string is the 340 binary representation of the value attribute of the field, modulo 341 "2^length", where "length" is the length attribute of the field. 342 This is however only the representation of the bits exchanged between 343 the compressor and the decompressor, designed to allow maximum 344 compression efficiency. The FN itself uses the full range of 345 integers. See Section 4.4.2 for further details. 347 3.2.2. Formats and Encoding Methods 349 The ROHC-FN provides a library of commonly used encoding methods. 350 Encoding methods can be defined using plain English, or using a 351 formal definition consisting of e.g. a collection of expressions 352 (Section 4.7) and "ENFORCE" statements (Section 4.9). 354 ROHC-FN also provides mechanisms for combining fields and their 355 encoding methods into higher level encoding methods following a well- 356 defined structure. This is similar to the definition of functions 357 and procedures in an ordinary programming language. It allows 358 complexity to be handled by being broken down into manageable parts. 359 New encoding methods are defined at the top level of a profile. 360 These can then be used in the definition of other higher level 361 encoding methods, and so on. 363 new_encoding_method // This block is an encoding method 364 { 365 UNCOMPRESSED { // This block is an uncompressed format 366 field_1 [ 16 ]; 367 field_2 [ 32 ]; 368 field_3 [ 48 ]; 369 } 371 CONTROL { // This block defines control fields 372 ctrl_field_1; 373 ctrl_field_2; 374 } 376 DEFAULT { // This block defines default encodings 377 // for specified fields 378 ctrl_field_2 =:= encoding_method_2; 379 field_1 =:= encoding_method_1; 380 } 382 COMPRESSED format_0 { // This block is a compressed format 383 field_1; 384 field_2 =:= encoding_method_2; 385 field_3 =:= encoding_method_3; 386 ctrl_field_1 =:= encoding_method_4; 387 ctrl_field_2; 388 } 390 COMPRESSED format_1 { // This block is a compressed format 391 field_1; 392 field_2 =:= encoding_method_3; 393 field_3 =:= encoding_method_4; 394 ctrl_field_2 =:= encoding_method_5; 395 ctrl_field_3 =:= encoding_method_6; // This is a control field 396 // with no uncompressed value 397 } 398 } 400 In the example above, the encoding method being defined is called 401 "new_encoding_method". The section headed "UNCOMPRESSED" indicates 402 the order of fields in the uncompressed header, i.e. the uncompressed 403 header format. The number of bits in each of the fields is indicated 404 in square brackets. After this is another section, "CONTROL", which 405 defines two control fields. Following this is the "DEFAULT" section 406 which defines default encoding methods for two of the fields (see 407 below). Finally, two alternative compressed formats follow, each 408 defined in sections headed "COMPRESSED". The fields that occur in 409 the compressed formats are either: 411 o fields that occur in the uncompressed format; or 412 o control fields, that have an uncompressed value and that occur in 413 the CONTROL section; or 414 o control fields, that do not have an uncompressed value and thus 415 defined as part of the compressed format. 417 Central to each of these formats is a "field list", which defines the 418 fields contained in the format and also the order that those fields 419 appear in that format. For the "DEFAULT" and "CONTROL" sections, the 420 field order is not significant. 422 In addition to specifying field order, the field list may also 423 specify bindings for any or all of the fields it contains. Fields 424 that have no bindings defined for them are bound using the default 425 bindings specified in the "DEFAULT" section (see Section 4.12.1.5). 427 Fields from the compressed format have the same name as they do in 428 the uncompressed format. If there are any fields which are present 429 exclusively in the compressed format but which do have an 430 uncompressed value, they must be declared in the "CONTROL" section of 431 the definition of the encoding method (see Section 4.12.1.3 for more 432 details on defining control fields). 434 Fields which have no uncompressed value do not appear in an 435 "UNCOMPRESSED" field list and do not have to appear in the "CONTROL" 436 field list either. Instead they are only declared in the compressed 437 field lists where they are used. 439 In the example above, all the fields that appear in the compressed 440 format are also found in the uncompressed format, or the control 441 field list, except for ctrl_field_3; this is possible because 442 ctrl_field_3 has no "uncompressed" value at all. Fields such as a 443 checksum on the compressed information fall into this category. 445 3.3. Example using IPv4 447 This section gives an overview of how the notation is used by means 448 of an example. The example will develop the formal notation for an 449 encoding method capable of compressing a single, well-known header: 450 the IPv4 header [RFC791]. 452 The first step is to specify the overall structure of the IPv4 453 header. To do this, we use an encoding method which we will call 454 "ipv4_header". More details on definitions of encoding methods can 455 be found in Section 4.12. This is notated as follows: 457 ipv4_header 458 { 460 The fragment of notation above defines the encoding method 461 "ipv4_header", the definition of which follows the opening brace (see 462 Section 4.12). 464 Definitions within the pair of braces are local to "ipv4_header". 465 This scoping mechanism helps to clarify which fields belong to which 466 formats: it is also useful when compressing complex protocol stacks 467 with several headers, often with the same field names occurring in 468 multiple formats (see Section 4.2). 470 The next step is to specify the fields contained in the uncompressed 471 IPv4 header to represent the uncompressed format for which the 472 encoding method will define one or more compressed formats. This is 473 accomplished using ROHC-FN as follows: 475 UNCOMPRESSED { 476 version [ 4 ]; 477 header_length [ 4 ]; 478 tos [ 6 ]; 479 ecn [ 2 ]; 480 length [ 16 ]; 481 id [ 16 ]; 482 reserved [ 1 ]; 483 dont_frag [ 1 ]; 484 more_fragments [ 1 ]; 485 offset [ 13 ]; 486 ttl [ 8 ]; 487 protocol [ 8 ]; 488 checksum [ 16 ]; 489 src_addr [ 32 ]; 490 dest_addr [ 32 ]; 491 } 493 The width of each field is indicated in square brackets. This part 494 of the notation is used in the example for illustration, to help the 495 reader's understanding. However indicating the field lengths in this 496 way is optional since the width of each field can normally also be 497 derived from the encoding that is used to compress/decompress it, for 498 a specific format. This part of the notation is formally defined in 499 Section 4.10. 501 The next step is to specify the compressed format. This includes the 502 encodings for each field which map between the compressed and 503 uncompressed forms of the field. In the example, these encoding 504 methods are mainly taken from the ROHC-FN library (see Section 4.11). 505 Since the intention here is to illustrate the use of the notation, 506 rather than to describe the optimum method of compressing IPv4 507 headers, this example uses only three encoding methods. 509 The "uncompressed_value" encoding method (defined in Section 4.11.1) 510 can compress any field whose uncompressed length and value are fixed, 511 or can be calculated using an expression. No compressed bits need to 512 be sent because the uncompressed field can be reconstructed using its 513 known size and value. The "uncompressed_value" encoding method is 514 used to compress five fields in the IPv4 header, as described below: 516 COMPRESSED { 517 header_length =:= uncompressed_value(4, 5); 518 version =:= uncompressed_value(4, 4); 519 reserved =:= uncompressed_value(1, 0); 520 offset =:= uncompressed_value(13, 0); 521 more_fragments =:= uncompressed_value(1, 0); 523 The first parameter indicates the length of the uncompressed field in 524 bits, and the second parameter gives its integer value. 526 Note that the order of the fields in the compressed format is 527 independent of the order of the fields in the uncompressed format. 529 The "irregular" encoding method (defined in Section 4.11.3) can be 530 used to encode any field for which both uncompressed attributes 531 (ULENGTH and UVALUE) are defined, and whose ULENGTH attribute is 532 either fixed or it can be calculated using an expression. It is a 533 fail-safe encoding method that can be used for such fields in the 534 case where no other encoding method applies. All of the bits in the 535 uncompressed form of the field are present in the compressed form as 536 well; hence this encoding does not achieve any compression. 538 src_addr =:= irregular(32); 539 dest_addr =:= irregular(32); 540 length =:= irregular(16); 541 id =:= irregular(16); 542 ttl =:= irregular(8); 543 protocol =:= irregular(8); 544 tos =:= irregular(6); 545 ecn =:= irregular(2); 546 dont_frag =:= irregular(1); 548 Finally, the third encoding method is specific only to the 549 uncompressed format defined above for the IPv4 header, 550 "inferred_ip_v4_header_checksum": 552 checksum =:= inferred_ip_v4_header_checksum [ 0 ]; 553 } 554 } 556 The "inferred_ip_v4_header_checksum" encoding method is different 557 from the other two encoding methods in that it is not defined in the 558 ROHC-FN library of encoding methods. Its definition could be given 559 either using the formal notation as part of the profile definition 560 itself (see Section 4.12) or using plain English text (see 561 Section 4.13). 563 In our example, the "inferred_ip_v4_header_checksum" is a specific 564 encoding method that calculates the IP checksum from the rest of the 565 header values. Like the "uncompressed_value" encoding method, no 566 compressed bits need to be sent, since the field value can be 567 reconstructed at the decompressor. This is notated explicitly by 568 specifying, in square brackets, a length of 0 for the checksum field 569 in the compressed format. Again, this notation is optional since the 570 encoding method itself would be defined as sending zero compressed 571 bits, however it is useful to the reader to include such notation 572 (see Section 4.10 for details on this part of the notation). 574 Finally the definition of the format is terminated with a closing 575 brace. At this point, the above example has defined a compressed 576 format that can be used to represent the entire compressed IPv4 577 header, and provided enough information to allow an implementation to 578 construct the compressed format from an uncompressed format 579 (compression) and vice versa (decompression). 581 4. Normative Definition of ROHC-FN 583 This section gives the normative definition of ROHC-FN. ROHC-FN is a 584 declarative language that is referentially transparent, with no side 585 effects. This means that whenever an expression is evaluated, there 586 are no other effects from obtaining the value of the expression; the 587 same expression is thus guaranteed to have the same value wherever it 588 appears in the notation, and it can always be interchanged with its 589 value in any of the formats it appears in (subject to the scope rules 590 of identifiers of Section 4.2). 592 The formal notation describes the structure of the formats and the 593 relationships between their uncompressed and compressed forms, rather 594 than describing how compression and decompression is performed. 596 In various places within this section, text inside angle brackets has 597 been used as a descriptive placeholder. The use of angle brackets in 598 this way is solely for the benefit of the reader of this draft. 599 Neither the angle brackets nor their contents form a part of the 600 notation. 602 4.1. Structure of a Specification 604 The specification of the compressed formats of a ROHC profile using 605 ROHC-FN is called a ROHC-FN specification. ROHC-FN specifications 606 are case sensitive and are written in the 7-bit ASCII character set 607 (as defined in [RFC2822]) and consist of a sequence of zero or more 608 constant definitions (Section 4.3), an optional global control field 609 list (Section 4.12.1.3) and one or more encoding method definitions 610 (Section 4.12). 612 Encoding methods can be defined using the formal notation or can be 613 predefined encoding methods. 615 Encoding methods are defined using the formal notation by giving one 616 or more uncompressed formats to represent the uncompressed header and 617 one or more compressed formats. These formats are related to each 618 other by "fields", each of which describes a certain part of an 619 uncompressed and/or a compressed header. In addition to the formats 620 each encoding method may contain control fields and default field 621 encodings sections. The attributes of a field are bound by using an 622 encoding method for it and/or by using "ENFORCE" statements 623 (Section 4.9) within the formats. Each of these is terminated by a 624 semi-colon. 626 Predefined encoding methods are not defined in the formal notation. 627 Instead they are defined by giving a short textual reference 628 explaining where the encoding method is defined. It is not necessary 629 to define the library of encoding methods contained in this document 630 in this way, their definition is implicit to the usage of the formal 631 notation. 633 4.2. Identifiers 635 In ROHC-FN identifiers are used for any of the following: 637 o encoding methods 638 o formats 639 o fields 640 o parameters 641 o constants 643 All identifiers may be of any length and may contain any combination 644 of alphanumeric characters and underscores, within the restrictions 645 defined in this section. 647 All identifiers must start with an alphabetic character. 649 It is illegal to have two or more identifiers that differ from each 650 other only in capitalisation, in the same scope. 652 All letters in identifiers for constants must be upper case. 654 It is illegal to use any of the following as identifiers (including 655 alternative capitalisations): 657 o "false", "true" 658 o "ENFORCE", "THIS", "VARIABLE" 659 o "ULENGTH", "UVALUE" 660 o "CLENGTH", "CVALUE" 661 o "UNCOMPRESSED", "COMPRESSED", "CONTROL", "INITIAL" or "DEFAULT" 663 Format names can not be referred to in the notation, although they 664 are considered to be identifiers. See Section 4.12.3.1) for more 665 details on format names. 667 All identifiers used in ROHC-FN have a "scope". The scope of an 668 identifier defines the parts of the specification where that 669 identifier applies and from which it can be referred to. If an 670 identifier has "global" scope, then it applies throughout the 671 specification which contains it and can be referred to from anywhere 672 within it. If an identifier has "local" scope, then it only applies 673 to the encoding method in which it is defined, it cannot be 674 referenced from outside the local scope of that encoding method. If 675 an identifier has local scope, that identifier can therefore be used 676 in multiple different local scopes to refer to different items. 678 All instances of an identifier within its scope refer to the same 679 item. It is not possible to have different items referred to by a 680 single identifier within any given scope. For this reason, if there 681 is an identifier which has global scope it can not be used separately 682 in a local scope, since a globally scoped identifier is already 683 applicable in all local scopes. 685 The identifiers for each encoding method and each constant all have 686 global scope. Each format and field also has an identifier. The 687 scope of format and field identifiers is local, with the exception of 688 global control fields which have global scope. Therefore it is 689 illegal for a format or field to have the same identifier as another 690 format or field within the same scope, or as an encoding method or a 691 constant (since they have global scope). 693 Note that although format names (see Section 4.12.3.1) are considered 694 to be identifiers, they are not referred to in the notation, but are 695 primarily for the benefit of the reader. 697 4.3. Constant Definitions 699 Constant values can be defined using the "=" operator. Identifiers 700 for constants must be all upper case. For example: 702 SOME_CONSTANT = 3; 704 Constants are defined by an expression (see Section 4.7) on the right 705 hand side of the "=" operator. The expression must yield a constant 706 value. That is, the expression must be one whose terms are all 707 either constants or literals and must not vary depending on the 708 header being compressed. 710 Constants have global scope. Constants must be defined at the top 711 level, outside any encoding method definition. Constants are 712 entirely equivalent to the value they refer to, and are completely 713 interchangeable with that value. Unlike field attributes, which may 714 change from packet to packet, constants have the same value for all 715 packets. 717 4.4. Fields 719 Fields are the basic building blocks of a ROHC-FN specification. 720 Fields are the units into which headers are divided. Each field may 721 have two forms: a compressed form and an uncompressed form. Both 722 forms are represented as bits exchanged between the compressor and 723 the decompressor in the same way, as an unsigned string of bits, most 724 significant bit first. 726 The properties of the compressed form of a field are defined by an 727 encoding method and/or "ENFORCE" statements. This entirely 728 characterises the relationship between the uncompressed and 729 compressed forms of that field. This is achieved by specifying the 730 relationships between the field's attributes. 732 The notation defines four field attributes, two for the uncompressed 733 form and a corresponding two for the compressed form. The attributes 734 available for each field are: 736 uncompressed attributes of a field: 737 o "UVALUE" and "ULENGTH", 739 compressed attributes of a field: 740 o "CVALUE" and "CLENGTH". 742 The two value attributes contain the respective numerical values of 743 the field, i.e. "UVALUE" gives the numerical value of the 744 uncompressed form of the field, and the attribute "CVALUE" gives the 745 numerical value of the compressed form of the field. The numerical 746 values are derived by interpreting the bit string representations of 747 the field as bit strings, most-significant bit first. 749 The two length attributes indicate the length in bits of the 750 associated bit string; "ULENGTH" for the uncompressed form, and 751 "CLENGTH" for the compressed form. 753 Attributes are undefined unless they are bound to a value in which 754 case they become defined. If two conflicting bindings are given for 755 a field attribute then the bindings fail along with the (combination 756 of) formats in which those bindings were defined. 758 Uncompressed attributes do not always reflect an aspect of the 759 uncompressed header. Some fields do not originate from the 760 uncompressed header, but are control fields. 762 4.4.1. Attribute References 764 Attributes of a particular field are formally referred to by using 765 the field's name followed by a "." and the attribute's identifier. 767 For example: 769 rtp_seq_number.UVALUE 771 gives the uncompressed value of the rtp_seq_number field. The 772 primary reason for referencing attributes is for use in expressions, 773 which are explained in Section 4.7. 775 4.4.2. Representation of Field Values 777 Fields are represented as bit strings. The bit string is calculated 778 using the value attribute ("val") and the length attribute ("len"). 779 The bit string is the binary representation of "val % (2 ^ len)". 781 For example if a field's "CLENGTH" attribute was 8, and its "CVALUE" 782 attribute was -1, the compressed representation of the field would be 783 "-1 % (2 ^ 8)", which equals "-1 % 256", which equals 255, 11111111 784 in binary. 786 ROHC-FN supports the full range of integers for use in expressions 787 (see Section 4.7), but the representation of the formats (i.e. the 788 bits exchanged between the compressor and the decompressor) is in the 789 above form. 791 4.5. Grouping of Fields 793 Since the order of fields in a "COMPRESSED" field list 794 (Section 4.12.1.2) do not have to be the same as the order of fields 795 in an "UNCOMPRESSED" field list (Section 4.12.1.1), it is possible to 796 group together any number of fields which are contiguous in a 797 "COMPRESSED" format, to allow them all to be encoded using a single 798 encoding method. The group of fields is specified immediately to the 799 left of "=:=" in place of a single field name. 801 The group is notated by giving a colon separated list of the fields 802 to be grouped together. For example there may be two non-contiguous 803 fields in an uncompressed header which are two halves of what is 804 effectively a single sequence number: 806 grouping_example 807 { 808 UNCOMPRESSED { 809 minor_seq_num; // 12 bits 810 other_field; // 8 bits 811 major_seq_num; // 4 bits 812 } 814 COMPRESSED { 815 other_field =:= irregular(8); 816 major_seq_num 817 : minor_seq_num =:= lsb(3, 0); 818 } 819 } 821 The group of fields is presented to the encoding method as a 822 contiguous group of bits, assembled by the concatenation of the 823 fields in the order they are given in the group. The most 824 significant bit of the combined field is the most significant bit of 825 the first field in the list, and the least significant bit of the 826 combined field is the least significant bit of the last field in the 827 list. 829 Finally, the length attributes of the combined field are equal to the 830 sum of the corresponding length attributes for all the fields in the 831 group. 833 4.6. "THIS" 835 Within the definition of an encoding method it is possible to refer 836 to the field (i.e. the group of contiguous bits) the method is 837 encoding, using the keyword "THIS". 839 This is useful for gaining access to the attributes of the field 840 being encoded. For example it is often useful to know the total 841 uncompressed length of the uncompressed format which is being 842 encoded: 844 THIS.ULENGTH 846 4.7. Expressions 848 ROHC-FN includes the usual infix style of expressions, with 849 parentheses "(" and ")" used for grouping. Expressions can be made 850 up of any of the components described in the following subsections. 852 The semantics of expressions are generally similar to the expressions 853 in the ANSI-C programming language [C90]. The definitive list of 854 expressions in ROHC-FN follows in the next subsections; the list 855 below provides some examples of the difference between expressions in 856 ANSI-C and expressions in ROHC-FN: 858 o There is no limit on the range of integers. 859 o "x ^ y" evaluates to x raised to the power of y. This has a 860 precedence higher than *, / and %, but lower than unary - and is 861 right to left associative. 862 o There is no comma operator 863 o There are no "modify" operators (no assignment operators and no 864 increment or decrement) 865 o There are no bitwise operators. 867 Expressions may refer to any of the attributes of a field (as 868 described in Section 4.4), to any defined constant (see Section 4.3) 869 and also to encoding method parameters, if any are in scope (see 870 Section 4.12). 872 If any of the attributes, constants or parameters used in the 873 expression are undefined, the value of the expression is undefined. 874 Undefined expressions cause the environment (e.g. the compressed 875 format) in which they are used to fail if a defined value is 876 required. Defined values are required for all compressed attributes 877 of fields which appear in the compressed format. Defined values are 878 not required for all uncompressed attributes of fields which appear 879 in the uncompressed format. It is up to the profile creator to 880 define what happens to the unbound field attributes in this case. It 881 should be noted that in such a case, transparency of the compression 882 process will be lost: i.e. it will not be possible for the 883 decompressor to reproduce the original header. 885 Expressions cannot be used as encoding methods directly because they 886 do not completely characterise a field. Expressions only specify a 887 single value whereas a field is made up of several values: its 888 attributes. For example, the following is illegal: 890 tcp_list_length =:= (data_offset + 20) / 4; 892 There is only enough information here to define a single attribute of 893 "tcp_list_length". Although this makes no sense formally, this could 894 intuitively be read as defining the "UVALUE" attribute. However, 895 that would still leave the length of the uncompressed field undefined 896 at the decompressor. Such usage is therefore prohibited. 898 4.7.1. Integer Literals 900 Integers can be expressed as decimal values, binary values (prefixed 901 by "0b"), or hexadecimal values (prefixed by "0x"). Negative 902 integers are prefixed by a "-" sign. For example "10", "0b1010" and 903 "-0x0a" are all valid integer literals, having the values ten, ten 904 and minus ten respectively. 906 4.7.2. Integer Operators 908 The following "integer" operators are available, which take integer 909 arguments and return an integer result: 911 o ^, for exponentiation. "x ^ y" returns the value of "x" to the 912 power of "y". 913 o *, / for multiplication and division. "x * y" returns the product 914 of "x" and "y". "x / y" returns the quotient, rounded down to the 915 next integer (the next one towards negative infinity). 916 o +, - for addition and subtraction. "x + y" returns the sum of "x" 917 and "y". "x - y" returns the difference. 918 o % for modulo. "x % y" returns "x" modulo "y"; x - y * (x / y). 920 4.7.3. Boolean Literals 922 The boolean literals are "false", and "true". 924 4.7.4. Boolean Operators 926 The following "boolean" operators are available, which take boolean 927 arguments and return a boolean result: 929 o &&, for logical "and". Returns true if both arguments are true. 930 Returns false otherwise. 931 o ||, for logical "or". Returns true if at least one argument is 932 true. Returns false otherwise. 934 o !, for logical not. Returns true if its argument is false. 935 Returns false otherwise. 937 4.7.5. Comparison Operators 939 The following "comparison" operators are available, which take 940 integer arguments and return a boolean result: 942 o ==, !=, for equality and its negative. "x == y" returns true if x 943 is equal to y. Returns false otherwise. "x != y" returns true if 944 x is not equal to y. Returns false otherwise. 945 o <, >, for less than and greater than. "x < y" returns true if x is 946 less than y. Returns false otherwise. "x > y" returns true if x 947 is greater than y. Returns false otherwise. 948 o >=, <=, for greater than or equal and less than or equal, the 949 inverse functions of <, >. "x >= y" returns false if x is less 950 than y. Returns true otherwise. "x <= y" returns false if x is 951 greater than y. Returns true otherwise. 953 4.8. Comments 955 Free English text can be inserted into a ROHC-FN specification to 956 explain why something has been done a particular way, to clarify the 957 intended meaning of the notation, or to elaborate on some point. 959 The FN uses an end of line comment style, which makes use of the "//" 960 comment marker. Any text between the "//" marker and the end of the 961 line has no formal meaning. For example: 963 //----------------------------------------------------------------- 964 // IR-REPLICATE header formats 965 //----------------------------------------------------------------- 967 // The following fields are included in all of the IR-REPLICATE 968 // header formats: 969 // 970 UNCOMPRESSED { 971 discriminator; // 8 bits 972 tcp_seq_number; // 32 bits 973 tcp_flags_ecn; // 2 bits 975 Comments do not affect the formal meaning of what is notated, but can 976 be used to improve readability. Their use is optional. 978 Comments may help to provide clarifications to the reader, and serve 979 different purposes to implementers. Comments should thus not be 980 considered of lesser importance when inserting them into a ROHC-FN 981 specification; they should be consistent with the normative part of 982 the specification. 984 4.9. "ENFORCE" Statements 986 The "ENFORCE" statement provides a way to add predicates to a format, 987 all of which must be fulfilled for the format to succeed. An 988 "ENFORCE" statement shares some similarities with an encoding method. 989 Specifically, whereas an encoding method binds several field 990 attributes at once, an "ENFORCE" statement typically binds just one 991 of them. In fact, all the bindings that encoding methods create can 992 be expressed in terms of a collection of "ENFORCE" statements. Here 993 is an example "ENFORCE" statement which binds the "UVALUE" attribute 994 of a field to 5. 996 ENFORCE(field.UVALUE == 5); 998 An "ENFORCE" statement must only be used inside a field list (see 999 Section 4.12). It attempts to force the expression given to be true 1000 for the format which it belongs to. 1002 An abbreviated form of "ENFORCE" statement is available for binding 1003 length attributes using "[" and "]", see Section 4.10. 1005 Like an encoding method, an "ENFORCE" statement can only be 1006 successfully used in a format if the binding it describes is 1007 achievable. A format containing the example "ENFORCE" statement 1008 above would not be usable if the field had also been bound within 1009 that same format with "uncompressed_value" encoding which gave it a 1010 "UVALUE" other than 5. 1012 An "ENFORCE" statement takes a boolean expression as a parameter. It 1013 can be used to assert that the expression is true, in order to choose 1014 a particular format from a list of possible formats specified in an 1015 encoding method (see Section 4.12), or just to bind an expression as 1016 in the example above. The general form of an "ENFORCE" statement is 1017 therefore: 1019 ENFORCE(); 1021 There are three possible conditions that the expression may be in: 1023 1. The boolean expression evaluates to false, in which case the 1024 local scope of the format that contains the "ENFORCE" statement 1025 cannot be used, 1026 2. The boolean expression evaluates to true, in which case the 1027 binding is created and successful, 1029 3. The value of the boolean expression is undefined. In this case, 1030 the binding is also created and successful. 1032 In all three cases, any undefined terms become bound by the 1033 expression. Generally speaking an "ENFORCE" statement is either 1034 being used as an assignment (condition 3 above) or else it is being 1035 used to test if a particular format is usable, as is the case with 1036 conditions 1 and 2. 1038 4.10. Formal Specification of Field Lengths 1040 In many of the preceding examples each field has been followed by a 1041 comment indicating the length of the field. Indicating the length of 1042 a field like this is optional, but can be very helpful for the 1043 reader. However, whilst useful to the reader, comments have no 1044 formal meaning. 1046 One of the most common uses for "ENFORCE" statements (see 1047 Section 4.9) is to explicitly define the length of a field within a 1048 header. Using "ENFORCE" statements for this purpose has formal 1049 meaning but is not so easy to read. Therefore an abbreviated form is 1050 provided for this use of "ENFORCE", which is both easy to read and 1051 has formal meaning. 1053 An expression defining the length of a field can be specified in 1054 square brackets after the appearance of that field in a format. If 1055 the field can take several alternative lengths then the expressions 1056 defining those lengths can be enumerated as a comma separated list 1057 within the square brackets. For example, 1059 field_1 [ 4 ]; 1060 field_2 [ a+b, 2 ]; 1061 field_3 =:= lsb(16, 16) [ 26 ]; 1063 The actual length attribute which is bound by this notation depends 1064 on whether it appears in a "COMPRESSED", "UNCOMPRESSED" or "CONTROL" 1065 field list (see Section 4.12.1 and its subsections). In a 1066 "COMPRESSED" field list, the field's "CLENGTH" attribute is bound. 1067 In "UNCOMPRESSED" and "CONTROL" field lists, the field's "ULENGTH" 1068 attribute is bound. Abbreviated "ENFORCE" statements are not allowed 1069 in "DEFAULT" sections (see Section 4.12.1.5). Therefore the above 1070 notation would not be allowed to appear in a "DEFAULT" section. 1071 However if the above appeared in an "UNCOMPRESSED" or "CONTROL" 1072 section it would be equivalent to: 1074 field_1; ENFORCE(field_1.ULENGTH == 4); 1075 field_2; ENFORCE((field_2.ULENGTH == 2) 1076 || (field_2.ULENGTH == a+b)); 1078 field_3 =:= lsb(16, 16); ENFORCE(field_3.ULENGTH == 26); 1080 A special case exists for fields which have a variable length, that 1081 the notator does not wish to define or is not able to define using an 1082 expression. The keyword "VARIABLE" can be used in this case: 1084 variable_length_field [ VARIABLE ]; 1086 Formally this provides no restrictions on the field length, but maps 1087 onto any positive integer or to a value of zero. It will therefore 1088 be necessary to define the length of the field elsewhere (see the 1089 final paragraphs of Section 4.12.1.1 and Section 4.12.1.2). This may 1090 either be in the notation or in the English text of the profile 1091 within which the FN is contained. Within the square brackets, the 1092 keyword "VARIABLE" may be used as a term in an expression, just like 1093 any other term that normally appears in an expression. For example: 1095 field [ 8 * (5 + VARIABLE) ]; 1097 This defines a field whose length is a whole number of octets and at 1098 least 40 bits (5 octets) long. 1100 4.11. Library of Encoding Methods 1102 A number of common techniques for compressing header fields are 1103 defined as part of the ROHC-FN library so that they can be reused 1104 when creating new ROHC-FN specifications. Their notation is 1105 described below. 1107 As an alternative or a complement to this library of encoding 1108 methods, a ROHC-FN specification can define its own set of encoding 1109 methods, using the formal notation (see Section 4.12) or using a 1110 textual definition (see Section 4.13). 1112 4.11.1. uncompressed_value 1114 The "uncompressed_value" encoding method is used to encode header 1115 fields for which the uncompressed value can be defined using a 1116 mathematical expression (including constant values). This encoding 1117 method is defined as follows: 1119 uncompressed_value(len, val) { 1120 UNCOMPRESSED { 1121 field; 1122 ENFORCE(field.ULENGTH == len); 1123 ENFORCE(field.UVALUE == val); 1124 } 1125 COMPRESSED { 1126 field; 1127 ENFORCE(field.CLENGTH == 0); 1128 } 1129 } 1131 To exemplify the usage of "uncompressed_value" encoding, the IPv6 1132 header version number is a four bit field that always has the value 1133 6: 1135 version =:= uncompressed_value(4, 6); 1137 Here is another example of value encoding, using an expression to 1138 calculate the length: 1140 padding =:= uncompressed_value(nbits - 8, 0); 1142 The expression above uses an encoding method parameter, "nbits", 1143 which in this example specifies how many significant bits there are 1144 in the data, to calculate how many pad bits to use. See 1145 Section 4.12.2 for more information on encoding method parameters. 1147 4.11.2. compressed_value 1149 The "compressed_value" encoding method is used to define fields in 1150 compressed formats for which there is no counterpart in the 1151 uncompressed format (i.e. control fields). It can be used to specify 1152 compressed fields whose value can be defined using a mathematical 1153 expression (including constant values). This encoding method is 1154 defined as follows: 1156 compressed_value(len, val) { 1157 UNCOMPRESSED { 1158 field; 1159 ENFORCE(field.ULENGTH == 0); 1160 } 1161 COMPRESSED { 1162 field; 1163 ENFORCE(field.CLENGTH == len); 1164 ENFORCE(field.CVALUE == val); 1165 } 1166 } 1168 One possible use of this encoding method is to define padding in a 1169 compressed format: 1171 pad_to_octet_boundary =:= compressed_value(3, 0); 1173 A more common use is to define a discriminator field to make it 1174 possible to differentiate between different compressed formats within 1175 an encoding method (see Section 4.12). For convenience, the notation 1176 provides syntax for specifying "compressed_value" encoding in the 1177 form of a binary string. The binary string to be encoded is simply 1178 given in single quotes; the "CLENGTH" attribute of the field binds 1179 with the number of bits in the string, while its "CVALUE" attribute 1180 binds with the value given by the string. For example: 1182 discriminator =:= '01101'; 1184 This has exactly the same meaning as: 1186 discriminator =:= compressed_value(5, 13); 1188 4.11.3. irregular 1190 The "irregular" encoding method is used to encode a field in the 1191 compressed format with a bit pattern identical to the uncompressed 1192 field. This encoding method is defined as follows: 1194 irregular(len) { 1195 UNCOMPRESSED { 1196 field; 1197 ENFORCE(field.ULENGTH == len); 1198 } 1199 COMPRESSED { 1200 field; 1201 ENFORCE(field.CLENGTH == len); 1202 ENFORCE(field.CVALUE == field.UVALUE); 1203 } 1204 } 1206 For example, the checksum field of the TCP header is a sixteen bit 1207 field that does not follow any predictable pattern from one header to 1208 another (and so cannot be compressed): 1210 tcp_checksum =:= irregular(16); 1212 Note that the length does not have to be constant, for example the 1213 length expression can be used to derive the length of the field from 1214 the value of another field. 1216 4.11.4. static 1218 The "static" encoding method compresses a field whose length and 1219 value are the same as for a previous header in the flow, i.e. where 1220 the field completely matches an existing entry in the context: 1222 field =:= static; 1224 The field's "UVALUE" and "ULENGTH" attributes bind with their 1225 respective values in the context and the "CLENGTH" attribute is bound 1226 to zero. 1228 Since the field value is the same as a previous field value, the 1229 entire field can be reconstructed from the context, so it is 1230 compressed to zero bits and does not appear in the compressed format. 1232 For example, the source port of the TCP header is a field whose value 1233 does not change from one packet to the next for a given flow: 1235 src_port =:= static; 1237 4.11.5. lsb 1239 The least significant bits encoding method, "lsb", compresses a field 1240 whose value differs by a small amount from the value stored in the 1241 context. The least significant bits of the field value are 1242 transmitted instead of the original field value. 1244 field =:= lsb(, ); 1246 Here, "num_lsbs_param" is the number of least significant bits to 1247 use, and "offset_param" is the interpretation interval offset as 1248 defined below. 1250 The parameter "num_lsbs_param" binds with the "CLENGTH" attribute, 1251 the "UVALUE" attribute binds to the value within the interval whose 1252 least significant bits match the "CVALUE" attribute. The value of 1253 the "ULENGTH" can be derived from the information stored in the 1254 context. 1256 For example, the TCP sequence number: 1258 tcp_sequence_number =:= lsb(14, 8192); 1260 This takes up 14 bits, and can communicate any value which is between 1261 8192 lower than the value of the field stored in context and 8191 1262 above it. 1264 The interpretation interval can be described as a function of a value 1265 stored in the context, ref_value, and of num_lsbs_param: 1267 f(context_value, num_lsbs_param) = [ref_value - offset_param, 1268 ref_value + (2^num_lsbs_param - 1) - offset_param] 1270 where offset_param is an integer. 1272 <-- interpretation interval (size is 2^num_lsbs_param) --> 1273 |---------------------------+----------------------------| 1274 lower ref_value upper 1275 bound bound 1277 where: 1279 lower bound = ref_value - offset_param 1280 upper bound = ref_value + (2^num_lsbs_param-1) - offset_param 1282 The "lsb" encoding method can therefore compress a field whose value 1283 lies between the lower and the upper bounds, inclusively, of the 1284 interpretation interval. In particular, if offset_param = 0 then the 1285 field value can only stay the same or increase relative to the 1286 reference value ref_value. If offset_param = -1 then it can only 1287 increase, whereas if offset_param = 2^num_lsbs_param then it can only 1288 decrease. 1290 The compressed field takes up the specified number of bits in the 1291 compressed format (i.e. num_lsbs_param). 1293 The compressor may not be able to determine the exact reference value 1294 stored in the decompressor context and that will be used by the 1295 decompressor, since some packets that would have updated the context 1296 may have been lost or damaged. However, from feedback received or by 1297 making assumptions, the compressor can limit the candidate set of 1298 values. The compressor can then select a format that uses an "lsb" 1299 encoding defined with suitable values for its parameters 1300 num_lsbs_param and offset_param, such that no matter which context 1301 value in the candidate set the decompressor uses, the resulting 1302 decompression is correct. If that is not possible, the "lsb" 1303 encoding method fails (which typically results in a less efficient 1304 compressed format being chosen by the compressor). How the 1305 compressor determines what reference values it stores and maintains 1306 in its set of candidate references is outside the scope of the 1307 notation. 1309 4.11.6. crc 1311 The "crc" encoding method provides a CRC calculated over a block of 1312 data. The algorithm used to calculate the CRC is the one specified 1313 in [I-D.ietf-rohc-rfc3095bis-framework]. The "crc" method takes a 1314 number of parameters: 1316 o the number of bits for the CRC (crc_bits), 1317 o the bit-pattern for the polynomial (bit_pattern), 1318 o the initial value for the CRC register (initial_value), 1319 o the value of the block of data, represented using either the 1320 "UVALUE" or "CVALUE" attribute of a field (block_data_value); and 1321 o the size in octets of the block of data (block_data_length). 1323 i.e.: 1325 field =:= crc(, , , 1326 , ); 1328 When specifying the bit pattern for the polynomial, each bit 1329 represents the coefficient for the corresponding term in the 1330 polynomial. Note that the highest order term is always present (by 1331 definition) and therefore does not need specifying in the bit 1332 pattern. Therefore a CRC polynomial with n terms in it is 1333 represented by a bit pattern with n-1 bits set. 1335 The CRC is calculated in least significant bit (LSB) order. 1337 For example: 1339 // 3 bit CRC, C(x) = x^0 + x^1 + x^3 1340 crc_field =:= crc(3, 0x6, 0xF, THIS.CVALUE, THIS.CLENGTH); 1342 Usage of the "THIS" keyword (see Section 4.6) as shown above, is 1343 typical when using "crc" encoding. For example, when used in the 1344 encoding method for an entire header, it causes the CRC to be 1345 calculated over all fields in the header. 1347 4.12. Definition of Encoding Methods 1349 New encoding methods can be defined in a formal specification. These 1350 compose groups of individual fields into a contiguous block. 1352 Encoding methods have names and may have parameters; they can also be 1353 used in the same way as any other encoding method from the library of 1354 encoding methods. Since they can contain references to other 1355 encoding methods, complicated formats can be broken down into 1356 manageable pieces in a hierarchical fashion. 1358 This section describes the various features used to define new 1359 encoding methods. 1361 4.12.1. Structure 1363 This simplest form of defining an encoding method is to specify a 1364 single encoding. For example: 1366 compound_encoding_method 1367 { 1368 UNCOMPRESSED { 1369 field_1; // 4 bits 1370 field_2; // 12 bits 1371 } 1373 COMPRESSED { 1374 field_2 =:= uncompressed_value(12, 9); // 0 bits 1375 field_1 =:= irregular(4); // 4 bits 1376 } 1377 } 1379 The above begins with the new method's identifier, 1380 "compound_encoding_method". The definition of the method then 1381 follows inside curly braces, "{" and "}". The first item in the 1382 definition is the "UNCOMPRESSED" field list, which gives the order of 1383 the fields in the uncompressed format. This is followed by the 1384 compressed format field list ("COMPRESSED"). This list gives the 1385 order of fields in the compressed format and also gives the encoding 1386 method for each field. 1388 In the example both the formats list each field exactly once. 1389 Sometimes however it is necessary to specify more than one binding 1390 for a given field, which means it appears more than once in the field 1391 list. In this case it is the first occurrence of the field in the 1392 list which indicates its position in the field order. The subsequent 1393 occurrences of the field only specify binding information, not field 1394 order information. 1396 The different components of this example are described in more detail 1397 below. Other components that can be used in the definition of 1398 encoding methods are also defined thereafter. 1400 4.12.1.1. Uncompressed Format - "UNCOMPRESSED" 1402 The uncompressed field list is defined by "UNCOMPRESSED", which 1403 specifies the fields of the uncompressed format in the order that 1404 they appear in the uncompressed header. The sum of the length of 1405 each individual uncompressed field in the list must be equal to the 1406 length of the field being encoded. Finally, the representation of 1407 the uncompressed format described using the list of fields in the 1408 "UNCOMPRESSED" section, for which compressed formats are being 1409 defined, always consists of one single contiguous block of bits. 1411 In the example above in Section 4.12.1, the uncompressed field list 1412 is "field_1" followed by "field_2". This means that a field being 1413 encoded by this method is divided into two subfields, "field_1" and 1414 "field_2". The total uncompressed lengths of these two fields 1415 therefore equals the length of the field being encoded: 1417 field_1.ULENGTH + field_2.ULENGTH == THIS.ULENGTH 1419 In the example, there are only two fields, but any number of 1420 subfields may be used. This relationship applies to however many 1421 fields are actually used. Any arrangement of fields that efficiently 1422 describes the content of the uncompressed header may be chosen -- 1423 this need not be the same as the one described in the specifications 1424 for the protocol header being compressed. 1426 For example, there may be a protocol whose header contains a 16 bit 1427 sequence number, but whose sessions tend to be short lived. This 1428 would mean that the high bits of the sequence number are almost 1429 always constant. The "UNCOMPRESSED" format could reflect this by 1430 splitting the original uncompressed field into two fields, one field 1431 to represent the almost-always-zero part of the sequence number, and 1432 a second field to represent the salient part. 1434 An "UNCOMPRESSED" field list may specify encoding methods in the same 1435 way as the "COMPRESSED" field list in the example. Encoding methods 1436 specified therein are used whenever a packet with that uncompressed 1437 format is being encoded. The encoding of a packet with a given 1438 uncompressed format can only succeed if all of its encoding methods 1439 and "ENFORCE" statements succeed (see Section 4.9). 1441 The total length of an uncompressed format must always be defined. 1442 The length of each of the fields in an uncompressed format must also 1443 be defined. This means that the bindings in the "UNCOMPRESSED", 1444 "COMPRESSED" (see Section 4.12.1.2 below), "CONTROL" (see 1445 Section 4.12.1.3 below), "INITIAL" (see Section 4.12.1.4 below) and 1446 "DEFAULT" (see Section 4.12.1.5 below) field lists must between them 1447 define the "ULENGTH" attribute of every field in an uncompressed 1448 format so that there is an unambiguous mapping from the bits in the 1449 uncompressed format to the fields listed in each "UNCOMPRESSED" field 1450 list. 1452 4.12.1.2. Compressed Format - "COMPRESSED" 1454 Similar to the uncompressed field list, the compressed header will 1455 appear in the order specified by the compressed field list given for 1456 a compressed format. Each individual field is encoded in the manner 1457 given for that field. The total length of the compressed data will 1458 be the sum of the compressed lengths of all the individual fields. 1459 In the example from Section 4.12.1, the encoding methods used for 1460 these fields indicate that they are zero and 4 bits long, making a 1461 total of 4 bits. 1463 The order of the fields specified in a "COMPRESSED" field list does 1464 not have to match the order they appear in the "UNCOMPRESSED" field 1465 list. It may be desirable to reorder the fields in the compressed 1466 format to align the compressed header to the octet boundary, or for 1467 other reasons. In the above example, the order is in fact the 1468 opposite of that in the uncompressed format. 1470 The compressed field list specifies that the encoding for "field_1" 1471 is "irregular", and takes up four bits in both the compressed format 1472 and uncompressed format. The encoding for "field_2" is 1473 "uncompressed_value", which means that the field has a fixed value, 1474 so it can be compressed to zero bits. The value it takes is 9, and 1475 it is 12 bits wide in the uncompressed format. 1477 Fields like "field_2", which compress to zero bits in length, may 1478 appear anywhere in the field list without changing the compressed 1479 format because their position in the list is not significant. In 1480 fact, if the encoding method for this field were defined elsewhere 1481 (e.g. in the "UNCOMPRESSED" section), this field could be omitted 1482 from the "COMPRESSED" section altogether: 1484 compound_encoding_method 1485 { 1486 UNCOMPRESSED { 1487 field_1; // 4 bits 1488 field_2 =:= uncompressed_value(12, 9); // 12 bits 1489 } 1491 COMPRESSED { 1492 field_1 =:= irregular(4); // 4 bits 1493 } 1494 } 1496 The total length of a compressed format must always be defined. The 1497 length of each of the fields in a compressed format must also be 1498 defined. This means that the bindings in the "UNCOMPRESSED", 1499 "COMPRESSED", "CONTROL" (see Section 4.12.1.3 below), "INITIAL" (see 1500 Section 4.12.1.4 below) and "DEFAULT" (see Section 4.12.1.5 below) 1501 field lists must between them define the "CLENGTH" attribute of every 1502 field in a compressed format so that there is an unambiguous mapping 1503 from the bits in the compressed format to the fields listed in each 1504 "COMPRESSED" field list. 1506 4.12.1.3. Control Fields - "CONTROL" 1508 Control fields are defined using the "CONTROL" field list. The 1509 control field list specifies all fields that do not appear in the 1510 uncompressed format but which have an uncompressed value 1511 (specifically those with an "ULENGTH" greater than zero). Such 1512 fields may be used to help compress fields from the uncompressed 1513 format more efficiently. A control field could be used to improve 1514 efficiency by representing some commonality between a number of the 1515 uncompressed fields, or by representing some information about the 1516 flow that is not explicitly contained in the protocol headers. 1518 For example in IPv4, the behaviour of the IP-ID field in a flow 1519 varies depending on how the endpoints handle IP-IDs. Sometimes the 1520 behaviour is effectively random and sometimes the IP-ID follows a 1521 predictable sequence. The type of IP-ID behaviour is information 1522 that is never communicated explicitly in the uncompressed header. 1524 However, a profile can still be designed to identify the behaviour 1525 and adjust the compression strategy according to the identified 1526 behaviour, thereby improving the compression performance. To do so, 1527 the ROHC-FN specification can introduce an explicit field to 1528 communicate the IP-ID behaviour in compressed format -- this is done 1529 by introducing a control field: 1531 ipv4 1532 { 1533 UNCOMPRESSED { 1534 version; // 4 bits 1535 hdr_length; // 4 bits 1536 protocol; // 8 bits 1537 tos_tc; // 6 bits 1538 ip_ecn_flags; // 2 bits 1539 ttl_hopl; // 8 bits 1540 df; // 1 bit 1541 mf; // 1 bit 1542 rf; // 1 bit 1543 frag_offset; // 13 bits 1544 ip_id; // 16 bits 1545 src_addr; // 32 bits 1546 dst_addr; // 32 bits 1547 checksum; // 16 bits 1548 length; // 16 bits 1549 } 1551 CONTROL { 1552 ip_id_behavior; // 1 bit 1553 : 1554 : 1556 The "CONTROL" field list is equivalent to the "UNCOMPRESSED" field 1557 list for fields that do not appear in the uncompressed format. It 1558 defines a field that has the same properties (the same defined 1559 attributes etc.) as fields appearing in the uncompressed format. 1561 Control fields are initialised by using the appropriate encoding 1562 methods and/or by using "ENFORCE" statements. This may be done 1563 inside the "CONTROL" field list. 1565 For example: 1567 example_encoding_method_definition 1568 { 1569 UNCOMPRESSED { 1570 field_1 =:= some_encoding; 1571 } 1573 CONTROL { 1574 scaled_field; 1575 ENFORCE(scaled_field.UVALUE == field_1.UVALUE / 8); 1576 ENFORCE(scaled_field.ULENGTH == field_1.ULENGTH - 3); 1577 } 1579 COMPRESSED { 1580 scaled_field =:= lsb(4, 0); 1581 } 1582 } 1584 This control field is used to scale down a field in the uncompressed 1585 format by a factor of 8 before encoding it with the "lsb" encoding 1586 method. Scaling it down makes the "lsb" encoding more efficient. 1588 Control fields may also be used with global scope. In this case 1589 their declaration must be outside of any encoding method definition. 1590 They are then visible within any encoding method thus allowing 1591 information to be shared between encoding methods directly. 1593 4.12.1.4. Initial Values - "INITIAL" 1595 In order to allow fields in the very first usage of a specific format 1596 to be compressed with "static", "lsb", or other encoding methods 1597 which depend on the context, it is possible to specify initial 1598 bindings for such fields. This is done using "INITIAL", for example: 1600 INITIAL { 1601 field =:= uncompressed_value(4, 6); 1602 } 1604 This initialises the "UVALUE" of "field" to 6 and initialises its 1605 "ULENGTH" to 4. Unlike all other bindings specified in the formal 1606 notation, these bindings are applied to the context of the field, if 1607 the field's context is undefined. This is particularly useful when 1608 using encoding methods which rely on context being present, such as 1609 "static" or "lsb", for e.g. the first packet in a flow. 1611 Because the "INITIAL" field list is used to bind the context alone, 1612 it makes no sense to specify initial bindings which themselves rely 1613 on the context (e.g. lsb). Such usage is not allowed. 1615 4.12.1.5. Default Field Bindings - "DEFAULT" 1617 Default bindings may be specified for each field or attribute. The 1618 default encoding methods specify the encoding method to use for a 1619 field if no binding is given elsewhere for the value of that field. 1620 This is helpful to keep the definition of the formats concise, as the 1621 same encoding method need not be repeated for every format, when for 1622 example defining multiple formats (see Section 4.12.3). 1624 Default bindings are optional and may be given for any combination of 1625 fields and attributes which are in scope. 1627 The syntax for specifying default bindings is similar to that used to 1628 specify a compressed or uncompressed format. However, the order of 1629 the fields in the field list does not affect the order of the fields 1630 in either the compressed or uncompressed format. This is because the 1631 field order is specified individually for each "COMPRESSED" format 1632 and "UNCOMPRESSED" format. 1634 Here is an example: 1636 DEFAULT { 1637 field_1 =:= uncompressed_value(4, 1); 1638 field_2 =:= uncompressed_value(4, 2); 1639 field_3 =:= lsb(3, -1); 1640 ENFORCE(field_4.ULENGTH == 4); 1641 } 1643 Here default bindings are specified for fields 1 to 3. A default 1644 binding for the "ULENGTH" attribute of field_4 is also specified. 1646 Fields for which there is a default encoding method do not need their 1647 bindings to be specified in the field list of any format that uses 1648 the default encoding method for that field. Any format that does not 1649 use the default encoding method must explicitly specify a binding for 1650 the value of that field's attributes. 1652 If a binding is not specified for the attributes of a field, the 1653 default encoding method is used. If the default encoding method 1654 always compresses the field down to zero bits, the field can be 1655 omitted from the compressed format's field list. Like any other zero 1656 bit field, its position in the field list is not significant. 1658 The "DEFAULT" field list may contain default bindings for individual 1659 attributes by using "ENFORCE" statements. A default binding for an 1660 individual attribute will only be used if there is no binding given 1661 for that attribute nor the field to which it belongs. If there is an 1662 "ENFORCE" statement binding that attribute, or an encoding method 1663 binding the field to which it belongs, the default binding for the 1664 attribute will not be used. This applies even if the specified 1665 encoding method does not bind the particular attribute given in the 1666 "DEFAULT" section. However an "ENFORCE" statement which just binds 1667 the length of the field still allows the default bindings to be used, 1668 except for default "ENFORCE" statements which bind nothing but the 1669 field's length. 1671 To clarify, assuming the default methods given in the example above, 1672 the first three of the following four compressed formats would not 1673 use the default binding for "field_4.ULENGTH": 1675 COMPRESSED format1 { 1676 ENFORCE(field_4.ULENGTH == 3); // set ULENGTH to 3 1677 ENFORCE(field_4.UVALUE == 7); // set UVALUE to 7 1678 } 1680 COMPRESSED format2 { 1681 field_4 =:= irregular(3); // set ULENGTH to 3 1682 } 1684 COMPRESSED format3 { 1685 field_4 =:= '1010'; // set ULENGTH to zero 1686 } 1688 COMPRESSED format4 { 1690 ENFORCE(field_4.UVALUE == 12); // use default ULENGTH 1691 } 1693 The fourth format is the only one which uses the default binding for 1694 "field_4.ULENGTH". 1696 In summary, the default bindings of an encoding method are only used 1697 for formats which do not already specify an encoding for the value of 1698 all of their fields. For the formats that do use the default 1699 methods, only those fields and attributes whose bindings are not 1700 specified are looked up in the default methods. 1702 4.12.2. Arguments 1704 Encoding methods may take arguments that control the mapping between 1705 compressed and uncompressed fields. These are specified immediately 1706 after the method's name, in parentheses, as a comma separated list. 1708 For example: 1710 poor_mans_lsb(variable_length) 1711 { 1712 UNCOMPRESSED { 1713 constant_bits; 1714 variable_bits; 1715 } 1717 COMPRESSED { 1718 variable_bits =:= irregular(variable_length); 1719 constant_bits =:= static; 1720 } 1721 } 1723 As with any encoding method, all arguments take individual values 1724 such as an integer literal or a field attribute, rather than entire 1725 fields. Although entire fields cannot be passed as arguments, it is 1726 possible to pass each of their attributes instead, which is 1727 equivalent. 1729 Recall that all bindings are two-way so that rather than the 1730 arguments acting as "inputs" to the encoding method, the result of an 1731 encoding method may be to bind the parameters passed to it. 1733 For example: 1735 set_to_double(arg1, arg2) 1736 { 1737 CONTROL { 1738 ENFORCE(arg1 == 2 * arg2); 1739 } 1740 } 1742 This encoding method will attempt to bind the first argument to twice 1743 the value of the second. In fact this "encoding" method is 1744 pathological. Since it defines no fields, it does not do any actual 1745 encoding at all. "CONTROL" sections are more appropriate to use for 1746 this purpose than "UNCOMPRESSED". 1748 4.12.3. Multiple Formats 1750 Encoding methods can also define multiple formats for a given header. 1751 This allows different compression methods to be used depending on 1752 what is the most efficient way of compressing a particular header. 1754 For example, a field may have a fixed value most of the time, but the 1755 value may occasionally change. Using a single format for the 1756 encoding, this field would have to be encoded using "irregular" (see 1757 Section 4.11.3), even though the value only changes rarely. However, 1758 by defining multiple formats, we can provide two alternative 1759 encodings: one for when the value remains fixed and another for when 1760 the value changes. 1762 This is the topic of the following sub-sections. 1764 4.12.3.1. Naming Convention 1766 When compressed formats are defined, they must be defined using the 1767 reserved word "COMPRESSED". Similarly uncompressed formats must be 1768 defined using the reserved word "UNCOMPRESSED". After each of these 1769 keywords, a name may be given for the format. If no name is given to 1770 the format, the name of the format is empty. 1772 Format names, except for the case where the name is empty, follow the 1773 syntactic rules of identifiers as described in Section 4.2. 1775 Format names must be unique within the scope of the encoding method 1776 to which they belong, except for the empty name which may be used for 1777 one "COMPRESSED" and one "UNCOMPRESSED" format. 1779 4.12.3.2. Format Discrimination 1781 Each of the compressed formats has its own field list. A compressor 1782 may pick any of these alternative formats to compress a header, as 1783 long as the field bindings it employs can be used with the 1784 uncompressed format. For example, the compressor could not choose to 1785 use a compressed format that had a "static" encoding for a field 1786 whose "UVALUE" attribute differs from its corresponding value in the 1787 context. 1789 More formally, the compressor can choose any combination of an 1790 uncompressed format and a compressed format for which no binding for 1791 any of the field's attributes "fail", i.e. the encoding methods and 1792 "ENFORCE" statements (see Section 4.9) which bind their compressed 1793 attributes succeed. If there are multiple successful combinations, 1794 the compressor can choose any one. Otherwise if there are no 1795 successful combinations, the encoding method "fails". A format will 1796 never fail due to it not defining an uncompressed attribute of a 1797 field. A format only fails if it fails to define one of the 1798 compressed attributes of one of the fields in the compressed format. 1800 Because the compressor has a choice, it must be possible for the 1801 decompressor to discriminate between the different compressed formats 1802 that the compressor could have chosen. A simple approach to this 1803 problem is for each compressed format to include a "discriminator" 1804 that uniquely identifies that particular "COMPRESSED" format. A 1805 discriminator is a control field; it is not derived from any of the 1806 uncompressed field values (see Section 4.11.2). 1808 4.12.3.3. Example of Multiple Formats 1810 Putting this all together, here is a complete example of the 1811 definition of an encoding method with multiple compressed formats: 1813 example_multiple_formats 1814 { 1815 UNCOMPRESSED { 1816 field_1; // 4 bits 1817 field_2; // 4 bits 1818 field_3; // 24 bits 1819 } 1821 DEFAULT { 1822 field_1 =:= static; 1823 field_2 =:= uncompressed_value(4, 2); 1824 field_3 =:= lsb(4, 0); 1825 } 1827 COMPRESSED format0 { 1828 discriminator =:= '0'; // 1 bit 1829 field_3; // 4 bits 1830 } 1832 COMPRESSED format1 { 1833 discriminator =:= '1'; // 1 bit 1834 field_1 =:= irregular(4); // 4 bits 1835 field_3 =:= irregular(24); // 24 bits 1836 } 1837 } 1839 Note the following: 1841 o "field_1" and "field_3" both have default encoding methods 1842 specified for them, which are used in "format0", but are 1843 overridden in "format1"; the default encoding method of "field_2" 1844 however, is not overridden. 1845 o "field_1" and "field_2" have default encoding methods which 1846 compress to zero bits. When these are used in "format0", the 1847 field names do not appear in the field list. 1848 o "field_3" has an encoding method which does not compress to zero 1849 bits, so whilst "field_3" has no encoding specified for it in the 1850 field list of "format0", it still needs to appear in the field 1851 list to specify where it goes in the compressed format. 1853 o In the example, all the fields in the uncompressed format have 1854 default encoding methods specified for them, but this is not a 1855 requirement. Default encodings can be specified for only some or 1856 even none of the fields of the uncompressed format. 1857 o In the example, all the default encoding methods are on fields 1858 from the uncompressed format, but this is not a requirement. 1859 Default encoding methods can be specified for control fields. 1861 4.13. Profile-specific Encoding Methods 1863 The library of encoding methods defined by ROHC-FN in Section 4.11 1864 provides a basic and generic set of field encoding methods. When 1865 using a ROHC-FN specification in a ROHC profile, some additional 1866 encodings specific to the particular protocol header being compressed 1867 may however be needed, such as methods that infer the value of a 1868 field from other values. 1870 These methods are specific to the properties of the protocol being 1871 compressed and will thus have to be defined within the profile 1872 specification itself. Such profile-specific encoding methods, 1873 defined either in ROHC-FN syntax or rigorously in plain text, can be 1874 referred to in the ROHC-FN specification of the profile's formats in 1875 the same way as any other method in the ROHC-FN library. 1877 Encoding methods which are not defined in the formal notation are 1878 specified by giving their name, followed by a short description of 1879 where they are defined, in double quotes, and a semi-colon. 1881 For example: 1883 inferred_ip_v4_header_checksum "defined in RFCxxxx Section 6.4.1"; 1885 5. Security considerations 1887 This draft describes a formal notation similar to ABNF [RFC4234], and 1888 hence is not believed to raise any security issues (note that ABNF 1889 has a completely separate purpose to the ROHC formal notation). 1891 6. IANA Considerations 1893 This document has no actions for IANA. 1895 7. Contributors 1897 Richard Price did much of the foundational work on the formal 1898 notation. He authored the initial internet draft describing a formal 1899 notation on which this document is based. 1901 Kristofer Sandlund contributed to this work by applying new ideas to 1902 the ROHC-TCP profile, by providing feedback and by helping resolving 1903 different issues during the entire development of the notation. 1905 Carsten Bormann provided the translation of the formal notation 1906 syntax using ABNF in Appendix A, and also contributed with feedback 1907 and reviews to validate the completeness and the correctness of the 1908 notation. 1910 8. Acknowledgements 1912 A number of important concepts and ideas have been borrowed from ROHC 1913 [RFC3095]. 1915 Thanks to Mark West, Eilert Brinkmann, Alan Ford and Lars-Erik 1916 Jonsson for their contribution, reviews and feedback which led to 1917 significant improvements to the readability, completeness and overall 1918 quality of the notation. 1920 Thanks to Stewart Sadler, Caroline Daniels, Alan Finney and David 1921 Findlay for their reviews and comments. Thanks to Rob Hancock and 1922 Stephen McCann for early work on the formal notation. The authors 1923 would also like to thank Christian Schmidt, Qian Zhang, Hongbin Liao 1924 and Max Riegel for their comments and valuable input. 1926 Additional thanks: this document was reviewed during working group 1927 last-call by committed reviewers Mark West, Carsten Bormann and Joe 1928 Touch, as well as by Sally Floyd who provided a review at the request 1929 of the Transport Area Directors. Thanks also to Magnus Westerlund 1930 for his feedback in preparation for the IESG review. 1932 9. References 1934 9.1. Normative References 1936 [C90] ISO/IEC, "ISO/IEC 9899:1990 Information technology -- 1937 Programming Language C", ISO 9899:1990, April 1990. 1939 [I-D.ietf-rohc-rfc3095bis-framework] 1940 Jonsson, L., "The RObust Header Compression (ROHC) 1941 Framework", draft-ietf-rohc-rfc3095bis-framework-01 (work 1942 in progress), July 2006. 1944 [RFC2822] Resnick, P., Ed., "STANDARD FOR THE FORMAT OF ARPA 1945 INTERNET TEXT MESSAGES", RFC 2822, April 2001. 1947 [RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax 1948 Specifications: ABNF", RFC 4234, October 2005. 1950 9.2. Informative References 1952 [RFC3095] Bormann, C., Burmeister, C., Degermark, M., Fukushima, H., 1953 Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le, 1954 K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, K., 1955 Wiebke, T., Yoshimura, T., and H. Zheng, "RObust Header 1956 Compression (ROHC): Framework and four profiles: RTP, UDP, 1957 ESP, and uncompressed", RFC 3095, July 2001. 1959 [RFC791] University of Southern California, "DARPA INTERNET PROGRAM 1960 PROTOCOL SPECIFICATION", RFC 791, September 1981. 1962 Appendix A. Formal Syntax of ROHC-FN 1964 This section gives a definition of the syntax of ROHC-FN in ABNF 1965 [RFC4234], using "fnspec" as the start rule. 1966 ; overall structure 1967 fnspec = S *(constdef S) [globctl S] 1*(methdef S) 1968 constdef = constname S "=" S expn S ";" 1969 globctl = CONTROL S formbody 1970 methdef = id S [parmlist S] "{" S 1*(formatdef S) "}" 1971 / id S [parmlist S] STRQ *STRCHAR STRQ S ";" 1972 parmlist = "(" S id S *( "," S id S ) ")" 1973 formatdef = formhead S formbody 1974 formhead = UNCOMPRESSED [ 1*WS id ] 1975 / COMPRESSED [ 1*WS id ] 1976 / CONTROL / INITIAL / DEFAULT 1977 formbody = "{" S *((fielddef/enforcer) S) "}" 1978 fielddef = fieldgroup S ["=:=" S encspec S] [lenspec S] ";" 1979 fieldgroup = fieldname *( S ":" S fieldname ) 1980 fieldname = id 1981 encspec = "'" *("0"/"1") "'" 1982 / id [ S "(" S expn S *( "," S expn S ) ")"] 1983 lenspec = "[" S expn S *("," S expn S) "]" 1984 enforcer = ENFORCE S "(" S expn S ")" S ";" 1985 ; expressions 1986 expn = *(expnb S "||" S) expnb 1987 expnb = *(expna S "&&" S) expna 1988 expna = *(expn7 S ("=="/"!=") S) expn7 1989 expn7 = *(expn6 S ("<"/"<="/">"/">=") S) expn6 1990 expn6 = *(expn4 S ("+"/"-") S) expn4 1991 expn4 = *(expn3 S ("*"/"/"/"%") S) expn3 1992 expn3 = expn2 [S "^" S expn3] 1993 expn2 = ["!" S] expn1 1994 expn1 = expn0 / attref / constname / litval / id 1995 expn0 = "(" S expn S ")" / VARIABLE 1996 attref = fieldnameref "." attname 1997 fieldnameref = fieldname / THIS 1998 attname = ( U / C ) ( LENGTH / VALUE ) 1999 litval = ["-"] "0b" 1*("0"/"1") 2000 / ["-"] "0x" 1*(DIGIT/"a"/"b"/"c"/"d"/"e"/"f") 2001 / ["-"] 1*DIGIT 2002 / false / true 2004 ; lexical categories 2005 constname = UPCASE *(UPCASE / DIGIT / "_") 2006 id = ALPHA *(ALPHA / DIGIT / "_") 2007 ALPHA = %x41-5A / %x61-7A 2008 UPCASE = %x41-5A 2009 DIGIT = %x30-39 2010 COMMENT = "//" *(SP / HTAB / VCHAR) CRLF 2011 SP = %x20 2012 HTAB = %x09 2013 VCHAR = %x21-7E 2014 CRLF = %x0A / %x0D.0A 2015 NL = COMMENT / CRLF 2016 WS = SP / HTAB / NL 2017 S = *WS 2018 STRCHAR = SP / HTAB / %x21 / %x23-7E 2019 STRQ = %x22 2020 ; case-sensitive literals 2021 C = %d67 2022 COMPRESSED = %d67.79.77.80.82.69.83.83.69.68 2023 CONTROL = %d67.79.78.84.82.79.76 2024 DEFAULT = %d68.69.70.65.85.76.84 2025 ENFORCE = %d69.78.70.79.82.67.69 2026 INITIAL = %d73.78.73.84.73.65.76 2027 LENGTH = %d76.69.78.71.84.72 2028 THIS = %d84.72.73.83 2029 U = %d85 2030 UNCOMPRESSED = %d85.78.67.79.77.80.82.69.83.83.69.68 2031 VALUE = %d86.65.76.85.69 2032 VARIABLE = %d86.65.82.73.65.66.76.69 2033 false = %d102.97.108.115.101 2034 true = %d116.114.117.101 2036 Appendix B. Bit-level Worked Example 2038 This section gives a worked example at the bit level, showing how a 2039 simple ROHC-FN specification describes the compression of real data 2040 from an imaginary protocol header. The example used has been kept 2041 fairly simple, whilst still aiming to illustrate some of the 2042 intricacies that arise in use of the notation. In particular, fields 2043 have been kept short to make it possible to read the binary 2044 representation of the headers without too much difficulty. 2046 B.1. Example Packet Format 2048 Our imaginary header is just 16 bits long, and consists of the 2049 following fields: 2051 1. version number -- 2 bits 2052 2. type -- 2 bits 2053 3. flow id -- 4 bits 2054 4. sequence number -- 4 bits 2055 5. flag bits -- 4 bits 2057 So for example 0101000100010000 indicates a header with a version 2058 number of one, a type of one, a flow id of one, a sequence number of 2059 one, and all flag bits set to zero. 2061 Here is an ASCII box notation diagram of the imaginary header: 2063 0 1 2 3 4 5 6 7 2064 +---+---+---+---+---+---+---+---+ 2065 |version| type | flow_id | 2066 +---+---+---+---+---+---+---+---+ 2067 | sequence_no | flag_bits | 2068 +---+---+---+---+---+---+---+---+ 2070 B.2. Initial Encoding 2072 An initial definition based solely on the above information is: 2074 eg_header 2075 { 2076 UNCOMPRESSED { 2077 version_no [ 2 ]; 2078 type [ 2 ]; 2079 flow_id [ 4 ]; 2080 sequence_no [ 4 ]; 2081 flag_bits [ 4 ]; 2082 } 2084 COMPRESSED initial_definition { 2085 version_no =:= irregular(2); 2086 type =:= irregular(2); 2087 flow_id =:= irregular(4); 2088 sequence_no =:= irregular(4); 2089 flag_bits =:= irregular(4); 2090 } 2091 } 2093 This defines the format nicely, but doesn't actually offer any 2094 compression. If we use it to encode the above header, we get: 2096 Uncompressed header: 0101000100010000 2097 Compressed header: 0101000100010000 2099 This is because we have stated that all fields are "irregular" -- 2100 i.e. we haven't specified anything about their behaviour. 2102 Note that since we have only one compressed format and one 2103 uncompressed format, it makes no difference whether the encoding 2104 methods for each field are specified in the compressed or 2105 uncompressed format. It would make no difference at all if we wrote 2106 the following instead: 2108 eg_header 2109 { 2110 UNCOMPRESSED { 2111 version_no =:= irregular(2); 2112 type =:= irregular(2); 2113 flow_id =:= irregular(4); 2114 sequence_no =:= irregular(4); 2115 flag_bits =:= irregular(4); 2116 } 2118 COMPRESSED initial_definition { 2119 version_no [ 2 ]; 2120 type [ 2 ]; 2121 flow_id [ 4 ]; 2122 sequence_no [ 4 ]; 2123 flag_bits [ 4 ]; 2124 } 2125 } 2127 B.3. Basic Compression 2129 In order to achieve any compression we need to notate more knowledge 2130 about the header and its behaviour in a flow. For example, we may 2131 know the following facts about the header: 2133 1. version number -- indicates which version of the protocol this 2134 is: always one for this version of the protocol 2135 2. type -- may take any value. 2136 3. flow id -- may take any value. 2137 4. sequence number -- make take any value 2138 5. flag bits -- contains three flags, a, b and c, each of which may 2139 be set or clear, and a reserved flag bit, which is always clear 2140 (i.e. zero). 2142 We could notate this knowledge as follows: 2144 eg_header 2145 { 2146 UNCOMPRESSED { 2147 version_no [ 2 ]; 2148 type [ 2 ]; 2149 flow_id [ 4 ]; 2150 sequence_no [ 4 ]; 2151 abc_flag_bits [ 3 ]; 2152 reserved_flag [ 1 ]; 2153 } 2155 COMPRESSED basic { 2156 version_no =:= uncompressed_value(2, 1) [ 0 ]; 2157 type =:= irregular(2) [ 2 ]; 2158 flow_id =:= irregular(4) [ 4 ]; 2159 sequence_no =:= irregular(4) [ 4 ]; 2160 abc_flag_bits =:= irregular(3) [ 3 ]; 2161 reserved_flag =:= uncompressed_value(1, 0) [ 0 ]; 2162 } 2163 } 2165 Using this simple scheme, we have successfully encoded the fact that 2166 one of the fields has a permanently fixed value of one, and therefore 2167 contains no useful information. We have also encoded the fact that 2168 the final flag bit is always zero, which again contains no useful 2169 information. Both of these facts have been notated using the 2170 "uncompressed_value" encoding method (see Section 4.11.1). 2172 Using this new encoding on the above header, we get: 2174 Uncompressed header: 0101000100010000 2175 Compressed header: 0100010001000 2177 which reduces the amount of data we need to transmit by roughly 20%. 2178 However, this encoding fails to take advantage of relationships 2179 between values of a field in one packet and its value in subsequent 2180 packets. For example, every header in the following sequence is 2181 compressed by the same amount despite the similarities between them: 2183 Uncompressed header: 0101000100010000 2184 Compressed header: 0100010001000 2186 Uncompressed header: 0101000101000000 2187 Compressed header: 0100010100000 2189 Uncompressed header: 0110000101110000 2190 Compressed header: 1000010111000 2192 B.4. Inter-packet compression 2194 The profile we have defined so far has not compressed the sequence 2195 number or flow ID fields at all, since they can take any value. 2196 However the value of each of these fields in one header has a very 2197 simple relationship to their values in previous headers: 2198 o the sequence number is unusual -- it increases by three each time, 2199 o the flow_id stays the same -- it always has the same value that it 2200 did in the previous header in the flow, 2201 o the abc_flag_bits stay the same most of the time -- they usually 2202 have the same value that they did in the previous header in the 2203 flow. 2205 An obvious way of notating this is as follows: 2207 // This obvious encoding will not work (correct encoding below) 2208 eg_header 2209 { 2210 UNCOMPRESSED { 2211 version_no [ 2 ]; 2212 type [ 2 ]; 2213 flow_id [ 4 ]; 2214 sequence_no [ 4 ]; 2215 abc_flag_bits [ 3 ]; 2216 reserved_flag [ 1 ]; 2217 } 2219 COMPRESSED obvious { 2220 version_no =:= uncompressed_value(2, 1); 2221 type =:= irregular(2); 2222 flow_id =:= static; 2223 sequence_no =:= lsb(0, -3); 2224 abc_flag_bits =:= irregular(3); 2225 reserved_flag =:= uncompressed_value(1, 0); 2226 } 2227 } 2229 The dependency on previous packets is notated using the "static" and 2230 "lsb" encoding methods (see Section 4.11.4 and Section 4.11.5 2231 respectively). However there are a few problems with the above 2232 notation. 2234 Firstly, and most importantly, the "flow_id" field is notated as 2235 "static" which means that it doesn't change from packet to packet. 2236 However, the notation does not indicate how to communicate the value 2237 of the field initially. There is no point saying "it's the same 2238 value as last time", if there has not been a first time where we 2239 define what that value is, so that it can be referred back to. The 2240 above notation provides no way of communicating that. Similarly with 2241 the sequence number -- there needs to be a way of communicating its 2242 initial value. In fact, except for the explicit notation indicating 2243 their lengths, even the lengths of these two fields would be left 2244 undefined. This problem will be solved below, in Appendix B.5. 2246 Secondly, the sequence number field is communicated very efficiently 2247 in zero bits, but it is not at all robust against packet loss. If a 2248 packet is lost then there is no way to handle the missing sequence 2249 number. When communicating sequence numbers, or any other field 2250 encoding with LSB encoding, a very important consideration for the 2251 notator is how robust against packet loss the compressed protocol 2252 should be. This will vary a lot from protocol stack to protocol 2253 stack. For the example protocol we'll assume short, low overhead 2254 flows and say we need to be robust to the loss of just one packet, 2255 which we can achieve with two bits of LSB encoding (one bit isn't 2256 enough since the sequence number increases by three each time, see 2257 Section 4.11.5). This will be solved below in Appendix B.5. 2259 Finally, although the flag bits are usually the same as in the 2260 previous header in the flow, the profile doesn't make any use of this 2261 fact; since they are sometimes not the same as those in the previous 2262 header, it is not safe to say that they are always the same, so 2263 "static" encoding can't be used exclusively. This problem will be 2264 solved later through the use of multiple formats in Appendix B.6. 2266 B.5. Specifying Initial Values 2268 To communicate initial values for fields compressed with a context 2269 dependent encoding such as "static" or "lsb" we use an "INITIAL" 2270 field list. This can help with fields whose start value is fixed and 2271 known. For example if we knew that at the start of the flow, 2272 "flow_id" would always be 1 and "sequence_no" would always be 0, we 2273 could notate that like this: 2275 // This encoding will not work either (correct encoding below) 2276 eg_header 2277 { 2278 UNCOMPRESSED { 2279 version_no [ 2 ]; 2280 type [ 2 ]; 2281 flow_id [ 4 ]; 2282 sequence_no [ 4 ]; 2283 abc_flag_bits [ 3 ]; 2284 reserved_flag [ 1 ]; 2285 } 2287 INITIAL { 2288 // set initial values of fields before flow starts 2289 flow_id =:= uncompressed_value(4, 1); 2290 sequence_no =:= uncompressed_value(4, 0); 2291 } 2293 COMPRESSED obvious { 2294 version_no =:= uncompressed_value(2, 1); 2295 type =:= irregular(2); 2296 flow_id =:= static; 2297 sequence_no =:= lsb(2, -3); 2298 abc_flag_bits =:= irregular(3); 2299 reserved_flag =:= uncompressed_value(1, 0); 2300 } 2301 } 2303 However, this use of "INITIAL" is no good since the initial values of 2304 both "flow_id" and "sequence_no" vary from flow to flow. "INITIAL" 2305 is only applicable where the initial value of a field is fixed, as is 2306 often the case with control fields. 2308 B.6. Multiple Packet Formats 2310 To communicate initial values for the sequence number and flow ID 2311 fields correctly, and to take advantage of the fact that the flag 2312 bits are usually the same as in the previous header, we need to 2313 depart from the single format encoding we are currently using and 2314 instead use multiple formats. Here, we have expressed the encodings 2315 for two of the fields in the uncompressed format, since they will 2316 always be true for uncompressed headers of that format. The 2317 remaining fields, whose encoding method may depend on exactly how the 2318 header is being compressed, have their encodings specified in the 2319 compressed formats. 2321 eg_header 2322 { 2323 UNCOMPRESSED { 2324 version_no =:= uncompressed_value(2, 1) [ 2 ]; 2325 type [ 2 ]; 2326 flow_id [ 4 ]; 2327 sequence_no [ 4 ]; 2328 abc_flag_bits [ 3 ]; 2329 reserved_flag =:= uncompressed_value(1, 0) [ 1 ]; 2330 } 2332 COMPRESSED irregular_format { 2333 discriminator =:= '0' [ 1 ]; 2334 version_no [ 0 ]; 2335 type =:= irregular(2) [ 2 ]; 2336 flow_id =:= irregular(4) [ 4 ]; 2337 sequence_no =:= irregular(4) [ 4 ]; 2338 abc_flag_bits =:= irregular(3) [ 3 ]; 2339 reserved_flag [ 0 ]; 2340 } 2342 COMPRESSED compressed_format { 2343 discriminator =:= '1' [ 1 ]; 2344 version_no [ 0 ]; 2345 type =:= irregular(2) [ 2 ]; 2346 flow_id =:= static [ 0 ]; 2347 sequence_no =:= lsb(2, -3) [ 2 ]; 2348 abc_flag_bits =:= static [ 0 ]; 2349 reserved_flag [ 0 ]; 2350 } 2351 } 2353 Note that we have had to add a discriminator field, so that the 2354 decompressor can tell which format has been used by the compressor. 2355 The format with a "static" flow ID and "lsb" encoded sequence number, 2356 is now 5 bits long. Note that despite having to add the 2357 discriminator field, this format is still the same size as the 2358 original incorrect naive notation, because this notation takes 2359 advantage of the fact that the abc flag bits rarely change. 2361 However, the original format (with an "irregular" flow ID and 2362 sequence number) has also grown by one bit due to the addition of the 2363 discriminator. An important consideration when creating multiple 2364 formats is whether each format occurs frequently enough that the 2365 average compressed header length is shorter as a result of its usage. 2366 For example, if in fact the flag bits always changed between packets, 2367 the "static" encoding could never be used; all we would have achieved 2368 is to lengthen the "irregular" format by one bit. 2370 Using the above notation, we now get: 2372 Uncompressed header: 0101000100010000 2373 Compressed header: 00100010001000 2375 Uncompressed header: 0101000101000000 2376 Compressed header: 10100 ; 00100010100000 2378 Uncompressed header: 0110000101110000 2379 Compressed header: 11011 ; 01000010111000 2381 The first header in the stream is compressed the same way as before, 2382 except that it now has the extra 1 bit discriminator at the start 2383 (0). When a second header arrives, with the same flow ID as the 2384 first and its sequence number three higher, it can now be compressed 2385 in two possible ways, either using "compressed_format" or in the same 2386 way as previously, using "irregular_format". 2388 Note that we show all theoretically possible encodings of a header as 2389 defined by the ROHC-FN specification, separated by semi-colons. 2390 Either of the above encodings for each header could be produced by a 2391 valid implementation, although a good implementation would always aim 2392 to pick the encoding which led to the best compression. A good 2393 implementation would also take robustness into account and so 2394 probably wouldn't assume on the second packet that the decompressor 2395 had available the context necessary to decompress the shorter form of 2396 the packet. 2398 Finally, note that the fields whose encoding methods are specified in 2399 the uncompressed format have zero length when compressed. This means 2400 their position in the compressed format is not significant. In this 2401 case there is no need to notate them when defining the compressed 2402 formats. In the next part of the example we will see that they have 2403 been removed from the compressed formats altogether. 2405 B.7. Variable Length Discriminators 2407 Suppose we do some analysis on flows of our example protocol and 2408 discover that whilst it is usual for successive packets to have the 2409 same flags, on the occasions when they don't, the packet is almost 2410 always a "flags set" packet in which all three of the abc flags are 2411 set. To encode the flow more efficiently a format needs to be 2412 written to reflect this. 2414 This now gives a total of three formats, which means we need three 2415 discriminators to differentiate between them. The obvious solution 2416 here is to increase the number of bits in the discriminator from one 2417 to two and for example use discriminators 00, 01, and 10. However we 2418 can do slightly better than this. 2420 Any uniquely identifiable discriminator will suffice, so we can use 2421 00, 01 and 1. If the discriminator starts with 1, that's the whole 2422 thing. If it starts with 0 the decompressor knows it has to check 2423 one more bit to determine the kind of format. 2425 Note that care must be taken when using variable length 2426 discriminators. For example, it would be erroneous to use 0, 01 and 2427 10 as discriminators since after reading an initial 0, the 2428 decompressor would have no way of knowing if the next bit was a 2429 second bit of discriminator, or the first bit of the next field in 2430 the format. 0, 10 and 11 however would be correct as the first bit 2431 again indicates whether or not there are further discriminator bits 2432 to follow. 2434 This gives us the following: 2436 eg_header 2437 { 2438 UNCOMPRESSED { 2439 version_no =:= uncompressed_value(2, 1) [ 2 ]; 2440 type [ 2 ]; 2441 flow_id [ 4 ]; 2442 sequence_no [ 4 ]; 2443 abc_flag_bits [ 3 ]; 2444 reserved_flag =:= uncompressed_value(1, 0) [ 1 ]; 2445 } 2447 COMPRESSED irregular_format { 2448 discriminator =:= '00' [ 2 ]; 2449 type =:= irregular(2) [ 2 ]; 2450 flow_id =:= irregular(4) [ 4 ]; 2451 sequence_no =:= irregular(4) [ 4 ]; 2452 abc_flag_bits =:= irregular(3) [ 3 ]; 2453 } 2455 COMPRESSED flags_set { 2456 discriminator =:= '01' [ 2 ]; 2457 type =:= irregular(2) [ 2 ]; 2458 flow_id =:= static [ 0 ]; 2459 sequence_no =:= lsb(2, -3) [ 2 ]; 2460 abc_flag_bits =:= uncompressed_value(3, 7) [ 0 ]; 2461 } 2463 COMPRESSED flags_static { 2464 discriminator =:= '1' [ 1 ]; 2465 type =:= irregular(2) [ 2 ]; 2466 flow_id =:= static [ 0 ]; 2467 sequence_no =:= lsb(2, -3) [ 2 ]; 2468 abc_flag_bits =:= static [ 0 ]; 2469 } 2470 } 2472 Here is some example output: 2474 Uncompressed header: 0101000100010000 2475 Compressed header: 000100010001000 2477 Uncompressed header: 0101000101000000 2478 Compressed header: 10100 ; 000100010100000 2480 Uncompressed header: 0110000101110000 2481 Compressed header: 11011 ; 001000010111000 2483 Uncompressed header: 0111000110101110 2484 Compressed header: 011110 ; 001100011010111 2486 Here we have a very similar sequence to last time, except that there 2487 is now an extra message on the end which has the flag bits set. The 2488 encoding for the first message in the stream is now one bit larger, 2489 the encoding for the next two messages is the same as before, since 2490 that format has not grown, thanks to the use of variable length 2491 discriminators. Finally the packet that comes through with all the 2492 flag bits set can be encoded in just six bits, only one bit more than 2493 the most common format. Without the extra format, this last packet 2494 would have to be encoded using the longest format and would have 2495 taken up 14 bits. 2497 B.8. Default encoding 2499 Some of the common encoding methods used so far have been "factored 2500 out" into the definition of the uncompressed format meaning that they 2501 don't need to be defined for every compressed format. However, there 2502 is still some redundancy in the notation. For a number of fields, 2503 the same encoding method is used several times in different formats 2504 (though not necessarily in all of them), but the field encoding is 2505 redefined explicitly each time. If the encoding for any of these 2506 fields changed in the future (e.g. if the reserved flag took on some 2507 new role), then every format which uses that encoding would have to 2508 be modified to reflect this change. 2510 This problem can be avoided by specifying default encoding methods 2511 for these fields. Doing so can also lead to a more concisely notated 2512 profile: 2514 eg_header 2515 { 2516 UNCOMPRESSED { 2517 version_no =:= uncompressed_value(2, 1) [ 2 ]; 2518 type [ 2 ]; 2519 flow_id [ 4 ]; 2520 sequence_no [ 4 ]; 2521 abc_flag_bits [ 3 ]; 2522 reserved_flag =:= uncompressed_value(1, 0) [ 1 ]; 2523 } 2525 DEFAULT { 2526 type =:= irregular(2); 2527 flow_id =:= static; 2528 sequence_no =:= lsb(2, -3); 2529 } 2531 COMPRESSED irregular_format { 2532 discriminator =:= '00' [ 2 ]; 2533 type [ 2 ]; // Uses default 2534 flow_id =:= irregular(4) [ 4 ]; // Overrides default 2535 sequence_no =:= irregular(4) [ 4 ]; // Overrides default 2536 abc_flag_bits =:= irregular(3) [ 3 ]; 2537 } 2539 COMPRESSED flags_set { 2540 discriminator =:= '01' [ 2 ]; 2541 type [ 2 ]; // Uses default 2542 sequence_no [ 2 ]; // Uses default 2543 abc_flag_bits =:= uncompressed_value(3, 7); 2544 } 2546 COMPRESSED flags_static { 2547 discriminator =:= '1' [ 1 ]; 2548 type [ 2 ]; // Uses default 2549 sequence_no [ 2 ]; // Uses default 2550 abc_flag_bits =:= static; 2551 } 2552 } 2554 The above profile behaves in exactly the same way as the one notated 2555 previously, since it has the same meaning. Note that the purpose 2556 behind the different formats becomes clearer with the default 2557 encoding methods factored out: all that remains are the encodings 2558 which are specific to each format. Note also that default encoding 2559 methods which compress down to zero bits have become completely 2560 implicit. For example the compressed formats using the default 2561 encoding for "flow_id" don't mention it (the default is "static" 2562 encoding which compresses to zero bits). 2564 B.9. Control Fields 2566 One inefficiency in the compression scheme we have produced thus far 2567 is that it uses two bits to provide the LSB encoded sequence number 2568 with robustness for the loss of just one packet. In theory only one 2569 bit should be needed. The root of the problem is the unusual 2570 sequence number that the protocol uses -- it counts up in increments 2571 of three. In order to encode it at maximum efficiency we need to 2572 translate this into a field that increments by one each time. We do 2573 this using a control field. 2575 A control field is extra data that is communicated in the compressed 2576 format, but which is not a direct encoding of part of the 2577 uncompressed header. Control fields can be used to communicate extra 2578 information in the compressed format, that allows other fields to be 2579 compressed more efficiently. 2581 The control field which we introduce scales the sequence number down 2582 by a factor of three. Instead of encoding the original sequence 2583 number in the compressed packet, we encode the scaled sequence 2584 number, allowing us to have robustness to the loss of one packet by 2585 using just one bit of LSB encoding: 2587 eg_header 2588 { 2589 UNCOMPRESSED { 2590 version_no =:= uncompressed_value(2, 1) [ 2 ]; 2591 type [ 2 ]; 2592 flow_id [ 4 ]; 2593 sequence_no [ 4 ]; 2594 abc_flag_bits [ 3 ]; 2595 reserved_flag =:= uncompressed_value(1, 0) [ 1 ]; 2596 } 2598 CONTROL { 2599 // need modulo maths to calculate scaling correctly, 2600 // due to 4 bit wrap around 2601 scaled_seq_no [ 4 ]; 2602 ENFORCE(sequence_no.UVALUE 2603 == (scaled_seq_no.UVALUE * 3) % 16); 2604 } 2606 DEFAULT { 2607 type =:= irregular(2); 2608 flow_id =:= static; 2609 scaled_seq_no =:= lsb(1, -1); 2610 } 2612 COMPRESSED irregular_format { 2613 discriminator =:= '00' [ 2 ]; 2614 type [ 2 ]; 2615 flow_id =:= irregular(4) [ 4 ]; 2616 scaled_seq_no =:= irregular(4) [ 4 ]; // Overrides default 2617 abc_flag_bits =:= irregular(3) [ 3 ]; 2618 } 2620 COMPRESSED flags_set { 2621 discriminator =:= '01' [ 2 ]; 2622 type [ 2 ]; 2623 scaled_seq_no [ 1 ]; // Uses default 2624 abc_flag_bits =:= uncompressed_value(3, 7); 2625 } 2627 COMPRESSED flags_static { 2628 discriminator =:= '1' [ 1 ]; 2629 type [ 2 ]; 2630 scaled_seq_no [ 1 ]; // Uses default 2631 abc_flag_bits =:= static; 2632 } 2633 } 2635 Normally, the encoding method(s) used to encode a field specify the 2636 length of the field. In the above notation, since there is no 2637 encoding method using "sequence_no" directly, its length needs to be 2638 defined explicitly using an "ENFORCE" statement. This is done using 2639 the abbreviated syntax, both for consistency and also for ease of 2640 readability. Note that this is unusual: whereas the majority of 2641 field length indications are redundant (and thus optional), this one 2642 isn't. If it was removed from the above notation, the length of the 2643 "sequence_no" field would be undefined. 2645 Here is some example output: 2647 Uncompressed header: 0101000100010000 2648 Compressed header: 000100011011000 2650 Uncompressed header: 0101000101000000 2651 Compressed header: 1010 ; 000100011100000 2653 Uncompressed header: 0110000101110000 2654 Compressed header: 1101 ; 001000011101000 2656 Uncompressed header: 0111000110101110 2657 Compressed header: 01110 ; 001100011110111 2659 In this form, we see that this gives us a saving of a further bit in 2660 most packets. Assuming the bulk of a flow is made up of 2661 "flags_static" headers, the mean size of the headers in a compressed 2662 flow is now just over a quarter of their size in an uncompressed 2663 flow. 2665 B.10. Use Of "ENFORCE" Statements As Conditionals 2667 Earlier, we created a new format "flags_set" to handle packets with 2668 all three of the flag bits set. As it happens, these three flags are 2669 always all set for "type 3" packets, and are never all set for other 2670 packet types (a "type 3" packet is one where the type field is set to 2671 three). 2673 This allows extra efficiency in encoding such packets. We know the 2674 type is three, so we don't need to encode the type field in the 2675 compressed header. The type field was previously encoded as 2676 "irregular(2)" which is two bits long. Removing this reduces the 2677 size of the "flags_set" format from five bits to three, making it the 2678 smallest format in the encoding method definition. 2680 In order to notate that the "flags_set" format should only be used 2681 for "type 3" headers, and the "flags_static" format only when the 2682 type isn't three it is necessary to state these conditions inside 2683 each format. This can be done with a "ENFORCE" statement: 2685 eg_header 2686 { 2687 UNCOMPRESSED { 2688 version_no =:= uncompressed_value(2, 1) [ 2 ]; 2689 type [ 2 ]; 2690 flow_id [ 4 ]; 2691 sequence_no [ 4 ]; 2692 abc_flag_bits [ 3 ]; 2693 reserved_flag =:= uncompressed_value(1, 0) [ 1 ]; 2694 } 2696 CONTROL { 2697 // need modulo maths to calculate scaling correctly, 2698 // due to 4 bit wrap around 2699 scaled_seq_no [ 4 ]; 2700 ENFORCE(sequence_no.UVALUE 2701 == (scaled_seq_no.UVALUE * 3) % 16); 2702 } 2704 DEFAULT { 2705 type =:= irregular(2); 2706 scaled_seq_no =:= lsb(1, -1); 2707 flow_id =:= static; 2708 } 2710 COMPRESSED irregular_format { 2711 discriminator =:= '00' [ 2 ]; 2712 type [ 2 ]; 2713 flow_id =:= irregular(4) [ 4 ]; 2714 scaled_seq_no =:= irregular(4) [ 4 ]; 2715 abc_flag_bits =:= irregular(3) [ 3 ]; 2716 } 2718 COMPRESSED flags_set { 2719 ENFORCE(type.UVALUE == 3); // redundant condition 2720 discriminator =:= '01' [ 2 ]; 2721 type =:= uncompressed_value(2, 3) [ 0 ]; 2722 scaled_seq_no [ 1 ]; 2723 abc_flag_bits =:= uncompressed_value(3, 7) [ 0 ]; 2724 } 2726 COMPRESSED flags_static { 2727 ENFORCE(type.UVALUE != 3); 2728 discriminator =:= '1' [ 1 ]; 2729 type [ 2 ]; 2730 scaled_seq_no [ 1 ]; 2731 abc_flag_bits =:= static [ 0 ]; 2732 } 2733 } 2735 The two "ENFORCE" statements in the last two formats act as "guards". 2736 Guards prevent formats from being used under the wrong circumstances. 2737 In fact the "ENFORCE" statement in "flags_set" is redundant. The 2738 condition it guards for is already enforced by the new encoding 2739 method used for the "type" field. The encoding method 2740 "uncompressed_value(2,3)" binds the "UVALUE" attribute to three. 2741 This is exactly what the "ENFORCE" statement does, so it can be 2742 removed without any change in meaning. The "uncompressed_value" 2743 encoding method on the other hand is not redundant. It specifies 2744 other bindings on the type field in addition to the one which the 2745 "ENFORCE" statement specifies. Therefore it would not be possible to 2746 remove the encoding method and leave just the "ENFORCE" statement. 2748 Note that a guard is solely preventative. A guard can never force a 2749 format to be chosen by the compressor. A format can only be 2750 guaranteed to be chosen in a given situation if there are no other 2751 formats which can be used instead. This is demonstrated in the 2752 example output below. The compressor can still choose the 2753 "irregular" format if it wishes: 2755 Uncompressed header: 0101000100010000 2756 Compressed header: 000100011011000 2758 Uncompressed header: 0101000101000000 2759 Compressed header: 1010 ; 000100011100000 2761 Uncompressed header: 0110000101110000 2762 Compressed header: 1101 ; 001000011101000 2764 Uncompressed header: 0111000110101110 2765 Compressed header: 010 ; 001100011110111 2767 This saves just two extra bits (a 7% saving) in the example flow. 2769 Authors' Addresses 2771 Robert Finking 2772 Siemens/Roke Manor 2773 Roke Manor Research Ltd. 2774 Romsey, Hampshire SO51 0ZN 2775 UK 2777 Phone: +44 (0)1794 833189 2778 Email: robert.finking@roke.co.uk 2779 URI: http://www.roke.co.uk 2781 Ghyslain Pelletier 2782 Ericsson 2783 Box 920 2784 Lulea SE-971 28 2785 Sweden 2787 Phone: +46 (0) 8 404 29 43 2788 Email: ghyslain.pelletier@ericsson.com 2790 Full Copyright Statement 2792 Copyright (C) The IETF Trust (2006). 2794 This document is subject to the rights, licenses and restrictions 2795 contained in BCP 78, and except as set forth therein, the authors 2796 retain all their rights. 2798 This document and the information contained herein are provided on an 2799 "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS 2800 OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND 2801 THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS 2802 OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF 2803 THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED 2804 WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. 2806 Intellectual Property 2808 The IETF takes no position regarding the validity or scope of any 2809 Intellectual Property Rights or other rights that might be claimed to 2810 pertain to the implementation or use of the technology described in 2811 this document or the extent to which any license under such rights 2812 might or might not be available; nor does it represent that it has 2813 made any independent effort to identify any such rights. Information 2814 on the procedures with respect to rights in RFC documents can be 2815 found in BCP 78 and BCP 79. 2817 Copies of IPR disclosures made to the IETF Secretariat and any 2818 assurances of licenses to be made available, or the result of an 2819 attempt made to obtain a general license or permission for the use of 2820 such proprietary rights by implementers or users of this 2821 specification can be obtained from the IETF on-line IPR repository at 2822 http://www.ietf.org/ipr. 2824 The IETF invites any interested party to bring to its attention any 2825 copyrights, patents or patent applications, or other proprietary 2826 rights that may cover technology that may be required to implement 2827 this standard. Please address the information to the IETF at 2828 ietf-ipr@ietf.org. 2830 Acknowledgment 2832 Funding for the RFC Editor function is provided by the IETF 2833 Administrative Support Activity (IASA).