idnits 2.17.1 

draft-ietf-rohc-formal-notation-13.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** It looks like you're using RFC 3978 boilerplate.  You should update this
     to the boilerplate described in the IETF Trust License Policy document
     (see https://trustee.ietf.org/license-info), which is required now.

  -- Found old boilerplate from RFC 3978, Section 5.1 on line 16.

  -- Found old boilerplate from RFC 3978, Section 5.5, updated by RFC 4748 on
     line 2804.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 1 on line 2815.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 2 on line 2822.

  -- Found old boilerplate from RFC 3979, Section 5, paragraph 3 on line 2828.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

     No issues found here.

  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust Copyright Line does not match the
     current year

  -- The document seems to lack a disclaimer for pre-RFC5378 work, but may
     have content which was first submitted before 10 November 2008.  If you
     have contacted all the original authors and they are all willing to grant
     the BCP78 rights to the IETF Trust, then this is fine, and you can ignore
     this comment.  If not, you may need to add the pre-RFC5378 disclaimer. 
     (See the Legal Provisions document at
     https://trustee.ietf.org/license-info for more information.)

  -- The document date (November 2006) is 6371 days in the past.  Is this
     intentional?


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  -- Possible downref: Non-RFC (?) normative reference: ref. 'C90'

  == Outdated reference: A later version (-04) exists of
     draft-ietf-rohc-rfc3095bis-framework-01

  ** Obsolete normative reference: RFC 2822 (Obsoleted by RFC 5322)

  ** Obsolete normative reference: RFC 4234 (Obsoleted by RFC 5234)


     Summary: 3 errors (**), 0 flaws (~~), 2 warnings (==), 8 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------


2	Robust Header Compression                                     R. Finking
3	Internet-Draft                                        Siemens/Roke Manor
4	Intended status: Standards Track                            G. Pelletier
5	Expires: May 5, 2007                                            Ericsson
6	                                                           November 2006

8	        Formal Notation for Robust Header Compression (ROHC-FN)
9	                   draft-ietf-rohc-formal-notation-13

11	Status of this Memo

13	   By submitting this Internet-Draft, each author represents that any
14	   applicable patent or other IPR claims of which he or she is aware
15	   have been or will be disclosed, and any of which he or she becomes
16	   aware will be disclosed, in accordance with Section 6 of BCP 79.

18	   Internet-Drafts are working documents of the Internet Engineering
19	   Task Force (IETF), its areas, and its working groups.  Note that
20	   other groups may also distribute working documents as Internet-
21	   Drafts.

23	   Internet-Drafts are draft documents valid for a maximum of six months
24	   and may be updated, replaced, or obsoleted by other documents at any
25	   time.  It is inappropriate to use Internet-Drafts as reference
26	   material or to cite them other than as "work in progress."

28	   The list of current Internet-Drafts can be accessed at
29	   http://www.ietf.org/ietf/1id-abstracts.txt.

31	   The list of Internet-Draft Shadow Directories can be accessed at
32	   http://www.ietf.org/shadow.html.

34	   This Internet-Draft will expire on May 5, 2007.

36	Copyright Notice

38	   Copyright (C) The IETF Trust (2006).

40	Abstract

42	   This document defines ROHC-FN (RObust Header Compression - Formal
43	   Notation): a formal notation to specify field encodings for
44	   compressed formats when defining new profiles within the ROHC
45	   framework.  ROHC-FN offers a library of encoding methods that are
46	   often used in ROHC profiles and can thereby help simplifying future
47	   profile development work.

49	Table of Contents

51	   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
52	   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  4
53	   3.  Overview of ROHC-FN  . . . . . . . . . . . . . . . . . . . . .  5
54	     3.1.  Scope of the Formal Notation . . . . . . . . . . . . . . .  6
55	     3.2.  Fundamentals of the Formal Notation  . . . . . . . . . . .  7
56	       3.2.1.  Fields and Encodings . . . . . . . . . . . . . . . . .  7
57	       3.2.2.  Formats and Encoding Methods . . . . . . . . . . . . .  9
58	     3.3.  Example using IPv4 . . . . . . . . . . . . . . . . . . . . 11
59	   4.  Normative Definition of ROHC-FN  . . . . . . . . . . . . . . . 14
60	     4.1.  Structure of a Specification . . . . . . . . . . . . . . . 15
61	     4.2.  Identifiers  . . . . . . . . . . . . . . . . . . . . . . . 15
62	     4.3.  Constant Definitions . . . . . . . . . . . . . . . . . . . 17
63	     4.4.  Fields . . . . . . . . . . . . . . . . . . . . . . . . . . 17
64	       4.4.1.  Attribute References . . . . . . . . . . . . . . . . . 18
65	       4.4.2.  Representation of Field Values . . . . . . . . . . . . 18
66	     4.5.  Grouping of Fields . . . . . . . . . . . . . . . . . . . . 19
67	     4.6.  "THIS" . . . . . . . . . . . . . . . . . . . . . . . . . . 19
68	     4.7.  Expressions  . . . . . . . . . . . . . . . . . . . . . . . 20
69	       4.7.1.  Integer Literals . . . . . . . . . . . . . . . . . . . 21
70	       4.7.2.  Integer Operators  . . . . . . . . . . . . . . . . . . 21
71	       4.7.3.  Boolean Literals . . . . . . . . . . . . . . . . . . . 21
72	       4.7.4.  Boolean Operators  . . . . . . . . . . . . . . . . . . 21
73	       4.7.5.  Comparison Operators . . . . . . . . . . . . . . . . . 22
74	     4.8.  Comments . . . . . . . . . . . . . . . . . . . . . . . . . 22
75	     4.9.  "ENFORCE" Statements . . . . . . . . . . . . . . . . . . . 23
76	     4.10. Formal Specification of Field Lengths  . . . . . . . . . . 24
77	     4.11. Library of Encoding Methods  . . . . . . . . . . . . . . . 25
78	       4.11.1. uncompressed_value . . . . . . . . . . . . . . . . . . 25
79	       4.11.2. compressed_value . . . . . . . . . . . . . . . . . . . 26
80	       4.11.3. irregular  . . . . . . . . . . . . . . . . . . . . . . 27
81	       4.11.4. static . . . . . . . . . . . . . . . . . . . . . . . . 28
82	       4.11.5. lsb  . . . . . . . . . . . . . . . . . . . . . . . . . 28
83	       4.11.6. crc  . . . . . . . . . . . . . . . . . . . . . . . . . 30
84	     4.12. Definition of Encoding Methods . . . . . . . . . . . . . . 30
85	       4.12.1. Structure  . . . . . . . . . . . . . . . . . . . . . . 31
86	       4.12.2. Arguments  . . . . . . . . . . . . . . . . . . . . . . 38
87	       4.12.3. Multiple Formats . . . . . . . . . . . . . . . . . . . 39
88	     4.13. Profile-specific Encoding Methods  . . . . . . . . . . . . 42
89	   5.  Security considerations  . . . . . . . . . . . . . . . . . . . 42
90	   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 42
91	   7.  Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 42
92	   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 43
93	   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 43
94	     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 43
95	     9.2.  Informative References . . . . . . . . . . . . . . . . . . 44
96	   Appendix A.  Formal Syntax of ROHC-FN  . . . . . . . . . . . . . . 44
97	   Appendix B.  Bit-level Worked Example  . . . . . . . . . . . . . . 46
98	     B.1.  Example Packet Format  . . . . . . . . . . . . . . . . . . 46
99	     B.2.  Initial Encoding . . . . . . . . . . . . . . . . . . . . . 47
100	     B.3.  Basic Compression  . . . . . . . . . . . . . . . . . . . . 48
101	     B.4.  Inter-packet compression . . . . . . . . . . . . . . . . . 50
102	     B.5.  Specifying Initial Values  . . . . . . . . . . . . . . . . 51
103	     B.6.  Multiple Packet Formats  . . . . . . . . . . . . . . . . . 52
104	     B.7.  Variable Length Discriminators . . . . . . . . . . . . . . 54
105	     B.8.  Default encoding . . . . . . . . . . . . . . . . . . . . . 57
106	     B.9.  Control Fields . . . . . . . . . . . . . . . . . . . . . . 59
107	     B.10. Use Of "ENFORCE" Statements As Conditionals  . . . . . . . 61
108	   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 63
109	   Intellectual Property and Copyright Statements . . . . . . . . . . 65

111	1.  Introduction

113	   ROHC-FN is a formal notation designed to help with the definition of
114	   ROHC [I-D.ietf-rohc-rfc3095bis-framework] header compression
115	   profiles.  Previous header compression profiles have been so far
116	   specified using a combination of English text together with ASCII Box
117	   notation.  Unfortunately, this was sometimes unclear and ambiguous,
118	   revealing the limitations of defining complex structures and
119	   encodings for compressed formats this way.  The primary objective of
120	   the Formal Notation is to provide a more rigorous means to define
121	   header formats -- compressed and uncompressed -- as well as the
122	   relationships between them.  No other formal notation exists which
123	   meet these requirements, so ROHC-FN aims to meet them.

125	   In addition, ROHC-FN offers a library of encoding methods that are
126	   often used in ROHC profiles, so that the specification of new
127	   profiles using the formal notation can be done without having to
128	   redefine this library from scratch.  Informally, an encoding method
129	   defines a two-way mapping between uncompressed data and compressed
130	   data.

132	2.  Terminology

134	   o  Compressed format

136	      A compressed format consists of a list of fields that provides
137	      bindings between encodings and the fields it compresses.  One or
138	      more compressed formats can be combined to represent an entire
139	      compressed header format.

141	   o  Context

143	      Context is information about the current (de)compression state of
144	      the flow.  Specifically, a context for a specific field can be
145	      either uninitialized, or it can include a set of one or more
146	      values for the field's attributes defined by the compression
147	      algorithm, where a value may come from the field's attributes
148	      corresponding to a previous packet.  See also a more generalized
149	      definition in section 2.2 of [I-D.ietf-rohc-rfc3095bis-framework].

151	   o  Control field

153	      Control fields are transmitted from a ROHC compressor to a ROHC
154	      decompressor, but are not part of the uncompressed header itself.

156	   o  Encoding method, encodings

158	      Encoding methods are two-way relations that can be applied to
159	      compress and decompress fields of a protocol header.

161	   o  Field

163	      The protocol header is divided into a set of contiguous bit
164	      patterns known as fields.  Each field is defined by a collection
165	      of attributes which indicate its value and length in bits for both
166	      the compressed and uncompressed headers.  The way the header is
167	      divided into fields is specific to the definition of a profile,
168	      and it is not necessary for the field divisions to be identical to
169	      the ones given by the specification(s) for the protocol header
170	      being compressed.

172	   o  Library of encoding methods

174	      The library of encoding methods contains a number of commonly used
175	      encoding methods for compressing header fields.

177	   o  Profile

179	      A ROHC [I-D.ietf-rohc-rfc3095bis-framework] profile is a
180	      description of how to compress a certain protocol stack.  Each
181	      profile consists of a set of formats (e.g. uncompressed and
182	      compressed formats) along with a set of rules that control
183	      compressor and decompressor behaviour.

185	   o  ROHC-FN specification

187	      The specification of the set of formats of a ROHC profile using
188	      ROHC-FN.

190	   o  Uncompressed format

192	      An uncompressed format consists of a list of fields that provides
193	      the order of the fields to be compressed for a contiguous set of
194	      bits whose bit layout corresponds to the protocol header being
195	      compressed.

197	3.  Overview of ROHC-FN

199	   This section gives an overview of ROHC-FN.  It also explains how
200	   ROHC-FN can be used to specify the compression of header fields as
201	   part of a ROHC profile.

203	3.1.  Scope of the Formal Notation

205	   This section explains how the formal notation relates to the ROHC
206	   framework and to specifications of ROHC profiles.

208	   The ROHC framework [I-D.ietf-rohc-rfc3095bis-framework] provides the
209	   general principles for performing robust header compression.  It
210	   defines the concept of a profile, which makes ROHC a general platform
211	   for different compression schemes.  It sets link layer requirements,
212	   and in particular negotiation requirements, for all ROHC profiles.
213	   It defines a set of common functions such as Context Identifiers
214	   (CIDs), padding and segmentation.  It also defines common formats
215	   (IR, IR-DYN, Feedback, Add-CID, etc.), and finally it defines a
216	   generic, profile independent, feedback mechanism.

218	   A ROHC profile is a description of how to compress a certain protocol
219	   stack.  For example, ROHC profiles are available for RTP/UDP/IP and
220	   many other protocol stacks.

222	   At a high level, each ROHC profile consists of a set of formats
223	   (defining the bits to be transmitted) along with a set of rules that
224	   control compressor and decompressor behaviour.  The purpose of the
225	   formats is to define how to compress and decompress headers.  The
226	   formats define one or more compressed versions of each uncompressed
227	   header, and simultaneously define the inverse: how to relate a
228	   compressed header back to the original uncompressed header.

230	   The set of formats will typically define compression of headers
231	   relative to a context of field values from previous headers in a
232	   flow, improving the overall compression by taking into account
233	   redundancies between headers of successive packets.  Therefore, in
234	   addition to defining the formats, a profile has to:

236	   o  specify how to manage the context, for both the compressor and the
237	      decompressor,
238	   o  define when and what to send in feedback messages, if any, from
239	      decompressor to compressor,
240	   o  outline compression principles to make the profile robust against
241	      bit errors and dropped packets.

243	   All this is needed to ensure that the compressor and decompressor
244	   contexts are kept consistent with each other, while still
245	   facilitating the best possible compression performance.

247	   The ROHC-FN is designed to help in the specification of compressed
248	   formats that, when put together based on the profile definition, make
249	   up the formats used in a ROHC profile.  It offers a library of
250	   encoding methods for compressing fields, and a mechanism for
251	   combining these encoding methods to create compressed formats
252	   tailored to a specific protocol stack.

254	   The scope of ROHC-FN is limited to specifying the relationship
255	   between the compressed and uncompressed formats.  To form a complete
256	   profile specification the control logic for the profile behaviour
257	   needs to be defined by other means.

259	3.2.  Fundamentals of the Formal Notation

261	   There are two fundamental elements to the formal notation:

263	   1.  Fields and their encodings, which define the mapping between a
264	       header's uncompressed and compressed forms.
265	   2.  Encoding methods, which define the way headers are broken down
266	       into fields.  Encoding methods define lists of uncompressed
267	       fields and the lists of compressed fields they map onto.

269	   These two fundamental elements are at the core of the notation and
270	   are outlined below.

272	3.2.1.  Fields and Encodings

274	   Headers are made up of fields.  For example version number, header
275	   length and sequence number are all fields used in real protocols.

277	   Fields have attributes.  Attributes describe various things about the
278	   field, including the length of the field and where the field appears
279	   in the header.  For example:

281	     field.ULENGTH

283	   indicates the uncompressed length of the field.  A field is said to
284	   have a value attribute, i.e. a compressed value or an uncompressed
285	   value, if the corresponding length attribute is greater than zero.
286	   See Section 4.4 for more details on field attributes.

288	   The relationship between the compressed and uncompressed attributes
289	   of a field are specified with encoding methods, using the following
290	   notation:

292	     field   =:=   encoding_method;

294	   In the field definition above, the symbol "=:=" means "is encoded
295	   by".  This field definition does not represent an assignment
296	   operation from the right hand side to the left side.  Instead, it is
297	   a two-way mapping between the compressed and uncompressed attributes
298	   of the field.  It both represents the compression and the
299	   decompression operation in a single field definition, through a
300	   process of two-way matching.

302	   Two-way matching is a binary operation that attempts to make the
303	   operands (i.e. the compressed and uncompressed attributes) the same.
304	   This is similar to the unification process in logic.  The operands
305	   represent one unspecified data object and one specified object.
306	   Values can be matched from either operand.

308	   During compression, the uncompressed attributes of the field are
309	   already defined.  The given encoding matches the compressed
310	   attributes against them.  During decompression, the compressed
311	   attributes of the field are already defined, so the uncompressed
312	   attributes are matched to the compressed attributes using the given
313	   encoding method.  Thus both compression and decompression are defined
314	   by a single field definition.

316	   Therefore, an encoding method (including any parameters specified)
317	   creates a reversible binding between the attributes of a field.  At
318	   the compressor, a format can be used if a set of bindings that is
319	   successful for all the attributes in all its fields can be found.  At
320	   the decompressor, the operation is reversed using the same bindings
321	   and the attributes in each field are filled according to the
322	   specified bindings; decoding fails if the binding for an attribute
323	   fails.

325	   For example, the "static" encoding method creates a binding between
326	   the attribute corresponding to the uncompressed value of the field
327	   and the attribute corresponding to the value of the field in the
328	   context.

330	   o  For the compressor, the "static" binding is successful when both
331	      the context value and the uncompressed value are the same.  If the
332	      two values differ then the binding fails.
333	   o  For the decompressor, the "static" binding succeeds only if a
334	      valid context entry containing the value of the uncompressed field
335	      exists.  Otherwise, the binding will fail.

337	   Both the compressed and uncompressed forms of each field are
338	   represented as a string of bits, most significant bit first, of the
339	   length specified by the length attribute.  The bit string is the
340	   binary representation of the value attribute of the field, modulo
341	   "2^length", where "length" is the length attribute of the field.
342	   This is however only the representation of the bits exchanged between
343	   the compressor and the decompressor, designed to allow maximum
344	   compression efficiency.  The FN itself uses the full range of
345	   integers.  See Section 4.4.2 for further details.

347	3.2.2.  Formats and Encoding Methods

349	   The ROHC-FN provides a library of commonly used encoding methods.
350	   Encoding methods can be defined using plain English, or using a
351	   formal definition consisting of e.g. a collection of expressions
352	   (Section 4.7) and "ENFORCE" statements (Section 4.9).

354	   ROHC-FN also provides mechanisms for combining fields and their
355	   encoding methods into higher level encoding methods following a well-
356	   defined structure.  This is similar to the definition of functions
357	   and procedures in an ordinary programming language.  It allows
358	   complexity to be handled by being broken down into manageable parts.
359	   New encoding methods are defined at the top level of a profile.
360	   These can then be used in the definition of other higher level
361	   encoding methods, and so on.

363	   new_encoding_method         // This block is an encoding method
364	   {
365	     UNCOMPRESSED {            // This block is an uncompressed format
366	       field_1   [ 16 ];
367	       field_2   [ 32 ];
368	       field_3   [ 48 ];
369	     }

371	     CONTROL {                 // This block defines control fields
372	       ctrl_field_1;
373	       ctrl_field_2;
374	     }

376	     DEFAULT {                 // This block defines default encodings
377	                               // for specified fields
378	       ctrl_field_2 =:= encoding_method_2;
379	       field_1      =:= encoding_method_1;
380	     }

382	     COMPRESSED format_0 {     // This block is a compressed format
383	       field_1;
384	       field_2      =:= encoding_method_2;
385	       field_3      =:= encoding_method_3;
386	       ctrl_field_1 =:= encoding_method_4;
387	       ctrl_field_2;
388	     }

390	     COMPRESSED format_1 {     // This block is a compressed format
391	       field_1;
392	       field_2      =:= encoding_method_3;
393	       field_3      =:= encoding_method_4;
394	       ctrl_field_2 =:= encoding_method_5;
395	       ctrl_field_3 =:= encoding_method_6; // This is a control field
396	                                           // with no uncompressed value
397	     }
398	   }

400	   In the example above, the encoding method being defined is called
401	   "new_encoding_method".  The section headed "UNCOMPRESSED" indicates
402	   the order of fields in the uncompressed header, i.e. the uncompressed
403	   header format.  The number of bits in each of the fields is indicated
404	   in square brackets.  After this is another section, "CONTROL", which
405	   defines two control fields.  Following this is the "DEFAULT" section
406	   which defines default encoding methods for two of the fields (see
407	   below).  Finally, two alternative compressed formats follow, each
408	   defined in sections headed "COMPRESSED".  The fields that occur in
409	   the compressed formats are either:

411	   o  fields that occur in the uncompressed format; or
412	   o  control fields, that have an uncompressed value and that occur in
413	      the CONTROL section; or
414	   o  control fields, that do not have an uncompressed value and thus
415	      defined as part of the compressed format.

417	   Central to each of these formats is a "field list", which defines the
418	   fields contained in the format and also the order that those fields
419	   appear in that format.  For the "DEFAULT" and "CONTROL" sections, the
420	   field order is not significant.

422	   In addition to specifying field order, the field list may also
423	   specify bindings for any or all of the fields it contains.  Fields
424	   that have no bindings defined for them are bound using the default
425	   bindings specified in the "DEFAULT" section (see Section 4.12.1.5).

427	   Fields from the compressed format have the same name as they do in
428	   the uncompressed format.  If there are any fields which are present
429	   exclusively in the compressed format but which do have an
430	   uncompressed value, they must be declared in the "CONTROL" section of
431	   the definition of the encoding method (see Section 4.12.1.3 for more
432	   details on defining control fields).

434	   Fields which have no uncompressed value do not appear in an
435	   "UNCOMPRESSED" field list and do not have to appear in the "CONTROL"
436	   field list either.  Instead they are only declared in the compressed
437	   field lists where they are used.

439	   In the example above, all the fields that appear in the compressed
440	   format are also found in the uncompressed format, or the control
441	   field list, except for ctrl_field_3; this is possible because
442	   ctrl_field_3 has no "uncompressed" value at all.  Fields such as a
443	   checksum on the compressed information fall into this category.

445	3.3.  Example using IPv4

447	   This section gives an overview of how the notation is used by means
448	   of an example.  The example will develop the formal notation for an
449	   encoding method capable of compressing a single, well-known header:
450	   the IPv4 header [RFC791].

452	   The first step is to specify the overall structure of the IPv4
453	   header.  To do this, we use an encoding method which we will call
454	   "ipv4_header".  More details on definitions of encoding methods can
455	   be found in Section 4.12.  This is notated as follows:

457	     ipv4_header
458	     {

460	   The fragment of notation above defines the encoding method
461	   "ipv4_header", the definition of which follows the opening brace (see
462	   Section 4.12).

464	   Definitions within the pair of braces are local to "ipv4_header".
465	   This scoping mechanism helps to clarify which fields belong to which
466	   formats: it is also useful when compressing complex protocol stacks
467	   with several headers, often with the same field names occurring in
468	   multiple formats (see Section 4.2).

470	   The next step is to specify the fields contained in the uncompressed
471	   IPv4 header to represent the uncompressed format for which the
472	   encoding method will define one or more compressed formats.  This is
473	   accomplished using ROHC-FN as follows:

475	       UNCOMPRESSED {
476	         version         [  4 ];
477	         header_length   [  4 ];
478	         tos             [  6 ];
479	         ecn             [  2 ];
480	         length          [ 16 ];
481	         id              [ 16 ];
482	         reserved        [  1 ];
483	         dont_frag       [  1 ];
484	         more_fragments  [  1 ];
485	         offset          [ 13 ];
486	         ttl             [  8 ];
487	         protocol        [  8 ];
488	         checksum        [ 16 ];
489	         src_addr        [ 32 ];
490	         dest_addr       [ 32 ];
491	       }

493	   The width of each field is indicated in square brackets.  This part
494	   of the notation is used in the example for illustration, to help the
495	   reader's understanding.  However indicating the field lengths in this
496	   way is optional since the width of each field can normally also be
497	   derived from the encoding that is used to compress/decompress it, for
498	   a specific format.  This part of the notation is formally defined in
499	   Section 4.10.

501	   The next step is to specify the compressed format.  This includes the
502	   encodings for each field which map between the compressed and
503	   uncompressed forms of the field.  In the example, these encoding
504	   methods are mainly taken from the ROHC-FN library (see Section 4.11).
505	   Since the intention here is to illustrate the use of the notation,
506	   rather than to describe the optimum method of compressing IPv4
507	   headers, this example uses only three encoding methods.

509	   The "uncompressed_value" encoding method (defined in Section 4.11.1)
510	   can compress any field whose uncompressed length and value are fixed,
511	   or can be calculated using an expression.  No compressed bits need to
512	   be sent because the uncompressed field can be reconstructed using its
513	   known size and value.  The "uncompressed_value" encoding method is
514	   used to compress five fields in the IPv4 header, as described below:

516	       COMPRESSED {
517	         header_length  =:= uncompressed_value(4, 5);
518	         version        =:= uncompressed_value(4, 4);
519	         reserved       =:= uncompressed_value(1, 0);
520	         offset         =:= uncompressed_value(13, 0);
521	         more_fragments =:= uncompressed_value(1, 0);

523	   The first parameter indicates the length of the uncompressed field in
524	   bits, and the second parameter gives its integer value.

526	   Note that the order of the fields in the compressed format is
527	   independent of the order of the fields in the uncompressed format.

529	   The "irregular" encoding method (defined in Section 4.11.3) can be
530	   used to encode any field for which both uncompressed attributes
531	   (ULENGTH and UVALUE) are defined, and whose ULENGTH attribute is
532	   either fixed or it can be calculated using an expression.  It is a
533	   fail-safe encoding method that can be used for such fields in the
534	   case where no other encoding method applies.  All of the bits in the
535	   uncompressed form of the field are present in the compressed form as
536	   well; hence this encoding does not achieve any compression.

538	         src_addr       =:= irregular(32);
539	         dest_addr      =:= irregular(32);
540	         length         =:= irregular(16);
541	         id             =:= irregular(16);
542	         ttl            =:= irregular(8);
543	         protocol       =:= irregular(8);
544	         tos            =:= irregular(6);
545	         ecn            =:= irregular(2);
546	         dont_frag      =:= irregular(1);

548	   Finally, the third encoding method is specific only to the
549	   uncompressed format defined above for the IPv4 header,
550	   "inferred_ip_v4_header_checksum":

552	         checksum       =:= inferred_ip_v4_header_checksum [ 0 ];
553	       }
554	     }

556	   The "inferred_ip_v4_header_checksum" encoding method is different
557	   from the other two encoding methods in that it is not defined in the
558	   ROHC-FN library of encoding methods.  Its definition could be given
559	   either using the formal notation as part of the profile definition
560	   itself (see Section 4.12) or using plain English text (see
561	   Section 4.13).

563	   In our example, the "inferred_ip_v4_header_checksum" is a specific
564	   encoding method that calculates the IP checksum from the rest of the
565	   header values.  Like the "uncompressed_value" encoding method, no
566	   compressed bits need to be sent, since the field value can be
567	   reconstructed at the decompressor.  This is notated explicitly by
568	   specifying, in square brackets, a length of 0 for the checksum field
569	   in the compressed format.  Again, this notation is optional since the
570	   encoding method itself would be defined as sending zero compressed
571	   bits, however it is useful to the reader to include such notation
572	   (see Section 4.10 for details on this part of the notation).

574	   Finally the definition of the format is terminated with a closing
575	   brace.  At this point, the above example has defined a compressed
576	   format that can be used to represent the entire compressed IPv4
577	   header, and provided enough information to allow an implementation to
578	   construct the compressed format from an uncompressed format
579	   (compression) and vice versa (decompression).

581	4.  Normative Definition of ROHC-FN

583	   This section gives the normative definition of ROHC-FN.  ROHC-FN is a
584	   declarative language that is referentially transparent, with no side
585	   effects.  This means that whenever an expression is evaluated, there
586	   are no other effects from obtaining the value of the expression; the
587	   same expression is thus guaranteed to have the same value wherever it
588	   appears in the notation, and it can always be interchanged with its
589	   value in any of the formats it appears in (subject to the scope rules
590	   of identifiers of Section 4.2).

592	   The formal notation describes the structure of the formats and the
593	   relationships between their uncompressed and compressed forms, rather
594	   than describing how compression and decompression is performed.

596	   In various places within this section, text inside angle brackets has
597	   been used as a descriptive placeholder.  The use of angle brackets in
598	   this way is solely for the benefit of the reader of this draft.
599	   Neither the angle brackets nor their contents form a part of the
600	   notation.

602	4.1.  Structure of a Specification

604	   The specification of the compressed formats of a ROHC profile using
605	   ROHC-FN is called a ROHC-FN specification.  ROHC-FN specifications
606	   are case sensitive and are written in the 7-bit ASCII character set
607	   (as defined in [RFC2822]) and consist of a sequence of zero or more
608	   constant definitions (Section 4.3), an optional global control field
609	   list (Section 4.12.1.3) and one or more encoding method definitions
610	   (Section 4.12).

612	   Encoding methods can be defined using the formal notation or can be
613	   predefined encoding methods.

615	   Encoding methods are defined using the formal notation by giving one
616	   or more uncompressed formats to represent the uncompressed header and
617	   one or more compressed formats.  These formats are related to each
618	   other by "fields", each of which describes a certain part of an
619	   uncompressed and/or a compressed header.  In addition to the formats
620	   each encoding method may contain control fields and default field
621	   encodings sections.  The attributes of a field are bound by using an
622	   encoding method for it and/or by using "ENFORCE" statements
623	   (Section 4.9) within the formats.  Each of these is terminated by a
624	   semi-colon.

626	   Predefined encoding methods are not defined in the formal notation.
627	   Instead they are defined by giving a short textual reference
628	   explaining where the encoding method is defined.  It is not necessary
629	   to define the library of encoding methods contained in this document
630	   in this way, their definition is implicit to the usage of the formal
631	   notation.

633	4.2.  Identifiers

635	   In ROHC-FN identifiers are used for any of the following:

637	   o  encoding methods
638	   o  formats
639	   o  fields
640	   o  parameters
641	   o  constants

643	   All identifiers may be of any length and may contain any combination
644	   of alphanumeric characters and underscores, within the restrictions
645	   defined in this section.

647	   All identifiers must start with an alphabetic character.

649	   It is illegal to have two or more identifiers that differ from each
650	   other only in capitalisation, in the same scope.

652	   All letters in identifiers for constants must be upper case.

654	   It is illegal to use any of the following as identifiers (including
655	   alternative capitalisations):

657	   o  "false", "true"
658	   o  "ENFORCE", "THIS", "VARIABLE"
659	   o  "ULENGTH", "UVALUE"
660	   o  "CLENGTH", "CVALUE"
661	   o  "UNCOMPRESSED", "COMPRESSED", "CONTROL", "INITIAL" or "DEFAULT"

663	   Format names can not be referred to in the notation, although they
664	   are considered to be identifiers.  See Section 4.12.3.1) for more
665	   details on format names.

667	   All identifiers used in ROHC-FN have a "scope".  The scope of an
668	   identifier defines the parts of the specification where that
669	   identifier applies and from which it can be referred to.  If an
670	   identifier has "global" scope, then it applies throughout the
671	   specification which contains it and can be referred to from anywhere
672	   within it.  If an identifier has "local" scope, then it only applies
673	   to the encoding method in which it is defined, it cannot be
674	   referenced from outside the local scope of that encoding method.  If
675	   an identifier has local scope, that identifier can therefore be used
676	   in multiple different local scopes to refer to different items.

678	   All instances of an identifier within its scope refer to the same
679	   item.  It is not possible to have different items referred to by a
680	   single identifier within any given scope.  For this reason, if there
681	   is an identifier which has global scope it can not be used separately
682	   in a local scope, since a globally scoped identifier is already
683	   applicable in all local scopes.

685	   The identifiers for each encoding method and each constant all have
686	   global scope.  Each format and field also has an identifier.  The
687	   scope of format and field identifiers is local, with the exception of
688	   global control fields which have global scope.  Therefore it is
689	   illegal for a format or field to have the same identifier as another
690	   format or field within the same scope, or as an encoding method or a
691	   constant (since they have global scope).

693	   Note that although format names (see Section 4.12.3.1) are considered
694	   to be identifiers, they are not referred to in the notation, but are
695	   primarily for the benefit of the reader.

697	4.3.  Constant Definitions

699	   Constant values can be defined using the "=" operator.  Identifiers
700	   for constants must be all upper case.  For example:

702	      SOME_CONSTANT = 3;

704	   Constants are defined by an expression (see Section 4.7) on the right
705	   hand side of the "=" operator.  The expression must yield a constant
706	   value.  That is, the expression must be one whose terms are all
707	   either constants or literals and must not vary depending on the
708	   header being compressed.

710	   Constants have global scope.  Constants must be defined at the top
711	   level, outside any encoding method definition.  Constants are
712	   entirely equivalent to the value they refer to, and are completely
713	   interchangeable with that value.  Unlike field attributes, which may
714	   change from packet to packet, constants have the same value for all
715	   packets.

717	4.4.  Fields

719	   Fields are the basic building blocks of a ROHC-FN specification.
720	   Fields are the units into which headers are divided.  Each field may
721	   have two forms: a compressed form and an uncompressed form.  Both
722	   forms are represented as bits exchanged between the compressor and
723	   the decompressor in the same way, as an unsigned string of bits, most
724	   significant bit first.

726	   The properties of the compressed form of a field are defined by an
727	   encoding method and/or "ENFORCE" statements.  This entirely
728	   characterises the relationship between the uncompressed and
729	   compressed forms of that field.  This is achieved by specifying the
730	   relationships between the field's attributes.

732	   The notation defines four field attributes, two for the uncompressed
733	   form and a corresponding two for the compressed form.  The attributes
734	   available for each field are:

736	   uncompressed attributes of a field:
737	   o  "UVALUE" and "ULENGTH",

739	   compressed attributes of a field:
740	   o  "CVALUE" and "CLENGTH".

742	   The two value attributes contain the respective numerical values of
743	   the field, i.e.  "UVALUE" gives the numerical value of the
744	   uncompressed form of the field, and the attribute "CVALUE" gives the
745	   numerical value of the compressed form of the field.  The numerical
746	   values are derived by interpreting the bit string representations of
747	   the field as bit strings, most-significant bit first.

749	   The two length attributes indicate the length in bits of the
750	   associated bit string; "ULENGTH" for the uncompressed form, and
751	   "CLENGTH" for the compressed form.

753	   Attributes are undefined unless they are bound to a value in which
754	   case they become defined.  If two conflicting bindings are given for
755	   a field attribute then the bindings fail along with the (combination
756	   of) formats in which those bindings were defined.

758	   Uncompressed attributes do not always reflect an aspect of the
759	   uncompressed header.  Some fields do not originate from the
760	   uncompressed header, but are control fields.

762	4.4.1.  Attribute References

764	   Attributes of a particular field are formally referred to by using
765	   the field's name followed by a "." and the attribute's identifier.

767	   For example:

769	     rtp_seq_number.UVALUE

771	   gives the uncompressed value of the rtp_seq_number field.  The
772	   primary reason for referencing attributes is for use in expressions,
773	   which are explained in Section 4.7.

775	4.4.2.  Representation of Field Values

777	   Fields are represented as bit strings.  The bit string is calculated
778	   using the value attribute ("val") and the length attribute ("len").
779	   The bit string is the binary representation of "val % (2 ^ len)".

781	   For example if a field's "CLENGTH" attribute was 8, and its "CVALUE"
782	   attribute was -1, the compressed representation of the field would be
783	   "-1 % (2 ^ 8)", which equals "-1 % 256", which equals 255, 11111111
784	   in binary.

786	   ROHC-FN supports the full range of integers for use in expressions
787	   (see Section 4.7), but the representation of the formats (i.e. the
788	   bits exchanged between the compressor and the decompressor) is in the
789	   above form.

791	4.5.  Grouping of Fields

793	   Since the order of fields in a "COMPRESSED" field list
794	   (Section 4.12.1.2) do not have to be the same as the order of fields
795	   in an "UNCOMPRESSED" field list (Section 4.12.1.1), it is possible to
796	   group together any number of fields which are contiguous in a
797	   "COMPRESSED" format, to allow them all to be encoded using a single
798	   encoding method.  The group of fields is specified immediately to the
799	   left of "=:=" in place of a single field name.

801	   The group is notated by giving a colon separated list of the fields
802	   to be grouped together.  For example there may be two non-contiguous
803	   fields in an uncompressed header which are two halves of what is
804	   effectively a single sequence number:

806	     grouping_example
807	     {
808	       UNCOMPRESSED {
809	         minor_seq_num;  // 12 bits
810	         other_field;    //  8 bits
811	         major_seq_num;  //  4 bits
812	       }

814	       COMPRESSED {
815	         other_field     =:= irregular(8);
816	         major_seq_num
817	         : minor_seq_num =:= lsb(3, 0);
818	       }
819	     }

821	   The group of fields is presented to the encoding method as a
822	   contiguous group of bits, assembled by the concatenation of the
823	   fields in the order they are given in the group.  The most
824	   significant bit of the combined field is the most significant bit of
825	   the first field in the list, and the least significant bit of the
826	   combined field is the least significant bit of the last field in the
827	   list.

829	   Finally, the length attributes of the combined field are equal to the
830	   sum of the corresponding length attributes for all the fields in the
831	   group.

833	4.6.  "THIS"

835	   Within the definition of an encoding method it is possible to refer
836	   to the field (i.e. the group of contiguous bits) the method is
837	   encoding, using the keyword "THIS".

839	   This is useful for gaining access to the attributes of the field
840	   being encoded.  For example it is often useful to know the total
841	   uncompressed length of the uncompressed format which is being
842	   encoded:

844	       THIS.ULENGTH

846	4.7.  Expressions

848	   ROHC-FN includes the usual infix style of expressions, with
849	   parentheses "(" and ")" used for grouping.  Expressions can be made
850	   up of any of the components described in the following subsections.

852	   The semantics of expressions are generally similar to the expressions
853	   in the ANSI-C programming language [C90].  The definitive list of
854	   expressions in ROHC-FN follows in the next subsections; the list
855	   below provides some examples of the difference between expressions in
856	   ANSI-C and expressions in ROHC-FN:

858	   o  There is no limit on the range of integers.
859	   o  "x ^ y" evaluates to x raised to the power of y.  This has a
860	      precedence higher than *, / and %, but lower than unary - and is
861	      right to left associative.
862	   o  There is no comma operator
863	   o  There are no "modify" operators (no assignment operators and no
864	      increment or decrement)
865	   o  There are no bitwise operators.

867	   Expressions may refer to any of the attributes of a field (as
868	   described in Section 4.4), to any defined constant (see Section 4.3)
869	   and also to encoding method parameters, if any are in scope (see
870	   Section 4.12).

872	   If any of the attributes, constants or parameters used in the
873	   expression are undefined, the value of the expression is undefined.
874	   Undefined expressions cause the environment (e.g. the compressed
875	   format) in which they are used to fail if a defined value is
876	   required.  Defined values are required for all compressed attributes
877	   of fields which appear in the compressed format.  Defined values are
878	   not required for all uncompressed attributes of fields which appear
879	   in the uncompressed format.  It is up to the profile creator to
880	   define what happens to the unbound field attributes in this case.  It
881	   should be noted that in such a case, transparency of the compression
882	   process will be lost: i.e. it will not be possible for the
883	   decompressor to reproduce the original header.

885	   Expressions cannot be used as encoding methods directly because they
886	   do not completely characterise a field.  Expressions only specify a
887	   single value whereas a field is made up of several values: its
888	   attributes.  For example, the following is illegal:

890	      tcp_list_length =:= (data_offset + 20) / 4;

892	   There is only enough information here to define a single attribute of
893	   "tcp_list_length".  Although this makes no sense formally, this could
894	   intuitively be read as defining the "UVALUE" attribute.  However,
895	   that would still leave the length of the uncompressed field undefined
896	   at the decompressor.  Such usage is therefore prohibited.

898	4.7.1.  Integer Literals

900	   Integers can be expressed as decimal values, binary values (prefixed
901	   by "0b"), or hexadecimal values (prefixed by "0x").  Negative
902	   integers are prefixed by a "-" sign.  For example "10", "0b1010" and
903	   "-0x0a" are all valid integer literals, having the values ten, ten
904	   and minus ten respectively.

906	4.7.2.  Integer Operators

908	   The following "integer" operators are available, which take integer
909	   arguments and return an integer result:

911	   o  ^, for exponentiation. "x ^ y" returns the value of "x" to the
912	      power of "y".
913	   o  *, / for multiplication and division. "x * y" returns the product
914	      of "x" and "y". "x / y" returns the quotient, rounded down to the
915	      next integer (the next one towards negative infinity).
916	   o  +, - for addition and subtraction. "x + y" returns the sum of "x"
917	      and "y". "x - y" returns the difference.
918	   o  % for modulo. "x % y" returns "x" modulo "y"; x - y * (x / y).

920	4.7.3.  Boolean Literals

922	   The boolean literals are "false", and "true".

924	4.7.4.  Boolean Operators

926	   The following "boolean" operators are available, which take boolean
927	   arguments and return a boolean result:

929	   o  &&, for logical "and".  Returns true if both arguments are true.
930	      Returns false otherwise.
931	   o  ||, for logical "or".  Returns true if at least one argument is
932	      true.  Returns false otherwise.

934	   o  !, for logical not.  Returns true if its argument is false.
935	      Returns false otherwise.

937	4.7.5.  Comparison Operators

939	   The following "comparison" operators are available, which take
940	   integer arguments and return a boolean result:

942	   o  ==, !=, for equality and its negative. "x == y" returns true if x
943	      is equal to y.  Returns false otherwise. "x != y" returns true if
944	      x is not equal to y.  Returns false otherwise.
945	   o  <, >, for less than and greater than. "x < y" returns true if x is
946	      less than y.  Returns false otherwise. "x > y" returns true if x
947	      is greater than y.  Returns false otherwise.
948	   o  >=, <=, for greater than or equal and less than or equal, the
949	      inverse functions of <, >. "x >= y" returns false if x is less
950	      than y.  Returns true otherwise. "x <= y" returns false if x is
951	      greater than y.  Returns true otherwise.

953	4.8.  Comments

955	   Free English text can be inserted into a ROHC-FN specification to
956	   explain why something has been done a particular way, to clarify the
957	   intended meaning of the notation, or to elaborate on some point.

959	   The FN uses an end of line comment style, which makes use of the "//"
960	   comment marker.  Any text between the "//" marker and the end of the
961	   line has no formal meaning.  For example:

963	     //-----------------------------------------------------------------
964	     //    IR-REPLICATE header formats
965	     //-----------------------------------------------------------------

967	     // The following fields are included in all of the IR-REPLICATE
968	     // header formats:
969	     //
970	     UNCOMPRESSED {
971	       discriminator;    //  8 bits
972	       tcp_seq_number;   // 32 bits
973	       tcp_flags_ecn;    //  2 bits

975	   Comments do not affect the formal meaning of what is notated, but can
976	   be used to improve readability.  Their use is optional.

978	   Comments may help to provide clarifications to the reader, and serve
979	   different purposes to implementers.  Comments should thus not be
980	   considered of lesser importance when inserting them into a ROHC-FN
981	   specification; they should be consistent with the normative part of
982	   the specification.

984	4.9.  "ENFORCE" Statements

986	   The "ENFORCE" statement provides a way to add predicates to a format,
987	   all of which must be fulfilled for the format to succeed.  An
988	   "ENFORCE" statement shares some similarities with an encoding method.
989	   Specifically, whereas an encoding method binds several field
990	   attributes at once, an "ENFORCE" statement typically binds just one
991	   of them.  In fact, all the bindings that encoding methods create can
992	   be expressed in terms of a collection of "ENFORCE" statements.  Here
993	   is an example "ENFORCE" statement which binds the "UVALUE" attribute
994	   of a field to 5.

996	     ENFORCE(field.UVALUE == 5);

998	   An "ENFORCE" statement must only be used inside a field list (see
999	   Section 4.12).  It attempts to force the expression given to be true
1000	   for the format which it belongs to.

1002	   An abbreviated form of "ENFORCE" statement is available for binding
1003	   length attributes using "[" and "]", see Section 4.10.

1005	   Like an encoding method, an "ENFORCE" statement can only be
1006	   successfully used in a format if the binding it describes is
1007	   achievable.  A format containing the example "ENFORCE" statement
1008	   above would not be usable if the field had also been bound within
1009	   that same format with "uncompressed_value" encoding which gave it a
1010	   "UVALUE" other than 5.

1012	   An "ENFORCE" statement takes a boolean expression as a parameter.  It
1013	   can be used to assert that the expression is true, in order to choose
1014	   a particular format from a list of possible formats specified in an
1015	   encoding method (see Section 4.12), or just to bind an expression as
1016	   in the example above.  The general form of an "ENFORCE" statement is
1017	   therefore:

1019	     ENFORCE(<boolean expression>);

1021	   There are three possible conditions that the expression may be in:

1023	   1.  The boolean expression evaluates to false, in which case the
1024	       local scope of the format that contains the "ENFORCE" statement
1025	       cannot be used,
1026	   2.  The boolean expression evaluates to true, in which case the
1027	       binding is created and successful,

1029	   3.  The value of the boolean expression is undefined.  In this case,
1030	       the binding is also created and successful.

1032	   In all three cases, any undefined terms become bound by the
1033	   expression.  Generally speaking an "ENFORCE" statement is either
1034	   being used as an assignment (condition 3 above) or else it is being
1035	   used to test if a particular format is usable, as is the case with
1036	   conditions 1 and 2.

1038	4.10.  Formal Specification of Field Lengths

1040	   In many of the preceding examples each field has been followed by a
1041	   comment indicating the length of the field.  Indicating the length of
1042	   a field like this is optional, but can be very helpful for the
1043	   reader.  However, whilst useful to the reader, comments have no
1044	   formal meaning.

1046	   One of the most common uses for "ENFORCE" statements (see
1047	   Section 4.9) is to explicitly define the length of a field within a
1048	   header.  Using "ENFORCE" statements for this purpose has formal
1049	   meaning but is not so easy to read.  Therefore an abbreviated form is
1050	   provided for this use of "ENFORCE", which is both easy to read and
1051	   has formal meaning.

1053	   An expression defining the length of a field can be specified in
1054	   square brackets after the appearance of that field in a format.  If
1055	   the field can take several alternative lengths then the expressions
1056	   defining those lengths can be enumerated as a comma separated list
1057	   within the square brackets.  For example,

1059	     field_1                  [ 4 ];
1060	     field_2                  [ a+b, 2 ];
1061	     field_3 =:= lsb(16, 16)  [ 26 ];

1063	   The actual length attribute which is bound by this notation depends
1064	   on whether it appears in a "COMPRESSED", "UNCOMPRESSED" or "CONTROL"
1065	   field list (see Section 4.12.1 and its subsections).  In a
1066	   "COMPRESSED" field list, the field's "CLENGTH" attribute is bound.
1067	   In "UNCOMPRESSED" and "CONTROL" field lists, the field's "ULENGTH"
1068	   attribute is bound.  Abbreviated "ENFORCE" statements are not allowed
1069	   in "DEFAULT" sections (see Section 4.12.1.5).  Therefore the above
1070	   notation would not be allowed to appear in a "DEFAULT" section.
1071	   However if the above appeared in an "UNCOMPRESSED" or "CONTROL"
1072	   section it would be equivalent to:

1074	     field_1;                 ENFORCE(field_1.ULENGTH == 4);
1075	     field_2;                 ENFORCE((field_2.ULENGTH == 2)
1076	                                   || (field_2.ULENGTH == a+b));

1078	     field_3 =:= lsb(16, 16); ENFORCE(field_3.ULENGTH == 26);

1080	   A special case exists for fields which have a variable length, that
1081	   the notator does not wish to define or is not able to define using an
1082	   expression.  The keyword "VARIABLE" can be used in this case:

1084	     variable_length_field  [ VARIABLE ];

1086	   Formally this provides no restrictions on the field length, but maps
1087	   onto any positive integer or to a value of zero.  It will therefore
1088	   be necessary to define the length of the field elsewhere (see the
1089	   final paragraphs of Section 4.12.1.1 and Section 4.12.1.2).  This may
1090	   either be in the notation or in the English text of the profile
1091	   within which the FN is contained.  Within the square brackets, the
1092	   keyword "VARIABLE" may be used as a term in an expression, just like
1093	   any other term that normally appears in an expression.  For example:

1095	         field  [ 8 * (5 + VARIABLE) ];

1097	   This defines a field whose length is a whole number of octets and at
1098	   least 40 bits (5 octets) long.

1100	4.11.  Library of Encoding Methods

1102	   A number of common techniques for compressing header fields are
1103	   defined as part of the ROHC-FN library so that they can be reused
1104	   when creating new ROHC-FN specifications.  Their notation is
1105	   described below.

1107	   As an alternative or a complement to this library of encoding
1108	   methods, a ROHC-FN specification can define its own set of encoding
1109	   methods, using the formal notation (see Section 4.12) or using a
1110	   textual definition (see Section 4.13).

1112	4.11.1.  uncompressed_value

1114	   The "uncompressed_value" encoding method is used to encode header
1115	   fields for which the uncompressed value can be defined using a
1116	   mathematical expression (including constant values).  This encoding
1117	   method is defined as follows:

1119	     uncompressed_value(len, val) {
1120	       UNCOMPRESSED {
1121	         field;
1122	         ENFORCE(field.ULENGTH == len);
1123	         ENFORCE(field.UVALUE == val);
1124	       }
1125	       COMPRESSED {
1126	         field;
1127	         ENFORCE(field.CLENGTH == 0);
1128	       }
1129	     }

1131	   To exemplify the usage of "uncompressed_value" encoding, the IPv6
1132	   header version number is a four bit field that always has the value
1133	   6:

1135	     version   =:=   uncompressed_value(4, 6);

1137	   Here is another example of value encoding, using an expression to
1138	   calculate the length:

1140	     padding =:= uncompressed_value(nbits - 8, 0);

1142	   The expression above uses an encoding method parameter, "nbits",
1143	   which in this example specifies how many significant bits there are
1144	   in the data, to calculate how many pad bits to use.  See
1145	   Section 4.12.2 for more information on encoding method parameters.

1147	4.11.2.  compressed_value

1149	   The "compressed_value" encoding method is used to define fields in
1150	   compressed formats for which there is no counterpart in the
1151	   uncompressed format (i.e. control fields).  It can be used to specify
1152	   compressed fields whose value can be defined using a mathematical
1153	   expression (including constant values).  This encoding method is
1154	   defined as follows:

1156	     compressed_value(len, val) {
1157	       UNCOMPRESSED {
1158	         field;
1159	         ENFORCE(field.ULENGTH == 0);
1160	       }
1161	       COMPRESSED {
1162	         field;
1163	         ENFORCE(field.CLENGTH == len);
1164	         ENFORCE(field.CVALUE == val);
1165	       }
1166	     }

1168	   One possible use of this encoding method is to define padding in a
1169	   compressed format:

1171	     pad_to_octet_boundary      =:=   compressed_value(3, 0);

1173	   A more common use is to define a discriminator field to make it
1174	   possible to differentiate between different compressed formats within
1175	   an encoding method (see Section 4.12).  For convenience, the notation
1176	   provides syntax for specifying "compressed_value" encoding in the
1177	   form of a binary string.  The binary string to be encoded is simply
1178	   given in single quotes; the "CLENGTH" attribute of the field binds
1179	   with the number of bits in the string, while its "CVALUE" attribute
1180	   binds with the value given by the string.  For example:

1182	     discriminator     =:=   '01101';

1184	   This has exactly the same meaning as:

1186	     discriminator     =:=   compressed_value(5, 13);

1188	4.11.3.  irregular

1190	   The "irregular" encoding method is used to encode a field in the
1191	   compressed format with a bit pattern identical to the uncompressed
1192	   field.  This encoding method is defined as follows:

1194	     irregular(len) {
1195	       UNCOMPRESSED {
1196	         field;
1197	         ENFORCE(field.ULENGTH == len);
1198	       }
1199	       COMPRESSED {
1200	         field;
1201	         ENFORCE(field.CLENGTH == len);
1202	         ENFORCE(field.CVALUE == field.UVALUE);
1203	       }
1204	     }

1206	   For example, the checksum field of the TCP header is a sixteen bit
1207	   field that does not follow any predictable pattern from one header to
1208	   another (and so cannot be compressed):

1210	     tcp_checksum  =:=   irregular(16);

1212	   Note that the length does not have to be constant, for example the
1213	   length expression can be used to derive the length of the field from
1214	   the value of another field.

1216	4.11.4.  static

1218	   The "static" encoding method compresses a field whose length and
1219	   value are the same as for a previous header in the flow, i.e. where
1220	   the field completely matches an existing entry in the context:

1222	     field            =:=   static;

1224	   The field's "UVALUE" and "ULENGTH" attributes bind with their
1225	   respective values in the context and the "CLENGTH" attribute is bound
1226	   to zero.

1228	   Since the field value is the same as a previous field value, the
1229	   entire field can be reconstructed from the context, so it is
1230	   compressed to zero bits and does not appear in the compressed format.

1232	   For example, the source port of the TCP header is a field whose value
1233	   does not change from one packet to the next for a given flow:

1235	     src_port  =:=   static;

1237	4.11.5.  lsb

1239	   The least significant bits encoding method, "lsb", compresses a field
1240	   whose value differs by a small amount from the value stored in the
1241	   context.  The least significant bits of the field value are
1242	   transmitted instead of the original field value.

1244	     field  =:=   lsb(<num_lsbs_param>, <offset_param>);

1246	   Here, "num_lsbs_param" is the number of least significant bits to
1247	   use, and "offset_param" is the interpretation interval offset as
1248	   defined below.

1250	   The parameter "num_lsbs_param" binds with the "CLENGTH" attribute,
1251	   the "UVALUE" attribute binds to the value within the interval whose
1252	   least significant bits match the "CVALUE" attribute.  The value of
1253	   the "ULENGTH" can be derived from the information stored in the
1254	   context.

1256	   For example, the TCP sequence number:

1258	     tcp_sequence_number   =:=   lsb(14, 8192);

1260	   This takes up 14 bits, and can communicate any value which is between
1261	   8192 lower than the value of the field stored in context and 8191
1262	   above it.

1264	   The interpretation interval can be described as a function of a value
1265	   stored in the context, ref_value, and of num_lsbs_param:

1267	     f(context_value, num_lsbs_param) = [ref_value - offset_param,
1268	                ref_value + (2^num_lsbs_param - 1) - offset_param]

1270	   where offset_param is an integer.

1272	          <-- interpretation interval (size is 2^num_lsbs_param) -->
1273	          |---------------------------+----------------------------|
1274	        lower                     ref_value                      upper
1275	        bound                                                    bound

1277	   where:

1279	        lower bound = ref_value - offset_param
1280	        upper bound = ref_value + (2^num_lsbs_param-1) - offset_param

1282	   The "lsb" encoding method can therefore compress a field whose value
1283	   lies between the lower and the upper bounds, inclusively, of the
1284	   interpretation interval.  In particular, if offset_param = 0 then the
1285	   field value can only stay the same or increase relative to the
1286	   reference value ref_value.  If offset_param = -1 then it can only
1287	   increase, whereas if offset_param = 2^num_lsbs_param then it can only
1288	   decrease.

1290	   The compressed field takes up the specified number of bits in the
1291	   compressed format (i.e. num_lsbs_param).

1293	   The compressor may not be able to determine the exact reference value
1294	   stored in the decompressor context and that will be used by the
1295	   decompressor, since some packets that would have updated the context
1296	   may have been lost or damaged.  However, from feedback received or by
1297	   making assumptions, the compressor can limit the candidate set of
1298	   values.  The compressor can then select a format that uses an "lsb"
1299	   encoding defined with suitable values for its parameters
1300	   num_lsbs_param and offset_param, such that no matter which context
1301	   value in the candidate set the decompressor uses, the resulting
1302	   decompression is correct.  If that is not possible, the "lsb"
1303	   encoding method fails (which typically results in a less efficient
1304	   compressed format being chosen by the compressor).  How the
1305	   compressor determines what reference values it stores and maintains
1306	   in its set of candidate references is outside the scope of the
1307	   notation.

1309	4.11.6.  crc

1311	   The "crc" encoding method provides a CRC calculated over a block of
1312	   data.  The algorithm used to calculate the CRC is the one specified
1313	   in [I-D.ietf-rohc-rfc3095bis-framework].  The "crc" method takes a
1314	   number of parameters:

1316	   o  the number of bits for the CRC (crc_bits),
1317	   o  the bit-pattern for the polynomial (bit_pattern),
1318	   o  the initial value for the CRC register (initial_value),
1319	   o  the value of the block of data, represented using either the
1320	      "UVALUE" or "CVALUE" attribute of a field (block_data_value); and
1321	   o  the size in octets of the block of data (block_data_length).

1323	   i.e.:

1325	     field   =:=   crc(<num_bits>, <bit_pattern>, <initial_value>,
1326	                       <block_data_value>, <block_data_length>);

1328	   When specifying the bit pattern for the polynomial, each bit
1329	   represents the coefficient for the corresponding term in the
1330	   polynomial.  Note that the highest order term is always present (by
1331	   definition) and therefore does not need specifying in the bit
1332	   pattern.  Therefore a CRC polynomial with n terms in it is
1333	   represented by a bit pattern with n-1 bits set.

1335	   The CRC is calculated in least significant bit (LSB) order.

1337	   For example:

1339	     // 3 bit CRC, C(x) = x^0 + x^1 + x^3
1340	     crc_field =:= crc(3, 0x6, 0xF, THIS.CVALUE, THIS.CLENGTH);

1342	   Usage of the "THIS" keyword (see Section 4.6) as shown above, is
1343	   typical when using "crc" encoding.  For example, when used in the
1344	   encoding method for an entire header, it causes the CRC to be
1345	   calculated over all fields in the header.

1347	4.12.  Definition of Encoding Methods

1349	   New encoding methods can be defined in a formal specification.  These
1350	   compose groups of individual fields into a contiguous block.

1352	   Encoding methods have names and may have parameters; they can also be
1353	   used in the same way as any other encoding method from the library of
1354	   encoding methods.  Since they can contain references to other
1355	   encoding methods, complicated formats can be broken down into
1356	   manageable pieces in a hierarchical fashion.

1358	   This section describes the various features used to define new
1359	   encoding methods.

1361	4.12.1.  Structure

1363	   This simplest form of defining an encoding method is to specify a
1364	   single encoding.  For example:

1366	     compound_encoding_method
1367	     {
1368	       UNCOMPRESSED {
1369	         field_1;  //  4 bits
1370	         field_2;  // 12 bits
1371	       }

1373	       COMPRESSED {
1374	         field_2 =:= uncompressed_value(12, 9); //  0 bits
1375	         field_1 =:= irregular(4);              //  4 bits
1376	       }
1377	     }

1379	   The above begins with the new method's identifier,
1380	   "compound_encoding_method".  The definition of the method then
1381	   follows inside curly braces, "{" and "}".  The first item in the
1382	   definition is the "UNCOMPRESSED" field list, which gives the order of
1383	   the fields in the uncompressed format.  This is followed by the
1384	   compressed format field list ("COMPRESSED").  This list gives the
1385	   order of fields in the compressed format and also gives the encoding
1386	   method for each field.

1388	   In the example both the formats list each field exactly once.
1389	   Sometimes however it is necessary to specify more than one binding
1390	   for a given field, which means it appears more than once in the field
1391	   list.  In this case it is the first occurrence of the field in the
1392	   list which indicates its position in the field order.  The subsequent
1393	   occurrences of the field only specify binding information, not field
1394	   order information.

1396	   The different components of this example are described in more detail
1397	   below.  Other components that can be used in the definition of
1398	   encoding methods are also defined thereafter.

1400	4.12.1.1.  Uncompressed Format - "UNCOMPRESSED"

1402	   The uncompressed field list is defined by "UNCOMPRESSED", which
1403	   specifies the fields of the uncompressed format in the order that
1404	   they appear in the uncompressed header.  The sum of the length of
1405	   each individual uncompressed field in the list must be equal to the
1406	   length of the field being encoded.  Finally, the representation of
1407	   the uncompressed format described using the list of fields in the
1408	   "UNCOMPRESSED" section, for which compressed formats are being
1409	   defined, always consists of one single contiguous block of bits.

1411	   In the example above in Section 4.12.1, the uncompressed field list
1412	   is "field_1" followed by "field_2".  This means that a field being
1413	   encoded by this method is divided into two subfields, "field_1" and
1414	   "field_2".  The total uncompressed lengths of these two fields
1415	   therefore equals the length of the field being encoded:

1417	     field_1.ULENGTH + field_2.ULENGTH == THIS.ULENGTH

1419	   In the example, there are only two fields, but any number of
1420	   subfields may be used.  This relationship applies to however many
1421	   fields are actually used.  Any arrangement of fields that efficiently
1422	   describes the content of the uncompressed header may be chosen --
1423	   this need not be the same as the one described in the specifications
1424	   for the protocol header being compressed.

1426	   For example, there may be a protocol whose header contains a 16 bit
1427	   sequence number, but whose sessions tend to be short lived.  This
1428	   would mean that the high bits of the sequence number are almost
1429	   always constant.  The "UNCOMPRESSED" format could reflect this by
1430	   splitting the original uncompressed field into two fields, one field
1431	   to represent the almost-always-zero part of the sequence number, and
1432	   a second field to represent the salient part.

1434	   An "UNCOMPRESSED" field list may specify encoding methods in the same
1435	   way as the "COMPRESSED" field list in the example.  Encoding methods
1436	   specified therein are used whenever a packet with that uncompressed
1437	   format is being encoded.  The encoding of a packet with a given
1438	   uncompressed format can only succeed if all of its encoding methods
1439	   and "ENFORCE" statements succeed (see Section 4.9).

1441	   The total length of an uncompressed format must always be defined.
1442	   The length of each of the fields in an uncompressed format must also
1443	   be defined.  This means that the bindings in the "UNCOMPRESSED",
1444	   "COMPRESSED" (see Section 4.12.1.2 below), "CONTROL" (see
1445	   Section 4.12.1.3 below), "INITIAL" (see Section 4.12.1.4 below) and
1446	   "DEFAULT" (see Section 4.12.1.5 below) field lists must between them
1447	   define the "ULENGTH" attribute of every field in an uncompressed
1448	   format so that there is an unambiguous mapping from the bits in the
1449	   uncompressed format to the fields listed in each "UNCOMPRESSED" field
1450	   list.

1452	4.12.1.2.  Compressed Format - "COMPRESSED"

1454	   Similar to the uncompressed field list, the compressed header will
1455	   appear in the order specified by the compressed field list given for
1456	   a compressed format.  Each individual field is encoded in the manner
1457	   given for that field.  The total length of the compressed data will
1458	   be the sum of the compressed lengths of all the individual fields.
1459	   In the example from Section 4.12.1, the encoding methods used for
1460	   these fields indicate that they are zero and 4 bits long, making a
1461	   total of 4 bits.

1463	   The order of the fields specified in a "COMPRESSED" field list does
1464	   not have to match the order they appear in the "UNCOMPRESSED" field
1465	   list.  It may be desirable to reorder the fields in the compressed
1466	   format to align the compressed header to the octet boundary, or for
1467	   other reasons.  In the above example, the order is in fact the
1468	   opposite of that in the uncompressed format.

1470	   The compressed field list specifies that the encoding for "field_1"
1471	   is "irregular", and takes up four bits in both the compressed format
1472	   and uncompressed format.  The encoding for "field_2" is
1473	   "uncompressed_value", which means that the field has a fixed value,
1474	   so it can be compressed to zero bits.  The value it takes is 9, and
1475	   it is 12 bits wide in the uncompressed format.

1477	   Fields like "field_2", which compress to zero bits in length, may
1478	   appear anywhere in the field list without changing the compressed
1479	   format because their position in the list is not significant.  In
1480	   fact, if the encoding method for this field were defined elsewhere
1481	   (e.g. in the "UNCOMPRESSED" section), this field could be omitted
1482	   from the "COMPRESSED" section altogether:

1484	     compound_encoding_method
1485	     {
1486	       UNCOMPRESSED {
1487	         field_1;                                //  4 bits
1488	         field_2 =:= uncompressed_value(12, 9);  // 12 bits
1489	       }

1491	       COMPRESSED {
1492	         field_1 =:= irregular(4);               //  4 bits
1493	       }
1494	     }

1496	   The total length of a compressed format must always be defined.  The
1497	   length of each of the fields in a compressed format must also be
1498	   defined.  This means that the bindings in the "UNCOMPRESSED",
1499	   "COMPRESSED", "CONTROL" (see Section 4.12.1.3 below), "INITIAL" (see
1500	   Section 4.12.1.4 below) and "DEFAULT" (see Section 4.12.1.5 below)
1501	   field lists must between them define the "CLENGTH" attribute of every
1502	   field in a compressed format so that there is an unambiguous mapping
1503	   from the bits in the compressed format to the fields listed in each
1504	   "COMPRESSED" field list.

1506	4.12.1.3.  Control Fields - "CONTROL"

1508	   Control fields are defined using the "CONTROL" field list.  The
1509	   control field list specifies all fields that do not appear in the
1510	   uncompressed format but which have an uncompressed value
1511	   (specifically those with an "ULENGTH" greater than zero).  Such
1512	   fields may be used to help compress fields from the uncompressed
1513	   format more efficiently.  A control field could be used to improve
1514	   efficiency by representing some commonality between a number of the
1515	   uncompressed fields, or by representing some information about the
1516	   flow that is not explicitly contained in the protocol headers.

1518	   For example in IPv4, the behaviour of the IP-ID field in a flow
1519	   varies depending on how the endpoints handle IP-IDs.  Sometimes the
1520	   behaviour is effectively random and sometimes the IP-ID follows a
1521	   predictable sequence.  The type of IP-ID behaviour is information
1522	   that is never communicated explicitly in the uncompressed header.

1524	   However, a profile can still be designed to identify the behaviour
1525	   and adjust the compression strategy according to the identified
1526	   behaviour, thereby improving the compression performance.  To do so,
1527	   the ROHC-FN specification can introduce an explicit field to
1528	   communicate the IP-ID behaviour in compressed format -- this is done
1529	   by introducing a control field:

1531	     ipv4
1532	     {
1533	       UNCOMPRESSED {
1534	         version;       // 4 bits
1535	         hdr_length;    // 4 bits
1536	         protocol;      // 8 bits
1537	         tos_tc;        // 6 bits
1538	         ip_ecn_flags;  // 2 bits
1539	         ttl_hopl;      // 8 bits
1540	         df;            // 1 bit
1541	         mf;            // 1 bit
1542	         rf;            // 1 bit
1543	         frag_offset;   // 13 bits
1544	         ip_id;         // 16 bits
1545	         src_addr;      // 32 bits
1546	         dst_addr;      // 32 bits
1547	         checksum;      // 16 bits
1548	         length;        // 16 bits
1549	       }

1551	       CONTROL {
1552	         ip_id_behavior; // 1 bit
1553	            :
1554	            :

1556	   The "CONTROL" field list is equivalent to the "UNCOMPRESSED" field
1557	   list for fields that do not appear in the uncompressed format.  It
1558	   defines a field that has the same properties (the same defined
1559	   attributes etc.) as fields appearing in the uncompressed format.

1561	   Control fields are initialised by using the appropriate encoding
1562	   methods and/or by using "ENFORCE" statements.  This may be done
1563	   inside the "CONTROL" field list.

1565	   For example:

1567	     example_encoding_method_definition
1568	     {
1569	       UNCOMPRESSED {
1570	         field_1 =:= some_encoding;
1571	       }

1573	       CONTROL {
1574	         scaled_field;
1575	         ENFORCE(scaled_field.UVALUE == field_1.UVALUE / 8);
1576	         ENFORCE(scaled_field.ULENGTH == field_1.ULENGTH - 3);
1577	       }

1579	       COMPRESSED {
1580	         scaled_field =:= lsb(4, 0);
1581	       }
1582	     }

1584	   This control field is used to scale down a field in the uncompressed
1585	   format by a factor of 8 before encoding it with the "lsb" encoding
1586	   method.  Scaling it down makes the "lsb" encoding more efficient.

1588	   Control fields may also be used with global scope.  In this case
1589	   their declaration must be outside of any encoding method definition.
1590	   They are then visible within any encoding method thus allowing
1591	   information to be shared between encoding methods directly.

1593	4.12.1.4.  Initial Values - "INITIAL"

1595	   In order to allow fields in the very first usage of a specific format
1596	   to be compressed with "static", "lsb", or other encoding methods
1597	   which depend on the context, it is possible to specify initial
1598	   bindings for such fields.  This is done using "INITIAL", for example:

1600	     INITIAL {
1601	        field =:= uncompressed_value(4, 6);
1602	     }

1604	   This initialises the "UVALUE" of "field" to 6 and initialises its
1605	   "ULENGTH" to 4.  Unlike all other bindings specified in the formal
1606	   notation, these bindings are applied to the context of the field, if
1607	   the field's context is undefined.  This is particularly useful when
1608	   using encoding methods which rely on context being present, such as
1609	   "static" or "lsb", for e.g. the first packet in a flow.

1611	   Because the "INITIAL" field list is used to bind the context alone,
1612	   it makes no sense to specify initial bindings which themselves rely
1613	   on the context (e.g. lsb).  Such usage is not allowed.

1615	4.12.1.5.  Default Field Bindings - "DEFAULT"

1617	   Default bindings may be specified for each field or attribute.  The
1618	   default encoding methods specify the encoding method to use for a
1619	   field if no binding is given elsewhere for the value of that field.
1620	   This is helpful to keep the definition of the formats concise, as the
1621	   same encoding method need not be repeated for every format, when for
1622	   example defining multiple formats (see Section 4.12.3).

1624	   Default bindings are optional and may be given for any combination of
1625	   fields and attributes which are in scope.

1627	   The syntax for specifying default bindings is similar to that used to
1628	   specify a compressed or uncompressed format.  However, the order of
1629	   the fields in the field list does not affect the order of the fields
1630	   in either the compressed or uncompressed format.  This is because the
1631	   field order is specified individually for each "COMPRESSED" format
1632	   and "UNCOMPRESSED" format.

1634	   Here is an example:

1636	       DEFAULT {
1637	         field_1 =:= uncompressed_value(4, 1);
1638	         field_2 =:= uncompressed_value(4, 2);
1639	         field_3 =:= lsb(3, -1);
1640	         ENFORCE(field_4.ULENGTH == 4);
1641	       }

1643	   Here default bindings are specified for fields 1 to 3.  A default
1644	   binding for the "ULENGTH" attribute of field_4 is also specified.

1646	   Fields for which there is a default encoding method do not need their
1647	   bindings to be specified in the field list of any format that uses
1648	   the default encoding method for that field.  Any format that does not
1649	   use the default encoding method must explicitly specify a binding for
1650	   the value of that field's attributes.

1652	   If a binding is not specified for the attributes of a field, the
1653	   default encoding method is used.  If the default encoding method
1654	   always compresses the field down to zero bits, the field can be
1655	   omitted from the compressed format's field list.  Like any other zero
1656	   bit field, its position in the field list is not significant.

1658	   The "DEFAULT" field list may contain default bindings for individual
1659	   attributes by using "ENFORCE" statements.  A default binding for an
1660	   individual attribute will only be used if there is no binding given
1661	   for that attribute nor the field to which it belongs.  If there is an
1662	   "ENFORCE" statement binding that attribute, or an encoding method
1663	   binding the field to which it belongs, the default binding for the
1664	   attribute will not be used.  This applies even if the specified
1665	   encoding method does not bind the particular attribute given in the
1666	   "DEFAULT" section.  However an "ENFORCE" statement which just binds
1667	   the length of the field still allows the default bindings to be used,
1668	   except for default "ENFORCE" statements which bind nothing but the
1669	   field's length.

1671	   To clarify, assuming the default methods given in the example above,
1672	   the first three of the following four compressed formats would not
1673	   use the default binding for "field_4.ULENGTH":

1675	       COMPRESSED format1 {
1676	         ENFORCE(field_4.ULENGTH == 3); // set ULENGTH to 3
1677	         ENFORCE(field_4.UVALUE == 7);  // set UVALUE to 7
1678	       }

1680	       COMPRESSED format2 {
1681	         field_4 =:= irregular(3);      // set ULENGTH to 3
1682	       }

1684	       COMPRESSED format3 {
1685	         field_4 =:= '1010';            // set ULENGTH to zero
1686	       }

1688	       COMPRESSED format4 {

1690	         ENFORCE(field_4.UVALUE == 12); // use default ULENGTH
1691	       }

1693	   The fourth format is the only one which uses the default binding for
1694	   "field_4.ULENGTH".

1696	   In summary, the default bindings of an encoding method are only used
1697	   for formats which do not already specify an encoding for the value of
1698	   all of their fields.  For the formats that do use the default
1699	   methods, only those fields and attributes whose bindings are not
1700	   specified are looked up in the default methods.

1702	4.12.2.  Arguments

1704	   Encoding methods may take arguments that control the mapping between
1705	   compressed and uncompressed fields.  These are specified immediately
1706	   after the method's name, in parentheses, as a comma separated list.

1708	   For example:

1710	     poor_mans_lsb(variable_length)
1711	     {
1712	       UNCOMPRESSED {
1713	         constant_bits;
1714	         variable_bits;
1715	       }

1717	       COMPRESSED {
1718	         variable_bits =:= irregular(variable_length);
1719	         constant_bits =:= static;
1720	       }
1721	     }

1723	   As with any encoding method, all arguments take individual values
1724	   such as an integer literal or a field attribute, rather than entire
1725	   fields.  Although entire fields cannot be passed as arguments, it is
1726	   possible to pass each of their attributes instead, which is
1727	   equivalent.

1729	   Recall that all bindings are two-way so that rather than the
1730	   arguments acting as "inputs" to the encoding method, the result of an
1731	   encoding method may be to bind the parameters passed to it.

1733	   For example:

1735	     set_to_double(arg1, arg2)
1736	     {
1737	       CONTROL {
1738	         ENFORCE(arg1 == 2 * arg2);
1739	       }
1740	     }

1742	   This encoding method will attempt to bind the first argument to twice
1743	   the value of the second.  In fact this "encoding" method is
1744	   pathological.  Since it defines no fields, it does not do any actual
1745	   encoding at all.  "CONTROL" sections are more appropriate to use for
1746	   this purpose than "UNCOMPRESSED".

1748	4.12.3.  Multiple Formats

1750	   Encoding methods can also define multiple formats for a given header.
1751	   This allows different compression methods to be used depending on
1752	   what is the most efficient way of compressing a particular header.

1754	   For example, a field may have a fixed value most of the time, but the
1755	   value may occasionally change.  Using a single format for the
1756	   encoding, this field would have to be encoded using "irregular" (see
1757	   Section 4.11.3), even though the value only changes rarely.  However,
1758	   by defining multiple formats, we can provide two alternative
1759	   encodings: one for when the value remains fixed and another for when
1760	   the value changes.

1762	   This is the topic of the following sub-sections.

1764	4.12.3.1.  Naming Convention

1766	   When compressed formats are defined, they must be defined using the
1767	   reserved word "COMPRESSED".  Similarly uncompressed formats must be
1768	   defined using the reserved word "UNCOMPRESSED".  After each of these
1769	   keywords, a name may be given for the format.  If no name is given to
1770	   the format, the name of the format is empty.

1772	   Format names, except for the case where the name is empty, follow the
1773	   syntactic rules of identifiers as described in Section 4.2.

1775	   Format names must be unique within the scope of the encoding method
1776	   to which they belong, except for the empty name which may be used for
1777	   one "COMPRESSED" and one "UNCOMPRESSED" format.

1779	4.12.3.2.  Format Discrimination

1781	   Each of the compressed formats has its own field list.  A compressor
1782	   may pick any of these alternative formats to compress a header, as
1783	   long as the field bindings it employs can be used with the
1784	   uncompressed format.  For example, the compressor could not choose to
1785	   use a compressed format that had a "static" encoding for a field
1786	   whose "UVALUE" attribute differs from its corresponding value in the
1787	   context.

1789	   More formally, the compressor can choose any combination of an
1790	   uncompressed format and a compressed format for which no binding for
1791	   any of the field's attributes "fail", i.e. the encoding methods and
1792	   "ENFORCE" statements (see Section 4.9) which bind their compressed
1793	   attributes succeed.  If there are multiple successful combinations,
1794	   the compressor can choose any one.  Otherwise if there are no
1795	   successful combinations, the encoding method "fails".  A format will
1796	   never fail due to it not defining an uncompressed attribute of a
1797	   field.  A format only fails if it fails to define one of the
1798	   compressed attributes of one of the fields in the compressed format.

1800	   Because the compressor has a choice, it must be possible for the
1801	   decompressor to discriminate between the different compressed formats
1802	   that the compressor could have chosen.  A simple approach to this
1803	   problem is for each compressed format to include a "discriminator"
1804	   that uniquely identifies that particular "COMPRESSED" format.  A
1805	   discriminator is a control field; it is not derived from any of the
1806	   uncompressed field values (see Section 4.11.2).

1808	4.12.3.3.  Example of Multiple Formats

1810	   Putting this all together, here is a complete example of the
1811	   definition of an encoding method with multiple compressed formats:

1813	     example_multiple_formats
1814	     {
1815	       UNCOMPRESSED {
1816	         field_1;  //  4 bits
1817	         field_2;  //  4 bits
1818	         field_3;  // 24 bits
1819	       }

1821	       DEFAULT {
1822	         field_1 =:= static;
1823	         field_2 =:= uncompressed_value(4, 2);
1824	         field_3 =:= lsb(4, 0);
1825	       }

1827	       COMPRESSED format0 {
1828	         discriminator =:= '0'; // 1 bit
1829	         field_3;               // 4 bits
1830	       }

1832	       COMPRESSED format1 {
1833	         discriminator =:= '1';           //  1 bit
1834	         field_1       =:= irregular(4);  //  4 bits
1835	         field_3       =:= irregular(24); // 24 bits
1836	       }
1837	     }

1839	   Note the following:

1841	   o  "field_1" and "field_3" both have default encoding methods
1842	      specified for them, which are used in "format0", but are
1843	      overridden in "format1"; the default encoding method of "field_2"
1844	      however, is not overridden.
1845	   o  "field_1" and "field_2" have default encoding methods which
1846	      compress to zero bits.  When these are used in "format0", the
1847	      field names do not appear in the field list.
1848	   o  "field_3" has an encoding method which does not compress to zero
1849	      bits, so whilst "field_3" has no encoding specified for it in the
1850	      field list of "format0", it still needs to appear in the field
1851	      list to specify where it goes in the compressed format.

1853	   o  In the example, all the fields in the uncompressed format have
1854	      default encoding methods specified for them, but this is not a
1855	      requirement.  Default encodings can be specified for only some or
1856	      even none of the fields of the uncompressed format.
1857	   o  In the example, all the default encoding methods are on fields
1858	      from the uncompressed format, but this is not a requirement.
1859	      Default encoding methods can be specified for control fields.

1861	4.13.  Profile-specific Encoding Methods

1863	   The library of encoding methods defined by ROHC-FN in Section 4.11
1864	   provides a basic and generic set of field encoding methods.  When
1865	   using a ROHC-FN specification in a ROHC profile, some additional
1866	   encodings specific to the particular protocol header being compressed
1867	   may however be needed, such as methods that infer the value of a
1868	   field from other values.

1870	   These methods are specific to the properties of the protocol being
1871	   compressed and will thus have to be defined within the profile
1872	   specification itself.  Such profile-specific encoding methods,
1873	   defined either in ROHC-FN syntax or rigorously in plain text, can be
1874	   referred to in the ROHC-FN specification of the profile's formats in
1875	   the same way as any other method in the ROHC-FN library.

1877	   Encoding methods which are not defined in the formal notation are
1878	   specified by giving their name, followed by a short description of
1879	   where they are defined, in double quotes, and a semi-colon.

1881	   For example:

1883	     inferred_ip_v4_header_checksum "defined in RFCxxxx Section 6.4.1";

1885	5.  Security considerations

1887	   This draft describes a formal notation similar to ABNF [RFC4234], and
1888	   hence is not believed to raise any security issues (note that ABNF
1889	   has a completely separate purpose to the ROHC formal notation).

1891	6.  IANA Considerations

1893	   This document has no actions for IANA.

1895	7.  Contributors

1897	   Richard Price did much of the foundational work on the formal
1898	   notation.  He authored the initial internet draft describing a formal
1899	   notation on which this document is based.

1901	   Kristofer Sandlund contributed to this work by applying new ideas to
1902	   the ROHC-TCP profile, by providing feedback and by helping resolving
1903	   different issues during the entire development of the notation.

1905	   Carsten Bormann provided the translation of the formal notation
1906	   syntax using ABNF in Appendix A, and also contributed with feedback
1907	   and reviews to validate the completeness and the correctness of the
1908	   notation.

1910	8.  Acknowledgements

1912	   A number of important concepts and ideas have been borrowed from ROHC
1913	   [RFC3095].

1915	   Thanks to Mark West, Eilert Brinkmann, Alan Ford and Lars-Erik
1916	   Jonsson for their contribution, reviews and feedback which led to
1917	   significant improvements to the readability, completeness and overall
1918	   quality of the notation.

1920	   Thanks to Stewart Sadler, Caroline Daniels, Alan Finney and David
1921	   Findlay for their reviews and comments.  Thanks to Rob Hancock and
1922	   Stephen McCann for early work on the formal notation.  The authors
1923	   would also like to thank Christian Schmidt, Qian Zhang, Hongbin Liao
1924	   and Max Riegel for their comments and valuable input.

1926	   Additional thanks: this document was reviewed during working group
1927	   last-call by committed reviewers Mark West, Carsten Bormann and Joe
1928	   Touch, as well as by Sally Floyd who provided a review at the request
1929	   of the Transport Area Directors.  Thanks also to Magnus Westerlund
1930	   for his feedback in preparation for the IESG review.

1932	9.  References

1934	9.1.  Normative References

1936	   [C90]      ISO/IEC, "ISO/IEC 9899:1990 Information technology --
1937	              Programming Language C", ISO 9899:1990, April 1990.

1939	   [I-D.ietf-rohc-rfc3095bis-framework]
1940	              Jonsson, L., "The RObust Header Compression (ROHC)
1941	              Framework", draft-ietf-rohc-rfc3095bis-framework-01 (work
1942	              in progress), July 2006.

1944	   [RFC2822]  Resnick, P., Ed., "STANDARD FOR THE FORMAT OF ARPA
1945	              INTERNET TEXT MESSAGES", RFC 2822, April 2001.

1947	   [RFC4234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
1948	              Specifications: ABNF", RFC 4234, October 2005.

1950	9.2.  Informative References

1952	   [RFC3095]  Bormann, C., Burmeister, C., Degermark, M., Fukushima, H.,
1953	              Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le,
1954	              K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, K.,
1955	              Wiebke, T., Yoshimura, T., and H. Zheng, "RObust Header
1956	              Compression (ROHC): Framework and four profiles: RTP, UDP,
1957	              ESP, and uncompressed", RFC 3095, July 2001.

1959	   [RFC791]   University of Southern California, "DARPA INTERNET PROGRAM
1960	              PROTOCOL SPECIFICATION", RFC 791, September 1981.

1962	Appendix A.  Formal Syntax of ROHC-FN

1964	   This section gives a definition of the syntax of ROHC-FN in ABNF
1965	   [RFC4234], using "fnspec" as the start rule.
1966	   ; overall structure
1967	   fnspec     = S *(constdef S) [globctl S] 1*(methdef S)
1968	   constdef   = constname S "=" S expn S ";"
1969	   globctl    = CONTROL S formbody
1970	   methdef    = id S [parmlist S] "{" S 1*(formatdef S) "}"
1971	              / id S [parmlist S] STRQ *STRCHAR STRQ S ";"
1972	   parmlist   = "(" S id S *( "," S id S ) ")"
1973	   formatdef  = formhead S formbody
1974	   formhead   = UNCOMPRESSED [ 1*WS id ]
1975	              / COMPRESSED [ 1*WS id ]
1976	              / CONTROL / INITIAL / DEFAULT
1977	   formbody   = "{" S *((fielddef/enforcer) S) "}"
1978	   fielddef   = fieldgroup S ["=:=" S encspec S] [lenspec S] ";"
1979	   fieldgroup = fieldname *( S ":" S fieldname )
1980	   fieldname  = id
1981	   encspec    = "'" *("0"/"1") "'"
1982	              / id [ S "(" S expn S *( "," S expn S ) ")"]
1983	   lenspec    = "[" S expn S *("," S expn S) "]"
1984	   enforcer   = ENFORCE S "(" S expn S ")" S ";"
1985	   ; expressions
1986	   expn  = *(expnb S "||" S) expnb
1987	   expnb = *(expna S "&&" S) expna
1988	   expna = *(expn7 S ("=="/"!=") S) expn7
1989	   expn7 = *(expn6 S ("<"/"<="/">"/">=") S) expn6
1990	   expn6 = *(expn4 S ("+"/"-") S) expn4
1991	   expn4 = *(expn3 S ("*"/"/"/"%") S) expn3
1992	   expn3 = expn2 [S "^" S expn3]
1993	   expn2 = ["!" S] expn1
1994	   expn1 = expn0 / attref / constname / litval / id
1995	   expn0 = "(" S expn S ")" / VARIABLE
1996	   attref       = fieldnameref "." attname
1997	   fieldnameref = fieldname / THIS
1998	   attname      = ( U / C ) ( LENGTH / VALUE )
1999	   litval       = ["-"] "0b" 1*("0"/"1")
2000	                / ["-"] "0x" 1*(DIGIT/"a"/"b"/"c"/"d"/"e"/"f")
2001	                / ["-"] 1*DIGIT
2002	                / false / true

2004	   ; lexical categories
2005	   constname = UPCASE *(UPCASE / DIGIT / "_")
2006	   id        = ALPHA *(ALPHA / DIGIT / "_")
2007	   ALPHA     = %x41-5A / %x61-7A
2008	   UPCASE    = %x41-5A
2009	   DIGIT     = %x30-39
2010	   COMMENT   = "//" *(SP / HTAB / VCHAR) CRLF
2011	   SP        = %x20
2012	   HTAB      = %x09
2013	   VCHAR     = %x21-7E
2014	   CRLF      = %x0A / %x0D.0A
2015	   NL        = COMMENT / CRLF
2016	   WS        = SP / HTAB / NL
2017	   S         = *WS
2018	   STRCHAR   = SP / HTAB / %x21 / %x23-7E
2019	   STRQ      = %x22
2020	   ; case-sensitive literals
2021	   C            = %d67
2022	   COMPRESSED   = %d67.79.77.80.82.69.83.83.69.68
2023	   CONTROL      = %d67.79.78.84.82.79.76
2024	   DEFAULT      = %d68.69.70.65.85.76.84
2025	   ENFORCE      = %d69.78.70.79.82.67.69
2026	   INITIAL      = %d73.78.73.84.73.65.76
2027	   LENGTH       = %d76.69.78.71.84.72
2028	   THIS         = %d84.72.73.83
2029	   U            = %d85
2030	   UNCOMPRESSED = %d85.78.67.79.77.80.82.69.83.83.69.68
2031	   VALUE        = %d86.65.76.85.69
2032	   VARIABLE     = %d86.65.82.73.65.66.76.69
2033	   false        = %d102.97.108.115.101
2034	   true         = %d116.114.117.101

2036	Appendix B.  Bit-level Worked Example

2038	   This section gives a worked example at the bit level, showing how a
2039	   simple ROHC-FN specification describes the compression of real data
2040	   from an imaginary protocol header.  The example used has been kept
2041	   fairly simple, whilst still aiming to illustrate some of the
2042	   intricacies that arise in use of the notation.  In particular, fields
2043	   have been kept short to make it possible to read the binary
2044	   representation of the headers without too much difficulty.

2046	B.1.  Example Packet Format

2048	   Our imaginary header is just 16 bits long, and consists of the
2049	   following fields:

2051	   1.  version number -- 2 bits
2052	   2.  type -- 2 bits
2053	   3.  flow id -- 4 bits
2054	   4.  sequence number -- 4 bits
2055	   5.  flag bits -- 4 bits

2057	   So for example 0101000100010000 indicates a header with a version
2058	   number of one, a type of one, a flow id of one, a sequence number of
2059	   one, and all flag bits set to zero.

2061	   Here is an ASCII box notation diagram of the imaginary header:

2063	     0   1   2   3   4   5   6   7
2064	   +---+---+---+---+---+---+---+---+
2065	   |version| type  |    flow_id    |
2066	   +---+---+---+---+---+---+---+---+
2067	   |  sequence_no  |   flag_bits   |
2068	   +---+---+---+---+---+---+---+---+

2070	B.2.  Initial Encoding

2072	   An initial definition based solely on the above information is:

2074	     eg_header
2075	     {
2076	       UNCOMPRESSED {
2077	         version_no   [ 2 ];
2078	         type         [ 2 ];
2079	         flow_id      [ 4 ];
2080	         sequence_no  [ 4 ];
2081	         flag_bits    [ 4 ];
2082	       }

2084	       COMPRESSED initial_definition {
2085	         version_no  =:= irregular(2);
2086	         type        =:= irregular(2);
2087	         flow_id     =:= irregular(4);
2088	         sequence_no =:= irregular(4);
2089	         flag_bits   =:= irregular(4);
2090	       }
2091	     }

2093	   This defines the format nicely, but doesn't actually offer any
2094	   compression.  If we use it to encode the above header, we get:

2096	     Uncompressed header: 0101000100010000
2097	     Compressed header:   0101000100010000

2099	   This is because we have stated that all fields are "irregular" --
2100	   i.e. we haven't specified anything about their behaviour.

2102	   Note that since we have only one compressed format and one
2103	   uncompressed format, it makes no difference whether the encoding
2104	   methods for each field are specified in the compressed or
2105	   uncompressed format.  It would make no difference at all if we wrote
2106	   the following instead:

2108	     eg_header
2109	     {
2110	       UNCOMPRESSED {
2111	         version_no  =:= irregular(2);
2112	         type        =:= irregular(2);
2113	         flow_id     =:= irregular(4);
2114	         sequence_no =:= irregular(4);
2115	         flag_bits   =:= irregular(4);
2116	       }

2118	       COMPRESSED initial_definition {
2119	         version_no   [ 2 ];
2120	         type         [ 2 ];
2121	         flow_id      [ 4 ];
2122	         sequence_no  [ 4 ];
2123	         flag_bits    [ 4 ];
2124	       }
2125	     }

2127	B.3.  Basic Compression

2129	   In order to achieve any compression we need to notate more knowledge
2130	   about the header and its behaviour in a flow.  For example, we may
2131	   know the following facts about the header:

2133	   1.  version number -- indicates which version of the protocol this
2134	       is: always one for this version of the protocol
2135	   2.  type -- may take any value.
2136	   3.  flow id -- may take any value.
2137	   4.  sequence number -- make take any value
2138	   5.  flag bits -- contains three flags, a, b and c, each of which may
2139	       be set or clear, and a reserved flag bit, which is always clear
2140	       (i.e. zero).

2142	   We could notate this knowledge as follows:

2144	     eg_header
2145	     {
2146	       UNCOMPRESSED {
2147	         version_no     [ 2 ];
2148	         type           [ 2 ];
2149	         flow_id        [ 4 ];
2150	         sequence_no    [ 4 ];
2151	         abc_flag_bits  [ 3 ];
2152	         reserved_flag  [ 1 ];
2153	       }

2155	       COMPRESSED basic {
2156	         version_no    =:= uncompressed_value(2, 1)  [ 0 ];
2157	         type          =:= irregular(2)              [ 2 ];
2158	         flow_id       =:= irregular(4)              [ 4 ];
2159	         sequence_no   =:= irregular(4)              [ 4 ];
2160	         abc_flag_bits =:= irregular(3)              [ 3 ];
2161	         reserved_flag =:= uncompressed_value(1, 0)  [ 0 ];
2162	       }
2163	     }

2165	   Using this simple scheme, we have successfully encoded the fact that
2166	   one of the fields has a permanently fixed value of one, and therefore
2167	   contains no useful information.  We have also encoded the fact that
2168	   the final flag bit is always zero, which again contains no useful
2169	   information.  Both of these facts have been notated using the
2170	   "uncompressed_value" encoding method (see Section 4.11.1).

2172	   Using this new encoding on the above header, we get:

2174	     Uncompressed header: 0101000100010000
2175	     Compressed header:   0100010001000

2177	   which reduces the amount of data we need to transmit by roughly 20%.
2178	   However, this encoding fails to take advantage of relationships
2179	   between values of a field in one packet and its value in subsequent
2180	   packets.  For example, every header in the following sequence is
2181	   compressed by the same amount despite the similarities between them:

2183	     Uncompressed header: 0101000100010000
2184	     Compressed header:   0100010001000

2186	     Uncompressed header: 0101000101000000
2187	     Compressed header:   0100010100000

2189	     Uncompressed header: 0110000101110000
2190	     Compressed header:   1000010111000

2192	B.4.  Inter-packet compression

2194	   The profile we have defined so far has not compressed the sequence
2195	   number or flow ID fields at all, since they can take any value.
2196	   However the value of each of these fields in one header has a very
2197	   simple relationship to their values in previous headers:
2198	   o  the sequence number is unusual -- it increases by three each time,
2199	   o  the flow_id stays the same -- it always has the same value that it
2200	      did in the previous header in the flow,
2201	   o  the abc_flag_bits stay the same most of the time -- they usually
2202	      have the same value that they did in the previous header in the
2203	      flow.

2205	   An obvious way of notating this is as follows:

2207	     // This obvious encoding will not work (correct encoding below)
2208	     eg_header
2209	     {
2210	       UNCOMPRESSED {
2211	         version_no     [ 2 ];
2212	         type           [ 2 ];
2213	         flow_id        [ 4 ];
2214	         sequence_no    [ 4 ];
2215	         abc_flag_bits  [ 3 ];
2216	         reserved_flag  [ 1 ];
2217	       }

2219	       COMPRESSED obvious {
2220	         version_no    =:= uncompressed_value(2, 1);
2221	         type          =:= irregular(2);
2222	         flow_id       =:= static;
2223	         sequence_no   =:= lsb(0, -3);
2224	         abc_flag_bits =:= irregular(3);
2225	         reserved_flag =:= uncompressed_value(1, 0);
2226	       }
2227	     }

2229	   The dependency on previous packets is notated using the "static" and
2230	   "lsb" encoding methods (see Section 4.11.4 and Section 4.11.5
2231	   respectively).  However there are a few problems with the above
2232	   notation.

2234	   Firstly, and most importantly, the "flow_id" field is notated as
2235	   "static" which means that it doesn't change from packet to packet.
2236	   However, the notation does not indicate how to communicate the value
2237	   of the field initially.  There is no point saying "it's the same
2238	   value as last time", if there has not been a first time where we
2239	   define what that value is, so that it can be referred back to.  The
2240	   above notation provides no way of communicating that.  Similarly with
2241	   the sequence number -- there needs to be a way of communicating its
2242	   initial value.  In fact, except for the explicit notation indicating
2243	   their lengths, even the lengths of these two fields would be left
2244	   undefined.  This problem will be solved below, in Appendix B.5.

2246	   Secondly, the sequence number field is communicated very efficiently
2247	   in zero bits, but it is not at all robust against packet loss.  If a
2248	   packet is lost then there is no way to handle the missing sequence
2249	   number.  When communicating sequence numbers, or any other field
2250	   encoding with LSB encoding, a very important consideration for the
2251	   notator is how robust against packet loss the compressed protocol
2252	   should be.  This will vary a lot from protocol stack to protocol
2253	   stack.  For the example protocol we'll assume short, low overhead
2254	   flows and say we need to be robust to the loss of just one packet,
2255	   which we can achieve with two bits of LSB encoding (one bit isn't
2256	   enough since the sequence number increases by three each time, see
2257	   Section 4.11.5).  This will be solved below in Appendix B.5.

2259	   Finally, although the flag bits are usually the same as in the
2260	   previous header in the flow, the profile doesn't make any use of this
2261	   fact; since they are sometimes not the same as those in the previous
2262	   header, it is not safe to say that they are always the same, so
2263	   "static" encoding can't be used exclusively.  This problem will be
2264	   solved later through the use of multiple formats in Appendix B.6.

2266	B.5.  Specifying Initial Values

2268	   To communicate initial values for fields compressed with a context
2269	   dependent encoding such as "static" or "lsb" we use an "INITIAL"
2270	   field list.  This can help with fields whose start value is fixed and
2271	   known.  For example if we knew that at the start of the flow,
2272	   "flow_id" would always be 1 and "sequence_no" would always be 0, we
2273	   could notate that like this:

2275	     // This encoding will not work either (correct encoding below)
2276	     eg_header
2277	     {
2278	       UNCOMPRESSED {
2279	         version_no     [ 2 ];
2280	         type           [ 2 ];
2281	         flow_id        [ 4 ];
2282	         sequence_no    [ 4 ];
2283	         abc_flag_bits  [ 3 ];
2284	         reserved_flag  [ 1 ];
2285	       }

2287	       INITIAL {
2288	         // set initial values of fields before flow starts
2289	         flow_id     =:= uncompressed_value(4, 1);
2290	         sequence_no =:= uncompressed_value(4, 0);
2291	       }

2293	       COMPRESSED obvious {
2294	         version_no    =:= uncompressed_value(2, 1);
2295	         type          =:= irregular(2);
2296	         flow_id       =:= static;
2297	         sequence_no   =:= lsb(2, -3);
2298	         abc_flag_bits =:= irregular(3);
2299	         reserved_flag =:= uncompressed_value(1, 0);
2300	       }
2301	     }

2303	   However, this use of "INITIAL" is no good since the initial values of
2304	   both "flow_id" and "sequence_no" vary from flow to flow.  "INITIAL"
2305	   is only applicable where the initial value of a field is fixed, as is
2306	   often the case with control fields.

2308	B.6.  Multiple Packet Formats

2310	   To communicate initial values for the sequence number and flow ID
2311	   fields correctly, and to take advantage of the fact that the flag
2312	   bits are usually the same as in the previous header, we need to
2313	   depart from the single format encoding we are currently using and
2314	   instead use multiple formats.  Here, we have expressed the encodings
2315	   for two of the fields in the uncompressed format, since they will
2316	   always be true for uncompressed headers of that format.  The
2317	   remaining fields, whose encoding method may depend on exactly how the
2318	   header is being compressed, have their encodings specified in the
2319	   compressed formats.

2321	     eg_header
2322	     {
2323	       UNCOMPRESSED {
2324	         version_no    =:= uncompressed_value(2, 1) [ 2 ];
2325	         type                                       [ 2 ];
2326	         flow_id                                    [ 4 ];
2327	         sequence_no                                [ 4 ];
2328	         abc_flag_bits                              [ 3 ];
2329	         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
2330	       }

2332	       COMPRESSED irregular_format {
2333	         discriminator =:= '0'          [ 1 ];
2334	         version_no                     [ 0 ];
2335	         type          =:= irregular(2) [ 2 ];
2336	         flow_id       =:= irregular(4) [ 4 ];
2337	         sequence_no   =:= irregular(4) [ 4 ];
2338	         abc_flag_bits =:= irregular(3) [ 3 ];
2339	         reserved_flag                  [ 0 ];
2340	       }

2342	       COMPRESSED compressed_format {
2343	         discriminator =:= '1'          [ 1 ];
2344	         version_no                     [ 0 ];
2345	         type          =:= irregular(2) [ 2 ];
2346	         flow_id       =:= static       [ 0 ];
2347	         sequence_no   =:= lsb(2, -3)   [ 2 ];
2348	         abc_flag_bits =:= static       [ 0 ];
2349	         reserved_flag                  [ 0 ];
2350	       }
2351	     }

2353	   Note that we have had to add a discriminator field, so that the
2354	   decompressor can tell which format has been used by the compressor.
2355	   The format with a "static" flow ID and "lsb" encoded sequence number,
2356	   is now 5 bits long.  Note that despite having to add the
2357	   discriminator field, this format is still the same size as the
2358	   original incorrect naive notation, because this notation takes
2359	   advantage of the fact that the abc flag bits rarely change.

2361	   However, the original format (with an "irregular" flow ID and
2362	   sequence number) has also grown by one bit due to the addition of the
2363	   discriminator.  An important consideration when creating multiple
2364	   formats is whether each format occurs frequently enough that the
2365	   average compressed header length is shorter as a result of its usage.
2366	   For example, if in fact the flag bits always changed between packets,
2367	   the "static" encoding could never be used; all we would have achieved
2368	   is to lengthen the "irregular" format by one bit.

2370	   Using the above notation, we now get:

2372	     Uncompressed header: 0101000100010000
2373	     Compressed header:   00100010001000

2375	     Uncompressed header: 0101000101000000
2376	     Compressed header:   10100 ; 00100010100000

2378	     Uncompressed header: 0110000101110000
2379	     Compressed header:   11011 ; 01000010111000

2381	   The first header in the stream is compressed the same way as before,
2382	   except that it now has the extra 1 bit discriminator at the start
2383	   (0).  When a second header arrives, with the same flow ID as the
2384	   first and its sequence number three higher, it can now be compressed
2385	   in two possible ways, either using "compressed_format" or in the same
2386	   way as previously, using "irregular_format".

2388	   Note that we show all theoretically possible encodings of a header as
2389	   defined by the ROHC-FN specification, separated by semi-colons.
2390	   Either of the above encodings for each header could be produced by a
2391	   valid implementation, although a good implementation would always aim
2392	   to pick the encoding which led to the best compression.  A good
2393	   implementation would also take robustness into account and so
2394	   probably wouldn't assume on the second packet that the decompressor
2395	   had available the context necessary to decompress the shorter form of
2396	   the packet.

2398	   Finally, note that the fields whose encoding methods are specified in
2399	   the uncompressed format have zero length when compressed.  This means
2400	   their position in the compressed format is not significant.  In this
2401	   case there is no need to notate them when defining the compressed
2402	   formats.  In the next part of the example we will see that they have
2403	   been removed from the compressed formats altogether.

2405	B.7.  Variable Length Discriminators

2407	   Suppose we do some analysis on flows of our example protocol and
2408	   discover that whilst it is usual for successive packets to have the
2409	   same flags, on the occasions when they don't, the packet is almost
2410	   always a "flags set" packet in which all three of the abc flags are
2411	   set.  To encode the flow more efficiently a format needs to be
2412	   written to reflect this.

2414	   This now gives a total of three formats, which means we need three
2415	   discriminators to differentiate between them.  The obvious solution
2416	   here is to increase the number of bits in the discriminator from one
2417	   to two and for example use discriminators 00, 01, and 10.  However we
2418	   can do slightly better than this.

2420	   Any uniquely identifiable discriminator will suffice, so we can use
2421	   00, 01 and 1.  If the discriminator starts with 1, that's the whole
2422	   thing.  If it starts with 0 the decompressor knows it has to check
2423	   one more bit to determine the kind of format.

2425	   Note that care must be taken when using variable length
2426	   discriminators.  For example, it would be erroneous to use 0, 01 and
2427	   10 as discriminators since after reading an initial 0, the
2428	   decompressor would have no way of knowing if the next bit was a
2429	   second bit of discriminator, or the first bit of the next field in
2430	   the format. 0, 10 and 11 however would be correct as the first bit
2431	   again indicates whether or not there are further discriminator bits
2432	   to follow.

2434	   This gives us the following:

2436	     eg_header
2437	     {
2438	       UNCOMPRESSED {
2439	         version_no    =:= uncompressed_value(2, 1) [ 2 ];
2440	         type                                       [ 2 ];
2441	         flow_id                                    [ 4 ];
2442	         sequence_no                                [ 4 ];
2443	         abc_flag_bits                              [ 3 ];
2444	         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
2445	       }

2447	       COMPRESSED irregular_format {
2448	         discriminator =:= '00'         [ 2 ];
2449	         type          =:= irregular(2) [ 2 ];
2450	         flow_id       =:= irregular(4) [ 4 ];
2451	         sequence_no   =:= irregular(4) [ 4 ];
2452	         abc_flag_bits =:= irregular(3) [ 3 ];
2453	       }

2455	       COMPRESSED flags_set {
2456	         discriminator =:= '01'                     [ 2 ];
2457	         type          =:= irregular(2)             [ 2 ];
2458	         flow_id       =:= static                   [ 0 ];
2459	         sequence_no   =:= lsb(2, -3)               [ 2 ];
2460	         abc_flag_bits =:= uncompressed_value(3, 7) [ 0 ];
2461	       }

2463	       COMPRESSED flags_static {
2464	         discriminator =:= '1'          [ 1 ];
2465	         type          =:= irregular(2) [ 2 ];
2466	         flow_id       =:= static       [ 0 ];
2467	         sequence_no   =:= lsb(2, -3)   [ 2 ];
2468	         abc_flag_bits =:= static       [ 0 ];
2469	       }
2470	     }

2472	   Here is some example output:

2474	     Uncompressed header: 0101000100010000
2475	     Compressed header:   000100010001000

2477	     Uncompressed header: 0101000101000000
2478	     Compressed header:   10100 ; 000100010100000

2480	     Uncompressed header: 0110000101110000
2481	     Compressed header:   11011 ; 001000010111000

2483	     Uncompressed header: 0111000110101110
2484	     Compressed header:   011110 ; 001100011010111

2486	   Here we have a very similar sequence to last time, except that there
2487	   is now an extra message on the end which has the flag bits set.  The
2488	   encoding for the first message in the stream is now one bit larger,
2489	   the encoding for the next two messages is the same as before, since
2490	   that format has not grown, thanks to the use of variable length
2491	   discriminators.  Finally the packet that comes through with all the
2492	   flag bits set can be encoded in just six bits, only one bit more than
2493	   the most common format.  Without the extra format, this last packet
2494	   would have to be encoded using the longest format and would have
2495	   taken up 14 bits.

2497	B.8.  Default encoding

2499	   Some of the common encoding methods used so far have been "factored
2500	   out" into the definition of the uncompressed format meaning that they
2501	   don't need to be defined for every compressed format.  However, there
2502	   is still some redundancy in the notation.  For a number of fields,
2503	   the same encoding method is used several times in different formats
2504	   (though not necessarily in all of them), but the field encoding is
2505	   redefined explicitly each time.  If the encoding for any of these
2506	   fields changed in the future (e.g. if the reserved flag took on some
2507	   new role), then every format which uses that encoding would have to
2508	   be modified to reflect this change.

2510	   This problem can be avoided by specifying default encoding methods
2511	   for these fields.  Doing so can also lead to a more concisely notated
2512	   profile:

2514	     eg_header
2515	     {
2516	       UNCOMPRESSED {
2517	         version_no    =:= uncompressed_value(2, 1) [ 2 ];
2518	         type                                       [ 2 ];
2519	         flow_id                                    [ 4 ];
2520	         sequence_no                                [ 4 ];
2521	         abc_flag_bits                              [ 3 ];
2522	         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
2523	       }

2525	       DEFAULT {
2526	         type          =:= irregular(2);
2527	         flow_id       =:= static;
2528	         sequence_no   =:= lsb(2, -3);
2529	       }

2531	       COMPRESSED irregular_format {
2532	         discriminator =:= '00'         [ 2 ];
2533	         type                           [ 2 ]; // Uses default
2534	         flow_id       =:= irregular(4) [ 4 ]; // Overrides default
2535	         sequence_no   =:= irregular(4) [ 4 ]; // Overrides default
2536	         abc_flag_bits =:= irregular(3) [ 3 ];
2537	       }

2539	       COMPRESSED flags_set {
2540	         discriminator =:= '01' [ 2 ];
2541	         type                   [ 2 ]; // Uses default
2542	         sequence_no            [ 2 ]; // Uses default
2543	         abc_flag_bits =:= uncompressed_value(3, 7);
2544	       }

2546	       COMPRESSED flags_static {
2547	         discriminator =:= '1' [ 1 ];
2548	         type                  [ 2 ]; // Uses default
2549	         sequence_no           [ 2 ]; // Uses default
2550	         abc_flag_bits =:= static;
2551	       }
2552	     }

2554	   The above profile behaves in exactly the same way as the one notated
2555	   previously, since it has the same meaning.  Note that the purpose
2556	   behind the different formats becomes clearer with the default
2557	   encoding methods factored out: all that remains are the encodings
2558	   which are specific to each format.  Note also that default encoding
2559	   methods which compress down to zero bits have become completely
2560	   implicit.  For example the compressed formats using the default
2561	   encoding for "flow_id" don't mention it (the default is "static"
2562	   encoding which compresses to zero bits).

2564	B.9.  Control Fields

2566	   One inefficiency in the compression scheme we have produced thus far
2567	   is that it uses two bits to provide the LSB encoded sequence number
2568	   with robustness for the loss of just one packet.  In theory only one
2569	   bit should be needed.  The root of the problem is the unusual
2570	   sequence number that the protocol uses -- it counts up in increments
2571	   of three.  In order to encode it at maximum efficiency we need to
2572	   translate this into a field that increments by one each time.  We do
2573	   this using a control field.

2575	   A control field is extra data that is communicated in the compressed
2576	   format, but which is not a direct encoding of part of the
2577	   uncompressed header.  Control fields can be used to communicate extra
2578	   information in the compressed format, that allows other fields to be
2579	   compressed more efficiently.

2581	   The control field which we introduce scales the sequence number down
2582	   by a factor of three.  Instead of encoding the original sequence
2583	   number in the compressed packet, we encode the scaled sequence
2584	   number, allowing us to have robustness to the loss of one packet by
2585	   using just one bit of LSB encoding:

2587	     eg_header
2588	     {
2589	       UNCOMPRESSED {
2590	         version_no    =:= uncompressed_value(2, 1) [ 2 ];
2591	         type                                       [ 2 ];
2592	         flow_id                                    [ 4 ];
2593	         sequence_no                                [ 4 ];
2594	         abc_flag_bits                              [ 3 ];
2595	         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
2596	       }

2598	       CONTROL {
2599	         // need modulo maths to calculate scaling correctly,
2600	         // due to 4 bit wrap around
2601	         scaled_seq_no   [ 4 ];
2602	         ENFORCE(sequence_no.UVALUE
2603	                   == (scaled_seq_no.UVALUE * 3) % 16);
2604	       }

2606	       DEFAULT {
2607	         type          =:= irregular(2);
2608	         flow_id       =:= static;
2609	         scaled_seq_no =:= lsb(1, -1);
2610	       }

2612	       COMPRESSED irregular_format {
2613	         discriminator =:= '00'         [ 2 ];
2614	         type                           [ 2 ];
2615	         flow_id       =:= irregular(4) [ 4 ];
2616	         scaled_seq_no =:= irregular(4) [ 4 ]; // Overrides default
2617	         abc_flag_bits =:= irregular(3) [ 3 ];
2618	       }

2620	       COMPRESSED flags_set {
2621	         discriminator =:= '01' [ 2 ];
2622	         type                   [ 2 ];
2623	         scaled_seq_no          [ 1 ]; // Uses default
2624	         abc_flag_bits =:= uncompressed_value(3, 7);
2625	       }

2627	       COMPRESSED flags_static {
2628	         discriminator =:= '1' [ 1 ];
2629	         type                  [ 2 ];
2630	         scaled_seq_no         [ 1 ]; // Uses default
2631	         abc_flag_bits =:= static;
2632	       }
2633	     }

2635	   Normally, the encoding method(s) used to encode a field specify the
2636	   length of the field.  In the above notation, since there is no
2637	   encoding method using "sequence_no" directly, its length needs to be
2638	   defined explicitly using an "ENFORCE" statement.  This is done using
2639	   the abbreviated syntax, both for consistency and also for ease of
2640	   readability.  Note that this is unusual: whereas the majority of
2641	   field length indications are redundant (and thus optional), this one
2642	   isn't.  If it was removed from the above notation, the length of the
2643	   "sequence_no" field would be undefined.

2645	   Here is some example output:

2647	     Uncompressed header: 0101000100010000
2648	     Compressed header:   000100011011000

2650	     Uncompressed header: 0101000101000000
2651	     Compressed header:   1010 ; 000100011100000

2653	     Uncompressed header: 0110000101110000
2654	     Compressed header:   1101 ; 001000011101000

2656	     Uncompressed header: 0111000110101110
2657	     Compressed header:   01110 ; 001100011110111

2659	   In this form, we see that this gives us a saving of a further bit in
2660	   most packets.  Assuming the bulk of a flow is made up of
2661	   "flags_static" headers, the mean size of the headers in a compressed
2662	   flow is now just over a quarter of their size in an uncompressed
2663	   flow.

2665	B.10.  Use Of "ENFORCE" Statements As Conditionals

2667	   Earlier, we created a new format "flags_set" to handle packets with
2668	   all three of the flag bits set.  As it happens, these three flags are
2669	   always all set for "type 3" packets, and are never all set for other
2670	   packet types (a "type 3" packet is one where the type field is set to
2671	   three).

2673	   This allows extra efficiency in encoding such packets.  We know the
2674	   type is three, so we don't need to encode the type field in the
2675	   compressed header.  The type field was previously encoded as
2676	   "irregular(2)" which is two bits long.  Removing this reduces the
2677	   size of the "flags_set" format from five bits to three, making it the
2678	   smallest format in the encoding method definition.

2680	   In order to notate that the "flags_set" format should only be used
2681	   for "type 3" headers, and the "flags_static" format only when the
2682	   type isn't three it is necessary to state these conditions inside
2683	   each format.  This can be done with a "ENFORCE" statement:

2685	     eg_header
2686	     {
2687	       UNCOMPRESSED {
2688	         version_no    =:= uncompressed_value(2, 1) [ 2 ];
2689	         type                                       [ 2 ];
2690	         flow_id                                    [ 4 ];
2691	         sequence_no                                [ 4 ];
2692	         abc_flag_bits                              [ 3 ];
2693	         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
2694	       }

2696	       CONTROL {
2697	         // need modulo maths to calculate scaling correctly,
2698	         // due to 4 bit wrap around
2699	         scaled_seq_no   [ 4 ];
2700	         ENFORCE(sequence_no.UVALUE
2701	                   == (scaled_seq_no.UVALUE * 3) % 16);
2702	       }

2704	       DEFAULT {
2705	         type          =:= irregular(2);
2706	         scaled_seq_no =:= lsb(1, -1);
2707	         flow_id       =:= static;
2708	       }

2710	       COMPRESSED irregular_format {
2711	         discriminator =:= '00'         [ 2 ];
2712	         type                           [ 2 ];
2713	         flow_id       =:= irregular(4) [ 4 ];
2714	         scaled_seq_no =:= irregular(4) [ 4 ];
2715	         abc_flag_bits =:= irregular(3) [ 3 ];
2716	       }

2718	       COMPRESSED flags_set {
2719	         ENFORCE(type.UVALUE == 3); // redundant condition
2720	         discriminator =:= '01'                      [ 2 ];
2721	         type          =:= uncompressed_value(2, 3)  [ 0 ];
2722	         scaled_seq_no                               [ 1 ];
2723	         abc_flag_bits =:= uncompressed_value(3, 7)  [ 0 ];
2724	       }

2726	       COMPRESSED flags_static {
2727	         ENFORCE(type.UVALUE != 3);
2728	         discriminator =:= '1'    [ 1 ];
2729	         type                     [ 2 ];
2730	         scaled_seq_no            [ 1 ];
2731	         abc_flag_bits =:= static [ 0 ];
2732	       }
2733	     }

2735	   The two "ENFORCE" statements in the last two formats act as "guards".
2736	   Guards prevent formats from being used under the wrong circumstances.
2737	   In fact the "ENFORCE" statement in "flags_set" is redundant.  The
2738	   condition it guards for is already enforced by the new encoding
2739	   method used for the "type" field.  The encoding method
2740	   "uncompressed_value(2,3)" binds the "UVALUE" attribute to three.
2741	   This is exactly what the "ENFORCE" statement does, so it can be
2742	   removed without any change in meaning.  The "uncompressed_value"
2743	   encoding method on the other hand is not redundant.  It specifies
2744	   other bindings on the type field in addition to the one which the
2745	   "ENFORCE" statement specifies.  Therefore it would not be possible to
2746	   remove the encoding method and leave just the "ENFORCE" statement.

2748	   Note that a guard is solely preventative.  A guard can never force a
2749	   format to be chosen by the compressor.  A format can only be
2750	   guaranteed to be chosen in a given situation if there are no other
2751	   formats which can be used instead.  This is demonstrated in the
2752	   example output below.  The compressor can still choose the
2753	   "irregular" format if it wishes:

2755	     Uncompressed header: 0101000100010000
2756	     Compressed header:   000100011011000

2758	     Uncompressed header: 0101000101000000
2759	     Compressed header:   1010 ; 000100011100000

2761	     Uncompressed header: 0110000101110000
2762	     Compressed header:   1101 ; 001000011101000

2764	     Uncompressed header: 0111000110101110
2765	     Compressed header:   010 ; 001100011110111

2767	   This saves just two extra bits (a 7% saving) in the example flow.

2769	Authors' Addresses

2771	   Robert Finking
2772	   Siemens/Roke Manor
2773	   Roke Manor Research Ltd.
2774	   Romsey, Hampshire  SO51 0ZN
2775	   UK

2777	   Phone: +44 (0)1794 833189
2778	   Email: robert.finking@roke.co.uk
2779	   URI:   http://www.roke.co.uk

2781	   Ghyslain Pelletier
2782	   Ericsson
2783	   Box 920
2784	   Lulea  SE-971 28
2785	   Sweden

2787	   Phone: +46 (0) 8 404 29 43
2788	   Email: ghyslain.pelletier@ericsson.com

2790	Full Copyright Statement

2792	   Copyright (C) The IETF Trust (2006).

2794	   This document is subject to the rights, licenses and restrictions
2795	   contained in BCP 78, and except as set forth therein, the authors
2796	   retain all their rights.

2798	   This document and the information contained herein are provided on an
2799	   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
2800	   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
2801	   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
2802	   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
2803	   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
2804	   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

2806	Intellectual Property

2808	   The IETF takes no position regarding the validity or scope of any
2809	   Intellectual Property Rights or other rights that might be claimed to
2810	   pertain to the implementation or use of the technology described in
2811	   this document or the extent to which any license under such rights
2812	   might or might not be available; nor does it represent that it has
2813	   made any independent effort to identify any such rights.  Information
2814	   on the procedures with respect to rights in RFC documents can be
2815	   found in BCP 78 and BCP 79.

2817	   Copies of IPR disclosures made to the IETF Secretariat and any
2818	   assurances of licenses to be made available, or the result of an
2819	   attempt made to obtain a general license or permission for the use of
2820	   such proprietary rights by implementers or users of this
2821	   specification can be obtained from the IETF on-line IPR repository at
2822	   http://www.ietf.org/ipr.

2824	   The IETF invites any interested party to bring to its attention any
2825	   copyrights, patents or patent applications, or other proprietary
2826	   rights that may cover technology that may be required to implement
2827	   this standard.  Please address the information to the IETF at
2828	   ietf-ipr@ietf.org.

2830	Acknowledgment

2832	   Funding for the RFC Editor function is provided by the IETF
2833	   Administrative Support Activity (IASA).