Network Working Group M. Nossik, F. Welfeld, M. Richardson INTERNET-DRAFT Solidum Systems Corporation draft-nossik-pax-pdl-00.txt October 16, 1998 Expires in six months PAX PDL - a non-procedural packet description language Status of This memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast). Abstract This document describes PAX Pattern Description Language (PDL). PAX is a special purpose language for definitions of filters (recognizers) for sequential inputs. The language is suitable for describing pattern matching criteria in policy-based networking devices such as QoS routers and switches, packet filters, RMON probes, traffic shapers, etc. It pro- vides consistent means of programming policy-based networking devices based on different hardware and software platforms. Programs written in PAX can be built incrementally, where elementary patterns can be used as building blocks for more complex ones. The language encourages modular and object-oriented design. M. Nossik, F. Welfeld, M. Richardson [page 1] INTERNET-DRAFT September 30, 1998 Table of Contents 1. Scope and purpose . . . . . . . . . . . . . . . . . . . . . . . 2 2. Language structure . . . . . . . . . . . . . . . . . . . . . . . 3 2.1. Characters, separators and tokens . . . . . . . . . . . . . 3 2.2. Comments . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.3. Preprocessing . . . . . . . . . . . . . . . . . . . . . . . 3 2.3.1. Preprocessor directives . . . . . . . . . . . . . . . . 4 2.3.3. Conditional compilation . . . . . . . . . . . . . . . . 4 2.3.4. Defines and macros . . . . . . . . . . . . . . . . . . . 4 2.4. Keywords and names . . . . . . . . . . . . . . . . . . . . . 4 2.5. Program structure . . . . . . . . . . . . . . . . . . . . . 5 2.5.1. PATTERN statement . . . . . . . . . . . . . . . . . . . 5 2.5.2. IMPORT statement . . . . . . . . . . . . . . . . . . . . 5 2.5.3. EXPORT statement . . . . . . . . . . . . . . . . . . . . 6 3. Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1. Elementary patterns . . . . . . . . . . . . . . . . . . . . 6 3.2. Field concatenation . . . . . . . . . . . . . . . . . . . . 6 3.3. Field combination . . . . . . . . . . . . . . . . . . . . . 7 3.4. Naming patterns . . . . . . . . . . . . . . . . . . . . . . 8 3.5. Pattern references . . . . . . . . . . . . . . . . . . . . . 9 3.5.1. Specialized fields . . . . . . . . . . . . . . . . . . . 10 3.5.2. Conditional fields . . . . . . . . . . . . . . . . . . . 10 3.5.3. Variant Fields . . . . . . . . . . . . . . . . . . . . . 11 3.6. Pattern length adjustment . . . . . . . . . . . . . . . . . 12 3.7. Parametric patterns . . . . . . . . . . . . . . . . . . . . 13 3.7.1. Parameters declared in the formal parameter . . . . . . 13 3.7.2. Parameters declared by reference . . . . . . . . . . . . 14 4. Elementary patterns and Atomic conditional expressions HEADER-2 15 4.2. Relations . . . . . . . . . . . . . . . . . . . . . . . . . 16 4.3. Atomic conditional expressions . . . . . . . . . . . . . . . 17 5. Pattern Conditional Expressions . . . . . . . . . . . . . . . . 17 5.1. Field references . . . . . . . . . . . . . . . . . . . . . . 18 5.2. Pattern Conditional Expressions . . . . . . . . . . . . . . 19 6. Pattern Libraries . . . . . . . . . . . . . . . . . . . . . . . 19 6.1. EXPORT statement . . . . . . . . . . . . . . . . . . . . . . 19 6.2. IMPORT statement . . . . . . . . . . . . . . . . . . . . . . 19 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 8. Editors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20 9. Appendix A. Formal Grammar . . . . . . . . . . . . . . . . . . . 21 1. Scope and purpose This document provides a reference guide for using the PAX language. Numerous examples help illustrate the capabilities and expressive power of the language. PAX PDL is a non-procedural Fourth Generation Language (4GL). This means that it describes the patterns that need to be recognized, but does not describe how to recognize them. The details of how it is being done are hidden in the compiler. The PAX PDL allows concise, flexible description M. Nossik, F. Welfeld, M. Richardson [page 2] INTERNET-DRAFT September 30, 1998 of headers and data contained in packets. Although general enough for any kind of pattern recognition, such as database searches, PAX is intended for use primarily in data communications networks, such as data communication devices. This document does not describe the compiler for the language, implementation details or the API used to classify the data packets. Readers should be familiar with Backus Naur Form (BNF) notation, used to define the language grammar. PAX is defined in terms of a left-recursive context free grammar. HEADER-2 1.1. Specification of Requirements The keywords "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", and "MAY" that appear in this document are to be interpreted as described in Bradner97. 2. Language structure In terms of form and organization, PAX resembles the data structure definition part of the C programming language. Given its widespread acceptance, the authors have tried to borrow as much as possible from C. This results in the superior expressive power of PAX, compared to competing specifications (e.g. BPF, PATHFINDER and APF). 2.1. Characters, separators and tokens The character set used in source programs consists of the whitespace plus the following 89 graphical characters: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 0 1 2 3 4 5 6 7 8 9 _ { } [ ] ( ) # < > : ; . ? * + - / & | ~ ! = , \ " ' Whitespace characters (space, horizontal TAB, vertical TAB, form- feed and new-line) are used as separators between tokens, identifiers and language constructs. 2.2. Comments PAX supports both C and C++ type comments. C-style comments are any string of characters starting with the /*, up to the nearest */. C-style comments are treated as a whitespace. C++ comments are any string of characters following //, to the nearest new-line character. C++-style comments are treated as a whitespace. 2.3. Preprocessing M. Nossik, F. Welfeld, M. Richardson [page 3] INTERNET-DRAFT September 30, 1998 PAX provides support for some useful constructs through the use of a preprocessor. As discussed below, major useful preprocessor features include: o source file inclusion, o conditional compilation, and o defines and macros. 2.3.1. Preprocessor directives Lines beginning with # (perhaps preceded by whitespace) are interpreted by the preprocessor as preprocessing directives. Line boundaries are significant. The preprocessing directive extends to the next newline character not immediately preceded by a backslash ( ). The syntax of these lines is independent of the rest of the language. HEADER-3 2.3.2. Source file inclusion The PAX preprocessor supports the #include directive. The directive specifies that the compiler must switch processing from the current file to the file specified on this line. This allows modular, hierarchical program design. The PAX preprocessor supports both conventional quoted ( "file" ) and angle bracketed ( ) notation, for example: #include #include "project_file.pax" 2.3.3. Conditional compilation The PAX preprocessor supports the following conditional compilation directives: #ifdef #if #else #endif Conditional compilation may be used in "standard header" files to prevent multiple inclusion. 2.3.4. Defines and macros The PAX preprocessor supports the #define directive which specifies a macro/alias for a specific string. This feature is useful for improving code layout and readability, by providing an elegant way to configure PAX files using literals in PAX header files. ANSI-C preprocessor capabilities SHOULD NOT be used beyond those described in this document. M. Nossik, F. Welfeld, M. Richardson [page 4] INTERNET-DRAFT September 30, 1998 2.4. Keywords and names The following keywords are used in the language constructs: AND ANYOF BIT EXPORT IMPORT NOT OR PATTERN UINT WHERE Note: These values are reserved and cannot be used as identifiers of any kind. They are not case sensitive, but are by convention shown in this document as being in upper case. Identifiers are used within the language to convey pattern names, parameter names and field names. 2.5. Program structure A PAX program is a non-empty sequence of statements. There are three types of statements available in PAX: PATTERN IMPORT EXPORT 2.5.1. PATTERN statement The PATTERN statement is the main building block of a PAX program. It describes both the prototype and the layout of the pattern. The name of the pattern defined by the PATTERN statement has a global scope and can be used to reference this pattern at any point in the program past this PATTERN statement. Names of the parameters used in the prototype have a scope of the pattern body. 2.5.2. IMPORT statement The IMPORT statement specifies a prototype for a pattern residing in an external library. IMPORT statement describes the pattern prototype (name and parameters) allowing it to be referenced. Note:Pattern conditionals cannot be used on imported patterns. Use #include statements for conditionals. The name of the pattern defined by the IMPORT statement has a global scope and can be used to reference this pattern at any point in the program past this IMPORT statement. Names of the parameters used in the M. Nossik, F. Welfeld, M. Richardson [page 5] INTERNET-DRAFT September 30, 1998 prototype are meaningless, as actual parameters are passed in pattern references by position. 2.5.3. EXPORT statement The EXPORT statement specifies a pattern(s) to be written to the output file (pattern library). Any number of EXPORT statements can appear anywhere in the program. Pattern can be named in the EXPORT statement even before it is defined. 3. Patterns Pattern is a basic PAX concept. Unlike C, where you can see the variable as a basic low level construct, and the function as the basic high level construct, pattern is both the low level and high level construct. 3.1. Elementary patterns The following describes the basic elementary patterns: := := := BIT | UINT := The simplest pattern looks like: BIT 16 This matches any 16 bits in the input. BIT is a builtin field type. Here we can as well use: UINT 16 UINT, another builtin field type, is used for unsigned numeric fields. := Matching anything is not very useful. Here is more valuable pattern: UINT 4 >= 5 This pattern matches a 4 bit field when its numerical value is greater than 4 (e.g. ihl field in the IP header). >= is a relational operator and 5 is a literal value. Both the relational operator and the literal value are appropriate for the UINT field type. M. Nossik, F. Welfeld, M. Richardson [page 6] INTERNET-DRAFT September 30, 1998 3.2. Field concatenation The most common way of building more complex patterns is by concatenating simpler ones, such as those below: := } | ; } := { | ; | ; | ; := WHEN := A sample concatenation follows: { version UINT 4 == 4; ihl UINT 4 == 5; /* length == 5 : no options */ typeOfService UINT 8; totalLength UINT 16; identification UINT 16; flagReserved BIT 1 == 0; flagDontFragment BIT 1; flagMoreFragments BIT 1 == 0; /* last fragments only */ fragmentOffset UINT 13 == 0; /* first fragments only */ timeToLive UINT 8; protocol BIT 8 = 6; /* TCP */ headerChecksum BIT 16; sourceAddress BIT 32; destinationAddress BIT 32;/* this semicolon is optional */ } This pattern matches the IP version 4 headers of TCP/IP non- fragmented packets without IP options. When patterns are put in sequence on the list, the resulting pattern matches the input only when the first list element matches the beginning of the input; the next element matches some subsequent bits and so on. Elements of the above list are not simple patterns, but fields - patterns with the names attached, according to the following rule of the language grammar: := Note: Naming fields allows you to reference them later. See Pattern Conditional Expressions on page. 3.3. Field combination Compare the use of the opening and closing square brackets, ( ) in the M. Nossik, F. Welfeld, M. Richardson [page 7] INTERNET-DRAFT September 30, 1998 following to the brackets used in Field Concatenation, in 3.2. := ] | ; ] := [ | ; Often data has alternative layouts, as discussed following the sample IEEE 802.2 LLC header code below: { dsap BIT 8 <> 0xFF; ssap BIT 8 <> 0xFF; control [ longControl { control1 BIT 2 <> 0b11; control2 BIT 14 }; shortControl { control1 BIT 2 == 0b11; control2 BIT 6 } ] } The control field of this pattern can be either 8 or 16 bits long. This is represented by the combination of two patterns : longControl and shortControl. Input is accepted if it matches either of these patterns. The operands of the combination must be mutually exclusive; that is, no input can match more than one of them. In the example code, this is guaranteed by the distinct conditions on longControl.control1 and shortControl.control1. The C aficionados may notice that syntactical similarity between concatenation and combination in the PAX is somewhat analogous to the similarity between struct and union. Do not carry this too far: semantics of the concatenation resembles struct much more than combination resembles union. 3.4. Naming patterns As shown in the preceding example, any pattern may be used to build more complex patterns. This is a powerful concept: PAX patterns can be created in hierarchical fashion. Although possible, it is infeasible to retype component patterns verbatim each time it is used. Similar to typedefs in C, PAX allows you to assign a name to a pattern and subsequently refer to that pattern by its name: := := PATTERN M. Nossik, F. Welfeld, M. Richardson [page 8] INTERNET-DRAFT September 30, 1998 := Any identifier can be used as pattern name, as long as all names within compilation unit are unique. The simplest prototype consists just of the keyword and the name. For more complex patterns, see 3.6, Parametric patterns. The pattern definition is just the prototype followed by the pattern itself, as shown below: /* IP */ #define ETHERTYPE_IP 0x0800 /* Internet Control Message Protocol */ #define IPPROTO_ICMP 1 PATTERN Ethernet_Hdr_For_IP { destination BIT 48; source BIT 48; type UINT 16 == ETHERTYPE_IP } PATTERN IP_v4_Hdr_For_ICMP { version UINT 4 == 4; ihl UINT 4 == 5; /* length == 5 : no options */ typeOfService UINT 8; totalLength UINT 16; identification UINT 16; flagReserved BIT 1 == 0; flagDontFragment BIT 1; flagMoreFragments BIT 1 == 0; /* last fragments only */ fragmentOffset UINT 13 == 0; /* first fragments only */ timeToLive UINT 8; protocol BIT 8 == IPPROTO_ICMP; /* ICMP */ headerChecksum BIT 16; sourceAddress BIT 32; destinationAddress BIT 32 } 3.5. Pattern references Pattern reference (i.e. the name of the pattern) can be used instead of the pattern itself: := := PATTERN ICMP_Over_IP_Over_Ethernet_Hdr { eh Ethernet_Hdr_For_IP; iph IP_v4_Hdr_For_ICMP; type UINT 8; code UINT 8; checksum BIT 16; } M. Nossik, F. Welfeld, M. Richardson [page 9] INTERNET-DRAFT September 30, 1998 3.5.1. Specialized fields You do not have to build and name all the patterns you are going to use later. While packet layouts are defined by protocols and are mostly known in advance, conditions applied to data in the packets could vary widely. PAX allows you to define a pattern and to apply some conditions to its data only when it is referenced: := WHERE Thus, you may define more generic patterns and customize them as required, as shown below: PATTERN Ethernet_Hdr { destination BIT 48; source BIT 48; type UINT 16 } PATTERN IP_v4_Hdr { version UINT 4 == 4; ihl UINT 4 == 5; /* length == 5 : no options */ typeOfService UINT 8; totalLength UINT 16; identification UINT 16; flagReserved BIT 1 == 0; flagDontFragment BIT 1; flagMoreFragments BIT 1 == 0; /* last fragments only */ fragmentOffset UINT 13 == 0; /* first fragments only */ timeToLive UINT 8; protocol BIT 8; headerChecksum BIT 16; sourceAddress BIT 32; destinationAddress BIT 32 } PATTERN ICMP_Over_IP_Over_Ethernet_Hdr { eh Ethernet_Hdr WHERE type == ETHERTYPE_IP; iph IP_v4_Hdr WHERE protocol == IPPROTO_ICMP; type UINT 8; code UINT 8; checksum BIT 16; } 3.5.2. Conditional fields Conditional field is used to describe patterns that are have different layouts depending on certain conditions. := WHEN M. Nossik, F. Welfeld, M. Richardson [page 10] INTERNET-DRAFT September 30, 1998 Consider the pattern for matching the MAC LLC header: PATTERN IEEE_802_2_LLC { DSAP BIT 8 <> 0xFF; SSAP BIT 8 <> 0xFF; Control1 BIT 2; LongControl BIT 14 WHEN Control1 <> 0b11; ShortControl BIT 6 WHEN Control1 == 0b11; } Without resorting to conditional field, we would have to express the same pattern in less elegant way through the use of combination. PATTERN IEEE_802_2_LLC { DSAP BIT 8 <> 0xFF; SSAP BIT 8 <> 0xFF; Control [ LongControl { Control1 BIT 2 <> 0b11; Control2 BIT 14 }; ShortControl { Control1 BIT 2 == 0b11; Control2 BIT 6 } ] } 3.5.3. Variant Fields Variant fields serve to describe variants of the pattern layout depending on the value of the other field or on the value of arbitrarily complex conditional expression involving multiple fields. := } | ; } := ANYOF { | ; := : := The must be different from the names of all cases already included in . Scope for the is the pattern for which an ANYOF field is a member. Only the fields preceding this ANYOF field in the definition and definition subfields may be referenced in the . The ANYOF is a convenient shorthand for the combination operator. The following demonstrates the semantics: M. Nossik, F. Welfeld, M. Richardson [page 11] INTERNET-DRAFT September 30, 1998 PATTERN original { pre_field_1 ...; ...; pre_field_K ...; anyOfField ANYOF { case_selector_1 : pattern_1; ...; case_selector_M : pattern_M } post_field_1 ...; ...; post_field_N ... } PATTERN equivalent [ case1 { /* conditions corresponding to case_selector_1 */ ; pre_field_1 ... WHERE ... ; ...; /* conditions corresponding to case_selector_K */ ; pre_field_1 ... WHERE ... ; anyOfField pattern_1; post_field_1 ...; ...; post_field_N ... } ...; caseM { /* conditions corresponding to case_selector_M */ pre_field_1 ... WHERE ... ; ...; /* conditions corresponding to case_selector_M */ pre_field_K ... WHERE ... ; anyOfField pattern_M; post_field_1 ...; ...; post_field_N ... }; ] Note: Patterns corresponding to different case selectors MUST be mutually exclusive. 3.6. Pattern length adjustment Patterns of variable length can be adjusted to a user-specified length. := The pattern adjusted to the particular length n will accept the packet: M. Nossik, F. Welfeld, M. Richardson [page 12] INTERNET-DRAFT September 30, 1998 1. if the unadjusted pattern will accept the evaluated data within the the first n bits; 2. if the evaluated data would not be rejected by the unadjusted pattern within the n bits. In other words, in the case of matching before the adjusted length is exhausted, the remaining bits are skipped (ignored, treated as padding). In case of non-matching after the adjusted length is exhausted, the remaining packet bits are not evaluated (truncated). Let's consider the options field in IP header. OptionsAndPadding ANYOF { IHL == 5 : opt1 IP_4_16Options 0; IHL == 6 : opt2 IP_4_16Options 32; IHL == 7 : opt3 IP_4_16Options 64; IHL == 8 : opt4 IP_4_16Options 96; IHL == 9 : opt5 IP_4_16Options 128 WHERE FirstOption.type == SECURITY_OPT; IHL == 10 : opt6 IP_4_16Options 160; IHL == 11 : opt7 IP_4_16Options 192; IHL == 12 : opt7 IP_4_16Options 224; IHL == 13 : opt8 IP_4_16Options 256; IHL == 14 : opt9 IP_4_16Options 288; IHL == 15 : opt10 IP_4_16Options 320 } Here the options field is adjusted depending on the total header length defined by the IHL field. For the IHL of 9 words, if the first option is the security option, the packet will be accepted, otherwise it will be rejected. 3.7. Parametric patterns Parametric patterns have the following structure: := | , := PATTERN ( ) The WHERE clauses could be applied only to the patterns which are defined by the PATTERN statements in the same compilation unit. Internal structure of the patterns IMPORT'ed from pattern libraries is not visible by the compiler. The following sample prototype has two parameters: PATTERN IP_4_Hdr ( srcAddr BIT 32, dstAddr BIT 32 ) 3.7.1. Parameters declared in the formal parameter M. Nossik, F. Welfeld, M. Richardson [page 13] INTERNET-DRAFT September 30, 1998 The definition of the parametric pattern consists of the prototype with formal parameter list and the pattern (the body). Parameters declared in the formal parameter list can be used within the body of the pattern wherever the literal of the same type and length can occur, as shown in the following: PATTERN IP_4_Hdr ( srcAddr BIT 32, dstAddr BIT 32 ) { version UINT 4 == 4; ihl UNIT 4 == 5; /* length == 5 : no options */ typeOfService UINT 8; totalLength UINT 16; flagReserved BIT 1 == 0; flagDontFragment Bit 1; fragmentOffset UINT 13; timeToLive UINT 8; protocol BIT 8; headerChecksum BIT 16; sourceAddress BIT 32 == srcAddr; destinationAddress BIT 32 == dstAddr; } 3.7.2. Parameters declared by reference When parametric patterns are referenced, the actual parameters corresponding to the formal parameters have to be supplied. They can be literals (or the parameter references from the outer scope) of the appropriate type: := | := | , := ( := 0b0 | 0b1 | 0B0 | 0B1 | 0 1 := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 := 0 | := | 8 | 9 := 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | := | a | b | c | d | e | f | A | B | C | D | E | F := 0x | 0X | := | | | := For UINT fields, literals can be expressed as binary numbers, octal numbers, decimal numbers or hexadecimal numbers. In the above notation: o a binary number is any sequence of 0s and 1s preceded by 0b or 0B, o an octal number is any sequence of octal digits starting with 0, o a decimal number is any sequence of decimal digits not starting with 0, and o a hexadecimal number is any sequence of hexadecimal digits preceded by 0x or 0X. := 0b0 | 0b1 | 0b* | 0B0 | 0B1 | 0B* | 0 | 1 | * M. Nossik, F. Welfeld, M. Richardson [page 15] INTERNET-DRAFT September 30, 1998 := 0 | | * := 0x | 0X | 0X | | * := | | := For BIT fields, literals can be expressed as binary, octal or hexadecimal masked strings. Notation is very similar to binary, octal or hexadecimal numbers, with addition of * as a don't care digit. * represents one don't care bit in binary string, one don't care octal nibble (3 bits) in octal string, and one don't care hexadecimal nibble (4 bits) in hexadecimal string. Any valid number is also a valid masked string of the same type. A binary masked string is any sequence of 0s, 1s, and *s preceded by 0b or 0B. An octal masked string is any sequence of octal digits and *s starting with 0. A hexadecimal masked string is any sequence of hexadecimal digits and *s preceded by 0x or 0X. 4.2. Relations The following shows basic PAX relations: := | := == | <> | <= | >= | < | > := := Conditions applied to the elementary ( builtin) fields depend on the type of the field. For UINT fields, can be any of: o < o <= o > o >= M. Nossik, F. Welfeld, M. Richardson [page 16] INTERNET-DRAFT September 30, 1998 o == o <> can be a number or reference to the UINT parameter. , For BIT fields, can be only == or <>; has to be a masked string or reference to the BIT parameter. Any valid number is also a valid masked string. 4.3. Atomic conditional expressions Several conditions can be applied to the same field using more complex atomic conditional expressions. := ! If accepts the input, then rejects it, and vice versa. If the input is too short to be either accepted or rejected by , then it is too short to be either accepted or rejected by . := && Input is accepted by if it is accepted by both and . := || . Input is accepted by if it is accepted either by or . := ( ) Parentheses can be used to control the order of operations in usual way. In their absence, the precedence of the operators is: 1. ! 2. && 3. || 5. Pattern Conditional Expressions The following pattern was used as an example in 3.5, Pattern References: PATTERN ICMP_Over_IP_Over_Ethernet_Hdr { eh Ethernet_Hdr WHERE type == ETHERTYPE_IP; M. Nossik, F. Welfeld, M. Richardson [page 17] INTERNET-DRAFT September 30, 1998 iph IP_v4_Hdr WHERE protocol == IPPROTO_ICMP; type UINT 8; code UINT 8; checksum BIT 16; } In the above, the eh and iph fields are qualified by pattern conditional expressions in its simplest form. The above pattern is equivalent to: PATTERN ICMP_Over_IP_Over_Ethernet_Hdr { eh Ethernet_Hdr_For_IP; iph IP_v4_Hdr_For_ICMP; type UINT 8; code UINT 8; checksum BIT 16; } Note: In the above example, the patterns Ethernet_Hdr_For_IP and IP_v4_Hdr_For_ICMP are defined as: PATTERN Ethernet_HdrFor_IP { ... type UINT 16 == ETHERTYPE_IP } PATTERN IP_v4_Hdr_For_ICMP { ... protocol BIT 8 == IPPROTO_ICMP; ... } 5.1. Field references The examples shown in Pattern Conditional Expressions, above, put conditions only on the pattern fields directly referred to in the field definition. PAX also allows you to address embedded fields through any level of nesting, as shown below: := | . := If the points to the non-elementary pattern, then any of its fields can be addressed by . . PATTERN ftp_header { f1 ip4TcpConn( 209.67.1.1, 10.10.1.1, 1, 21 ) WHERE tcph.ctrl.syn == 1 } tcph is a field of ip4TcpConn, ctrl is a field of TcpHeader, and syn is a field of Control. See the pattern definitions in 3.6, Parametric Patterns. M. Nossik, F. Welfeld, M. Richardson [page 18] INTERNET-DRAFT September 30, 1998 5.2. Pattern Conditional Expressions You may combine the pattern conditional expressions using the usual logical operators shown below: := | ( ) | NOT | AND | OR 6. Pattern Libraries Some patterns may be written using their names to a single output file, the Pattern Library. Pattern libraries serve as the input for further steps in the classification process, but they can be also used as a source of patterns for the compiler through IMPORT statements. 6.1. EXPORT statement := EXPORT | Pattern construction is hierarchical. Many patterns may be named in a program which serve only as the intermediate steps towards an ultimate pattern. Files may also be #include'd which contain many more unneeded definitions. To prevent littering of the result, the compiler writes to the output file only the patterns explicitly named in the EXPORT statements, such as: EXPORT Ethernet_Hdr IP_v4_Hdr Any number of EXPORT statements can appear anywhere in the program. s used in EXPORT statements do not need to be defined before, as processing of all EXPORT statements takes place only at the end of the compiler run. 6.2. IMPORT statement := IMPORT | IMPORT ( ) IMPORT statement declares the pattern to be available in one of the import libraries. Such patterns can be referenced in the following statements, and the prototype provides enough information for the compiler to process references to parametric library patterns. WHERE M. Nossik, F. Welfeld, M. Richardson [page 19] INTERNET-DRAFT September 30, 1998 clauses cannot be applied to the library patterns, whereas the length adjustment can. The patterns named in the IMPORT statements are retrieved from the libraries (and the agreement between prototypes from the library and from the IMPORT statement is checked) only at the end of the compiler run, and only if such patterns are (directly or indirectly) used by one of the patterns to be exported. PATTERN ip4TcpConn ( srcIpAddress BIT 32, dstIpAddress BIT 32, srcPort UINT 16, dstPort UINT 16 ) 7. References Bradner97 S. Bradner, "Key words for use in RFCs to indicate Requirement Levels", RFC2119, March 1997 APF96 H. Dan Lambright and Saumya K. Debray, "APF : A Modular language for Fast Packet Classification". Dept of Computer Science, University of Arizona, Tucson, August 30, 1996. BPF93 S. McCanne and V. Jacobson, "The BSD packet filter: A new architecture for user-level packet capture". In USENIX Technical Conference Proceedings, pages 259- 269, San Diego, CA, Jan. 1993. CSPF87 J. C. Mogul, R. F. Rashid and M. J. Accetta, "The packet filter: An efficient mechanism for user-level network code". In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles, pages 39-- 51, Nov. 1987. DPF96 Dawson R. Engler and M. Frans Kaashoek, "DPF: Fast, Flexible Message Demultiplexing using Dynamic Code Generation", Proc. of ACM Sigcomm, August 1996. MPF94 M. Yuhara, B. N. Bershad, C. Maeda, and J. E. Moss, "Efficient packet demultiplexing for multiple endpoints and large messages". Proc. Winter 1994 USENIX Conference, Jan. 1994. PATHFINDER94 M. L. Bailey, B. Gopal, M. A. Pagels, L. L. Peterson, and P. Sarkar, "PATHFINDER: A pattern-based packet classifier". In Proceedings of the First Symposium on Operating Systems Design and Implementation, pages 115-123, November 1994. M. Nossik, F. Welfeld, M. Richardson [page 20] INTERNET-DRAFT September 30, 1998 8. Editors' Addresses Misha Nossik mnossik@solidum.com Solidum Systems Corp +1 (613) 244-4804 Michael Richardson mcr@solidum.com Feliks J. Welfeld feliks@solidum.com 9. Appendix A. Formal Grammar := | < statement> := | | := EXPORT | := := | : = a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z := | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | _ := IMPORT | IMPORT ( ) := | , := := := BIT | UINT := := | | M. Nossik, F. Welfeld, M. Richardson [page 21] INTERNET-DRAFT September 30, 1998 | := 0b0 | 0b1 | 0B0 | 0B1 | 0 | 1 := 0 | := 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 := 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | := | 8 | 9 := 0x | 0X | := | a | b | c | d | e | f | A | B | C | D | E | F := := PATTERN | PATTERN ( ) := | | | | | WHERE | | WHERE := ] | ; ] := [ | ; := } | ; } := { | ; | ; | ; := WHEN M. Nossik, F. Welfeld, M. Richardson [page 22] INTERNET-DRAFT September 30, 1998 := := | ( ) | ! | && | || := == | <> | <= | >= | < | > := | := | := | | := 0b0 | 0b1 | 0b* | 0B0 | 0B1 | 0B* | 0 | 1 | * := 0 | | * := 0x | 0X | 0X* | | * := := | ; | ; := | := } | ; } := ANYOF { | ; := : := := := | ; | M. Nossik, F. Welfeld, M. Richardson [page 23] INTERNET-DRAFT September 30, 1998 ; := : := := | ( ) | NOT | AND | OR := | . := | ( ) := | , M. Nossik, F. Welfeld, M. Richardson [page 24]