idnits 2.17.1 draft-ietf-drums-abnf-04.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-19) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 605 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. ** The document seems to lack a both a reference to RFC 2119 and the recommended RFC 2119 boilerplate, even if it appears to use RFC 2119 keywords. RFC 2119 keyword, line 47: '...3.7 OPTIONAL SEQUENCE [RULE]...' RFC 2119 keyword, line 397: '... IT IS STRONGLY RECOMMENDED THAT THE ...' Miscellaneous warnings: ---------------------------------------------------------------------------- == Line 89 has weird spacing: '...ng with an al...' == Line 98 has weird spacing: '...are not requi...' == Line 100 has weird spacing: '... use of a...' == Line 344 has weird spacing: '...rrences of e...' == Line 472 has weird spacing: '...Certain basic...' == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Missing Reference: 'RULE' is mentioned on line 364, but not defined == Unused Reference: 'RFC733' is defined on line 555, but no explicit reference was found in the text == Unused Reference: 'RFC822' is defined on line 559, but no explicit reference was found in the text -- Possible downref: Non-RFC (?) normative reference: ref. 'US-ASCII' ** Obsolete normative reference: RFC 733 (Obsoleted by RFC 822) ** Obsolete normative reference: RFC 822 (Obsoleted by RFC 2822) Summary: 14 errors (**), 0 flaws (~~), 11 warnings (==), 2 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Network Working Group D. Crocker (editor) 2 Internet-Draft: DRAFT-DRUMS-ABNF-04.txt Internet Mail 3 Expiration <4/98> Consortium 4 Paul Overell 5 Demon Internet Ltd 7 Augmented BNF for Syntax Specifications: ABNF 9 STATUS OF THIS MEMO 11 This document is an Internet-Draft. Internet-Drafts are working 12 documents of the Internet Engineering Task Force (IETF), its 13 areas, and its working groups. Note that other groups may also 14 distribute working documents as Internet-Drafts. 16 Internet-Drafts are draft documents valid for a maximum of six 17 months and may be updated, replaced, or obsoleted by other 18 documents at any time. It is inappropriate to use Internet- 19 Drafts as reference material or to cite them other than as ``work 20 in progress.'' 22 To learn the current status of any Internet-Draft, please check 23 the ``1id-abstracts.txt'' listing contained in the Internet- 24 Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net 25 (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East 26 Coast), or ftp.isi.edu (US West Coast). 28 TABLE OF CONTENTS 30 1. INTRODUCTION 32 2. RULE DEFINITION 33 2.1 RULE NAMING 34 2.2 RULE FORM 35 2.3 END-OF-RULE 36 2.4 TERMINAL VALUES 37 2.5 EXTERNAL ENCODINGS 39 3. OPERATORS 40 3.1 CONCATENATION RULE1 RULE2 41 3.2 ALTERNATIVES RULE1 / RULE2 42 3.3 INCREMENTAL ALTERNATIVES RULE1 =/ RULE2 43 3.4 VALUE RANGE ALTERNATIVES %C##-## 44 3.5 SEQUENCE GROUP (RULE1 RULE2) 45 3.6 VARIABLE REPETITION *RULE 46 3.6 SPECIFIC REPETITION NRULE 47 3.7 OPTIONAL SEQUENCE [RULE] 48 3.8 ; COMMENT 49 3.8 OPERATOR PRECEDENCE 51 4. ABNF DEFINITION OF ABNF 53 5. APPENDIX A - CORE 55 6. ACKNOWLEDGMENTS 57 7. REFERENCES 59 8. CONTACT 61 1. INTRODUCTION 63 Internet technical specifications often need to define a format 64 syntax and are free to employ whatever notation their authors 65 deem useful. Over the years, a modified version of Backus-Naur 66 Form (BNF), called Augmented BNF (ABNF), has been popular among 67 many Internet specifications. It balances compactness and 68 simplicity, with reasonable representational power. In the early 69 days of the Arpanet, each specification contained its own 70 definition of ABNF. This included the email specifications, 71 RFC733 and then RFC822 which have come to be the common citations 72 for defining ABNF. The current document separates out that 73 definition, to permit selective reference. Predictably, it also 74 provides some modifications and enhancements. 76 The differences between standard BNF and ABNF involve naming 77 rules, repetition, alternatives, order-independence, lists, and 78 value ranges. Appendix A (Core) supplies rule definitions for a 79 core lexical analyzer of the type common to several Internet 80 specifications. It is provided as a convenience and is otherwise 81 separate from the meta language defined in the body of this 82 document, and separate from its formal status. 84 2. RULE DEFINITION 86 2.1 Rule Naming 88 The name of a rule is simply the name itself; that is, a 89 sequence of characters, beginning with an alphabetic character, 90 and followed by a combination of alphabetics, digits and hyphens 91 (dashes). 93 RULE NAMES ARE CASE-INSENSITIVE. 95 The names , , and all 96 refer to the same rule. 98 Unlike original BNF, angle brackets ("<", ">") are not required. 99 However, angle brackets may be used around a rule reference 100 whenever their presence will facilitate discerning the use of a 101 rule name. This is typically restricted to rule name references 102 in free-form prose, or to distinguish partial rules that combine 103 into a string not separated by white space, such as shown in the 104 discussion about repetition, below. 106 2.2 Rule Form 108 A rule is defined by the following sequence: 110 name = elements 112 where is the name of the rule and is one or 113 more rule names or terminal specifications. The equal sign 114 separates the name from the definition of the rule. The elements 115 form a sequence of one or more rule names and/or value 116 definitions, combined according to the various operators, defined 117 in this document, such as alternative and repetition. 119 2.3 End-of-Rule 121 Formally the grammar requires a one-token look-ahead to find the 122 "=" token, which indicates that the previous token is the name of 123 a new rule. For visual ease, rule definitions are left aligned. 124 When a rule requires multiple lines, the continuation lines are 125 indented. 127 2.4 Terminal Values 129 Rules resolve into a string of terminal values, sometimes called 130 characters. Values within ABNF are represented as decimal 131 numbers. Hence, an ABNF parser processes a sequence of 132 characters. Each character is represented as a decimal number. 133 A string of values is in "network byte order" with the higher- 134 valued bytes represented on the left-hand side and being sent 135 over the network first. 137 Terminals are specified by one or more numeric characters with 138 the base interpretation of those characters indicated explicitly. 139 The following bases are currently defined: 141 b = binary 143 d = decimal 145 x = hexadecimal 147 Hence: 149 CR = %d13 151 CR = %x0D 153 respectively specify the decimal and hexadecimal representation 154 of [US-ASCII] for carriage return. 156 A concatenated string of such values is specified compactly, 157 using a period (".") to indicate separation of characters within 158 that value. Hence: 160 CRLF = %d13.10 162 ABNF permits specifying literal text string directly, enclosed in 163 quotation-marks. Hence: 165 command = "command string" 167 Literal text strings are interpreted as a concatenated set of 168 printable characters. 170 ABNF STRINGS ARE CASE-INSENSITIVE AND THE 171 CHARACTER SET FOR THESE STRINGS IS US-ASCII. 173 Hence: 175 rulename = "abc" 177 and: 179 rulename = "aBc" 181 will match "abc", "Abc", "aBc", "abC", "ABc", "aBC", "AbC" and 182 "ABC". 184 TO SPECIFY A RULE WHICH IS CASE SENSITIVE, 185 SPECIFY THE CHARACTERS INDIVIDUALLY. 187 For example: 189 rulename = %d97 %d9 %d99 191 or 193 rulename = %d97.98.99 195 will match only the string which comprises only lowercased 196 characters, abc. 198 2.5 External Encodings 200 External representations of these characters will vary according 201 to constraints in the storage or transmission environment. 202 Hence, the same ABNF-based grammar may have multiple external 203 encodings, such as one for a 7-bit US-ASCII environment, another 204 for a binary octet environment and still a different one when 16- 205 bit Unicode is used. Encoding details are beyond the scope of 206 ABNF, although Appendix A (Core) provides definitions for a 7-bit 207 US-ASCII environment as has been common to much of the Internet. 209 By separating external encoding from the syntax, it is intended 210 that alternate encoding environments can be used for the same 211 syntax. 213 3. OPERATORS 215 3.1 Concatenation Rule1 Rule2 217 A rule can define a simple, ordered string of values -- i.e., a 218 concatenation of contiguous characters -- by listing a sequence 219 of rule names. For example: 221 foo = %x61 ; a 223 bar = %x62 ; b 225 mumble = foo bar foo 227 So that the rule matches the lower-case string "aba". 229 LINEAR WHITE SPACE: Concatenation is at the core of the ABNF 230 parsing model. A string of contiguous characters (values) is 231 parsed according to the rules defined in ABNF. For Internet 232 specifications, there is some history of permitting linear white 233 space (space and horizontal tab) to be freely - and 234 implicitly - interspered around major constructs, such as 235 delimiting special characters or atomic strings. 237 THIS SPECIFICATION FOR ABNF DOES NOT PROVIDE 238 FOR IMPLICIT SPECIFICATION OF LINEAR WHITE SPACE 240 Any grammar which wishes to permit linear white space around 241 delimiters or string segments must specify it explicitly. It is 242 often useful to provide for such white space in "core" rules that 243 are then used variously among higher-level rules. The "core" 244 rules might be formed into a lexical analyzer or simply be part 245 of the main ruleset. 247 3.2 Alternatives Rule1 / Rule2 249 Elements separated by forward slash ("/") are alternatives. 250 Therefore, 252 foo / bar 254 will accept or . 256 REMINDER: A string containing alphabetic 257 characters is a non-terminal representing the set 258 of combinatorial strings with upper and lower case 259 characters. 261 3.3 Incremental Alternatives Rule1 =/ Rule2 263 It is sometimes convenient to specify a list of alternatives in 264 fragments. That is, an initial rule may match one or more 265 alternatives, with later rule definitions adding to the set of 266 alternatives. This is particularly useful for otherwise- 267 independent specifications which derive from the same parent rule 268 set, such as often occurs with parameter lists. ABNF permits 269 this incremental definition through the construct: 271 oldrule =/ additional-alternatives 273 So that the rule set 275 ruleset = alt1 / alt2 277 ruleset =/ alt3 279 ruleset =/ alt4 / alt5 281 is the same as specifying 283 ruleset = alt1 / alt2 / alt3 / alt4 / alt5 285 3.4 Value Range Alternatives %c##-## 287 A range of alternative numeric values can be specified compactly, 288 using dash ("-") to indicate the range of alternative values. 289 Hence: 291 DIGIT = %x30-39 293 is equivalent to: 295 DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" / 297 "7" / "8" / "9" 299 Concatenated numeric values and numeric value ranges can not be 300 specified in the same string. A numeric value may use the dotted 301 notation for concatenation or it may use the dash notation to 302 specify one value range. Hence, to specify a line containing one 303 printable character, the specification could be: 305 onechar-line = %x0D.OA %x20-7E %xOD.OA 307 3.5 Sequence Group (Rule1 Rule2) 309 Elements enclosed in parentheses are treated as a single element, 310 whose contents are STRICTLY ORDERED. Thus, 312 elem (foo / bar) blat 314 which matches (elem foo blat) or (elem bar blat). 316 elem foo / bar blat 318 matches (elem foo) or (bar blat). 320 IT IS STRONGLY ADVISED TO USE GROUPING 321 NOTATION, RATHER THAN TO RELY ON PROPER 322 READING OF "BARE" ALTERNATIONS, WHEN 323 ALTERNATIVES CONSIST OF MULTIPLE RULE NAMES 324 OR LITERALS. 326 Hence it is strongly recommended that instead of the above form, 327 the form: 329 (elem foo) / (bar blat) 331 be used. It will avoid misinterpretation by casual readers. 333 The local grouping notation is also used within free text to set 334 off an element sequence from the prose. 336 3.6 Variable Repetition *Rule 338 The operator "*" preceding an element indicates repetition. The 339 full form is: 341 *element 343 where and are optional decimal values, indicating at 344 least and at most occurrences of element. 346 Default values are 0 and infinity so that <*element> allows any 347 number, including zero; <1*element> requires at least one; 348 <3*3element> allows exactly 3 and <1*2element> allows one or two. 350 3.7 Specific Repetition nRule 352 A rule of the form: 354 element 356 is equivalent to 358 *element 360 That is, exactly occurrences of . Thus 2DIGIT is 361 a 2-digit number, and 3ALPHA is a string of three alphabetic 362 characters. 364 3.8 Optional Sequence [RULE] 366 Square brackets enclose an optional element sequence: 368 [foo bar] 370 is equivalent to 372 *1(foo bar). 374 3.9 ; Comment 376 A semi-colon starts a comment that continues to the end of line. 377 This is a simple way of including useful notes in parallel with 378 the specifications. 380 3.10 Operator Precedence 382 The various mechanisms described above have the following 383 precedence, from highest (binding tightest) at the top, to 384 lowest and loosest at the bottom: 386 Strings, Names formation 387 Comment 388 Value range 389 Repetition, List 390 Grouping, Optional 391 Concatenation 392 Alternative 394 Use of the alternative operator, freely mixed with concatenations 395 can be confusing. 397 IT IS STRONGLY RECOMMENDED THAT THE GROUPING 398 OPERATOR BE USED TO MAKE EXPLICIT 399 CONCATENATION GROUPS. 401 4. ABNF DEFINITION OF ABNF 403 This syntax uses the rules provided in Appendix A (Core). 405 rulelist = 1*( rule / (*c-wsp c-nl) ) 407 rule = rulename defined-as elements c-nl 408 ; continues if next line starts 409 ; with white space 411 rulename = ALPHA *(ALPHA / DIGIT / "-") 413 defined-as = *c-wsp ("=" / "=/") *c-wsp 414 ; basic rules definition and 415 ; incremental alternatives 417 elements = alternation *c-wsp 419 c-wsp = WSP / (c-nl WSP) 421 c-nl = comment / CRLF 422 ; comment or newline 424 comment = ";" *(WSP / VCHAR) CRLF 426 alternation = concatenation 427 *(*c-wsp "/" *c-wsp concatenation) 429 concatenation = repetition *(1*c-wsp repetition) 431 repetition = [repeat] element 433 repeat = 1*DIGIT / (*DIGIT "*" *DIGIT) 435 element = rulename / group / option / 436 char-val / num-val / prose-val 438 group = "(" *c-wsp alternation *c-wsp ")" 440 option = "[" *c-wsp alternation *c-wsp "]" 442 char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE 443 ; quoted string of SP and VCHAR 444 without DQUOTE 446 num-val = "%" (bin-val / dec-val / hex-val) 448 bin-val = "b" 1*BIT 449 *("." 1*BIT) / ["-" 1*BIT] 450 ; series of concatenated bit values 451 ; or single ONEOF range 453 dec-val = "d" 1*DIGIT 454 *("." 1*DIGIT) / ["-" 1*DIGIT] 456 hex-val = "x" 1*HEXDIG 457 *("." 1*HEXDIG) / ["-" 1*HEXDIG] 459 prose-val = "<" *(%x20-3D / %x3F-7E) ">" 460 ; bracketed string of SP and VCHAR 461 without angles 463 5. SECURITY CONSIDERATIONS 465 Security is truly believed to be irrelevant to this document. 467 6. APPENDIX A - CORE 469 This Appendix is provided as a convenient core for specific 470 grammars. The definitions may be used as a core set of rules. 472 Certain basic rules are in uppercase, such as SP, HTAB, CRLF, 473 DIGIT, ALPHA, etc. 475 ALPHA = %x41-5A / %x61-7A ; A-Z / a-z 477 BIT = "0" / "1" 479 CHAR = %x01-7F 480 ; any 7-bit US-ASCII character, 481 excluding NUL 483 CR = %x0D 484 ; carriage return 486 CRLF = CR LF 487 ; Internet standard newline 489 CTL = %x00-1F / %x7F 490 ; controls 492 DIGIT = %x30-39 493 ; 0-9 495 DQUOTE = %x22 496 ; " (Double Quote) 498 HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F" 500 HTAB = %x09 501 ; horizontal tab 503 LF = %x0A 504 ; linefeed 506 LWSP = *(WSP / CRLF WSP) 507 ; linear white space (past newline) 509 OCTET = %x00-FF 510 ; 8 bits of data 512 SP = %x20 513 ; space 515 VCHAR = %x21-7E 516 ; visible (printing) characters 518 WSP = SP / HTAB 519 ; white space 521 Externally, data are represented as "network virtual ASCII", 522 namely 7-bit US-ASCII in an 8th bit field, with the high (8th) 523 bit set to zero. 525 7. ACKNOWLEDGMENTS 527 The syntax for ABNF was originally specified in RFC #733. Ken L. 528 Harrenstien, of SRI International, was responsible for re-coding 529 the BNF into an augmented BNF that makes the representation 530 smaller and easier to understand. 532 This recent project began as a simple effort to cull out the 533 portion of RFC 822 which has been repeatedly cited by non-email 534 specification writers, namely the description of augmented BNF. 535 Rather than simply and blindly converting the existing text into 536 a separate document, the working group chose to give careful 537 consideration to the deficiencies, as well as benefits, of the 538 existing specification and related specifications available over 539 the last 15 years and therefore to pursue enhancement. This 540 turned the project into something rather more ambitious than 541 first intended. Interestingly the result is not massively 542 different from that original, although decisions such as removing 543 the list notation came as a surprise. 545 The current round of specification was part of the DRUMS working 546 group, with significant contributions from Roger Fajman, Bill 547 McQuillan, Keith Moore, Pete Resnick, Jerome Abela and Chris 548 Newman. 550 8. REFERENCES 552 [US-ASCII] Coded Character Set--7-Bit American Standard Code 553 for Information Interchange, ANSI X3.4-1986. 555 [RFC733] Crocker, D.H., Vittal, J.J., Pogran, K.T., 556 Henderson, D.A. "Standard for the Format of ARPA Network 557 Text Message," RFC 733, November 1977. 559 [RFC822] Crocker, D., "Standard for the Format of ARPA 560 Internet Text Messages", RFC 822, August, 1982. 562 9. CONTACT 564 David H. Crocker Paul Overell 566 Internet Mail Consortium Demon Internet Ltd 567 675 Spruce Dr. Dorking Business Park 568 Sunnyvale, CA 94086 USA Dorking 569 Surrey, RH4 1HN 570 UK 572 Phone: +1 408 246 8253 573 Fax: +1 408 249 6205