idnits 2.17.1 draft-ietf-dnsop-dns-capture-format-06.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There are 11 instances of too long lines in the document, the longest one being 9 characters in excess of 72. == There are 3 instances of lines with non-RFC6890-compliant IPv4 addresses in the document. If these are example addresses, they should be changed. -- The document has examples using IPv4 documentation addresses according to RFC6890, but does not use any IPv6 documentation addresses. Maybe there should be IPv6 examples, too? Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 5, 2018) is 2243 days in the past. Is this intentional? Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) -- Looks like a reference, but probably isn't: '1' on line 1964 -- Looks like a reference, but probably isn't: '2' on line 1967 -- Looks like a reference, but probably isn't: '3' on line 1970 -- Looks like a reference, but probably isn't: '4' on line 1973 -- Looks like a reference, but probably isn't: '5' on line 1976 -- Looks like a reference, but probably isn't: '6' on line 1979 -- Looks like a reference, but probably isn't: '7' on line 1982 -- Looks like a reference, but probably isn't: '8' on line 1984 -- Looks like a reference, but probably isn't: '9' on line 1986 -- Looks like a reference, but probably isn't: '10' on line 1988 -- Looks like a reference, but probably isn't: '11' on line 1991 -- Looks like a reference, but probably isn't: '12' on line 2547 -- Looks like a reference, but probably isn't: '13' on line 2553 -- Looks like a reference, but probably isn't: '14' on line 2596 -- Looks like a reference, but probably isn't: '15' on line 2607 -- Looks like a reference, but probably isn't: '16' on line 2620 -- Looks like a reference, but probably isn't: '17' on line 2646 -- Looks like a reference, but probably isn't: '18' on line 2647 -- Looks like a reference, but probably isn't: '19' on line 2649 -- Looks like a reference, but probably isn't: '20' on line 2652 -- Looks like a reference, but probably isn't: '21' on line 2654 -- Looks like a reference, but probably isn't: '22' on line 2656 -- Looks like a reference, but probably isn't: '23' on line 2841 -- Looks like a reference, but probably isn't: '24' on line 2843 ** Obsolete normative reference: RFC 7049 (Obsoleted by RFC 8949) == Outdated reference: A later version (-16) exists of draft-hoffman-dns-in-json-13 == Outdated reference: A later version (-08) exists of draft-ietf-cbor-cddl-02 -- Obsolete informational reference (is this intentional?): RFC 7159 (Obsoleted by RFC 8259) Summary: 2 errors (**), 0 flaws (~~), 4 warnings (==), 27 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 2 dnsop J. Dickinson 3 Internet-Draft J. Hague 4 Intended status: Standards Track S. Dickinson 5 Expires: September 6, 2018 Sinodun IT 6 T. Manderson 7 J. Bond 8 ICANN 9 March 5, 2018 11 C-DNS: A DNS Packet Capture Format 12 draft-ietf-dnsop-dns-capture-format-06 14 Abstract 16 This document describes a data representation for collections of DNS 17 messages. The format is designed for efficient storage and 18 transmission of large packet captures of DNS traffic; it attempts to 19 minimize the size of such packet capture files but retain the full 20 DNS message contents along with the most useful transport metadata. 21 It is intended to assist with the development of DNS traffic 22 monitoring applications. 24 Status of This Memo 26 This Internet-Draft is submitted in full conformance with the 27 provisions of BCP 78 and BCP 79. 29 Internet-Drafts are working documents of the Internet Engineering 30 Task Force (IETF). Note that other groups may also distribute 31 working documents as Internet-Drafts. The list of current Internet- 32 Drafts is at http://datatracker.ietf.org/drafts/current/. 34 Internet-Drafts are draft documents valid for a maximum of six months 35 and may be updated, replaced, or obsoleted by other documents at any 36 time. It is inappropriate to use Internet-Drafts as reference 37 material or to cite them other than as "work in progress." 39 This Internet-Draft will expire on September 6, 2018. 41 Copyright Notice 43 Copyright (c) 2018 IETF Trust and the persons identified as the 44 document authors. All rights reserved. 46 This document is subject to BCP 78 and the IETF Trust's Legal 47 Provisions Relating to IETF Documents 48 (http://trustee.ietf.org/license-info) in effect on the date of 49 publication of this document. Please review these documents 50 carefully, as they describe your rights and restrictions with respect 51 to this document. Code Components extracted from this document must 52 include Simplified BSD License text as described in Section 4.e of 53 the Trust Legal Provisions and are provided without warranty as 54 described in the Simplified BSD License. 56 Table of Contents 58 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 59 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 4 60 3. Data collection use cases . . . . . . . . . . . . . . . . . . 5 61 4. Design considerations . . . . . . . . . . . . . . . . . . . . 7 62 5. Choice of CBOR . . . . . . . . . . . . . . . . . . . . . . . 8 63 6. C-DNS format conceptual overview . . . . . . . . . . . . . . 9 64 6.1. Block Parameters . . . . . . . . . . . . . . . . . . . . 10 65 6.2. Storage Parameters . . . . . . . . . . . . . . . . . . . 10 66 6.2.1. Optional data items . . . . . . . . . . . . . . . . . 10 67 6.2.2. Optional RRs and OPCODES . . . . . . . . . . . . . . 11 68 6.2.3. Storage flags . . . . . . . . . . . . . . . . . . . . 12 69 6.2.4. IP Address storage . . . . . . . . . . . . . . . . . 12 70 7. C-DNS format detailed description . . . . . . . . . . . . . . 12 71 7.1. Map quantities and indexes . . . . . . . . . . . . . . . 12 72 7.2. Tabular representation . . . . . . . . . . . . . . . . . 13 73 7.3. "File" . . . . . . . . . . . . . . . . . . . . . . . . . 14 74 7.4. "FilePreamble" . . . . . . . . . . . . . . . . . . . . . 14 75 7.4.1. "BlockParameters" . . . . . . . . . . . . . . . . . . 15 76 7.4.2. "CollectionParameters" . . . . . . . . . . . . . . . 18 77 7.5. "Block" . . . . . . . . . . . . . . . . . . . . . . . . . 19 78 7.5.1. "BlockPreamble" . . . . . . . . . . . . . . . . . . . 20 79 7.5.2. "BlockStatistics" . . . . . . . . . . . . . . . . . . 21 80 7.5.3. "BlockTables" . . . . . . . . . . . . . . . . . . . . 22 81 7.6. "QueryResponse" . . . . . . . . . . . . . . . . . . . . . 27 82 7.6.1. "ResponseProcessingData" . . . . . . . . . . . . . . 29 83 7.6.2. "QueryResponseExtended" . . . . . . . . . . . . . . . 29 84 7.7. "AddressEventCount" . . . . . . . . . . . . . . . . . . . 30 85 7.8. "MalformedMessage" . . . . . . . . . . . . . . . . . . . 31 86 8. Malformed messages . . . . . . . . . . . . . . . . . . . . . 32 87 9. C-DNS to PCAP . . . . . . . . . . . . . . . . . . . . . . . . 33 88 9.1. Name compression . . . . . . . . . . . . . . . . . . . . 34 89 10. Data collection . . . . . . . . . . . . . . . . . . . . . . . 35 90 10.1. Matching algorithm . . . . . . . . . . . . . . . . . . . 35 91 10.2. Message identifiers . . . . . . . . . . . . . . . . . . 36 92 10.2.1. Primary ID (required) . . . . . . . . . . . . . . . 36 93 10.2.2. Secondary ID (optional) . . . . . . . . . . . . . . 36 94 10.3. Algorithm parameters . . . . . . . . . . . . . . . . . . 36 95 10.4. Algorithm requirements . . . . . . . . . . . . . . . . . 36 96 10.5. Algorithm limitations . . . . . . . . . . . . . . . . . 37 97 10.6. Workspace . . . . . . . . . . . . . . . . . . . . . . . 37 98 10.7. Output . . . . . . . . . . . . . . . . . . . . . . . . . 37 99 10.8. Post processing . . . . . . . . . . . . . . . . . . . . 37 100 11. Implementation guidance . . . . . . . . . . . . . . . . . . . 37 101 11.1. Optional data . . . . . . . . . . . . . . . . . . . . . 38 102 11.2. Trailing data in TCP . . . . . . . . . . . . . . . . . . 38 103 11.3. Limiting collection of RDATA . . . . . . . . . . . . . . 38 104 12. Implementation status . . . . . . . . . . . . . . . . . . . . 38 105 12.1. DNS-STATS Compactor . . . . . . . . . . . . . . . . . . 39 106 13. IANA considerations . . . . . . . . . . . . . . . . . . . . . 39 107 14. Security considerations . . . . . . . . . . . . . . . . . . . 39 108 15. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 39 109 16. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 40 110 17. References . . . . . . . . . . . . . . . . . . . . . . . . . 42 111 17.1. Normative References . . . . . . . . . . . . . . . . . . 42 112 17.2. Informative References . . . . . . . . . . . . . . . . . 42 113 17.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 43 114 Appendix A. CDDL . . . . . . . . . . . . . . . . . . . . . . . . 45 115 Appendix B. DNS Name compression example . . . . . . . . . . . . 54 116 B.1. NSD compression algorithm . . . . . . . . . . . . . . . . 55 117 B.2. Knot Authoritative compression algorithm . . . . . . . . 56 118 B.3. Observed differences . . . . . . . . . . . . . . . . . . 56 119 Appendix C. Comparison of Binary Formats . . . . . . . . . . . . 56 120 C.1. Comparison with full PCAP files . . . . . . . . . . . . . 59 121 C.2. Simple versus block coding . . . . . . . . . . . . . . . 60 122 C.3. Binary versus text formats . . . . . . . . . . . . . . . 60 123 C.4. Performance . . . . . . . . . . . . . . . . . . . . . . . 60 124 C.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . 61 125 C.6. Block size choice . . . . . . . . . . . . . . . . . . . . 61 126 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 62 128 1. Introduction 130 There has long been a need to collect DNS queries and responses on 131 authoritative and recursive name servers for monitoring and analysis. 132 This data is used in a number of ways including traffic monitoring, 133 analyzing network attacks and "day in the life" (DITL) [ditl] 134 analysis. 136 A wide variety of tools already exist that facilitate the collection 137 of DNS traffic data, such as DSC [dsc], packetq [packetq], dnscap 138 [dnscap] and dnstap [dnstap]. However, there is no standard exchange 139 format for large DNS packet captures. The PCAP [pcap] or PCAP-NG 140 [pcapng] formats are typically used in practice for packet captures, 141 but these file formats can contain a great deal of additional 142 information that is not directly pertinent to DNS traffic analysis 143 and thus unnecessarily increases the capture file size. 145 There has also been work on using text based formats to describe DNS 146 packets such as [I-D.daley-dnsxml], [I-D.hoffman-dns-in-json], but 147 these are largely aimed at producing convenient representations of 148 single messages. 150 Many DNS operators may receive hundreds of thousands of queries per 151 second on a single name server instance so a mechanism to minimize 152 the storage size (and therefore upload overhead) of the data 153 collected is highly desirable. 155 The format described in this document, C-DNS (Compacted-DNS), 156 focusses on the problem of capturing and storing large packet capture 157 files of DNS traffic. with the following goals in mind: 159 o Minimize the file size for storage and transmission 161 o Minimizing the overhead of producing the packet capture file and 162 the cost of any further (general purpose) compression of the file 164 This document contains: 166 o A discussion of the some common use cases in which such DNS data 167 is collected Section 3 169 o A discussion of the major design considerations in developing an 170 efficient data representation for collections of DNS messages 171 Section 4 173 o A description of why CBOR [RFC7049] was chosen for this format 174 Section 5 176 o A conceptual overview of the C-DNS format Section 6 178 o The definition of the C-DNS format for the collection of DNS 179 messages Section 7. 181 o Notes on converting C-DNS data to PCAP format Section 9 183 o Some high level implementation considerations for applications 184 designed to produce C-DNS Section 10 186 2. Terminology 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in [RFC2119]. 192 "Packet" refers to individual IPv4 or IPv6 packets. Typically these 193 are UDP, but may be constructed from a TCP packet. "Message", unless 194 otherwise qualified, refers to a DNS payload extracted from a UDP or 195 TCP data stream. 197 The parts of DNS messages are named as they are in [RFC1035]. In 198 specific, the DNS message has five sections: Header, Question, 199 Answer, Authority, and Additional. 201 Pairs of DNS messages are called a Query and a Response. 203 3. Data collection use cases 205 In an ideal world, it would be optimal to collect full packet 206 captures of all packets going in or out of a name server. However, 207 there are several design choices or other limitations that are common 208 to many DNS installations and operators. 210 o DNS servers are hosted in a variety of situations 212 * Self-hosted servers 214 * Third party hosting (including multiple third parties) 216 * Third party hardware (including multiple third parties) 218 o Data is collected under different conditions 220 * On well-provisioned servers running in a steady state 222 * On heavily loaded servers 224 * On virtualized servers 226 * On servers that are under DoS attack 228 * On servers that are unwitting intermediaries in DoS attacks 230 o Traffic can be collected via a variety of mechanisms 232 * On the same hardware as the name server itself 234 * Using a network tap on an adjacent host to listen to DNS 235 traffic 237 * Using port mirroring to listen from another host 239 o The capabilities of data collection (and upload) networks vary 240 * Out-of-band networks with the same capacity as the in-band 241 network 243 * Out-of-band networks with less capacity than the in-band 244 network 246 * Everything being on the in-band network 248 Thus, there is a wide range of use cases from very limited data 249 collection environments (third party hardware, servers that are under 250 attack, packet capture on the name server itself and no out-of-band 251 network) to "limitless" environments (self hosted, well provisioned 252 servers, using a network tap or port mirroring with an out-of-band 253 networks with the same capacity as the in-band network). In the 254 former, it is infeasible to reliably collect full packet captures, 255 especially if the server is under attack. In the latter case, 256 collection of full packet captures may be reasonable. 258 As a result of these restrictions, the C-DNS data format was designed 259 with the most limited use case in mind such that: 261 o data collection will occur on the same hardware as the name server 262 itself 264 o collected data will be stored on the same hardware as the name 265 server itself, at least temporarily 267 o collected data being returned to some central analysis system will 268 use the same network interface as the DNS queries and responses 270 o there can be multiple third party servers involved 272 Because of these considerations, a major factor in the design of the 273 format is minimal storage size of the capture files. 275 Another significant consideration for any application that records 276 DNS traffic is that the running of the name server software and the 277 transmission of DNS queries and responses are the most important jobs 278 of a name server; capturing data is not. Any data collection system 279 co-located with the name server needs to be intelligent enough to 280 carefully manage its CPU, disk, memory and network utilization. This 281 leads to designing a format that requires a relatively low overhead 282 to produce and minimizes the requirement for further potentially 283 costly compression. 285 However, it was also essential that interoperability with less 286 restricted infrastructure was maintained. In particular, it is 287 highly desirable that the collection format should facilitate the re- 288 creation of common formats (such as PCAP) that are as close to the 289 original as is realistic given the restrictions above. 291 4. Design considerations 293 This section presents some of the major design considerations used in 294 the development of the C-DNS format. 296 1. The basic unit of data is a combined DNS Query and the associated 297 Response (a "Q/R data item"). The same structure will be used 298 for unmatched Queries and Responses. Queries without Responses 299 will be captured omitting the response data. Responses without 300 queries will be captured omitting the Query data (but using the 301 Question section from the response, if present, as an identifying 302 QNAME). 304 * Rationale: A Query and Response represents the basic level of 305 a clients interaction with the server. Also, combining the 306 Query and Response into one item often reduces storage 307 requirements due to commonality in the data of the two 308 messages. 310 2. All top level fields in each Q/R data item will be optional. 312 * Rationale: Different users will have different requirements 313 for data to be available for analysis. Users with minimal 314 requirements should not have to pay the cost of recording full 315 data, however this will limit the ability to perform certain 316 kinds of data analysis and also reconstruct packet captures. 317 For example, omitting the resource records from a Response 318 will reduce the C-DNS file size, and in principle responses 319 can be synthesized if there is enough context. 321 3. Multiple Q/R data items will be collected into blocks in the 322 format. Common data in a block will be abstracted and referenced 323 from individual Q/R data items by indexing. The maximum number 324 of Q/R data items in a block will be configurable. 326 * Rationale: This blocking and indexing provides a significant 327 reduction in the volume of file data generated. Although this 328 introduces complexity, it provides compression of the data 329 that makes use of knowledge of the DNS message structure. 331 * It is anticipated that the files produced can be subject to 332 further compression using general purpose compression tools. 333 Measurements show that blocking significantly reduces the CPU 334 required to perform such strong compression. See 335 Appendix C.2. 337 * [TODO: Further discussion of commonality between DNS messages 338 e.g. common query signatures, a finite set of valid responses 339 from authoritatives] 341 4. Traffic metadata can optionally be included in each block. 342 Specifically, counts of some types of non-DNS packets (e.g. 343 ICMP, TCP resets) sent to the server may be of interest. 345 5. The wire format content of malformed DNS messages can optionally 346 be recorded. 348 * Rationale: Any structured capture format that does not capture 349 the DNS payload byte for byte will be limited to some extent 350 in that it cannot represent "malformed" DNS messages (see 351 Section 8). Only those messages that can be fully parsed and 352 transformed into the structured format can be fully 353 represented. Therefore it can greatly aid downstream analysis 354 to have the wire format of the malformed DNS messages 355 available directly in the C-DNS file. Note, however, this can 356 result in rather misleading statistics. For example, a 357 malformed query which cannot be represented in the C-DNS 358 format will lead to the (well formed) DNS responses with error 359 code FORMERR appearing as 'unmatched'. 361 5. Choice of CBOR 363 This document presents a detailed format description using CBOR, the 364 Concise Binary Object Representation defined in [RFC7049]. 366 The choice of CBOR was made taking a number of factors into account. 368 o CBOR is a binary representation, and thus is economical in storage 369 space. 371 o Other binary representations were investigated, and whilst all had 372 attractive features, none had a significant advantage over CBOR. 373 See Appendix C for some discussion of this. 375 o CBOR is an IETF standard and familiar to IETF participants. It is 376 based on the now-common ideas of lists and objects, and thus 377 requires very little familiarization for those in the wider 378 industry. 380 o CBOR is a simple format, and can easily be implemented from 381 scratch if necessary. More complex formats require library 382 support which may present problems on unusual platforms. 384 o CBOR can also be easily converted to text formats such as JSON 385 ([RFC7159]) for debugging and other human inspection requirements. 387 o CBOR data schemas can be described using CDDL 388 [I-D.ietf-cbor-cddl]. 390 6. C-DNS format conceptual overview 392 The following figures show purely schematic representations of the 393 C-DNS format to convey the high-level structure of the C-DNS format. 394 Section 7 provides a detailed discussion of the CBOR representation 395 and individual elements. 397 Figure showing the C-DNS format (PNG) [1] 399 Figure showing the C-DNS format (SVG) [2] 401 Figure showing the Query/Response data item and Block Tables format 402 (PNG) [3] 404 Figure showing the Query/Response item and Block Tables format (SVG) 405 [4] 407 A C-DNS file begins with a file header containing a File Type 408 Identifier and a File Preamble. The File Preamble contains 409 information on the file Format Version and an array of Block 410 Parameters items (the contents of which include Collection and 411 Storage Parameters used for one or more blocks). 413 The file header is followed by a series of data Blocks. 415 A Block consists of a Block Preamble item, some Block Statistics for 416 the traffic stored within the Block and then various arrays of common 417 data collectively called the Block Tables. This is then followed by 418 an array of the Query/Response data items detailing the queries and 419 responses stored within the Block. The array of Query/Response data 420 items is in turn followed by the Address/Event Counts data items (an 421 array of per-client counts of particular IP events) and then 422 Malformed Message data items (an array of malformed messages that 423 stored in the Block). 425 The exact nature of the DNS data will affect what block size is the 426 best fit, however sample data for a root server indicated that block 427 sizes up to 10,000 Q/R data items give good results. See 428 Appendix C.6 for more details. 430 6.1. Block Parameters 432 The details of the Block Parameters items are not shown in the 433 diagrams but are discussed here for context. 435 An array of Block Parameters items is stored in the File Preamble 436 (with a minimum of one item at index 0); a Block Parameters item 437 consists of a collection of Storage and Collection Parameters that 438 applies to any given Block. An array is used in order to support use 439 cases such as wanting to merge C-DNS files from different sources. 440 The Block Preamble item then contains an optional index for the Block 441 Parameters item that applies for that Block; if not present the index 442 defaults to 0. Hence, in effect, a global Block Parameters item is 443 defined which can then be overridden per Block. 445 6.2. Storage Parameters 447 The Block Parameters item includes a Storage Parameters item - this 448 contains information about the specific data fields stored in the 449 C-DNS file. 451 These parameters include: 453 o The sub-second timing resolution used by the data. 455 o Information (hints) on which optional data items can be expected 456 to appear in the data. See Section 6.2.1. 458 o Recorded OPCODES and RR types. See Section 6.2.2. 460 o Flags indicating, for example, whether the data is sampled or 461 anonymised. See Section 6.2.3. 463 o Client and server IPv4 and IPv6 address prefixes. See 464 Section 6.2.4 466 6.2.1. Optional data items 468 To enable applications to store data to their precise requirements in 469 as space-efficient manner as possible, all fields in the following 470 arrays are optional: 472 o Query/Response 474 o Query Signature 476 o Malformed messages 477 In other words, an application can choose to omit any data item that 478 is not required for its use case. In addition, implementations may 479 be configured to not record all RRs, or only record messages with 480 certain OPCODES. 482 This does, however, mean that a consumer of a C-DNS file faces two 483 problems: 485 1. How can it quickly determine whether a file contains the data 486 items it requires to complete a particular task (e.g. 487 reconstructing query traffic or performing a specific piece of 488 data analysis)? 490 2. How can it determine if a data item is not present because it was 492 * explicitly not recorded, or 494 * either was not present in the original data stream or the data 495 item was not available to the collecting application? 497 For example, an application capturing C-DNS data from within a 498 nameserver implementation is unlikely to be able to record the Client 499 Hoplimit. Or, if there is no query ARCount recorded and no query OPT 500 RDATA recorded, is that because no query contained an OPT RR, or 501 because that data was not stored? 503 The Storage Parameters therefore also contains a Storage Hints item 504 which specifies whether the encoder of the file recorded each data 505 item if it was present. An application decoding that file can then 506 use these to quickly determine whether the input data is rich enough 507 for its needs. 509 QUESTION: Should the items within certain tables also be optional 510 e.g. within the RR table should all of Name index, ClassType, TTL and 511 RDATA be optional? 513 6.2.2. Optional RRs and OPCODES 515 Also included in the Storage Parameters is an explicit array of the 516 RR types and OPCODES that were recorded. Using an explicit array 517 removes any ambiguity about whether the OPCODE/RR type was not 518 recognised by the collecting implementation or whether it was 519 specifically configured not to record it. 521 In the case of RR records, each record must be parsable, including 522 parsing the record RDATA, to determine whether it is correctly 523 formed. Otherwise it has to be regarded as at least potentially 524 partially malformed. See Section 8 for further discussion of storing 525 partially parsed messages. 527 For the case of unrecognised OPCODES the message may be parsable (for 528 example, if it has a format similar enough to the one described in 529 [RFC1035]) or it may not. See Section 8 for further discussion of 530 storing partially parsed messages. 532 6.2.3. Storage flags 534 The Storage Parameters contains flags that can be used to indicate 535 if: 537 o the data is anonymised, 539 o the data is produced from sample data, or 541 o names in the data have been normalised (converted to uniform 542 case). 544 The Storage Parameters also contains optional fields holding details 545 of the sampling method used and the anonymisation method used. It is 546 RECOMMENDED these fields contain URIs pointing to resources 547 describing the methods used. 549 6.2.4. IP Address storage 551 The format contains fields to indicate if only IP prefixes were 552 stored. If IP address prefixes are given, only the prefix bits of 553 addresses are stored. For example, if a client IPv4 prefix of 16 is 554 specified, a client address of 192.0.2.1 will be stored as 0xc000 555 (192.0), reducing address storage space requirements. 557 7. C-DNS format detailed description 559 The CDDL definition for the C-DNS format is given in Appendix A. 561 7.1. Map quantities and indexes 563 All map keys are integers with values specified in the CDDL. String 564 keys would significantly bloat the file size. 566 All key values specified are positive integers under 24, so their 567 CBOR representation is a single byte. Positive integer values not 568 currently used as keys in a map are reserved for use in future 569 standard extensions. 571 Implementations may choose to add additional implementation-specific 572 entries to any map. Negative integer map keys are reserved for these 573 values. Key values from -1 to -24 also have a single byte CBOR 574 representation, so such implementation-specific extensions are not at 575 any space efficiency disadvantage. 577 An item described as an index is the index of the data item in the 578 referenced array. Indexes are 0-based. 580 7.2. Tabular representation 582 The following sections present the C-DNS specification in tabular 583 format with a detailed description of each item. 585 In all quantities that contain bit flags, bit 0 indicates the least 586 significant bit, i.e. flag "n" in quantity "q" is on if "(q & (1 << 587 n)) != 0". 589 For the sake of readability, all type and field names defined in the 590 CDDL definition are shown in double quotes. Type names are by 591 convention camel case (e.g. "BlockTable"), field names are lower- 592 case with hyphens (e.g. "block-tables"). 594 For the sake of brevity, the following conventions are used in the 595 tables: 597 o The column O marks whether items in a map are optional. 599 * O - Optional. The item may be omitted. 601 * M - Mandatory. The item must be present. 603 o The column T gives the CBOR data type of the item. 605 * U - Unsigned integer 607 * I - Signed integer 609 * B - Byte string 611 * T - Text string 613 * M - Map 615 * A - Array 617 In the case of maps and arrays, more information on the type of each 618 value, include the CDDL definition name if applicable, is given in 619 the description. 621 7.3. "File" 623 A C-DNS file has an outer structure "File", a map that contains the 624 following: 626 +---------------+---+---+-------------------------------------------+ 627 | Field | O | T | Description | 628 +---------------+---+---+-------------------------------------------+ 629 | file-type-id | M | T | String "C-DNS" identifying the file type. | 630 | | | | | 631 | file-preamble | M | M | Version and parameter information for the | 632 | | | | whole file. Map of type "FilePreamble", | 633 | | | | see Section 7.4. | 634 | | | | | 635 | file-blocks | M | A | Array of items of type "Block", see | 636 | | | | Section 7.5. The array may be empty if | 637 | | | | the file contains no data. | 638 +---------------+---+---+-------------------------------------------+ 640 7.4. "FilePreamble" 642 Information about data in the file. A map containing the following: 644 +----------------------+---+---+------------------------------------+ 645 | Field | O | T | Description | 646 +----------------------+---+---+------------------------------------+ 647 | major-format-version | M | U | Unsigned integer '1'. The major | 648 | | | | version of format used in file. | 649 | | | | | 650 | minor-format-version | M | U | Unsigned integer '0'. The minor | 651 | | | | version of format used in file. | 652 | | | | | 653 | private-version | O | U | Version indicator available for | 654 | | | | private use by applications. | 655 | | | | | 656 | block-parameters | M | A | Array of items of type | 657 | | | | "BlockParameters", see Section | 658 | | | | 7.4.1. The array must contain at | 659 | | | | least one entry. (The "block- | 660 | | | | parameters-index" item in each | 661 | | | | "BlockPreamble" indicates which | 662 | | | | array entry applies to that | 663 | | | | "Block".) | 664 +----------------------+---+---+------------------------------------+ 666 7.4.1. "BlockParameters" 668 Parameters relating to data storage and collection which apply to one 669 or more items of type "Block". A map containing the following: 671 +-----------------------+---+---+-----------------------------------+ 672 | Field | O | T | Description | 673 +-----------------------+---+---+-----------------------------------+ 674 | storage-parameters | M | M | Parameters relating to data | 675 | | | | storage in a "Block" item. Map | 676 | | | | of type "StorageParameters", see | 677 | | | | Section 7.4.1.1. | 678 | | | | | 679 | collection-parameters | O | M | Parameters relating to collection | 680 | | | | of the data in a "Block" item. | 681 | | | | Map of type | 682 | | | | "CollectionParameters", see | 683 | | | | Section 7.4.2. | 684 +-----------------------+---+---+-----------------------------------+ 686 7.4.1.1. "StorageParameters" 688 Parameters relating to how data is stored in the items of type 689 "Block". A map containing the following: 691 +------------------+---+---+----------------------------------------+ 692 | Field | O | T | Description | 693 +------------------+---+---+----------------------------------------+ 694 | ticks-per-second | M | U | Sub-second timing is recorded in | 695 | | | | ticks. This specifies the number of | 696 | | | | ticks in a second. | 697 | | | | | 698 | max-block-items | M | U | The maximum number of items stored in | 699 | | | | any of the arrays in a "Block" item | 700 | | | | (Q/R items, address event counts or | 701 | | | | malformed messages). An indication to | 702 | | | | a decoder of the resources needed to | 703 | | | | process the file. | 704 | | | | | 705 | storage-hints | M | M | Collection of hints as to which fields | 706 | | | | are present in the arrays that have | 707 | | | | optional fields. Map of type | 708 | | | | "StorageHints", see Section 7.4.1.1.1. | 709 | | | | | 710 | opcodes | M | A | Array of OPCODES (unsigned integers) | 711 | | | | recorded by the collection | 712 | | | | application. See Section 6.2.2. | 713 | | | | | 714 | rr-types | M | A | Array of RR types (unsigned integers) | 715 | | | | recorded by the collection | 716 | | | | application. See Section 6.2.2. | 717 | | | | | 718 | storage-flags | O | U | Bit flags indicating attributes of | 719 | | | | stored data. | 720 | | | | Bit 0. The data has been anonymised. | 721 | | | | Bit 1. The data is sampled data. | 722 | | | | Bit 2. Names have been normalised | 723 | | | | (converted to uniform case). | 724 | | | | | 725 | client-address | O | U | IPv4 client address prefix length. If | 726 | -prefix-ipv4 | | | specified, only the address prefix | 727 | | | | bits are stored. | 728 | | | | | 729 | client-address | O | U | IPv6 client address prefix length. If | 730 | -prefix-ipv6 | | | specified, only the address prefix | 731 | | | | bits are stored. | 732 | | | | | 733 | server-address | O | U | IPv4 server address prefix length. If | 734 | -prefix-ipv4 | | | specified, only the address prefix | 735 | | | | bits are stored. | 736 | | | | | 737 | server-address | O | U | IPv6 server address prefix length. If | 738 | -prefix-ipv6 | | | specified, only the address prefix | 739 | | | | bits are stored. | 740 | | | | | 741 | sampling-method | O | T | Information on the sampling method | 742 | | | | used. See Section 6.2.3. | 743 | | | | | 744 | anonymisation | O | T | Information on the anonymisation | 745 | -method | | | method used. See Section 6.2.3. | 746 +------------------+---+---+----------------------------------------+ 748 7.4.1.1.1. "StorageHints" 750 An indicator of which fields the collecting application stores in the 751 arrays with optional fields. A map containing the following: 753 +------------------+---+---+----------------------------------------+ 754 | Field | O | T | Description | 755 +------------------+---+---+----------------------------------------+ 756 | query-response | M | U | Hints indicating which "QueryResponse" | 757 | -hints | | | fields are stored, see section Section | 758 | | | | 7.6. If the field is stored the bit is | 759 | | | | set. | 760 | | | | Bit 0. time-offset | 761 | | | | Bit 1. client-address-index | 762 | | | | Bit 2. client-port | 763 | | | | Bit 3. transaction-id | 764 | | | | Bit 4. qr-signature-index | 765 | | | | Bit 5. client-hoplimit | 766 | | | | Bit 6. response-delay | 767 | | | | Bit 7. query-name-index | 768 | | | | Bit 8. query-size | 769 | | | | Bit 9. response-size | 770 | | | | Bit 10. response-processing-data | 771 | | | | Bit 11. query-question-sections | 772 | | | | Bit 12. query-answer-sections | 773 | | | | Bit 13. query-authority-sections | 774 | | | | Bit 14. query-additional-sections | 775 | | | | Bit 15. response-answer-sections | 776 | | | | Bit 16. response-authority-sections | 777 | | | | Bit 17. response-additional-sections | 778 | | | | | 779 | query-response | M | U | Hints indicating which | 780 | -signature-hints | | | "QueryResponseSignature" fields are | 781 | | | | stored, see section Section 7.5.3.2. | 782 | | | | If the field is stored the bit is set. | 783 | | | | Bit 0. server-address | 784 | | | | Bit 1. server-port | 785 | | | | Bit 2. qr-transport-flags | 786 | | | | Bit 3. qr-type | 787 | | | | Bit 4. qr-sig-flags | 788 | | | | Bit 5. query-opcode | 789 | | | | Bit 6. dns-flags | 790 | | | | Bit 7. query-rcode | 791 | | | | Bit 8. query-class-type | 792 | | | | Bit 9. query-qdcount | 793 | | | | Bit 10. query-ancount | 794 | | | | Bit 11. query-nscount | 795 | | | | Bit 12. query-arcount | 796 | | | | Bit 13. query-edns-version | 797 | | | | Bit 14. query-udp-size | 798 | | | | Bit 15. query-opt-rdata | 799 | | | | Bit 16. response-rcode | 800 | | | | | 801 | rr-hints | M | U | Hints indicating which optional "RR" | 802 | | | | fields are stored, see Section | 803 | | | | 7.5.3.4. If the data type is stored | 804 | | | | the bit is set. | 805 | | | | Bit 0. ttl | 806 | other-data-hints | M | U | Hints indicating which other data | 807 | | | | types are stored. If the data type is | 808 | | | | stored the bit is set. | 809 | | | | Bit 0. malformed-messages | 810 | | | | Bit 1. address-event-counts | 811 +------------------+---+---+----------------------------------------+ 813 TODO: Revise non-QueryResponse hints to cover optional fields in 814 malformed message data maps. 816 7.4.2. "CollectionParameters" 818 Parameters relating to how data in the file was collected. 820 These parameters have no default. If they do not appear, nothing can 821 be inferred about their value. 823 A map containing the following items: 825 +------------------+---+---+----------------------------------------+ 826 | Field | O | T | Description | 827 +------------------+---+---+----------------------------------------+ 828 | query-timeout | O | U | To be matched with a query, a response | 829 | | | | must arrive within this number of | 830 | | | | seconds. | 831 | | | | | 832 | skew-timeout | O | U | The network stack may report a | 833 | | | | response before the corresponding | 834 | | | | query. A response is not considered to | 835 | | | | be missing a query until after this | 836 | | | | many micro-seconds. | 837 | | | | | 838 | snaplen | O | U | Collect up to this many bytes per | 839 | | | | packet. | 840 | | | | | 841 | promisc | O | U | 1 if promiscuous mode was enabled on | 842 | | | | the interface, 0 otherwise. | 843 | | | | | 844 | interfaces | O | A | Array of identifiers (of type text | 845 | | | | string) of the interfaces used for | 846 | | | | collection. | 847 | | | | | 848 | server-addresses | O | A | Array of server collection IP | 849 | | | | addresses (of type byte string). Hint | 850 | | | | for downstream analysers; does not | 851 | | | | affect collection. | 852 | | | | | 853 | vlan-ids | O | A | Array of identifiers (of type unsigned | 854 | | | | integer) of VLANs selected for | 855 | | | | collection. | 856 | | | | | 857 | filter | O | T | "tcpdump" [pcap] style filter for | 858 | | | | input. | 859 | | | | | 860 | generator-id | O | T | String identifying the collection | 861 | | | | method. | 862 | | | | | 863 | host-id | O | T | String identifying the collecting | 864 | | | | host. Empty if converting an existing | 865 | | | | packet capture file. | 866 +------------------+---+---+----------------------------------------+ 868 7.5. "Block" 870 Container for data with common collection and and storage parameters. 871 A map containing the following: 873 +--------------------+---+---+--------------------------------------+ 874 | Field | O | T | Description | 875 +--------------------+---+---+--------------------------------------+ 876 | block-preamble | M | M | Overall information for the "Block" | 877 | | | | item. Map of type "BlockPreamble", | 878 | | | | see Section 7.5.1. | 879 | | | | | 880 | block-statistics | O | M | Statistics about the "Block" item. | 881 | | | | Map of type "BlockStatistics", see | 882 | | | | Section 7.5.2. | 883 | | | | | 884 | block-tables | O | M | The arrays containing data | 885 | | | | referenced by individual | 886 | | | | "QueryResponse" or | 887 | | | | "MalformedMessage" items. Map of | 888 | | | | type "BlockTables", see Section | 889 | | | | 7.5.3. | 890 | | | | | 891 | query-responses | O | A | Details of individual DNS Q/R data | 892 | | | | items. Array of items of type | 893 | | | | "QueryResponse", see Section 7.6. If | 894 | | | | present, the array must not be | 895 | | | | empty. | 896 | | | | | 897 | address-event | O | A | Per client counts of ICMP messages | 898 | -counts | | | and TCP resets. Array of items of | 899 | | | | type "AddressEventCount", see | 900 | | | | Section 7.7. If present, the array | 901 | | | | must not be empty. | 902 | | | | | 903 | malformed-messages | O | A | Details of malformed DNS messages. | 904 | | | | Array of items of type | 905 | | | | "MalformedMessage", see Section 7.8. | 906 | | | | If present, the array must not be | 907 | | | | empty. | 908 +--------------------+---+---+--------------------------------------+ 910 7.5.1. "BlockPreamble" 912 Overall information for a "Block" item. A map containing the 913 following: 915 +------------------+---+---+----------------------------------------+ 916 | Field | O | T | Description | 917 +------------------+---+---+----------------------------------------+ 918 | earliest-time | O | A | A timestamp (2 unsigned integers, | 919 | | | | "Timestamp") for the earliest record | 920 | | | | in the "Block" item. The first integer | 921 | | | | is the number of seconds since the | 922 | | | | Posix epoch ("time_t"). The second | 923 | | | | integer is the number of ticks since | 924 | | | | the start of the second. This | 925 | | | | timestamp can only be omitted if all | 926 | | | | block items containing a time offset | 927 | | | | from the start of the block also omit | 928 | | | | the timestamp. | 929 | | | | | 930 | block-parameters | O | U | The index of the item in the "block- | 931 | -index | | | parameters" array (in the "file- | 932 | | | | premable" item) applicable to this | 933 | | | | block. If not present, index 0 is | 934 | | | | used. See Section 7.4.1. | 935 +------------------+---+---+----------------------------------------+ 937 7.5.2. "BlockStatistics" 939 Basic statistical information about a "Block" item. A map containing 940 the following: 942 +---------------------+---+---+-------------------------------------+ 943 | Field | O | T | Description | 944 +---------------------+---+---+-------------------------------------+ 945 | total-messages | O | U | Total number of DNS messages | 946 | | | | processed from the input traffic | 947 | | | | stream during collection of data in | 948 | | | | this "Block" item. | 949 | | | | | 950 | total-pairs | O | U | Total number of Q/R data items in | 951 | | | | this "Block" item. | 952 | | | | | 953 | unmatched-queries | O | U | Number of unmatched queries in this | 954 | | | | "Block" item. | 955 | | | | | 956 | unmatched-responses | O | U | Number of unmatched responses in | 957 | | | | this "Block" item. | 958 | | | | | 959 | malformed-messages | O | U | Number of malformed messages found | 960 | | | | in input for this "Block" item. | 961 +---------------------+---+---+-------------------------------------+ 963 7.5.3. "BlockTables" 965 Arrays containing data referenced by individual "QueryResponse" or 966 "MalformedMessage" items in this "Block". Each element is an array 967 which, if present, must not be empty. 969 An item in the "qlist" array contains indexes to values in the "qrr" 970 array. Therefore, if "qlist" is present, "qrr" must also be present. 971 Similarly, if "rrlist" is present, "rr" must also be present. 973 The map contains the following items: 975 +-------------------+---+---+---------------------------------------+ 976 | Field | O | T | Description | 977 +-------------------+---+---+---------------------------------------+ 978 | ip-address | O | A | Array of IP addresses, in network | 979 | | | | byte order (of type byte string). If | 980 | | | | client or server address prefixes are | 981 | | | | set, only the address prefix bits are | 982 | | | | stored. Each string is therefore up | 983 | | | | to 4 bytes long for an IPv4 address, | 984 | | | | or up to 16 bytes long for an IPv6 | 985 | | | | address. See Section 7.4.1.1. | 986 | | | | | 987 | classtype | O | A | Array of RR class and type | 988 | | | | information. Type is "ClassType", see | 989 | | | | Section 7.5.3.1. | 990 | | | | | 991 | name-rdata | O | A | Array where each entry is the | 992 | | | | contents of a single NAME or RDATA | 993 | | | | (of type byte string). Note that | 994 | | | | NAMEs, and labels within RDATA | 995 | | | | contents, are full domain names or | 996 | | | | labels; no DNS style name compression | 997 | | | | is used on the individual | 998 | | | | names/labels within the format. | 999 | | | | | 1000 | qr-sig | O | A | Array Q/R data item signatures. Type | 1001 | | | | is "QueryResponseSignature", see | 1002 | | | | Section 7.5.3.2. | 1003 | | | | | 1004 | qlist | O | A | Array of type "QuestionList". A | 1005 | | | | "QuestionList" is an array of | 1006 | | | | unsigned integers, indexes to | 1007 | | | | "Question" items in the "qrr" array. | 1008 | | | | | 1009 | qrr | O | A | Array of type "Question". Each entry | 1010 | | | | is the contents of a single question, | 1011 | | | | where a question is the second or | 1012 | | | | subsequent question in a query. See | 1013 | | | | Section 7.5.3.3. | 1014 | | | | | 1015 | rrlist | O | A | Array of type "RRList". An "RRList" | 1016 | | | | is an array of unsigned integers, | 1017 | | | | indexes to "RR" items in the "rr" | 1018 | | | | array. | 1019 | | | | | 1020 | rr | O | A | Array of type "RR". Each entry is the | 1021 | | | | contents of a single RR. See Section | 1022 | | | | 7.5.3.4. | 1023 | | | | | 1024 | malformed-message | O | A | Array of the contents of malformed | 1025 | -data | | | messages. Array of type | 1026 | | | | "MalformedMessageData", see Section | 1027 | | | | 7.5.3.5. | 1028 +-------------------+---+---+---------------------------------------+ 1030 7.5.3.1. "ClassType" 1032 RR class and type information. A map containing the following: 1034 +-------+---+---+--------------+ 1035 | Field | O | T | Description | 1036 +-------+---+---+--------------+ 1037 | type | M | U | TYPE value. | 1038 | | | | | 1039 | class | M | U | CLASS value. | 1040 +-------+---+---+--------------+ 1042 7.5.3.2. "QueryResponseSignature" 1044 Elements of a Q/R data item that are often common between multiple 1045 individual Q/R data items. A map containing the following: 1047 +--------------------+---+---+--------------------------------------+ 1048 | Field | O | T | Description | 1049 +--------------------+---+---+--------------------------------------+ 1050 | server-address | O | U | The index in the item in the "ip- | 1051 | -index | | | address" array of the server IP | 1052 | | | | address. See Section 7.5.3. | 1053 | | | | | 1054 | server-port | O | U | The server port. | 1055 | | | | | 1056 | qr-transport-flags | O | U | Bit flags describing the transport | 1057 | | | | used to service the query. | 1058 | | | | Bit 0. IP version. 0 = IPv4, 1 = | 1059 | | | | IPv6 | 1060 | | | | Bit 1-4. Transport. 0 = UDP, 1 = | 1061 | | | | TCP, 2 = TLS, 3 = DTLS. | 1062 | | | | Bit 5. Trailing bytes in query | 1063 | | | | payload. The DNS query message in | 1064 | | | | the UDP or TCP payload was followed | 1065 | | | | by some additional bytes, which were | 1066 | | | | discarded. | 1067 | | | | | 1068 | qr-type | O | U | Type of Query/Response transaction. | 1069 | | | | 0 = Stub. A query from a stub | 1070 | | | | resolver. | 1071 | | | | 1 = Client. An incoming query to a | 1072 | | | | recursive resolver. | 1073 | | | | 2 = Resolver. A query sent from a | 1074 | | | | recursive resolver to an authorative | 1075 | | | | resolver. | 1076 | | | | 3 = Authorative. A query to an | 1077 | | | | authorative resolver. | 1078 | | | | 4 = Forwarder. A query sent from a | 1079 | | | | recursive resolver to an upstream | 1080 | | | | recursive resolver. | 1081 | | | | 5 = Tool. A query sent to a server | 1082 | | | | by a server tool. | 1083 | | | | | 1084 | qr-sig-flags | O | U | Bit flags indicating information | 1085 | | | | present in this Q/R data item. | 1086 | | | | Bit 0. 1 if a Query is present. | 1087 | | | | Bit 1. 1 if a Response is present. | 1088 | | | | Bit 2. 1 if one or more Question is | 1089 | | | | present. | 1090 | | | | Bit 3. 1 if a Query is present and | 1091 | | | | it has an OPT Resource Record. | 1092 | | | | Bit 4. 1 if a Response is present | 1093 | | | | and it has an OPT Resource Record. | 1094 | | | | Bit 5. 1 if a Response is present | 1095 | | | | but has no Question. | 1096 | | | | | 1097 | query-opcode | O | U | Query OPCODE. | 1098 | | | | | 1099 | qr-dns-flags | O | U | Bit flags with values from the Query | 1100 | | | | and Response DNS flags. Flag values | 1101 | | | | are 0 if the Query or Response is | 1102 | | | | not present. | 1103 | | | | Bit 0. Query Checking Disabled (CD). | 1104 | | | | Bit 1. Query Authenticated Data | 1105 | | | | (AD). | 1106 | | | | Bit 2. Query reserved (Z). | 1107 | | | | Bit 3. Query Recursion Available | 1108 | | | | (RA). | 1109 | | | | Bit 4. Query Recursion Desired (RD). | 1110 | | | | Bit 5. Query TrunCation (TC). | 1111 | | | | Bit 6. Query Authoritative Answer | 1112 | | | | (AA). | 1113 | | | | Bit 7. Query DNSSEC answer OK (DO). | 1114 | | | | Bit 8. Response Checking Disabled | 1115 | | | | (CD). | 1116 | | | | Bit 9. Response Authenticated Data | 1117 | | | | (AD). | 1118 | | | | Bit 10. Response reserved (Z). | 1119 | | | | Bit 11. Response Recursion Available | 1120 | | | | (RA). | 1121 | | | | Bit 12. Response Recursion Desired | 1122 | | | | (RD). | 1123 | | | | Bit 13. Response TrunCation (TC). | 1124 | | | | Bit 14. Response Authoritative | 1125 | | | | Answer (AA). | 1126 | | | | | 1127 | query-rcode | O | U | Query RCODE. If the Query contains | 1128 | | | | OPT, this value incorporates any | 1129 | | | | EXTENDED_RCODE_VALUE. | 1130 | | | | | 1131 | query-classtype | O | U | The index to the item in the the | 1132 | -index | | | "classtype" array of the CLASS and | 1133 | | | | TYPE of the first Question. See | 1134 | | | | Section 7.5.3. | 1135 | | | | | 1136 | query-qd-count | O | U | The QDCOUNT in the Query, or | 1137 | | | | Response if no Query present. | 1138 | | | | | 1139 | query-an-count | O | U | Query ANCOUNT. | 1140 | | | | | 1141 | query-ns-count | O | U | Query NSCOUNT. | 1142 | | | | | 1143 | query-ar-count | O | U | Query ARCOUNT. | 1144 | | | | | 1145 | edns-version | O | U | The Query EDNS version. | 1146 | | | | | 1147 | udp-buf-size | O | U | The Query EDNS sender's UDP payload | 1148 | | | | size. | 1149 | | | | | 1150 | opt-rdata-index | O | U | The index in the "name-rdata" array | 1151 | | | | of the OPT RDATA. See Section 7.5.3. | 1152 | | | | | 1153 | response-rcode | O | U | Response RCODE. If the Response | 1154 | | | | contains OPT, this value | 1155 | | | | incorporates any | 1156 | | | | EXTENDED_RCODE_VALUE. | 1157 +--------------------+---+---+--------------------------------------+ 1159 QUESTION: Currently we collect OPT RDATA as a blob as this is 1160 consistent with and re-uses the generic mechanism for RDATA storage. 1161 Should we break individual EDNS(0) options into Option code and data 1162 and store the data separately in a new array within the Block type? 1163 This would potentially allow exploitation of option data commonality. 1165 7.5.3.3. "Question" 1167 Details on individual Questions in a Question section. A map 1168 containing the following: 1170 +-----------------+---+---+-----------------------------------------+ 1171 | Field | O | T | Description | 1172 +-----------------+---+---+-----------------------------------------+ 1173 | name-index | M | U | The index in the "name-rdata" array of | 1174 | | | | the QNAME. See Section 7.5.3. | 1175 | | | | | 1176 | classtype-index | M | U | The index in the "classtype" array of | 1177 | | | | the CLASS and TYPE of the Question. See | 1178 | | | | Section 7.5.3. | 1179 +-----------------+---+---+-----------------------------------------+ 1181 7.5.3.4. "RR" 1183 Details on individual Resource Records in RR sections. A map 1184 containing the following: 1186 +-----------------+---+---+-----------------------------------------+ 1187 | Field | O | T | Description | 1188 +-----------------+---+---+-----------------------------------------+ 1189 | name-index | M | U | The index in the "name-rdata" array of | 1190 | | | | the NAME. See Section 7.5.3. | 1191 | | | | | 1192 | classtype-index | M | U | The index in the "classtype" array of | 1193 | | | | the CLASS and TYPE of the RR. See | 1194 | | | | Section 7.5.3. | 1195 | | | | | 1196 | ttl | O | U | The RR Time to Live. | 1197 | | | | | 1198 | rdata-index | M | U | The index in the "name-rdata" array of | 1199 | | | | the RR RDATA. See Section 7.5.3. | 1200 +-----------------+---+---+-----------------------------------------+ 1202 7.5.3.5. "MalformedMessageData" 1204 Details on malformed message items in this "Block" item. A map 1205 containing the following: 1207 +--------------------+---+---+--------------------------------------+ 1208 | Field | O | T | Description | 1209 +--------------------+---+---+--------------------------------------+ 1210 | server-address | O | U | The index in the "ip-address" array | 1211 | -index | | | of the server IP address. See | 1212 | | | | Section 7.5.3. | 1213 | | | | | 1214 | server-port | O | U | The server port. | 1215 | | | | | 1216 | mm-transport-flags | O | U | Bit flags describing the transport | 1217 | | | | used to service the query. Bit 0 is | 1218 | | | | the least significant bit. | 1219 | | | | Bit 0. IP version. 0 = IPv4, 1 = | 1220 | | | | IPv6 | 1221 | | | | Bit 1-4. Transport. 0 = UDP, 1 = | 1222 | | | | TCP, 2 = TLS, 3 = DTLS. | 1223 | | | | | 1224 | mm-payload | O | B | The payload (raw bytes) of the DNS | 1225 | | | | message. | 1226 +--------------------+---+---+--------------------------------------+ 1228 7.6. "QueryResponse" 1230 Details on individual Q/R data items. 1232 Note that there is no requirement that the elements of the "query- 1233 responses" array are presented in strict chronological order. 1235 A map containing the following items: 1237 +----------------------+---+---+------------------------------------+ 1238 | Field | O | T | Description | 1239 +----------------------+---+---+------------------------------------+ 1240 | time-offset | O | U | Q/R timestamp as an offset in | 1241 | | | | ticks from "earliest-time". The | 1242 | | | | timestamp is the timestamp of the | 1243 | | | | Query, or the Response if there is | 1244 | | | | no Query. | 1245 | | | | | 1246 | client-address-index | O | U | The index in the "ip-address" | 1247 | | | | array of the client IP address. | 1248 | | | | See Section 7.5.3. | 1249 | | | | | 1250 | client-port | O | U | The client port. | 1251 | | | | | 1252 | transaction-id | O | U | DNS transaction identifier. | 1253 | | | | | 1254 | qr-signature-index | O | U | The index in the "qr-sig" array of | 1255 | | | | the "QueryResponseSignature" item. | 1256 | | | | See Section 7.5.3. | 1257 | | | | | 1258 | client-hoplimit | O | U | The IPv4 TTL or IPv6 Hoplimit from | 1259 | | | | the Query packet. | 1260 | | | | | 1261 | response-delay | O | I | The time difference between Query | 1262 | | | | and Response, in ticks. Only | 1263 | | | | present if there is a query and a | 1264 | | | | response. The delay can be | 1265 | | | | negative if the network | 1266 | | | | stack/capture library returns | 1267 | | | | packets out of order. | 1268 | | | | | 1269 | query-name-index | O | U | The index in the "name-rdata" | 1270 | | | | array of the item containing the | 1271 | | | | QNAME for the first Question. See | 1272 | | | | Section 7.5.3. | 1273 | | | | | 1274 | query-size | O | U | DNS query message size (see | 1275 | | | | below). | 1276 | | | | | 1277 | response-size | O | U | DNS query message size (see | 1278 | | | | below). | 1279 | | | | | 1280 | response-processing | O | M | Data on response processing. Map | 1281 | -data | | | of type "ResponseProcessingData", | 1282 | | | | see Section 7.6.1. | 1283 | | | | | 1284 | query-extended | O | M | Extended Query data. Map of type | 1285 | | | | "QueryResponseExtended", see | 1286 | | | | Section 7.6.2. | 1287 | | | | | 1288 | response-extended | O | M | Extended Response data. Map of | 1289 | | | | type "QueryResponseExtended", see | 1290 | | | | Section 7.6.2. | 1291 +----------------------+---+---+------------------------------------+ 1293 The "query-size" and "response-size" fields hold the DNS message 1294 size. For UDP this is the size of the UDP payload that contained the 1295 DNS message. For TCP it is the size of the DNS message as specified 1296 in the two-byte message length header. Trailing bytes with queries 1297 are routinely observed in traffic to authoritative servers and this 1298 value allows a calculation of how many trailing bytes were present. 1300 7.6.1. "ResponseProcessingData" 1302 Information on the server processing that produced the response. A 1303 map containing the following: 1305 +------------------+---+---+----------------------------------------+ 1306 | Field | O | T | Description | 1307 +------------------+---+---+----------------------------------------+ 1308 | bailiwick-index | O | U | The index in the "name-rdata" array of | 1309 | | | | the owner name for the response | 1310 | | | | bailiwick. See Section 7.5.3. | 1311 | | | | | 1312 | processing-flags | O | U | Flags relating to response processing. | 1313 | | | | Bit 0. 1 if the response came from | 1314 | | | | cache. | 1315 +------------------+---+---+----------------------------------------+ 1317 QUESTION: Should this be an item in the "QueryResponseSignature"? 1319 7.6.2. "QueryResponseExtended" 1321 Extended data on the Q/R data item. 1323 Each item in the map is present only if collection of the relevant 1324 details is configured. 1326 A map containing the following items: 1328 +------------------+---+---+----------------------------------------+ 1329 | Field | O | T | Description | 1330 +------------------+---+---+----------------------------------------+ 1331 | question-index | O | U | The index in the "qlist" array of the | 1332 | | | | entry listing any second and | 1333 | | | | subsequent Questions in the Question | 1334 | | | | section for the Query or Response. See | 1335 | | | | Section 7.5.3. | 1336 | | | | | 1337 | answer-index | O | U | The index in the "rrlist" array of the | 1338 | | | | entry listing the Answer Resource | 1339 | | | | Record sections for the Query or | 1340 | | | | Response. See Section 7.5.3. | 1341 | | | | | 1342 | authority-index | O | U | The index in the "rrlist" array of the | 1343 | | | | entry listing the Authority Resource | 1344 | | | | Record sections for the Query or | 1345 | | | | Response. See Section 7.5.3. | 1346 | | | | | 1347 | additional-index | O | U | The index in the "rrlist" array of the | 1348 | | | | entry listing the Additional Resource | 1349 | | | | Record sections for the Query or | 1350 | | | | Response. See Section 7.5.3. | 1351 +------------------+---+---+----------------------------------------+ 1353 7.7. "AddressEventCount" 1355 Counts of various IP related events relating to traffic with 1356 individual client addresses. A map containing the following: 1358 +------------------+---+---+----------------------------------------+ 1359 | Field | O | T | Description | 1360 +------------------+---+---+----------------------------------------+ 1361 | ae-type | M | U | The type of event. The following | 1362 | | | | events types are currently defined: | 1363 | | | | 0. TCP reset. | 1364 | | | | 1. ICMP time exceeded. | 1365 | | | | 2. ICMP destination unreachable. | 1366 | | | | 3. ICMPv6 time exceeded. | 1367 | | | | 4. ICMPv6 destination unreachable. | 1368 | | | | 5. ICMPv6 packet too big. | 1369 | | | | | 1370 | ae-code | O | U | A code relating to the event. | 1371 | | | | | 1372 | ae-address-index | M | U | The index in the "ip-address" array of | 1373 | | | | the client address. See Section 7.5.3. | 1374 | | | | | 1375 | ae-count | M | U | The number of occurrences of this | 1376 | | | | event during the block collection | 1377 | | | | period. | 1378 +------------------+---+---+----------------------------------------+ 1380 7.8. "MalformedMessage" 1382 Details of malformed messages. See Section 8. A map containing the 1383 following: 1385 +----------------------+---+---+------------------------------------+ 1386 | Field | O | T | Description | 1387 +----------------------+---+---+------------------------------------+ 1388 | time-offset | O | U | Message timestamp as an offset in | 1389 | | | | ticks from "earliest-time". | 1390 | | | | | 1391 | client-address-index | O | U | The index in the "ip-address" | 1392 | | | | array of the client IP address. | 1393 | | | | See Section 7.5.3. | 1394 | | | | | 1395 | client-port | O | U | The client port. | 1396 | | | | | 1397 | message-data-index | O | U | The index in the "malformed- | 1398 | | | | message-data" array of the message | 1399 | | | | data for this message. See Section | 1400 | | | | 7.5.3. | 1401 +----------------------+---+---+------------------------------------+ 1403 8. Malformed messages 1405 In the context of generating a C-DNS file it is assumed that only 1406 those DNS messages which can be parsed to produce a well-formed DNS 1407 message are stored in the C-DNS format and that all other messages 1408 will be recorded (if at all) as malformed messages. 1410 Parsing a well-formed message means as a minimum: 1412 o The packet has a well-formed 12 byte DNS Header 1414 o The section counts are consistent with the section contents 1416 o All of the resource records can be parsed 1418 In principle, packets that do not meet these criteria could be 1419 classified into two categories: 1421 o Partially malformed: those packets which can be decoded 1422 sufficiently to extract 1424 * a well-formed 12 byte DNS header (and therefore a DNS 1425 transaction ID) 1427 * the first Question in the Question section if QDCOUNT is 1428 greater than 0 1430 but suffer other issues while parsing. This is the minimum 1431 information required to attempt Query/Response matching as described 1432 in Section 10.1. 1434 o Completely malformed: those packets that cannot be decoded to this 1435 extent. 1437 An open question is whether there is value in attempting to process 1438 partially malformed messages in an analogous manner to well formed 1439 messages in terms of attempting to match them with the corresponding 1440 query or response. This could be done by creating 'placeholder' 1441 records during Query/Response matching with just the information 1442 extracted as above. If the packet were then matched the resulting 1443 C-DNS Q/R data item would include flags to indicate a malformed query 1444 or response or both record (in addition to capturing the wire format 1445 of the packet). 1447 An advantage of this would be that it would result in more meaningful 1448 statistics about matched packets because, for example, some partially 1449 malformed queries could be matched to responses. However it would 1450 only apply to those queries where the first Question is well formed. 1452 It could also simplify the downstream analysis of C-DNS files and the 1453 reconstruction of packet streams from C-DNS. 1455 A disadvantage is that this adds complexity to the Query/Response 1456 matching and data representation, could potentially lead to false 1457 matches and some additional statistics would be required (e.g. counts 1458 for matched-partially-malformed, unmatched-partially-malformed, 1459 completely-malformed). 1461 NOTE: Note that within these definitions a message that contained an 1462 unrecognised OPCODE or RR code would be treated as malformed. It may 1463 be the case that the OPCODE/RR is not recognised just because the 1464 implementation does not support it yet, rather than it not being 1465 standardized. For the case of unrecognised OPCODES the message may 1466 be parsable (for example, if it has a format similar enough to the 1467 one described in [RFC1035]) or it may not. Similarly for 1468 unrecognised RR types the RDATA can still be stored, but the 1469 collector will not be able to process it to remove, for example, name 1470 compression pointers. 1472 QUESTION: There has been no feedback to date requesting further work 1473 on the processing partially malformed messages. The editors are 1474 inclined not to include it in this version. It could be the subject 1475 of a future extension. 1477 9. C-DNS to PCAP 1479 It is possible to re-construct PCAP files from the C-DNS format in a 1480 lossy fashion. Some of the issues with reconstructing both the DNS 1481 payload and the full packet stream are outlined here. 1483 The reconstruction depends on whether or not all the optional 1484 sections of both the query and response were captured in the C-DNS 1485 file. Clearly, if they were not all captured, the reconstruction 1486 will be imperfect. 1488 Even if all sections of the response were captured, one cannot 1489 reconstruct the DNS response payload exactly due to the fact that 1490 some DNS names in the message on the wire may have been compressed. 1491 Section 9.1 discusses this is more detail. 1493 Some transport information is not captured in the C-DNS format. For 1494 example, the following aspects of the original packet stream cannot 1495 be re-constructed from the C-DNS format: 1497 o IP fragmentation 1499 o TCP stream information: 1501 * Multiple DNS messages may have been sent in a single TCP 1502 segment 1504 * A DNS payload may have be split across multiple TCP segments 1506 * Multiple DNS messages may have be sent on a single TCP session 1508 o Malformed DNS messages if the wire format is not recorded 1510 o Any Non-DNS messages that were in the original packet stream e.g. 1511 ICMP 1513 Simple assumptions can be made on the reconstruction: fragmented and 1514 DNS-over-TCP messages can be reconstructed into single packets and a 1515 single TCP session can be constructed for each TCP packet. 1517 Additionally, if malformed messages and Non-DNS packets are captured 1518 separately, they can be merged with packet captures reconstructed 1519 from C-DNS to produce a more complete packet stream. 1521 9.1. Name compression 1523 All the names stored in the C-DNS format are full domain names; no 1524 DNS style name compression is used on the individual names within the 1525 format. Therefore when reconstructing a packet, name compression 1526 must be used in order to reproduce the on the wire representation of 1527 the packet. 1529 [RFC1035] name compression works by substituting trailing sections of 1530 a name with a reference back to the occurrence of those sections 1531 earlier in the message. Not all name server software uses the same 1532 algorithm when compressing domain names within the responses. Some 1533 attempt maximum recompression at the expense of runtime resources, 1534 others use heuristics to balance compression and speed and others use 1535 different rules for what is a valid compression target. 1537 This means that responses to the same question from different name 1538 server software which match in terms of DNS payload content (header, 1539 counts, RRs with name compression removed) do not necessarily match 1540 byte-for-byte on the wire. 1542 Therefore, it is not possible to ensure that the DNS response payload 1543 is reconstructed byte-for-byte from C-DNS data. However, it can at 1544 least, in principle, be reconstructed to have the correct payload 1545 length (since the original response length is captured) if there is 1546 enough knowledge of the commonly implemented name compression 1547 algorithms. For example, a simplistic approach would be to try each 1548 algorithm in turn to see if it reproduces the original length, 1549 stopping at the first match. This would not guarantee the correct 1550 algorithm has been used as it is possible to match the length whilst 1551 still not matching the on the wire bytes but, without further 1552 information added to the C-DNS data, this is the best that can be 1553 achieved. 1555 Appendix B presents an example of two different compression 1556 algorithms used by well-known name server software. 1558 10. Data collection 1560 This section describes a non-normative proposed algorithm for the 1561 processing of a captured stream of DNS queries and responses and 1562 matching queries/responses where possible. 1564 For the purposes of this discussion, it is assumed that the input has 1565 been pre-processed such that: 1567 1. All IP fragmentation reassembly, TCP stream reassembly, and so 1568 on, has already been performed 1570 2. Each message is associated with transport metadata required to 1571 generate the Primary ID (see Section 10.2.1) 1573 3. Each message has a well-formed DNS header of 12 bytes and (if 1574 present) the first Question in the Question section can be parsed 1575 to generate the Secondary ID (see below). As noted earlier, this 1576 requirement can result in a malformed query being removed in the 1577 pre-processing stage, but the correctly formed response with 1578 RCODE of FORMERR being present. 1580 DNS messages are processed in the order they are delivered to the 1581 application. It should be noted that packet capture libraries do not 1582 necessary provide packets in strict chronological order. 1584 TODO: Discuss the corner cases resulting from this in more detail. 1586 10.1. Matching algorithm 1588 A schematic representation of the algorithm for matching Q/R data 1589 items is shown in the following diagram: 1591 Figure showing the Query/Response matching algorithm format (PNG) [5] 1593 Figure showing the Query/Response matching algorithm format (SVG) [6] 1595 Further details of the algorithm are given in the following sections. 1597 10.2. Message identifiers 1599 10.2.1. Primary ID (required) 1601 A Primary ID is constructed for each message. It is composed of the 1602 following data: 1604 1. Source IP Address 1606 2. Destination IP Address 1608 3. Source Port 1610 4. Destination Port 1612 5. Transport 1614 6. DNS Message ID 1616 10.2.2. Secondary ID (optional) 1618 If present, the first Question in the Question section is used as a 1619 secondary ID for each message. Note that there may be well formed 1620 DNS queries that have a QDCOUNT of 0, and some responses may have a 1621 QDCOUNT of 0 (for example, responses with RCODE=FORMERR or NOTIMP). 1622 In this case the secondary ID is not used in matching. 1624 10.3. Algorithm parameters 1626 1. Query timeout 1628 2. Skew timeout 1630 10.4. Algorithm requirements 1632 The algorithm is designed to handle the following input data: 1634 1. Multiple queries with the same Primary ID (but different 1635 Secondary ID) arriving before any responses for these queries are 1636 seen. 1638 2. Multiple queries with the same Primary and Secondary ID arriving 1639 before any responses for these queries are seen. 1641 3. Queries for which no later response can be found within the 1642 specified timeout. 1644 4. Responses for which no previous query can be found within the 1645 specified timeout. 1647 10.5. Algorithm limitations 1649 For cases 1 and 2 listed in the above requirements, it is not 1650 possible to unambiguously match queries with responses. This 1651 algorithm chooses to match to the earliest query with the correct 1652 Primary and Secondary ID. 1654 10.6. Workspace 1656 A FIFO structure is used to hold the Q/R data items during 1657 processing. 1659 10.7. Output 1661 The output is a list of Q/R data items. Both the Query and Response 1662 elements are optional in these items, therefore Q/R data items have 1663 one of three types of content: 1665 1. A matched pair of query and response messages 1667 2. A query message with no response 1669 3. A response message with no query 1671 The timestamp of a list item is that of the query for cases 1 and 2 1672 and that of the response for case 3. 1674 10.8. Post processing 1676 When ending capture, all remaining entries in the Q/R data item FIFO 1677 should be treated as timed out queries. 1679 11. Implementation guidance 1681 Whilst this document makes no specific recommendations with respect 1682 to Canonical CBOR (see Section 3.9 of [RFC7049]) the following 1683 guidance may be of use to implementors. 1685 Adherence to the first two rules given in Section 3.9 of [RFC7049] 1686 will minimise file sizes. 1688 Adherence to the last two rules given in Section 3.9 of [RFC7049] for 1689 all maps and arrays would unacceptably constrain implementations, for 1690 example, in the use case of real-time data collection in constrained 1691 environments. 1693 NOTE: With this clarification to the use of Canonical CBOR, we could 1694 consider re-ordering fields in maps to improve readability. 1696 11.1. Optional data 1698 When decoding data some items required for a particular function the 1699 consumer wishes to perform may be missing. Consumers should consider 1700 providing configurable default values to be used in place of the 1701 missing values in their output. 1703 11.2. Trailing data in TCP 1705 TODO: Clarify the impact of processing wire captures which includes 1706 trailing data in TCP. What will appear as trailing data, what will 1707 appear as malformed messages? 1709 11.3. Limiting collection of RDATA 1711 Implementations should consider providing a configurable maximum 1712 RDATA size for capture , for example, to avoid memory issues when 1713 confronted with large XFR records. 1715 12. Implementation status 1717 [Note to RFC Editor: please remove this section and reference to 1718 [RFC7942] prior to publication.] 1720 This section records the status of known implementations of the 1721 protocol defined by this specification at the time of posting of this 1722 Internet-Draft, and is based on a proposal described in [RFC7942]. 1723 The description of implementations in this section is intended to 1724 assist the IETF in its decision processes in progressing drafts to 1725 RFCs. Please note that the listing of any individual implementation 1726 here does not imply endorsement by the IETF. Furthermore, no effort 1727 has been spent to verify the information presented here that was 1728 supplied by IETF contributors. This is not intended as, and must not 1729 be construed to be, a catalog of available implementations or their 1730 features. Readers are advised to note that other implementations may 1731 exist. 1733 According to [RFC7942], "this will allow reviewers and working groups 1734 to assign due consideration to documents that have the benefit of 1735 running code, which may serve as evidence of valuable experimentation 1736 and feedback that have made the implemented protocols more mature. 1737 It is up to the individual working groups to use this information as 1738 they see fit". 1740 12.1. DNS-STATS Compactor 1742 ICANN/Sinodun IT have developed an open source implementation called 1743 DNS-STATS Compactor. The Compactor is a suite of tools which can 1744 capture DNS traffic (from either a network interface or a PCAP file) 1745 and store it in the Compacted-DNS (C-DNS) file format. PCAP files 1746 for the captured traffic can also be reconstructed. See Compactor 1747 [7]. 1749 This implementation: 1751 o is mature but has only been deployed for testing in a single 1752 environment so is not yet classified as production ready. 1754 o covers the whole of the specification described in the -03 draft 1755 with the exception of support for malformed messages (Section 8) 1756 and pico second time resolution. (Note: this implementation does 1757 allow malformed messages to be dumped to a PCAP file). 1759 o is released under the Mozilla Public License Version 2.0. 1761 o has a users mailing list available, see dns-stats-users [8]. 1763 There is also some discussion of issues encountered during 1764 development available at Compressing Pcap Files [9] and Packet 1765 Capture [10]. 1767 This information was last updated on 29th of June 2017. 1769 13. IANA considerations 1771 None 1773 14. Security considerations 1775 Any control interface MUST perform authentication and encryption. 1777 Any data upload MUST be authenticated and encrypted. 1779 15. Acknowledgements 1781 The authors wish to thank CZ.NIC, in particular Tomas Gavenciak, for 1782 many useful discussions on binary formats, compression and packet 1783 matching. Also Jan Vcelak and Wouter Wijngaards for discussions on 1784 name compression and Paul Hoffman for a detailed review of the 1785 document and the C-DNS CDDL. 1787 Thanks also to Robert Edmonds, Jerry Lundstroem, Richard Gibson, 1788 Stephane Bortzmeyer and many other members of DNSOP for review. 1790 Also, Miek Gieben for mmark [11] 1792 16. Changelog 1794 draft-ietf-dnsop-dns-capture-format-06 1796 o Correct BlockParameters type to map 1798 o Make RR ttl optional 1800 o Add storage flag indicating name normalisation 1802 o Add storage parameter fields for sampling and anonymisation 1803 methods 1805 o Editorial clarifications and improvements 1807 draft-ietf-dnsop-dns-capture-format-05 1809 o Make all data items in Q/R, QuerySignature and Malformed Message 1810 arrays optional 1812 o Re-structure the FilePreamble and ConfigurationParameters into 1813 BlockParameters 1815 o BlockParameters has separate Storage and Collection Parameters 1817 o Storage Parameters includes information on what optional fields 1818 are present, and flags specifying anonymisation or sampling 1820 o Addresses can now be stored as prefixes. 1822 o Switch to using a variable sub-second timing granularity 1824 o Add response bailiwick and query response type 1826 o Add specifics of how to record malformed messages 1828 o Add implementation guidance 1830 o Improve terminology and naming consistency 1832 draft-ietf-dnsop-dns-capture-format-04 1834 o Correct query-d0 to query-do in CDDL 1835 o Clarify that map keys are unsigned integers 1837 o Add Type to Class/Type table 1839 o Clarify storage format in section 7.12 1841 draft-ietf-dnsop-dns-capture-format-03 1843 o Added an Implementation Status section 1845 draft-ietf-dnsop-dns-capture-format-02 1847 o Update qr_data_format.png to match CDDL 1849 o Editorial clarifications and improvements 1851 draft-ietf-dnsop-dns-capture-format-01 1853 o Many editorial improvements by Paul Hoffman 1855 o Included discussion of malformed message handling 1857 o Improved Appendix C on Comparison of Binary Formats 1859 o Now using C-DNS field names in the tables in section 8 1861 o A handful of new fields included (CDDL updated) 1863 o Timestamps now include optional picoseconds 1865 o Added details of block statistics 1867 draft-ietf-dnsop-dns-capture-format-00 1869 o Changed dnstap.io to dnstap.info 1871 o qr_data_format.png was cut off at the bottom 1873 o Update authors address 1875 o Improve wording in Abstract 1877 o Changed DNS-STAT to C-DNS in CDDL 1879 o Set the format version in the CDDL 1881 o Added a TODO: Add block statistics 1882 o Added a TODO: Add extend to support pico/nano. Also do this for 1883 Time offset and Response delay 1885 o Added a TODO: Need to develop optional representation of malformed 1886 messages within C-DNS and what this means for packet matching. 1887 This may influence which fields are optional in the rest of the 1888 representation. 1890 o Added section on design goals to Introduction 1892 o Added a TODO: Can Class be optimised? Should a class of IN be 1893 inferred if not present? 1895 draft-dickinson-dnsop-dns-capture-format-00 1897 o Initial commit 1899 17. References 1901 17.1. Normative References 1903 [RFC1035] Mockapetris, P., "Domain names - implementation and 1904 specification", STD 13, RFC 1035, DOI 10.17487/RFC1035, 1905 November 1987, . 1907 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 1908 Requirement Levels", BCP 14, RFC 2119, 1909 DOI 10.17487/RFC2119, March 1997, . 1912 [RFC7049] Bormann, C. and P. Hoffman, "Concise Binary Object 1913 Representation (CBOR)", RFC 7049, DOI 10.17487/RFC7049, 1914 October 2013, . 1916 17.2. Informative References 1918 [ditl] DNS-OARC, "DITL", 2016, . 1921 [dnscap] DNS-OARC, "DNSCAP", 2016, . 1924 [dnstap] dnstap.info, "dnstap", 2016, . 1926 [dsc] Wessels, D. and J. Lundstrom, "DSC", 2016, 1927 . 1929 [I-D.daley-dnsxml] 1930 Daley, J., Morris, S., and J. Dickinson, "dnsxml - A 1931 standard XML representation of DNS data", draft-daley- 1932 dnsxml-00 (work in progress), July 2013. 1934 [I-D.hoffman-dns-in-json] 1935 Hoffman, P., "Representing DNS Messages in JSON", draft- 1936 hoffman-dns-in-json-13 (work in progress), October 2017. 1938 [I-D.ietf-cbor-cddl] 1939 Birkholz, H., Vigano, C., and C. Bormann, "Concise data 1940 definition language (CDDL): a notational convention to 1941 express CBOR data structures", draft-ietf-cbor-cddl-02 1942 (work in progress), February 2018. 1944 [packetq] .SE - The Internet Infrastructure Foundation, "PacketQ", 1945 2014, . 1947 [pcap] tcpdump.org, "PCAP", 2016, . 1949 [pcapng] Tuexen, M., Risso, F., Bongertz, J., Combs, G., and G. 1950 Harris, "pcap-ng", 2016, . 1953 [RFC7159] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data 1954 Interchange Format", RFC 7159, DOI 10.17487/RFC7159, March 1955 2014, . 1957 [RFC7942] Sheffer, Y. and A. Farrel, "Improving Awareness of Running 1958 Code: The Implementation Status Section", BCP 205, 1959 RFC 7942, DOI 10.17487/RFC7942, July 2016, 1960 . 1962 17.3. URIs 1964 [1] https://github.com/dns-stats/draft-dns-capture- 1965 format/blob/master/draft-06/cdns_format.png 1967 [2] https://github.com/dns-stats/draft-dns-capture- 1968 format/blob/master/draft-06/cdns_format.svg 1970 [3] https://github.com/dns-stats/draft-dns-capture- 1971 format/blob/master/draft-06/qr_data_format.png 1973 [4] https://github.com/dns-stats/draft-dns-capture- 1974 format/blob/master/draft-06/qr_data_format.svg 1976 [5] https://github.com/dns-stats/draft-dns-capture- 1977 format/blob/master/draft-06/packet_matching.png 1979 [6] https://github.com/dns-stats/draft-dns-capture- 1980 format/blob/master/draft-06/packet_matching.svg 1982 [7] https://github.com/dns-stats/compactor/wiki 1984 [8] https://mm.dns-stats.org/mailman/listinfo/dns-stats-users 1986 [9] https://www.sinodun.com/2017/06/compressing-pcap-files/ 1988 [10] https://www.sinodun.com/2017/06/more-on-debian-jessieubuntu- 1989 trusty-packet-capture-woes/ 1991 [11] https://github.com/miekg/mmark 1993 [12] https://www.nlnetlabs.nl/projects/nsd/ 1995 [13] https://www.knot-dns.cz/ 1997 [14] https://avro.apache.org/ 1999 [15] https://developers.google.com/protocol-buffers/ 2001 [16] http://cbor.io 2003 [17] https://github.com/kubo/snzip 2005 [18] http://google.github.io/snappy/ 2007 [19] http://lz4.github.io/lz4/ 2009 [20] http://www.gzip.org/ 2011 [21] http://facebook.github.io/zstd/ 2013 [22] http://tukaani.org/xz/ 2015 [23] https://github.com/dns-stats/draft-dns-capture- 2016 format/blob/master/file-size-versus-block-size.png 2018 [24] https://github.com/dns-stats/draft-dns-capture- 2019 format/blob/master/file-size-versus-block-size.svg 2021 Appendix A. CDDL 2023 ; CDDL specification of the file format for C-DNS, 2024 ; which describes a collection of DNS messages and 2025 ; traffic meta-data. 2027 ; 2028 ; The overall structure of a file. 2029 ; 2030 File = [ 2031 file-type-id : tstr .regexp "C-DNS", 2032 file-preamble : FilePreamble, 2033 file-blocks : [* Block], 2034 ] 2036 ; 2037 ; The file preamble. 2038 ; 2039 FilePreamble = { 2040 major-format-version => uint .eq 1, 2041 minor-format-version => uint .eq 0, 2042 ? private-version => uint, 2043 block-parameters => [+ BlockParameters], 2044 } 2045 major-format-version = 0 2046 minor-format-version = 1 2047 private-version = 2 2048 block-parameters = 3 2050 BlockParameters = { 2051 storage-parameters => StorageParameters, 2052 ? collection-parameters => CollectionParameters, 2053 } 2054 storage-parameters = 0 2055 collection-parameters = 1 2057 StorageParameters = { 2058 ticks-per-second => uint, 2059 max-block-items => uint, 2060 storage-hints => StorageHints, 2061 opcodes => [+ uint], 2062 rr-types => [+ uint], 2063 ? storage-flags => StorageFlags, 2064 ? client-address-prefix-ipv4 => uint, 2065 ? client-address-prefix-ipv6 => uint, 2066 ? server-address-prefix-ipv4 => uint, 2067 ? server-address-prefix-ipv6 => uint, 2068 ? sampling-method => tstr, 2069 ? anonymisation-method => tstr, 2070 } 2071 ticks-per-second = 0 2072 max-block-items = 1 2073 storage-hints = 2 2074 opcodes = 3 2075 rr-types = 4 2076 storage-flags = 5 2077 client-address-prefix-ipv4 = 6 2078 client-address-prefix-ipv6 = 7 2079 server-address-prefix-ipv4 = 8 2080 server-address-prefix-ipv6 = 9 2081 sampling-method = 10 2082 anonymisation-method = 11 2084 ; A hint indicates if the collection method will output the 2085 ; item or will ignore the item if present. 2086 StorageHints = { 2087 query-response-hints => QueryResponseHints, 2088 query-response-signature-hints => QueryResponseSignatureHints, 2089 rr-hints => RRHints, 2090 other-data-hints => OtherDataHints, 2091 } 2092 query-response-hints = 0 2093 query-response-signature-hints = 1 2094 rr-hints = 2 2095 other-data-hints = 3 2097 QueryResponseHintValues = &( 2098 time-offset : 0, 2099 client-address-index : 1, 2100 client-port : 2, 2101 transaction-id : 3, 2102 qr-signature-index : 4, 2103 client-hoplimit : 5, 2104 response-delay : 6, 2105 query-name-index : 7, 2106 query-size : 8, 2107 response-size : 9, 2108 response-processing-data : 10, 2109 query-question-sections : 11, ; Second & subsequent questions 2110 query-answer-sections : 12, 2111 query-authority-sections : 13, 2112 query-additional-sections : 14, 2113 response-answer-sections : 15, 2114 response-authority-sections : 16, 2115 response-additional-sections : 17, 2116 ) 2117 QueryResponseHints = uint .bits QueryResponseHintValues 2119 QueryResponseSignatureHintValues =&( 2120 server-address : 0, 2121 server-port : 1, 2122 qr-transport-flags : 2, 2123 qr-type : 3, 2124 qr-sig-flags : 4, 2125 query-opcode : 5, 2126 dns-flags : 6, 2127 query-rcode : 7, 2128 query-class-type : 8, 2129 query-qdcount : 9, 2130 query-ancount : 10, 2131 query-arcount : 11, 2132 query-nscount : 12, 2133 query-edns-version : 13, 2134 query-udp-size : 14, 2135 query-opt-rdata : 15, 2136 response-rcode : 16, 2137 ) 2138 QueryResponseSignatureHints = uint .bits QueryResponseSignatureHintValues 2140 RRHintValues = &( 2141 ttl : 0, 2142 ) 2143 RRHints = uint .bits RRHintValues 2145 OtherDataHintValues = &( 2146 malformed-messages : 0, 2147 address-event-counts : 1, 2148 ) 2149 OtherDataHints = uint .bits OtherDataHintValues 2151 StorageFlagValues = &( 2152 anonymised-data : 0, 2153 sampled-data : 1, 2154 normalised-names : 2, 2155 ) 2156 StorageFlags = uint .bits StorageFlagValues 2158 CollectionParameters = { 2159 ? query-timeout => uint, 2160 ? skew-timeout => uint, 2161 ? snaplen => uint, 2162 ? promisc => uint, 2163 ? interfaces => [+ tstr], 2164 ? server-addresses => [+ IPAddress], ; Hint for later analysis 2165 ? vlan-ids => [+ uint], 2166 ? filter => tstr, 2167 ? generator-id => tstr, 2168 ? host-id => tstr, 2169 } 2170 query-timeout = 0 2171 skew-timeout = 1 2172 snaplen = 2 2173 promisc = 3 2174 interfaces = 4 2175 server-addresses = 5 2176 vlan-ids = 6 2177 filter = 7 2178 generator-id = 8 2179 host-id = 9 2181 ; 2182 ; Data in the file is stored in Blocks. 2183 ; 2184 Block = { 2185 block-preamble => BlockPreamble, 2186 ? block-statistics => BlockStatistics, ; Much of this could be derived 2187 ? block-tables => BlockTables, 2188 ? query-responses => [+ QueryResponse], 2189 ? address-event-counts => [+ AddressEventCount], 2190 ? malformed-messages => [+ MalformedMessage], 2191 } 2192 block-preamble = 0 2193 block-statistics = 1 2194 block-tables = 2 2195 query-responses = 3 2196 address-event-counts = 4 2197 malformed-messages = 5 2199 ; 2200 ; The (mandatory) preamble to a block. 2201 ; 2202 BlockPreamble = { 2203 ? earliest-time => Timestamp, 2204 ? block-parameters-index => uint .default 0, 2205 } 2206 earliest-time = 0 2207 block-parameters-index = 1 2209 ; Ticks are subsecond intervals. The number of ticks in a second is file/block 2210 ; metadata. Signed and unsigned tick types are defined. 2211 ticks = int 2212 uticks = uint 2213 Timestamp = [ 2214 timestamp-secs : uint, 2215 timestamp-uticks : uticks, 2216 ] 2218 ; 2219 ; Statistics about the block contents. 2220 ; 2221 BlockStatistics = { 2222 ? total-messages => uint, 2223 ? total-pairs => uint, 2224 ? total-unmatched-queries => uint, 2225 ? total-unmatched-responses => uint, 2226 ? total-malformed-messages => uint, 2227 } 2228 total-messages = 0 2229 total-pairs = 1 2230 total-unmatched-queries = 2 2231 total-unmatched-responses = 3 2232 total-malformed-messages = 4 2234 ; 2235 ; Tables of common data referenced from records in a block. 2236 ; 2237 BlockTables = { 2238 ? ip-address => [+ IPAddress], 2239 ? classtype => [+ ClassType], 2240 ? name-rdata => [+ bstr], ; Holds both Name RDATA and RDATA 2241 ? qr-sig => [+ QueryResponseSignature], 2242 ? QuestionTables, 2243 ? RRTables, 2244 ? malformed-message-data => [+ MalformedMessageData], 2245 } 2246 ip-address = 0 2247 classtype = 1 2248 name-rdata = 2 2249 qr-sig = 3 2250 qlist = 4 2251 qrr = 5 2252 rrlist = 6 2253 rr = 7 2254 malformed-message-data = 8 2256 IPv4Address = bstr .size 4 2257 IPv6Address = bstr .size 16 2258 IPAddress = IPv4Address / IPv6Address 2260 ClassType = { 2261 type => uint, 2262 class => uint, 2263 } 2264 type = 0 2265 class = 1 2267 QueryResponseSignature = { 2268 ? server-address-index => uint, 2269 ? server-port => uint, 2270 ? qr-transport-flags => QueryResponseTransportFlags, 2271 ? qr-type => QueryResponseType, 2272 ? qr-sig-flags => QueryResponseFlags, 2273 ? query-opcode => uint, 2274 ? qr-dns-flags => DNSFlags, 2275 ? query-rcode => uint, 2276 ? query-classtype-index => uint, 2277 ? query-qd-count => uint, 2278 ? query-an-count => uint, 2279 ? query-ns-count => uint, 2280 ? query-ar-count => uint, 2281 ? edns-version => uint, 2282 ? udp-buf-size => uint, 2283 ? opt-rdata-index => uint, 2284 ? response-rcode => uint, 2285 } 2286 server-address-index = 0 2287 server-port = 1 2288 qr-transport-flags = 2 2289 qr-type = 3 2290 qr-sig-flags = 4 2291 query-opcode = 5 2292 qr-dns-flags = 6 2293 query-rcode = 7 2294 query-classtype-index = 8 2295 query-qd-count = 9 2296 query-an-count = 10 2297 query-ns-count = 12 2298 query-ar-count = 12 2299 edns-version = 13 2300 udp-buf-size = 14 2301 opt-rdata-index = 15 2302 response-rcode = 16 2304 Transport = &( 2305 udp : 0, 2306 tcp : 1, 2307 tls : 2, 2308 dtls : 3, 2310 ) 2312 TransportFlagValues = &( 2313 ip-version : 0, ; 0=IPv4, 1=IPv6 2314 ; Transport value bits 1-4 2315 ) / (1..4) 2316 TransportFlags = uint .bits TransportFlagValues 2318 QueryResponseTransportFlagValues = &( 2319 query-trailingdata : 5, 2320 ) / TransportFlagValues 2321 QueryResponseTransportFlags = uint .bits QueryResponseTransportFlagValues 2323 QueryResponseType = &( 2324 stub : 0, 2325 client : 1, 2326 resolver : 2, 2327 auth : 3, 2328 forwarder : 4, 2329 tool : 5, 2330 ) 2332 QueryResponseFlagValues = &( 2333 has-query : 0, 2334 has-reponse : 1, 2335 query-has-question : 2, 2336 query-has-opt : 3, 2337 response-has-opt : 4, 2338 response-has-no-question: 5, 2339 ) 2340 QueryResponseFlags = uint .bits QueryResponseFlagValues 2342 DNSFlagValues = &( 2343 query-cd : 0, 2344 query-ad : 1, 2345 query-z : 2, 2346 query-ra : 3, 2347 query-rd : 4, 2348 query-tc : 5, 2349 query-aa : 6, 2350 query-do : 7, 2351 response-cd: 8, 2352 response-ad: 9, 2353 response-z : 10, 2354 response-ra: 11, 2355 response-rd: 12, 2356 response-tc: 13, 2357 response-aa: 14, 2359 ) 2360 DNSFlags = uint .bits DNSFlagValues 2362 QuestionTables = ( 2363 qlist => [+ QuestionList], 2364 qrr => [+ Question] 2365 ) 2367 QuestionList = [+ uint] ; Index of Question 2369 Question = { ; Second and subsequent questions 2370 name-index => uint, ; Index to a name in the name-rdata table 2371 classtype-index => uint, 2372 } 2373 name-index = 0 2374 classtype-index = 1 2376 RRTables = ( 2377 rrlist => [+ RRList], 2378 rr => [+ RR] 2379 ) 2381 RRList = [+ uint] ; Index of RR 2383 RR = { 2384 name-index => uint, ; Index to a name in the name-rdata table 2385 classtype-index => uint, 2386 ? ttl => uint, 2387 rdata-index => uint, ; Index to RDATA in the name-rdata table 2388 } 2389 ; Other map key values already defined above. 2390 ttl = 2 2391 rdata-index = 3 2393 MalformedMessageData = { 2394 ? server-address-index => uint, 2395 ? server-port => uint, 2396 ? mm-transport-flags => TransportFlags, 2397 ? mm-payload => bstr, 2398 } 2399 ; Other map key values already defined above. 2400 mm-transport-flags = 2 2401 mm-payload = 3 2403 ; 2404 ; A single query/response pair. 2405 ; 2406 QueryResponse = { 2407 ? time-offset => uticks, ; Time offset from start of block 2408 ? client-address-index => uint, 2409 ? client-port => uint, 2410 ? transaction-id => uint, 2411 ? qr-signature-index => uint, 2412 ? client-hoplimit => uint, 2413 ? response-delay => ticks, 2414 ? query-name-index => uint, 2415 ? query-size => uint, ; DNS size of query 2416 ? response-size => uint, ; DNS size of response 2417 ? response-processing-data => ResponseProcessingData, 2418 ? query-extended => QueryResponseExtended, 2419 ? response-extended => QueryResponseExtended, 2420 } 2421 time-offset = 0 2422 client-address-index = 1 2423 client-port = 2 2424 transaction-id = 3 2425 qr-signature-index = 4 2426 client-hoplimit = 5 2427 response-delay = 6 2428 query-name-index = 7 2429 query-size = 8 2430 response-size = 9 2431 response-processing-data = 10 2432 query-extended = 11 2433 response-extended = 12 2435 ResponseProcessingData = { 2436 ? bailiwick-index => uint, 2437 ? processing-flags => ResponseProcessingFlags, 2438 } 2439 bailiwick-index = 0 2440 processing-flags = 1 2442 ResponseProcessingFlagValues = &( 2443 from-cache : 0, 2444 ) 2445 ResponseProcessingFlags = uint .bits ResponseProcessingFlagValues 2447 QueryResponseExtended = { 2448 ? question-index => uint, ; Index of QuestionList 2449 ? answer-index => uint, ; Index of RRList 2450 ? authority-index => uint, 2451 ? additional-index => uint, 2452 } 2453 question-index = 0 2454 answer-index = 1 2455 authority-index = 2 2456 additional-index = 3 2458 ; 2459 ; Address event data. 2460 ; 2461 AddressEventCount = { 2462 ae-type => &AddressEventType, 2463 ? ae-code => uint, 2464 ae-address-index => uint, 2465 ae-count => uint, 2466 } 2467 ae-type = 0 2468 ae-code = 1 2469 ae-address-index = 2 2470 ae-count = 3 2472 AddressEventType = ( 2473 tcp-reset : 0, 2474 icmp-time-exceeded : 1, 2475 icmp-dest-unreachable : 2, 2476 icmpv6-time-exceeded : 3, 2477 icmpv6-dest-unreachable: 4, 2478 icmpv6-packet-too-big : 5, 2479 ) 2481 ; 2482 ; Malformed messages. 2483 ; 2484 MalformedMessage = { 2485 ? time-offset => uticks, ; Time offset from start of block 2486 ? client-address-index => uint, 2487 ? client-port => uint, 2488 ? message-data-index => uint, 2489 } 2490 ; Other map key values already defined above. 2491 message-data-index = 3 2493 Appendix B. DNS Name compression example 2495 The basic algorithm, which follows the guidance in [RFC1035], is 2496 simply to collect each name, and the offset in the packet at which it 2497 starts, during packet construction. As each name is added, it is 2498 offered to each of the collected names in order of collection, 2499 starting from the first name. If labels at the end of the name can 2500 be replaced with a reference back to part (or all) of the earlier 2501 name, and if the uncompressed part of the name is shorter than any 2502 compression already found, the earlier name is noted as the 2503 compression target for the name. 2505 The following tables illustrate the process. In an example packet, 2506 the first name is example.com. 2508 +---+-------------+--------------+--------------------+ 2509 | N | Name | Uncompressed | Compression Target | 2510 +---+-------------+--------------+--------------------+ 2511 | 1 | example.com | | | 2512 +---+-------------+--------------+--------------------+ 2514 The next name added is bar.com. This is matched against example.com. 2515 The com part of this can be used as a compression target, with the 2516 remaining uncompressed part of the name being bar. 2518 +---+-------------+--------------+--------------------+ 2519 | N | Name | Uncompressed | Compression Target | 2520 +---+-------------+--------------+--------------------+ 2521 | 1 | example.com | | | 2522 | 2 | bar.com | bar | 1 + offset to com | 2523 +---+-------------+--------------+--------------------+ 2525 The third name added is www.bar.com. This is first matched against 2526 example.com, and as before this is recorded as a compression target, 2527 with the remaining uncompressed part of the name being www.bar. It 2528 is then matched against the second name, which again can be a 2529 compression target. Because the remaining uncompressed part of the 2530 name is www, this is an improved compression, and so it is adopted. 2532 +---+-------------+--------------+--------------------+ 2533 | N | Name | Uncompressed | Compression Target | 2534 +---+-------------+--------------+--------------------+ 2535 | 1 | example.com | | | 2536 | 2 | bar.com | bar | 1 + offset to com | 2537 | 3 | www.bar.com | www | 2 | 2538 +---+-------------+--------------+--------------------+ 2540 As an optimization, if a name is already perfectly compressed (in 2541 other words, the uncompressed part of the name is empty), then no 2542 further names will be considered for compression. 2544 B.1. NSD compression algorithm 2546 Using the above basic algorithm the packet lengths of responses 2547 generated by NSD [12] can be matched almost exactly. At the time of 2548 writing, a tiny number (<.01%) of the reconstructed packets had 2549 incorrect lengths. 2551 B.2. Knot Authoritative compression algorithm 2553 The Knot Authoritative [13] name server uses different compression 2554 behavior, which is the result of internal optimization designed to 2555 balance runtime speed with compression size gains. In brief, and 2556 omitting complications, Knot Authoritative will only consider the 2557 QNAME and names in the immediately preceding RR section in an RRSET 2558 as compression targets. 2560 A set of smart heuristics as described below can be implemented to 2561 mimic this and while not perfect it produces output nearly, but not 2562 quite, as good a match as with NSD. The heuristics are: 2564 1. A match is only perfect if the name is completely compressed AND 2565 the TYPE of the section in which the name occurs matches the TYPE 2566 of the name used as the compression target. 2568 2. If the name occurs in RDATA: 2570 * If the compression target name is in a query, then only the 2571 first RR in an RRSET can use that name as a compression 2572 target. 2574 * The compression target name MUST be in RDATA. 2576 * The name section TYPE must match the compression target name 2577 section TYPE. 2579 * The compression target name MUST be in the immediately 2580 preceding RR in the RRSET. 2582 Using this algorithm less than 0.1% of the reconstructed packets had 2583 incorrect lengths. 2585 B.3. Observed differences 2587 In sample traffic collected on a root name server around 2-4% of 2588 responses generated by Knot had different packet lengths to those 2589 produced by NSD. 2591 Appendix C. Comparison of Binary Formats 2593 Several binary serialisation formats were considered, and for 2594 completeness were also compared to JSON. 2596 o Apache Avro [14]. Data is stored according to a pre-defined 2597 schema. The schema itself is always included in the data file. 2599 Data can therefore be stored untagged, for a smaller serialisation 2600 size, and be written and read by an Avro library. 2602 * At the time of writing, Avro libraries are available for C, 2603 C++, C#, Java, Python, Ruby and PHP. Optionally tools are 2604 available for C++, Java and C# to generate code for encoding 2605 and decoding. 2607 o Google Protocol Buffers [15]. Data is stored according to a pre- 2608 defined schema. The schema is used by a generator to generate 2609 code for encoding and decoding the data. Data can therefore be 2610 stored untagged, for a smaller serialisation size. The schema is 2611 not stored with the data, so unlike Avro cannot be read with a 2612 generic library. 2614 * Code must be generated for a particular data schema to to read 2615 and write data using that schema. At the time of writing, the 2616 Google code generator can currently generate code for encoding 2617 and decoding a schema for C++, Go, Java, Python, Ruby, C#, 2618 Objective-C, Javascript and PHP. 2620 o CBOR [16]. Defined in [RFC7049], this serialisation format is 2621 comparable to JSON but with a binary representation. It does not 2622 use a pre-defined schema, so data is always stored tagged. 2623 However, CBOR data schemas can be described using CDDL 2624 [I-D.ietf-cbor-cddl] and tools exist to verify data files conform 2625 to the schema. 2627 * CBOR is a simple format, and simple to implement. At the time 2628 of writing, the CBOR website lists implementations for 16 2629 languages. 2631 Avro and Protocol Buffers both allow storage of untagged data, but 2632 because they rely on the data schema for this, their implementation 2633 is considerably more complex than CBOR. Using Avro or Protocol 2634 Buffers in an unsupported environment would require notably greater 2635 development effort compared to CBOR. 2637 A test program was written which reads input from a PCAP file and 2638 writes output using one of two basic structures; either a simple 2639 structure, where each query/response pair is represented in a single 2640 record entry, or the C-DNS block structure. 2642 The resulting output files were then compressed using a variety of 2643 common general-purpose lossless compression tools to explore the 2644 compressibility of the formats. The compression tools employed were: 2646 o snzip [17]. A command line compression tool based on the Google 2647 Snappy [18] library. 2649 o lz4 [19]. The command line compression tool from the reference C 2650 LZ4 implementation. 2652 o gzip [20]. The ubiquitous GNU zip tool. 2654 o zstd [21]. Compression using the Zstandard algorithm. 2656 o xz [22]. A popular compression tool noted for high compression. 2658 In all cases the compression tools were run using their default 2659 settings. 2661 Note that this draft does not mandate the use of compression, nor any 2662 particular compression scheme, but it anticipates that in practice 2663 output data will be subject to general-purpose compression, and so 2664 this should be taken into consideration. 2666 "test.pcap", a 662Mb capture of sample data from a root instance was 2667 used for the comparison. The following table shows the formatted 2668 size and size after compression (abbreviated to Comp. in the table 2669 headers), together with the task resident set size (RSS) and the user 2670 time taken by the compression. File sizes are in Mb, RSS in kb and 2671 user time in seconds. 2673 +-------------+-----------+-------+------------+-------+-----------+ 2674 | Format | File size | Comp. | Comp. size | RSS | User time | 2675 +-------------+-----------+-------+------------+-------+-----------+ 2676 | PCAP | 661.87 | snzip | 212.48 | 2696 | 1.26 | 2677 | | | lz4 | 181.58 | 6336 | 1.35 | 2678 | | | gzip | 153.46 | 1428 | 18.20 | 2679 | | | zstd | 87.07 | 3544 | 4.27 | 2680 | | | xz | 49.09 | 97416 | 160.79 | 2681 | | | | | | | 2682 | JSON simple | 4113.92 | snzip | 603.78 | 2656 | 5.72 | 2683 | | | lz4 | 386.42 | 5636 | 5.25 | 2684 | | | gzip | 271.11 | 1492 | 73.00 | 2685 | | | zstd | 133.43 | 3284 | 8.68 | 2686 | | | xz | 51.98 | 97412 | 600.74 | 2687 | | | | | | | 2688 | Avro simple | 640.45 | snzip | 148.98 | 2656 | 0.90 | 2689 | | | lz4 | 111.92 | 5828 | 0.99 | 2690 | | | gzip | 103.07 | 1540 | 11.52 | 2691 | | | zstd | 49.08 | 3524 | 2.50 | 2692 | | | xz | 22.87 | 97308 | 90.34 | 2693 | | | | | | | 2694 | CBOR simple | 764.82 | snzip | 164.57 | 2664 | 1.11 | 2695 | | | lz4 | 120.98 | 5892 | 1.13 | 2696 | | | gzip | 110.61 | 1428 | 12.88 | 2697 | | | zstd | 54.14 | 3224 | 2.77 | 2698 | | | xz | 23.43 | 97276 | 111.48 | 2699 | | | | | | | 2700 | PBuf simple | 749.51 | snzip | 167.16 | 2660 | 1.08 | 2701 | | | lz4 | 123.09 | 5824 | 1.14 | 2702 | | | gzip | 112.05 | 1424 | 12.75 | 2703 | | | zstd | 53.39 | 3388 | 2.76 | 2704 | | | xz | 23.99 | 97348 | 106.47 | 2705 | | | | | | | 2706 | JSON block | 519.77 | snzip | 106.12 | 2812 | 0.93 | 2707 | | | lz4 | 104.34 | 6080 | 0.97 | 2708 | | | gzip | 57.97 | 1604 | 12.70 | 2709 | | | zstd | 61.51 | 3396 | 3.45 | 2710 | | | xz | 27.67 | 97524 | 169.10 | 2711 | | | | | | | 2712 | Avro block | 60.45 | snzip | 48.38 | 2688 | 0.20 | 2713 | | | lz4 | 48.78 | 8540 | 0.22 | 2714 | | | gzip | 39.62 | 1576 | 2.92 | 2715 | | | zstd | 29.63 | 3612 | 1.25 | 2716 | | | xz | 18.28 | 97564 | 25.81 | 2717 | | | | | | | 2718 | CBOR block | 75.25 | snzip | 53.27 | 2684 | 0.24 | 2719 | | | lz4 | 51.88 | 8008 | 0.28 | 2720 | | | gzip | 41.17 | 1548 | 4.36 | 2721 | | | zstd | 30.61 | 3476 | 1.48 | 2722 | | | xz | 18.15 | 97556 | 38.78 | 2723 | | | | | | | 2724 | PBuf block | 67.98 | snzip | 51.10 | 2636 | 0.24 | 2725 | | | lz4 | 52.39 | 8304 | 0.24 | 2726 | | | gzip | 40.19 | 1520 | 3.63 | 2727 | | | zstd | 31.61 | 3576 | 1.40 | 2728 | | | xz | 17.94 | 97440 | 33.99 | 2729 +-------------+-----------+-------+------------+-------+-----------+ 2731 The above results are discussed in the following sections. 2733 C.1. Comparison with full PCAP files 2735 An important first consideration is whether moving away from PCAP 2736 offers significant benefits. 2738 The simple binary formats are typically larger than PCAP, even though 2739 they omit some information such as Ethernet MAC addresses. But not 2740 only do they require less CPU to compress than PCAP, the resulting 2741 compressed files are smaller than compressed PCAP. 2743 C.2. Simple versus block coding 2745 The intention of the block coding is to perform data de-duplication 2746 on query/response records within the block. The simple and block 2747 formats above store exactly the same information for each query/ 2748 response record. This information is parsed from the DNS traffic in 2749 the input PCAP file, and in all cases each field has an identifier 2750 and the field data is typed. 2752 The data de-duplication on the block formats show an order of 2753 magnitude reduction in the size of the format file size against the 2754 simple formats. As would be expected, the compression tools are able 2755 to find and exploit a lot of this duplication, but as the de- 2756 duplication process uses knowledge of DNS traffic, it is able to 2757 retain a size advantage. This advantage reduces as stronger 2758 compression is applied, as again would be expected, but even with the 2759 strongest compression applied the block formatted data remains around 2760 75% of the size of the simple format and its compression requires 2761 roughly a third of the CPU time. 2763 C.3. Binary versus text formats 2765 Text data formats offer many advantages over binary formats, 2766 particularly in the areas of ad-hoc data inspection and extraction. 2767 It was therefore felt worthwhile to carry out a direct comparison, 2768 implementing JSON versions of the simple and block formats. 2770 Concentrating on JSON block format, the format files produced are a 2771 significant fraction of an order of magnitude larger than binary 2772 formats. The impact on file size after compression is as might be 2773 expected from that starting point; the stronger compression produces 2774 files that are 150% of the size of similarly compressed binary 2775 format, and require over 4x more CPU to compress. 2777 C.4. Performance 2779 Concentrating again on the block formats, all three produce format 2780 files that are close to an order of magnitude smaller that the 2781 original "test.pcap" file. CBOR produces the largest files and Avro 2782 the smallest, 20% smaller than CBOR. 2784 However, once compression is taken into account, the size difference 2785 narrows. At medium compression (with gzip), the size difference is 2786 4%. Using strong compression (with xz) the difference reduces to 2%, 2787 with Avro the largest and Protocol Buffers the smallest, although 2788 CBOR and Protocol Buffers require slightly more compression CPU. 2790 The measurements presented above do not include data on the CPU 2791 required to generate the format files. Measurements indicate that 2792 writing Avro requires 10% more CPU than CBOR or Protocol Buffers. It 2793 appears, therefore, that Avro's advantage in compression CPU usage is 2794 probably offset by a larger CPU requirement in writing Avro. 2796 C.5. Conclusions 2798 The above assessments lead us to the choice of a binary format file 2799 using blocking. 2801 As noted previously, this draft anticipates that output data will be 2802 subject to compression. There is no compelling case for one 2803 particular binary serialisation format in terms of either final file 2804 size or machine resources consumed, so the choice must be largely 2805 based on other factors. CBOR was therefore chosen as the binary 2806 serialisation format for the reasons listed in Section 5. 2808 C.6. Block size choice 2810 Given the choice of a CBOR format using blocking, the question arises 2811 of what an appropriate default value for the maximum number of query/ 2812 response pairs in a block should be. This has two components; what 2813 is the impact on performance of using different block sizes in the 2814 format file, and what is the impact on the size of the format file 2815 before and after compression. 2817 The following table addresses the performance question, showing the 2818 impact on the performance of a C++ program converting "test.pcap" to 2819 C-DNS. File size is in Mb, resident set size (RSS) in kb. 2821 +------------+-----------+--------+-----------+ 2822 | Block size | File size | RSS | User time | 2823 +------------+-----------+--------+-----------+ 2824 | 1000 | 133.46 | 612.27 | 15.25 | 2825 | 5000 | 89.85 | 676.82 | 14.99 | 2826 | 10000 | 76.87 | 752.40 | 14.53 | 2827 | 20000 | 67.86 | 750.75 | 14.49 | 2828 | 40000 | 61.88 | 736.30 | 14.29 | 2829 | 80000 | 58.08 | 694.16 | 14.28 | 2830 | 160000 | 55.94 | 733.84 | 14.44 | 2831 | 320000 | 54.41 | 799.20 | 13.97 | 2832 +------------+-----------+--------+-----------+ 2834 Increasing block size, therefore, tends to increase maximum RSS a 2835 little, with no significant effect (if anything a small reduction) on 2836 CPU consumption. 2838 The following figure plots the effect of increasing block size on 2839 output file size for different compressions. 2841 Figure showing effect of block size on file size (PNG) [23] 2843 Figure showing effect of block size on file size (SVG) [24] 2845 From the above, there is obviously scope for tuning the default block 2846 size to the compression being employed, traffic characteristics, 2847 frequency of output file rollover etc. Using a strong compression, 2848 block sizes over 10,000 query/response pairs would seem to offer 2849 limited improvements. 2851 Authors' Addresses 2853 John Dickinson 2854 Sinodun IT 2855 Magdalen Centre 2856 Oxford Science Park 2857 Oxford OX4 4GA 2858 United Kingdom 2860 Email: jad@sinodun.com 2862 Jim Hague 2863 Sinodun IT 2864 Magdalen Centre 2865 Oxford Science Park 2866 Oxford OX4 4GA 2867 United Kingdom 2869 Email: jim@sinodun.com 2871 Sara Dickinson 2872 Sinodun IT 2873 Magdalen Centre 2874 Oxford Science Park 2875 Oxford OX4 4GA 2876 United Kingdom 2878 Email: sara@sinodun.com 2879 Terry Manderson 2880 ICANN 2881 12025 Waterfront Drive 2882 Suite 300 2883 Los Angeles CA 90094-2536 2885 Email: terry.manderson@icann.org 2887 John Bond 2888 ICANN 2889 12025 Waterfront Drive 2890 Suite 300 2891 Los Angeles CA 90094-2536 2893 Email: john.bond@icann.org