idnits 2.17.1 draft-mcfadden-smart-rfc3552-textual-research-02.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 12 characters in excess of 72. ** The abstract seems to contain references ([RFC3552]), which it shouldn't. Please replace those with straight textual mentions of the documents in question. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (September 9, 2020) is 1296 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'RFC3410' is mentioned on line 454, but not defined == Missing Reference: 'RFC4301' is mentioned on line 456, but not defined == Missing Reference: 'RFC7258' is mentioned on line 618, but not defined == Unused Reference: '2' is defined on line 596, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2223 (Obsoleted by RFC 7322) Summary: 3 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 SMART M. McFadden 2 Internet-Draft internet policy advisors ltd 3 A. Mills 4 UWE - Bristol 6 Intended status: Informational September 9, 2020 7 Expires: March 9, 2021 9 Textual Analysis Methodology for Security Considerations Sections 10 draft-mcfadden-smart-rfc3552-textual-research-02.txt 12 Status of this Memo 14 This Internet-Draft is submitted in full conformance with the 15 provisions of BCP 78 and BCP 79. 17 Internet-Drafts are working documents of the Internet Engineering 18 Task Force (IETF), its areas, and its working groups. Note that 19 other groups may also distribute working documents as Internet- 20 Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six 23 months and may be updated, replaced, or obsoleted by other documents 24 at any time. It is inappropriate to use Internet-Drafts as 25 reference material or to cite them other than as "work in progress." 27 The list of current Internet-Drafts can be accessed at 28 http://www.ietf.org/ietf/1id-abstracts.txt 30 The list of Internet-Draft Shadow Directories can be accessed at 31 http://www.ietf.org/shadow.html 33 This Internet-Draft will expire on March 9, 2021. 35 Copyright Notice 37 Copyright (c) 2020 IETF Trust and the persons identified as the 38 document authors. All rights reserved. 40 This document is subject to BCP 78 and the IETF Trust's Legal 41 Provisions Relating to IETF Documents 42 (http://trustee.ietf.org/license-info) in effect on the date of 43 publication of this document. Please review these documents 44 carefully, as they describe your rights and restrictions with 45 respect to this document. Code Components extracted from this 46 document must include Simplified BSD License text as described in 47 Section 4.e of the Trust Legal Provisions and are provided without 48 warranty as described in the Simplified BSD License. 50 Abstract 52 [RFC3552] provides guidance to authors in crafting RFC text on 53 Security Considerations. The RFC is more than fifteen years old. 54 With the threat landscape and security ecosystem significantly 55 changed since the RFC was published, RFC3552 is a candidate for 56 update. This draft proposes that, prior to drafting an update to 57 RFC3552, an examination of recent, published Security Considerations 58 sections be carried out as a baseline for how to improve RFC3552. It 59 suggests a methodology for examining Security Considerations 60 sections in published RFCs and the extraction of both quantitative 61 and qualitative information that could inform a revision of the 62 older guidance. It also reports on a recent experiment on textual 63 analysis of sixteen years of RFC Security Consideration sections. 65 Table of Contents 67 1. Introduction...................................................3 68 2. Conventions used in this document..............................3 69 3. Motivation.....................................................4 70 3.1. Non-goals and scoping.....................................5 71 3.2. Research Group............................................5 72 4. Goals for Surveying Existing Security Considerations Sections..5 73 5. Methodology....................................................5 74 5.1. Methodology Overview......................................5 75 5.2. Quantitative Methodology..................................6 76 5.3. Qualitative Methodology...................................7 77 5.4. Implications of the Size of n-set.........................7 78 6. Experimental Activity..........................................8 79 6.1. Experiment Methodology....................................8 80 6.2. Stopword List.............................................8 81 6.3. Resulting Characterization...............................10 82 6.4. Indicative Results.......................................11 83 6.4.1. Top Ten Word Counts in Four Sample Years............11 84 6.4.2. Top Ten Word Counts Without RFC2119 Words in Four 85 Sample Years...............................................12 86 6.4.3. Normative RFC2119 Words in Security Considerations..12 87 7. Security Considerations.......................................13 88 8. IANA Considerations...........................................13 89 9. References....................................................13 90 9.1. Normative References.....................................13 91 9.2. Informative References...................................13 92 Appendix A. Document History.....................................14 93 Appendix B. 75 Most Common Words in Security Considerations Sections 94 .................................................................15 96 1. Introduction 98 [RFC2223] requires that all RFCs have a Security Consideration 99 section. The motivation of the section is both to encourage RFC 100 authors to consider security in protocol design and to inform 101 readers of relevant security issues. RFC3552 was published in July 102 of 2003 to give guidance to RFC authors on how to write a good 103 Security Considerations section. It is structured in three parts: a 104 tutorial and definitional section, then a series of guidelines, and 105 finally a series of examples. 107 It is possible to observe that the Internet security landscape has 108 changed significantly since the publication of RFC3552. Rather than 109 an immediate attempt to draft and discuss a revision to the older 110 RFC, it may be prudent to learn from the experience of more than 111 fifteen years of documents published since RFC3552 was approved for 112 publication. 114 It is possible that an examination of published Security 115 Considerations sections of existing documents could give both 116 quantitative and qualitative insight on how to proceed with a newer 117 version of the Security Considerations guidelines. The motivation is 118 to inform any discussion of a revision with quantitative and 119 qualitative data gleaned from years of published RFCs. 121 This document proposes a methodology for such research. 123 This scope of this proposal is for the research itself. Discussion 124 of relevant issues, document organization and revised content for a 125 revision of RFC3552 is out of scope. Instead, the motivation is to 126 guide a piece of research that would later form part of the 127 foundation for a discussion of a revision to RFC3552. 129 2. Conventions used in this document 131 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 132 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 133 document are to be interpreted as described in RFC 2119 [RFC2119]. 135 In this document, these words will appear with that interpretation 136 only when in ALL CAPS. Lower case uses of these words are not to be 137 interpreted as carrying significance described in RFC 2119. 139 3. Motivation 141 Since 1998, all RFCs have been required to have a Security 142 Considerations section. The authors of RFC3552 observed that 143 "historically, such sections have been relatively weak." The 144 motivation for RFC3552 was, in part, to improve the quality of 145 Security Considerations sections. 147 Today the Internet threat model, the landscape of attacks, and our 148 understanding of how to craft protocols that are more robust and 149 resilient has changed significantly. Experience in both protocol 150 design and implementation has greatly improved our understanding of 151 the security implications of choices made during protocol design. 153 It is possible that a revision of RFC3552, reflecting the changes to 154 the Internet and our understanding of the evolved security landscape 155 and threat model, is appropriate. The IAB is currently examining and 156 reassessing the Internet's threat model [1]. 158 The IAB has previously discussed a potential revision to RFC3552 in 159 its report from the Strengthening the Internet (STRINT) Workshop. In 160 section 2 of [RFC7687], the editors report that "...the IETF may be 161 in a position to start to develop an update to BCP 72 [RFC3552], 162 most likely as a new RFC enhancing that BCP and dealing with 163 recommendations on how to mitigate PM and how to reflect that in 164 IETF work." 166 If a revision were to be contemplated, it would be useful to learn 167 from the body of experience of crafting Security Considerations 168 sections in recent years. That body of experience could inform the 169 discussion of what makes up a good Security Considerations section 170 by collecting real-world data from existing RFCs. It would be 171 possible to have a survey of the existing Security Considerations 172 sections in published RFCs. The data collected from that survey 173 could provide one source of information for discussion of how to 174 improve upon RFC3552 in the current environment. 176 For such a survey to be successful, an outline of some basic goals 177 and a methodology would be required. This document provides those 178 goals and methodology. The intent is that individuals or 179 organizations could then carry out such a survey, publish the 180 results and use that data to inform any discussion of a potential 181 3552bis. 183 This draft also documents the results of a recent experiment to 184 conduct an automated survey of words in Security Considerations 185 sections. 187 3.1. Non-goals and scoping 189 This document specifically does not make suggestions for changes to 190 RFC3552. It also does not identify changes to the Internet threat 191 model or the general security landscape that has changed since that 192 RFC has been published. 194 The scope of this document is to provide a basic set of goals for 195 research on existing Security Considerations sections and establish 196 a methodology for conducting that research. 198 3.2. Research Group 200 The research work suggested in this document was envisioned and 201 intended to be carried out as a research activity of the proposed 202 Stopping Malware and Researching Threats (SMART) research group in 203 the IRTF. The work could also be conducted independently and 204 submitted as an Independent Submission in the IETF. 206 4. Goals for Surveying Existing Security Considerations Sections 208 A cursory examination of recent years' Security Considerations 209 sections shows that authors publish a wide variety of these 210 sections. This is natural since the RFC series has a diverse set of 211 purposes and readership. 213 However, even a cursory examination shows that published Security 214 Considerations sections have some clear characteristics. Identifying 215 useful characteristics and then surveying the existing base of 216 published RFCs may provide a useful base of information for a later 217 discussion of revising RFC3552. 219 The goal of surveying existing Security Considerations sections is 220 to provide quantitative and qualitative data, from existing, 221 published RFCs, that can be used to inform a discussion of revising 222 RFC3552. 224 5. Methodology 226 5.1. Methodology Overview 228 The survey of existing Security Considerations sections would 229 examine a subset of RFCs published since the publication of RFC3552. 230 RFCs obsoleted by later publications, RFCs that are reports from IAB 231 activities and IETF, IRTF, and IESG administrative RFC are omitted 232 from consideration. 234 The survey should select a specific timeframe, across which, all 235 RFCs published in that period are examined. 237 The examination proceeds in two parts: a quantitative examination of 238 the Security Considerations sections and then a qualitative 239 examination. 241 As an example, the quantitative examination might survey and collect 242 data on the source of the RFC (e.g. Security Area, Routing Area, 243 Transport Area), whether the RFC extends the Security Considerations 244 section of a previously published document, the wordcount of the 245 section, and the existence of specific keywords. 247 The qualitative analysis might group Security Considerations 248 sections by particular characteristics - those characteristics being 249 discovered, in part, during an initial examination of the published 250 documents. 252 5.2. Quantitative Methodology 254 Once the set of RFCs (where the size of the set is said to be n-set) 255 to be considered is established, the quantitative analysis proceeds 256 as follows for each item in the set: 258 o recording the date of publication 260 o recording the source of the original draft 262 o recording the category of the RFC (e.g. Informational, etc.) 264 o recording the size of the Security Considerations section in 265 words and paragraphs 267 o recording whether or not the section updates or extends the 268 Security Considerations section of a previously published 269 document 271 o record whether or not examples exist in the Security 272 Considerations section 274 o record whether or not example code appears in the Security 275 Considerations section 277 o extracting the text and creating a new text removing the 100 most 278 common English words 280 o against the new text created in the step above, perform text 281 analytics - for instance, create a count of the number of 282 occurrences of expected keywords 284 The result would be a series of metrics for n-set that establish 285 certain characteristics of the Security Considerations sections of 286 published RFCs. Once the quantitative data was gathered, further 287 analysis of the data could be conducted (for instance, finding 288 relationships between certain features of the RFCs). 290 5.3. Qualitative Methodology 292 The documents could also be assigned qualitative characteristics as 293 a result of the survey. For instance, based on characteristics of 294 the document, the Security Considerations could be characterized as 295 "extensive" or "limited." 297 It is also clear that analysis of the Security Considerations could 298 lead to other groupings. For instance, an analysis of recent RFCs 299 shows that those documents which focus on cipher suites have quite 300 different security considerations sections compared to those that 301 extend and existing protocol. Identification of those 302 characteristics might be possible during an initial survey. In 303 another case, those characteristics might emerge during the survey 304 execution. 306 5.4. Implications of the Size of n-set 308 Since part of the execution of the survey has to be done via human 309 intervention, the size of n-set has an effect on whether or not 310 volunteers or organizations take on the effort. While it would be 311 helpful to have as large a sample size as possible for the 312 collection of data to support the analysis. It may be necessary to 313 limit the size of n-set in practice. 315 One way to do this is to limit the range of dates for the RFCs being 316 analyzed. A cursory, initial examination of Security Considerations 317 sections seems to indicate that, in recent years, a clear set of 318 prototypical security considerations sections has emerged and that 319 there are distinct type of sections. By limiting the RFCs for the 320 set of considered document to a specific, recent timeframe the goal 321 is to focus the analysis on recent practice in crafting Security 322 Considerations sections and moving them through the document 323 approval process. 325 Another approach to solving the potential problem of the size of n- 326 set is to incorporate a sampling regime for the selection of RFCs to 327 be examined. This would be a meaningful approach in the event where 328 the timeframe was extended, but where it was still desirable to 329 reduce the size of n-set. 331 This proposal suggests to use the timeframe limitation but not 332 incorporate sampling. 334 6. Experimental Activity 336 One of the authors has conducted an experiment that is consistent 337 with many of the features of the methodology in Section 5 above. 338 This experiment uses a pair of Python scripts to extract the 339 Security Considerations sections from historic RFCs and then parse 340 those sections to get word frequency information from those Security 341 Considerations. 343 The initial experiment was motivated by a desire to see if one could 344 detect changes in Security Considerations section wording after 345 significant security incidents in the public Internet. In 346 particular, the experiment was designed to detect changes in the 347 frequency of words over time. 349 6.1. Experiment Methodology 351 The RFC series was grouped into input files based on the year of 352 publication of the RFC. 354 Using HTML versions of the RFC series document as input, these were 355 put through an open source parser. The parser then identified the 356 words "Security Consideration" or "Security" in header text. It then 357 output that text to a temporary file in UTF-8 encoding until the 358 parser encountered the next section. 360 The parser removed non-textual material from the temporary files 361 including hyphens, RFC references, anchor URLs, other sections 362 references, standalone letters and other characters that were not 363 words. 365 It then built a frequency list for all words not in a designated 366 list of words not to be counted. This list is a variable and could 367 be changed to include, or exclude, words from the designated list. 369 6.2. Stopword List 371 The following list of words were used as the designated list of 372 words not to be counted: 374 . Also 376 . Could 378 . Would 380 . However 382 . One 384 . See 386 . Use 388 . Therefore 390 . Discussed 392 . New 394 . March 396 . Type 398 . Even 400 . Following 402 . Without 404 . Bradner 406 . Using 408 . Described 410 . Might 412 . Thus 414 . Two 416 . Since 418 . Different 420 . Number 421 . Via 423 . Mechanism 425 . Used 427 . Tl 429 . Header 431 . Field 433 . Name 435 . Sent 437 6.3. Resulting Characterization 439 The result of this experiment is a pair of files for each year 440 starting in 2003. The two files for each year are: 442 . A word frequency file sorted by the number of times a 443 particular word appears in the Security Considerations section 444 of RFCs published in that year; and, 446 . An RFC Count file that counts how many times each RFC was 447 mentioned within the Security Considerations sections. 449 The idea behind the second file was to see if there was a trend or 450 change in the RFCs cited and what this might suggest or say in 451 regards to the content of these sections. For example in 2004 the 452 highest referenced RFC was [RFC3410] Applicability Statements for 453 SNMP in 2009 it was [RFC4301] Security Architecture for IP though 454 [RFC3410] was also referenced a high number of times. 456 As [RFC4301] came out in 2005 we would not expect it to be 457 referenced in 2004, but the reference count in 2009 could indicate 458 that there were a number of RFCs which likely simply referred to the 459 Security Considerations Section of this RFC in a line similar to 460 "this extends the security consideration of ." This 461 could then be used to help narrow down qualitative focus on this 462 highly referenced RFCs and to also see if in some cases lip service 463 is all that is occurring within other Security Considerations 464 Sections. 466 Another result, included with the word frequency file, is a list of 467 words similar to the word "security" based on context analysis. This 468 is another indicator that can be used to look at how the language of 469 the RFC series is changing. For example looking at 2004 the most 470 similar words are: 472 . Used, ipsec, mode, authentication, implementation, message, 473 may, watcher, method and block. 475 In 2009: 477 . Message, attacker, syslog, used, attack, information, 478 transport, gruu, may and case. 480 Yet another result was a file that provides comparative data for 481 word counts in the Security Considerations and Privacy 482 Considerations sections of published RFCs. The result provides a 483 look at whether the length of those sections might have changed over 484 time. 486 A final result was a Frequency count over the entire period examined 487 for Internet Standards, BCPs, and Proposed Standards. This result 488 gives indication of whether or not the average length of these 489 sections has changed - either over time, or in response to specific 490 security incidents on the public Internet. 492 6.4. Indicative Results 494 This draft is focused on proposing a methodology and not on the 495 experiment being reported on here. However, there are some 496 indicative results that may be of use as a future methodology is 497 considered. It is worth observing that the original motivation for 498 the experiment - to see if Security Considerations sections changed 499 in the face of security-related events on the public Internet - 500 showed that no significant re-wording took place over the timeframe 501 studied. 503 6.4.1. Top Ten Word Counts in Four Sample Years 505 Choosing four sample years - 2019 2014 2009 and 2004 as examples, 506 the experiment found the following most frequent words in Security 507 Considerations sections (the lists are in most frequent to tenth 508 most frequent). 510 . 2019 - security, server, data, message, may, network, attack, 511 information, client, xmpp-grid 513 . 2014 - security, information, attack, message, may, used, 514 server, data, authentication, network 516 . 2009 - security, may, message, address, attack, used, packet, 517 protocol, network, information 519 . 2004 - security, may, key, authentication, object, used, 520 information, message, attack, access 522 6.4.2. Top Ten Word Counts Without RFC2119 Words in Four Sample Years 524 Taking the same data and removing the normative words that are 525 defined in RFC2119 leads to slightly different results. 527 . 2019 - security, server, data, message, network, attack, 528 information, client, xmpp-grid, document 530 . 2014 - security, information, message, used, server, data, 531 authentication, network, attacker 533 . 2009 - security, message, address, attack, used, packet, 534 protocol, network, information, object 536 . 2004 - security, key, authentication, object, used, 537 information, message, attack, access, user 539 6.4.3. Normative RFC2119 Words in Security Considerations 541 The word MAY always appears more often than any other RFC2119 word 542 in Security Considerations sections. The word MUST most often 543 appears after MAY and is often in the top 15 words sorted by 544 frequency. 546 However, the word SHOULD hardly ever appears in the top 100 most 547 frequent words for any year of published RFCs. 549 Most Frequent Words in Proposed Standards Security Considerations 551 Over the entire period 2003-2019, the most frequent non-normative 552 words in Security Considerations sections was: 554 . Security, message, attack, server, information, key, 555 authentication, network, protocol, client 557 A list of the 75 most commonly, non-normative words is provided in 558 Appendix B. 560 7. Security Considerations 562 This document describes goals and a methodology for surveying the 563 existing body of Security Considerations in published RFCs. It does 564 not create, extend or modify any protocols. Its intent is to provide 565 a foundation for a data-driven discussion of the guidelines for 566 writing a Security Considerations section in an RFC. 568 8. IANA Considerations 570 Upon publication, this document has no required actions for IANA. 572 9. References 574 9.1. Normative References 576 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 577 Requirement Levels", BCP 14, RFC 2119, March 1997. 579 [RFC2223] Postel J. and Reynolds J., ISI, "Instructions to RFC 580 Authors", RFC2223, October 1997. 582 [RFC3552] Rescorla E. and Korver B.(Editors), "Guidelines for 583 Writing RFC Text on Security Considerations", BCP 72, 584 RFC3552, July 2003. 586 [RFC7687] Farrell S., Wenning R., Bos B., Blanchet M. and Tschofenig 587 H., "Report from the Strengthening the Internet (STRINT) 588 Workshop, RFC 7687, December 2015 590 9.2. Informative References 592 [1] Model-t -- Discussions of changes in Internet deployment 593 patterns and their impact on the Internet threat model, 594 https://www.ietf.org/mailman/listinfo/model-t 596 [2] Acknowledgments 598 This document was prepared using 2-Word-v2.0.template.dot. 600 Appendix A. Document History 602 -00 604 Initial Internet Draft 606 -01 608 Section 6 and Appendix B are added. Significant editing of Section 3 609 on Motivation and Section 5 on Methodology. Several typos fixed. 611 -02 613 This draft was presented at the combined RIPE MAT and IRTF MAPRG 614 meeting held on August 8,2020. Many of the comments during that 615 session focused on the fact that words like privacy and surveillance 616 did not appear in the lists of most commonly used words. After 617 presentation, a quick check indicates that those words, after the 618 publication of [RFC7258] do not appear int eh Security 619 Considerations sections of those documents. 621 The MAPRG presentation of this draft is available at the following 622 URL: 624 https://trac.ietf.org/trac/irtf/wiki/map 626 Appendix B. 75 Most Common Words in Security Considerations Sections 628 Over the entire period 2003-2019, the 75 most frequent words in 629 Security Considerations sections was (in order by frequency): 631 security, message, attack, data, used, may, authentication, key, 632 access, protocol, information, must, address, transport, process, 633 model, client, server, network, ipfix, tl, user, traffic, packet, 634 object, operation, control, service, ipp, example, document, 635 implementation, measurement, collecting, secure, header, attacker, 636 identity, value, job, need, support, snmp, provide, printer, uri, 637 certificate, authenticated, possible, name, content, source, 638 connection, field, set, system, dtls, cause, sensitive, domain, 639 provides, configuration, router, privacy, protection, peer, nacm, 640 layer, ip, device, exporting, within, request, large, and signature. 642 Authors' Addresses 644 Mark McFadden 645 Internet policy advisors ltd 646 Madison Wisconsin US 648 Email: mark@internetpolicyadvisors.com 650 Alan Mills 651 University of the West of England, Bristol 652 Bristol BS16 1QY United Kingdom 654 Email: Alan2.Mills@live.uwe.ac.uk