idnits 2.17.1 draft-mcfadden-rfc3552-research-methodology-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- No issues found here. Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** There is 1 instance of too long lines in the document, the longest one being 12 characters in excess of 72. Miscellaneous warnings: ---------------------------------------------------------------------------- == The copyright year in the IETF Trust and authors Copyright Line does not match the current year -- The document date (March 4, 2020) is 1513 days in the past. Is this intentional? Checking references for intended status: Informational ---------------------------------------------------------------------------- == Missing Reference: 'X' is mentioned on line 342, but not defined == Missing Reference: 'RFC3410' is mentioned on line 475, but not defined == Missing Reference: 'RFC4301' is mentioned on line 477, but not defined == Unused Reference: '2' is defined on line 617, but no explicit reference was found in the text ** Obsolete normative reference: RFC 2223 (Obsoleted by RFC 7322) Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Independent Submission M. McFadden 2 Internet-Draft internet policy advisors ltd 3 Intended status: Informational A. Mills 4 Expires: September 4, 2020 UWE - Bristol 5 March 4, 2020 7 Methodology for Researching Security Considerations Sections 8 draft-mcfadden-rfc3552-research-methodology-00.txt 10 Status of this Memo 12 This Internet-Draft is submitted in full conformance with the 13 provisions of BCP 78 and BCP 79. 15 Internet-Drafts are working documents of the Internet Engineering 16 Task Force (IETF), its areas, and its working groups. Note that 17 other groups may also distribute working documents as Internet- 18 Drafts. 20 Internet-Drafts are draft documents valid for a maximum of six 21 months and may be updated, replaced, or obsoleted by other documents 22 at any time. It is inappropriate to use Internet-Drafts as 23 reference material or to cite them other than as "work in progress." 25 The list of current Internet-Drafts can be accessed at 26 http://www.ietf.org/ietf/1id-abstracts.txt 28 The list of Internet-Draft Shadow Directories can be accessed at 29 http://www.ietf.org/shadow.html 31 This Internet-Draft will expire on September 4, 2020. 33 Copyright Notice 35 Copyright (c) 2020 IETF Trust and the persons identified as the 36 document authors. All rights reserved. 38 This document is subject to BCP 78 and the IETF Trust's Legal 39 Provisions Relating to IETF Documents 40 (http://trustee.ietf.org/license-info) in effect on the date of 41 publication of this document. Please review these documents 42 carefully, as they describe your rights and restrictions with 43 respect to this document. Code Components extracted from this 44 document must include Simplified BSD License text as described in 45 Section 4.e of the Trust Legal Provisions and are provided without 46 warranty as described in the Simplified BSD License. 48 Abstract 50 RFC3552 provides guidance to authors in crafting RFC text on 51 Security Considerations. The RFC is more than fifteen years old. 52 With the threat landscape and security ecosystem significantly 53 changed since the RFC was published, RFC3552 is a candidate for 54 update. This draft proposes that, prior to drafting an update to 55 RFC3553, an examination of recent, published Security Considerations 56 sections be carried out as a baseline for how to improve RFC3553. It 57 suggests a methodology for examining Security Considerations 58 sections in published RFCs and the extraction of both quantitative 59 and qualitative information that could inform a revision of the 60 older guidance. It also reports on a recent experiment on textual 61 analysis of sixteen years of RFC Security Consideration sections. 63 Table of Contents 65 1. Introduction...................................................3 66 2. Conventions used in this document..............................3 67 3. Motivation.....................................................4 68 3.1. Non-goals and scoping.....................................5 69 3.2. Research Group............................................5 70 4. Goals for Surveying Existing Security Considerations Sections..5 71 5. Methodology....................................................5 72 5.1. Methodology Overview......................................5 73 5.2. Quantitative Methodology..................................6 74 5.3. Qualitative Methodology...................................7 75 5.4. Implications of the Size of n-set.........................7 76 5.5. Potential Additional Metrics..............................8 77 6. Experimental Activity..........................................8 78 6.1. Experiment Methodology....................................9 79 6.2. Stopword List.............................................9 80 6.3. Resulting Characterization...............................10 81 6.4. Indicative Results.......................................12 82 6.4.1. Top Ten Word Counts in Four Sample Years............12 83 6.4.2. Top Ten Word Counts Without RFC2119 Words in Four 84 Sample Years...............................................12 85 6.4.3. Normative RFC2119 Words in Security Considerations..13 86 7. Security Considerations.......................................13 87 8. IANA Considerations...........................................13 88 9. References....................................................13 89 9.1. Normative References.....................................13 90 9.2. Informative References...................................14 91 Appendix A. Document History.....................................15 92 Appendix B. 75 Most Common Words in Security Considerations Sections 93 .................................................................16 95 1. Introduction 97 [RFC2223] requires that all RFCs have a Security Consideration 98 section. The motivation of the section is both to encourage RFC 99 authors to consider security in protocol design and to inform 100 readers of relevant security issues. RFC3552 was published in July 101 of 2003 to give guidance to RFC authors on how to write a good 102 Security Considerations section. It is structured in three parts: a 103 tutorial and definitional section, then a series of guidelines, and 104 finally a series of examples. 106 It is possible to observe that the Internet security landscape has 107 changed significantly since the publication of RFC3552. Rather than 108 an immediate attempt to draft and discuss a revision to the older 109 RFC, it may be prudent to learn from the experience of more than 110 fifteen years of documents published since RFC3552 was approved for 111 publication. 113 It is possible that an examination of published Security 114 Considerations sections of existing documents could give both 115 quantitative and qualitative insight on how to proceed with a newer 116 version of the Security Considerations guidelines. The motivation is 117 to inform any discussion of a revision with quantitative and 118 qualitative data gleaned from years of published RFCs. 120 This document proposes a methodology for such research. 122 This scope of this proposal is for the research itself. Discussion 123 of relevant issues, document organization and revised content for a 124 revision of RFC3552 is out of scope. Instead, the motivation is to 125 guide a piece of research that would later form part of the 126 foundation for a discussion of a revision to RFC3552. 128 2. Conventions used in this document 130 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 131 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 132 document are to be interpreted as described in RFC 2119 [RFC2119]. 134 In this document, these words will appear with that interpretation 135 only when in ALL CAPS. Lower case uses of these words are not to be 136 interpreted as carrying significance described in RFC 2119. 138 3. Motivation 140 Since 1998, all RFCs have been required to have a Security 141 Considerations section. The authors of RFC3552 observed that 142 "historically, such sections have been relatively weak." The 143 motivation for RFC3552 was, in part, to improve the quality of 144 Security Considerations sections. 146 Today the Internet threat model, the landscape of attacks, and our 147 understanding of how to craft protocols that are more robust and 148 resilient has changed significantly. Experience in both protocol 149 design and implementation has greatly improved our understanding of 150 the security implications of choices made during protocol design. 152 It is possible that a revision of RFC3552, reflecting the changes to 153 the Internet and our understanding of the evolved security landscape 154 and threat model, is appropriate. The IAB is currently examining and 155 reassessing the Internet's threat model [1]. 157 The IAB has previously discussed a potential revision to RFC3552 in 158 its report from the Strengthening the Internet (STRINT) Workshop. In 159 section 2 of [RFC7687], the editors report that "...the IETF may be 160 in a position to start to develop an update to BCP 72 [RFC3552], 161 most likely as a new RFC enhancing that BCP and dealing with 162 recommendations on how to mitigate PM and how to reflect that in 163 IETF work." 165 If a revision were to be contemplated, it would be useful to learn 166 from the body of experience of crafting Security Considerations 167 sections in recent years. That body of experience could inform the 168 discussion of what makes up a good Security Considerations section 169 by collecting real-world data from existing RFCs. It would be 170 possible to have a survey of the existing Security Considerations 171 sections in published RFCs. The data collected from that survey 172 could provide one source of information for discussion of how to 173 improve upon RFC3552 in the current environment. 175 For such a survey to be successful, an outline of some basic goals 176 and a methodology would be required. This document provides those 177 goals and methodology. The intent is that individuals or 178 organizations could then carry out such a survey, publish the 179 results and use that data to inform any discussion of a potential 180 3552bis. 182 This draft also documents the results of a recent experiment to 183 conduct an automated survey of words in Security Considerations 184 sections. 186 3.1. Non-goals and scoping 188 This document specifically does not make suggestions for changes to 189 RFC3552. It also does not identify changes to the Internet threat 190 model or the general security landscape that has changed since that 191 RFC has been published. 193 The scope of this document is to provide a basic set of goals for 194 research on existing Security Considerations sections and establish 195 a methodology for conducting that research. 197 3.2. Research Group 199 The research work suggested in this document was envisioned and 200 intended to be carried out as a research activity of the proposed 201 Stopping Malware and Researching Threats (SMART) research group in 202 the IRTF. The work could also be conducted independently and 203 submitted as an Independent Submission in the IETF. 205 4. Goals for Surveying Existing Security Considerations Sections 207 A cursory examination of recent years' Security Considerations 208 sections shows that authors publish a wide variety of these 209 sections. This is natural since the RFC series has a diverse set of 210 purposes and readership. 212 However, even a cursory examination shows that published Security 213 Considerations sections have some clear characteristics. Identifying 214 useful characteristics and then surveying the existing base of 215 published RFCs may provide a useful base of information for a later 216 discussion of revising RFC3552. 218 The goal of surveying existing Security Considerations sections is 219 to provide quantitative and qualitative data, from existing, 220 published RFCs, that can be used to inform a discussion of revising 221 RFC3552. 223 5. Methodology 225 5.1. Methodology Overview 227 The survey of existing Security Considerations sections would 228 examine a subset of RFCs published since the publication of RFC3552. 229 RFCs obsoleted by later publications, RFCs that are reports from IAB 230 activities and IETF, IRTF, and IESG administrative RFC are omitted 231 from consideration. 233 The survey should select a specific timeframe, across which, all 234 RFCs published in that period are examined. 236 The examination proceeds in two parts: a quantitative examination of 237 the Security Considerations sections and then a qualitative 238 examination. 240 As an example, the quantitative examination might survey and collect 241 data on the source of the RFC (e.g. Security Area, Routing Area, 242 Transport Area), whether the RFC extends the Security Considerations 243 section of a previously published document, the wordcount of the 244 section, and the existence of specific keywords. 246 The qualitative analysis might group Security Considerations 247 sections by particular characteristics - those characteristics being 248 discovered, in part, during an initial examination of the published 249 documents. 251 5.2. Quantitative Methodology 253 Once the set of RFCs (where the size of the set is said to be n-set) 254 to be considered is established, the quantitative analysis proceeds 255 as follows for each item in the set: 257 o recording the date of publication 259 o recording the source of the original draft 261 o recording the category of the RFC (e.g. Informational, etc.) 263 o recording the size of the Security Considerations section in 264 words and paragraphs 266 o recording whether or not the section updates or extends the 267 Security Considerations section of a previously published 268 document 270 o record whether or not examples exist in the Security 271 Considerations section 273 o record whether or not example code appears in the Security 274 Considerations section 276 o extracting the text and creating a new text removing the 100 most 277 common English words 279 o against the new text created in the step above, perform text 280 analytics - for instance, create a count of the number of 281 occurrences of expected keywords 283 The result would be a series of metrics for n-set that establish 284 certain characteristics of the Security Considerations sections of 285 published RFCs. Once the quantitative data was gathered, further 286 analysis of the data could be conducted (for instance, finding 287 relationships between certain features of the RFCs). 289 5.3. Qualitative Methodology 291 The documents could also be assigned qualitative characteristics as 292 a result of the survey. For instance, based on characteristics of 293 the document, the Security Considerations could be characterized as 294 "extensive" or "limited." 296 It is also clear that analysis of the Security Considerations could 297 lead to other groupings. For instance, an analysis of recent RFCs 298 shows that those documents which focus on cipher suites have quite 299 different security considerations sections compared to those that 300 extend and existing protocol. Identification of those 301 characteristics might be possible during an initial survey. In 302 another case, those characteristics might emerge during the survey 303 execution. 305 5.4. Implications of the Size of n-set 307 Since part of the execution of the survey has to be done via human 308 intervention, the size of n-set has an effect on whether or not 309 volunteers or organizations take on the effort. While it would be 310 helpful to have as large a sample size as possible for the 311 collection of data to support the analysis. It may be necessary to 312 limit the size of n-set in practice. 314 One way to do this is to limit the range of dates for the RFCs being 315 analyzed. A cursory, initial examination of Security Considerations 316 sections seems to indicate that, in recent years, a clear set of 317 prototypical security considerations sections has emerged and that 318 there are distinct type of sections. By limiting the RFCs for the 319 set of considered document to a specific, recent timeframe the goal 320 is to focus the analysis on recent practice in crafting Security 321 Considerations sections and moving them through the document 322 approval process. 324 Another approach to solving the potential problem of the size of n- 325 set is to incorporate a sampling regime for the selection of RFCs to 326 be examined. This would be a meaningful approach in the event where 327 the timeframe was extended, but where it was still desirable to 328 reduce the size of n-set. 330 This proposal suggests to use the timeframe limitation but not 331 incorporate sampling. 333 5.5. Potential Additional Metrics 335 It's also possible to consider other metrics to be examined. The 336 idea would be to allow for answers to open questions that have not 337 been resolved. As an example: 339 . "How long do you go before you mention X?" 341 o split the data by year, how many words into each RFC's 342 seccons do you go before you find the word [X] or a 343 variant/related word from set {X} 345 o (or for how many RFCs is that word absent?) 347 o then take the average over each year 349 o plot a trend to see if (for example), authors are much 350 quicker to jump to communications security words in recent 351 years [perhaps a seed list taken from RFC3552?], or getting 352 slower to mention "systems security" words. 354 . Analysis per working group / area ? 355 6. Experimental Activity 357 One of the authors has conducted an experiment that is consistent 358 with many of the features of the methodology in Section 5 above. 359 This experiment uses a pair of Phython scripts to extract the 360 Security Considerations sections from historic RFCs and then parse 361 those sections to get word frequency information from those Security 362 Considerations. 364 The initial experiment was motivated by a desire to see if one could 365 detect changes in Security Considerations section wording after 366 significant security incidents in the public Internet. In 367 particular, the experiment was designed to detect changes in the 368 frequency of words over time. 370 6.1. Experiment Methodology 372 The RFC series was grouped into input files based on the year of 373 publication of the RFC. 375 Using HTML versions of the RFC series document as input, these were 376 put through an open source parser. The parser then identified the 377 words "Security Consideration" or "Security" in header text. It then 378 output that text to a temporary file in UTF-8 encoding until the 379 parser encountered the next section. 381 The parser removed non-textual material from the temporary files 382 including hyphens, RFC references, anchor URLs, other sections 383 references, standalone letters and other characters that were not 384 words. 386 It then built a frequency list for all words not in a designated 387 list of words not to be counted. This list is a variable and could 388 be changed to include, or exclude, words from the designated list. 390 6.2. Stopword List 392 The following list of words were used as the designated list of 393 words not to be counted: 395 . Also 397 . Could 399 . Would 401 . However 403 . One 405 . See 407 . Use 409 . Therefore 411 . Discussed 413 . New 415 . March 416 . Type 418 . Even 420 . Following 422 . Without 424 . Bradner 426 . Using 428 . Described 430 . Might 432 . Thus 434 . Two 436 . Since 438 . Different 440 . Number 442 . Via 444 . Mechanism 446 . Used 448 . Tl 450 . Header 452 . Field 454 . Name 456 . Sent 458 6.3. Resulting Characterization 460 The result of this experiment is a pair of files for each year 461 starting in 2003. The two files for each year are: 463 . A word frequency file sorted by the number of times a 464 particular word appears in the Security Considerations section 465 of RFCs published in that year; and, 467 . An RFC Count file that counts how many times each RFC was 468 mentioned within the Security Considerations sections. 470 The idea behind the second file was to see if there was a trend or 471 change in the RFCs cited and what this might suggest or say in 472 regards to the content of these sections. For example in 2004 the 473 highest referenced RFC was [RFC3410] Applicability Statements for 474 SNMP in 2009 it was [RFC4301] Security Architecture for IP though 475 [RFC3410] was also referenced a high number of times. 477 As [RFC4301] came out in 2005 we would not expect it to be 478 referenced in 2004, but the reference count in 2009 could indicate 479 that there were a number of RFCs which likely simply referred to the 480 Security Considerations Section of this RFC in a line similar to 481 "this extends the security consideration of ." This 482 could then be used to help narrow down qualitative focus on this 483 highly referenced RFCs and to also see if in some cases lip service 484 is all that is occurring within other Security Considerations 485 Sections. 487 Another result, included with the word frequency file, is a list of 488 words similar to the word "security" based on context analysis. This 489 is another indicator that can be used to look at how the language of 490 the RFC series is changing. For example looking at 2004 the most 491 similar words are: 493 . Used, ipsec, mode, authentication, implementation, message, 494 may, watcher, method and block. 496 In 2009: 498 . Message, attacker, syslog, used, attack, information, 499 transport, gruu, may and case. 501 Yet another result was a file that provides comparative data for 502 word counts in the Security Considerations and Privacy 503 Considerations sections of published RFCs. The result provides a 504 look at whether the length of those sections might have changed over 505 time. 507 A final result was a Frequency count over the entire period examined 508 for Internet Standards, BCPs, and Proposed Standards. This result 509 gives indication of whether or not the average length of these 510 sections has changed - either over time, or in response to specific 511 security incidents on the public Internet. 513 6.4. Indicative Results 515 This draft is focused on proposing a methodology and not on the 516 experiment being reported on here. However, there are some 517 indicative results that may be of use as a future methodology is 518 considered. It is worth observing that the original motivation for 519 the experiment - to see if Security Considerations sections changed 520 in the face of security-related events on the public Internet - 521 showed that no significant re-wording took place over the timeframe 522 studied. 524 6.4.1. Top Ten Word Counts in Four Sample Years 526 Choosing four sample years - 2019 2014 2009 and 2004 as examples, 527 the experiment found the following most frequent words in Security 528 Considerations sections (the lists are in most frequent to tenth 529 most frequent). 531 . 2019 - security, server, data, message, may, network, attack, 532 information, client, xmpp-grid 534 . 2014 - security, information, attack, message, may, used, 535 server, data, authentication, network 537 . 2009 - security, may, message, address, attack, used, packet, 538 protocol, network, information 540 . 2004 - security, may, key, authentication, object, used, 541 information, message, attack, access 543 6.4.2. Top Ten Word Counts Without RFC2119 Words in Four Sample Years 545 Taking the same data and removing the normative words that are 546 defined in RFC2119 leads to slightly different results. 548 . 2019 - security, server, data, message, network, attack, 549 information, client, xmpp-grid, document 551 . 2014 - security, information, message, used, server, data, 552 authentication, network, attacker 554 . 2009 - security, message, address, attack, used, packet, 555 protocol, network, information, object 557 . 2004 - security, key, authentication, object, used, 558 information, message, attack, access, user 560 6.4.3. Normative RFC2119 Words in Security Considerations 562 The word MAY always appears more often than any other RFC2119 word 563 in Security Considerations sections. The word MUST most often 564 appears after MAY and is often in the top 15 words sorted by 565 frequency. 567 However, the word SHOULD hardly ever appears in the top 100 most 568 frequent words for any year of published RFCs. 570 Most Frequent Words in Proposed Standards Security Considerations 572 Over the entire period 2003-2019, the most frequent non-normative 573 words in Security Considerations sections was: 575 . Security, message, attack, server, information, key, 576 authentication, network, protocol, client 578 A list of the 75 most commonly, non-normative words is provided in 579 Appendix B. 581 7. Security Considerations 583 This document describes goals and a methodology for surveying the 584 existing body of Security Considerations in published RFCs. It does 585 not create, extend or modify any protocols. Its intent is to provide 586 a foundation for a data-driven discussion of the guidelines for 587 writing a Security Considerations section in an RFC. 589 8. IANA Considerations 591 Upon publication, this document has no required actions for IANA. 593 9. References 595 9.1. Normative References 597 [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate 598 Requirement Levels", BCP 14, RFC 2119, March 1997. 600 [RFC2223] Postel J. and Reynolds J., ISI, "Instructions to RFC 601 Authors", RFC2223, October 1997. 603 [RFC3552] Rescorla E. and Korver B.(Editors), "Guidelines for 604 Writing RFC Text on Security Considerations", BCP 72, 605 RFC3552, July 2003. 607 [RFC7687] Farrell S., Wenning R., Bos B., Blanchet M. and Tschofenig 608 H., "Report from the Strengthening the Internet (STRINT) 609 Workshop, RFC 7687, December 2015 611 9.2. Informative References 613 [1] Model-t -- Discussions of changes in Internet deployment 614 patterns and their impact on the Internet threat model, 615 https://www.ietf.org/mailman/listinfo/model-t 617 [2] Acknowledgments 619 This document was prepared using 2-Word-v2.0.template.dot. 621 Appendix A. Document History 623 [[ To be removed from the final document ]] 625 -00 627 Initial Internet Draft 629 -01 631 Section 6 and Appendix B are added. Significant editing of Section 3 632 on Motivation and Section 5 on Methodology. Several typos fixed. 634 Appendix B. 75 Most Common Words in Security Considerations Sections 636 Over the entire period 2003-2019, the 75 most frequent words in 637 Security Considerations sections was (in order by frequency): 639 security, message, attack, data, used, may, authentication, key, 640 access, protocol, information, must, address, transport, process, 641 model, client, server, network, ipfix, tl, user, traffic, packet, 642 object, operation, control, service, ipp, example, document, 643 implementation, measurement, collecting, secure, header, attacker, 644 identity, value, job, need, support, snmp, provide, printer, uri, 645 certificate, authenticated, possible, name, content, source, 646 connection, field, set, system, dtls, cause, sensitive, domain, 647 provides, configuration, router, privacy, protection, peer, nacm, 648 layer, ip, device, exporting, within, request, large, and signature. 650 Authors' Addresses 652 Mark McFadden 653 Internet policy advisors ltd 654 Madison Wisconsin US 656 Email: mark@internetpolicyadvisors.com 658 Alan Mills 659 University of the West of England, Bristol 660 Bristol BS16 1QY United Kingdom 662 Email: Alan2.Mills@live.uwe.ac.uk