idnits 2.17.1 

draft-mcfadden-rfc3552-research-methodology-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

     No issues found here.

  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** There is 1 instance of too long lines in the document, the longest one
     being 12 characters in excess of 72.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == The copyright year in the IETF Trust and authors Copyright Line does not
     match the current year

  -- The document date (March 4, 2020) is 1513 days in the past.  Is this
     intentional?


  Checking references for intended status: Informational
  ----------------------------------------------------------------------------

  == Missing Reference: 'X' is mentioned on line 342, but not defined

  == Missing Reference: 'RFC3410' is mentioned on line 475, but not defined

  == Missing Reference: 'RFC4301' is mentioned on line 477, but not defined

  == Unused Reference: '2' is defined on line 617, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 2223 (Obsoleted by RFC 7322)


     Summary: 2 errors (**), 0 flaws (~~), 5 warnings (==), 1 comment (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Independent Submission                                      M. McFadden
2	Internet-Draft                             internet policy advisors ltd
3	Intended status: Informational                                 A. Mills
4	Expires: September 4, 2020                                UWE - Bristol
5	                                                          March 4, 2020

7	       Methodology for Researching Security Considerations Sections
8	            draft-mcfadden-rfc3552-research-methodology-00.txt

10	Status of this Memo

12	   This Internet-Draft is submitted in full conformance with the
13	   provisions of BCP 78 and BCP 79.

15	   Internet-Drafts are working documents of the Internet Engineering
16	   Task Force (IETF), its areas, and its working groups.  Note that
17	   other groups may also distribute working documents as Internet-
18	   Drafts.

20	   Internet-Drafts are draft documents valid for a maximum of six
21	   months and may be updated, replaced, or obsoleted by other documents
22	   at any time.  It is inappropriate to use Internet-Drafts as
23	   reference material or to cite them other than as "work in progress."

25	   The list of current Internet-Drafts can be accessed at
26	   http://www.ietf.org/ietf/1id-abstracts.txt

28	   The list of Internet-Draft Shadow Directories can be accessed at
29	   http://www.ietf.org/shadow.html

31	   This Internet-Draft will expire on September 4, 2020.

33	Copyright Notice

35	   Copyright (c) 2020 IETF Trust and the persons identified as the
36	   document authors. All rights reserved.

38	   This document is subject to BCP 78 and the IETF Trust's Legal
39	   Provisions Relating to IETF Documents
40	   (http://trustee.ietf.org/license-info) in effect on the date of
41	   publication of this document. Please review these documents
42	   carefully, as they describe your rights and restrictions with
43	   respect to this document. Code Components extracted from this
44	   document must include Simplified BSD License text as described in
45	   Section 4.e of the Trust Legal Provisions and are provided without
46	   warranty as described in the Simplified BSD License.

48	Abstract

50	   RFC3552 provides guidance to authors in crafting RFC text on
51	   Security Considerations. The RFC is more than fifteen years old.
52	   With the threat landscape and security ecosystem significantly
53	   changed since the RFC was published, RFC3552 is a candidate for
54	   update. This draft proposes that, prior to drafting an update to
55	   RFC3553, an examination of recent, published Security Considerations
56	   sections be carried out as a baseline for how to improve RFC3553. It
57	   suggests a methodology for examining Security Considerations
58	   sections in published RFCs and the extraction of both quantitative
59	   and qualitative information that could inform a revision of the
60	   older guidance. It also reports on a recent experiment on textual
61	   analysis of sixteen years of RFC Security Consideration sections.

63	Table of Contents

65	   1. Introduction...................................................3
66	   2. Conventions used in this document..............................3
67	   3. Motivation.....................................................4
68	      3.1. Non-goals and scoping.....................................5
69	      3.2. Research Group............................................5
70	   4. Goals for Surveying Existing Security Considerations Sections..5
71	   5. Methodology....................................................5
72	      5.1. Methodology Overview......................................5
73	      5.2. Quantitative Methodology..................................6
74	      5.3. Qualitative Methodology...................................7
75	      5.4. Implications of the Size of n-set.........................7
76	      5.5. Potential Additional Metrics..............................8
77	   6. Experimental Activity..........................................8
78	      6.1. Experiment Methodology....................................9
79	      6.2. Stopword List.............................................9
80	      6.3. Resulting Characterization...............................10
81	      6.4. Indicative Results.......................................12
82	         6.4.1. Top Ten Word Counts in Four Sample Years............12
83	         6.4.2. Top Ten Word Counts Without RFC2119 Words in Four
84	         Sample Years...............................................12
85	         6.4.3. Normative RFC2119 Words in Security Considerations..13
86	   7. Security Considerations.......................................13
87	   8. IANA Considerations...........................................13
88	   9. References....................................................13
89	      9.1. Normative References.....................................13
90	      9.2. Informative References...................................14
91	   Appendix A. Document History.....................................15
92	   Appendix B. 75 Most Common Words in Security Considerations Sections
93	   .................................................................16

95	1. Introduction

97	   [RFC2223] requires that all RFCs have a Security Consideration
98	   section.  The motivation of the section is both to encourage RFC
99	   authors to consider security in protocol design and to inform
100	   readers of relevant security issues.  RFC3552 was published in July
101	   of 2003 to give guidance to RFC authors on how to write a good
102	   Security Considerations section.  It is structured in three parts: a
103	   tutorial and definitional section, then a series of guidelines, and
104	   finally a series of examples.

106	   It is possible to observe that the Internet security landscape has
107	   changed significantly since the publication of RFC3552. Rather than
108	   an immediate attempt to draft and discuss a revision to the older
109	   RFC, it may be prudent to learn from the experience of more than
110	   fifteen years of documents published since RFC3552 was approved for
111	   publication.

113	   It is possible that an examination of published Security
114	   Considerations sections of existing documents could give both
115	   quantitative and qualitative insight on how to proceed with a newer
116	   version of the Security Considerations guidelines. The motivation is
117	   to inform any discussion of a revision with quantitative and
118	   qualitative data gleaned from years of published RFCs.

120	   This document proposes a methodology for such research.

122	   This scope of this proposal is for the research itself. Discussion
123	   of relevant issues, document organization and revised content for a
124	   revision of RFC3552 is out of scope. Instead, the motivation is to
125	   guide a piece of research that would later form part of the
126	   foundation for a discussion of a revision to RFC3552.

128	2. Conventions used in this document

130	   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
131	   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
132	   document are to be interpreted as described in RFC 2119 [RFC2119].

134	   In this document, these words will appear with that interpretation
135	   only when in ALL CAPS. Lower case uses of these words are not to be
136	   interpreted as carrying significance described in RFC 2119.

138	3. Motivation

140	   Since 1998, all RFCs have been required to have a Security
141	   Considerations section. The authors of RFC3552 observed that
142	   "historically, such sections have been relatively weak."  The
143	   motivation for RFC3552 was, in part, to improve the quality of
144	   Security Considerations sections.

146	   Today the Internet threat model, the landscape of attacks, and our
147	   understanding of how to craft protocols that are more robust and
148	   resilient has changed significantly. Experience in both protocol
149	   design and implementation has greatly improved our understanding of
150	   the security implications of choices made during protocol design.

152	   It is possible that a revision of RFC3552, reflecting the changes to
153	   the Internet and our understanding of the evolved security landscape
154	   and threat model, is appropriate. The IAB is currently examining and
155	   reassessing the Internet's threat model [1].

157	   The IAB has previously discussed a potential revision to RFC3552 in
158	   its report from the Strengthening the Internet (STRINT) Workshop. In
159	   section 2 of [RFC7687], the editors report that "...the IETF may be
160	   in a position to start to develop an update to BCP 72 [RFC3552],
161	   most likely as a new RFC enhancing that BCP and dealing with
162	   recommendations on how to mitigate PM and how to reflect that in
163	   IETF work."

165	   If a revision were to be contemplated, it would be useful to learn
166	   from the body of experience of crafting Security Considerations
167	   sections in recent years. That body of experience could inform the
168	   discussion of what makes up a good Security Considerations section
169	   by collecting real-world data from existing RFCs.  It would be
170	   possible to have a survey of the existing Security Considerations
171	   sections in published RFCs. The data collected from that survey
172	   could provide one source of information for discussion of how to
173	   improve upon RFC3552 in the current environment.

175	   For such a survey to be successful, an outline of some basic goals
176	   and a methodology would be required. This document provides those
177	   goals and methodology. The intent is that individuals or
178	   organizations could then carry out such a survey, publish the
179	   results and use that data to inform any discussion of a potential
180	   3552bis.

182	   This draft also documents the results of a recent experiment to
183	   conduct an automated survey of words in Security Considerations
184	   sections.

186	3.1. Non-goals and scoping

188	   This document specifically does not make suggestions for changes to
189	   RFC3552. It also does not identify changes to the Internet threat
190	   model or the general security landscape that has changed since that
191	   RFC has been published.

193	   The scope of this document is to provide a basic set of goals for
194	   research on existing Security Considerations sections and establish
195	   a methodology for conducting that research.

197	3.2. Research Group

199	   The research work suggested in this document was envisioned and
200	   intended to be carried out as a research activity of the proposed
201	   Stopping Malware and Researching Threats (SMART) research group in
202	   the IRTF. The work could also be conducted independently and
203	   submitted as an Independent Submission in the IETF.

205	4. Goals for Surveying Existing Security Considerations Sections

207	   A cursory examination of recent years' Security Considerations
208	   sections shows that authors publish a wide variety of these
209	   sections. This is natural since the RFC series has a diverse set of
210	   purposes and readership.

212	   However, even a cursory examination shows that published Security
213	   Considerations sections have some clear characteristics. Identifying
214	   useful characteristics and then surveying the existing base of
215	   published RFCs may provide a useful base of information for a later
216	   discussion of revising RFC3552.

218	   The goal of surveying existing Security Considerations sections is
219	   to provide quantitative and qualitative data, from existing,
220	   published RFCs, that can be used to inform a discussion of revising
221	   RFC3552.

223	5. Methodology

225	5.1. Methodology Overview

227	   The survey of existing Security Considerations sections would
228	   examine a subset of RFCs published since the publication of RFC3552.
229	   RFCs obsoleted by later publications, RFCs that are reports from IAB
230	   activities and IETF, IRTF, and IESG administrative RFC are omitted
231	   from consideration.

233	   The survey should select a specific timeframe, across which, all
234	   RFCs published in that period are examined.

236	   The examination proceeds in two parts: a quantitative examination of
237	   the Security Considerations sections and then a qualitative
238	   examination.

240	   As an example, the quantitative examination might survey and collect
241	   data on the source of the RFC (e.g. Security Area, Routing Area,
242	   Transport Area), whether the RFC extends the Security Considerations
243	   section of a previously published document, the wordcount of the
244	   section, and the existence of specific keywords.

246	   The qualitative analysis might group Security Considerations
247	   sections by particular characteristics - those characteristics being
248	   discovered, in part, during an initial examination of the published
249	   documents.

251	5.2. Quantitative Methodology

253	   Once the set of RFCs (where the size of the set is said to be n-set)
254	   to be considered is established, the quantitative analysis proceeds
255	   as follows for each item in the set:

257	   o  recording the date of publication

259	   o  recording the source of the original draft

261	   o  recording the category of the RFC (e.g. Informational, etc.)

263	   o  recording the size of the Security Considerations section in
264	      words and paragraphs

266	   o  recording whether or not the section updates or extends the
267	      Security Considerations section of a previously published
268	      document

270	   o  record whether or not examples exist in the Security
271	      Considerations section

273	   o  record whether or not example code appears in the Security
274	      Considerations section

276	   o  extracting the text and creating a new text removing the 100 most
277	      common English words

279	   o  against the new text created in the step above, perform text
280	      analytics - for instance, create a count of the number of
281	      occurrences of expected keywords

283	   The result would be a series of metrics for n-set that establish
284	   certain characteristics of the Security Considerations sections of
285	   published RFCs. Once the quantitative data was gathered, further
286	   analysis of the data could be conducted (for instance, finding
287	   relationships between certain features of the RFCs).

289	5.3. Qualitative Methodology

291	   The documents could also be assigned qualitative characteristics as
292	   a result of the survey. For instance, based on characteristics of
293	   the document, the Security Considerations could be characterized as
294	   "extensive" or "limited."

296	   It is also clear that analysis of the Security Considerations could
297	   lead to other groupings.  For instance, an analysis of recent RFCs
298	   shows that those documents which focus on cipher suites have quite
299	   different security considerations sections compared to those that
300	   extend and existing protocol.  Identification of those
301	   characteristics might be possible during an initial survey. In
302	   another case, those characteristics might emerge during the survey
303	   execution.

305	5.4. Implications of the Size of n-set

307	   Since part of the execution of the survey has to be done via human
308	   intervention, the size of n-set has an effect on whether or not
309	   volunteers or organizations take on the effort. While it would be
310	   helpful to have as large a sample size as possible for the
311	   collection of data to support the analysis. It may be necessary to
312	   limit the size of n-set in practice.

314	   One way to do this is to limit the range of dates for the RFCs being
315	   analyzed. A cursory, initial examination of Security Considerations
316	   sections seems to indicate that, in recent years, a clear set of
317	   prototypical security considerations sections has emerged and that
318	   there are distinct type of sections. By limiting the RFCs for the
319	   set of considered document to a specific, recent timeframe the goal
320	   is to focus the analysis on recent practice in crafting Security
321	   Considerations sections and moving them through the document
322	   approval process.

324	   Another approach to solving the potential problem of the size of n-
325	   set is to incorporate a sampling regime for the selection of RFCs to
326	   be examined. This would be a meaningful approach in the event where
327	   the timeframe was extended, but where it was still desirable to
328	   reduce the size of n-set.

330	   This proposal suggests to use the timeframe limitation but not
331	   incorporate sampling.

333	5.5. Potential Additional Metrics

335	   It's also possible to consider other metrics to be examined. The
336	   idea would be to allow for answers to open questions that have not
337	   been resolved.  As an example:

339	  . "How long do you go before you mention X?"

341	       o split the data by year, how many words into each RFC's
342	          seccons do you go before you find the word [X] or a
343	          variant/related word from set {X}

345	       o (or for how many RFCs is that word absent?)

347	       o then take the average over each year

349	       o plot a trend to see if (for example), authors are much
350	          quicker to jump to communications security words in recent
351	          years [perhaps a seed list taken from RFC3552?], or getting
352	          slower to mention "systems security" words.

354	  . Analysis per working group / area ?
355	6. Experimental Activity

357	   One of the authors has conducted an experiment that is consistent
358	   with many of the features of the methodology in Section 5 above.
359	   This experiment uses a pair of Phython scripts to extract the
360	   Security Considerations sections from historic RFCs and then parse
361	   those sections to get word frequency information from those Security
362	   Considerations.

364	   The initial experiment was motivated by a desire to see if one could
365	   detect changes in Security Considerations section wording after
366	   significant security incidents in the public Internet.  In
367	   particular, the experiment was designed to detect changes in the
368	   frequency of words over time.

370	6.1. Experiment Methodology

372	   The RFC series was grouped into input files based on the year of
373	   publication of the RFC.

375	   Using HTML versions of the RFC series document as input, these were
376	   put through an open source parser.  The parser then identified the
377	   words "Security Consideration" or "Security" in header text. It then
378	   output that text to a temporary file in UTF-8 encoding until the
379	   parser encountered the next section.

381	   The parser removed non-textual material from the temporary files
382	   including hyphens, RFC references, anchor URLs, other sections
383	   references, standalone letters and other characters that were not
384	   words.

386	   It then built a frequency list for all words not in a designated
387	   list of words not to be counted.  This list is a variable and could
388	   be changed to include, or exclude, words from the designated list.

390	6.2. Stopword List

392	   The following list of words were used as the designated list of
393	   words not to be counted:

395	     . Also

397	     . Could

399	     . Would

401	     . However

403	     . One

405	     . See

407	     . Use

409	     . Therefore

411	     . Discussed

413	     . New

415	     . March
416	     . Type

418	     . Even

420	     . Following

422	     . Without

424	     . Bradner

426	     . Using

428	     . Described

430	     . Might

432	     . Thus

434	     . Two

436	     . Since

438	     . Different

440	     . Number

442	     . Via

444	     . Mechanism

446	     . Used

448	     . Tl

450	     . Header

452	     . Field

454	     . Name

456	     . Sent

458	6.3. Resulting Characterization

460	   The result of this experiment is a pair of files for each year
461	   starting in 2003. The two files for each year are:

463	     . A word frequency file sorted by the number of times a
464	        particular word appears in the Security Considerations section
465	        of RFCs published in that year; and,

467	     . An RFC Count file that counts how many times each RFC was
468	        mentioned within the Security Considerations sections.

470	   The idea behind the second file was to see if there was a trend or
471	   change in the RFCs cited and what this might suggest or say in
472	   regards to the content of these sections. For example in 2004 the
473	   highest referenced RFC was [RFC3410] Applicability Statements for
474	   SNMP in 2009 it was [RFC4301] Security Architecture for IP though
475	   [RFC3410] was also referenced a high number of times.

477	   As [RFC4301] came out in 2005 we would not expect it to be
478	   referenced in 2004, but the reference count in 2009 could indicate
479	   that there were a number of RFCs which likely simply referred to the
480	   Security Considerations Section of this RFC in a line similar to
481	   "this extends the security consideration of <insert RFC here>." This
482	   could then be used to help narrow down qualitative focus on this
483	   highly referenced RFCs and to also see if in some cases lip service
484	   is all that is occurring within other Security Considerations
485	   Sections.

487	   Another result, included with the word frequency file, is a list of
488	   words similar to the word "security" based on context analysis. This
489	   is another indicator that can be used to look at how the language of
490	   the RFC series is changing. For example looking at 2004 the most
491	   similar words are:

493	     . Used, ipsec, mode, authentication, implementation, message,
494	        may, watcher, method and block.

496	   In 2009:

498	     . Message, attacker, syslog, used, attack, information,
499	        transport, gruu, may and case.

501	   Yet another result was a file that provides comparative data for
502	   word counts in the Security Considerations and Privacy
503	   Considerations sections of published RFCs. The result provides a
504	   look at whether the length of those sections might have changed over
505	   time.

507	   A final result was a Frequency count over the entire period examined
508	   for Internet Standards, BCPs, and Proposed Standards. This result
509	   gives indication of whether or not the average length of these
510	   sections has changed - either over time, or in response to specific
511	   security incidents on the public Internet.

513	6.4. Indicative Results

515	   This draft is focused on proposing a methodology and not on the
516	   experiment being reported on here.  However, there are some
517	   indicative results that may be of use as a future methodology is
518	   considered. It is worth observing that the original motivation for
519	   the experiment - to see if Security Considerations sections changed
520	   in the face of security-related events on the public Internet -
521	   showed that no significant re-wording took place over the timeframe
522	   studied.

524	6.4.1. Top Ten Word Counts in Four Sample Years

526	   Choosing four sample years - 2019 2014 2009 and 2004 as examples,
527	   the experiment found the following most frequent words in Security
528	   Considerations sections (the lists are in most frequent to tenth
529	   most frequent).

531	     . 2019 - security, server, data, message, may, network, attack,
532	        information, client, xmpp-grid

534	     . 2014 - security, information, attack, message, may, used,
535	        server, data, authentication, network

537	     . 2009 - security, may, message, address, attack, used, packet,
538	        protocol, network, information

540	     . 2004 - security, may, key, authentication, object, used,
541	        information, message, attack, access

543	6.4.2. Top Ten Word Counts Without RFC2119 Words in Four Sample Years

545	   Taking the same data and removing the normative words that are
546	   defined in RFC2119 leads to slightly different results.

548	     . 2019 - security, server, data, message, network, attack,
549	        information, client, xmpp-grid, document

551	     . 2014 - security, information, message, used, server, data,
552	        authentication, network, attacker

554	     . 2009 - security, message, address, attack, used, packet,
555	        protocol, network, information, object

557	     . 2004 - security, key, authentication, object, used,
558	        information, message, attack, access, user

560	6.4.3. Normative RFC2119 Words in Security Considerations

562	   The word MAY always appears more often than any other RFC2119 word
563	   in Security Considerations sections. The word MUST most often
564	   appears after MAY and is often in the top 15 words sorted by
565	   frequency.

567	   However, the word SHOULD hardly ever appears in the top 100 most
568	   frequent words for any year of published RFCs.

570	   Most Frequent Words in Proposed Standards Security Considerations

572	   Over the entire period 2003-2019, the most frequent non-normative
573	   words in Security Considerations sections was:

575	     . Security, message, attack, server, information, key,
576	        authentication, network, protocol, client

578	   A list of the 75 most commonly, non-normative words is provided in
579	   Appendix B.

581	7. Security Considerations

583	   This document describes goals and a methodology for surveying the
584	   existing body of Security Considerations in published RFCs. It does
585	   not create, extend or modify any protocols. Its intent is to provide
586	   a foundation for a data-driven discussion of the guidelines for
587	   writing a Security Considerations section in an RFC.

589	8. IANA Considerations

591	   Upon publication, this document has no required actions for IANA.

593	9. References

595	9.1. Normative References

597	   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
598	             Requirement Levels", BCP 14, RFC 2119, March 1997.

600	   [RFC2223] Postel J. and Reynolds J., ISI, "Instructions to RFC
601	             Authors", RFC2223, October 1997.

603	   [RFC3552] Rescorla E. and Korver B.(Editors), "Guidelines for
604	             Writing RFC Text on Security Considerations", BCP 72,
605	             RFC3552, July 2003.

607	   [RFC7687] Farrell S., Wenning R., Bos B., Blanchet M. and Tschofenig
608	             H., "Report from the Strengthening the Internet (STRINT)
609	             Workshop, RFC 7687, December 2015

611	9.2. Informative References

613	   [1]   Model-t -- Discussions of changes in Internet deployment
614	         patterns and their impact on the Internet threat model,
615	         https://www.ietf.org/mailman/listinfo/model-t

617	   [2]   Acknowledgments

619	   This document was prepared using 2-Word-v2.0.template.dot.

621	Appendix A.                 Document History

623	    [[ To be removed from the final document ]]

625	   -00

627	   Initial Internet Draft

629	   -01

631	   Section 6 and Appendix B are added. Significant editing of Section 3
632	   on Motivation and Section 5 on Methodology. Several typos fixed.

634	Appendix B.                 75 Most Common Words in Security Considerations Sections

636	   Over the entire period 2003-2019, the 75 most frequent words in
637	   Security Considerations sections was (in order by frequency):

639	   security, message, attack, data, used, may, authentication, key,
640	   access, protocol, information, must, address, transport, process,
641	   model, client, server, network, ipfix, tl, user, traffic, packet,
642	   object, operation, control, service, ipp, example, document,
643	   implementation, measurement, collecting, secure, header, attacker,
644	   identity, value, job, need, support, snmp, provide, printer, uri,
645	   certificate, authenticated, possible, name, content, source,
646	   connection, field, set, system, dtls, cause, sensitive, domain,
647	   provides, configuration, router, privacy, protection, peer, nacm,
648	   layer, ip, device, exporting, within, request, large, and signature.

650	Authors' Addresses

652	   Mark McFadden
653	   Internet policy advisors ltd
654	   Madison Wisconsin US

656	   Email: mark@internetpolicyadvisors.com

658	   Alan Mills
659	   University of the West of England, Bristol
660	   Bristol BS16 1QY United Kingdom

662	   Email: Alan2.Mills@live.uwe.ac.uk