Network Working Group M. Rose Internet-Draft Dover Beach Consulting, Inc. Expires: May 7, 2003 D. Crocker Brandenburg InternetWorking November 6, 2002 Toward a Quantitative Analysis of IETF Productivity draft-etal-ietf-analysis-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http:// www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on May 7, 2003. Copyright Notice Copyright (C) The Internet Society (2002). All Rights Reserved. Abstract This memo presents an initial quantitative analysis of the IETF's working groups using RFC publication as the primary metric. These basic indicators are sufficient for an initial assessment of IETF performance. We can discuss our expectations for the numbers and our reaction to them. Where there is a discrepancy, we can decide whether to change our expectations or whether to look for ways to improve the numbers. In other words, the purpose of this effort is to encourage community discussion about measuring IETF productivity, detecting possible problems and fixing them. Rose & Crocker Expires May 7, 2003 [Page 1] Internet-Draft Quantifying of IETF Productivity November 2002 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Statistics and Methodology . . . . . . . . . . . . . . . . . . 5 2.1 What to measure . . . . . . . . . . . . . . . . . . . . . . . 5 3. The Model . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4. The Queries . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. The Results . . . . . . . . . . . . . . . . . . . . . . . . . 12 5.1 Days until 1st RFC published . . . . . . . . . . . . . . . . . 14 5.2 WG duration in days . . . . . . . . . . . . . . . . . . . . . 15 5.3 WG duration normalized over #-RFCs produced . . . . . . . . . 16 5.4 Numver of RFCs produced . . . . . . . . . . . . . . . . . . . 17 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 18 6.1 Two Suggestions for the WG Charter Document System . . . . . 18 7. Security Considerations . . . . . . . . . . . . . . . . . . . 20 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21 A. Data Gathering and Processing . . . . . . . . . . . . . . . . 22 A.1 Charters, Messages, and Documents . . . . . . . . . . . . . . 22 A.2 Reconstructing Activity . . . . . . . . . . . . . . . . . . . 23 A.3 Running the Analysis . . . . . . . . . . . . . . . . . . . . . 24 Full Copyright Statement . . . . . . . . . . . . . . . . . . . 26 Rose & Crocker Expires May 7, 2003 [Page 2] Internet-Draft Quantifying of IETF Productivity November 2002 1. Introduction An important part of management is measurement. When you can measure something, you can make reasoned decisions as to how to improve it -- if not "reasoned", then at least "better informed". Imagine trying to achieve the last 15 years of improvements to TCP's algorithms without being able to measure the results at each step! The Internet depends upon the IETF's producing useful specifications, in a timely manner. Many folks contribute considerable resources to that goal, so a lot of folks should be interested in understanding how well the IETF is working. Historically the IETF has an excellent track record. However as it has grown, there is increasing concern that IETF efforts are less efficient and less effective. To date the community has relied solely on its subjective sense of this change. It is time to get help from evaluation tools that are both objective and useful. Obviously measuring the effectiveness of an organization is different than measuring the effectiveness of a protocol. However there are some fairly simple, objective metrics that we can apply to get a first- or second-order approximation. (Previously, the only public analysis of the IETF, per se, has been of the number of working groups and meeting size. While interesting these metrics are useful primarily for logistics planning.) This memo is to encourage community discussion about measuring IETF productivity, detecting possible problems and fixing them. Therefore simple measurements and simple analyses are used. It is hoped that the community will focus on the question of IETF productivity, rather than the question of methodological imperfections. The ultimate test of the IETF is that its work gets used by the Internet community. Given the size of the Internet, this should mean that the work is employed on millions or even hundreds of millions of platforms. However it is years before adoption and use can be measured, and even then, we do not have objective methods for assessing that success. Consider these two assertions: o from a work product perspective, the IETF is simply the sum of its working groups; and, o while a working group might do many valuable things, the only quantifiable metric is the number and timing of the RFCs that it Rose & Crocker Expires May 7, 2003 [Page 3] Internet-Draft Quantifying of IETF Productivity November 2002 produces. So to measure something now, we'll take the middle road: while agreeing that there are significant qualitative aspects of the IETF that are not tied to RFC publication, and while hoping that IETF participants are significant stakeholders in terms of needing successful implementation and provisioning, we'll focus on measuring IETF production of RFCs. In particular does the IETF produce specifications efficiently? To be useful answers to such a question needs to identify activities or areas that might be problematic. Rose & Crocker Expires May 7, 2003 [Page 4] Internet-Draft Quantifying of IETF Productivity November 2002 2. Statistics and Methodology The first question is what gives guidance about IETF productivity and is reasonably easy to measure objectively? Statistical measurement involves the choice of what to measure and the choice of how to analyze the measurements. In this memo we keep both choices as simple as possible. The first needs to be intuitively reasonable and procedurally easy. The second also needs to be minimalist. Because this process of IETF measurement is new, no one has enough knowledge about it to use sophisticated statistical tools. (For that matter, the small very small quantity of data and its non-normal distribution make use of the usual statistics tools inappropriate.) In fact the IETF approach to rough consensus is helpful here. It means that we need the data and the analyses to be straightforward, so that the bulk of the community can understand it easily, and agree with it. Therefore we limit ourselves to the most basic calculations: mean, minimum and maximum, and standard deviation (sigma). The mean tells us what is "usual" whilst the minimum and maximum tell us the upper and lower bounds of performance. The standard deviation gives us gradations of "better" and "worse", "faster" and "slower". These basic indicators are sufficient for an initial assessment of IETF performance. We can discuss our expectations for the numbers and our reaction to them. Where there is a discrepancy, we can decide whether to change our expectations or whether to look for ways to improve the numbers. 1. Develop a model and populate it with data. 2. Decide on some interesting queries (analyses) to run against the data. 3. Look at the results. We hope that such tidbits prompt community discussion about useful metrics and their preferred values. Even better will be the development of community consensus for such measures and on-going use of them to improve IETF performance. 2.1 What to measure Since we can't measure exactly what we'd like, namely "productivity", we'll measure something that's close. Rose & Crocker Expires May 7, 2003 [Page 5] Internet-Draft Quantifying of IETF Productivity November 2002 It is easy to create mechanical measurements of IETF activity. The measure can be entirely objective and entirely meaningless. Size of meeting sessions, number of messages on the mailing list, number of words per specification, and number of RFCs cited in specifications are all examples. For some these measures might be interesting, but they do not really tell us much about productivity. They do not tell us how well specifications are produced. So it is difficult to measure statistics that say something both meaningful and accurate. They need to tell us how well one or a number of efforts is working. They might also tell us where we have a problem. Although we try, here, to select particular measures that seem useful, we carefully avoid drawing any qualitative conclusions about the results. Therefore while we might show that it takes a working group about 740 days to produce its first RFC, we will not say that this is either "sweet" or "sour". As with everything else in life, reader are free to draw their own conclusions (that naturally reinforce their own perspectives). Rose & Crocker Expires May 7, 2003 [Page 6] Internet-Draft Quantifying of IETF Productivity November 2002 3. The Model Given our second assertion, that RFC publication is the only thing that a working group does that we care to measure. How can we go about measuring it? The first step is to develop a model of a working group. For this analysis, we're going to use XML as the description language: which provides a document-centric, and, to a lesser extent, a role- centric model of a working group. Of course, it's not a complete model, (e.g., BOF and meeting information is absent). Rose & Crocker Expires May 7, 2003 [Page 7] Internet-Draft Quantifying of IETF Productivity November 2002 To make this data model a bit more concrete, here's an abbreviated example: ... ... In case it isn't obvious, throughout this memo the presence of an ellipsis ("...") in an example indicates that some information was omitted. It turns out this information can be gathered in a fairly automated fashion! Consult the Appendix for the details (including how to get a copy of the dataset along with the tools that synthesized it.) Rose & Crocker Expires May 7, 2003 [Page 8] Internet-Draft Quantifying of IETF Productivity November 2002 4. The Queries Although the IETF secretariat does a great job of documenting things, the process has evolved over more than a decade. It turns out that the data starts to get "really clean", for quantitative analysis, in late 1998. Moving further back, it becomes progressively harder to construct the written record. So, the very first thing to appreciate is that, for the purposes of this analysis, the epoch for any query is February 12, 1997 -- getting the data clean from that date moving forward is straight-forward. With a lot more work, we can move the epoch backward in time, This will be particularly helpful for doing trend analysis, trying to discern changes in IETF productivity. Regardless of the data cleanliness issue, it turns out that February, 1997 is a natural choice for an epoch -- this covers the last five years. Besides cleanliness, there may be other things to consider when limiting the input domain. For example, if a working group was created very recently, the fact that it hasn't published any RFCs yet is hardly of interest. The question, of course, is where the cutoff is for "very recently". For the purposes of this analysis, we'll consider any working group that's at least two years old as being of interest. Another thing to consider is whether a working group is actually "active". The IESG often does not conclude a working group at the end of a document production cycle -- instead, the working group may remain on "the books" as active, even though it isn't allowed to produce any more RFCs until some external event occurs, (e.g., an period of implementation and experimentation before re-examining a document). Since this may affect some queries (e.g., "what's the lifetime of a working group?"), we'll use the following heuristic: if the working group has published at least one RFC, and if all of its Internet-Drafts have been published as RFCs, then we consider it "inactive". So, here are the queries to consider: o How long does it a take a working group to get its first RFC published? o How long is a working group "active"? o How many RFCs does a working group publish? o For these quantities, what is the average, the distribution, and the extrema? Rose & Crocker Expires May 7, 2003 [Page 9] Internet-Draft Quantifying of IETF Productivity November 2002 o Do these relationships change if we aggregate the working groups into areas? In order to answer questions such as these, we'll also need a relational model, for which we'll use SQL: CREATE TABLE person ( id int(11) NOT NULL auto_increment, name varchar(255) NOT NULL default '', PRIMARY KEY (id) ); CREATE TABLE groop ( id int(11) NOT NULL auto_increment, name varchar(8) NOT NULL default '', area varchar(8) NOT NULL default '', chartered date NOT NULL default '', concluded date default NULL, title varchar(255) NOT NULL default '', PRIMARY KEY (id) ); CREATE TABLE role ( id int(11) NOT NULL auto_increment, groupID int(11) NOT NULL default 0, personID int(11) NOT NULL default 0, name varchar(25) NOT NULL default '', PRIMARY KEY (id) ); CREATE TABLE doc ( id int(11) NOT NULL auto_increment, groupID int(11) NOT NULL default 0, status varchar(25) NOT NULL default '', name varchar(255) NOT NULL default '', PRIMARY KEY (id) ); CREATE TABLE revision ( id int(11) NOT NULL auto_increment, groupID int(11) NOT NULL default 0, docID int(11) NOT NULL default 0, size int(11) NOT NULL default 0, published date NOT NULL default '', uri varchar(255) NOT NULL default '', PRIMARY KEY (id) Rose & Crocker Expires May 7, 2003 [Page 10] Internet-Draft Quantifying of IETF Productivity November 2002 ); Rose & Crocker Expires May 7, 2003 [Page 11] Internet-Draft Quantifying of IETF Productivity November 2002 5. The Results We now look at the results from the queries. Note that for a working group to be considered in this analysis: o it must have been chartered since the epoch; and, o either: * have published at least one RFC; or, * be at least two years old. Seventy-five working groups meet this criteria: area size ================= ==== Applications app 19 Internet int 7 Operations ops 11 Routing rtg 5 Security sec 9 Sub-IP sub 2 Transport tsv 20 User Services usv 2 With the exception of the Sub-IP and User Services areas, we should be able to make comparisons between areas. For each of the queries, a tabular summary of the results is presented. Interested readers may also consult [1] for graphical summaries of these results. Readers are strongly encouraged to examine these summaries. They provide some very clear insight into the numbers. Although the statistical terms used in this memo are basic measurements, they are not part of typical IETF parlance. Accordingly, we remind readers of their meanings: size: the number of values measured min: the smallest value measured Rose & Crocker Expires May 7, 2003 [Page 12] Internet-Draft Quantifying of IETF Productivity November 2002 max: the largest value measured avg: the arithmetic mean (simple average) of all values measured median: the dividing point of values, with half of the values below, and the other half above mode: a peak (maximum value) in the distribution; the primary mode is the highest peak sigma: one standard deviation from mean 1s: one sigma from mean 2s: two sigmas from mean 3s: three sigmas from mean Although looking at the median reduces the effect of extreme minimum or maximum values, it's also useful to try to normalize the data to permit better comparisons between areas. To do this, we present working group percentiles that occur at a given number of standard- deviations. Percentiles and standard-deviations normalize the results across the disparate groups, permitting fair comparison among them. (Note that due to rounding, the percentiles for each area may not add up to 100%). Rose & Crocker Expires May 7, 2003 [Page 13] Internet-Draft Quantifying of IETF Productivity November 2002 5.1 Days until 1st RFC published area size min avg median sigma max ==== ==== === === ====== ===== === * 75 193 777 741 418 1849 app 19 239 897 842 459 1705 int 7 193 731 772 419 1466 ops 11 287 867 794 481 1807 rtg 5 601 1106 1227 339 1438 sec 9 297 812 741 482 1849 sub 2 937 1040 1040 146 1144 tsv 20 206 480 473 161 791 usv 2 1029 1034 1034 8 1040 area 1s 2s 3s ==== === === === app 47 47 5 int 71 28 ops 63 27 9 rtg 40 60 sec 66 22 11 sub 100 tsv 75 25 usv 100 This measures how quickly the working groups in an area produce their first RFC. All of the areas, except Transport, show very inconsistent durations between start of the working group and issuance of the first RFC. Application and Operations shows some similarity to the shape of their distributions, with similar variance but very different means. Internet and Security also have variances that are similar but means that are quite different. Transport is distinctive, with a lower average and narrower variance than the other areas. Routing is distinctive, with the highest average and highest variance. Note however that it also has the smallest number of working groups in this calculation. The graphs of these measures, with number of days normalized as a percentage and the duration normalized in sigmas, is striking. All of the areas, except Internet, have a primary mode (highest peak) of 1.5 to 1.75 sigmas. The Internet area has a primary mode at about Rose & Crocker Expires May 7, 2003 [Page 14] Internet-Draft Quantifying of IETF Productivity November 2002 2.25 sigmas. Routing shows a distinctive, secondary mode at about 4.25 sigmas. 5.2 WG duration in days area size min avg median sigma max ==== ==== === === ====== ===== === * 75 235 1074 1059 427 1849 app 19 240 1094 959 477 1829 int 7 235 820 772 344 1334 ops 11 307 1054 1109 479 1823 rtg 5 934 1409 1331 366 1809 sec 9 515 1038 952 444 1849 sub 2 1144 1494 1494 495 1844 tsv 20 471 1088 1106 347 1684 usv 2 633 638 638 8 644 area 1s 2s ==== === === app 57 42 int 71 28 ops 54 45 rtg 60 40 sec 66 33 sub 50 50 tsv 80 20 usv 0 100 This measures the longevity of a working group, from the time its chartered, until it's concluded. The most significant observation about the data is that it has little coherence. The only pattern that is consistent is that all the areas show no real "shape" to the distribution of their working group durations, so that their distributions are nearly flat. Transport might be seen as having a bit of a distribution curve, but too few working groups (3) form the mode, to make such an assessment meaningful. Rose & Crocker Expires May 7, 2003 [Page 15] Internet-Draft Quantifying of IETF Productivity November 2002 5.3 WG duration normalized over #-RFCs produced area size min avg median sigma max ==== ==== === === ====== ===== === * 75 106 687 623 435 1849 app 19 120 784 812 453 1810 int 7 235 575 644 255 920 ops 11 132 652 533 426 1393 rtg 5 934 1409 1331 366 1809 sec 9 106 796 741 511 1849 sub 2 108 626 626 733 1144 tsv 20 159 435 434 189 875 usv 2 633 638 638 8 644 area 1s 2s 3s ==== === === === app 68 21 10 int 85 14 ops 72 27 rtg 20 40 40 sec 66 22 11 sub 0 100 tsv 80 20 usv 100 Calculating the total number of RFCs produced by a working group, against the duration of that working group, produces a unit of measure for the average time needed to produce each RFC. The distributions show very little pattern, except for Transport, which has a distinctive primary mode at one year per RFC. Over the other areas, most working groups take longer than two years to produce each RFC. The effect of normalizing both axes is remarkable. Normalizing to percentage of working groups, and number of sigmas, shows all of the areas to be relatively similar to each other in shape and mode of their distribution curves, with the primary mode being approximately 1.75 sigmas. Operations is distinctive with a lower percentage of working groups at the mode and a high-end tail having a more gradual descent. Routing is remarkable with two modes that are nearly the same height, one that is the same as the rest of the IETF, though with a markedly smaller percentage of working groups, and a second Rose & Crocker Expires May 7, 2003 [Page 16] Internet-Draft Quantifying of IETF Productivity November 2002 mode at approximately 4 sigmas. 5.4 Numver of RFCs produced area size min avg median sigma max ==== ==== === === ====== ===== === * 75 0 2 1 3 17 app 19 0 1 1 2 7 int 7 0 1 1 1 3 ops 11 0 2 1 4 12 rtg 5 0 0 0 0 1 sec 9 0 2 0 5 15 sub 2 0 8 8 12 17 tsv 20 1 3 2 2 9 usv 2 0 0 0 0 0 area 1s 2s 3s 4s 5s 6s ==== === === === === === === app 84 15 int 100 ops 81 9 0 9 rtg 100 sec 88 0 0 0 11 sub 50 0 0 0 0 50 tsv 75 20 5 usv 100 The distributions show very little pattern. Only Transport and Applications show a real mode, both at three RFCs. All areas show extremely long, flat, high-end tails. Again, the effect of normalizing both axes is remarkable. Normalizing to percentage of working groups, and number of sigmas, shows all of the areas to be extremely similar to each other, with well-shaped and equivalent curves having a primary mode at about 1.75 sigmas. Internet has a much lower, but distinctive, secondary mode at about 3.5 sigmas. Rose & Crocker Expires May 7, 2003 [Page 17] Internet-Draft Quantifying of IETF Productivity November 2002 6. Conclusions If these sorts of measures are useful, the question is what additional measurements and analyses should be pursued? First note that the data used for this analysis is only from the last 5 years. The modern IETF was formed in 1989 and these analysis should be applied across at least the last 10 years looking for trends, such as with a rolling 3 or 5 year analysis, to see how things have changed. Second note that we have only looked for broad assessments of the IETF. These same techniques can be used to look more closely at the history of particular working groups and even particular contributors to IETF efforts. Obviously statistics about people can be sensitive and the dangers of inappropriate use are particularly serious. Still, we wonder whether the mere fact that a working group has produced many specifications is a good thing, or whether a particular person has their name on many specifications is a good thing. Ultimately the question is whether those specifications get used. 6.1 Two Suggestions for the WG Charter Document System Please note the current system works fine for its original and intended purpose. However, here are two concrete suggestions for improving the charter documents maintained by the IETF secretariat: o each charter document should include information for each event in its lifetime (and in between); and, o in addition to making charter documents available in both text and HTML, to facilitate automatic processing, each charter document should also be available in XML. Consider this (abbreviated) example, which pretty much captures the whole of a working group's past and present activity: subscribe ... ...text... Prepare ... ... This is actually a straight-forward generalization of the model used in this analysis. However, since the DTD for this model isn't germane to this analysis, it isn't presented in this memo. Rose & Crocker Expires May 7, 2003 [Page 19] Internet-Draft Quantifying of IETF Productivity November 2002 7. Security Considerations This memo has nothing, whatsoever, to do with security; nor does it have anything to do with insecurity. Rose & Crocker Expires May 7, 2003 [Page 20] Internet-Draft Quantifying of IETF Productivity November 2002 URIs [1] [2] [3] Authors' Addresses Marshall T. Rose Dover Beach Consulting, Inc. POB 255268 Sacramento, CA 95865-5268 US Phone: +1 916 483 8878 EMail: mrose@dbc.mtview.ca.us David H. Crocker Brandenburg InternetWorking 675 Spruce Drive Sunnyvale, CA 94086 US Phone: +1 408 246 8253 EMail: dcrocker@brandenburg.com URI: http://www.brandenburg.com/ Rose & Crocker Expires May 7, 2003 [Page 21] Internet-Draft Quantifying of IETF Productivity November 2002 Appendix A. Data Gathering and Processing Because the IETF "written history" wasn't explicitly designed for productivity analysis, three different sources are consulted to synthesize a uniform dataset. Even so, there are still some gaps in the data, primarily before 1997. A.1 Charters, Messages, and Documents The charter document is the fundamental description of a working group. Since its earliest days, the IETF secretariat has been quite diligent in using a consistent charter format, which allows for automatic processing of the charter documents. Unfortunately, there is no known public archive of charter document revisions. Accordingly, if a working group is active, the charter document reflects the present state of affairs. In other words, the charter document indicates the current chairs, advisors and directors. But, it does not indicate when the working group was created, and who has been involved with the working group prior to the present. Fortunately, chair and technical advisor turnover is (anecdotally) rather low. Further, in the case of area director and advisors, the IESG membership rotates fairly infrequently -- even with a two-year term. In addition to a lack of personnel history, the charter document indicates only the latest I-Ds produced by a working group. (For example, if an I-D expires or moves to another working group, this fact isn't noted by the charter document.) Simlarly, when a working group reaches a milestone, the charter document is updated to replace the planned date of the milestone with the string "Done". (Obviously, knowing the planned and actual dates is rather useful for gauging a working group's ability to work towards deadlines.) A third issue in mining charter documents is that when a working group is concluded, the charter document is severely condensed, e.g., with the exception of the chairs all other personnel information is removed. The difficulty here is that when a working group is concluded and then re-activated, all of the history is lost. Fortunately, three other data sources are available to help minimize these deficiencies. First, there are the archives for the IETF general and announcement mailing lists. Although there are some gaps in the archives prior to mid-1998, when announcements were split off from the general discussion list, many charter announcements (and conclusion notices) are archived. (For this analysis, approximately 54,250 messages were Rose & Crocker Expires May 7, 2003 [Page 22] Internet-Draft Quantifying of IETF Productivity November 2002 examined to find 127 charter document and 65 conclusion announcements.) Second, there are several archival I-D repositories, which can be examined to reconstruct the document history of each working group. (This analysis uses the "watersprings" archive, which indexes approximately 11,700 I-Ds and RFCs since late 1991.) Finally, a small exception file was built by consulting the IETF proceedings. Of the 126 working groups chartered since the epoch, the charter document announcement message for 25 were not archived. However, by manually examining the IETF proceedings and the earliest I-Ds produced by those working groups, it is possible to estimate when these working groups were created. A.2 Reconstructing Activity Here are the steps to produce a uniform dataset: 1. The document history for each working group is constructed, by parsing the index for the I-D archive repository. Although straight-forward, there are a few nuances: * Some I-Ds were published with incorrect prefixes, so an exception list is consulted so that the correct working group gets the "credit". * The index lists documents and their revision history, but not publication dates nor sizes, so the appropriate instances are retrieved in order to determine this information. 2. The creation and conclusion dates for each working group is determined, by parsing the mail archives for the IETF general and announcement mailing lists. It turns out that (only) eight of these announcements are poorly-formatted, so an additional archive containing these 8 messages, properly-formatted, is also consulted. 3. Additional information for each working group is determined, by parsing the current charter document. Because of the deficiencies in the format, some additional steps are taken: * If a working group is concluded, then the charter announcement is examined (if available) to determine the area associated with the working group. * If the creation date of a working group isn't known, or if a working group has produced a document that is before the Rose & Crocker Expires May 7, 2003 [Page 23] Internet-Draft Quantifying of IETF Productivity November 2002 charter announcement, then the creation date of the working group is set to the working group's earliest document, and a note is made of this. (This may happen if the creation date of a working group is estimated using the IETF proceedings.) Howeer, to avoid skewing the results, these working groups were not analyzed. * If the conclusion date of a working group is known, and if the working group produced a document after that date, then the conclusion date of the working group is updated, and a note is made of this. (This happens when a working group is concluded before one of its document is published as an RFC -- ultimately, only RFCs count, so even if the IESG formally concludes a working group, this analysis doesn't consider the working group concluded until the RFC editor publishes...) 4. Finally, the XML and corresponding SQL datasets are produced. When the SQL dataset is produced, for each working groups considered active, a check is made to see if all of the working group's documents have been published as RFCs. If so, the working group is considered inactive, as of the most recent RFC publication date. (This reflects the fact that the IESG often does not conclude a working group until its documents reach final standardization status.) A.3 Running the Analysis The software and data for this analysis is available at [2]. If you want to build the datasets yourself, you'll need a UNIX system (e.g., NetBSD) and these packages: o 'tcl', a powerful scripting language; o the 'mbox' package for Tcl; and, o the GNU 'wget' utility. If you just want the resulting XML and SQL datasets, they're also available at [3]. You'll also need database software. This analysis was generated using the excellent "MySQL" package, although any "modern" SQL software should work. Simliarly, any "postmodern" XML database software should work, e.g., "Tamino" or "eXist", although the authors used only SQL for database access. Finally, to query the database and visualize the results, this analysis uses the "fbsql" extension to "Tcl" and the friendly Rose & Crocker Expires May 7, 2003 [Page 24] Internet-Draft Quantifying of IETF Productivity November 2002 "ploticus" graphics package. Rose & Crocker Expires May 7, 2003 [Page 25] Internet-Draft Quantifying of IETF Productivity November 2002 Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Acknowledgement Funding for the RFC Editor function is currently provided by the Internet Society. Rose & Crocker Expires May 7, 2003 [Page 26]