Network Working Group                                            M. Rose
Internet-Draft                              Dover Beach Consulting, Inc.
Expires: May 7, 2003                                          D. Crocker
                                             Brandenburg InternetWorking
                                                        November 6, 2002


          Toward a Quantitative Analysis of IETF Productivity
                      draft-etal-ietf-analysis-02

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on May 7, 2003.

Copyright Notice

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

Abstract

   This memo presents an initial quantitative analysis of the IETF's
   working groups using RFC publication as the primary metric.  These
   basic indicators are sufficient for an initial assessment of IETF
   performance.  We can discuss our expectations for the numbers and our
   reaction to them.  Where there is a discrepancy, we can decide
   whether to change our expectations or whether to look for ways to
   improve the numbers.  In other words, the purpose of this effort is
   to encourage community discussion about measuring IETF productivity,
   detecting possible problems and fixing them.


Rose & Crocker            Expires May 7, 2003                   [Page 1]

Internet-Draft      Quantifying of IETF Productivity       November 2002


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Statistics and Methodology . . . . . . . . . . . . . . . . . .  5
   2.1 What to measure  . . . . . . . . . . . . . . . . . . . . . . .  5
   3.  The Model  . . . . . . . . . . . . . . . . . . . . . . . . . .  7
   4.  The Queries  . . . . . . . . . . . . . . . . . . . . . . . . .  9
   5.  The Results  . . . . . . . . . . . . . . . . . . . . . . . . . 12
   5.1 Days until 1st RFC published . . . . . . . . . . . . . . . . . 14
   5.2 WG duration in days  . . . . . . . . . . . . . . . . . . . . . 15
   5.3 WG duration normalized over #-RFCs produced  . . . . . . . . . 16
   5.4 Numver of RFCs produced  . . . . . . . . . . . . . . . . . . . 17
   6.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 18
   6.1 Two Suggestions for the WG Charter Document  System  . . . . . 18
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 20
       Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . 21
   A.  Data Gathering and Processing  . . . . . . . . . . . . . . . . 22
   A.1 Charters, Messages, and Documents  . . . . . . . . . . . . . . 22
   A.2 Reconstructing Activity  . . . . . . . . . . . . . . . . . . . 23
   A.3 Running the Analysis . . . . . . . . . . . . . . . . . . . . . 24
       Full Copyright Statement . . . . . . . . . . . . . . . . . . . 26


Rose & Crocker            Expires May 7, 2003                   [Page 2]

Internet-Draft      Quantifying of IETF Productivity       November 2002


1. Introduction

   An important part of management is measurement.

   When you can measure something, you can make reasoned decisions as to
   how to improve it -- if not "reasoned", then at least "better
   informed".  Imagine trying to achieve the last 15 years of
   improvements to TCP's algorithms without being able to measure the
   results at each step!

   The Internet depends upon the IETF's producing useful specifications,
   in a timely manner.  Many folks contribute considerable resources to
   that goal, so a lot of folks should be interested in understanding
   how well the IETF is working.  Historically the IETF has an excellent
   track record.  However as it has grown, there is increasing concern
   that IETF efforts are less efficient and less effective.  To date the
   community has relied solely on its subjective sense of this change.
   It is time to get help from evaluation tools that are both objective
   and useful.

   Obviously measuring the effectiveness of an organization is different
   than measuring the effectiveness of a protocol.  However there are
   some fairly simple, objective metrics that we can apply to get a
   first- or second-order approximation.  (Previously, the only public
   analysis of the IETF, per se, has been of the number of working
   groups and meeting size.  While interesting these metrics are useful
   primarily for logistics planning.)

   This memo is to encourage community discussion about measuring IETF
   productivity, detecting possible problems and fixing them.  Therefore
   simple measurements and simple analyses are used.  It is hoped that
   the community will focus on the question of IETF productivity, rather
   than the question of methodological imperfections.

   The ultimate test of the IETF is that its work gets used by the
   Internet community.  Given the size of the Internet, this should mean
   that the work is employed on millions or even hundreds of millions of
   platforms.  However it is years before adoption and use can be
   measured, and even then, we do not have objective methods for
   assessing that success.

   Consider these two assertions:

   o  from a work product perspective, the IETF is simply the sum of its
      working groups; and,

   o  while a working group might do many valuable things, the only
      quantifiable metric is the number and timing of the RFCs that it


Rose & Crocker            Expires May 7, 2003                   [Page 3]

Internet-Draft      Quantifying of IETF Productivity       November 2002


      produces.

   So to measure something now, we'll take the middle road: while
   agreeing that there are significant qualitative aspects of the IETF
   that are not tied to RFC publication, and while hoping that IETF
   participants are significant stakeholders in terms of needing
   successful implementation and provisioning, we'll focus on measuring
   IETF production of RFCs.  In particular does the IETF produce
   specifications efficiently? To be useful answers to such a question
   needs to identify activities or areas that might be problematic.


Rose & Crocker            Expires May 7, 2003                   [Page 4]

Internet-Draft      Quantifying of IETF Productivity       November 2002


2. Statistics and Methodology

   The first question is what gives guidance about IETF productivity and
   is reasonably easy to measure objectively?

   Statistical measurement involves the choice of what to measure and
   the choice of how to analyze the measurements.  In this memo we keep
   both choices as simple as possible.  The first needs to be
   intuitively reasonable and procedurally easy.  The second also needs
   to be minimalist.  Because this process of IETF measurement is new,
   no one has enough knowledge about it to use sophisticated statistical
   tools.  (For that matter, the small very small quantity of data and
   its non-normal distribution make use of the usual statistics tools
   inappropriate.)

   In fact the IETF approach to rough consensus is helpful here.  It
   means that we need the data and the analyses to be straightforward,
   so that the bulk of the community can understand it easily, and agree
   with it.  Therefore we limit ourselves to the most basic
   calculations: mean, minimum and maximum, and standard deviation
   (sigma).

   The mean tells us what is "usual" whilst the minimum and maximum tell
   us the upper and lower bounds of performance.  The standard deviation
   gives us gradations of "better" and "worse", "faster" and "slower".
   These basic indicators are sufficient for an initial assessment of
   IETF performance.  We can discuss our expectations for the numbers
   and our reaction to them.  Where there is a discrepancy, we can
   decide whether to change our expectations or whether to look for ways
   to improve the numbers.

   1.  Develop a model and populate it with data.

   2.  Decide on some interesting queries (analyses) to run against the
       data.

   3.  Look at the results.

   We hope that such tidbits prompt community discussion about useful
   metrics and their preferred values.  Even better will be the
   development of community consensus for such measures and on-going use
   of them to improve IETF performance.

2.1 What to measure

   Since we can't measure exactly what we'd like, namely "productivity",
   we'll measure something that's close.


Rose & Crocker            Expires May 7, 2003                   [Page 5]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   It is easy to create mechanical measurements of IETF activity.  The
   measure can be entirely objective and entirely meaningless.  Size of
   meeting sessions, number of messages on the mailing list, number of
   words per specification, and number of RFCs cited in specifications
   are all examples.  For some these measures might be interesting, but
   they do not really tell us much about productivity.  They do not tell
   us how well specifications are produced.

   So it is difficult to measure statistics that say something both
   meaningful and accurate.  They need to tell us how well one or a
   number of efforts is working.  They might also tell us where we have
   a problem.  Although we try, here, to select particular measures that
   seem useful, we carefully avoid drawing any qualitative conclusions
   about the results.  Therefore while we might show that it takes a
   working group about 740 days to produce its first RFC, we will not
   say that this is either "sweet" or "sour".  As with everything else
   in life, reader are free to draw their own conclusions (that
   naturally reinforce their own perspectives).


Rose & Crocker            Expires May 7, 2003                   [Page 6]

Internet-Draft      Quantifying of IETF Productivity       November 2002


3. The Model

   Given our second assertion, that RFC publication is the only thing
   that a working group does that we care to measure.  How can we go
   about measuring it?

   The first step is to develop a model of a working group.  For this
   analysis, we're going to use XML as the description language:


   <!ENTITY % DATE      "CDATA">
   <!ENTITY % URI       "CDATA">
   <!ENTITY % SIZE      "CDATA">

   <!ELEMENT ietf        (group*)>

   <!ELEMENT group       (person*,doc*)>
   <!ATTLIST group
             name        CDATA              #REQUIRED
             title       CDATA              #REQUIRED
             area        CDATA              #REQUIRED
             chartered   %DATE;             #REQUIRED
             concluded   %DATE;             #IMPLIED
             estimated   NMTOKENS           "">

   <!ELEMENT person      EMPTY>
   <!ATTLIST person
             name        CDATA              #REQUIRED
             role        (director|area-advisor|chair|technical-advisor)
                                            "chair">

   <!ELEMENT doc         (revision*)>
   <!ATTLIST doc
             name        CDATA              #REQUIRED
             status     (rfc|expired|inprogress|moved) "inprogress">

   <!ELEMENT revision    EMPTY>
   <!ATTLIST revision
             published   %DATE;             #REQUIRED
             uri         %URI;              #REQUIRED
             size        %SIZE;             #REQUIRED>


   which provides a document-centric, and, to a lesser extent, a role-
   centric model of a working group.  Of course, it's not a complete
   model, (e.g., BOF and meeting information is absent).


Rose & Crocker            Expires May 7, 2003                   [Page 7]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   To make this data model a bit more concrete, here's an abbreviated
   example:


   <group name='beep' area='app' chartered='20000705'
          title='Blocks Extensible Exchange Protocol'>

     <person name='Pete Resnick' role='chair' />
     <person name='Ned Freed' role='director' />
     <person name='Patrik Faltstrom' role='director' />
     <person name='Ned Freed' role='area-advisor' />

     <doc name='draft-ietf-beep-framework' status='rfc'>
       <revision published='20010301' size='82089'
                 uri='http://.../rfc3080.txt' />
       <revision published='20010105' size='83266'
                 uri='http://.../draft-ietf-beep-framework-11.txt' />
       ...
       <revision published='20000825' size='77321'
                 uri='http://.../draft-ietf-beep-framework-00.txt' />
     </doc>
     ...
   </group>


   In case it isn't obvious, throughout this memo the presence of an
   ellipsis ("...") in an example indicates that some information was
   omitted.

   It turns out this information can be gathered in a fairly automated
   fashion! Consult the Appendix for the details (including how to get a
   copy of the dataset along with the tools that synthesized it.)


Rose & Crocker            Expires May 7, 2003                   [Page 8]

Internet-Draft      Quantifying of IETF Productivity       November 2002


4. The Queries

   Although the IETF secretariat does a great job of documenting things,
   the process has evolved over more than a decade.  It turns out that
   the data starts to get "really clean", for quantitative analysis, in
   late 1998.  Moving further back, it becomes progressively harder to
   construct the written record.  So, the very first thing to appreciate
   is that, for the purposes of this analysis, the epoch for any query
   is February 12, 1997 -- getting the data clean from that date moving
   forward is straight-forward.  With a lot more work, we can move the
   epoch backward in time, This will be particularly helpful for doing
   trend analysis, trying to discern changes in IETF productivity.
   Regardless of the data cleanliness issue, it turns out that February,
   1997 is a natural choice for an epoch -- this covers the last five
   years.

   Besides cleanliness, there may be other things to consider when
   limiting the input domain.  For example, if a working group was
   created very recently, the fact that it hasn't published any RFCs yet
   is hardly of interest.  The question, of course, is where the cutoff
   is for "very recently".  For the purposes of this analysis, we'll
   consider any working group that's at least two years old as being of
   interest.

   Another thing to consider is whether a working group is actually
   "active".  The IESG often does not conclude a working group at the
   end of a document production cycle -- instead, the working group may
   remain on "the books" as active, even though it isn't allowed to
   produce any more RFCs until some external event occurs, (e.g., an
   period of implementation and experimentation before re-examining a
   document).  Since this may affect some queries (e.g., "what's the
   lifetime of a working group?"), we'll use the following heuristic: if
   the working group has published at least one RFC, and if all of its
   Internet-Drafts have been published as RFCs, then we consider it
   "inactive".

   So, here are the queries to consider:

   o  How long does it a take a working group to get its first RFC
      published?

   o  How long is a working group "active"?

   o  How many RFCs does a working group publish?

   o  For these quantities, what is the average, the distribution, and
      the extrema?


Rose & Crocker            Expires May 7, 2003                   [Page 9]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   o  Do these relationships change if we aggregate the working groups
      into areas?

   In order to answer questions such as these, we'll also need a
   relational model, for which we'll use SQL:


   CREATE TABLE person (
       id        int(11)      NOT NULL auto_increment,
       name      varchar(255) NOT NULL default '',
       PRIMARY KEY (id)
   );

   CREATE TABLE groop (
       id        int(11)      NOT NULL auto_increment,
       name      varchar(8)   NOT NULL default '',
       area      varchar(8)   NOT NULL default '',
       chartered date         NOT NULL default '',
       concluded date                  default NULL,
       title     varchar(255) NOT NULL default '',
       PRIMARY KEY (id)
   );

   CREATE TABLE role (
       id        int(11)      NOT NULL auto_increment,
       groupID   int(11)      NOT NULL default 0,
       personID  int(11)      NOT NULL default 0,
       name      varchar(25)  NOT NULL default '',
       PRIMARY KEY (id)
   );

   CREATE TABLE doc (
       id        int(11)      NOT NULL auto_increment,
       groupID   int(11)      NOT NULL default 0,
       status    varchar(25)  NOT NULL default '',
       name      varchar(255) NOT NULL default '',
       PRIMARY KEY (id)
   );

   CREATE TABLE revision (
       id        int(11)      NOT NULL auto_increment,
       groupID   int(11)      NOT NULL default 0,
       docID     int(11)      NOT NULL default 0,
       size      int(11)      NOT NULL default 0,
       published date         NOT NULL default '',
       uri       varchar(255) NOT NULL default '',
       PRIMARY KEY (id)


Rose & Crocker            Expires May 7, 2003                  [Page 10]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   );


Rose & Crocker            Expires May 7, 2003                  [Page 11]

Internet-Draft      Quantifying of IETF Productivity       November 2002


5. The Results

   We now look at the results from the queries.  Note that for a working
   group to be considered in this analysis:

   o  it must have been chartered since the epoch; and,

   o  either:

      *  have published at least one RFC; or,

      *  be at least two years old.

   Seventy-five working groups meet this criteria:


                area         size
       =================     ====
        Applications app       19
            Internet int        7
          Operations ops       11
             Routing rtg        5
            Security sec        9
              Sub-IP sub        2
           Transport tsv       20
       User Services usv        2


   With the exception of the Sub-IP and User Services areas, we should
   be able to make comparisons between areas.

   For each of the queries, a tabular summary of the results is
   presented.  Interested readers may also consult  [1] for graphical
   summaries of these results.  Readers are strongly encouraged to
   examine these summaries.  They provide some very clear insight into
   the numbers.

   Although the statistical terms used in this memo are basic
   measurements, they are not part of typical IETF parlance.
   Accordingly, we remind readers of their meanings:

   size: the number of values measured

   min: the smallest value measured


Rose & Crocker            Expires May 7, 2003                  [Page 12]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   max: the largest value measured

   avg: the arithmetic mean (simple average) of all values measured

   median: the dividing point of values, with half of the values below,
      and the other half above

   mode: a peak (maximum value) in the distribution; the primary mode is
      the highest peak

   sigma: one standard deviation from mean

   1s: one sigma from mean

   2s: two sigmas from mean

   3s: three sigmas from mean

   Although looking at the median reduces the effect of extreme minimum
   or maximum values, it's also useful to try to normalize the data to
   permit better comparisons between areas.  To do this, we present
   working group percentiles that occur at a given number of standard-
   deviations.  Percentiles and standard-deviations normalize the
   results across the disparate groups, permitting fair comparison among
   them.  (Note that due to rounding, the percentiles for each area may
   not add up to 100%).


Rose & Crocker            Expires May 7, 2003                  [Page 13]

Internet-Draft      Quantifying of IETF Productivity       November 2002


5.1 Days until 1st RFC published


   area    size     min     avg    median  sigma    max
   ====    ====     ===     ===    ======  =====    ===
   *         75     193     777       741    418   1849
   app       19     239     897       842    459   1705
   int        7     193     731       772    419   1466
   ops       11     287     867       794    481   1807
   rtg        5     601    1106      1227    339   1438
   sec        9     297     812       741    482   1849
   sub        2     937    1040      1040    146   1144
   tsv       20     206     480       473    161    791
   usv        2    1029    1034      1034      8   1040

                   area  1s  2s  3s
                   ==== === === ===
                   app   47  47   5
                   int   71  28
                   ops   63  27   9
                   rtg   40  60
                   sec   66  22  11
                   sub  100
                   tsv   75  25
                   usv  100


   This measures how quickly the working groups in an area produce their
   first RFC.

   All of the areas, except Transport, show very inconsistent durations
   between start of the working group and issuance of the first RFC.
   Application and Operations shows some similarity to the shape of
   their distributions, with similar variance but very different means.
   Internet and Security also have variances that are similar but means
   that are quite different.  Transport is distinctive, with a lower
   average and narrower variance than the other areas.  Routing is
   distinctive, with the highest average and highest variance.  Note
   however that it also has the smallest number of working groups in
   this calculation.

   The graphs of these measures, with number of days normalized as a
   percentage and the duration normalized in sigmas, is striking.  All
   of the areas, except Internet, have a primary mode (highest peak) of
   1.5 to 1.75 sigmas.  The Internet area has a primary mode at about


Rose & Crocker            Expires May 7, 2003                  [Page 14]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   2.25 sigmas.  Routing shows a distinctive, secondary mode at about
   4.25 sigmas.

5.2 WG duration in days


   area    size     min     avg    median  sigma    max
   ====    ====     ===     ===    ======  =====    ===
   *         75     235    1074      1059    427   1849
   app       19     240    1094       959    477   1829
   int        7     235     820       772    344   1334
   ops       11     307    1054      1109    479   1823
   rtg        5     934    1409      1331    366   1809
   sec        9     515    1038       952    444   1849
   sub        2    1144    1494      1494    495   1844
   tsv       20     471    1088      1106    347   1684
   usv        2     633     638       638      8    644

                       area  1s  2s
                       ==== === ===
                       app   57  42
                       int   71  28
                       ops   54  45
                       rtg   60  40
                       sec   66  33
                       sub   50  50
                       tsv   80  20
                       usv    0 100


   This measures the longevity of a working group, from the time its
   chartered, until it's concluded.

   The most significant observation about the data is that it has little
   coherence.  The only pattern that is consistent is that all the areas
   show no real "shape" to the distribution of their working group
   durations, so that their distributions are nearly flat.  Transport
   might be seen as having a bit of a distribution curve, but too few
   working groups (3) form the mode, to make such an assessment
   meaningful.


Rose & Crocker            Expires May 7, 2003                  [Page 15]

Internet-Draft      Quantifying of IETF Productivity       November 2002


5.3 WG duration normalized over #-RFCs produced


   area    size     min     avg    median  sigma    max
   ====    ====     ===     ===    ======  =====    ===
   *         75     106     687       623    435   1849
   app       19     120     784       812    453   1810
   int        7     235     575       644    255    920
   ops       11     132     652       533    426   1393
   rtg        5     934    1409      1331    366   1809
   sec        9     106     796       741    511   1849
   sub        2     108     626       626    733   1144
   tsv       20     159     435       434    189    875
   usv        2     633     638       638      8    644

                   area  1s  2s  3s
                   ==== === === ===
                   app   68  21  10
                   int   85  14
                   ops   72  27
                   rtg   20  40  40
                   sec   66  22  11
                   sub    0 100
                   tsv   80  20
                   usv  100


   Calculating the total number of RFCs produced by a working group,
   against the duration of that working group, produces a unit of
   measure for the average time needed to produce each RFC.

   The distributions show very little pattern, except for Transport,
   which has a distinctive primary mode at one year per RFC.  Over the
   other areas, most working groups take longer than two years to
   produce each RFC.

   The effect of normalizing both axes is remarkable.  Normalizing to
   percentage of working groups, and number of sigmas, shows all of the
   areas to be relatively similar to each other in shape and mode of
   their distribution curves, with the primary mode being approximately
   1.75 sigmas.  Operations is distinctive with a lower percentage of
   working groups at the mode and a high-end tail having a more gradual
   descent.  Routing is remarkable with two modes that are nearly the
   same height, one that is the same as the rest of the IETF, though
   with a markedly smaller percentage of working groups, and a second


Rose & Crocker            Expires May 7, 2003                  [Page 16]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   mode at approximately 4 sigmas.

5.4 Numver of RFCs produced


   area    size     min     avg    median  sigma    max
   ====    ====     ===     ===    ======  =====    ===
   *         75       0       2         1      3     17
   app       19       0       1         1      2      7
   int        7       0       1         1      1      3
   ops       11       0       2         1      4     12
   rtg        5       0       0         0      0      1
   sec        9       0       2         0      5     15
   sub        2       0       8         8     12     17
   tsv       20       1       3         2      2      9
   usv        2       0       0         0      0      0

               area  1s  2s  3s  4s  5s  6s
               ==== === === === === === ===
               app   84  15
               int  100
               ops   81   9   0   9
               rtg  100
               sec   88   0   0   0  11
               sub   50   0   0   0   0  50
               tsv   75  20   5
               usv  100


   The distributions show very little pattern.  Only Transport and
   Applications show a real mode, both at three RFCs.  All areas show
   extremely long, flat, high-end tails.

   Again, the effect of normalizing both axes is remarkable.
   Normalizing to percentage of working groups, and number of sigmas,
   shows all of the areas to be extremely similar to each other, with
   well-shaped and equivalent curves having a primary mode at about 1.75
   sigmas.  Internet has a much lower, but distinctive, secondary mode
   at about 3.5 sigmas.


Rose & Crocker            Expires May 7, 2003                  [Page 17]

Internet-Draft      Quantifying of IETF Productivity       November 2002


6. Conclusions

   If these sorts of measures are useful, the question is what
   additional measurements and analyses should be pursued?

   First note that the data used for this analysis is only from the last
   5 years.  The modern IETF was formed in 1989 and these analysis
   should be applied across at least the last 10 years looking for
   trends, such as with a rolling 3 or 5 year analysis, to see how
   things have changed.

   Second note that we have only looked for broad assessments of the
   IETF.  These same techniques can be used to look more closely at the
   history of particular working groups and even particular contributors
   to IETF efforts.  Obviously statistics about people can be sensitive
   and the dangers of inappropriate use are particularly serious.

   Still, we wonder whether the mere fact that a working group has
   produced many specifications is a good thing, or whether a particular
   person has their name on many specifications is a good thing.
   Ultimately the question is whether those specifications get used.

6.1 Two Suggestions for the WG Charter Document  System

   Please note the current system works fine for its original and
   intended purpose.  However, here are two concrete suggestions for
   improving the charter documents maintained by the IETF secretariat:

   o  each charter document should include information for each event in
      its lifetime (and in between); and,

   o  in addition to making charter documents available in both text and
      HTML, to facilitate automatic processing, each charter document
      should also be available in XML.

   Consider this (abbreviated) example, which pretty much captures the
   whole of a working group's past and present activity:


   <group name='beep' area='app'
          title='Blocks Extensible Exchange Protocol'>
     <events>
       <event type='chartered' date='20000705' />
       <event type='inactive'  date='20010302' />
     </events>

     <mail mailto='beepwg@lists.beepcore.org'


Rose & Crocker            Expires May 7, 2003                  [Page 18]

Internet-Draft      Quantifying of IETF Productivity       November 2002


           archive='http://lists.beepcore.org/mailman/listinfo/beepwg/'>
       <subscribe mailto='beepwg-request@...'>subscribe</subscribe>
     </mail>

     <person name='Keith McCloghrie' mailto='...'
             role='chair'        begin='20000705' end='20010201' />
     <person name='Pete Resnick' mailto='...'
             role='chair'        begin='20010201' />
     <person name='Ned Freed' mailto='...'
             role='director'     begin='20010201' />
     <person name='Patrik Faltstrom' mailto='...'
             role='director'     begin='20010201' />
     <person name='Ned Freed' mailto='...'
             role='area-advisor' begin='20010201' />

     <doc name='draft-ietf-beep-framework' status='rfc'>
       <revision published='20010301' size='82089'
                 uri='http://.../rfc3080.txt' />
       <revision published='20010105' size='83266'
                 uri='http://.../draft-ietf-beep-framework-11.txt' />
       ...
     </doc>

     <description begin='20000705'> ...text... </description>

     <milestones begin='20000705'>
       <milestone planned='20000705' actual='20000731'>Prepare
                                                          ...<milestone>
       ...
     </milestones>
   </group>


   This is actually a straight-forward generalization of the model used
   in this analysis.  However, since the DTD for this model isn't
   germane to this analysis, it isn't presented in this memo.


Rose & Crocker            Expires May 7, 2003                  [Page 19]

Internet-Draft      Quantifying of IETF Productivity       November 2002


7. Security Considerations

   This memo has nothing, whatsoever, to do with security; nor does it
   have anything to do with insecurity.


Rose & Crocker            Expires May 7, 2003                  [Page 20]

Internet-Draft      Quantifying of IETF Productivity       November 2002


URIs

   [1]  <http://xml.resource.org/ietf-analysis/current/>

   [2]  <http://xml.resource.org/ietf-analysis/analysis.tgz>

   [3]  <http://xml.resource.org/ietf-analysis/analysis.tgz>


Authors' Addresses

   Marshall T. Rose
   Dover Beach Consulting, Inc.
   POB 255268
   Sacramento, CA  95865-5268
   US

   Phone: +1 916 483 8878
   EMail: mrose@dbc.mtview.ca.us


   David H. Crocker
   Brandenburg InternetWorking
   675 Spruce Drive
   Sunnyvale, CA  94086
   US

   Phone: +1 408 246 8253
   EMail: dcrocker@brandenburg.com
   URI:   http://www.brandenburg.com/


Rose & Crocker            Expires May 7, 2003                  [Page 21]

Internet-Draft      Quantifying of IETF Productivity       November 2002


Appendix A. Data Gathering and Processing

   Because the IETF "written history" wasn't explicitly designed for
   productivity analysis, three different sources are consulted to
   synthesize a uniform dataset.  Even so, there are still some gaps in
   the data, primarily before 1997.

A.1 Charters, Messages, and Documents

   The charter document is the fundamental description of a working
   group.  Since its earliest days, the IETF secretariat has been quite
   diligent in using a consistent charter format, which allows for
   automatic processing of the charter documents.  Unfortunately, there
   is no known public archive of charter document revisions.

   Accordingly, if a working group is active, the charter document
   reflects the present state of affairs.  In other words, the charter
   document indicates the current chairs, advisors and directors.  But,
   it does not indicate when the working group was created, and who has
   been involved with the working group prior to the present.
   Fortunately, chair and technical advisor turnover is (anecdotally)
   rather low.  Further, in the case of area director and advisors, the
   IESG membership rotates fairly infrequently -- even with a two-year
   term.

   In addition to a lack of personnel history, the charter document
   indicates only the latest I-Ds produced by a working group.  (For
   example, if an I-D expires or moves to another working group, this
   fact isn't noted by the charter document.) Simlarly, when a working
   group reaches a milestone, the charter document is updated to replace
   the planned date of the milestone with the string "Done".
   (Obviously, knowing the planned and actual dates is rather useful for
   gauging a working group's ability to work towards deadlines.)

   A third issue in mining charter documents is that when a working
   group is concluded, the charter document is severely condensed, e.g.,
   with the exception of the chairs all other personnel information is
   removed.  The difficulty here is that when a working group is
   concluded and then re-activated, all of the history is lost.

   Fortunately, three other data sources are available to help minimize
   these deficiencies.

   First, there are the archives for the IETF general and announcement
   mailing lists.  Although there are some gaps in the archives prior to
   mid-1998, when announcements were split off from the general
   discussion list, many charter announcements (and conclusion notices)
   are archived.  (For this analysis, approximately 54,250 messages were


Rose & Crocker            Expires May 7, 2003                  [Page 22]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   examined to find 127 charter document and 65 conclusion
   announcements.)

   Second, there are several archival I-D repositories, which can be
   examined to reconstruct the document history of each working group.
   (This analysis uses the "watersprings" archive, which indexes
   approximately 11,700 I-Ds and RFCs since late 1991.)

   Finally, a small exception file was built by consulting the IETF
   proceedings.  Of the 126 working groups chartered since the epoch,
   the charter document announcement message for 25 were not archived.
   However, by manually examining the IETF proceedings and the earliest
   I-Ds produced by those working groups, it is possible to estimate
   when these working groups were created.

A.2 Reconstructing Activity

   Here are the steps to produce a uniform dataset:

   1.  The document history for each working group is constructed, by
       parsing the index for the I-D archive repository.  Although
       straight-forward, there are a few nuances:

       *  Some I-Ds were published with incorrect prefixes, so an
          exception list is consulted so that the correct working group
          gets the "credit".

       *  The index lists documents and their revision history, but not
          publication dates nor sizes, so the appropriate instances are
          retrieved in order to determine this information.

   2.  The creation and conclusion dates for each working group is
       determined, by parsing the mail archives for the IETF general and
       announcement mailing lists.  It turns out that (only) eight of
       these announcements are poorly-formatted, so an additional
       archive containing these 8 messages, properly-formatted, is also
       consulted.

   3.  Additional information for each working group is determined, by
       parsing the current charter document.  Because of the
       deficiencies in the format, some additional steps are taken:

       *  If a working group is concluded, then the charter announcement
          is examined (if available) to determine the area associated
          with the working group.

       *  If the creation date of a working group isn't known, or if a
          working group has produced a document that is before the


Rose & Crocker            Expires May 7, 2003                  [Page 23]

Internet-Draft      Quantifying of IETF Productivity       November 2002


          charter announcement, then the creation date of the working
          group is set to the working group's earliest document, and a
          note is made of this.  (This may happen if the creation date
          of a working group is estimated using the IETF proceedings.)
          Howeer, to avoid skewing the results, these working groups
          were not analyzed.

       *  If the conclusion date of a working group is known, and if the
          working group produced a document after that date, then the
          conclusion date of the working group is updated, and a note is
          made of this.  (This happens when a working group is concluded
          before one of its document is published as an RFC --
          ultimately, only RFCs count, so even if the IESG formally
          concludes a working group, this analysis doesn't consider the
          working group concluded until the RFC editor publishes...)

   4.  Finally, the XML and corresponding SQL datasets are produced.
       When the SQL dataset is produced, for each working groups
       considered active, a check is made to see if all of the working
       group's documents have been published as RFCs.  If so, the
       working group is considered inactive, as of the most recent RFC
       publication date.  (This reflects the fact that the IESG often
       does not conclude a working group until its documents reach final
       standardization status.)


A.3 Running the Analysis

   The software and data for this analysis is available at  [2].

   If you want to build the datasets yourself, you'll need a UNIX system
   (e.g., NetBSD) and these packages:

   o  'tcl', a powerful scripting language;

   o  the 'mbox' package for Tcl; and,

   o  the GNU 'wget' utility.

   If you just want the resulting XML and SQL datasets, they're also
   available at  [3].  You'll also need database software.  This
   analysis was generated using the excellent "MySQL" package, although
   any "modern" SQL software should work.  Simliarly, any "postmodern"
   XML database software should work, e.g., "Tamino" or "eXist",
   although the authors used only SQL for database access.

   Finally, to query the database and visualize the results, this
   analysis uses the "fbsql" extension to "Tcl" and the friendly


Rose & Crocker            Expires May 7, 2003                  [Page 24]

Internet-Draft      Quantifying of IETF Productivity       November 2002


   "ploticus" graphics package.


Rose & Crocker            Expires May 7, 2003                  [Page 25]

Internet-Draft      Quantifying of IETF Productivity       November 2002


Full Copyright Statement

   Copyright (C) The Internet Society (2002).  All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works.  However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.

   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assigns.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.


Rose & Crocker            Expires May 7, 2003                  [Page 26]