[xml2rfc] vspace usage statistics

Julian Reschke <julian.reschke@gmx.de> Sun, 16 February 2014 18:42 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0E91E1A0210 for <xml2rfc@ietfa.amsl.com>; Sun, 16 Feb 2014 10:42:56 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.198
X-Spam-Level:
X-Spam-Status: No, score=-1.198 tagged_above=-999 required=5 tests=[BAYES_50=0.8, FREEMAIL_FROM=0.001, GB_I_LETTER=-2, SPF_FAIL=0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QZTn08L1wdq3 for <xml2rfc@ietfa.amsl.com>; Sun, 16 Feb 2014 10:42:33 -0800 (PST)
Received: from cyclone.public.resource.org (cyclone.public.resource.org [192.101.98.135]) by ietfa.amsl.com (Postfix) with ESMTP id 3C7421A01EC for <xml2rfc@ietf.org>; Sun, 16 Feb 2014 10:42:33 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.19]) by cyclone.public.resource.org (8.14.5/8.14.4) with ESMTP id s1GIg2L1043566 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for <xml2rfc@lists.xml.resource.org>; Sun, 16 Feb 2014 10:42:03 -0800 (PST) (envelope-from julian.reschke@gmx.de)
Received: from [192.168.2.117] ([84.187.40.12]) by mail.gmx.com (mrgmx102) with ESMTPSA (Nemesis) id 0LosFD-1WvLEb0TD1-00glr7 for <xml2rfc@lists.xml.resource.org>; Sun, 16 Feb 2014 19:41:55 +0100
Message-ID: <5301066C.6070500@gmx.de>
Date: Sun, 16 Feb 2014 19:41:48 +0100
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0
MIME-Version: 1.0
To: xml2rfc <xml2rfc@lists.xml.resource.org>
Content-Type: multipart/mixed; boundary="------------030806060905010003060008"
X-Provags-ID: V03:K0:fPDd+5WhLPgJuSyLY/iIhz5usV3s4uP2L+h16mGKrjPkQ6T/MKL yrNtad6FY0LQGExf/ByI8nXopAPsGivy0mrlPxKNwQViop26VaoR6iDOmdcaGBjQHCTqbZP Ugrw8g304AbNEc9yrKAC9nm1v5O482XeRDTcN3Ue32Kh2KEkdyozsJBVBoSsa68lxUPtW+U O6AxqX87YPo+3fZf2hqGA==
Archived-At: http://mailarchive.ietf.org/arch/msg/xml2rfc/mKm4MTbZ5Y9VCp78SKGKEGfrlEk
Subject: [xml2rfc] vspace usage statistics
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 16 Feb 2014 18:42:56 -0000

...obtained from XML versions of RFCs in AUTH48, starting with RFC6000 
(only RFCs with parseable XML counted).


"vspace.xslt" attempts to list all instance of vspace, and to categorize 
them by known usage patterns. Output file is "log".

   <xsl:choose>
     <xsl:when test="parent::t[@hangText] and 
normalize-space(preceding-sibling::node())=''">
       <xsl:text> DIAG-dictionary-list</xsl:text>
     </xsl:when>
     <xsl:when test="parent::t and @blankLines='1'">
       <xsl:text> DIAG-paragraph-break</xsl:text>
     </xsl:when>
     <xsl:when test="parent::t and @blankLines='0'">
       <xsl:text> DIAG-line-break</xsl:text>
     </xsl:when>
     <xsl:when test="parent::t and @blankLines>=10">
       <xsl:text> DIAG-form-feed</xsl:text>
     </xsl:when>
     <xsl:otherwise>
       <xsl:text> UNKNOWN-USE</xsl:text>
     </xsl:otherwise>
   </xsl:choose>

DIAG-dictionary-list is the use case we want to address with a new list 
style.

DIAG-paragraph-break is for cases where vspace apparently was used to 
emulate a paragraph break; this is needed in list items (but some users 
seem to use it for regular paragraph breaks as well)

DIAG-line-break is about forcing ... a line break; I haven't 
investigated all of these; some were used for creating entries in a 
"contributors" section, which we should address separately.

DIAG-form-feed is about cases where it was tried to enforce a page 
break; something we don't need in the future.

log2 contains those entries that were outside the known categories, 
sorted by # of instance:

       1 /rfc/middle/section/list/vspace[@blankLines=1] UNKNOWN-USE
       1 
/rfc/middle/section/section/section/texttable/ttcol/vspace[@blankLines=0] UNKNOWN-USE 

       1 /rfc/middle/section/section/t/vspace[@blankLines=5] UNKNOWN-USE
       1 /rfc/middle/section/t/vspace[@blankLines=2] UNKNOWN-USE
       1 /rfc/middle/section/t/vspace[@blankLines=5] UNKNOWN-USE
       1 /rfc/middle/section/vspace[@blankLines=5] UNKNOWN-USE
       2 /rfc/middle/section/section/vspace[@blankLines=1] UNKNOWN-USE
       2 /rfc/middle/section/vspace[@blankLines=1] UNKNOWN-USE
       3 /rfc/back/section/figure/preamble/vspace[@blankLines=1] 
UNKNOWN-USE
      38 /rfc/middle/section/vspace[@blankLines=99] UNKNOWN-USE

These seem to fall into these categories:

- use of vspace in a place where it's neither allowed nor needed: list, 
section

- use in elements where it's allowed but might be useful: ttcol, preamble

- attempts to control vertical white space


Best regards, Julian