Internet Engineering Task Force (IETF) Phillip Hallam-Baker Internet-Draft Comodo Group Inc. Intended Status: Standards Track Rob Stradling Expires: April 10, 2015 Comodo CA Ltd. October 7, 2014 Compressed CRL Sets draft-hallambaker-compressedcrlset-00 Abstract An efficient encoding for Certificate Revocation Lists is described based on a Disjoint Set encoding technique. When applied to the population of currently issued certificates enroled in the Certificate Transparency Notary network, the data set was reduced by 97% without introduction of false positives or false negatives. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Copyright Notice Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Hallam-Baker April 10, 2015 [Page 1] Internet-Draft Compressed CRLSets October 2014 Table of Contents 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 2. The Revocation Problem . . . . . . . . . . . . . . . . . . . . 3 2.1. PKIX Certificate Status Mechanisms . . . . . . . . . . . 3 2.2. Short Lived Certificates . . . . . . . . . . . . . . . . 4 2.3. CRL Sets . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4. Compressed CRL Sets . . . . . . . . . . . . . . . . . . . 5 3. Efficient Encoding of Disjoint Sets . . . . . . . . . . . . . 5 3.1. Problem Statement . . . . . . . . . . . . . . . . . . . . 5 3.2. Standard Hash Function. . . . . . . . . . . . . . . . . . 6 3.3. Parameterized Hash Function. . . . . . . . . . . . . . . 6 3.4. Shortest Discrimination Prefix . . . . . . . . . . . . . 6 3.4.1. Sorted List of Hashes . . . . . . . . . . . . . . . 7 3.4.2. Difference Encoding . . . . . . . . . . . . . . . . 7 3.4.3. Further optimizations. . . . . . . . . . . . . . . . 7 4. Compressed CRL Set Encoding . . . . . . . . . . . . . . . . . 8 5. Experimental Results . . . . . . . . . . . . . . . . . . . . . 8 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 8 6.1. Normative References . . . . . . . . . . . . . . . . . . 8 6.2. Informative References . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8 Hallam-Baker April 10, 2015 [Page 2] Internet-Draft Compressed CRLSets October 2014 1. Definitions 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. The Revocation Problem Certificate revocation is a critical element of the Web PKI design. A certificate issuer may revoke a certificate for many reasons including: * The certificate was issued in error or due to a CA breach. * The certificate holder has breached some condition set by the issuer. * The certificate is being used for or in connection with illegal purposes. * The certificate holder has requested the certificate be revoked. * The private key corresponding to the public key in the certificate has been compromised. While certificates are occasionally issued in error or due to an issuer breach, these cases are very rare. Impersonating another company is much more difficult than stealing a private key from a legitimate company. The ability to revoke certificates is therefore a critical element in the WebPKI risk mitigation strategy. At present, there are approximately TBS certificates unexpired certificates issued in the WebPKI of which TBS (approximately 10%) have been revoked. 2.1. PKIX Certificate Status Mechanisms PKIX [RFC5280] describes two mechanisms for communicating certificate status. * Certificate Revocation Lists (CRLs) list the serial numbers of certificates that a CA has issued that are not currently valid. * Online Certificate Status Protocol (OCSP) describes an online service that a relying party can query to request the current status of a certificate. Hallam-Baker April 10, 2015 [Page 3] Internet-Draft Compressed CRLSets October 2014 The scaling problems inherent in the use of CRLs quickly became apparent as the size of the WebPKI grew. The length of the CRLs issued by the large CAs quickly grew to several Mb. While techniques such as delta encoding and partitioning had the protential to provide temporary relief, the scaling issue was only mitigated, not solved. OCSP [RFC6960] provides a scalable solution but in its original conception for TLS introduces a new party to every communication. Each time a client attempts to establish a TLS connection it looks for an unexpired OCSP token to establish the validity of the certificate presented. If no unexpired cached token is available, a fresh one is obtained from the OCSP distribution. This introduces a risk of traffic analysis by either the OCSP service provider itself or any party that can view the traffic to the OCSP service. OCSP Stapling [RFC6066] was introduced to the TLS protocol as a mechanism to avoid the need to obtain OCSP tokens from a third party by allowing them to be passed in band in the TLS conversation. 2.2. Short Lived Certificates While OCSP stapling removes the privacy, reliability and performance concerns of the original OCSP design, the certificate issuer is still required to issue the OCSP token and the server using the certificate is still required to fetch it. Since issuing an OCSP token on a daily basis is tantamount to issuing a new certificate, why not reissue the certificate on a daily basis and do away with the OCSP token entirely? In principle, use of short lived certificates is simpler for all the parties involved: CAs, subjects and relying parties. But before short lived certificates can be used in practice, new infrastructure is required to support their use. In particular client software has to be rewritten so that it recognizes a certificate with a short validity interval (e.g. 36 hours) as carrying an implicit assertion of validity and a mechanism by which the server obtains fresh certificates must be specified and implemented. For work on the latter problem, see [I-D.hallambaker-omnipublish]. 2.3. CRL Sets CRL Sets are curated CRLs, typically issued by an application provider that list the revoked certificates regarded to pose the greatest concern of abuse. Use of CRL Sets has the advantage that they may allow an application provider to react to a security incident more quickly than the issuing CA. But they only provide revocation information for a limited number of certificates. Hallam-Baker April 10, 2015 [Page 4] Internet-Draft Compressed CRLSets October 2014 As with traditional CRLs, various techniques such as delta encoding may be used to limit the size of individual CRLs but these do not reduce the total volume of data that must be exchanged. 2.4. Compressed CRL Sets Compressed CRL Sets are an alternative encoding for CRLs that permit very much higher encoding density than traditional CRLs. Rather than encoding the set of revoked certificates, Compressed CRLs encode the difference between the set of valid unexpired certificates and the set of invalid unexpired certificates. This allows the average number of bits required to encode the set to be reduced from 96 bits per certificate (using the certificate serial number) to approximately 6. While Compressed CRL Sets currently offer a practical answer to the problem of revocation in the WebPKI, short lived certificates actually scale better. Although Compressed CRL Sets provide much better scaling properties than traditional CRLs, the size of the CRL Set still increases linearly with the product of the number of revoked certificates and the log of the revocation ratio. Thus while use of Compressed CRL Sets will remain practical as long as the growth in the number of WebPKI certificates issued grows at a slower rate than additional communications bandwidth becomes available, this cannot be guaranteed. Short lived certificates in contrast do not require any additional revocation data and may be regarded as offering perfect scaling. The chief drawback to relying on short lived certificates is that the minimum practical validity interval is 48 hours which is also the time period in which the vast majority of the harm caused in a criminal attack occurs. Also deployment of short lived certificates relies on new certificate lifecycle management infrastructure that is not yet developed. Use of Compressed CRLs in combination with Short Lived certificates offers advantages over both. In the short term Compressed CRLs provide a mechanism for reporting status on the population of long expiry certificates until the infrastructure is deployed to enable use of the short lived approach. In the long term, Compressed CRLs provide an efficient mechanism to narrow the residual vulnerability window from days to minutes. 3. Efficient Encoding of Disjoint Sets CRL Sets are compressed using a selected variant of the approach described in [[HBS2014 - To be specified]]. Hallam-Baker April 10, 2015 [Page 5] Internet-Draft Compressed CRLSets October 2014 3.1. Problem Statement Given two disjoint sets of data entries A and B in which |A| >= |B|, provide a boolean function f (x) that is efficient in both memory and time such that: * f(x) is true if x is a member of A * f(x) is false if x is a member of B * f(x) can return true or false otherwise For ease of comparison we will consider an example in which set A has 10 million members and set B has 1 million. This corresponds to a revocation use case in which there are 11 million unexpired certificates and 10% have been revoked. Note that while it is in theory possible for the revocation ratio to exceed 50% and thus for |A| < |B|, this case is easily handled with a binary flag to indicate that Set A represents the revoked certificates and B those that are valid. 3.2. Standard Hash Function. One approach to this problem is to first reduce the data entries using a hash function and compile a list of the hash values corresponding to the entries of set B. This has the advantage of simplicity but the corresponding data set is very large. For the example case, the implementation of f(x) requires 32MB of data if a 256 bit hash is used. 3.3. Parameterized Hash Function. While using a 256 bit hash leaves an infintesimal probability of collisions, the same is true of 128 bit or even shorter hashes. If a parameterized hash function (e.g. MAC-SHA1) is used we can repeat the hash construction process multiple times with different parameters and check to see the shortest truncation that allows the sets to be distinguished. This allows the example case to be addressed using only 6MB of data. 3.4. Shortest Discrimination Prefix The parameterized hash function encodes enough information to distinguish any member of Set A from any member of set B, but this is more information than is necessary. It is sufficient to distinguish a member of set B from the two members of set A that are adjacent to it. Hallam-Baker April 10, 2015 [Page 6] Internet-Draft Compressed CRLSets October 2014 We begin by calculating a hash value H(x) of each member of A and B using a hash function that returns a unique value for each entry to create the corresponding hash sets AH, BH. The hash function used for this purpose does not need to meet any special cryptographic properties such as first or second pre-image resistance. Let P(x,y) be a function of two bit strings, x, y such that P(x,y) is true if and only if y is a prefix of x. That if y is n bits long, the first n bits of x and y are identical. Let M(b, AH) be a function that returns the shortest prefix of b that is not a prefix of any member of AH. We apply the function M to each member of BH and remove duplicates to create the set BP. The function f(A,B) can now be implemented as follows: f(x) = false if there is a member p of BP such that p is a prefix of the hash value of x (i/e/ P(H(x),p) is true)f(x) = true otherwise Note that the value of f(x) returned if x is not a member of A or B is undefined. A space and time efficient implementation of the function f(x) is arrived at by an efficient encoding of the set BP. 3.4.1. Sorted List of Hashes The set BP may be encoded as a simple list. Sorting the list permits efficient access using standard techniques applicable to searching a hash table. For the example case, the average shortest prefix is 24?? bits giving a total data set size of 3Mb. 3.4.2. Difference Encoding In a sorted list of hashes, adjacent hashes will typically only vary in the last few bits. Since it is only the difference between the hashes that contains information, we can encode this information without loss of information overall. For the example case, the average number of bits required for a difference encoding is 6 bits and the total data volume required is 750KB. Hallam-Baker April 10, 2015 [Page 7] Internet-Draft Compressed CRLSets October 2014 3.4.3. Further optimizations. Further optimizations have been developed permitting an additional reduction in the data volume by half over the difference technique. 4. Compressed CRL Set Encoding TBS specify an encoding of CRL data based on the above. Following the general trend for JSON and the need for compactness, the use of JSON-B is ideal. 5. Experimental Results TBS here extract the lab data from the prototype implementations using the encoding described above. 6. References 6.1. Normative References [I-D.hallambaker-omnipublish] Hallam-Baker, P, "OmniBroker Publication Protocol", Internet-Draft draft-hallambaker- omnipublish-00, 22 May 2014. [RFC6960] Santesson, S.,Myers, M.,Ankney, R.,Malpani, A.,Galperin, S.,Adams, C., "X.509 Internet Public Key Infrastructure Online Certificate Status Protocol - OCSP", RFC 6960, June 2013. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 6.2. Informative References [RFC6066] Eastlake, D., "Transport Layer Security (TLS) Extensions: Extension Definitions", RFC 6066, January 2011. [RFC5280] Cooper, D.,Santesson, S.,Farrell, S.,Boeyen, S.,Housley, R.,Polk, W., "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, May 2008. Authors' Addresses Phillip Hallam-Baker Comodo Group Inc. philliph@comodo.com Hallam-Baker April 10, 2015 [Page 8] Internet-Draft Compressed CRLSets October 2014 Rob Stradling Comodo CA Ltd. rob.stradling@comodo.com Hallam-Baker April 10, 2015 [Page 9]