<?xml version="1.0"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced. 
          An alternate method (rfc include) is described in the references.
     --><!ENTITY TLS SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5246.xml">
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC8018 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8018.xml">
<!ENTITY RFC7613 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7613.xml">
<!ENTITY RFC7292 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7292.xml">
<!ENTITY RFC3629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml">
]>
<?rfc toc="no"?>
<?rfc symrefs="yes"?>
<rfc ipr="trust200902" category="info" updates="7292,8018" docName="draft-mavrogiannopoulos-pkcs5-passwords-00">
  <front>
    <title abbrev="Internationalized passwords in PKCS#5">Internationalized passwords in Password-Based Cryptography Specification</title>
    <author initials="N." surname="Mavrogiannopoulos" fullname="Nikos Mavrogiannopoulos">
      <organization abbrev="Red Hat">Red Hat, Inc.</organization>
      <address>
        <postal>
          <street/>
          <city>Brno</city>
          <region/>
          <country>Czech Republic</country>
        </postal>
        <email>nmav@redhat.com</email>
      </address>
    </author>
    <date month="May" year="2017"/>
    <area>Security</area>
    <!--      <workgroup>TLS Working Group</workgroup> -->
    <keyword>I-D</keyword>
    <keyword>Internet-Draft</keyword>
    <keyword>Password-Based Cryptography Specification</keyword>
    <keyword>PKCS#5</keyword>
    <abstract>
      <t>
This memo clarifies the requirements of using internationalized strings
as passwords in Password-Based Cryptography Specification version 2.1 <xref target="RFC8018"/>
(PKCS#5) and Personal Information Exchange Syntax <xref target="RFC7292"/> (PKCS#12).
         </t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" title="Introduction">
      <t>
Utilizing Internationalized passwords is not known to lead to a consistent
user experience. US-ASCII passwords as usually preferred since they are unambiguously
interpreted by applications, even though UTF-8 <xref target="RFC3629"/> updates
US-ASCII in a backwards compatible way.</t>
<t>
The reason for preferring US-ASCII passwords, is the fact that UTF-8 does not imply that
strings conforming to it, are unambiguously unique. There are can be various forms of the same
string which may look identical to an observer, even though it is being represented by a different
byte string. The following are certain issues with using passwords in UTF-8.
<list style="symbols">
<t>There exist various normalization forms, which result to different data for the same input.</t>
<t>There is no consistent input form in diverse systems.</t>
<t>There are various deprecated alphabets which should not be allowed for future compatibility.</t>
</list>
      </t>
    </section>
    <section anchor="terms" title="Terminology">
      <t>
         The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
         NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and
         "OPTIONAL" in this document are to be interpreted as described in
         <xref target="RFC2119"/>.
         </t>
    </section>
    
        <section anchor="pkcs5" title="Passwords in PKCS#5">
<t>
The existing PKCS#5 <xref target="RFC8018"/> methods (PBES1, PBES2, PBMAC1) treat passwords as an opaque
string and describe the usage of ASCII and UTF-8 strings as a possibility of encoding them.
In the interest of interoperability, applications conforming to this specification should
encode passwords in UTF-8 NFC form and SHOULD be adhering to the
OpaqueString profile (section 4.2 of  <xref target="RFC7613"/>).
</t>
<t>
As an exception to the OpaqueString profile, empty (zero-length) passwords MAY
be used, when they are not they result of the <xref target="RFC7613"/> processing.
That is, an empty string generated from any non-empty input MUST NOT be used.
</t>
</section>

        <section anchor="pkcs12" title="Passwords in PKCS#12">
<t>
The PKCS#12 document <xref target="RFC7292"/> defines the use of BMPString passwords (a subset of
UTF-16), for its defined encryption methods. This document does not add any further restrictions
to the input passwords of these methods, however it is RECOMMENDED to use of (big-endian) UTF-16
NFC form <xref target="NFC"/> for encoding the password.
</t>
<t>
Furthermore, when the PKCS#12 container files are combined with methods from PKCS#5 <xref target="RFC8018"/>,
e.g., AES-CBC-Pad, the passwords SHOULD be adhering to the recommendations in <xref target="pkcs5"/>.
In that case, since typically the passwords of the MacData field and the encrypted data match,
applications which restricted the MacData password to BMPString set, SHOULD fail when the input password
cannot be expressed in that set.
</t>
</section>

        <section anchor="notes" title="Compatibility notes">
<t>
Note that software wishing to decrypt files with internationalized passwords MAY
prepare to handle password encoding methods not adhering to this document.
The following paragraphs document existing practices and known bugs in popular software.
</t>
        <section anchor="notes-nfc" title="Attempting the password in NFC">
	<t>
	  The recommendations in the PKCS#5 document are not sufficient to deduce
	  the UTF-8 input form of internationalized passwords. Implementations
	  receiving an internationalized password may attempt decrypting using
	  the password in UTF-8 NFC form.
	</t>
	</section>
        <section anchor="notes-openssl" title="OpenSSL's incorrect password conversion">
	<t>
	  OpenSSL versions prior to 1.1.0 had a bug which always assumed the input
	  password was in the ISO8859-1 character set regardless of the
	  actual character set used on the system. This occurred because
	  it attempted to convert to UTF-16 for the BMPString merely by
	  alternating each byte from the input string with a zero byte
	  to expand to 16 bits.
	</t>
	<t>
	  As an example, consider a PKCS#12 file for which the password
	  is intended to be the following two characters:
	  <list>
	    <t>U+0102 LATIN CAPITAL LETTER A WITH BREVE</t>
	    <t>U+017B LATIN CAPITAL LETTER Z WITH DOT ABOVE</t>
	  </list>
	  For the purpose of this example, the user is operating in a
	  legacy 8-bit locale using the ISO8859-2 character set. The
	  above two characters are thus provided to the application as
	  the bytes 0xC3 0xAF.
	</t>
	<t>
	  The correct form of that password for PKCS#12 key derivation
	  includes precisely those characters in UTF-16 big-endian
	  form as required for a BMPString: the bytes 0x01 0x02 0x01
	  0x7B. This is the correct version which any application
	  supporting the use of files for certificates and keys MUST
	  support.
	</t>
	<t>
	  Historical versions of OpenSSL, as noted, would assume that
	  the input bytes were in the ISO8859-1 character set. So the
	  input bytes 0xC3 0xAF would therefore be interpreted as the
	  two characters:
	  <list>
	    <t>U+00C3 LATIN CAPITAL LETTER A WITH TILDE</t>
	    <t>U+00AF MACRON</t>
	  </list>
	  The BMPString used for key derivation in this case would
	  include the bytes 0x00 0xC3 0x00 0xAF.
	</t>
	<t>
	  An application in a non-ISO8859-1 locale can therefore attempt
	  to decrypt such wrongly-created files by treating the input
	  password as if it is a sequence of bytes in ISO8859-1 rather
	  than the locale character set in which it really was
	  provided. The application can generate the BMPString by
	  converting from ISO8859-1 to big-endian UTF-16, and attempt to
	  decrypt the file by deriving the key using that rendition of
	  the password.
	</t>
	</section>
</section>

    <section anchor="security" title="Security Considerations">
<t>
All the considerations in <xref target="RFC8018"/> and <xref target="RFC7292"/> apply.
</t>
</section>
    <section anchor="IANA" title="IANA Considerations">
      <t>
              None.
          </t>
    </section>
  </middle>
  <back>
    <references title="Normative References">

	<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
	&RFC2119;

	<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.7613.xml"?-->
	&RFC7613;

    <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.8018.xml"?-->
	&RFC8018;

	<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.7292.xml"?-->
	&RFC7292;

	<reference anchor='NFC'>
	<front>
	    <title>Unicode Standard Annex #15: Unicode Normalization Forms r.44</title>
	    <author initials='M.' surname='Davis'
		    fullname='Mark Davis'>
	    </author>
	    <author initials='M.' surname='Duerst'
		    fullname='Ken Whistler'>
	    </author>

	    <date month='February' year='2016' />
	</front>
	<seriesInfo name="Unicode" value=""/>
	</reference>

	
   	  </references>
    <references title="Informative References">
	<!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.3629.xml"?-->
	&RFC3629;

	
   	  </references>
      <section title="Acknowledgements">
      <t>
      The compatibility notes section is based on David Woodhouse's compatibility notes on certificate
      best practices.
      </t>
      </section>
  </back>
</rfc>
