[websec] RFC6454 (Origin) vs URI schemes unlike "http"

Bjoern Hoehrmann <derhoermi@gmx.net> Wed, 13 February 2013 14:09 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: websec@ietfa.amsl.com
Delivered-To: websec@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1C3EA21F8714 for <websec@ietfa.amsl.com>; Wed, 13 Feb 2013 06:09:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z1aE1ObFXgEw for <websec@ietfa.amsl.com>; Wed, 13 Feb 2013 06:09:54 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) by ietfa.amsl.com (Postfix) with ESMTP id 01ADC21F86F6 for <websec@ietf.org>; Wed, 13 Feb 2013 06:09:47 -0800 (PST)
Received: from mailout-de.gmx.net ([10.1.76.30]) by mrigmx.server.lan (mrigmx001) with ESMTP (Nemesis) id 0MPsKa-1U0exE3QeI-0054Mr for <websec@ietf.org>; Wed, 13 Feb 2013 15:09:46 +0100
Received: (qmail invoked by alias); 13 Feb 2013 14:09:46 -0000
Received: from p5B23193A.dip.t-dialin.net (EHLO netb.Speedport_W_700V) [91.35.25.58] by mail.gmx.net (mp030) with SMTP; 13 Feb 2013 15:09:46 +0100
X-Authenticated: #723575
X-Provags-ID: V01U2FsdGVkX18S2WRJES4SwepRzKvfxJ76n7XVGKgHKjTHRvujKS GB8Jh+sDzlWs/R
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: websec@ietf.org
Date: Wed, 13 Feb 2013 15:09:47 +0100
Message-ID: <h11nh8hl1037ibut5adr4b2ifuu9umrvie@hive.bjoern.hoehrmann.de>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Y-GMX-Trusted: 0
Subject: [websec] RFC6454 (Origin) vs URI schemes unlike "http"
X-BeenThere: websec@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Web Application Security Minus Authentication and Transport <websec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/websec>, <mailto:websec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/websec>
List-Post: <mailto:websec@ietf.org>
List-Help: <mailto:websec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/websec>, <mailto:websec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Feb 2013 14:09:55 -0000

Hi,

  https://tools.ietf.org/html/rfc6454 fails to properly account for a
number of cases where URIs and URI schemes are slightly unusual, e.g.

   The origin of a URI is the value computed by the following algorithm:

   1.  If the URI does not use a hierarchical element as a naming
       authority (see [RFC3986], Section 3.2) or if the URI is not an
       absolute URI, then generate a fresh globally unique identifier
       and return that value.
   ...
   2.  Let uri-scheme be the scheme component of the URI, converted to
       lowercase.

   3.  If the implementation doesn't support the protocol given by uri-
       scheme, then generate a fresh globally unique identifier and
       return that value.

Consider `javascript://example.org`. In order to make the determination
whether "the URI" uses "a hierarchical element as a naming authority"
you have to know the scheme, but the scheme is not mentioned until after
the first step, which may lead one to believe that you can make this de-
termination without knowing the scheme.

For 'javascript' in particular there is no "over the wire" "protocol",
so it's not clear what to do in the third step. Consider this from the
perspective of someone making a generic URI library and giving URI
objects some `.origin` property: how would that work? A browser might
support "ftp" but a user might disable loading resources over FTP in the
browser; or it might phase out FTP support but keep supporting 'ftp'
URIs (like by still knowing the default port). What is the "Origin of a
URI" then? Does it matter if you do not actually load content from such
a URI, or don't do it in a web-browser-like fashion? I am not sure...

Further down there is

   5.  Let uri-host be the host component of the URI, converted to lower
       case (using the i;ascii-casemap collation defined in [RFC4790]).

What if there is no `host` component? `news:de.comp.text.xml` does not
have one, even though the scheme does use "a hierarchical element as a
naming authority" and the URI is valid? For that matter, what if there
is such a component but it's the empty string (like in `file:///`, if
you ignore the specific provision for 'file')? It seems the empty string
would pass through the "algorithm", but it's unclear if that is inten-
tional and what the security considerations are in this regard.

   6.  If there is no port component of the URI:

       1.  Let uri-port be the default port for the protocol given by
           uri-scheme.

       Otherwise:

       2.  Let uri-port be the port component of the URI.

Per RFC 3986 schemes may define a default port but do not have to. What
if a scheme does not define a default port? Also, what if the component
is present, but is the empty string? In section 6.1 I'm told

       1.  Append a U+003A COLON code point (":") and the given port, in
           base ten, to result.

I can't give the empty string in base ten. Per RFC 3986 the port compo-
nent should be omitted when it is the empty string, which would lead to
use of the default port if any, but there is no provision in RFC 6454
for normalising URIs and it's valid to use the empty string as value so
that is valid input into the "Origin of a URI" "algorithm".

regards,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/