[apps-discuss] Review of: draft-ietf-appsawg-uri-get-off-my-lawn-00

Dave Crocker <dhc@dcrocker.net> Tue, 31 December 2013 15:53 UTC

Message-ID: <52C2E825.9010405@dcrocker.net>
Date: Tue, 31 Dec 2013 07:52:05 -0800
From: Dave Crocker <dhc@dcrocker.net>
Organization: Brandenburg InternetWorking
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: draft-ietf-appsawg-uri-get-off-my-lawn.all@tools.ietf.org, apps-discuss@ietf.org
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: [apps-discuss] Review of: draft-ietf-appsawg-uri-get-off-my-lawn-00
Precedence: list
Reply-To: dcrocker@bbiw.net

G'day.

I have been selected as the Applications Area Review Team reviewer for 
this draft (for background on apps-review, please see 
http://www.apps.ietf.org/content/applications-area-review-team).

Please resolve these comments along with any other comments you may 
receive. Please wait for direction from your document shepherd or AD 
before posting a new version of the draft.



    NOTE:        This was assigned as an 'early' review.  So the review
                 is not being sent to the IESG.

    DISCLAIMER:  I've read the discussion thread that took place on the
                 apps list with Tim Bray's review.  To the extent that
                 my comments match his, that's great.  To the extent
                 that they don't /and/ my comments seem to be wrong or
                 misguided, please consider that many readers of this
                 document will not be experts in URI technical details
                 and probably won't put the effort into reading the spec
                 that I just did.  In other words, if I am confused
                 after reading the doc, others probably will be too...



Document:     draft-ietf-appsawg-uri-get-off-my-lawn-00
Title:        Standardising Structure in URIs
Reviewer:     D. Crocker
Review Date:  31 December 2013



Summary:

      URIs are the core citation and client transaction (request) form 
on the Internet.  Anything used that widely and in such variable 
circumstances is likely to get ad hoc tailoring for use with particular 
applications and/or particular implementations and/or particular 
servers.  The "and/or" combinations highlight the potential for chaotic 
tailoring which, therefore, creates ambiguities and conflicts.   The 
current draft seeks to impose better global discipline on the use of URIs.

     For a construct as important as a URI, the importance of moving 
towards clean and unconflicted use can't be overstated.  However the 
requirements that prompt the ad hoc tailoring typically are legitimate, 
even if the method of dealing with it is problematic.  So the challenges 
here are in clarity and precision about what should /not/ be done, 
sufficiency in the available alternatives, and clarity in the way all 
this is explained.  Any specification needs to worry that it is 
adequately understandable to a reader new to the topic, but in the case 
of a document attempting to repair existing confusion and misbehavior, 
it is especially important:  the document needs an added degree of tight 
and coherent organization, with very pedantic logical sequences to what 
is said, and very careful wording that says it.

Unfortunately, at the highest level, I'm not sure how this document can 
be used, for those most needing to use it.  It's not a question of 
whether the document isn't chock full of specific advice.  And the issue 
is separate from agreeing or disagreeing with any of the points made in 
the document.  It's a question of how it is organized for long-term use. 
  The current form of the document doesn't seem tightly-enough organized 
for that use, but I'm not sure what to suggest.  In addition, it varies 
between portions that appear to be protocol specification, versus 
portions that appear to be best current practices.

Also I found myself periodically confused about the exact meaning of 
text in the document.  Some of this might merely need somewhat more 
careful wording.  Some of the confusion might be due to deeper issues 
with the document.  I'm not sure.  Obviously some of the confusion could 
be my own limitations, but I spent some time trying to track things down 
and was still left confused, as noted below.

The title of the document asserts that it is /creating/ structure for 
URI's, but I believe it is, instead, mostly clarifying /existing/ 
structure.  Beyond that I believe that it is at most making -- or, at 
least, intending to make -- modest tweaks.  (I see that as a Good Thing, 
for something like this.  Anything more ambitious would probably disrupt 
current, legitimate uses of URIs...)



Detailed Comments:

> Abstract
>
>    Sometimes, it is attractive to add features to protocols or
>    applications by specifying a particular structure for URIs (or parts
>    thereof).  This document cautions against this practice in standards
>    (sometimes called "URI Squatting").

This implies that URIs must not have any 'features' or 'structure'. I 
think that what is meant is more limited than that.  So what is needed 
is a statement to balance, that indicates what is normal and acceptable. 
  Maybe that's just a pointer to Section 3, but I suspect there's more 
needed than that.

Note, for example, that the URI spec (RFC 3986) allocates the role of 
providing "generative grammar for URIs; that task is performed by the 
individual specifications of each URI scheme."  A grammar is, by 
definition, a structure.  So some sorts of structures are acceptable, in 
some sorts of specifications.  What is ok and what isn't ok needs to be 
clarified.

That leaves a basic question for the current draft:  when/where is 
acceptable for specifying functions and structures in a URI and 
when/where isn't?

In addition, the use of sub-structure in URI's is integral to many, 
established web practices.  Again:  this document needs to distinguish 
between those and whatever others that are /not/ acceptable.



> 1.  Introduction
>
>    URIs [RFC3986] very often include structured application data.  This

In spite of the examples provided in the next sentence, it isn't obvious 
to me what "structured application data" means.  Yet it's the key 
construct for the document.


>    might include artifacts from filesystems (often occurring in the path
>    component), and user information (often in the query component).  In
>    some cases, there can even be application-specific data in the
>    authority component (e.g., some applications are spread across
>    several hostnames to enable a form of partitioning or dispatch).
>
>    Furthermore, constraints upon the structure of URIs can be imposed by
>    an implementation; for example, many Web servers use the filename
>    extension of the last path segment to determine the media type of the
>    response.  Likewise, pre-packaged applications often have highly
>    structured URIs that can only be changed in limited ways (often, just
>    the hostname and port they are deployed upon).
>
>    Because the owner of the URI is choosing to use the server or the
>    software, this can be seen as reasonable delegation of authority.
>    When such conventions are mandated by standards, however, it can have
>    several potentially detrimental effects:
>
>    o  Collisions - As more conventions for URI structure become
>       standardised, it becomes more likely that there will be collisions

If indeed they are being standardized, then the real issue is to have 
coordination of the standards documents, through a registry or the like.


>       between such conventions (especially considering that servers,
>       applications and individual deployments will have their own
>       conventions).
>    o  Dilution - Adorning URIs with extra information to support new
>       standard features dilutes their usefulness as identifiers when
>       that information is ephemeral (as URIs ought to be stable; see
>       [webarch] Section 3.5.1), or its inclusion causes several
>       alternate forms of the URI to exist (see [webarch] Section 2.3.1).

hmmm.  The more I look the document over, the more it seems that the 
issue here is a conflict between use of a URI as an actual "identifier" 
for some data, versus use as a query mechanisms (which produces some 
data).  The latter lends itself to extra parameters and ad hoc 
techniques, whereas the former seems less likely to.


>    o  Brittleness - A standard that specifies a static URI cannot change
>       its form in future revisions.

I don't really understand what the explanation of brittleness means.

But I think the word that fits what is described is "rigidity", not 
"brittleness".  The URI doesn't break.  Rather, use of ad hoc 
conventions means the range of operational uses isn't as flexible as 
desired.

I've searched through a number of the documents cited in the References 
section.  None of them defines "static URI", nor does it appear to be in 
common use online.  I don't have a good guess about its meaning. 
Perhaps it mean "tied to a specific use"?  Anyhow, it needs to be defined.


>    o  Operational Difficulty - Supporting some URI conventions can be
>       difficult in some implementations.  For example, specifying that a
>       particular query parameter be used precludes the use of Web
>       servers that serve the response from a filesystem.

Huh?

 >                     Likewise, an
>       application that fixes a base path for its operation (e.g., "/v1")
>       makes it impossible to deploy other applications with the same
>       prefix on the same host.
>    o  Client Assumptions - When conventions are standardised, some
>       clients will inevitably assume that the standards are in use when
>       those conventions are seen.

Well, uh, yeah.

I suspect the word "standardize" is being used ambiguously in this 
document, to mean both "formal" standards and "common practice" (de fact).

For the text here, perhaps:

    When ad hoc conventions are used, some clients are likely to assume 
that they are universal, rather than ad hoc, and so will assume that the 
conventions apply in scenarios that do not work.


>                 This can lead to interoperability
>       problems; for example, if a specification documents that the "sig"
>       URI query parameter indicates that its payload is a cryptographic
>       signature for the URI, it can lead to false positives.

The example seems to presume that the attribute associated with 'sig' 
will incorrectly validate as a signature.  What are the odds of that, if 
the sig is using competent modern cryptography?  False negatives seem 
more likely.


>    While it is not ideal when a server or a deployed application
>
>
>
> Nottingham               Expires March 21, 2014                 [Page 3]
> 
> Internet-Draft           URI Structure Policies           September 2013
>
>
>    constrains URI structure (indeed, this is not recommended practice,
>    but that discussion is out of scope for this document), publishing
>    standards that mandate URI structure is inappropriate because the
>    structure of a URI needs to be firmly under the control of its owner,

See comment above, in Abstract:  the URI spec assigns exactly that role 
to exactly such documents.  But perhaps I've misunderstood that doc or 
this one?


>    and the IETF (as well as other organisations) should not usurp this
>    ownership; see [webarch] Section 2.2.2.1.

Huh?


>
>    This document explains best current practices for establishing URI
>    structures, conventions and formats in standards.  It also offers
>    strategies for specifications to avoid violating these guidelines in
>    Section 3.
>
> 1.1.  Who This Document Is For
>
>    These guidelines are IETF Best Current Practice, and are therefore
>    binding upon IETF standards-track documents, as well as submissions
>    to the RFC Editor on the Independent and IRTF streams.  See [RFC2026]
>    and [RFC4844] for more information.

Having a document like this make such a broad and absolute assertion 
seems inappropriate.


>
>    Other Open Standards organisations (in the sense of [RFC2026]) are
>    encouraged to adopt them.  Questions as to their applicability ought
>    to be handled through the liaison relationship, if present.
>
>    Ad hoc efforts are also encouraged to adopt them, as this RFC
>    reflects Best Current Practice.

Instead of making the above statements of "authority" over other 
documents, it would be more useful for the text to describe the nature 
of its utility and, therefore, /when/ it should apply.  I think the 
draft text that follows this probably suffices for that.


>
>    This document's requirements specifically targets a few different

    specifically targets -> target


>    types of specifications:
>
>    o  URI Scheme Definitions ("scheme definitions") - specifications
>       that define and register URI schemes, as per [RFC4395].
>    o  Protocol Extensions ("extensions") - specifications that offer new
>       capabilities to potentially any identifier, or a large subset;
>       e.g., a new signature mechanism for 'http' URIs, or metadata for
>       any URI.

How are these formally/notationally distinguished?  For example, must 
they be issued as updates to existing definitions?


>    o  Applications Using URIs ("applications") - specifications that use
>       URIs to meet specific needs; e.g., a HTTP interface to particular
>       information on a host.
>
>    Requirements that target the generic class "Specifications" apply to
>    all specifications, including both those enumerated above above and
>    others.

This document makes requirements on all other specifications ever written?


>
>    Note that this specification ought not be interpreted as preventing
>    the allocation of control of URIs by parties that legitimately own
>    them, or have delegated that ownership; for example, a specification
>    might legitimately specify the semantics of a URI on the IANA.ORG Web
>    site as part of the establishment of a registry.

How is that meaningfully different from the specifications done by the 
IETF, or others, that are /not/ legitimate?


> Nottingham               Expires March 21, 2014                 [Page 4]
> 
> Internet-Draft           URI Structure Policies           September 2013
>
>
> 1.2.  Notational Conventions
>
>    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
>    "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
>    document are to be interpreted as described in [RFC2119].
>
>
> 2.  Best Current Practices for Standardising Structured URIs
>
>    Different components of a URI have differing practices recommended.
>
> 2.1.  URI Schemes
>
>    Applications and extensions MAY require use of specific URI
>    scheme(s); for example, it is perfectly acceptable to require that an
>    application support 'http' and 'https' URIs.  However, applications
>    SHOULD NOT preclude the use of other URI schemes in the future, to
>    promote reuse, unless they are clearly specific to the nominated
>    schemes.

What does it mean, in practical, technical terms, to not preclude use of 
other schemes?  (A desire to avoid limiting future specification choices 
is always nice, but knowing what will and won't limit things isn't 
always obvious.)

For example, how is it reasonable for a file transfer application to be 
required to permit use of a mailto: or sip: scheme?  zThe current 
wording seems to imply such flexibility.

Also, "reuse"?  What does that mean?


>    Specifications MUST NOT define substructure within URI schemes,
>    unless they do so by modifying [RFC4395], or they are the
>    registration document for the URI scheme(s) in question.

This should be stated as an affirmative, rather than a negative.  For 
example:

    A specification that defines substructure within URI scheme MUST do 
so by modifying [RFC4395], or as a registration document for the URI 
scheme in question.


>
> 2.2.  URI Authorities
>
>    Scheme definitions define the presence, format and semantics of an
>    authority component in URIs; all other specifications MUST NOT
>    constrain, define structure or semantics for them.

'them'?

I think the phrasing needs to be:

      all other specifications MUST NOT constrain or define URI 
authority structure or semantics.


>
> 2.3.  URI Paths
>
>    Scheme definitions define the presence, format, and semantics of a
>    path component in URIs; all other specifications MUST NOT constrain,
>    define structure or semantics for any path component.

same suggested rewording, as above.


>    The only exception to this requirement is registered "well-known"
>    URIs, as specified by [RFC5785].  See that document for a description
>    of the applicability of that mechanism.
>
> 2.4.  URI Queries
>
>    The presence, format and semantics of the query component of URIs is
>    dependent upon many factors, and MAY be constrained by a scheme
>    definition.  Often, they are determined by the implementation of a
>    resource itself.

And again...


>    Applications SHOULD NOT directly specify the syntax of queries, as
>
>
>
> Nottingham               Expires March 21, 2014                 [Page 5]
> 
> Internet-Draft           URI Structure Policies           September 2013
>
>
>    this can cause operational difficulties for deployments that do not
>    support a particular form of a query.

The implication of this is that all applications that accept any queries 
must accept all queries.  But that doesn't make sense.  The nature of an 
application constrains what types of queries it can process usefully.


>    Extensions MUST NOT specify the format or semantics of queries.  In
>    particular, extensions MUST NOT assume that all HTTP(S) resources are
>    capable of accepting queries in the format defined by [HTML4],
>    Section 17.13.4.

The first sentence seems to mean that a "protocol extension" is not 
allowed to extend a protocol to support queries.  But the second 
sentence clearly means otherwise.  I'm obviously misunderstanding what 
this means or how it is to be applied.


> 2.5.  URI Fragment Identifiers
>
>    Media type definitions (as per [RFC6838] SHOULD specify the fragment
>    identifier syntax(es) to be used with them; other specifications MUST
>    NOT define structure within the fragment identifier, unless they are
>    explicitly defining one for reuse by media type definitions.

Cite where fragment identifiers are defined: rfc3986, section-3.5


> 3.  Alternatives to Specifying Static URIs

Per the comment earlier in the review:  What is a static URI?


>    Given the issues above, the most successful strategy for applications
>    and extensions that wish to use URIs is to use them in the fashion
>    they were designed; as run-time artifacts that are exchanged as part
>    of the protocol, rather than statically specified syntax.
>
>    For example, if a specific URI needs to be known to interact with an

I'm not sure what it means for a URI to be "known to interact" with an 
application.  A URI isn't a protocol; it's a format.  And so I don't 
think of it as "interacting".

Do you mean that an application accepts a particular URI or that the 
application uses the URI?


>    application, its "shape" can be determined by interacting with the

I've searched a number of the documents cited in the references and none 
of them uses the word 'shape'.  What does it mean here?


>    application's more general interface (in Web terms, its "home page")
>    to learn about that URI.

How?  This seems a hugely important point but it contains no technical 
detail.

In general, this document would be greatly aided by many more examples, 
including examples that show the wrong way something is often done and 
then the right way it should be done.


>    [RFC5988] describes a framework for identifying the semantics of a

It seems to be more substantive/precise than a 'framework'.  Indeed, it 
says it "specifies relation types" and a registry for them.

So:

     [RFC5988] specifies relation types for Web links.


>    link in a "link relation type" to aid this.  [RFC6570] provides a
>    standard syntax for "link templates" that can be used to dynamically

That document does not use the term "link template", although it does 
refer to using URI templates with reference to links:

    URI Templates can have many uses, including the discovery
    of available services, configuring resource mappings, defining
    computed links, specifying interfaces, and other forms of
    programmatic interaction with resources.

I think the sentence here works adequately by simply using that 
document's own term "URI templates".


>    insert application-specific variables into a URI to enable such
>    applications while avoiding impinging upon URI owners' control of
>    them.
>
>    [RFC5785] allows specific paths to be 'reserved' for standard use on
>    URI schemes that opt into that mechanism ('http' and 'https' by
>    default).  Note, however, that this is not a general "escape valve"
>    for applications that need structured URIs; see that specification
>    for more information.
>
>    Specifying more elaborate structures in an attempt to avoid
>    collisions is not adequate to conform to this document.  For example,
>    prefixing query parameters with "myapp_" does not help.

This last paragraph is a useful, but small, tidbit.  It belongs 
elsewhere in the document, possibly in the introduction, where the case 
for the current problem is being made.  Hence, an example of some hack 
that doesn't solve the problem makes sense there.  However I suggest 
that the paragraph be expanded to explain why this particular hack does 
not suffice.  (Even if the explanation seems obvious.)


>
>
>
>
>
>
>
> Nottingham               Expires March 21, 2014                 [Page 6]
> 
> Internet-Draft           URI Structure Policies           September 2013
>
>
> 4.  Security Considerations
>
>    This document does not introduce new protocol artifacts with security
>    considerations.

Hmmm.  This draft touches on ambiguity and false positives/negatives.  I 
would think that these have some potential security implications.


>
>
> 5.  IANA Considerations
>
>    This document clarifies appropriate registry policy for new URI
>    schemes, and potentially for the creation of new URI-related
>    registries, if they attempt to mandate structure within URIs.  There
>    are no direct IANA actions specified in this document.
>
>
> 6.  References
>
> 6.1.  Normative References
>
>    [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
>               Requirement Levels", BCP 14, RFC 2119, March 1997.
>
>    [RFC3986]  Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform
>               Resource Identifier (URI): Generic Syntax", STD 66,
>               RFC 3986, January 2005.
>
>    [RFC4395]  Hansen, T., Hardie, T., and L. Masinter, "Guidelines and
>               Registration Procedures for New URI Schemes", BCP 35,
>               RFC 4395, February 2006.
>
>    [RFC6838]  Freed, N., Klensin, J., and T. Hansen, "Media Type
>               Specifications and Registration Procedures", BCP 13,
>               RFC 6838, January 2013.
>
> 6.2.  Informative References
>
>    [HTML4]    Jacobs, I., Le Hors, A., and D. Raggett, "HTML 4.01
>               Specification", December 1999,
>               <http://www.w3.org/TR/REC-html40/>.
>
>    [RFC2026]  Bradner, S., "The Internet Standards Process -- Revision
>               3", BCP 9, RFC 2026, October 1996.
>
>    [RFC4844]  Daigle, L. and Internet Architecture Board, "The RFC
>               Series and RFC Editor", RFC 4844, July 2007.
>
>    [RFC5785]  Nottingham, M. and E. Hammer-Lahav, "Defining Well-Known
>               Uniform Resource Identifiers (URIs)", RFC 5785,
>               April 2010.
>
>
>
> Nottingham               Expires March 21, 2014                 [Page 7]
> 
> Internet-Draft           URI Structure Policies           September 2013
>
>
>    [RFC5988]  Nottingham, M., "Web Linking", RFC 5988, October 2010.
>
>    [RFC6570]  Gregorio, J., Fielding, R., Hadley, M., Nottingham, M.,
>               and D. Orchard, "URI Template", RFC 6570, March 2012.
>
>    [webarch]  Jacobs, I. and N. Walsh, "Architecture of the World Wide
>               Web, Volume One", December 2004,
>               <http://www.w3.org/TR/2004/REC-webarch-20041215>.
>
>
> Appendix A.  Acknowledgments
>
>    Thanks to David Booth, Anne van Kesteren and Erik Wilde for their
>    suggestions and feedback.
>
>
> Author's Address
>
>    Mark Nottingham
>
>    Email: mnot@mnot.net
>    URI:   http://www.mnot.net/
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Nottingham               Expires March 21, 2014                 [Page 8]
> 

-- 
Dave Crocker
Brandenburg InternetWorking
bbiw.net

-- 
Dave Crocker
Brandenburg InternetWorking
bbiw.net

[apps-discuss] Review of: draft-ietf-appsawg-uri-… Dave Crocker
Re: [apps-discuss] Review of: draft-ietf-appsawg-… Mark Nottingham