RE: host-meta: template syntax hassles
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: host-meta: template syntax hassles



Thanks James.

 

The examples are obviously limited to URI with specific characteristics. If the presence of these examples is misleading, I will be happy to change them to put the {uri} at the end as a query parameter.

 

Defining a more complex syntax is absolutely out of scope. The template syntax is kept intentionally simple and a subset of Roy’s proposal for a URI template standard in hopes that when that work is completed, it will be forward compatible (code written for the new proposal being able to handle old files).

 

EHL

 

From: apps-discuss-bounces at ietf.org [mailto:apps-discuss-bounces at ietf.org] On Behalf Of Manger, James H
Sent: Monday, October 26, 2009 10:02 PM
To: apps-discuss at ietf.org
Subject: host-meta: template syntax hassles

 

draft-hammer-hostmeta-02 “Host-Meta: Web Host Metadata” defines a URI template syntax and variables based on parts of a URI [§3.2.1.].

 

I don’t think the current arrangement can work as easily as the examples in the draft suggest.

 

The example of a link to each resource’s author presumably wants “;by” to be appended to the resource URI’s path — but that is not what the “{+uri};by” template does.

  <Link> <Rel>author</Rel> <URITemplate>{+uri};by</URITemplate> </Link>

For URIs with query strings, “;by” is appended to the last query parameter value. This is almost certainly NOT desired. The order of query parameters is normally irrelevant but this template changes the value of whichever parameter happens to be last.

 

A more correct template is probably: “{+scheme}://{+authority}{+path};by?{+query}”

I am not sure that even this is correct if the path is empty (eg you may get the invalid URI “http://example.net;by?”)

There will also be a trailing “?” when there are no query parameters, which may be mostly harmless, but is strictly different from a URI without the “?” so caching and comparisons are adversely affected.

 

Another example from the draft is “{+uri}&test”. This either makes “&test” part of the path, or a new query parameter (depending on whether or not the URI already has a query string). This is probably never desirable. If a template wants to add a query parameter it probably needs to specify

  “{+scheme}://{+authority}{+path}?{+query}&test”.

Even this is not ideal as it produces unusual URIs such as “http://example.net/article?&test”, which again may be mostly harmless but is strictly different from a URI with just “?test” so caching and comparisons are adversely affected.

 

Another example from the spec is “http://meta.{host}:8080{+path}?{+query}”.

If some URIs might have a userinfo component then the template needs to be “http://{+userinfo} at meta.{host}:8080{+path}?{+query}”. However, when there is no userinfo the resulting URI starts http://@… that (in some clients at least, eg curl) causes a “Authorization: Basic Og==” header to be included when requesting the URI. That could be harmful.

P.S. It should probably also use {+host}, instead of {host}.

 

 

The minimal (partial) solution is to change the examples in the draft to be realistic (eg work for all URIs in the scope, with or without query strings).

 

Another solution is to define a more sophisticated template syntax. A syntax that lists a prefix that is only substituted when the variable is defined (non-null) may be almost sufficient. That address the problem that the defined variables for URI parts (scheme, authority, query, fragment, userinfo, host, port) don’t include the separator characters used when combining the parts.

 

 

In the host-meta case a better solution would be to define the translation from resource URI to link URI with a regular _expression_ and replacement string. The replacement string can reference “capturing groups” in the regular _expression_. I am sure all modern programming languages offer broadly similar regex-based replacement functionality. For instance, for Java see String#replaceFirst(regex, replacement) and Matcher.appendReplacement.

 

Then the question is where is the regex specified: in the <Link> or as the <Subject>?

Using a regular _expression_ to indicate all the URIs that the metadata (XRD) applies to (ie the XRD’s subject) feels flexible and appropriate. Host-meta would not need to be a special case with its own <host-meta:Scope>. Metadata that applied to all the URIs in a subdirectory (eg all under /blog/*) would be easy to define.

 

<XRD>

 <SubjectRegex>https?://(?:www\.)?example\.com/([^?#]*)(\?[^#]*)?(#\.*)?</SubjectRegex>

 <Link><Rel>author</Rel><URIPattern>https://www.example.com/\1;by\2</URLPattern></Link>

</XRD>

 

It is a pity regexs can be awkward to write (& harder to read). Including an example (or default value?) in the spec that captures the scheme, authority, path, query & fragment could be almost equivalent to the current template proposal (with much more flexibility, but less friendly variable names).

 

 

James Manger
James.H.Manger at team.telstra.com
Identity and security team Chief Technology Office Telstra


Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.