[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Uri-review] [hybi] ws: and wss: schemes



Ian Hickson wrote:
...
Then the encoding considerations should be something like:

  Because many characters are not permitted with this syntax, the
  "heir-part" and "query" elements may contain characters from the
  Unicode Character Set [UCS] as suggested by URI [RFC3986] using the
  reg-name and percent-encoding translations of IRI to URI
  mapping [RFC3937]. Translation is performed by first encoding those
  Unicode characters as octets to the UTF-8 character
  encoding [RFC3629]. Replace the reg-name part of the heir-part by
  the part converted using the ToASCII operation specified in section
  4.1 of [RFC3490] on each dot-separated label, and by using U+002E
  (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules
  set to TRUE, and with the flag AllowUnassigned set to TRUE. Then
  only those octets that do not correspond to characters in the
  unreserved set should be percent-encoded.

  By using UTF-8 encoding, there are no known compatibility issues
  with mapping Internationlized Resource Identifiers to websocket
  URIs according to [RFC3987].

I've used the above as a guide for what to put in the spec. I didn't use it literally because it seemed to misuse RFC2119 terminology, and it wasn't clear to me where the descriptive ended and the normative started. I hope the text now in the spec makes sense. Let me know if it needs more work.
...

It now says:

   Encoding considerations.
      Characters in the host component that are excluded by the syntax
      defined above must be converted from Unicode to ASCII by applying
      the IDNA ToASCII algorithm to the Unicode host name, with both the
      AllowUnassigned and UseSTD3ASCIIRules flags set, and using the
      result of this algorithm as the host in the URI.

      Characters in other components that are excluded by the syntax
      defined above must be converted from Unicode to ASCII by first
      encoding the characters as UTF-8 and then replacing the
      corresponding bytes using their percent-encoded form as defined in
      the URI and IRI specification.  [RFC3986] [RFC3987]

I think that's good, except that the mention of IRI in the last sentence seems to be superfluous. RFC3986 already defines everything that is needed here. Or is there something specific from the IRI spec you think is relevant? (In which case it should state that more clearly).

BR, Julian