On Tue, 27 Oct 2009, Greg Wilkins wrote: > Ian Hickson wrote: > > This would have some pretty major costs: > > > > - It requires length delimiting for text frames, which is more > > complicated to implement (it's non-trivial to tell the difference > > between characters and bytes). > > But length delimited frames are in the protocol anyway, so they have to > be implemented anyway. Only by generic clients (i.e. the browsers). Servers only have to implement the binary frames if they want to support binary frames in the protocol they implement, and dedicated clients (that just implement WebSocket as a necessary part of implementing whatever protocol it is that they are written for) would only need the binary frame support if the protocol they implement uses binary frames. > If there was only 1 framing type, then we have approximately half the > framing complexity. I've previously explained the reasoning for wanting to minimise the length measurements for UTF-8 data, so I won't repeat it here. > > - It requires parsing using a presized buffer for variable-encoding > > text, which risks character/byte mismatches and thus buffer overruns. > > With sentinel encoding, sending data might be marginally simpler, but > receiving data is much harder. Not particularly. In languages with automatic dynamic strings (like Perl, Python, ObjectPascal, etc) you just concatenate and all the complexity is hidden from you by the compiler or language runtime. If you are using explicit buffers, then the complexity consists of just doubling the buffer size when you reach the size of the buffer; it's not a big deal either. > You will still have buffers of fixed sizes when you receive bytes. You > don't know how much data is coming, so you don't know how big to make > your buffer or when to start turning bytes into characters. You don't know that anyway, if your internal representation isn't UTF-8, since UTF-8 is a variable-length encoding (e.g. Win32 uses UTF-16 internally, which is variable-length encoded in a different way from UTF-8, so you can't know how big the destination buffer should be without examining the whole byte string). > I can see implementations reading a buffer.... scanning for 0x00, not > finding it... allocating a larger buffer... copying the bytes ... > reading more bytes ... scanning again for 0x00.... still not finding > it.... allocating yet another larger buffer... copying the bytes .... > etc. etc. until either you get a denial of service or you find 0x00, > when you can finally scan over all the bytes again to convert to > characters. It's trivial to impose an arbitrary limit. It doesn't even have to be that arbitrary -- it can be whatever the server knows its protocol needs to support. Indeed, if the implementation uses fixed-size buffers like this, then it could just set its per-connection buffer to the maximum size it knows its protocol will ever handle, and just discard data if the limit is reached (or close the connection). (Martin responded to the rest of your comments.) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.