Even though I said that discussion about protocol requirements is more important, here is some more debate about protocol design: Ian Hickson wrote: > This would have some pretty major costs: > > - It requires length delimiting for text frames, which is more complicated > to implement (it's non-trivial to tell the difference between characters > and bytes). But length delimited frames are in the protocol anyway, so they have to be implemented anyway. If there was only 1 framing type, then we have approximately half the framing complexity. > - It requires parsing using a presized buffer for variable-encoding text, > which risks character/byte mismatches and thus buffer overruns. With sentinel encoding, sending data might be marginally simpler, but receiving data is much harder. You will still have buffers of fixed sizes when you receive bytes. You don't know how much data is coming, so you don't know how big to make your buffer or when to start turning bytes into characters. I can see implementations reading a buffer.... scanning for 0x00, not finding it... allocating a larger buffer... copying the bytes ... reading more bytes ... scanning again for 0x00.... still not finding it.... allocating yet another larger buffer... copying the bytes .... etc. etc. until either you get a denial of service or you find 0x00, when you can finally scan over all the bytes again to convert to characters. If the message is larger than you want to read, then you don't know this until you have read all the bytes you can and scanned for a 0x00. Resource limits are always better applied before the resource is consumed than after. To avoid wastful copying, you have to allocate wasteful large buffers, which can still fill up and might need to grow. Existing UTF-8 to character conversions will error if they get a partial multipart character, so you can start conversion to characters until you have the entire message. Efficient implementations will be forced to reimplement utf-8 converters to scan for 0x00 while converting. This will allow smaller fixed byte buffers, but will still need growing and/or large character buffers. This is probably also what needs to be done to be able to use direct kernel buffers. I don't think there is a clear winner between the two types of framing. Both have their pros and cons. But implementing both will be approximately twice as complex and it will mean that you have to deal with both sets of cons. regards
Note Well: Messages sent to this mailing list are the opinions of the senders and do not imply endorsement by the IETF.