| Hi Robert -
Thanks for your responsiveness on this. I'll close the loop on just those points that Vijay didn't already touch on in his response.
<snip>
2) The description of Timestamp and Fractional Seconds is
very confusing.
What does it mean to be represented in big-endian fashion
with a most
significant octet and to be decimal encoded? It's very far
from clear that
you intend this to be an ascii string representing an
integer, and it's
not clear that you intend that it to be prefix-padded with
0s if necessary.
Did the group consider using ABNF to describe these fields
(or the whole
log record) to avoid this kind of ambiguity?
Good point. The text needs clarifying. I will remove the
portion of the text
that states "Represented in big-endian fashion with most
significant octet
first from zero starting at the left, or high-order,
position. Decimal encoded."
from the description of both Timestamp and Fractional
Seconds fields.
I will also supplement the text with the following summary
text:
The timestamp is represented in the log file as a UTF-8 string
representing the date and time of the request or response
represented as the number of seconds and milliseconds since the
Unix epoch. The number of milliseconds MUST be separated by a
"." from the number of seconds. It is not required to zero-pad
the seconds and milliseconds, although if they are zero padded,
a SIP CLF reader MUST be able to interpret them by discarding
leading zeroes.
Actually, I was under the impression that the zero padding was
expected. The examples have them consistently - why did that happen?
Either way, group members should say something about what the intent
was.
The examples will be updated to reflect the updated text regarding the zero padding.
<snip/> <snip>
7) Why does Tag -01 restrict message bodies to just the
listed types, especially
with the open ended "miscellaneous text content" type
category? If there's a real
restriction here, it needs to be much more clearly stated.
Why are you allowing
any LWS for the separator between the content type header
field value and the
message body itself? Why isn't this simply a space? It is
not clear what
"byte encoded" means for representing binary bodies - do you
mean expanding
it into a hex representation, a uuencode representation, or
soemthing else?
The decision to restrict logging of bodies to the specific
types listed in the draft
was a decision made by the WG [1].
The document needs to explain _why_ this particular subset was
chosen.
What basis did the group use for that choice? My read of the thread
was that Vijay
threw an initial list together and that just ended up in the
document. It's not clear
that the list was intended to be exclusive, just that whatever
subsequent decisions
that were made allowed at least those types.
WHY are you not allowing images? Why are you not allowing
multipart/mime?
(If you ever did actually see an S/MIME protected body, you would
not be able to log it
the way the document stands now.)
We floated some text on the list that contained a list of those body types the WG thought MUST be logged. Nobody responded with an alternative, neither did anyone respond against the proposal. So we went with it. I'm happy to reopen this discussion again on the list, but if no one else answers than we are no better off than before.
I share your concern that we are being needlessly restrictive so I'll put forth a proposal to the WG in the next version of the draft that removes the constraint of logging bodies of those specified types. The logging mechanism will remain unchanged but without body type restrictions.
The usage of a LWS separator was specified
as part of that proposal for logging bodies. Making the
separator a space
instead of LWS is fine, in fact, it is preferable. I'll
update the draft to reflect that
change.
I'll also draw a line in the sand and update the text
indicating that "byte encoded"
in this context means that bodies with binary data MUST be
base64 encoded for
logging (since MIME uses base64 encoding).
Please be careful to be specific in the instructions on when to
encode
(for example, would you re-encode a binary format that was already
base64 encoded?)
Sounds good. I'll add text to address this.
9) When describing repeating tag-01 or tag-02 multiple
times to deal with
large entities, the document should be clear that the
fragments get logged
in order, and how to reassemble the actual entity. It should
also say whether
it's ok to interleave (for example) tag-01 fragements with
other optional fields.
This is a great callout. There is text in the description
of Tag=01 that needs to
be updated to clearly state that if your body or the entire
SIP
message is > 4096 bytes, then tough luck. We can't chunk this
as it would open
up a can of worms that we don't want to deal with; like
indexing the chunks
and a reassembly scheme. Better not to go there. I will
update the text
to clearly state this point.
Now I'm confused. Why do you have the ability to repeat these tags?
It *is* currently confusing. The updated text will remove this ability to have multiple Tag -01 and -02 entries. The only repeated Tag that will be allowed is -00 ONLY when logging an optional field that occurs more than once in a SIP message (e.g. Contact, Route, Record-Route, etc.).
11) Where does the document talk about how to log fields
that are line folded
in the original message? Is it ok for the logged field to
contain CRLFs, or is
the intent that the field be unfolded before logging?
I will update the text to indicate that:
For mandatory fields:
The mandatory fields MUST NOT contain CRLF when logged.
For optional fields:
If an optional field contains a CRLF as part of its normal
production rule, the CRLF MUST be escaped by using the
URI encoded equivalent value of "%0D%0A".
Why are you treating optional fields and mandatory fields
differently here?
If there's a good reason, fine, but be sure to include the reason in
the document.
I'm simply laying out the escape rules for the various CLF log elements.
For logging bodies:
All CRLF digraphs in the SIP body MUST be escaped by using
the
URI encoded equivalent value of "%0D%0A"
For logging the entire SIP message:
All instances of CRLF digraphs, whether they appear in the SIP
headers or the body MUST be escaped by using the URI encoded
equivalent value of "%0D%0A".
Even when the body is binary (see previous discussion about
escaping)?
The group should weigh in on whether this is what they were
expecting for
the usual application/sdp case.
I think _ANY_ time a CRLF is present in a body it is escaped. If it isn't there in a binary body, then there is no need to escape.
Minor
There is a maintenance risk in having all the detail of each
block duplicated
in Figure 1. Please consider taking that figure up a level,
showing the three
blocks and how they're separated, and let the other figures
show the internal
detail.
This is something that we have discussed and have not found an
elegant solution for.
It was thought to be very valuable to display the pertinent
portion of the complete
CLF within the section where that was discussed. I'll do my
best to ensure that they
are all copied over properly from Figure 1.
This has been a common theme in the IESG reviews of many recent
documents. The issue
is not the care you take while editing this document. It's the
burden you put on whoever
ends up with the pen on an update.
I certainly understand. I just don't have a better solution at this time.
There is a lot of descriptive text in 4.2's sections on
the server and client
transaction that is a copy of what's in the
problem-statement/information model
document. It would be better to just point to the
description in the other document.
This is something that we debated about quite a bit. In the end
we thought it more
useful for implementers to have the description of all
mandatory fields right in the
format draft.
Again, this is something that IESG review has put pressure against
in many recent
documents. Expect to defend this during that review. At the very
least, make it clear
which document is authoritative should they conflict.
I'll do my best to defend this because I think it is what is best for this document and its 'one stop shop' readability for implementers. If this turns out to be a battle not worth waging with the IESG then I will concede and add a reference.
Please consider scrubbing the message used in example 5
back to the minimal
number of octets you need to make the example useful. In
particular, consider
removing elements that have not already been standardized.
I'm surprised to see
the literal strings "server-tx" and "client-tx" in the
example log message. Was
that what you really intended?
I'll look to trim down the sample message in Section 5 to a
minimal set of fields.
The use of server-txn and client-txn was intentional and done
to parallel the usage in
the data model draft. If you think a specific value is best,
then I'll make one up.
It's confusing as is - the example is supposed to be an actual log
message, not a
pseudo-code description of one isn't it?
Yes, it should be an actual log message. I'll update as requested.
Thanks,
Gonzalo |