[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sip] ABNF issue



Hi,

> Blindly doing a reverse token substitution will produce an incorrect
> parse. You can only apply it when you have proven the left side of
> the rule fits in a valid parse of the message.

[CHH] Correct. And, I "shift" a message-header when I have found one CRLF (the end of
the header). After that I'm done with that header. After that I continue parsing, and
if the previous message-header was the last one I expect to parse a second CRLF, to
indentify the end of the SIP header part of the message. So far I agree. What we
disagree on is the fact that you say that now the definition of LWS is not valid
anymore (even if we are still parsing the SIP part of the message), and that in this
case a CRLF is a CRLF - no matter what comes after it.

> So from your original example. LWS is not defined inside a body - the
> body is just a sequence of octets from the perspective of the SIP
> grammar. Your SIP parser _cannot_ apply a match against LWS inside it -
> there is no valid expansion of the SIP grammar that has LWS appearing
> after CRLF CRLF

[CHH] I don't think there is such thing as CRLF CRLF in the current SIP grammar (it
used to be before, though, I think). Again, the message-header is shifted once I found
one CRLF, so there will be no CRLF CRLF shift.

> >>The BNF specifies that message-header ends with a CRLF.
> >
> >Yes.
> >
> >>What follows that is one of two things - either another CRLF, marking the end
> >>of the headers, or another message header, which never begins with LWS.
> >
> >>From a SIP grammar point of view that is correct. However, I don't see it said
> >>anywhere that the definition of LWS isn't valid at the beginning of a new line.
>
> It _is_ valid after CRLF (which is what I think you mean by a new line)
> if (and only if) you are still in the expansion of message-header.

[CHH] That is what I am asking: where is it said that LWS is only valid within the
context of a message-header?

I do agree that LWS is not valid within the message-body part of the message, once we
get there, but I see it nowhere said that I can't parse a LWS AFTER I've shifted a
message-header, when I'm still in the SIP part of the message. No, the SIP grammar may
not allow it, and we may return a parser error, but that is a separate issue.

> >Also, just for the record, where is it said that a sip-header can't start with
> >LWS? Where is it said that line folding can't be used at the beginning of a line?
> >The text only says that the sip-header field values can be folded onto multiple
> >lines.
> I don't know what you mean by beginning of a line here. I assume you are
> asking why we cant have a header that looks like (\b for blank):
> \b\bNewHeader: foo which is a "\b\bNewheader"?
>
> The grammar answers that. extension-header starts with a token.

[CHH] Of course the LWS would not be part of the message-header grammar itself. LWS is
not even part of the message-header grammar within a header (ie the SIP grammar doesn't
require any LWS to be parsed). The grammar only says that LWS can be used
message-header parts for line folding, but it's not part of the grammar.

Let's take an example:

INVITE sip:12345@sip.com SIP/2.0 <CRLF_1>
Call-ID: 67890@sip.com <CRLF_2>
<CRLF_3>
<SP_1>CSeq: 100 INVITE <CRLF_4>

Now, the parser shifts the Call-ID header when it has found CRLF_2.

Then, it finds CRLF_3 and SP_1, and returns LWS. So, there is a line-folding before the
beginning of the CSeq header. Again, the LWS is NOT part of the CSeq header itself, but
my question is: where is it said that this LWS is NOT allowed at the beginning of a
line, before the actual message-header, but only WITHIN a message-header? Have I missed
it in the text?

> >>Thus, if a CRLF is followed by something OTHER than WSP, the next construction
> >>cannot be LWS.
> >
> >Maybe I missunderstand you, but the case I try to describe is when the CRLF IS
> >followed by WSP.
> If its in a valid expansion of message-header (which will not happen
> across a CRLFCRLF string), then you are looking at LWS.

[CHH] I may be wrong, but again I don't think there is such a thing as CRLFCRLF
(DOUBLE-CRLF) in the current SIP grammar.

Each CRLF is treated separately, and if there is another CRLF after a message-header
has been shifted we have found the end of the SIP part of the message. I don't argue on
that point.

My issue is, that once we've shifted a message-header, why isn't the definition of LWS
valid anymore, since we still are parsing the SIP part of the message?

> The _only_ place that second CRLF matches in the grammar is the literal in
> the Request and Response rules.

[CHH] Again, the point is not where it matches in the grammar. I am not talking about
the SIP grammar, but of the definition of the parts/tokens building the grammar.

> So for your example, any whitespace following that second CRLF matches message-body =
> *OCTET, NOT LWS.

[CHH] Again, I don't think we have any token called the "second CRLF" in the SIP
grammar.

I also think it's important to have a look at the definition of OCTET. RFC2234 says:

OCTET          =  %x00-FF

SP                  =  %x20

Now, if we say that CRLF/SP is LWS, but that CRLF/OCTET is not LWS, don't we have a
conflict here, since %x20 is part of both SP and OCTET?

Anyway, I don't want to go on with this argument forever. I am not an ABNF expert, and
it is possible to fix this on the implementation level, so...

Regards,

Christer Holmberg
Ericsson Finland



_______________________________________________
Sip mailing list  https://www1.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors@cs.columbia.edu for questions on current sip
Use sipping@ietf.org for new developments on the application of sip