[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Sip] Is body handling NP-complete?
Referring back to last March, in
http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was
contemplating whether the rules for testing a message body required
searching through combinations of choices regarding which alternatives
of multipart/alternatives are chosen. As I then conceptualized the
problem, the question was "Is there a choice of alternatives such that
all the constraints for 'successful processing' can be met?" It turns
out that if you phrase the question that way, the problem is
NP-complete, which means in practice that any algorithm to solve it
must do backtracking.
The new formulation of the problem attempts to avoid this by dividing
the problem into three phases: (1) assemble a tree which represents
the dependencies of super-parts' being processable upon that of
sub-parts, (2) walk the tree, tagging parts from the bottom up as to
whether they are processable, (3) if the overall tree is processable,
selecting for processing the correct subset of parts as directed by
the tags in the tree. As long as the size of the tree is commensurate
with the number of body parts, the algorithm is inherently linear, and
clearly it doesn't involve backtracking.
In regard to step (1), the tree is assembled:
- the root node represents the message body
- a node for a multipart/alternative part has one child for each
component
- a node for a multipart/mixed part has one child for each
component
- a note for a multipart/related part is not automatically given
children for its components
- a header in a part that rFrom sip-bounces at ietf.org Mon Sep 29 20:09:56 2008
Return-Path: <sip-bounces at ietf.org>
X-Original-To: sip-archive at optimus.ietf.org
Delivered-To: ietfarch-sip-archive at core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1])
by core3.amsl.com (Postfix) with ESMTP id CC4E93A689D;
Mon, 29 Sep 2008 20:09:56 -0700 (PDT)
X-Original-To: sip at core3.amsl.com
Delivered-To: sip at core3.amsl.com
Received: from localhost (localhost [127.0.0.1])
by core3.amsl.com (Postfix) with ESMTP id C57F13A6955
for <sip at core3.amsl.com>; Mon, 29 Sep 2008 20:09:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.999
X-Spam-Level:
X-Spam-Status: No, score=-2.999 tagged_above=-999 required=5
tests=[AWL=-0.600, BAYES_00=-2.599, J_CHICKENPOX_14=0.6,
J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32])
by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id rfHMfP0Z7JaO for <sip at core3.amsl.com>;
Mon, 29 Sep 2008 20:09:51 -0700 (PDT)
Received: from TheWorld.com (pcls1.std.com [192.74.137.141])
by core3.amsl.com (Postfix) with ESMTP id E3EEC3A6924
for <sip at ietf.org>; Mon, 29 Sep 2008 20:09:50 -0700 (PDT)
Received: from shell.TheWorld.com (root at shell01.theworld.com [192.74.137.71])
by TheWorld.com (8.13.6/8.13.6) with ESMTP id m8U39w9U003332
for <sip at ietf.org>; Mon, 29 Sep 2008 23:10:00 -0400
Received: from shell01.TheWorld.com (localhost.theworld.com [127.0.0.1])
by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id m8U39whX37110736
for <sip at ietf.org>; Mon, 29 Sep 2008 23:09:58 -0400 (EDT)
Received: (from worley at localhost)
by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id m8U39wbI37097791;
Mon, 29 Sep 2008 23:09:58 -0400 (EDT)
Date: Mon, 29 Sep 2008 23:09:58 -0400 (EDT)
Message-Id: <200809300309.m8U39wbI37097791 at shell01.TheWorld.com>
To: sip at ietf.org
From: Dale.Worley at comcast.net
Subject: [Sip] Is body handling NP-complete?
X-BeenThere: sip at ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Session Initiation Protocol <sip.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip>,
<mailto:sip-request at ietf.org?subject=unsubscribe>
List-Post: <mailto:sip at ietf.org>
List-Help: <mailto:sip-request at ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip>,
<mailto:sip-request at ietf.org?subject=subscribe>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: sip-bounces at ietf.org
Errors-To: sip-bounces at ietf.org
Referring back to last March, in
http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was
contemplating whether the rules for testing a message body required
searching through combinations of choices regarding which alternatives
of multipart/alternatives are chosen. As I then conceptualized the
problem, the question was "Is there a choice of alternatives such that
all the constraints for 'successful processing' can be met?" It turns
out that if you phrase the question that way, the problem is
NP-complete, which means in practice that any algorithm to solve it
must do backtracking.
The new formulation of the problem attempts to avoid this by dividing
the problem into three phases: (1) assemble a tree which represents
the dependencies of super-parts' being processable upon that of
sub-parts, (2) walk the tree, tagging parts from the bottom up as to
whether they are processable, (3) if the overall tree is processable,
selecting for processing the correct subset of parts as directed by
the tags in the tree. As long as the size of the tree is commensurate
with the number of body parts, the algorithm is inherently linear, and
clearly it doesn't involve backtracking.
In regard to step (1), the tree is assembled:
- the root node represents the message body
- a node for a multipart/alternative part has one child for each
component
- a node for a multipart/mixed part has one child for each
component
- a note for a multipart/related part is not automatically given
children for its components
- a header in a part that refers toefers to another component generates a
child of that part that represents the referred-to component
- similarly, a reference within a part that refers to another
component generates a child of that part that represents the
referred-to component (This is how multipart/related nodes get
children.)
Of course, if a component is referred to multiple times, the component
is represented by multiple nodes in the tree, one node for each of its
roles.
However, there are subtleties I might be overlooking, so I am going to
revisit the embedding of the NP-complete problem "3-satisfiability"
into MIME to see what it reveals about my sketch of an algorithm.
Referring to the previous message:
The reduction is as follows:
http://en.wikipedia.org/wiki/NP-complete
http://en.wikipedia.org/wiki/Reduction_%28complexity%29
http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability
Choose an instance of the 3-satisfiability problem, that is: a set of
boolean variables A, B, C, etc. and a formula composed of:
a series of "clauses" joined by AND, consisting of
a series of variables and their negations joined by OR.
The problem is to determine if there is any assignment of truth values
to the variables which makes the formula true, that is, which makes
true one of the variables or negations in each of the clauses.
For example, choose 4 variables A, B, C, and D, and the formula
(A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B OR NOT-D)
This can be satisfied by setting A=TRUE, B=FALSE, C=TRUE, and D=FALSE.
Construct a multipart/mixed with each component labeled
handling=required. Within that, for each variable, have a part which
is multipart/alternate with 2 parts. If processing chooses the first
part of the part corresponding to a variable, that will model the
variable being TRUE. If processing chooses the second part of the
part corresponding to a variable, that will model the variable being
FALSE.
For example, our body starts:
multipart/mixed
multipart/alternative;handling=required
first part corresponding to A (not yet specified)
second part corresponding to NOT-A
multipart/alternative;handling=required
first part corresponding to B
second part corresponding to NOT-B
multipart/alternative;handling=required
first part corresponding to C
second part corresponding to NOT-C
multipart/alternative;handling=required
first part corresponding to D
second part corresponding to NOT-D
For each clause of the formula, add an additional body part, marked
"Content-Disposition: by-reference; handling=required". Give the part
a 'cid' URL. The ability of the recipient to successfully process
this part will model making the corresponding clause TRUE.
For example, our body is now:
multipart/mixed
multipart/alternative;handling=required
first part corresponding to A (not yet specified)
second part corresponding to NOT-A
multipart/alternative;handling=required
first part corresponding to B
second part corresponding to NOT-B
multipart/alternative;handling=required
first part corresponding to C
second part corresponding to NOT-C
multipart/alternative;handling=required
first part corresponding to D
second part corresponding to NOT-D
part (not otherwise specified);by-reference;handling=required;cid=clause1
part (not otherwise specified);by-reference;handling=required;cid=clause2
part (not otherwise specified);by-reference;handling=required;cid=clause3
We now want to arrange things so that one of the last 3 parts can be
successfully processed only if we choose for processing one of the
earlier parts that corresponds to a variable value that will make the
corresponding clause true. This is done by embedding forward
references in the variable-corresponding parts that refer to the
clause-corresponding parts.
For example:
1 multipart/mixed
2 multipart/alternative;handling=required
3 first part corresponding to A
references cid:clause1
4 second part corresponding to NOT-A
references cid:clause3
5 multipart/alternative;handling=required
6 first part corresponding to B
references cid:clause1
references cid:clause2
7 second part corresponding to NOT-B
references cid:clause3
8 multipart/alternative;handling=required
9 first part corresponding to C
references cid:clause1
10 second part corresponding to NOT-C
references cid:clause2
11 multipart/alternative;handling=required
12 first part corresponding to D
references cid:clause2
13 second part corresponding to NOT-D
references cid:clause3
14 part;by-reference;handling=required;cid=clause1
15 part;by-reference;handling=required;cid=clause2
16 part;by-reference;handling=required;cid=clause3
In order to process this body, the recipient must choose exactly one
of each pair of bodies in the multipart/alternative body parts. This
choice is equivalent to choosing a set of values for the variables.
The references embedded in the chosen parts refer to the body parts
that correspond to the clauses which are made true by the
corresponding variable's value. In order to successfully process the
final body parts, the recipient must have chosen a set of variable
values that makes all of the clauses true.
The resulting tree is:
1 -+- 2 --+- 3 ---- 14
| |
| +- 4 ---- 16
|
+- 5 --+- 6 --+- 14
| | |
| | +- 15
| |
| +- 7 ---- 16
|
+- 8 --+- 9 ---- 14
| |
| +- 10 --- 15
|
+- 11 -+- 12 --- 15
| |
| +- 13 --- 16
|
+- 14
|
+- 15
|
+- 16
Though the number of nodes is greater than the number of body parts,
it's limited approximately by the number of appearances of variables
in the original boolean formula, so it's linear in the number of body
parts. So in building the tree, we don't have exponential effort.
In the new approach, we walk the tree, tagging each node as to whether
it is processable. In the above model, we assume that all the simple
parts are processable, so all nodes labeled 2 through 16 are marked
processable. And as we tag the nodes for processability, we choose
for each multipart/alternative node which child will be processed
(namely, the last child that is processable). In this tree, the
choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13.
Where things get interesting is deciding if node 1 is processable.
All of its children are processable, of course. But the crux of the
construction is how the "by-reference;handling=required" constraints
on nodes 14, 15, and 16 are handled. In this algorithm, we check
whether each part is referenced by a part that will be processed.
Fortunately, all references to these nodes must have been tree-walked
before node 1 is examined, so can determine unambiguously whether used
references point to those nodes. In this case, we have chosen to
process 2, 4, 5, 7, 8, 10, 11, and 13, which means that there are used
references to 16, 16, 15, and 16. This leaves 17 unreferenced, and so
node 1 is determined to be unprocessable.
Note that the new algorithm avoids being NP-complete essentially by
making choices as soon as they are seen. In this model, values NOT-A,
NOT-B, NOT-C, and NOT-D are chosen outright. Since the example
formula is not satisfied, the body that encodes it is not processable.
This example also points out that the requirement that references
appear before the parts they reference is needed to make the problem
tractable. Otherwise, there could be circular dependencies between
the choices for different multipart/alternatives, which might be
impossible to analyze by an algorithm as simple as a tree-walk.
Dale
_______________________________________________
Sip mailing list https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sipping at ietf.org for new developments on the application of sip
another component generates a
child of that part that represents the referred-to component
- similarly, a reference within a part that refers to another
component generates a child of that part that represents the
referred-to component (This is how multipart/related nodes get
children.)
Of course, if a component is referred to multiple times, the component
is represented by multiple nodes in the tree, one node for each of its
roles.
However, there are subtleties I might be overlooking, so I am going to
revisit the embedding of the NP-complete problem "3-satisfiability"
into MIME to see what it reveals about my sketch of an algorithm.
Referring to the previous message:
The reduction is as follows:
http://en.wikipedia.org/wiki/NP-complete
http://en.wikipedia.org/wiki/Reduction_%28complexity%29
http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability
Choose an instance of the 3-satisfiability problem, that is: a set of
boolean variables A, B, C, etc. and a formula composed of:
a series of "clauses" joined by AND, consisting of
a series of variables and their negations joined by OR.
The problem is to determine if there is any assignment of truth values
to the variables which makes the formula true, that is, which makes
true one of the variables or negations in each of the clauses.
For example, choose 4 variables A, B, C, and D, and the formula
(A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B OR NOT-D)
This can be satisfied by setting A=TRUE, B=FALSE, C=TRUE, and D=FALSE.
Construct a multipart/mixed with each component labeled
handling=required. Within that, for each variable, have a part which
is multipart/alternate with 2 parts. If processing chooses the first
part of the part corresponding to a variable, that will model the
variable being TRUE. If processing chooses the second part of the
part corresponding to a variable, that will model the variable being
FALSE.
For example, our body starts:
multipart/mixed
multipart/alternative;handling=required
first part corresponding to A (not yet specified)
second part corresponding to NOT-A
multipart/alternative;handling=required
first part corresponding to B
second part corresponding to NOT-B
multipart/alternative;handling=required
first part corresponding to C
second part corresponding to NOT-C
multipart/alternative;handling=required
first part corresponding to D
second part corresponding to NOT-D
For each clause of the formula, add an additional body part, marked
"Content-Disposition: by-reference; handling=required". Give the part
a 'cid' URL. The ability of the recipient to successfully process
this part will model making the corresponding clause TRUE.
For example, our body is now:
multipart/mixed
multipart/alternative;handling=required
first part corresponding to A (not yet specified)
second part corresponding to NOT-A
multipart/alternative;handling=required
first part corresponding to B
second part corresponding to NOT-B
multipart/alternative;handling=required
first part corresponding to C
second part corresponding to NOT-C
multipart/alternative;handling=required
first part corresponding to D
second part corresponding to NOT-D
part (not otherwise specified);by-reference;handling=required;cid=clause1
part (not otherwise specified);by-reference;handling=required;cid=clause2
part (not otherwise specified);by-reference;handling=required;cid=clause3
We now want to arrange things so that one of the last 3 parts can be
successfully processed only if we choose for processing one of the
earlier parts that corresponds to a variable value that will make the
corresponding clause true. This is done by embedding forward
references in the variable-corresponding parts that refer to the
clause-corresponding parts.
For example:
1 multipart/mixed
2 multipart/alternative;handling=required
3 first part corresponding to A
references cid:clause1
4 second part corresponding to NOT-A
references cid:clause3
5 multipart/alternative;handling=required
6 first part corresponding to B
references cid:clause1
references cid:clause2
7 second part corresponding to NOT-B
references cid:clause3
8 multipart/alternative;handling=required
9 first part corresponding to C
references cid:clause1
10 second part corresponding to NOT-C
references cid:clause2
11 multipart/alternative;handling=required
12 first part corresponding to D
references cid:clause2
13 second part corresponding to NOT-D
references cid:clause3
14 part;by-reference;handling=required;cid=clause1
15 part;by-reference;handling=required;cid=clause2
16 part;by-reference;handling=required;cid=clause3
In order to process this body, the recipient must choose exactly one
of each pair of bodies in the multipart/alternative body parts. This
choice is equivalent to choosing a set of values for the variables.
The references embedded in the chosen parts refer to the body parts
that correspond to the clauses which are made true by the
corresponding variable's value. In order to successfully process the
final body parts, the recipient must have chosen a set of variable
values that makes all of the clauses true.
The resulting tree is:
1 -+- 2 --+- 3 ---- 14
| |
| +- 4 ---- 16
|
+- 5 --+- 6 --+- 14
| | |
| | +- 15
| |
| +- 7 ---- 16
|
+- 8 --+- 9 ---- 14
| |
| +- 10 --- 15
|
+- 11 -+- 12 --- 15
| |
| +- 13 --- 16
|
+- 14
|
+- 15
|
+- 16
Though the number of nodes is greater than the number of body parts,
it's limited approximately by the number of appearances of variables
in the original boolean formula, so it's linear in the number of body
parts. So in building the tree, we don't have exponential effort.
In the new approach, we walk the tree, tagging each node as to whether
it is processable. In the above model, we assume that all the simple
parts are processable, so all nodes labeled 2 through 16 are marked
processable. And as we tag the nodes for processability, we choose
for each multipart/alternative node which child will be processed
(namely, the last child that is processable). In this tree, the
choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13.
Where things get interesting is deciding if node 1 is processable.
All of its children are processable, of course. But the crux of the
construction is how the "by-reference;handling=required" constraints
on nodes 14, 15, and 16 are handled. In this algorithm, we check
whether each part is referenced by a part that will be processed.
Fortunately, all references to these nodes must have been tree-walked
before node 1 is examined, so can determine unambiguously whether used
references point to those nodes. In this case, we have chosen to
process 2, 4, 5, 7, 8, 10, 11, and 13, which means that there are used
references to 16, 16, 15, and 16. This leaves 17 unreferenced, and so
node 1 is determined to be unprocessable.
Note that the new algorithm avoids being NP-complete essentially by
making choices as soon as they are seen. In this model, values NOT-A,
NOT-B, NOT-C, and NOT-D are chosen outright. Since the example
formula is not satisfied, the body that encodes it is not processable.
This example also points out that the requirement that references
appear before the parts they reference is needed to make the problem
tractable. Otherwise, there could be circular dependencies between
the choices for different multipart/alternatives, which might be
impossible to analyze by an algorithm as simple as a tree-walk.
Dale
_______________________________________________
Sip mailing list https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sipping at ietf.org for new developments on the application of sip