[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Sip] Is body handling NP-complete?



Referring back to last March, in
http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was
contemplating whether the rules for testing a message body required
searching through combinations of choices regarding which alternatives
of multipart/alternatives are chosen.  As I then conceptualized the
problem, the question was "Is there a choice of alternatives such that
all the constraints for 'successful processing' can be met?"  It turns
out that if you phrase the question that way, the problem is
NP-complete, which means in practice that any algorithm to solve it
must do backtracking.

The new formulation of the problem attempts to avoid this by dividing
the problem into three phases:  (1) assemble a tree which represents
the dependencies of super-parts' being processable upon that of
sub-parts, (2) walk the tree, tagging parts from the bottom up as to
whether they are processable, (3) if the overall tree is processable,
selecting for processing the correct subset of parts as directed by
the tags in the tree.  As long as the size of the tree is commensurate
with the number of body parts, the algorithm is inherently linear, and
clearly it doesn't involve backtracking.

In regard to step (1), the tree is assembled:

- the root node represents the message body
- a node for a multipart/alternative part has one child for each
  component
- a node for a multipart/mixed part has one child for each
  component
- a note for a multipart/related part is not automatically given
  children for its components
- a header in a part that rFrom sip-bounces at ietf.org  Mon Sep 29 20:09:56 2008
Return-Path: <sip-bounces at ietf.org>
X-Original-To: sip-archive at optimus.ietf.org
Delivered-To: ietfarch-sip-archive at core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1])
	by core3.amsl.com (Postfix) with ESMTP id CC4E93A689D;
	Mon, 29 Sep 2008 20:09:56 -0700 (PDT)
X-Original-To: sip at core3.amsl.com
Delivered-To: sip at core3.amsl.com
Received: from localhost (localhost [127.0.0.1])
	by core3.amsl.com (Postfix) with ESMTP id C57F13A6955
	for <sip at core3.amsl.com>; Mon, 29 Sep 2008 20:09:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.999
X-Spam-Level: 
X-Spam-Status: No, score=-2.999 tagged_above=-999 required=5
	tests=[AWL=-0.600, BAYES_00=-2.599, J_CHICKENPOX_14=0.6,
	J_CHICKENPOX_15=0.6, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.32])
	by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id rfHMfP0Z7JaO for <sip at core3.amsl.com>;
	Mon, 29 Sep 2008 20:09:51 -0700 (PDT)
Received: from TheWorld.com (pcls1.std.com [192.74.137.141])
	by core3.amsl.com (Postfix) with ESMTP id E3EEC3A6924
	for <sip at ietf.org>; Mon, 29 Sep 2008 20:09:50 -0700 (PDT)
Received: from shell.TheWorld.com (root at shell01.theworld.com [192.74.137.71])
	by TheWorld.com (8.13.6/8.13.6) with ESMTP id m8U39w9U003332
	for <sip at ietf.org>; Mon, 29 Sep 2008 23:10:00 -0400
Received: from shell01.TheWorld.com (localhost.theworld.com [127.0.0.1])
	by shell.TheWorld.com (8.13.6/8.12.8) with ESMTP id m8U39whX37110736
	for <sip at ietf.org>; Mon, 29 Sep 2008 23:09:58 -0400 (EDT)
Received: (from worley at localhost)
	by shell01.TheWorld.com (8.13.6/8.13.6/Submit) id m8U39wbI37097791;
	Mon, 29 Sep 2008 23:09:58 -0400 (EDT)
Date: Mon, 29 Sep 2008 23:09:58 -0400 (EDT)
Message-Id: <200809300309.m8U39wbI37097791 at shell01.TheWorld.com>
To: sip at ietf.org
From: Dale.Worley at comcast.net
Subject: [Sip] Is body handling NP-complete?
X-BeenThere: sip at ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Session Initiation Protocol <sip.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip>,
	<mailto:sip-request at ietf.org?subject=unsubscribe>
List-Post: <mailto:sip at ietf.org>
List-Help: <mailto:sip-request at ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip>,
	<mailto:sip-request at ietf.org?subject=subscribe>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: sip-bounces at ietf.org
Errors-To: sip-bounces at ietf.org

Referring back to last March, in
http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was
contemplating whether the rules for testing a message body required
searching through combinations of choices regarding which alternatives
of multipart/alternatives are chosen.  As I then conceptualized the
problem, the question was "Is there a choice of alternatives such that
all the constraints for 'successful processing' can be met?"  It turns
out that if you phrase the question that way, the problem is
NP-complete, which means in practice that any algorithm to solve it
must do backtracking.

The new formulation of the problem attempts to avoid this by dividing
the problem into three phases:  (1) assemble a tree which represents
the dependencies of super-parts' being processable upon that of
sub-parts, (2) walk the tree, tagging parts from the bottom up as to
whether they are processable, (3) if the overall tree is processable,
selecting for processing the correct subset of parts as directed by
the tags in the tree.  As long as the size of the tree is commensurate
with the number of body parts, the algorithm is inherently linear, and
clearly it doesn't involve backtracking.

In regard to step (1), the tree is assembled:

- the root node represents the message body
- a node for a multipart/alternative part has one child for each
  component
- a node for a multipart/mixed part has one child for each
  component
- a note for a multipart/related part is not automatically given
  children for its components
- a header in a part that refers toefers to another component generates a
  child of that part that represents the referred-to component
- similarly, a reference within a part that refers to another
  component generates a child of that part that represents the
  referred-to component (This is how multipart/related nodes get
  children.)

Of course, if a component is referred to multiple times, the component
is represented by multiple nodes in the tree, one node for each of its
roles.

However, there are subtleties I might be overlooking, so I am going to
revisit the embedding of the NP-complete problem "3-satisfiability"
into MIME to see what it reveals about my sketch of an algorithm.

Referring to the previous message:

    The reduction is as follows:

    http://en.wikipedia.org/wiki/NP-complete
    http://en.wikipedia.org/wiki/Reduction_%28complexity%29
    http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability

    Choose an instance of the 3-satisfiability problem, that is: a set of
    boolean variables A, B, C, etc. and a formula composed of:
	    a series of "clauses" joined by AND, consisting of
		    a series of variables and their negations joined by OR.
    The problem is to determine if there is any assignment of truth values
    to the variables which makes the formula true, that is, which makes
    true one of the variables or negations in each of the clauses.

    For example, choose 4 variables A, B, C, and D, and the formula

    (A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B OR NOT-D)

    This can be satisfied by setting A=TRUE, B=FALSE, C=TRUE, and D=FALSE.

    Construct a multipart/mixed with each component labeled
    handling=required.  Within that, for each variable, have a part which
    is multipart/alternate with 2 parts.  If processing chooses the first
    part of the part corresponding to a variable, that will model the
    variable being TRUE.  If processing chooses the second part of the
    part corresponding to a variable, that will model the variable being
    FALSE.

    For example, our body starts:

	    multipart/mixed
		    multipart/alternative;handling=required
			    first part corresponding to A (not yet specified)
			    second part corresponding to NOT-A
		    multipart/alternative;handling=required
			    first part corresponding to B
			    second part corresponding to NOT-B
		    multipart/alternative;handling=required
			    first part corresponding to C
			    second part corresponding to NOT-C
		    multipart/alternative;handling=required
			    first part corresponding to D
			    second part corresponding to NOT-D

    For each clause of the formula, add an additional body part, marked
    "Content-Disposition: by-reference; handling=required".  Give the part
    a 'cid' URL.  The ability of the recipient to successfully process
    this part will model making the corresponding clause TRUE.

    For example, our body is now:

	    multipart/mixed
		    multipart/alternative;handling=required
			    first part corresponding to A (not yet specified)
			    second part corresponding to NOT-A
		    multipart/alternative;handling=required
			    first part corresponding to B
			    second part corresponding to NOT-B
		    multipart/alternative;handling=required
			    first part corresponding to C
			    second part corresponding to NOT-C
		    multipart/alternative;handling=required
			    first part corresponding to D
			    second part corresponding to NOT-D
		    part (not otherwise specified);by-reference;handling=required;cid=clause1
		    part (not otherwise specified);by-reference;handling=required;cid=clause2
		    part (not otherwise specified);by-reference;handling=required;cid=clause3

    We now want to arrange things so that one of the last 3 parts can be
    successfully processed only if we choose for processing one of the
    earlier parts that corresponds to a variable value that will make the
    corresponding clause true.  This is done by embedding forward
    references in the variable-corresponding parts that refer to the
    clause-corresponding parts.

    For example:

 1	    multipart/mixed
 2		    multipart/alternative;handling=required
 3			    first part corresponding to A
				    references cid:clause1
 4			    second part corresponding to NOT-A
				    references cid:clause3
 5		    multipart/alternative;handling=required
 6			    first part corresponding to B
				    references cid:clause1
				    references cid:clause2
 7			    second part corresponding to NOT-B
				    references cid:clause3
 8		    multipart/alternative;handling=required
 9			    first part corresponding to C
				    references cid:clause1
10			    second part corresponding to NOT-C
				    references cid:clause2
11		    multipart/alternative;handling=required
12			    first part corresponding to D
				    references cid:clause2
13			    second part corresponding to NOT-D
				    references cid:clause3
14		    part;by-reference;handling=required;cid=clause1
15		    part;by-reference;handling=required;cid=clause2
16		    part;by-reference;handling=required;cid=clause3

    In order to process this body, the recipient must choose exactly one
    of each pair of bodies in the multipart/alternative body parts.  This
    choice is equivalent to choosing a set of values for the variables.
    The references embedded in the chosen parts refer to the body parts
    that correspond to the clauses which are made true by the
    corresponding variable's value.  In order to successfully process the
    final body parts, the recipient must have chosen a set of variable
    values that makes all of the clauses true.

The resulting tree is:

	1 -+- 2 --+- 3 ---- 14
	   |      |
	   |      +- 4 ---- 16
	   |
	   +- 5 --+- 6 --+- 14
	   |      |      |
	   |      |      +- 15
	   |      |
	   |      +- 7 ---- 16
	   |
	   +- 8 --+- 9 ---- 14
	   |      |
	   |      +- 10 --- 15
	   |
	   +- 11 -+- 12 --- 15
	   |      |
	   |      +- 13 --- 16
	   |
	   +- 14
	   |
	   +- 15
	   |
	   +- 16

Though the number of nodes is greater than the number of body parts,
it's limited approximately by the number of appearances of variables
in the original boolean formula, so it's linear in the number of body
parts.  So in building the tree, we don't have exponential effort.

In the new approach, we walk the tree, tagging each node as to whether
it is processable.  In the above model, we assume that all the simple
parts are processable, so all nodes labeled 2 through 16 are marked
processable.  And as we tag the nodes for processability, we choose
for each multipart/alternative node which child will be processed
(namely, the last child that is processable).  In this tree, the
choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13.

Where things get interesting is deciding if node 1 is processable.
All of its children are processable, of course.  But the crux of the
construction is how the "by-reference;handling=required" constraints
on nodes 14, 15, and 16 are handled.  In this algorithm, we check
whether each part is referenced by a part that will be processed.
Fortunately, all references to these nodes must have been tree-walked
before node 1 is examined, so can determine unambiguously whether used
references point to those nodes.  In this case, we have chosen to
process 2, 4, 5, 7, 8, 10, 11, and 13, which means that there are used
references to 16, 16, 15, and 16.  This leaves 17 unreferenced, and so
node 1 is determined to be unprocessable.

Note that the new algorithm avoids being NP-complete essentially by
making choices as soon as they are seen.  In this model, values NOT-A,
NOT-B, NOT-C, and NOT-D are chosen outright.  Since the example
formula is not satisfied, the body that encodes it is not processable.

This example also points out that the requirement that references
appear before the parts they reference is needed to make the problem
tractable.  Otherwise, there could be circular dependencies between
the choices for different multipart/alternatives, which might be
impossible to analyze by an algorithm as simple as a tree-walk.

Dale
_______________________________________________
Sip mailing list  https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sipping at ietf.org for new developments on the application of sip


 another component generates a
  child of that part that represents the referred-to component
- similarly, a reference within a part that refers to another
  component generates a child of that part that represents the
  referred-to component (This is how multipart/related nodes get
  children.)

Of course, if a component is referred to multiple times, the component
is represented by multiple nodes in the tree, one node for each of its
roles.

However, there are subtleties I might be overlooking, so I am going to
revisit the embedding of the NP-complete problem "3-satisfiability"
into MIME to see what it reveals about my sketch of an algorithm.

Referring to the previous message:

    The reduction is as follows:

    http://en.wikipedia.org/wiki/NP-complete
    http://en.wikipedia.org/wiki/Reduction_%28complexity%29
    http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability

    Choose an instance of the 3-satisfiability problem, that is: a set of
    boolean variables A, B, C, etc. and a formula composed of:
	    a series of "clauses" joined by AND, consisting of
		    a series of variables and their negations joined by OR.
    The problem is to determine if there is any assignment of truth values
    to the variables which makes the formula true, that is, which makes
    true one of the variables or negations in each of the clauses.

    For example, choose 4 variables A, B, C, and D, and the formula

    (A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B OR NOT-D)

    This can be satisfied by setting A=TRUE, B=FALSE, C=TRUE, and D=FALSE.

    Construct a multipart/mixed with each component labeled
    handling=required.  Within that, for each variable, have a part which
    is multipart/alternate with 2 parts.  If processing chooses the first
    part of the part corresponding to a variable, that will model the
    variable being TRUE.  If processing chooses the second part of the
    part corresponding to a variable, that will model the variable being
    FALSE.

    For example, our body starts:

	    multipart/mixed
		    multipart/alternative;handling=required
			    first part corresponding to A (not yet specified)
			    second part corresponding to NOT-A
		    multipart/alternative;handling=required
			    first part corresponding to B
			    second part corresponding to NOT-B
		    multipart/alternative;handling=required
			    first part corresponding to C
			    second part corresponding to NOT-C
		    multipart/alternative;handling=required
			    first part corresponding to D
			    second part corresponding to NOT-D

    For each clause of the formula, add an additional body part, marked
    "Content-Disposition: by-reference; handling=required".  Give the part
    a 'cid' URL.  The ability of the recipient to successfully process
    this part will model making the corresponding clause TRUE.

    For example, our body is now:

	    multipart/mixed
		    multipart/alternative;handling=required
			    first part corresponding to A (not yet specified)
			    second part corresponding to NOT-A
		    multipart/alternative;handling=required
			    first part corresponding to B
			    second part corresponding to NOT-B
		    multipart/alternative;handling=required
			    first part corresponding to C
			    second part corresponding to NOT-C
		    multipart/alternative;handling=required
			    first part corresponding to D
			    second part corresponding to NOT-D
		    part (not otherwise specified);by-reference;handling=required;cid=clause1
		    part (not otherwise specified);by-reference;handling=required;cid=clause2
		    part (not otherwise specified);by-reference;handling=required;cid=clause3

    We now want to arrange things so that one of the last 3 parts can be
    successfully processed only if we choose for processing one of the
    earlier parts that corresponds to a variable value that will make the
    corresponding clause true.  This is done by embedding forward
    references in the variable-corresponding parts that refer to the
    clause-corresponding parts.

    For example:

 1	    multipart/mixed
 2		    multipart/alternative;handling=required
 3			    first part corresponding to A
				    references cid:clause1
 4			    second part corresponding to NOT-A
				    references cid:clause3
 5		    multipart/alternative;handling=required
 6			    first part corresponding to B
				    references cid:clause1
				    references cid:clause2
 7			    second part corresponding to NOT-B
				    references cid:clause3
 8		    multipart/alternative;handling=required
 9			    first part corresponding to C
				    references cid:clause1
10			    second part corresponding to NOT-C
				    references cid:clause2
11		    multipart/alternative;handling=required
12			    first part corresponding to D
				    references cid:clause2
13			    second part corresponding to NOT-D
				    references cid:clause3
14		    part;by-reference;handling=required;cid=clause1
15		    part;by-reference;handling=required;cid=clause2
16		    part;by-reference;handling=required;cid=clause3

    In order to process this body, the recipient must choose exactly one
    of each pair of bodies in the multipart/alternative body parts.  This
    choice is equivalent to choosing a set of values for the variables.
    The references embedded in the chosen parts refer to the body parts
    that correspond to the clauses which are made true by the
    corresponding variable's value.  In order to successfully process the
    final body parts, the recipient must have chosen a set of variable
    values that makes all of the clauses true.

The resulting tree is:

	1 -+- 2 --+- 3 ---- 14
	   |      |
	   |      +- 4 ---- 16
	   |
	   +- 5 --+- 6 --+- 14
	   |      |      |
	   |      |      +- 15
	   |      |
	   |      +- 7 ---- 16
	   |
	   +- 8 --+- 9 ---- 14
	   |      |
	   |      +- 10 --- 15
	   |
	   +- 11 -+- 12 --- 15
	   |      |
	   |      +- 13 --- 16
	   |
	   +- 14
	   |
	   +- 15
	   |
	   +- 16

Though the number of nodes is greater than the number of body parts,
it's limited approximately by the number of appearances of variables
in the original boolean formula, so it's linear in the number of body
parts.  So in building the tree, we don't have exponential effort.

In the new approach, we walk the tree, tagging each node as to whether
it is processable.  In the above model, we assume that all the simple
parts are processable, so all nodes labeled 2 through 16 are marked
processable.  And as we tag the nodes for processability, we choose
for each multipart/alternative node which child will be processed
(namely, the last child that is processable).  In this tree, the
choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13.

Where things get interesting is deciding if node 1 is processable.
All of its children are processable, of course.  But the crux of the
construction is how the "by-reference;handling=required" constraints
on nodes 14, 15, and 16 are handled.  In this algorithm, we check
whether each part is referenced by a part that will be processed.
Fortunately, all references to these nodes must have been tree-walked
before node 1 is examined, so can determine unambiguously whether used
references point to those nodes.  In this case, we have chosen to
process 2, 4, 5, 7, 8, 10, 11, and 13, which means that there are used
references to 16, 16, 15, and 16.  This leaves 17 unreferenced, and so
node 1 is determined to be unprocessable.

Note that the new algorithm avoids being NP-complete essentially by
making choices as soon as they are seen.  In this model, values NOT-A,
NOT-B, NOT-C, and NOT-D are chosen outright.  Since the example
formula is not satisfied, the body that encodes it is not processable.

This example also points out that the requirement that references
appear before the parts they reference is needed to make the problem
tractable.  Otherwise, there could be circular dependencies between
the choices for different multipart/alternatives, which might be
impossible to analyze by an algorithm as simple as a tree-walk.

Dale
_______________________________________________
Sip mailing list  https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sipping at ietf.org for new developments on the application of sip