[apps-discuss] Review of: draft-ietf-decade-problem-statement-05

Dave Crocker <dcrocker@bbiw.net> Thu, 22 March 2012 10:48 UTC

Return-Path: <dcrocker@bbiw.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BAB6521F86AA; Thu, 22 Mar 2012 03:48:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.117
X-Spam-Level:
X-Spam-Status: No, score=-4.117 tagged_above=-999 required=5 tests=[AWL=1.414, BAYES_00=-2.599, DATE_IN_PAST_06_12=1.069, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uikEXN51rdgO; Thu, 22 Mar 2012 03:48:37 -0700 (PDT)
Received: from sbh17.songbird.com (sbh17.songbird.com [72.52.113.17]) by ietfa.amsl.com (Postfix) with ESMTP id 6A38B21F86A7; Thu, 22 Mar 2012 03:48:37 -0700 (PDT)
Received: from [192.168.8.65] (ter75-1-81-57-68-77.fbx.proxad.net [81.57.68.77]) (authenticated bits=0) by sbh17.songbird.com (8.13.8/8.13.8) with ESMTP id q2MAmSYW004974 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 22 Mar 2012 03:48:35 -0700
Message-ID: <4F6A96A5.3070608@bbiw.net>
Date: Thu, 22 Mar 2012 04:04:05 +0100
From: Dave Crocker <dcrocker@bbiw.net>
Organization: Brandenburg InternetWorking
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2
MIME-Version: 1.0
To: apps-discuss@ietf.org, draft-ietf-decade-problem-statement.all@tools.ietf.org
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0 (sbh17.songbird.com [72.52.113.17]); Thu, 22 Mar 2012 03:48:37 -0700 (PDT)
Cc: iesg <iesg@ietf.org>
Subject: [apps-discuss] Review of: draft-ietf-decade-problem-statement-05
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 22 Mar 2012 10:48:38 -0000

I have been selected as the Applications Area Directorate reviewer for this 
draft: (for background on appsdir, please see 
http://trac.tools.ietf.org/area/app/trac/wiki/ApplicationsAreaDirectorate).


Document:

    draft-ietf-decade-problem-statement-05

Title:

    DECoupled Application Data Enroute (DECADE) Problem Statement

Reviewer:

    D. Crocker <dcrocker@bbiw.net>

Review Date:

    22 March 2012


Summary:

    This draft needs substantial work, due to ambiguous terminology and a lack 
of technical detail.


The document serves as an initial statement of the problem(s) that a DECADE 
protocol will solve.

If I understand that problem correctly, it concerns creation of a standardized 
caching protocol, in order to gain performance and network efficiencies when a 
collection of content consumers are topologically near.  Proprietary caching 
already exists, but the current effort is to develop an open specification.



Major Issues:

The document suffers a number of significant problems.  Caches are useful.  It 
seems obvious that they would be useful for the types of configurations that the 
working group intends to cover.  It also seems that stating that basic pint as a 
'problem' requires no more than a few sentences, or at most a few paragraphs. 
The interesting parts of the problem are the technical details and constraints 
that are specific to the current effort.

The document is deficient with respect to its assumptions and its details.  The 
issue is not whether caching can be useful.  It's that the engineering issues 
about this particular environment are not stated or explored.

The document assumes that the reader already know the meanings and implications 
of a number of terms of art, since they are used heavily but not really 
explained. (I also suspect that some of the terms do not have the kind of 
widespread, rigorous technical definitions that one would like.) The document 
needs to define these terms and the definitions need to be sufficient to help 
the reader understand what possible cases are excluded, not just those that are 
included.

In addition, the document could benefit from using some different terms, such as 
producer (or provider) and consumer, rather than 'peer'.  For a given type of 
activity or role, the terminology should clearly distinguish one actor from 
another and the terminology should be terse.  And roles change. For example for 
BitTorrent, a machine that is initially a consumer, when receiving the document, 
then becomes a provider as other machines consume copies from it.

I also was sometimes confused about the relationships between the cache and 
producers or consumers.  It is unclear which actor needs what control over which 
caches.

More seriously, the problem statement only has superficial technical content, as 
summarized above.  It states things redundantly, but does not elaborate.  It 
needs much great technical depth in order to guide requirements and protocol 
choices.

For example, what sorts of usage scenarios are to be covered?  The included 
examples of BitTorrent and Content Distribution are offered more as functional 
tags than as networking technical scenarios.  What sort of related scenarios 
will not be covered and why?  I'll include my usual suggestion that diagrams for 
this sort of thing could help quite a bit.


Detailed comments:


> 2.  Terminology and Concepts
>
>    The following terms have special meaning in the definition of the in-
>    network storage system.
>
>       in-network storage: A service inside a network that provides
>       storage and bandwidth to network applications.  In-network storage
>       may reduce upload/transit/backbone traffic and improve network
>       application performance.

The problem with this definition is that the most obvious meaning of 'in 
network' is wrong and the correct meaning is not provided.

For example, I suspect it has nothing to do with routers.  And certainly nothing 
with the contents or code of routers.

In reality, the model being espoused sounds quite a bit like what we are used to 
for the DNS and possibly similar to common Web arrangements.  I strongly urge 
discussion of these two explicitly, though probably in the requirements 
documents, since that moves more towards solution than problem statement.

I suspect that "in network" merely means 'in the query/response handling 
sequence between consumer and provider.'



>       P2P cache (Peer to Peer cache): A kind of in-network storage that
>       understands the signaling and transport of specific P2P
>       application protocols.  It caches the content for those specific
>       P2P applications in order to serve peers and reduce traffic on
>       certain links.


The problem with this definition is that P2P is not defined.  It needs to be, 
since the term is at the core of the stated problem-space. Use technical terms 
that distinguish it from other kinds of applications.  How is P2P different from 
email, web, DNS or anything else?  The definition should be sufficient to tell 
what qualifies as P2P and what doesn't.


> 3.  The Problems
>
>    The emergence of peer-to-peer (P2P) as a major network application
>    (especially P2P file sharing and streaming) has led to substantial
>    opportunities.  The P2P paradigm can be utilized to design highly
>    scalable and robust applications at low cost, compared to the
>    traditional client-server paradigm.  For example, CNN reported that
>    P2P streaming by Octoshape played a major role in its distribution of
>    the historic inauguration address of President Obama[Octoshape].
>    PPLive, one of the largest P2P streaming vendors, is able to
>    distribute large-scale, live streaming programs to more than 2
>    million users with only a handful of servers [PPLive].
>
>    However, P2P applications also face substantial design challenges.  A
>    particular problem facing P2P applications is the additional stress
>    that they place on the network infrastructure.  Furthermore, lack of
>    infrastructure support can lead to unstable P2P application
>    performance during peer churns and flash crowds, when a large group
>    of users begin to retrieve the content during a short period of time.
>    These problems are now discussed in further detail.
>
> 3.1.  P2P infrastructural stress and inefficiency
>
>    A particular problem of the P2P paradigm is the stress that P2P
>    application traffic places on the infrastructure of Internet service
>    providers (ISPs).  Multiple measurements (e.g., [Internet Study 2008/
>    2009][Internet_Study_2008-2009]) have shown that P2P traffic has
>    become a major type of traffic on some networks.  Furthermore, the
>    inefficiency of network-agnostic peering (at the P2P transmission
>    level) leads to unnecessary traversal across network domains or
>    spanning the backbone of a network [RFC5693].
>
>
>
> Song, et al.             Expires August 11, 2012                [Page 4]
> 
> Internet-Draft          DECADE Problem Statement           February 2012
>
>
>    Using network information alone to construct more efficient P2P
>    swarms is not sufficient to reduce P2P traffic in access networks, as

"swarms"?  this probably needs a technical definition, since it points towards 
the implied technical solution that is being proposed.

Again, there is an implied underlying architecture here that is not stated, yet 
is fundamental to the assumptions and proposals of the work.


>    the total access upload traffic is equal to the total access download
>    traffic in a traditional P2P system.  On the other hand, it is
>    reported that P2P traffic is becoming the dominant traffic on the
>    access networks of some networks, reaching as high as 50-60% on the
>    downlinks and 60-90% on the uplinks ([DCIA], [ICNP],
>    [ipoque.P2P_survey.], [P2P_file_sharing]).  Consequently, it becomes
>    increasingly important to reduce upload access traffic, in addition
>    to cross-domain and backbone traffic.
>
>    The inefficiency is also represented when traffic is sent upstream as

I have no idea what 'upstream' or 'downstream' mean in this context.  Does it 
mean from provider to consumer?  Or perhaps it's a query, going from consumer to 
provider?


>    many times as there are remote peers interested in getting the
>    corresponding information.  For example, the P2P application transfer
>    completion times remain affected by potentially (relatively) slow
>    upstream transmission.  Similarly, the performance of real-time P2P
>    applications may be affected by potentially (relatively) higher
>    upstream latencies.

transit latencies.  up-vs-down is irrelevant.

I'm also not 100% convinced that 'latency' is the precisely correct technical 
concern.

 From a practical standpoint, although caching is an obvious and generally 
helpful method of improving the performance concerns being raised here, not that 
congestion is often worst at leaf portions of a network and caching doesn't fix 
that.


> 3.2.  P2P cache: a complex in-network storage

>    An effective technique to reduce P2P infrastructural stress and
>    inefficiency is to introduce in-network storage.
>
>    In the current Internet, in-network storage is introduced as P2P
>    caches, either transparently or explicitly as a P2P peer.  To provide
>    service to a specific P2P application, the P2P cache server must
>    support the specific signaling and transport protocols of the
>    specific P2P application.  This can lead to substantial complexity
>    for the P2P Cache vendor.
>
>    First, there are many P2P applications on the Internet (e.g.,
>    BitTorrent, eMule, Flashget, and Thunder for file sharing; Abacast,
>    Kontiki, Octoshape, PPLive, PPStream, and UUSee for P2P streaming).
>    Consequently, a P2P cache vendor faces the challenge of supporting a
>    large number of P2P application protocols, leading to product
>    complexity and increased development cost.
>
>    Furthermore, a specific P2P application protocol may evolve
>    continuously, to add new features or fix bugs.  This forces a P2P
>    cache vendor to continuously update to track the changes of the P2P
>    application, leading to product complexity and increased costs.

While I can imagine that this is true, I am pretty sure that it is not 
automatically true.  Does the creation of a new DNS RR type automatically 
requirement enhancement of DNS caches?  I suspect it does only for overly 
knowledgeable caches.


>    Third, many P2P applications use proprietary protocols or support
>    end-to-end encryption.  This can render P2P caches ineffective.
>
>    Finally, a P2P cache is likely to be much better connected to end
>    hosts than to remote peers.  Without the ability to manage bandwidth

"end hosts"?  "remote peers"?  this statement is probably important and 
therefore should assume less.  it needs to explain itself /much/ better.


> 4.1.  BitTorrent
>
>    When a BitTorrent client A uploads a block to multiple peers, the
>    block traverses the last-mile uplink once for each peer.  And after
>    that, the peer B who just received the block from A also needs to
>    upload through its own last-mile uplink to others when sharing this
>    block.  This is not an efficient use of the last-mile uplink.  With
>    in-network storage server however, the BitTorrent client may upload
>    the block to its in-network storage.  Peers may retrieve the block
>    from the in-network storage, reducing the amount of data on the last-

"its"?  any given producer of content (the one doing the 'uploading') is 
supposed to know about all of the caches near "swarms" of consumers?  Note that 
this also means knowing about the swarms.

And all these producers are supposed to have administrative rights for 
'controlling' those caches?



>
> Song, et al.             Expires August 11, 2012                [Page 6]
> 
> Internet-Draft          DECADE Problem Statement           February 2012
>
>
>    mile uplink.  If supported by the in-network storage, a peer can also
>    save the block in its own in-network storage while it is being
>    retrieved; the block can then be uploaded from the in-network storage
>    to other peers.

It occurs to me that having infrastructure support, such as a cache, would seem 
to be at odds with the basic philosophy of BitTorrent, which is entirely about 
having ad hoc, consumer machines used for the functionality.  Rather than 
defining an infrastructure caching service, I'd expect the Bit Torrent protocols 
to support similar functionality in the consumer machines...


>    As previously discussed, BitTorrent or other P2P applications
>    currently cannot explicitly manage which content is placed in the
>    existing P2P caches, nor can they manage access and resource control
>    polices.  Applications need to retain flexibility to control the
>    content distribution policies and topology among peers.
>
> 4.2.  Content Publisher
>
>    Content publishers may also utilize in-network storage.  For example,
>    consider a P2P live streaming application.  A Content Publisher
>    typically maintains a small number of sources, each of which
>    distributes blocks in the current play buffer to a set of the P2P
>    peers.
>
>    Some content publishers use another hybrid content distribution
>    approach incorporating both P2P and CDN modes.  As an example,

Modes?  What does this mean?


>    Internet TV may be implemented as a hybrid CDN/P2P application by
>    distributing content from central servers via a CDN, and also
>    incorporating a P2P mode amongst endhosts and set-top boxes.  In-
>    network storage may be beneficial to hybrid CDN/P2P applications as
>    well to support P2P distribution and to enable content publisher
>    standard interfaces and controls.
>
>    However, there is no standard interface for different content
>    publishers to access in-network storage.  One streaming content
>    publisher may need the existing in-network storage to support
>    streaming signaling or such capability, such as transcoding
>    capability, bitmap information, intelligent retransmission, etc,
>    while a different content publisher may only need the in-network
>    storage to distribute files.  However it is reasonable that the
>    application services are only supported by content publisher's
>    original servers and clients, and intelligent data plane transport
>    for those content publishers are supported by in-network storage.
>
>    A content publisher also benefits from a standard interface to access
>    in-network storage servers provided by different providers.  The
>    standard interface must allow the content publisher to retain control
>    over content placed in their own in-network storage, and grant access
>    and resources only to the desired endhosts and peers.
>
>    In the hybrid CDN/P2P scenario, if only the endhosts can store

I don't know what this 'hybrid' scenario is?  There needs to be a statement of 
the actual technical and/or operational details that make it a hybrid.

There should be a discussion of the differences between BitTorrent and Content 
Provider models, to highlight what problems are shared between them and what 
problems are specific to them.


>    content in the in-network storage server, the content must be
>    downloaded and then uploaded over the last-mile access link before
>
>
>
> Song, et al.             Expires August 11, 2012                [Page 7]
> 
> Internet-Draft          DECADE Problem Statement           February 2012
>
>
>    another peer may retrieve it from a in-network storage server.  Thus,
>    in this deployment scenario, it may be advantageous for a content
>    publisher or CDN provider to store content in in-network storage
>    servers.
>
>
> 5.  Security Considerations


Good discussion.  There is probably an accidental security risk from scaling 
possibilities.  I'm imagining that having every BitTorrent content host 
dictating the policies of a cache is likely to have serious scaling problems 
that wind up constituting a de facto DOS attack.



d/
-- 

   Dave Crocker
   Brandenburg InternetWorking
   bbiw.net