![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Hi Robb, Thanks for the excellent comments. Please see below. Robb Topolski wrote:
Welcome to the real world :-) Large-scale and yet tightly controlled experiments are hard to conduct (if you have an idea or are interested in more discussion, it will be fantastic). We did quite a few controlled experiments using clients on PlanetLab, but we could get a couple hundred users if we were lucky. Neither may tightly controlled experiments be highly desirable, because we would not know what we were missing then (your example of p2p streaming is an excellent example). One objective of real experiments is unexpected discovery. Sometime they are pleasant, and sometime they are not. Download abort was one unexpected. But given the small fraction, I think it does not changes the reported results. For full disclosure, another unexpected user behavior we discovered during our log processing was that users sometime paused their download process (e.g. put the machines to sleep). We spent quite a lot of time on this issue. Unfortunately the logs we had did not capture such events. We were careful to use statistically more robust metrics (e.g., the download rates at different percentiles instead of the average). We tried to check the impacts of such events. For example, we also computed statistics after filtering to make sure our statistics did not change much. One filtering we did was that if a user had no activity for too long (we tried two minutes and some other numbers), we flagged that the user might have paused, and we ignored that user's data. We did not see large change of theIt seems like a broken statistic given this experiment. If it were tightly controlled (and thus less Internet-like), all of the downloads would have completed and the download byte amounts would be virtually the same.
statistics we computed before and after such filtering.Of course, we learn from such experiences. In our current experiments, we are designing a new log format to try to capture more events. We might be surprised again and go back to log more. We will be happy to share and work with others on robust experiment design and statistics collection.
I agree it will be fantastic if we can know the reason for an aborted download. One thought came to mind immediately is a pop-up window after a download abort to ask the user for reason(s). I do not know any P2P clients doing this right now and I am not sure how many users will respond (e.g. not click cancel). But it is an excellent suggestion! We sure will talk to the P2P developers we are working withOne might argue that there is virtue to these observations because a downloader who cancels a slow download is a bad thing, because they ultimately did not get what they attempted to download. But is there virtue in an aborted download? If they canceled the download in frustration or because they simply couldn't stay connected to the swarm long enough before being forced to quit by some other personal oe obligation, then one could say that an aborted download is a band thing. OTOH, an incomplete download is a good thing when a downloader may have changed his mind (wanted to hear/see/do something else) with less bandwidth wasted. There's just no way to know why the user aborted the download.
to see if we can add such an option in one of our log plug-in.
Absolutely good comment. This is why we are focusing on P2P streaming, where channel hopping is a common user behavior (http://ccr.sigcomm.org/online/?q=node/404), and startup delay is a majorEven if we did know, this experiment doesn't deal with other possible P4P-advantaged uses such as streaming-P2P video delivery (a mode more prone to user "taste-testing" before completing a download than traditional file-transfer models).
performance metric. Hope to hear more such good comments! Richard
Robb 2008/11/6 Ye WANG <wangye.thu at gmail.com>:Hi Haibin, Yes, the Random swarm has notable smaller finished downloads than Generic or Coarse Grained during the period. Since the swarm sizes (# downloading peers) are roughly equal across all five swarms (Richard explained this in details), we suspect that a portion of slow peers terminate/discard their downloads. This is almost the same hypothesis pointed out by Rich Woundy. Another evidence is we do notice significantly slower peers in Random swarm, e.g., the slowest Random peer took 7268s (>2hours) to download the video, but the slowest Generic took 2725s (<1hour), the slowest Corase Grained took 3114s (<1hour). Hours of downloading may make users impatient. If the "tail" peers in Random swarm could suffer much lower download rates, presumably, the number of "terminated" peers may be larger in Random swarm. On Thu, Nov 6, 2008 at 4:09 AM, Song Haibin <melodysong at huawei.com> wrote:Hi Richard and all,The access download of each swarm should be equal to the sum of those downloaded by the clients in each swarm. So if the number of downloads in each swarm is the same and the amount downloaded is the same, then each swarm should have the same access download.Song Haibin: From section 4.1, we can see that "The results of the trial indicated that P4P can improve the speed of downloads to P2P clients", so if the statistics data is collected during a certain period (from July 2 to July 17, 2008), then the download will be increased than the random swarm. I don't think each swarm has downloaded the same amount of chunk files during the statistic period.Best Regards, Song Haibin Email: melodysong at huawei.com Skype: alexsonghw-----Original Message----- From: p2pi-bounces at ietf.org [mailto:p2pi-bounces at ietf.org] On Behalf Of Y.R.Yang Sent: Thursday, November 06, 2008 10:40 AM To: Woundy, Richard Cc: p2pi at ietf.org; Livingood, Jason Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted Hi Rich and others, The access download of each swarm should be equal to the sum of those downloaded by the clients in each swarm. So if the number of downloads in each swarm is the same and the amount downloaded is the same, then each swarm should have the same access download. First look at the amount downloaded. There are can be some differences due to duplicated chunks and the way we detected data chunks (there are also control data in the logs), but the difference appears to be small. Now let's look at the number of downloads. During the test, each client is uniformly assigned to a swarm. Given the large number of clients, each swarm should have about the same number of clients. But there can be two factors for us to see different numbers of *reported* downloads: (1) some clients are old and may not report or the reporting of logs was not successful; and (2) different # of clients finished downloading (if a client does not finish downloading, it does not report. Laird, please correct me if I am wrong). I belive the first factor should be small due to uniform random assignment of peers to swarms. I just looked at the data available. We did detect a smaller number of finished download with Random than with the P4P swarms. For example, from July 3 to July 10, detected # of finished download of Generic is about 5% more than than Random, and Coarse is 7% more than Random. From July 10 to July 17, Generic is 10.5% more than Random, and Coarse is 4.7% more. Looking at the traffic volume at Section 4.2, I see that Generic is about 7% higher, and Coarse is about 8.5% higher. Note that # of finished download and volume are different due to duplicated chunks and missing logs. So I would like to support the theory/guess of Rich that some users terminated the download prematurally and faster downloads may result in fewer such terminations. But it may also include factors in (1) differences in initial assignment due to random numbers; and (2) # of finished but non-reporting clients. If you have any other suggestions, we will be more than happy to look into the available data more. Richard On Wed, 5 Nov 2008, 6:26pm -0500, Woundy, Richard wrote:My current theory/guess is that some users may terminated the download prematurely, eg due to user impatience. So faster downloads (e.g. thanks to P4P) may result in fewer user terminations. Laird is checking the data to see if we can confirm that, or find another explanation. -- Rich ________________________________ From: Laird Popkin [mailto:laird at pando.com] Sent: Wednesday, November 05, 2008 6:12 PM To: Robb Topolski Cc: Livingood, Jason; p2pi at ietf.org; Woundy, Richard Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted That's a good question, and Richard and I spoke about this yesterday. I'll be looking into the data to see what the cause is. - Laird Popkin, CTO, Pando Networks mobile: 646/465-0570 ----- Original Message ----- From: "Robb Topolski" <robb at funchords.com> To: "Richard Woundy" <Richard_Woundy at cable.comcast.com> Cc: "Jason Livingood" <Jason_Livingood at cable.comcast.com>, p2pi at ietf.org Sent: Wednesday, November 5, 2008 5:42:51 PM (GMT-0500) America/New_York Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted I don't get the part where access network download consumption increased as a result of using P4P (section 4.2). Can someone explain how that could happen? Robb On Wed, Nov 5, 2008 at 2:29 PM, Woundy, Richard <Richard_Woundy at cable.comcast.com> wrote: Reinaldo, I can answer the easy questions. We will need some assistance from Pando (and Yale) for some of the other ones.What was the file size in those experiments?21 megabytes. From section 2: "Pando distributed a special 21 MB licensed video file as in order to measure the effectiveness of P4P iTrackers."How long would it take to download the file in the three differentscenarios? I know that more consumed bandwidth in access might lead one to conclude that file was downloaded faster... To clarify, most of the raw data (download speed and Internet peering/transit traffic volumes) were collected by Pando Networks from their P2P clients, not collected by Comcast across its links. So my assumption is that the Pando client used the content size (21 MB), and divided by the download time to get the speed.Was the file already seeded in Comcast's network? More specifically,how was file propagation done? Any seeding happened outside of Comcast's network, and outside of Comcast's control. That's really a question for Pando.Was PEX, DHT and others enabled in the clients?Pando would know whether PEX was enabled. It would be safe to assume that with respect to this trial, DHT was NOT enabled, since Pando supplied the tracker. (The pTracker in the draft is a tracker operated by Pando.)Was local peer discovery enabled in the clients?Pando would know.BTW, can broadcast/multicast peer discovery work in Cable networks?Do you mean something like this: http://bittorrent.org/beps/bep_0026.html? If so, peer discovery probably would not work over the typical last mile cable network. Maybe I'm wrong, but I see this protocol as intended for peer discovery within one's home network / LAN / WiFi network, not over a cable network.So, were clients allowed to become seeders to the outside of Comcast'snetwork? Yes, they were. As a related item, look closely at section 4.2. The amount of aggregate uploaded data from Comcast clients (per swarm) was about 140,000 MB. The amount of aggregate downloaded data from Comcast clients (per swarm) was about 60,000 MB or so. So the typical Comcast client uploaded more than twice the amount of data that it downloaded.How much of the swarm was within Comcast and outside?Most of the swarm was outside of Comcast. Unfortunately I don't have access to the size of the global swarm, but I would guess that Comcast clients represented no more than 15% of the swarm, and maybe as little as 5%. Those guesses are based on the behavior of the random swarm, e.g. Comcast clients uploaded to non-Comcast clients 94% of the time in the random swarm. -- Rich -----Original Message----- From: p2pi-bounces at ietf.org [mailto:p2pi-bounces at ietf.org] On Behalf Of Reinaldo Penno Sent: Wednesday, November 05, 2008 11:23 AM To: Livingood, Jason; p2pi at ietf.org Subject: Re: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted Hello Jason/Rich, This is such an interesting draft. I'm surprised there are no questions about it. Maybe everybody else is part of P4P one way or another and I'm not in the 'in' crowd (;-) so I have questions. * What was the file size in those experiments? Some post long ago said the file size in some P4P experiments was really small, as opposed to the top 100 torrents where the file size is ~1Gb. I was curious what is the optimization payback in terms of download time for large files as opposed small files. * How long would it take to download the file in the three different scenarios? I know that more consumed bandwidth in access might lead one to conclude that file was downloaded faster but I'm not sure this is a straightforward conclusion. * Was the file already seeded in Comcast's network? More specifically, how was file propagation done? All clients started from scratch and had to start pulling the file from some other side of the world and then exchanging pieces? This is mainly due to the discussion in 4.2. * Was PEX, DHT and others enabled in the clients? * Was local peer discovery enabled in the clients? BTW, can broadcast/multicast peer discovery work in Cable networks? * If more clients finish downloading faster and become seeders you would think that for popular content Comcast's upstream bandwidth would increase due to the number of seeder in its network. So, were clients allowed to become seeders to the outside of Comcast's network? How much of the swarm was within Comcast and outside? Thanks, Reinaldo On 11/3/08 12:49 PM, "Livingood, Jason" <Jason_Livingood at cable.comcast.com> wrote:For some reason the URL was cut to two lines - trying again:http://www.ietf.org/internet-drafts/draft-livingood-woundy-p4p-experiences-02.txt-----Original Message----- From: p2pi-bounces at ietf.org [mailto:p2pi-bounces at ietf.org] On Behalf Of Livingood, Jason Sent: Monday, November 03, 2008 3:35 PM To: p2pi at ietf.org Subject: [p2pi] draft-livingood-woundy-p4p-experiences-02 posted A draft at http://www.ietf.org/internet-drafts/draft-livingood-woundy-p4p -experienc es-02.txt may be of interest to folks that have been interested in P2Pi and ALTO. We have requested time on the ALTO agenda at IETF 73 to present this. Regards Jason _______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi_______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi_______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi _______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi -- Robb Topolski (robb at funchords.com) Hillsboro, Oregon USA http://www.funchords.com/ _______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi_______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi_______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi_______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi
_______________________________________________ p2pi mailing list p2pi at ietf.org https://www.ietf.org/mailman/listinfo/p2pi