Re: [p2pi] Content Identifiers sent to ALTO server?
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [p2pi] Content Identifiers sent to ALTO server?



Laird Popkin <laird at pando.com> wrote:
> 
>> I see two very different use cases for what we're discussing here.
>> 
>> 1. Client has a list of IP addresses, and wants to know which to
>>    try first;
>> 
>> 2. Client doesn't have a list of IP addresses, but has a Content
>>    Identifier, and wants an ordered list of IP addresses to try.
> 
> I don't think that either of the scenarios that you list is particularly
> likely for p2p swarm file downloading.

   I'm perfectly happy for "client" to mean the tracker itself, BTW.

> I'll explain why, they outline the model of ALTO that seems most
> valuable for that application.
> 
> 1. If the client has a list of IP addresses, the list is short enough
>    that it can test throughput with each of them all fairly quickly,
>    so the value of ALTO is low in terms of optimizing p2p performance.

   There's no functional problem with such a list exceeding 100 IPs.
Many ISPs consider 100 simultaneous TCP connections abusive. (Unless
they actually overlap, I don't see how they can be tried "quickly".)

> This scenario makes sense for some other applications (e.g. finding
> a good network relay, or optimizing small swarm downloads).

   Exactly. We should not design something useful only for p2p swarm.

> 2) I think that it's safe to assume that p2p networks will never send
>    a list of all IP addresses associated with a particular content
>    swarm to an ISP's ALTO server. This raises obvious business,
>    privacy and legal issues, which are at best quite concerning.

   Nonetheless, privacy is compromised as soon as they send it to a
potential user. More likely, they would send an incomplete list,
possibly adding non-swarm IPs to obscure things.

> In addition, putting the ALTO protocol in the middle of a peer request
> creates unacceptable operational problems for the p2p network that
> would be far worse than any optimization that ALTO could provide.

   I don't follow you. I see neither "unacceptable operational problems"
nor a lack of potential optimization to outweigh them.

> Let's compare a standard BitTorrent "tracker announce", a tracker
> announce with ALTO IP lookup" and "tracker announce with ALTO
> guidance" for a peer in a swarm of 10,000 peers.

   Please do...

> standard: peer announces, Tracker responds from list in memory (one
> MTU each way, UDP). Fast, reliable in-memory operation.

   One Round-Trip-Time; but client must try the IPs in turn. Often it
will go through several TCP connections on paths the ISP would prefer
weren't used.

> ALTO IP list: In this model, information sent into or out of ALTO
> is lists of IP addresses. Peer announces. Tracker sends request,
> with the peer address and a complete list of the 10,000 peers in
> the swarm, to the ALTO server.

   I doubt they'd send all 10,000, but if they did we're talking no
more than tens of milliseconds, still all from memory.

> ALTO server computes optimal list and returns 50 peers.

   50 seems enough for an end-user to actually try.

> Tracker returns list to peer. Tracker does this (send swarm peer list,
> receive ALTO peer list) 2,000 times per second.

   I _really_ doubt they'd send 10,000 IPs 2,000 times a second; but
that's for them to optimize. (An obvious optimization is to determine
the AS number of the end-user.)

> This makes the p2p tracker slow and unreliable (by adding external
> communication to what is now an in-memory operation),

   Any tracker that busy would no doubt cache the information useful
for ordering the list, and serve from cache rather than wait for a
response from the ALTO server if a similar IP pairing hadn't expired
from the cache.

> and places a huge operational burden on the ISP's ALTO server.

   ISPs _could_ manage such a load, but certainly would not choose to
do so unless they saw a return on the investment.

> I suspect that ISPs would have problems with this as well, because
> it would effectly make their ALTO server a p2p tracker, with all that
> implies.

   I don't follow...

> ALTO guidance: Tracker (updated periodically, asynchronously) receives
> optimization rules from the ALTO server.

   Certainly a reasonable design.

> Peer announces.

   I don't follow...

> Tracker picks optimal list from known swarm peers based on the ISP's
> rules, and returns 50 peers to announcing peer. This is slightly slower
> than a standard announce, but it's all in memory in the Tracker,
> and only imposes a low volume of ALTO communications.
> Communication is the same as the first case (one MTU each way, UDP),
> plus one ALTO message every hour (or day, etc.) to update the
> optimization rules.

   The devil is in the details...

   Obviously, there could be a simple rule, in the nature of always
prefer ASs in this order. But that won't cover all situations, and
risks causing the whole service to be ignored.

   A more useful rule would state that certain CIDR blocks can reach
other CIDR blocks over a path with these characteristics. While this
implies some computation on the part of the Tracker making the query,
that can easily be cached by CIDR-block-pairs.

   So, you have made I case for returning CIDR length on ALTO replies.
I agree that looks like a good idea.

   I didn't mention cacheing, because it seemed a local optimization.
I still don't think it _needs_ to be in the spec, but it very likely
deserves to be mentioned.

   (And please don't forget that ALTO will be used for more than just
p2p swarms.)

--
John Leslie <john at jlc.net>
_______________________________________________
p2pi mailing list
p2pi at ietf.org
https://www.ietf.org/mailman/listinfo/p2pi



Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.