[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sipping] Overload/Congestion mechanisms - design choices and issues



Jonathan,

some more thoughts on issues 6 and 7.

ISSUE 6: What is the temporal scope of reported congestion information?

When an upstream element receives congestion information, how long does it consider it valid for? What does it do when it receives no information from a downstream element? What does it do if it cannot reach a downstream element?

How long does the upstream proxy consider congestion information as valid? A simple way to address this problem is to have the reporting proxy attach a duration parameter. The assumption here is that it is easier for the reporting proxy to determine how stable the congestion status information is (e.g. based on history) than for the receiving proxy. If the current status deviates too much from the last reported status, the reporting proxy can simply create a new report. Or the proxy can simply report the current congestion value in every response (if responses are used to convey this information).

What happens if congestion information times out? An upstream proxy might simply remove this congestion status when it times out. However, this would remove all throttling and cause the full traffic load to be forwarded to the downstream proxy - which is a problem in particular if the last report was overload. An alternative approach is to use a slow start: instead of timing out congestion information, the upstream proxy slowly starts to increase the load (i.e. decreases the overload state). If the downstream proxy is still under overload, it reports the current status and throttling can be adjusted. If it has failed, this is detected by the upstream proxy on the transport or SIP level. If the downstream proxy processes transactions correctly and does not report overload, the forwarded load is increased slowly.

If congestion information is reported in SIP responses there is the problem that a proxy can't report congestion state unless it receives a request from the upstream proxy. This is not a problem if the upstream proxy throttles early before the downstream neighbor reaches 100%. It can then still forward an occasional request to probe overload status even under heavy load. Slow start also helps to alleviate this problem since it gives the downstream neighbor time to report the current status before the load goes up significantly. However, if a proxy receives requests from many different sources, even one request from each source may be enough to cause overload (this is essentially ISSUE 7).

ISSUE 7: How does the system work for upstream elements which are not "servers"

I tend to think of the overload mechanism as having two modes. One mode is where an element knows of all of the upstream elements sending it traffic. It uses this knowledge to figure out what kind of rate to proportionally allocate to each upstream element.

This is a mode in which a proxy receives may requests from a limited number of sources.

I think that the proxy does not necessarily need to know all of the upstream neighbors. I can, e.g., respond to each request it receives and tell the upstream neighbor that it should discard 20% of the traffic it would route to this proxy. Thus, each upstream proxy discards 20% of the traffic it would send to this downstream neighbor and the downstream proxy gets rid of 20% of its traffic. A more sophisticated scheme may allocate a higher share to certain known proxies and use a lower share for all others. However, this works even without tracking (or configuring) all upstream neighbors.

In the other mode, we have somethign like an edge proxy, which has many many clients that connect to it. How does that edge element implement overload handling? Certainly it does; if the network as a whole is overloaded, you need to push back to the actual source so that the overall load being sent into the network is reduced. That source are the endpoints themselves. In this mode, the existing 503 mechanism works better. But, you want the edge proxy to push back gradually, sending 503 (or something else) to some fraction of the clients, so that you reduce the input load proportionally.

I agree that this scenario can probably be handled well by 503. If each client only sends one request there is not much to throttle and a proxy can push back by selectively sending 503s to some clients.

Thanks,

Volker




_______________________________________________ Sipping mailing list https://www1.ietf.org/mailman/listinfo/sipping This list is for NEW development of the application of SIP Use sip-implementors at cs.columbia.edu for questions on current sip Use sip at ietf.org for new developments of core SIP