[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Sipping] Overload/Congestion mechanisms - design choices andissues



Volker, Jonathan,

"load balancing" and "overload control" are two related but different issues. I get the feeling that we need to get the distinction clear before plunging into this discussion. I don't pretend to know the answers, but let me try and give it a go

"Load balancing" is about resource optimization and scaleability. What makes load balancing for SIP (or proxy systems in general) difficult, is that behind a SIP element there is typically a cloud of other systems. An upstream SIP peer typically has no knowledge of the structure of this cloud, especially when it is in another domain. My hunch is that to enable a peer to make good load balancing decisions, it will need load information that is not only based on its direct next hop but also on the systems behind that. So a major issue (for load balancing) is the propagation and aggregation of this information (and the underlying model).

"Overload/congestion control" is about reducing the load on a particular system that is getting more than it is capable of handling. The two mechanisms that can be applied here are traffic diversion and throttling. To decide that overload control is needed, information from a single element (ie the system that is experiencing overload) suffices. However, to enable good diversion/throttling decisions, you may need the same information that is used for load balancing.
I wager that the upstream system does not need to know exactly which part of the peer system (disk, network, etc) is experiencing overload. This information would be useful for diagnosis, but not to decide what (ie divert and/or throttle) to do. Since the system under overload would know best what effect certain decisions would have on its overload status (and perhaps also on the rest of its network), I reckon that the system under overload should make a suggestion to the upstream peer on what to do. In other words, not "help, I am 98% loaded, and it is because my disks are failing" but concrete suggestions / parameters on what to do (see next).


Regarding traffic diversion: what we currently have is forwarding or 3xx responses. What we might need here is what eg also Diameter offers, what I would call "selective temporary redirection". In Diameter, you can say "for the next 10 minutes, send all traffic for {this session, this domain, this user, ...} to this and that node". You could do the same for SIP, the selection criteria could e.g. be pre-established, standardized profiles or fully dynamic.
A solution would probably need to support both implicit and explicit alternative hosts, ie the system in overload must be able to express "send your traffic to these nodes instead" but also "to any node other than me"


For throttling we have 503, which only applies to a single request. What we might need here is a mechanism that allows a node to respond with classification criteria (for example "all dialog creating requests", or something more dynamic like IMS initial filter criteria) and something like shaper settings, eg minimum intervals between requests, max requests per time interval, etc. Again, such a filter would probably have a suggested time period

My 2 cents,

Jeroen



_______________________________________________
Sipping mailing list  https://www1.ietf.org/mailman/listinfo/sipping
This list is for NEW development of the application of SIP
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sip at ietf.org for new developments of core SIP