[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Sipping] Overload/Congestion mechanisms - design choices andissues
Volker, Jonathan,
"load balancing" and "overload control" are two related but different
issues. I get the feeling that we need to get the distinction clear before
plunging into this discussion. I don't pretend to know the answers, but let
me try and give it a go
"Load balancing" is about resource optimization and scaleability. What makes
load balancing for SIP (or proxy systems in general) difficult, is that
behind a SIP element there is typically a cloud of other systems. An
upstream SIP peer typically has no knowledge of the structure of this cloud,
especially when it is in another domain. My hunch is that to enable a peer
to make good load balancing decisions, it will need load information that is
not only based on its direct next hop but also on the systems behind that.
So a major issue (for load balancing) is the propagation and aggregation of
this information (and the underlying model).
"Overload/congestion control" is about reducing the load on a particular
system that is getting more than it is capable of handling. The two
mechanisms that can be applied here are traffic diversion and throttling. To
decide that overload control is needed, information from a single element
(ie the system that is experiencing overload) suffices. However, to enable
good diversion/throttling decisions, you may need the same information that
is used for load balancing.
I wager that the upstream system does not need to know exactly which part of
the peer system (disk, network, etc) is experiencing overload. This
information would be useful for diagnosis, but not to decide what (ie divert
and/or throttle) to do. Since the system under overload would know best what
effect certain decisions would have on its overload status (and perhaps also
on the rest of its network), I reckon that the system under overload should
make a suggestion to the upstream peer on what to do. In other words, not
"help, I am 98% loaded, and it is because my disks are failing" but concrete
suggestions / parameters on what to do (see next).
Regarding traffic diversion: what we currently have is forwarding or 3xx
responses. What we might need here is what eg also Diameter offers, what I
would call "selective temporary redirection". In Diameter, you can say "for
the next 10 minutes, send all traffic for {this session, this domain, this
user, ...} to this and that node". You could do the same for SIP, the
selection criteria could e.g. be pre-established, standardized profiles or
fully dynamic.
A solution would probably need to support both implicit and explicit
alternative hosts, ie the system in overload must be able to express "send
your traffic to these nodes instead" but also "to any node other than me"
For throttling we have 503, which only applies to a single request. What we
might need here is a mechanism that allows a node to respond with
classification criteria (for example "all dialog creating requests", or
something more dynamic like IMS initial filter criteria) and something like
shaper settings, eg minimum intervals between requests, max requests per
time interval, etc. Again, such a filter would probably have a suggested
time period
My 2 cents,
Jeroen
_______________________________________________
Sipping mailing list https://www1.ietf.org/mailman/listinfo/sipping
This list is for NEW development of the application of SIP
Use sip-implementors at cs.columbia.edu for questions on current sip
Use sip at ietf.org for new developments of core SIP