Summary on the decisions.

á      draft-ietf-soc-overload-design-00: New version to be submitted after  the meeting to reflect discussion on message prioritization.

á      draft-gurbani-soc-overload-control-01: inconclusive hum on WG  adoption. A new hum will be taken on the mailing list, after the meeting.
New version to be submitted after the meeting.

á  The scope of the SoC wg, as stated in the charter:
ÒOverload occurs if a SIP server does not have sufficient resources to process all incoming SIP messages.
These resources may include CPU, memory, input/output, or disk resourcesÉ

Overload control is used by a SIP server if it is unable to process all SIP requests due to resource constraintsÉ

There are other failure cases in which a SIP server can successfully process incoming requests but has to reject them for other reasonsÉ
Overload control mechanisms do not apply in these cases and SIP provides appropriate response codes for them.
Ó

á      draft-partha-soc-overload-resource-availability-00: authors encouraged  to discuss draft with MEDIACTRL.
Recommendation to take draft to DISPATCH. The draft is out of scope for the SoC wg.

á      draft-jones-sip-overload-sce-00: Recommendation to take draft to DISPATCH. The draft is out of scope for the SoC wg.

 

 

 

  Notes from Spencer Dawkins

 

1.1                SOC

SOC is new. The working group had a virtual interim four weeks ago, but this is the first face-to-face meeting.

 

1.1.1            Administrative/Agenda bash

Order of presentations was switched É no other bashing happened.

 

1.1.2            SIP Load Control Event Package

(draft-shen-soc-load-control-event-package-00): Volker speaking for Charles Shen and Henning who aren't present

 

This draft adds controls for overload events that are known in advance, complementing control based on feedback.

Proposal is for XML document that describes the event, in an RFC 3265 subscribe/notify implementation.

 

1.1.3            A Mechanism for SIP Avalanche Restart Overload Control

(draft-shen-soc-avalanche-restart-overload-00): Volker speaking for Charles Shen and Henning who aren't present

Problem is simultaneous power-on and resulting floods of messages – how to determine how long you back off before sending traffic into an avalanche situation.

 

1.1.4            Design Consideration

(draft-ietf-soc-overload-design-00)

Volker did a longer presentation on this draft in the interim meeting.

01 version is being prepared for submission (expect this after IETF 78).

No major changes to 00 version (added reference, addressed comments from Geoff).

SIP Overload Control is specifically for SIP-level message processing overload. Other mechanisms can be required for network elements that have additional resources beyond SIP message processing (DSPs, trunk lines). These mechanisms are out of scope for the working group at this time.

Want to do better than choosing messages to throttle at random.

Seeing messages with RPH isnÕt enough to identify what to do – in some networks, all messages may carry RPH. Need RPH with known namespace and specific priority values, and RPH COULD be used to lower the priority (not just raise it).

How many levels of RPH are we going to target? Just two, or lots?

What unit of feedback do we target? Number of sessions/calls, number of messages É

Should use both numbers and rates É and these are orthogonal.

Do we need a single unit for overload control? Different units may be limited by different SIP message processing resources.

We talk about calls per second, but Hadriel thinks itÕs safer to talk about out-of-dialog requests.

Robert thinks that it would be bad for these mechanisms to have to know about sessions. John Elwell agreed.

Robert (as AD) – confused about continued discussion about this – when you came up with an algorithm that didnÕt collapse, what did it do? J We should be further down the path than this level of noodling.

Simulations were targeting INVITE-based dialog-creating messages.

Partha - How can we make this generic? Call flows can be very different (including PRACK, for example).

Hadriel – doesnÕt matter how many messages it takes to create a call, we just whack on the base of dialog-creating requests. We canÕt know how much work we avoided by whacking any specific INVITE, for example.

Vijay – Partha is asking if we are going to standardize a behavior.

Jonathan – need to specify what happens on the sending side – TCP wouldnÕt work if you just said Òoops, loss happened, please respond appropriatelyÓ. Document is useless without this, so please include this with the mechanism.

Will submit 01 ÒsoonÓ. Will add text around message prioritization and units, to capture discussion on mailing list and during this meeting.

Milestone is to ship this draft to IESG by September – is it ready for WGLC?

Robert – donÕt ask people to hum on a draft they havenÕt seen – thatÕs awkward.

Keith – ask Òare there any open issues with the document?Ó

Chairs will wait for 01 version of the draft, please read and comment.

Hadriel – weÕve been talking about RPH in another draft, but that also affects this draft. Needs to be included in this draft.

Please send text for RPH to Volker – Janet Gunn said (on Jabber) that she would send text.

Heshem – is there any enforcement for pushback? ThereÕs text about this in the Design Considerations draft – you can reject messages from senders who arenÕt behaving, etc.

 

1.1.5            SIP Overload Control

(draft-gurbani-soc-overload-control-01)

Work began about four years ago (and 8 revisions of an individual draft ago).

We use Via, because the reaction needs to be hop-by-hop.

Oc-accept – why not follow rport semantics?

Is this feedback in one direction? Yes.

Make ip+port the default, and get rid of the oc-port parameter altogether? Sure.

Simplify oc-seq parameter, so it always uses the same format? Using UDP, so thinking that we should sequence responses to retransmitted requests. Robert, speaking as individual – this is fundamentally broken, donÕt even go there.

Jonathan – TCP RTT retransmission has the same problem, and they ignore retransmitted segments when calculating RTT. Need to do the same kind of thing here? Jonathan now thinking that the issue isnÕt quite the sameÉ

How fine a granularity do we want these servers to be working with? Need to think about this before we spend a lot of time thinking about sequencing.

Should we just use timestamping? Receiver is just figuring out if this message has feedback that weÕve already seen in another packet previously received.

Paul – this is just a number with a decimal point, or without one. In the end, itÕs just a number to the recipient.

Hadriel – itÕs just encoding a floating point. You might use a timestamp to create the floating point, but the receiver doesnÕt care how you created it.

R-P header summary – need to agree which R-P namespace upstream and downstream will support.

Keith – remember that multiple namespaces can be supported.

Open issues:

Message prioritization – how do you know what to drop? Lots of discussion. Identifying messages which must not be dropped (RPH, for example) and which can be dropped (out-of-dialog messages). (ÒdroppedÓ means Òhandled speciallyÓ).

            ¥           John Elwell – might not be able to achieve 30-percent drop without dropping RPH messages – correct, then you need to drop them.

            ¥           Keith thinks this is based on the business arrangement you have for the namespace, canÕt write business arrangements here.

            ¥           Hadriel – youÕre only meeting a percentage; if you have only one call, you canÕt drop 30 percent of a call. Question is Òdropping 30 percent of NEW requests, or 30 percent of ALL requestsÓ? Should we signal two levels of dropping? If you have two levels, you signal oc-low and oc-high, for example.

            ¥           Keith – possible to define a priority scheme that LOWERS priorities. Different business arrangements describe different things.

            ¥           Hadriel – yes, spam might be low, for example.

            ¥           Paul – read RPH after discussion started. Understanding is that if you start understanding RPH, you have to understand everything about relative priority ordering. I agree with out of dialog vs. in dialog, but it all comes down to the details. WeÕre not stating our assumptions about the impact of different messages on the system. CanÕt talk about this if we just assert what the result is. Not all in-dialog messages are equally important – PRACK, BYE É and thatÕs only in the space we know!

            ¥           Janet Gunn via Jabber – more than two levels, needs to be a local policy.

Corner case – downstream server that is so overloaded that it canÕt even respond, so you just stop sending. Would we send a NOTIFY saying Ògo aheadÓ?

            ¥           Hadriel – what dialog information is available? Would be implementation-specific.

            ¥           Do we worry about the seconds where a rebooted server is available and no load is presented?

            ¥           Partha – NOTIFY could be used to pass OC values – same parameters, only traffic is different.

OC parameter currently carries loss rates. Could support other measures.

            ¥           What did testing show? Other messures behave the same way.

Can we adopt as working group draft? Hums, but nothing conclusive?

How many people have read the draft? About 8. Hum on the mailing list? WeÕll do that.

Hadriel – can you ask about any possible objections when you do the hum on the list?

Robert – is anyone thinking of alternatives to this draft?

Adam – there are five drafts on the agenda, and at least two that do the same thing, arenÕt there? Salvatore doesnÕt think they have the same subject. Volker agrees that the drafts do overlap. Adam thinks we have to look at interactions if weÕre going to have multiple mechanisms – thatÕs why I was silent.

 

1.1.6            SIP Resource availability Event package

(draft-partha-soc-overload-resource-availability-00)

Each device has a variety of resources, and the protocols used affect what resources are needed, and how much of each resource.

This proposal isnÕt in scope for the working group at this time – but is it interesting?

Spencer – MRB in MEDIACTRL had to worry about the same kind of things (licensing, DSPs, codec type, etc.) Talk to those guys and see what you can steal from each other?

Dale – people are going to have to figure this stuff out in the real world, would be great to have a standard for it.

John Elwell – might be able to accept audio on audio/video call – may not want to push back at the call level.

Dale – load balancers need to understand this kind of stuff dynamically.

(Spencer at mike)

Hadriel – take this to DISPATCH.

Robert – right answer. Most of the time, SIP UAs arenÕt going to care about the information youÕre exposing.

Spencer – talk to the guys in MEDIACTRL in the hallway on the way to the DISPATCH meeting JÉ Media Resource Brokers are exactly the kind of SIP UAs (B2BUAs) that need to know stuff like youÕre talking about in your proposal.

 

1.1.7            Transmission of a Session Capacity Estimate (SCE) to Prevent SIP Server Overload

 (draft-jones-sip-overload-sce-00)

SCE is conveyed through normal SIP signaling (including SIP OPTIONS when no other traffic is being sent).

Hadriel – like the OPTIONS mechanism, and there are a lot of them in wire traces (no matter what weÕd like to believe). This is an absolute number of calls for capacity – itÕs like trunk capacity. Not quite in scope for us.

Partha – when you have one kind of call, SCE will work. If you have different kinds of calls, it wonÕt work.

Hadriel – it will work, but it will block calls that shouldnÕt have been blocked. Every INVITE you get creates a different amount of load, even for INVITEs to the same UA. Could this capacity be dynamic?

 

1.1.8            Open Discussion

Expecting another virtual interim meeting at the end of September – watch the mailing list for details.

 

 

 

 

Notes from Paul Kyzivat

 

Stream of consciousness (very rough) notes from SoC session 26-july.

Backup note taker: Paul Kyzivat

 

First time this wg has met face to face. Interim meeting previously.

 

 

* Volker speaking for Charles Shen and Henning who aren't present.

 

Just an overview of two documents they submitted.

For both, request to read and send comments to list.

 

* Draft: ...overload_design:

 

Volker: Status report. Only minor changes.

 

* Draft: ...overload_control:

 

Volker: Summary of scope. Mention of discussion about session overload,

etc. – that for now it is out of scope.

 

Reference from mailing list – should have some guidelines for what

should be preserved and what dropped. Vijay mentioned that RPH must be

taken into account with its priority. Keith mentioned that on network

boundaries message must be authorized for the RPH too. Hadriel agrees

the selection is going to be configured, but wants guidance on how many

levels of prioritization. Partha asked about the units of reduction –

could it be sessions rather than requests. Volker acks these candidates,

suggests that selection of unit is needed, and we need a choice rather

than it being negotiated or other wise variable.

 

Sohel Kahen(?) asked about absolute limits, rather than percentage

reduction.

 

Hadriel commented on that too. He favors messages/sec or percentage rate.

 

Robert as individual objects to upstream element needing to be aware of

sessions – some things won't know that.

 

Robert as AD – wonders about discussion on selection algorithm – what

did simulations use.  Volker answer was that simulations used only

invite-based call sequences, and only dropped INVITEs.

 

Partha noted that not all call sequences are the same, some are more

lengthly, and this makes difficult to know how much to push back. There

was then some extended discussion on this. Jonathan Rosenberg compared

this to TCP congestion control with only loss indicated – need the

algorithm to be specified to make it work.

 

Volker will issue -01 version - will have text around message

priorizations and units. After some procedural discussions, decided that

after -01 is published will query if this is ready for wglc.

 

Hadriel states that some considerations are needed re RPH.

Vijay reports that Janet, on jabber, says will send text on RPH.

 

* vijay: overload-control-mechanism draft:

 

Robert as indiv contrib. objected to using sequence numbers. Then JDR

said this is a solved problem in other protocols. There was some

discussion about this. This was decided to be still unsolved – requires

more list discussion.

 

Regarding R-P header, Keith comments that multiple namespaces may need

to be supported.

 

Major open issue is which messages to drop.

 

open issue 3: units of reduction – lost rate or absolute rate. No new

conclusion.

 

Request to adopt as a wg item, humm: weak for, no against. Robert asked

if people understood the question. All seemed to. Asked who read the

draft. Eight had.

 

Hadriel asked if could ask on the list why people didn't humm. Robert

speculated that people may be looking for alternatives. Adam commented

that he thought there were other drafts that are alternatives. He is

looking for some evaluation, or vision of how these things work together.

Will take another humm on mailing list.

 

* Draft: ...Parth on overload-resource-availability:

 

Victor ? asked in middle of presentation if we agree that this is also a

problem we need to solve. Spencer mentioned that mediactrl had similar

issues – perhaps we need to talk to them about it. Dale Worley also

thought this could be an important problem to solve. John Elwell asked

if you would always want to reject a video call even if you couldn't

handle it – you might just accept only the audio. Dale mentioned a load

balancer hasn't been addressed here. Hadriel thought this is something

different – would like this taken to Dispatch – or dropped as

impossible. Robert says this information is useful to an *application*

that needs a lot of application knowledge to make use of it. Robert

agrees this should be taken to dispatch, with view that this is

application-application communication. So feedback to move it there.

 

* Draft: ...SCE draft by Victor Pascual):

 

Hadriel in favor of options ping.  But objects to the info being sent –

doesn't believe it can be estimated usefully. Didn't get very far. Lots

of audience discussion – recommended to go to Dispatch.