[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Megaco] Conclusion "MGC Out-of-Service" - Immediate vs Delayed Failover; RE: Which MGC should be contacted first on retransmission timeout?



Hello Vikram,

guess that there is in the meanwhile consensus within the H.248
community, that there isn't such a thing like a "better behaviour" in
*general*. Because we got multiple useful, but "conditional actions", as
you also describe.

Multiple possible actions is inline with H.248.
[The underlying problem in reality are actually H.248 implementations
which are just supporting "one action", which is on the one side H.248
compliant, but for some particular conditions "not the best action".]

In this scenario might be following conditions:
C1: H.248 transport mode (reliable vs non-assured transport)
C2: IPLR in the network segment of the H.248 CA
C3: Node availability of primary MGC implementation
C4: Node availability of secondary MGC implementation
(C5: Node availability of MG implementation)
C6: ...

Next step would be the quantitative estimation of time periods, in order
to distinguish re-attempts with the primary MGC vs failover action vs
...

Conclusion: a real-world MGC-MG tandem (= H.248 CA) must provide a
coordinated set of "service change rules" (rule Rx: set of conditions C
will lead to action Ay ...), of course, compliant to H.248.
Such a coordination may be only done via an H.248 profile specification,
i.e. the mutual agreement between MGC and MG.
A standardized profile is just the first step.
You need to add more details on call-independent procedures (e.g. using
the macros from ETSI TR 183 025.
Then you have to specify the "conditional actions" also in the profile.

This is the only way forward in my opinion.
If not, both H.248 entities claiming correctly a H.248 compliant
ServiceChange implementation, but covering only a subset of real-world
use cases.
If not, you may end in protocol deadlocks.
Particularily MGCs as master should be capable in differentiating
multiple actions, or specify in the profile spec the set of expected
actions by their controlled MGs.

Not to forget the management plane, which requires a similar
coordination between the fault/alarm mgmt of all controlled MGs and
primary/secondary MGCs. The alarm mgmt behaviour (related to H.248
ServiceChanges) might be also covered in the H.248 profile spec.

BR, Albrecht

PS
Some time ago we did a high-level proposal on this SC aspect, see ETSI
TISPAN contribution:

13tTD374 WI-03051 H.248 System Management - Conclusion "MGC
Out-of-Service" - Immediate vs Delayed Failover
http://docbox.etsi.org/TISPAN/TISPAN/99-Archive/2007/50-20070514-Sophia-
13ter/13tTD374_WI-03051_H.248_System_Management_%E2%80%93_Conclusion_%E2
%80%9CMGC_Out-of-Service%E2%80%9D_%E2%80%93_Immediate_vs_Delayed_Failove
r.doc



> -----Original Message-----
> From: Vikram Guleria [mailto:vguleria at redback.com] 
> Sent: Dienstag, 9. September 2008 19:56
> To: Schwarz Albrecht
> Cc: Raphael Tryster; megaco at ietf.org
> Subject: Re: [Megaco] Which MGC should be contacted first on 
> retransmission timeout?
> 
> Hi Albrecht,
> 
> After going through this thread will it be correct to assume:
> 
> 1. There is no reason to retry the same MGC with which a 
> transaction failure has happened for Notify requests sent from MG.
> 2. For the reliable transport protocols it is possible to 
> find out if the link/connection to MGC has gone down. I think 
> the reason F.3.6 mentions to retry the same MGC is because we 
> do not want to go into cycle of retrying all other MGCs 
> during transient failures. But don't you think that the 
> reliable transport layer itself should handle such transient 
> failures. The transient failures should be  transparent to
> H.248 layer.
> 
> Isn't it better to have MG always try  different MGC starting 
> from primary (first in the list) irrespective of the 
> transport protocol. For reliable transport protocols , the 
> transport layer will have capability to recover from 
> transient failures transparently, so that MG does not enter 
> the disconnect procedure.
> 
> Thanks,
> Vikram
> 
> 
> 
> 
> Schwarz Albrecht wrote:
> > Hi Raphael,
> >
> > you may be right for a particular network instantiation, whereas 
> > H.248.1 is considering the more general case.
> >
> > Here: guess "Notify" relateds to the H.248.14-based "MGC polling 
> > mechanism by MG", which again is a technology, required only for 
> > non-assured transport protocols.
> > Any assured transport technology does not require H.248.14.
> >
> > Annex F (F.3.6) is transport-independent, thus not providing all 
> > potential logic when coupling non-SC based call-independent 
> procedures 
> > with SC procedures.
> >
> > Saying that, your point is valid, and I would try to 
> address it in a 
> > profile spec (because a profile is transport dependent).
> >
> > BR, Albrecht
> >
> >
> >   
> >> -----Original Message-----
> >> From: megaco-bounces at ietf.org
> >> [mailto:megaco-bounces at ietf.org] On Behalf Of Raphael Tryster
> >> Sent: Dienstag, 29. Juli 2008 07:25
> >> To: Christian Groves
> >> Cc: megaco at ietf.org
> >> Subject: Re: [Megaco] Which MGC should be contacted first on 
> >> retransmissiontimeout?
> >>
> >>
> >>  Thanks, Christian.  So to give a concrete example, if MG 
> receives no 
> >> reply to retransmissions of a Notify for half a minute, it will 
> >> transmit SC Disconnected for another half a minute before trying 
> >> failover to another MGC.  I don't see any reason why 
> retransmission 
> >> timeout of SC is a more reliable indication of MGC failure than 
> >> retransmission timeout of a Notify (except a bug in the MGC), but 
> >> this mechanism provides some extra time to verify the MGC or the 
> >> connection is dead before trying a different MGC.  Is this 
> extra time 
> >> actually a Good Thing?
> >>
> >> Regards,
> >>
> >> Raphael
> >>
> >> -----Original Message-----
> >> From: Christian Groves [mailto:Christian.Groves at nteczone.com]
> >> Sent: Tuesday, July 29, 2008 8:06 AM
> >> To: Raphael Tryster
> >> Cc: megaco at ietf.org
> >> Subject: Re: [Megaco] Which MGC should be contacted first on 
> >> retransmission timeout?
> >>
> >> Hello Raphael,
> >>
> >> In Amendment 1 to H.248.1v3 two notes were added to the F.3.6 text 
> >> that may provide further explanation. The text reads:
> >>
> >>
> >>       F.3.6 MG Lost Communication
> >>
> >> When the MG has detected a loss and subsequent re-establishment of 
> >> communication with the MGC (NOTE 1), the MG sends a ServiceChange 
> >> Command (NOTE 2) with a ServiceChangeMethod of 
> "Disconnected" to the 
> >> MGC
> >>
> >> in the current control association. If the MGC fails to 
> respond, the 
> >> MG then sends a ServiceChange Command with a 
> ServiceChangeMethod of 
> >> "Failover" and ServiceChangeReason 909 ("MGC Impending 
> Failure") to 
> >> each
> >>
> >> MGC in its list in turn until it has successfully 
> established a new 
> >> control association, or it has exhausted its list of MGCs. 
> If the MGC 
> >> does respond, the control association continues as if it were not 
> >> interrupted.
> >>
> >> NOTE 1: The two main causes for lost communications 
> between the MGC 
> >> and MG are 1) failures or short-term interruptions of the H.248 
> >> transport connection, or 2) the primary MGC going 
> "OutOfService". The 
> >> MG will not necessarily be able to discriminate between the two, 
> >> therefore the ServiceChange procedures are the same in both cases.
> >>
> >> NOTE 2: The MG may send one or more ServiceChange Commands. 
> >> The transmission of subsequent ServiceChange Commands may be 
> >> timer-controlled. Multiple re-establishment attempts may help in 
> >> situations with short-term failures, either of the transport 
> >> connection or of the MGC, thereby avoiding the invocation 
> of failover 
> >> procedures when they are not warranted.
> >>
> >> ....
> >>
> >> With regards to "Disconnected" despite its name its actually to 
> >> indicate
> >>
> >> that the Control association was lost and is now 
> re-established, thus 
> >> the first sentence of F.3.6. I don't think there's a 
> problem between
> >> 11.5 and F.3.6 because 11.5 indicates firstly that its already 
> >> determined that the MGC has failed thus the procedure goes into 
> >> failover. In F.3.6 the first part of the procedure is that 
> >> communication
> >>
> >> is restored and if this isn't successful then go into a failover.
> >> With regards to the "hint" of another method I guess its 
> relying on a 
> >> transport level indication of connectivity between a MGC and MG.
> >>
> >> Regards, Christian
> >>
> >> Raphael Tryster wrote:
> >>     
> >>>  Is there a contradiction between 11.5 and F.3.6?
> >>>
> >>> 11.5 says:
> >>>
> >>> "If the MG detects a failure of its controlling MGC, it 
> attempts to 
> >>> contact the next MGC on its pre provisioned list. It starts its
> >>>       
> >> attempts
> >>     
> >>> at the beginning (primary MGC), unless that was the MGC 
> that failed,
> >>>       
> >> in
> >>     
> >>> which case it starts at its first secondary MGC".
> >>>
> >>> F.3.6 says:
> >>>
> >>> "When the MG has detected a loss and subsequent 
> re-establishment of 
> >>> communication with the MGC, the MG sends a ServiceChange
> >>>       
> >> Command with
> >> a
> >>     
> >>> ServiceChangeMethod of 'Disconnected' to the MGC in the current
> >>>       
> >> control
> >>     
> >>> association. If the MGC fails to respond, the MG then sends a 
> >>> ServiceChange Command with a ServiceChangeMethod of 
> 'Failover' and 
> >>> ServiceChangeReason 909 ('MGC Impending Failure') to each 
> MGC in its 
> >>> list in turn until it has successfully established a new control 
> >>> association, or it has exhausted its list of MGCs".
> >>>
> >>> So, suppose MG sent a Notify and failed to receive a reply before 
> >>> retransmissions timed out.  According to 11.5, it would 
> send SC to a 
> >>> DIFFERENT MGC.  According to F.3.6, it would send SC to the
> >>>       
> >> SAME MGC,
> >>     
> >>> and only try a different one if the SC also timed out.  
> Which is it?
> >>>
> >>> I am also puzzled by the language of F.3.6.  ""When the MG has
> >>>       
> >> detected
> >>     
> >>> a loss and subsequent re-establishment of communication
> >>>       
> >> with the MGC".
> >>     
> >>> Usually, SC Disconnected is sent as a trial to see whether the MGC
> >>>       
> >> will
> >>     
> >>> now reply, and not as a result of re-establishment of 
> communication.
> >>>       
> >> Is
> >>     
> >>> some other method of detecting re-establishment of
> >>>       
> >> communication being
> >>     
> >>> hinted at here?
> >>>
> >>> Raphael Tryster
> >>>
> >>>       
> >> **************************************************************
> >> **********
> >> **********************
> >>     
> >>> IMPORTANT: The contents of this email and any attachments are
> >>>       
> >> confidential. They are intended for the
> >>     
> >>> named recipient(s) only.
> >>> If you have received this email in error, please notify the system
> >>>       
> >> manager or the sender immediately and do
> >>     
> >>> not disclose the contents to anyone or make copies thereof.
> >>> *** eSafe scanned this email for viruses, vandals, and malicious
> >>>       
> >> content. ***
> >>     
> >> **************************************************************
> >> **********
> >> **********************
> >>     
> >>> _______________________________________________
> >>> Megaco mailing list
> >>> Megaco at ietf.org
> >>> https://www.ietf.org/mailman/listinfo/megaco
> >>>
> >>>   
> >>>       
> >> **************************************************************
> >> ********************************
> >> IMPORTANT: The contents of this email and any attachments are 
> >> confidential. They are intended for the named recipient(s) only.
> >> If you have received this email in error, please notify the system 
> >> manager or the sender immediately and do not disclose the 
> contents to 
> >> anyone or make copies thereof.
> >> *** eSafe scanned this email for viruses, vandals, and malicious 
> >> content. ***
> >> **************************************************************
> >> ********************************
> >>
> >> _______________________________________________
> >> Megaco mailing list
> >> Megaco at ietf.org
> >> https://www.ietf.org/mailman/listinfo/megaco
> >>
> >>     
> > _______________________________________________
> > Megaco mailing list
> > Megaco at ietf.org
> > https://www.ietf.org/mailman/listinfo/megaco
> >
> >   
> 
> 
_______________________________________________
Megaco mailing list
Megaco at ietf.org
https://www.ietf.org/mailman/listinfo/megaco