Meeting Minutes

Note takers: John Scudder <jgs@juniper.net>
             Sue Hares <skh@ndzh.com>
             Christian Martin <christianm@juniper.net>
             Warren Kumari <warren@kumari.net>
             Rob Shakir <rjs@rob.sh>
             David Freedman <david.freedman@uk.clara.net>
Compiled and edited by John Scudder

Agenda:

Interdomain Routing (IDR) WG
 
WEDNESDAY, July 27, 2011
0900-1130 Morning Session I
206 A
=====================================================
 
CHAIR(s):  Susan Hares <shares@ndzh.com>
           John Scudder <jgs@juniper.net>
 
o Administrivia
  Chairs                                               10 minutes
   
  - Note Well
  - Scribe
  - Blue Sheets
  - Document Status

o draft-gredler-bgp-te-01
  Slides
  Stefano Previdi                                      15 minutes

o draft-frs-bgp-operational-message-00
  Slides
  Rob Shakir                                           20 minutes

o draft-raszuk-wide-bgp-communities-02
  Slides
  Jeff Haas/Robert Raszuk                              10 minutes

o draft-keyur-idr-enhanced-gr-00
  Slides
  Keyur Patel                                          15 minutes

o draft-bashandy-bgp-edge-node-frr-00
  draft-bashandy-idr-bgp-repair-label-02
  Slides
  Jakob Heitz                                          10 minutes

o draft-zeng-idr-bgp-mtu-extension-00
  Slides
  Jie Dong                                              5 minutes

o draft-ietf-idr-add-paths-guidelines-01
  Slides
  Adam Simpson                                         10 minutes

o Path Exploration Damping
  Slides
  Mattia Rossi                                         20 minutes

Speaker shuffling time                                  5 minutes 
  
Total                                                   2 hours


Editor's Notes:

o Times indicated below are local time and are approximate.  They may be useful
  for finding the corresponding section in the audio recording, 
  http://www.ietf.org/audio/ietf81/ietf81-206a-20110727-0855-am.mp3
o The notation ~"quotation"~ (a tilde next to a quotation mark) is used to
  indicate an approximate quotation.


Chairs/Introduction
-------------------
Start time 9:00

Pending work items:
   - Clarifications are required.
   - GR required, and merged error handling document needs to be done.
   - Route reflection also may need some clarification work.

Seemed important that we address individual drafts with GR for error handling -
but all go into a merged document at the end (Sue). Combine them into one
consistent space - can then see the error handling across the RFCs, and then
changes there.

Status:
o: draft-ietf-idr-deprecate-as-sets: Move from informational -> BCP.
o: draft-ietf-idr-bgp-issues -- WGLC done, awaiting minor edits.
o: draft-ietf-idr-link-bandwidth -- witing on implementation report.


Stefano Previdi: Advertising TE Information in BGP (draft-gredler-bgp-te-01)
----------------------------------------------------------------------------
Start time 9:07

One thing that is required is a change of name of the draft to reduce
confusion. Definition of an API for components outside of the routing layer
to get the topology information - e.g. for alto.

No intention to do things between ___routers___, rather to out-of-band devices.
(~"we have no intention, out of scope, to leak routing information between
*routing* layers. it is for extracting information from the routing layer"~)

Brief overview of alto - collecting topology information, doing some magic, and
then presenting a map that gives the overlay topology - or give an application
some information about the topology where they require this.

Deployment and operation perspective, then putting this into the IGP at the
right places (e.g. different ASNs, or different IGP areas/levels), then this is
quite difficult. Also perhaps some security concerns with this. No passive
mechanisms in IGP in general - therefore BGP is mainly used in the deployments
of alto currently. They need more information than is currently available in
BGP.

Added a new NLRI - for both global table and VPN contexts, therefore include
both SAFI. Then add node, and link attributes in the SAFI. Problem is how we
encode this - without re-inventing the wheel.

Encoding is quite link-centric - using the existing IS-IS encoding format where
possible.

No changes to be made in BGP operations or state machinery - use the standard
BGP path selection and distribution mechanism. BGP path selection will be used
where the alto server sees multiple paths.

There are some TODOs for this draft - especially some around the intended
behaviour so that we don't see the same level of messages as we might see in an
IGP. Some question as to whether they want to encode the area ID in the NLRI,
or whether they should use community or not.

9:13
Q: Sue: Why not timestamp updates?
A: We considered it.  Rather than timestamp, why not use IGP sequence numbers?  
   We're not sure we need it at all.  We're extracting a topo from one
   component and putting it into another. It's like redistribution.  We rely
   on BGP's native mechanisms to converge to the latest information.  We
   could later add a timestamp or sequence number TLV.  I'm not yet convinced
   that we really need it.  The timescale we're talking about is minutes or
   tens of minutes.
Q: Sue: We can take the rest off line, it's a question of synchronization of data 
   from multiple sources.
   
9:20
Q: (Name not captured): I don't understand the puppose of the draft.  We already 
   have TE types in the IGP, type 10 or type 11.
A: I think you're referring to OSPF LSAs.  First, this really isn't a TE draft.
Q: Basically it seems you can get what you need using IGP.
A: Yes, we know the IGP can do that, but there are operational reasons to avoid 
   having the ALTO server participate in the IGP.  Putting in the information
   into BGP helps secure the IGP.
   
9:22
Q: Acee Lindem: Another reason not to flood all the TE in OSPF is that doing so 
   would be ... surprising ... from scaling PoV.  But another thing.  Why not 
   just use PCE?
A: We did talk with PCE folks.  Our use case is different.  The PCE collects from 
   the IGP and then serves up fully-computed paths.
Q: Are you saying this was just expeditious?
A: According to my understanding, there is no mechanism in PCE to extract 
   information from different areas.
   
9:24
Q: Rob Shakir asked if there could be any analysis of the security requirements in
   the draft "Often, security gets waved as a big flag".  
A: alto servers are not permitted in the IGP today
Q: Rob restates the point that the security requirements should be specified in 
   a more clear fashion.  (~"Since you keep talking about security, you
   probably should expand your security section to answer these questions and
   provide your justification."~) 

9:26
Q: Sue: It would help if you meet with PCE authors.
A: We did
Q: Please check back with them.
A: OK


Rob Shakir: BGP operational message
-----------------------------------
Start time 9:27

Comes from request from chairs in Prague to look at diagnostic and advisory
drafts and put together a merger framework. Motivations, firstly error
handling, secondly, improving operations, In the first case, if we make the
complexity of the error handing in bgp greater, then we need some way of our
NOC seeing this condition between the two nodes. In the second case,
decoupling of session and messaging, used in peering, useful for NOC-to-NOC,
at IXP, send static message.

Capability signalled, TLV based message, 4 TLV types, Advise, State, Dump,
Control, these can be easily extended, advise TLVs come from advisory, state
and dump tlvs come from diagnostic, control TLVs deny access and provide rate
limiting. 

Why in band?  BGP authenticated, existing control plane channel, especially
in IXP case, information propagated relevant to the carrier session. 
Security and convergence concerns exist, interlaaving a convergence concern,
use of control TLVs and ability to completelty ignore a message. 

Overlap with BMP, two seperate idea, both useful and both combine well, BMP
is out of band, seperate socket, no overlap.  Intra-domain and inter-domain
scenarios are very different. Does not replace BMP at all.

Soliciting feedback, FAQ online, next steps, revise to -01 based on issues
raised, requesting IDR adoption. 

John, considers this an extension to advisory, request response is the new
work. 

9:39
Q: (Name not captured, Huawei): Interesting. Last year I gave a similar draft 
   in OSPF. The focus should be information that we can't get other ways.
A: OK

9:39
Q: Jeff Haas: My comments are that the additional complexity seems pretty 
   heavy weight as far as the new stuff goes.  That is a big concern.  Secondary 
   concern is that many of the things look as though they have disclosure
   issues.  
A: Agreed.  There is potentially an information leak issue.  You might address 
   it by limiting request-response interdomain, i.e. non-reply.
Q: Jeff: Another thought from implementation PoV.  If we're putting in a query-
   response mechanism to ask "did you get this thing I sent you" there may be a
   problem associated with the fact that some implementations throw away
   received information even for routes stored in adj-rib-in.  
Q: Jeff asks how this non-reply works with the request/reply form, proposes a 
   lighter mechanism, e.g. send back a sequence number or
   something like that instead of the full malformed update. Jeff also
   talks about malformed updates, will it cause local crash when MUP
   received? , Jeff's feedback is "Strongly reconsider anything which
   pushes back an entire update". 
A: Rob suggests that draft should state clearly "don't re-parse errors".  Also 
   mentions draft includes list-of-nlri option.  Suggests maybe just hex dump 
   any returned malformed updates?
Q: Seems like what you really want is a transaction acknowledgement.

9:44
Q: Enke: NOTIFICATION doesn't work for Jeff's case because it brings down the 
   session
(Continued debate between Enke and Jeff regarding where malformed update should
be logged and whether it should be sent back to sender.)

9:46
Q: Enke: please go back to slide 2.  In the case of planned maint, there is already
   a well-defined CEASE message but it may not be adequate because it doesn't include
   the expected duration.  Maybe CEASE needs a subcode to give the expected duration?
A: CEASE doesn't provide enough context to be useful.
Q: Right, so augment CEASE with everything you need.
A: Problem is we don't know what we want.  The flexibility provided in OPERATIONAL
   is a benefit -- the TLV is just a string.

9:48
Q: Jeff: I think Enke's idea is great. You could put the exact format you're proposing
   as a subcode.  Put a string in the CEASE if you want.
A: Another use case is quiescing a connection but not actually downing it.  It still
   becomes unusable but no CEASE.
Q: Warren: yes I agree completely that is a valid use case.

Q: John: Following up on security/disclosure...
A: ... we have the option not to respond at all.
Q: My question for operators in the room is, would it be more common to turn off all
   possible disclosure features on EBGP, or to leave it on?
A: Dave Freedman: There are recommendations in the draft about this (default to
   off).
A: Rob: In Internet deployments, off, but in VPN EBGP might be on. 
Warren: Also could be used by organizations that have multiple ASes who use EBGP
   internally.

9:53
John: Poll for those who've read.  ("A fair number".) 
   Having done that, how many are interested in adding the request/response 
   functionality?
   (Speaking as author of Advisory likes structured message part better.)
   ("I see one half hand")
Robert Raszuk: Request/response already today with one-time ORF for example.  So
   Not fundamentally new.
Randy Bush: Agree generic TLV is better although Jabber/Skype peanut gallery is
   saying "just what we needed! IRC and Twitter for BGP!"  
Rob: Problem with request/response is IF you need it, better to have it in base
   framework.  Unless we add a mechanism for it later.
John: My sense of the room is comfort level with message structuring is good,
   request/response, discomfort.  Move that conversation to list.
   

Jeff Haas: Wide communities
---------------------------
Start time 9:55

(no q's)


Keyur Patel: Accelerated Graceful Restart
-----------------------------------------
Start time 10:01

10:09
Q: Jeff Tantsura: seems kind of complex compared to NSR
A: NSR doesn't protect you from unwanted session resets.  Question is, can you 
   get away with incremental updates

10:10
Q: Jeff Haas: Rob, this is the acknowledgement I was talking about.  Could be 
   leveraged for operational message ack.
Q: Rob Shakir: But in this proposal not each update has a specific ID, right?
Q: Warren Kumari: You could do something like an offset from a sequence number.
   Pretty funky though.
A: It's an implementation issue how often you increment your version.
Q: Jeff: Will you discuss why you chose it as a separate message instead of some 
   other choice such as updating the marker.
A: It's not just updates you have to checkpoint, ORFs for example.  Doing it as a 
   separate message decouples it from any specific message.
A: Enke: here we are really talking about the routing state, the superset of routing
   updates.
   
10:12
Q: Sriram, NIST: This will be very useful in the context of BGPSEC since updates are
   going to get more expensive.
Q: Sriram: You say you may need to do a full exchange in case your table is corrupt.
   Please elaborate.
A: I have to do it any time I don't preserve my Adj-RIB-In or -Out.  For example if 
   my policy changes such that I might have accepted some prefixes that were dropped by
   the previous policy.
A: Enke: One more scenario, specific to NSR, it's implementation specific.  Can 
   simplify existing implemenatations.
   
10:14
Q: (Name not given) seems very complicated, cpu time, memory to do enhanced GR.  Seems 
   like spending a lot during runtime to save work at restart time.
A: Enke: Please be more specific about what seems complex.
Q: Versions have to be generated and saved.  Uses cpu and memory.
A: Enke: If you are already doing BGP today, you are already keep tracking of
   incremental data. All you do know is add a number to each update.
A: Keyur: describes the details more for session restart
Sue: Let's move to list. Question for operators, how often to GRs occur in real life?

10:18
Q: Rob Shakir: this can happen often, and in a short space of time.  Therefore 
   I think this is a really good idea.  
Q: Acee Lindem: This is more robust than NSR since not so many things need to be 
   done perfectly.  I think this is good.
   
Keyur: requesting WG adoption

John: How many read draft? (hands showed more than half the room).  How many want to 
   move it forward to WG?  (roughly the same number that read it).  Of course, take to 
   the list.


Loop Free BGP with Repair Label: Jakob Heitz
--------------------------------------------
Start time 10:22

Q: Rob Shakir: do you have any data showing how often this actually happens?  Where you
   have one CE dual-attached to two PEs?  Basically the CE is the cheapest piece so
   this scenario just doesn't exist in my network.  So is this really an existing problem?  
   I don't have it in my network.
A: You don't have dual-homed CEs?
Q: Rob: No because CEs are cheap and tail circuits are expensive.  I would have two
   CEs.
Q: Keyur: it's not a common case but severe
Q: Rob: but for how long
A: Jacob: In short I don't have any data for how common it is.

10:27
Q: (Name not captured, Huawei): current implementations can solve this issue.  
   (essentially directed forwarding with 3107, long description)
A: Yes that works in an active/standby case.  But in an active-active case
   both PEs must do an IP lookup.
Q: label allocation behavior should always be consistent.  Use IPFRR to cover 
   the active-active case.
A: How is that different from what you said the last time?
(further debate of this point)
John: please take it off-line

John: This kind of work will likely get moved into RTGWG
John: This seems like one particular use case, we should understand whether or not 
   this is a sufficiently general solution


Jie Dong: MTU Extended Community for BGP
----------------------------------------
Start time 10:34

10:39
Q: Jeff: comments -- from IDR pov this is somewhat interesting and I've heard it spoken of
   before.  Is there an assumption this will be fed back into LDP after the inter-AS
   hop?  
A: It can be but not currently part of the proposal.
Q: I would suggest adding that.
Q: I am co-author of a draft to use BFD for PMTUD.  This is not the only solution.

10:41
Q: Jeff Tantsura: I think you should clearly differentiate between 2 and 3 label when
   you run labeled IBGP vs. distribute LDP into BGP.  Quite different, different
   procedures for passing MTU. 

John: How many read draft?  ... a few.  Comments/feedback to authors/list.


Adam Simpson: Add Path Applicability
------------------------------------
Start time 10:42

No questions/comments


Mattia Rossi: Path Exploration Damping
--------------------------------------
Start time 10:57

11:17
Q: Keyur: Good work.  One comment on slide 36 -- knowing a few different 
   implementations ...
A: ... AS 12 is announcing a longer path to AS 11
Q: ... right, and you want to announce that quicker to AS 11 but delay it
   to AS 13 ...
A: exactly
Q: ... then if they are in the same update-group... it would be interesting
   as to how you stop update on one side but not the other.  For this reason,
   might want to consider this more as an inbound rather than outbound
   processing thing.  Send it, but let the other guy process it a little later.
   We can continue off line. 

11:19
Q: Jeff Haas: for path length comparison are you using absolute length or number of
   unique AS?
A: absolute
Q: suggest rerunning with unique-as path length, i.e. factor out prepending
Q: second comment, the further you move from the origin, the more path-hunting becomes
   a problem.  as a heuristic can you decrease your timers as a function of distance
   from origin?  maybe low-exponential with distance
A: great idea, thanks

11:21
Q: Jeff Haas: this works best when AS path length is the main determinant.  but in 
   many networks other policy is at work.
A: Yes we know.  We've observed the data some and do see the longer path updates
   are common.  Needs more investigation.
Q: skh: need to investigate larger data set.
A: Yes, please audience share your data sets.  Mail to author.

11:23
Q: Randy Bush: most operators don't know where their tie-break is.  Experimental
   code in Cisco and Juniper to answer that question.  I did a something-NOG 
   presentation on this about a year ago.  I would *love* to deploy this code 
   in a richly-connected router.  (Contact randy.)
Q: Randy: also I am not RECOMMENDING flap damping.  We recommend changes to make
   RFD less harmful if it's used.  I think your approach is quite interesting.  
   I'm less confident than Jeff about how much state you have to keep.

11:25
Q: Jeff Haas: MRAI isn't as common or as strictly to the RFC implemented as you may
   think.
A: Yes, I know it may differ from implementation to implementation.  In my 
   experiments I used Quagga as-is.
Q: Randy: also many people have turned it off or down
Q: Jeff: We have it off by default.  Some customers turn it on.  But as it impacts
   convergence, sometimes it is not turned on.  Convergence trumps control plane
   load as long as control plane can keep up at all.  Tragedy of the commons.
A: Planning more core network experiments

11:28
Q: Rob Shakir: About wanting more MRT data, are you aware of RIPE RIS project?
A: No, please send pointer.