Meeting Minutes: ANCP. IETF76.
Tuesday, November 10, 2009
--------------------------------
CHAIRS: Wojciech Dec  & Matthew Bocci

Milestones:
Revised from last IETF. So we are looking for a new version of the PON draft before putting a call to the list to accept as a WG draft. 
Woj: A new version of the PON draft has been posted but no discussion so far.

Other milestone update was the addition of ANCP applicability to PON.

One ANCP draft (security) in RFC editor queue. One draft in IESG processing (ANCP framework), though we understand the outstanding issue is now resolved. Three WG drafts work-in-progress.


ANCP Protocol Draft (Roberta):
----

Between last meeting and this meeting we posted two revisions of the draft, current revision is version 08. In this presentation you will see the changes between version six and version seven, and version seven and version eight. There are just a few changes on version eight, we posted them recently this week just reflecting feedback and comments from the mailing list.
Changes from 06 to 07, we did a lot of editorial changes and fixes related to TLV. The main changes are related to the multicast status message, replaced with general status message. Last meeting we decided to use the generic response message to handle different types of errors. There are just a few changes not in the slides: no one commented on the changes in the mailing list (posted by Tom a long time ago). 
Decided to remove all the multicast text from the protocol draft and put it into the ANCP-specific multicast draft. 
Currently we are using tech-type 05 to indicate DSL, but in the original version there are other values allocated. We will use value 01 for PON. 
Target-type is a generic TLV, special case is a single access port. It would be good to have a range of 1000-1020.
There was a correction to status-info from 012, however current implementations are using 016, so changed the status-info to 016. Asked for feedback on the mailing list and replies were positive.

There was a discussion on versioning in Stockholm, we decided to change the sub-version number for the ANCP standard protocol. You will find the version will become 2 (from 1) on publication. We added text in section 5 and section 5.2 to describe this issue. Section 2 describes the history (ANCP pre-standard implementation) whereas section 5 describes the negotiation mechanism.
Added an appendix which summarised the versioning, and a clarification on what version 3.1 really is.

Woj: clarification, the non-normative appendix that will contain clarification of what version 3.1 is, is there anything in there right now?
Roberta: right now there is just a description of what is in section 2 and 5.2. We have version 3.1 for this reason, and a negotiation process we reuse, etc.
Woj: for the WG, in terms of non-normative references we have this issue of defining which of the past drafts define version 3.1. It seems a bit odd to have this in a standards document. We have an RFC which will be 3.2, all implementations are currently 3.1. Everyone then objected that 3.1 wasn't document, now we have to document 3.1?
Ralph: I understand this is 3.2, but who cares what 3.1 was? What difference does it make? Does anything in 3.2 depend on what 3.1 was?
Woj: No
Ralph: In that case, a non-normative appendix that states what version 3.1 was seems perfectly fine.
Woj: My concern is arguments about what 3.1 was
Matthew: There are no 3.1 implementations, there are many pre 3.2 implementations
Ralph: So why does anyone want a definition of 3.1 is? Write something down as 3.1 and see who complains.
X: 3.1 was implemented and is deployed. Because its already deployed you need to describe it for migration purposes.
Woj: but not everyone was implementing the same 3.1. 
Ralph: you could also say "there are pre-3.2 versions" and leave it at that?
Woj: why don't we just leave it out?
-- no room objection --
Roberta: we will just leave out the appendix

Version 08 contains a small change with partition ID. We didn't remove the partition ID field in 07, but removed the two options to select the partition ID. In version 08 we re-inserted the two options.
Woj: regarding partition ID, the current state of the specification is not useful - while its defined its not all that useful
Roberta: we asked for further discussion but have had none
XX: there is no use case defined in the draft for the partition ID, no use cases were defined, it seems to be of local significance to AN, and this could be achieved using socket instead of partition ID. Is partition ID unique per AN, or can the same AN have multiple partition ID. If there is only one partition ID per AN, then there is no value.
Woj: continue discussion on mailing list
Roberta: are we ready for last call?
Woj: I think it is ready for last call
Roberta: I will post a new version without the appendix


Gap analysis between TR-147 and ANCP
-----
-1.03:23

As 147 is why we have ANCP here, in 147 we define a lot of functional requirements. The BBF chose GSMP as the protocol to implement the layer 2 control, and is why we have the ANCP working group here. The Broadband Forum does not define protocols, just requirements. This gap analysis is to show the functional requirements that are in WT-147 but not defined by ANCP, and then there are things ANCP can do that are not listed as requirements.

-1.01.30

Regarding topology discovery: there are three requirements not yet in the current ANCP draft. They are not mandatory requirements and they are closely related to DSL technology. As the DSL line could be unstable, the report should only be sent when it is stable. Perhaps we could add words in the draft to this affect. THe second defines a threshold of change between line rate, etc, and when this threshold is exceeded a report should be triggered. The last one is the DSL line could be unstable and bounce between port-down and port-up, in this case the AN could tell the BNG that the line is unstable (another SHOULD requirement). If we want to do this it would need to be defined in the ANCP protocol. 
Line configuration: there are several parameters in WT-147 which are not in ANCP relating to line configuration. Again we would need to define these parameters in the protocol [ANCP]. Multicast: ANCP drafts do more than what is defined in WT-147, but there are some MC requirements not defined, including mapping between ports and multicast VLANs and several other requirements related to MC which are not covered. These may be closer to a protocol part vs. the multicast extensions because they are line configuration.
How are we going to deal with all these gaps?

Woj: when looking at these slides it would appear that these gaps fall into two categories: the line change conditions - they seem to be more about AN functional behaviour as opposed to protocol design. We could address that as part of the last call comments. Would that work? [Yes]. Are there any things you think that require protocol changes? 
Resp: Maybe we do need to define one for the BNG line unstable as it is a change to the protocol
Woj: I propose that protocol modifications/extensions could be captured in a separate draft rather than the protocol draft. Like a line configuration draft as its a use case not addressed today.
Resp: I agree, the line configuration and multicast can be in different drafts. What about the last one, should we define new parameters to say this line is unstable, its not configuration action, its just a report. Do you think it is valid in this last call phase?
Woj: I would like to understand more what it entails. Discuss during last-call phase.
Roberta: We do not have all these requirements in the framework draft, is that a problem?
Woj: Its only a problem if we want to make it a problem. Lets put the protocol draft into WG last call and ...
Comments: The TR-147 requirements for topology discovery and versioning are also covered in the framework - to keep things aligned we need to keep these requirements.
Dave: The notion of having multiple multicast VPN mapping to a port implies the ability to steer control plane traffic - seems like a piece is missing. How do we sort out colliding multicast address space on the access node? There are some gaps that exist.
Woj: Are these TR-147 gaps?
Dave: that's entirely possible, I don't remember a means to associate ranges of multicast addresses with a multicast VLAN.
Woj: Are these MC use cases being worked on as a followup to TR-147 in BBF?
Dave: We did just kick off 207, but it may or may not incorporate it. The issue of resolving colliding MC address space comes up quite often.
Woj: For the MC topic, if we say here is the applicability and descriptive scenario on how this applies, would this help?
Dave: Yes, I think having multiple MC VLANs associated with a single access port is not a good idea without describing a mechanism on how I would actually use that in the AN. 
Woj: If there was interest then a new draft could come  - its not part of the protocol draft. Interested parties should get together and create a draft.
Dave: The issue of colliding address spaces comes up frequently enough that I need to disambiguate what is coming from the home and I need to define which VLAN to send the request up.
DM: Could we use IGMPv3/SSM instead of using static mappings. Perhaps this isn't best put in ANCP.
Dave: I don't think AN should be full blown routers, there seem to be simpler solutions.


Graceful restart draft
----
Sri:
There is a mention of base multicast in the ANCP framework and base protocol. We are trying to address issues when the AN or BNG restarts in this draft. In the current draft there are three broad types of messages exchanged between AN and NAS, they are: capabilities exchange, topology data pushed from AN to the NAS for qos policies, and; multicast data being stored in NAS about which flows are active on the AN - bandwidth utilisation, etc.
Whenever a session is established between AN and NAS, the current draft doesn't say the AN should push information for all active ports. THe ports could have already come up and the AN may not send information about these ports to the NAS. This is more of a problem when the NAS restarts and the AN has been up for some amount of time.
In some cases it is possible no new ports may come up on the DSLAM, and the AN may not send any new ports [because they are already up]. Graceful restart is essential when there are a cluster of NAS. Currently using ANCP keepalives it takes quite some time to detect an adjacency is down, we could use TCP socket state for quicker detection.
In multicast, the view on the NAS and the AN may not be synchronised. If the NAS is to restart, the NAS may not be aware of what flows are active.
I have solutions for two issues
Topology discovery: the AN will push all the information for all ports whenever a new adjacency is established. This solves the problem with already active ports.
Adjacency loss detection: TCP socket state will be much quicker Multicast solution is not very simple, and we are still working it out. One possible method is to ask the AN to send the information to the NAS. Instead of reauthorising all the flows perhaps you just sync the data instead of reauthorising all the flows. [Deal with those flows that require reauthorisation by exception].
The solutions proposed here do not cause a drastic change on the current mechanisms, there are no new messages and the changes are minimal.

We cannot assume all implementations could use TCP HA, there will still be gaps such as with topology discovery. [So TCP HA is not really going to work.]

David Miles: On multicast, do you want to reuse the existing portup message?
Sri: So you do push the existing port-up message David Miles: Could we put additional multicast TLV is that existing port-up message, so port, line characteristics and multicast are all conveyed.
Sri: It is possible, but you would then have problems processing on message type. Multicast messages would be confused if TLV were into port messages. It talks of having a separate message for having.
David: I see message types as defining events, and isn't the purpose of TLV to give us flexibility to put these into different types of message?
Woj: What difference do you think it would make if the port-up message and unsolicited multicast message was sent separately.
David: Not really, its just a design issue.
Woj: we would need to change the existing semantics of port ups.
David: We need a port up to be extensible though, right?
Woj: yes, a port up is an informational event sent to the NAS. 
Sri: it seems ANCP is a request-response protocol. It could be possible to define a new message type.
Woj: I think what David is saying, is why not put all information about the port-up message
David: Indeed, but I'm not too worried about making a new message, call it bulk-update, we need a new implementation to support this anyway so why not?
Sri: This looks more natural
Francois: I too wanted to know what was the strong motivation for doing so. I expect that there will be additional multicast things that may need to be reported, such as for bandwidth delegation. Perhaps a bulk update is a way to go?
Woj: we discussed a bulk update, but its basically closed because of a TCP issue. However having a different behaviour, such as putting multicast information into a port-up is open for discussion.
Sri: Yes, we could look at that and see how it may be implemented all in one message. Perhaps it is possible.


---
Multicast control draft
-
A summary of the biggest changes, (quite a lot in this round):
- A restructuring of the different multicast use cases/capabilities into four different capabilities, before we had three incremental capabilities, and now we have four and they can be negotiated individually.
- Redefine the operations admission control and how we split the responsibilities between the NAS and the AN
- There is a change in how we use the result code
- We started using the generic response message

The capabilities have been restricted into four independent ones:
- NAS initiated replication
- CA with white/blacklist
- CA with greyest
- Bandwidth delegation

Each have a different capability type so each can be independently negotiated. For each of these capabilities theres is a subsection that describes the protocol requirements required to support that capability, along with a full procedure of how to implement that capability.
In addition to the description for each capability, we also describe how interactions occur when multiple capabilities are simultaneously active. Whenever there are changes because of interaction this is pointed out specifically.

Admission control operations:
The principal now is that the activation of the admission control function on the AN is explicitly controlled at provisioning time by the NAS. When you look at a provisioning message now ow know what the AN is supposed to do with admission control. It is not just a on/off capability, there are actually two knobs if you will - one controls whether the AN will do admission control for IGMP joins, the other is whether the AN does admission control for messages from the NAS in the form of a multicast replication control message. You have one, the other or both. 
The activation of the admission control on the access node is purely on the access node level. You turn it on for while lists and it turns it on for all ports, at the whole  AN level. To achieve this, two new TLV: whitelist CAC TLV. Included when you want the AN to do admission control for IGMP joints. MC replication control TLV tells the AN to do admission control to do joins for MC replication control messages from the NAS.

Q: Are we giving up port-level admission control?
A: Yes you are, there is still the idea of BW on a per-port basis, but you don't turn it on/off
Q: If there is the concepts of a whitelist, why do we have whitelists pass admission control?
A: Its going to be successful through conditional access check, but it is not systematically for bandwidth.
Q: So its admitted for the flow, but the bandwidth?
A: Whitelist defines conditional access rights, it says that flow is allowed to set it up. Then bandwidth admission control must occur. In the case of whitelist BW admission control is done by the AN and it can succeed or fail.

NAS Initiated Replication: At provisioning time, there is a MC replication control CAC TLV in the provisioning message for the NAS to tell the AN to do admission control for MC replication control joins. In the port-management message, you see bandwidth allocation occurs per-port. Whenever there is a MC replication control message containing an add, the AN will do admission control for that message.

NAS does admission control: The NAS does NOT include the MC replication control CAC TLV, it tells the AN not to do admission control on MC replication control adds, and there is no need to put a bandwidth on the port. It is the NAS which does the admission control decision itself before deciding to replicate on the access node.

At the moment we assumed only one capability, CA with white and black lists. 
Admission control by AN: at provisioning time the NAS includes a MC service profile and a whitelist CAC TLV to perform admission control for whitelist IGMP joins. In the port management message, the NAS will include a MC service profile name TLV and provide a bandwidth allocation TLV. Then the AN is ready, has a whitelist and possibly a blacklist to do CA, and a per-port bandwidth to do admission control.  

When the NAS does not include a whitelist CAC TLV in the provisioning message, you are telling the AN don't do CAC. In that case the AN will not do admission control. 

These are just two capabilities, but the principle is the same. The TLV in the provision message control the admission control AN behaviour.

Q: Just to mention bandwidth delegation on its own, when not combined with other capabilities you always have admission control. Cause otherwise why bother doing it
Q2: Is there a supported use case to support CA in the access node based on whitelists without doing bandwidth allocation? Can you just admit the flow without doing a bandwidth check?
A; Yes, you can do conditional access without bandwidth admission control

Result code: There was a principle that all MC messages would use a result code of 0, meaning ignore, and the behaviour of the recipient of the message will be specified as part of the message specification. It turned out that for some messages the behaviour we want is a nak-all or ack-all result code. When a message has a ack-all or nak-all we use the existing error codes, perhaps another one? The broader question "is this okay"? Should we stick to the previous approach using a result code of 0, or should we use ack-all and nak-all where there are cases that match that behaviour?
So it sounds like we don't have objections to this, so we will use ack-all/nak-all.

The use of the new generic response. Not much new here as it was all discussed on the list, the new message was added to the protocol documents and the MC draft now uses it when appropriate.

Q: Where we did have MC status in a couple of places, the delegated bandwidth query and the multicast flow thing, particular the delegated bandwidth query, you could get either a delegated bandwidth query response we could get a generic response, so now its just one.
A: Yes, its a choice do you use a generic message or a specific response.

There was a restructuring of the white/black/greylist encoding as a result of meeting some of the requirements in the framework document such as being able to dynamically update the lists. Now you can have an action associated with an entry, such as add and delete. You can also mix IPv4 and IPv6 entries. The actions are add, delete and replace. Replace completely overrides the list.

Woj: Wouldn't the add with an override semantic have been sufficient?
Tom: Add specifies flows to the previous list, so there is no overriding semantic, it just expands the list. 
Woj: I know for MC replication control we have add, delete and modify,
Tom: This is different. Add takes the flows and places them on the list, so it could be a way of creating a list in the first place. If you have a pre-existing list, replace takes the pre-existing list, discards it and replaces it with the content of the action.

The next change is the reset procedure, this one was discussed on the list. Its not contentious so lets skip to editorials. There was a restricting inside the specification of every message type and now there is a specification on what the sender and receiver have to do on every message type. We moved all example messages and call flows to an appendix.  We have discussed on the list things to do, more explicit description of admission control done on the NAS, more editorials to be done. We then need to get some feedback on all the changes etc.