I am the assigned Gen-ART reviewer for this draft. For background on Gen-ART, please see the FAQ at < http://wiki.tools.ietf.org/area/gen/trac/wiki/GenArtfaq> . Please wait for direction from your document shepherd or AD before posting a new version of the draft. Document:  draft-ietf-savi-dhcp-32.txt Reviewer:  Elwyn Davies Review Date:  2015/01/30 IETF LC End Date:  [2014/08/07] IESG Telechat date: 2015/02/05 Summary: I regret to say that although the draft has considerably improved since I last reviewed it, there are still a number of significant issues that need to be addressed before it can be considered ready for the IESG.  My main concern for the technical quality of the document is the specification of the Data Snooping Process where the state machine is just not properly specified.  There are numerous more local problems.  I am unsure how significant the issue with temporary disconnections if the data snooping process has to be used repeatedly on a particular attachment point.  The various suggestions on limiting the rate of data snooping and the probabilistic back off may mean that the disconnection disrupts connections, whereas if the disconnection are only a couple of packets long, the usual congestion avoidance mechanisms will cover up the gap.  Likewise, in the likely deployment scenarios of SAVI-DHCP, fragmented DHCP messages may be a non-problem: However the issue should be noted.  Previous reviews have noted that the attribute setting requirements could be contradictory; this has not been fixed, although I think the points of conflict have moved.  I have suggested some text to cover the necessity for security protection of lease query transactions which were also noted previously. Major issues: =========== None Minor issues: =========== Temporary disconnections when Data Snooping Process is relevant: ----------------------------------------------------------------------------------------- The intention in the Data Snooping process appears to be that when the state reaches BOUND, the state machine 'merges' with the DHCP Snooping Process machine and thereafter responds to snooped DHCP messages as if the BOUND state had been reached purely by snooping DHCP messages. However, there is no guarantee, in the exceptional cases where the Data Snooping Process is invoked in the first place, that the SAVI device will 'see' any relevant DHCP messages after reaching the BOUND state since the lack of such messages is what triggered the Data Snooping Process in the first place.  Consequently, there is a significant likelihood that when the lease time (derived via lease querying) expires, the SAVI device won't have seen any of the DHCP messages that would indicate that the lease had been extended.   The timeout event will therefore trigger the removal of the SAVI binding.  Thereafter the messages from the attached device will be dropped at least until the Data Snooping Process can kick in again, assuming that the attached device still has a lease and the SAVI device is still not seeing the DHCP messages. This will mean that a device managed exclusively by Data Snooping  will suffer periodic packet loss.  How much of an effect this will have on the attached device depends on the probabilistic process used to start the Data Snooping Process and the applications being run on it.  If only a couple of packets are lost then the disconnection will probably be no worse than the effects of temporary congestion and the user will likely be unaware.  Longer outages could be very irritating, depending on their frequency, and would be difficult to diagnose/explain to the user. One possibility would be to remember that a binding anchor was initially set up by data snooping and turn up the Data Snooping Probability to one for a period after its lease timed out.  This would minimise the disconnection period at the expense of a little more complexity. Note, also that if not all the packets go through the SAVI device, those not passing through will not be validated and could have spoofed source addresses. I think these issues should be mentioned even if it is thought they do not need any action. Fragmented DHCP messages: --------------------------------------- The brief mention of draft-ietf-opsec-dhcpv6-shield at the end of s4.2.2 triggered a thought about a potential problem for SAVI-DHCP that doesn't seem to be considered in the draft.  It is possible that the DHCP messages that savi-dhcp has to snoop on might be fragmented before they pass through the SAVI device.  Unlike the scenario in the "shield" draft, it is not sufficient for the SAVI device to consider only the first fragment in a v6 DHCP message as it needs to know what type of DHCP message is passing by and that is not guaranteed to be deducible from the first fragment.  It may well not be a frequent occurrence, but it should probably be brought out - AFAICS it can apply to both DHCPv4 and DHCPv6 - it seems likely that the SAVI device would have to defragment the DHCP message in order to analyse it, but I haven't looked into this in detail. s4.1, para 3: Traffic from unprotected links should be checked by an unprotected system or [ RFC2827 ] mechanisms. What does "an unprotected system" imply?  Does "system" mean a (SAVI) technique and devices other than the DHCP scheme or just a device outside the protection perimeter.   I would have assumed that the protection scheme could also be implemented on the (DHCP) SAVI device in parallel with the DHCP scheme.  Some different words are needed but I am not sure what. s4.2.3: If it is FALSE, either the Trust attribute must be TRUE (so that bindings become irrelevant) or another SAVI mechanism such as SAVI-FCFS must be used on the point of attachment. Does the protection mechanism *have* to be another SAVI mechanism?  Would RFC 2827 not be an alternative? s4.2.3 (DHCP-Snooping Attribute)/s4.2.4 (Data-Snooping Attribute)/s4.2.5(Validating Attribute): Last para of s4.2.3: Whenever this attribute is set TRUE on a point of attachment, the "Validating Attribute" MUST also be set TRUE. Last para sf s4.2.4: Whenever this attribute is set on an attachment, the "Validating Attribute" MUST be set on the same attachment. Last para of s4.2.5: The expected use case is when SAVI is used to monitor but not block unvalidated transmissions. The network manager, in that case, may set the DHCP-Snooping and/or Data-Snooping attribute TRUE but the VALIDATING attribute FALSE. These statements are inconsistent: - s4.2.3 says DHCP-Snooping == TRUE => Validating == TRUE - s4.2.3 says Data-Snooping == TRUE => Validating == TRUE - s4.2.5 says DHCP=Snooping == TRUE and/or Data-Snooping == TRUE allows VALIDATING to be TRUE or FALSE. I believe the statements in s4.2.3 and s4.2.4 can be deleted. s4.2.4, last para: Since some networks require DHCP deployment and others avoid it, there is no obvious universal default value for the Data-Snooping Attribute. However, note that deployment of SLAAC (and therefore SAVI-FCFS) is generally configuration-free, while the deployment of DHCP involves at minimum the deployment of a server. Hence, the Data-Snooping Attribute should default to FALSE, and a mechanism should be implemented to conveniently set it to TRUE on all points of attachment for which the Trust attribute is FALSE. If this text remains as it is the acronym SLAAC needs to be expanded (probably back in s1).  However, introducing SLAAC at this point seems inappropriate.  SAVI-DHCP is specifically stated in s1 to be intended for 'pure DHCP scenarios'.  Further, it is not at all clear why the issue of zero-configuration or otherwise suddenly appears here.  I suggest that the second sentence is removed unless there is some really good reason that I have missed.  It seems to me that maybe there is something to be said about indirectly connected hosts here. Nits/editorial comments: General: Both 'validate' and 'check' are used in the text.  'Validating' (as in 'Validating Attribute') appears to have the specific meaning of ensuring that the IP source address is legitimate, whereas 'checking' has a variety of more general meanings.  It would be wise to ensure that wherever ensuring that the process of ensuring the IP source address is legitimate the term 'Validating' is used (possibly with a capital letter) and check is used in all other cases. General: s/LEASEQUERY_REPLY/LEASEQUERY-REPLY/ (7 places) [There are also 8 places where it is already right.] s1, para 1, first sentence: (has two 'source's) OLD:    This document describes a fine-grained source IPv4 or IPv6 source    address validation mechanism. NEW:    This document describes a fine-grained source    address validation mechanism for IPv4 and IPv6 packets. s1, para 1, sentence 4: OLD: assigned to the other attachment points or invalid in the network. NEW: assigned to any other attachment point in or not associated with the network. s1, para 1, sentence 6: s/If [RFC2827]/Whereas [RFC2827]/ s1, para 2, sentence 2: s/the header of data packet/on the IP header of data packets/ s1, para 2, sentence 3: s/a permanent block/the permanent blockage/ s1, para 3: OLD: The appropriate SAVI method must be used in those cases. NEW: An alternative SAVI method would have be used in those cases. s1, para 3: s/Besides, this mechanism/This mechanism/ s1, para 3: s/enable a SAVI solution for link-local/deploy an alternative SAVI solution for link-local/ s1, last para: OLD: This mechanism works for DHCPv4-only, DHCPv6-only, or both DHCPv4 and  DHCPv6. NEW: This mechanism works for networks that use DHCPv4 only, DHCPv6 only, or both DHCPv4 and  DHCPv6. s3, Client-Server messages, 6th bullet:  It would be good to distinguish this bullet for example making it a next level sub-list with no bullet. s3, Direct attachment/Indirect attachment: s/e.g./i.e./ (one in each entry) s3, Unprotected link/Protected link: I believe the intention is that these are links connected to SAVI devices. Hence: s/that the device/that the SAVI device/ s3, Cut Vertex: s/connected components/connected components in a (network) graph/ s3, Cut Vertex; s6.1, para 2; s7.1, 1st and 2nd  bullets: s/DHCP snooping process[procedure]/DHCP Snooping Process/ s3, Detection message: "by the Data Snooping Process."  I think "by" should be "during" but  could be "that triggers". s3: The various messages associated with the DHCP lease query process are not mentioned here.  It would probably be appropriate to add the DHCPv4 DHCPLEASEQUERY and DHCPv6 LEASEQUERY messages to the Client-Server set as they will occasionally be used by DHCP clients but are mostly used by SAVI devices in this context but should be allowed from clients.  On the other hand, it may be sensible to have a separate category for the lease query responses as treated specially on return flows - I am not quite sure which is best. However they are categorized they need to be filtered in the same way as Server-to-Client messages and only allowed on attachment points that have the Trust or DHCP-Trust attributes. s4.1, para 1: I assume that a SAVI protection perimeter could contain more than one DHCP server (I think the last para of s4.2 implies this).  In this case s/include a DHCP server/include at least one DHCP server/ s4.2.1, para 1: Examples of a trusted binding anchor would be a port to another SAVI device, Surely an attachment that is trusted doesn't have a binding anchor just because it is trusted - and also because, with multiple source addresses expected on the link, a binding anchor is not relevant.  This is said in the second para of the section! I think s/trusted binding anchor/trusted attachment/. s4.2.1, para 2: Two points: - Presumably the intention is that *no* messages from attached devices will be checked - singling out DHCP and 'data' messages is confusing - better something like 'any packets, including DHCP messages' would be better. Also I assume that 'checked' is a synonym for 'validated' here (see General point above) - Para 1 of s4.2.6 states that "there is no need to set DHCP-Trust to TRUE on an attachment point that sets Trust TRUE   It would be clearer if this was made explicit here. I suggest: OLD:    SAVI devices will not set up bindings for points of attachment with    the Trust attribute set TRUE; DHCP messages and data packets from    attached devices with this attribute will not be checked.  If the    DHCP Server-to-Client messages from attached devices with this    attribute can trigger the state transitions specified in Section 6    and Section 7, the corresponding processes in Section 6 and Section 7    will handle these messages. NEW:    SAVI devices will not set up bindings for points of attachment with    the Trust attribute set TRUE; no packets, including DHCP messages, from    devices with this attribute on their attachments will be validated.  However DHCP    Server-to-Client messages will be snooped on attachment points with the Trust    attribute set TRUE in the same way as if they had the DHCP-Trust attribute set (see    Section 4.2.2) s4.2.2: "ports that are trusted would have it set TRUE."  As discussed in the previous comment Trust effectively implies DHCP-Trust.  Suggest changing this to: NEW:    attachment points that have Trust set TRUE are implicitly treated as if DHCP-Trust is TRUE. s4.2.5, s8.1: s/VALIDATING/Validating/ (for consistency naming attributes) s4.2.4, para 2: s/Data-Snooping process/Data Snooping Process/ s4.3.1, bullet #2: s/Each SAVI device only need/Each SAVI device only needs/ s4.3.1, last para: Add a sentence after sentence 1 to emphasise that incoming  DHCP Server-to-Client messages are filtered at the boundary.  Suggest:      DHCP server response messages incoming across the perimeter will be dropped      (see Section 8). [Note: This needs to be more general as the DHCP Server-to-Client messages do not include LEASEQUERY responses and maybe some others.] s4.3.2, item (4)/Figure 1: The link to Non-SAVI Device 2 is marked as "protected" in Fig 1 and according to s4.3.2, item(4) it appears that the attachment point of this device on  SAVI Device A would have Trust set to TRUE.  I would have thought this meant  Non-SAVI Device 2 was inside the perimeter, but Fig 1 show it outside.  Please explain what is going on, as some further text may be needed. s4.3.2, last para: OLD:    The DHCP-Trust attribute is only TRUE on the inside links of the perimeter. NEW:    The DHCP-Trust attribute is only TRUE on links inside the perimeter. s4.3.3, para 1: s/guideline/guidelines/ s4.3.5, para 1: s/and the SAVI switch/and the SAVI device/ (as it isn't necessarily a switch). s4.3.5, para 2: s/IP spoofing traffic/spoofed IP traffic/ s4.4, item (1): Address configuration. For DHCPv4: the client of a SAVI device MUST have an IPv4 address. Comparing this with the following words for IPv6, I think this ought to be "For DHCPv4: the SAVI device MUST have an IPv4 address."  Otherwise it is really a non-sequitur, since if DHCPv4 is being used, presumably the idea is that the client will obtain an IPv4 address from the DHCP server - the text implies that it needs one even before it gets one via DHCPv4. s4.4, item (2):  Add "A SAVI device may also require security parameters, such as pre-configured keys to establish a secure connection for the Lease query process (see [RFC4388,RFC 5007]). connection s5, bullet #1: In the light of the discussion in s4.3.5: OLD: s5, bullet #2 in second set: s/data-snooping/data snooping/ OLD:    o  Binding Anchor(Anchor): the binding anchor, i.e., a physical and/       or link-layer property of the attachment. NEW:    o  Binding Anchor(Anchor): the binding anchor, i.e., one or more physical and/       or link-layer properties of the attachment. s5, Figure 4: s/instance/example/ (2 places - sentence before figure and caption of figure) s6.1, sentence 1: s/on the client's point of attachment./via the client's point of attachment./ s6.1, sentence 2: s/basis/assumption/ s6.3: It would be worth noting here: "The DHCP message categories (e.g., DHCPv4 Discover) defined in Section 3 are used extensively in the definitions of events and elsewhere in the state machine definition." s6.3: Possibly add equivalent text to that in s7.4: If an event will trigger the creation of a new binding entry, the binding entry limit on the binding anchor MUST NOT be exceeded. s6.3.2, EVE_DHCP_LEASEQUERY: It would be worth noting that DHCPv4 DHCPLEASEQUERY is not used in the DHCP Snooping process to avoid confusion with s7.  Also since the LEASEQUERY should have been originated by the SAVI Device itself, the destination check should verify that the message is directed to this SAVI device  - and it should not be forwarded once it has been processed here. s6.3.2: o Attribute check: ...; the DHCP Client-Server messages should be from attachments with DHCP-Snooping attribute. o Destination check: the DHCP Server-to-Client messages should be destined to attachments with DHCP-Snooping attribute. ... If I understand correctly, SAVI DHCP will allow devices on protected links (e.g., Non-SAVI device 2 in Figure 1) to obtain an IP address via DHCP without triggering the binding anchor set up.  The rules cited above would mean that the DHCP interactions for these devices would not trigger events, which I think is intended.  It might be worth making this explicit (assuming I have it correct). [Check: Should certain messages of types that might have triggered events but didn't, because of the above checks, be logged?] s6.4.1.1: Two issues:  - Refers to DHCPv4 Reboot.  This is not a message that triggers this event and so should not be mentioned.   - Should the address in any IA's carried by the DHCPv6 Request message that can trigger this message also be copied into the BST? If so should it also refer to Figure 6? s6.4.1.2:  Refers to DHCPv4 Request.  This is not a message that triggers this event and so should not be mentioned. s6.4.1.3: Three issues:  - Refers to DHCPv4 Request and DHCPv4 Reboot.  These are not messages that trigger this event and so should not be mentioned.   - Should the address in any IA's carried by the DHCPv6 Solicitation message that can trigger this message also be copied into the BST?  If so should it also refer to Figure 6? s6.4.3.7: The LEASEQUERY message is destined for this SAVI device.  It is not clear where it would be forwarded to (maybe some DHCPv6 infrastructure on the SAVI device?) Figure 9 Caption and Figure 10 Caption: s/Transit/Transition/ s6.4.4/Figure 9/Figure 10: Events EVE_DHCP_RENEW and EVE_DHCP_REBIND are missing from the table, list and diagram.  They should be marked as cycling in the BOUND state.  Putting them in here is desirable so that it is consistent with the table/diagrams in s7.9 etc. s7.1, para 1: s/two case when this does not work/two cases when this does not work/ s7.1, bullet #1: s/everyone/every one/;                            s/passing by the SAVI device/passing through the SAVI device/ s7.1, bullet #2: s/turns broken/becomes broken/;                            s/as planned/some planned change/ s7.1, para after bullets: OLD:    Data Snooping Process prevents permanently blocking legitimate    traffic in case of these two exceptions. NEW:    The Data Snooping Process can avoid the permanent blocking of legitimate    traffic in case one of these two exceptions occurs. s7.1, next to last para: s/a conditional SHOULD/OPTIONAL/;         s/without the above exceptions/where the exceptions cannot occur/ s7.1, last para:  Mention that the probability of unmatched packets triggering the Data Snooping Process should be a configurable parameter of implementations.  Presumably the default should be zero so it is normally turned off. s7.2, last para: s/about this process is discussed is/concerning this process are discussed in/ s7.3: The way in which the Data Snooping process integrates with the DHCP Snooping Process is not explained until the very end of s7 (in s7.8) and even then very tersely.  I suggest adding the following to s7.3: NEW:     The Data Snooping Process provides an alternative path for binding entries     to reach the BOUND state in the exceptional cases explained in Section 7.1     when there are no DHCP messages that can be snooped by the SAVI device.     In some of the exceptional cases (especially the dynamic topology case), by     the time the binding has reached the BOUND state the DHCP messages may     be passing through the SAVI device.  In this case the events driven by DHCP     messages that are expected in the BOUND state in the DHCP Snooping Process     may occur and the binding can be handled by the DHCP Snooping Process     state machine.     In any event, the lease expiry timeout event will occur even if no others do.      This will cause the binding to be deleted and the state to logically return to     NO_BIND state.  Either the DHCP or the Data Snooping Process will be     reinvoked if the lease is still place.  If DHCP messages are still not passing     through the SAVI device, there will be a brief disconnection during which     data packets passing through the SAVI device will be dropped.  The     probabilistic initiation of the Data Snooping Process can then take over again     and return the binding state to BOUND in due course.    s7.4, para 1: s/In addition to EVE_DATA_LEASEQUERY and EVE_DHCP_REBIND,/In addition to EVE_DHCP_RENEW and EVE_DHCP_REBIND,/ s7.4, para 1:  To make the BOUND state consistent with the DHCP Snooping Process case, the events EVE_DHCP_RELEASE, EVE_DHCP_DECLINE, EVE_DHCP_REPLY, and EVE_DHCP_LEASEQUERY should also be processed in the BOUND state.  The actions in the BOUND state can be explained by a reference to s6.4.3 rather than having to repeat them in s7. s7.5-s7.7:  The textual description of the actions is not an accurate representation of the evolution of the state machine, primarily because the sending of the second neighbour detection and second lease query messages is shown in the wrong state.  This means that the detection messages would not be received in the expected states.  The way to fix this is to define three "functions" (as is normally done for state machines).    The functions would subsume the text about sending ARP/DAD messages in s7.5.1, the text about sending lease queries in s7.6.1 and sending ARP requests in s7.7.1.  A third intermediate state is needed to handle the three response messages or timeouts from the three remote messages.  As it stands, the actions after an UNMATCH event (s7.5.1) involve transmitting messages and waiting for responses or timeouts.  The text is unclear exactly when the transition to the DETECTION state occurs, and the EVE_ENTRY_EXPIRE actions in DETECTION (s7.6.1) do not handle the retransmission of the ARP/DAD requests (see s7.5.1) but set about sending lease queries.  The BST needs an expire counter entry to simplify the number of states.  The sequence (for v4) needs to go something like: [NO_BIND] UNMATCH recd and decided to act [NO_BIND] Set expiry count -> 0; Send ARP; Set timeout to  DETECTION_TIMEOUT; Transition to [DETECTION] [DETECTION] CONFLICT recd => abort and discard binding; Transition to [NO_BIND] [DETECTION] EXPIRY: Increment expiry count;                          if count == 1: Send ARP; Set timeout to  DETECTION_TIMEOUT; Remain in state [DETECTION];                          else if count ==2: Send lease query; Set timeout to LEASEQUERY_DELAY; set count to 0; Transition to state [RECOVERY] ... and similar in state RECOVERY.. an extra state is needed to handle the ARP etc after successful lease query. s7.5.1: This local conflict process SHOULD be performed. If it is not performed, the state of the entry is set to RECOVERY, the lifetime is set to 2*MAX_LEASEQUERY_DELAY, and the lease query process specified in the following section will be performed directly. Under what circumstances would the local conflict process not be performed? If the sequence noted above is used, the lease query process can be triggered by  setting the expiry count to 0 and sending the lease query request before transitioning to state RECOVERY. s7.6.1, item (1): s/A list of authorized DHCP servers are kept/A list of authorized DHCP servers is kept/ s7.8/s7.9/Figure 13/Figure 14: s/Transit/Transition/ (4 places) s8: The filtering of DHCP messages needs to apply to the various lease query and lease query response messages.  The relevant messages need to be either added to the appropriate one from Client-Server and server-to-Client category, or a new category created and mentioned with the existing categories (see comment on s3 above).   s9.1, para 2: s/attribute can be found/attributes can be found/ s9.2, para 1: s/discard legitimate/to discard legitimate/; s/Purely/Simply/; s/is of/is a/;                       s/considerable/may cause considerable/ s9.2, last para: OLD:    Immediately after reboot, the SAVI device SHOULD restore binding    states from the non-volatile storage.  The system time of save    process MUST be stored.  After rebooting, the SAVI device MUST check    whether each entry has been obsolete by comparing the saved lifetime    and the difference between the current time and time when the binding    entry is established. NEW    Immediately after reboot, the SAVI device SHOULD restore the binding    states from the non-volatile storage.   Using the time when each binding    entry was saved, the SAVI device should check whether the entry has    become obsolete by comparing the saved lifetime and the difference    between the current time and time when the binding entry was established.     Obsolete entries which would have expired before the reboot MUST be removed. s11.2:  This section adds additional states/events/actions into the state machine.  The link-down event and its consequences aren't really a security consideration. s11.3: I am unclear how the processes described could lead to multiple binding anchors being established on the same SAVI device.  Could you quote an example please?  I am unsure how a client with multiple attachments could be using the same address on each of them. s11:  As discussed on various previous occasions, lease query operations are considered security sensitive [RFC5007] [RFC4388].  It is recommended that an IPsec channel is opened to carry out lease query enquiries.   However, since the number of SAVI devices and DHCP servers/relays in a typical network is relatively small and they will all be under the control of a single administrative authority, keying material can be prepositioned out of band and used as necessary by SAVI devices that know the addresses of the DHCP entities.  This needs to be described in an extra section in s11.  It may also be the case that in some circumstances, the SAVI protection itself could be considered sufficient to obviate the need for using IPsec channels - however, that needs to be discussed and  I suggest that the authors consult with a security expert (i.e., not me) to decide what is appropriate.