Re: [IPsec] draft-kivinen-ipsecme-esp-null-heuristics comments
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [IPsec] draft-kivinen-ipsecme-esp-null-heuristics comments



Some comments inline...

Thanks, 
Ken
 
>-----Original Message-----
>From: ipsec-bounces at ietf.org [mailto:ipsec-bounces at ietf.org] On Behalf Of
>Tero Kivinen
>Sent: Tuesday, February 03, 2009 5:12 AM
>To: Dragan Grebovich
>Cc: ipsec at ietf.org
>Subject: [IPsec] draft-kivinen-ipsecme-esp-null-heuristics comments
>
>Dragan Grebovich writes:
>> Hi Tero
>> I reviewed your heuristics draft and I believe it is interesting and
>> doable, however:
>>
>> 1) I believe the actual implementation would require more code than
>> current front-end hardware allows.  The hardware we work with have a
>> limited space for that much heuristics code.   Our preference is to find
>> a quick direct match (in a few instructions max).  We would have to
>> respin hardware to allow your heuristics implementation.  This might not
>> be implementable today, as hardware respins are costly and take
>> relatively long.
>
>In normal cases, I would expect that heuristics are not run on the
>hardware, only the SPI cache is run there (i.e. the Appendix A.1 part
>of the operations). This kind of processing is already run on the
>firewalls, deep inspection devices and so on, usually per TCP flow
>basis. Adding that to be run on per IPsec SA basis should be very
>small addition to the front-end hardware.
>
>Are you saying that even that is too complicated?
Complexity is one thing, but there is a cost vector to consider too and the answer may be different based on the device being targeted. 

Some devices may be capable of doing the heuristics processing in SW as an exception case, whereas other devices (e.g. requiring high bandwidth DPI) need to do this in HW to maintain line rates. 

From a cost perspective, If a device is stateless in nature (e.g. traffic monitoring/logging for auditing purposes), then that device will now need to be stateful to maintain all these (potentially hundreds of thousands) or 3-tuples as state, which may not be practical. 

Cost also reflects expensive SRAM to store these cached policies, which may be pennies, but in a low end NIC can be a cost-inhibitive to implement. 

>
>> 2) On low-end products, e.g. Access (aka Edge) it is unlikely that
>> today's and even tomorrow's implementations will have stateful DPI
>> implemented; these devices have low target COGS and usually do not have
>> simple packet filters and stateless firewalls and signature-based DPI
>> and IDS/IPS.  It is unlikely that this would be the potential
>> implementation target for heuristics, as the cost/price would go up.
>
>If they do not have "simple packet filters and stateless firewalls and
>signature-based DPI and IDS/IPS", then they do not need heuristics at
>all, as there is no need to differentiate the ESP vs ESP-NULL.
>
>Heuristics solution do not require any changes to those low-end
>product. On the other hand the new protocol number for ESP most likely
>will require changes to those devices, as they need to be sure to pass
>that protocol number also.
Any addition of a new protocol will require changes to any device that does not support that protocol, so this argument is not unique to WESP, but applicable to all new protocols - e.g. how would these devices act when seeing the recently introduced HIP protocol or UDP-Lite? If we use this argument that a new protocol addition will impact legacy devices, we should not be allocating ANY new protocols - full stop!

Also see previous argument for low end devices above.
>
>> 3) High-end products, e.g. Data Center front-ends, would allow a
>> heuristics implementation; it may have a stateful firewall and/or
>> IDS/IPS and other DPI capabilities, however this class of devices is
>> migrating towards virtualized platforms (major vendors are moving in
>> that direction), and they would be terminating tens or even hundreds of
>> thousands of SAs.
>
>Heuristics are not needed on the devices which are terminating IPsec
>SAs. Devices terminating IPsec SAs already know all IPsec SA state,
>thus they do not need heuristics.
>
>Heuristics is only needed on devices which do all the following:
>
>1) Do want to some kind of deep inspection on the ESP-NULL packets.
>
>2) Are on the path of the IPsec flow, but not participating in the
>   IPsec SA (i.e. is not a node where IPsec SA terminates).
>

Agreed - any device terminating the SA has all the info it needs already.

>Note, that usually one IPsec SA has tens or hundreds of the TCP or UDP
>flows inside, thus for each entry added to the SPI cache, the TCP or
>UDP caches require much more. 

This is typically true for VPN/Tunneling, but not true for transport mode, E2E solutions, where an IPsec SA will be as granular as the traffic it is carrying (e.g. TCP port 123 to TCP port 456 between two endpoints). This will result in a 1:1 correlation between a flow and an SA, hence the number of SAs can and will be in the 10s and 100s of thousands, especially at intermediate nodes in the network. 

>This means that high-end producsts need
>to have memory for storing hundreds of thousands or even millions of
>TCP/UDP flows anyways, and adding some for IPsec SA flows does not
>significantly increase the number (in the worst case scenario, where
>each IPsec SA has exactly one TCP flow inside, it might double the
>number of flows needed to be stored).

Agree with the last part for doubling the state, BUT this additional state may be in the HW (e.g. dedicated SRAM on the NIC, instead of DRAM accessible by the DPI inference engines), so even though the state doubles, the cost associated with where that state is kept goes up by a larger factor (cost of SRAM is many times the cost of DRAM).  

>
>> 4) Even if these high-end devices have implementations running on
>> multicores, everyone is struggling to provide low latency and high
>> throughput, squeezing the last possible bit through of the pipe.
>> Keeping state until one finds that there is a match or not may introduce
>> unwanted latency and create large memory consumption (multiply
>> everything by e.g. 100K Sas) which we may not be able to afford.
>
>During normal operation there is no increased latency or slow
>throughput, as the used IPsec SAs are already in the SPI flow cache,
>and the parameters are fetched from there.
>
>Even if there is hundreds of tousands of SAs, the rate of new IPsec
>SAs created per second is very low. I.e. if IPsec SAs has lifetime of
>1 hour, that means that every second we create on average of ~30 new
>IPsec SA flows to keep up 100 000 IPsec SA flows running. Thus the
>actual heuristics processing only requires to process around that
>amount of flows.

The SA lifetimes will be dependent on policy and the type of traffic underneath. Again, we are talking transport mode and transient traffic - I connect to a server to retrieve some data and then move on with something else - the life of the connection may be a few minutes or even seconds. How about VOIP calls protected by IPsec? The usage model is securing traffic for all connection using transport more IPsec and the typical connections from the client to server are short lived, mixed in with a small percentage of long lived connections - e.g. you may be connected to a resource continuously during a given work day (check your email, Open channel IM with a colleague during a meeting, keep an eye on the every decreasing stock price in order to decide when to empty your mattress and buy!, and so on...)

Which brings up another important point...
Cache eviction - how will this work?
We can keep adding SAs (based on heuristics), but how do we decide when a given SA is no longer needed? This compounds the issues with keeping state, as in the best case, cache eviction will likely be policy based. How is the policy determined and how do we differentiate between short and long lived SAs? As the SA cache will be of a finite size, this WILL lead to a cache thrash (add SA, evict SA, ...), causing further resource consumption.

>
>Of course during the monday morning problem the rate will be higher,
>but even so the heuristics is cheaper than the Diffie-Hellman required
>to actually create those IPsec SAs, so even low powered CPU should be
>able to easily to run heuristics on thousands or tens of thousands
>packets per second.
>
>After the heuristics is run then the processing is moved to the actual
>fast path which only inspects the IPsec SPI flow cache and the src/dst
>addresses and SPI number found from the packet.


How about dynamic route changes for already established SAs? The same problem will exist as the Monday morning problem, but without the diffie-hellman overhead. Because we are caching state on one device, a change in route where the packets take a different path will force the new device to 'relearn' all the cached info. 

>
>> 5) On these High-end appliances, there is a definite need for
>> load-balancing between flows.  That means the amount of the required
>> memory space will be at least 2x or 3x more, and also there is need for
>> additional CPU and I/O overhead to maintain two or more loanbalancers in
>> sync.
>
>Heuristics can be run separetely on each devices without any problems,
>but the deep inspection usually cannot be run separetely. Thus I do
>not think deep inspection engines will be load balanced so that same
>flow can end up in multiple devices. Usually the load balancing there
>is done based on the IP addresses or similar, where they can guarantee
>that each flow ends to one specific node of the load balancing
>cluster.

It may also be desirable to load balance based on ports. E.g. traffic flowing between two domains with a NAT may see a lot of flows with the same src/dest IPs, so disambiguating by ports is the next step.

High availability is another one to consider, especially during maintenance periods. One device is powered down and a backup device takes over. This is similar to route changes in principle, where the backup device will need to relearn all the state. 

>
>> 6) Another problem I see is that more traffic today is carried on UDP,
>> and those are inherently difficult to track as "flows".  One must keep a
>> significant portion of traffic to determine ESP vs. ESP-NULL.  Years ago
>> it was rare and mostly used for implementations such as SNMP, but these
>> days new applications ar ecoming out (e.g. Bittorrent-over-UDP).
>> SIP-over-UDP is another such implementation that may cause problems, if
>> heuristics are implemented in intermediate devices.
>
>As you said the UDP is hard to keep track, but heuristics does not
>NEED to keep track of UDP flows, it just need to find enough
>information in the beginning of the flow, to create the flow
>information for the ESP-NULL flow. After that it does not care what
>kind of UDP traffic go inside. The deep inspection device inspecting
>the traffic inside ESP-NULL flow, still need to keep track of UDP
>flows, but heuristics has already been done in that case.

Agree, but the heuristics engine may be heavily used as VOIP becomes prevalent, especially within the Enterprise. Lots and lots of UDP traffic that is short lived, which ties back to earlier points on cache maintenance and frequency of exercising the heuristics engine. 

>
>> 7) There will be stateful appliances in the future, but stateless
>> devices will remain in the future;  these will not be able to perform
>> heuristics as defined in the heuristics draft.  They may have to be
>> redeployed to other places in the network, causing forklift replacements
>
>Normal steteless devices do not need to do heuristics, as they do not
>do deep inspection or things that require looking in to the ESP-NULL
>packets.

Auditing / logging / sniffing / sampling are some examples of stateless devices that do require to peek in the packets. Probably lots more also, so look for others to provide examples...

>
>> 8) Even by your own conclusion, heuristics are not 100% accurate.  There
>> are some "false positives" and some "false negatives" situations.
>
>Not sure what you mean by "false positives" or "false negatives", but
>as explained in the draft it can only make errors that valid ESP
>packet is detected as being ESP-NULL packet. It will never detect
>valid ESP-NULL packet as ESP packet.
>
>> In summary, I find your draft to be interesting, and I am looking
>> forward to seeing it progress on Informational track.  I believe also,
>> that a deterministic approach would be quicker and easier.  I suggest
>> the "visibility" draft remain on the WG Standards track as it is more
>> implementable.
>
>It might be easier, but it definately will not be quickier. 
The speed of deployment may or may not be true. If the stars are aligned, then it could be deployed within one or two refresh cycles, which is about the same for heuristics in intermediaries devices. After all, only a handful of vendors own a large percentage of the market for OS / intermediary devices. Adoption is obviously based on customer pull/perceived usefulness.


> Heuristics
>can also be implemented regardless of IKEv1 or IKEv2. Modifying the
>ESP will be IKEv2 only, thus it will require end nodes to start using
>that too. It seems that quite a lot of devices are still using already
>obsoleted IKEv1 still, even when IKEv2 has been out for 3 years...
>--

This point is somewhat moot, as we are trying to address a new use case for ubiquitous transport mode IPsec, which is not the case today. It is a new use case, so if people want it, they will use the correct version of IKE. 

In fact, the same argument is true for all the other changes we are putting into IKE under the different charter items...


Note: Messages sent to this list are the opinions of the senders and do not imply endorsement by the IETF.