[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pf vs ASIC firewalls



On Mar 14, 2005, at 2:26 PM, Mike Frantzen wrote:

Could Someone please tell me the advantages of PF against Firewalls
using the ASIC technology in terms of Security and perfomance??

Many (most? all?) vendors shipping what they call ASIC firewalls are actually running software on a network processor (NPU). The benefit is that most NPUs will process packets in real-time so if they claim to support X gigabit per second then they can probably sustain that even with minimum sized 64byte ethernet frames;

You think? I've been a bit curious about this, especially in the low-end ("cheap") consumer-grade hardware. Just because a device "supports" 10/100 ethernet, that doesn't mean that it can saturate it across all ports simultaneously. Assuming nothing is being funneled (inbound traffic on 2 ports destined for the same output port), which is "legitimate" lossage, can one of those sub-$100 10-16 port 10/100 switches really saturate half their ports at 100 Mbits/sec? That is, you might get full-throughput at 100Mbits from the first to the second, but there has to be some central IC that keeps track of minimal routing tables and locks & releases access to the transceivers. Can something like that actually pass 100Mbit simultaneously from 0->1, 2->3, 4->5, etc? I may be completely wrong, but I'd bet the design specs for those things basically say that the entire "device" can move 100Mbit total. After all, that's more than sufficient for 99% of the people that would buy them. Hell, half the people that buy them probably wouldn't notice the difference between 10Mbit and 100Mbit. Just a curiosity.



The down side to NPUs is that they have to service every packet in a
fixed amount of time so they can't do much. They need to have fixed
sized state and fragment reassembly tables. They also aren't allowed to
do much work per packet. You will also be able to surf Moore's law
better with a normal x86 processor than with an NPU.

Well, the only difference between that and the requirements for a PF-style setup is that there's more room for buffering. That is, you don't strictly have to finish servicing a packet by the time the next one arrives (or a hardware buffer by the time it would be filled again), but if you want to saturate, you have to have a "service time" per packet of less than the amount of time it takes for the packet to transmit. (If your "service time" is less than the duration of the packet transmit, you get 100% saturation. If it takes 5-6us to transmit a 64-byte ethernet frame, and you take 50-60us to process it and hand it off to the outgoing transceiver (if necessary), then you only get 10% saturation.)


The strict time restrictions are what limits functionality, of course. The more complex your ruleset, the longer it takes to process. Rules, like PF's, that can look inside protocol data for additional blocking are even more processor-intensive. Things like keeping statistics add some minimal overhead, too.

And of course, buffering really only makes things more complex: every system bus is added overhead and potential bottleneck. Is that packet being stored in an on-chip cache, or did the OS copy it to off-chip core? Etc. Add to that the fact that one processor has to do this for all interfaces, and moving from 2 to 6 interfaces is going to give you, what, perhaps 33% theoretical bandwidth (not even counting the extra routing table overhead, or added rule complexity), because every one of a pair of interfaces may be saturated with traffic that needs to be serviced to meet theoretical maximum capacity.

That doesn't even get into things like logging, or restrictions from the OS design. If you want to be able to saturate AND log, you need to know what you're logging to. If writing to disk, or sending messages to a loghost, even on a dedicated interface, adds extra system latency or traffic. In theory, extreme logging on all 64-byte ethernet frames for some rule or another could generate MORE traffic than that which you are logging.

I have no idea if the BSD's--or any general-purpose host OS, for that matter--will, for example, prevent logging to disk during bursts of traffic. Perhaps there's a kernel option for priority of servicing i/o.

An under-designed PF-style system might give you added functionality by allowing traffic to be buffered, and still handle a moderate-load network without dropping frames, if it can "catch up" when traffic bursts fall off. And that might be sufficient for most purposes, having more features than a hardware-only solution, but only a strictly-designed real-time system can make any guarantees. And those can only do so by imposing limits on the size of the "rules" applied.

All of that said, I wonder if there isn't some way to implement something vaguely PF-ish in an FPGA that would allow more control over the rulesets than an off-the-shelf ASIC.


JMF