[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: strange packet loss

On Mar 14 01:37 PM, Matt Provost wrote:
> We're running an OpenBSD 3.1 pf firewall between our network and our
> ISP. Lately we've had trouble with packet loss - it occurs at random
> intervals, but almost always lasts around 18 seconds (+-1 sec). In
> between, everything looks fine. The drops wreak havoc with ssh sessions
> though.
> We've got a direct cat5 patch to our ISP at the datacenter going into
> the firewall. I can watch the packet loss by running a ping from the
> firewall to the router on the other end of the cable.
> We're pushing about 15Mbits of traffic over this connection. We've got
> 245 filter rules, and every pass rule is stateful.
> If we disable pf, the packet loss seems to go away, although we haven't
> been able to run very extensive tests with it wide open.
> We have a standby firewall that I upgraded to the latest 3.3 snapshot
> (March 12?) and loaded the same ruleset on - and had the same packet
> loss.
> Has anyone ever seen this problem before? I doubt that it's strictly a pf
> problem, since it's been rewritten so much since 3.1.
> Here's my dmesg in case anyone sees anything. I did see that the USB
> controller shares an interrupt with the network cards, but we're not
> using USB and vmstat -i shows that it sends no interrupts.
> Thanks,
> Matt
Ok, since posting this email we've discovered that our DNS servers are
still delegated for some spammer domain that we kicked off months ago.
He's still using it as his From: address on spam though. So we're
getting several hundred thousand requests per hour for his DNS info. I'm
guessing that it's the massive number of new connections that is killing
the packet filter. We still get 18 second dropouts all the time.
We've got the lame delegation problem taken care of with his registrar,
and are just waiting for the glue record to update.
In the meantime, what can we do to deal with the problem? Our timeout
settings are the default:
udp.first            60s
udp.single           30s
udp.multiple         60s
Should I decrease these? I already switched from using stateful
connections to just passing DNS requests through.
Are there any queues on the system side that would make a difference?
The sysctl UDP parameters are all default too:
net.inet.udp.recvspace = 41600
net.inet.udp.sendspace = 9216
It is my understanding that these are only used by traffic destined for
the machine though, not passing through it.
I'm sure this is going to happen again, so I'd like to get it resolved
now, especially since updating the root servers takes forever.