[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with stalling sessions



On Tuesday 08 November 2005 15.30, Jon Hart wrote:
> On Tue, Nov 08, 2005 at 01:39:21AM +0100, Per-Olov Sjöholm wrote:
> > Hi
> >
> > I have a redundant firewall with CARP. 3.6 STABLE plus all patches from
> > CVS for stable (updated last week). The firewalls have 7 nic ports each.
> > External, internal, pfsync and 4 dmz interfaces. The servers are
> > firewalls, DNS, mailrelay, antivirus, spamkillern ntp and dhcp for
> > internal hosts. Everything works perfect! Except for the facts that
> > sessions are stalling during transfers of big files. I have tried to
> > remove "aggressive timeouts", "adaptive timeouts" and "scrub" without
> > success. It doesn't matter if the transfer goes over NAT from Lan to
> > internet or from a real IP on dmz2 to the internet. We have tried many
> > different protocols such as SSH, amanda and more.
> >
> > Turning on -x loud give ALOT of the below (maybe irrelevant??)
> > --snip--
> > Nov  8 00:49:53 san /bsd: pfsync: ignoring stale update (3) id:
> > 4367413c000b4c76 creatorid: e31b4f22
> > Nov  8 00:49:53 san /bsd: pfsync: ignoring stale update (3) id:
> > 4367413c000b4c75 creatorid: e31b4f22
> > Nov  8 00:49:53 san /bsd: pfsync: ignoring stale update (3) id:
> > --snip--
>
> Do you get these all the time or just when the system is under load?
> For some reason your primary carp host is getting hold updates from
> someone else, presumably the other carp machine.  Something seems out of
> whack here.
>
> > Nothing comes up as blocked in the firewall log when a session is
> > stalling. I have Intel 10/100 (fxp nics) and Soekris lan1641 quad boards
> > (sis nics)
>
> When I read 'sis' I immediately suspected those cards as the problem as
> I know others on the list have had problems with those cards under load
> in the past.  I believe this may have been fixed in more recent releases
> though, but don't quote me on it.
>
> > Don't look to close to the queuing stuff as it's not complete.
> > The rows from Firewall-1 pf.conf (primary) on the link below.
> > http://www.incedo.org/~sjoholmp/pf/pf.conf
> > (secondary FW have exactly the same pf.conf)
>
> The only comment I have about that ruleset that may be relevant is the
> max states.   Even though you've got it commented out it will still
> default to 10k states unless you say otherwise.  This may not even be
> relevant because a large transfer should not necessarily drive the
> number of states through the roof.  Depends on the method used to
> download, of course.
>
> > Any suggestions?
>
> In no particular order...
>
> Figure out why you are getting stale updates from pfsync.  Do a simple
> test.  Your two carp hosts, ONE other client machine.  From the client,
> initiate a connection outbound and ensure that the two pfsync hosts have
> similar (if not identical) state tables.
>
> When downloading, keep an eye on carp and see if the two hosts are
> flopping between master and slave.  If you don't feel like doing this
> manually, use ifstated (may not have been available in 3.6 though).
>
> Use systat/vmstat to see how the system is acting under load.  Looks
> like we are dealing with a 2M pipe so it shouldn't be an issue but worth
> looking at anyway.
>
> Take the second host out of the picture entirely and see if your
> problems persist.
>
> -jon
Seems like a "tcpdump -i fxp0 proto carp" on all relevant interfaces on both 
servers are ok and wont change state when the session is stalling.
However... Sniffing the pfsync interfaces on both firewalls looks strange. A 
"tcpdump -i xl0 -w /root/pfsync1" on server 1 and a "tcpdump -i xl0 
-w /root/pfsync2" on server 2 is found at: 
http://www.incedo.org/~sjoholmp/pf/
When the session is stalling these tcpdump files grow VERY fast. I think the 
session stall somewhere around (or just after) 11:48.
More suggestions?
Thanks in advance
/Per-Olov