[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: pf expiring states way too fast (2 hosts using carp+pfsync)



Hi Justin.
Ok I top post as well.....
Shut down my primary fw yesterday due to this problem. Today I did the 
following:
1. Did a pfctl -x misc in the secondary fw2 that we are runing on right now.
2. Started up the primary fw1 again
3. on the console on fw1 I did a pfctl -x misc
4. tried to log in to fw 1 with ssh from the lan (directly connected 
192.168.8.153 to 192.168.8.2). It hanged my session. I copied 
the /var/logmessages file and shut the fw1 down again.
As you can see in the following files the fw1 is yelling of bad state and my 
sessions hang.
http://www.incedo.org/~sjoholmp/pf/messages-fw1-pfctl-xm
http://www.incedo.org/~sjoholmp/pf/messages-fw2-pfctl-xm
As said in the conversation below it had worked for days before this happened. 
An ssh login to the secondary fw always work regardless of the master is up 
or not.... And the only solution I can come up with is to remove the "flags 
S/SA" from all tcp rules or run only one of the firewalls....
system bug ? I wild guess is pfsync....
Help appreciated
Regards
/Per-Olov
-- 
GPG keyID: 4DB283CE
GPG fingerprint: 45E8 3D0E DE05 B714 D549 45BC CFB4 BBE9 4DB2 83CE
On Wednesday 17 November 2004 06.18, Justin Krejci wrote:
> Wow! I am experiencing an almost identical problem just in a different
> scenario. I am not using carp, just PF. This is a totally strange problem
> that only just started happening one day at random. I have two servers that
> were humming along just fine for weeks without change or incident, just the
> occasional reboot for updates or power outages. Then one day both of my
> servers started having the identical problem at the same time where ssh
> sessions would stall early on, http traffic would halt after about 10%
> downloading one page.
>
> I have very similar tcpdump log entries with lots of packets with the P
> flag set.
>
> The only thing I have been able to figure out is that if I remove "keep
> state" on all of my pass rules, there are no problems.
>
> Try setting everything up as normal, then do
> pfctl -xm
>
> at this point monitor /var/log/messages for pf entries.
>
> On Tuesday 16 November 2004 11:59 am, you wrote:
> > On Wednesday 10 November 2004 20.11, Per-Olov Sjöholm wrote:
> > > On Wednesday 10 November 2004 19.46, you wrote:
> > > > On Wed, Nov 10, 2004 at 04:14:59PM +0100, Per-Olov Sj?holm wrote:
> > > > > >> http://marc.theaimsgroup.com/?l=openbsd-pf&m=109351242125764&w=2
> > > > > >>
> > > > > >> This has been fixed in -current, you might want to try that.
> > > > >
> > > > > Is this fixed in 3.6 release ?
> > > >
> > > > Yes.
> > > >
> > > > > Wonder as I have random disconnects when the two firewalls are up
> > > > > at the same time.
> > > >
> > > > Which version are you running?
> > >
> > > I use 2 HP intel servers running 3.6 with carp for lan , dmz and
> > > external interfaces. Plus one dedicated interface for pfsync.
> > >
> > > But it seems to be more stable now with my random disconnects ( I
> > > changed the lan port in the switch and the lan cable on one of the
> > > firewalls). But strange that the redundant firewalls passed the initial
> > > tests and have ran perfect for 2 days before it started to do random
> > > disconnects.... When it started to act strange I did not see any errors
> > > with netstat -s. And it worked perfect when just one firewall was
> > > started???? Didn't matter which one.... The random disconnects were
> > > related to tcp based session like ssh etc through and to the firewall
> > > from the lan. But a console login on the firewall and an ssh session
> > > out on the internet worked.... So I really hope it was the lan switch
> > > port or the cable...
> > >
> > > The reason for asking was that I use adaptive timeouts...
> > >
> > > Tnx
> > > /Per-Olov
> >
> > Well, my random disconnect problem still persists.
> >
> > The firewalls can run really perfect for a couple of days. And just like
> > that we have a problem that only non web users notice (ssh, telnet users
> > etc). Then we see random quick disconnects. We can even see this when
> > going directly to the firewalll interface with ssh and not against the
> > carp interface. It's not the switches. I have tried several differents
> > ones and also against other interfaces in the firealls (xl* and fxp*
> > interfaces).
> >
> > The file:
> >  http://www.incedo.org/~sjoholmp/pf/real_fuckup.txt
> > shows a tcpdump when it hangs when I just did a SSH from the lan to one
> > the lan interface of the firewall that holds the primary carp for the lan
> > net. When the problerm occurs you can see that the system is sending alot
> > of packets with "P" flags. It also seems that an ssh to the lan interface
> > of the backup firewall   seems to work.....
> >
> > It seems like the problem goes away when I remove my "flags S/SA" from
> > the rules. But the strange thing is that it had worked for days before
> > the problem appeared. Rebooting the master firewall wont help. Only two
> > things helps.
> > Either - remove the "flags S/SA" from the rules.
> > Or - shut down the primary fw and just use the backup.
> >
> >
> >
> > Could this be a bug with carp, pf or pfsync that I am not aware of yet?
> > (btw, I have disabled adaptive timeouts even though I use 3.6 where it
> > should be fixed)
> >
> >
> > Any help very much appreciated.
> >
> > Thanks in advance
> > Per-Olov Sjöholm

Attachment: pgp00196.pgp
Description: PGP signature