[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

load balancing outgoing traffic: 1st TCP connection RESET



Hi,
My goal is to write a PF setup to load-balance outbound TCP connections on 
a multiple external connection and transparent squid environment.
Since squid acts as a proxy, all the real Internet web access is carried 
out by TCP connections opened by the firewall itself. 
So, instead of using route-to at the internal interface, I am trying to 
use it at the default external interface, as recommended by Daniel in 
previous messages about this issue.
At the moment, I am trying to find out why, every time PF chooses to use 
the second external interface, a TCP connection reset is occurring. And 
this is slowing down payload transfer. For applications like pop3, smtp, 
ssh where only one TCP connection is needed, delay is acceptable. However, 
for web traffic, where sometimes many TCP conncetions are needed, this 
sort of behaviour is quite annoying.
Here is my setup:
int_if1=rl1
IP=192.168.1.254
ext_if1=rl0(default)
IP=200.177.74.24
GW=200.177.74.1
ext_if2=tun0 (vr0)
IP=201.9.166.61
GW=200.164.195.8
I am going to use a simple test to show what is going on. Basically, I 
will try to telnet (from a client station in the internal LAN) to an 
internet host on port 25 (smtp). 
When PF load-balance rule chooses the default interface, everything works 
as expected without any delay. However, when it chooses to re-route the 
TCP connection to the second external interface, for some reason, the 
problem occurs.
I sniffed the traffic at the internal side and noticed that a TIMEOUT at 
the application level (about 3 seconds) causing the client station to try 
to open a second TCP connection. Strange, but the second TCP connection is 
always successfull, whereas the first one always times out.
When I capture the traffic from the second external interface, I notice
that a TCP RESET packet is sent by PF immediatelly after it receives a TCP
SYN+ACK from the remote peer. Now, when a second request comes from the 
local client, the TCP connection with the remote peer is established 
without a problem.
So my questions are:
1. Why the first TCP connection is RST'ed by PF ?
2. Why the second TCP connection is established?
I will add some information I captured about this test. Forget about 
timeframe  because these information were captured at different moments. 
Also there may be some inconsistency regarding port numbers from one 
capture to the other. That is because both my connections get dynamic IP 
addresses, and these change quite frequently.
First, what tun0 shows:
tcpdump: listening on tun0
Apr 08 12:33:02.259794 201.9.166.61.62030 > 200.154.55.5.25: S [tcp sum ok] 
1188473903:1188473903(0) win 16384 <mss 1460,nop,nop,sackOK> (DF) 
(ttl 127, id 24724)
Apr 08 12:33:02.349561 200.154.55.5.25 > 201.9.166.61.62030: S[tcp sum ok] 
1682044118:1682044118(0) ack 1188473904 win 5840 <mss 1452,nop,nop,sackOK> (DF)
(ttl 53, id 0)
Apr 08 12:33:02.349666 201.9.166.61.62030 > 200.154.55.5.25: R [tcp sum ok] 
1188473904:1188473904(0) win 0 (DF) (ttl 64, id 47380)
Apr 08 12:33:05.532382 201.9.166.61.62547 > 200.154.55.5.25: S [tcp sum ok] 
1188473903:1188473903(0) win 16384 <mss 1460,nop,nop,sackOK> (DF) 
(ttl 127, id 24710)
Apr 08 12:33:05.622442 200.154.55.5.25 > 201.9.166.61.62547: S [tcp sum ok] 
1681492751:1681492751(0) ack 1188473904 win 5840 <mss 1452,nop,nop,sackOK> (DF)
(ttl 53, id 0)
Apr 08 12:33:05.622813 201.9.166.61.62547 > 200.154.55.5.25: . [tcp sum ok] 
1:1(0) ack 1 win 17424 (DF) (ttl 127, id 55193)
Apr 08 12:33:05.763136 200.154.55.5.25 > 201.9.166.61.62547: P [tcp sum ok] 
1:32(31) ack 1 win 5840 (DF) (ttl 53, id 32667)
Apr 08 12:33:05.969835 201.9.166.61.62547 > 200.154.55.5.25: . [tcp sum ok] 
1:1(0) ack 32 win 17393 (DF) (ttl 127, id 49352)
It is the output of tcpdump after an attempt to "telnet remote-host 
smtp" from a client station in the internal network. About 3 seconds after 
the first RST'ed TCP connection, a second one is successfully established.
Now, this is what pflog shows:
43.330244 rule 19/0(match): pass in on rl1: 192.168.1.50.4285 > 
200.154.55.5.25: S [tcp sum ok] 3538843289:3538843289(0) win 16384 
<mss 1460,nop,nop,sackOK> (DF) (ttl 128, id 31687)
43.330327 rule 20/0(match): pass out on rl0: 200.177.74.24.64146 > 
200.154.55.5.25: S [tcp sum ok] 3538843289:3538843289(0) win 16384 
<mss 1460,nop,nop,sackOK>(DF) (ttl 127, id 31687)
43.330365 rule 28/0(match): pass out on tun0: 201.9.166.61.51921 > 
200.154.55.5.25: S [tcp sum ok] 3538843289:3538843289(0) win 16384 
<mss 1460,nop,nop,sackOK>(DF) (ttl 127, id 31687)
43.421156 rule 28/0(match): pass in on tun0: 200.154.55.5.25 > 
200.177.74.24.64146: S [tcp sum ok] 1411088670:1411088670(0) ack 3538843290 
win 5840 <mss 1452,nop,nop,sackOK> (DF) (ttl 53, id 39397)
43.421198 rule 20/0(match): pass out on rl0: 200.177.74.24.64146 > 
200.154.55.5.25: R [tcp sum ok] 3538843290:3538843290(0) win 0 
(DF) (ttl 64, id 54273)
43.421231 rule 28/0(match): pass out on tun0: 201.9.166.61.51921 > 
200.154.55.5.25: R [tcp sum ok] 3538843290:3538843290(0) win 0 (DF) 
(ttl 64, id 54273)
46.509515 rule 19/0(match): pass in on rl1: 192.168.1.50.4285 > 
200.154.55.5.25: S [tcp sum ok] 3538843289:3538843289(0) win 16384 
<mss 1460,nop,nop,sackOK> (DF) (ttl 128, id 36033)
46.509541 rule 20/0(match): pass out on rl0: 192.168.1.50.4285 > 
200.154.55.5.25: S [tcp sum ok] 3538843289:3538843289(0) win 16384 
<mss 1460,nop,nop,sackOK> (DF) (ttl 127, id 36033)
46.509611 rule 28/0(match): pass out on tun0: 201.9.166.61.62409 > 
200.154.55.5.25: S [tcp sum ok] 3538843289:3538843289(0) win 16384 
<mss 1460,nop,nop,sackOK>(DF) (ttl 127, id 36033)
46.753637 rule 28/0(match): pass in on tun0: 200.154.55.5.25 > 
192.168.1.50.4285: S [tcp sum ok] 1417804841:1417804841(0) ack 3538843290 
win 5840 <mss 1452,nop,nop,sackOK> (DF) (ttl 53, id 2454)
46.753659 rule 19/0(match): pass out on rl1: 200.154.55.5.25 > 
192.168.1.50.4285: S [tcp sum ok] 1417804841:1417804841(0) ack 3538843290 
win 5840 <mss 1452,nop,nop,sackOK> (DF) (ttl 52, id 2454)
46.753933 rule 19/0(match): pass in on rl1: 192.168.1.50.4285 > 
200.154.55.5.25: . [tcp sum ok] 3538843290:3538843290(0) ack 1417804842 
win 17424 (DF) (ttl 128, id 65204)
46.753940 rule 20/0(match): pass out on rl0: 192.168.1.50.4285 > 
200.154.55.5.25: . [tcp sum ok] 3538843290:3538843290(0) ack 1417804842 
win 17424 (DF) (ttl 127, id 65204)
46.753954 rule 28/0(match): pass out on tun0: 201.9.166.61.62409 > 
200.154.55.5.25: . [tcp sum ok] 3538843290:3538843290(0) ack 1417804842 
win 17424 (DF) (ttl 127, id 65204)
47.086834 rule 28/0(match): pass in on tun0: 200.154.55.5.25 > 
192.168.1.50.4285: P [tcp sum ok] 1417804842:1417804873(31) ack 3538843290 
win 5840 (DF) (ttl 53, id 7609)
47.086854 rule 19/0(match): pass out on rl1: 200.154.55.5.25 > 
192.168.1.50.4285: P [tcp sum ok] 1417804842:1417804873(31) ack 3538843290 
win 5840 (DF) (ttl 52, id 7609)
47.275202 rule 19/0(match): pass in on rl1: 192.168.1.50.4285 > 
200.154.55.5.25: . [tcp sum ok] 3538843290:3538843290(0) ack 1417804873 
win 17393 (DF) (ttl 128, id 1006)
47.275216 rule 20/0(match): pass out on rl0: 192.168.1.50.4285 > 
200.154.55.5.25: . [tcp sum ok] 3538843290:3538843290(0) ack 1417804873 
win 17393 (DF) (ttl 127, id 1006)
47.275233 rule 28/0(match): pass out on tun0: 201.9.166.61.62409 > 
200.154.55.5.25: . [tcp sum ok] 3538843290:3538843290(0) ack 1417804873 
win 17393 (DF) (ttl 127, id 1006)
Below are rules 19, 20 and 28 mentioned above:
@19 pass in log-all quick on rl1 inet from 192.168.1.0/24 to any flags 
S/SA keep state (if-bound)
@20 pass out log-all quick on rl0 route-to { (rl0 <gws_if1>), (tun0 <gws_if2>) }
 round-robin inet proto tcp all keep state (if-bound) 
@28 pass out log-all on tun0 inet proto tcp all flags S/SA keep state (if-bound)
I find it strange that some of the packets showed by pflog above do not
really show up at the physical interface. Also, the second TCP connection
setup shows source IP address 192.168.1.50, which is the address of the
client station (internal address). This should not show up at an external
interface (we are doing NAT). Nevertheless this is the one connection that
gets established in the end.
This is the output of pfsync0 immediatelly after the RESET is sent:
Apr 07 11:49:22.567051 PFSYNCv2 count 3: INS ST:
rl1 6 200.154.55.5:25 <- 192.168.1.50:2061
   CLOSED:SYN_SENT
   [0 + 1]  [518960201 + 2]
   age 00:00:00, expires in 00:00:00, 0:0 pkts, 0:0 bytes, rule 19
   id: 425543dc000000a8 creatorid: 186c1f0d
rl0 6 192.168.1.50:2061 -> 200.177.74.24:62834 -> 200.154.55.5:25
   SYN_SENT:CLOSED
   [518960201 + 2]  [0 + 1]
   age 00:00:00, expires in 00:00:00, 0:0 pkts, 0:0 bytes, rule 20
   id: 425543dc000000a9 creatorid: 186c1f0d
tun0 6 200.177.74.24:62834 -> 201.9.166.61:51412 -> 200.154.55.5:25
   SYN_SENT:CLOSED
   [518960201 + 2]  [0 + 1]
   age 00:00:00, expires in 00:00:00, 0:0 pkts, 0:0 bytes, rule 28
   id: 425543dc000000aa creatorid: 186c1f0d
Apr 07 11:49:23.560006 PFSYNCv2 count 1: UPD ST:
tun0 6 200.177.74.24:62834 -> 201.9.166.61:51412 -> 200.154.55.5:25
   TIME_WAIT:TIME_WAIT
   [518960201 + 5841]  [2181922309 + 16384]
   age 00:00:00, expires in 00:00:00, 1:0 pkts, 48:0 bytes, rule 28
   id: 425543dc000000aa creatorid: 186c1f0d updates: 1
If it is of any help, here is the route table:
Internet:
Destination        Gateway            Flags     Refs     Use    Mtu  
Interface
default            200.177.74.1       UGS         3      235      -   rl0
127/8              127.0.0.1          UGRS        0        0  33224   lo0
127.0.0.1          127.0.0.1          UH          2        0  33224   lo0
192.168.1/24       link#2             UC          1        0      -   rl1
192.168.1.50       0:e0:7d:da:8f:48   UHLc        2        7      -   rl1
200.164.195.8      201.9.166.61       UH          1        0   1492   tun0
200.165.104.14     200.164.195.8      UGHS        0        6      -   tun0
200.177.74/24      link#1             UC          1        0      -   rl0
200.177.74.1       0:5:9a:d2:34:54    UHLc        1        0      -   rl0
200.177.74.24      127.0.0.1          UGHS        0        0  33224   lo0
224/4              127.0.0.1          URS         0        0  33224   lo0
One more thing, when this happens, I see log messages like the following:
Apr  8 15:30:57 blt-ha /bsd: pf: state insert failed: tree_ext_gwy lan: 
200.177.74.24:60846 gwy: 200.177.74.24:60846 ext: 200.154.55.5:25
in /var/log/messages. In fact, it shows up twice during this process.
Thanks in advance for any help.
Regards,
Emilio
--------------
ext_if1="rl0"
gw_if1="200.177.74.1"
ext_if2="tun0"
gw_if2="200.164.195.8"
int_if="rl1"
lan_net=$int_if:network
table <gws_if1> { $gw_if1 }
table <gws_if2> { $gw_if2 }
tcp_INservices = "{ 22 }"
icmp_types = "echoreq"
set loginterface $int_if
set state-policy if-bound
scrub in all random-id fragment reassemble
nat on $ext_if1 from !($ext_if1) to any -> ($ext_if1)
nat on $ext_if2 from !($ext_if2) to any -> ($ext_if2)
block in log from any to any
block out log from any to any
pass quick on lo0 all
block drop in log quick on $ext_if1 from $priv_nets to any
block drop out log quick on $ext_if1 from any to $priv_nets
block drop in log quick on $ext_if2 from $priv_nets to any
block drop out log quick on $ext_if2 from any to $priv_nets
pass in quick log-all on $int_if from $lan_net to any flags S/SA keep state
pass out quick log-all on $ext_if1 route-to \
    { ($ext_if1 <gws_if1>) , ($ext_if2 <gws_if2>) } round-robin \
    inet proto tcp from any to any keep state label lb
pass in log-all on $int_if from $lan_net to $int_if flags S/SA keep state
pass out log on $int_if from $int_if to $lan_net flags S/SA keep state
pass in log-all quick on $ext_if1 reply-to ($ext_if1 $gw_if1) \
    inet proto tcp from any to ($ext_if1) port $tcp_INservices \
    flags S/SA keep state
pass in log-all quick on $ext_if2 reply-to ($ext_if2 $gw_if2) \
    inet proto tcp from any to ($ext_if2) port $tcp_INservices \
    flags S/SA keep state
pass in log-all quick on $ext_if1 reply-to ($ext_if1 $gw_if1) \
    inet proto icmp all icmp-type $icmp_types keep state label icmp-if1
pass in log-all quick on $ext_if2 reply-to ($ext_if2 $gw_if2) \
    inet proto icmp all icmp-type $icmp_types keep state label icmp-if2
pass out on $ext_if1 inet proto tcp all flags S/SA keep state 
pass out log-all on $ext_if2 inet proto tcp all flags S/SA keep state 
pass out on $ext_if1 proto {udp icmp} all keep state
pass out on $ext_if2 proto {udp icmp} all keep state