Re: Bordermanager & TCPIP version

Mysterious

unread,

Feb 2, 2010, 5:42:42 AM2/2/10

to

On 02/02/2010 09:06 AM, marcoba wrote:
>
> At the moment we are struggling with Bordermanager not handling a wide
> range of sites. After some testing most problems seem to come down to
> the TCPIP stack version and different SET params (Nagle/DACK).

tid3321740

Nagle, Minshall and delay can be set to OFF

> We run BM39SP2IR1 now and de TCPIP version from SP8. We use HTTP ptoxy
> only with some Filters and Access rules on the fastest HPG6 server with
> dedicated cache controller/disks.
>
> The IR1 readme states there should be a new TCPIP version available. I
> cannot find the download for this file. Anyone?

it is not yet released. It should be this week or next one but the only
fixes there is for VPN.

> How do you handle ''problem" sites?

1. Check that bm is properly configure. See above tid
2. Check that your proxy.cfg is using the correct switches. tid3988333
or use the one from Craig's site (http://www.craigjconsulting.com/)
3. Search support and forums to find out if problem is already known
4. Take a lan trace on bm server to find out why it fails.

This is our short-list:
> - Citrix/terminal server is slow with TCP6.8x. Fastest (and only
> workable version is TCP6.57)

tid7004603. Use the domestic version.

> - Sites with upload options don't work/are slow (Viadesk for example)

Probably a proxy.cfg setting. Lan trace will confirm that

> - Redirects within sites result in 504-errors

Probably a proxy.cfg setting or another device like Trend micro in the
middle messing up with the 302 header. Lan trace will confirm that.

> - NTLM site authentication

tid10061203. NTLM does not work thru bm

Massimo Rosen

unread,

Feb 2, 2010, 2:00:07 PM2/2/10

to

Gonzalo,

Mysterious wrote:
> tid3321740
>
> Nagle, Minshall and delay can be set to OFF

Luckily, the TID doesn't state that, or I would have had to change it.
With todays internet line speeds, delayed ack, minshall and nagle
absolutely *MUST* be on, otherwise performance will be abysmal on all
ends. It is not possibly to achieve in any way normal internet
performance today when you have to acknowledge every single TCP packet.
That is pure maths. Even with a relatively fast pingtime of 30ms, if
delayed ack is off, the maximum performance reachable without delayed
acks would be roughly 500kb/s, completly irregardless how fast your
internet connection is in terms of bandwith. With an even higher
latency, the max. bandwith drops further linearly. E.G, a Target 60ms
away would only be able to achieve 250kb/s, and so on. 2001, when that
TID (and this paragraph) already existed, 250 or 500kb/s were basically
unexisting in the internet for normal use on individual connections.
Today, it's nothing. I have 100mbit/s at home and in my office.

In fact, the TID clearly explains that the switch should *not* be off on
Proxy.nlm versions newer than 7/16/01.

Although the statement in that TID that proxy itself would not use
delayed ack internally is luckily wrong, otherwise bordermanager would
long be in a state of being unuseable.

CU,
--
Massimo Rosen
Novell Product Support Forum Sysop
No emails please!
http://www.cfc-it.de

mysterious

unread,

Feb 2, 2010, 3:07:44 PM2/2/10

to

Massimo

We can discuss whatever you want, nagle and delay interact badly with
each other creating a dealock. With both algorithms enabled,
applications which do two successive writes to a TCP connection,
followed by a read which will not be fulfilled until after the data from
the second write has reached the destination, experience a constant
delay of up to 500 milliseconds, the "ACK delay". For this reason, TCP
implementations usually provide applications with an interface to
disable the Nagle algorithm. This is typically called the TCP_NODELAY
option. Applications that expect real time responses can react poorly
with Nagle's algorithm. Applications such as networked multiplayer video
games expect that actions in the game are sent immediately, while the
algorithm purposefully delays transmission, increasing bandwidth at the
expense of latency. For this reason applications with low-bandwidth
time-sensitive transmissions typically use TCP_NODELAY to bypass the
Nagle delay.
Really nice explained here:

(http://www.stuartcheshire.org/papers/NagleDelayedAck/)

or here:

https://datatracker.ietf.org/drafts/draft-doupnik-nagle-mode/

Both Nagle mode and delayed ACKs attempt to conserve network and host
machine resources by delaying transmissions in the expectation that the
current material can be piggybacked onto a future transmission.
Unfortunately when both mechanisms are active at the same time on either
end of a connection a deadlock can exist, which is broken by arrival of
new data for transmission or firing of the delayed ACK timer. This
produces classical timer based ACKing, which for the common 200ms ACK
delay yields five exchanges per second.

Massimo Rosen

unread,

Feb 2, 2010, 3:54:46 PM2/2/10

to

Gonzalo,

mysterious wrote:
>
> Massimo
>
> We can discuss whatever you want, nagle and delay interact badly with
> each other creating a dealock.

Correct. That's what minshall is there for, to avoid said deadlock.
Unfortunately, Novell ships the worst possible combination of default
settings, in that del. ack and nagle are on, and minshall off. YOu
enable minshall, and all is well and dandy.

> Applications that expect real time responses can react poorly
> with Nagle's algorithm.

Right, like Telnet for example. Streming applications like http couldn't
care less.

> Applications such as networked multiplayer video
> games expect that actions in the game are sent immediately, while the
> algorithm purposefully delays transmission, increasing bandwidth at the
> expense of latency.

Yep, online games are another example. Still, with minshall enabled, no
problem. And apart of this, as you correctly state, these applications
internally disable delayed acks on their own via the IP api. I just
don't know what that has to do with Border.

> For this reason applications with low-bandwidth
> time-sensitive transmissions typically use TCP_NODELAY to bypass the
> Nagle delay.

Correct.

But you wouldn't call HTTP "low bandwith", or?

As I said, Proxy does *not* disable delayed ack (every lan Trace of a
http stream immediately proves that), and disabling delayed ack in the
OS *seriously* degrades performance of proxy for individual streams.
Apart from the fact that it also kills any other potential TCP
connectivity to the server, and a lot of Border servers are also used
for other stuff, that sufer massively from delayed acks being disabled.

Massimo Rosen

unread,

Feb 5, 2010, 4:20:31 AM2/5/10

to

Hi,

marcoba wrote:
>
> At the moment we have only NAGLE set to OFF (Minshall is OFF by
> default). With these settings Citrix (ICAoverHTTP) flows accaptable (not
> fast as with TCP657!)

E.G you say with *just* delayed acks enabled and nagle and minshall both
off, it works? That doesn't really make a lot of sense. Same goes for a
difference between SP8 tcpip and TCP657 in that regard. That *could*
potentially point to a bug in SP8 tpcip you have discovered, but that's
uncertain at this point.

> through the proxy and a certain upload site works
> fine. With all options ON Citrix just gets unworkable and the upload
> site fails.

A failure is never an option with any combination of these settings.
They affect performance, yes. They should under no circumstances cause
complete failure. There must be something else going on here, and the
only way to find out what is to take a LAN Trace.

> Our network engineers suggest to change the MTU-size on the
> bordermanger servers to 1500. Any comments to that? We do not use
> BM-VPN.

The MTU *is* 1500. That's ethernet standard. And a potential MTU issue
would never cause such effects, or be in any way influenced by ACK
settings.

> Finally, looking for a confirmation on bm-tuning. We use the CJ-ncf for
> tuning but we had to change the numbers for:
> Set Maximum Packet Receive Buffers = 120000
> Set Minimum Packet Receive Buffers = 32767
> Max running value seen = 77656

That sounds highly suspicious. You should *never* see a server
allocating +70000 ECBs. That points to an issue.

Massimo Rosen

unread,

Feb 5, 2010, 4:24:57 AM2/5/10

to

Hi,

marcoba wrote:
>
> Above problems are easily solved by using a SQUID proxy directly or
> SQUID via CERN.

And the latter (SQUID via Cern works) is another pointer to some odd
problem at your site (vs. some general issue in Border, Netware, or
TCP/IP). In theory, from Border or clients POV, this shouldn't make a
difference. All it does is chnage the target bordermanager uses from the
"real" website to another proxy. I fthat works, but using the original
website doesn't, then that points to a communication problem between BM
and the original website that doesn't exist between BM and Squid, and
not between Squid and target.

Craig Johnson

unread,

Feb 8, 2010, 1:36:50 PM2/8/10

to

In article <marcoba...@no-mx.forums.novell.com>, Marcoba wrote:
> We found this to be an diskchannel issue! When we start about 8
> large-file downloads the dirty-cache fills memory completely and then
> the receive buffers start climbing until the server gets unresponsive.
> The cache diskchannel has a dedicated array controller with 3 disks,
> setup to be raid0 each (3 logical drives). BBWcache is enabled at
> 50/50%. This setup can't cope with the downloads. We tested the
> raid0-setup with 3 disks (1 logical drive) and this does the job much
> better. Then it only starts failing at 10+ downloads, where we fill the
> GB-connection to internet.
>
You should try 1 cache volume per physical drive, no array, and see how
that works. Ideally, 6 or 7 fast drives, each with a dedicated cache
volume, for simultaneous reads and writes split over the drives.

Craig Johnson
Novell Knowledge Partner
*** For a current patch list, tips, handy files and books on
BorderManager, go to http://www.craigjconsulting.com ***

Craig Johnson

unread,

Feb 12, 2010, 1:26:31 PM2/12/10

to

In article <marcoba...@no-mx.forums.novell.com>, Marcoba wrote:

> Is it wise to create one single cache volume on this 1 logical drive,
> or will creating more cache volumes be more effective? What about the
> sizes of these volumes (we now have 3 12GB volumes).
>
I'm not keen on the idea of RAID0. If any of the drives fail, you lose
the logical array, and then proxy blows up (even if it is only using
the RAID0 for caching - proxy will unload if it doesn't see at least
one cache volume to use).

The idea of having RAID0 on three drives is to split writes (and reads)
simultaneously over three drives. The idea of having 3 cache volumes
on 3 drives is to split writes across 3 drives, but it is safer.

The 'ideal' cache size is now 15GB per cache volume. The number of
volumes depends on the loading. Try to get close to having enough cache
capacity to hold 1 week of browsing. More is wasted, less is not as
efficient.

Massimo Rosen

unread,

Feb 25, 2010, 4:56:25 AM2/25/10

to

Hi,

marcoba wrote:
>
> marcoba;1934657 Wrote:
> > Anyway... too little too late :*(
> > Although the performance is fine now, the BM-servers are being stripped
> > down and will be replaced this week with squid3 on suse10. Way to many
> > problems with specific sites that just work fine with squid.Marco
>
> Interesting news... the new squids installed on the former
> bordermanager hardware now show the same site problems. We think we are
> experiencing nic issues with the proliant G6 servers.

That's *extremely* unlikely. That Squid has even more problems to keep
up with your load than BM on similar Hardware however was something I
totally expected. Squid cannot compete with BM in terms of performance.
Good Luck.