Very slow read performance after upgrading to Windows Server R2012 R2

1,159 views
Skip to first unread message

Marcos

unread,
Jan 25, 2016, 7:43:44 PM1/25/16
to Harbour Users
Hello,

We ran our apps via a Terminal Service connection to
a VM hosted in a datacenter (Zen Server 6.5).

After upgrading this VM from Windows R2008 R2 to 
Windows R2012 R2, we saw a BIG performance penalty, 
but ONLY in Harbour or xHarbour apps !

In order to analyze the problem, we created a small Harbour PRG, 
that reads a 100 MB file in two different languages (Harbour and C), 
but using same algorithm:
   - Using a "C" procedure (inside the PRG, #pragma command), 
      via C fread() function, and;
   - Using a "Harbour" procedure (in the same the PRG), 
      via Harbour wrapped fread() function.

Of course, read task will run faster inside C code, then inside PRG code... 
But how much difference is normal, how much is strange ?

a) In the mentioned server (in a VM, now with Windows R2012 R2), the results are:
   fread() via C..:          1.47 seconds
   fread() via PRG:         57.15 seconds
      PRG fread() 38.90x slower then C fread()

b) In a new machine (Windows 7, not in a VM, Sata disk), the results are:
   fread() via C..:          1.69 seconds
   fread() via PRG:          5.23 seconds
      PRG fread() 3.09x slower then C fread()

c) In an very old machine (Windows 7, not in a VM, Sata disk), the results are:
   fread() via C..:          2.27 seconds
   fread() via PRG:         31.23 seconds
      PRG fread() 13.75x slower then C fread()

The test program is just one executable. Note that under C code, read performance 
is good in all machines. But performance under PRG code is very bad in the 
server (a), good in a new machine (b) and medium in a old machine (c).

Third-party benchmark utilities display very good performance marks in server (a). 

The PRG test executable was built using a clean Harbour 3.0 install, directly from 
distributed binaries. I  attached the source code. 

We are not understanding why so much variation in increased times 
(from 3.09x to 38.90x). (same EXE, same call, same hardware, same moment...)

If possible, can someone compile the PRG and see if time difference was similar ?

Maybe this could not be a hardware performance problem, because 
read under C code is ok. Maybe a Windows configuration issue ? 
Maybe a Harbour compiler switch ?

Thanks in advance,

Marcos
Double_fread.prg

CV

unread,
Jan 25, 2016, 10:10:59 PM1/25/16
to Harbour Users
Hi Marcos

If it serves you a bit, in the computer where I'm typing this message I have installed just xharbour.com profesional (not harbour).

Your test program gives me this:

C:\performance>double_fread.exe
fread() via C..:          0.37 seconds
fread() via PRG:          2.31 seconds
PRG fread() 6.17x slower then C fread()

This computer is a Dell desktop running W7 pro 64b, 16 gb ram, sata HD 2 TB (western digital black, 7.200 rpm), Intel I5 2.8 GHz (Sandy Bridge).

In my server at the office, via TS:

Q:\performance>double_fread.exe
fread() via C..:          1.06 seconds
fread() via PRG:          2.06 seconds
PRG fread() 1.94x slower then C fread()

The server is an Ibm desktop server running W2003 std 64b, 10 gb ram, Intel xeon 2.1 GHz, SAS HD 300 gb (10.000 rpm).

Tomorrow I will test this program compiled with harbour.

Regards,
---
Claudio Voskian
Buenos Aires - Argentina

Ash

unread,
Jan 25, 2016, 11:41:06 PM1/25/16
to Harbour Users
Hello Marcos,

I get a better result by simply changing the block size.

Regards.
Ash

#define F_BLOCK  100

fread() via C..:          0.10 seconds
fread() via PRG:          1.24 seconds

PRG fread() 12.44x slower then C fread()

#define F_BLOCK  1024

fread() via C..:          0.10 seconds
fread() via PRG:          0.15 seconds

PRG fread() 1.47x slower then C fread()
Message has been deleted

Francesco Perillo

unread,
Jan 26, 2016, 3:33:26 AM1/26/16
to harbou...@googlegroups.com
I'd check:
- which flags are passed when really opening a file
- using wireshark, how many packets and how long are they, are moved on the wire.

I'm having a strange situation with just ONE windows 8 notebook that is way way slower than all the others doing the same operation despite - it seems to me - everything is setup in the same way.

--
--
You received this message because you are subscribed to the Google
Groups "Harbour Users" group.
Unsubscribe: harbour-user...@googlegroups.com
Web: http://groups.google.com/group/harbour-users

---
You received this message because you are subscribed to the Google Groups "Harbour Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to harbour-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Przemyslaw Czerpak

unread,
Jan 26, 2016, 5:08:16 AM1/26/16
to harbou...@googlegroups.com
Hi Marcos,

is it a joke?
Don't you know that fread() C function uses read ahead buffer
and FRead() prg uses raw file read so it can be compared only
with read() C function?
To increase this idiotic test I suggest to add speed comparison
also for MemoRead() and the please inform us what is the fastest
method?

best regards,
Przemek

Marcos

unread,
Jan 26, 2016, 10:18:51 AM1/26/16
to Harbour Users
Hi Przemek,

I didn't know that  fread() C function uses read ahead buffer.

I changed my test program, and results were:
   a) In the mentioned server (in a VM, now with Windows R2012 R2), the results are:
       read() via C:              50.37 seconds
       memoread() via PRG:  0.09 seconds
       fread() via PRG:         55.20 seconds
       PRG fread() 1.10x slower then C read()

My conclusions about my test program are now:
- Harbour PRG performs only a little slower then C code, both reading 100 bytes per call
- Harbour Memoread() performs much better, because it reads all file in just one call

My conclusions about VM hosted server:
- Agressive read ahead optimizations make server performs well in cached reads
- But uncached reads are performing very bad.

Thanks for all,

Marcos
double_read_2.prg

Paola Bruccoleri

unread,
Jan 27, 2016, 6:59:29 AM1/27/16
to harbou...@googlegroups.com
El 26/01/2016 a las 12:18, Marcos escribió:
Hi Przemek,

I didn't know that  fread() C function uses read ahead buffer.

I changed my test program, and results were:
   a) In the mentioned server (in a VM, now with Windows R2012 R2), the results are:
       read() via C:              50.37 seconds
       memoread() via PRG:  0.09 seconds
       fread() via PRG:         55.20 seconds
       PRG fread() 1.10x slower then C read()

My conclusions about my test program are now:
- Harbour PRG performs only a little slower then C code, both reading 100 bytes per call
- Harbour Memoread() performs much better, because it reads all file in just one call

My conclusions about VM hosted server:
- Agressive read ahead optimizations make server performs well in cached reads
- But uncached reads are performing very bad.

Thanks for all,

Marcos
Hello Marcos
I tested your demo:

a) in a local machine (core i5, 6G ram)
read() via C......:          1.69 seconds
memoread() via PRG:          0.07 seconds
fread() via PRG...:          2.29 seconds

PRG fread() 1.36x slower then C read()


b) connected via terminal server to a server win 2012 R2 (the server where I work)
read() via C......:          8.68 seconds
memoread() via PRG:          0.12 seconds
fread() via PRG...:          8.09 seconds

PRG fread() 0.93x slower then C read()


Paola Bruccoleri

unread,
Jan 27, 2016, 7:03:44 AM1/27/16
to harbou...@googlegroups.com
with FBLOCK 1024

a)
read() via C......:          1.72 seconds

memoread() via PRG:          0.07 seconds
fread() via PRG...:          0.37 seconds

PRG fread() 0.21x slower then C read()

b)
read() via C......:          7.43 seconds
memoread() via PRG:          0.08 seconds
fread() via PRG...:          0.83 seconds

PRG fread() 0.11x slower then C read()


Przemyslaw Czerpak

unread,
Jan 27, 2016, 8:28:04 AM1/27/16
to harbou...@googlegroups.com
On Wed, 27 Jan 2016, Paola Bruccoleri wrote:

Hi,

> with FBLOCK 1024
> a)
> read() via C......: 1.72 seconds
> memoread() via PRG: 0.07 seconds
> fread() via PRG...: 0.37 seconds
>
> PRG fread() 0.21x slower then C read()
> b)
> read() via C......: 7.43 seconds
> memoread() via PRG: 0.08 seconds
> fread() via PRG...: 0.83 seconds
> PRG fread() 0.11x slower then C read()

I guess you wanted to write that PRG FRead() is faster.
But it's not realistic tests. The results should be
nearly the same. Probably infamous opportunistic locks
which allow to cash file body on the network client side
strongly interact with your results so "fread() via PRG"
gives such good result in comparison to "read() via C".
Just move "memoread() via PRG:" before "read() via C" and
see what will happen with C results.
It's well known fact that in MS-Windows any IO operations
are much slower then in *nixes. The best performance you
can reach running remotely *nix application which operates
on local files. This gives incredible speed difference.
If it's not possible and your applications uses RDDs then
you can try to use remote RDD like LETO or ADS RDD. If this
is not possible too then you can use HBNETIO and finally
simple network driver like the one you were testing. Anyhow
in all cases if possible use Linux as the server to maximize
the performance.

best regards,
Przemek

Marcos

unread,
Jan 28, 2016, 12:45:58 PM1/28/16
to Harbour Users
Hi All,

After we saw that slow performance was not related to Harbour, we start doing a step by step installation of Windows Server 2012 R2, checking performance after each step. 

Just after installing "RDS - Remote Desktop Services", in Windows Server 2012 R2, I/O performance drops down (up to 8 times slower then before installing "RDS"), with only one user connected to VM...

After some research, we saw that, in "Windows 2012", a feature named "Dynamic Fair Share Scheduling (DFSS) / CPU Fairshare" was enabled by default, and tries to share CPU/Net resource in equal base to all processes.

As we didn't find, in Windows 2012, any specific group policy, we disabled this directly via registry setting. 

Without reboot, I/O performance changed to normal, just a second later !

Disk (1 - enable, 0 - disable)
HKLM\SYSTEM\CurrentControlSet\Services\TSFairShare\Disk\EnableFairShare

NetFS (1 - enable, 0 - disable)
HKLM\SYSTEM\CurrentControlSet\Services\TSFairShare\NetFS\EnableFairShare

IMPORTANT NOTES:
- Read carefully TechNet texts about this, before doing any change !
- This chance can help only if your server has RDS, and I/O performance is much more slower than a normal performance.
- We didn't know if this change can cause any side-effects.
- In other Windows versions (ex: 2008), registry keys are different.

We will post any additional information, if we find any side-effect.

Thanks,

Marcos

==========

References:
What's New in Remote Desktop Services in Windows Server

Group Policy to disable DFSS is not functional

You find high CPU usage for the Wmiprvse.exe process on a terminal server that is running Windows Server 2008 when you run the Windows System Resource Manager

Using Windows System Resource Manager

Remote Desktop Session Host
Reply all
Reply to author
Forward
Message has been deleted
0 new messages