>I wonder if it's the security descriptor
> your using on the CreateFileMapping
it's possible, I should check
>on an NT4 TSE machine here are the results.
>
> Relative to Legacy TCP speed = 1.0
> SharedMem: 0.9 times faster than Legacy TCP
> Blocking TCP: 1.2 times faster than Legacy TCP
> Legacy SUP: 1.8 times faster than Legacy TCP
>
> The TSE machine is a Duel Pentium 600, with half a Gig of ram.
that's odd
--
,-._|\ Paul Motyer [TPX]
/ Oz \ SoftStuff P/L
\_,--._/ [paul.motyer at tpx.turbopower dot com]
v [ http://members.optushome.com.au/paulmotyer ]
I've just been testing the protocols on a TSE system, unfortunaly SharedMem
does'nt appear to work between sessions, (I thought it would because I
thought SharedMem was IPC per machine). The window help on IPC does say
it's processes per machine, but I wonder if it's the security descriptor
your using on the CreateFileMapping & CreateMutex??
But what realy baffled me was the benchmark results, On my XP machine the
speed increase was amazing. But on an NT4 TSE machine here are the
results.
Relative to Legacy TCP speed = 1.0
SharedMem: 0.9 times faster than Legacy TCP
Blocking TCP: 1.2 times faster than Legacy TCP
Legacy SUP: 1.8 times faster than Legacy TCP
The TSE machine is a Duel Pentium 600, with half a Gig of ram.
Keith..
> that's odd
Hi Paul,
That's what I think. Weird, it appeared that the slowest protocol was
SharedMem, but this is normally the fastest. The only thing I can think of
is that it's maybe the Duel Processor, maybe a strategic placing of SleepEx
is what's required..
Would you like to send the source, so that I can tinker, and any findings I
get I'll post them to you?, if you want I could do the PayPal first.
Also I assume your using Mutex's, so you might want to check the security
of these too..
You know me, I'm usually good at finding OS oddity's :-) , eg
(SendMessage/PostMessage) etc..
I've a couple of questions->
How does your SharedMem work, are you doing a sort of FiFo buffer?
does each connection have it's own 2 sharedMem objects, 1 for read & 1 for
write?.
How are you doing Timed Reads in TCP/IP, are you using GetSockOpt, or
Overlapped I/O?
Did you end up using SO_KEEPALIVE, this would be handy for Threads that
block, because I think they might still get fired during a Blocking
operation.?
Regards
Keith..
> Would you like to send the source
The licencing issues are still being sorted out
> Also I assume your using Mutex's
nope - Events
> so you might want to check the security
> of these too..
good point - but I'm already grantng full access in the Event's
constructor
> How does your SharedMem work
>are you doing a sort of FiFo buffer?
nope, no buffering at all - I rely on messaging being sequenced: the
client sends a message and awaits a reply. The approach is even
simpler than what I did in FF1
> does each connection have it's own 2 sharedMem objects,
>1 for read & 1 for write?.
no and yes. I use one object per connection - but divide it logically
into 2 - one half for the Client the other for the server then I use
Events for signalling - the same as in the FF1 code
> How are you doing Timed Reads in TCP/IP
Do you mean how do I time how long a read takes? I just use
QueryPerformanceCounter() before and after the recv()
> are you using GetSockOpt
only during an accept() - to set the new socket type to blocking
> or Overlapped I/O?
this is blocking - how does overlapped i/o apply?
> Did you end up using SO_KEEPALIVE
nope
> how does function like GetRecordBatch work,
It's no different to any other message:
- Client sends variable length request
- Server builds and returns a variable length reply
- until the reply is fully sent, the Client won't send another
message
- the server doesn't send unsollicited messages
- and the client doesn't listen for them
Each connection is managed independently of the others (with its own
thread) so I don't need to buffer
How the Transport sends is up to it. I send in 4K blocks - this means
there's a 16K overhead on SharedMem for each connection (4K send and
receive buffers for Client & Server)
> I'm not sure what the performance hit for a Named Event is
Whilst I don't know, MSDN seemed to suggest Events were one of the
fastest options - not as fast as Spinlocks - but they chew up CPU
cycles
I use a Spinlock for SharedMem connections. Of course a Spinlock is
faster - but I need to preserve some CPU cycles for other processes!
So I couldn't really use them for general packet transfers
> but recv() can block forever, or do you do a Select first?
A raw recv is the fastest - but it's a disaster client side to block
the main thread indefinately - and no good for performance to switch
threads.
So the code has two paths: if the Timeout is 0 --> a raw (unprotected)
recv() otherwise I use a protected receive protected by a Select().
I tried to optimize for speed - but it must be usable. The defaults
are that the Server side uses raw recv() - and then has to kill the
threads violently at shutdown, whilst the clients use whatever timeout
is supplied.
> what exactly is a spinlock?
It's a thread synchronization construct
A Spinlock is the name MS give to a DWORD of Memory that is accessed
by InterlockedIncrement, InterlockedDecrement, and InterlockedExchange
used to synchronize threads.
If the DWORD is just in the memory space of a process then it can be
used to synchronize the threads of that process.
If the DWORD is in SharedMemory then all processes and their threads
that map that SharedMemory can use it for synchronization
This is from MSDN:
"One way to implement a spinlock is to use a value of zero to
represent a free spinlock. When a thread needs to acquire the
spinlock, it uses InterlockedExchange to set its value to 1. The
spinlock is acquired if the result of the InterlockedExchange is 0,
otherwise the attempt has failed and must be retried. There are many
different strategies for retrying the lock acquisition (or,
"spinning"). The best method depends on many factors."
[this is from section 4.3 of this page:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndll
pro/html/msdn_scalabil.asp ]
It's this "retrying" that burns up the CPU cycles.
If the spinlock is guarding something infrequently used then it's a
pearler for speed compared to a CriticalSection or a Mutex - because
there are minimal retries. But if the resource is "popular" then the
retries can easily put the CPU utiliztion to 100%
That's sort of what I thought it might be ...
It reminds me of the Burroughs 6000 machine series. They were large
computers designed by software engineers instead of hardware engineers.
They were multi-processor, multi-programming machines. They
implemented a very nice DBS system in the 70's and quickly learned they
had to worry about the deadly embrace that could occur when locking
records. Machine one has recorde A and needs B and machine 2 has record
B locked and needs A. They implemented a hardware locking instruction
that in one memory cycle would test and/or switch the memory locataion
to and from zero and one. This prevented conflicts in the locking
situation especially with competing processors and allowed them to
implement a resolution for the deadly embrace. It was a great system,
and it really taught me that the software engineers were the ones who
should spec out the hardware requirements.
Have a nice day ...
havea nice day