On Wednesday, September 5, 2012 2:10:44 AM UTC-4, Rod Pemberton wrote:
> > char example[8] = "zot";
> > Then the string isn't null terminated. To be precise [...]
> > the string here is zero-padded.
> Since you apparently think that is true, feel free to cite K&R, ANSI, or ISO
> C specifications.
You're the kind of copy-and-paste, and I wouldn't dare try to dethrone you. You're free to provide evidence to the contrary.
> > Thanks for playing, Rod.
> I'm not sure why you're thanking me. Every time you say something, I've
> clearly demonstrated what you say is false, despite your attempts to present
> your falsehoods as correct. I'm not an authority, and I keep finding
> provable errors with what you say.
I'm waiting to see the error here. At most, you pointed out an imprecise use of language on my part. You haven't yet "proved" that the larger statement (that C programmers-- especially embedded systems programmers-- are in no way required to use any of the standard library functions to work with strings.)
> That's ignoring the fact you never post anything Forth related, i.e., you're
> just a nuisance.
Correct. Ignore me. Do you need help setting up a kill file?
> On Wednesday, September 5, 2012 2:10:44 AM UTC-4, Rod Pemberton wrote:
>> JP
...
> > > char example[8] = "zot";
> > > Then the string isn't null terminated. To be precise [...]
> > > the string here is zero-padded.
> > Since you apparently think that is true, feel free to cite K&R,
> > ANSI, or ISO C specifications.
> You're the kind of copy-and-paste, and I wouldn't dare try to
> dethrone you. You're free to provide evidence to the contrary.
You could learn something. It's clear I didn't want to look it up. So, you
might learn you're correct. Why the fear of finding out? Are you saying
you can dish out insults to everyone, but not prove your claims?
> You haven't yet "proved" that the larger statement (that C
> programmers-- especially embedded systems programmers--
> are in no way required to use any of the standard library
> functions to work with strings.)
Well, that's because I agree with the larger statement. Did you see me take
issue with it?
The C language can be used on it's own. However, it's a bit more difficult
to do so without C's file I/O functions. They are about all you need from
the C libraries.
If you actually had a memory and remembered what you read, you'd know that
was my position since it has been stated here on multiple occasions.
On Wednesday, September 5, 2012 4:09:08 PM UTC-4, Rod Pemberton wrote:
> You could learn something. It's clear I didn't want to look it up. So, you
> might learn you're correct. Why the fear of finding out? Are you saying
> you can dish out insults to everyone, but not prove your claims?
At this point, I have no idea what you think I got incorrect. I know it's some incredibly subtle use or misuse of language that nobody but you and your /seriously/ pompous assness cares about. Is it the word "use"? Is is that when talking about standard library functions that use strings, I didn't include printf? Please Rod, let me know what incredibly tiny meaningless waste of time argument you're making here.
> > You haven't yet "proved" that the larger statement (that C
> > programmers-- especially embedded systems programmers--
> > are in no way required to use any of the standard library
> > functions to work with strings.)
> Well, that's because I agree with the larger statement. Did > you see me take issue with it?
Ah, I should have known. The real problem here is that I wrote a message and felt the need to knee-jerk out a tedious pendant response.
No, that's not fair. Please, do let me know what horrible misrepresentation of C that I've provided here. After all, there are children here.
> The C language can be used on it's own. However, it's a bit more difficult
> to do so without C's file I/O functions. They are about all you need from
> the C libraries.
Not in an embedded system. Most of the systems I've coded over the years (and many of the others that I've worked on from others) were "bare metal" systems without an operating system. The I/O library of functions is useless when your front panel is composed of LEDs, knobs, and switches. Tell me how the I/O library is at all useful in such a system.
More recently, some of these systems have had serial ports for debugging. I typically implement one of the C-based Forths and use that as my monitor program. But even when that's not the case, I don't find it terribly useful to use printf and friends.
> If you actually had a memory and remembered what you read, you'd know that
> was my position since it has been stated here on multiple occasions.
This assumes I read everything you've written. I don't. I know it may come as a shock, but there are probably others in this newsgroup who also don't rush to the computer when another message from Rod is posted in comp.lang.forth. I
Bernd Paysan <bernd.pay...@gmx.de> writes:
> By t2 < t1, all new events get scheduled into queue[1] or later, so the > total order of dependent events is maintained. The total order of > independent events isn't, but we don't need to care about.
Well, you may have to be careful about dependencies, but it's probably
doable. How many queue slots do you plan to have? I guess unlike OS
tasks, these gate operations won't schedule events far in the future, so
you can have small timeslices while still not needing a very large
queue.
> Yes, it could be done that way. The linear version still has ample > opportunities for tuning.
I suspect the binary heap version also can be tuned. I did finally
manage to look at it and there may be considerable overhead from the OO
stuff and a few other things. You'd know better than me about that
though. Is there a reasonable way to profile it?
>> around 180 connections ("netstat | wc").
> Active, TCP, not waiting? Local AF_UNIX socket connections don't count.
Oh, good point, there are only around 27 connections.
>> I don't know how many is typical for something like BitTorrent.
> Usually in the one-digit order.
Numbers I've seen are higher, but I'm not very knowledgeable about this.
> The algorithm to spead data quickly is "fastest peer
> first". ... Unlike in economy, the "trickle down effect" works in P2P
> networks, it's because there, everybody shares all he has ;-).
"fastest peer" in that case should mean fastest uploader rather than
downloader, I guess.
Paul Rubin wrote:
> Bernd Paysan <bernd.pay...@gmx.de> writes:
>> By t2 < t1, all new events get scheduled into queue[1] or later, so
>> the
>> total order of dependent events is maintained. The total order of
>> independent events isn't, but we don't need to care about.
> Well, you may have to be careful about dependencies, but it's probably
> doable. How many queue slots do you plan to have? I guess unlike OS
> tasks, these gate operations won't schedule events far in the future,
> so you can have small timeslices while still not needing a very large
> queue.
The worst gate delay you get is the "delay element", and you probably have to calculate the queue size at netlist read (by the next power of two). Without the delay element (which is difficult to calculate, as it is deliberately constructed to be slow), 16 slots should be more than enough.
>> Yes, it could be done that way. The linear version still has ample
>> opportunities for tuning.
> I suspect the binary heap version also can be tuned. I did finally
> manage to look at it and there may be considerable overhead from the
> OO stuff and a few other things.
heap1.fs is without OO stuff.
> You'd know better than me about that
> though. Is there a reasonable way to profile it?
ATM, gforth-prof doesn't build, but the net2o.fs code has a nice timing-
based profiler, which is intented to measure how much time different activities in the same program consume - search for "timing measurements".
>>> around 180 connections ("netstat | wc").
>> Active, TCP, not waiting? Local AF_UNIX socket connections don't
>> count.
> Oh, good point, there are only around 27 connections.
And most of them are probably just waiting, and therefore wouldn't have a timeout queued.
>>> I don't know how many is typical for something like BitTorrent.
>> Usually in the one-digit order.
> Numbers I've seen are higher, but I'm not very knowledgeable about
> this.
>> The algorithm to spead data quickly is "fastest peer
>> first". ... Unlike in economy, the "trickle down effect" works in P2P
>> networks, it's because there, everybody shares all he has ;-).
> "fastest peer" in that case should mean fastest uploader rather than
> downloader, I guess.
I would mix it, something like download_time + 2*upload_time, lowest number gets highest priority. As you usually upload and download at the same time (leecher mode), you can measure both rates of your peers - how fast do they upload to you, and how fast can you upload to them.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
m...@iae.nl (Marcel Hendrix) writes Re: Function Points
[..]
> Here are the iForth results [..]
> The trend is the same.
[..]
I forgot to the test heap1.fs (non-OOP)
binary heap linear "heap" heap1
-----------------------------------------+---------------------------------
gForth iForth | gForth iForth iForth
-----------------------------------------+---------------------------------
10 inserts 17,666ns 14 us | 9,287ns 13 us 11 us
10 deletes 15,941ns 9 us | 6,960ns 6 us 7 us
256 inserts 164,048ns 55 us | 452,609ns 142 us 33 us
256 deletes 476,276ns 133 us | 251,134ns 99 us 83 us
4096 inserts 2,609,171ns 743 us | 93,231,767ns 15,493 us 368 us
4096 deletes 12,061,005ns 2,886 us | 55,881,857ns 7,712 us 1,895 us
Bernd Paysan <bernd.pay...@gmx.de> writes:
> heap1.fs is without OO stuff.
OK. Is that the one you benchmarked? It still looks pretty painful,
e.g. you have
index cell /
in the bubble-up loop on every insert, which must be painful in an
interpreter. I don't know if iforth optimizes this stuff. There's
also a word hsize@ that is never used.
In the linear version, why is the list kept sorted? The simplest
implementation would be an unsorted list, that you scan to find the
smallest element. Insertion is then trivial (just append the new
element).
>> Oh, good point, there are only around 27 connections.
> And most of them are probably just waiting, and therefore wouldn't have > a timeout queued.
Hm, if TCP keepalive is active then there's a wakeup every 20 minutes or
so on each connection, I think.
Paul Rubin wrote:
> Bernd Paysan <bernd.pay...@gmx.de> writes:
>> heap1.fs is without OO stuff.
> OK. Is that the one you benchmarked?
I've posted benchmarks of that here, yes.
> It still looks pretty painful, e.g. you have
> index cell /
> in the bubble-up loop on every insert, which must be painful in an
> interpreter.
Unfortunately, there's no standard word that does the opposite of cells.
However,
index 2/ cell negate and
should to it, too.
> I don't know if iforth optimizes this stuff. There's
> also a word hsize@ that is never used.
It's used for the tests.
> In the linear version, why is the list kept sorted? The simplest
> implementation would be an unsorted list, that you scan to find the
> smallest element. Insertion is then trivial (just append the new
> element).
Yes, but that adds a factor of 2, because you always have to scan the full list. With this insert function, you have to scan half of the list (on average), and scanning is way more expensive than the MOVE to shift the rest. As I said, the linear version has ample opportunities to tune it, and the point to write it is only to show that even though, it is almost as good for up to 256 elements, and it is much more likely to be correct.
>>> Oh, good point, there are only around 27 connections.
>> And most of them are probably just waiting, and therefore wouldn't
>> have a timeout queued.
> Hm, if TCP keepalive is active then there's a wakeup every 20 minutes
> or so on each connection, I think.
Which is not really a performance-critical issue...
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
Bernd Paysan <bernd.pay...@gmx.de> writes:
> Unfortunately, there's no standard word that does the opposite of cells.
> However,
> index 2/ cell negate and
> should to it, too.
Best would be to generate a suitable inlined shift at compile time, I
guess.
> As I said, the linear version has ample opportunities to tune
> it... it is almost as good for up to 256 elements, and it is much more
> likely to be correct.
The linear version can be tuned (as you mentioned) with binary search to
find the insert point, plus O(1) deletion, plus choosing the move
direction to average n/4 moves instead of n/2. The log n version can
also be tuned a lot, though. I'm not sure what we have really learned.
A tuned-vs-tuned version would be more interesting, as would be
untuned-vs-untuned (i.e. both versions use OOF with generic ordering)
but neither seems worth the effort.
>> Hm, if TCP keepalive is active then there's a wakeup every 20 minutes
>> or so on each connection, I think.
> Which is not really a performance-critical issue...
It may be more of an issue when there's a large number of connections.
Bernd Paysan <bernd.pay...@gmx.de> writes:
>Paul Rubin wrote:
>> It still looks pretty painful, e.g. you have
>> index cell /
>> in the bubble-up loop on every insert, which must be painful in an
>> interpreter.
>Unfortunately, there's no standard word that does the opposite of cells.
Yes, strange that nobody has proposed CELL/ FLOAT/ SFLOAT/ DFLOAT/
yet. Even stranger that we have not implemented them in Gforth yet.
Concerning optimizations:
1) The IFs in BUBBLE-DOWN are hard to predict and could be replaced
with arithmetic, which might be faster (especially on native-code
systems).
2) It seems to me that the hysteresis of the HRESIZE words is to
tight. In the worst case you have one RESIZE per HINSERT or HDELETE.
>Yes, but that adds a factor of 2, because you always have to scan the >full list. With this insert function, you have to scan half of the list >(on average), and scanning is way more expensive than the MOVE to shift >the rest.
If the list is not in the cache, memory costs dominate, and it's not
clear that scanning N elements is then more expensive than moving N/2
elements.
Anton Ertl wrote:
>>Unfortunately, there's no standard word that does the opposite of
>>cells.
> Yes, strange that nobody has proposed CELL/ FLOAT/ SFLOAT/ DFLOAT/
> yet. Even stranger that we have not implemented them in Gforth yet.
Yes, the latter is really strange. I have CELL/ in bigForth, but not the float versions. Looking at Gforth's sources, CELL / is used several times, 1 FLOATS / only once.
> Concerning optimizations:
> 1) The IFs in BUBBLE-DOWN are hard to predict and could be replaced
> with arithmetic, which might be faster (especially on native-code
> systems).
A native code system could translate an IF DROP somevalue THEN to a movcc. VFX doesn't.
> 2) It seems to me that the hysteresis of the HRESIZE words is to
> tight. In the worst case you have one RESIZE per HINSERT or HDELETE.
The hysteresis is factor 2. Except for empty queues, not many RESIZEs needed.
>>Yes, but that adds a factor of 2, because you always have to scan the
>>full list. With this insert function, you have to scan half of the
>>list (on average), and scanning is way more expensive than the MOVE to
>>shift the rest.
> If the list is not in the cache, memory costs dominate, and it's not
> clear that scanning N elements is then more expensive than moving N/2
> elements.
Move operations are highly optimized.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
Bernd Paysan <bernd.pay...@gmx.de> writes:
>Anton Ertl wrote:
>> 1) The IFs in BUBBLE-DOWN are hard to predict and could be replaced
>> with arithmetic, which might be faster (especially on native-code
>> systems).
>A native code system could translate an IF DROP somevalue THEN to a >movcc. VFX doesn't.
I think the Forthier approach is to have a selection word, equivalent
to:
: select ( x1 x2 f -- x )
if drop else nip then ;
which would be implemented with movcc.
>> 2) It seems to me that the hysteresis of the HRESIZE words is to
>> tight. In the worst case you have one RESIZE per HINSERT or HDELETE.
>The hysteresis is factor 2. Except for empty queues, not many RESIZEs >needed.
It looks to me that you check for the same size before and after doubling.
E.g., if you check for x before doubling, after doubling hmaxsize is
x*2, and you check for hmaxsize/2, which is x. So if the size wobbles
around x, you get very many RESIZEs.
>> If the list is not in the cache, memory costs dominate, and it's not
>> clear that scanning N elements is then more expensive than moving N/2
>> elements.
>Move operations are highly optimized.
If the stuff is not in the cache, memory operations dominate, and it
does not matter whether in-cache moves are highly optimized; they get
swamped by the memory operations.
Anton Ertl wrote:
>>A native code system could translate an IF DROP somevalue THEN to a
>>movcc. VFX doesn't.
> I think the Forthier approach is to have a selection word, equivalent
> to:
> : select ( x1 x2 f -- x )
> if drop else nip then ;
> which would be implemented with movcc.
We could add one into Gforth, let GCC do the movcc magic, and check if it is worth the effort... If f is a well formed flag, you can do
: mux ( x1 x2 flag -- x ) tuck invert and >r and r> or ;
and have highly predictable code.
>>> 2) It seems to me that the hysteresis of the HRESIZE words is to
>>> tight. In the worst case you have one RESIZE per HINSERT or
>>> HDELETE.
>>The hysteresis is factor 2. Except for empty queues, not many RESIZEs
>>needed.
> It looks to me that you check for the same size before and after
> doubling.
> E.g., if you check for x before doubling, after doubling hmaxsize is
> x*2, and you check for hmaxsize/2, which is x. So if the size wobbles
> around x, you get very many RESIZEs.
Yes, probably. The benchmarks would not exhibit that behavior.
>>Move operations are highly optimized.
> If the stuff is not in the cache, memory operations dominate, and it
> does not matter whether in-cache moves are highly optimized; they get
> swamped by the memory operations.
Move operations are even highly optimized for out-of-cache, by having the apropriate prefetchers. You simply get all available bandwidth with a move, and you can't predict if you get the same bandwidth with a scan.
Well, maybe you get it, because it's an easy enough to predict pattern.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/
Gerry Jackson <ge...@jackson9000.fsnet.co.uk> writes:
>On 07/09/2012 20:34, Bernd Paysan wrote:
>> Anton Ertl wrote:
>>>> A native code system could translate an IF DROP somevalue THEN to a
>>>> movcc. VFX doesn't.
>>> I think the Forthier approach is to have a selection word, equivalent
>>> to:
>>> : select ( x1 x2 f -- x )
>>> if drop else nip then ;
>>> which would be implemented with movcc.
>> We could add one into Gforth, let GCC do the movcc magic, and check if
>> it is worth the effort...
The effort is small, the benefit will be hard to measure. How do you
want to check this? This would be a natural factor of MIN, MAX, UMIN
and UMAX, though.
>> If f is a well formed flag, you can do
>> : mux ( x1 x2 flag -- x ) tuck invert and >r and r> or ;
>> and have highly predictable code.
Anything wrong with the name "SELECT"? Is there some naming conflict?
Or is this just the "name NIH" syndrom that's even more rampant in
Forth than in other areas.
Another alternative (one word less:-):
: select ( x1 x2 f -- x )
>r tuck xor r> and xor ;
>or even
>: mux ( x1 x2 flag -- x ) 1+ roll drop ;
Cute, but probably slower than the above on many systems, particularly
on the faster ones.
Bernd Paysan <bernd.pay...@gmx.de> writes:
>Anton Ertl wrote:
>> If the stuff is not in the cache, memory operations dominate, and it
>> does not matter whether in-cache moves are highly optimized; they get
>> swamped by the memory operations.
>Move operations are even highly optimized for out-of-cache, by having >the apropriate prefetchers.
I never heard about separate prefetchers for move operations. The
prefetchers I have read about would benefit a scan at least as well as
a move. Someone on comp.arch described a direction change (from read
to write or vice versa) for DDR2 and DDR3 SDRAM as "train wreck"; I
would expect fewer direction changes with the scanning approach.
> You simply get all available bandwidth with >a move, and you can't predict if you get the same bandwidth with a scan.
Due to the train wreck thing, I would not be so sure about getting all
the bandwidth when using moves, while scanning seems to be ideal for
hardware prefetchers. I also remember funny results when doing
multi-processor microbenchmarks with moves.
Let's just measure it:
[~/gforth:78229] time vfxlin "20000008 allocate throw constant pad : scanning ( n addr ) begin 2dup @ < while cell+ repeat 2drop ; : bench 1000 0 do -1 pad scanning loop ; pad 20000000 erase -2 pad 20000000 + ! bench bye" >/dev/null
real 0m4.209s
user 0m4.132s
sys 0m0.016s
[~/gforth:78230] time vfxlin "20000008 allocate throw constant pad : bench 1000 0 do pad cell+ pad 10000000 move loop ; pad 20000000 erase bench bye" >/dev/null
real 0m3.687s
user 0m3.600s
sys 0m0.024s
Moving x/2 memory is indeed a little (factor 1.15) faster than scanning.
Anton Ertl wrote:
> Anything wrong with the name "SELECT"? Is there some naming conflict?
No. The name for a multiplexer is MUX. As in AND, OR, XOR, which are all named by their logic equivalent. Though Forth chose to have an inverter called INVERT instead of just INV.
> Or is this just the "name NIH" syndrom
The name of that thing has long been invented. It's MUX. Please don't try to reinvent a different name for that thing.
> that's even more rampant in Forth than in other areas.
I've never seen a multiplexer named SELECT - it usually has an input line called S or select or so, but itself it is called MUX. SELECT is used for SQL statements to query a data base.
> Another alternative (one word less:-):
> : select ( x1 x2 f -- x )
> >r tuck xor r> and xor ;
Yes, but less obvious.
>>or even
>>: mux ( x1 x2 flag -- x ) 1+ roll drop ;
> Cute, but probably slower than the above on many systems, particularly
> on the faster ones.
On vfx, this one is 3 times slower.
-- Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/