On 05/11/2022 20:15, Bart wrote:
> On 05/11/2022 17:54, James Harris wrote:
>> On 05/11/2022 16:51, Bart wrote:
>
>>> Your figures are too confused to comment on meaningfully. For a
>>> start, I'd like to know the reason for that 25 times speedup! What
>>> was it spending 96% of its time doing in those earlier versions?
>>
>> In a word, IO. It was reading and writing one character at a time -
>> which was enough to start with. You may remember we discussed this a
>> few years ago:
>>
>> Â Â
https://groups.google.com/g/comp.lang.misc/c/nABLfzd08dA/m/WImDDyDUCAAJ
>>
>>>
>>> I don't believe there's any real buffering involved in 18KB input.
>>
>> It was definitely buffering. With the new compiler if I change the
>> buffer size to 1 then it takes as long as it did before.
>
> > bufsize, approx time in ms
> > 1, 650
> > 2, 340
>
> I assumed that such a small file would be loaded in one go by the OS
> anyway. Then any calls you do, even reading a character at a time, would
> read from the OS's in-memory buffer.
>
> So to take 0.65 seconds to read 18KB seems puzzling. 28KB per second?
> That's roughly the transfer rate from a floppy disk! Yet this is memory
> to memory on a modern PC with GHz clock rates. Something funny is going on.
Don't forget all the context switching to and from kernel mode - for
both read (18k) and write (100k).
>
> A loop like this:
>
> Â Â to 18'000 do
> Â Â Â Â Â c:=fgetc(f)
> Â Â od
>
> in /interpreted/ code (calling the C function) is too fast to measure.
Two points:
1. That only reads - c. 18k. To be a fair test you would also have to
write c. 100k.
2. I'd expect fgetc to be buffered.
You can see the difference if you run under strace. It will show the
individual syscalls. Without buffering there should be something like
118,000 reads/writes. With buffering of 512 there should only be about
230 such syscalls - about 1/500th of the number. I suspect that kernel
calls and returns is where a lot of the time will be going if there's no
buffering.
Still not convinced? Take a look at dd. When run on a file with 112,000
bytes:
time dd if=infile of=/dev/null bs=1
time dd if=infile of=/dev/null bs=512
The first takes 370 ms. The second just 5ms. QED, I think. :)
Perhaps more interesting is where an individual compiler spends its
time: how long in lexing, statement parsing, expression parsing, IR
generation, IR alterations, optimisation, code gen, etc. At some point I
may add code to gather such info.
>
> But reading a 7.8MB input file with such a loop, a character at a time
> in scripting code, takes 0.47 seconds.
>
>>> If I tell it to generate ASM, it drops to 260K lines per second
>>> (taking 2.8 seconds to generate 2.2M lines. (The input file is 9MB,
>>> and the generated ASM is 87MB; the EXE was 8MB.)
>>
>> Based on the above I make mine 32k lines per second.
>
> This is the sort of speed of compilers like gcc. Is yours still written
> in Python? I thought you had it self-hosted.
The compiler is self hosted. The first one, which I now call cda, was
written in asm. The others, cdb to cdd, are written in my language and
compiled via asm. No other language is used in compilation.
What you may be remembering is that Python is used for running test scripts.
>
>>> These were all done on a slower machine. My current one (where I can
>>> get 1Mlps) uses an 'AMD Ryzen 3 2650U 2.6GHz', which while faster, I
>>> think is still low-end.
>>
>> Surprisingly, that processor doesn't come up at
>>
>> Â Â
https://www.cpubenchmark.net/singleCompare.php
>>
>> Â From /proc/cpuinfo I have the following.
>>
>> Â Â Intel(R) Pentium(R) Silver J5005 CPU @ 1.50GHz
>
>
> Probably because it's 3650U not 2650U. The rating on that site shows it
> at 3900 compared with 3050 of your device.
I see your other post that it's the 3250U.
As for comparing, unless your compiler is multithreaded it's probably
best to use the CPU's single-thread rating. Mine comes in at 1206. Yours
at 1812 - about 50% faster.
--
James Harris