Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

ecosystem

54 views
Skip to first unread message

muta...@gmail.com

unread,
Nov 29, 2022, 3:05:58 AM11/29/22
to
Hi Waldek (mainly).

You made a comment that writing an assembler is
not difficult. I'm wondering what IS difficult for a
basic OS and tools - not necessarily exactly MSDOS
which needs to cope with segmentation, but
something that looks like MSDOS, regardless of
whether it runs on ARM or S/370.

Here is what I have for MSDOS for the 8086:

PDOS/86 (OS): about 30,000 lines
PDPCLIB (C library): About 17,000 lines
SubC (C compiler): About 5,500 lines
as86 (assembler): About 13,000 lines
pdar (archiver): About 1000 lines
ld86 (linker): About 3000 lines
pdmake (make): About 2000 lines

I am not very good with algorithms, nor do I know
much of the theory, so at the moment, only numbers
1 and 2 are within my capability.

Note that I am running up against the 640k limit
with PDOS/86. The OS and command processor
are taking up 300k or something, and when I
try to run pdmake (which opens another command
processor before running another program), I run
out of memory.

I refuse to change the fundamental design to try to
alleviate the memory problems, and instead wish to
run the exact (*) same toolchain in either PM16 or
PM32 with the D bit set to indicate 16-bit.

If I go the PM32 route I am wondering whether I can
make fairly small (LOC) changes to PDOS/386 to make
it accommodate 16-bit (only) programs - the specific
MSDOS tools that have been linked with PDPCLIB - I
don't care about other MSDOS programs that don't
follow "the rules" (*).

(*) The rules aren't set in stone yet. PDPCLIB still
hardcodes 4-bit shifts which won't work on either
a Turbo 186 (8-bit shifts) or the above PM16/32
scenario, and it is only when the rules exist, and
PDPCLIB follows the rules, that I wish to throw
64 MB (to start with) at my MSDOS executables.

Any comment?

Note that the LOC are mostly the same for PDOS/386.
Only the assembler and linker change for those.

pdas - 6000 lines
pdld - 2000 lines

Thanks. Paul.

Joe Monk

unread,
Nov 29, 2022, 5:49:31 AM11/29/22
to

> Note that I am running up against the 640k limit
> with PDOS/86. The OS and command processor
> are taking up 300k or something, and when I
> try to run pdmake (which opens another command
> processor before running another program), I run
> out of memory.
>
> I refuse to change the fundamental design to try to
> alleviate the memory problems, and instead wish to
> run the exact (*) same toolchain in either PM16 or
> PM32 with the D bit set to indicate 16-bit.
>

Back in the day, when I was doing application work before I got into systems, I wrote code that did accounting for pensions.

We had one piece of mainline code, and then we had subroutines that did certain parts of the work. We dynamically built the code each time we started a run. In addition, we had code that did file IO, but one of the features of the system was that account numbers could be either 12 or 20 bytes with no change in the code.

The mainline had hooks (our term), but what they really were was a common API - a well defined interface for passing information among code. Since we were on memory limited systems (most of the time 256K or less), we would swap in and out portions of code that didnt need to be in memory.

I'd be willing to bet your code is probably suffering from the same issues... You have everything in memory, versus a "resident" portion of the OS that provides services, and swaps in and out portions of code that dont need to be there all the time - passing a buffer among subroutines for input and output as those subroutines are swapped in and out.

Joe


muta...@gmail.com

unread,
Nov 29, 2022, 11:51:33 PM11/29/22
to
On Tuesday, November 29, 2022 at 6:49:31 PM UTC+8, Joe Monk wrote:

> I'd be willing to bet your code is probably suffering from the same issues...
> You have everything in memory, versus a "resident" portion of the OS that
> provides services,

Absolutely. It's a very simple design. And I'm happy
with that.

If someone wants to port PDOS to a system with
minimal memory, which apparently includes the
IBM PC XT - they are free to modify it to do swapping
etc.

I just don't want to do that in my copy of the code.
Anyone who wants to run my copy of the code is
sort of required to provide 2 MB realistically.

BFN. Paul.

anti...@math.uni.wroc.pl

unread,
Nov 30, 2022, 11:46:25 PM11/30/22
to
muta...@gmail.com <muta...@gmail.com> wrote:
> Hi Waldek (mainly).
>
> You made a comment that writing an assembler is
> not difficult. I'm wondering what IS difficult for a
> basic OS and tools - not necessarily exactly MSDOS
> which needs to cope with segmentation, but
> something that looks like MSDOS, regardless of
> whether it runs on ARM or S/370.

Well, is real OS the big thing is device drivers. If you
look at recent Linux kernel source tree you will see that 964MB
is source code of drivers. The whole source tree is 1465MB,
so drivers are more than 65%. And the rest includes build
machinery, utilities and documentation which support everthing,
so also drivers. And there is 146MB arch subdirectory, which
contains support for various architectures. Much of architecture
dependent code is "driver like", it task is to handle device like
things like busses, timers, interrupt controllers etc.

In compilers hard part is optimization. When I compare gcc-4.8
to gcc-12.0 it seems that code produced by gcc-12.0 is probaby
about 10% more efficient than code from gcc-4.8. But C compiler
in gcc-12.0 is twice as large as C compiler in gcc-4.8. And
looking back, gcc-4.8 is much bigger than gcc-1.42 (IIRC C
compiler in gcc-1.42 was of order one megabyte in size).
gcc-12.0 produces more efficient code than gcc-1.42, but
probably no more than 2 times more efficient. Certainly,
code from gcc-12.0 is not 26 time more efficient than code
from gcc-1.42 (which would be the case if speed of object
code were simply proportional to compiler size). And in
turn gcc-1.42 generates more efficient code than simpler
compilers.

Both of the above are quite different than MSDOS, so let
me mention another aspect. "Bug compatibility" with
different system is hard. Namely, original devlopers
(in case of MSDOS Microsoft) code in a way that is
convenient to them, and say that "product is as is".
If you want to compete with MSDOS you need carefuly establish
what MSDOS is doing and then find a way to implement exact
same bahaviour in your product. This was learned hard
way by Wine folks. Original idea was: Linux system calls
provide equivalent functionality to Windows system calls,
so let us create a loader which can load PE executable and
provide tiny translation layer from Windows system calls
to Linux calls. Loader part went smoothly, but Wine
folks quickly discoverd that there were no "well written"
Windows programs: even "trivial" programs depended on
various tiny details of Windows interface. Do it differently
and the program will not work.

For MSDOS there are some specific troubles:
- interfaces were specified in assembler
- OS hand to run acceptably on small and by modern standard slow
machines

Looking at this, I think that there were a lot of companies
which could create something with comparable functionality
to MSDOS, so in this sense replicating MSDOS was not hard.
If you want good compatiblity, and efficiency, then things
go harder, but IIUC there were several companies that could
do this and some that actually did. But there is also
business aspect: Microsoft from the start used "tax" method.
Namely, manufactur had to pay moderate fee for each PC they
sold. So even if you got alternative to DOS you effectively
payed for DOS. And since Microsoft kept moderate prices, there
was price pressure on competitors, competing product had
to be significantly better than MSDOS to justify price.
And when comptitor (DR DOS) was doing well, Microsoft put
extra code in Windows to detect that Windows was not
running on top of MSDOS and produce error message.

Of course, there is also issue of size of whole enterprise.
MSDOS class system is approachable by single person, but not
in a weekend (and probably not in a month). Most people
lack sufficient motivation to spend needed effort given that
quite good alternative (Free DOS) is available with sources.

> Here is what I have for MSDOS for the 8086:
>
> PDOS/86 (OS): about 30,000 lines
> PDPCLIB (C library): About 17,000 lines
> SubC (C compiler): About 5,500 lines
> as86 (assembler): About 13,000 lines
> pdar (archiver): About 1000 lines
> ld86 (linker): About 3000 lines
> pdmake (make): About 2000 lines

The line counts look a bit high to me, given limited functionality
of what you have. Especially line counts for PDOS and as86
look high.

I have Minix sources, it has 6192 lines in header files, which
include C library headers. IIUC some include files are generated,
so it is not clear if they should be counted as true sources.
There is 7651 lines for bootloaders, 331 lines in mandatory
system configuration files, 38282 lines for kernel proper,
19868 lines for networking support, 47361 for system libraries
(including C library). There is also 18345 lines of test
code (I am not sure if you include test code in your line
counts).

Note that Minix includes its own drivers for popular devices
and the source code is both for 8086 and 386 (there are two
versions of assembler code, C code is common).

Originally Minix was written during 3 years of part-time
work by Andrew Tanenbaum. He had full time job at univerity
and simultanenousy wrote a book about operationg systems,
using Minix as example. The code I have is for an expanded
version compared to orignal, but probably not more than
twice as large as original.

Tanenbaum took advantage that his univerity developed a
compiler+related tools (linker, assembler) and used those
for Minix. He also used available Unix utilities. I am
not sure if command processor (shell) was written specially
for Minix, but it was not included in counts above.

Linux-0.01 is about 11000 lines of code, this includes driver
for "standard" hard disc, keyboard and serial port (it looks
that there are no floppy driver). There is paging and
multitasking. There is filesystem (Minix compatible). There
are no user level command or compilers, one needs to get them
separately. IIUC this is essentially original version as written
by Linus Torwalds in 6 months.

Wirth and Gutkneht in 1986-1988 period created system Oberon.
That included device drivers, Oberon compiler (Oberon is both
name of language used for implementation and name of the
whole system), file system and GUI. Many things in Oberon
compared to modern systems look primitive. But probably it
could do more than DOS. There was some cost: Oberon requires
32-bit machine with graphic (bitmapped) display. Originally
Oberon was written for processor from National Semiconductor
which is essentially forgotten now. However, code was ported
to 386 (IIUC it was not much more than retargetting compiler)
and there is more modern version using custom RISC processor.

> I am not very good with algorithms,

Do you really mean "I am not very good with programming"?
When programming you all time deal with algorithms.
Frankly, it seems that you spent quite a lot of time to
get to the point were you are now. And IIUC substantial
part of your codebase came from other folks. Examples above
shows that other folks in 2-3 years time got systems that
look more advanced than yours.

> nor do I know
> much of the theory, so at the moment, only numbers
> 1 and 2 are within my capability.
>
> Note that I am running up against the 640k limit
> with PDOS/86. The OS and command processor
> are taking up 300k or something, and when I
> try to run pdmake (which opens another command
> processor before running another program), I run
> out of memory.

Real MSDOS kept COMMAND.COM on disk and loaded it only
when needed. In memory there was only small resident
stub (and of course kernel).

BTW: Do you mean 300k when compiled by Watcom or when
compiled by SubC? I would expect Watcom result to be
significantly smaller than result from SubC.

> I refuse to change the fundamental design to try to
> alleviate the memory problems, and instead wish to
> run the exact (*) same toolchain in either PM16 or
> PM32 with the D bit set to indicate 16-bit.
>
> If I go the PM32 route I am wondering whether I can
> make fairly small (LOC) changes to PDOS/386 to make
> it accommodate 16-bit (only) programs - the specific
> MSDOS tools that have been linked with PDPCLIB - I
> don't care about other MSDOS programs that don't
> follow "the rules" (*).
>
> (*) The rules aren't set in stone yet. PDPCLIB still
> hardcodes 4-bit shifts which won't work on either
> a Turbo 186 (8-bit shifts) or the above PM16/32
> scenario, and it is only when the rules exist, and
> PDPCLIB follows the rules, that I wish to throw
> 64 MB (to start with) at my MSDOS executables.
>
> Any comment?

Well, it seems that you want to have troubles and
you have them. When writing for small and slow machines
you ether need good optimizing compiler or hand
optimized assembly at least for critical parts. When
size is main concern there are ways to trade some
speed to decrease code size. In particular, using
interpreted byte code one can reduce code size 2-3
times compared to good assemby. With approriate mix
of small amount of fast code (hand written assembly or
output from optimizing compiler) and byte code one
can get small and relatively fast program. Both
segmentation and MSDOS "compatibility" are liabilities,
they bring unnecessary complications.

> Note that the LOC are mostly the same for PDOS/386.
> Only the assembler and linker change for those.
>
> pdas - 6000 lines
> pdld - 2000 lines
>
> Thanks. Paul.

--
Waldek Hebisch

muta...@gmail.com

unread,
Dec 1, 2022, 4:22:04 AM12/1/22
to
On Thursday, December 1, 2022 at 12:46:25 PM UTC+8, anti...@math.uni.wroc.pl wrote:

> Well, is real OS the big thing is device drivers. If you

Ok.

> In compilers hard part is optimization. When I compare gcc-4.8

Ok.

> Both of the above are quite different than MSDOS, so let
> me mention another aspect. "Bug compatibility" with
> different system is hard.

Ok, thanks.

I'm happy to bypass all of those things, basically.

> For MSDOS there are some specific troubles:
> - interfaces were specified in assembler

Another problem I am happy to bypass by creating
an API.

> - OS hand to run acceptably on small and by modern standard slow
> machines

Slow should not be a problem for an OS. Applications
should be bottlenecked in the application, not OS calls.

And I'm also happy to bypass the "small" issue.

> do this and some that actually did. But there is also
> business aspect: Microsoft from the start used "tax" method.
> Namely, manufactur had to pay moderate fee for each PC they
> sold. So even if you got alternative to DOS you effectively
> payed for DOS.

I see.

> Of course, there is also issue of size of whole enterprise.
> MSDOS class system is approachable by single person, but not
> in a weekend (and probably not in a month). Most people
> lack sufficient motivation to spend needed effort given that
> quite good alternative (Free DOS) is available with sources.

Ok. Plus most people are unwilling to make the code
public domain.

> > Here is what I have for MSDOS for the 8086:
> >
> > PDOS/86 (OS): about 30,000 lines
> > PDPCLIB (C library): About 17,000 lines
> > SubC (C compiler): About 5,500 lines
> > as86 (assembler): About 13,000 lines
> > pdar (archiver): About 1000 lines
> > ld86 (linker): About 3000 lines
> > pdmake (make): About 2000 lines

> The line counts look a bit high to me, given limited functionality
> of what you have. Especially line counts for PDOS and as86
> look high.

Here are the biggest files in PDOS:

21/11/2022 09:38 am 20,480 ntdll.c
21/11/2022 09:38 am 23,724 format.c
21/11/2022 09:38 am 24,876 liballoc.c
27/11/2022 11:57 am 26,211 memmgr.c
21/11/2022 09:38 am 32,445 bos.c
21/11/2022 09:38 am 40,865 kernel32.c
21/11/2022 09:38 am 42,017 minifat.c
25/11/2022 04:59 pm 46,549 pos.c
25/11/2022 04:48 pm 47,529 int21.c
21/11/2022 09:38 am 58,808 exeloado.c
25/11/2022 05:00 pm 91,099 pcomm.c
27/11/2022 11:49 am 139,899 fat.c
30/11/2022 12:33 pm 179,584 pdos.c

pdos.c is 6500 lines (that includes VM support which
I have conditionally disabled). fat.c is 3800 lines (that
includes LFN and FAT12-32).

pcomm.c is the command processor.

It all adds up.

None of it is test code.

> > I am not very good with algorithms,

> Do you really mean "I am not very good with programming"?
> When programming you all time deal with algorithms.

Possibly. I'm pretty good at debugging, and can do it
at the assembler level. In a typical workplace I'm the
go-to guy. I make things work. But that's when an
existing algorithm is not working due to a wild pointer
or buffer overflow or something, not understanding the
business logic. If the "business" is actually a utility, I
can understand that, e.g. the most recent bug I fixed in
a commercial setting was file synchronization. They
had put in a 2 second delay to ensure processes
didn't clash with each other. Another programmer had
"solved" the problem by creating a hash to make
filenames unique, which basically meant we were now
playing Russian Roulette, and the manager had sensed
that and asked me to look at it. I saw the original 2
second delay, ie the original design, and wondered why
it wasn't working as designed. It took a while because
it all looked correct to me. And normal testing showed
that it was working - the second process dutifully queued
on the first. I wanted to see evidence of genuine clashing,
because I didn't believe it wasn't working. I saw the
evidence, so switched belief. I eventually figured out that
they were deleting the original file after it had served its
purpose, and that process of deletion was allowing a
second task to reuse the same filename if the timing
was right. I can't remember the exact algorithm. The
only way I could solve it under the original design was
to not auto-delete all the files. I think I had to leave one
per "group" in /tmp. Like I said, I can't remember exactly.

And here's a comment from someone in one of the groups:

https://groups.io/g/hercules-os380/message/16004

As I said, I encourage you to work on your PDPCLIB. You've obviously
put a huge amount into it already; you are without question one of the
most productive people I know, even when I disagree with what you're
doing. There are all kinds of things you could do that would further
the ability to port existing programs. Just because I think many of
your proposals are wacky doesn't mean that I don't respect your work
and skill and "get it done" approach.

(That's an MVS-related group, and that was after he watched me
work on PDPCLIB and related for about 15 years)

Note that I wanted PDOS to work on S/370 too, and was
looking for a common code base for all my systems.

It took all that 15 years, and more, to understand AMODEs
properly to the point that I could design PDPCLIB "properly".

A lot of the things I was doing I was told were impossible.
Even by the people who were doing some of the coding.
Gerhard was writing the 370 assembler in PDPCLIB and
constantly telling me that it wouldn't work, and then it
started working, and I showed him the result, and he
mustn't have realized what I was showing him, and he
still insisted that it wouldn't work, and I told him I was
surprised that he was still saying it wouldn't work when
I had just showed him that it was already working.

> Frankly, it seems that you spent quite a lot of time to
> get to the point were you are now. And IIUC substantial
> part of your codebase came from other folks.

Probably about 30% came from others. I managed to get
FAT12 and 16 reading, but writing was terrible. I managed
to get FAT16 writing working after many years. Then
someone else (Alica) came along and voom, FAT32 existed,
writing of all worked, LFN worked.

The transition into PM32 came from someone else too. I
wrote the code, but it wasn't working. Someone else
figured out what I was doing wrong. There were no
diagnostics available for why the transition was failing.
You either get it right or you don't.

> Examples above
> shows that other folks in 2-3 years time got systems that
> look more advanced than yours.

Ok.

> BTW: Do you mean 300k when compiled by Watcom or when
> compiled by SubC?

SubC can't handle my code yet as it isn't C90-compliant.
And it only does small memory model too at the moment.

Here is Watcom for PDOS/86:

Memory size: 00030370 (197488.)

And pcomm:

Memory size: 00019a10 (104976.)

The latter wastes about 15k on big buffers that it doesn't
really need. But 15k isn't going to make or break this design,
so I'm not attempting to save that. I need to upgrade from
RM16, it's as simple as that. PDOS isn't designed for 640k.
Microsoft and Freedos can have that market.

BFN. Paul.

Scott Lurndal

unread,
Dec 1, 2022, 10:15:57 AM12/1/22
to
anti...@math.uni.wroc.pl writes:
>muta...@gmail.com <muta...@gmail.com> wrote:
>> Hi Waldek (mainly).

>For MSDOS there are some specific troubles:
>- interfaces were specified in assembler
>- OS hand to run acceptably on small and by modern standard slow
> machines
>
>Looking at this, I think that there were a lot of companies
>which could create something with comparable functionality
>to MSDOS, so in this sense replicating MSDOS was not hard.

I would argue that UEFI is the modern equivalent of MSDOS.

anti...@math.uni.wroc.pl

unread,
Dec 1, 2022, 12:23:12 PM12/1/22
to
muta...@gmail.com <muta...@gmail.com> wrote:
> On Thursday, December 1, 2022 at 12:46:25 PM UTC+8, anti...@math.uni.wroc.pl wrote:
>
> > - OS hand to run acceptably on small and by modern standard slow
> > machines
>
> Slow should not be a problem for an OS. Applications
> should be bottlenecked in the application, not OS calls.

You write "should". Toi have this OS must be efficient enough.
As an example let me mention Chi-Writer. This was an editor
running under MSDOS on 8086 class machines. Saving files to
floppies were quite slow. I looked at clock during saving
and time to save could be explained by assuming that Chi-Writer
wrote 1 sector per disk revolution. Technically floppy hardware
was capable of continiousy writing data giving 9 sectors
per revolution on standard (low density) floppies and more
for high density ones. But apparenty Chi-Writer issued
an OS call per sector and DOS/BIOS was too slow to immediately
write next sector after previous one and had to wait full
revolution. Here blame was shared by DOS/BIOS and Chi-Writer,
IIUC it was possible to do full track read/writes using BIOS.
But Chi-Writer was not the only program with such a problem.
AFAIK this was common problem with MSDOS programs, I just
had first hand experience with Chi-Writer.

To put this into a bit different light, my first own PC was 486.
It was pretty fast compared to other machines from the same time.
At first I used MSDOS as my OS. I got DJ GCC so I could do 32-bit
programming and avoid 16-bit limitations. But I noticed that
many things were slower than I hoped. I tried 386BSD and later
Linux on this machine. And many things become much faster.

And do not be fooled by emulation. When you run Hercules
emulated CPU may be 20 MIPS. But major part of mainframe
is I/O subsytem and using Hercules you are in fact using
I/O subsystem of modern OS. There is some overhead to
emulate IBM CKD discs, but compared to overhead for CPU
emulation I/O overhead is tiny. So in Hercules you get
excelenet I/O performace. There is no way that real 20
MIPS mainframe could match that. Similar things applay
to Bochs: in Bochs emulated CPU is quite slow while I/O
runs with all modern improvements. There is good chance
that program which runs fast enough under Bochs would be
dog slow on real hardware, even with faster hardware
processor.

Reiterating: to make your "should" resonably close to
truth requires serious work from OS implementer, in
particular in area of I/O.
With good algorithmic background such things immediately
"stink" when you read the code. I does not mean that
with algorithmic background you will get things right
first time, but knowledge helps to concentrate on
critical parts and get them right faster.

> Another programmer had
> "solved" the problem by creating a hash to make
> filenames unique, which basically meant we were now
> playing Russian Roulette, and the manager had sensed
> that and asked me to look at it. I saw the original 2
> second delay, ie the original design, and wondered why
> it wasn't working as designed. It took a while because
> it all looked correct to me. And normal testing showed
> that it was working - the second process dutifully queued
> on the first. I wanted to see evidence of genuine clashing,
> because I didn't believe it wasn't working. I saw the
> evidence, so switched belief. I eventually figured out that
> they were deleting the original file after it had served its
> purpose, and that process of deletion was allowing a
> second task to reuse the same filename if the timing
> was right. I can't remember the exact algorithm. The
> only way I could solve it under the original design was
> to not auto-delete all the files. I think I had to leave one
> per "group" in /tmp. Like I said, I can't remember exactly.

I Unix standard way to solve such problems is to build
name from process identifier. OS makes sure that
process identifiers of running processes are distinct, so
names produced from different processes can not clash. Things
get more tricky is file is supposed to live after process
exit.
I did transition to PM as part of my toy OS many years ago. You
basically do it "by the book": Intel docs said exactly what needs
to be done. Of course, you get sequence of something like 20
instructions and it is possible to make mistake here. IIRC it
took me few trials to get it right. I used writes to screen
memory and also keyboard LED-s to get feedback, so part whithout
feedback was quite short. It took something like few hours to get
it right and in the process I fixed some problems in code before
and after the switch.

> > Examples above
> > shows that other folks in 2-3 years time got systems that
> > look more advanced than yours.
>
> Ok.
>
> > BTW: Do you mean 300k when compiled by Watcom or when
> > compiled by SubC?
>
> SubC can't handle my code yet as it isn't C90-compliant.
> And it only does small memory model too at the moment.
>
> Here is Watcom for PDOS/86:
>
> Memory size: 00030370 (197488.)
>
> And pcomm:
>
> Memory size: 00019a10 (104976.)

Late Microsoft COMMAND.COM is 54619 bytes. Earlier were much
smaller (but had less functionality). COMMAND.COM from
early FreeDos is 67399 bytes. This one is probably larger
than necessary because it looks as it was compiled by
Turbo C (which generates relatively poor code).

> The latter wastes about 15k on big buffers that it doesn't
> really need. But 15k isn't going to make or break this design,
> so I'm not attempting to save that. I need to upgrade from
> RM16, it's as simple as that. PDOS isn't designed for 640k.
> Microsoft and Freedos can have that market.

I see. So you want PM16 and bigger machines? Or just assume
at least 32 bits with resonable amout of memory?

--
Waldek Hebisch

muta...@gmail.com

unread,
Dec 1, 2022, 1:50:08 PM12/1/22
to
On Friday, December 2, 2022 at 1:23:12 AM UTC+8, anti...@math.uni.wroc.pl wrote:

> for high density ones. But apparenty Chi-Writer issued
> an OS call per sector and DOS/BIOS was too slow to immediately
> write next sector after previous one and had to wait full
> revolution. Here blame was shared by DOS/BIOS and Chi-Writer,
> IIUC it was possible to do full track read/writes using BIOS.

Crikey - you expect the OS to be so fast that it can get
the next sector ready? I'm not trying to compete in that
market.

> And do not be fooled by emulation. When you run Hercules
> emulated CPU may be 20 MIPS. But major part of mainframe
> is I/O subsytem and using Hercules you are in fact using
> I/O subsystem of modern OS. There is some overhead to
> emulate IBM CKD discs, but compared to overhead for CPU
> emulation I/O overhead is tiny. So in Hercules you get
> excelenet I/O performace. There is no way that real 20
> MIPS mainframe could match that. Similar things applay
> to Bochs: in Bochs emulated CPU is quite slow while I/O
> runs with all modern improvements. There is good chance
> that program which runs fast enough under Bochs would be
> dog slow on real hardware, even with faster hardware
> processor.

I run PDOS on real hardware too. The bottleneck (for the
work I do) is in the GCC optimized compiles, not I/O.

But yes, I agree that the OS should do caching to allow
the application to bottleneck on CPU (actually, memory
access I think). PDOS doesn't do that caching, but
real I/O hardware is compensating for that currently.

I'm still more interested in bugs and theory than
performance.

> > Memory size: 00019a10 (104976.)

> Late Microsoft COMMAND.COM is 54619 bytes. Earlier were much
> smaller (but had less functionality). COMMAND.COM from
> early FreeDos is 67399 bytes. This one is probably larger

Sure. I'm not competing there. I'm happy to link in the
entire C library.

I didn't know in advance how big things would be.
Now I know.

> > The latter wastes about 15k on big buffers that it doesn't
> > really need. But 15k isn't going to make or break this design,
> > so I'm not attempting to save that. I need to upgrade from
> > RM16, it's as simple as that. PDOS isn't designed for 640k.
> > Microsoft and Freedos can have that market.

> I see. So you want PM16 and bigger machines? Or just assume
> at least 32 bits with resonable amout of memory?

I want both solutions.

32-bit won't run my 16-bit MSDOS executables that "follow the rules".
Yes, I understand that I can recompile from source, but I want both
things to work.

I want my 16-bit programs that "follow the rules" to work on
MSDOS in 640k, work on a Turbo 186 with access to 16 MiB
of memory, and work on an 80386 with access to 256 or
512 MiB depending on whether I choose PM16 or PM32 with
the D-bit set to 16-bit. And a theoretical 4 GiB on other hardware
too. I may or may not do 80286 too, using a triple-fault to get
back to RM16 quickly so that I can continue to make BIOS
calls. I researched timings previously and triple fault is fine.

I guess that is something I am doing - fleshing out combinations.
That's why I am also interested in running on other systems like
the Amiga (68020, not sure about 68000 as it requires extra
runtime support functions).

Basically I want to have a basic OS, that runs anywhere, that
can then be used to write a much better OS.

BFN. Paul.

muta...@gmail.com

unread,
Dec 9, 2022, 10:05:14 PM12/9/22
to
On Thursday, December 1, 2022 at 12:46:25 PM UTC+8, anti...@math.uni.wroc.pl wrote:

> In compilers hard part is optimization. When I compare gcc-4.8
> to gcc-12.0 it seems that code produced by gcc-12.0 is probaby
> about 10% more efficient than code from gcc-4.8. But C compiler
> in gcc-12.0 is twice as large as C compiler in gcc-4.8. And
> looking back, gcc-4.8 is much bigger than gcc-1.42 (IIRC C
> compiler in gcc-1.42 was of order one megabyte in size).
> gcc-12.0 produces more efficient code than gcc-1.42, but
> probably no more than 2 times more efficient. Certainly,
> code from gcc-12.0 is not 26 time more efficient than code
> from gcc-1.42 (which would be the case if speed of object
> code were simply proportional to compiler size). And in
> turn gcc-1.42 generates more efficient code than simpler
> compilers.

Someone said you can get 80% of the performance of a
modern compiler with a handful of "easy" optimizations,
and gave these as references "the dragon book" and
Frances Allen's "Seven Optimising Transformations".

Any comment?

Also, we have a new contender:

https://github.com/wxwisiasdf/cc23/tree/master

5000 lines just for the compiler (relying on pdcc as
the independent preprocessor), must be pretty close
to C90 now. Currently in very active development.

Both 80386 and i370. :-)

And he said he would do 8086 huge memory model too!

BFN. Paul.

muta...@gmail.com

unread,
Dec 9, 2022, 11:01:53 PM12/9/22
to
On Saturday, December 10, 2022 at 11:05:14 AM UTC+8, muta...@gmail.com wrote:

> 5000 lines just for the compiler (relying on pdcc as
> the independent preprocessor), must be pretty close
> to C90 now. Currently in very active development.

Assuming Microsoft's goons don't assassinate him like
they did Alica.

Fortunately this time around I've arranged for more
international spooks to keep tabs on Microsoft shenanigans.

BFN. Paul.

anti...@math.uni.wroc.pl

unread,
Dec 13, 2022, 6:25:20 PM12/13/22
to
muta...@gmail.com <muta...@gmail.com> wrote:
> On Thursday, December 1, 2022 at 12:46:25 PM UTC+8, anti...@math.uni.wroc.pl wrote:
>
> > In compilers hard part is optimization. When I compare gcc-4.8
> > to gcc-12.0 it seems that code produced by gcc-12.0 is probaby
> > about 10% more efficient than code from gcc-4.8. But C compiler
> > in gcc-12.0 is twice as large as C compiler in gcc-4.8. And
> > looking back, gcc-4.8 is much bigger than gcc-1.42 (IIRC C
> > compiler in gcc-1.42 was of order one megabyte in size).
> > gcc-12.0 produces more efficient code than gcc-1.42, but
> > probably no more than 2 times more efficient. Certainly,
> > code from gcc-12.0 is not 26 time more efficient than code
> > from gcc-1.42 (which would be the case if speed of object
> > code were simply proportional to compiler size). And in
> > turn gcc-1.42 generates more efficient code than simpler
> > compilers.
>
> Someone said you can get 80% of the performance of a
> modern compiler with a handful of "easy" optimizations,
> and gave these as references "the dragon book" and
> Frances Allen's "Seven Optimising Transformations".
>
> Any comment?

Well, it depends on your code. Compilers normally
will perform access to non local variables as written
in program. More precisely, it is easy to write program
in a way that there is no redundant non local memory accesses,
and it is hard to detect and eliminate redundant ones.
If memory access pattern is bad enough, then runtime will
be dominated by memory access and speeding other parts
has little effect. But there are also small benchmarks
and well-behaved programs which correlate to benchmarks.
On small benchmark Tiny C generated code which was about
6 times slower than code from gcc. But Tiny C compiled
by gcc run about two times faster than self compiled Tiny C.
So was object code from Tiny C 6 times slower than object
code from gcc or was it 2 times slower?

If on well behaved programs you can get about half of speed
of optimal code, than on badly behaved ones you probably
will get 80%.

A lot also depends on programming style. Compare

for(i = 0; i < N; i++) {
a[i] += b[i];
}

with

int * ap = a;
int * bp = b;
int * ep = a + N;
while(ap < ep) {
*ap++ += *bp++;
}

In both cases a and b are arrays of integers (but similar effect
would appear for other types). Good optimizing compiler will
generate similar or maybe the same code from both versions.
But naive compiler is expected to get faster code from second
version.

The above may look as small thing, but topic is much bigger.
Namely, modern tendency is to write code that at first glance
may look quite inefficient, but which compiler can transform
into much faster, frequently close to optimal code. You may
ask why to write "slow" code and depend on compiler to "fix"
it. The reason is that this "slow" code is easier to write
and to understand. For example you use small functions or
maybe macros. Code using small helper functions may be
shorter and easy to get correct. But in naive compiler
function calls cost. Expanding inline (say using macros)
alone is of limited help: function in general must do
more work than in special cases. And unsing macros risks
bigger object code (which may also make program slower).
Optimizing compiler effectively after expanding function
inline is producing special case version for given call.
In particular, if one of argunents is constant, then
compiler my find sustantial simplifications.

There is also different trend: autovectorization. Modern
PC-s have "vector" instructions which take arguments from
vestor registers, which may be 16,32 of 64 byte long and
treat them as arrays of numbers, say 4, 8 or 16 integers.
Vector operations performs the same operation (say addition
or multiplications) on corresponding integers in both
vectors. In effect, program may do computations many times
faster than using normal operations. Currently the most
extreme case would be parallel operation on bytes, which
can give 64 times speedup. Compilers like gcc have
now extentions which allow programmer to say that some
operations should be done using vector operations. But
it would be nicer if compiler could automatically use
vector operations when if gives faster code. This is
called autovectorization. In some cases it works
nicely and gives expected speedup. In some other cases
it does not work. Still, it is better to have support
for such thing even if it not always give speedup.

--
Waldek Hebisch
0 new messages