Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

reading a file from disk

64 views
Skip to first unread message

luser...@nospicedham.gmail.com

unread,
May 16, 2021, 4:48:21 PM5/16/21
to
Hello all,

I'm planning to resume work on my partly written 8086 emulator after a
long hiatus. I want to add the ability to read a file, but I'm having some
difficulty
understanding how it's supposed to work under MS-DOS. I've found the
listing of int 13h in Ralf Brown's Interrupt List (http://www.ctyme.com/intr/cat-003.htm)
but it all seems very complicated and perhaps unnecessary.

For the simplest working test, I think I can skip the CHS addressing and
just use Logical Block Addressing with a single "disk" file on the host.
Is there a good resource to understand how this all should work?

I need to implement the BIOS routines and call host functions, probably
just mmap'ing the file and using memcpy for both read and write.
I have this sort of thing partly working for keyboard read and and screen
write by using ESC instructions in the BIOS routines. The emulator
implements the ESC instructions to call host functions getchar() and
putchar().

My emulator code is at https://github.com/luser-dr00g/8086
with some overview and explanations in https://github.com/luser-dr00g/8086/pres

TIA
--
droog

luser...@nospicedham.gmail.com

unread,
May 24, 2021, 2:22:49 AM5/24/21
to
On Sunday, May 16, 2021 at 3:48:21 PM UTC-5, luser...@nospicedham.gmail.com wrote:
> Hello all,
>
> I'm planning to resume work on my partly written 8086 emulator after a
> long hiatus. I want to add the ability to read a file, but I'm having some
> difficulty
[...]
> Is there a good resource to understand how this all should work?

I suppose this isn't really related to assembly language. I've ordered Peter Norton's
Guide to the IBM PC. I guess I'll post in comp.os.msdos.programmer if I run into
trouble.

anti...@nospicedham.math.uni.wroc.pl

unread,
May 24, 2021, 9:40:27 PM5/24/21
to
luser...@nospicedham.gmail.com <luser...@nospicedham.gmail.com> wrote:
> Hello all,
>
> I'm planning to resume work on my partly written 8086 emulator after a
> long hiatus. I want to add the ability to read a file, but I'm having some
> difficulty
> understanding how it's supposed to work under MS-DOS. I've found the
> listing of int 13h in Ralf Brown's Interrupt List (http://www.ctyme.com/intr/cat-003.htm)
> but it all seems very complicated and perhaps unnecessary.
>
> For the simplest working test, I think I can skip the CHS addressing and
> just use Logical Block Addressing with a single "disk" file on the host.
> Is there a good resource to understand how this all should work?
>
> I need to implement the BIOS routines and call host functions, probably
> just mmap'ing the file and using memcpy for both read and write.
> I have this sort of thing partly working for keyboard read and and screen
> write by using ESC instructions in the BIOS routines. The emulator
> implements the ESC instructions to call host functions getchar() and
> putchar().

Basic question is what do you want to emulate? 8086 by itself can not
do file I/O. You may do PC emulator in style of Bochs, that is
emulate common hardware. You may do BIOS emulator. You may
do DOS emulator, that is emulate file I/O at DOS level. Or
you may emulate different system. For example QEMU 386 emulates
Linux system calls. If you want your emulator to be simple,
you may define your own system calls (say using something like INT 0x80h),
thing like open, close, read, write. Advantage is simplicity.
Disadvantage is that large body of existing 8086 assembler programs
assumes DOS environment and will not work with different system.

--
Waldek Hebisch

luser...@nospicedham.gmail.com

unread,
May 26, 2021, 6:47:10 PM5/26/21
to
Very good points. Thanks. You're right that the 8086 by itself only has
in/out instructions, ESC instructions, and memory mapped IO, all of which
depend on the rest of the system. It appears that the IBM BIOS operates
at the very low level of clusters, heads, and sectors. But the DOS functions
appear to map pretty closely to stdio.h functions. So that seems to be
the easiest path forward.

My goal at this stage is just to get some kind of read/write ability so the
Forth interpreter can read Forth source from a file. So far, all my Forth code
is written in a sort of "pre-compiled" form directly in the C code that
implements the CPU emulator.

Rod Pemberton

unread,
May 27, 2021, 2:17:47 AM5/27/21
to
On Wed, 26 May 2021 15:33:53 -0700 (PDT)
"luser...@nospicedham.gmail.com" <luser...@nospicedham.gmail.com>
wrote:

<follow ups set to comp.lang.forth, from comp.lang.asm.x86>

> My goal at this stage is just to get some kind of read/write ability
> so the Forth interpreter can read Forth source from a file. So far,
> all my Forth code is written in a sort of "pre-compiled" form
> directly in the C code that implements the CPU emulator.

Since your Forth interpreter is coded in C, you might start by using
custom Forth words for standard C file I/O functions. Over time, you
could implement modern standard Forth words for loading a file, by
transforming and rewriting the custom words, as these mostly match C's
functionality. Personally, I'd avoid loading blocks of ancient text
screens like fig-Forth, unless you already have the functionality.
E.g., set an ANS Forth word like OPEN-FILE to C's fopen() so you can
build other ANS Forth file I/O words like INCLUDED INCLUDE-FILE etc.
You might be able to do this by setting the CFA for a primitive (or
low-level Forth word) with the address of the C function E.g., if you
have some Forth words coded in C (or assembly), you should be able to
do this.

http://lars.nocrew.org/dpans/dpans11.htm#11.6.1.1718

--
The SALT deduction is a kickback of taxes to wealthy people in wealthy
states.

James Harris

unread,
May 27, 2021, 3:02:52 AM5/27/21
to
On 26/05/2021 23:33, luser...@nospicedham.gmail.com wrote:
> On Monday, May 24, 2021 at 8:40:27 PM UTC-5, anti...@nospicedham.math.uni.wroc.pl wrote:
>> luser...@nospicedham.gmail.com <luser...@nospicedham.gmail.com> wrote:

...

>> Basic question is what do you want to emulate? 8086 by itself can not
>> do file I/O. You may do PC emulator in style of Bochs, that is
>> emulate common hardware. You may do BIOS emulator. You may
>> do DOS emulator, that is emulate file I/O at DOS level. Or
>> you may emulate different system. For example QEMU 386 emulates
>> Linux system calls. If you want your emulator to be simple,
>> you may define your own system calls (say using something like INT 0x80h),
>> thing like open, close, read, write. Advantage is simplicity.
>> Disadvantage is that large body of existing 8086 assembler programs
>> assumes DOS environment and will not work with different system.
>>
>
> Very good points. Thanks. You're right that the 8086 by itself only has
> in/out instructions, ESC instructions, and memory mapped IO, all of which
> depend on the rest of the system. It appears that the IBM BIOS operates
> at the very low level of clusters, heads, and sectors. But the DOS functions
> appear to map pretty closely to stdio.h functions. So that seems to be
> the easiest path forward.

There's a guy on alt.os.development who has been posting about very much
the same sorts of thing. Maybe there's value in the two of you getting
in touch, if you aren't already.


--
James Harris

luser...@nospicedham.gmail.com

unread,
May 29, 2021, 3:21:09 PM5/29/21
to
Thanks. I have read some of his postings in comp.lang.c and have just
now started to browse some in AOD. We have a similar set of interests
but diverge on many of the details. Eg. I'm using C99 tools[1] rather than C90
and targeting just 8086 for now. I still haven't implemented the full
instruction set, just the ones initially required for the codegolf.stackexchange.com
challenge and additional ones needed to get the Forth up and running.
So, while we're both working in the same sort of space we're on different
peaks of the mountain range.

[1] I'm pretty much addicted to the C99 designated initializers and variable
argument macros. I can't really imagine doing without those for hobby work.

luser...@nospicedham.gmail.com

unread,
Jun 17, 2021, 9:43:57 PM6/17/21
to
For posterity, Norton's Guide really seems to be the perfect book to learn all about
this stuff. It looks like I want to bypass DOS 1.0 stuff, too, and go straight for the DOS 2.0
additions to have a file handle and less fiddly business.

luser...@nospicedham.gmail.com

unread,
Jul 4, 2021, 12:32:14 AM7/4/21
to
So I read the whole book except some of the video and keyboard stuff that I may not need
(or at least don't need yet).

Here's my rough draft of dos support functions. The vv array is the payload of the ESC
bytes from the ESC instruction. My interrupt handlers fill it in with the interrupt number.
U is a uintptr_t.

It need to do more error checking and reporting, but this should be enough to access
files from my Forth CODEs. Maybe exiting a dos program ought not to exit() the whole
emulator.


static int keyboard_input_with_echo(){
bput(al, fgetc(stdin));
}

static int display_output(){
fputs( cp437tounicode( bget(dl) ), stdout );
bput(al,bget(dl)); if(bget(al)=='\t')bput(al,' ');
}

static int display_string(){
f=wget(dx);
while(mem[f]!='$')fputs( cp437tounicode( mem[f++] ), stdout );
bput(al,'$');
}

static int get_date(){
time_t t=time(NULL);struct tm*tm=localtime(&t);
wput(cx,tm->tm_year);
wput(dh,tm->tm_mon);
wput(dl,tm->tm_mday);
wput(al,tm->tm_wday);
}

static int get_time(){
struct timeval tv;gettimeofday(&tv,0);
time_t t=time(NULL);struct tm*tm=localtime(&t);
bput(ch,tm->tm_hour);
bput(cl,tm->tm_min);
bput(dh,tm->tm_sec);
bput(dl,tv.tv_usec/10);
}

static int open_file(){
U mode = bget(al);
FILE *f = fopen(mem + ds_(dx), (mode & 7) == 0? "r":
(mode & 7) == 1? "w": "rw");
if( f ){
U handle = next_handle ++;
handles[ handle ] = f;
wput(ax, handle);
clc();
return 0;
}
wput(ax, 0);
stc();
}

static int close_file_handle(){
U handle = wget(bx);
fclose( handles[ handle ] );
handles[ handle ] = 0;
-- next_handle;
}

static int read_file(){
U handle = wget(bx);
U count = fread(mem + ds_(dx), 1, wget(cx), handles[ handle ]);
if( count ){
wput(ax, count);
clc();
return 0;
}
wput(ax, 5); //access denied
stc();
}

static int write_file(){
U handle = wget(bx);
U count = fwrite(mem + ds_(dx), 1, wget(cx), handles[ handle ]);
if( count == wget(cx) ){
clc();
return 0;
}
wput(ax, 5); //access denied
stc();
}

static int move_file_pointer(){
U handle = wget(bx);
U whence = bget(al);
fseek( handles[ handle ], qget(cx,dx), whence == 0? SEEK_SET:
whence == 1? SEEK_CUR:
whence == 2? SEEK_END: 0);
U pos = ftell( handles[ handle ] );
qput(dx, ax, pos);
}

static int dos( UC vv[7] ){
switch(bget(ah)){
CASE 0x01: return keyboard_input_with_echo();
CASE 0x02: return display_output();
CASE 0x09: return display_string();

CASE 0x2A: return get_date();
CASE 0x2C: return get_time();

CASE 0x3C: // create file
CASE 0x3D: return open_file();
CASE 0x3E: return close_file_handle();
CASE 0x3F: return read_file();
CASE 0x40: return write_file();
CASE 0x42: return move_file_pointer();
CASE 0x44: // ioctl
CASE 0x4B: // load/execute program
CASE 0x4C: exit(bget(al));
CASE 0x5B: // create new file
;
}
}


static int keyboard_input_with_echo(){
bput(al, fgetc(stdin));
}

static int display_output(){
fputs( cp437tounicode( bget(dl) ), stdout );
bput(al,bget(dl)); if(bget(al)=='\t')bput(al,' ');
}

static int display_string(){
f=wget(dx);
while(mem[f]!='$')fputs( cp437tounicode( mem[f++] ), stdout );
bput(al,'$');
}

static int get_date(){
time_t t=time(NULL);struct tm*tm=localtime(&t);
wput(cx,tm->tm_year);
wput(dh,tm->tm_mon);
wput(dl,tm->tm_mday);
wput(al,tm->tm_wday);
}

static int get_time(){
struct timeval tv;gettimeofday(&tv,0);
time_t t=time(NULL);struct tm*tm=localtime(&t);
bput(ch,tm->tm_hour);
bput(cl,tm->tm_min);
bput(dh,tm->tm_sec);
bput(dl,tv.tv_usec/10);
}

static int open_file(){
U mode = bget(al);
FILE *f = fopen(mem + ds_(dx), (mode & 7) == 0? "r":
(mode & 7) == 1? "w": "rw");
if( f ){
U handle = next_handle ++;
handles[ handle ] = f;
wput(ax, handle);
clc();
return 0;
}
wput(ax, 0);
stc();
}

static int close_file_handle(){
U handle = wget(bx);
fclose( handles[ handle ] );
handles[ handle ] = 0;
-- next_handle;
}

static int read_file(){
U handle = wget(bx);
U count = fread(mem + ds_(dx), 1, wget(cx), handles[ handle ]);
if( count ){
wput(ax, count);
clc();
return 0;
}
wput(ax, 5); //access denied
stc();
}

static int write_file(){
U handle = wget(bx);
U count = fwrite(mem + ds_(dx), 1, wget(cx), handles[ handle ]);
if( count == wget(cx) ){
clc();
return 0;
}
wput(ax, 5); //access denied
stc();
}

static int move_file_pointer(){
U handle = wget(bx);
U whence = bget(al);
fseek( handles[ handle ], qget(cx,dx), whence == 0? SEEK_SET:
whence == 1? SEEK_CUR:
whence == 2? SEEK_END: 0);
U pos = ftell( handles[ handle ] );
qput(dx, ax, pos);
}

static int dos( UC vv[7] ){
switch(bget(ah)){
CASE 0x01: return keyboard_input_with_echo();
CASE 0x02: return display_output();
CASE 0x09: return display_string();

CASE 0x2A: return get_date();
CASE 0x2C: return get_time();

CASE 0x3C: // create file
CASE 0x3D: return open_file();
CASE 0x3E: return close_file_handle();
CASE 0x3F: return read_file();
CASE 0x40: return write_file();
CASE 0x42: return move_file_pointer();
CASE 0x44: // ioctl
CASE 0x4B: // load/execute program
CASE 0x4C: exit(bget(al));
CASE 0x5B: // create new file
;
}
}


And this bit is silly but fun:

unsigned cp437table[256] = {
' ', 0x263A,0x263B,0x2665,0x2666,0x2663,0x2660,0x2022,
0x25D8,0x25CB,0x2509,0x2642,0x2640,0x266A,0x266B,0x263C,
0x25BA,0x25C4,0x2195,0x203C,0x00B6,0x00A7,0x25AC,0x21A8,
0x2191,0x2193,0x2192,0x2190,0x221F,0x2194,0x25B2,0x25BC,
32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47,
48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,
64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79,
80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95,
96, 97, 98, 99,100,101,102,103,104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,0x2302,
0xC7,0xFC,0xE9,0xE2,0xE4,0xE0,0xE5,0xE7,0xEA,0xEB,0xE8,0xEF,0xEE,0xEC,0xC4,0xC5,
0xC9,0xE6,0xC6,0xF4,0xF6,0xF2,0xFB,0xF9,0xFF,0xD6,0xDC,0xA2,0xA3,0xA5,0x20A7,0x192,
0xE1,0xED,0xF3,0xFA,0xF1,0xD1,0xAA,0xBA,0xBF,0x2310,0xAC,0xBD,0xBC,0xA1,0xAB,0xBB,
0x2591,0x2592,0x2593,0x2502,0x2524,0x2561,0x2562,0x2556,
0x2555,0x2563,0x2551,0x2557,0x255D,0x255C,0x255B,0x2510,
0x2514,0x2534,0x252C,0x251C,0x2500,0x253C,0x255E,0x255F,
0x255A,0x2554,0x2569,0x2566,0x2560,0x2550,0x256C,0x2567,
0x2568,0x2564,0x2565,0x2559,0x2558,0x2552,0x2553,0x256B,
0x256A,0x251B,0x250C,0x2588,0x2584,0x258C,0x2590,0x2580,
0x3B1,0xDF,0x393,0x3C0,0x3A3,0x3C3,0x3BC,0x3C4,
0x3A6,0x398,0x3A9,0x3B4,0x221E,0x3C6,0x3B5,0x2229,
0x2261,0xB1,0x2265,0x2264,0x2320,0x2321,0xF7,0x2248,
0xB0,0x2219,0xB7,0x221A,0x207F,0xB2,0x25A0,0xA0
};

static
char *cp437tounicode( unsigned int c ){
static char buf[4] = "";
unsigned ucs4 = cp437table[ c ];
if( ucs4 < 0x80 ){ // 0... ....
buf[0] = ucs4;
buf[1] = 0;
} else
if( ucs4 < 0x800 ){ // 110. .... 10.. ....
buf[0] = 0xC0 | ucs4 >> 6;
buf[1] = 0x80 | ucs4 & 0x3F;
buf[2] = 0;
} else
if( ucs4 < 0x10000 ){ // 1110 .... 10.. .... 10.. ....
buf[0] = 0xE0 | (ucs4 >> 12) & 0xF;
buf[1] = 0x80 | (ucs4 >> 6) & 0x3F;
buf[2] = 0x80 | ucs4 & 0x3F;
buf[3] = 0;
}
return buf;
}

Frank Kotler

unread,
Jul 4, 2021, 1:31:05 AM7/4/21
to
On 07/04/2021 12:17 AM, luser...@nospicedham.gmail.com wrote:
...
>>> I suppose this isn't really related to assembly language.

No...

I hate to chase you away, where you're doing low level stuff, but keep
it in assembly, okay?

If you post to multiple groups and I reject it, it won't post at all, so
don't...

Best,
Frank
{moderator}

0 new messages