Re: Clarification of PO's UTM architecture and his H and H

olcott

unread,

Sep 6, 2020, 1:23:16 PM9/6/20

to

On 9/6/2020 11:00 AM, Mike Terry wrote:
> On 06/09/2020 04:16, olcott wrote:
>> On 9/5/2020 4:11 PM, Mike Terry wrote:
>>> Some questions for PO, concerning the architecture for his "UTM", H,
>>> H_Hat, etc..
>>>
>>> Here is my understanding:   [please correct me if I've misunderstood,
>>> but there's no need to explain about TM equivalent computations - this
>>> is just trying to get clear /what/ you are doing, not particularly
>>> /why/, although I expect we'll stray in that direction!]
>>>
>>> You will be using an appropriate (e.g. no library calls) subset of "C"
>>> as the TM language, and COFF files as the description of a "TM
>>> Program".   Or perhaps more accurately we would say it is the byte
>>> array contents of the COFF file which is the description, rather than
>>> the file itself. (That would make more sense.)
>>>
>>> E.g. if I want a TM that was to loop indefinitely, producing no
>>> output, I could code a C program file with
>>>
>>> void Loop (void)
>>> {
>>>     for (;;);
>>> }
>>>
>>> This is then to be compiled by Microsoft C compiler for i386 32-bit
>>> target platform, producing a COFF file output. (No linkage to .exe
>>> performed.) Or at least, I would do this if I wanted to run it in
>>> your emulator.
>>>
>>> From here I'm getting progressively less certain...
>>>
>>> If I wish to run this program I could use your emulator somehow. How
>>> exactly? I'm thinking that your emulator must be some other program
>>> you've written (in C/C++?) which /is/ compiled and linked to an
>>> executable.
>>
>> The emulator itself was written over many years by a very highly skilled
>> team of several developers. With my COFF library this system can now
>> directly execute COFF object files.
>>
>
> Will you be making the emulator (source code) available as part of your
> delivery?

Better than that, I will be making it available first.

>
>>> How exactly would I use it to emulate my COFF program running? (I'm
>>> thinking I might run it from a shell command line, with some
>>> command-line parameters, but /what/ exactly? I'm not expecting you to
>>> commit to a finalised command line syntax - just a rough idea will do.)
>>
>> Simply execute this on the command line:
>> C:\>x86utm.exe Filename.obj
>>
>> Although the same system will also run on Linux it will only be able to
>> execute COFF object files representing x86 code.
>>
>
> A minor question (just curious): a COFF file can contain many function
> implementations, so how would the emulator know which was the entry point?

The assumption is that the COFF file is generated by a Microsoft "C"
compiler, thus the entry point is always main().

>
>>>
>>> And now, what if I want a TM that adds two numbers (let's say unsigned
>>> numbers fitting in 32-bit unsigned int range) and add them (mod 2^32)
>>> and "return" the output. If this were a TM, I would put the two input
>>> numbers on the tape in a suitable format with the rest of the tape
>>> blank, then the TM would run and eventually halt, leaving the answer
>>> on the tape in a suitable format. What is the equivalent procedure
>>> with your emulator? Does your emulator have a virtual tape for this
>>> purpose?   (There's no reason it has to, just asking...)
>>>
>>
>> Return values from functions are shown when a value is assign to the EAX
>> register. All of "C" besides library calls is allowed. All globals must
>> be initialized or they don't exist at all.
>>
>> The only output is an assembly language execution trace.
>>
>
> OK, but how does my C code get a value into EAX? If I included a return

Here are all of the details of a complete example:

// "C" Source code
unsigned int Test()
{
return 0x12345678;
}

int main()
{
Test();
__asm hlt
}

// Intel XED Disassembly
b4: 55 push ebp
b5: 8BEC mov ebp, esp
b7: E803000000 call 0xbf
bc: F4 hlt
bd: 5D pop ebp
be: C3 ret

bf: 55 push ebp
c0: 8BEC mov ebp, esp
c2: B878563412 mov eax, 0x12345678
c7: 5D pop ebp
c8: C3 ret

// Debug Trace
[00b4] 55 push ebp
[00b5] 8bec mov ebp,esp
[00b7] e803000000 call 000000bf
[00bf] 55 push ebp
[00c0] 8bec mov ebp,esp
[00c2] b878563412 mov eax,12345678
[00c7] 5d pop ebp
[00c8] c3 ret
[00bc] f4 hlt

The halt decider is executed this same way, returning a value to main()
this same way.

> statement, the ABI would require the return value to be returned through
> EAX (at least for integer returns), but then it's too late for the HLT
> instruction. Also how does the C program get to include the HLT
> instruction anyway? (If this is embedded assembler, I guess there could
> just be a mov eax,retval embedded assembler just before HLT.)
>
> [Another approach would be to have the program call a routine declared
> extern, e.g. " Halt (1); /* represents halting in state 1 */ " and
> your emulator would recognise the Halt call as a request to halt, and
> also understand the halt code passed. There wouldn't need to be an
> actual routine Halt() in the COFF. Similar for other calls the C
> program may need to make back to the emulator. Might be neater...]
>
> The bigger mystery for me though is how are the two input numbers to be
> passed? You didn't comment on that. (Perhaps you saw my comment below
> saying "ignore that bit", but I only meant that for the example below
> since I had already asked about input in the example above.)
>
> So how do the two input numbers get passed in to my add "TM"? This is a
> question for the C coding itself (do I just code void Add (int a, int b)
> {...} and assume inputs are passed as a,b), and also for the scenario
> where I want to run the program in your emulator.
>

Its just regular ordinary "C" code that generates regular ordinary COFF
object files with the Microsoft "C" compiler.

The only thing that makes an x86 emulator into a UTM equivalent are four
operating system function calls that the master UTM equivalent provides
to its user programs:

// Saves the state of the current virtual-machine
void SaveState(u32* s);

// Loads the state of a saved virtual-machine
void LoadState(u32* s);

// Allocates memory from the heap
u32* Allocate(u32 size);

// The master machine executes the slave machine
void DebugStep(u32* master_state, u32* slave_state);

Each user code UTM equivalent can be the master to its own slave to an
arbitrary recursion depth.

>>> Also, what if I have some decider TM that (following tradition)
>>> examines its input, then terminates in one of two states, representing
>>> accept/reject. I understand the terminating aspect is handled by the
>>> COFF object code hitting a HLT instruction, but what of the halt states?
>>
>> Return int values of 1 or 0 from the halt decider function.
>>
>
> But the decider function does not return, right? it somehow executes a
> HLT instruction.
>
> Mike.
>

unsigned int H() is invoked from main() and returns to main() where
main() reaches HLT and halts. (see complete example above).

--
Copyright 2020 Pete Olcott

olcott

unread,

Sep 6, 2020, 1:41:02 PM9/6/20

to

Here is a slightly more complete example.

// "C" source code
unsigned int Test2()
{
return 0x12345678;
}

int main()
{
unsigned int M = Test2();

__asm hlt
}

// Intel XED Disassembly
b4: 55 push ebp
b5: 8BEC mov ebp, esp

b7: 51 push ecx
b8: E808000000 call 0xc5
bd: 8945FC mov dword ptr [ebp-0x4], eax
c0: F4 hlt
c1: 8BE5 mov esp, ebp
c3: 5D pop ebp
c4: C3 ret

c5: 55 push ebp
c6: 8BEC mov ebp, esp
c8: B878563412 mov eax, 0x12345678
cd: 5D pop ebp
ce: C3 ret

// Execution trace

[00b4] 55 push ebp
[00b5] 8bec mov ebp,esp

[00b7] 51 push ecx
[00b8] e808000000 call 000000c5
[00c5] 55 push ebp
[00c6] 8bec mov ebp,esp
[00c8] b878563412 mov eax,12345678
[00cd] 5d pop ebp
[00ce] c3 ret
[00bd] 8945fc mov [ebp-04],eax
[00c0] f4 hlt

Mike Terry

unread,

Sep 6, 2020, 8:47:26 PM9/6/20

to

That's good!

>>
>>>> How exactly would I use it to emulate my COFF program running? (I'm
>>>> thinking I might run it from a shell command line, with some
>>>> command-line parameters, but /what/ exactly? I'm not expecting you to
>>>> commit to a finalised command line syntax - just a rough idea will do.)
>>>
>>> Simply execute this on the command line:
>>> C:\>x86utm.exe Filename.obj
>>>
>>> Although the same system will also run on Linux it will only be able to
>>> execute COFF object files representing x86 code.
>>>
>>
>> A minor question (just curious): a COFF file can contain many
>> function implementations, so how would the emulator know which was the
>> entry point?
>
> The assumption is that the COFF file is generated by a Microsoft "C"
> compiler, thus the entry point is always main().
>

ok

ok, that's not how I would do it, but it's workable I guess. At least,
it's enough for deciders which only have to return a binary decision.

You've missed the point of my question. I wanted to know how my add
routine would receive its two inputs. What I /originally/ expected was
that I would code my Add "TM" in C as

int AddTM (int a, int b)
{
return a+b;
}

and somehow the values for a and b would be supplied on the shell
command line when invoking the emulator, so the emulator would start
execution of AddTM having placed a and b on the stack as expected by the
C code of AddTM. E.g. if I wanted it to add 3 and 4, something like:
x86utm.exe Filename.obj parm(int:3) parm(int:4)

NOW, building on how you've explained handling of return values, I'm
guessing input parms will be handled in essentially the same way: main()
will act as a wrapper for the actual "TM" (the AddTM routine) as follows:

// "C" Source code
int AddTM (int a, int b)
{
return a+b;
}

int main()
{
AddTM (3, 4);
__asm hlt
}

...so the source code contains not just the "TM", but a little wrapper
to call it, including the input supplied to the "TM". (In my example
the wrapper is main() which has hard coded literals for 3, 4.) If I
change my mind and want to run it to add 5, 84, I can (must) change the
code and recompile it to a new COFF file.

Is this right? (It's an ok approach, easy for you to develop.)

If so, I can move on to my first serious question!

Mike.

olcott

unread,

Sep 6, 2020, 9:19:22 PM9/6/20

to

int main()
int N = AddTM(56,93);

}

> and somehow the values for a and b would be supplied on the shell
> command line when invoking the emulator, so the emulator would start
> execution of AddTM having placed a and b on the stack as expected by the
> C code of AddTM. E.g. if I wanted it to add 3 and 4, something like:
> x86utm.exe Filename.obj parm(int:3) parm(int:4)
>
> NOW, building on how you've explained handling of return values, I'm
> guessing input parms will be handled in essentially the same way: main()
> will act as a wrapper for the actual "TM" (the AddTM routine) as follows:
>
> // "C" Source code
> int AddTM (int a, int b)
> {
> return a+b;
> }
>
> int main()
> {
> AddTM (3, 4);
> __asm hlt
> }
>

Exactly!

> ...so the source code contains not just the "TM", but a little wrapper

Just the cconventional main()

> to call it, including the input supplied to the "TM". (In my example
> the wrapper is main() which has hard coded literals for 3, 4.) If I
> change my mind and want to run it to add 5, 84, I can (must) change the
> code and recompile it to a new COFF file.
>
> Is this right? (It's an ok approach, easy for you to develop.)
>

Yes.

> If so, I can move on to my first serious question!
>
> Mike.
>

OK

olcott

unread,

Sep 7, 2020, 1:16:55 AM9/7/20

to

The idea is to make it as much like a Turing machine as possible and yet
allow "C" to be its programming language. All of its memory is a single
contiguous block as if it was a Turing machine tape.

For Turing machines all the "input" is already on the tape or it doesn't
exist.

Mike Terry

unread,

Sep 7, 2020, 2:04:52 PM9/7/20

to

ok, so now what about the (partial) halt decider H. How does H receive
its input data? As above, it would be passed the input by main(), but
how will that input be represented?

This has to take as input two parameters:

1) A TM description
2) A runtime data parm

It seems to me there are at least a couple of reasonable approaches:

- Represent both as arrays of bytes with a supplied length. The TM
description would be the /contents/ of the COFF file.

- Represent both as strings. The TM description would be the /name/ of
the COFF file.

The most "logical" approach (which respects the original intention of
the Linz definitions) is the array of bytes, but due to the other
architectural choices you've made, this way leads to a few headaches!

The strings approach /could/ work specifically for your purposes,
although not as a /general/ approach for working with halting TMs. The
problem with this as a general approach is that the TM ought to be able
to examine all aspects of the input TM to make its decision, but it
can't do that if all it has is a file name. (Or at least, it can't do
this unless you also include file I/O in your TM C spec, and you've
already said "no library functions". But maybe your specific H doesn't
need to examine the COFF data in this fashion.

Anyway, which way are you planning to go on this question?

Regards,
Mike.

olcott

unread,

Sep 7, 2020, 2:44:21 PM9/7/20

to

I have a perfect answer that I was going to post that I just now decided
to make trade secret for about one month. I don't want someone to take
my hints and publish my solution before I do.

Mike Terry

unread,

Sep 7, 2020, 3:51:47 PM9/7/20

to

I could ask how revealing how you intend to represent a "TM" description
within your C program constitutes a hint. But I know that once you get
these strange ideas there's no reasoning with you, so just get on with
it then.

Mike.

olcott

unread,

Sep 7, 2020, 4:25:36 PM9/7/20

to

I can't afford one chance in million of ruining my life's work.

olcott

unread,

Sep 7, 2020, 4:36:37 PM9/7/20

to

One thing that I have accomplished with you is that I am fairly
confident that you can see that I really am building an actual x86 based
UTM equivalent and not merely endlessly bull shitting about this.

Ben should have been able to see this on the basis of my x86utm
operating system interface:

// Saves the state of the current virtual-machine
void SaveState(u32* s);

// Loads the state of a saved virtual-machine
void LoadState(u32* s);

// Allocates memory from the heap
u32* Allocate(u32 size);

// The master machine executes the slave machine
void DebugStep(u32* master_state, u32* slave_state);

olcott

unread,

Sep 7, 2020, 8:46:07 PM9/7/20

to

On 9/7/2020 7:07 PM, Andy Walker wrote:
> On 07/09/2020 19:43, olcott wrote:

>> I have a perfect answer that I was going to post that I just now
>> decided to make trade secret for about one month. I don't want
>> someone to take my hints and publish my solution before I do.
>

>     If you don't want someone to publish before you, the best way
> is to establish priority by publishing at least something. You may
> recall that Andrew Wiles's proof of FLT was initially flawed, and
> there was a degree of scepticism; but once it was "out there", the
> proof was quite rapidly made correct. No-one disputes that it was
> his proof. Similarly, the Appel-Haken proof of the four-colour
> theorem followed more than a century of attempts, and was initially
> distrusted because of the compexity of the proof; but others
> produced simpler and independent proofs. Appel and Haken are still
> universally acknowledged; if they had waited, others might easily
> have got in first.
>

I may pre-publish the essence of my work through email to a few comp
theory college professors. The email should sufficiently establish an
audit trail. Now that I am so close to finishing I am sure that a
summation of this work would be construed as establishing priority at
least in light of the very soon to be completed work.

>     Publish what you have. It will be either refuted or refined.
> If you have something interesting, you will be grateful of the help
> in refining it. If you don't, then you need to stop wasting time
> pursuing red herrings.
>
>     If you want to publish while retaining "trade secrets", you
> may recall the 17thC precedent of [eg] Hooke publishing his law in
> the form of an anagram.
>

I may be able to publish everything except a PhD level quality verbiage
by next month. Since my code will fully speak for itself not having the
best words to describe it will merely delay academic publication and
thus have no lasting impact.

I may have something like 1500 hours in my x86 based UTM equivalent. At
least two months of that work (trying to do everything from scratch) was
discarded.

My x86utm has the superb basis of a really well written x86 emulator
that was written by a team of highly skilled developers over several
decades. It fully implements the both the 16-bit and 32-bit entire basic
instruction set of the Intel x86 language and no more.

I developed a COFF library for it so that the slightly enhanced system
can directly execute the COFF object files generated by the Microsoft
"C" compiler. It will probably also directly execute COFF object files
targeted for x86 by GCC. I am focusing on the Microsoft "C" because it
generates code that is easier to understand.

x86utm is concurrently implemented on both Windows and Linux. It will be
published as open source. I may also make x86utm into a web-service.

Re: Clarification of PO's UTM architecture and his H and H_Hat

olcott

olcott

Mike Terry

olcott

olcott

Mike Terry

olcott

Mike Terry

olcott

olcott

olcott