Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

status update: x86 interpreter...

3 views

Skip to first unread message

BGB / cr88192

unread,

Nov 9, 2009, 2:23:49 PM11/9/09

seems I had not been telling people here about this, so, I just figured I
would give a status update...

x86 ISA, still "mostly" supported:

x87 has not been well tested (and even then, does not currently support
"subtle issues" such as rounding modes, ...);
SSE/SSE2, still not gotten to finishing up support as of yet (mostly lacking
is packed integer support, ...).

the x86 core ISA thus far seems to work in tests.

performance:

currently about 13 MIPS in tests (testing a simple loop doing lots of memory
IO and bit-twiddling);
in terms of time, the interpreter is about 76x slower than native (vs the
same loop compiled in native code).

the interpreter is currently pure C, and is not particularly
micro-optimized, so this factor could still be improved some (ASM and/or JIT
being options, but at the moment I am not considering them, since as I see
it, having the thing work acceptably is more important than speed at
present).

similarly, I have currently also excluded optimizations which would
unecessarily mess up my current design practices (such as violating
modularity, lead to overly large amounts of redundancy or special cases,
...).

functionality:

I have a C library on the thing (PDPCLIB), and the code making use of said C
library seems to start up and work ok. effort is currently underway to be
able to support JNI as well (I am using JNI to interface with my native
object system, although it is not an exact match with the JVM object
system).

note that, in this case, I am generally importing APIs via a JNI-style
"structs of function pointers" strategy (although this is rather
inconvinient and tedious). my main reason for doing so is to avoid needing
physical linkage with other parts of my framework, and to more easily
allowing falling back to native facilities (or automatically disabling
stuff), should stuff not be available.

internally, some facilities are being marshalled into the interpreter via
JNI and JVMTI, and I am implementing similarly designed interfaces for other
pieces of functionality (VFS, ASM / native-code reflection, ...).

(note that my framework is NOT a JVM, even if I am using JNI and JVMTI to
marshall certain facilities, ...).

I am currently also working some on POSIX functionality (within the
virtualized world). I don't expect a "complete" implementation, but
hopefully enough for my uses.

currently supported POSIX features: basic file IO, dlopen/dlsym, ...
still lacking: more advanced file features (stat, readdir, ...), sockets,
pthreads, ...

I am making use of a posix-style "process" model (AKA: multiple running
processes, each with a PID, assigned UID and GID, ...).

I am considering using UID and GID for the security model, but may
"re-interpret" how they work (nevermind lacking support in my framework's
VFS).

"POSIX shell" support is nebulous (some "shell" related functionality has
been written, but I am not sure what here is actually relevant or what all
is worth trying to implement). shells may be attached to processes, but
currently are not themselves processes (IOW: they exist as native code).

I am still using PE/COFF EXE's and DLL's (but may consider leaving off the
'.EXE' extension, or allowing use of '.SO' for DLL's). little beyond
inconvinience (having the means to compile code as ELF on Windows, or
writing a loader) prevents using ELF. I "could" take a "flex" strategy and
use whatever is more convinient (vs forcing Linux builds to use PE/COFF),
although it can be noted that they are not strictly equivalent (and could
also pose issues for dumb build tools).

(technically, mostly all this is more a matter of aesthetics and available
compilers...).

for technical reasons, I may impose limits as to how much memory can be used
by virtual processes (otherwise, I would need a "proper" MMU and support for
swapping in order to work effectively on 32-bit hosts).

or such...

Rod Pemberton

unread,

Nov 10, 2009, 10:13:26 AM11/10/09

"BGB / cr88192" <cr8...@hotmail.com> wrote in message
news:hd9q87$6cp$1...@news.albasani.net...

>
> x86 ISA, still "mostly" supported:

...

> x87 has not been well tested (and even then, does not currently support
> "subtle issues" such as rounding modes, ...);

Is this really needed?

What x86 code are you running through your emulator? BIOS? Video BIOS?

> SSE/SSE2, still not gotten to finishing up support as of yet (mostly
lacking
> is packed integer support, ...).

Ditto.

> currently supported POSIX features: basic file IO, dlopen/dlsym, ...
> still lacking: more advanced file features (stat, readdir, ...), sockets,
> pthreads, ...

Doug Gwyn's PD libndir has readdir, opendir, seekdir, etc. directory
functions. It's usually name libndir.tar.z or libndir-posix.tar.z The two
versions are slightly different. There is also posix-1.0-src-11.00.tar.gz
which has more files.

I had to locate a new link for them. This site has them:
http://ftp.lahtermaher.org/pub/unix-c/languages/c/

posix-1.0-src-11.00.tar.gz:
http://hpux.connect.org.uk/ftp/hpux/Languages/posix-1.0/

Rod Pemberton

BGB / cr88192

unread,

Nov 10, 2009, 11:28:36 AM11/10/09

"Rod Pemberton" <do_no...@nohavenot.cmm> wrote in message
news:hdc024$939$1...@aioe.org...

> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:hd9q87$6cp$1...@news.albasani.net...
>>
>> x86 ISA, still "mostly" supported:
> ...
>
>> x87 has not been well tested (and even then, does not currently support
>> "subtle issues" such as rounding modes, ...);
>
> Is this really needed?
>
> What x86 code are you running through your emulator? BIOS? Video BIOS?
>

mostly userspace code, since it is essentially interpreting/emulating C code
compiled into a form usable from the simulated userspace.

it currently does not (completely) simulate either Ring-0 features, or the
hardware.
like a traditional userspace, it is assumed that all communication with the
"kernel" (in this case, the interpreter itself, and everything outside said
interpreter) is via system calls (since, as can be noted, traditional
userspace apps don't have any access to the HW anyways, only the flat 4GB
address space, of which I am currently dedicating 2GB as usable for the app,
1GB as "shared", and 1GB for internal interpreter use, mostly for swizzling
handles between address spaces, ...).

FWIW, I could give a simulated app up to 3.5GB of address space (essentially
giving up the idea of a "common region" and limiting the raw number of
references which may be swizzled), but this is not needed, and OS's such as
Windows typically impose a 2GB app address-space limit anyways...

I may also consider address-space randomization as well, but have not done
so as of yet...

as for x87:
typically, it is locked into full 80-bit mode, and the low order bits are
subject to "whatever" roundoff.

some operations only have 64 bit accuracy (currently, mostly for things like
sin/cos/...).

>> SSE/SSE2, still not gotten to finishing up support as of yet (mostly
> lacking
>> is packed integer support, ...).
>
> Ditto.
>

yep.

I also have a few "bits and pieces" of MMX functionality, but this is not a
particularly high priority (since, I figure, almost nothing really uses MMX
anyways...).

>> currently supported POSIX features: basic file IO, dlopen/dlsym, ...
>> still lacking: more advanced file features (stat, readdir, ...), sockets,
>> pthreads, ...
>
> Doug Gwyn's PD libndir has readdir, opendir, seekdir, etc. directory
> functions. It's usually name libndir.tar.z or libndir-posix.tar.z The
> two
> versions are slightly different. There is also posix-1.0-src-11.00.tar.gz
> which has more files.
>
> I had to locate a new link for them. This site has them:
> http://ftp.lahtermaher.org/pub/unix-c/languages/c/
>
> posix-1.0-src-11.00.tar.gz:
> http://hpux.connect.org.uk/ftp/hpux/Languages/posix-1.0/
>

well, what is to say they are compatible with my implementation?...

(well, even if not entirely, maybe the headers and some of the code may be
useful, and I can hack on the rest...).

I may look, but thus far most operations of this sort are simply wrappers
around system calls, with the actual machinery taking place in the "host
app". the main exception is the C library, since it does a lot of things
which can't be directly provided via system calls, and allows marshalling a
much more complex interface (the C90 runtime) through a simpler interface (a
relatively smaller number of basic system calls).

the main reason I don't have readdir already is because the interpreter is
connected to the VFS via an older interface struct which does not provide
the needed API calls (basically, I am using the same interface which was
used for my assembler, which only provided for basic file IO), and I am in
the process of designing a new VFS interface struct (which may also provide
for sockets), but this has not yet been finished.

this would be followed by re-routing the traffic via the new interface, and
maybe implementing the needed syscalls (in the interpreter, and also in the
'vxcore' DLL).

note: 'vxcore.dll' essentially serves a similar role to 'kernel32.dll' and
'ntdll.dll' in Windows (AKA: mostly raw system calls and wrappers and
similar).

for example, I emulate some calls via system calls which are often following
a mix of POSIX and Win32 conventions.

for example, the libdl is faked via syscalls which fake, for example,
'LoadLibrary' and 'GetProcAddress', and others via a faked 'VirtualAlloc',
... ('mmap' would be a wrapper over 'VirtualAlloc' and 'CreateFileMapping'
and similar...).

this is partly because, at the basic level, I am implementing syscalls using
whichever style is easiest to emulate...

similarly, there is a big chunk of syscall space currently dedicated to JNI
(I am marshalling the calls, so both struct-based JNI, and a more direct
interface to the host object system, may be provided via this means).

an awkwardness though is that x86 is not JBC, and so there is no good way at
present to handle class and interface definition in 'vitrual' code
('classloader' doesn't really work, and the facilities provided by 'dyClass'
don't route through JNI, ...).

(I may instead resort to a trick involving statically defining data via
statically-initialized structs and strings and using special exports to
describe it...).

or such...

>
> Rod Pemberton
>
>

Rod Pemberton

unread,

Nov 10, 2009, 5:56:06 PM11/10/09

"BGB / cr88192" <cr8...@hotmail.com> wrote in message

news:hdc5b7$p2m$1...@news.albasani.net...
>
> [POSIX support code]

>
> well, what is to say they are compatible with my implementation?...
>

Nothing... But, you mentioned PDPCLIB somewhere. I don't think it has them
either, does it?

Rod Pemberton

BGB / cr88192

unread,

Nov 10, 2009, 6:37:44 PM11/10/09

"Rod Pemberton" <do_no...@nohavenot.cmm> wrote in message

news:hdcr5j$dis$1...@aioe.org...

nope, but it is not strictly a thin wrapper either.

PDPCLIB essentially wraps a small number of syscalls, and exports a much
more complex API.

much of POSIX, however, would seem to be things which would be directly
implemented via syscalls.

but, as I said earlier, I would look into this, and see if a lot can be
reused, I have just not been able to today as I had been off at classes and
stuff...

and, worst case, it will probably at least provide some usable headers (for
which I can fill in the backend logic, ...).

so, more status for later.

or such...

>
> Rod Pemberton
>
>

BGB / cr88192

unread,

Nov 10, 2009, 8:40:50 PM11/10/09

"Rod Pemberton" <do_no...@nohavenot.cmm> wrote in message

news:hdcr5j$dis$1...@aioe.org...

checking the provided links...

checking:
yeah... a lot of this code is almost as old as I am...
also generally pre-ANSI, ...

the readdir code apparently opens directories as files, and reads them
directly.
this much will not exactly work in my case, as my VFS does not exactly
implement "this" ability.

but, maybe a few 'bits and pieces' could be usable though...

>
> Rod Pemberton
>
>

Rod Pemberton

unread,

Nov 11, 2009, 10:47:15 AM11/11/09

"BGB / cr88192" <cr8...@hotmail.com> wrote in message

news:hdd4n6$bcj$1...@news.albasani.net...

>
> a lot of this code is almost as old as I am...

:-)

The GNU project started in 1983. The Linux kernel started in 1991.

Hmm, I wonder if can find code from 1970's in a modern GNU codebase such as
DJGPP? Yes. After five minutes of looking, I found 1976 in GAWK's io.c.

Copyright dates from the code in the DJGPP (GNU codebase) compiler: 2004,
2003, 2001, 2000, (many 1990's), 1989, 1987, 1986, 1985, 1984, 1983, ...,
1980, 1979, 1976

Admittedly, many of the dates are not of the compiler proper (GCC), but of
other GNU packages, e.g., GAWK, and DJGPP contributed packages. The bulk of
the dates in DJGPP seem to be from 1987 to 2002, with many around the early
1990's (Linux), mid-to-late 1990's, and early 2000's. DJGPP also uses it's
own C library, so my search didn't get dates for GLIBC. I'd suspect some
old code still lingers there.

Rod Pemberton

BGB / cr88192

unread,

Nov 11, 2009, 1:43:41 PM11/11/09

"Rod Pemberton" <do_no...@nohavenot.cmm> wrote in message

news:hdemdf$ft7$1...@aioe.org...

> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:hdd4n6$bcj$1...@news.albasani.net...
>>
>> a lot of this code is almost as old as I am...
>
> :-)
>
> The GNU project started in 1983. The Linux kernel started in 1991.
>

yep...

nevermind that the GNU project started shortly before I was born, and I was
not exactly all that old when Linux started...

> Hmm, I wonder if can find code from 1970's in a modern GNU codebase such
> as
> DJGPP? Yes. After five minutes of looking, I found 1976 in GAWK's io.c.
>

yes, maybe this being why K&R style declarations never die...

> Copyright dates from the code in the DJGPP (GNU codebase) compiler: 2004,
> 2003, 2001, 2000, (many 1990's), 1989, 1987, 1986, 1985, 1984, 1983, ...,
> 1980, 1979, 1976
>
> Admittedly, many of the dates are not of the compiler proper (GCC), but of
> other GNU packages, e.g., GAWK, and DJGPP contributed packages. The bulk
> of
> the dates in DJGPP seem to be from 1987 to 2002, with many around the
> early
> 1990's (Linux), mid-to-late 1990's, and early 2000's. DJGPP also uses
> it's
> own C library, so my search didn't get dates for GLIBC. I'd suspect some
> old code still lingers there.
>

yes, ok.

I did go and beat together readdir support now, where a lot more of this
work is going on in the interpreter than in the VM code...

one 'slight' issue that popped up was that the interface I am using (in
native land) has no analogue of seekdir/telldir, but quickly enough I had
the idea that 'readdir' would keep count of how many directory entries it
had seen, and seekdir would basically just do a rewinddir followed by N
readdir calls (thus meaning no need to go bother with all of the costs of
going and adding this concept to the lower end APIs...).

involved in the process was a rather severe rewrite of the native-land
file-IO code (mostly changing the method of interfacing the interpreter with
my VFS), ...

the header I am using (came from MinGW, also PD), also includes _WDIR,
_wreaddir, ... (apparently for unicode versions), and I am not sure if I
should bother implementing them.

technically, in my case the 'ASCII' versions are using UTF-8 (actually, it
is the 'Modified UTF-8' scheme from the JVM, which is sort of the de-facto
charset in my codebase), so the main issue would be to add code for doing
string conversions to this library.

however, if I were to do this, I would likely have to address pdpclib's lack
of 'wstring.h' (AKA: go and find some "nice" way to address adding in C99
functionality), ...

the original author expressed doubts about adding C99 functionality (I guess
because it could interfere with simplicity and usability on more
minimalistic systems), and I would personally like to avoid an unecassary
fork, however, I would like to avoid needing a separate DLL for this (AKA: I
would like it if all the core C runtime stuff was in the same DLL), ...

oh well, other details:
for some not easily understood reason, using a linear search is currently
faster for address-mapping resolution than a binary search.

my guess is that with (currently) only about 5 mappings (the EXE, the
pdpclib and vxcore DLLs, the stack, and the heap), the overhead of the
binary search is higher than that of just a plain 'for' loop.

actually, I think it is because the linear loop in this case will find the
item in an average of 2.5 steps, whereas the binary search would take 3
steps + a final adjustment.

I added partial address randomization (currently for heap and stack), but
was having difficulty with randomizing the EXE and DLL's (I would get a VM
GPF...).

I realize now that I had not verified that the EXE was not
relocation-stripped, and in this case the relocations are necessary to
properly re-base the PE/COFF image (I guess I could add a check and test,
and if this is the case maybe look into telling the linker to produce
relocatable EXE's...).

I may also need to look into "minimum alignment" issues (for tests, I was
keeping a 16-byte alignment). I am not sure if any code can "reasonably"
depend on the exact alignment of its load address though ("I'm so totally
going to GPF if my global array is not 4KiB aligned..."), so maybe it is ok
for now.

just checked, yep.

it now verifies that the image has relocs before randomizing the address,
and this makes the thing work. granted, in this case, the EXE is not
randomized, but DLL's seem to work ok...

basically, this is a simplistic version, where it mostly just jitters things
in 16-byte steps, with a max of 256 steps, meaning a 4KiB overhead per
structure (loaded DLL, stack, heap-segment, ...). this should be small
enough to be ignorable (an 0.4% heap overhead, ...).

or such...

Rod Pemberton

unread,

Nov 12, 2009, 12:02:14 AM11/12/09

"BGB / cr88192" <cr8...@hotmail.com> wrote in message

news:hdf0kv$d18$1...@news.albasani.net...

>
> I did go and beat together readdir support now

Already? Young dude, you are fast...

> however, if I were to do this, I would likely have to address pdpclib's
lack
> of 'wstring.h' (AKA: go and find some "nice" way to address adding in C99
> functionality), ...

I'm not sure about wstring.h, but I'm sure I posted the link a few times for
Doug Gwyn's "Instant C9x":
http://www.lysator.liu.se/c/q8/index.html

> the original author expressed doubts about adding C99 functionality (I
guess
> because it could interfere with simplicity and usability on more
> minimalistic systems),

PDPCLIB? That could be one issue. Another issue might be size. Size is
why Paul Edwards (kerravon) said they are using PDPCLIB for GCCMVS project
instead of GLIBC. Sigh. That's one more thing I still need to do: setup an
MVS380 environment so I can test some of my code with EBCDIC...

http://gccmvs.sourceforge.net/
http://mvs380.sourceforge.net/
http://sourceforge.net/projects/pdos/

> and I would personally like to avoid an unecassary
> fork, however, I would like to avoid needing a separate DLL for this (AKA:
I
> would like it if all the core C runtime stuff was in the same DLL), ...

MSVCRT... Oops, wrong library. :)

> for some not easily understood reason, using a linear search is currently
> faster for address-mapping resolution than a binary search.

...

While I understand what big-O representation is for, I don't know what
either would be or which should take more time in this case. As the length
of the linear search gets longer, it should take more time. One would think
a binary search was more "compact" or shorter due to searching fewer nodes.
Binary trees can get quite large also. It's possible the binary tree is not
always the optimal method, but the optimal method on average. I.e.,
something at the end of a large linear search won't be optimal, but
something at the beginning will be. In a binary tree, many values will be
"average", i.e., the search will neither be very long, nor be very short.
I'm fairly sure the graph of a linear search is linear while the graph of
the binary search could be very chaotic plot or very smooth curve in any
direction or shape depending on how the nodes are balanced, where they are
inserted, or on what branch decision is used.

What I do know is that a repeat prefix on an x86 string instruction is very
fast. I believe it's fast enough that converting from a rep to a loop or
using a jcc branch, e.g., needed to implement some theoretically more
optimal search solution, can completely kill the performance of the
theoretically optimal solution. Now, it seems you didn't have much data in
your search. What happens if you put in lots of data just for kicks? Is my
belief correct? I.e., that the linear search with rep and string
instructions is faster than a binary search when it shouldn't be in
theory... Well, you're doing this in C, right? Never mind.

> I added partial address randomization (currently for heap and stack), but
> was having difficulty with randomizing the EXE and DLL's (I would get a VM
> GPF...).

In the compiler? I.e., not inside an OS. Interesting. It's probably the
best the place to do so. I'm unaware of anyone else doing so.

> I may also need to look into "minimum alignment" issues (for tests, I was
> keeping a 16-byte alignment). I am not sure if any code can "reasonably"
> depend on the exact alignment of its load address though ("I'm so totally
> going to GPF if my global array is not 4KiB aligned..."),

Self re-aligning code? Nah, you'd need the code segment to be rewritable...

Rod Pemberton

BGB / cr88192

unread,

Nov 12, 2009, 1:44:16 AM11/12/09

"Rod Pemberton" <do_no...@nohavenot.cmm> wrote in message

news:hdg501$2gf$1...@aioe.org...

> "BGB / cr88192" <cr8...@hotmail.com> wrote in message
> news:hdf0kv$d18$1...@news.albasani.net...
>>
>> I did go and beat together readdir support now
>
> Already? Young dude, you are fast...
>

it is mostly about going and patching things together...

the low-level VFS already had something analogous to readdir, so I first had
to be able to route it into the interpreter (via a new interface struct),
write some utility wrappers, and the logic necessary for the syscalls.

>> however, if I were to do this, I would likely have to address pdpclib's
> lack
>> of 'wstring.h' (AKA: go and find some "nice" way to address adding in C99
>> functionality), ...
>
> I'm not sure about wstring.h, but I'm sure I posted the link a few times
> for
> Doug Gwyn's "Instant C9x":
> http://www.lysator.liu.se/c/q8/index.html
>

didn't see this, but will look at it.

>> the original author expressed doubts about adding C99 functionality (I
> guess
>> because it could interfere with simplicity and usability on more
>> minimalistic systems),
>
> PDPCLIB? That could be one issue. Another issue might be size. Size is
> why Paul Edwards (kerravon) said they are using PDPCLIB for GCCMVS project
> instead of GLIBC. Sigh. That's one more thing I still need to do: setup
> an
> MVS380 environment so I can test some of my code with EBCDIC...
>
> http://gccmvs.sourceforge.net/
> http://mvs380.sourceforge.net/
> http://sourceforge.net/projects/pdos/
>

I don't support EBCDIC, so I presume to live in a world where only UTF-8
exists...

granted, proper C support requires baseline support for codepages/... but oh
well...

>> and I would personally like to avoid an unecassary
>> fork, however, I would like to avoid needing a separate DLL for this
>> (AKA:
> I
>> would like it if all the core C runtime stuff was in the same DLL), ...
>
> MSVCRT... Oops, wrong library. :)
>

MSVCRT is not open source, and likely depends on a lot of facilities I don't
provide (core Windows syscalls, ...).

>> for some not easily understood reason, using a linear search is currently
>> faster for address-mapping resolution than a binary search.
> ...
>
> While I understand what big-O representation is for, I don't know what
> either would be or which should take more time in this case. As the
> length
> of the linear search gets longer, it should take more time. One would
> think
> a binary search was more "compact" or shorter due to searching fewer
> nodes.
> Binary trees can get quite large also. It's possible the binary tree is
> not
> always the optimal method, but the optimal method on average. I.e.,
> something at the end of a large linear search won't be optimal, but
> something at the beginning will be. In a binary tree, many values will be
> "average", i.e., the search will neither be very long, nor be very short.
> I'm fairly sure the graph of a linear search is linear while the graph of
> the binary search could be very chaotic plot or very smooth curve in any
> direction or shape depending on how the nodes are balanced, where they are
> inserted, or on what branch decision is used.
>

I was doing the array version of a binary search (no trees, only a sorted
array).

the issue is that I think that the average number of steps is larger, since
for small n,
ceil(log2 n) > n/2

this is because the variant of linear search I use takes ceil(log2 n) steps
(I had failed to improve upon this, as an "early out check" costed a lot
more than it saved, ...), whereas on-average a linear search will find the
results "somewhere near the middle".

1 - -
2 1 1
3 2 1.5
4 2 2
5 3 2.5
6 3 3
7 3 3.5
8 3 4
9 4 4.5
10 4 5
11 4 5.5
12 4 6
13 4 6.5
14 4 7
15 4 7.5
16 4 8

so, for n<10 or so, it is really a toss up in terms of complexity.

for 5, binary has a slightly higher complexity than linear, as well as being
an otherwise more complicated piece of code...

similarly, I had also made a slight tweak in the linear search to exploit
the sorting (the first 'if' check either results in a 'continue', or if it
fails, indicates a terminal position in the loop meaning either the item is
found or absent).

so, for a binary search to be faster, n would need to be larger...

> What I do know is that a repeat prefix on an x86 string instruction is
> very
> fast. I believe it's fast enough that converting from a rep to a loop or
> using a jcc branch, e.g., needed to implement some theoretically more
> optimal search solution, can completely kill the performance of the
> theoretically optimal solution. Now, it seems you didn't have much data
> in
> your search. What happens if you put in lots of data just for kicks? Is
> my
> belief correct? I.e., that the linear search with rep and string
> instructions is faster than a binary search when it shouldn't be in
> theory... Well, you're doing this in C, right? Never mind.
>

yep...

this is not for string lookups, this is for mapping addresses to memory
spans.
hence, it is an operation performed pretty much every time an opcode
attempts to access a memory operand...

mov eax, [ebp+8]
add [eax+16], ecx
...

as is, this operation is also significant in the running time, but there
seems not much way at present to speed it up (even a hash would not likely
help here, as it would likely add more overhead than it saves...).

>> I added partial address randomization (currently for heap and stack), but
>> was having difficulty with randomizing the EXE and DLL's (I would get a
>> VM
>> GPF...).
>
> In the compiler? I.e., not inside an OS. Interesting. It's probably the
> best the place to do so. I'm unaware of anyone else doing so.
>

no, it is in the PE/COFF loader...

>> I may also need to look into "minimum alignment" issues (for tests, I was
>> keeping a 16-byte alignment). I am not sure if any code can "reasonably"
>> depend on the exact alignment of its load address though ("I'm so totally
>> going to GPF if my global array is not 4KiB aligned..."),
>
> Self re-aligning code? Nah, you'd need the code segment to be
> rewritable...
>

I meant, just loading in an EXE and DLL and shifting it by a random multiple
of 16 bytes, and doing the DLL rebase magic...

for now, it seems to work ok, but I am not sure if anything will depend on a
coarser alignment, and thus risk breaking...

>
> Rod Pemberton
>
>

Steve

unread,

Nov 12, 2009, 8:04:12 AM11/12/09

"BGB / cr88192" <cr8...@hotmail.com> writes:
>
>technically, in my case the 'ASCII' versions are using UTF-8 (actually, it
>is the 'Modified UTF-8' scheme from the JVM, which is sort of the de-facto
>charset in my codebase), so the main issue would be to add code for doing
>string conversions to this library.

Hello,

Could you explain the modification, and its advantage?

Thanks

Steve N.

BGB / cr88192

unread,

Nov 12, 2009, 9:45:20 AM11/12/09

"Steve" <Bo...@Embarq.com> wrote in message news:hdh14b$1li$1...@aioe.org...

main differences:
it is possible to encode literal 0 characters in a string as the bytes 0xC0
0x80 (although this is rarely used in practice, as it is problematic in
handling code...).

in standard UTF-8, characters >= 65536 are encoded directly;
in UTF-16, they are encoded via "surrogate pairs", which extend the space to
1M.
in Modified UTF-8, characters >=65536 are encoded as surrogate pairs encoded
as UTF-8.

the main advantage of M/UTF-8 in this cases is that it is much closer to a
1:1 mapping with UTF-16 (since they are converted back and forth simply by
converting the values, rather than having to understand the more subtle
rules of each).

however, as a cost, it implies that chars >1M are multiple characters, and
also may inflate the string some if many of these characters are used (for
example, because a character uses 6 bytes rather than 4, ...).

note that, technically the formats are relatively compatible, and most of my
handling code can deal with both formats fairly transparently.

> Thanks
>
> Steve N.

BGB / cr88192

unread,

Nov 12, 2009, 10:45:49 AM11/12/09

"BGB / cr88192" <cr8...@hotmail.com> wrote in message

news:hd9q87$6cp$1...@news.albasani.net...

> seems I had not been telling people here about this, so, I just figured I
> would give a status update...
>

and now is another update...

>
> currently about 13 MIPS in tests (testing a simple loop doing lots of
> memory IO and bit-twiddling);
> in terms of time, the interpreter is about 76x slower than native (vs the
> same loop compiled in native code).
>

still about the same, but it is variable as "trivial" changes end up making
it faster or slower...

> the interpreter is currently pure C, and is not particularly
> micro-optimized, so this factor could still be improved some (ASM and/or
> JIT being options, but at the moment I am not considering them, since as I
> see it, having the thing work acceptably is more important than speed at
> present).
>

a few idle thoughts for how to approach JIT have occured though, and so I
may do this eventually.
current thinking is that a piece of logic would be connected between the
decoder and interpreter, which would scan forwards, and possibly replace a
group of instructions with a single 'pseudo-instruction' which would encode
this whole group.

the external behavior would be be about the same as the single instruction
case, only that the 'handler' function is actually a chunk of JITed code.

however, this is not an immediate priority.

JNI, still not fully implemented...

>
> I am currently also working some on POSIX functionality (within the
> virtualized world). I don't expect a "complete" implementation, but
> hopefully enough for my uses.
>
> currently supported POSIX features: basic file IO, dlopen/dlsym, ...
> still lacking: more advanced file features (stat, readdir, ...), sockets,
> pthreads, ...
>

readdir has been added.

stat has not (stat would actually have to be implemented within my VFS, as
my VFS does not presently include this feature, in the general sense).
sockets is similar to stat.

granted, I could try to map it directly to the OS level sockets (rather than
trying to add it to and route it through the VFS). this would require a
little fiddling to avoid breaking the POSIX-style sockets which alias files
and sockets.

pthreads, at present, would require adding a scheduler, and possibly moving
the interpreter logic into its own (OS-level) thread (may make this part
optional).

UID/GID/shell: no change.

>
> I am still using PE/COFF EXE's and DLL's (but may consider leaving off the
> '.EXE' extension, or allowing use of '.SO' for DLL's). little beyond
> inconvinience (having the means to compile code as ELF on Windows, or
> writing a loader) prevents using ELF. I "could" take a "flex" strategy and
> use whatever is more convinient (vs forcing Linux builds to use PE/COFF),
> although it can be noted that they are not strictly equivalent (and could
> also pose issues for dumb build tools).
>

I have now added a basic form of ASLR:
http://en.wikipedia.org/wiki/ASLR

at present, it mostly just jitters the base addresses of some structures
(the load base of DLLs, the stack top, ...). for the heap, it reserves a
small random-size space at the front (0-4kB), thus effecting the exact
positions of subsequent alignments.

I am currently trying to think up a way to do "generalized" address space
randomization without risking introducing too much fragmentation.

>
> for technical reasons, I may impose limits as to how much memory can be
> used by virtual processes (otherwise, I would need a "proper" MMU and
> support for swapping in order to work effectively on 32-bit hosts).
>

not yet addressed...

wolfgang kern

unread,

Nov 12, 2009, 3:06:13 PM11/12/09

"BGB / cr88192" posted here and in AOD:

...

> and now is another update...

...

I'm still not sure that I'm able to follow/understand your ideas,
but anyway all new ideas are worth to think over at least ...

Yeah there could be a one and only interpretater for all CPU's
and OS's, but this would need to shrink all sourcecode down to a
very limited (yours? or whoevers?) functionality.

In general I like this idea, but we easy will find us apart when
it comes to speed and size related to performance.

My solution for OS (version-)independent coding of addons/applications
is an OS-related script-language. So whenever I upgrade my Os, the
already sold applications may gain speed, but nothing else ...

Sure there might be new features which allow other methodes,
but the customers decide to use it or keep the old method.

Please give us an answer (perhaps first to yourself) what kind of
programming style you have in mind here [fast?/smart?/short?/easy?].

a commom saying:
We can't get everything, and for sure not all at the very same time!

my personal preference [very well accepted by my clients] is
[fast/smart/short and transparent], and as a matter of fact:
my clients never care (never had to care) about "easy to programm".
__
wolfgang

BGB / cr88192

unread,

Nov 12, 2009, 5:49:55 PM11/12/09

"wolfgang kern" <now...@never.at> wrote in message
news:hdhps3$2iq$1...@newsreader2.utanet.at...

>
> "BGB / cr88192" posted here and in AOD:
>
> ...
>
>> and now is another update...
>
> ...
>
> I'm still not sure that I'm able to follow/understand your ideas,
> but anyway all new ideas are worth to think over at least ...
>
> Yeah there could be a one and only interpretater for all CPU's
> and OS's, but this would need to shrink all sourcecode down to a
> very limited (yours? or whoevers?) functionality.
>

this interpreter is not intended to replace native code...

it is no more intended to replace the existence of native code and OS's than
DOSBox is to replace Windows...

Windows serves one role, DOSBox another, and my project would serve a
different role than either...

the goal is also different from that of the JVM or .NET VM (which aim to
replace one world with another...).

the role then is to do what my prior compiler has aimed to do:
to allow application extensions and scripts.

they then differ in the types of extensions and the means of implementing
them, where the compiler had implemented extensions in the form of code
running directly in the host address space, the interpreter will run them
essentially in sandboxes (hence the idea of using a VFS as opposed to
directly using the host filesystem, ...).

in particular, I am considering the possibility of "untrusted" extensions.

similarly, there are cases where pre-compiled DLLs may be a convinient means
of distributing code (plugins, scripts, ...), ...

> In general I like this idea, but we easy will find us apart when
> it comes to speed and size related to performance.
>

if I don't replace the native OS, it is not as much of a worry.
scripts are slower than native, granted, but I will assume here that most
performance critical parts of an app are likely to be written in native
code, and so the role of scripts is not to implement an entire app's
functionality.

the app will instead plug some of the interpreter's functionality into its
own provided backends (or, as is, they default to trying to use my compiler
framework's facilities).

> My solution for OS (version-)independent coding of addons/applications
> is an OS-related script-language. So whenever I upgrade my Os, the
> already sold applications may gain speed, but nothing else ...
>
> Sure there might be new features which allow other methodes,
> but the customers decide to use it or keep the old method.
>
> Please give us an answer (perhaps first to yourself) what kind of
> programming style you have in mind here [fast?/smart?/short?/easy?].
>

I don't know...
I just figured people would use C for both the host app, and for any
extensions.

so, the idea is to hopefully allow code to be moved fairly easily and
transparently between native-land and the VM world.

it also helps here if many common facilities are available in both cases,
and if many of the facilities existing in the interpreted world will be
familiar (however, virtualized and potentially sandboxed).

I would also like it if most facilities can be available at similar "levels
of abstraction".
using a language like Java and C# should not require a traditional VM, as
the VM should be optional (for example, we could compile the Java directly
to native code, ...).

ones' choice to use C should not prevent them from being able to use things
such as operating within a virtualized or sandboxed environment if needed,
or being able to use "eval", ...

C and x86 do not require retooling, and they don't require a fundamental
change in coding practices or mindset.

we already have a VM with a long and proven track record: x86...

> a commom saying:
> We can't get everything, and for sure not all at the very same time!
>
> my personal preference [very well accepted by my clients] is
> [fast/smart/short and transparent], and as a matter of fact:
> my clients never care (never had to care) about "easy to programm".

ok.

> __
> wolfgang
>
>

0 new messages