Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Awfully quiet in here lately...

70 views
Skip to first unread message

Tony

unread,
Apr 19, 2011, 11:56:54 PM4/19/11
to
What's are you working on?

(Please do not disclose patentable information). ;)


Jacko

unread,
Apr 20, 2011, 1:21:15 AM4/20/11
to
I'm stripping down small c for ISA testing. it might change enough to unbecome c. You'd have thought gcc would double pass the symbol table by now.

Cheers Jacko

BGB

unread,
Apr 20, 2011, 1:44:47 AM4/20/11
to
On 4/19/2011 8:56 PM, Tony wrote:
> What's are you working on?
>
> (Please do not disclose patentable information). ;)
>


A: trying to make a game, that is actually functional, so that maybe I
can sell it on Steam and get some money or similar, and maybe try to
make bigger fancier games if I can get things like, a budget...

current effort is like a simplistic FPS+ARPG hybrid.


B: redesigned the bytecode for my BGBScript2 project, which is otherwise
a bit stalled at the moment.

the prior bytecode has a drawback in that it requires a complex
interpreter or JIT (as it required type-flow analysis and similar, which
turned out to make things more painful than originally imagined), and I
switched to a new bytecode design (where most opcodes are type-specific,
more like JBC) which would allow a simpler direct-interpreter and/or
JIT, which does not require complex type and scope-handling machinery to
work (albeit this complexity would be migrated to the compiler instead).

both A and B require a large investment of effort though.


in the new interpreter, a function like, say:
int foo(int x, int y)
{
int z;

z=3*x+y;
return(z);
}
would produce bytecode which would look something like:
lloadi 0
pushi_3
muli
lloadi 1
addi
lstorei 2
lloadi 2
ret

or:
int fib(int x)
{
if(x>2)return(fib(x-2)+fib(x-1));
return(1);
}
as:
lloadi 0
pushi_2
cmpi
jle L0
mark
lloadi 0
pushi_2
subi
scall fib(i)i
mark
lloadi 0
pushi_1
subi
scall fib(i)i
addi
ret
L0:
pushi_1
ret


or such...

tm

unread,
Apr 20, 2011, 3:27:04 AM4/20/11
to
On 20 Apr., 07:44, BGB <cr88...@hotmail.com> wrote:
> On 4/19/2011 8:56 PM, Tony wrote:
>
> > What's are you working on?
>
> > (Please do not disclose patentable information). ;)
>
> A: trying to make a game, that is actually functional, so that maybe I
> can sell it on Steam and get some money or similar, and maybe try to
> make bigger fancier games if I can get things like, a budget...

Thank you for this information. I was not aware of Steam
and this possibility to sell games. Since I have written
some games I could possibly get some money too.

I have looked at the wikipedia entry of Steam and its
homepage, but I did not find information about what is
necessary to release a game (please give me some links).

Is a windows executable enough or is it necessary to
change the sourcecode to make it work with the Steam
platform?

Is there a copy protection?

Is there a marked for 2d retro games at all?


Greetings Thomas Mertes

--
Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.

BGB

unread,
Apr 20, 2011, 5:07:34 AM4/20/11
to
On 4/20/2011 12:27 AM, tm wrote:
> On 20 Apr., 07:44, BGB<cr88...@hotmail.com> wrote:
>> On 4/19/2011 8:56 PM, Tony wrote:
>>
>>> What's are you working on?
>>
>>> (Please do not disclose patentable information). ;)
>>
>> A: trying to make a game, that is actually functional, so that maybe I
>> can sell it on Steam and get some money or similar, and maybe try to
>> make bigger fancier games if I can get things like, a budget...
>
> Thank you for this information. I was not aware of Steam
> and this possibility to sell games. Since I have written
> some games I could possibly get some money too.
>

sadly, at the moment I lack any sufficiently-complete games.
what I have is in-progress, but still a long way from a usable game.
most of the engine is in plain C.


I had something more functionally complete before, but it used a lot of
GPL code, and dropping a lot of this code left many things I have to
re-implement myself. as-is, all code is either written by myself, or is
under BSD-style licenses.

many GPL data files remain for now, but I have little certainty as to
what effects GPL has on data files.

> I have looked at the wikipedia entry of Steam and its
> homepage, but I did not find information about what is
> necessary to release a game (please give me some links).
>

I don't have any good links off-hand.
it all requires a bit of digging to find much of anything.
I have not actually released anything with Steam yet though, as I would
need a mostly working game first.

> Is a windows executable enough or is it necessary to
> change the sourcecode to make it work with the Steam
> platform?
>

AFAICT, it does support raw directories, so unmodified games work.

there is also their GCF files, which basically launch the game from a
VFS (Virtual File-System), but from what I have gathered, once the main
binary is launched, there is an ugly/awkward process needed effectively
to connect back into the steam-platform, which is needed to fetch any
subsequent files.

basically, it involves going to steam's bin directory, loading the steam
DLL, and fetching and calling some function pointers.

well, and in their great wisdom, they put the needed interface headers
in the Source SDK, and used a C++ based API interface.


> Is there a copy protection?
>

partly, mostly related to using the GCF format.
basic encryption is supported, but is only used during distribution (the
GCF files are locally decrypted when activated). also, GCF files don't
work in Steam unless they are activated with the currently logged-in
account, which presumably mostly prevents people just copy/pasting the
GCF files (they can get the files, but Steam wont willingly launch them).

otherwise, then, it amounts mostly people needing a special tool to
read/unpack the GCF files (which IIRC are a vaguely FAT-like filesystem
style format).

a game could prevent this from working by refusing to work from a raw
directory if built for GCF-based distribution.

in the above case, then, probably one would need a hacked or cloned
Steam to be able to launch the GCF files (providing a dummy steam DLL
and implementing custom versions of all the Steam API calls).


> Is there a marked for 2d retro games at all?
>

well, there are some 2D games on there, but beyond this (how many people
buy them, ...), I don't know...

Rod Pemberton

unread,
Apr 20, 2011, 5:55:36 AM4/20/11
to
"Jacko" <jacko...@gmail.com> wrote in message
news:1363cdd2-f6c8-4a35...@glegroupsg2000goo.googlegroups.com...

>
> I'm stripping down small c for ISA testing. it might change enough to
unbecome c. You'd have thought gcc would double pass the symbol table by
now.
>

Aren't you Forth oriented?

SmallC or a C subset?

ISA = abbreviation for ????


IIRC, some modern parser toolsets have visualization tools for the grammar.
I'd guess that one is capable of enter the full ISO C grammar and reduce it
to just what is needed with such tools.

E.g., ANTLR or Gold.
http://www.antlr.org/
http://www.devincook.com/goldparser/


Rod Pemberton


Nils M Holm

unread,
Apr 20, 2011, 8:42:04 AM4/20/11
to
Tony <nos...@myisp.net> wrote:
> What's are you working on?

I have designed and implemented a mostly useless language
one afternoon and now I am having a lot of fun with it.
By now it has an editor with syntax highlighting and a
useless->C compiler. Of course it is self-hosting. The
whole thing is rather small: ~350 lines of C for the
interpreter, ~300 lines of useless code for the compiler,
another 300 for various libraries.

It's on my homepage in case anyone wants to have a look.

--
Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org
May all beings be happy, peaceful, and free from suffering!

Richard Harter

unread,
Apr 20, 2011, 10:38:33 AM4/20/11
to
On Tue, 19 Apr 2011 22:56:54 -0500, "Tony" <nos...@myisp.net> wrote:

>What's are you working on?
>
>(Please do not disclose patentable information). ;)

I've been working off and on on a flow based programming engine - more
off than on lately. I've also been participating in fpb forums.

BGB

unread,
Apr 20, 2011, 1:40:02 PM4/20/11
to
On 4/20/2011 5:42 AM, Nils M Holm wrote:
> Tony<nos...@myisp.net> wrote:
>> What's are you working on?
>
> I have designed and implemented a mostly useless language
> one afternoon and now I am having a lot of fun with it.
> By now it has an editor with syntax highlighting and a
> useless->C compiler. Of course it is self-hosting. The
> whole thing is rather small: ~350 lines of C for the
> interpreter, ~300 lines of useless code for the compiler,
> another 300 for various libraries.
>
> It's on my homepage in case anyone wants to have a look.
>

and, meanwhile, my VM project is around 600 kloc (kloc == 1000 lines of
code), and is not self-hosting...

so, alas...


granted, its currently supported scripting languages are:
BGBScript, which is mostly a JS variant;
C99, but C support has been a road of pain this whole time;
Java, albeit mainly compiled with a standard Java compiler;
BS2, which is not complete (WRT the implementation), but is sort of like
a Java/AS3 hybrid, but with a bunch of features added.

most serious code is written in C though, but C has drawbacks which make
it a poor choice for a scripting language (hence, the C frontend at this
point is more used to generate a semi-transparent C FFI for BGBScript).


then, with this, there is another 250 kloc of 3D engine code, ...
but, my 3D engine currently delivers unimpressive performance (mostly
due I think to using a Doom3-like lighting and shadowing model, and a
lot of this code is not particularly performance-tuned).


I am not entirely sure how much is a result of having a complex
feature-set though, and how much is a result of just tending to write
bulky code.


or such...

Jacko

unread,
Apr 20, 2011, 2:36:55 PM4/20/11
to
On Wednesday, 20 April 2011 10:55:36 UTC+1, Rod Pemberton wrote:
> "Jacko" <jacko...@gmail.com> wrote in message
> news:1363cdd2-f6c8-4a35...@glegroupsg2000goo.googlegroups.com...
> >
> > I'm stripping down small c for ISA testing. it might change enough to
> unbecome c. You'd have thought gcc would double pass the symbol table by
> now.
> >
>
> Aren't you Forth oriented?

It could be best expressed as a hate for stack frames and a single parameter return, with a dislike of excessive code overhead, especially in embedded where mW per device can be MW planet. And hence a single stack model.

> SmallC or a C subset?

At the moment it's fixing small C to compile on gcc, then it's on to de-extending small C to accept *var as a function argument, and make int default. (or remove other types altogether and start that bit again)

> ISA = abbreviation for ????

Then altering the code gen to emit a different Instruction Set Architecture.

> IIRC, some modern parser toolsets have visualization tools for the grammar.
> I'd guess that one is capable of enter the full ISO C grammar and reduce it
> to just what is needed with such tools.

This may be true, but I've looked at SDCC and LCC and GCC as possible options, but when I found small C I got that magic feeling.

Cheers Jacko

James Harris

unread,
Apr 20, 2011, 3:26:12 PM4/20/11
to
On Apr 20, 4:56 am, "Tony" <nos...@myisp.net> wrote:
> What's are you working on?

I recently started a new job. Almost everything else has had to be
parked for a while.

James

Rod Pemberton

unread,
Apr 20, 2011, 4:00:52 PM4/20/11
to
"BGB" <cr8...@hotmail.com> wrote in message
news:ion5te$io4$1...@news.albasani.net...

>
> most serious code is written in C though,
>

Did you mean your serious code or serious code in general?

> most serious code is written in C though,
> but C has drawbacks which make
> it a poor choice for a scripting language
> (hence, the C frontend at this point is more
> used to generate a semi-transparent C FFI
> for BGBScript).
>

In what way is it a poor choice? I.e., C language's syntax is a poor choice
for a scripting language, or, C language is a poor choice to use to
implement a scripting language? E.g., too pointer oriented?


Rod Pemberton


Rod Pemberton

unread,
Apr 20, 2011, 4:32:04 PM4/20/11
to
"Jacko" <jacko...@gmail.com> wrote in message
news:c54ef1e7-99ac-475a...@glegroupsg2000goo.googlegroups.com...

> On Wednesday, 20 April 2011 10:55:36 UTC+1, Rod Pemberton wrote:
> > "Jacko" <jacko...@gmail.com> wrote in message
> >
news:1363cdd2-f6c8-4a35...@glegroupsg2000goo.googlegroups.com...
> > >
> > > I'm stripping down small c
> >
> > SmallC or a C subset?
>
> At the moment it's fixing small C to compile on gcc, then it's on to
> de-extending small C to accept *var as a function argument, and
> make int default. (or remove other types altogether and start
> that bit again)
>

So, you mean a version of Ron Cain's SmallC... Yes?

If so, have you been following Steve Dubrovich's work on SmallC?
http://www.project-fbin.hostoii.com/index.htm

Sorry, don't click that link. In that link, you'll need to remove one 'i'
from the end of hostxx.com. It seem URIBL's spam list has that domain
blocked which prevents AIOE.org from accepting the correct link in my post.

Steve and my posts on SmallC, for reasons unknown, are on alt.lang.asm
(older) and alt.os.development (recent). You can use Google Group's
advanced search to find them. We've both added various C syntax, e.g., '&&'
and '||', etc. I modified mine to ANSI C parameters instead of K&R, and a
"faked" - syntax only - 'int' return type. Steve added prolog/epilog code
to his, etc. I.e., maybe some useful info in our posts. E.g., here is
where I describe my "fake" 'int' return:
http://groups.google.com/group/alt.os.development/msg/982d5ff3d2db42cd

I've been working on another version of SmallC with input from Steve and use
of his I/O library. You'll need a small I/O library to bootstrap SmallC.
His is for CP/M and or Windows "dosbox" consoles which support FCBs. The
version I started with came from Evgueniy Vitchev. It's for Linux. It
seems he made some mistakes with his version, or maybe they were correct for
Linux and not other environments... You may need to locate and fix them, if
you use that version. But, he also adapted some things in a way that I
liked. I reworked it for DJGPP and NASM. EV's version is here:
http://www.physics.rutgers.edu/~vitchev/smallc-i386.html

> > IIRC, some modern parser toolsets have visualization tools for the
> > grammar. I'd guess that one is capable of enter the full ISO C
> > grammar and reduce it to just what is needed with such tools.
>
> This may be true, but I've looked at SDCC and LCC and GCC as
> possible options,
>

After GCC, LCC seems to have the most variants (Pelles C, LCC-Win32,
LCC-Linux32, RadiOS LCC, Q3's LCC-Q3VM). IIRC, LCC v4.1 and v4.2 were
"fixed" to match the C specification in regards to proper implementation of
pointers. But, LCC v3.5 and v3.6 compiled for DOS, and the source should
still be available. From what I saw, there really wasn't a huge amount of
code changes. I.e., one could probably rework v4.2 for DOS, if desired.
Fabrice Bellard's TCC is another smaller C compiler. But, it uses some
features of the GNU C library I wasn't familiar with. I.e., my porting
efforts didn't start off too well. There are a couple minimal C compilers
in the IOCCC contest archives. There was a small C compiler called WCC by
Lutz Hamel from DECUS.

> but when I found small C I got that magic feeling.
>

There are many versions of SmallC. I've got 21 in my collection. I'm not
sure how many Steve's got, probably more. He seems to know about all of
them. Steve classifies SmallC as either version 1 (Ron Cain) or version 2
(James Hendrix). JH added other features. Most versions of Cain's SmallC
need structs, which is a serious drawback. Small-C/Plus by RM Yorston for
the Z80 has them. Unfortunately, the main compiler code is somewhat
different from the other SmallC versions. Fortunately, most other versions
are similar to Cain's or Hendrix's. I.e., you can "cut-n-paste" the needed
code or features rather easily. I recently found that Geof Cooper posted an
"informal EBNF" grammar for version 2 way back in '87. It may help in
understanding the recursive descent parser.
http://groups.google.com/group/net.sources/browse_thread/thread/7e70ce70b4d4dd17/13cb3e331ece1bd7


Rod Pemberton


BGB

unread,
Apr 20, 2011, 6:22:00 PM4/20/11
to
On 4/20/2011 1:00 PM, Rod Pemberton wrote:
> "BGB"<cr8...@hotmail.com> wrote in message
> news:ion5te$io4$1...@news.albasani.net...
>>
>> most serious code is written in C though,
>>
>
> Did you mean your serious code or serious code in general?
>

my serious code...


>> most serious code is written in C though,
>> but C has drawbacks which make
>> it a poor choice for a scripting language
>> (hence, the C frontend at this point is more
>> used to generate a semi-transparent C FFI
>> for BGBScript).
>>
>
> In what way is it a poor choice? I.e., C language's syntax is a poor choice
> for a scripting language, or, C language is a poor choice to use to
> implement a scripting language? E.g., too pointer oriented?
>

note: I mean standards-conformant C (IOW: properly follows ISO-9899:1990
and 1999 with minimal variance), not "some vaguely C-like language...".

and being used *as* the scripting language.


some problem areas are:

preprocessing headers is slow (causes a big delay loading C code from
source, and using "#include" in an "eval" expression is worse...).

one has to deal with all sorts of system and often compiler-dependent
cruftiness (load Linux headers and see piles of GCC'isms, load
MSVC/VisualStudio headers and see piles of MS-specific extensions).

there is no good way to do precompiled headers (they either have to
dance around the standard or introduce subtle non-standard behaviors).

there is some amount of legacy cruft in the language (K&R declarations,
implicit int, ...) which to really be "proper", one needs to support.


the inability to have top-level expressions can be rather limiting (but
allowing expressions would both violate the standard, and introduce very
non-C behavior). at-best, one could add Java-style "static { ... }" blocks.

proper C code can't be readily typed interactively by a user (proper
syntax would be too cumbersome), and interactive console entry doesn't
constitute a "compilation unit" as per the standard.

subtle bugs can cause serious problems.

trying to hot-patch running code is liable to cause it to blow up in
ones' face.

...

the natural result then being that I ended up mostly relegating the C
compiler to being a source code processing/information-mining tool
(where the delays don't matter as much), and I just use BGBScript or
similar as the main scripting language....

another considered language idea of mine ("C-Aux") would keep a mostly
similar syntax, but would alter some things (mostly to make a C-like
language which would be more "generalized" and easier to make binary
portable, and also allow a faster parser).

one considered feature would be the "reasonable declaration type" rule,
whereby:
if a declaration type can be taken to be a self-contained type, then the
next token will be assumed to be a part of the declarator (it will not
be checked to see if it is a typedef).

hence:
"unsigned short foo;"

the 'foo' token will be assumed to be a declarator token, rather than a
type token, on the basis that "unsigned short" can be inferred to be a
self-contained type, and thus would have no need for a following
declared type.

whereas, in:
"static foo bar;"
at the point foo is seen, there is no complete type, and hence foo will
be assumed to be a part of the type.


however, this will disallow certain things, such as:
"
typedef int foo;
unsigned short foo bar; //syntax error in C-Aux
"

which would be allowed in C, but would be a syntax error in C-Aux.

also, ordering will be constrained somewhat, namely in that if a custom
type is used, no type modifiers may follow it:
"
typedef int foo;
foo static unsigned *x; //syntax error in C-Aux
"

there would also be some more alterations in terms of preprocessor
behavior and include semantics as well.


also:
C-Aux would not support either K&R declarations, nor implicit int;
function prototypes or declarations will be mandatory;
"()" in function declarations will be interpreted equivalently to
"(void)" in all cases (a "()" function may not accept arguments).

...


or such...

aury

unread,
Apr 22, 2011, 8:35:05 AM4/22/11
to
On Apr 21, 12:22 am, BGB <cr88...@hotmail.com> wrote:
> On 4/20/2011 1:00 PM, Rod Pemberton wrote:
>
> > "BGB"<cr88...@hotmail.com> wrote in message

And what will be the main purpose of BGBscript?
Do you recive any feedback from potentiol users?

Rod Pemberton

unread,
Apr 20, 2011, 4:15:28 PM4/20/11
to
"Jacko" <jacko...@gmail.com> wrote in message
news:c54ef1e7-99ac-475a...@glegroupsg2000goo.googlegroups.com...

> On Wednesday, 20 April 2011 10:55:36 UTC+1, Rod Pemberton wrote:
> > "Jacko" <jacko...@gmail.com> wrote in message
> >
news:1363cdd2-f6c8-4a35...@glegroupsg2000goo.googlegroups.com...
> > >
.
> > > I'm stripping down small c
> >
> > SmallC or a C subset?
>
> At the moment it's fixing small C to compile on gcc, then it's on to
> de-extending small C to accept *var as a function argument, and
> make int default. (or remove other types altogether and start
> that bit again)
>

So, you mean a version of Ron Cain's SmallC... Yes?

If so, have you been following Steve Dubrovich's work on SmallC?

http://www.project-fbin.hostoi.com/index.htm

Steve and my posts on SmallC, for reasons unknown, are on alt.lang.asm
(older) and alt.os.development (recent). You can use Google Group's
advanced search to find them. We've both added various C syntax, e.g., '&&'
and '||', etc. I modified mine to ANSI C parameters instead of K&R, and a
"faked" - syntax only - 'int' return type. Steve added prolog/epilog code
to his, etc. I.e., maybe some useful info in our posts. E.g., here is
where I describe my "fake" 'int' return:
http://groups.google.com/group/alt.os.development/msg/982d5ff3d2db42cd

I've been working on another version of SmallC with input from Steve and use
of his I/O library. You'll need a small I/O library to bootstrap SmallC.
His is for CP/M and or Windows "dosbox" consoles which support FCBs. The
version I started with came from Evgueniy Vitchev. It's for Linux. It
seems he made some mistakes with his version, or maybe they were correct for
Linux and not other environments... You may need to locate and fix them, if
you use that version. But, he also adapted some things in a way that I
liked. I reworked it for DJGPP and NASM. EV's version is here:
http://www.physics.rutgers.edu/~vitchev/smallc-i386.html

> > IIRC, some modern parser toolsets have visualization tools for the


> > grammar. I'd guess that one is capable of enter the full ISO C
> > grammar and reduce it to just what is needed with such tools.
>
> This may be true, but I've looked at SDCC and LCC and GCC as
> possible options,
>

After GCC, LCC seems to have the most variants (Pelles C, LCC-Win32,


LCC-Linux32, RadiOS LCC, Q3's LCC-Q3VM). IIRC, LCC v4.1 and v4.2 were
"fixed" to match the C specification in regards to proper implementation of
pointers. But, LCC v3.5 and v3.6 compiled for DOS, and the source should
still be available. From what I saw, there really wasn't a huge amount of
code changes. I.e., one could probably rework v4.2 for DOS, if desired.
Fabrice Bellard's TCC is another smaller C compiler. But, it uses some
features of the GNU C library I wasn't familiar with. I.e., my porting
efforts didn't start off too well. There are a couple minimal C compilers
in the IOCCC contest archives. There was a small C compiler called WCC by
Lutz Hamel from DECUS.

> but when I found small C I got that magic feeling.
>

There are many versions of SmallC. I've got 21 in my collection. I'm not

BGB

unread,
Apr 22, 2011, 12:05:17 PM4/22/11
to

<snip>

>>
>> ...
>>
>> or such...
>
> And what will be the main purpose of BGBscript?

thus far, it is mostly used for interactive evaluation, and for script
fragments (for my apps and game project).

as is, I can type "; expr" at the console to evaluate things.

for example:
"; btParticleExplosion(player.origin);"
would cause a particle explosion effect where the player is standing.
(previously "eval" was being used as the command-name, but this was more
effort to type).

although BGBScript is not technically C, with the current machinery
which mostly gives it access to C-land functions and data-structures, it
gets "fairly close" to like what I would get if I could have interactive
C command-entry anyways (the main places where C and BS syntax differs
are not generally those which are involved with interactively typing
commands into the console).


a drawback of BS though is that it is a little weak and has poor enough
performance to where it is not particularly well-suited for non-trivial
implementation code (nearly all implementation code then being mostly C,
and some C++ and Java here and there, albeit the Java via a
custom-written mini-JVM, which only really implements J2ME functionality).

as an example, at present it takes several seconds to evaluate fib(28)
or fib(30) via a recursive function, vs under 1s for C.

example (C):


int fib(int x)
{
if(x>2)return(fib(x-2)+fib(x-1));

else return(1);
}
or (BS):
function fib(x)
{
if(x>2)return(fib(x-2)+fib(x-1));
else return(1);
}

granted, most real-life uses are not this sensitive to performance (a
1000x slowdown being apparently acceptable for many script languages).


the BS2 language would have been a possible alternative to BS, C, and
Java, while still retaining similar capabilities to all of them (more
readily portable than C, faster and more solid than BS, and like Java
without the horrid FFI issues and stupid single-class-per-file
limitations). however, the BS2 effort is proving to be a bit much work,
and I have more pressing issues to deal with...


> Do you recive any feedback from potentiol users?

not so much...
I am currently the sole/main user of the language, and so at this point
it is mostly my opinion which matters.

although it is currently my plan to release the BGBScript VM and similar
under the MIT license, the current main apps and game which uses it, is
intended more to be a closed-source / proprietary app.


admittedly, I would almost prefer to just make everything open source
(for benefit to humanity and so on), but I am faced with a greater problem:
I need some way to support myself (hopefully), as presently I lack a job.

trying for a sellable project (a game, and maybe my custom DCC tools,
which would probably come along with the game) is a possible way to get
some money.


I also have "target_load" and "target_eval" entity types which will load
scripts and evaluate code fragments when triggered.

the idea being here that it makes more sense to have map-specific
behavior as scripts rather than as C code and hard-coded into the game.

otherwise, the world structure is mostly similar to the Quake-series
games (and at the moment I am mostly using the Quake1 maps for engine
development/testing). however, the present engine is almost entirely my
own code (excepting a some VM-related code, which is generally either
public-domain or uses BSD-style licenses).


later, I will probably need to make my own maps, mostly as:
the Quake maps are GPL, being ominious;
trying to make an original game would be defeated by just running along
through the Quake maps (then it would just be a "from the ground up
Quake emulator").

but, given these maps often use around 1000-2000 brushes and 500 or so
light sources, they have some merit in helping point out performance
problems (most of my own maps are less complicated).

note that most 3D models are in a custom skeletal format (actually, a
slightly tweaked AC3D format, with the skeleton and animations in
separate files), but the engine itself also supports Valve SMD (the main
model format used in Half-Life and HL2 / Source-Engine games).

mostly I am using my own model format for several reasons:
having separate meshes and skeletons is more generic, and allows a 3D
modeler that doesn't need to (itself) care about skeletal animation
issues (or, for that matter, having my animation tool also have to deal
with 3D modeling issues);
SMD uses angles and animations store the pose in every frame, both of
which are drawbacks WRT animation (my present animation format encodes
transformation matrices instead).

as a drawback:
my tweaked AC3D format is not strictly compatible with existing tools
which use AC3D. the files will load (no major incompatible structural
changes were made), however, the texture-system differs some (so
textures/shaders don't work).


I wrote my own 3D modeler/animator tools, mostly as:
most existing (none crap) 3D modelers cost piles of money (and the
market is at this point almost entirely controlled by Autodesk), and of
all things, I prefer to avoid piracy when possible;
most of the free modelers were either crap, unusable for my purposes, or
generally so broken as to not be worth bothering with.

my tools generally suck as well (and can't deal with models with more
than around 10k-20k polygons without lag), but at least they work for my
purposes (and the 10k-20k poly limit is far above the 3k limit in-game,
mostly as much above 3k and character models will start causing lag
in-game...).


or such...

ed_davis2

unread,
Apr 22, 2011, 12:07:38 PM4/22/11
to

This thing is just way too cool! I've spent far too many hours
playing with it the last two days.

Thanks so much for sharing it!

Is it going to become an IOCCC entry?

BartC

unread,
Apr 22, 2011, 5:31:53 PM4/22/11
to
"BGB" <cr8...@hotmail.com> wrote in message
news:ios93s$3uk$1...@news.albasani.net...

> a drawback of BS though is that it is a little weak and has poor enough
> performance to where it is not particularly well-suited for non-trivial
> implementation code (nearly all implementation code then being mostly C,
> and some C++ and Java here and there, albeit the Java via a custom-written
> mini-JVM, which only really implements J2ME functionality).
>
> as an example, at present it takes several seconds to evaluate fib(28) or
> fib(30) via a recursive function, vs under 1s for C.

You've either got a quite slow machine, or you must mean fib(40), which
takes some 1.2-1.6 seconds on my low-end desktop machine.

(And my interpreted code takes 3-8 times as long, which is not unexpected.)

--
Bartc

BGB

unread,
Apr 22, 2011, 8:56:07 PM4/22/11
to

I was talking about BGBScript here, which in this case is both
interpreted and dynamically typed (and does most of its type-checking
via "strcmp()" magic...). there is a JIT, but it is not currently used
(and when using dynamic types is not that much faster).

fib(28) in C is in the millisecond range though.
but, it does take several seconds in BGBScript.

note that current CPU is an Athlon II X4 2.8GHz, with 4GB of PC2-6400
RAM, with an nVidia GeForce 460 GTX, and running Windows 7.

part of this is, after all, why I was trying to implement a replacement
language and VM, which would use static-typing instead, but I have more
important stuff going on...

or such...

Mike Austin

unread,
Apr 22, 2011, 9:27:55 PM4/22/11
to
On Apr 19, 8:56 pm, "Tony" <nos...@myisp.net> wrote:
> What's are you working on?
>
> (Please do not disclose patentable information). ;)

Great topic, thanks for asking :) Recently been refining the syntax
by porting benchmark programs like n-body, and looking for ambiguities
and what the language looks like when really used. Also, updated the
examples with syntax highlighting to get a better feel of what it
would look like in an editor:

https://docs.google.com/document/d/1gjvuc4wWEEJ_9zb3ks_RNyoSOBL1pR-jnTFqls6OZ3Q/edit?hl=en

I know people like "." dot notation, but I still like the visual
separation of objects and methods like Smalltalk/Io methods.

canvas draw-string: "Hello", size: 16px
vs
canvas.draw-string "Hello", size: 16.px

The dot notation makes the arguments look like they're "floating",
while the Smalltalk version all floats so it doesn't matter. Also,
you can say things like "16px", which I like. Notice the comma - it's
not really like Smalltalk, since the method signature is the first
identifier and the arguments are just keyword arguments.

Mike

tm

unread,
Apr 23, 2011, 6:07:32 AM4/23/11
to
On 23 Apr., 02:56, BGB <cr88...@hotmail.com> wrote:
> On 4/22/2011 2:31 PM, BartC wrote:
>
> > "BGB" <cr88...@hotmail.com> wrote in message

> >news:ios93s$3uk$1...@news.albasani.net...
>
> >> a drawback of BS though is that it is a little weak and has poor
> >> enough performance to where it is not particularly well-suited for
> >> non-trivial implementation code (nearly all implementation code then
> >> being mostly C, and some C++ and Java here and there, albeit the Java
> >> via a custom-written mini-JVM, which only really implements J2ME
> >> functionality).
>
> >> as an example, at present it takes several seconds to evaluate fib(28)
> >> or fib(30) via a recursive function, vs under 1s for C.
>
> > You've either got a quite slow machine, or you must mean fib(40), which
> > takes some 1.2-1.6 seconds on my low-end desktop machine.
>
> > (And my interpreted code takes 3-8 times as long, which is not unexpected.)
>
> I was talking about BGBScript here, which in this case is both
> interpreted and dynamically typed (and does most of its type-checking
> via "strcmp()" magic...). there is a JIT, but it is not currently used
> (and when using dynamic types is not that much faster).
>
> fib(28) in C is in the millisecond range though.
> but, it does take several seconds in BGBScript.

fib(28) with interpreted Seed7 takes 400 milliseconds.
A compiled fib(28) Seed7 program takes 5 milliseconds.

BGB

unread,
Apr 23, 2011, 3:11:01 PM4/23/11
to
On 4/23/2011 3:07 AM, tm wrote:
> On 23 Apr., 02:56, BGB<cr88...@hotmail.com> wrote:
>> On 4/22/2011 2:31 PM, BartC wrote:
>>
>>> "BGB"<cr88...@hotmail.com> wrote in message
>>> news:ios93s$3uk$1...@news.albasani.net...
>>
>>>> a drawback of BS though is that it is a little weak and has poor
>>>> enough performance to where it is not particularly well-suited for
>>>> non-trivial implementation code (nearly all implementation code then
>>>> being mostly C, and some C++ and Java here and there, albeit the Java
>>>> via a custom-written mini-JVM, which only really implements J2ME
>>>> functionality).
>>
>>>> as an example, at present it takes several seconds to evaluate fib(28)
>>>> or fib(30) via a recursive function, vs under 1s for C.
>>
>>> You've either got a quite slow machine, or you must mean fib(40), which
>>> takes some 1.2-1.6 seconds on my low-end desktop machine.
>>
>>> (And my interpreted code takes 3-8 times as long, which is not unexpected.)
>>
>> I was talking about BGBScript here, which in this case is both
>> interpreted and dynamically typed (and does most of its type-checking
>> via "strcmp()" magic...). there is a JIT, but it is not currently used
>> (and when using dynamic types is not that much faster).
>>
>> fib(28) in C is in the millisecond range though.
>> but, it does take several seconds in BGBScript.
>
> fib(28) with interpreted Seed7 takes 400 milliseconds.
> A compiled fib(28) Seed7 program takes 5 milliseconds.
>

yes, but it probably isn't dynamically typed either...
or, if it is, it probably uses a more efficient tagging system...


as-is, my present typesystem handles types via a process like:
lookup the heap object or memory region identified by the pointer (this
process itself has been micro-optimized some);
return the type name as a string.

use "!strcmp()" to see if it is the wanted type.
usually this is wrapped in a predicate function.

the arithmetic operators typically invoke lots of predicates, which
looks up the type a number of times (to match the correct operator logic
with the given arguments).


say:
BGBDY_API int dyintp(dyt p)
{
if(dyfixintp(p))return(1);
return(dytypep(p, "_int_t"));
// return(dytypep(p, "_fixint_t") || dytypep(p, "_int_t"));
}

BGBDY_API int dytypep(dyt p, char *ty)
{
char *ty1;

ty1=dygettype(p);

if(!ty && !ty1)return(1);
if(!ty || !ty1)return(0);

if(!strcmp(ty1, ty))return(1);
return(0);
}

"dygettype()" invokes the GC to look up the memory object and return its
type-name.

...


BGBDY_API dyt dyadd(dyt a, dyt b)
{
dyt c;
if(dyintp(a) && dyintp(b))
{ c=dyint(dyintv(a)+dyintv(b)); return(c); }
if(dylonglongp(a) && dylonglongp(b))
{ c=dylonglong(dylonglongv(a)+dylonglongv(b)); return(c); }
if(dyfloatp(a) && dyfloatp(b))
{ c=dyfloat(dyfloatv(a)+dyfloatv(b)); return(c); }
if(dyrealp(a) && dyrealp(b))
{ c=dydouble(dydoublev(a)+dydoublev(b)); return(c); }
if(dycomplexp(a) && dycomplexp(b))
{ c=dydcomplex(dtyDCadd(dycomplexv(a), dycomplexv(b))); return(c); }
//...
}

and operations like "dyintv()" may do additional type-checks, to see if
the type is something that can be converted to an integer, ...
(going and looking, "dyintv()" redirects to complex/nasty logic which I
will not post here...).

yeah... maybe some optimization could make sense here, like reducing how
often or how many type-lookups are done or similar...

granted, this would be along the lines of trying to shave off how much
heap garbage is produced, ... it would help performance, but it is also
a pain.


one realizes just how long/winding a lot of this can get if they step
through it in a debugger, which itself motivated some optimizations
within the GC.

but, usually I use profiler-driven optimization, so I don't usually go
and optimize things unless they show up notably in the profiler, but
this does miss some things.

a lot of this code was written over a long time, and the process is
often more driven by convenience than performance (more efficient GC's
or type-systems are generally more awkward to use...).

note that the BGBScript interpreter was originally written back in 2004,
at the time initially as a set of kludges (and initially the performance
was so bad as to make it nearly unusable...). however, it has since been
much improved from its original form, but is still not exactly the
fastest thing around...


it was largely rewritten ("refactored" may be more accurate) in 2006 (to
use a newer/faster design, and a much faster GC, ...) but another
partial-rewrite in 2008 reverted it to the old GC and typesystem, mostly
as the new (precise) GC was painful to work with. but some of the
new/improved core machinery was kept, such as an cons-list based parser,
rather than one built on XML/DOM (originally, because this is what I had
on-hand at the time), albeit an XML/DOM based parser is used for the C
and BS2 parser (derived from the 2004-era BGBScript parser...).


granted, in theory, "fib()" could be made faster by entering it as:
function fib(x:int):int
{
if(x>2)return(fib(x-2)+fib(x-1));
return(1);
}
where then the interpreter will used type-specialized operations
(instead of generic dynamically-typed ones).

however, last time I tried this the interpreter went off into wonky
land, and I would have to go debug it to make all this work correctly
again (I normally just use plain dynamic types here and call it "good
enough"...).

even if it was working correctly though, these don't implement proper
static-type semantics, and hence using them would essentially violate
type-safety...


otherwise, I am off trying to get a 3D game into working order, as all
my interpreter/compiler/VM stuff has eaten up a lot of time over the
years (much of which may have been better spent elsewhere in my projects).


or such...

Tony

unread,
Apr 24, 2011, 12:37:01 AM4/24/11
to

Doing language development via a grammar is no fun, IMO. I'm doing
recursive descent, manually. Indeed, perhaps it's because of grammar
technologies that we have such complex monstrous programming languages.


Tony

unread,
Apr 24, 2011, 12:48:54 AM4/24/11
to
Jacko wrote:

> when I found small C I got that magic feeling.

Expound on that please. How can something so old give "magic feelings"?


Tony

unread,
Apr 24, 2011, 12:41:35 AM4/24/11
to

We've discussed this before in here: C is only good for C, but is a
piss-poor choice for an intermediate language (the same can be said of
CLI, IMO, and for the same reasons).


Tony

unread,
Apr 24, 2011, 12:52:08 AM4/24/11
to

The Small C code is horrendous by today's coding standards (IMO)... it
rates right up there with cfront! (I also have a lot of the Small C
versions in my historical code library).


Tony

unread,
Apr 24, 2011, 12:54:38 AM4/24/11
to
James Harris wrote:
> On Apr 20, 4:56 am, "Tony" <nos...@myisp.net> wrote:
>> What's are you working on?
>
> I recently started a new job.

I'm not sure if congrats are in order or regrets (!).

BGB

unread,
Apr 24, 2011, 2:10:42 AM4/24/11
to

find old tome and archaic cards;
start getting magic feelings;
strike a pose and suddenly transform into wearing a magical-school-girl
outfit (regardless of age or gender);
proceed to battle evil magical cotton plants (which attack with
spike-covered cotton seeds).

and, all is good...

Nils M Holm

unread,
Apr 24, 2011, 2:33:06 AM4/24/11
to
X-Posted to c.l.f. with apologies for the caricature of Forth!

ed_davis2 <ed_d...@yahoo.com> wrote:
> This thing is just way too cool! I've spent far too many hours
> playing with it the last two days.

I am glad to hear so! Often the "toys" and the "half-baked ideas"
are the coolest things to play with.

> Thanks so much for sharing it!

You are very welcome!

> Is it going to become an IOCCC entry?

Ah, I don't think so. I believe it is neither sufficiently
obfuscated nor short enough (but feel free to improve it! ;-))

Anyway, here is a little bit background for those who are
interested:

My hypothesis is that any programming language is "readable"
once you get used to it. Long keywords like CALL-WITH-CURRENT-
CONTINUATION or statements like SUBTRACT A FROM B GIVING C. do
not improve readability at all. So I started an experiment to
find out whether I could get used to a programming language
with single-character keywords.

Feel free to skip to the next vertical line, if you are only
interested in the conclusion.

----------------------------------------------------------------

The language -- let's call it "useless" or just U -- is a
forthish language where (almost) each ASCII character is
an operator. Instead of

: abs dup 0 < if negate then ;

you would write

:abs d'0<[%]

where d=dup, '0=0, [=if, %=negate and ]=then.

Because all characters are taken up by operators, there has
to be a separate operator for procedure application, so you
write

'123%_abs

instead of

-123 abs

The _ calls a procedure instead of interpreting the letters
of its name as operators.

Here are some further samples:

:max oo<[s]x '( maximum of two numbers )

o copies the second stack element to the top (Forth: over)
s swaps the top values
x drops the top value

:pow'1s(o*)sx '( a b --> a^b )

N(X) repeats X N times.

:fac'1s(i*) '( n --> n! )

i copies the index of the current iteration to the stack;
note: () counts *down*, i.e.:

z'10(i) --> 10 9 8 7 6 5 4 3 2 1
'( z clears the stack)

'(X) is short for '0(X) and repeats X 0 times, i.e. it
is a comment.

A more complex example:

:snext d[\-s\+s]
:sskipw`ob?k =o##&{_snext}

\- is '1-, \+ is '1+, so _snext goes to the next character
of a string, which is represented by its address and length
on the stack.

`P{X} executes X while P is true and then repeats;
b? copies a byte from the given memory address to the stack;
kC places character C on the stack;
# is logical not, i.e. 0-->1 and (not 0)-->0;
& is bitwise and.

So _sskipw skips over leading white space in a string.

----------------------------------------------------------------

Of course, the U language may look confusing or obfuscated
to a programmer who is not used to it. But so does FORTRAN,
or C, or Java. The useless language certainly confused *me*,
its inventor, when I wrote the first program in it. :-)
BTW, it was this one:

:line'6(_bl_bl.d_pr8_bl.df!'16+)x_nl
'32`d'48<{d_line\+}x.q

(_bl prints a blank, _pr8 prints a byte, f! prints a character,
_nl prints a newline, . does nothing, q quits.)

In the following days, though, after translating a few
trivial examples, I started writing programs like an ASCII
mandelbrot explorer, a shar(1) utility, an editor with
U syntax highlighting, and a useless->C compiler. (They
are all contained in the U archive.) It was not much
harder than writing similar programs in more familiar
languages.

After getting used to the syntax, I find U programs no
less readable than Forth programs, but I can squeeze lots
of code in a single line, which I consider to be a good
thing, because most objects I might want to look up are
already on the screen. (And then my vision does not improve,
either (I already use an 18pt font in my browser and xterms)).

To counterbalance brevity, U forces you to factor (or chain)
by limiting procedures to 64 characters (inculding the :
and the name). This is why there is no terminating ; like
in Forth: every routine ends at the end of a line.

All in all I came to the conclusion that "readable keywords"
and "familiar syntax" are red herrings. Brevity is most
important. This is something that math and APL definitely
got right (but unfortunately APL destroyed the idea immediately
by using an obscure character set).

Feel free to discuss. :-)

Disclaimer: I am no longer a professional programmer, and
I do not take the whole topic terribly seriously. So please
be patient with me!

--
Nils M Holm < n m h @ t 3 x . o r g > www.t3x.org

BGB

unread,
Apr 24, 2011, 4:58:47 AM4/24/11
to


I have some mini-languages which work sort of along these lines, but are
usually limited to a certain task and are often not turing complete.

the main merit though is that usually then entire interpreter boils down
to a few while loops and switch statements.

these were not generally designed to be human readable, but with basic
familiarity they can be understood...

BartC

unread,
Apr 24, 2011, 6:03:04 AM4/24/11
to

"BGB" <cr8...@hotmail.com> wrote in message

news:iot876$73t$1...@news.albasani.net...


> On 4/22/2011 2:31 PM, BartC wrote:
>> "BGB" <cr8...@hotmail.com> wrote in message
>> news:ios93s$3uk$1...@news.albasani.net...
>>
>>> a drawback of BS though is that it is a little weak and has poor
>>> enough performance to where it is not particularly well-suited for
>>> non-trivial implementation code (nearly all implementation code then
>>> being mostly C, and some C++ and Java here and there, albeit the Java
>>> via a custom-written mini-JVM, which only really implements J2ME
>>> functionality).
>>>
>>> as an example, at present it takes several seconds to evaluate fib(28)
>>> or fib(30) via a recursive function, vs under 1s for C.
>>
>> You've either got a quite slow machine, or you must mean fib(40), which
>> takes some 1.2-1.6 seconds on my low-end desktop machine.
>>
>> (And my interpreted code takes 3-8 times as long, which is not
>> unexpected.)
>>
>
> I was talking about BGBScript here, which in this case is both interpreted
> and dynamically typed (and does most of its type-checking via "strcmp()"
> magic...). there is a JIT, but it is not currently used (and when using
> dynamic types is not that much faster).
>
> fib(28) in C is in the millisecond range though.
> but, it does take several seconds in BGBScript.

OK, so when you said under 1 second for C, you meant quite a lot under 1
second...

> note that current CPU is an Athlon II X4 2.8GHz, with 4GB of PC2-6400 RAM,
> with an nVidia GeForce 460 GTX, and running Windows 7.

After a quick check (as usually I have no idea what I'm running) I seem to
have the X2 version of the same CPU.

My current project, also interpreted and dynamically typed, takes about 35ms
to do fib(28). (An older version, with type hinting, but still interpreted,
took 16ms. Full static typing might bring that down further, but then, I'm
planning to use JIT compilation for that).

Python took about 300ms, and Ruby (surprisingly) only about 250ms. Lua was
150ms, and LuaJIT about 11ms

> part of this is, after all, why I was trying to implement a replacement
> language and VM, which would use static-typing instead, but I have more
> important stuff going on...

Your timings are not so surprising if you're using strcmp() to check types!
And presumably you check a range of types one at a time using strcmp()?

Any interpreted language ought to be able to beat Ruby and Python at
least...

--
Bartc

Marco van de Voort

unread,
Apr 24, 2011, 8:21:31 AM4/24/11
to
On 2011-04-24, Nils M Holm <news...@t3x.org> wrote:
>
> After getting used to the syntax, I find U programs no
> less readable than Forth programs, but I can squeeze lots
> of code in a single line, which I consider to be a good
> thing, because most objects I might want to look up are
> already on the screen. (And then my vision does not improve,
> either (I already use an 18pt font in my browser and xterms)).

I've reading the post with some interest, since I've had some thought about
this with an in language scripting language a while back.

Actually, the one letter keyword was more an accident then design, and more
due to a bit of organic growth in the first days, and an initial simplistic
assumption.

Anyway, my conclusion was the opposite.

> To counterbalance brevity, U forces you to factor (or chain)
> by limiting procedures to 64 characters (inculding the :
> and the name). This is why there is no terminating ; like
> in Forth: every routine ends at the end of a line.

That is probably partially the reason for the lesser readability, short
procedures, and enforcing it. A one line description comment can then go a
long way.

But, structuring programs in a different way, only shovels complexity
around. Forcing short procedure gives you simpler procedures, but hoists
the complexity of the problem from the procedure implementation to
complicate the global state, calling chain and (if you try to avoid global
state) parameter transmission.

The classic problems of such statemetns is that they are demonstrated to be
true using a few math-centric example program, which are fairly localised
and often easily divisible into subprograms.

It is the real life problems, with their business rules exception, large
amounts of state, legacy concerns, multiple programmers, communication with
the outside world and code that is not written by you that poses the real
challenges though.

Not the little math world where everything is controlled, written by you,
and where you have complete knowledge about the language, its implementation
and the test problems.

That is a situation that simply never happens in professional programming.

> All in all I came to the conclusion that "readable keywords"
> and "familiar syntax" are red herrings. Brevity is most
> important. This is something that math and APL definitely
> got right (but unfortunately APL destroyed the idea immediately
> by using an obscure character set).

Math is a different problem. It is not a programming language.

I'd say the reason that short notation works so good for math is that the
typical application is following some derivation where readers constant
compare how to go from one equation to the next. The short but welldesigned
syntax lends itself perfectly for comparing equation N-1 with N. Moreover
while there is some global state (border conditions, approximations etc) it
is fairly finite.

But when programming you are not constantly comparing procedures, so I see
no need to drag math syntax into it.

> Feel free to discuss. :-)

As you wish:-)

> Disclaimer: I am no longer a professional programmer, and
> I do not take the whole topic terribly seriously. So please
> be patient with me!

To appreciate such things, you need to have taken over a program of several
hundreds of thousands of lines from other programmer(s), while being under
tremendous timepressure to make alterations at one hand, and keep it stable
on the other hand.

BGB

unread,
Apr 24, 2011, 1:50:15 PM4/24/11
to BartC

yeah.


>> note that current CPU is an Athlon II X4 2.8GHz, with 4GB of PC2-6400
>> RAM,
>> with an nVidia GeForce 460 GTX, and running Windows 7.
>
> After a quick check (as usually I have no idea what I'm running) I seem to
> have the X2 version of the same CPU.
>

yeah. prior computer ran an Athlon64 X2 1.8GHz, which was slower on pure
CPU activities than my newer laptop, but said laptop came with a
terrible (Intel) GPU...


> My current project, also interpreted and dynamically typed, takes about
> 35ms
> to do fib(28). (An older version, with type hinting, but still interpreted,
> took 16ms. Full static typing might bring that down further, but then, I'm
> planning to use JIT compilation for that).
>
> Python took about 300ms, and Ruby (surprisingly) only about 250ms. Lua
> was 150ms, and LuaJIT about 11ms
>

hence, why I was trying for a statically typed language, or using a
custom-written mini-JVM at one point, because it is much faster...


>> part of this is, after all, why I was trying to implement a replacement
>> language and VM, which would use static-typing instead, but I have more
>> important stuff going on...
>
> Your timings are not so surprising if you're using strcmp() to check types!
> And presumably you check a range of types one at a time using strcmp()?
>

actually, worse, in many cases the type-name is not even cached between
operations, and so needs to be endlessly re-fetched from the object,
which is much slower than just the "strcmp()" calls...

but, I guess maybe there are some of these areas that need a bit of
optimizing.


in other parts of the VM framework, there are many "short circuit"
optimizations.

such as, before I went and optimized class/instance method dispatch,
getting (static) method calls down to about 75ns, but in so doing ended
up cutting through several layers of abstractions.

note that although one can use numerical type-ids, they are much more
painful to work with, especially with a somewhat ad-hoc typesystem, and
there is no good (decentralized) way to assign unique type-numbers (hash
codes also have their own drawbacks).

in the GC, type-numbers do exist, but are not generally usable in switch
statements because they are not constant.

however, potentially prefetching type numbers and going through if/else
chains would still be much faster than the current strategy (or, hell,
even prefetching the type names and doing all checks against these,
rather than using piles of predicates which constantly re-fetch them...).


sadly, slow top-level logic is harder to diagnose in a profiler, which
will tend to just show the leaf functions eating up the most time,
rather than the actual logic which is calling them.


> Any interpreted language ought to be able to beat Ruby and Python at
> least...
>

ideally, yes...

but, also trying to keep code clean and modular is also sometimes the
enemy of good performance, as it providing a nice API, ...


one of the worst areas at the moment is actually in the
dynamically-typed arithmetic operations (and similar), but I have not
gotten around to a lot of this (so many other higher-priority things to
do...).

foxchip

unread,
Apr 24, 2011, 11:54:05 PM4/24/11
to
On Apr 23, 10:33 pm, Nils M Holm <news2...@t3x.org> wrote:
> X-Posted to c.l.f. with apologies for the caricature of Forth!
>
> ed_davis2 <ed_dav...@yahoo.com> wrote:
> > This thing is just way too cool!  I've spent far too many hours
> > playing with it the last two days.
>
> I am glad to hear so! Often the "toys" and the "half-baked ideas"
> are the coolest things to play with.
>
> > Thanks so much for sharing it!
>
> You are very welcome!
>
> > Is it going to become an IOCCC entry?
>
> Ah, I don't think so. I believe it is neither sufficiently
> obfuscated nor short enough (but feel free to improve it! ;-))
>
> Anyway, here is a little bit background for those who are
> interested:
>
> My hypothesis is that any programming language is "readable"
> once you get used to it. Long keywords like CALL-WITH-CURRENT-
> CONTINUATION or statements like SUBTRACT A FROM B GIVING C. do
> not improve readability at all. So I started an experiment to
> find out whether I could get used to a programming language
> with single-character keywords.

Chuck used that idea on the original machineForth for P20
back in 1990. He mostly used single character names.

DUP was T. OVER was S. R> was R. etc.
DROP was T!. >R was R!. The LIT was N.

It has been published and discussed before.

As an experiment I took the idea to the next level using
single character names for NOT, BEGIN, UNTIL, WHILE, IF,
ELSE, THEN, -IF, -UNTIL, -WHILE, FOR, NEXT, DROP, I, >R,
R>, and the rest of the words.

It made for very very compact source. I also realized
that since all those words were single characters that
the spaces between them were not needed.

I made a version of machineForth that worked that way
and used it for a few projects over the years to follow
up on the experiment and retain my ability to read it.

I showed it to Chuck and he found it amusing. I then
joked that since the original OKAD had a very odd
character set that included all the symbols needed to
lay out pictures of circuits that those symbols were
also available as characters for names of single
character names when not in code layout mode. I must
say it really looked strange and reminded me APL.

Yes. It was a half-baked idea even if I baked it
quite a bit. But I concluded that people already
had so much trouble adjusting to the new words in
the improved set of Forth primitives designed to
make code smaller, faster, and clearer from the
late eighties and early nineties that I knew it
would never seem acceptable to people to whom
DUP was and always would be DUP.

I convinced Chuck to go back to many traditional
Forth word names although he did insist that he
never liked >R and R> and has used PUSH and POP
ever since. He preferred the name - to NOT
since it was meant to convey the idea that one
way to reduce code size and make code run faster
was to use one's complement math in places and
get the same result as two's complement yet
avoid a 1+ in one place and a 1- in another.
But since it is both a minus and NOT he pronounced
it "dash."

And of course he insisted that inclusive-OR was
very uncommon in code while exclusive-OR was very
common and that exclusive-OR should be a primitive
and just named OR.

These are things he continued to do for the last
twenty years and despite the dozens or hundreds
or explanations they still confuse people who
didn't pay attention to the last twenty years
or who just remember the Forth they did in the
seventies.

> ----------------------------------------------------------------
>
> Of course, the U language may look confusing or obfuscated
> to a programmer who is not used to it. But so does FORTRAN,
> or C, or Java.

Or Forth. BTW I used U for Until.

> To counterbalance brevity, U forces you to factor (or chain)
> by limiting procedures to 64 characters (inculding the :
> and the name). This is why there is no terminating ; like
> in Forth: every routine ends at the end of a line.

There is no need to terminate with ; in machineForth which
allows for fall-through and accounts for a few percent of
an improvement on short definitions.

; also functions as EXIT in machineForth and colorforth.

> All in all I came to the conclusion that "readable keywords"
> and "familiar syntax" are red herrings. Brevity is most
> important. This is something that math and APL definitely
> got right (but unfortunately APL destroyed the idea immediately
> by using an obscure character set).

That's why I joked with Chuck that if he wanted to make
code look obscure he could use the original OKAD font
and have it resemble compressed APL.

> Feel free to discuss. :-)

For me the bottom line was that it is hard for most
programmers to learn new things once things become
habit that they don't think about anymore. Perhaps
when teaching children who haven't been ruined yet
it might work well.

> Disclaimer: I am no longer a professional programmer, and
> I do not take the whole topic terribly seriously. So please
> be patient with me!

Best Wishes

Brad

unread,
Apr 25, 2011, 12:48:22 PM4/25/11
to
On Apr 23, 11:33 pm, Nils M Holm <news2...@t3x.org> wrote:
>
> :max oo<[s]x  '( maximum of two numbers )
>
The classic Forth interpreter needs whitespace between words. Take a
look at colorForth. Each word is encoded as a data structure
containing attributes and characters. The interpreter doesn't need
whitespace between tokens, nor does it need to keep track of STATE.
The colors (attributes) replace STATE manipulation.

Your syntax could benefit a lot from the ideas in colorForth, although
you might want to use something simpler than the Huffman coding chosen
by Chuck Moore.

You'd need an editor for your code. Maybe an HTML editor would be
sufficient, if you set up a CSS that simplifies editing. Or, you can
make your own editor. Set it up so when you mouse-over a particular 1-
character operator, a help window shows the stack and description of
it.

-Brad

Brad

unread,
Apr 25, 2011, 2:05:17 PM4/25/11
to
On Apr 23, 11:33 pm, Nils M Holm <news2...@t3x.org> wrote:
> After getting used to the syntax, I find U programs no
> less readable than Forth programs, but I can squeeze lots
> of code in a single line, which I consider to be a good
> thing, because most objects I might want to look up are
> already on the screen. (And then my vision does not improve,
> either (I already use an 18pt font in my browser and xterms)).

You have covered a lot of ideas explored by Chuck Moore (his eyesight
isn't great either) in colorForth.

> important. This is something that math and APL definitely
> got right (but unfortunately APL destroyed the idea immediately
> by using an obscure character set).

colorForth defines its own glyphs, so the idea of giving Forth
primitives their own symbols was tried and rejected. I like your
syntax because it doesn't need color. An editor could color it for you
to make it more readable. A clever editor could display documentation
in a mouseover pane so you could just hover over a character to see
what it means.

You might want to check out colorForth to see what features might be
useful. The ability to switch between immediate and compile states
(done at edit time in colorForth) would be useful, but you are already
using [ and ].

-Brad

BartC

unread,
Apr 25, 2011, 6:08:19 PM4/25/11
to
"BGB" <cr8...@hotmail.com> wrote in message
news:4DB462D7...@hotmail.com...

> On 4/24/2011 3:03 AM, BartC wrote:

[Speed of fib(28) with interpreted code]

>> My current project, also interpreted and dynamically typed, takes about
>> 35ms to do fib(28).

>> Python took about 300ms, and Ruby (surprisingly) only about 250ms. Lua


>> was 150ms, and LuaJIT about 11ms
>>
>
> hence, why I was trying for a statically typed language, or using a
> custom-written mini-JVM at one point, because it is much faster...

Yes, but all the above timings are for *dynamic* types. Sure, you can switch
to static typing, but that is avoiding the problem (and dynamic typing has
many advantages over static that would be lost).

> actually, worse, in many cases the type-name is not even cached between
> operations, and so needs to be endlessly re-fetched from the object, which
> is much slower than just the "strcmp()" calls...
>
> but, I guess maybe there are some of these areas that need a bit of
> optimizing.

Hmm, I think it would take more of a totally different approach than
optimising!

> note that although one can use numerical type-ids, they are much more
> painful to work with, especially with a somewhat ad-hoc typesystem, and
> there is no good (decentralized) way to assign unique type-numbers (hash
> codes also have their own drawbacks).

I do use a set of unique type-numbers to represent each type, but my method
(which generates far too many types at present) ceases to be too practical
at around 1000 types; then, a different, somewhat slower approach is needed
(although that doesn't involve calling strcmp()...)

These have to be fixed-up using a centralised method (why would you want it
decentralised?), which ensures that type "T" in one module, has the same
type-number as "T" in another module.

I also have a type 'signature' which I use to mean a combination of two
types, created at runtime as needed; so each distinct combination of two
type numbers given a unique signature code. This is used for dispatching of
operations with two operands.

And, there is a whole bunch of jump-, call- and other tables which means I
don't need to bother with 'switch' statements too much (although fundamental
types *do* have fixed type-numbers useable in a switch statement).

> sadly, slow top-level logic is harder to diagnose in a profiler, which
> will tend to just show the leaf functions eating up the most time, rather
> than the actual logic which is calling them.

I'm thinking that you don't need a profiler to tell you that using string
processing for type-dispatching might be slowing things down. But then I
don't know how the rest of your language works; maybe there are other,
bigger overheads. (And for a 'script' language, the performance may not be
crucial, as you say.)

--
Bartc

BGB

unread,
Apr 25, 2011, 10:47:08 PM4/25/11
to
On 4/25/2011 3:08 PM, BartC wrote:
> "BGB" <cr8...@hotmail.com> wrote in message
> news:4DB462D7...@hotmail.com...
>> On 4/24/2011 3:03 AM, BartC wrote:
>
> [Speed of fib(28) with interpreted code]
>
>>> My current project, also interpreted and dynamically typed, takes about
>>> 35ms to do fib(28).
>
>>> Python took about 300ms, and Ruby (surprisingly) only about 250ms. Lua
>>> was 150ms, and LuaJIT about 11ms
>>>
>>
>> hence, why I was trying for a statically typed language, or using a
>> custom-written mini-JVM at one point, because it is much faster...
>
> Yes, but all the above timings are for *dynamic* types. Sure, you can
> switch
> to static typing, but that is avoiding the problem (and dynamic typing has
> many advantages over static that would be lost).
>

yeah.

for a long time, I have been trying to switch to a primarily
statically-typed model, rather than the present system which mixes a lot
of stuff...


>> actually, worse, in many cases the type-name is not even cached between
>> operations, and so needs to be endlessly re-fetched from the object,
>> which
>> is much slower than just the "strcmp()" calls...
>>
>> but, I guess maybe there are some of these areas that need a bit of
>> optimizing.
>
> Hmm, I think it would take more of a totally different approach than
> optimising!
>


well, usually there are ways to make things faster which may not involve
changing around all the code or compromising the behavior of the
external interface.

BGB

unread,
Apr 25, 2011, 11:34:01 PM4/25/11
to
disregard higher, accidentally hit "send" when I meant file/save...


On 4/25/2011 3:08 PM, BartC wrote:

> "BGB" <cr8...@hotmail.com> wrote in message
> news:4DB462D7...@hotmail.com...
>> On 4/24/2011 3:03 AM, BartC wrote:
>
> [Speed of fib(28) with interpreted code]
>
>>> My current project, also interpreted and dynamically typed, takes about
>>> 35ms to do fib(28).
>
>>> Python took about 300ms, and Ruby (surprisingly) only about 250ms. Lua
>>> was 150ms, and LuaJIT about 11ms
>>>
>>
>> hence, why I was trying for a statically typed language, or using a
>> custom-written mini-JVM at one point, because it is much faster...
>
> Yes, but all the above timings are for *dynamic* types. Sure, you can
> switch
> to static typing, but that is avoiding the problem (and dynamic typing has
> many advantages over static that would be lost).
>

yeah.

for a long time, I have been trying to switch to a primarily
statically-typed model, rather than the present system which mixes a lot
of stuff...

>> actually, worse, in many cases the type-name is not even cached between
>> operations, and so needs to be endlessly re-fetched from the object,
>> which
>> is much slower than just the "strcmp()" calls...
>>
>> but, I guess maybe there are some of these areas that need a bit of
>> optimizing.
>
> Hmm, I think it would take more of a totally different approach than
> optimising!
>

well, usually there are ways to make things faster which may not involve
changing around all the code or compromising the behavior of the
external interface.

>> note that although one can use numerical type-ids, they are much more
>> painful to work with, especially with a somewhat ad-hoc typesystem, and
>> there is no good (decentralized) way to assign unique type-numbers (hash
>> codes also have their own drawbacks).
>
> I do use a set of unique type-numbers to represent each type, but my method
> (which generates far too many types at present) ceases to be too practical
> at around 1000 types; then, a different, somewhat slower approach is needed
> (although that doesn't involve calling strcmp()...)
>

well, I meant statically generate unique type-numbers, since although a
run-time number exists, it is not so usable for switch statements, which
are a major way of doing type dispatch.


> These have to be fixed-up using a centralised method (why would you want it
> decentralised?), which ensures that type "T" in one module, has the same
> type-number as "T" in another module.
>

the reason for decentralizing is partly because:
the GC, typesystem, and BGBScript VM, exist in different DLLs which
should theoretically not depend on each others' internals.

most other types are created by independent DLLs, ...

hence, there is no good central location which would be a good place to
put this, and so I currently don't have any centralized type-list.

> I also have a type 'signature' which I use to mean a combination of two
> types, created at runtime as needed; so each distinct combination of two
> type numbers given a unique signature code. This is used for dispatching
> of operations with two operands.
>
> And, there is a whole bunch of jump-, call- and other tables which means I
> don't need to bother with 'switch' statements too much (although
> fundamental
> types *do* have fixed type-numbers useable in a switch statement).
>

no direct analogue of the above at present, but I was considering
switching over to to an operator dispatch system which would involve
using pairs of types for all operators, and a generalized lookup algo,
rather than a big mess of if/else logic and predicate functions.

>> sadly, slow top-level logic is harder to diagnose in a profiler, which
>> will tend to just show the leaf functions eating up the most time, rather
>> than the actual logic which is calling them.
>
> I'm thinking that you don't need a profiler to tell you that using string
> processing for type-dispatching might be slowing things down. But then I
> don't know how the rest of your language works; maybe there are other,
> bigger overheads. (And for a 'script' language, the performance may not be
> crucial, as you say.)
>

yeah, the GC is also really slow when it runs...


but, anyways, I originally wrote the type-system well before I
considered building a VM on it (originally, it was used for XML-RPC and
as a C-level utility API), and despite being slow in many places, the
current system is fairly convenient to work with.

later, when the first form of the BGBScript VM was written, it was just
sort of thrown together from what code I had on-hand.

as noted, I may a far more efficient version in late 2006 (the
interpreter was around 15x slower than C, and the JIT was around 1.5-3x
slower than C), but its VM interface was so awkward to use that I
largely ended up not using it (it required registering roots and similar
keeping track of reference counts and similar...).

the (current) 2008 version just sort of hacked it back together with
code from the original (2004) version. which largely regained ease of
use (I can use the same MM/GC for both the interpreter and for generic C
code), but lost most of its performance gain.

in 2010, I added a spiffy/fancy new FFI, which made it in general a much
more attractive option, but its performance still left a lot to be desired.

but, for light-weight scripting tasks (evaluating interactively-typed
commands, and simple high-level scripts, it is "fast enough").

albeit, it is not exactly a language one would want to write much
serious (program implementation) code in, as otherwise the performance
will suck, and the GC will probably end up running too often, ...


but, I may try to optimize it some, as at the moment optimizing involves
a lot less than trying to write an almost entirely new VM to replace it...


or such...

Ian Osgood

unread,
Apr 26, 2011, 10:52:24 AM4/26/11
to
On Apr 23, 11:33 pm, Nils M Holm <news2...@t3x.org> wrote:
> X-Posted to c.l.f. with apologies for the caricature of Forth!
>
> My hypothesis is that any programming language is "readable"
> once you get used to it. Long keywords like CALL-WITH-CURRENT-
> CONTINUATION or statements like SUBTRACT A FROM B GIVING C. do
> not improve readability at all. So I started an experiment to
> find out whether I could get used to a programming language
> with single-character keywords.

Of course, this is nothing new. The entire field of esoteric
programming languages began in 1993 with Wouter van Oortmussen's FALSE
language, a 1K compiler which embodies similar ideals. A host of
similar languages have sprung up since then.

http://esolangs.org/wiki/FALSE
http://esolangs.org/wiki/Category:Stack-based

The esolangs site is a wiki, and they would welcome the addition of
your "U" language.

Ian

Brad

unread,
Apr 26, 2011, 11:05:16 AM4/26/11
to
On Apr 23, 11:33 pm, Nils M Holm <news2...@t3x.org> wrote:
> :abs d'0<[%]

Usually brackets are used to manipulate STATE. What terminates the
definition besides EOL?

It's kind of like colorForth without the color. Maybe you should mix
your 1-char notation with colors. You could have the editor display
glossary information in a mouseover pane.

You have covered some of the same ground as Chuck Moore, so you might
be onto something.

-Brad

Ian Osgood

unread,
Apr 26, 2011, 11:31:48 AM4/26/11
to
On Apr 23, 11:33 pm, Nils M Holm <news2...@t3x.org> wrote:
> X-Posted to c.l.f. with apologies for the caricature of Forth!

> All in all I came to the conclusion that "readable keywords"


> and "familiar syntax" are red herrings. Brevity is most
> important. This is something that math and APL definitely
> got right (but unfortunately APL destroyed the idea immediately
> by using an obscure character set).
>
> Feel free to discuss. :-)

If you are serious about combining brevity with readability, there is
another experiment you should be aware of.

There have been many experiments to create minimal languages, to
determine the minimal number of operators and keywords required to
still have a usable language. Usually stack language semantics are
chosen so that data-naming syntax can be eliminated. One of the
experimenters noticed that programs written in such a language often
contained common "phrases", oft repeated short sequences of words for
doing common data operations. Although most people think of such
sequences in terms of actual sentences or fragments, he thought of it
as a word in itself. By naming the keywords as syllables such as "BA"
and "RE", such common phrases become pronounceable words themselves.
This may be more readable because it draws upon our innate linguistic
skills.

The project in question, Nibz, was a minimal instruction set for a
custom VSLI core with 16 opcodes named as syllables.

http://code.google.com/p/nibz/wiki/InstructionSet

(In fact, I believe the author Jacko replied to you in c.l.f!)

AshleyF

unread,
Apr 26, 2011, 1:57:59 PM4/26/11
to
Try out: http://www.lkjsdf.com/archive/turtle/?(320dg0(1adf89r)720x1lc)1000x
Compact, single-character words, interactive Logo-like toy language.

One interesting aspect is the use of quotations (similar to Joy or
Factor). You can define new words with (abc)Z then use Z elsewhere in
place of abc or you can push quotations on the stack and apply
combinators such as "repeat" - x. For example, (abc)10x repeats abc 10
times.

Anyway, might be interesting to you...

Mat

unread,
Apr 26, 2011, 3:45:32 PM4/26/11
to
Nice language ;) It have some similarities to false:
http://en.wikipedia.org/wiki/FALSE

I think both profits from eliminating white spaces as separator but if
I rewrite your examples there came close to readable code for me:

:abs d '0 < [%]
:fac '1 s (i *)

You can obfuscate your language much more by replacing (optional?)
white spaces though other characters like #,°,^,§ at will:

:abs&d°'0#<^[%]
:fac#'1^s§(i²*)

Looks like a postfix version of Perl ;)

Rod Pemberton

unread,
Apr 26, 2011, 8:11:25 PM4/26/11
to
"Ian Osgood" <ia...@quirkster.com> wrote in message
news:fd723c5e-6e16-4b57...@f31g2000pri.googlegroups.com...

>
> The entire field of esoteric
> programming languages began in 1993 with Wouter van
> Oortmussen's FALSE language, a 1K compiler which
> embodies similar ideals. A host of
> similar languages have sprung up since then.
>
> http://esolangs.org/wiki/FALSE
> http://esolangs.org/wiki/Category:Stack-based
>

Did you forget about INTERCAL?
http://esolangs.org/wiki/INTERCAL

Is there an esoteric language listed on esolangs.org that is actually
_useful_ as a language? FALSE? Befunge? Brainfuck? Stlisp? Toadskin?
Brainfuck is somewhat useable, however, most assembly languages are far more
powerful. FALSE doesn't look too bad, but it's still very simple, i.e., not
powerful. Most Esolang languages seem to be in the spirit of INTERCAL...


Rod Pemberton

Rod Pemberton

unread,
Apr 26, 2011, 8:12:20 PM4/26/11
to
"Ian Osgood" <ia...@quirkster.com> wrote in message
news:f7438818-1ec1-44b1...@l2g2000prg.googlegroups.com...

>
> There have been many experiments to create minimal languages, to
> determine the minimal number of operators and keywords required to
> still have a usable language.
>

Yes.

E.g.,
Brainfuck using 8 commands
Malbolge using 7 commands
a minimal Lisp using 8 to 14 operators
realistic minimal Forths using 11 to 31 primitives
OISC - one instruction set computer

http://en.wikipedia.org/wiki/One_instruction_set_computer


OISC allows you to reduce all functionality to one instruction. But, once
you've got a bunch of one instruction operations, how do you turn them into
something useful? I.e., at that point, assembly is a much higher level
construct. You need to use assembly effectively to get work done. So,
except for bootstrapping a language and implementation convienence, I'm not
sure if there is any purpose to having a language that can be constructed
from minimalist set of functionality. It's not going to effectively use the
host platform's assembly language. Since it won't, it'll produce bloated or
slow code.

> stack language
>

I.e., 0-operand instruction set
http://en.wikipedia.org/wiki/Instruction_set

> Usually stack language semantics are
> chosen so that data-naming syntax can be eliminated.
>

Do you consider Brainfuck to be stack or array?

> The project in question, Nibz, was a minimal instruction set
> for a custom VSLI core with 16 opcodes named as syllables.
>

On paper, decades ago, I reduced the 6502 instruction set to 13
instructions. For integer programming, even on x86 today, you only
need a handful of instructions. Yet, the x86 has many instructions, i.e.,
more work per instruction (CISC).


Rod Pemberton


Nomen Nescio

unread,
Apr 27, 2011, 4:42:44 PM4/27/11
to
"Rod Pemberton" <do_no...@noavailemail.cmm> wrote:

> Do you consider Brainfuck to be stack or array?

BF is definitely NOT a stack based language. There's nothing in the spec
that is remotely related to a stack. It's based on the idea of a paper tape
rolling along, which is best represented by an array or vector.

You CAN use stacks to do stuff like checking nesting levels (not required by
the language spec btw) but you don't need to. There are other ways to do
that. I have a BF interpreter written in Ada without any use of stacks and
AFAIK it passes all the BF test suite.

> On paper, decades ago, I reduced the 6502 instruction set to 13
> instructions. For integer programming, even on x86 today, you only
> need a handful of instructions. Yet, the x86 has many instructions, i.e.,
> more work per instruction (CISC).

Now that you say that a neat project might be an RISC assembler for x86. It
would implement all the thousands of less needed opcodes as macros expanding
to a small set of basic instructions. After you prove to Intel they're full
of shit with all these instructions they'll hire you to design their next
CPU family...either that or they put out a hit on you ;-)

BGB

unread,
Apr 27, 2011, 4:49:52 PM4/27/11
to

well, the complexity has to go somewhere...
one can have a high-level ISA, which allows simpler compilers and is
easier to write ASM for, but at the cost of a more complex processor.
a simpler processor can have a smaller ISA, but is much harder to use
(for humans and compilers), or results in reduced performance.

granted, whether or not a processor needs 1000s of instructions, or only
a few hundred, is a less certain issue.


I have observed that most of my custom interpreters seem to level off
with around a few hundred opcodes, even with statically-typed opcodes
(where one ends up with many variations on an instruction differing only
in type).


I recently designed a "new" statically typed bytecode (for a possible
alternate way of approaching my "BGBScript2" language implementation,
if/when I get back to it), which dropped the prior strategy (which used
implied/inferred static types) to use an explicitly-typed bytecode
design (more like Java ByteCode, but with more features...).

the initial design managed to level off at around 320 opcodes (it may
expand some more, but not likely drastically, apart from adding
micro-optimization opcodes such as combined compare/jump opcodes and
similar).

the advantage of using explicit types though would be that it would
allow a much simpler interpreter and JIT (as both are relieved from
having to deal with an otherwise complex type and scope-management system).

I largely wrote out an interpreter as well, but didn't finish it up (nor
write any compiler code to produce the new bytecode format it would need).


for now, I have ended up working some on making some more modest
improvements to my existing BGBScript VM (which is mostly dynamically
typed, with type-specialized opcodes as optimization special cases).

recently I also went and fixed the type-tagging and type-inference
machinery, as apparently these had broken probably back during the 2008
rewrite (which was a bit hack-and-slash, as BS was not a high priority
at the time, given how much effort I was investing into trying to use C
as a viable scripting language...), and made a few (modest) alterations
which should make performance less dismal (allowing basic arithmetic
operations to skip type-lookups and "strcmp()" calls...).

turned out the type-tagging problem was due to a mismatch between the
parser and compiler code as to what form the types-tags would take in
the AST.


quick check:
fib(28) dynamically-typed runs about 7.2s to 7.9s.
fib(28) with type-tags is around 5.4s.

it is still prone to triggering GC (I suspect because the interpreter is
presently bulk-leaking cons cells... which are used for setting up the
lexical environment frames). it will run around 18s-22s if a GC triggers
right in the middle of the operation...

probably doing something about the cons-cell leak would be better.
previously, there was a feature where non-captured frames would use the
stack (instead of cons-based lists) for storing lexical variables, but
this logic is disabled at the moment (I think I disabled it previously
in an effort to hunt down bugs, and disabling non-essential features
helps make bugs easier to identify).


technical note: this is in a 3D engine (a 3D FPS/ARPG style game
project) which currently uses around 300MB of heap data, which makes the
GC a bit slower... (I was running the fib-checks from the in-game
console...).

so... it is slow, but maybe I can improve it...


here is the code for the test I was running (BGBScript):
function fib(x:int):int
{
if(x>2)return(fib(x-2)+fib(x-1));
else return(1);
}

var t0, t1, dt;

function fibtst(x)
{
t0=clock();
printf("%f\n", t0);
fib(x);
t1=clock();
dt=t1-t0;
printf("%d\n", t0);
printf("%d\n", t1);
printf("%d\n", dt);
}


as can noted, most things I am using the scripting for at the moment
don't depend so much on raw speed...

but, it works...

Fritz Wuehler

unread,
Apr 27, 2011, 5:48:39 PM4/27/11
to
"Rod Pemberton" <do_no...@noavailemail.cmm> wrote:

> Do you consider Brainfuck to be stack or array?

BF is definitely NOT a stack based language. There's nothing in the spec


that is remotely related to a stack. It's based on the idea of a paper tape
rolling along, which is best represented by an array or vector.

You CAN use stacks to do stuff like checking nesting levels (not required by
the language spec btw) but you don't need to. There are other ways to do
that. I have a BF interpreter written in Ada without any use of stacks and
AFAIK it passes all the BF test suite.

> On paper, decades ago, I reduced the 6502 instruction set to 13


> instructions. For integer programming, even on x86 today, you only
> need a handful of instructions. Yet, the x86 has many instructions, i.e.,
> more work per instruction (CISC).

Now that you say that a neat project might be an RISC assembler for x86. It

BGB

unread,
Apr 27, 2011, 6:06:06 PM4/27/11
to

I wrote an x86 interpreter, but this is not quite the same thing.

Transmeta marketed a chip (during their short lifespan) which translated
x86 into a RISC-like form, however they couldn't match true x86 chips
performance-wise, and this was a major problem.

modern x86 chips do virtualize many of their less common instructions,
so it is not like they directly handle the entire ISA directly in the HW.

...


I had considered a mini-x86 before for other reasons (mostly to have an
x86 variant which could be more readily JITed to different ISAs,
possibly of different word sizes or endianess, vs full x86 which sort of
gets in the way of allowing this to be possible).

for example, a GPU running a minimalist x86 cores, could be interesting
(then one could more easily run custom languages on the GPU, rather than
being limited to GLSL and HLSL and a C subset and similar, due mostly at
this point to different GPUs using different ISAs internally...).

or, maybe, a JIT/translator from such an x86 subset to whatever the GPU
in question happens to be running...


or such...

Brad

unread,
Apr 28, 2011, 11:11:29 AM4/28/11
to
I hear crickets too.

You should reserve [ and ] for their traditional role of manipulating
STATE. Other than that, it's a nice syntax. Like colorForth without
the color.

IMHO colorForth's use of color is more powerful. Each word header is a
Shannon-coded data structure including a color and a name. I'd lose
the Shannon coding. Your syntax is even more compact. If you used
colors you wouldn't need [ and ]. You would benefit from an editor
that displays glossary information in a mouseover pane.

-Brad

Nils M Holm

unread,
Apr 28, 2011, 12:53:41 PM4/28/11
to

Hello everybody, and thank you for your inspiring resonses! They are
appreciated!

At the moment, there is some rather heavy restructuring going on in
my life, so unfortunately I do not have the resources to check and
write news as often and as I would like to. This is why I respond in
a single posting here.

X-posted to c.l.f. once more, in case you are not all following c.l.m.
F'up-To: c.l.m.

------------------------------------------------------------------------

Marco van de Voort <mar...@turtle.stack.nl> wrote:
> That is probably partially the reason for the lesser readability, short
> procedures, and enforcing it. A one line description comment can then go a
> long way.

I do not think I understand this. In U you can have any number
of comment lines before a single-line procedure, but I guess you
meant something else.

> Forcing short procedure gives you simpler procedures, but hoists
> the complexity of the problem from the procedure implementation to
> complicate the global state

I agree that short procedures only shift the problem to another
level. However, short procedures offer more possibilities to
introduce meaningful names for pieces of code that would otherwise
be inlined. (Of course this practice does not guarantee good code;
any rule can be bent or broken.)

> That is a situation that simply never happens in professional programming.

I agree that "real-world" programming is an entirely different
beast than tinkering in a lab. However, I would not call tinkering
"non-professional."

I find it a bit sad to hear that people consider a programming
language as no good when it cannot be used to maintain code that is
basically a mess (and, having spent a few years in business software
development, I would say that most professional code *is* a mess).

The most maintenance-friendly language for messy code may indeed
be COBOL, Pascal, or any other least-surprise language, but not
because of its syntax, but because of its predictable semantics.

> Math is a different problem. It is not a programming language.

Is this math or a program?

f(0) |--> 1
f(x) |--> f(x-1)*x

And this?

fun f 0 = 1
| f n = n * f (n - 1)

Yes, only a limited example, but I would say that the border is
blurring more and more.

------------------------------------------------------------------------

foxchip <f...@ultratechnology.com> wrote:
> Chuck used that idea on the original machineForth for P20
> back in 1990

> [ Lots of interesting trivia ...]

I was sure that the idea was not new! :-)

> Or Forth. BTW I used U for Until.

I had thought about using F and N for FOR and NEXT, but then
it occurred to me that all flow control operators have a
beginning and an end, so I decided to use the various brackets
for those: () = for/next, `{} = begin/while/repeat,
[;] = if/else/then.

> For me the bottom line was that it is hard for most
> programmers to learn new things once things become
> habit that they don't think about anymore.

Yes, this is probably true for most professional programmers.
Hobbyists seem to be different in this regard, maybe due to
their less purpose-oriented motivation (which I consider to
be a good thing).

------------------------------------------------------------------------

Brad <hwf...@gmail.com> wrote:
> Or, you can
> make your own editor. Set it up so when you mouse-over a particular 1-
> character operator, a help window shows the stack and description of
> it.

I had thought about a similar idea. Some block-oriented FORTH systems
used something which was called a shadow screen (?) that held the
stack relations and comments for the words in the corresponding principal
screen. I found this a pretty nifty feature, but could never come up
with a straight-forward idea to implement it in a more dynamic (i.e.
not block-oriented) environment.

> I like your
> syntax because it doesn't need color. An editor could color it for you

> to make it more readable. [...]

The UE editor that is contained in the U package does syntax
highlighting. It's only a tiny vi subset, though, just 300 lines
of U.

> The ability to switch between immediate and compile states
> (done at edit time in colorForth) would be useful, but you are already
> using [ and ].

[] is if/then, but you dicovered that later. I am not using state
at all, because I rarely used it anyway.

> Usually brackets are used to manipulate STATE. What terminates the
> definition besides EOL?

Only the EOL terminates definitions.

> You should reserve [ and ] for their traditional role of manipulating
> STATE.

Because U does not use state, I recycled it for if/then.

> Other than that, it's a nice syntax. Like colorForth without
> the color.

Thank you! I have never had a closer look at colorForth, but
I certainly liked the idea.

------------------------------------------------------------------------

Ian Osgood <ia...@quirkster.com> wrote:
> Of course, this is nothing new.

I thought so!

> The esolangs site is a wiki, and they would welcome the addition of
> your "U" language.

I will keep that in mind!

> One of the
> experimenters noticed that programs written in such a language often
> contained common "phrases", oft repeated short sequences of words for
> doing common data operations. Although most people think of such
> sequences in terms of actual sentences or fragments, he thought of it
> as a word in itself. By naming the keywords as syllables such as "BA"
> and "RE", such common phrases become pronounceable words themselves.

This idea went through my head for a brief moment while thinking
about the U operators, but I discarded it immediately, because I
thought that it would add an unmanageable level of complexity to
the language design. Nice to see that somebody else accepted the
challenge and even succeded!

------------------------------------------------------------------------

AshleyF <feni...@gmail.com> wrote:

> Try out: http://www.lkjsdf.com/archive/turtle/?(320dg0(1adf89r)720x1lc)1000x
> Compact, single-character words, interactive Logo-like toy language.

Very nice, thank you! (Although the animated example almost made my
browser freeze (ancient hardware at work here)).

> One interesting aspect is the use of quotations (similar to Joy or
> Factor). You can define new words with (abc)Z then use Z elsewhere in
> place of abc or you can push quotations on the stack and apply
> combinators such as "repeat" - x. For example, (abc)10x repeats abc 10
> times.

Yes, this is an interesting feature indeed. However, it places
the language in an entirely different category, at least from an
implementors POV, because you need a garbage collector, which
makes the runtime much more complex.

Incidentally, though, '10(abc) repeats abc 10 times in U. :-)

------------------------------------------------------------------------

Mat <dam...@web.de> wrote:

> I think both profits from eliminating white spaces as separator but if
> I rewrite your examples there came close to readable code for me:

When I wrote the first U programs, I used some white space, too,
but when I become more proficient, it started to look redudant.

> You can obfuscate your language much more by replacing (optional?)

> white spaces though other characters like ?,?,?,? at will:

:-)

------------------------------------------------------------------------

jacko <jacko...@gmail.com> wrote:

> Have you tried J instead of APL?

Yes, I have tried J and liked it.

I also liked K. I did not use either of them much, though, due
to their proprietary nature. The last proprietary language I
have used, ever, was Turbo C 2.x (not C++!) in the late 80's.

Best Wishes, Nils

Fritz Wuehler

unread,
Apr 28, 2011, 3:09:31 PM4/28/11
to
> Transmeta marketed a chip (during their short lifespan) which translated
> x86 into a RISC-like form, however they couldn't match true x86 chips
> performance-wise, and this was a major problem.

Sounds like poor design and probably alot of complexity.

>
> modern x86 chips do virtualize many of their less common instructions,
> so it is not like they directly handle the entire ISA directly in the HW.

I think that's true of most non-RISC designs. Mainframe CPUs are heavily
microcoded.

> for example, a GPU running a minimalist x86 cores, could be interesting
> (then one could more easily run custom languages on the GPU, rather than
> being limited to GLSL and HLSL and a C subset and similar, due mostly at
> this point to different GPUs using different ISAs internally...).

Isn't nvidia doing that already?

BGB

unread,
Apr 28, 2011, 4:58:26 PM4/28/11
to
On 4/28/2011 12:09 PM, Fritz Wuehler wrote:
>> Transmeta marketed a chip (during their short lifespan) which translated
>> x86 into a RISC-like form, however they couldn't match true x86 chips
>> performance-wise, and this was a major problem.
>
> Sounds like poor design and probably alot of complexity.
>

AFAIK they basically used a flash-ROM in the processor, where there
would be a translator within the CPU (in the ROM), which would try to
rewrite the x86 code into its native ISA within its caches.

the downside was that there was a notable performance overhead.


>>
>> modern x86 chips do virtualize many of their less common instructions,
>> so it is not like they directly handle the entire ISA directly in the HW.
>
> I think that's true of most non-RISC designs. Mainframe CPUs are heavily
> microcoded.
>

yep.


>> for example, a GPU running a minimalist x86 cores, could be interesting
>> (then one could more easily run custom languages on the GPU, rather than
>> being limited to GLSL and HLSL and a C subset and similar, due mostly at
>> this point to different GPUs using different ISAs internally...).
>
> Isn't nvidia doing that already?
>

AFAIK, their GPU cores are ARM-based.
shaders/'kernels'/... then have to be compiled into ARM to run on them
(via OpenGL, OpenCL, or CUDA).

but, ATI uses a different ISA AFAIK, so effectively one has to produce
code specific to each GPU (if they could bypass the drivers, that is).

whereas, if there was a more-standardized x86-based ISA, then people
could probably write more generic JITs, rather than having to rely on
the drivers and similar to do all the translation work.


granted, one could trans-compile custom languages into the C variant and
feed them into OpenCL, but alas...


or such...

Rod Pemberton

unread,
Apr 28, 2011, 5:19:55 PM4/28/11
to
"Nomen Nescio" <nob...@dizum.com> wrote in message
news:ff612e0d5d7a1cf8...@dizum.com...

> > Do you consider Brainfuck to be stack or array?
>
> BF is definitely NOT a stack based language.
>

The reason I asked Ian was two-fold. First, ISTM that BF programmers
basically use the array like a (non-moving) stack, except there is no direct
PUSH, POP or DROP. Second, much of the BF code I've seen copies and moves
cell values, in essence, it functions like some less common stack
operations, e.g., PICK, PLACE, and maybe ROLL or ROT, in Forth.

> BF test suite
>

Uh, ... I'm not familiar with a BF test-suite. I do recall one BF IDE
that had a larger collection of programs. I collected a handful of BF
programs to test my code.

> I have a BF interpreter [...] and AFAIK it passes all the BF test suite.
>

That it passes all the tests is good though. Of what I believe to be
the best five BF interpreters from the Internet, all of them fail with
one program or another. I've fixed a few trivial errors in them for my use
that prevented some programs from running. However, they still fail with
certain programs. They are:

BCCI.C by Daniel B. Cristofani
PBRAIN.C by Daniel B. Cristofani
QDB.C by Daniel B. Cristofani
SBI.C by Daniel B. Cristofani
BFF4.C by Oleg Mazonka - based on Daniel B. Cristofani's DBFI

> Ada
>

Ada ... ? Ok, that would make you the 3rd person, AIR, maybe 4th..., in the
NGs I've come across with Ada experience, if you aren't one of the others...

> I have a BF interpreter written [...] without any use of stacks
>

Ok.

I have a couple variations of a BF interpreter in C. I also have a handful
of BF to C converters that emit code using C arrays. The most advanced one
optimizes redundant BF commands into integer constants, indents the emitted
C code correctly, doesn't allocate any memory when converting, and does so
in a single pass. In addition to those, I have a couple variations of
another type of BF to C converter that don't emit code which use C arrays.
They emiited C code that used "static" cells, i.e., file-scope declared C
variables not actually declared static. When it worked correctly, it
produced C code that optimized very well. Unfortunately, BF code can change
the cell used as the loop variable, which made the static naming a
problem... I also learned that BF programmers use a number of various
side-effects of the array, such as array wrap-around, etc. All of these
only work on 8-bit cells of 32KB in size. I've run across code of larger
array sizes and different integer sizes. They fail with what I've done.

> Now that you say that a neat project might be an RISC assembler for x86.
> It would implement all the thousands of less needed opcodes as macros
> expanding to a small set of basic instructions. After you prove to Intel
> they're full of shit with all these instructions they'll hire you to

> design their next CPU family... [...]
>

AFAIK, the x86 micro-code isn't available. QEMU breaks down each
instruction into C code. GCC's backend has to choose instructions for it's
intermediate representation (IR). So, it probably has an x86 instruction
breakdown but in terms of it's IR. So, I'd expect other emulators and
binary translators to have something also. Whether it is re-useable or
effective is another topic.


Rod Pemberton

BGB

unread,
Apr 29, 2011, 12:39:55 AM4/29/11
to

yeah...

my x86 interpreter converted it into lists of structs containing
function pointers...

it worked well enough...


arrays of function pointers could also be interesting:

void (**fpa)(Context *ctx);
...
while(*fpa)
(*fpa++)(ctx);

albeit this doesn't itself make any provision for jumps, unless maybe:
while(*ctx->fpa)
(*ctx->fpa++)(ctx);

but, alas, the current script interpreter I am currently stuck with (due
to my inability to effectively replace it), has a very different problem:
most of its running-time overhead is currently going into type-checking
and GC-related stuff (and running the GC, as it still spews cons cells
at a high rate...).

but, I have shaved its time to evaluate fib(28) to 3 seconds, which is a
little better...

earlier today, evaluated fib(34) and it took several minutes...

or such...

BGB

unread,
Apr 29, 2011, 2:23:59 AM4/29/11
to
(quick status update...)

On 4/28/2011 9:39 PM, BGB wrote:

>
> but, I have shaved its time to evaluate fib(28) to 3 seconds, which is a
> little better...
>

now, fib(28) is down to 400ms, once I plugged a spot where it was
spewing garbage, and then noted that a large amount of time was being
eaten up by Windows' mutex locking...


> earlier today, evaluated fib(34) and it took several minutes...

now fib(34) is down to a few seconds...

fib(38) is now being used for the profiler instead (since I need
something with a decent running time).


now, the majority of time is actually in the interpreter (rather than in
the GC or elsewhere...).

BartC

unread,
Apr 29, 2011, 4:51:45 AM4/29/11
to

"BGB" <cr8...@hotmail.com> wrote in message

news:ipdlm4$sq2$1...@news.albasani.net...


> (quick status update...)
>
> On 4/28/2011 9:39 PM, BGB wrote:
>
>>
>> but, I have shaved its time to evaluate fib(28) to 3 seconds, which is a
>> little better...
>>
>
> now, fib(28) is down to 400ms, once I plugged a spot where it was

That's much better, and what you might expect of an interpreter. Although
it's now a surprise that using string processing on types can be so quick...

>> earlier today, evaluated fib(34) and it took several minutes...
>
> now fib(34) is down to a few seconds...
>
> fib(38) is now being used for the profiler instead (since I need something
> with a decent running time).
>
>
> now, the majority of time is actually in the interpreter (rather than in
> the GC or elsewhere...).

Yes, 90% overhead for garbage collection, especially for a fib() function on
values which easily fit into 32 bits, doesn't sound right.

--
Bartc

Nomen Nescio

unread,
Apr 29, 2011, 5:02:51 AM4/29/11
to
BGB <cr8...@hotmail.com> wrote:

> well, the complexity has to go somewhere...
> one can have a high-level ISA, which allows simpler compilers and is
> easier to write ASM for, but at the cost of a more complex processor.
> a simpler processor can have a smaller ISA, but is much harder to use
> (for humans and compilers), or results in reduced performance.

That's pretty general. So general I think it might not even be accurate.

> granted, whether or not a processor needs 1000s of instructions, or only
> a few hundred, is a less certain issue.

Now we're getting closer.

>
>
> I have observed that most of my custom interpreters seem to level off
> with around a few hundred opcodes, even with statically-typed opcodes
> (where one ends up with many variations on an instruction differing only
> in type).

That's interesting I think I will write some code to analyze my assembler
code and count opcodes. I bet 90% of my code uses much less than 100
different instructions, probably closer to 50.

> the initial design managed to level off at around 320 opcodes (it may
> expand some more, but not likely drastically, apart from adding
> micro-optimization opcodes such as combined compare/jump opcodes and
> similar).

That's mind boggling. I can't imagine needing so many opcodes.


BGB

unread,
Apr 29, 2011, 6:08:41 AM4/29/11
to
On 4/29/2011 1:51 AM, BartC wrote:
>
>
> "BGB" <cr8...@hotmail.com> wrote in message
> news:ipdlm4$sq2$1...@news.albasani.net...
>> (quick status update...)
>>
>> On 4/28/2011 9:39 PM, BGB wrote:
>>
>>>
>>> but, I have shaved its time to evaluate fib(28) to 3 seconds, which is a
>>> little better...
>>>
>>
>> now, fib(28) is down to 400ms, once I plugged a spot where it was
>
> That's much better, and what you might expect of an interpreter.
> Although it's now a surprise that using string processing on types can
> be so quick...
>

I added more optimization cases in the arithmetic handling so that now
many of the strings-based type-checks can be skipped for fixnum and
flonum types (because the predicates for these can use address
range-checks instead...).

this way, it will not fall back to the slower type handling unless one
of the arguments in an arithmetic expression is not a fixnum or a
flonum. I also added a "dyfixrealp()" predicate which detects this case.

the cons-cell predicate ("dyconsp()") also avoids checking a type by
name, but is still a little costly as it does address range-checks
against known cons-cell blocks (cons cells are presently allocated in
512KiB chunks), which is also not free (however, I did reorganize the
logic some so that it was a little more efficient).


name-based handling is still used for function-types though (the "call"
handler in the interpreter will itself check if it is a recognized
function/method type, and then pass it off to more "generic" logic
elsewhere...).

however, this is not presently near the top of the list in the profiler...

>>> earlier today, evaluated fib(34) and it took several minutes...
>>
>> now fib(34) is down to a few seconds...
>>
>> fib(38) is now being used for the profiler instead (since I need
>> something with a decent running time).
>>
>>
>> now, the majority of time is actually in the interpreter (rather than
>> in the GC or elsewhere...).
>
> Yes, 90% overhead for garbage collection, especially for a fib()
> function on values which easily fit into 32 bits, doesn't sound right.
>

actually, the fixint/fixnum type is only 28 bits (on 32-bit x86),
meaning that going outside this range will result in type-boxing
(allocating heap objects for the value).

sadly, ECMA-262 mandates always using doubles, which would be very
costly in my case (as doubles need to be boxed), so the use of fixnums
and flonums instead is technically a violation of the ECMA-262 standard,
but whatever...


granted, but the GC library is responsible for memory
allocation/freeing, for the internal aspects of type-handling (looking
up the type names, ...). so pretty much anything memory or types related
goes through the GC (although, it deals with the low-level aspects of
the typesystem, as the high-level typesystem is located in a different
library).

also, the interpreter does do a fair number of type-checks, and also a
good amount of allocating and freeing memory in tight loops, so the GC
has its finger in it...


actually, the main one of these "tight loop types" is actually cons
cells, which are used to build many internal structures for the running
program (such as argument lists and environment frames). the result is
that (sadly) the interpreters' present/overall performance is highly
dependent on the performance of cons-cell operations.

a major aspect of this optimization effort was actually tracking down
and fixing a spot where the interpreter was leaking cons cells
(actually, I had found several such spots...).

for example:
function-argument lists were being leaked (in a past GC, ref-counting
was used here, but ref-counting is not used at present);
lexical binding scopes were being leaked if the return statement was in
expression context;
...

misc:
http://en.wikipedia.org/wiki/Cons_cell


could be worse though...


or such...

BGB

unread,
Apr 29, 2011, 6:27:31 AM4/29/11
to
On 4/29/2011 2:02 AM, Nomen Nescio wrote:
> BGB<cr8...@hotmail.com> wrote:
>
>> well, the complexity has to go somewhere...
>> one can have a high-level ISA, which allows simpler compilers and is
>> easier to write ASM for, but at the cost of a more complex processor.
>> a simpler processor can have a smaller ISA, but is much harder to use
>> (for humans and compilers), or results in reduced performance.
>
> That's pretty general. So general I think it might not even be accurate.
>

these sort of triangle-like trade-offs seem common IME...


>> granted, whether or not a processor needs 1000s of instructions, or only
>> a few hundred, is a less certain issue.
>
> Now we're getting closer.
>
>>
>>
>> I have observed that most of my custom interpreters seem to level off
>> with around a few hundred opcodes, even with statically-typed opcodes
>> (where one ends up with many variations on an instruction differing only
>> in type).
>
> That's interesting I think I will write some code to analyze my assembler
> code and count opcodes. I bet 90% of my code uses much less than 100
> different instructions, probably closer to 50.
>

dunno...


>> the initial design managed to level off at around 320 opcodes (it may
>> expand some more, but not likely drastically, apart from adding
>> micro-optimization opcodes such as combined compare/jump opcodes and
>> similar).
>
> That's mind boggling. I can't imagine needing so many opcodes.
>

well, mostly it is simple patterns...

say, the arithmetic operators:
add, sub, mul, div, mod, and, or, xor, shl, shr, ushr

and the types:
int, long, float, double, pointer, reference

right there is 66 opcodes, although one can shave off a few cases that
don't make sense (division with pointers, bitwise ops or shifts with
float and double, ...).

then when one throws in a few other things:
typed variable loads/stores;
typed array-index load/stores;
...

36 more for typed compare operators:
cmpeqi, cmpltf, ...

36 more for typed compare+jump:
jmpnei, jmpgtd, ...

...


as well as different forms of function/method calls (static vs virtual
vs dynamic methods, ...).

one may soon find themselves looking at several hundred opcodes.


granted, having higher-level opcodes (or dynamic types) does reduce the
count some.

Nomen Nescio

unread,
Apr 29, 2011, 8:19:27 AM4/29/11
to
"Rod Pemberton" <do_no...@noavailemail.cmm> wrote:

> "Nomen Nescio" <nob...@dizum.com> wrote in message
> news:ff612e0d5d7a1cf8...@dizum.com...
> > > Do you consider Brainfuck to be stack or array?
> >
> > BF is definitely NOT a stack based language.
> >
>
> The reason I asked Ian was two-fold. First, ISTM that BF programmers
> basically use the array like a (non-moving) stack, except there is no direct
> PUSH, POP or DROP. Second, much of the BF code I've seen copies and moves
> cell values, in essence, it functions like some less common stack
> operations, e.g., PICK, PLACE, and maybe ROLL or ROT, in Forth.

I really don't know since I haven't been bored enough to try to actually
write BF code. I was laughing my ass off just running some of it. I don't
know Forth but it seems worth looking into, I like wierd languages.

>
> > BF test suite
> >
>
> Uh, ... I'm not familiar with a BF test-suite. I do recall one BF IDE
> that had a larger collection of programs. I collected a handful of BF
> programs to test my code.

It's on one of Cristofani's pages. I'll send you a link if I can find it
again.

>
> > I have a BF interpreter [...] and AFAIK it passes all the BF test suite.
> >
>
> That it passes all the tests is good though. Of what I believe to be
> the best five BF interpreters from the Internet, all of them fail with
> one program or another. I've fixed a few trivial errors in them for my use
> that prevented some programs from running. However, they still fail with
> certain programs. They are:
>
> BCCI.C by Daniel B. Cristofani
> PBRAIN.C by Daniel B. Cristofani
> QDB.C by Daniel B. Cristofani
> SBI.C by Daniel B. Cristofani
> BFF4.C by Oleg Mazonka - based on Daniel B. Cristofani's DBFI

What code do they fail on? I'm surprised Cristofani's would fail.

>
> > Ada
> >
>
> Ada ... ? Ok, that would make you the 3rd person, AIR, maybe 4th..., in the
> NGs I've come across with Ada experience, if you aren't one of the
> others...

I just messed around with it a little since it's one of the few mainframe
languages that runs on PC's. It's pretty nice although the usual GPL
bullshit applies.

>
> > I have a BF interpreter written [...] without any use of stacks
> >
>
> Ok.
>
> I have a couple variations of a BF interpreter in C. I also have a handful
> of BF to C converters that emit code using C arrays. The most advanced one
> optimizes redundant BF commands into integer constants, indents the emitted
> C code correctly, doesn't allocate any memory when converting, and does so
> in a single pass. In addition to those, I have a couple variations of
> another type of BF to C converter that don't emit code which use C arrays.
> They emiited C code that used "static" cells, i.e., file-scope declared C
> variables not actually declared static. When it worked correctly, it
> produced C code that optimized very well. Unfortunately, BF code can change
> the cell used as the loop variable, which made the static naming a
> problem... I also learned that BF programmers use a number of various
> side-effects of the array, such as array wrap-around, etc. All of these
> only work on 8-bit cells of 32KB in size. I've run across code of larger
> array sizes and different integer sizes. They fail with what I've done.

I think some of that behaviour is not defined for example I'm not sure the
tape MUST wrap although it can. Also the tape size should be AT LEAST some
value but it can be any value larger also. I ran into that with a big
program somebody wrote that didn't fit in a standard tape so I adjusted my
code but without prescanning the source of course that is going to break any
BF compiler or interpreter that follows the standard.

I saw something else you will like if you haven't seen it already, somebody
wrote a BF compiler that produces a linux ELF in one go. I mean no
intermediate steps the guy takes BF source and spits out an ELF! That's a
seriously smart coder! If you haven't seen it tell me and I will try to find
the link and post it.

>
> > Now that you say that a neat project might be an RISC assembler for x86.
> > It would implement all the thousands of less needed opcodes as macros
> > expanding to a small set of basic instructions. After you prove to Intel
> > they're full of shit with all these instructions they'll hire you to
> > design their next CPU family... [...]
> >
>
> AFAIK, the x86 micro-code isn't available. QEMU breaks down each
> instruction into C code. GCC's backend has to choose instructions for it's
> intermediate representation (IR). So, it probably has an x86 instruction
> breakdown but in terms of it's IR. So, I'd expect other emulators and
> binary translators to have something also. Whether it is re-useable or
> effective is another topic.

I wasn't thinking of doing it at the microcode level just the functional
level. In other words write an assembler that supports all the x86
instruction set by cross compiling to a RISC-like subset of x86
instructions. Not sure if it's reasonable but might be a nice proof of
concept.


Marco van de Voort

unread,
Apr 29, 2011, 12:07:44 PM4/29/11
to
On 2011-04-28, Nils M Holm <news...@t3x.org> wrote:
>
> Marco van de Voort <mar...@turtle.stack.nl> wrote:
>> That is probably partially the reason for the lesser readability, short
>> procedures, and enforcing it. A one line description comment can then go a
>> long way.
>
> I do not think I understand this. In U you can have any number
> of comment lines before a single-line procedure, but I guess you
> meant something else.

I meant that a short (and even possibly oneline) procedure can be described
fairly easily.

>> Forcing short procedure gives you simpler procedures, but hoists
>> the complexity of the problem from the procedure implementation to
>> complicate the global state
>
> I agree that short procedures only shift the problem to another
> level. However, short procedures offer more possibilities to
> introduce meaningful names for pieces of code that would otherwise
> be inlined. (Of course this practice does not guarantee good code;
> any rule can be bent or broken.)

I theory it could yes, and such arguments are routinely used for nearly
every aspect of coding standards and language design.

However from what I've seen such benefits are extremely small, if any.

Moreover you would actually to have a tool or administration to find such
redundancies, since you actually have to know something exists before using
it.

>> That is a situation that simply never happens in professional programming.
>
> I agree that "real-world" programming is an entirely different
> beast than tinkering in a lab. However, I would not call tinkering
> "non-professional."

I do all kinds of tinkering. I have my main production apps, but also a lot
of microcontroller programs and utility stuff. (which are much more
play-ready than the main apps).

Deadlines, those are the difference. Actually I do more high tech work at
home for various open source projects than professionally. But at work it is
always a race with the clock, and heaps of legacy concerns.

> I find it a bit sad to hear that people consider a programming
> language as no good when it cannot be used to maintain code that is
> basically a mess (and, having spent a few years in business software
> development, I would say that most professional code *is* a mess).

True. IMHO the illusion that that can be avoided evaporates if your code
gets out of hand a few times. Simply because you find that the reasons for
that are usually external to the programmer (deadlines, changes in direction
in the company, dealing with external or bought in software).

The main reason is that such software is usually too large and its use critical
to majorly rewrite or refactor on change of requirements. And usually the
validation and introduction of the rewrite to customers that run old
versions and claim nothing is wrong with it and have no time is even a
bigger problem than finding the time for the rewrite.

Then you see posts here, or in other language debate areas that start with
nicely defined, overviewable problems with all border conditions known, and
a rewrite of toolchain + program on each major problem.

THAT IS NOT SOLVING A PROBLEM, that is micromanaging. Go play a RTS or so :-)

> The most maintenance-friendly language for messy code may indeed
> be COBOL, Pascal, or any other least-surprise language, but not
> because of its syntax, but because of its predictable semantics.

Ok. Then I made a

>> Math is a different problem. It is not a programming language.
>
> Is this math or a program?
>
> f(0) |--> 1
> f(x) |--> f(x-1)*x

There is a fine line between a problem description language and an
implementation language.

> And this?
>
> fun f 0 = 1
> | f n = n * f (n - 1)

Does this have all the same border conditions as the math example?
Infinitely deep numbers etc? Because that is what the difference between a
problem description and a implementation of a solution has. Enough implicit
or explicit information to actually automatically solve the problem.

But anyway, I don't think math notation was invented for the above example,
so my original statement still holds.

> Yes, only a limited example, but I would say that the border is
> blurring more and more.

Since there are certain languages working hard ot make it happen. None of
them have really broken through in the general purpose area. (and no LINQ
and friends don't count)

Rod Pemberton

unread,
Apr 29, 2011, 1:55:58 PM4/29/11
to
"BGB" <cr8...@hotmail.com> wrote in message
news:ipdfj0$fvr$1...@news.albasani.net...

> On 4/28/2011 2:19 PM, Rod Pemberton wrote:
>
> > AFAIK, the x86 micro-code isn't available. QEMU breaks down each
> > instruction into C code. GCC's backend has to choose instructions for
> > it's intermediate representation (IR). So, it probably has an x86
> > instruction breakdown but in terms of it's IR. So, I'd expect other
> > emulators and binary translators to have something also. Whether it
> > is re-useable or effective is another topic.
> >
>
> yeah...
>
> my x86 interpreter converted it into lists of structs containing
> function pointers...
>

That works. It's probably pretty flexible too. Just change the pointers as
needed. Branching is about the only issue, I'd guess, based on my
experience with implementing a Forth interpreter.

> arrays of function pointers could also be interesting:
>
> void (**fpa)(Context *ctx);
> ...
> while(*fpa)
> (*fpa++)(ctx);
>
> albeit this doesn't itself make any provision for jumps, unless maybe:
> while(*ctx->fpa)
> (*ctx->fpa++)(ctx);
>
>

Maybe use an escape code to indicate a special instruction sequence, e.g.,
branch. One could use NULL or a small value: 1, 2, or 3 etc. , i.e., not a
typical pointer value for the escape code.

> but, alas, the current script interpreter I am currently stuck with (due
> to my inability to effectively replace it), has a very different problem:
> most of its running-time overhead is currently going into type-checking
> and GC-related stuff (and running the GC, as it still spews cons cells
> at a high rate...).
>

type-checking? for an interpreter? One doesn't usually think of an
interpreter doing that.

Forth got around it by putting a small header on each Forth "word" that has
a pointer to a function. If the Forth "word" represents a type, i.e.,
variable or constant, instead of a function or procedure, the header's
pointer to a function contains a pointer to the code for processing that
specific type. If it's a function or procedure, there is another routine
for calling that function or procedure, etc. If it's a "compiled" word,
there is another routine which "interprets", i.e., sequentially calls, the
addresses of the compiled word. Forth user's and implementor's use their
own lingo, so this is not well explained anywhere. Basically, AFAICT, Forth
uses a space deliminated syntax, which generates an AST/CST very simply, and
then the interpreter walks the AST/CST tree calling the function in the
Forth "word's" header, i.e., node. So, Forth combines lexing, parsing,
AST/CST, type/function/state info, interpreter, command line, etc. all into
one.


Rod Pemberton


Rod Pemberton

unread,
Apr 29, 2011, 1:56:21 PM4/29/11
to
"BGB" <cr8...@hotmail.com> wrote in message
news:ipe2ri$qse$1...@news.albasani.net...

>
> I added more optimization cases in the arithmetic handling so that now
> many of the strings-based type-checks can be skipped for fixnum and
> flonum types (because the predicates for these can use address
> range-checks instead...).
>

IIRC, you are using hash values first, then string comparisons second, yes?
I.e., to reduce string comparisons?

>
> sadly, ECMA-262 mandates always using doubles,
>

Fake it. Use integers. Or, use hash values except where doubles are
required, e.g., display. One can probably get "close enough" by faking it
for 99.999% of situations.

> so pretty much anything memory or types related
> goes through the GC
>

Is there much overhead with this? Or, is it infrequently used?

> good amount of allocating and freeing memory in tight loops,
>

Not optimal?

> actually, the main one of these "tight loop types" is actually cons
> cells, which are used to build many internal structures for the running
> program (such as argument lists and environment frames). the result is
> that (sadly) the interpreters' present/overall performance is highly
> dependent on the performance of cons-cell operations.
>

Do you add an in-use flag to the cons cell for GC? memory compaction?


Rod Pemberton


Rod Pemberton

unread,
Apr 29, 2011, 1:56:37 PM4/29/11
to
"Nomen Nescio" <nob...@dizum.com> wrote in message
news:e53d6e72eebeffe8...@dizum.com...

> "Rod Pemberton" <do_no...@noavailemail.cmm> wrote:
> > "Nomen Nescio" <nob...@dizum.com> wrote in message
> > news:ff612e0d5d7a1cf8...@dizum.com...
>
> > > BF test suite
> > >
> >
> > Uh, ... I'm not familiar with a BF test-suite. I do recall one BF IDE
> > that had a larger collection of programs. I collected a handful of BF
> > programs to test my code.
>
> It's on one of Cristofani's pages. I'll send you a link if I can find it
> again.
>

That's probably enough for me to find it.

> >
> > > I have a BF interpreter [...] and AFAIK it passes all the BF test
> > > suite.
> >
> > That it passes all the tests is good though. Of what I believe to be
> > the best five BF interpreters from the Internet, all of them fail with
> > one program or another. I've fixed a few trivial errors in them for my
> > use that prevented some programs from running. However, they still
> > fail with certain programs. They are:
> >
> > BCCI.C by Daniel B. Cristofani
> > PBRAIN.C by Daniel B. Cristofani
> > QDB.C by Daniel B. Cristofani
> > SBI.C by Daniel B. Cristofani
> > BFF4.C by Oleg Mazonka - based on Daniel B. Cristofani's DBFI
>
> What code do they fail on? I'm surprised Cristofani's would fail.
>

Oh boy... Sorry, I don't recall the specifics, and didn't keep track. It's
been a while since I've worked with BF. I mention problems I've had in the
past and everyone wants examples or proof... I was using various apps from
the small collection to test with. I could probably locate them with some
effort... I can make diffs of the code to see what fixes I made. Of the
five, it looks like QDB is the only one I didn't make changes to.

>
> I think some of that behaviour is not defined for example I'm not sure the
> tape MUST wrap although it can. Also the tape size should be AT LEAST some
> value but it can be any value larger also. I ran into that with a big
> program somebody wrote that didn't fit in a standard tape so I adjusted my
> code but without prescanning the source of course that is going to break
> any BF compiler or interpreter that follows the standard.
>

Yeah, there are a number of boundary conditions that they use. IIRC, there
was one that had something to do with finding the start or end of string in
some BF code.

> I saw something else you will like if you haven't seen it already,
> somebody wrote a BF compiler that produces a linux ELF in one go.
> I mean no intermediate steps the guy takes BF source and spits out an
> ELF! That's a seriously smart coder! If you haven't seen it tell me and
> I will try to find the link and post it.
>

Off hand, I don't recall that one. There are only 8 instructions that he
needs assembly sequences for, and a minimal ELF header.

>
> I wasn't thinking of doing it at the microcode level just the functional
> level. In other words write an assembler that supports all the x86
> instruction set by cross compiling to a RISC-like subset of x86
> instructions. Not sure if it's reasonable but might be a nice proof of
> concept.
>

The problem is equivalence of x86 code. It's not equivalent across cpu
modes. Register only instructions are OK. But, memory modes and offsets
aren't. 64-bit code throws in REX.W. I.e., one might want access to 16-bit
BIOS and video BIOS calls, and also want 32-bit PCI/PNP BIOS calls, but
their OS is in 64-bit assembly.

E.g., because of restrictions in 16-bit addressing modes and changes in
register encoding, if you want the code using a memory address to work the
same in 32-bit mode, you've got to do something like:

;16-bit code
00000028 89FB mov bx,di
0000002A 8B4720 mov ax,[bx+0x20]

;32-bit code
00000028 89FB mov ebx,edi
0000002A 8B4720 mov eax,[edi+32]

BX is used as the address offset for 16-bit code. EDI is used for 32-bit
code. I've not attempted to work out an example that works for all three
modes: 16-bit, 32-bit, 64-bit. REX.W gets thrown into the mix. Offsets
are differently sized. The stack is differently sized. Etc.


Rod Pemberton

BGB

unread,
Apr 29, 2011, 4:50:26 PM4/29/11
to
On 4/29/2011 10:56 AM, Rod Pemberton wrote:
> "BGB"<cr8...@hotmail.com> wrote in message
> news:ipe2ri$qse$1...@news.albasani.net...
>>
>> I added more optimization cases in the arithmetic handling so that now
>> many of the strings-based type-checks can be skipped for fixnum and
>> flonum types (because the predicates for these can use address
>> range-checks instead...).
>>
>
> IIRC, you are using hash values first, then string comparisons second, yes?
> I.e., to reduce string comparisons?
>

hash values are used in many places, but are not really used for
general-purpose type compares for similar reasons to why I don't use
integer handles, namely they can get nasty and awkward to work with, and
often saving a "strcmp()" here or there doesn't often amount to *that*
much of a speedup.


for example, before I did a test involving interning a string (resolving
it via a hash so that strings with the same value will have the same
pointer).

the observation was that this only really offered a speedup for lists
longer than around 10 "strcmp()" calls...

this was observed while optimizing my assembler. likewise goes for
trying to intern-strings in a lexer (or even having a separate lexer and
parser stages) which I have observed is less efficient than not doing so
(albeit eliminating re-reading tokens can be aided via a small
hash-table, as my parsers will often end up re-reading the same tokens
multiple times in small localized bursts, at the modest cost of needing
to make a call to flush the tokenizer-hash whenever input stream changes).

theoretically, one could intern strings with the GC, so that they could
do "A==B" comparisons with the returned values, but this would not
likely offer a notable speedup, considering that, at present, fetching
the type-name from the pointer is actually the main expensive part (in
particular, the much bigger cost of validating that the pointer is to a
valid heap object, and in the process, identifying the object base).

there is a "fast" version of the above process, but one has to already
know in advance a few things, such as that it is a valid pointer to the
start of a small-cell-based heap-object (it may blow up with either
large-objects, with cons cells, or with fixnums or flonums), ...
effectively eliminating its usage.


sadly, the highest performance strategy I know of is:
use tagged references (rather than raw pointers);
represent objects as an encoded heap cell index, rather than an address;
...

but, this has its own mess of drawbacks (suddenly, one now needs an API
call to get an object's base-pointer, or get the handle for a pointed-to
object, ...).

which, FWIW, may very well be painful enough to not be worth the
performance gain...


>>
>> sadly, ECMA-262 mandates always using doubles,
>>
>
> Fake it. Use integers. Or, use hash values except where doubles are
> required, e.g., display. One can probably get "close enough" by faking it
> for 99.999% of situations.
>

my stuff uses integers wherever integers will work.

currently, any integer will default to being a 28-bit fixnum if it will
fit, or will be auto-promoted to a 64-bit boxed integer, albeit at the
moment there are no automatic larger dynamic integer types...
there is statically-typed int128 support, but no present dynamic type
for this (could add easily enough).
there is also a much larger BCD-based large-number type (but it is not
used automatically, IIRC it currently has a max of 2048.2048 decimal
digits or similar). I could address a lot of this if needed.


the issue then is for floating point code, but I have found that even
the current 28-bit flonums are "usually good enough" for the stuff I use
the scripting for (in my 3D engine, which mostly uses 32-bit floats for
nearly everything anyways...).

but, using integer and float types, rather than full doubles (at least
for any real-number tasks) is technically a violation of the ECMA-262
standard (which specifies ECMAScript/JavaScript).

granted, there are other technical variations as well:
BGBScript uses an object for the toplevel, rather than a lexical scope
frame, as would be required by the standard;
there are differences in the object model, as BGBScript uses an object
model more closely related to that of Self (namely, N-way and
potentially cyclic delegation chains, vs only having "parent" and an
acyclic graph);
...

I am also considering dropping whitespace-sensitive semicolon insertion
and similar, as in a few cases this logic has led to misparsing
(particularly, in cases where intermediate processing has fouled up
whitespace, either preventing it from working, or worse, creating
unintended breaks, which the parser misinterprets as an expression-break).


most of these issues should not actually be noticeable from code in
general, although the flonum/double issue could be noticed when dealing
with calculations with large floating-point numbers, where the use of
flonums will reduce accuracy.


granted, there are ways to force double precision in BGBScript, such as:
"x=1.0 as double;"
and I may add:
"x=1.0D;"
which would do the same as the above.


so, it may make sense to just keep it JS/ES-like, rather than trying to
be JS proper...


>> so pretty much anything memory or types related
>> goes through the GC
>>
>
> Is there much overhead with this? Or, is it infrequently used?
>

C code uses dynamic type-checking rarely enough so that performance
isn't really an issue. the vast majority of the time, C is using
statically-typed operations, usually only doing dynamic type-checks to
determine what sort of object it was given.

for example, my 3D renderer currently uses dynamic type-checking to
render scene objects, but only at the coarse-grain (scene object level),
rather than, say, using dynamic types for representing the scene or
model geometry.

hence, the current usual strategy of giving dynamic type-names to
structs, but using static types for all of the struct fields.

BGBScript also follows this convention when interfacing with C-side
data. machinery is in place so that BGBScript can work with raw C
structs, with a few minor restrictions:
namely, that a dynamic type-name has to be assigned to the struct and
visible to the GC, and also that at present, physically/directly-nested
structs will not work;
it *only* works for C-style structs, C++ classes or similar will not work;
...

so:
typedef struct Foo_s Foo dytname("myapp_foo_t");
struct Foo_s
{
int x, y;
Bar *bar;
...
};

will work ("dytname" is a magic macro, which expands to "something
relevant" for my own tools, but is no-op in standard C). basically, when
allocating memory for an instance of this struct, the same name will be
given, so that the typesystem will know which struct is being defined
here. it may either be placed on the typedef or on the struct itself
(and if on the typedef, it may also be a pointer).

but, this is bad:

struct Bar_s
{
struct {
double x, y;
}baz;
double z;
};

since with the current implementation, it can't currently see into baz
(although 'z' should still be visible).

although, the above *can* be fudged over, if one registers a custom
lookup handler, which will itself handle these accesses (a registered
lookup handler will overload the default behavior).


currently, "dytname" will not work on non-structs:
typedef unsigned long long myull dytname("myapp_ull_t");

and, as noted above, this is bad:
class MyClass dytname("myapp_MyClass_t")
{
public:
MyClass();
~MyClass();
virtual void myMethod();
private:
double x, y;
};

as is:
struct MyCPPStruct dytname("myapp_MyCPPStruct_t")
{
virtual void myMethod();
double x, y;
};

since including methods, effectively, makes it a class (the C++ ABI will
then go put in a vtable and so on, and may do some particularly
funky/nasty stuff if MI or virtual inheritance is used...).

nevermind that not having a typedef here is not valid in plain C anyways
(my tools will overlook this, but not the virtual method...).

non-virtual methods are a potential fudge area, as this "could work":
#ifndef __cplusplus
typedef struct MyCPPStruct MyCPPStruct;
#endif
struct MyCPPStruct dytname("myapp_MyCPPStruct_t")
{
#ifdef __cplusplus
void MyCPPStruct();
void ~MyCPPStruct();
void myMethod();
#endif
double x, y;
};

which "may" work, but is not required to (little requires that the C++
compiler produce the same physical layout in both cases, as the C++
compiler may well still insert a vtable for sake of allowing inheriting
and RTTI, since this would not be a C++ "POD" case, where the compiler
is obligated to generate the same layout).


>> good amount of allocating and freeing memory in tight loops,
>>
>
> Not optimal?
>

yep, but sadly there is no good way to address this at the moment.
the problem with interpreters like this is that, at best, one will have
to manage frame structures, and in this case, one also needs to manage
lexical environment chains (this can be optimized out for plain
functions though, which is supported by the interpreter but is not
presently enabled, whereby and arguments and locals are placed
onto/left-on the main evaluation stack).

var w; //toplevel

function foo(x, y)
{
var z;
z=x*y+w;
return(z);
}

ok, no variable capture:
the toplevel is not lexical, and no closures are involved, so a simple
fixed-frame environment can be used (and raw-memory or stack space can
be used for locals).

however:
function foo(x, y)
{
var z;
z=x*y;
return fun()z+w;
}

creates a closure, which will mean that the lexical frame will have to
be created in a way which will outlive the call-frame.

at present, this essentially forces both functions to allocate their
environments lexical as lists (applies naturally enough to all parent
functions, and naturally enough, to any closures created).

a much worse situation would exist if BGBScript supported full call/cc
("call-with-current-continuation") style functionality, as then it would
also be needed to be able to capture any call frames (full call/cc is a
nasty thing to implement, but exit-only call/cc is a bit nicer...).

>> actually, the main one of these "tight loop types" is actually cons
>> cells, which are used to build many internal structures for the running
>> program (such as argument lists and environment frames). the result is
>> that (sadly) the interpreters' present/overall performance is highly
>> dependent on the performance of cons-cell operations.
>>
>
> Do you add an in-use flag to the cons cell for GC? memory compaction?
>

yeah, external bitmaps are used to manage both cons cells and heap
(small-object) cells.

no memory compaction is preformed as the GC is conservative and
concurrent, and hence the GC has no idea what all references to an
object may exist at any given moment, or if a given found reference is
actually a valid reference, or just happens to look like one.

so, the GC is a conservative concurrent mark/sweep collector (sort of
like Boehm or similar...).

BGB

unread,
Apr 29, 2011, 5:53:21 PM4/29/11
to
On 4/29/2011 10:55 AM, Rod Pemberton wrote:
> "BGB"<cr8...@hotmail.com> wrote in message
> news:ipdfj0$fvr$1...@news.albasani.net...
>> On 4/28/2011 2:19 PM, Rod Pemberton wrote:
>>
>>> AFAIK, the x86 micro-code isn't available. QEMU breaks down each
>>> instruction into C code. GCC's backend has to choose instructions for
>>> it's intermediate representation (IR). So, it probably has an x86
>>> instruction breakdown but in terms of it's IR. So, I'd expect other
>>> emulators and binary translators to have something also. Whether it
>>> is re-useable or effective is another topic.
>>>
>>
>> yeah...
>>
>> my x86 interpreter converted it into lists of structs containing
>> function pointers...
>>
>
> That works. It's probably pretty flexible too. Just change the pointers as
> needed. Branching is about the only issue, I'd guess, based on my
> experience with implementing a Forth interpreter.
>

IIRC, I did a special case for this:
I used special "interrupts" whenever a call or jump was performed, which
would re-hash the EIP value (to lookup the target opcode, or trigger
instruction decoding);
interrupts were also used to call into native land, where I had a
(technically not really used) "intx" (or something like this) opcode
which would do 16-bit interrupts.

interrupts >=256 were generally used for operations internal to the
interpreter (calls, jumps, and IIRC I remapped at least one of the
normal CPU exceptions there, like #DE or similar, I forget, I think
because it conflicted with something).

but, anyways, as is common in my interpreters, usually there is a
special status value that is set to a non-zero value whenever special
handling is needed (whereas a value of 0 just means to continue on
executing instructions...).


>> arrays of function pointers could also be interesting:
>>
>> void (**fpa)(Context *ctx);
>> ...
>> while(*fpa)
>> (*fpa++)(ctx);
>>
>> albeit this doesn't itself make any provision for jumps, unless maybe:
>> while(*ctx->fpa)
>> (*ctx->fpa++)(ctx);
>>
>>
>
> Maybe use an escape code to indicate a special instruction sequence, e.g.,
> branch. One could use NULL or a small value: 1, 2, or 3 etc. , i.e., not a
> typical pointer value for the escape code.
>

in the loop:
while(*ctx->fpa)
(*ctx->fpa++)(ctx);

a jump can be handled inline by the handler function, just alter the
function-pointer-array value.

setting the value to point to a null value necessarily breaks the loop,
which could either be done as the natural result of hitting the "end" of
the sequence, or to trigger some special action...

my planned (currently halted) BGBScript2 interpreter would had this as
its dispatch loop:
BS2I_API void BS2I_RunOpcodes(BS2I_VMState *ctx, BS2I_Opcode *fst)
{
BS2I_Opcode *cur;

cur=fst;
while(cur)
cur=cur->fcn(ctx, cur);
}

so, as can be noted, it works on a similar concept.
most normal operations will simply return a different opcode-structure
pointer, and returning NULL will break the loop.


>> but, alas, the current script interpreter I am currently stuck with (due
>> to my inability to effectively replace it), has a very different problem:
>> most of its running-time overhead is currently going into type-checking
>> and GC-related stuff (and running the GC, as it still spews cons cells
>> at a high rate...).
>>
>
> type-checking? for an interpreter? One doesn't usually think of an
> interpreter doing that.
>

it is dynamically typed...
luckily, I have largely fixed the above problems now, so it is a bit
faster, albeit still a good deal slower than some of my other (more
experimental) interpreters.

BGBScript is currently around 278x slower than C.

my x86 interpreter was 70x slower than C.

some of my experimental interpreters (using special opcode handler
functions, ...) have gotten within 5x-10x of raw C (so, fairly close to
JIT-like speeds), however, I have noted the major drawback that these
style of interpreters are damn near as complicated to implement and work
on (when one tries to build a full-language VM) as a JIT would be.


so, the big-ass switch table is slower, but also is a bit easier to
implement. a partial hybrid strategy though is to simply use such a
big-ass switch for any cases where there are no special
(micro-optimized) special-case handlers, which is pretty much what I did
for the x86 interpreter (I really didn't need to have special-case
handlers to try to make "cpuid" and similar go extra fast...).


> Forth got around it by putting a small header on each Forth "word" that has
> a pointer to a function. If the Forth "word" represents a type, i.e.,
> variable or constant, instead of a function or procedure, the header's
> pointer to a function contains a pointer to the code for processing that
> specific type. If it's a function or procedure, there is another routine
> for calling that function or procedure, etc. If it's a "compiled" word,
> there is another routine which "interprets", i.e., sequentially calls, the
> addresses of the compiled word. Forth user's and implementor's use their
> own lingo, so this is not well explained anywhere. Basically, AFAICT, Forth
> uses a space deliminated syntax, which generates an AST/CST very simply, and
> then the interpreter walks the AST/CST tree calling the function in the
> Forth "word's" header, i.e., node. So, Forth combines lexing, parsing,
> AST/CST, type/function/state info, interpreter, command line, etc. all into
> one.
>

I have usually just used plain bytecode...


usually, my bytecode encodings follow the format:

opcode [immediates]

where, opcode is a byte:
0x00 .. 0xBF: single-byte opcode (0 .. 191);
0xC0 .. 0xFF: 2 (or more) byte opcode, usually with the secondary
byte(s) encoding the low-order value bits.

as I have thus far never gone beyond what can be encoded in 2 bytes
(16384 opcodes in this encoding), I usually only leave 3-byte or longer
opcodes as a theoretical possibility.


potentially better could be:
0x00..0xEF: single-byte opcodes (0..239);
0xF0..0xFE: dual-byte opcodes (240..3839);
0xFF: 3-byte (or more) opcodes (3840+).

this is because, it is unlikely an interpreter will even need more than
4096 opcodes, and this leaves more space for the (generally far more
common/important) single-byte opcodes. but, I have usually stuck with my
traditional encoding, as there is little real reason to change it, and
any interpreter which pre-converts the bytecode would likely render the
difference moot anyways...


immediates are usually hard-coded in the opcode-listing tables.

the earlier (2004) version of BGBScript had actually used word-code
instead, which had used 2 opcode bits to encode the size of the
immediates (allowing 0-3 words of immediate data).

IIRC the 2006 version switched to bytes and the 192-based encoding (and
most of my later encodings did the same), IIRC, because it was then a
"proper" bytecode, in the common case only took about 1/2 as much memory
for bytecode (weak reason, probably...), and because encoding the
opcode-length in the opcode was sort of redundant (since if this much
can't be figured out by other means, then there is not a whole lot
useful that can reasonably be done with the opcode anyways, as it is by
this point simply a glob of several numbers...).

my extended JBC (Java ByteCode) encoding had switched to using 0xE0
instead (0xE0..0xEF were 2-byte, 0xF0..0xFD were 3 byte, 0xFE/0xFF were
reserved). this was partly because the JVM had used opcodes past 0xC0,
but not past 0xE0.


although, there was a downside:
any 192-based bytecodes generally need an extra "if()" statement
preceding the main "switch()" statement, which doesn't come for free
either (sadly...).

treating them as escape codes is also possible, but more of a hassle.


or such...

foxchip

unread,
May 1, 2011, 1:41:54 PM5/1/11
to
On Apr 28, 8:53 am, Nils M Holm <news2...@t3x.org> wrote:
> foxchip <f...@ultratechnology.com> wrote:
> > Chuck used that idea on the original machineForth for P20
> > back in 1990
> > [ Lots of interesting trivia ...]
>
> I was sure that the idea was not new! :-)

Google seems to have lost that post. I am glad that you got to
see it.

> > Or Forth.  BTW I used U for Until.
>
> I had thought about using F and N for FOR and NEXT, but then
> it occurred to me that all flow control operators have a
> beginning and an end, so I decided to use the various brackets
> for those: () = for/next, `{} = begin/while/repeat,
> [;] = if/else/then.

I did something similar for those too.

Best Wishes

Mike Austin

unread,
Jul 17, 2011, 4:11:27 AM7/17/11
to
A new girlfriend... I'll be back when the honeymoon phase is over :)

Peace!
Mike

fmn...@gmail.com

unread,
Sep 11, 2012, 12:48:17 PM9/11/12
to
Am Mittwoch, 27. April 2011 02:11:25 UTC+2 schrieb Rod Pemberton:
> "Ian Osgood" <ia...@quirkster.com> wrote in message
> news:fd723c5e-6e16-4b57...@f31g2000pri.googlegroups.com...
> >
> > The entire field of esoteric
> > programming languages began in 1993 with Wouter van
> > Oortmussen's FALSE language, a 1K compiler which
> > embodies similar ideals. A host of
> > similar languages have sprung up since then.
> >
> > http://esolangs.org/wiki/FALSE
> > http://esolangs.org/wiki/Category:Stack-based
> >
>
> Did you forget about INTERCAL?
> http://esolangs.org/wiki/INTERCAL
>
> Is there an esoteric language listed on esolangs.org that is actually
> _useful_ as a language? FALSE? Befunge? Brainfuck? Stlisp? Toadskin?
> Brainfuck is somewhat useable, however, most assembly languages are far more
> powerful. FALSE doesn't look too bad, but it's still very simple, i.e., not
> powerful. Most Esolang languages seem to be in the spirit of INTERCAL...
>
>
> Rod Pemberton

Being the creator of Stlisp I feel obliged to state that Stlisp can be used as a useful language. It had everything necessary to be a useful language as well as support for loading additional modules like cgi and mysql which made it comfortable enough to write webpages ;). If I look at http://mroman.ch/cgi/cgitest.slisp?name=eso and http://mroman.ch/cgi/mysql.slisp it doesn't look bad and uncomfortable at all. It might not even be considered an esoteric language at all :(, sadly.
0 new messages