Designing a memory image for language runtime

luserdroog

unread,

Dec 24, 2022, 12:16:39 AM12/24/22

to

One of the parts of the Adobe PostScript interpreter source that I've
been taking inspiration from is designing the initial state of the
virtual memory (the systemdict holding all the operators and other
initialized data structures) to be cached and loaded from a file.

In my PostScript clone xpost I really tried to make this feature work
and I never quite managed it. Worse, mine creates multiple image
files every time it runs and just leaves them cluttering up the working
directory.

The first problem is pointers. The memory image has to use integer
offset and things instead of pointers. Function pointers for operators
have to be mediated through a table that lives outside of the image.
And it's coordinating the operator codes across the memory image
and the compiled code that leads to coupling the two with a version
number or password or something. If you change the ordering of
operators in the source code, the recompiled program will not work
with an old image.

Then, some amount of the top-level interpreter state needs to be
stashed in a known location (Adobe's code uses a struct VMInfo for
this top-level data).

What other considerations are there in designing such a thing?

In xpost there's a tiny issue of pointers to posix objects related
to file globbing. But I don't know how much I really need to worry
about cleaning those out of the image. If you're trying to resume
an image that stopped in the middle of a file globbing loop, then ...
I suppose I should explicitly disavow any guarantees about the
results in the documentation.

Tim Rentsch

unread,

Dec 24, 2022, 4:05:54 AM12/24/22

to

luserdroog <mij...@yahoo.com> writes:

> One of the parts of the Adobe PostScript interpreter source that
> I've been taking inspiration from is designing the initial state
> of the virtual memory (the systemdict holding all the operators
> and other initialized data structures) to be cached and loaded
> from a file.
>
> In my PostScript clone xpost I really tried to make this feature
> work and I never quite managed it. Worse, mine creates multiple
> image files every time it runs and just leaves them cluttering up

> the working directory. [...]

If you want help, define the problem completely and clearly enough
so people who don't know anything about PostScript can understand
what it is you want to accomplish.

Bart

unread,

Dec 24, 2022, 5:34:00 AM12/24/22

to

This doesn't sound any interpreter I've ever used. Usually there is the
intepreter itself - a executable file image; the program to be run, and
that's it.

Everything needed is created in-memory and stays there; you don't
attempt to write that stuff to a file.

What is the special requirement here? Is there a large program prelude
involved for PostScript, which you don't want to have to repeat each
time it runs a script, so you want cache that state?

Does it need to be able to recover from an aborted script? Or is there
other state that needs to be persistent across activations? In short,
what is the problem to be solved that might need saving a memory image
to disk and reloading later?

Spiros Bousbouras

unread,

Dec 24, 2022, 8:25:30 AM12/24/22

to

Disclaimer : I don't know postscript so I'm kind of guessing here. Also
it might have made sense to crosspost this on comp.lang.postscript .

On Fri, 23 Dec 2022 21:16:38 -0800 (PST)
luserdroog <mij...@yahoo.com> wrote:
> One of the parts of the Adobe PostScript interpreter source that I've
> been taking inspiration from is designing the initial state of the
> virtual memory (the systemdict holding all the operators and other
> initialized data structures) to be cached and loaded from a file.

What for ?

> In my PostScript clone xpost I really tried to make this feature work
> and I never quite managed it. Worse, mine creates multiple image
> files every time it runs and just leaves them cluttering up the working
> directory.

Images shouldn't be in the working directory , they should be at a fixed
location unless the user has specified otherwise in some way.

> The first problem is pointers. The memory image has to use integer
> offset and things instead of pointers. Function pointers for operators
> have to be mediated through a table that lives outside of the image.
> And it's coordinating the operator codes across the memory image
> and the compiled code that leads to coupling the two with a version
> number or password or something.

Don't the operators have names ? Why can't you use a hash table saying
that the code corresponding to operator XYZ is at location N ? I don't
see what the ordering of the operators in some source code would have
anything to do with it.

> If you change the ordering of
> operators in the source code, the recompiled program will not work
> with an old image.

Source code of what ?

> Then, some amount of the top-level interpreter state needs to be
> stashed in a known location (Adobe's code uses a struct VMInfo for
> this top-level data).

Why does interpreter state need to be stored ?

> What other considerations are there in designing such a thing?

I don't know but Common Lisp implementations have the facility to save an
image including all the new stuff the user has defined while running the
image which is the point. If you google for "X save image" where X would be
some Common Lisp implementation like SBCL or clisp or ECL , you will get
matches. I have no idea how useful this will be to your own goals but it's an
idea.

> In xpost there's a tiny issue of pointers to posix objects related
> to file globbing. But I don't know how much I really need to worry
> about cleaning those out of the image. If you're trying to resume
> an image that stopped in the middle of a file globbing loop, then ...
> I suppose I should explicitly disavow any guarantees about the
> results in the documentation.

I do think that a cleaner option would be cleaning all objects whose
operation depends on external state (i.e. state outside the control of the
interpreter itself) as opposed to relying on user diligence. If you can't
remove the (pointers to) objects then surely they can have a flag which says
that the object has been invalidated and if the user tries to use an
invalidated object , they would get an error message. So when saving an image
, you would traverse all objects (which I assume you have to do anyway) and
set the "invalidate" flag for all objects dealing with external state.

--
It is better to be young , rich , beautiful and healthy than
old , poor , ugly and diseased.
Anonymous

luserdroog

unread,

Dec 24, 2022, 2:26:24 PM12/24/22

to

On Saturday, December 24, 2022 at 7:25:30 AM UTC-6, Spiros Bousbouras wrote:
> Disclaimer : I don't know postscript so I'm kind of guessing here. Also
> it might have made sense to crosspost this on comp.lang.postscript .
> On Fri, 23 Dec 2022 21:16:38 -0800 (PST)
> luserdroog <mij...@yahoo.com> wrote:
> > One of the parts of the Adobe PostScript interpreter source that I've
> > been taking inspiration from is designing the initial state of the
> > virtual memory (the systemdict holding all the operators and other
> > initialized data structures) to be cached and loaded from a file.
> What for ?

For the original application, as the firmware for a laser printer, I think the
motivation was fast boot up. They could write the program more simply
to do all the initial allocations and load everything to the point that it can
start processing programs, and then do a *core dump* right there so that
state can be instantly resumed after loading that core image.

For my clone I also wanted to be able to perform postmortem debugging
with this core dump file.

> > In my PostScript clone xpost I really tried to make this feature work
> > and I never quite managed it. Worse, mine creates multiple image
> > files every time it runs and just leaves them cluttering up the working
> > directory.
> Images shouldn't be in the working directory , they should be at a fixed
> location unless the user has specified otherwise in some way.

Yes. I can't justify it. I'm really good at procrastinating. It's been a problem
for a long time and I keep on not doing the work to fix it.

> > The first problem is pointers. The memory image has to use integer
> > offset and things instead of pointers. Function pointers for operators
> > have to be mediated through a table that lives outside of the image.
> > And it's coordinating the operator codes across the memory image
> > and the compiled code that leads to coupling the two with a version
> > number or password or something.
> Don't the operators have names ? Why can't you use a hash table saying
> that the code corresponding to operator XYZ is at location N ? I don't
> see what the ordering of the operators in some source code would have
> anything to do with it.

The operator objects are created dynamically and allocated sequential
codes at "ram initialization time". Under modern operating systems you
can't reliably store function pointers across separate invocations of the
(interpreter) program because of ASLR. cf.

https://stackoverflow.com/questions/18040735/will-statically-linked-functions-have-the-same-pointer-values-across-runs-on-the

> > If you change the ordering of
> > operators in the source code, the recompiled program will not work
> > with an old image.
> Source code of what ?

The C code that implements the interpreter.

> > Then, some amount of the top-level interpreter state needs to be
> > stashed in a known location (Adobe's code uses a struct VMInfo for
> > this top-level data).
> Why does interpreter state need to be stored ?

I'm not sure what I've failed to describe. The goal is for the entire state
of execution to be suspended to disk and resumed from the same point
after power cycling. Or, for a hosted interpreter that runs on an OS, you'd
resume by running the interpreter program again with an option to resume
the suspended session.

> > What other considerations are there in designing such a thing?
> I don't know but Common Lisp implementations have the facility to save an
> image including all the new stuff the user has defined while running the
> image which is the point. If you google for "X save image" where X would be
> some Common Lisp implementation like SBCL or clisp or ECL , you will get
> matches. I have no idea how useful this will be to your own goals but it's an
> idea.

Yes, I think this is probably the closest examples I'm likely to find. I think
emacs does something similar.

> > In xpost there's a tiny issue of pointers to posix objects related
> > to file globbing. But I don't know how much I really need to worry
> > about cleaning those out of the image. If you're trying to resume
> > an image that stopped in the middle of a file globbing loop, then ...
> > I suppose I should explicitly disavow any guarantees about the
> > results in the documentation.
> I do think that a cleaner option would be cleaning all objects whose
> operation depends on external state (i.e. state outside the control of the
> interpreter itself) as opposed to relying on user diligence. If you can't
> remove the (pointers to) objects then surely they can have a flag which says
> that the object has been invalidated and if the user tries to use an
> invalidated object , they would get an error message. So when saving an image
> , you would traverse all objects (which I assume you have to do anyway) and
> set the "invalidate" flag for all objects dealing with external state.

Good idea. Yep, I'll need to do this. Any open file handles except stdin, stdout,
stderr should be closed, too. Oh gawd, that means Window handles, too. Ich.

luserdroog

unread,

Dec 24, 2022, 2:38:23 PM12/24/22

to

On Saturday, December 24, 2022 at 4:34:00 AM UTC-6, Bart wrote:

> This doesn't sound any interpreter I've ever used. Usually there is the
> intepreter itself - a executable file image; the program to be run, and
> that's it.
>
> Everything needed is created in-memory and stays there; you don't
> attempt to write that stuff to a file.
>
> What is the special requirement here? Is there a large program prelude
> involved for PostScript, which you don't want to have to repeat each
> time it runs a script, so you want cache that state?

Yes. For the application of being the resident print server inside the printer,
some versions had the ability to run a special `exitserver` operator that
allowed storing persistent data on the printer, like fonts and forms or
emulations of new operators for the next version of PS.

> Does it need to be able to recover from an aborted script?

That's something I want my clone to do. Or at least get some debugging
info out of it. I'd love to be able to get bug reports with the exact state
of the interpreter at the point of the problem and have the tools to diagnose
and fix it. That's the pie-in-the-sky dream anyway.

> Or is there
> other state that needs to be persistent across activations? In short,
> what is the problem to be solved that might need saving a memory image
> to disk and reloading later?

I suppose there are just 2 or 3 use cases I can think of:

1. quick loading of the interpreter without having to run through all of it's
dynamic initialization of all it's many various data structures.
2. storing fonts and stuff into this image.
3. debugging an image after a crash.

Spiros Bousbouras

unread,

Dec 25, 2022, 7:31:51 AM12/25/22

to

On Sat, 24 Dec 2022 11:38:22 -0800 (PST)
luserdroog <mij...@yahoo.com> wrote:
> On Saturday, December 24, 2022 at 4:34:00 AM UTC-6, Bart wrote:
> > Does it need to be able to recover from an aborted script?
>
> That's something I want my clone to do. Or at least get some debugging
> info out of it. I'd love to be able to get bug reports with the exact state
> of the interpreter at the point of the problem and have the tools to diagnose
> and fix it. That's the pie-in-the-sky dream anyway.

Surely the ultimate aspiration should be not to have any bugs at all rather
than how to fix them. By the way , have you considered giving it an interactive
debugger ?

Spiros Bousbouras

unread,

Dec 25, 2022, 7:46:54 AM12/25/22

to

On Sat, 24 Dec 2022 11:26:23 -0800 (PST)

luserdroog <mij...@yahoo.com> wrote:
> On Saturday, December 24, 2022 at 7:25:30 AM UTC-6, Spiros Bousbouras wrote:
> > Disclaimer : I don't know postscript so I'm kind of guessing here. Also
> > it might have made sense to crosspost this on comp.lang.postscript .
> > On Fri, 23 Dec 2022 21:16:38 -0800 (PST)
> > luserdroog <mij...@yahoo.com> wrote:
> > > The first problem is pointers. The memory image has to use integer
> > > offset and things instead of pointers. Function pointers for operators
> > > have to be mediated through a table that lives outside of the image.
> > > And it's coordinating the operator codes across the memory image
> > > and the compiled code that leads to coupling the two with a version
> > > number or password or something.
> > Don't the operators have names ? Why can't you use a hash table saying
> > that the code corresponding to operator XYZ is at location N ? I don't
> > see what the ordering of the operators in some source code would have
> > anything to do with it.
>
> The operator objects are created dynamically and allocated sequential
> codes at "ram initialization time". Under modern operating systems you
> can't reliably store function pointers across separate invocations of the
> (interpreter) program because of ASLR. cf.
>
> https://stackoverflow.com/questions/18040735/will-statically-linked-functions-have-the-same-pointer-values-across-runs-on-the

You haven't answered the core of my question and that is why can't you use
operator names and you have to rely on sequential codes.

Your link says

I want to add a quick-launch feature to my postscript interpreter so it
can bypass the long (-ish) initialization routines and start executing
user programs straightaway.

How long do these take anyway on modern hardware ?

> > > If you change the ordering of
> > > operators in the source code, the recompiled program will not work
> > > with an old image.
> > Source code of what ?
>
> The C code that implements the interpreter.
>
> > > Then, some amount of the top-level interpreter state needs to be
> > > stashed in a known location (Adobe's code uses a struct VMInfo for
> > > this top-level data).
> > Why does interpreter state need to be stored ?
>
> I'm not sure what I've failed to describe. The goal is for the entire state
> of execution to be suspended to disk and resumed from the same point
> after power cycling. Or, for a hosted interpreter that runs on an OS, you'd
> resume by running the interpreter program again with an option to resume
> the suspended session.

No , it's ok , I get it now.

> > > What other considerations are there in designing such a thing?
> > I don't know but Common Lisp implementations have the facility to save an
> > image including all the new stuff the user has defined while running the
> > image which is the point. If you google for "X save image" where X would be
> > some Common Lisp implementation like SBCL or clisp or ECL , you will get
> > matches. I have no idea how useful this will be to your own goals but it's an
> > idea.
>
> Yes, I think this is probably the closest examples I'm likely to find. I think
> emacs does something similar.

firefox does it too up to a point.

--
The parties are advised to chill.
Alex Kozinski in the "MATTEL INC v. MCA RECORDS INC" court decision

Spiros Bousbouras

unread,

Dec 25, 2022, 9:06:05 AM12/25/22

to

Of course for 3 to be possible , the interpreter would have to regularly save
state as opposed to waiting for a user command to do so.

luserdroog

unread,

Dec 25, 2022, 11:45:25 AM12/25/22

to

True, xpost is able to do this much by using an mmap'ed file as the image.
So, when it crashes the file is just there already on the disk.

The Adobe source code has a VMFlush() function to sync the memory to disk.
Oddly, it's disabled when compiled for APPLE.

luserdroog

unread,

Dec 25, 2022, 12:06:08 PM12/25/22

to

You right. I didn't realize how scatterbrained and incomplete my post was.
From the start, I think I need to separate the new features from the bug fix.
And prioritize the latter.

The bug.

When xpost begins executing it creates two mmap'ed files named
gmemXXXXXX and lmemXXXXXX (and if logging is enabled there's
also a third xdumpXXXXXX file created). Each file has its own unique
XXXXXX string. So when there's more than one set of these files in
the directory it'd be very difficult to know which ones are related.

So what I need is a function like tmpfile() that takes a set of filename
templates and fills in each with the same unique XXXXXXs so that
none of them conflict with existing files. That much is probably just
a programming task and not really on topic in this ng. I also need to
change them to live in a tmp directory somewhere (and do this portably
through the autotools).

The language design issue I'm really fuzzy on is what options to have
for the interpreter so it can clean up after itself -- or preserve the
evidence -- as needed. As a first draft, does the following seem
reasonable?

xpost --clean-up % the default
xpost --preserve-temps % or --keep-memory-files ?

For the --clean-up option, there's a programming question of whether
to remove() the files at exit or just use an anonymous mmap() to begin
with.

Tim Rentsch

unread,

Dec 25, 2022, 9:58:33 PM12/25/22

to

> so that none of them conflict with existing files. [...]

The suggestion that comes to mind is to use a combination of a
timestamp and a tmpnam() result to create a directory name (call
it D) and put the files in D/gmem, D/lmem, D/xdump, etc.

> The language design issue I'm really fuzzy on is what options to
> have for the interpreter so it can clean up after itself -- or
> preserve the evidence -- as needed. As a first draft, does the
> following seem reasonable?
>
> xpost --clean-up % the default
> xpost --preserve-temps % or --keep-memory-files ?
>
> For the --clean-up option, there's a programming question of
> whether to remove() the files at exit or just use an anonymous
> mmap() to begin with.

I'm still hoping to see a summary description (concise but complete)
of all the different parts of the environment, what operations need
to be performed on the different kinds of "snapshots", and which
parts of the environment you think need to go in each kind of
snapshot. I have done some work in and on PostScript in the past,
but that was some years ago, so the safest assumption is that I
don't remember anything, and also may not have understood what all
the different parts are even when I was fully immersed in PostScript
work.

luserdroog

unread,

Dec 28, 2022, 12:00:17 AM12/28/22

to

On Sunday, December 25, 2022 at 8:58:33 PM UTC-6, Tim Rentsch wrote:
> luserdroog <mij...@yahoo.com> writes:
>
> > On Saturday, December 24, 2022 at 3:05:54 AM UTC-6, Tim Rentsch wrote:
> >
> >> luserdroog <mij...@yahoo.com> writes:
> >>
> >>> One of the parts of the Adobe PostScript interpreter source that
> >>> I've been taking inspiration from is designing the initial state
> >>> of the virtual memory (the systemdict holding all the operators
> >>> and other initialized data structures) to be cached and loaded
> >>> from a file.
> >>>
> >>> In my PostScript clone xpost I really tried to make this feature
> >>> work and I never quite managed it. Worse, mine creates multiple
> >>> image files every time it runs and just leaves them cluttering up
> >>> the working directory. [...]
> >>
> >> If you want help, define the problem completely and clearly enough
> >> so people who don't know anything about PostScript can understand
> >> what it is you want to accomplish.
> >
> > You right. I didn't realize how scatterbrained and incomplete my
> > post was. From the start, I think I need to separate the new
> > features from the bug fix. And prioritize the latter.
> >
> > The bug.

[snip]

> I'm still hoping to see a summary description (concise but complete)
> of all the different parts of the environment, what operations need
> to be performed on the different kinds of "snapshots", and which
> parts of the environment you think need to go in each kind of
> snapshot. I have done some work in and on PostScript in the past,
> but that was some years ago, so the safest assumption is that I
> don't remember anything, and also may not have understood what all
> the different parts are even when I was fully immersed in PostScript
> work.

Xpost follows the PostScript Level 2 standard with some Display PostScript
Extensions. The Level 2 standard requires a garbage collector, and DPS
requires cooperative multitasking with various ways of forking to share
one or both of Local and Global memory with the parent. The very top
level data structure encompassing the total state is

A collection of execution contexts,
the id of the currently executing context,
a collection of global memory files,
a collection of local memory files
and a flag for whether we're in an error state that hasn't finished being handled.

Already, I'm not sure if full generality is necessary for the present. The existence
and behavior of the multiple contexts is still experimental and not fully
implemented. So the endeavor could be simplified by ignoring this top level
structure and focusing on a single context which is associated with one global
memory file and one local memory file. I think the top level error flag is not
important enough to preserve and can be simply reset to zero when resuming
a session from files.

An execution context is

A collection of opcodes for internal use,
the currently executing object,
pointer to global memory file,
pointer to local memory file,
context id,
a bunch of unsigned integers some of which are addresses in local memory
(including addresses of the stacks, random number seed, flags)
the event handler object,
window device object,
device name (from the command line, like "raster:argb", "mswindows", "xcb").

And some remaining stuff that probably isn't important to preserve.
There's an ignoreinvalidaccess flag that is only used for a brief period during
initialization while building the system dictionary, so it can just be reset to zero.
And some function pointers that don't need to be preserved, they can be filled in
with default functions.

The memory file contains the contents of all the stacks, dictionaries, arrays, strings.
It has an associated memory table that describes what's actually in there. I had
originally designed the memory table to live inside the memory file's arena. But I
pulled the table out of the arena to improve performance. When it lived in the arena,
it was organized as a linked list of small fixed size tables, but that meant chasing
several pointers to access later table segments to find the right table entry. And this
table lookup is needed in every operation involving a dictionary, array, or string.
So the tables came out of the arena, because I couldn't think of a way to make
them grow-able without making them slow to use for common uses.

The memory file structure itself is

a file descriptor,
a filename string of 20 chars max,
a pointer to the data or base pointer,
size used,
size allocated,
memory table descriptor, //originally the address of the first table segment
index of the start of regular table entries
(the first several entries are for special things like the free list and name search tree),

And there's also a bunch of function pointers and flags but I think they're not
important to preserve and can be reset to default values.

The memory table descriptor is

index of next unused entry,
size allocated,
pointer to table entries.

Where a table entry itself is

address of allocation,
size in use,
size of allocation,
an integer for GC to mark,
a tag field used for GC to close files or other destruction.

And that is the complete state of the interpreter. I've included the C structs
themselves below, but hopefully my description above is at the appropriate
level of detail.

typedef struct
{
Xpost_Context ctab[MAXCONTEXT];
unsigned int cid;
Xpost_Memory_File gtab[MAXMFILE];
Xpost_Memory_File ltab[MAXMFILE];
int in_onerror;
} Xpost_Interpreter;

struct _Xpost_Context {

struct
{
int contfilenameforall;
int cvx;
int opfor;
int forall;
int load;
int loop;
int repeat;
int token;
int transform;
int itransform;
int rotate;
int concatmatrix;
} opcode_shortcuts; /**< opcodes for internal use, to avoid lookups */

Xpost_Object currentobject; /**< currently-executing object, for error() */

/*@dependent@*/
Xpost_Memory_File *gl; /**< global VM */
/*@dependent@*/
Xpost_Memory_File *lo; /**< local VM */

unsigned int id; /**< cid for this context */

unsigned int os, es, ds, hold; /**< stack addresses in local VM */
unsigned long rand_next; /**< random number seed */
unsigned int vmmode; /**< allocating in GLOBAL or LOCAL */
unsigned int state; /**< process state: running, blocked, iowait */
unsigned int quit; /**< if 1 cause mainloop() to return, if 0 keep looping */

Xpost_Object event_handler;
Xpost_Object window_device;
const char *device_str;

int ignoreinvalidaccess; //briefly allow invalid access to put userdict in systemdict (per PLRM)

int (*xpost_interpreter_cid_init)(unsigned int *cid);
Xpost_Memory_File *(*xpost_interpreter_alloc_local_memory)(void);
Xpost_Memory_File *(*xpost_interpreter_alloc_global_memory)(void);
int (*garbage_collect_function)(Xpost_Memory_File *mem, int dosweep, int markall);
};

typedef struct Xpost_Memory_File
{
int fd; /**< file descriptor associated with this memory/file,
or -1 if not used. */
char fname[20]; /**< file name associated with this memory/file,
or "" if not used. */
/*@dependent@*/
unsigned char *base; /**< pointer to mapped memory */
unsigned int used; /**< size used, cursor to free space */
unsigned int max; /**< size available in memory pointed to by base */

struct Xpost_Memory_Table table;

unsigned int start; /**< first 'live' entry in the memory_table. */
/* the domain of the collector is entries >= start */

int period;
int threshold;
int free_list_alloc_is_installed;
int (*free_list_alloc)(struct Xpost_Memory_File *mem,
unsigned sz,
unsigned tag,
unsigned int *entity);

int garbage_collect_is_installed;
int (*garbage_collect)(struct Xpost_Memory_File *mem,
int dosweep,
int markall);
int interpreter_cid_get_context_is_installed;
struct _Xpost_Context *(*interpreter_cid_get_context)(unsigned int cid);
int (*interpreter_get_initializing)(void);
void (*interpreter_set_initializing)(int);
} Xpost_Memory_File;

typedef struct Xpost_Memory_Table
{
unsigned int nextent; /**< next slot in table */
unsigned int max; /**< allocated size */
struct
{
unsigned int adr; /**< allocation address */
unsigned int used; /**< size in use */
unsigned int sz; /**< size of allocation */
unsigned int mark; /**< garbage collection metadata */
unsigned int tag; /**< type of object using this allocation, if needed */
} *tab; /**< table entries */
} Xpost_Memory_Table;

luserdroog

unread,

Dec 28, 2022, 12:14:15 AM12/28/22

to

On Sunday, December 25, 2022 at 8:58:33 PM UTC-6, Tim Rentsch wrote:

[...]

> what operations need
> to be performed on the different kinds of "snapshots", and which
> parts of the environment you think need to go in each kind of
> snapshot.

Forgot to answer this part in my last post. In fact I forgot to think about
this before writing my last post. Different kinds of snapshots ... that is
a very good question. My initial thought is that I'm interested in taking
a snapshot just after the operator table and system dictionary are fully
populated, the point where it can actually start executing PostScript
programs. But I need to think more about other kinds of snapshots.
Maybe a few key bits of info would be useful in a bug report where a 1GB
coredump would just be a waste of bandwidth.

luserdroog

unread,

Dec 28, 2022, 11:36:37 PM12/28/22

to

On Friday, December 23, 2022 at 11:16:39 PM UTC-6, luserdroog wrote:
> One of the parts of the Adobe PostScript interpreter source that I've
> been taking inspiration from is designing the initial state of the
> virtual memory (the systemdict holding all the operators and other
> initialized data structures) to be cached and loaded from a file.
>
> In my PostScript clone xpost I really tried to make this feature work
> and I never quite managed it. Worse, mine creates multiple image
> files every time it runs and just leaves them cluttering up the working
> directory.

As a stepping stone, and also to avoid doing the real work, I've been
attempting to add this feature to my horribly golfed lisp interpreter.
This has the advantage that the memory image was 90% there already.
And the whole source code is under 300 lines.

There were only 2 kinds of object with pointers: strings and functions.
So I changed the strings to store an offset instead of a pointer, and
changed the functions pointers to index a global table of function
pointers.

single file:
https://raw.githubusercontent.com/luser-dr00g/sexp.c/0f21549c84b5080bfda940412e54ccf3dd0fe431/sexp.c
whole project directory
https://github.com/luser-dr00g/sexp.c/archive/0f21549c84b5080bfda940412e54ccf3dd0fe431.zip

For this simpler interpreter, the entire state is described by
the function table and the memory image.

struct state {
int*m,*n,msz, /*memory next mem-size*/
env, /* global environment for REPL, modified by SET, SETQ and DEFUN */
atoms; /* head of atom list */
char linebuf[BUFSIZ];
char *inputptr;
} global = { .linebuf = { 0 }, .inputptr = global.linebuf };

#define MAX_FUNCTIONS 30
typedef int function();
function *ftab[MAX_FUNCTIONS];
int num_functions;

So, really the data from the state that need to be preserved in a snapshot here are
the size in use (n-m),
index to env,
index to atoms

They need to placed in a strategic "known location" I think. Maybe just appended
to the image data, so upon reloading they're the last 3 ints? The size in use
would point back to the same location, offering a sanity check.

A remaining issue is that the image data doesn't identify which version of the
interpreter source (the C source) it is intended to work with. In particular, the
ftab needs to have been populated in the same order as the indices in the
function objects that exist in the memory, so their association is preserved.

Tim Rentsch

unread,

Dec 29, 2022, 9:06:37 PM12/29/22

to

luserdroog <mij...@yahoo.com> writes:

> On Sunday, December 25, 2022 at 8:58:33 PM UTC-6, Tim Rentsch wrote:

[...]

>> I'm still hoping to see a summary description (concise but complete)

>> of all the different parts of the environment, [...]
>
> [.. long and detailed description ..]

A blizzard of information... more than is needed in some areas,
and (I suspect) less than is needed in others.

Part of the reason for asking for a _concise_ but complete
summary description is to get you to organize the information so
it can be so presented. Going through the effort of organizing
the information in this way should go a long way towards helping
you implement the state-saving functionality.

For example, any state that is held in integer data types can all
be lumped together, because saving integers is well-understood
and pretty easy.

Conversely, function pointers need special care, because they
cannot just be stored directly.

The long description given mentions "objects" but as far as I can
tell what an "object" (Xpost_Object?) is is never defined.

Are references to objects done with pointers or by means of an
object table? If there were an object table that would greatly
simplify (probably) the state-saving operations.

You mention "operations" but don't say what an operation is.

Also, there is some amount of state for the garbage collector.
Probably that state does not need to be (directly) saved for
a state-saving operation. What information is important to save,
and what information is incidental and can be ignored?

Do these comments help you see what I'm getting at?

luserdroog

unread,

Jan 2, 2023, 9:13:31 PM1/2/23

to

On Thursday, December 29, 2022 at 8:06:37 PM UTC-6, Tim Rentsch wrote:
> luserdroog <mij...@yahoo.com> writes:
>
> > On Sunday, December 25, 2022 at 8:58:33 PM UTC-6, Tim Rentsch wrote:
> [...]
> >> I'm still hoping to see a summary description (concise but complete)
> >> of all the different parts of the environment, [...]
> >
> > [.. long and detailed description ..]
>
> A blizzard of information... more than is needed in some areas,
> and (I suspect) less than is needed in others.
>
> Part of the reason for asking for a _concise_ but complete
> summary description is to get you to organize the information so
> it can be so presented. Going through the effort of organizing
> the information in this way should go a long way towards helping
> you implement the state-saving functionality.
>

[snip]

>
> Do these comments help you see what I'm getting at?

Yes. It's taking more time to make this description shorter. But I'm working on it.
I've discovered a few other pointers in there that I'd forgotten about.

I'm also trying to think about the situation from the POV of regular readers
of this newsgroup. It'd probably be of more enduring interest if I kept the
description as close as possible to what the PostScript standard requires
and to use its terminology. But that goal is at odds with my personal goal
of making improvements to this existing implementation -- howsoever flawed
and idiosyncratic it might be. OTOH nitty gritty implementation details crop
up in virtually every topic.

luserdroog

unread,

Jan 8, 2023, 12:07:49 AM1/8/23

to

On Thursday, December 29, 2022 at 8:06:37 PM UTC-6, Tim Rentsch wrote:
> luserdroog <mij...@yahoo.com> writes:
>
> > On Sunday, December 25, 2022 at 8:58:33 PM UTC-6, Tim Rentsch wrote:
> [...]
> >> I'm still hoping to see a summary description (concise but complete)
> >> of all the different parts of the environment, [...]
> >
> > [.. long and detailed description ..]
>
> A blizzard of information... more than is needed in some areas,
> and (I suspect) less than is needed in others.
>
> Part of the reason for asking for a _concise_ but complete
> summary description is to get you to organize the information so
> it can be so presented. Going through the effort of organizing
> the information in this way should go a long way towards helping
> you implement the state-saving functionality.
>

> For example, any state that is held in integer data types can all
> be lumped together, because saving integers is well-understood
> and pretty easy.
>
> Conversely, function pointers need special care, because they
> cannot just be stored directly.
>
> The long description given mentions "objects" but as far as I can
> tell what an "object" (Xpost_Object?) is is never defined.
>
> Are references to objects done with pointers or by means of an
> object table? If there were an object table that would greatly
> simplify (probably) the state-saving operations.
>
> You mention "operations" but don't say what an operation is.
>
> Also, there is some amount of state for the garbage collector.
> Probably that state does not need to be (directly) saved for
> a state-saving operation. What information is important to save,
> and what information is incidental and can be ignored?
>

> Do these comments help you see what I'm getting at?

Yes. Let me try again having cleaned and straightened up my bifocals.

From the top level, and abstracting away all the fiddlybits, the whole
interpreter is just a collection of execution contexts. Since it's
just a simple round robin scheduling algorithm, it doesn't even really
need to remember the current context to resume execution.

Interpreter
collection of Contexts

Next, an execution context has a global memory and a local memory,
where there is a rule that global memory ought not to contain any
references to things in local memory. So, the global memory can be
considered self-contained with the local memory forming a shell around
it.

Context
Global Memory
Local Memory
a collection of integers (flags, offsets into local memory)
a collection of Objects (current object, window device, window
device event handler)

Skipping ahead to the Objects themselves, these are designed to be 64
bits long. There are Simple Objects which contain their value entirely
within the 64 bit representation. And there are Composite Objects,
such as arrays, strings, and dictionaries, which have their values in
either Global or Local Memory. There are File objects which have a
pointer to a C structure in memory, indexed by an entity number.
There are also Name objects which have associated strings in one of
the memories. Operator objects contain an integer code which indexes
the operator table which is in Global memory (although this is not a
requirement, perhaps it would make more sense to have the operator
table exist outside the memory arena). There is a Glob object which
is not directly accessible to the user but exists during the execution
of the `filenameforall` looping operator. There is also a Magic
object which is intended to exist only in the value part of a
key/value pair in a dictionary, in order to implement the Magic
Dictionaries from Sun's NeWS (where something like `canvas /mapped
true put` would instantly make the window visible).

Object = Integer int-val
| Boolean bool-val
| Composite Global? Entity Size Offset
| File Global? Entity
| Name Global? name-index
| Operator opcode
| Glob pointer-to-POSIX-glob_t
| Magic pointer-to-struct{(*get)();(*set)();}

A composite object always has a bit specifying whether it's in Global
or Local memory, then an Entity number which is an index into the
Memory Table for that memory, Size (in bytes for a String, objects for
an Array, key/value pairs for a Dictionary), and an Offset which will
be added to the address looked up from the Memory Table.

[Aside: I think both kinds of pointer need to be removed from the
Object representation and replaced with indexes into global tables.
With 64bit pointers, these violate the "64bit design" and force the
objects to be larger than intended.]

Each Memory has an associated Memory Table, indexed by the Entity
number, and a flat area of raw memory with size in use and total size
available.

Memory
Memory arena (big block of raw data, size in use, size available)
Memory Table

The Memory Table has a size in use and total size available, and an
array of allocation records containing an address (offset into the
arena data), size in use, size available, GC mark

Memory Table
size in use, size available
Array of (address (==offset), size used, size available, mark, tag)

For better or worse, all the other features of the PostScript Virual
Memory are grafted on top of this basic structure of Objects which
index the Memory Table to get the offset into the raw data arena.

sidebar: How some other features are grafted on:

The first few slots of the Memory Table hold Special Entities:
[0]: Free List (32bit word at address is the index of next free slot)
[1]: Save Stack (address locates head of stack of stacks of Save Records)
[2]: Context List (array of ids of all contexts sharing this memory)
[3]: Name Stack (address locates stack of string objects)
[4]: Name Tree (address locates head of Ternary Search Tree)
[5]: Bogus Name (special internal string returned by a failed name lookup)
( [6]: Operator Table if this is a Global Memory )
[...]: Live Allocations of Entities

The Operator Table is organized as an array of records

Operator Table
Array of (name stack index of operator's name,
number of operator Signatures,
address of array of Signatures)

Signature
pointer to function which implements the operator's action
number of argument objects
address of array of tag patterns
pointer to stack checking function (or NULL)
number of output objects

The window device is a Dictionary whose contents are in Local Memory.
The window device event handler is an Operator object which indexes
into the Operator table like any other operator to locate its function
pointer. One complication is that a window object has a block of internal
data that it stores in a PostScript String object. For an xcb device
this block of data contains an xcb_connection_t * and xcb_screen_t *
which would no longer be valid. Although some crucial information
would still be stored in the dictionary, like the dimensions. So, it
shouldn't be too difficult to create a new window with the old specs.

What needs to be done for the interpreter to resume a stored memory
image.

The original design was to have all the various pieces naturally
live inside the memory arena, so then the Saving/Resuming behavior
would just happen automatically by saving and loading the raw data.
The memory arena is allocated using mmap() so the saving part is
already done. Without any extra effort, exiting the interpreter
leaves it's final gmemXXXXXX and lmemXXXXXX files sitting right
there on the disk.

I put the Operator table in the arena so the memory image would
naturally correspond to the operator definitions it would work
with... but that doesn't really solve anything it seems. The Memory
Tables were originally implemented as a linked list of fixed sized
tables but it was pulled out of the arena for better performance.

So, in broad strokes, the Memory Table needs to go back into the arena
and the Operator Table needs to come out. Perhaps some kind of CRC or
hash could be computed on the operator names to establish the
correspondence between the codes in memory and the functions they
reference. And the Magic pointers need to be replaced with an
index into a table.

Upon resuming, a sweep needs to be done to invalidate all
Glob pointers. And something needs to be done about FILE *s.
Stdio files like stdin, stdout, stderr should be possible
to reconnect -- if they are stored in a recognizable form.
And it seems possible -- with much additional work -- to
remember the position and filename of a repositionable file
and to fopen() and fseek() to the same place. But maybe FILE *s
should just be invalidated except for stdio files which do
seem essential.

And finally, all the top level structures need to be packaged
up into a record and stashed into the Local Memory arena.
Well, wait a second. I think it does actually need one special
top-level file to list all the contexts. Each context is associated
with exactly one Local memory so all the per-context info can be
stored there. But two contexts may be sharing that same Local memory.

Files to store on Disk:
stateXXX/
interpreter.config
gmem001
lmem001
lmem002
...

Where interpreter.config would need to show which memories are used
by a context and where to find the context info in the (local) memory.
Something like:

cid001: gmem001 lmem001 info:<address>
cid002: gmem001 lmem002 info:<address>

Tim Rentsch

unread,

Jan 16, 2023, 1:04:25 PM1/16/23

to

Okay. It might be good to say whether the order is important,
especially if the ordering information needs to be held extrinsic
to a Context rather than being stored in the Context itself.

> Next, an execution context has a global memory and a local memory,
> where there is a rule that global memory ought not to contain any
> references to things in local memory. So, the global memory can be
> considered self-contained with the local memory forming a shell
> around it.
>
> Context
> Global Memory
> Local Memory
> a collection of integers (flags, offsets into local memory)
> a collection of Objects (current object, window device, window
> device event handler)
>

> [another 150 lines]

The names Global Memory and Local Memory don't say very much.

Here the word "collection" feels wrong. For example a struct
is a kind of "collection", but normally we wouldn't call it
a collection. The descriptions here are both a little big
vague and have unnecessary (?) information.

Some general comments...

Probably anything related to internal memory management (e.g.,
garbage collector, free space) can be left out. Unless there
is some compelling reason they need to be stored on the disk,
they shouldn't be, and hence there is no need to describe them.

Presumably all the stuff that needs to be stored is made up of
"atoms" and some sort of links between atoms. (Atoms are just
blobs of memory that are referred to only as a whole, never as a
subatom except for things like integers and links. In particular
atoms are not the same as an atom in Lisp.) All the ordinary
data in an atom can be lumped together and doesn't need to be
mentioned further. The questions that come up are

1. How do we know what kind of atom a particular atom is?

2. Given a particular kind of atom, how do we know where
the links are in that atom?

3. What are the different kind of links?

4. Do some atoms live in "spaces", which could affect
the nature of links to those atoms?

5. Are spaces of atoms some sort of compound structures
to be stored, or are they implicit somehow?

6. Do we also need to store some sort of links to
external entities that are not going to be held
in the database but are understood to reference
some sort of thing that the interpreter will
know about, without needing to store it?

These questions need accurate answers but not necessarily
precise answers. For example, we might say

Each atom has a tag that says what kind of atom it is.
The tag has enough information to be able to locate
where are the links are, and what is the kind of each
link.

Probably that description isn't accurate, but if it were then
that is most of what we need to say about atoms. The description
isn't precise enough that we could write code from it, but it has
enough information that we could start to design code to read a
stored image back into memory. Looking at the structure from the
other end of the spectrum, we need to know how the data storage
is compartmentalized. Are links allowed to cross compartment
boundaries (probably some are), and if so what kinds of links?
How are the kinds of different compartments to be identified?
Compartments may be the same as spaces or they may be different
somehow, probably the main difference being how they participate
(or do not participate) in different kinds of links.

The key is to give just enough information so the overall
structure stands out, but not more information than that. You
might try writing descriptions at three different levels: one
with not enough information, one with too much information, and
one somewhere in between those two. By adding bits to the "too
little" description and taking away bits from the "too much"
description you may find it easier to discover the "just right"
description.

I hope this is enough to help you in the next iteration.

Tim Rentsch

unread,

Jan 16, 2023, 1:05:38 PM1/16/23

to

P.S. Tsk, tsk.. posting through google groups...