Code Review requested: Postscript Interpreter

luserXtrog

unread,

Dec 19, 2010, 6:44:17 AM12/19/10

to

I feel I need a fresh perspective (or many, ideally)
on my program. It's grown to where I can't quite keep
it all in my head and making new additions has become
a game of "how did I do this elsewhere?"

A zip file containing c and postscript source and a makefile
are available at:
http://code.google.com/p/xpost/downloads/list

I chose a BSD licence because I don't know any better.

There are probably too few comments.
So even comments like "this part needs more comments"
are desirable.

And in more than a few places I'm certainly guilty
of attempting to be cute and/or clever. But to all
appearances, it all works somehow.

A little toc:
arr.c arr.h array operators (functions)
bool.c bool.h boolean operators
color.c color.h color operators
control.c control.h control operators
dic.c dic.h dictionary operators
err.c err.h error handling (c-part)
err.ps error handling (ps-part)
file.c file.h file operators
global.c global.h global variables (yes, I know. bad. sorry)
all the stacks are here
init.c init.h initialize (c-part)
init.ps initialize (ps-part)
lim.h implementation limits
main.c main function and central loop
math.c math.h math operators
matrix.c matrix.h matrix operators
obj.h the object structure
oper.c oper.h the operator interface (function-pointer-objects)
paint.c paint.h just stroke
path.c path.h path construction (no curves)
poly.c poly.h polymorphic operators
squiggle.ps a doodle (no showpage)
sta.c sta.h stack manipulation operators
str.c str.h string operators
tok.c tok.h the token operator (the lexical scanner)
type.c type.h type and attribute operators
vm.c vm.h virtual memory (mmap requires POSIX)
x.c x.h the X11 functions

TIA
--
Oh, to watch one's favorite show again
for the first time.

Barry Schwarz

unread,

Dec 19, 2010, 1:36:19 PM12/19/10

to

On Sun, 19 Dec 2010 03:44:17 -0800 (PST), luserXtrog
<mij...@yahoo.com> wrote:

snip

>A little toc:
>arr.c arr.h array operators (functions)
>bool.c bool.h boolean operators
>color.c color.h color operators
>control.c control.h control operators
>dic.c dic.h dictionary operators
>err.c err.h error handling (c-part)
>err.ps error handling (ps-part)
>file.c file.h file operators
>global.c global.h global variables (yes, I know. bad. sorry)
> all the stacks are here
>init.c init.h initialize (c-part)
>init.ps initialize (ps-part)
>lim.h implementation limits
>main.c main function and central loop
>math.c math.h math operators

Why would you introduce unnecessary confusion by naming your header
file the same as a standard header?

--
Remove del for email

Gene

unread,

Dec 19, 2010, 7:02:34 PM12/19/10

to

On Dec 19, 6:44 am, luserXtrog <mijo...@yahoo.com> wrote:
> I feel I need a fresh perspective (or many, ideally)
> on my program. It's grown to where I can't quite keep
> it all in my head and making new additions has become
> a game of "how did I do this elsewhere?"
>
> A zip file containing c and postscript source and a makefile
> are available at:http://code.google.com/p/xpost/downloads/list
>
> I chose a BSD licence because I don't know any better.
>
> There are probably too few comments.
> So even comments like "this part needs more comments"
> are desirable.
>
> And in more than a few places I'm certainly guilty
> of attempting to be cute and/or clever. But to all
> appearances, it all works somehow.
>

I count a bit over 2,700 sloc. It's typical for an inexperienced
programmer to start losing control of a program at about this size if
there's been no design work or scaffolding before coding. If that's
what's happened, you won't regain control. Get a good book on data
structures and another on software design. Read them. Start over.
Chalk this one up to a learning experience.

BartC

unread,

Dec 19, 2010, 7:48:33 PM12/19/10

to

"Gene" <gene.r...@gmail.com> wrote in message
news:8c538e83-4c6b-4818...@y23g2000yqd.googlegroups.com...

I had a quick look. 2700 loc seems tiny for any sort of interpreter.

My main criticism might be that it is split up into too many files,
averaging just 100 lines per module and 23 lines per header file.

I'm not surprised it's difficult to keep it all together. In fact I'd be
tempted to put it all into one file.

--
bartc

luserXtrog

unread,

Dec 20, 2010, 12:50:04 AM12/20/10

to

On Dec 19, 6:48 pm, "BartC" <b...@freeuk.com> wrote:
> "Gene" <gene.ress...@gmail.com> wrote in message

>
> news:8c538e83-4c6b-4818...@y23g2000yqd.googlegroups.com...
>
>
>
> > On Dec 19, 6:44 am, luserXtrog <mijo...@yahoo.com> wrote:
> >> I feel I need a fresh perspective (or many, ideally)
> >> on my program. It's grown to where I can't quite keep
> >> it all in my head and making new additions has become
> >> a game of "how did I do this elsewhere?"
>
> >> A zip file containing c and postscript source and a makefile
> >> are available at:http://code.google.com/p/xpost/downloads/list
>
> >> I chose a BSD licence because I don't know any better.
>
> >> There are probably too few comments.
> >> So even comments like "this part needs more comments"
> >> are desirable.
>
> >> And in more than a few places I'm certainly guilty
> >> of attempting to be cute and/or clever. But to all
> >> appearances, it all works somehow.
>
> > I count a bit over 2,700 sloc. It's typical for an inexperienced
> > programmer to start losing control of a program at about this size if
> > there's been no design work or scaffolding before coding. If that's
> > what's happened, you won't regain control. Get a good book on data
> > structures and another on software design. Read them. Start over.
> > Chalk this one up to a learning experience.

This is already the fourth start-over! For a sense of where I started,
you could search for the thread "Embarrassing Spaghetti Code Needs
Stylistic Advice" in clc about a year ago.

> I had a quick look. 2700 loc seems tiny for any sort of interpreter.

Well it's still incomplete. I imagine the size will more than double
by the time I get all the standard operators finished.

> My main criticism might be that it is split up into too many files,
> averaging just 100 lines per module and 23 lines per header file.
>
> I'm not surprised it's difficult to keep it all together. In fact I'd be
> tempted to put it all into one file.

The first version was a single file. Then I read somewhere that
source files should be no more than 200-300 lines (Art of Unix
Programming, maybe?). Thereafter, the single file began to seem
unweildy. So when I started the first rewrite I tried to keep things
smaller.

With version 3, I tried partitioning along "logical groupings" of
functions (mostly following the categories from the Postscript
manual itself). With this fourth try, I've tried to strictly
follow the categories from the manual and let the file sizes take
care of themselves. Learning how to use ctags has made editing
multiple files almost as easy the single file was.

Adding a new operator is probably the most troublesome part, lately.
It requires an operator function in the .c file, a declaration in
the .h file, and an entry in the OPERATORS macro (at the end, to
avoid recompiling everything) in oper.h. And the makefile doesn't
really know all the dependencies so init.c has to be touched
so the new operator can get installed in the dictionary at startup.

luser- -droog

unread,

Dec 20, 2010, 1:19:37 AM12/20/10

to

On Dec 19, 12:36 pm, Barry Schwarz <schwa...@dqel.com> wrote:
> On Sun, 19 Dec 2010 03:44:17 -0800 (PST), luserXtrog
>

> <mijo...@yahoo.com> wrote:
>
> snip
>
>
>
> >A little toc:
> >arr.c arr.h array operators (functions)
> >bool.c bool.h boolean operators
> >color.c color.h color operators
> >control.c control.h control operators
> >dic.c dic.h dictionary operators
> >err.c err.h error handling (c-part)
> >err.ps error handling (ps-part)
> >file.c file.h file operators
> >global.c global.h global variables (yes, I know. bad. sorry)
> > all the stacks are here
> >init.c init.h initialize (c-part)
> >init.ps initialize (ps-part)
> >lim.h implementation limits
> >main.c main function and central loop
> >math.c math.h math operators
>
> Why would you introduce unnecessary confusion by naming your header
> file the same as a standard header?
>

I must have thought that if I could keep them straight, everyone
else could too. Fallacious perhaps, but all too common among us
introverts.

Oh, and I apologize for forgetting to set followups in the original.
I fully intended to decide which group should house the thread
and set the followup right up until I completely forgot and just
hit send.

Malcolm McLean

unread,

Dec 20, 2010, 2:19:47 AM12/20/10

to

On Dec 20, 7:50 am, luserXtrog <mijo...@yahoo.com> wrote:
>
> The first version was a single file. Then I read somewhere that
> source files should be no more than 200-300 lines (Art of Unix
> Programming, maybe?). Thereafter, the single file began to seem
> unweildy. So when I started the first rewrite I tried to keep things
> smaller.
>

What's important is that source files should be organised logically,
holding related functions (which you seem to have done), and with
controlled dependencies - the last is the hard part and often there
are forces pulling you both ways.

luserXtrog

unread,

Dec 20, 2010, 3:35:54 AM12/20/10

to

On Dec 20, 1:19 am, Malcolm McLean <malcolm.mcle...@btinternet.com>
wrote:

My use of header files got rather convoluted while trying to avoid
circular dependecies. So I stuffed everything in a controlled order
in one place (global.h) and had everything else include that. I'm
hoping if I side-step the issue long enough eventually I'll be
running rings around it.

BartC

unread,

Dec 20, 2010, 7:48:09 AM12/20/10

to

"luserXtrog" <mij...@yahoo.com> wrote in message
news:6bd8060b-e0bc-4845...@i18g2000yqn.googlegroups.com...

> On Dec 19, 6:48 pm, "BartC" <b...@freeuk.com> wrote:

>> My main criticism might be that it is split up into too many files,
>> averaging just 100 lines per module and 23 lines per header file.
>>
>> I'm not surprised it's difficult to keep it all together. In fact I'd be
>> tempted to put it all into one file.
>
> The first version was a single file. Then I read somewhere that
> source files should be no more than 200-300 lines (Art of Unix
> Programming, maybe?). Thereafter, the single file began to seem
> unweildy. So when I started the first rewrite I tried to keep things
> smaller.

If your editing tools present all the entities in the project (functions,
variables, types, macros, etc) as a kind of database, then their location in
a specific file, and the number of such files, becomes less important.

With the low-level tools I use, which rely on me *remembering* where
everything is, then too many files can generate real problems.

As it is, the byte-sizes of your modules, are more like the line-counts of
mine...

But the 200-300 lines per source file rule sounds nonsense to me, if you
have to know yourself where everything lives (Unix -- or is it Linux --
source code is supposed to be 4Mloc, which would make it some 16,000 files
according to that rule; a tad unmanageable.)

(In one interpreter project of mine, the core of it occupies three modules:
the interpreter itself (7500 lines, half of that in-line asm), implementing
it's operators (4500 ) and implementing it's built-in functions (3500). And
I still have trouble knowing where a function lives! (This is for bytecode,
so no parsing is needed.) A newer, more ambitious project however averages
1300 lines per file, but is in early stages so that figure will grow.)

The other thing I noticed is that you have a lot of names starting with "O",
which look a bit like "0" (apart from names which are just "o"); in other
words, a bit strange...

--
Bartc

ImpalerCore

unread,

Dec 20, 2010, 11:27:00 AM12/20/10

to

On Dec 19, 6:44 am, luserXtrog <mijo...@yahoo.com> wrote:

> I feel I need a fresh perspective (or many, ideally)
> on my program. It's grown to where I can't quite keep
> it all in my head and making new additions has become
> a game of "how did I do this elsewhere?"
>
> A zip file containing c and postscript source and a makefile
> are available at:http://code.google.com/p/xpost/downloads/list
>
> I chose a BSD licence because I don't know any better.
>
> There are probably too few comments.
> So even comments like "this part needs more comments"
> are desirable.
>
> And in more than a few places I'm certainly guilty
> of attempting to be cute and/or clever. But to all
> appearances, it all works somehow.

<snip>

I think you're getting to the point where you're going to need a
documentation system. First off, browsing for functionality in a
browser is much easier than grepping files. You forget things like
order of arguments, semantics for elements in a struct, or the meaning
of return values. I recommend spending some time learning and
creating some documentation in Doxygen (or similar) to get a feel for
what's possible. Without a good system of documentation, you will
likely spend lots of additional time rereading code to learn how to
use the functions you've created months ago. Here's an example of
some doxygenated comments from my list library.

\code snippet
/*!
* \struct c_list
* \brief The \c c_list struct is used as the list node for a
* double-linked list.
*/
struct c_list
{
/*!
* \brief This variable references the list node's object, which
* can be a pointer to any type, and may point to a
* dynamically allocated object.
*/
void* object;

/*! \brief This variable links to the previous object in the list.
*/
struct c_list* prev;

/*! \brief This variable links to the next object in the list. */
struct c_list* next;
};

#if defined(C_ALIAS_TYPES)
/*! \brief Alias the <tt>struct c_list</tt> type. */
typedef struct c_list c_list;
#endif

/*!
* \brief Adds a new object at the front of a \c c_list.
* \param list A \c c_list.
* \param object The reference to the new object.
* \return The start of the new \c c_list.
*
* \usage
* \include list/c_list_insert_front_example.c
*
* The example above should display the following.
*
* \code
* A coders haiku
* --------------
* A double linked list
* In the right circumstances
* Points to good design
* \endcode
*/
struct c_list* c_list_insert_front( struct c_list* list, void*
object );

/*!
* \brief Adds a new object at the end of a \c c_list.
* \param list A \c c_list.
* \param object The reference to the new object.
* \return The start of the new \c c_list.
*
* The return value is the start of the new list, which may have
* changed.
*
* Note that \c c_list_insert_back has to traverse the entire list to
* find the end, which is inefficient when adding multiple objects.
* A common idiom to avoid the inefficiency is to insert the objects
* at the front of the list and reverse the list when all the objects
* have been added.
*
* \usage
* \include list/c_list_insert_back_example.c
*
* The example above should display the following.
*
* \code
* A coders haiku
* --------------
* Inserting objects
* End of a very long list
* Extra long coffee break
* \endcode
*/
struct c_list* c_list_insert_back( struct c_list* list, void*
object );
\endcode

\code snippet c_list_insert_front_example.c
#include <stdio.h>
#include <string.h>
#include <VH/common/config.h>
#include <VH/common/macros.h>
#include <VH/common/alloc.h>
#include <VH/common/strops.h>
#include <VH/common/list.h>

int main( void )
{
c_list* haiku = NULL;
c_list* l;
size_t i;
char* s;

char* haiku_strings[] = {
"Points to good design",
"In the right circumstances",
"A double linked list"
};

for ( i = 0; i < C_ARRAY_N( haiku_strings ); ++i )
{
s = c_strdup( haiku_strings[i] );
if ( s ) {
haiku = c_list_insert_front( haiku, s );
}
}

printf( "A coders haiku\n" );
printf( "--------------\n" );
for ( l = haiku; l != NULL; l = l->next ) {
printf( "%s\n", (char*)l->object );
}

c_list_free( haiku, c_free );

return EXIT_SUCCESS;
}
\endcode

In my documentation, I describe the parameters, return values,
semantic details if needed, and provide an example that demonstrates
its usage with expected results.

The drawback of this is that it is a *lot* more work, especially if
you want to create (non-boring) examples that demonstrate something
interesting about the semantics of the function. The payoff is that
you can go back at a later time and grok the function much easier than
having to reread code you wrote months or years ago. And if you make
the time to go through the documentation, any semantic quirks that pop
up (like NULL pointers) will be addressed in the documentation, and
hopefully won't bite you again or someone that follows you. Again let
me re-emphasize, doing what I do is a *lot* of extra work, and may not
be compatible in some work environments.

Second, I think you may want to look at partitioning functionality
into a couple of libraries. Library is the basic component of reuse
in C, and if or when you do something new, the work you put into the
functionality for the postscript parser will be easier to apply to
something else if pertinent pieces are nicely encapsulated in a
library.

That's all I got for now.

Best regards,
John D.

Gene

unread,

Dec 20, 2010, 6:25:04 PM12/20/10

to

On Monday, December 20, 2010 7:48:09 AM UTC-5, Bart wrote:
> But the 200-300 lines per source file rule sounds nonsense to me, if you
> have to know yourself where everything lives (Unix -- or is it Linux --
> source code is supposed to be 4Mloc, which would make it some 16,000 files
> according to that rule; a tad unmanageable.)

Yes, the 200-300 was common advice circa mid-80's when compilers were much slower and editors didn't have features (bookmarks, source navigation, etc.) to deal with longer files. These days 2,000-3,000 lines is perfectly fine and 20,000-30,000 occasionally if the code content has boilerplate similarity.

Ben Pfaff

unread,

Dec 20, 2010, 7:08:31 PM12/20/10

to

Gene <gene.r...@gmail.com> writes:

> Yes, the 200-300 was common advice circa mid-80's when
> compilers were much slower and editors didn't have features
> (bookmarks, source navigation, etc.) to deal with longer files.
> These days 2,000-3,000 lines is perfectly fine and
> 20,000-30,000 occasionally if the code content has boilerplate
> similarity.

I'd be really uncomfortable with 20,000-30,000 lines of
boilerplate, unless it was automatically generated and maintained
by modifying the generator, not by modifying the generated code.
--
A competent C programmer knows how to write C programs correctly,
a C expert knows enough to argue with Dan Pop, and a C expert
expert knows not to bother.

luser- -droog

unread,

Dec 20, 2010, 8:41:23 PM12/20/10

to

On Dec 20, 10:27 am, ImpalerCore <jadil...@gmail.com> wrote:
> On Dec 19, 6:44 am, luserXtrog <mijo...@yahoo.com> wrote:
>
>
>
> > I feel I need a fresh perspective (or many, ideally)
> > on my program. It's grown to where I can't quite keep
> > it all in my head and making new additions has become
> > a game of "how did I do this elsewhere?"
>
> > A zip file containing c and postscript source and a makefile
> > are available at:http://code.google.com/p/xpost/downloads/list

> <snip>
>
> I think you're getting to the point where you're going to need a
> documentation system. First off, browsing for functionality in a
> browser is much easier than grepping files. You forget things like
> order of arguments, semantics for elements in a struct, or the meaning
> of return values. I recommend spending some time learning and
> creating some documentation in Doxygen (or similar) to get a feel for
> what's possible. Without a good system of documentation, you will
> likely spend lots of additional time rereading code to learn how to
> use the functions you've created months ago.

Agreed. One of my dreams for the project is to make it a self-
documenting
literate program (the uber-quine) producing a pdf book describing
itself.
But I really should learn some sort of documenting system now to get
the
whole process started.

>Here's an example of
> some doxygenated comments from my list library.
>

> In my documentation, I describe the parameters, return values,
> semantic details if needed, and provide an example that demonstrates
> its usage with expected results.

I've tried to avoid the need for this level of detail in comments
by building, as directly as I could, a mapping between the published
standard and the semantics of the program. Hence parameters and return
values for the O* functions are directly from the Adobe book.
But, of course, that has the drawback that anyone who doesn't own
the book can't make as much sense of the program.

Point taken. Each function should have some description.

> The drawback of this is that it is a *lot* more work, especially if
> you want to create (non-boring) examples that demonstrate something
> interesting about the semantics of the function. The payoff is that
> you can go back at a later time and grok the function much easier than
> having to reread code you wrote months or years ago. And if you make
> the time to go through the documentation, any semantic quirks that pop
> up (like NULL pointers) will be addressed in the documentation, and
> hopefully won't bite you again or someone that follows you. Again let
> me re-emphasize, doing what I do is a *lot* of extra work, and may not
> be compatible in some work environments.
>
> Second, I think you may want to look at partitioning functionality
> into a couple of libraries. Library is the basic component of reuse
> in C, and if or when you do something new, the work you put into the
> functionality for the postscript parser will be easier to apply to
> something else if pertinent pieces are nicely encapsulated in a
> library.

Indeed. I've been trying to build up the graphics functionality as
a library. It's a lot of work just to track down the sources (texts
and journals from 70s-80s), let alone understanding and implementing
the algorithms. (I've lost count of how many times I've read about
the Bresenham line drawing algorithm; I'm still not sure I "get it.")

As for this project, I'm having some trouble envisioning which pieces
should be partitioned off. They all seem so interrelated! The parser
(just a scanner, really; I think there's one point where it recurses
and that's only for scanning literal procedures) has to know about
the object types and how to create each of them.

I think the virtual memory store for composite objects (dictionaries,
arrays,
and strings) might be the best thing to break off first. I've just
discovered
in another thread that my simplistic implementation (with each save-
level
as an anonymous mmap) doesn't duplicate a legacy quirk of the original
Adobe implementation (restoring an earlier save-level doesn't rollback
string contents) which all other interpreters have followed.

So I need to modify the implentation of this part without disturbing
the
rest of the program. "Modularity to the rescue?"

> That's all I got for now.

Much obliged.

luser- -droog

unread,

Dec 20, 2010, 8:47:48 PM12/20/10

to

On Dec 20, 6:08 pm, b...@cs.stanford.edu (Ben Pfaff) wrote:

> Gene <gene.ress...@gmail.com> writes:
> > Yes, the 200-300 was common advice circa mid-80's when
> > compilers were much slower and editors didn't have features
> > (bookmarks, source navigation, etc.) to deal with longer files.
> > These days 2,000-3,000 lines is perfectly fine and
> > 20,000-30,000 occasionally if the code content has boilerplate
> > similarity.
>
> I'd be really uncomfortable with 20,000-30,000 lines of
> boilerplate, unless it was automatically generated and maintained
> by modifying the generator, not by modifying the generated code.

I'm just using vim. It's probably got some crazy navigation tools
but I haven't explored them. I'm afraid of emacs.

And since my olpc xo-1 laptop has a quirk of deactivating the mouse
if you happen to be touching it during a recalibration, I'm afraid
of gui tools in general. Once the mouse is off, you have to reboot
to turn it back on.

So for my own purposes, I'm quite pleased with the small file
sizes. To me it suggests that the code is concise. Perhaps Strunk
and White isn't the best style guide for coding.

luser- -droog

unread,

Dec 20, 2010, 8:59:34 PM12/20/10

to

init.ps defines a procedure just before it begins executing
user statements that shows off its one trick. The code
suggest you can run it two ways, but 'fill' isn't implemented
yet in this version so only 'stroke' works:

634(1)02:44 AM:xpost 0> xpost
initgraphics...found a TrueColor class visual at default depth.
drawWindow()
Xpost Version 0c
PS>{stroke}wheel

I probably should've turned off the 'printf's before uploading.
The text just indicates how the lines are being batched up for
X11.

Ian Collins

unread,

Dec 20, 2010, 9:43:13 PM12/20/10

to

On 12/21/10 02:47 PM, luser- -droog wrote:
> On Dec 20, 6:08 pm, b...@cs.stanford.edu (Ben Pfaff) wrote:
>> Gene<gene.ress...@gmail.com> writes:
>>> Yes, the 200-300 was common advice circa mid-80's when
>>> compilers were much slower and editors didn't have features
>>> (bookmarks, source navigation, etc.) to deal with longer files.
>>> These days 2,000-3,000 lines is perfectly fine and
>>> 20,000-30,000 occasionally if the code content has boilerplate
>>> similarity.
>>

<snip>

>
> So for my own purposes, I'm quite pleased with the small file
> sizes. To me it suggests that the code is concise. Perhaps Strunk
> and White isn't the best style guide for coding.

Small file sizes is good - it's easier to read multiple files side by
side than to be a several place in one file and if you ever use a
parallel or distributed build system, things go faster.

--
Ian Collins

Gene

unread,

Dec 20, 2010, 11:06:18 PM12/20/10

to

On Monday, December 20, 2010 9:43:13 PM UTC-5, Ian Collins wrote:
> On 12/21/10 02:47 PM, luser- -droog wrote:

> > On Dec 20, 6:08 pm, b....@cs.stanford.edu (Ben Pfaff) wrote:
> > So for my own purposes, I'm quite pleased with the small file
> > sizes. To me it suggests that the code is concise. Perhaps Strunk
> > and White isn't the best style guide for coding.
>
> Small file sizes is good - it's easier to read multiple files side by
> side than to be a several place in one file and if you ever use a
> parallel or distributed build system, things go faster.

Okay, sure. So now I will count angels on pinheads, but there isn't much difference between working in multiple files and working with multiple frames viewing the same file at different points. I find that being able to make interfaces (read header files) as small and simple as possible is an advantage of bigger files, esp. since C doesn't allow anything like a module hierarchy a la Ada. For example, as other have said, putting most of an interpreter (parser, simulator, etc.; take your pick) in one file requires exposing only the main interface. Interfaces among subsystems remain invisible. This, avoids complexity of headers and their dependencies, yada, yada. For example, a year or so ago I broke a single 10,000 line module down into pieces of ~600 lines. It took 2 or 3 hours to get the headers right, generate make dependencies, etc. and get everything through the regression tests. Build time advantage was tiny and negative when clean-building. On the whole, wish I'd stayed with original setup.

ImpalerCore

unread,

Dec 21, 2010, 12:38:09 AM12/21/10

to

The main advantage that you gain with smaller files is better compile-
time granularity. This is primarily useful for enabling trace, debug,
or constraint checking functionality to a subset of an interface.
This may or may not be useful depending on the code infrastructure in
place.

For example, consider implementing a list interface with a pair of
files: a header (list.h) and source file (list.c). Furthermore, this
file can be compiled using a compile-time flag (say -DENABLE_TRACE)
that can enable trace messages when calling functions within the list
library. Typically, when the interface is all in a single file, it's
quite tedious to enable trace messages for only a subset of functions,
which can be quite useful in debugging or logging to limit the
overhead or information overload. In my limited experience, trying to
control this at run-time (at least the way I was trying to do it) was
a huge pain.

Contrast that with an implementation that separates the large module
into a set of smaller modules. One can split up list.c into
list_insert_front.c, list_insert_back.c, list_free.c, list_sort.c,
etc., where each file corresponds to a single function (with
associated helper functions if needed). You can in essence compile
each individual "function" with a specific set of compiler flags. For
instance, one can compile list_sort.c with -DENABLE_TRACE to just
trace through sorting function calls without adding the tracing
overhead to other list functions that may be extraneous to the
problem. If a bug is was found in list_insert_back, one could try
compiling list_insert_back.c with -DENABLE_CONSTRAINTS to verify
function arguments. But, in addition to the dependency complexity
described above, it also requires making the build system (Makefiles)
more complicated to support compiling each object file with compile-
time specific flags.

I definitely recommend using the single file approach for major
components until the majority of the interface and design work is
complete, to simplify development. Function names, arguments,
structures, return values get changed, and splitting up a module too
early in my opinion is more hassle than its worth. Large header files
don't particularly bother me since I use a web browser to lookup
functionality rather than grepping headers. When the interface is
pretty stable, one can consider whether the kind of granularity
presented above is useful enough to warrant splitting up the interface
into more source files. I typically wouldn't bother doing subdividing
an interface just because of build times, but I'm also not in an
environment where build times are terribly long so that's out of my
personal experience.

Best regards,
John D.

Keith Thompson

unread,

Dec 21, 2010, 1:08:19 PM12/21/10

to

Gene <gene.r...@gmail.com> writes:
[...]

> Okay, sure. So now I will count angels on pinheads, but there isn't much difference between working in multiple files and working with multiple frames viewing the same file at different points. I find that being able to make interfaces (read header files) as small and simple as possible is an advantage of bigger files, esp. since C doesn't allow anything like a module hierarchy a la Ada. For example, as other have said, putting most of an interpreter (parser, simulator, etc.; take your pick) in one file requires exposing only the main interface. Interfaces among subsystems remain invisible. This, avoids complexity of headers and their dependencies, yada, yada. For example, a year or so ago I broke a single 10,000 line module down into pieces of ~600 lines. It took 2 or 3 hours to get the headers right, generate make dependencies, etc. and get everything through the regression tests. Build time advantage was tiny and negative when clean-building. On the whole, wish I'd stayed with original setup.

Gene, not all news reader software copes well with very long lines.
Keeping your text down to 72 columns or so is helpful.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

pete

unread,

Dec 22, 2010, 7:54:41 AM12/22/10

to

Keith Thompson wrote:
>
> Gene <gene.r...@gmail.com> writes:
> [...]

>
>
> Gene, not all news reader software copes well with very long lines.
> Keeping your text down to 72 columns or so is helpful.

My news reader showed me the line,
but wouldn't quote it in a reply.
But my newsreader is from the previous millenium.

--
pete

David Resnick

unread,

Dec 22, 2010, 10:47:48 AM12/22/10

to

On Dec 20, 11:06 pm, Gene <gene.ress...@gmail.com> wrote:

There are advantages to smaller files though, which have nothing to do
with editing. These include:
1) Parallel make of big projects works better (I usually kick off
parallel makes with the number of jobs equal to my number of cores,
works well for me and rather faster than single threaded make)
2) Small files makes merging of different developers work easier.
Less likely that you will have silly merge conflicts if you aren't
touching the same files. Of course, if you are changing the exact
same code in different branches there will always be stuff to
resolve. Doesn't apply to personal projects of course.
3) Smaller more specialized headers will have fewer includers, and
hence will not trigger the global rebuild that a monolithic header
might.

There are of course tradeoffs. Having everything in a single header
can make a simpler interface for clients. And huge numbers of files
means more metadata when using version control systems that do
labelling etc. And for a small project with a few thousand lines,
really doesn't matter much anyway.

-David

luser- -droog

unread,

Dec 22, 2010, 6:35:10 PM12/22/10

to

On Dec 19, 5:44 am, luserXtrog <mijo...@yahoo.com> wrote:

> I feel I need a fresh perspective (or many, ideally)
> on my program. It's grown to where I can't quite keep
> it all in my head and making new additions has become
> a game of "how did I do this elsewhere?"
>
> A zip file containing c and postscript source and a makefile
> are available at:http://code.google.com/p/xpost/downloads/list

I have uploaded a revised version which includes
commentary for all functions and at the tops of
all files. Should increase legibility.

I'm investigating using Cairo for the graphics.
That would eliminate 10 files.

luser- -droog

unread,

Dec 29, 2010, 4:13:33 PM12/29/10

to

Switching to cairo has dramatically accelerated my efforts.
As per suggestions, I have
- reduced the number of files (by consolidating the graphics)
- increased file sizes (by writing more functions)
- added comments for all operators (even those that don't exist)

http://code.google.com/p/xpost/downloads/list

Any advice or comments are greatly appreciated.

One question.
When including a header from a location not in the compiler
search path, is it better to pack the path into the #include
directive, thus

#include <cairo/cairo.h>

or as a command-line option via the makefile, thus

CFLAGS=-I/usr/include/cairo

?

Jorgen Grahn

unread,

Dec 29, 2010, 5:02:25 PM12/29/10

to

["Followup-To:" header set to comp.lang.c.]

On Wed, 2010-12-29, luser- -droog wrote:
...

> One question.
> When including a header from a location not in the compiler
> search path, is it better to pack the path into the #include
> directive, thus
>
> #include <cairo/cairo.h>
>
> or as a command-line option via the makefile, thus
>
> CFLAGS=-I/usr/include/cairo
>
> ?

IMHO,

#include <cairo/cairo.h>

is the better one. It says the file is cairo/cairo.h, relative to some
base include path. One well-known such path is /usr/include/.

Alternatively, do what the cairo documentation says.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

luser- -droog

unread,

Dec 29, 2010, 6:10:50 PM12/29/10

to

On Dec 29, 4:02 pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
> ["Followup-To:" header set to comp.lang.c.]
>
> On Wed, 2010-12-29, luser- -droog wrote:
>
> ...
>
> > One question.
> > When including a header from a location not in the compiler
> > search path, is it better to pack the path into the #include
> > directive, thus
>
> > #include <cairo/cairo.h>
>
> > or as a command-line option via the makefile, thus
>
> > CFLAGS=-I/usr/include/cairo
>
> > ?
>
> IMHO,
>
> #include <cairo/cairo.h>
>
> is the better one. It says the file is cairo/cairo.h, relative to some
> base include path. One well-known such path is /usr/include/.

Sadly, it doesn't work. cairo.h can't find its other files.

> Alternatively, do what the cairo documentation says.

Yeah. I saw all that pkg-config stuff at
http://cairographics.org/FAQ/ .
I'm not sure why, but I don't like it.
Probably misunderstanding masquerading as fear.

tlvp

unread,

Dec 30, 2010, 12:57:15 AM12/30/10

to

On Wed, 29 Dec 2010 16:13:33 -0500, luser- -droog <mij...@yahoo.com> wrote:

> Switching to cairo has dramatically accelerated my efforts.
> As per suggestions, I have
> - reduced the number of files (by consolidating the graphics)
> - increased file sizes (by writing more functions)
> - added comments for all operators (even those that don't exist)
>
> http://code.google.com/p/xpost/downloads/list
>
> Any advice or comments are greatly appreciated.
>
> One question.
> When including a header from a location not in the compiler
> search path, is it better to pack the path into the #include
> directive, thus
>
> #include <cairo/cairo.h>

Like Jorgen, I'd prefer the include line above -- but on the grounds that any human being reviewing the code learns at one glance where cairo.h is, and needn't go chasing through the makefile's CFLAGS options.

Cheers, -- tlvp

> or as a command-line option via the makefile, thus
>
> CFLAGS=-I/usr/include/cairo
>
> ?
>

--
Avant de repondre, jeter la poubelle, SVP

Nick Keighley

unread,

Dec 30, 2010, 2:12:19 AM12/30/10

to

it'snot impossible to regain control. Refactorise madly. Though
personnally I'd do the redesign. Hopefully there will be lots of
utilities to salvage from the first design.

Nick Keighley

unread,

Dec 30, 2010, 2:13:53 AM12/30/10

to

On Dec 20, 12:48 am, "BartC" <b...@freeuk.com> wrote:
> "Gene" <gene.ress...@gmail.com> wrote in message

>
> news:8c538e83-4c6b-4818...@y23g2000yqd.googlegroups.com...
>
>
>
>
>
> > On Dec 19, 6:44 am, luserXtrog <mijo...@yahoo.com> wrote:
> >> I feel I need a fresh perspective (or many, ideally)
> >> on my program. It's grown to where I can't quite keep
> >> it all in my head and making new additions has become
> >> a game of "how did I do this elsewhere?"
>
> >> A zip file containing c and postscript source and a makefile
> >> are available at:http://code.google.com/p/xpost/downloads/list
>
> >> I chose a BSD licence because I don't know any better.
>
> >> There are probably too few comments.
> >> So even comments like "this part needs more comments"
> >> are desirable.
>
> >> And in more than a few places I'm certainly guilty
> >> of attempting to be cute and/or clever. But to all
> >> appearances, it all works somehow.
>
> > I count a bit over 2,700 sloc. It's typical for an inexperienced
> > programmer to start losing control of a program at about this size if
> > there's been no design work or scaffolding before coding. If that's
> > what's happened, you won't regain control. Get a good book on data
> > structures and another on software design. Read them. Start over.
> > Chalk this one up to a learning experience.
>
> I had a quick look. 2700 loc seems tiny for any sort of interpreter.

well certainly for a Postscript interpreter!

> My main criticism might be that it is split up into too many files,
> averaging just 100 lines per module and 23 lines per header file.
>
> I'm not surprised it's difficult to keep it all together. In fact I'd be
> tempted to put it all into one file.

sailing perilously close to my personnel limits to filesize. If he's
going to implement a complete Postscript interpreter I'd expect it to
get too large for a single file.

Nick Keighley

unread,

Dec 30, 2010, 2:17:30 AM12/30/10

to

On Dec 21, 12:08 am, b...@cs.stanford.edu (Ben Pfaff) wrote:

> Gene <gene.ress...@gmail.com> writes:
> > Yes, the 200-300 was common advice circa mid-80's when
> > compilers were much slower and editors didn't have features
> > (bookmarks, source navigation, etc.) to deal with longer files.
> > These days 2,000-3,000 lines is perfectly fine and
> > 20,000-30,000 occasionally if the code content has boilerplate
> > similarity.
>
> I'd be really uncomfortable with 20,000-30,000 lines of
> boilerplate, unless it was automatically generated and maintained
> by modifying the generator, not by modifying the generated code.

I have trouble imagining that much "boiler plate similarity". I'm
actually not even sure what it means...

Nick Keighley

unread,

Dec 30, 2010, 2:25:12 AM12/30/10

to

everyone seems to be treating good modularisation as purely a solution
to long compilation times (not a problem I'd have thought with a
couple of kloc...). How about it's simply good design! Modularistaion,
information hiding etc. etc. You should be able to understand what a
module does just by reading its header file. The messy details of how
it does it shouldn't concern you (until it breaks).

Take a look at the Single Responsibility Principle. In fact start
reading up on software design in general.

Jorgen Grahn

unread,

Dec 31, 2010, 2:51:59 AM12/31/10

to

On Wed, 2010-12-29, luser- -droog wrote:
> On Dec 29, 4:02�pm, Jorgen Grahn <grahn+n...@snipabacken.se> wrote:
>> ["Followup-To:" header set to comp.lang.c.]
>>
>> On Wed, 2010-12-29, luser- -droog wrote:
>>
>> ...
>>
>> > One question.
>> > When including a header from a location not in the compiler
>> > search path, is it better to pack the path into the #include
>> > directive, thus
>>
>> > #include <cairo/cairo.h>
>>
>> > or as a command-line option via the makefile, thus
>>
>> > CFLAGS=-I/usr/include/cairo
>>
>> > ?
>>
>> IMHO,
>>
>> #include <cairo/cairo.h>
>>
>> is the better one. It says the file is cairo/cairo.h, relative to some
>> base include path. One well-known such path is /usr/include/.
>
> Sadly, it doesn't work. cairo.h can't find its other files.

*Checking on my own system, which has this "cairo" thing installed*

Yeah, /usr/include/cairo/cairo.h contains lines like

#include <cairo-features.h>
#include <cairo-deprecated.h>

It's very unclear to me why they chose to do it that way -- it would
have worked if they had written

#include "cairo-features.h"
#include "cairo-deprecated.h"

because that causes the search to include the directory where
<cairo/cairo.h> was found. Perhaps that's a gcc-ism, and they need to
support some compiler which doesn't do it like that? Most other
libraries on my system either use (a) the second form above, or
(b) the equivalent of #include <cairo/cairo-features.h>.

Anyway, then they're really not intending you to say <cairo/cairo.h> and you
cannot do it. It's an unfortunate choice IMHO to *both* make that
decision and install in a non-standard location (/usr/include/cairo/)
because that forces them to ...

>> Alternatively, do what the cairo documentation says.
>
> Yeah. I saw all that pkg-config stuff at
> http://cairographics.org/FAQ/ .
> I'm not sure why, but I don't like it.
> Probably misunderstanding masquerading as fear.

... invent strange ways for your build system to find the right
compiler flags. That's what that pkg-config stuff is, nothing more.

% pkg-config --cflags --libs cairo
-D_REENTRANT -I/usr/include/cairo -I/usr/include/freetype2
-I/usr/include/directfb -I/usr/include/libpng12
-I/usr/include/pixman-1 -lcairo

luser- -droog

unread,

Dec 31, 2010, 4:45:59 AM12/31/10

to

A productive 3-day weekend! Counting everything in systemdict,
it's got 181 operators.

http://code.google.com/p/xpost/downloads/list

The one I'd been dreading turned out pretty well, I think.
After I wrote it, I just kind of stared at it for a while.
But when I dared to try it, it worked!

/*
(15, 2)
(7, 2) 1
(3, 2) 1 1
(1, 2) 1 1 1
1 1 1 1
*/

int conv_rad (int num, int rad, char *s, int n) {
char *vec =
"0123456789"
"ABCDEFGHIJKLM"
"NOPQRSTUVWXYZ";
int off;
if (n == 0) return 0;
if (num < rad) {
*s = vec[num];
return 1;
}
off = conv_rad(num/rad, rad, s, n);
if ((off == n) || (off == -1)) return -1;
s[off] = vec[num%rad];
return off+1;
}

/* num radix string cvrs substring
convert to string with radix
*/
void Ocvrs (Object *o) {
int rad, n;
char *s;

if (type(o[0]) == realtype) o[0] = makeinteger(o[0].r);
if (!typearg3(integer,integer,string)) error(typecheck, OP_cvrs);
rad = o[1].i;
if ( (rad < 2) || (rad > 36)) error(rangecheck, OP_cvrs);
s = (char *)VM(o[2].s);
n = o[2].n;
n = conv_rad(o[0].i, rad, s, n);
if (n == -1) error(rangecheck, OP_cvrs);
o[0] = substring(o[2], 0, n);
}

Robert Bonomi

unread,

Jan 1, 2011, 12:23:25 PM1/1/11

to

In article <2e537a0c-2c87-4baa...@o4g2000yqd.googlegroups.com>,

luser- -droog <mij...@yahoo.com> wrote:
>
>One question.
>When including a header from a location not in the compiler
>search path, is it better to pack the path into the #include
>directive, thus
>
>#include <cairo/cairo.h>
>
>or as a command-line option via the makefile, thus
>
>CFLAGS=-I/usr/include/cairo

Authoritative Answer: "It depends." <grin>

If the include file is generally found in a 'standard' location (which the
use of the '<' ">" tokens presumes), and that location is a subdirectory
in the standard include location, by all means specify it in-line in the
#include directive.

If it is something that is _not_ in the standard include hierarchy, and/or
likely to be in _different_ places in different installations, then it
is *strongly* advised to either use the "quoted" full path-name in the
include directive, *OR* specify it via the -I compile-time directive.
In -either- method, be sure to have *LOUD* notice in the 'build the program'
directions that the setting _must_ be adjusted to reflect the user's
environment.

Keith Thompson

unread,

Jan 1, 2011, 6:46:08 PM1/1/11

to

bon...@host122.r-bonomi.com (Robert Bonomi) writes:
> In article <2e537a0c-2c87-4baa...@o4g2000yqd.googlegroups.com>,
> luser- -droog <mij...@yahoo.com> wrote:
>>
>>One question.
>>When including a header from a location not in the compiler
>>search path, is it better to pack the path into the #include
>>directive, thus
>>
>>#include <cairo/cairo.h>
>>
>>or as a command-line option via the makefile, thus
>>
>>CFLAGS=-I/usr/include/cairo
>
> Authoritative Answer: "It depends." <grin>

Even more authoritative answer: Do whatever the library's documentation
recommends, so your code will be compatible with other code that uses
the same library.

In this particular case, <http://cairographics.org/FAQ/> says:

You will need to instruct the compiler where to find the cairo.h
file, (generally something like -I /usr/include/cairo), and tell
the linker where to find the cairo library and to link with it,
(often something like -L /usr/lib -lcairo).

which implies that you should use "-I/usr/include/cairo" and
"#include <cairo.h>".

[...]