Correction, that's 16 bytes, not 8.

Jonathan Revusky

unread,

Aug 11, 2008, 9:31:43 PM8/11/08

to kawadd...@googlegroups.com

Actually, there are 4 int fields in Token.java for the location info,
beginLine, endLine, beginColumn, endColumn. That's 16 bytes, not 8.

Still, I think that throwing away location info will almost always be a
very bad tradeoff.

JR

brianegge

unread,

Aug 12, 2008, 8:58:28 AM8/12/08

to KawaDD Development

I agree with removing the option to turn off column information. I
think it's pretty rare to need to optimize a parser for memory or
speed these days. For memory, I find it easier to do memory intensive
apps in Java than C these days. A typical KVM setting for me is "-d64 -
Xmx12g". In C my problem is I do a lot of pointer arithetic and use 32
bit values to store intermediate values. The programs will compile
fine in 64 bit, but will fail on data sets larger than 1GB. This can
occur in Java as well, but is less common.

For speed concerns, I think these days most people will try to run
multiple parsers in paralel rather than get a single parser to run
faster. With every performance tweak turned on, you might get 2x the
speed. My apps today run on a 16 core machine, and I expect they will
be on a 100+ core before their end of life.

Jonathan Revusky

unread,

Aug 13, 2008, 12:06:08 AM8/13/08

to kawadd...@googlegroups.com

On Tue, Aug 12, 2008 at 5:58 AM, brianegge <bria...@gmail.com> wrote:
>
> I agree with removing the option to turn off column information. I
> think it's pretty rare to need to optimize a parser for memory or
> speed these days.

I would say that sensible people are only interested to the nearest
order of magnitude.

Anyway, I removed the KEEP_LINE_COLUMN option. Anybody who really very
badly wanted to remove the location info could edit the templates, but
I don't think it's a very sensible option to be offering people. The
one good thing about it though was that it had the right default,
which was to keep the location info. The STATIC option (which I
removed a while ago) was on by default which was just terrible. It
meant that you generated non-thread-safe code by default.

There are various configuration options that don't make a whole lot of
sense. Some of them are basically the result of laziness on the part
of the implementor. For example, COMMON_TOKEN_ACTION makes hardly any
sense. When you read in the grammar, you can see whether the person
put in a CommonTokenAction(Token t) method in his TOKEN_MGR_DECLS
section and set the option on that basis.

The same goes for these options like NODE_SCOPE_HOOK. I mean, when you
read in the file, you can see whether the parser contains a method
called jjtreeOpenNodeScope() and/or a jjtreeCloseNodeScope() and if it
does, you simply add in the call to that in the appropriate places.
AFAICS, there is no need for a separate configuration option, since
the presence or absence of the method of that name can be checked
perfectly well. So you can expect those options to melt away.

The other very odd thing about the jjtree scheme was that it provides
no way to include custom code in the generated Node files besides
post-editing the generated files. This is by complete contrast with
the XXXParser.java and XXXParserTokenManager.java, where you could put
your custom code in specific sections in the grammar file.

So I intend to add the ability to have sections analogous to
TOKEN_MGR_DECLS and PARSER_BEGIN...PARSER_END for the other files that
get generated, so that, at least with best practice, you don't
post-edit generated files.

Of course, once that is done, there is a general need for
documentation of these various things, but once I have something
together, I think it will be time to roll up a release and start
announcing it in some of the typical places.