extremely slow parsing on large files

62 views
Skip to first unread message

Arthur Blake

unread,
Sep 13, 2011, 8:23:35 PM9/13/11
to CFEclipse Developers
I would like to take a stab at fixing this issue which I previously
complained about on the cfeclipse users list:

http://groups.google.com/group/cfeclipse-users/browse_thread/thread/4499f77e9bacd00d#

In the above posting I was complaining about CFEclipse 1.4.5, but I've
since discovered it's happening in all versions of CFEclipse (or at
least all versions that I've used...)

I believe the corresponding trac issue is:

http://trac.cfeclipse.org/ticket/152

Although the trac issue is assigned to markd, it has been open for 5
years and has not been touched for 2. I get the impression that
CFEclipse is languishing a bit since Denny started working on Railo,
and there don't appear to be any checkins on the trunk since Feb. and
that's a shame cause it's a great tool even though it's quite buggy in
some areas. So all I can do is roll up my sleeves and try and fix it
myself.

At any rate, the problem is driving me bonkers as I use cfeclipse
daily and against many large files (they would not be quite so large
if I had my say, but many were in place well before I inherited the
code base.)

If anyone has any pointers or ideas on fixing this issue, by all means
please let me know.

I am new to eclipse plugin development, but I am a very strong Java
programmer (far stronger than CF in fact) I have already gotten the
project checked out and set up in eclipse and I already fixed a couple
of really small, by annoying other bugs.

Thanks

denstar

unread,
Sep 18, 2011, 4:29:57 AM9/18/11
to cfecli...@googlegroups.com
Hi Arthur!

On Tue, Sep 13, 2011 at 6:23 PM, Arthur Blake wrote:
> I would like to take a stab at fixing this issue which I previously
> complained about on the cfeclipse users list:
>
> http://groups.google.com/group/cfeclipse-users/browse_thread/thread/4499f77e9bacd00d#
>
> In the above posting I was complaining about CFEclipse 1.4.5, but I've
> since discovered it's happening in all versions of CFEclipse (or at
> least all versions that I've used...)

Yeah, I reckon it stems from our sub-par parsing, really. Quite hard
to fix as it's deep in the guts.

That doesn't mean we ain't been trying, but be forewarned, this
probably won't be an easy thing to jump into. :)

I see somebody created a trac account for you-- thanks whoever did that!

> I believe the corresponding trac issue is:
>
> http://trac.cfeclipse.org/ticket/152
>
> Although the trac issue is assigned to markd, it has been open for 5
> years and has not been touched for 2.  I get the impression that
> CFEclipse is languishing a bit since Denny started working on Railo,
> and there don't appear to be any checkins on the trunk since Feb. and
> that's a shame cause it's a great tool even though it's quite buggy in
> some areas.  So all I can do is roll up my sleeves and try and fix it
> myself.

Woohoo! We're always happy to have more coders on board!

Maybe if we add some problems, we'll sucker some more heads in! ;)p

Fer reals though, there are a few aspects to what's going on here.

As for the commits, we've moved the source to GitHub. This is an
attempt to garner some more contributions, basically, and make the
source more accessible. I think we'll have to show consistent
activity to capture folk's cycles, if at all, but it has the potential
to help.
This kinda puts us in a bit of a bind ATM, as the ticket tracking
system is in trac, linked to commits and mylyn contexts and whatnot,
and not easily transferred. Also, the build process, including
checking out code, has changed. So a couple of major things to
address, just infrastructure wise. I only recently got my SVN sources
synced to what we put in github, and still have some commits I need to
push (funny side story-- I pushed a new build of the snippets plugin a
bit ago, only to realize that the stuff I pushed was old. Hrm. Ok,
that wasn't funny in the "ha ha" sense, but whatever :]).

As for the actual parsing problem(s), I've been working on it for more
than 2 years at this point, at a guess. This has included learning
about CS type stuff I'd managed to avoid prior, such as compilers and
language interpreters. I think ANTLR is the way to go here (the
cfscript ANTLR stuff is from OpenBD, and is quite nice. Beats the
hell out of what we were rocking before!), as it's pretty freaking
awesome.

See, this sub-optimal parsing extends it's tentacles into many aspects
of CFE, like content proposals and whatnot. I've kind of been hacking
little improvements in, but the end game is something totally
different (a true CFML language parser, leverage-able from outside
eclipse), and it's a balancing act of how much time to spend to try to
patch something that's never going to really float, vs. working on
something that will fly.

We need to fly. CFML *needs* a true parser. We can sorta hack our
way around code coverage tools and whatnot, maybe leaning on engines
like Railo and OpenBD, but that's just not going to lift off in
Language Tools Land I'm afraid.

That said, there is some freakishly awesome stuff on the way, via a
tight bond with a running engine. But still, that can only go so far.

If you really want to try to get in on the action as far as that
ticket is concerned, I'd suggest picking up the ANTLR book and looking
at the grammar the OpenBD folks have come up with, as well as the
latest updates Terrance has done to ANTLR, and start trying to wrap
your head around it. If you've done JavaCC, or other AST generating
type thingies, it should be a bit easier for you than the average Joe
(like me (Honest. I'm normal, I swear!)).

If you haven't done language lexing/parsing type stuff before, your
Java knowledge probably won't help very much. If you took CS courses,
that might-- conceptually at the least!

BUT-- there are millions and billions of places (well, maybe not that
many, but oodles) that can use improvement and where java knowledge
*will* be useful, and lots of them are what make CFE great. Even tho
the guts are an ancient parser, there's a whole lot of meat in there,
and things that can be done to make the average CFML coder's life a
bit easier. And Eclipse plugin development is almost a language of
it's own, so you won't be "wasting" any time trying to bend it to your
will. I've deleted more code than I've committed, but the journey is
burned into my neural pathways, and feeds stuff the stuff that makes
it past dev/null, even if not directly (sometimes it is directly,
too).

Shit, I don't mean to scare you off, which is sorta how this comes
across. The problem you're trying to fix is a hard nut to crack
though, and I want to give you a bit of background to keep in your
subconscious as you're looking over the base.

I think the idea of doing small bits here and there, getting a feel
for the flow of [CF]Eclipse, is perfecto. If it's not too much to
ask, can you pull the latest from github, copy over your changes, and
do a pull request? github.com/cfeclipse

The empowerment you'll feel getting some stuff in, and then out in a
build for the CFML masses, will go a long ways towards keeping the
fire burning!

> At any rate, the problem is driving me bonkers as I use cfeclipse
> daily and against many large files (they would not be quite so large
> if I had my say, but many were in place well before I inherited the
> code base.)
>
> If anyone has any pointers or ideas on fixing this issue, by all means
> please let me know.

...

It's come up a few times on the list, and several people have reported
that changing the parser settings has greatly improved things. Take a
gander through the list archive and see what settings to change, or
else do a bit of experimentation by turning off various things like
variable parsing, etc., until you see where the bottleneck is. Then
go rummaging through the code to see why that specific bit is killing
performance.

You may find that it's a string that once you start to pull, leads
into interesting mazes ("dance magic dance"- David Bowie)... or maybe
you'll be like "Ah ha! If I just use a bitwise or here, I can improve
this routine 336%!". I don't /really/ know, even when I sound like I
do, which I apologize for doing, as I try to avoid it but love to
converse about this stuff, so forget to make a point of pointing out.
<-- should read "I just got involved because I wanted to make CFE do
my bidding, the rest is all semi-educated guesses and experimentation"
:)

Anyways, feel free to hit me up directly if you've got questions, and
I'll do my best to answer 'em lucidly.

:DeN

--
I left Delhi, in 1971, shortly after Collective Choice and Social
Welfare was published in 1970.
Amartya Sen

Arthur Blake

unread,
Sep 19, 2011, 11:02:42 AM9/19/11
to CFEclipse Developers
I'm Glad someone (especially you) replied!

I was beginning to think that your old girlfriend CFEclipse was long
since dumped and abandoned for that sexy new Dojo...

My trac account actually already existed from some earlier bugs I had
reported-- I just forgot about it and Jim Priest reset the password
for me.

I am aware that the issue is not at all a simple one to solve,
especially after browsing through the parser code for awhile. So I'm
not under any illusions that this might be a quick simple fix.

I do have the ANTLR book from a previous project (but I ended up not
actually using it.) and I do have a little experience with parsers/
compilers, etc. as well as a formal CS education.

I also picked up the Eclipse plugin book that was recommended on the
wiki, and have been working my way through that one. I think that
could pay huge dividends whether or not I actually use much of the
knowledge for CFEclipse or not.

I think maybe starting over with a new parser (perhaps borrowing the
Railo parser) might be a good way to go. I'm saying this knowing
nothing about the actual Railo parser and also realizing that doing
something like that could be a monumental task.

So, I may not succeed in what I set out to do, but I'm going to gnaw
on it a little bit in my spare time. Nothing bad could come out of
that (right?)

Glad to know the source is now on github. I didn't see a single
reference to that on the wiki (maybe I missed it... or it was probably
posted here on the google group some time ago and I've not been
keeping up with the posts...) I'll grab the code from github and
maybe do a pull request later with some of my tiny little fixes.

Thanks! I will post more information if and when I actually get to
the place where I have concrete suggestions for changing the parser.

On Sep 18, 4:29 am, denstar <valliants...@gmail.com> wrote:
> Hi Arthur!
>
> On Tue, Sep 13, 2011 at 6:23 PM, Arthur Blake wrote:
> > I would like to take a stab at fixing this issue which I previously
> > complained about on the cfeclipse users list:
>
> >http://groups.google.com/group/cfeclipse-users/browse_thread/thread/4...

Peter Boughton

unread,
Sep 19, 2011, 11:56:02 AM9/19/11
to cfecli...@googlegroups.com
Oooh, someone with strong Java skills AND existing parser experience,
can we tie a rope around them so they don't escape? ;)

I have recently been poking about with the stuff I was previously
doing in this area, and considering reviving that project.
It was taking a different approach (is standalone, not Eclipse plugin,
generating CFML/JS/Java code), so not really directly
relevant/helpful, but ultimate goals are the same.

So yeah, just to say I'm also up for bouncing ideas against and
discussing stuff and whatever else. :)

Reply all
Reply to author
Forward
0 new messages