I was trying to use racc the other day, and couldn't get even the most
basic sample started. Does anybody know of any fully working sample
grammar files for it?
The calculator that is mentioned in the docs would be fine, if it had
the next_token, on_error and parse implemented. My main problem is that
I'm not sure what legal values next_token can return.
It would be great if there was some fully-working sample code along with
the docs, instead of just the example snippets. I even tried
understanding some of the japanese results that google found, but to no
avail.
Ben
Ive just been using LittleLexer, and its very simple to understand. Maybe
you'll have more luck with that.
http://rubyforge.org/projects/littlelexer/
--
spooq
On Tuesday 15 March 2005 15:03, Ben Giddings wrote:
> Hey all,
>
> I was trying to use racc the other day, and couldn't get even the most
> basic sample started. Does anybody know of any fully working sample
> grammar files for it?
>
Can I interest you in one of the Coco/R versions?
Either one that generates extensions (semantics and actions in C) (contact
me), or Ryan Davis's pure Ruby version
(http://www.zenspider.com/ZSS/Products/CocoR/index.html)
Regards,
--
-mark. (probertm at acm dot org)
There were some suggestions in the thread starting at
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/110423
(I offered a RACC parser for a subset of C++, just numerical
expressions, basically.)
I've got a grammar for a tool-specific export file format, about 560
lines. Let me know if you want to see it. It's not useful to anyone
without this (expensive) tool, but it's real working code.
Steve
See below.
>
> The calculator that is mentioned in the docs would be fine, if it had
> the next_token, on_error and parse implemented. My main problem is
that
> I'm not sure what legal values next_token can return.
>
Depends on your grammer, but a token should be a two element array,
whose first element describes the type of the token (usually a symbol -
like :IDENTIFIER) and the second element is the value of the token
(usually a string - like 'x'). Any token whose type (first element) is
false signals end of input.
You should write next_token() so it always returns [false, false]
(second element could be anything) when end of input is reached. (If
next_token() returns nil you will cause RACC to raise an error.)
> It would be great if there was some fully-working sample code along
with
> the docs, instead of just the example snippets. I even tried
> understanding some of the japanese results that google found, but to
no
> avail.
>
There is a link on my website (http://cmills.freeshell.org/) to a
presentation I gave at the PDX user group on RACC. It includes a power
point presentation on RACC and a bunch of small working example
programs - the grammer for some is only a few lines long.
-Charlie
Thanks Mark (and others who replied to my request).
I'll tell you what I'm looking to do, and maybe you can tell me what
would work.
I've become annoyed with the output of "make". I've recently been
compiling a set of things that takes a really long time to do. Make
enters about 500 directories, and gcc is invoked at least 5000 times.
The output from this process is nearly useless.
I'm thinking of making something to which I can pipe the output of this
process, so that I can make sense of what happened.
To do that, I want to parse this:
gcc -D__DEBUG__ -I../include -I/foo/bar/path -funroll-loops -o foo foo.c
to this:
#<GCCInvocation:0x4029a1f4 @lang_opts=["unroll-loops"],
@includes=["../include", "/path/to/foo"], @defines=["DEBUG"],
@sources=["foo.c"], @output="foo">
In other words, I want to make an object with the parsed commandline.
Once I can do that, I could hopefully make a GUI that printed something
simple for each line of output (like a dot), but if you wanted more
details you could expand the dot, and see the information in a clearly
presented way, like maybe:
gcc building foo:
sources: foo.c
defines: DEBUG
include paths: ../include, /path/to/foo
language options: unroll-loops
I chose racc because I pretty much understand bison, but I'm willing to
use anything that gets me from A to B relatively easily (as long as it
can keep up with make's output)
Ben
> To do that, I want to parse this:
>
> gcc -D__DEBUG__ -I../include -I/foo/bar/path -funroll-loops -o foo foo.c
>
> to this:
>
> #<GCCInvocation:0x4029a1f4 @lang_opts=["unroll-loops"],
> @includes=["../include", "/path/to/foo"], @defines=["DEBUG"],
> @sources=["foo.c"], @output="foo">
>
> In other words, I want to make an object with the parsed commandline.
Perhaps it is sufficient to scan the line a couple of times with some
regular expressions? e.g.:
class GCCInvocation
def initialize(line)
@includes = []
@lang_opts = []
@output = ""
@defines = []
@sources = []
line = " " + line
line.gsub!(/ -I ?(\S+)/) { @includes << $1; "" }
line.gsub!(/ -f ?(\S+)/) { @lang_opts << $1; "" }
line.gsub!(/ -o ?(\S+)/) { @output = $1; "" }
line.gsub!(/ -D ?(\S+)/) { @defines << $1; "" }
line.gsub!(/ ([^- ]\S+)/) { @sources << $1 }
@remainder = line.squeeze(' ')
end
end
while line = gets
case line
when /^gcc (.*)$/
p GCCInvocation.new($1)
end
end
Yeah, that would work (with tweaked regexps), but I would imagine it
would be less efficient. Each regexp is essentially a complex parser,
and so in your example above, you have five parsers per gcc line. If I
could do it in one parser, it would probably be quicker. I also think
it might be easier to avoid mistakes due to complex regexps. I might
try it though, it might be enough to do it the simple way.
Ben
> Yeah, that would work (with tweaked regexps), but I would imagine it
> would be less efficient. Each regexp is essentially a complex parser,
> and so in your example above, you have five parsers per gcc line. If I
> could do it in one parser, it would probably be quicker. I also think
> it might be easier to avoid mistakes due to complex regexps. I might
> try it though, it might be enough to do it the simple way.
If you use a regexp-based scanner as input to your parser, you may
actually end up with more regexp matching!
I disagree with the statement that a regexp is essentially a complex
parser. As I understand it, parsers like racc describe a more complicated
language than that described by regular expressions.
I've found that doing things with regular expressions in Ruby tends to be
much faster than anything involving interpreting Ruby code and creating
lots of objects.
Efficiency shouldn't be too much of a concern here anyway - the input data
rate is limited by how quickly your machine can do compilations -
certainly much slower than a few gsub!s on some very small input strings.
Good luck!
Jonathan
True enough. I wasn't planning on using a regexp based scanner, but I'm
not actually sure how some of these things are constructed internally.
> I disagree with the statement that a regexp is essentially a complex
> parser. As I understand it, parsers like racc describe a more complicated
> language than that described by regular expressions.
My understanding is that many regexp engines have complex parsers behind
the scenes. In fact, with lookahead and such, regular expression syntax
is much more complex than what bison can handle. I'm no regexp expert
though, so I may be wrong about that.
> Efficiency shouldn't be too much of a concern here anyway - the input data
> rate is limited by how quickly your machine can do compilations -
> certainly much slower than a few gsub!s on some very small input strings.
Tis true. GCC can spit out a lot of output when it's dealing with
small files, or doing dependency checks, but I imagine that's still slow
compared to Ruby's regexp parsing speed. I may be engaging in premature
optimization...
Ben
optparser = OptionParser.new
optparser.on("-D") do |arg|
gcc_inv.debug << arg
end
The above is sloppy and mostly wrong, but I don't have Ruby on this
machine to do it properly. Just an alternative idea. Especailly since
you won't have to write a new regexp everytime you want to add a
option to your class.
> optparser = OptionParser.new
> optparser.on("-D") do |arg|
> gcc_inv.debug << arg
> end
>
> The above is sloppy and mostly wrong, but I don't have Ruby on this
> machine to do it properly. Just an alternative idea. Especailly since
> you won't have to write a new regexp everytime you want to add a
> option to your class.
I suppose the problem with this approach is that optparse will raise an
exception for unspecified options. So, to get it to work correctly you'd
need to know all the possible options to gcc (and whether or not they take
an argument).
It'd be a really neat solution if this can be resolved!
If you go into the sample or test directories in the source tar ball for
racc there a bunch of fully worked out examples. More so in the test
then sample though.
Charlie