The Reg RubyForge project: http://rubyforge.org/projects/reg/
The Reg Tarball:
http://rubyforge.org/frs/download.php/4199/reg-0.4.0.tar.bz2
Reg is best thought of in analogy to regular expressions; Regexps are
special data structures for matching Strings; Regs are special data
structures for matching ANY type of ruby data (Strings included, using
Regexps).
This table compares syntax of reg and regexp for various constructs.
Keep
in mind that all Regs are ordinary ruby expressions. The special syntax
is acheived by overriding ruby operators.
These abbreviations are used:
re,re1,re2 represent arbitrary regexp subexpressions,
r,r1,r2 represent arbitrary reg subexpressions
s,t represent any single character (perhaps appropriately escaped, if
the char is magical)
reg regexp #description
+[r1,r2,r3] /re1re2re3/ #sequence
-[r1,r2] (re1re2) #subsequence
r.lit \re #escaping a magical
regproc{r} #{re} #dynamic inclusion
r1|r2 or :OR (re1|re2) or [st] #alternation
~r [^s] #negation (for scalar r and s)
r+0 re* #zero or more matches
r+1 re+ #one or more matches
r-1 re? #zero or one matches
r*n re{n} #exactly n matches
r*(n..m) re{n,m} #at least n, at most m matches
r-n re{n,} #at least n matches
r+m re{,m} #at most m matches
OB . #a single item
OBS .* #zero or more items
BR[1,2] \1,\2 #backreference ***
r>>x or sub sub,gsub #search and replace ***
here are features of reg that don't have an equivalent in regexp
r.la #lookahead ***
~-[] #subsequence negation w/lookahead ***
& or :AND #all alternatives match
^ or :XOR #exactly one of alternatives matches
+{r1=>r2} #hash matcher
-{name=>r} #object matcher
obj.reg #turn any ruby object into a reg that matches if
obj.=== succeeds
/re/.sym #a symbol regex
proceq(klass){rcode} #a proc{} that responds to === by invoking the
proc's call
OBS as un-anchor #opposite of ^ and $ when placed at edges of a
reg array (kinda cheesy)
name=r #named subexpressions
recursive matches via regvariables®constants ***
*** = not implemented yet.
Reg is kind of hard to wrap your mind around, so here are some
examples:
Matches array containing exactly 2 elements; 1st is another array, 2nd
is integer:
+[Array,Integer]
Like above, but 1st is array of arrays of symbol
+[+[+[Symbol.reg+0]+0],Integer]
Matches array of at least 3 consecutive symbols and nothing else:
+[Symbol.reg+3]
Matches array with at least 3 symbols in it somewhere:
+[OBS, Symbol.reg+3, OBS]
Matches array of at most 6 strings starting with 'g'
+[/^g/-6] #no .reg necessary for regexp
Matches array of between 5 and 9 hashes containing a key :k pointing to
something non-nil:
+[ +{:k=>~nil.reg}*(5..9) ]
Matches an object with Integer instance variable @k and property (ie
method) foobar that returns a string with 'baz' somewhere in it:
-{:@k=>Integer, :foobar=>/baz/}
Matches array of 6 hashes with 6 as a value of at least one key,
followed by 18 objects with an attribute @s which is a String:
+[ +{OB=>6}*6, -{:@s=>String}*18 ]
Status:
Some highly nested vector reg constructions still don't work quite
right. (For examples, search on eat_unworking in regtest.rb.) A number
of features are unimplemented at this point, most notably
backreferences and substitutions.
Sorry for the shouting.
Just curious: what kind needs led you to develop this?
Phil
In article <1114309915.9...@g14g2000cwa.googlegroups.com>,
> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar.
This is like too good/weird to be true.
--Peter
--
There's neither heaven nor hell, save what we grant ourselves.
There's neither fairness nor justice, save what we grant each other.
A long time ago, I wanted a better regexp than regexp. My search ended
when I found an extremely obscure language called gema (the
general-purpose matcher). I'm guessing that I'm the only person to ever
take gema seriously. For a time, I became the worlds foremost expert on
gema. Gema is designed around the idea that all computation can be
modeled as pattern and replacement. Everything in gema is pattern and
replacement... essentially everything is done with regexps. I was
fascinated with the idea. This seemed to me to be a much better model
for most programming problems, which typically involve reading input,
tranforming it in some way, and writing it out again. Conventional
languages (starting with fortran, and including ruby) are based around
the idea of a program being a long string of formulas. This is great
for math-heavy stuff, but most programming is really about data
manipulation, not math.
But there was trouble in paradise. Gema was wonderful, but weird. The
syntax was cranky. The author had issued one version long ago then
disappeared. Gema code was hard to read, in part because
everythingwasalljammedtogether.
Ifyouinsertspacestomakeitmorereadable,itchangesthesemanticsofyourprogram.
There were strange problems that I never tracked down or fully
characterized. The only data-type was the string. You had to be an
expert at avoiding the invisible pitfalls of the language to get
anywhere. But I did get surprisingly far. I managed to coax gema into
becoming a true parser, and parsing a toy language.
I wanted to write a compiler in gema. Yes, the whole compiler. And
parsing the toy language was already straining its capabilites. It
wasn't the data model; I actually figured out how to model all other
data types using strings. A match-and-replace language is actually much
better suited to most compiler tasks than an algol-like formula
language.
Eventually, I abandoned gema, determined to recreate it's glory in a
cleaner form. It was at about this time that I discovered ruby. The
successor to gema was ruma, the ruby matcher. Ruma would be basically
just like gema, but without the problems. Whitespace allowed between
tokens. Proper quotation mechanisms, including nested quotes. And the
language used in the actions (replacements) would be full ruby, instead
of gema's inadequate and crude action language.
Ruma got maybe halfway done... quite a ways, really. As part of ruma, I
needed a ruby lexer to make sense of the actions. This turned out to be
quite a lot harder than I had anticipated; I'm still working on that
lexer.
After grinding away at the lexer for a while, dreaming of ruma in the
meantime, I had a brainstorm. Ruma, like gema, was to be a string-based
language. It only operated on strings. In gema, that was just fine
because everything was strings and you just had to live with that. But
ruby has all these other types, a real type system. Wouldn't it be nice
to have those sophisticated search capabilites for other types too?
Well, since I proved to myself that all data types can be converted to
strings, why not convert the ruby data into strings and then match that
in ruma. Of course, it would be so much nicer to just do the matching
on the data in it's original form....
The breakthrough came when I realized how malleable ruby really is. I
had become accustomed to c, which I still love, but in so many ways
it's so much more limited. I didn't really have to write my own parser
and lexer; ruby could do it all for me. I just had to override a bunch
of operators.
After that, it was simple. All I do is override the right operators,
and ruby does the parsing and hands me the match expressions in
already-parsed form. Reg is amazingly small in the end. Most of the
effort and code went into the array matcher, but at least as much
functionality is to be had from the hash and object matchers, which
were trivial.
I would very much like to do this, but right now, no. I'm not sure
exactly what would be involved in having the array matcher match files
as well; it seems like you might have to rip out the guts of the
backtracking engine to support it... but maybe not. Anyway, stay tuned
for a future release.
Just having the ability to compare regexps directly against files would
be really helpful in the construction of lexers of all sorts. Java has
this; why doesn't ruby?
> Because if it does, then you've got a lexer system that is also good
as
> something else than just a damn lexer.
Lexers, parsers, and pattern matching languages get too short a shrift
in my opinion. There's really a lot more they could be used for, if
only people would see... of course, it doesn't help that almost all
existing tools of this kind are string-oriented, and hard to use for
other data.
> And by making regexps unified with the rest of the language, it
brings
> Ruby closer to the Icon language, isn't it?
I wouldn't know... please let know about regexp integration in icon;
maybe there's some features I can steal.
>
> Lexers, parsers, and pattern matching languages get too short a shrift
> in my opinion. There's really a lot more they could be used for, if
> only people would see... of course, it doesn't help that almost all
> existing tools of this kind are string-oriented, and hard to use for
> other data.
>
A small piece of example code could help to open eyes of people that dont
see what could be done with Reg (like me).
Denis
> I would like to announce the first version, 0.4.0, of Reg, the Ruby
> Extended Grammar. Reg is a library for pattern matching in ruby data
> structures. Reg provides Regexp-like match and match-and-replace for
> all data structures (particularly Arrays, Objects, and Hashes), not
> just Strings.
Nifty, nifty, nifty. I really need to have a look at that.
How does it compare to the ML style of argument matching, btw?
--
Christian Neukirchen <chneuk...@gmail.com> http://chneukirchen.org
It seems similar in spirit to JXPath for java which lets you use XPath
expressions to access objects, Hashs, Arrays, Maps etc which otherwise
is quite longwinded in java ( no snickering please ).
http://jakarta.apache.org/commons/jxpath/
--
Into RFID? www.rfidnewsupdate.com Simple, fast, news.
I have not played with it yet, so hope these questions are not off base:
- can I bind variables to (parts of) matches
- have you thought about the connection to duck typing?
- any convenient way to match "all" ... like r*(size..size)
"vikkous" <goo...@inforadical.net> wrote in message
news:1114309915.9...@g14g2000cwa.googlegroups.com...
Traditionally, parsing and pattern matching languages stop after the
parser stage of the compiler pipeline, but it seems to me that many
later compiler tasks are particularly well suited for pattern-matchers.
(They're never used because by this point, compiler data is in the form
of a parse tree,
and text-based pattern tools (most of them) can't deal with that.)
Let's take the example of a simple optimization, like strength
reduction. This is where the compiler changes multiplication by a
constant power of two into a left shift.
The problem, in other words, is to search for nodes of the syntax tree
that look like this:
[<some expr>, :*, 4]
and turn them into into this:
[<some expr>, :<<, 2]
In Reg, that would be:
+[expr, :*, -{:power_of_2?=>:true}].sub{BR[0], :<<, BR[2].log2}
My post "Lalr(n) parsing with reg" outlines how to twist Reg to
actually be a parser.
I took a skim thru this. It seems like XPath is all 'traversal of the
object graph' and thus very much like Reg's Object and Hash matchers.
The Array matcher does that and also matches regexp-like patterns
within arrays. Does XPath have that? I didn't see anything.
I guess what you're asking for is like the functionality of
backreferences, which aren't implemented yet. Binding a variable is
somewhat different, I guess but allows you to do the same sort of
thing. It could be implemented, not necessarily easily for global vars,
and maybe for others too using Binding.of_caller.
> - any convenient way to match "all" ... like r*(size..size)
Uhh... well, OB matches any single object and OBS matches 0 or more of
any object. One great thing about reg is that you can name your own
subexpressions if you happen to need a matcher that matches -[some long
reg that I dont want to type over and over], you can write:
foo=-[some long reg... etc]
and then use foo everywhere in your larger reg. This is a major
weakness of regexps, and was one of the great things about gema in
comparison.
For instance, modulo optimizations, the definitions of OB and OBS are:
OB=Object.reg
OBS=OB+0
Christian Neukirchen said:
> How does it compare to the ML style of argument matching, btw?
I think you're both talking about the same thing here; the ability to
dispatch to different methods depending on the types of method
parameters other than just the receiver. I have thought a good deal
about this, in fact, but not exactly in the context of Reg.
The best way to do this means extending the syntax, but here's a quick
and dirty way reg might be used to do it:
#warning! won't work yet; no substitutions in reg yet
module Scoundrels
def Scoundrels::bill(*args) #dispatcher for all the bills
send +[
-[Lockpick]>>:bill_the_picklock |
-[FakePassport,Cash,ShoePhone]>>:bill_the_spy |
-[Gun.reg+1, Knife]>>:bill_the_murderer |
-[Laptop,Oscilloscope.reg|CellPhone|Password.reg*5]>>:bill_the_hacker
].match(args.dup).first, args
end
#definitions of the various bills omitted
end
Then you could do:
Scoundrel.bill(LockPick.new) #invokes bill_the_picklock
Scoundrel.bill(Laptop.new, CellPhone.new) #invokes bill_the_hacker
Maybe this could be made easier. The syntax takes a little getting used
to, let us say.
This is great; people are coming up with angles I never thought of.
On Tue, 26 Apr 2005, vikkous wrote:
> itsme213 said:
>> - have you thought about the connection to duck typing?
>
> Christian Neukirchen said:
>> How does it compare to the ML style of argument matching, btw?
>
> I think you're both talking about the same thing here; the ability to
> dispatch to different methods depending on the types of method
> parameters other than just the receiver. I have thought a good deal
> about this, in fact, but not exactly in the context of Reg.
I'm not sure about Christian, but I think itsme213 meant sort of the
opposite: how does your system relate to the duck-typing environment
of Ruby, where type != class (i.e., an object's capabilities and its
class's instance methods are not necessarily the same thing)? Are you
thinking of extending the system so that it could match, for example,
"an array of objects that respond to '[]'" (or something along those
lines)? One could imagine that being useful -- though on the other
hand, duck typing per se, as I understand it, really means just
requesting action from objects without a lot of preliminary querying
and measuring (whether it be is_a?, respond_to?, or whatever). So a
system like yours might be part of a fundamentally different way of
handling these things -- though respond_to?-awareness might be an nice
sort of middle ground.
At least I think that's what he meant, and even if not, it would be
interesting to hear your thoughts on it :-)
David
--
David A. Black
dbl...@wobblini.net
I am not certain how flexible your framework is, but as a sidenote
the typical pattern matching (term used earlier) in functional
languages is well represented by things like this:
length [] = 0 -- Matches an empty list
length (x:xs) = 1 + length xs -- Matches&splits nonempty lists
length [1,2,3] -- returns 3
-- [1,2,3] -> [2,3] -> [3] -> []
Sorry. I just like writing Haskell :)
>The best way to do this means extending the syntax, but here's a quick
>and dirty way reg might be used to do it:
>
>#warning! won't work yet; no substitutions in reg yet
>module Scoundrels
> def Scoundrels::bill(*args) #dispatcher for all the bills
> send +[
> -[Lockpick]>>:bill_the_picklock |
> -[FakePassport,Cash,ShoePhone]>>:bill_the_spy |
> -[Gun.reg+1, Knife]>>:bill_the_murderer |
>
>-[Laptop,Oscilloscope.reg|CellPhone|Password.reg*5]>>:bill_the_hacker
> ].match(args.dup).first, args
> end
>
> #definitions of the various bills omitted
>end
>
>Then you could do:
> Scoundrel.bill(LockPick.new) #invokes bill_the_picklock
> Scoundrel.bill(Laptop.new, CellPhone.new) #invokes bill_the_hacker
>
>Maybe this could be made easier. The syntax takes a little getting used
>to, let us say.
>
>This is great; people are coming up with angles I never thought of.
E
--
template<typename duck>
void quack(duck& d) { d.quack(); }
> extending the system so that it could match, for example,
> "an array of objects that respond to '[]'" (or something along those
> lines)?
I wonder if he could just support objects that implement === which would
give you Range and Module support and ruby-contract support for free.
But how is this related to your quote at all? ruby-contract offers a
Check::Quack[:message] adaptor that implements === via respond_to?() --
I wonder if it would be a good idea to define Symbol#=== which would be
used like this:
first = case obj
when :first then
obj.first
when :fetch then
obj.fetch(0)
when :at then
obj.at(0)
when :[] then
obj[0]
end
Or in combination with ruby-contract's signature():
class IO
def pretty_output(*objs)
objs.each do |obj|
puts obj.pretty
end
end
signature :pretty_output, :repeated => :pretty, :block => false
end
> David A. Black wrote:
>
>> extending the system so that it could match, for example,
>> "an array of objects that respond to '[]'" (or something along those
>> lines)?
>
> I wonder if he could just support objects that implement === which
> would give you Range and Module support and ruby-contract support for
> free.
>
> But how is this related to your quote at all? ruby-contract offers a Check::Quack[:message] adaptor that implements === via respond_to?() --
> I wonder if it would be a good idea to define Symbol#=== which would
> be used like this:
>
> first = case obj
> when :first then
> obj.first
> when :fetch then
> obj.fetch(0)
> when :at then
> obj.at(0)
> when :[] then
> obj[0]
> end
While I can see your intention, please don't do that. Often, people
want to compare Symbols and Symbols (they are perfect for that). It
is already very confusing that String !=== String (what is the most
elegant way to case compare classes against classes, btw?).
case obj.quack
when :first
when :fetch
...
end
would be nice to have, though.
Oh! Ok, well that's easy, it would be something like:
+[ proceq{|x| x.respond_to? :[] }+1 ]
Being forced to use a proceq (gawd that's a terrible name) here is a
little ugly. Eventually, I want to be able to pass arguments to
property matchers, so the above could be:
+[ -{ [:respond_to?, :[]]=>true }+1 ]
> duck typing per se, as I understand it, really means just
> requesting action from objects without a lot of preliminary
> querying and measuring (whether it be is_a?, respond_to?,
> or whatever).
If you're doing something complicated, you occasionally have to
explicity request the type (whether duck- or class-) of objects you're
working with, in order to do the right thing with it. I know this isn't
polymorphic, but sometimes it is the right way...
And in fact, this is exactly what I did. The idea is that users can
easily create their own (scalar) matchers by writing a matcher class
and implementing === in it. This also gives you automatic support for
all the library classes that have ===.
I'm going to have to check out this ruby-contract thing. Sounds very
interesting.
The correct operator to use for comparison is ==, not ===. Despite the
similarity of names, === is for pattern-matching, not comparison. For
simple classes, pattern-matching and comparison are the same thing, but
when the receiver is a Regexp, or a Reg, things are different.
Btw,
:sym==:sym and Class==Class
work just fine.
> case obj.quack
> when :first
> when :fetch
> ...
> end
>
> would be nice to have, though.
hmmm... I haven't tried this, but....
require 'reg'
class Object
def quack
RegOr.new *methods
end
end
ought to do it
Well, not quite. If Florian's Symbol#=== existed it would work,
otherwise it has to be like:
require 'reg'
class Object
def quack
RegOr.new *methods.map{|m| proceq{|x| x.respond_to? m } }
end
end
> It is already very confusing that String !=== String (what is the most
> elegant way to case compare classes against classes, btw?).
I don't know about elegant, but this is probably the way that requires
least typing:
def make_new(klass)
case [klass]
when [String]: "foo"
when [Fixnum]: 27
when [Array]: [nil, nil, true]
end
end
make_new String
=> "foo"
Ilmari Heikkinen
--
66. The regions beyond these places are either difficult of access
because of their excessive winters and great cold, or else cannot be
sought out because, of some divine influence of the gods.
On Wed, 27 Apr 2005, vikkous wrote:
> I wrote:
>
>> I'm not sure about Christian, but I think itsme213 meant
>> sort of the opposite: how does your system relate to the
>> duck-typing environment of Ruby, where type != class
>> (i.e., an object's capabilities and its class's instance
>> methods are not necessarily the same thing)? Are you
>> thinking of extending the system so that it could match,
>> for example, "an array of objects that respond to '[]'"
>
> Oh! Ok, well that's easy, it would be something like:
>
> +[ proceq{|x| x.respond_to? :[] }+1 ]
>
> Being forced to use a proceq (gawd that's a terrible name) here is a
> little ugly. Eventually, I want to be able to pass arguments to
> property matchers, so the above could be:
>
> +[ -{ [:respond_to?, :[]]=>true }+1 ]
I wonder whether one could combine this with tests for class ancestry,
by matching a class/module name (#is_a?) and, at the same time,
verifying that the object's capabilities are a match for its ancestry.
Or something. I haven't puzzled it through very deeply.
>> duck typing per se, as I understand it, really means just
>> requesting action from objects without a lot of preliminary
>> querying and measuring (whether it be is_a?, respond_to?,
>> or whatever).
>
> If you're doing something complicated, you occasionally have to
> explicity request the type (whether duck- or class-) of objects you're
> working with, in order to do the right thing with it. I know this isn't
> polymorphic, but sometimes it is the right way...
I'd tend to say: if it's the type, it's not the class, and if you have
to explicitly request it, it's not duck-typing :-) But yes, I think
there are occasions for all these things, in various combinations.
As others have often pointed out, respond_to? != duck typing, but I do
tend to think of respond_to?-based logic as a kind of "weak duck
typing", if that isn't too arcane. It falls short of the true "Here,
do this!" spirit of true duck typing, but undeniably (in my view) has
a foot in the same camp, in that it factors in the dynamism of type.
> Christian Neukirchen wrote:
>> While I can see your intention, please don't do that. Often,
>> people want to compare Symbols and Symbols (they are
>> perfect for that). It is already very confusing that String
>> !=== String (what is the most elegant way to case
>> compare classes against classes, btw?).
>
> The correct operator to use for comparison is ==, not ===. Despite the
> similarity of names, === is for pattern-matching, not comparison. For
> simple classes, pattern-matching and comparison are the same thing, but
> when the receiver is a Regexp, or a Reg, things are different.
>
> Btw,
> :sym==:sym and Class==Class
> work just fine.
Yes, they do. But case compares with ===.
> On 26.4.2005, at 20:35, Christian Neukirchen wrote:
>
>> It is already very confusing that String !=== String (what is the most
>> elegant way to case compare classes against classes, btw?).
>
> I don't know about elegant, but this is probably the way that requires
> least typing:
>
> def make_new(klass)
> case [klass]
> when [String]: "foo"
> when [Fixnum]: 27
> when [Array]: [nil, nil, true]
> end
> end
>
> make_new String
> => "foo"
Nice, that conses a lot but is readable and understandable. Thanks!
> Ilmari Heikkinen
vikkous wrote:
> Lexers, parsers, and pattern matching languages get too short a shrift
> in my opinion.
You're right. As I said in a recent discussion, I think it's because
the people of a sufficient theoretical bent to create the tools don't
seem to be able to make them usable:-).
Talking about substitutions, made me wonder whether you were familiar
with Txl (Tree Transformation Language).
When you come to doing substitutions, read up on BURGs (Bottom Up
Rewriting Grammars) if you aren't already familiar with them. They're
how optimising compilers choose an optimal sequence of code to emit.
They basically match leaf portions of an expression tree, and for each
match, accumulate the cost of the instructions that need to be emitted
to allow that sub-tree to be simplified. Within reason, all possible
paths are explored that allow the tree to be rewritten to the empty
tree, by emitting the optimal instruction sequence. It'd be excellent
if reg could deal with the kind of ambiguity this entails, choosing a
minimum-cost resolution.
Clifford Heath.