RbYAML

6 views
Skip to first unread message

Simon Chiang

unread,
Dec 28, 2008, 2:22:31 PM12/28/08
to Zaml
Hey everyone, has anyone looked into this?

http://rubyforge.org/projects/rbyaml

Its apparently a pure-ruby YAML implementation, but it doesn't look
like it's been touched in a couple years. I haven't played around
with it yet.

Markus

unread,
Dec 28, 2008, 2:45:47 PM12/28/08
to za...@googlegroups.com
Simon --

I believe (if this is the one I'm thinking of) we'd looked at it briefly
and found that it had the same (algorithmic) performance problems as the
C-assisted one.

I'm in the process of wrapping up a week+ merging session to bring some
of your changes & a few others into the master branch. Details shortly.

-- Markus

Markus

unread,
Dec 28, 2008, 4:54:31 PM12/28/08
to za...@googlegroups.com
All --

I've just pushed a major revision of zaml to [master], incorporating
many of Simon's changes from [proposed] and fixing a number of bugs.

Overview:

* The new version is about as fast as the previous (+/-10% or so
depending on the data).
* It correctly handles a number of cases that yaml.rb does not
(such as "\r\n" and {0 => :integer, :symbol => :symbol, {}
=> :hash, [] => :array}) and some complex structures that
actually crash yaml.rb.
* Getting to this point required compromise in two areas
* Simon's goal of matching yaml's output; I eventually
gave this up as unobtainable / undesirable, as many of
the output quirks of yaml.rb lead to incorrect results
in some (admittedly contrived) cases.
* Ragav's goal of fully readable strings, again due to
wanting accurate reloading above all. See below for
further discussion.
* Much more extensive tests
* the basic testing structure Simon set up
* all one and two character strings
* all words from the system dictionary (if present)
* selected comprehensive strings up to 5 characters (takes
several hours, so it's disabled in the test file by
default)
* More "tricky" strings such as "on" and "yes" and "1:2:3"
* Simon's benchmarks plus some with more complex data (instead of
1000's of repetitions of simple data) to show off zaml's
performance advantage.


* Some of Simon's changes I chose not to include:
* Splitting the yaml patch for exceptions off to a
separate file. Since the patch can't be used without
zaml and it makes no sense to use zaml to serialize
exceptions without the patch, there's no reason to have
the patch in a separate file.
* Indentation unnesting for arrays inside hashes. It
complicates and slightly slows the system, and makes the
output of some structures ambiguous. The two advantages
(slightly smaller output files and exact duplication of
yaml.rb's output) aren't worth the loss of
functionality.
* Changing the spelling of Jesse's name. While it would
be tempting to play with renaming friends and coworkers
in this way, the potential risk of retaliation is just
too great.
* The !!binary type for string data with high-bits. See
below.


One area where I think further work could be beneficial is the handling
of text. Ragav's point about wanting them as readable as possible is
well taken, provided neither speed or accuracy are compromised. I'd
rank the goals as:

1. Accurate representation
2. Speed
3. Readability
4. Compactness

...in that order. We now correctly handle a great many strings that the
last version did not (and even some that yaml.rb does not) but I suspect
we could do better.

The problem is how to identify and correctly produce alternative output
for such cases without spending so much time on it that we lose our
speed advantage.

For example, the !!binary type makes sense on the basis of compactness
for (say) image data that just happens to be encoded in a string. But
going to it for any string that has any high-bit characters compromises
both readability and compactness for bodies of text that just have one
or two UTF-8 characters in them, as it makes them ~50% larger and
totally unreadable.

Likewise, the -|, -|- and -|+ formats could make multi-paragraph blocks
of text much more readable than embedded "\n" notation, but aren't
reloaded properly in many cases.

I'm sure there's a way to distinguish these cases and do the right
thing, but I'm not sure it can be done without extensive analysis of the
entire string in question (which might, after all, be very large) and
thus incurring a significant performance hit. I'm also not sure how to
prove that such a test accurately captures all the edge cases.

Thoughts?

-- Markus

Simon Chiang

unread,
Dec 29, 2008, 1:43:04 AM12/29/08
to za...@googlegroups.com


     * The new version is about as fast as the previous (+/-10% or so
       depending on the data).


Hey Markus,

I took a quick look at the update and saw some opportunities to speed up the String#to_zaml.  I re-forked your repository and made some updates (you may need to re-clone if you've cloned my repo before).  The speedup is fairly dramatic, at least in the 'big data' dumps.

http://github.com/bahuvrihi/zaml/tree/master

Basically what I did was reduce the number of regexp comparisons.  I also pulled them out into a constant, which I think helps, if not a ton.

    num = '[-+]?(0x)?\d+\.?\d*'
    ESCAPE_CASES = [
      /\A(true|false|yes|no|on|null|off|#{num}(:#{num})*|!|=|~)$/i,
      /\A(\n* |[-:?!#&*'"]|<<|%.+:.)/,
      /\s$/,
      /^[>|][-+\d]*\s/i,
      /[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\xFF]|[,\[\]\{\}\r\t]|:\s|\s#/
    ]

The tests still pass, so I think everything is ok.  The one that I wanted to make a special note of is /\s$/.  In your code you test for:

                 (self =~ /\s$/) or
                 ...
                 (self[-1..-1] =~ /\s/) or

I think that /\s$/ should cover both these cases unless you're specifically using $ instead of \Z.  In that case the proper regexp might be something like /\s$/m, or /\s($|\Z)/ (I don't know for sure).  Anyhow, maybe working on the regexps more will yield more benefits.  I haven't done any profiling but I'm guessing they're a bottleneck.

Cheers!

- Simon


Jesse Hallett

unread,
Dec 29, 2008, 2:50:39 AM12/29/08
to za...@googlegroups.com
Thanks Simon! I went ahead and pushed your changes in. But I took the
liberty of fiddling with the indentation of the ESCAPE_CASES
definition. Did I get it right Markus?

Cheers,
Jesse

Markus

unread,
Dec 29, 2008, 12:17:10 PM12/29/08
to za...@googlegroups.com
Simon/Jesse --

I'm mostly reverting these changes.

First, I don't think the changes do what you think they do. Because a
constant array always evaluates to true (with no overhead), they've
effectively reduced the logic to:

class String
def to_zaml(z)
z.first_time_only(self) {
z.emit("\"#{escaped_for_zaml}\"")
}
end
end

...which of course is faster (and essentially what we had originally),
but it totally defeats the goal voiced by Ragav (and others) to have the
multiline strings readable. When I raised this as an area for future
work, I was thinking of something a little more balanced.

If this is what we want to do, the code above would do the same thing,
be slightly faster, and a heck of a lot clearer.

> The tests still pass, so I think everything is ok.

We need better test cases; specifically, while we do round trip testing
for accuracy on a bunch of things, we don't have any tests for the
readability constraint.

> I think that /\s$/ should cover both these cases unless you're
> specifically using $ instead of \Z. In that case the proper
> regexp might be something like /\s$/m, or /\s($|\Z)/ (I don't
> know for sure).

I'll look at it. There are a number of cases where the regexps were
carefully tailored to let specific cases through but not others so (for
example) a paragraph with trailing spaces at the end of internal lines
but not on the last line would use one of the -| formats instead of
quoting and escaping the "\n"s.

> Anyhow, maybe working on the regexps more will yield more benefits.
> I haven't done any profiling but I'm guessing they're a bottleneck.

Possibly. As I noted yesterday, it's a balancing act. Do we want to go
back to speed as the only constraint, and ignore readability?

Also, it looks as if I may have failed to push my full change set (which
is why --I'm guessing here--the gem wasn't building for one of you).
But I'm taking out the gem-building test because of load path
incompatibilities between test/unit and the gem spec stuff. The net
effect of this means we don't need to edit the gemspec just to change
the data and version (nor do we have to maintain it in two places); just
change it in zaml.rb and the gemspec will pull it from there.

-- Markus


Markus

unread,
Dec 29, 2008, 1:01:34 PM12/29/08
to za...@googlegroups.com
All --

I'm seeing at best a <10% performance improvement by pre-compiling the
regexps in String#to_zaml, or even eliminating them and always using
String#escape_for_zaml with it's regexps precompiled. This is within
the run-to-run variability, so I'm not even sure if it's real.

What sort of speedup were you seeing?

-- Markus


Ragav Satish

unread,
Dec 29, 2008, 1:16:24 PM12/29/08
to za...@googlegroups.com
I replied to previous query from you without reading the rest of my
email.

Could you explain this

> Likewise, the -|, -|- and -|+ formats could make multi-paragraph blocks
> of text much more readable than embedded "\n" notation, but aren't
> reloaded properly in many cases.

I've tried this with yaml (the cases you mentioned in the previous mail)

---
hello: |-
0:0
world: |-
!\n
how : |-
!

and I get the desired output

{"world"=>"!\\n", "how"=>"!", "hello"=>"0:0"}

--Cheers
--Ragav

Simon Chiang

unread,
Dec 29, 2008, 1:31:36 PM12/29/08
to za...@googlegroups.com

First, I don't think the changes do what you think they do.  Because a
constant array always evaluates to true (with no overhead), they've
effectively reduced the logic to:

class String
   def to_zaml(z)
       z.first_time_only(self) {
           z.emit("\"#{escaped_for_zaml}\"")
           }
       end
   end


You know it's funny because after I posted I saw what I think you're talking about... I did some (now I see) flawed tests that for somehow made me think everything was ok.  Anyhow, this:

case
when self == ""
when *ESCAPE_CASES
when self =~ /\n/
else
end

Should be this:

case self
when ""
when *ESCAPE_CASES
when /\n/
else
end

I think then everything is actually working (hopefully!).  When I comment out one of the regexps, the tests break, so I believe there is a meaningful comparison going on.  As for the speed, when I benchmark at 'c877030...'

dump time for deeply nested arrays:
      user     system      total        real
yaml  3.820000   0.050000   3.870000 (  3.930472)
zaml  1.010000   0.010000   1.020000 (  1.020790)
.
dump:
      user     system      total        real
yaml  0.030000   0.000000   0.030000 (  0.027838)
zaml  0.020000   0.000000   0.020000 (  0.012775)
.
dump time for big data:
      user     system      total        real
s = 10
yaml  0.670000   0.000000   0.670000 (  0.677598)
zaml  0.270000   0.010000   0.280000 (  0.281358)
s = 100
yaml  0.290000   0.000000   0.290000 (  0.300507)
zaml  0.150000   0.000000   0.150000 (  0.152046)
s = 1000
yaml  0.240000   0.010000   0.250000 (  0.263622)
zaml  0.140000   0.000000   0.140000 (  0.139986)
.
dump time for complex data:
      user     system      total        real
yaml  0.330000   0.000000   0.330000 (  0.339236)
zaml  0.110000   0.000000   0.110000 (  0.116310)
.
dump time for lots of back references:
      user     system      total        real
zaml  5.950000   0.140000   6.090000 (  6.126743)
.
dump time for big, tangled nest of objects:
      user     system      total        real
zaml 13.070000   0.270000  13.340000 ( 13.463683)

When I benchmark at my master '3b17c112...'

dump time for deeply nested arrays:
      user     system      total        real
yaml  3.860000   0.050000   3.910000 (  3.976390)
zaml  1.010000   0.010000   1.020000 (  1.027761)
.
dump:
      user     system      total        real
yaml  0.030000   0.000000   0.030000 (  0.026971)
zaml  0.010000   0.000000   0.010000 (  0.009442)
.
dump time for big data:
      user     system      total        real
s = 10
yaml  0.670000   0.010000   0.680000 (  0.723018)
zaml  0.180000   0.000000   0.180000 (  0.181383)
s = 100
yaml  0.280000   0.000000   0.280000 (  0.288529)
zaml  0.090000   0.000000   0.090000 (  0.100249)
s = 1000
yaml  0.230000   0.000000   0.230000 (  0.237980)
zaml  0.090000   0.000000   0.090000 (  0.092310)
.
dump time for complex data:
      user     system      total        real
yaml  0.330000   0.000000   0.330000 (  0.325796)
zaml  0.090000   0.000000   0.090000 (  0.086142)
.
dump time for lots of back references:
      user     system      total        real
zaml  5.880000   0.130000   6.010000 (  6.044775)
.
dump time for big, tangled nest of objects:
      user     system      total        real
zaml 12.900000   0.260000  13.160000 ( 13.201744)

I commented out the yaml benchmarks for 'back references' and 'big tangled nest' because they were taking very long.  Noting that, check out the numbers on 'big data'.  My times are roughly 2/3 with the escape cases array than without.

Sorry about the initial flub!

- Simon

Markus

unread,
Dec 29, 2008, 3:37:26 PM12/29/08
to za...@googlegroups.com
Simon --

>
> I think then everything is actually working (hopefully!). When I
> comment out one of the regexps, the tests break, so I believe there is
> a meaningful comparison going on.

*smile* That sounds convincing.

Condensed:
> As for the speed...

3b17c112... c877030...
> dump:
> zaml 0.020000 0.010000
> big data:
> s = 10 0.270000 0.180000
> s = 100 0.150000 0.090000
> s = 1000 0.140000 0.090000


> .
> dump time for complex data:

> zaml 0.110000 0.090000

> Noting that, check out the numbers on 'big data'. My times are
> roughly 2/3 with the escape cases array than without.
>

That does look significant. I'll see if I can figure out why I wasn't
seeing the improvement.

> Sorry about the initial flub!
>

Not to worry. If we all try to be careful, and all watch each other's
changes with a skeptical eye, we should be able to catch these things
before they become problems.

-- Markus

Markus

unread,
Dec 29, 2008, 4:01:55 PM12/29/08
to za...@googlegroups.com
Ragav --

> Could you explain this...

I'll try. I'll pair up the examples I gave with how they fail (since
there are multiple failure modes, and I may not have been clear about
what causes what).

irb> require 'yaml'
=> true
irb> YAML.load(YAML.dump("\n"))
=> ""

Note that YAML does not correctly load its own output in this case.

irb> print YAML.dump("0:0")
--- "0:0"
=> nil
irb> YAML.load("--- 0:0")
=> 0

This is a case where YAML quotes a string which we weren't quoting.
Note that if it isn't quoted, YAML reads it back incorrectly.

irb> print YAML.dump("\n ....\n")
--- |

....

=> nil
irb> YAML.dump("\n ....\n")
=> "--- |\n\n ....\n\n"
irb> YAML.load(YAML.dump("\n ....\n"))
=> "\n....\n"

Here YAML produces what looks like correct output, but fails to read it
in correctly. Just adding an "x" to the end of the string makes for
even more interesting behavior:

irb> print YAML.dump("\n ....\nx")
--- |-

....
x
=> nil
irb> YAML.dump("\n ....\nx")
=> "--- |-\n\n ....\nx\n"
irb> YAML.load(YAML.dump("\n ....\nx"))
ArgumentError: syntax error on line 3, col 0: `x'
from /usr/lib/ruby/1.8/yaml.rb:133:in `load'
from /usr/lib/ruby/1.8/yaml.rb:133:in `load'
from (irb):19
from /usr/lib/ruby/1.8/yaml/rubytypes.rb:120

Again, YAML produces what looks like correct output, but if you try to
read it back in it crashes.

And on and on.

While each of these cases may be relatively simple, there are a lot of
them and the patterns aren't readily apparent to me. Running tests on
all* 0,1,2,3,... character strings, I kept finding new exception and odd
behaviors up to 5 character strings.

I'm not saying there isn't a good resolution, just that I haven't seen
it yet.

-- Markus

*Not really all, but a huge number; see the test unit for details.


Ragav Satish

unread,
Dec 29, 2008, 5:35:32 PM12/29/08
to za...@googlegroups.com
On Mon, 2008-12-29 at 13:01 -0800, Markus wrote:
> Ragav --
>
> > Could you explain this...
>
> I'll try. I'll pair up the examples I gave with how they fail (since
> there are multiple failure modes, and I may not have been clear about
> what causes what).
>
> irb> require 'yaml'
> => true
> irb> YAML.load(YAML.dump("\n"))
> => ""
>

This is a YAML bug. It's a special case that why probably never had
reported. This works

irb> YAML.load(YAML.dump("a\n"))
=> "a\n"

>
> irb> print YAML.dump("0:0")
> --- "0:0"
> => nil
> irb> YAML.load("--- 0:0")
> => 0
>
> This is a case where YAML quotes a string which we weren't quoting.

Not sure I follow this. What do you expect this to be ?
irb: YAML.load(YAML.dump("0:0")) => "0:0"

> Note that if it isn't quoted, YAML reads it back incorrectly.

Which is how it should be since it interprets the first char to be a
digit if it looks like one. ZAML should quote it as well if not it won't
be read back correctly.

Ok .. I see this now. The YAML spec for scalars is a complex mess and if
you try to embed spaces it often gets confused with indentation which is
why you see those errors. In a YAML file if you want to embed spaces
double quotes are only option.

Remove the spaces before "...." in your examples and they should work
fine.

Perhaps a reasonable compromise is to quote when you have multiple
spaces embedded and use the block style ("|" "|-" ) otherwise?

--Cheers
--Ragav

Markus

unread,
Dec 29, 2008, 6:34:29 PM12/29/08
to za...@googlegroups.com
Ragav --

> >
> > irb> require 'yaml'
> > => true
> > irb> YAML.load(YAML.dump("\n"))
> > => ""
> >
>
> This is a YAML bug. It's a special case that why probably never had
> reported.

Well, it may be a special case, but it isn't unique. It just happens to
be the simplest of many cases that YAML doesn't handle correctly.

> >
> > irb> print YAML.dump("0:0")
> > --- "0:0"
> > => nil
> > irb> YAML.load("--- 0:0")
> > => 0
> >
> > This is a case where YAML quotes a string which we weren't quoting.
>
> Not sure I follow this. What do you expect this to be ?
> irb: YAML.load(YAML.dump("0:0")) => "0:0"
>
> > Note that if it isn't quoted, YAML reads it back incorrectly.
> Which is how it should be since it interprets the first char to be a
> digit if it looks like one. ZAML should quote it as well if not it won't
> be read back correctly.

Yes. Though if it's taking it as a number it should object, rather than
just silently discarding the trailing ":0"; also, from the docs, you
could also conclude that "--- 0:0" should be treated as a string (since
it isn't a valid number) or even as a hash ({0=>0}).

The point of this example was that our previous implementation was
producing things that YAML didn't load correctly, and thus needed to be
changed.

Maybe not the only option, but the only one I've yet been able to get my
head all the way around. For example, the second case in this series:

irb> YAML.load("--- <")
=> "<"
irb> YAML.load("--- <<")
=> #<YAML::Syck::MergeKey:0xb7a933e8>
irb> YAML.load("--- <<<")
=> "<<<"
irb> YAML.load("--- <<<<")
=> "<<<<"



> Perhaps a reasonable compromise is to quote when you have multiple
> spaces embedded and use the block style ("|" "|-" ) otherwise?

It isn't just multiple consecutive spaces (remember your "- - -" case),
and any reasonable chunk of text is going to have multiple
non-consecutive spaces.

I'd certainly be willing to entertain such a compromise (in fact, my
main reason for throwing the issue out there is to see if we can find
one) but I'm going to take some convincing before announcing that we've
succeeded.

Specifically, I'd at least like to see it pass the full short_strings
test (most of which is disabled by default in test_zaml.rb because it
takes several hours to run) and possibly even an extended version to try
out some representative strings longer than 5 characters.

-- Markus


Markus

unread,
Dec 29, 2008, 10:21:26 PM12/29/08
to za...@googlegroups.com
All --

'cause it wouldn't be nice not to share the fun.

--------------------------------------------------------------------

(markus@quadcore) ~/projects/zaml/temp> irb < yaml_huh
require 'yaml'
true

def test(s)
yd = YAML.dump(s)
yl = YAML.load(yd)
print "Consider #{s.inspect}\n"
print " #{yd.split("\n").join(" ")}\n"
print " (#{yd.inspect})\n"
print " loads as #{yl.inspect}\n"
print "yaml.rb handles #{s.inspect} #{(yl == s) ? 'correctly' :
'incorrectly'}\n"
"----------"
end
nil

#
# First note that yaml.rb doesn't deal well with the string "\r"
#
test("\r")
Consider "\r"
---
("--- \r\n")
loads as nil
yaml.rb handles "\r" incorrectly
"----------"
#
# So as a first hypothesis we assume that yaml doesn't handle strings
# containing \r. But then noticing that the following three cases
# are handled correctly:
#
test("a\rb")
Consider "a\rb"
--- a
b
("--- a\rb\n")
loads as "a\rb"
yaml.rb handles "a\rb" correctly
"----------"
test("\0\r")
Consider "\000\r"
--- !binary | AA0=
("--- !binary |\nAA0=\n\n")
loads as "\000\r"
yaml.rb handles "\000\r" correctly
"----------"
test(" \r")
Consider " \r"
--- " \r"
("--- \" \\r\"\n")
loads as " \r"
yaml.rb handles " \r" correctly
"----------"
#
# ...we revise our hypothesis to state that yaml doesn't handle
# strings ending in "\r" correctly if presented as bare literals.
#
# Ah, but then we notice that it does deal with a bare literals
# ending in "\r":
#
test("\r~,\r")
Consider "\r~,\r"
---
~,
("--- \r~,\r\n")
loads as "\r~,\r"
yaml.rb handles "\r~,\r" correctly
"----------"
#
# So we again revise our rule to:
#
# yaml doesn't handle strings ending in "\r" correctly if they
# presented as bare literals unless they also start with "\r"
#
# Oh, but wait:
#
test("\rx\r")
Consider "\rx\r"
---
x
("--- \rx\r\n")
loads as "\rx"
yaml.rb handles "\rx\r" incorrectly
"----------"
#
# At this point I decided to stop & share the fun. If you're
# bored, you may want to come up with a cogent explanation of
# yaml's behavior that also accounts for the following cases:
#
test("\r\r\n|")
Consider "\r\r\n|"
--- |-

|
("--- |-\n\r\r\n|\n")
loads as "\r\n|"
yaml.rb handles "\r\r\n|" incorrectly
"----------"
test("}\r\n)")
Consider "}\r\n)"
--- |- }
)
("--- |-\n}\r\n)\n")
loads as "}\n)"
yaml.rb handles "}\r\n)" incorrectly
"----------"
test("~\t,\r")
Consider "~\t,\r"
--- ~ ,
("--- ~\t,\r\n")
loads as "~\t,\r"
yaml.rb handles "~\t,\r" correctly
"----------"
test("~\r\n&")
Consider "~\r\n&"
--- |- ~
&
("--- |-\n~\r\n&\n")
loads as "~\n&"
yaml.rb handles "~\r\n&" incorrectly
"----------"
(markus@quadcore) ~/projects/zaml/temp>

-- Markus

P.S. If you think you have it captured let me know--I've got a whole
bunch more test cases.


Ragav Satish

unread,
Dec 29, 2008, 10:30:45 PM12/29/08
to za...@googlegroups.com

> Yes. Though if it's taking it as a number it should object, rather
> than
> just silently discarding the trailing ":0"; also, from the docs, you
> could also conclude that "--- 0:0" should be treated as a string
> (since
> it isn't a valid number) or even as a hash ({0=>0}).

It appears to me that the YAML parser doesn't do any lookahead so this
case and the other one you mention

irb> YAML.load("--- <<") =>
#<YAML::Syck::MergeKey:0xb7a933e8>

the context is set up as soon as a valid token is obtained (<< is a hash
merge token).

I think this will get better when syck is completely implemented but I'm
not hoping for that anytime soon. I suspect that there are a lot of
corner cases that YAML does not handle correctly.


> It isn't just multiple consecutive spaces (remember your "- - -"
> case),

Could you refresh my memory on this one?

--Cheers
--Ragav


Markus

unread,
Dec 29, 2008, 11:39:00 PM12/29/08
to za...@googlegroups.com

> > It isn't just multiple consecutive spaces (remember your "- - -"
> > case),
>
> Could you refresh my memory on this one?

YAML.load(%q{
---
x x
})
=> "x x"

...but if you replace the "x"s with "-"s:

YAML.load(%q{
---
- -
})
=> [[nil]]


-- Markus


Markus

unread,
Dec 30, 2008, 12:34:35 AM12/30/08
to za...@googlegroups.com
Ragav --

> > It isn't just multiple consecutive spaces (remember your "- - -"
> > case),
>
> Could you refresh my memory on this one?

And the reason you don't remember it was that it was Ian, not you, that
raised it.

Oops.

-- Markus


Jesse Hallett

unread,
Dec 30, 2008, 2:57:34 AM12/30/08
to za...@googlegroups.com
It would be nice if the gemspec would automatically pull the version
number from ZAML. But the gem won't build if there are require lines
in the gemspec:

The gem build failed with the following error:

/usr/lib64/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
`gem_original_require': Insecure operation – gem_original_require

And Github won't build a gem for a new version until the gemspec is
modified anyway.

Jesse Hallett

unread,
Dec 30, 2008, 2:59:26 AM12/30/08
to za...@googlegroups.com
Oh yeah; I keep forgetting to mention this. Now that we have a gemspec
in the github repository, ZAML can be installed from anywhere with
this command:

sudo gem install hallettj-zaml -s http://gems.github.com/

Markus

unread,
Dec 30, 2008, 10:42:15 AM12/30/08
to za...@googlegroups.com
On Mon, 2008-12-29 at 23:57 -0800, Jesse Hallett wrote:
> It would be nice if the gemspec would automatically pull the version
> number from ZAML. But the gem won't build if there are require lines
> in the gemspec:
>
> The gem build failed with the following error:
>
> /usr/lib64/ruby/site_ruby/1.8/rubygems/custom_require.rb:27:in
> `gem_original_require': Insecure operation – gem_original_require


Maybe I'm misunderstanding, but it builds fine for me. Just type:

rake gem

and it puts it in the pkg directory. Or type:

gem build zaml.gemspec

and it puts it where you are. The error your getting comes from using
the old Rakefile, which I may have failed to push the first round.


> And Github won't build a gem for a new version until the gemspec is
> modified anyway.

touch?

Is there an advantage to having github build the gem?

-- Markus


Jesse Hallett

unread,
Dec 30, 2008, 12:25:01 PM12/30/08
to za...@googlegroups.com

I'm sorry; I should have said that github won't build the gem if the gemspec contains require lines. The error I gave came from them. But we could try it with the new Rakefile.

The advantage of having github build the gem is that we can distribute it through github. A lot of people have github in their gem sources next to rubyforge nowadays. If we can give a simple gem install line for installation instructions, it makes it really easy for people to get our code up and running. (See the latest README revision.)

On Dec 30, 2008 7:42 AM, "Markus" <mar...@reality.com> wrote:

On Mon, 2008-12-29 at 23:57 -0800, Jesse Hallett wrote: > It would be nice if the gemspec would aut...

Maybe I'm misunderstanding, but it builds fine for me.  Just type:

   rake gem

and it puts it in the pkg directory.  Or type:

   gem build zaml.gemspec

and it puts it where you are.  The error your getting comes from using
the old Rakefile, which I may have failed to push the first round.

> And Github won't build a gem for a new version until the gemspec is > modified anyway.

touch?

Is there an advantage to having github build the gem?

-- Markus --~--~---------~--~----~------------~-------~--~----~ You received this message beca...

Markus

unread,
Dec 30, 2008, 12:36:50 PM12/30/08
to za...@googlegroups.com
Jesse --

> I'm sorry; I should have said that github won't build the gem if the
> gemspec contains require lines. The error I gave came from them. But
> we could try it with the new Rakefile.
>

Ah, I get it. Sorry I was being slow earlier (pre-coffee).

I wonder if they are using some generic Rakefile or other such system to
build the gem? If so, we may have to abandon my idea of DRYing up the
version info.

It seems very silly (security theater anyone?) for the Rakefile to use
the taint system on gem construction, since the whole point is to
package up files which you control. But that's what the standard
Rakefile does, and that's where the error is coming from. If they
aren't using our Rakefile to build our gem, then we'll have to give up
the idea or come up with another way to do it.

Hmmm. That sounds like a challenge...

-- Markus

Markus

unread,
Dec 30, 2008, 4:41:45 PM12/30/08
to za...@googlegroups.com

> If so, we may have to abandon my idea of DRYing up the
> version info...or come up with another way to do it.

>
> Hmmm. That sounds like a challenge...

How about this:

mask = /VERSION = "(\d+\.\d+\.\d+)"/
if File.readlines('lib/zaml.rb').find {|l| l =~ mask} =~ mask
s.version = $1
else
raise "Unable to find the version number in lib/zaml.rb"
end

-- Markus


Reply all
Reply to author
Forward
0 new messages