Optimization: Marshal serialized attributes

156 views
Skip to first unread message

Stephen Celis

unread,
Oct 8, 2008, 12:07:36 PM10/8/08
to Ruby on Rails: Core
Here's a ticket for a simple patch to use Marshal instead of YAML for
attribute serialization. Marshaling is significantly faster (see in
link), and fixes some YAML load issues (including an outstanding
ticket).

http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1191-marshal-serialized-attributes

Simple enough?

Stephen

Frederick Cheung

unread,
Oct 8, 2008, 12:23:42 PM10/8/08
to rubyonra...@googlegroups.com

While the option is fair enough, I don't thing all existing apps
wouldn't want this turned on "silently":
- if other people use your database yaml is ok as there are parsers
for it in many languages whereas Marshal would be a PITA
- if your existing column is not a blob column (which it wouldn't have
to be previously since yaml generates plain text), the database will
throw a hissy fit (or just truncate the data) when you try to insert a
character that is not legal in the charset used.
- should you be calling string_to_binary if the column supports it ?

Fred
> Simple enough?
>
> Stephen
> >

Izidor Jerebic

unread,
Oct 9, 2008, 3:45:36 AM10/9/08
to rubyonra...@googlegroups.com

To add additional point for -1: Marshal uses a binary format and does
not guarantee any compatibility with anything except the ruby on the
computer which created it. Marshal uses internal format versioning
numbers which do not correspond to ruby versions. This means your
database backups are potentially non-portable to other OS/ruby/
computer version. This in itself makes this option a non-starter, imho.

You could add the option, but it should never become a default. And
even an option should come with a big warning.

izidor

Stephen Celis

unread,
Oct 9, 2008, 8:17:46 AM10/9/08
to Ruby on Rails: Core
Yes; after Fred's post I did a bit more research and realize it's a
big no for portable apps. Not an impossible hurdle to deal with, but
not a desirable default. I also realized that AR in its current state
has no simple interface for such an option: quoting's serialization is
abstracted away from table and model information. While MySQL handled
Marshal swimmingly in string columns, I don't think other adapters
would agree.

I've tried to stay away from serialize where possible, but am dealing
with it on a current project and noticed complaints from others
regarding the speed at which large groups of objects with serialized
attributes are instantiated from the database. YAML has been the
culprit.

Stephen


On Oct 9, 2:45 am, Izidor Jerebic <ij.rubyl...@gmail.com> wrote:
> To add additional point for -1: Marshal uses a binary format and does  
> not guarantee any compatibility with anything except the ruby on the  
> computer which created it. Marshal uses internal format versioning  
> numbers which do not correspond to ruby versions. This means your  
> database backups are potentially non-portable to other OS/ruby/
> computer version. This in itself makes this option a non-starter, imho.
>
> You could add the option, but it should never become a default. And  
> even an option should come with a big warning.
>
> izidor
>
> On 8.10.2008, at 18:23, Frederick Cheung wrote:
>
>
>
>
>
> > On 8 Oct 2008, at 17:07, Stephen Celis wrote:
>
> >> Here's a ticket for a simple patch to use Marshal instead of YAML for
> >> attribute serialization. Marshaling is significantly faster (see in
> >> link), and fixes some YAML load issues (including an outstanding
> >> ticket).
>
> >>http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/11...

Duncan Beevers

unread,
Oct 9, 2008, 12:52:34 PM10/9/08
to rubyonra...@googlegroups.com
There are also faster YAML dumpers out there.  A recent Portland code sprint produced ZAML, a work-in-progress that offers (if I recall correctly) something like a 14x speed boost over vanilla YAML.dump


YAML's slowness has been a pain point, but we don't have to sacrifice portability.

Stephen Celis

unread,
Oct 9, 2008, 2:44:01 PM10/9/08
to Ruby on Rails: Core
I did a quick benchmark of ZAML and found it to be slower:

http://pastie.org/288592


On Oct 9, 11:52 am, "Duncan Beevers" <duncanbeev...@gmail.com> wrote:
> There are also faster YAML dumpers out there.  A recent Portland code sprint
> produced ZAML, a work-in-progress that offers (if I recall correctly)
> something like a 14x speed boost over vanilla YAML.dumphttp://github.com/hallettj/zaml/tree/master/zaml.rb
>
> YAML's slowness has been a pain point, but we don't have to sacrifice
> portability.
>

Frederick Cheung

unread,
Oct 10, 2008, 4:48:31 AM10/10/08
to rubyonra...@googlegroups.com

On 9 Oct 2008, at 19:44, Stephen Celis wrote:

>
> I did a quick benchmark of ZAML and found it to be slower:
>
> http://pastie.org/288592
>

I swapped the IO for a StringIO and ZAML was then twice as fast:

user system total real
yaml 0.360000 0.010000 0.370000 ( 0.375689)
zaml 0.150000 0.000000 0.150000 ( 0.154976)
>

making the dumped object a little less trivial (seems to me like the
initial test only really tests the overhead in getting things set up)
further increases the difference
to_dump = ['foo', 'bar', 'baz', Time.now, {'key' => 'value', 'bar' =>
{1 => 'hello', 2 => 'world'}}]*5

yaml 9.260000 0.060000 9.320000 ( 9.413077)
zaml 2.790000 0.010000 2.800000 ( 2.827708)

Reply all
Reply to author
Forward
0 new messages