Message from discussion
Storing ruby serialized objects as text or binary?
Received: by 10.66.78.40 with SMTP id y8mr4839687paw.9.1352479887489;
Fri, 09 Nov 2012 08:51:27 -0800 (PST)
X-BeenThere: sequel-talk@googlegroups.com
Received: by 10.68.234.37 with SMTP id ub5ls16692388pbc.6.gmail; Fri, 09 Nov
2012 08:51:25 -0800 (PST)
Received: by 10.68.232.167 with SMTP id tp7mr3852275pbc.7.1352479885463;
Fri, 09 Nov 2012 08:51:25 -0800 (PST)
Date: Fri, 9 Nov 2012 08:51:24 -0800 (PST)
From: Jeremy Evans <jeremyeva...@gmail.com>
To: sequel-talk@googlegroups.com
Message-Id: <fb4fbb7c-bb65-4af9-8bb1-9658b9ba5c85@googlegroups.com>
In-Reply-To: <69999408-f4dd-4a73-9a33-30399f78fabe@googlegroups.com>
References: <69999408-f4dd-4a73-9a33-30399f78fabe@googlegroups.com>
Subject: Re: Storing ruby serialized objects as text or binary?
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_193_31153107.1352479884887"
------=_Part_193_31153107.1352479884887
Content-Type: multipart/alternative;
boundary="----=_Part_194_6926057.1352479884887"
------=_Part_194_6926057.1352479884887
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
On Friday, November 9, 2012 1:03:04 AM UTC-8, Clint Pachl wrote:
>
> I am wondering what are the dis/advantages of storing ruby serialized
> objects (i.e. Marshal.dump) directly into a `bytea` column versus Base64
> encoding the byte stream and storing into a `text` column? Also, are there
> any performance concerns with either method? I am using PostgreSQL to be
> specific.
>
> One of the advantages I found in base64 encoding first is that it doesn't
> screw up my terminal when Sequel logs to STDOUT. Maybe there is a way to
> escape or hide this binary data in the logger? However, it seems storing
> the serialized object directly, without encoding, would be more efficient.
>
> (this may make this post irrelevant)
> Finally, I was unable to reconstitute a ruby object after an
> insert/select. I keep getting the error from Marshal.dump, "data too
> short". It wasn't until I base64 encoded first that I got it to work. So
> maybe storing binary data directly doesn't work? I found a post from 2008, How
> do you insert binary data using sequel + postgresql?<https://groups.google.com/forum/#!searchin/sequel-talk/ruby$20binary/sequel-talk/4aRQGNPQ7Po/_hdJMmZ2Ki4J>
>
>
> Here's what I did:
>
> ds = DB[:core__checkout_snapshots]
> checkout_data = Marshal.dump(data)
> ds.insert(id: checkout_id, data: checkout_data)
> Marshal.load(ds[id: checkout_id][:data])
>
>
> This failed with the `data` column as type `bytea` and `text`.
>
When storing data in a bytea column, you need to mark it as a blob:
ds.insert(id: checkout_id, data: Sequel.blob(checkout_data))
That may fix your issue.
Personally, I don't think serialization of ruby objects into a database is
a good idea in most cases, but if you have to do it with Marshal, it's
probably better to store it in bytea instead of base64 encoded text.
About your logger issue, use a custom logger that escapes the output
instead of the default Logger class, that's unrelated to Sequel.
Jeremy
------=_Part_194_6926057.1352479884887
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable
On Friday, November 9, 2012 1:03:04 AM UTC-8, Clint Pachl wrote:<blockquote=
class=3D"gmail_quote" style=3D"margin: 0;margin-left: 0.8ex;border-left: 1=
px #ccc solid;padding-left: 1ex;">I am wondering what are the dis/advantage=
s of storing ruby serialized objects (i.e. Marshal.dump) directly into a `b=
ytea` column versus Base64 encoding the byte stream and storing into a `tex=
t` column? Also, are there any performance concerns with either method? I a=
m using PostgreSQL to be specific.<div><br></div><div>One of the advantages=
I found in base64 encoding first is that it doesn't screw up my terminal w=
hen Sequel logs to STDOUT. Maybe there is a way to escape or hide this bina=
ry data in the logger? However, it seems storing the serialized object dire=
ctly, without encoding, would be more efficient.</div><div><br></div><div>(=
this may make this post irrelevant)</div><div>Finally, I was unable to reco=
nstitute a ruby object after an insert/select. I keep getting the error fro=
m Marshal.dump, "data too short". It wasn't until I base64 encoded first th=
at I got it to work. So maybe storing binary data directly doesn't work? I =
found a post from 2008, <a href=3D"https://groups.google.com/forum/#!search=
in/sequel-talk/ruby$20binary/sequel-talk/4aRQGNPQ7Po/_hdJMmZ2Ki4J" target=
=3D"_blank">How do you insert binary data using sequel + postgresql?</a>&nb=
sp;</div><div><br></div><div>Here's what I did:</div><div><br></div><blockq=
uote style=3D"margin:0 0 0 40px;border:none;padding:0px"><div>ds =3D DB[:co=
re__checkout_snapshots]</div><div>checkout_data =3D Marshal.dump(data)</div=
><div>ds.insert(id: checkout_id, data: checkout_data)</div><div>Marshal.loa=
d(ds[id: checkout_id][:data])</div></blockquote><div><br></div><div>This fa=
iled with the `data` column as type `bytea` and `text`.</div></blockquote><=
div><br>When storing data in a bytea column, you need to mark it as a blob:=
<br><br> ds.insert(id: checkout_id, data: Sequel.blob(checkout_=
data))<br><br>That may fix your issue.<br><br>Personally, I don't think ser=
ialization of ruby objects into a database is a good idea in most cases, bu=
t if you have to do it with Marshal, it's probably better to store it in by=
tea instead of base64 encoded text.<br><br>About your logger issue, use a c=
ustom logger that escapes the output instead of the default Logger class, t=
hat's unrelated to Sequel.<br><br>Jeremy<br></div>
------=_Part_194_6926057.1352479884887--
------=_Part_193_31153107.1352479884887--