Error inserting data into Postgres bytea column (using JDBC/JRuby version of do_postgres)

1,018 views
Skip to first unread message

Abe Voelker

unread,
Sep 11, 2012, 10:13:51 AM9/11/12
to datam...@googlegroups.com
I am unable to insert binary values into a Postgres bytea column using the JDBC version of do_postgres (I'm using JRuby 1.6.7.2 in 1.9 mode).  I have an InterviewForm model that I am trying to create records for, which has a formImage column that is a bytea (it's actually a custom Postgres type "ImageT" that wraps bytea).  This is the model definition:

class InterviewForm
  include DataMapper::Resource
  storage_names[:default] = 'InterviewForm'

  property :interview_id, String,  :field => 'interviewId', :key => true
  property :form_seq_no,  Integer, :field => 'formSeqNo',   :key => true
  property :file_name,    String,  :field => 'fileName'
  property :size,         Integer
  property :sha_256_hash, String,  :field => 'sha256Hash',  :length => 64
  property :form_image,   Binary,  :field => 'formImage'

end

I tried using the usual DataMapper InterviewForm.create method:

require 'open-uri'

InterviewForm.create(:interview_id => 1,
                     :form_seq_no  => 1,
                     :file_name    => "foo.pdf"
                     :size         => blob_string.size,
                     :sha_256_hash => Digest::SHA256.new.hexdigest(blob_string),
                     :form_image   => ::Extlib::ByteArray.new(blob_string))

that initially gave me a validation error - "Form image must be at most 50 characters long," so I added :length => 2000000000 and :lazy => true properties to the :form_image property.  After that, I got the following error for the same statement:

DataObjects::SQLError: ERROR: column "formImage" is of type "ImageT" but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.

I tried all kinds of variations of wrapping the :form_image value in ::Extlib::ByteArray but have not been able to get around this error.  Therefore, I decided to drop down into a raw prepared statement:

require 'open-uri'

InterviewForm.repository.adapter.execute('INSERT INTO "InterviewForm" ("interviewId", "formSeqNo", "fileName", "size", "sha256Hash", "formImage") VALUES(?, ?, ?, ?, ?, ?)',
                                         1,
                                         1,
                                         "foo.pdf",
                                         blob_string.size,
                                         Digest::SHA256.new.hexdigest(blob_string),
                                         ::Extlib::ByteArray.new(blob_string))

However this also errors out.  The error is:

DataObjects::SQLError: ERROR: invalid input syntax for type bytea (code: 0, sql state: 22P02, query: INSERT INTO "InterviewForm" ("interviewId", "formSeqNo", "fileName", "size", "sha256Hash", "formImage") VALUES('1', '1', 'foo.pdf', '869245', '34abd7142491c988bd15515ee74ab5b0ef426994477363508c820634a6edc962', '<bunch of binary data>'), uri: )

This error leads me to believe that prepared statements for ByteArrays are not being escaped as I can replicate the above error by just trying to insert a single \ character:

InterviewForm.repository.adapter.execute('INSERT INTO "InterviewForm" ("interviewId", "formSeqNo", "fileName", "size", "sha256Hash", "formImage") VALUES(?, ?, ?, ?, ?, ?)',
                                         1,
                                         1,
                                         "foo.pdf",
                                         blob_string.size,
                                         Digest::SHA256.new.hexdigest(blob_string),
                                         ::Extlib::ByteArray.new("\\"))

If I use the plain prepared statements, am I responsible for escaping the data on my own?  What is the easiest way to do that - try and find the right method from the JDBC driver itself?  Postgres has some vendor-specific escaping for bytea columns so I'd rather not have to re-invent it myself if possible.

My Gemfile looks like this:

# Postgres databases
gem "pg", "~> 0.14.0", :platforms => :ruby
gem "jdbc-postgres", "~> 9.1.901", :platforms => :jruby
# Use pre-release (1.3.0.beta) versions of DM due to multi_json dependency issue
gem "data_mapper"
gem "dm-postgres-adapter", :git => 'git://github.com/datamapper/dm-postgres-adapter.git'
gem "dm-do-adapter",       :git => 'git://github.com/datamapper/dm-do-adapter.git'
gem "dm-core",             :git => 'git://github.com/datamapper/dm-core.git'
gem "dm-aggregates",       :git => 'git://github.com/datamapper/dm-aggregates.git'
gem "dm-migrations",       :git => 'git://github.com/datamapper/dm-migrations.git'
gem "dm-transactions",     :git => 'git://github.com/datamapper/dm-transactions.git'
gem "dm-serializer",       :git => 'git://github.com/datamapper/dm-serializer.git'
gem "dm-timestamps",       :git => 'git://github.com/datamapper/dm-timestamps.git'
gem "dm-validations",      :git => 'git://github.com/datamapper/dm-validations.git'
gem "dm-types",            :git => 'git://github.com/datamapper/dm-types.git'

Thanks in advance for any assistance.

Abe Voelker

unread,
Sep 11, 2012, 1:01:07 PM9/11/12
to datam...@googlegroups.com
I ran the do_postgres specs on my local Postgres 9.1 database, but didn't get any errors (as expected).  I'm not testing against the same database that is causing me trouble, so it might be something to do with the custom type ImageT column type or the slightly older Postgres version - at this point I'm not sure.

I also tried some bytea escaping code I found in the ActiveRecord driver, but doesn't really work well from within DM.  It successfully inserts, but Postgres seems to be actually storing the escaped string itself as raw data rather than interpreting it as binary before storing - when I query it back out the size is greatly inflated and I have to decode it before I can use it.  If I go that route I might as well just use a VARCHAR and base64 encode/decode.  I'm assuming that DM must do some extra escaping (quoting) of the prepared statement before sending the raw SQL to Postgres.

So my updated question is, is there a way to get to the raw database connection (i.e. the JDBC connection context) from within DM that would let me do plain unquoted SQL queries?

Abe Voelker

unread,
Sep 11, 2012, 3:22:55 PM9/11/12
to datam...@googlegroups.com
Still in the process of tearing my hair out... I added some extra specs to DataObjects to try and isolate this.  The first commit, which just inserts a PDF into a normal bytea column, passes.  The second commit, which adds a custom type that wraps bytea, fails. The error I'm getting from the second commit is

DataObjects::SQLError:
  ERROR: invalid byte sequence for encoding "UTF8": 0x00

Not sure why this error is happening as I explicitly created my test database as UTF-8 with createdb do_test -E UTF8

Any ideas, anyone?

Abe Voelker

unread,
Sep 11, 2012, 5:08:18 PM9/11/12
to datam...@googlegroups.com
There is definitely something going on due to the custom ImageT type.  I had our DBA switch the ImageT type to just a plain bytea and I was able to do an insertion using my original workaround of InterviewForm.repository.adapter.execute.  Trying to do an InterviewForm.create
fails.

Unfortunately, upon fetching the record back from Postgres, the form_image property comes back as a Postgres hex-encoded string.  I added this method to the model to decode it:

def form_image_decoded
  [form_image[2..-1]].pack("H*")
end

That's good enough to get me past this hump, for now.
Reply all
Reply to author
Forward
0 new messages