Saving file content to ByteBuffer and to column does not retrieve the same size of data

571 views
Skip to first unread message

Carlos Scheidecker

unread,
Sep 22, 2014, 6:50:51 AM9/22/14
to us...@cassandra.apache.org, java-dri...@lists.datastax.com
Hello all,

I can successfully read a file to a ByteBuffer and then write to a Cassandra blob column. However, when I retrieve the value of the column, the size of the ByteBuffer retrieved is bigger than the original ByteBuffer where the file was read from. Writing to the disk, corrupts the image.

So I read a JPG:

private ByteBuffer readFile(String filename) {
FileInputStream fIn;
FileChannel fChan;
long fSize;
ByteBuffer mBuf;
try {
fIn = new FileInputStream(filename);
fChan = fIn.getChannel();
fSize = fChan.size();
mBuf = ByteBuffer.allocate((int) fSize);
fChan.read(mBuf);
//mBuf.rewind();
fChan.close();
fIn.close();
//mBuf.clear();
return mBuf;
} catch (IOException exc) {
System.out.println("I/O Error: " + exc.getMessage());
System.exit(1);
}
return null;
}

So I run readFile and save it to the contents column down which is a blob.

ByteBuffer file1 = readFile(filename);

Then I retrieve the object I call Attachment which has the following definition from the table:

CREATE TABLE attachment (
  attachmentid timeuuid,
  attachmentname varchar,
  mimecontenttype varchar,
  contents blob,
  PRIMARY KEY(attachmentid)
);

The ByteBuffer is saved under the contents column.

When I retrieve that ByteBuffer file2 = readAttachment.getContents();

I notice that file1 is java.nio.HeapByteBuffer[pos=99801 lim=99801 cap=99801] and file2 that is retrieved is java.nio.HeapByteBuffer[pos=0 lim=99937 cap=99937]

As you can see the original size was 99801 how it was save, and is now 99937 so when I get that retrieved ByteBuffer which I assign to file2 and write it to disk, I have a JPG that I cannot read because has some metadata in from of it.

I am attaching the original JPG, 1.jpg and 1_retrieve.jpg which is the one with size 99937.

So there is a difference of 99937-99801=136 between them.

Therefore, taking the retrieved one and saving to a physical file will prevent the system to display the JPG as it will error as: "Error interpreting JPEG image file (Not a JPEG file: starts with 0x82 0x00)"

Why is that the blob retrieved from the column bigger than the one I have saved initially to the same column? Is there an offset that I need to deal with? If I call buffer.arrayOffset() on the retrieved ByteBuffer the value of the offset is 0.

Here's the function I use to write file2 to the physical file 1_retrieved.jpg: 

private void writeFile(ByteBuffer buffer, String destname) {
try (RandomAccessFile out = new RandomAccessFile(destname, "rw")) {
   //
   byte[] data = new byte[buffer.limit()];
   //buffer.flip();
   buffer.get(data);
   out.write(data);
   out.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

Thanks.
1.jpg
1_retrieved.jpg

Carlos Scheidecker

unread,
Sep 22, 2014, 6:56:34 AM9/22/14
to java-dri...@lists.datastax.com
1.jpg
1_retrieved.jpg

Olivier Michallat

unread,
Sep 22, 2014, 11:35:48 AM9/22/14
to java-dri...@lists.datastax.com
Try with a Channel. Quick and dirty example (bad exception handling):

        File outFile = new File(destName);
        FileOutputStream out = new FileOutputStream(outFile, false);
        FileChannel outChannel = out.getChannel();
        outChannel.write(buffer);
        outChannel.close();
        out.close();

Side note: make sure you close all resources, even when an exception occurs. This is not the case in your readFile method. If you use Java 7, try-with-resource makes it much simpler.


datastax_logo.png

OLIVIER MICHALLAT

Drivers & Tools Engineer | olivier....@datastax.com


linkedin.png facebook.png twitter.png g+.png



To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Carlos Scheidecker

unread,
Sep 22, 2014, 3:19:17 PM9/22/14
to java-dri...@lists.datastax.com
Olivier,

Thanks. But I have tried that before as well, as you can see bellow, to the exact same effect. The file size is larger after retrieved.

private void writeFile2(ByteBuffer buf, String destname) {
try {
buf.clear();
File file = new File(destname);
// write new file
boolean append = false;
FileOutputStream fs = new FileOutputStream(file, append);
FileChannel channel = fs.getChannel();
channel.position(channel.size());
//buf.clear();
// Flips this buffer. The limit is set to the current position and
// then
// the position is set to zero. If the mark is defined then it is
// discarded.
buf.clear();
// Writes a sequence of bytes to this channel from the given buffer.
buf.position(buf.arrayOffset());
channel.write(buf);
// close the objects
channel.close();
fs.close();
} catch (IOException e) {
System.out.println("I/O Error: " + e.getMessage());
System.exit(1);

Carlos Scheidecker

unread,
Sep 22, 2014, 5:50:13 PM9/22/14
to java-dri...@lists.datastax.com
The goal is to write a file to a Blob (to which I have successfully done) and then read it back and write the file to the disk. 

Now, the big issue which is driving me crazy is the fact that the file ByteBuffer is saved properly, but when it is retrieved from the Table blob column, it comes back with a bigger size so that writing it back to a file will make the file unreadable if it is a JPG for instance.

Therefore, looking at the retrieved file with the debugger I can see that there is more information on the byte array at the front and then comes the original content.

Looking at it with an editor, I see there is information like the original path of the file. So if I take all the bytes and write it back to a JPG file, the JPG will not work of course.

Rick Bullotta

unread,
Sep 22, 2014, 6:04:29 PM9/22/14
to java-dri...@lists.datastax.com
Send the first 50 bytes of each (input and output) as hex and let's see if it is anything obvious. 

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Carlos Scheidecker

unread,
Sep 22, 2014, 6:35:05 PM9/22/14
to java-dri...@lists.datastax.com
Guys,

It looks like that if I skip the first 8 bytes and set the position to 129, then it works:

(file2 is a Bytebuffer containing the retrieve contents blob field using the driver mapping)

file2.position(129);
this.writeFile(file2, destname);

private void writeFile(ByteBuffer buf, String destname) {
try {
//buf.clear();
File file = new File(destname);
// write new file
boolean append = false;
FileOutputStream fs = new FileOutputStream(file, append);
FileChannel channel = fs.getChannel();
channel.position(channel.size());
//buf.clear();
// Flips this buffer. The limit is set to the current position and
// then
// the position is set to zero. If the mark is defined then it is
// discarded.
//buf.clear();
// Writes a sequence of bytes to this channel from the given buffer.
//buf.position(buf.arrayOffset());
channel.write(buf);
// close the objects
channel.close();
fs.close();
} catch (IOException e) {
//System.out.println("I/O Error: " + e.getMessage());
//System.exit(1);

Alex Popescu

unread,
Sep 22, 2014, 7:09:18 PM9/22/14
to java-dri...@lists.datastax.com
--

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru

Carlos Scheidecker

unread,
Sep 22, 2014, 9:05:00 PM9/22/14
to java-dri...@lists.datastax.com
Alex,

Thanks. But I have seen that before.

It did not help because they use the rows from the ResultSet instead of using the mapper like I did.

Nonetheless, the first 128 bits or 16 bytes are offset which includes probably information of the ByteBuffer object itself.

Johan Mjönes

unread,
Sep 23, 2014, 3:56:46 AM9/23/14
to java-dri...@lists.datastax.com
Carlos,

The URL Alex posted is the correct answer.

You're using the ByteBuffer in the wrong way in the first snippets of code you posted:

private void writeFile(ByteBuffer buffer, String destname) {
try (RandomAccessFile out = new RandomAccessFile(destname, "rw")) {
    //
    byte[] data = new byte[buffer.limit()];
    //buffer.flip();
    buffer.get(data);
    out.write(data);
    out.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

The best way to do the above is using the Bytes helper provided by the driver:

        byte[] data = Bytes.getArray(buffer);

The extra bytes you're seeing is probably the driver reusing the backing byte array from the original response (note: I'm speculating). However, the ByteBuffer is properly set up for you to read your bytes properly.

Regards,

Johan

Carlos Scheidecker

unread,
Sep 23, 2014, 4:51:49 AM9/23/14
to java-dri...@lists.datastax.com
Thanks Johan,

Yeah, initially is what I did. However the size of the byte array will be limited to Integer.MAX_VALUE - 5  if I do that way. Correct?

Therefore one can only have 2.147.483.642 bytes correct? I guess that is plenty.

Carlos Scheidecker

unread,
Sep 23, 2014, 5:01:59 AM9/23/14
to java-dri...@lists.datastax.com
Johan,

Doing that, will still copy the extra bytes, so:

private void writeFile(ByteBuffer buffer, String destname) {
try (RandomAccessFile out = new RandomAccessFile(destname, "rw")) {
   byte[] data = Bytes.getArray(buffer);
   out.write(data);
   out.close();
} catch (IOException e) {
// TODO Auto-generated catch block
//e.printStackTrace();
}
}

But if I do not set the position to 129 it will still copy the extra bytes.

ByteBuffer file2 = readAttachment.getContents();
//int diff = file2.capacity() - file1.capacity();
file2.position(129);

So, thanks for showing the correct way of using it but the extra 16 initial bytes still need to be offset.

On Tue, Sep 23, 2014 at 1:56 AM, Johan Mjönes <johan....@gmail.com> wrote:

Sylvain Lebresne

unread,
Sep 30, 2014, 3:53:38 AM9/30/14
to us...@cassandra.apache.org, java-dri...@lists.datastax.com
On Tue, Sep 30, 2014 at 2:25 AM, Robert Coli <rc...@eventbrite.com> wrote:
On Mon, Sep 22, 2014 at 3:50 AM, Carlos Scheidecker <nand...@gmail.com> wrote:
I can successfully read a file to a ByteBuffer and then write to a Cassandra blob column. However, when I retrieve the value of the column, the size of the ByteBuffer retrieved is bigger than the original ByteBuffer where the file was read from. Writing to the disk, corrupts the image.

Probably don't write binary blobs like images into a database, use a distributed filesystem?

I've very successfully stored lots of small images into Cassandra so I have to disagree with that far too quick conclusion. Cassandra always read blobs in their entirety, so it's definitively not very good with very large blobs, but there is many cases where images are known to be pretty small (I was personally storing thumbnails) and in those cases, it is my experience that Cassandra is a very viable solution.


But I agree that this behavior sounds like a bug, I would probably file it as a JIRA on http://issues.apache.org and then tell the list the URL of the JIRA you filed.

I actually doubt it is a bug, and it's almost certainly not a Cassandra bug (so please, do *no* open a JIRA on http://issues.apache.org). I suspect a bad use of the ByteBuffer API (which is definitively a very confusing API, but that's what Java gives us). Typically, in your snippet of code above, the line:

byte[] data = new byte[buffer.limit()];
is incorrect. 'buffer.limit()' is not the number of valid bytes in the buffer, you should use 'buffer.remaining()' for that. You should also be careful with messing with 'arrayOffset', a line that
    buf.position(buf.arrayOffset());
(also from one of you snippet above) is almost surely wrong.

--
Sylvain

noah chanmala

unread,
Aug 5, 2015, 4:05:47 PM8/5/15
to DataStax Java Driver for Apache Cassandra User Mailing List, us...@cassandra.apache.org
Carlos,

I am having the same issue as you.  I used below:

for (Row row : rs) {
    bytebuff=row.getBytes("data");
    byte data[];
    data=Bytes.getArray(bytebuff);
    out.write(data);

 Did you ever find the answer for this?

Thanks,

Noah

noah chanmala

unread,
Aug 5, 2015, 4:29:48 PM8/5/15
to DataStax Java Driver for Apache Cassandra User Mailing List, us...@cassandra.apache.org

ok, I fixed my issue by making sure that the total chunk I read in is equaled the total size of the file.  Using Bytes.getArray(bytebuff) to write out and my total file read and write is now the same size.

Thanks,

Noah


On Monday, September 22, 2014 at 6:50:51 AM UTC-4, Carlos Scheidecker wrote:
Reply all
Reply to author
Forward
0 new messages