How does CBL 1.4 unique CBLAttachments?

5 views
Skip to first unread message

Brendan Duddridge

unread,
Nov 9, 2017, 2:45:43 PM11/9/17
to Couchbase Mobile
Hi,

I have a customer who has a strange problem. He's attaching PDF files to different documents, but the PDF files have the same filename. They differ only slightly in their content.

It appears that the digest internal to the CBLAttachment is the same for these different files so they end up getting mapped to the same .blob file in the attachments folder.

Does CBL 1.4 use more than just the filename to unique the digests? Or is it based solely on the filename provided when calling setAttachmentNamed?

Are there any situations where the digest can be generated with the same value for different files?

Thanks,

Brendan

Jens Alfke

unread,
Nov 9, 2017, 4:05:05 PM11/9/17
to Couchbase Mobile


On Nov 9, 2017, at 11:45 AM, Brendan Duddridge <bren...@gmail.com> wrote:

I have a customer who has a strange problem. He's attaching PDF files to different documents, but the PDF files have the same filename. They differ only slightly in their content.

It appears that the digest internal to the CBLAttachment is the same for these different files so they end up getting mapped to the same .blob file in the attachments folder.

The digest is a SHA-1 digest of the attachment data. (It has nothing to do with the name.) It would be astronomically unlikely for two attachments with different data to produce the same digest.

unless these happen to be the two specific PDFs that were created to demonstrate a SHA-1 hash collision attack earlier this year. "As a proof of the attack, we are releasing two PDFs that have identical SHA-1 hashes but different content.”

—Jens

Brendan Duddridge

unread,
Nov 9, 2017, 6:31:11 PM11/9/17
to Couchbase Mobile

The digest is a SHA-1 digest of the attachment data. (It has nothing to do with the name.) It would be astronomically unlikely for two attachments with different data to produce the same digest.

unless these happen to be the two specific PDFs that were created to demonstrate a SHA-1 hash collision attack earlier this year. "As a proof of the attack, we are releasing two PDFs that have identical SHA-1 hashes but different content.”


Ok, that's good to know the digest isn't based on the filename. There must be a bug in my code causing this to happen then. I'll dig deeper. 

Thanks!

Brendan

Brendan Duddridge

unread,
Nov 9, 2017, 11:34:10 PM11/9/17
to Couchbase Mobile
Turns out it WAS a bug in my code. And oddly enough I couldn't reproduce it because the version I was using had this bug fixed inadvertently.  I had recently fixed a crash, so I reworked some code in my file attachments controller and it turns out that fixed this uniquing problem too! 

But thanks for the info about how the digest is generated.
Reply all
Reply to author
Forward
0 new messages