Maintaining an Image Database

5 views
Skip to first unread message

Loren M. Lang

unread,
May 21, 2011, 10:29:17 PM5/21/11
to lifel...@googlegroups.com
I was reading through the Exif specification and discovered that is does
indeed support adding a unique identifiers called ImageUniqueID to the
Exif header. ImageUniqueID is a random 128-bit number added to an
image, preserved across edits, and never modified. Now certainly, it's
possible some image editors may fail to preserve it and it's much more
likely for a program to drop the Exif header that it might be to drop a
JPEG Comment header which has been a part of the core JPEG specification
pre-dating even JFIF/JPEG files that were used in the early to mid '90s.
jpegtran's default behavior is -copy comments which will copy JPEG
Comment headers, but not Application Extensions like Exif. If this is
not a concern, I can just use the feature in Exif rather than trying to
create my own using a JPEG Comment.

Quick side-note, should I attempt to record other information like
rotated, or geo-tagged (indicating it's GPS coordinates were added after
the shoot)? With other sources I don't record this and consider them
original if they don't already exist in the database.

Now when I process an image, I can add a unique ID if the image lack one
and use that in a central database to track it. So far, I have not seen
any programs that generate the unique ID already, but even if I did, it
should not matter since any program should be able to ensure uniqueness
with a sufficient random number generator.

Once I have the unique ID, I can add an entry into the image database
and record information such as name, date taken and GPS coordinates
similar to how I record a Tweet or FourSquare check-in. In fact, I can
treat the unique ID just like I treat a Tweet or FourSquare ID. This
also means that I would consider the photo's GPS coordinates as original
as long as it's ID is not already in the database. If the photo has GPS
coordinates as I add it to the database I will also record it's GPS
position, but on future scans I will detect it exists in the database
and can determine whether it was previously geo-tagged. If the database
is lost, but not the photos, I will no longer know if the photo has
original GPS data. To avoid this, I could record such information in
the photo inside a JPEG Comment header, but I now think this is
unnecessary and it's the user's responsibility to keep their database
backed up if this is important.

I am also feeling like it's less important to preserve original
orientation data as I have no good spot to put this. I do like
preserving original filename and size, but I am not happy with the
current mechanism to do this. Again, I could use a JPEG Comment header
and use some standard, machine readable format like this:

LifeTracker;Size:56478;Filename:IMG_5775.JPG;Orientation:3

This may be in addition to a user-configurable comment Exif header for
convenience such as:

Original Filename: %f, original size: %s

One issue with this proposal for using a unique identifier instead of
just taking the SHA-1 sum of the image data is that it will require
re-writing any image that lacks the ImageUniqueID header before it can
be added to the database of images. Maybe this should be configurable
allowing using SHA-1 as an alternative to an embedded identifier. The
primary downsides to SHA-1 are speed and lack of support for modifying
an image (including rotations/optimizing/progressifying) and maintaining
the same identifier.

Reply all
Reply to author
Forward
0 new messages