Quick side-note, should I attempt to record other information like
rotated, or geo-tagged (indicating it's GPS coordinates were added after
the shoot)? With other sources I don't record this and consider them
original if they don't already exist in the database.
Now when I process an image, I can add a unique ID if the image lack one
and use that in a central database to track it. So far, I have not seen
any programs that generate the unique ID already, but even if I did, it
should not matter since any program should be able to ensure uniqueness
with a sufficient random number generator.
Once I have the unique ID, I can add an entry into the image database
and record information such as name, date taken and GPS coordinates
similar to how I record a Tweet or FourSquare check-in. In fact, I can
treat the unique ID just like I treat a Tweet or FourSquare ID. This
also means that I would consider the photo's GPS coordinates as original
as long as it's ID is not already in the database. If the photo has GPS
coordinates as I add it to the database I will also record it's GPS
position, but on future scans I will detect it exists in the database
and can determine whether it was previously geo-tagged. If the database
is lost, but not the photos, I will no longer know if the photo has
original GPS data. To avoid this, I could record such information in
the photo inside a JPEG Comment header, but I now think this is
unnecessary and it's the user's responsibility to keep their database
backed up if this is important.
I am also feeling like it's less important to preserve original
orientation data as I have no good spot to put this. I do like
preserving original filename and size, but I am not happy with the
current mechanism to do this. Again, I could use a JPEG Comment header
and use some standard, machine readable format like this:
LifeTracker;Size:56478;Filename:IMG_5775.JPG;Orientation:3
This may be in addition to a user-configurable comment Exif header for
convenience such as:
Original Filename: %f, original size: %s
One issue with this proposal for using a unique identifier instead of
just taking the SHA-1 sum of the image data is that it will require
re-writing any image that lacks the ImageUniqueID header before it can
be added to the database of images. Maybe this should be configurable
allowing using SHA-1 as an alternative to an embedded identifier. The
primary downsides to SHA-1 are speed and lack of support for modifying
an image (including rotations/optimizing/progressifying) and maintaining
the same identifier.