Duplicate resources

85 views
Skip to first unread message

mlg

unread,
May 11, 2008, 8:08:54 AM5/11/08
to ResourceSpace
I am testing ResourceSpace for a charitable organization that
processes 45000 photos per year. Upwards of 30 people will be
uploading photos to it. One problem I have discovered is how easy it
is to upload duplicates, and how difficult it is to find them. It
looks like you can have any number of resources with the same original
filename.

Is there any way to prevent duplicates from being uploaded, or to
easily find and delete duplicates?

mor...@propix.no

unread,
May 11, 2008, 8:17:12 AM5/11/08
to resour...@googlegroups.com
It is possible to search for the original filename.

Morten

mlg

unread,
May 11, 2008, 7:00:48 PM5/11/08
to ResourceSpace
Yes, I know that, but imagine the scenario where I forget that I had
added the resources already, and I do another batch upload of 50
pictures. Now I have 50 duplicates, and I have to delete them one at a
time. That is a lot of work.

Dan Huby

unread,
May 12, 2008, 4:00:20 AM5/12/08
to ResourceSpace


On 12 May, 00:00, mlg <mlgr...@accesscable.net> wrote:
> Yes, I know that, but imagine the scenario where I forget that I had
> added the resources already, and I do another batch upload of 50
> pictures. Now I have 50 duplicates, and I have to delete them one at a
> time. That is a lot of work.

You don't have to delete them one at a time.

You can add them to a collection then use 'delete all items in this
collection' (on the edit collection page).

On this topic - probably the thing to do is not to use the original
filename as the way to track down duplicates. It's quite possible that
two different files had the same filename on upload. I think what's
needed is a checksum created using the file contents.

This would also help with 'staticsync.php', as it could be used to
recognise that a file has been moved to a new location rather than
seeing it as a delete / new file.

Dan

Guy Tozzi

unread,
May 12, 2008, 1:16:10 PM5/12/08
to resour...@googlegroups.com
Again a great topic here. Detection of duplicates is a "must have" I
think in a DAM system.

For each uploaded file, we could create an md5 signature and store it
into a new column of the resource table.
I turned around this thing recently. It seems that the md5 operation
is pretty fast (not too much time consuming).
Of course, we also need to implement a "find duplicate" feature
somewhere in the Team Center.
I'm very busy and will not be able to work on this before a while.
Maybe some dev of RS would be interested?

-----
Guy

Le 12 mai 08 à 10:00, Dan Huby a écrit :

Reply all
Reply to author
Forward
0 new messages