Check duplicates from the datastore

13 views
Skip to first unread message

Livia Barazzetti

unread,
Apr 26, 2016, 3:09:44 PM4/26/16
to sumatra-users
Hi all,
I am thinking to take advantage of sumatra database to check for duplicate files (files with the same hash but different path). 
What I tried is:

project = load_project()
models = project.record_store._get_models()
datakeymanager = models.DataKey.objects.using(project.record_store._db_label)
query=datakeymanager.filter(path__contains=PATH, digest__contains = DIGEST)

Am I moving in the correct direction? Or is something similar already available? 
Thanks
Livia 

thomas...@boostheat.com

unread,
Apr 27, 2016, 9:57:27 AM4/27/16
to sumatra-users
Hello,
I think you search duplicate in DataKey right ?

Datastore holds where data are stored.
DataKey is the actual data used for a record.

What I use to select DataKey is something similar to :

records = Record.objects.filter(label__in=records_to_process, project__id=project)
for record in records:
      QuerySet = DataKey.objects.filter(input_to_records=record,
path__contains=PATH, digest__contains = DIGEST)

This might be faster by selecting it directly :
      QuerySet = DataKey.objects.filter(path__contains=PATH, digest__contains = DIGEST)

I didnot test it.
Thomas
Reply all
Reply to author
Forward
0 new messages