How can check the consistency between metadata and uploading files

32 views
Skip to first unread message

Michel Bamouni

unread,
Aug 10, 2017, 10:10:12 AM8/10/17
to Dataverse Users Community
Hi,

I want to setup dataverse in production, so I want to know if dataverse offers a way to check the consistency between the uploading files and the metadata in the database?

Regards,

Michel

danny...@g.harvard.edu

unread,
Aug 10, 2017, 10:46:37 AM8/10/17
to Dataverse Users Community
Hey Michel,

I'm not sure I understand the question as it pertains to metadata, but we have UNFs (http://guides.dataverse.org/en/latest/developers/unf/index.html#unf) at the dataset and file level for tabular data, and also MD5/SHA1 checksums for file fixity (http://guides.dataverse.org/en/latest/installation/config.html?highlight=md5#filefixitychecksumalgorithm). 

Check out the documentation and let me know if either of these answer your question. If not, I'm happy to dig a little more with people in the community who may be more experienced.

Thanks,

Danny

Pete Meyer

unread,
Aug 10, 2017, 8:28:56 PM8/10/17
to Dataverse Users Community
Hi Michel,

If you're talking about file checksums, I don't believe that there's anything built in.  But all the parts needed to do this should be available through the various APIs (list public datasets, query storage identifiers checksums, etc).  

That said, this is likely to be a reasonably CPU (and potentially IO intensive) process, so it probably makes sense to run it on a system that's not simultaneously driving the user interface.

Best,
Pete

On Thursday, August 10, 2017 at 10:10:12 AM UTC-4, Michel Bamouni wrote:

Michel Bamouni

unread,
Aug 11, 2017, 5:13:44 AM8/11/17
to Dataverse Users Community
Hi ,

First of all, thanks for answer,

The problem i want to solve is :
I setup a load balancer for my dataverse and I want to synchronous the two server data by coping the database et the uploading files.
So my need is that after copying data how can I verify the uploading files for a dataset are successfully coping?

Regards,

Michel

Pete Meyer

unread,
Aug 11, 2017, 11:03:56 AM8/11/17
to Dataverse Users Community
Hi Michel,




The problem i want to solve is :
I setup a load balancer for my dataverse and I want to synchronous the two server data by coping the database et the uploading files.
So my need is that after copying data how can I verify the uploading files for a dataset are successfully coping?


If the intention for load balancing is to have users making use of both at the same time, then I think that you'd be better off with shared file storage and a shared database for both application servers.  Trying to keep to filestores and databases in sync when both are being changed seems to me like something that could run into problems.

Best,
Pete
 

Michel Bamouni

unread,
Aug 11, 2017, 11:50:25 AM8/11/17
to Dataverse Users Community
Hi Peter,

My intention is not to have of the use of the two servers by end user at the same time.
My goal is to swith when one server encounter problem and to synchronise the data during night by copying them.
Reply all
Reply to author
Forward
0 new messages