Problem generating comprehensive breakdown reports on large datasets

40 views
Skip to first unread message

BN

unread,
Jan 9, 2023, 9:54:03 AM1/9/23
to droid-list
I'm experiencing problems trying to generate a comprehensive breakdown report for large datasets. The generation process freezes approx. 60% before being completed.

I'm not sure if it's the size of the dataset or a particular problem with a file or folder, but I'm experiencing the exact same problem on two different computers, on a different network and for different datasets.

I haven't had this problem before and it does not actually occur on smaller datasets.

I'm running Droid from my local computer, which is running Windows and Java 19. Removing the .droid6 folder didn't fix the problem either.

Don't hesitate to ask further details if it can help.

Hope this problem can be easily fixed. :-)

Best Regards,

Bieke



Matt Palmer

unread,
Jan 10, 2023, 4:52:53 AM1/10/23
to droid-list
I have seen this too many times. I believe it is a limitation of the XLST technology that is used to generate reports.  Large data sets grind it to a halt.

I think the fix would be to rewrite the report generation code without using XSLT.  It was chosen originally because it offered flexibility in producing many different kinds of report from the same data, by specifying different transformations.  And XML was the technology to rule them all back then.  In reality, the reports which were created over 10 years ago have not been changed or rewritten or added to, so the flexibility argument doesn't really stand up any more, IMHO.

As a workaround, you could try running lots of smaller reports (not comprehensive), or just export the data and create your own reports with some other technology.

Regards,

Matt

Matt Palmer

unread,
Jan 10, 2023, 4:56:18 AM1/10/23
to droid-list
Note - it is also possible that it is something to do with filtering large data sets, which the comprehensive reports might specify.  I haven't proven it is XSLT that is to blame.  It could be the database queries.

BN

unread,
Jan 12, 2023, 3:05:25 AM1/12/23
to droid-list
Hi Matt,

Thanks a lot for your reply.
I'm not sufficiently up to speed with the technology to know if the XSLT could be the problem, but I hope this is it and it's fixable.
We're about to send of a survey asking people to send us their Droid reports (among other things). I don't think they'll be willing to share the details of their data or generate several reports, after completing our quite long survey. :-)

BR,

Bieke
Op dinsdag 10 januari 2023 om 10:56:18 UTC+1 schreef matt...@gmail.com:

Matt Palmer

unread,
Jan 12, 2023, 3:28:17 AM1/12/23
to droid...@googlegroups.com
I should point out that I do not work at the National Archives any more, so please don't take my reply as a brush off!  I helped build DROID 6 many years ago, so I'm familiar with the technology and some of the issues.

Matt 

--
You received this message because you are subscribed to a topic in the Google Groups "droid-list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/droid-list/hVd6c3rByqc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to droid-list+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/droid-list/e5ca6f74-0284-4220-ac3c-da62a6404f8dn%40googlegroups.com.

Andrea Hricikova

unread,
Jan 24, 2023, 11:17:27 AM1/24/23
to droid-list
Dear Bieke, 

Thank you for alerting us to this issue and apologies for the delay in our response. 

We noticed that this was an issue some time ago and some of our users reported that it has been working better in the newest version of DROID (see BUG: comprehensive report never completes. · Issue #400 · digital-preservation/droid (github.com) ). The version 6.6 of DROID has not been officially released yet, but you can find the release candidate here -  Central Repository: uk/gov/nationalarchives/droid-binary/6.6.0-rc2 (maven.org) . If you are able to download this newest version of DROID (  droid-binary-6.6.0-rc2-bin-win32-with-jre.zip [Java embedded] or droid-binary-6.6.0-rc2-bin.zip [without Java]), then scan the 'large datasets' that you mentioned, and please let us know if the issue with creating comprehensive report still occurs or it has improved.

If it is still the case, it might be useful for us to know which version of DROID are you using and is it a version that has Java embedded for Windows, or the one without Java? It might also be helpful to know what constitutes as "large datasets" roughly, what would be the number of files that your users are scanning to create a comprehensive report? If this issue still occurs our team will investigate this further for future releases of DROID. 

Best regards
Andrea

Andrea Hricíková
File Format Analyst
The National Archives, UK

Bieke Nouws

unread,
Feb 8, 2023, 11:23:13 AM2/8/23
to droid...@googlegroups.com

Dear Andrea,

 

Thank you for taking the time to respond to my message and apologies in my turn for not getting back to you sooner.

I figured out the breakdown report is completed in the end, if you wait long enough and ignore the frozen progress bar. This is splendid news. The difficulty in our case will be to convince external parties to wait for the result after they already completed our survey. :-)

 

I’m not seeing a relevant difference using DROID 6.6 as compared to DROID 6.5. Generating the comprehensive breakdown report took 14-15 minutes using both versions. I am using and have been using the version with Java included. We're talking about a dataset of around 36.000 files or 70 GB of data.

 

Anyway, since the problem has been reporterd before, I will just keep an eye on the existing conversation on the forum and update regularly.

 

Many thanks!

 

Best regards,

 

Bieke


Op di 24 jan. 2023 om 17:17 schreef 'Andrea Hricikova' via droid-list <droid...@googlegroups.com>:
Reply all
Reply to author
Forward
0 new messages