Processing Bulk data from companies house

36 views
Skip to first unread message

Yowok YusYKLA

unread,
Jul 11, 2024, 11:20:42 AMJul 11
to Arelle-users
Hello Community,

We are working on a project that utilize the daily iXBRL data files from companieshouse.gov.uk. (e.g. https://download.companieshouse.gov.uk/en_accountsdata.html)

We have used the Arelle to convert the iXBRL file into the OIM xBRL-JSON file for subsequent processing after the files are checked free of error.

The Arelle has amazingly perform the jobs we wanted. However, there are too many (~30000) iXBRL data files daily, the job needs to takes hours to complete. It takes 4 seconds to load an iXBRL file and takes another 3 seconds to complete the validation of an iXBRL file.

Below is the command that we have been using.

"%ARELLE%" --internetConnectivity=offline --validate --logFile=D:\Log\Log_ Prod223_3732_SC764133_20240405.json --packages="D:\Taxonomy\The_2023_Taxonomy_suite_v1.0.1.zip|D:\Taxonomy\FRC-2022-Taxonomy.zip|D:\Taxonomy\FRC-2024-Taxonomy-v1.0.0_GJp67Do.zip" --plugins saveLoadableOIM --saveLoadableOIM D:\OIM_JSON\Prod223_3732_SC764133_20240405.json --file "D:\2024\Accounts_Bulk_Data-2024-07-02\Prod223_3732_SC764133_20240405.html"

Are there anything we could do to improve the performance of the job??

Thank you.

Regards,
Yowok

Austin Matherne

unread,
Jul 11, 2024, 1:19:03 PMJul 11
to Arelle-users
Hi Yowok,

The performance metrics you've provided are in line with what I would expect from Arelle. Users that need to handle large scale XBRL processing typically either run a work queue with a cluster of Arelle services that can dynamically scale up and down with the required load, or use an XBRL processor from one of the many vendors that have an optimized implementation written in C++/Java/C#/etc (although in this case you would likely still need multiple instances of the processor).

Kind regards,
Austin Matherne
Reply all
Reply to author
Forward
0 new messages