Recommended Hardware Spec for Arelle validation?

61 views
Skip to first unread message

Mike D

unread,
Jun 11, 2025, 9:28:52 AMJun 11
to Arelle-users
Hi, folks,

Thanks for all the helo to date on Arelle.  One thing that I've experienced and seen mentioned is that Arelle is reasonably computation-intensive, and it will take hours to run even on reasonably powerful hardware.  I am mostly interested in validating DORA registers of information xBRL-CSV files.  

That said, I am trying to understand what a "recommended" spec would be for an Arelle processing machine - what params make a difference.

Running locally on my M1 MacBook Pro (not super powerful, I know )with 32GB of RAM. Right now it's taking 12-24 hours to validate a not-terribly-complicated DORA package.  

I see Arelle GUI running at 100% of 1 CPU core and pretty much exactly 1GB of memory when processing, and up to 2-3GB of memory when working on views.  Arelle CLI seems to top out at 100% of 1 CPU and ~2-3GB of memory (even when more is available).  This is a little concerning, since it seems to indicate Arelle won't max out the resources available on the machine, but I'm sure better hardware would help.  Also possible the Mac version is less able to scale than on other OS's.  

I don't really need the GUI, so would probably run from the CLI. Looking for a recommendation on:
* Which OS does Arelle run best on, if it matters?  I can get a Linux or Windows VM very easily.
* How many processor cores can Arelle scale to use?  Don't want to request 16 and have 15 sit idle
* How much memory can Arelle make use of for processing tasks?  
* Are there any config settings in the CLI or GUI that I should be aware of that would allow Arelle to use more resources?  There's nothing. I see that's obvious in the CLI docs.

Mike

Gregorio Mongelli

unread,
Jun 12, 2025, 7:29:51 AMJun 12
to Arelle-users
Hi Mike,

Here are some hints on what you try to do.

1. Arelle relies on Python (currently Python 3.9-Python 3.13) and runs on any platform where Python runs. In our experience, for validation purposes, the OS does not really matter. We almost exclusively use the Linux x64 version in production environments.
2. Arelle only uses a single core. This is why we mostly use a queuing system (e. g. RabbitMQ) where we call Arelle on demand. For some projets  (with a fixed taxonomy) the queuing system calls Arelle in Web Service mode.
3. Arelle uses all the memory that it can use. We have some use cases where Arelle uses many gigabytes :-(
4. As far as I know, there are no specific settings to use more computation resources. In the past, there was a mode where it was possible to use a custom web server that was able to serve multiple requests in parallel in multiple OS processes. I think it is deprecated and must no longer be used. When I tried it last several years ago, It simply failed doieng the expected validations.

In the hope that these empirical observations could be of some use for you,

Greg  

Real Human

unread,
Jun 16, 2025, 7:02:58 AMJun 16
to Arelle-users
Hi,

Just a question, why does it make sense to use Arelle in Web Service mode when the taxonomy is fixed? what does fixed mean?

Best regards
RH

Mongelli, Gregorio

unread,
Jun 16, 2025, 8:25:35 AMJun 16
to arelle...@googlegroups.com
Hi Real Human,

By "fixed taxonomy", I meant a taxonomy that is immediately usable and for which it is forbidden to build extension taxonomies on top of it.

Each taxonomy is made of "entry points": initial starting points having the form of URLs that define the comprehensive subset of concepts defined in the taxonomy that are allowed to be used in a given XBRL instance. 

Taxonomies can get very large. When a validating XBRL processor reads an XBRL instance, it must first read the "Discoverable Taxonomy Set" (DTS) derived from a taxonomy entry point , i. e. the exhaustive list of files that make up the taxonomie and then it knows how to interpret the XML elements from the XBRL instance.

For an entry point in a "fixed taxonomy", Arelle in web server mode can be tweaked to load the DTS only once and to transform/validate multiple XBRL instances that obey the same DTS.

The idea is to keep the same DTS in memory for multiple XBRL instances and thus to optimise the number of times a taxonomy file is read.

Please notice that by default Arelle does not do that. You have to program it!

The EBA taxonomy, the one containing the entry point for DORA reporting, is one of those "fixed taxonomies".

Best regards,

Greg



--

---
You received this message because you are subscribed to a topic in the Google Groups "Arelle-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/arelle-users/zw50nWONHao/unsubscribe.
To unsubscribe from this group and all its topics, send an email to arelle-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/arelle-users/8793b9df-5dd9-4c65-8b69-7c0d723a64f1n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages