Need to Unify Arelle JSON/CSV fields with XBRL terms/designations

46 views

Skip to first unread message

Sean Smith

unread,

Sep 12, 2025, 3:37:05 PMSep 12

to Arelle-users

I'm very new to Arelle. I'm using to to extract and transform data to use with AI LLMs.

I'm vide coding with GPT 5 to get Source SEC Filings extracted to XBRL-JSON and CSV, then consolidating the CSVs to reduce table count.

While doing so, I've encountered some problems which, were frustrating and took awhile to work around.

But one that just crept up has me thinking I need to post here.

I got all XBRL facts output to a single master CSV fine. I wanted to add more data from the linkbase files, to provide more contextual datapoints for the AI to have access to.

Apparently Arelle outputs different column name/terms than what is XBRL standard. I think given the role that AI LLMs are going to have, the Arelle team should change arelle to output with exact XBRL terminology so that when an AI LLM reviews Arelle output JSON/CSV, there is no semantic interpretation. It is my experience that AI LLMs when presentated with tabular data hallucinate more frequency on tasks involving semantic interpretation; exact terms with exact matches would improve AI LLM data retrieval and analysis, in my opinion.

Here's an explanation from GPT 5 about this issue:

What that means in practice

Concepts CSV: You’ll see headers like Name, Type, Period Type, Balance, Abstract, Label, Preferred Label, Id.
These correspond to the XBRL concept QName, dataType, periodType, balance, abstract, labels, etc. Arelle just uses human-friendly column names (e.g., “Period Type” instead of periodType).
Relationships (arcs) CSV (from the viewArcrole exports): headers like From, To, Link Role, Arcrole, Order, Weight.
These map to XBRL linkbase relationships: from-concept, to-concept, linkrole URI, arcrole URI, order, weight.
Dimensions view: Axis, Member, Domain, Hypercube, which map to XBRL dimensional constructs (dimension, domain-member, hypercube).

So: Arelle is naming columns for readability, not enforcing spec names; the underlying data is standard XBRL, but header strings aren’t standardized by XBRL or by the SEC.

Again, I'm a complete newbie to all of this, so if I'm off base, then okay. However, if I'm not, then I think the Arelle project should consider making an effort to correct this problem and make JSON/CSV outputs strictly adhere to the existing XBRL terms/definitions.

Austin Matherne

unread,

Sep 15, 2025, 10:27:25 AMSep 15

to Arelle-users

Hi,

Thanks for sharing your experience, it’s always helpful to hear how new users approach Arelle, especially with AI/LLM use cases.

You’re right that Arelle’s CSV/JSON export uses human-readable column names rather than strictly the XBRL spec terminology. Accountants, auditors, and other non-developer users are often confused by the use of strict technical spec terms. Arelle uses terms that are more likely to be understood by the majority of users. The underlying data is standard XBRL, but the headers aren’t prescribed by the spec, so Arelle chose readability. That said, your point about minimizing semantic interpretation for LLMs is a good one.

There’s an active effort in the XBRL International Open Information Model Working Group to define a taxonomy format that’s specifically designed to be digestible by LLMs and other machine-learning systems. Once the specification is finalized, Arelle will support it. Paul Warren gave a preview of this work at the Digital Reporting in Europe 2025 conference earlier this year. You can find the slides and video here: under “Next Steps for the XBRL Standard: A preview of the OIM Taxonomy.”

So in the future, you should have exactly the kind of strict, spec-aligned formats that are easily consumed by LLMs that you’re looking for. In the meantime, if you need machine-strict naming, you can often adjust column headers yourself in post-processing since the mapping to spec concepts (periodType, balance, etc.) is consistent.

Thanks again for raising this, your use case is in line with the issues that the standards community is working to solve.