I'm very new to Arelle. I'm using to to extract and transform data to use with AI LLMs.
I'm vide coding with GPT 5 to get Source SEC Filings extracted to XBRL-JSON and CSV, then consolidating the CSVs to reduce table count.
While doing so, I've encountered some problems which, were frustrating and took awhile to work around.
But one that just crept up has me thinking I need to post here.
I got all XBRL facts output to a single master CSV fine. I wanted to add more data from the linkbase files, to provide more contextual datapoints for the AI to have access to.
Apparently Arelle outputs different column name/terms than what is XBRL standard. I think given the role that AI LLMs are going to have, the Arelle team should change arelle to output with exact XBRL terminology so that when an AI LLM reviews Arelle output JSON/CSV, there is no semantic interpretation. It is my experience that AI LLMs when presentated with tabular data hallucinate more frequency on tasks involving semantic interpretation; exact terms with exact matches would improve AI LLM data retrieval and analysis, in my opinion.
Here's an explanation from GPT 5 about this issue:
What that means in practice
-
Concepts CSV: You’ll see headers like Name, Type, Period Type, Balance, Abstract, Label, Preferred Label, Id.
These correspond to the XBRL concept QName, dataType, periodType, balance, abstract, labels, etc. Arelle just uses human-friendly column names (e.g., “Period Type” instead of periodType).
-
Relationships (arcs) CSV (from the viewArcrole exports): headers like From, To, Link Role, Arcrole, Order, Weight.
These map to XBRL linkbase relationships: from-concept, to-concept, linkrole URI, arcrole URI, order, weight.
-
Dimensions view: Axis, Member, Domain, Hypercube, which map to XBRL dimensional constructs (dimension, domain-member, hypercube).
So: Arelle is naming columns for readability, not enforcing spec names; the underlying data is standard XBRL, but header strings aren’t standardized by XBRL or by the SEC.
Again, I'm a complete newbie to all of this, so if I'm off base, then okay. However, if I'm not, then I think the Arelle project should consider making an effort to correct this problem and make JSON/CSV outputs strictly adhere to the existing XBRL terms/definitions.