How can I programmatically process iXBRL in bulk?

1,292 views
Skip to first unread message

Paul Tibbetts

unread,
Jul 17, 2016, 10:38:47 AM7/17/16
to Arelle-users
Hey,

I'm trying to extract financial information for companies from the daily files that Companies House (UK) provides at http://download.companieshouse.gov.uk/en_accountsdata.html

My original plan was to download the daily files to a Ubuntu server and then call the web API to extract relevant information. I've been able to install Arelle onto a Ubuntu server and use the webserver to process these files however on a server with 1GB of RAM each file was taking ~7 seconds to load (which would take about 30 days to process 1 days worth of files) and I'd like to be able to automate the processing of these files (around 6,000 released each day) in a single day so that I can stay up-to-date.

I've read that I can use Arelle in CGI mode behind a webserver which would enable me to process multiple files at once. I also found that Arelle has the XBRL-DB plugin however I could not understand how or if I could utilize it for my purpose.

  1. Could someone please explain to me the role of XBRL-DB? I am ultimately trying to get financial data into a database, could this be used to process the files in bulk and move extracted facts to a database for me? I tried to get this working but could not get Arelle to keep the XBRL-DB plugin active; it would return that it had been activated but when I tried to list the active plugins the response was empty and so was I unable to test this.
  2. Should I continue trying to run Arelle in CGI mode behind a webserver? Has anyone done this before? (I tried following the documentation but have not dealt with CGI before and was unsuccessful in getting it to work)
  3. Am I better off using Arelle's Python code to achieve this instead? I have been able to return a document's facts using some Python code however I haven't yet worked out how to extract single facts (and their context) and the website is currently down and the documentation is simply listing Python modules (also Python is pretty new to me!)

Thanks in advance for any help or guidance; I'm new to XBRL and whilst I've found various parsers for documents filed using US-GAAP I have been unsuccessful getting relevant information out of iXBRL using UK-GAAP.


- Paul

Dave Cook

unread,
Jan 2, 2017, 12:54:16 PM1/2/17
to Arelle-users
Hey Paul:

I'm about to head down this road as well. Have you made any further progress? I'm attempting to shove the Edgar XBRL data into a database. 

Cheers,
Dave

lasand...@yahoo.com

unread,
Mar 8, 2017, 1:19:42 PM3/8/17
to Arelle-users
Hi Paul

Have you figured out the XBRL-DB ? Please share if so.

Thanks.



On Sunday, July 17, 2016 at 10:38:47 AM UTC-4, Paul Tibbetts wrote:

John Blough

unread,
Mar 16, 2017, 2:57:55 PM3/16/17
to Arelle-users
The SEC.gov site does offer some advice, but I haven't even figured that out.  I would be interested in seeing the Python code that is extracting some of the info you mentioned. Does every company provide a xbrl document or is this some new initiative? 

any collaboration here in this group would be awesome. 


On Sunday, July 17, 2016 at 10:38:47 AM UTC-4, Paul Tibbetts wrote:

Dave Cook

unread,
Apr 27, 2017, 4:45:45 PM4/27/17
to Arelle-users
Hi Guys:

If you're still interested I have this going now. I've been loading for the last month or so. I do get the odd "Service not available" from the Edgar site. So what I did was:

1) Install latest Postgres for Windows on my Wndows10 laptop. (1Tb HD - 8Gb RAM)
2) Loaded the .ddl schema - xbrlPublicPostgresDB.ddl into Postgres (from https://github.com/Arelle/Arelle/tree/master/arelle/plugin/xbrlDB)
  - start psql shell
  - create database edgar;
  - \c edgar
  - \i 'c:\path\to\ddl\file' (it eats a lot of screens - need to hit the space bar a lot to see the schema load finish)
3) Install latest Arelle - http://arelle.org/download/
4) Open a command prompt 
5) cd c:\Program Files\Arelle
6) arelleCmdLine -f "https://www.sec.gov/Archives/edgar/monthly/xbrlrss-2017-04" -v --store-to-XBRL-DB "localhost,5432,postgres,mypassword,edgar,90,postgres" --plugins xbrlDB

This will load all filings for April 2017 to Postgres. Just rinse and repeat for all the other xbrlrss files. I'm finding rss files over 16mb in size can choke out towards the end, not sure if this is me or if the is SEC cutting me off. In these cases I've actually gone to loading each one by hand using the Arelle GUI interface. Bottom line it's an arduous process.

Once I have a good amount of data I'd like to setup APIs to access the data and let it loose on the world.....

Cheers,
Dave

Dave Cook

unread,
Apr 27, 2017, 4:51:41 PM4/27/17
to Arelle-users
Hi:

Sorry for the extra post but I also found a great little project for downloading SEC filings and plucking out facts - https://github.com/chrisspen/django-sec. It works quite well.

Cheers,
Dave


On Sunday, July 17, 2016 at 10:38:47 AM UTC-4, Paul Tibbetts wrote:

JOHN CHAN

unread,
Jun 7, 2019, 11:23:29 PM6/7/19
to Arelle-users
This work!

just add .xml after the http link.

arden liu

unread,
Apr 3, 2020, 11:12:42 PM4/3/20
to Arelle-users
I do not know arelleCmdLine can load xbrlrss link directly, I wrote some jave code download those sec files and use arelleCmdLine to load it into database one by one.

Herm Fischer

unread,
Apr 7, 2020, 8:42:15 PM4/7/20
to arelle...@googlegroups.com
Here is a command line one of our users has to load rss files directly (with some parameter substitution):

python3 arelleCmdLine.py -f http://www.sec.gov/Archives/edgar/monthly/xbrlrss-${yymm}.xml --xdgConfigHome /home/arelle/cache/ --disclosureSystem efm-strict-all-years -v --plugins 'validate/EFM|validate/DQC_US_Rules_v4|xbrlDB/ext/edgar.py|validate/USBestPractices.py' --store-to-XBRL-DB ‘{host},8084,{userid},{password},open_db,1200,pgOpenDB,skipLoadedFilings' --noCertificateCheck >> /home/arelle/DBdev/log/open-pg-${yymm}.log 2>&1 &

-- 

--- 
You received this message because you are subscribed to the Google Groups "Arelle-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to arelle-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/arelle-users/852abd11-2a54-4595-bd79-415ccecf2783%40googlegroups.com.

Reply all
Reply to author
Forward
0 new messages