Extracting XBRL data

Martin Hegedus

unread,

Jul 18, 2014, 2:22:22 PM7/18/14

to arelle...@googlegroups.com

Hi,

This is the first time I've dealt with XBRL and am a complete newbie in regards to it. I am familiar with Python and have a little knowledge of XPath.

My objective is to determine averages and standard deviations for various financial values such as revenue and net income from EDGAR data.

For example, I've downloaded KLA Tencor's (KLAC) latest 10-Q which includes the .xml. .xsd, _cal, _def, _lab, and _pre files. I started up the Arelle GUI (I'm running under SUSE and have downloaded the source code with git) and imported the KLAC files to get a basic idea of the contents of the files. That worked very nicely. The QNames for the parameters I'm interested in are us-gaap:SalesRevenueNet and us-gaap:NetIncomeLoss of the us-gaap_IncomeStatementAbstract "table".

So how do I use Arelle's Python API, or the command line, to retrieve KLAC's revenue and net income from the 10-Q/K, along with the date and time period (Q or Year) associated with it? I would like the process to be automated to the extent possible.

Thanks,

Martin

Herm Fischer

unread,

Jul 28, 2014, 4:54:36 AM7/28/14

to arelle...@googlegroups.com

There are a couple of ways you could retrieve from python or command line, none likely to be easy to work with a volume of data. The command line can save different formats of fact lists (such as CSV) that some use for exact matches on fact QNames, the API can do the same (I believe there are examples on loading instances and retrieving by QName from the instance.

However I'd think using the XBRL database might be another way to access the entire set of filings. There's an XBRL database documentation page, and downloads of pre-loaded database dumps, and that's an alternate way of directly accessing filing such facts.

In any case we're currently involved in several efforts for normalizing data. Individual filings are quite some choice in how they tag facts, and it may take an understanding of the presentation structure to (a) find how SalesRevenueNet and NetIncomeLoss got tagged, and (b) which schedules might have reported the values needed.

Herm

Martin Hegedus

unread,

Aug 1, 2014, 3:51:08 PM8/1/14

to arelle...@googlegroups.com

Hi Herm,

Thanks for you response. My current approach is to use the ViewFileFactTable method as a template for my data extraction method. And I've got that working. I tried figuring out how XPath and Sphinx could be used to extract the data, but was unsuccessful.

Yes, and as you mentioned, trying to figure out how a company tags something is challenging in the sense that many tags can be used for the same concept. Some of the tags are even user defined. Not sure how to deal with it, other than getting as much working as I can.

Thanks
Martin

arlu...@gmail.com

unread,

Aug 8, 2014, 12:03:58 AM8/8/14

to arelle...@googlegroups.com

Hi Herm

I uploaded the SEC filings using the RSS feed . How do I extract the XML id's from the XBRL database of all the Income statements in the 10-K forms uploaded ?

Thanks

Reply all

Reply to author

Forward