Merge Compustat Fundamentals Annual and Quartely

910 views
Skip to first unread message

diego

unread,
Jul 18, 2017, 2:48:16 PM7/18/17
to wrdssas
Hi all,

I want to construct a data set using firm-level balance sheet data from Compustat. The dataset should be at the quarterly frequency. However, some variables I need (such as the number of employees) are only available in the Fundamentals annual database, which is why I want to merge data from Compustat Fundamentals Quarterly with data from Compustat Fundamentals Annual. In principle, such a merge should be straightforward as the two databases feature the same identifier and time variable.

Here is what I did: I downloaded the data using SAS Studio. To produce one record per gvkey-datadate pair, I screen on indfmt = 'INDL', datafmt ='STD', popsrc='D', and consol='C'. Furthermore, I take the fiscal view (i.e. I drop observations where 'datafqtr' is missing) when downloading the quarterly data. Strangely, this does not produce unique gvkey-datadate pairs in case of the quarterly data: I get a number of duplicates. This is confusing as on the WRDS query page, it is stated that these screening values should produce one record per gvkey-datadate pair. Only when I drop observations for which 'datacqtr' is missing as well, I am able to get a dataset free of duplicates in terms of gvkey-datadate. In a next step, I merge the quarterly data with the annual data using an 1:1 merge on gvkey and datadate.

Does this sound like the correct way of merging Fundamentals Annual and Quarterly data from Compustat or is there a better way to merge the two? Why is screening on 'datafqtr' not sufficient (fiscal view) and what is the interpretation of screening over both 'datafqtr' and datacqtr'?

Many thanks for your help!

Best,

Diego

diego

unread,
Aug 3, 2017, 2:53:41 PM8/3/17
to wrdssas
After consulting with the S&P support I did the following:

For the quarterly data, I screen on indfmt = 'INDL', datafmt ='STD', popsrc='D', and consol='C' and take the fiscal view. The observations in the quarterly data are then identified by gvkey-datadate-fyr. For the annual data I screen on indfmt = 'INDL', datafmt ='STD', popsrc='D', and consol='C'. The observations are then identified by gvkey-datadate (and thus also gvkey-datadate-fyr). Finally, I perform an 1:1 merge on gvkey-datadate-fyr. The merge seems to work fine. There are some observations from the annual (using) dataset that could not be merged. This is because some companies feature longer history in the annual table or only have values in the annual table.

So I think this is fine as far as the merge is concerned. What is still
unclear to me is the interpretation of the quarterly data when I drop missings for both datacqtr and datafqtr to obtain a dataset that is uniquely identified by gvkey-datadate instead of gvkey-datadate-fyr. In the documentation it is just stated that we are taking the 'fiscal view' if we drop missings for 'datafqtr' and we take the calendar view if we drop missings for 'datacqtr'. Does anyone know whether it is legitimate to do both? Or how can I obtain a dataset that is uniquely identified by gvkey-datadate?

Many thanks for the help!

Best,

Diego
Reply all
Reply to author
Forward
0 new messages