Open Data vs. opensecrets.org industry amount mismatch

132 views
Skip to first unread message

greg michalec

unread,
Nov 27, 2009, 10:41:50 PM11/27/09
to OpenSecrets Open Data
Hi -
We're working on updating the numbers on oilmoney.priceofoil.org with
the Open Data data, and are having some problems making our numbers
match those posted on the opensecrets website. What we are trying to
get is an accurate representation of all contributions made to
congress members by the Oil and Gas industry, but when we tally up our
transaction amounts, a significant number of them differ from what's
on the CRP website.

Here's an example: For Ron Packard (N00007009 / H2CA43047), in 2000,
we have 7 contributions made to him
(0490782,0468041,0596189,0665787,0307230,0278526,0315402) for a total
of $6198 (all pac money). If we cross-reference this against the CRP
website (http://www.opensecrets.org/politicians/industries.php?
cid=N00007009&cycle=2000&type=I&newMem=N&recs=0), we see him listed
with a total of $7198.

Another example:
Terry Everett (N00003083 / H2AL02067), again in 2000. From the queries
we ran against the open data, we have a total of 12 transactions
(0876955,1737702,1737701,0723894,0723893,0234180,0234179,0481614,0100386,0565461,1524755,2081732)
for a total of $6500 ($5000 in 6 pac contribs, $1500 in 6 individual
contribs). The site (http://www.opensecrets.org/politicians/
industries.php?cid=N00003083&cycle=2000&type=I&newMem=N&recs=0) shows
him at $7500 Oil & Gas contribs - $6500 pac and $1000 individual.

I can only assume that this discrepency is due to a) our queries
against the open data being incorrect, b) the data used for the
website totals is somehow different than what is available for
download, or c) a combination of the two. My money is on our queries
being wrong, so if anyone could offer us any guidance, we would
greatly appreciate it.

Here are the queries that we are using to flag the contributions we
are interested in:
1) Individual contributions to Candidate Committees
select distinct i.FECTransID as fec_id, Orgname as company_name,
Party, i.Date as date, Amount, i.RecipID as recipient_id, feccandid
as candidateid, PacShort as recipientname, i.cycle as cycle, industry
as indshort , catname as catshort, i.microfilm as mflocations, if
(i.source = 'gen', 'g', 'i') as type, concat(i.FECTransID, '_',
substring(i.cycle, 3, 2)), i.realcode FROM indivs00 i left join
cmtes00 c on i.recipid = c.CmteID join categories b on i.realcode =
b.catcode where i.recipid like 'c%' and (primcode not like 'z4%' or
primcode is null) and i.recipcode not like 'p%' and i.type in (10, 11,
15, '15E', '15J', '22Y') group by FECTransID");

2) Individual contributions to Candidates
select distinct i.FECTransID, Orgname, Party, i.Date, Amount,
i.RecipID, FECCandID, FirstLastP, i.cycle, industry, catname,
i.microfilm, if(i.source = 'gen', 'g', 'i') as type, concat
(i.FECTransID, '_', substring(i.cycle, 3, 2)), i.realcode FROM
indivs00 i left join cands00 c on i.recipid = c.cid join categories b
on i.realcode = b.catcode where i.recipid like 'n%' and i.recipcode
not like 'p%' and i.type in (10, 11, 15, '15E', '15J', '22Y') group by
FECTransID");

3) PAC contributions to Candidates
select distinct i.FECRecNo as fec_id, e.pacshort as company_name,
c.Party, i.Date as date, Amount, i.cid as recipient_id, c.feccandid,
FirstLastP as recipientname, i.cycle as cycle, industry as indshort ,
catname as catshort, null as mflocations, 'c' as type, concat
(i.FECRecNo, '_', substring(i.cycle, 3, 2)), i.realcode FROM pacs00 i
join cmtes00 e on pacid = e.cmteid left join cands00 c on i.cid =
c.cid join categories b on i.realcode = b.catcode where i.cid like 'n
%' group by FECRecNo

We are assuming that we do not need to query against the pacs_other00
table, since it seems that any contribution from a PAC to a candidate
committee is reported by the candidates in pacs_00.

Again, any help in figuring this out would be appreciated. Thanks!

greg michalec

unread,
Nov 27, 2009, 10:47:39 PM11/27/09
to OpenSecrets Open Data
Sorry - I forgot to add the industry restriction to those queries: we
are using " and realcode like 'e11%'" on the the results of those
queries to get the transactions specific to the oil and gas industry.

Susi Alger

unread,
Nov 30, 2009, 11:50:13 AM11/30/09
to opensecret...@googlegroups.com
Hi Greg,

Interesting questions that you pose on the calculation of money to specific candidates from the oil & gas industry.  The primary reason that the numbers are not matching in the examples that you gave are due to the fact that the older cycle industry profile data on our web site (2004 and earlier), was last updated on 7/30/2009.  The 2000 OpenData dataset was generated on 9/1/2009.  I have verified that the numbers you gave me for Everett and Packard are what we have in our 2000 data now from Oil & Gas.  We expect to schedule a complete all-cycle update of our industry profile data this month.

Additionally, I should note that we do include Pac_Other data in our industry totals to candidates.  There is not a lot of money here, and this does not affect either of your examples, but use the money from Pac_Other where type in ('11', '24K', '22Z', '24R', '24Z') and recipid like N%

Lastly, I'd like to turn to your queries.  It's difficult for me to say if they are precisely correct for a number of reasons.  Is your candidate table unique on CID, for example?  Additionally, SQL syntax can be interpreted in different ways by different engines or interpreters.  The select field statement "if (i.source = 'gen', 'g', 'i') as type" for example, isn't standard SQL and can in some interpretations preclude records with a NULL source.  The condition ".type in (10, 11, 15, '15E', '15J', '22Y')" which includes both numeric and text items in the list could also throw an error or produce unexpected results.  Using "select distinct" against a list of fields while simultaneously grouping by one field is not something I've seen before.  All that said, the results you are getting seem to match what I find for your examples, so perhaps all of this works correctly with your implementation.

The critieria we use for PACs to candidates for the industry profiles is just DI = D and a non-blank cid. The criteria for Indivs to candidates is just type in ('11', '15', '15E', '15J', '22Y') and recipid like N%.  There's no need to exclude realcodes like Z9 or to worry about double counting because we're already limiting the realcode and we're not using any totals that could double count.  We have discussed including money to candidates' Leadership PACs in this web display, but that is not in the works yet.

Thanks again, Greg, and good luck with the project!

Susi


--

You received this message because you are subscribed to the Google Groups "OpenSecrets Open Data" group.
To post to this group, send email to opensecret...@googlegroups.com.
To unsubscribe from this group, send email to opensecrets-open...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/opensecrets-open-data?hl=en.



greg michalec

unread,
Dec 8, 2009, 1:24:17 AM12/8/09
to OpenSecrets Open Data
Hi Susi -
Thanks so much for your reply - the info you gave was very helpful.
Unfortunately, we're still having some issues making our numbers match
what is on the CRP website. For example, the website lists $11,324,508
in individual Oil & Gas contribs in 2000 (http://www.opensecrets.org/
industries/indus.php?cycle=2000&ind=E01). If we look at the OpenData
dataset using the query
select sum(amount) from indivs00 where RecipID like 'n%' and realcode
like 'e11%' and type in ('10', '11', '15', '15E', '15J', '22Y')
we only get $6,972,182.

If we look at a specific candidate, Tom Tancredo, the OpenData data
has him at $28,667 in 2000 cycle individual Oil & Gas contributions:
select sum(amount) from indivs00 where RecipID = 'N00006103' and
realcode like 'e11%' and type in ('10', '11', '15', '15E', '15J',
'22Y')
however, the website has him at $34,417 individual contributions
(http://www.opensecrets.org/politicians/industries.php?
cycle=2000&cid=N00006103&type=I).
I realize there may be a bit of a gap due to the generation time
between the two datasets, but a few months does not seem to justify
missing $5750 in contributions to a campaign over nine years ago.

We are seeing similar discrepancies in the pac data as well. Is it
possible that the website uses an expanded set of categories for the
'Oil & Gas' sector?

Thanks again,

--greg michalec
> > opensecrets-open...@googlegroups.com<opensecrets-open-data%2Bunsu...@googlegroups.com>
> > .

Susi Alger

unread,
Dec 8, 2009, 11:05:14 AM12/8/09
to opensecret...@googlegroups.com, jihan, sheila
Hello Greg,

No, we are not using an expanded definition of the Oil and Gas industry -- it is defined as catorder = E01.  As it happens, all categories that have a catorder of E01 also have a catcode of E11*.  So, either condition should work.  However, in researching this, I did learn something interesting that we hadn't really considered.  The industry profiles data are generated regularly -- every 3 to 4 weeks for 2006, 2008 and 2010.  A couple of times a year for the 1990 - 2004 data.  However, the 2000 member profile top industries have not been regenerated since 2002.  Seems like we need to get those older cycle updates onto our maintenance schedule soon.

Susi

To unsubscribe from this group, send email to opensecrets-open...@googlegroups.com.

greg michalec

unread,
Dec 8, 2009, 2:35:04 PM12/8/09
to OpenSecrets Open Data
Hi Susi -
Ok - that explains why 2000 data may be off - but if I look at 2006, I
see similar issues. The website lists the oil & gas individuals total
as $11,889,104, but I only find $6,100,321 in the OpenData. For
individual examples, I have N00026043 at $39,000 versus $45,000 on the
website. Conversely, I see N00005656 at $63,400 versus $58,200 on the
website.
Thanks for your help in tracking this down - we just want to make sre
our queries and data are correct before we publish anything.

--greg michalec

On Dec 8, 11:05 am, Susi Alger <susi.al...@gmail.com> wrote:
> Hello Greg,
>
> No, we are not using an expanded definition of the Oil and Gas industry --
> it is defined as catorder = E01.  As it happens, all categories that have a
> catorder of E01 also have a catcode of E11*.  So, either condition should
> work.  However, in researching this, I did learn something interesting that
> we hadn't really considered.  The industry profiles data are generated
> regularly -- every 3 to 4 weeks for 2006, 2008 and 2010.  A couple of times
> a year for the 1990 - 2004 data.  However, the 2000 member profile top
> industries have not been regenerated since 2002.  Seems like we need to get
> those older cycle updates onto our maintenance schedule soon.
>
> Susi
>
> On Tue, Dec 8, 2009 at 12:24 AM, greg michalec <gregmicha...@gmail.com>wrote:
>
> > Hi Susi -
> > Thanks so much for your reply - the info you gave was very helpful.
> > Unfortunately, we're still having some issues making our numbers match
> > what is on the CRP website. For example, the website lists $11,324,508
> > in individual Oil & Gas contribs in 2000 (http://www.opensecrets.org/
> > industries/indus.php?cycle=2000&ind=E01<http://www.opensecrets.org/%0Aindustries/indus.php?cycle=2000&ind=E01>).
> ...
>
> read more »
Reply all
Reply to author
Forward
0 new messages