RE: Open Data vs. opensecrets.org industry amount mismatch

132 views
Skip to first unread message

Skye Bender-deMoll

unread,
Mar 24, 2010, 8:29:05 AM3/24/10
to opensecret...@googlegroups.com
Hi Folks,

I'm curious if anyone else has been able to reproduce the industry
totals shown on the CRP site. I feel like I must be missing something
important on how these are calculated. Seems like our Oil & Gas numbers
using the most recent open data dump are only *half* of what is shown on
the CRP site.

open data dump total for indidual contribs in Oil & Gas in 2006:
$6,143,646 (was $6,100,321 on Dec 8)

vs

CRP site Oil and & Gas category, individual contributions:
$12,083,747 (was $11,889,104 on Dec 8 )

query is

select sum(amount) from indivs06 where RecipID like 'n%' and realcode
like 'e11%' and type in ('10', '11', '15', '15E', '15J', '22Y')

For greg's single candidate examples 2006:

N00026043
data dump: $24,500
CRP site: $39,000

N00005656
data dump: $63,400
CRP site: $70,900

but wait! in this case the totals on the website change when the
"leadership pac" flag is changed
it includes $7,500 in leadership pac funds.

Do the industry totals in the "Contributions from Individuals" column
include individual contributions to leadership pacs? Or is there some
other category of contributions included that we are missing?

http://www.opensecrets.org/industries/indus.php?cycle=2006&ind=E01

as always, thanks for your help and the amazing dataset,
-skye

> Hi Susi -
> Ok - that explains why 2000 data may be off - but if I look at 2006, I
> see similar issues. The website lists the oil & gas individuals total
> as $11,889,104, but I only find $6,100,321 in the OpenData. For
> individual examples, I have N00026043 at $39,000 versus $45,000 on the
> website. Conversely, I see N00005656 at $63,400 versus $58,200 on the
> website.
> Thanks for your help in tracking this down - we just want to make sre
> our queries and data are correct before we publish anything.
>
> --greg michalec

Susi Alger

unread,
Mar 25, 2010, 8:38:36 PM3/25/10
to opensecret...@googlegroups.com
Hi Skye,
 
First of all, I apologize for the delay in responding, but travel has interferred some with the time I have available online. 
 
I believe that your problem is the recipid like 'n%' criteria.  These totals are not just money to candidates, but they also include money to federal party committees.  Also, type 10 money is only included for the 2002 and earlier cycles.  After that, type 10 is Levin funds, and we do not include it in the industry profiles.
 
Let me know if this doesn't solve the problem.  I look forward to continuing the discussion at Transparency Camp this weekend.
 
Susi

--
You received this message because you are subscribed to the Google Groups "OpenSecrets Open Data" group.
To post to this group, send email to opensecret...@googlegroups.com.
To unsubscribe from this group, send email to opensecrets-open...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/opensecrets-open-data?hl=en.


Skye Bender-deMoll

unread,
Apr 2, 2010, 5:49:36 PM4/2/10
to opensecret...@googlegroups.com
Hi Susi

> I believe that your problem is the recipid like 'n%' criteria. These totals
> are not just money to candidates, but they also include money to federal
> party committees.

Ok, so you are saying that the "Contributions from Individuals" column
of the crp website industry totals includes committee money? If I don't
include the n% restriction, the total would include lots records with
"c" recipids, which appear to be mostly contributions from individuals
to committees not associated with candidates. Meaning that
contributions from individuals to oil-related pacs, and oil-related
contributions to non-oil PACs are also be included in the website
industry totals?

In that case I get

select sum(amount) from indivs06 a where realcode
like 'e11%' and type in ( '11', '15', '15E', '15J', '22Y')

which gives $16,962,262 for bulk data, compared with $12,083,747 for the
site. Which is better closer, but off by above instead of below.

If I include some of the other criteria mentioned elsewhere in the docs

select sum(amount) from indivs06 a left join cmtes06 b on a.recipid =
b.recipid where realcode
like 'e11%' and type in ( '11', '15', '15E', '15J', '22Y') and primcode
not like "Z4%" and a.recipcode not like "P%"

then I get $13,187,800 which is getting pretty close!

So this would exclude contribs to party-affiliated committees, but what
does Z4 exclude? Is this right?

> Also, type 10 money is only included for the 2002 and
> earlier cycles. After that, type 10 is Levin funds, and we do not include
> it in the industry profiles.

Thanks, good to know.

Maybe it makes sense to look at this the other direction. Can you tell
me the query in MS SQL within your schema you would use to find the
total of individual contributions for a an industry for a given cycle?
And also industry contributions to a single candidate?

best,
-skye

Susi Alger

unread,
Apr 2, 2010, 8:06:49 PM4/2/10
to opensecret...@googlegroups.com
let's try to make this easier to follow -- can you give me a URL that you'd like to match?

Susi

skyebend

unread,
Apr 5, 2010, 7:24:52 PM4/5/10
to OpenSecrets Open Data
HI,

http://www.opensecrets.org/industries/indus.php?cycle=2006&ind=E01
We are trying to reproduce the industry totals in the "Contributions
from Individuals" column as a check that we are summing contributions
correctly.

And also the "Oil & Gas" row of the 'indivs" column of
http://www.opensecrets.org/politicians/industries.php?cycle=2006&cid=N00026043&type=I

best,
-skye

Susi Alger

unread,
Apr 5, 2010, 8:53:22 PM4/5/10
to opensecret...@googlegroups.com
OK, Skye, here goes, and let me know if I've gotten too technical:

The first URL goes to the industry profile numbers, where we do everything by cycle and we also combine individual and PAC information.  Doing things by cycle makes it easier as we don't have to worry about 6 year data for Senators.  Combining PAC and individual data makes things more complicated, because we need to make corrections so that we don't double count things -- we don't want to add together the money that individuals give to PACs and that the PACs then turn around and give to candidates or we'd be double counting.  So, the individual contribution data found at
http://www.opensecrets.org/industries/indus.php?cycle=2006&ind=E01
is generated from indivs linked to categories to get the catorder = E01.  Then you also want to exclude indivs.recipid like P_ (Indivs to PACs) and left join indivs to cmtes on indivs.recipid = cmtes.cmteid to exclude any joint fundraising committees (primcode like Z4%).    You need to left join especially in the most current cycle as sometimes the PACs in the cmtes table lags behind the cmtes listed in the recipid field of the indivs table.

Within the member profiles ( http://www.opensecrets.org/politicians/industries.php?cycle=2006&cid=N00026043&type=I ) you are limited to the member money and double counting is not longer an issue -- however, you do need to exclude money to any presidential committee for the candidate and add in the previous two cycles for which a Senate seat was sought if the member is a Senator.

I hope this helps, and keep the questions coming.

Susi

skyebend

unread,
Apr 6, 2010, 3:08:12 PM4/6/10
to OpenSecrets Open Data
Thanks, exactly the details I was looking for.

On Apr 5, 5:53 pm, Susi Alger <susi.al...@gmail.com> wrote:
> OK, Skye, here goes, and let me know if I've gotten too technical:
>
> The first URL goes to the industry profile numbers, where we do everything
> by cycle and we also combine individual and PAC information.  Doing things
> by cycle makes it easier as we don't have to worry about 6 year data for
> Senators.  Combining PAC and individual data makes things more complicated,
> because we need to make corrections so that we don't double count things --
> we don't want to add together the money that individuals give to PACs and
> that the PACs then turn around and give to candidates or we'd be double

> counting.  So, the individual contribution data found athttp://www.opensecrets.org/industries/indus.php?cycle=2006&ind=E01


> is generated from indivs linked to categories to get the catorder = E01.
> Then you also want to exclude indivs.recipid like P_ (Indivs to PACs) and
> left join indivs to cmtes on indivs.recipid = cmtes.cmteid to exclude any
> joint fundraising committees (primcode like Z4%).    You need to left join
> especially in the most current cycle as sometimes the PACs in the cmtes
> table lags behind the cmtes listed in the recipid field of the indivs table.

Should the join criteria be : "indivs.recipid = cmtes.recipid"
instead of "indivs.recipid = cmtes.cmteid" ?
The first gives $5,857,043, the 2nd $13,187,800

select sum(amount) from indivs06 a left join cmtes06 b on a.recipid =
b.recipid where realcode
like 'e11%' and type in ( '11', '15', '15E', '15J', '22Y') and
primcode not like "Z4%" and a.recipcode not like "P%"

which gives

$13,187,800 in open data dump compared to $12,083,747 on the website.
Do you think that the difference is because the 2006 data is still
being coded/matched, and the website hasn't caught up yet?


>
> Within the member profiles
> (http://www.opensecrets.org/politicians/industries.php?cycle=2006&cid=...)


> you are limited to the member money and double counting is not longer
> an
> issue -- however, you do need to exclude money to any presidential committee
> for the candidate and add in the previous two cycles for which a Senate seat
> was sought if the member is a Senator.

select sum(amount) from indivs06 where RecipID = 'N00026043' and


realcode like 'e11%' and type in ( '11', '15', '15E', '15J',
'22Y')

gives $39,000 which matches with the website! Hallelujah!
Thanks! ;-)

>
> I hope this helps, and keep the questions coming.
>
> Susi
>

> On Mon, Apr 5, 2010 at 6:24 PM, skyebend <skyeb...@skyeome.net> wrote:
> > HI,
>
> >http://www.opensecrets.org/industries/indus.php?cycle=2006&ind=E01
> > We are trying to reproduce the industry totals in the "Contributions
> > from Individuals" column as a check that we are summing contributions
> > correctly.
>
> > And also the "Oil & Gas" row of the 'indivs" column of
>

> >http://www.opensecrets.org/politicians/industries.php?cycle=2006&cid=...


>
> > best,
> >  -skye
>
> > On Apr 2, 5:06 pm, Susi Alger <susi.al...@gmail.com> wrote:
> > > let's try to make this easier to follow -- can you give me a URL that
> > you'd
> > > like to match?
>
> > > Susi
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "OpenSecrets Open Data" group.
> > To post to this group, send email to
> > opensecret...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > opensecrets-open...@googlegroups.com<opensecrets-open-data%2Bunsu...@googlegroups.com>

Reply all
Reply to author
Forward
0 new messages