How to import

635 views
Skip to first unread message

Joel Gwynn

unread,
Nov 24, 2017, 11:16:13 AM11/24/17
to OpenSecrets Open Data
I'm trying to import the bulk data into PostgreSQL and I'm having a hell of a time with the file format.  Is there a program people use to parse these files?

TIA,
Joel

Ron Zucker

unread,
Nov 24, 2017, 12:26:58 PM11/24/17
to opensecret...@googlegroups.com
Are you on a Mac/Linux or PC? I can dig around. I used to have a set of scripts for Mac/LINUX. The problem, if you're in those OS's, is the CR/LF at the end of a row. You can use sed to replace them. If you're on a PC, it should import easily.


Ron Zucker

Sent from a cell phone. Please give me the benefit of the doubt and assume that I know how to spell that word or know that rule of grammar, but that it was to (sic) difficult to go back and change it. Thanks.

--
You received this message because you are subscribed to the Google Groups "OpenSecrets Open Data" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opensecrets-open-data+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Joel Gwynn

unread,
Nov 24, 2017, 12:33:30 PM11/24/17
to opensecret...@googlegroups.com
Thanks. The problem isn't with the row delimiters, it's with the
column delimiters. AFAICT the default delimiter is "|,|" except when
there are multiple blank columns; then the pipes are collapsed like
so: "|,,,|", and if the data is numeric then the pipes are also
collapsed, but only the one adjacent to the number. It's really like
nothing I've ever seen before. It has defeated all of my efforts to
do a simple find/replace. I can write a parser if I have to, but I
figured somebody has a way to handle this format.
>> email to opensecrets-open...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OpenSecrets Open Data" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> opensecrets-open...@googlegroups.com.

Joel Gwynn

unread,
Dec 1, 2017, 2:27:09 PM12/1/17
to OpenSecrets Open Data
To answer my own question: the SQL Server import/export utility has no problem with this.  To get it into Postgresql, I used the built-in python csv library, which also had no problem.
>> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OpenSecrets Open Data" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

SB

unread,
Dec 7, 2017, 4:03:37 PM12/7/17
to OpenSecrets Open Data
Hi Joel,

I am having trouble importing lob_lobbying.txt file. I tried using python's csv sniffer but I am still unable to import the data. I would appreciate any suggestions or advice you could share with me.

Thank you in advance.

Best,
Shradha

Joel Gwynn

unread,
Dec 7, 2017, 4:18:10 PM12/7/17
to opensecret...@googlegroups.com
What error are you getting? I had to read the files line-by-line with
the csv library, and insert the data. I can share some code if you
like.
>>> >> email to opensecrets-open...@googlegroups.com.
>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to a topic in the
>>> > Google Groups "OpenSecrets Open Data" group.
>>> > To unsubscribe from this topic, visit
>>> >
>>> > https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
>>> > To unsubscribe from this group and all its topics, send an email to
>>> > opensecrets-open...@googlegroups.com.
>>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OpenSecrets Open Data" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> opensecrets-open...@googlegroups.com.

SB

unread,
Dec 7, 2017, 6:34:26 PM12/7/17
to OpenSecrets Open Data
Thank you for the prompt response, Joel.  I was trying to use the csv sniffer module with no luck. I am working on a line by line loop but given the unstandardized columns, I am having trouble to get the data in shape. If you  could share some of your code, it would be extremely helpful.

Best,
Shradha

>>> >> For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to a topic in the
>>> > Google Groups "OpenSecrets Open Data" group.
>>> > To unsubscribe from this topic, visit
>>> >
>>> > https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
>>> > To unsubscribe from this group and all its topics, send an email to
>>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OpenSecrets Open Data" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

SB

unread,
Dec 7, 2017, 7:47:27 PM12/7/17
to OpenSecrets Open Data
I figured it out. Thank you for your help Joel.

Best,
Shradha

Joel Gwynn

unread,
Dec 7, 2017, 8:12:25 PM12/7/17
to opensecret...@googlegroups.com
You're welcome. Bravo!
>>> >>> >> email to opensecrets-open...@googlegroups.com.
>>> >>> >> For more options, visit https://groups.google.com/d/optout.
>>> >>> >
>>> >>> > --
>>> >>> > You received this message because you are subscribed to a topic in
>>> >>> > the
>>> >>> > Google Groups "OpenSecrets Open Data" group.
>>> >>> > To unsubscribe from this topic, visit
>>> >>> >
>>> >>> >
>>> >>> > https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
>>> >>> > To unsubscribe from this group and all its topics, send an email to
>>> >>> > opensecrets-open...@googlegroups.com.
>>> >>> > For more options, visit https://groups.google.com/d/optout.
>>> >
>>> > --
>>> > You received this message because you are subscribed to a topic in the
>>> > Google Groups "OpenSecrets Open Data" group.
>>> > To unsubscribe from this topic, visit
>>> >
>>> > https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
>>> > To unsubscribe from this group and all its topics, send an email to
>>> > opensecrets-open...@googlegroups.com.
>>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "OpenSecrets Open Data" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/opensecrets-open-data/VBGy-dcqYL8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> opensecrets-open...@googlegroups.com.

Zhilin Xue

unread,
Jun 27, 2024, 5:41:22 AM6/27/24
to OpenSecrets Open Data
Dear Joel,

Hope this email finds you very well.
This is a request for help from 2024.
I have struggled to import this lob_lobbying.txt file because of its unstandardized columns. I am trying to use csv library but it does not work. Could you please share some suggestions?

Thank you very much.

Best wishes,
Zhilin

Adam S

unread,
Jun 27, 2024, 10:08:44 AM6/27/24
to opensecret...@googlegroups.com
I've found this R code works well for dealing with the delimiters in the OpenSecrets data files. Here, I'm exporting to a Stata file, but you could export to a csv or excel using write.csv. You will have to add your own column headers once you've exported the data.

install.packages("readr")
install.packages("foreign")
library(readr)
library(foreign)
lobbying<-read_delim("lob_lobbying.txt",delim = ",",  col_names = FALSE,  quote = "|")
saveRDS(lobbying, file="lobbytest", ascii = FALSE, version = NULL, compress = TRUE, refhook = NULL)
write.dta(lobbying, "lobbying.dta", version=10L)


Reply all
Reply to author
Forward
0 new messages