JSON import to Postgres

2,192 views
Skip to first unread message

Brandon Hilkert

unread,
Nov 2, 2012, 11:51:26 AM11/2/12
to phil...@googlegroups.com
I have a 3GB json dump that I need to import into a Rails App. Writing a script that uses ActiveRecord has proved to take too long. Outside of dropping down to the SQL level, anyone have experience/suggestions doing this sort of thing in a timely manner?

Thanks,
Brandon

------------------------------------
http://brandonhilkert.com

Walter Lee Davis

unread,
Nov 2, 2012, 11:56:51 AM11/2/12
to phil...@googlegroups.com
I used a Gem to store/load a whole lot of ActiveRecord as YAML, there may be something similar that uses JSON. Is that one huge-ass file, or can it be broken into bite-size chunks?

http://rubydoc.info/gems/yaml_db/0.2.3/frames

Walter
> --
> You received this message because you are subscribed to the Google Groups "Philly.rb" group.
> To post to this group, send email to phil...@googlegroups.com.
> To unsubscribe from this group, send email to phillyrb+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/phillyrb?hl=en.

Randy Schmidt

unread,
Nov 2, 2012, 11:57:07 AM11/2/12
to phil...@googlegroups.com
On Fri, Nov 2, 2012 at 11:51 AM, Brandon Hilkert
<brandon...@gmail.com> wrote:
> I have a 3GB json dump that I need to import into a Rails App. Writing a
> script that uses ActiveRecord has proved to take too long. Outside of
> dropping down to the SQL level, anyone have experience/suggestions doing
> this sort of thing in a timely manner?

Can you convert it to a CSV or multiple CSVs easily? Maybe use raw
Ruby to convert it from JSON to CSV? I know MySQL can import stuff
from a CSV VERY fast... postgres probably can too.

Actually, it looks like:

COPY zip_codes FROM '/path/to/csv/whatevs.txt' DELIMITERS ',' CSV;

is how you do it in postgres.

--
Randy Schmidt

Angel Pizarro

unread,
Nov 2, 2012, 11:57:17 AM11/2/12
to phil...@googlegroups.com
Even using batch transactions with the activerecord-extensions gem? -a
--

Brandon Hilkert

unread,
Nov 2, 2012, 12:03:23 PM11/2/12
to phil...@googlegroups.com
Thanks for the suggestions guys. The file is one big dump from MongoDB. Unfortunately there's a bunch of embedded documents in the dump, so I can just drop it in the DB, without first parsing the embedded things and putting them in their appropriate places.

Sounds like this is going to be a pain...


------------------------------------
http://brandonhilkert.com

Walter Lee Davis

unread,
Nov 2, 2012, 12:06:35 PM11/2/12
to phil...@googlegroups.com
Could you bring it up again in a different Mongo, then iterate over all of the records and build up a new ActiveRecord (with nested children where needed) for each one?

Walter

Brandon Hilkert

unread,
Nov 2, 2012, 12:10:50 PM11/2/12
to phil...@googlegroups.com
Walter - The Rails app is currently backed by MongoDB. So the app is up and connected. I started doing what you suggested and at the rate it was going, it would've taken 10 hours or so. 

I also tried doing the same thing, and pushing the conversion of each record to Sidekiq to attempt to take advantage of the multi-threading, but it took even longer to queue the jobs and the weren't processing as fast as I expected. Doing it inline actually was faster (that surprised me...). Probably has something to do with the power of the db engines on my local machine, but it's not like we're using anything that much more powerful in production.

So i thought about just ditching AR all together, which I'm guessing will speed things up, but be a big ball of messy SQL. So I began to explore any benefits that the JSON import could offer.

And...that's where I'm at.

So in short, yeah I have access to the MongoDB instance with the data in it.


------------------------------------
http://brandonhilkert.com

Steve Eichert

unread,
Nov 2, 2012, 12:27:21 PM11/2/12
to phil...@googlegroups.com
+1 to Randy's suggestion.  I'd write a script that iterates over the documents and spits them out to CSV file(s).  Then import those CSV files into MySQL.  During import you can either import directly into the destination tables OR If there is some hierarchy that you need to maintain or other trickery load them into a few temporary tables that you can then use SQL against to copy to the right destination tables (with necessary JOIN's and such).  In my experience the trickier bit to this is keeping the identifiers and such around that you need to maintain the right relationships in the MySQL database but its usually not too hard to figure out a solution to this (add a column to the DB that you can later drop, etc.).

Cheers,
Steve

Steve Eichert

unread,
Nov 2, 2012, 12:28:06 PM11/2/12
to phil...@googlegroups.com
Sorry, replace all instances of MySQL with Postgres in my response :-)

Angel Pizarro

unread,
Nov 2, 2012, 12:55:09 PM11/2/12
to phil...@googlegroups.com
Looks like ar-extensions not supported for Rails 3. Pointed to this instead

Mat Schaffer

unread,
Nov 2, 2012, 1:00:05 PM11/2/12
to phil...@googlegroups.com
Would that library Chris Le presented a couple months back be useful here? I apparently didn't add it to the even page so I can't remember what it was called.

-Mat




--

Chris Le

unread,
Nov 2, 2012, 1:07:42 PM11/2/12
to phil...@googlegroups.com
Yep. I did a presentation on it. It's not the BEST library i've seen, but it managed to handle everything I threw at it. And I pull from all sorts of messy data sources.

Brandon Hilkert

unread,
Nov 2, 2012, 1:08:49 PM11/2/12
to phil...@googlegroups.com
Chris - Did you open source it?


------------------------------------
http://brandonhilkert.com

Chris Le

unread,
Nov 2, 2012, 1:17:58 PM11/2/12
to phil...@googlegroups.com
I didn't make any of those gems.  Here are the links:

Xiaoyi Lu

unread,
Aug 4, 2014, 10:49:11 AM8/4/14
to phil...@googlegroups.com
Unfortunately, I have to do something similar to you. Whats worse is we use mongoid-history which means there are a messy history data to clean up on top of the whole complexity. Which approach did you end up going for?

Dwyer, Mike

unread,
Aug 4, 2014, 1:33:14 PM8/4/14
to phillyrb
Out of curiosity, why the switch? Is that something you guys can share?

I'm looking into using a NoSQL database as part of one of my projects....


--
You received this message because you are subscribed to the Google Groups "Philly.rb" group.
To unsubscribe from this group and stop receiving emails from it, send an email to phillyrb+u...@googlegroups.com.

To post to this group, send email to phil...@googlegroups.com.
Visit this group at http://groups.google.com/group/phillyrb.
For more options, visit https://groups.google.com/d/optout.



--
Thanks,

Mike

Chad Ostrowski

unread,
Aug 4, 2014, 3:55:45 PM8/4/14
to phil...@googlegroups.com
Mike: I obviously can't speak for Xiaoyi, and I haven't implemented an app with a NoSQL store, but Sarah Mei's article Why You Should Never Use MongoDB has good thoughts on the subject.

Xiaoyi Lu

unread,
Aug 4, 2014, 11:52:05 PM8/4/14
to phil...@googlegroups.com
I dont think MongoDB is suitable for Enterprise solutions especially you need lots of reporting. There is no join in mongodb (you can hack it but you are not supposed to do also its dead slow). 

Xiaoyi Lu

unread,
Aug 5, 2014, 12:31:24 AM8/5/14
to phil...@googlegroups.com
If the project need lots of reporting which means you need lots of joins, then DO NOT use mongodb


On Tuesday, 5 August 2014 01:33:14 UTC+8, Mike Dwyer wrote:

Justin Campbell

unread,
Aug 5, 2014, 7:48:57 AM8/5/14
to phil...@googlegroups.com
Brandon, I've had great results using https://github.com/zdennis/activerecord-import to speed up bulk data imports.

-Justin

Sent from my iPhone

Walter Lee Davis

unread,
Aug 5, 2014, 8:02:32 AM8/5/14
to phil...@googlegroups.com
If your goal is simply to move a database from here to there, then https://github.com/ludicast/yaml_db is my go-to. Very simple, very reliable. I have used it to move from a test site in one database that suddenly gains "useful" data to a new production server elsewhere in another. I've used it to move from SQLite to MySQL and from MySQL to PostgreSQL.

Walter

Brandon Hilkert

unread,
Aug 5, 2014, 8:04:15 AM8/5/14
to phil...@googlegroups.com
Hey guys,

Thanks for all the suggestions. I must have posted this about 2 years. And consequently haven't used Mongo for quite awhile. But hopefully for anyone that does, they got something out of the recent comments. 

Thanks!


--

Dwyer, Mike

unread,
Aug 5, 2014, 11:19:23 AM8/5/14
to phillyrb
I appreciate it too... I've been considering using a NoSQL store to speed up one of my apps using it mostly as a "cache" to add/remove records to for fast access while using a mechanism for pushing updates, etc. to the main store in MySQL. It would also let the user "play" with the data until they were ready to commit it fully to MySQL.

Most of my attention has been on Redis, but I have concerns about scalability and heavy multiuser access simply because I'm not as familiar with these types of databases.

Anybody have thoughts on this?



Chad Ostrowski

unread,
Aug 5, 2014, 12:02:53 PM8/5/14
to phil...@googlegroups.com
There was a great talk about Event Store at Philly ETE. It's one of those append-only, never-delete-anything databases. You can think of it as basically a big log, which is highly optimized for writing. Then you can project that data into a relational store, a document store, a graph store, and more. All projected databases are essentially caches, which are highly optimized for reading.

I love the idea of this.

I'm not sure if there are good libraries for working with Event Store in Ruby/Rails. Just figured I would mention it, because it seems like a really smart way of dealing with data.

Maurício Linhares

unread,
Aug 8, 2014, 11:04:22 AM8/8/14
to phil...@googlegroups.com
Redis is good is you don't care about the data, can easily re-create
it and don't need multi-master replication, if any of these are
concerns for you, then it's probably not what you're looking for.

I'm finding Bill Howe's list of databases and features really
interesting, might be good for you to figure out what to do and how to
pick the best one.

From my personal experience, MongoDB is good for a read-mostly use
case, if you need lots and reads and writes you'll probably have
trouble with it.

-
Maurício Linhares
http://mauricio.github.io/ - http://twitter.com/#!/mauriciojr
options.pdf
Reply all
Reply to author
Forward
0 new messages