Re: Data Migration from MySQL to MongoDB

126 views
Skip to first unread message

Ronald Stalder

unread,
Mar 25, 2013, 4:35:49 PM3/25/13
to mongod...@googlegroups.com
Hi Vineet

when migrating from and RDBMS to MongoDB you have to keep in mind that you'll enter a completely new paradigm. MIgrating a SQL database 1:1 to MongoDB does not make sense.


One of the challenges that comes with moving to MongoDB is figuring how to best model your data. While most developers have internalized the rules of thumb for designing schemas for RDBMSs, these rules don't always apply to MongoDB. The simple fact that documents can represent rich, schema-free data structures means that we have a lot of viable alternatives to the standard, normalized, relational model. Not only that, MongoDB has several unique features, such as atomic updates and indexed array keys, that greatly influence the kinds of schemas that make sense. Understandably, this begets good questions:

    • Are foreign keys permissible, or is it better to represent one-to-many relations withing a single document?
    • Are join tables necessary, or is there another technique for building out many-to-many relationships?
    • What level of denormalization is appropriate?
    • How do my data modeling decisions affect the efficiency of updates and queries?

In this session, we'll answer these questions and more, provide a number of data modeling rules of thumb, and discuss the tradeoffs of various data modeling strategies.

Cheers

Ronald

tst...@sacbee.com

unread,
Mar 25, 2013, 11:18:50 PM3/25/13
to mongod...@googlegroups.com
Seconding Ronald's response and expanding upon it a bit... expect a redesign of your data and how you access it. Once that's done, you'll undoubtedly have to write your own program to do the conversion. By choosing mongodb you're sacrificing certain organizationally efficient things to adapt some entirely different organizationally efficient things. Some sort of 1:1 translation means terrible performance at the very least. You will have to re-adapt everything to make that transition appropriate... unless you've been using MySQL in a very mongodb way :).

Tips (all of these tips are intricately related, so pay attention to how one might impact the other):
- getmore's can be a killer. Where this comes into play when porting from mysql is this. Let's say you have some collection where you've indexed the time. Then you pull a collection of the documents which fall within a certain time. Let's say it returns 10k records. This is an insignificant amount, but the problem is that getmores are interleaved with all other requests. What you'll find is this: while MySQL is fast to return those records all at once, but can have concurrency issues (tenancy to block even when concurrent access should be assured), mongodb can suddenly pause for a long duration of time while the request is timesliced (this issue is massively compounded by updates... see below). MySQL will consistently look/feel faster on such queries. 
- Understanding memory mapping is extremely important. 1->1 lookups have a tendency to be (much) faster than MySQL if the mmap cache isn't being trashed. How you should keep this in mind in your schema design is: you need to understand that all data is memory mapped. There is no managed cached. Furthermore, those memory pages are not "smart". They're as smart as your OS is and OS's are not database engines, and therefore they're really quite stupid (top-stack sorting on access and such... very stupid when it comes to databases). For an extremely efficient schema design this fact is actually a blessing, but coming from a MySQL world where InnoDB caches are extremely efficiently manged, it's a shock. Keep in mind this: every database file is memory mapped and memory mapping is managed by your OS's kernel (not mongodb). Furthermore, mongodb memory maps on the database level rather than the collection/table level (because the data is split into files like <database>.1 and <database>.2). Furthermore (again), the way mongodb's binary data files are separated does not include indexing. This means that indexes are switched in-and-out on page faults a lot like data is. So to be efficient, here's what you do (very different form MySQL)... do not trash those memory page caches. If you have a collection that has a lot of data but doesn't necessarily contribute to your application (except maybe rarely)... move it elsewhere. Push large chunks of data elsewhere... lets say you have a blog with a collection of articles. When you stop an think about how your actual blog app might function, you'll realize that the article content doesn't have to be anywhere near the article meta data. Just an example, adjust appropriately. Just trying to drive a conceptual point home (with bad examples, but make sense?)
- You're simply shifting the responsibility of joins. This is an important point. You must realize that the cost of joins in a relational system is built-in to the database system. In mongodb, you'll find yourself invariably responsible for at least some joins. If you aren't, then you're completely relationshipless and simply shifting the responsibility off onto your ram/drive capacity. Let's say you have some relationships (presumably you will/actually should in a nice and balanced schema layout), you must first find what you want from your primary collection and then match it to it's related content. Where a relational database is designed for this, your are now required to make this happen in your app. This can actually be an extremely efficient way of doing things, or a horrible way... it entirely depends.
- Inserts/updates can be incredibly fast, but database-level locks can cripple you in a production system (particularly if you're used to innodb, but even if you use myisam usually even thought it's effectively a table lock). Delete's are the biggest killer of them all... avoid them if at all possible. Just mark documents as "deleted" and purge once in a while when the time is right (if possible; follow up with a resync once in a while if you can).
- If it's a sparse index (see http://docs.mongodb.org/manual/core/indexes/#sparse-indexes), use a sparse index. I/O can be the biggest killer in ANY db system, and sparse indexing is a beautiful thing.
- Lot's more, but I hope this has helped you understand why/when/how you should migrate. MongoDB has some incredibly great points, but it's important to utilize the technology appropriately. 

On Sunday, March 3, 2013 11:09:10 PM UTC-8, vineet khanna wrote:
Hi,

What are the various ways to migrate data from MySQL to MongoDB.

Is there any tool available like Sqoop for Hbase.

~Vineet

Rajeshwar Reddy

unread,
Apr 21, 2014, 11:53:33 AM4/21/14
to mongod...@googlegroups.com
Hi Vineet,

There is tool called "Pelica Migrator" which can migrate from MySQL to MongoDB in a single click. See if the tool can help: http://www.techgene.com/pelica-migrator/
Reply all
Reply to author
Forward
0 new messages