Seeking for advise on usage MySQL together with MongoDB

149 views
Skip to first unread message

Artie

unread,
Dec 5, 2014, 12:56:57 PM12/5/14
to django...@googlegroups.com
Good day, Community,

I'm looking for advise on MySQL & MongoDB usage so please let me take your time to describe the situation.

I came to work on e-commerce shop of electrical components with about 10 millions of products. Now all of it stored in awful structure of tables in MySQL. Products stored in 10 small tables, each of them represents individual manufacturer of products and 1 generalized table, where all 10 millions of goods is stored. Those big table serves Sphinx search to implement search on site.
All these products being crawled from several APIs and sites on web, so this shop is kind of authorized reseller in our region.

The case is that we have to update all products daily and  parsing with updating of products takes very long time.

I have an idea to start using MongoDB to update and store products and as I think it might take less time than same in MySQL. First question: Am I correct with this statement?

Browsing web I've found that some recommend using pymongo, avoiding django rather than mongodb for django. So for your opinion is this statement correct?

Also it will be highly highly appreciated if you can share your personal use experience with MongoDB and any information you think be useful.

Thank you in advance

Russell Keith-Magee

unread,
Dec 6, 2014, 12:27:18 AM12/6/14
to Django Users
Hi Artie,

Can I make an alternate suggestion? Get a real database.

In all honesty, I've never heard anyone in the Django community express a deep love of MySQL or MongoDB. I know people who use MySQL, but when they admit that, they say "Yeah, I know, but the customer required it" or "Yeah, I know, but at the time we started it was the only thing Amazon supported". As for MongoDB, the sentiment is usually "... and that was our first mistake.".

Personally - I have very little time for MySQL. It gets a number of key design decisions wrong (for example, InnoDB's implementation of row referential integrity is *demonstrably* incorrect). It has some default behaviours that beggar belief (e.g., on MyISAM, by default, a row with a "WHERE field IS NULL" clause that matches no results, and the previous statement was an insert, the query will not return no results, but the primary key of the last row inserted. By design [1][2]). And MySQL's usage of indexes is woefully naïve - to the point where "performance optimising for MySQL" often means "Roll out the results of an inner query and pass them in as arguments, rather than just using a subquery".


And then you have MongoDB - a database that exists, as far as I can make out, to overcome the deficiencies in MySQL. If you'd just started with a real database, you wouldn't have hit the problems with MySQL, and you wouldn't have to go looking for an exotic solution to overcome those problems.

If you're looking for performance, you're going to get much better performance out of PostgreSQL, for the same price you paid for MySQL, with the added benefit that PostgreSQL developers appear to have actually consulted the SQL specification when they implemented their database. They also have a query planner that will actually *use* indexes, instead of just keeping them for decoration like MySQL does.

If you're looking for a "schemaless" data store - well, PostgreSQL hstore fields [3] have you covered. To the extent that people have actually developed "MongoDB in PostgreSQL" [4]. And those stores outperform MongoDB [5].


So - in all honesty, I'd start by reconsidering your initial assumptions. 

Yours,
Russ Magee %-)

Collin Anderson

unread,
Dec 6, 2014, 6:45:36 PM12/6/14
to django...@googlegroups.com
Hi,

What do you mean by "parsing with updating of products"?

I'd personally store everything in authoritatively in that 1 generalized table and get rid of the 10 smaller tables. Have a "manufacturer" CharField or ForeignKey to determine what fields are available and other custom logic. As you mention, the one table is very helpful for search especially. You don't need to copy data between tables.

But I totally agree with Russ:
If performance is your goal, check out PostgreSQL.
If schema-less is what you are hoping for, try PostgreSQL hstore.

I also have tried mongodb with django many years ago and it didn't work well, but we're working on making django play better with non-sql databases.

Collin

Cal Leeming

unread,
Dec 6, 2014, 9:18:06 PM12/6/14
to django...@googlegroups.com
Hi Artie,

First, I would strongly recommend reading some of the work by David Mytton at Server Density, he and his team have been using MongoDB extensively for many years and they have shared a lot of their insight [1]. It's also worth mentioning that Postgres has support for JSON field type [2] which satisfies many of the use cases for document store, though I haven't finished my own experiments yet so can't comment on comparative performance/functionality.

It's industry knowledge that MySQL is bleeding out, rapidly [3] [4]. At this point, I introduce you to Monty and his creation MariaDB [5]. Although I would agree with Russell on many of the points he's made about MySQL, it's hard not to have respect for the work that Monty and his team have done previously on MySQL, and the vision that is now MariaDB. I would strongly recommend you spend some time looking into this yourself, read as many comparison articles as your eyes will allow, and come to your own conclusions. 

I have built numerous "large scale" systems with a variety of technologies.. Sphinx, ElasticSearch, MongoDB, MySQL, CouchBase, Redis. I've also spent many years battling with MongoEngine and have learnt to hate it, as much as I now hate the Django ORM. So, unless you are edging towards terabytes of data, assuming you are using SSDs and high memory nodes, then the impact of your choice will probably be negligible. And if you are using the Django ORM, then you are even less likely to reap any of these benefits out of the box [6] There are many different reasons for choosing one over the over, and you should assess this based on your own use case/needs/skills, rather than religious bias.

Determine your use case and test all viable options, otherwise you could be avoiding something for the wrong reasons. You can build *beautiful* things with all these different technologies, and it's actually the mindset which matters the most [7]. Don't just settle for what others are telling you, try for yourself and come to your own conclusions, only then can you be sure. It's also a good way to gain in-depth knowledge about how these technologies work, which can be invaluable.

Cal



--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAJxq84-FKrcBeB5joYTAP97dreo5cDmu_2rxbutrQNm%3DAFru3g%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Mario Gudelj

unread,
Dec 7, 2014, 3:23:37 AM12/7/14
to django...@googlegroups.com

Django ORM is the best thing since sliced bread!

Artie

unread,
Dec 10, 2014, 7:15:19 PM12/10/14
to django...@googlegroups.com
I would like to thank you a lot for so detailed answers, actually I didn't expect to get so much information on my question.
That really great and inspiring. So thank you very much again.
Guess now I need to start learning PostgreSQL, cause that is not first time when I hear that Python and Postgres rocks.


пятница, 5 декабря 2014 г., 14:56:57 UTC+2 пользователь Artie написал:

Artie

unread,
Dec 10, 2014, 7:17:09 PM12/10/14
to django...@googlegroups.com, c...@iops.io
Can you please advise why actually you hate Django ORM and what should be considered instead of it?

суббота, 6 декабря 2014 г., 23:18:06 UTC+2 пользователь Cal Leeming написал:

Cal Leeming

unread,
Dec 10, 2014, 8:21:07 PM12/10/14
to Artie, django...@googlegroups.com, Cal Leeming
If you're using Django, then there isn't really an alternative, as it's really a key selling point of Django.

Some people recommend SQLAlchemy, but I dislike that even more for it's over-engineered complexity.

Peewee [1] shows some good potential, but I haven't tried it for myself yet.

As for why I dislike the Django ORM, many of the technical reasons are already explained (in detail) in that thread [2] I mentioned before, but a lot of it comes down to personal taste and opinion of what matters most about an ORM etc.

Cal

Collin Anderson

unread,
Dec 12, 2014, 5:42:14 PM12/12/14
to django...@googlegroups.com, giliar...@gmail.com, c...@iops.io
Hi,

Re: the hacker news thread, (sorry for getting a little off topic,) I just wanted to mention a few places where we've tried to improve some of those things recently:
- The ORM is still slow, but we've added prefetch_related() and improved select_related() which allows you do use fewer queries.
- We've pulled out some of components into separate projects: django-localflavor, django-contrib-comments, django-formtools, and I imagine more in the future.
- You can now use custom user models.
- We're working on an official public API for the model._meta internals and working towards components only using that. It also means it will be easier to use non-django models in the admin.
- We're working on de-coupling django templates, allowing them to be used on their own, and making it much easier to use other template engines like Jinja2.
- There's are also quite a lot of people using django-rest-framework for creating REST apis, which was one thing the thread said django was not good at.

Collin

Carl Meyer

unread,
Dec 12, 2014, 5:54:02 PM12/12/14
to django...@googlegroups.com
On 12/12/2014 10:42 AM, Collin Anderson wrote:
> Re: the hacker news thread, (sorry for getting a little off topic,) I just
> wanted to mention a few places where we've tried to improve some of those
> things recently:
> - The ORM is still slow, but we've added prefetch_related() and improved
> select_related() which allows you do use fewer queries.

I'm not sure I'd make a general statement that "the ORM is slow" without
seeing further evidence. It'd be interesting to see benchmarks against
something like SQLAlchemy, but I haven't seen that (and a Google search
didn't turn anything up).

As is often the case, how fast the Django ORM is in practice has a lot
more to do with the skill and experience of the person using it than
anything else. You can make any ORM plenty slow if you carelessly code a
bunch of N+1 query situations.

> - We've pulled out some of components into separate projects:
> django-localflavor, django-contrib-comments, django-formtools, and I
> imagine more in the future.
> - You can now use custom user models.
> - We're working on an official public API for the model._meta internals and
> working towards components only using that. It also means it will be easier
> to use non-django models in the admin.
> - We're working on de-coupling django templates, allowing them to be used
> on their own, and making it much easier to use other template engines like
> Jinja2.
> - There's are also quite a lot of people using django-rest-framework for
> creating REST apis, which was one thing the thread said django was not good
> at.

Good list. I think Josh Smeaton's recent expressions refactor in master
should definitely make that list too; it goes a long way towards
addressing Alex's complaints in this talk (linked from the HN thread):
https://speakerdeck.com/alex/why-i-hate-the-django-orm

Also Aymeric's work in 1.6 rebuilding the transaction system from
scratch: Personally, I think Django 1.6+ has a better and clearer
transaction model than SQLAlchemy.

A few years back, I'd have said SQLAlchemy was clearly superior to the
Django ORM. Today, I would not make that claim; having used both a fair
bit, I think they have different strengths.

Carl

signature.asc

Cal Leeming

unread,
Dec 12, 2014, 6:05:02 PM12/12/14
to Collin Anderson, django...@googlegroups.com, Артём Мутерко, Cal Leeming
Hi Collin, 

Just a few comments;

On Fri, Dec 12, 2014 at 5:42 PM, Collin Anderson <cmawe...@gmail.com> wrote:
Hi,

Re: the hacker news thread, (sorry for getting a little off topic,) I just wanted to mention a few places where we've tried to improve some of those things recently:
- The ORM is still slow, but we've added prefetch_related() and improved select_related() which allows you do use fewer queries. 
- We've pulled out some of components into separate projects: django-localflavor, django-contrib-comments, django-formtools, and I imagine more in the future.
- You can now use custom user models.

Custom models is (imho) still unnecessarily complex, something as simple as not requiring a username field requires a lot of work. There have been several instances where migrations had to be scraped due to weird edge cases with custom user models.
 
- We're working on an official public API for the model._meta internals and working towards components only using that. It also means it will be easier to use non-django models in the admin.
- We're working on de-coupling django templates, allowing them to be used on their own, and making it much easier to use other template engines like Jinja2.
- There's are also quite a lot of people using django-rest-framework for creating REST apis, which was one thing the thread said django was not good at.

DRF has it's own problems and (again imho) does not solve the problems it was originally designed to fix. Attempting to build anything beyond "out of the box CRUD" requires hacky/unclean workarounds, with most of your time spent fighting against the shortcomings of DRF rather than working with it.

Building a clean RESTful API in Django is possible, but it requires a lot of custom libs/hacks to make it work properly. For example, you have to create your own method dispatcher/router, which in turn breaks class/dispatch decorators due to the way those are loaded. Or you could throw it all into a single class and split out the functionality using get/post/put, but this is really unclean because each HTTP method is essentially it's own view. Another way is to not use decorators of course. Then you have partial form handling, which I already touched on in another thread [1], so I won't repeat here.

These are just a few examples of annoyances which hinder productivity. That being said, Django is still quite good at a lot of things, and again it comes down to "the right tool for the job". It's all about knowing the strength/weaknesses of your options, and making an appropriate choice based on your use case, making sacrifices dependant on your needs etc.

Collin Anderson

unread,
Dec 12, 2014, 6:31:34 PM12/12/14
to django...@googlegroups.com, cmawe...@gmail.com, giliar...@gmail.com, c...@iops.io
Hi Cal,

Thanks. I'm always looking for more pain points about Django that could be improved.

Collin

Tom Christie

unread,
Dec 13, 2014, 11:12:28 AM12/13/14
to django...@googlegroups.com, cmawe...@gmail.com, giliar...@gmail.com, c...@iops.io
Hi Cal,

> DRF has it's own problems and (again imho) does not solve the problems it was originally designed to fix. Attempting to build anything beyond "out of the box CRUD" requires hacky/unclean workarounds, with most of your time spent fighting against the shortcomings of DRF rather than working with it.

I'm slightly at a loss to understand what set of pain points could have left you with that impression, but it'd be helpful to get a better idea.

Perhaps some of the awkwardnesses in 2.x serializers that have been addressed with the 3.0 release?
Perhaps a misunderstanding that viewsets/routers are intended to be the canonical way to use REST framework?

The 'build anything beyond "out of the box CRUD"' comment is particularly surprising to me because of all the work I do in REST framework *none* of it is CRUD, and the design decisions in REST framework are very explicitly towards it being an agnostic Web API toolkit, that just *happens* to also have easy support if basic CRUD stuff *does* happen to be what you want. Eg:

* Validation that's cleanly decoupled from Django's ORM, but that can also work seemlessly with it if that's what you need.
* Works just fine with regular views, but has a minimal set of pre-provided generic views if simple CRUD operations are what you need.
* Viewsets and routers for projects that fit well with a very standard URL style, or drop down to views and explicit URL conf otherwise.

I don't want to jump on and just criticize the view point you've presented - whatever it is that's left you with that impression is clearly a useful data point for me.

What other API toolkits would you look towards as more mature, well supported, nicely designed solutions?
Which shortcomings were you fighting against, and what do you think the project should be doing differently?

Cheers,

  Tom

Cal Leeming

unread,
Dec 16, 2014, 11:34:08 AM12/16/14
to Tom Christie, django...@googlegroups.com, Collin Anderson, Артём Мутерко, Cal Leeming
Hi Tom,

Apologies for the slow answer, give me a week or so and I'll spend some time putting a proper detailed reply/breakdown together in a new thread.

Cal

Tom Christie

unread,
Dec 16, 2014, 12:23:53 PM12/16/14
to django...@googlegroups.com
Thanks Cal. If you do get the time that'd be fab, positive criticism is always useful. Equally please don't feel obligated - sure you weren't expecting to have to give a blow-by-blow breakdown of pain points you found. :p

May be worth taking any further convo over to the django-rest-framework list, but either forum works for me.


Reply all
Reply to author
Forward
0 new messages