Scaling Django

264 views
Skip to first unread message

Joshua Pokotilow

unread,
Feb 3, 2016, 10:30:05 AM2/3/16
to Django users
At the startup where I work, we've written a lot of our server code in Django. So far, we've adopted a "build it fast" mentality, so we invested very little time in optimizing our code. A small amount of load testing has revealed our codebase / infrastructure as it stands today needs to run faster and support more users.

We recently hired some new engineers who are extremely skeptical that we should optimize our existing code. Their main concerns are:

- We need to move to a service-oriented infrastructure because Django is too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
- It's too easy to write slow queries using the Django ORM
- It's hard to hire Django engineers
- While Instagram and DISQUS use Django to service large numbers of people, they don't use it for any serious backend work

After having worked with Django for the last 3 years, I'm a big believer in it, and I believe it would scale. To defend my position, I've pointed out to my colleagues that it's easy to identify bottlenecks with tools like the Django Debug Toolbar and Yet Another Django Profiler. With my colleagues present, I've isolated and fixed significant speed problems inside of a few hours. I don't believe the Django ORM is inherently bad, although I do think that coders who use it should Know What They're Doing. Finally, I've referenced blog entries that talk about how Instagram and Disqus use Django on the backend for backend-y tasks.

Despite my best efforts, my colleagues are still pushing to have us rewrite large portions of our infrastructure as separate services before we try to fix them. For example, we have one slow REST endpoint that returns a boatload of user data, and so there's talk about using a new microservice for users in lieu of our existing Django models. Even if we are able to fix bottlenecks we encounter in a timely fashion, my colleagues fear that Django won't scale with the business.

I'm writing this post to garner additional evidence that Django will scale. Anything compelling (and preferably not obvious) that would help shed some light on Django's ability to scale would be *greatly* appreciated, as it's very difficult for me to defend my position that Django is a viable long-term solution without solid evidence to back up my claims. It certainly doesn't help that I don't have any experience scaling Django myself!

Thank you.

Avraham Serour

unread,
Feb 3, 2016, 10:46:25 AM2/3/16
to django-users
what do you mean by slow? can you measure in ms?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/83968c41-d415-4189-b33b-9f99b10b1c41%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rafael E. Ferrero

unread,
Feb 3, 2016, 10:47:20 AM2/3/16
to django...@googlegroups.com

Bill Blanchard

unread,
Feb 3, 2016, 10:49:51 AM2/3/16
to django...@googlegroups.com
Let's try to adress some of their concerns:

- We need to move to a service-oriented infrastructure because Django is too monolithic 

It depends on what your application does and what you're planning to do with it in the future.  People are quick prescribe SOA as the end all way to scale, but they tend to ignore the added complexity that comes with building out and integrating smaller services.

 - It's too easy to write slow queries using the Django ORM

It's just as easy (arguably easier) to write slow queries using pure SQL or any other ORM.  The ORM makes a lot of good decisions for mediocre programmers (I'd put myself in that category).  If you're a great programmer and have great programmers who really understand SQL, then you're just as likely to get your ORM queries right as you are straight SQL. 

- It's hard to hire Django engineers

Compared to what?  .NET or Java engineers?  Probably.  Harder than the newest shiny javascript framework engineers? Probably not.  Django has about as robust an  engineering population as Ruby/Rails does.  I don't know what you'd convert to in order to make hiring easier.  All engineers (especially good ones) are really hard to come by these days.  If you're looking at outsourcing to Southwest Asia, then yes, the Django population isn't as high as .NET/Java/PHP.  However, hiring challenges are most typically defined by your location and your ability and/or willingness to explore remote workers.

While Instagram and DISQUS use Django to service large numbers of people, they don't use it for any serious backend work

Reddit is also a large Django user.  All engineering decisions should be made around what your particular needs are and what skills your team possesses or is able to acquire.  Needs of an organization evolve over time and the organizations adjust as they need to.

Many organizations start with a Python/Django or Ruby/Rails application to build a product quickly which is what those stacks excel at.   A mantra typically heard in the community is "don't optimize prematurely".  If you're saying "man, we're going to hit a wall at 100,000 users", well you need to get to 90,000 users first before worrying about 100,000.  Getting the 90k users is the real hard part.

All this being said, your colleagues could be right to want to move off Django.  We don't know much about your particular circumstances.

For more information on optimizing Django for scale, check out this book.

Best of luck.

Bill




--

Sergiy Khohlov

unread,
Feb 3, 2016, 11:01:43 AM2/3/16
to django-users
Hello,
 Your first  words have a answer. Swift coding always produces performance problem.  This is expected.  Looks like  few new engineers  use another one technology  and would not like to use django.  This a reason of his criticism. Mostly low performance is related to the DB  performance. I'm preferring avoid using ManyToMany  ability by due to high res usage at the DB level. Writing  correct models and DB function helps in  the most case. I have no idea about proposed solution but I definitely sure that code produces bottleneck not programming language.  RubyOnRails has a same problem such a Django for example.. Have you got  good performance at JAva+Tomkat+Apache ?  I'm ready to see this high performance  ASP.
Half year ago I've rewritten GTS  service from C# to python. As result CPU usage dropped  from 45% to 7-9% and memory usage from 1.5Gb  to 300kb.
Wrong solution, mistakes at the building project requirement stage produces  more problem that selecting programming language.   

Many thanks,

Serge


+380 636150445
skype: skhohlov

On Wed, Feb 3, 2016 at 5:30 PM, Joshua Pokotilow <jpoko...@gmail.com> wrote:

--

Joshua Pokotilow

unread,
Feb 3, 2016, 11:02:38 AM2/3/16
to Django users
The service uses the Django REST Framework and takes multiple seconds to return a response. The response is a JSON array with thousands of dictionaries. We haven't yet investigated why it's slow, nor have we tried to cache / memoize anything to speed it up.

Remco Gerlich

unread,
Feb 3, 2016, 11:05:11 AM2/3/16
to django...@googlegroups.com
There is a book (ebook) named "High Performance Django" that has many useful tips.

Also, new software developers are _always_ skeptical when they start a new job. They have an old way of doing things from their previous job, "that's not how we work here!" reflexes also from that old job, and they take a while to adjust to the new tools.

 First let them get productive with what exists, _then_ let them gather real statistics about real problems when they want to change something.

You can have scalibility issues and solve them with whatever framework, Django included.

Remco Gerlich


--

Avraham Serour

unread,
Feb 3, 2016, 11:11:04 AM2/3/16
to django-users
if your problem is the DB or network or small processor it won't help rewriting the application.
The first step is to investigate the problem, then you can have solution, sometimes people have a solution and then look for a problem, in your case they want to leave python and django and are looking for problems with it.

more than which language, library and tech stack you use the system design impacts a lot in the overall speed among other things, I've worked with systems with too many indirections for example, the performance was not impacted so much by the libraries we were using or not.

sometimes rewriting can be a good because of this, it is a way to give you a chance to make a new design.

Avraham

Larry Martell

unread,
Feb 3, 2016, 11:14:47 AM2/3/16
to django...@googlegroups.com
I don't think there is a silver bullet that will fix all issues, nor
any one technology stack that will. I have a fairly good size django
app I built, and I did not consider performance all that much during
initial development. As the user base and dataset started to grow I
did see performance issues and I dealt with each one as they arose by
profiling the issue and seeing where the bottlenecks were. Often it
was not at all where I thought it would be. Each case had a different
solution, e.g.: database tuning, adding memory, writing custom queries
using temp tables, minimizing js code, optimizing jQuery code and DOM
manipulation, and so on.

Joshua Pokotilow

unread,
Feb 3, 2016, 11:16:14 AM2/3/16
to Django users
Thanks Bill. This was helpful. I understand that it's difficult to offer advice without too many specifics, so I'm hoping to get some high-level advice / examples of Django at scale, which you provided.

Thank you!

Joshua Pokotilow

unread,
Feb 3, 2016, 11:31:13 AM2/3/16
to Django users
Thank you Sergiy! I agree that the code needs to be fixed.

We don't have a Tomcat endpoint to compare with, although I did scare my coworkers a bit when we profiled a Django endpoint that took 300 - 400ms to return a response due (ostensibly) in large part to form object instantiation. Specifically, the bottleneck seemed to be that forms deep-copy their fields on instantiation.

Joshua Pokotilow

unread,
Feb 3, 2016, 11:42:23 AM2/3/16
to Django users
Thanks Remco. I'll look at High Performance Django.

Joshua Pokotilow

unread,
Feb 3, 2016, 11:54:33 AM2/3/16
to Django users
Thank you! I agree that we need to investigate before coming up with a solution.

Fred Stluka

unread,
Feb 3, 2016, 2:19:34 PM2/3/16
to django...@googlegroups.com
Joshua,

My team is producing a Django app with a small number of
users, so we haven't worried too much about performance yet,
but we know we may have to some day, so I've accumulated a
list of ways to improve performance in a JIRA ticket for if/when
it becomes a priority. 

We've done some of them already, with a quick and easy
thousand-fold increase in speed.

Here's a cut/paste of our ideas.  I hope you find it useful.

Optimize the site for speed

  • CPU speed
    • Do as much as possible in JS on the local device, not in Python/Django on the shared server
  • Disk speed
    • Use the AWS Console to change the disk drives of the server from magnetic
      disks to SSDs.
    • Set up a RAID volume
  • DB speed
    • Do all selects/joins/filtering in the DB. Do not query lots of data
      from the DB and then iterate over it or otherwise filter it in Python.
      Especially, do not query lots of data from the DB, pass it all to the
      browser, and iterate over it or otherwise filter it in JavaScript.
    • Create DB indexes for all fields used frequently in SELECTs
      • But don't over-index
    • Move the DB server to a different Linux server than the
      Web server, to split the load between 2 servers, if the DB
      is the bottleneck. Use Amazon RDS
    • http://www.revsys.com/12days/finding-sources-of-slowness/
      • EXPLAIN PLAN
      • EXPLAIN ANALYZE
  • Ajax
    • Use Ajax partial page loads instead of full page loads
    • Use Ajax to load rest of page while the user already has something to look at.
    • Don't get too chatty
      • Going back to the server on each keystroke for autocomplete is a heavy load. Might be better to wait a second or so after each keystroke to see if the user is still typing, using the same kind of algorithm Fred has used in the past, and that Google uses for auto-suggestions at Google Search.
        • 12/10/2012: Changed to use a 300 ms delay for now.
        • Should perhaps also delay search till 3 chars typed, instead of current
          2 chars
  • Django server-side caching
    • Prime candidate #1. May be costing us at least a couple seconds.
    • Django Template Caching
      • Prime candidate #1a. May be costing us at least a couple seconds.
      • Status: Done. Dramatic improvement.
    • Django page caching
    • Also, "Varnish" as an alternative to standard Django caching. See:
    • Also caching of template pages via template tag {% cache %}
      • Or is this just part of standard Django caching?
    • Could also re-order the INSTALLED_APPS list so that the apps with
      the most frequently loaded templates come first. That would make
      'django.template.loaders.app_directories.Loader' more efficient.
      • Not likely to matter since we're already doing template file
        caching. But if we ever stop using that...
    • Could also give up on the less efficient
      'django.template.loaders.app_directories.Loader', and just use
      'django.template.loaders.filesystem.Loader' which only looks in
      one location, not one per app. Especially since we include the
      app name in the reference to each template already. Could just
      create a single templates folder with app names as subfolders.
      • Not likely to matter since we're already doing template file
        caching. But if we ever stop using that...
  • Web Server speed
    • Apache httpd server
      • Optimal number of processes, threads, connections, open files
    • nginx instead of Apache
  • Caching on the phone
    • We may need to consider some caching of data on the phones
    • Have to be prepared for sluggish performance because some phones will be in areas with poor cell phone reception.
    • Especially for repetitive actions like Event Checkout.
  • Re-write specific pages to be faster
  • Scaling (vertical and horizontal):
    • Prime candidate #5. We really should not need to do this, but it's
      a cheap and easy way to get a quick fix, if none of the other
      prime candidates work well enough.
    • Vertically -- larger servers
    • Horizontally -- more servers
Enjoy!
--Fred
Fred Stluka -- mailto:fr...@bristle.com -- http://bristle.com/~fred/
Bristle Software, Inc -- http://bristle.com -- Glad to be of service!
Open Source: Without walls and fences, we need no Windows or Gates.

orzodk

unread,
Feb 3, 2016, 3:17:31 PM2/3/16
to django...@googlegroups.com

While optimizing the code will bring you improvements and you shouldn't
stop doing this, for the most part (as noted from Rafael's resources)
you should update your architecture to support Django in scaling.

As you mentioned, instead of hitting the DB for every multi-second API
call you scale it by caching results so they aren't recalculated on demand.


Microservices are something you should work toward by refactoring rather
than rewriting as that is a really great way to kill a start up.

http://steveblank.com/2011/01/25/startup-suicide-%E2%80%93-rewriting-the-code/

Daniel Chimeno

unread,
Feb 3, 2016, 6:28:53 PM2/3/16
to Django users
As you said the project is using DRF for an API, it came to my mind some blog post I've read about it:
I'm sure with some little tricks (that shouldn't be tricks after all) you'll go over that situation.
As others said, first look the problem, then search the solution.

In that specific case that you are getting thousand of results from database, you can go further in:
- SQL
- Caching
- Serialize
- Pagination

Hope it helps.

Russell Keith-Magee

unread,
Feb 3, 2016, 8:39:34 PM2/3/16
to Django Users
On Wed, Feb 3, 2016 at 11:30 PM, Joshua Pokotilow <jpoko...@gmail.com> wrote:
At the startup where I work, we've written a lot of our server code in Django. So far, we've adopted a "build it fast" mentality, so we invested very little time in optimizing our code. A small amount of load testing has revealed our codebase / infrastructure as it stands today needs to run faster and support more users.

We recently hired some new engineers who are extremely skeptical that we should optimize our existing code. Their main concerns are:

- We need to move to a service-oriented infrastructure because Django is too monolithic (monolithic = technology lock-in & difficult to troubleshoot)
- It's too easy to write slow queries using the Django ORM
- It's hard to hire Django engineers
- While Instagram and DISQUS use Django to service large numbers of people, they don't use it for any serious backend work

After having worked with Django for the last 3 years, I'm a big believer in it, and I believe it would scale. To defend my position, I've pointed out to my colleagues that it's easy to identify bottlenecks with tools like the Django Debug Toolbar and Yet Another Django Profiler. With my colleagues present, I've isolated and fixed significant speed problems inside of a few hours. I don't believe the Django ORM is inherently bad, although I do think that coders who use it should Know What They're Doing. Finally, I've referenced blog entries that talk about how Instagram and Disqus use Django on the backend for backend-y tasks.

Despite my best efforts, my colleagues are still pushing to have us rewrite large portions of our infrastructure as separate services before we try to fix them. For example, we have one slow REST endpoint that returns a boatload of user data, and so there's talk about using a new microservice for users in lieu of our existing Django models. Even if we are able to fix bottlenecks we encounter in a timely fashion, my colleagues fear that Django won't scale with the business.

My immediate reaction, knowing nothing about the site or it’s codebase - 

1) There’s nothing they’re proposing that excludes Django from the mix.
2) From an engineering management perspective, the solution they’re proposing is much more concerning than the problems you’re describing.

My suggestion for convincing management:
 
Tell them that you can write Microservices in Django. Because you can. Build a minimal Django stack - something that just returns a static Hello World - and do some load testing. This will prove that Django can serve high load - or at least as much load as whatever technology they’re proposing. 

Tell them that Microservices is just a new word for something software engineers have been calling “High cohesion, low coupling” since the 1960s. The only difference is that this time, instead of using the low latency, high speed interface of a function call, we’re using the slow, unreliable transfer of HTTP. If you’re actually focussing on performance, it’s trivial to build a high cohesion, low coupling stack in *any* technology. All that Webservices do is enforce this by making the inter-module barrier obvious.

Tell them that Microservices aren’t magic fairy sauce for speed. If the issue with your existing codebase is the speed of database queries, that problem isn’t going to go away by putting your code behind microservices. You’re just going to add the cost of inter-service HTTP transfer to the overhead of making a query. And if you’re putting something essential - like the user database - behind a service, then you’d better be prepared to add the round-trip time of a HTTP lookup onto Every. Single. Page. (Tell me again how this is good for performance?)

Teach them about Second Systems Syndrome [1] [2].


Tell them that while Django engineers might be hard to hire, they’re also relatively easy to grow from scratch. DjangoGirls proves you can take people with no experience in programming and make them competent Django developers. Take someone with a history in *any* programming language, and you can teach them Django; hire one or two Django experts to provide an internal knowledge and review, and you’re set.

Lastly, tell them that despite their protestations, your site isn’t Instagram, Disqus, or anything like it. 99% of web sites are not in the top 1% of websites by traffic. Your website is *not* in the top 1%. It might be one day. But it isn’t now. And if you’re *ever* in a position where you might end up in the top 1% - I can *guarantee* that it will be accompanied by a metric buttload of engineers and money who will have a lot more experience in scaling large scale services than any of the people who are proposing microservices as a silver bullet.

Now - I’m saying all this without having actually seen your code or knowing about your problem. It’s entirely possible that moving some of your code behind and SOA barrier would make good engineering sense. However, that doesn’t mean you need a re-write - it means you need a refactor. And, again, there’s nothing that precludes using Django for the services that get split out. Django is just a mechanism for mapping a HTTP request to a function that returns a HTTP response - and it’s one that has been repeatedly proven at *very* high scale.

Yours,
Russ Magee %-)

Erik Cederstrand

unread,
Feb 3, 2016, 10:16:03 PM2/3/16
to Django Users

> Den 3. feb. 2016 kl. 22.30 skrev Joshua Pokotilow <jpoko...@gmail.com>:
>
> At the startup where I work, we've written a lot of our server code in Django. So far, we've adopted a "build it fast" mentality, so we invested very little time in optimizing our code. A small amount of load testing has revealed our codebase / infrastructure as it stands today needs to run faster and support more users.
>
> We recently hired some new engineers who are extremely skeptical that we should optimize our existing code.

I was in a startup like that. We *had* a working solution, and we *had* customers. Not enough to pay our salaries, but enough to keep us and our investors hopeful.

Someone decided we needed a rewrite, because Django, because blog posts, because WebScale(TM), because in one year we might have 1000-fold users if our wildest startup dreams came true. So we started to rewrite. It was supposed to take one month. Our working, legacy solution started to deteriorate because we were busy rewriting. Two months. Bugs reports piled up in the tracker, but we didn't care because the rewrite was just around the corner and would solve everything, and we couldn't possibly work on two systems at the same time. Some customers left, but it was okay because our WebScale solution would make us filthy rich. Three months. Everyone was overworked, tired and the WebScale solution was still just around the corner... Four months, and our investors decided we were not part of the 1%.

In short, don't rewrite. Refactor. And know *exactly* why you are refactoring. As in, "We have profiled, discussed architecture, hardware, algorithms, etc, etc. The GIL is killing us and Guido doesn't care", not "Django doesn't scale". The year "monolithic" is an argument in itself is the year of the HURD desktop.

Except if you're programming in VBScript. Then by all means, rewrite.

Erik

bobhaugen

unread,
Feb 4, 2016, 5:42:06 PM2/4/16
to Django users
This is a sidelight to the OP, but he did mention django forms in one message. They are a dog. I have profiled a couple of slow pages with a lot of small forms and that's where all the time was spent (rendering forms on the server). We're moving those to DRF-serving-json to a javascript client-side framework. Not done yet, but the same data from DRF is way faster.

I would still be interested in some tips for speeding up django forms, though, because they are really great for speed of development, and they do work.

Luis Zárate

unread,
Feb 4, 2016, 8:49:07 PM2/4/16
to django...@googlegroups.com
" It's hard to hire Django engineers "

I don't think that this is a problem because good software developer can learn Django faster than other frameworks. For example I have a Costa Rican startup that develop in Django, as small company in small country we don't have inversor that allows to hire a big team so when we need to build a solution that is bigger than our capacity we need to hire temporary developers without django knowledge and training them.  In few days they are developing basic function that help in the project and help them to gain experience.

I think Costa Rica is an excellent place to make an inversion in training people, because with minimum salary in USA you can pay one developer and the training of other here so in few moths you will get high qualified engineers to work remote with only one hour of difference.  

So I suggests you to make a training plan for you future developers, with hackaton included :)



El jueves, 4 de febrero de 2016, bobhaugen <bob.h...@gmail.com> escribió:
> This is a sidelight to the OP, but he did mention django forms in one message. They are a dog. I have profiled a couple of slow pages with a lot of small forms and that's where all the time was spent (rendering forms on the server). We're moving those to DRF-serving-json to a javascript client-side framework. Not done yet, but the same data from DRF is way faster.
> I would still be interested in some tips for speeding up django forms, though, because they are really great for speed of development, and they do work.
>
> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
> To post to this group, send email to django...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-users.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/390f0a38-5d7c-4312-a1e3-90ea7bc2928d%40googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.
>

--
"La utopía sirve para caminar" Fernando Birri



Dexter T.

unread,
Feb 6, 2016, 2:20:26 PM2/6/16
to Django users
Lots of great replies already.
I also want to add a few (random) things ...

- have you clearly defined and isolated what issue(s) are you facing?
- you mentioned using DRF in a service, with a large JSON reponse taking seconds to finish, how did you troubleshoot/profile this? Seconds to process server-side? Seconds to download client-side? Where specifically? If you said you don't know, then find out!
- your system will have so many legs, have you made an effort to instrument and measure and isolate which parts are slow and why?
- you mentioned using the debug toolbar, have you proven that your database schema is optimal? Any queries in your slow queries log? Indexes used and ok and optimal? For your workload, can_read caching help? Db replicas be of help?
- how are your server resources utilized? Are you sure you are not bottlenecked by thrashing disk-io? Overcomitted CPU? Low memory/swapping? File descriptor count?
- have you checked if clients are not bottlenecked? An ajax call to download a complex nested json object is both costly to serialize, CPU and bandwidth wise. Gzip can help here, if applicable.
- for more context, can you share some numbers, like http and db level req/sec, row count for the most heavily used tables? How about server infrastructure specs?

Note that these are basic questions and are basic problem-solving steps, im assuming your teams should be aware and be taking steps like these already.

In one project of mine, we're doing a 100gb mysql db, some tables above 100mil recs and growing rapidly, properly indexed and optimized, it works ok on a lowly single vps instance with 8gb ram; workload is clearly oltp, we're throwing more sustained writes (100s/sec) than reads, all queries were scrutinized, almost all using the ORM, some handwritten SQL, other complex queries rewritten to be done at application level, joins are harder at this scale and therefore preferrably avoided (major architectural decision anticipated). But still we can easily throw hardware if needed.

For us, scaling is an continuous commitment to measure and refactor.

And one very important learning for me in my years of writing software: rewriting is very very very very costly.

These new engineers/other colleagues coming in, are they familiar with the domain problem, the exisiting codebase, the scale at which you operate now and expected in the future? Are they experienced in doing similar scaling before? And even if you think you can throw your old work, and now that you guys think you know better, be very careful of The-Second-System-Effect.

I hope you succeed.


Sergiy Khohlov

unread,
Feb 6, 2016, 5:03:11 PM2/6/16
to django-users
Print database structure.
Check possibility of DB normalization.
100 GB  (my "record" is 452 GB  )is not so high but  this size requires some attention. (Look like you Mysql used only one db file: try to set table per file.  Check index size , and verify that indexes  are working corectly)
Review  your project:
 try to avoid  Many to Many field
Is it possible switch from hardcode SQL to  stored function and procedure ?
 Look like  this issue in not connected to django only.


Many thanks,

Serge


+380 636150445
skype: skhohlov

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Dexter T.

unread,
Feb 6, 2016, 10:37:05 PM2/6/16
to Django users
Hi Sergiy, are you referring to my post or to the OP?


On Sunday, February 7, 2016 at 6:03:11 AM UTC+8, Sergiy Khohlov wrote:
Print database structure.
Check possibility of DB normalization.

You might have meant "denormalization" here (?), especially when operating at such scale. We do used denormalization for some of our larger tables.
 
100 GB  (my "record" is 452 GB  )is not so high but  this size requires some attention. (Look like you Mysql used only one db file: try to set table per file.  Check index size , and verify that indexes  are working corectly)

We are using innodb_file_per_table. But see that I mentioned that all this 100GB data fit on a lowly 8GB ram VM, 50% of which was allocated to innodb buffers. With such little resources, but at the same time intimately knowing your database workload, it is still possible to handle such db size. And yes, our indexes are used well, as most queries were EXPLAINed and optimized accordingly. 

What hardware are you running your 452GB db in?

Review  your project:
 try to avoid  Many to Many field
Is it possible switch from hardcode SQL to  stored function and procedure ?

See my above post about denormalization. And arguably storedprocs are even harder to manage, code-wise, and deployment wise.
 
 Look like  this issue in not connected to django only.

Again, if you are referring to my post, I am not the OP. Not that our system is perfect, and yes we're not the ones with scaling problems. 
I was in fact sharing the practices of scaling that worked for us. See the OPs post on what problems they're facing (organizational / political / methodological).

Cheers!

Sergiy Khohlov

unread,
Feb 7, 2016, 5:50:26 AM2/7/16
to django-users
Normalization  is something like that :

http://www.studytonight.com/dbms/database-normalization.php


 hardware for this Mysql was :
serg@anomehost:~$ free -m
             total       used       free     shared    buffers     cached
Mem:          4049       3920        129          0        338       2016
-/+ buffers/cache:       1565       2484
Swap:         2863        516       2347

serg@anomehost:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 15
model name      : Intel(R) Xeon(R) CPU           E5310  @ 1.60GHz
stepping        : 11
microcode       : 0xb7
cpu MHz         : 1596.170



 Mysql has a problem :
 db file is grown regularly and  run vacuum is  really hard. Only  partitutioning  helps in this case

Many thanks,

Serge


+380 636150445
skype: skhohlov

Reply all
Reply to author
Forward
0 new messages