Django app crashes with out of memory

610 views
Skip to first unread message

Горобец Дмитрий

unread,
Oct 7, 2016, 4:21:48 AM10/7/16
to Django users
Hello.

I have VPS with 2Gb RAM with django, gunicorn, mysql, nginx and redis. 

My app crashes with out of memory error, when I run django command on model, which has approximately 7 million records. It takes each premise object and updates one field by checking value with regular expression. Please, help optimize that command.

class Command(BaseCommand):
    help = 'Updates premise.number_order'

    def handle(self, *args, **options):
        for premise in Premise.objects.iterator():
            premise.number_order = premise.set_number_order()
            premise.save()

        self.stdout.write('Finished')


# Method of Premise model
def set_number_order(self):
    tr = {
        'А': '.10',
        'A': '.10',
        'Б': '.20',
        'В': '.30',
        'Г': '.40',
        'Д': '.50',
        'Е': '.60',
        'Ж': '.70',
        'З': '.80',
        'И': '.90',
    }

    only_digit = re.compile(r'^(?P<number>[0-9]{1,9})$')
    digit_with_separator = re.compile(r'^(?P<number>[0-9]{1,9})(?P<separator>[-|/])(?P<rest>\w+)$')
    digit_with_letter = re.compile(r'^(?P<number>[0-9]{1,9})(?P<letter>[А-Яа-я]+)')
    result = 0
    title = self.title.strip().upper()

    if only_digit.match(title):
        number = only_digit.match(title).group('number')
        result = number + '.00'

    elif digit_with_separator.match(title):
        number = digit_with_separator.match(title).group('number')
        rest = digit_with_separator.match(title).group('rest')
        if rest[0].isalpha():
            floating = tr.get(rest[0], '.90')
            result = number + floating

        elif rest[0].isdigit():
            try:
                if rest[1].isdigit():
                    result = number + '.{}'.format(rest[:2])
                else:
                    result = number + '.0{}'.format(rest[0])
            except IndexError:
                result = number + '.0{}'.format(rest[0])

    elif digit_with_letter.match(title):
        number = digit_with_letter.match(title).group('number')
        letter = digit_with_letter.match(title).group('letter')[0]

        floating = tr.get(letter, '.90')
        result = number + floating

    return Decimal(result)

M Hashmi

unread,
Oct 7, 2016, 12:00:00 PM10/7/16
to django...@googlegroups.com
These are all memory hungry applications. Please provide some detail that how many worker processes initialized by the gunicorn and how do you bind it with Nginx. Nginx configurations also required to see details. Plus use "free -m" to see how much memory you've got allocated to all services. Use "df -h" to see hard drive space available. Finally if you have a swap file then what's the size. 

You can create swapfile with "sudo fallocate -l 4G /swapfile" but after getting this message it might not be good idea to do it now. Or if you have feeling that swapping might solve the is issue  you can backup the code and re-create your droplet and manage swapping from the start. 4GB swapzie in total will be enough I guess.

If I were you I would comment out redis and see if that solves problem. Not sure what other applications you may have but remember that Python consumes a lot of memory and using redis for cache management is really a good tool. But in some cases depending on your settings it needs some good processing power with memory to manage cache. This is why it is suitable for large scale projects and for middle size projects cache is managed within Django. 

I cannot pin point the solution for you right away because according to the scenario there could be multiple reasons. You can also configure Nginx for browser based cache at user end to minimize load. 

For piece of code you've provided looks like no issue there as per my limited understanding. I had issues with PHP based CMS but never had issue with Django/Bottle/Pyramid. 

Hope something might point you to the right direction but for now that's all I can say with provided information. But if solution is different than one of the above reasons please share with group as well. It is difficult to recreate the error using your exact scenario so hard to come up with exact solution.

Good Luck!

Regards,
Mudassar

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/6ccbf103-c24b-4e3a-982d-4e5db0f01972%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Горобец Дмитрий

unread,
Oct 10, 2016, 7:14:30 AM10/10/16
to Django users
Hello. 

Thanks for advices. 

Crash occures, when I run Django command through python manage.py my_command_name.

It doesn't relate to nginx and even gunicorn, because, again, it's Django command, which I call from shell.

I switched off redis caching (I use cacheops), but it' didn't help too.

пятница, 7 октября 2016 г., 13:21:48 UTC+5 пользователь Горобец Дмитрий написал:

Constantine Covtushenko

unread,
Oct 10, 2016, 1:02:12 PM10/10/16
to django...@googlegroups.com
Hi Dmitry,

Please check that documentation page.
As you can see iterator() loads all in the memory and after that returns an iterator.

Try to fetch records say by 50 - 100 items.

Hope that helps.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

James Schneider

unread,
Oct 10, 2016, 1:23:57 PM10/10/16
to django...@googlegroups.com
On Mon, Oct 10, 2016 at 10:01 AM, Constantine Covtushenko <constantine...@gmail.com> wrote:
Hi Dmitry,

Please check that documentation page.
As you can see iterator() loads all in the memory and after that returns an iterator.

Try to fetch records say by 50 - 100 items.

Hope that helps.


That's correct. The Queryset is evaluated (which means the entire result is loaded into memory). The iterator() call only optimizes the access to the result, not the result itself. 50 - 100 items may be a bit conservative with the size of machine that you have, but it all depends on the size of the objects and how utilized the server is. There is likely a point of diminishing returns somewhere higher (probably in the thousands), but only experimentation is going to help you find it.

I also wanted to draw attention to the note relating to DB caching, which may be adding extra taxation on your server for large queryset results (although I can't imagine a DB would try to cache a result that large without specifically being tuned to do so). 

Is there a reason this is run in a batch, rather than being performed during the initial save() or an update() of the data? Seems like the operation you are performing is pretty static and should be able to be calculated as the data is created and/or updated. Might save you from having to deal with this type of operation at all. 

-James
Reply all
Reply to author
Forward
0 new messages