object.create speed, creating 3 objects takes 1 sec in sqlite

71 views
Skip to first unread message

Anton

unread,
Aug 3, 2012, 12:05:47 PM8/3/12
to django...@googlegroups.com
Hi I am new to Django.

I worked through the tutorial.

I created a small test app on:
- Win7 64bit
- python 2.7.3 (32bit)
- Django 1.4.1
- using the development server


Now I wanted to populate the (sqlite backend) database
with a python script.

My Problem:

I Insert 3 Object (like MyModel.objects.create(name="..", comment="..") ).

They consts only of 2 strings.

Inserting these 3 items take roughly 1 sec.

In settings.py I set DEBUG = False, but this doesnt help much
(I readed something about this on the net).

I retried with the mysql backend:
- mysql 5.5.27 (64bit)
- using sockets (named pipe on windows)

Now the time varies between 0.2 sec and 0.8 sec every time
I insert suche a 3 object tripplet.


My Problem: I will populate it with 50000 items.

Has Django really such a bad performance for data insertion?

I can't believe it, so ... can somebody give me a hint?
Is there a doc page dedicated to speed/performance issues??

Otherwise I have to look for an other system.

Thanks

Anton





Kurtis Mullins

unread,
Aug 3, 2012, 4:17:01 PM8/3/12
to django...@googlegroups.com
If you plan to create a batch of objects, I wouldn't recommend going that route. Instead, check this out:


Has Django really such a bad performance for data insertion?

The Object Relational Mapper does quite a bit more than simply write SQL insertion queries. I won't go into the details simply because I don't want to run you in the wrong direction but if you're interested in that topic, I'm sure someone else can fill you in on the details.

However, the main bottleneck that I see is in the structure. If you expect 5,000 queries at once on a regular basis then you may want to structure your application around that. Or, at least the part that performs this batch insertion. Even a relational database management system will most likely have issues with performing that type of duty directly (without the Django layer) unless it's optimized and tuned correctly, sitting on extremely fast IO hardware, etc...

Judging from your times with SQLite, I would guess that part of your problem is the IO performance (Hard Drive), since SQLite stores its data directly into a file.

Not sure if this answers your question but hopefully it might lead you in the right direction :) Good luck!
Message has been deleted

creecode

unread,
Aug 6, 2012, 11:14:28 AM8/6/12
to django...@googlegroups.com, ant...@gmx.de
Hello Anton,

On Friday, August 3, 2012 9:05:47 AM UTC-7, Anton wrote:
 
My Problem: I will populate it with 50000 items.

Has Django really such a bad performance for data insertion?

I can't believe it, so ... can somebody give me a hint?
Is there a doc page dedicated to speed/performance issues??

Otherwise I have to look for an other system.

Before you go looking for another system you may want to do some research on inserting lots of rows in Django.  Cal Leeming on this list has had some experience dealing with lots of rows and there are several threads on the topic.  Check the Google interface for this group and use the search feature.  Also try Googling for inserting lots of data with Django, several folks have written about their experiences inserting lots of rows.

Toodle-loooooooooooooo...................
creecode

Cal Leeming [Simplicity Media Ltd]

unread,
Aug 8, 2012, 12:18:31 AM8/8/12
to django...@googlegroups.com
Hi Anton,

In short, attempting to do any sort of bulk import "out of the box" with the ORM, will always end with bad performance.

No matter how fast your disks are (SSDs with 4000 iops in RAID 1 for example), you'll still only get around 0.1s per insert via the ORM on a single thread, and if you multi thread the import on a MySQL backend then you'll end up hitting the notorious high throughput race condition ( https://code.djangoproject.com/ticket/18557 )

The method used to speed up your bulk imports depend entirely on the actual data itself.

Depending on whether or not you absolutely need all the functionality of the ORM during import (signals, hooks etc) - then you could also look at using bulk_create() that comes with Django 1.4.

I actually did a webcast about a year ago on how to do bulk updates to achieve massively increased throughput (from 30 rows/sec to 8000 rows/sec) - and this method was eventually integrated into the DSE plugin (  http://pypi.python.org/pypi/dse/3.3.0 - look for bulk_update ).

As your table and indexes grow, both read/write performance will slowly start to get worse once you pass the 1mil rows point - so make sure to put some thought into the initial model structure (using ints to define choices where possible, no indexes on varchar/char if possible, make good use of composite/compound indexes etc).

Failing that, if you want to drag every single microsecond of performance out of this, then you could performance the bulk imports as raw SQL.

Hope this helps!

Cal

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/yLxStReK_1gJ.

To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.


Message has been deleted

Cal Leeming [Simplicity Media Ltd]

unread,
Aug 8, 2012, 8:26:17 AM8/8/12
to django...@googlegroups.com
Just to chime in again here, there are many alternatives, but don't give up on MySQL straight away.

Solutions such as Cassandra, MongoDB, Hadoop clustering etc are also lovely, but they all come at a cost (whether that be additional risk, development time, performance/feature trade offs, the "unknown" etc).

Here is an interesting article by David Mytton over at BoxedIce, which is just one example of some of the unexpected problems that can come up


In my own experience, I've seen MySQL handle 400+ million rows without much read/write performance hit for a specific data structure, but I've also seen it suffer terribly at both read/write performance for 900+ million rows for a different data structure, on the same hardware.

As always, you need to identify what your data is and how you intend to use it - before you can decide both how it needs to be stored, and how it will be extracted.

Hope this helps - I should also note that I am by no means an expect on the subject, I'm just quoting off my own production experience :)

Cal


On Wed, Aug 8, 2012 at 11:41 AM, Ivo Marcelo Leonardi Zaniolo <imarc...@gmail.com> wrote:

Some years ago I got a similar problem using postgresql. To.improve the bulk performance I chose to disable all keys and  indexes after bulk the data.
On postgres it solved my problem, but if you need more performance, and have more than one computer, I recommend that you take a look at Cassandra, it has a grate insert performance and could be used to hold that large data.

Take a look at this post, I wrote that in Portuguese, but google can translate it:

Http://imarcelolz.blogspot.com/2011/11/cassandra-hadoopy-performance-tunning.html

Ivo Marcelo Leonardi Zaniolo
+55 71 9302 3400
imarc...@gmail.com
www.informatizzare.com.br
imarcelolz.blogspot.com.br




--

Cal Leeming
Technical Support | Simplicity Media Ltd
US 310-362-7070UK 02476 100401 Direct 02476 100402

Available 24 hours a day, 7 days a week.


Reply all
Reply to author
Forward
Message has been deleted
0 new messages