Sure, I'd attend.
--
Michał (Saviq) Sawicz <mic...@sawicz.net>
Out of curiosity: When inserting lots of data, how do you do it? Using
the orm? Have you looked at http://pypi.python.org/pypi/dse/2.1.0 ? I
wrote DSE to solve inserting/updating huge sets of data, but if
there's a better way to do it that would be especially interesting to
hear more about ( and sorry for the self promotion ).
Regards,
Thomas
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
>
--
Mvh/Best regards,
Thomas Weholt
http://www.weholt.org
That sounds awesome. I wish you could present it at DjangoCon US too. :o/
Shawn
Hmmm, what do you mean by "bulk using update() + F()? Something like
"update sometable set somefield1 = somevalue1, somefield2 = somevalue2
where id in (1,2,3 .....)" ? Does "avg 13.8 mins/million" mean you
processed 13.8 million rows pr minute? What kind of hardware did you
use?
Thomas
Actually, I started working on something similar, but tried to find
sets of fields, instead of just updating one field pr update, but
didn't finish it because the actual grouping of the fields seem to
take alot of time/cpu/memory. Perhaps if I focused on updating one
field at the time it would be simpler. Might look at it again for DSE
3.0 ;-)
Thomas
--
Hello, Cal
First of all, congrats on the newborn! The Django community will surely benefit from having yet another success story, especially considering how big this project sounds. Is there any chance you could open-source some of your custom made improvements so that they could eventually be merged to trunk?
I definitely noticed how you mentioned large dbs in the past few months. I, along with many others I assume, would surely like to attend the webcast, with the only impediment being my schedule/timezone.
I recently asked about working with temporary tables for filtering/grouping data from uploads and inserting queries from that temporary table onto a permanent database. To make matters worse, I wanted to make this as flexible as possible (i.e. dynamic models) so that everything could be managed from a web app. Do you have any experience you could share about any of these use cases? As far as I know, there's nothing in the ORM that replicates PostgreSQL's CREATE TEMPORARY TABLE. My experience with SQL is rather limited, but from asking around, it seems like my project could indeed benefit from such a feature. If I had to guess, I would assume other DBMSs would offer something similar, but being limited to Postgres is okay for me, for now, anyway.
On Wed, Jun 22, 2011 at 3:25 PM, Andre Terra <andre...@gmail.com> wrote:Hello, Cal
First of all, congrats on the newborn! The Django community will surely benefit from having yet another success story, especially considering how big this project sounds. Is there any chance you could open-source some of your custom made improvements so that they could eventually be merged to trunk?
Thank you! Yeah, the plan is to release as much of the improvements as open source as possible. Although I'd rely heavily on the community to make them 'patch worthy' for the core, as the amount of spare time I have is somewhat limited.The improvements list is growing by the day, and I usually try and post as many snippets as I can, and/or tickets etc.It sounds like Thomas's DSE might be the perfect place for the bulk update code too.
I definitely noticed how you mentioned large dbs in the past few months. I, along with many others I assume, would surely like to attend the webcast, with the only impediment being my schedule/timezone.Once we've got a list of all the people who want to attend, I'll send out a mail asking for everyones timezone and availability, so we can figure out what is best for everyone.
I recently asked about working with temporary tables for filtering/grouping data from uploads and inserting queries from that temporary table onto a permanent database. To make matters worse, I wanted to make this as flexible as possible (i.e. dynamic models) so that everything could be managed from a web app. Do you have any experience you could share about any of these use cases? As far as I know, there's nothing in the ORM that replicates PostgreSQL's CREATE TEMPORARY TABLE. My experience with SQL is rather limited, but from asking around, it seems like my project could indeed benefit from such a feature. If I had to guess, I would assume other DBMSs would offer something similar, but being limited to Postgres is okay for me, for now, anyway.
I haven't had any exposure to Postgres, but my experience with temporary tables hasn't been a nice one (in regards to MySQL at least). MySQL has many gotchas when it comes to temporary tables and indexing, and on more than one occasion, I found it was actually quicker to analyse/mangle/re-insert the data via Python code, than it was to attempt the modifications within MySQL using a temporary table.It really does depend on what your data is, and what you want to do with it, which can make planning ahead somewhat tedious lol.For our stuff, when we need to do bulk modifications, we have a filtering rules list which is ran every hour against new rows (with is_checked=1 set on rows which have been checked). We then use bulk queries of 50k (id >= 0 AND id < 50000), rather than using LIMIT/OFFSET (because LIMIT/OFFSET gets slower and slower the larger the result set). Those queries are analysed/mangled within a transaction, and bulk updated using the method mentioned in the reply to Thomas.Sadly though, I can't say if the methods we use would be suitable for you, as we haven't tried it against Postgres, and we've only tested it against our own data set + requirements. This is what I mean by trial and error, it's a pain in the ass :)
Also, the 13.8 minutes per million, is basically a benchmark based on the amount of db writes, and the total amount of time it took to execute (which was 51s).Please also note, this code is doing a *heavy* amount of content analysis, but if you were to strip that out, the only overheads would be the map/filter/lambda, the time it takes to transmit to MySQL, and the time it takes for MySQL to perform the writes.
The database hardware spec is:1x X3440 quad core (2 cores assigned to MySQL).12GB memory (4 GB assigned to MySQL)./var/lib/mysql mapped to 2x Intel M3 SSD drives in RAID 1.CalOn Wed, Jun 22, 2011 at 2:52 PM, Cal Leeming [Simplicity Media Ltd] <cal.l...@simplicitymedialtd.co.uk> wrote:
Sorry, let me explain a little better.
If you're interested, please reply on-list so others can see.
--
If you're interested, please reply on-list so others can see.
Hello SleepyCal,
On Wednesday, June 22, 2011 6:15:48 AM UTC-7, SleepyCal wrote:If you're interested, please reply on-list so others can see.
+1
Also if the webcast could be stored for later viewing that would be grand.
Toodle-loooooooooooooo..........
creecode
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/SZabiWnq_S0J.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/SZabiWnq_S0J.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
Yup, I'm planning on recording in 1080p and posting on Youtube shortly afterwards.
Hi all,
Therefore, I'd like to see if there would be any interest in webcast in which I would explain how we handle such large amounts of data, the trial and error processes we went through, some really neat tricks we've done to avoid bottlenecks, our own approach to smart content filtering, and some of the valuable lessons we have learned. The webcast would be completely free of charge, last a couple of hours (with a short break) and anyone can attend. I'd also offer up a Q&A session at the end.If you're interested, please reply on-list so others can see.
FYI: Inspired by this discussion I've allready started on a similar
feature ( allthough somewhat simplified ) for DSE v2.2.0 and you're
right; the speed increase is huge using the method described here,
even compared to my current solution ( using cursor.executemany ),
which is considerably faster than the django orm allready. My testing
so far have been using postgresql, not sure how mysql will perform. I
expect to release DSE v.2.2.0 with this feature in the next few days.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
--
Hi all,Some of you may have noticed, in the last few months I've done quite a few posts/snippets about handling large data sets in Django. At the end of this month (after what seems like a lifetime of trial and error), we're finally going to be releasing a new site which holds around 40mil+ rows of data, grows by about 300-500k rows each day, handles 5GB of uploads per day, and can handle around 1024 requests per second on stress test on a moderately spec'd server.As the entire thing is written in Django (and a bunch of other open source products), I'd really like to give something back to the community. (stack incls Celery/RabbitMQ/Sphinx SE/PYQuery/Percona MySQL/NGINX/supervisord/debian etc)Therefore, I'd like to see if there would be any interest in webcast in which I would explain how we handle such large amounts of data, the trial and error processes we went through, some really neat tricks we've done to avoid bottlenecks, our own approach to smart content filtering, and some of the valuable lessons we have learned. The webcast would be completely free of charge, last a couple of hours (with a short break) and anyone can attend. I'd also offer up a Q&A session at the end.
If you're interested, please reply on-list so others can see.
ThanksCal
Cal,
That sounds awesome. I wish you could present it at DjangoCon US too. :o/
Shawn
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users+unsubscribe@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
Please, count me in.
I'm interested, visited the URL above already.
Time isn't going to be that big of an issue for me as I'm only 1 timezone away :)
Cheers,
Benedict
Here you go:
Count me in, and thanks to share your experience with others.
> If you're interested, please reply on-list so others can see.
Graziella
Can you please ..please...please..please record this session!?
-V
--
--