Handling millions of rows + large bulk processing (now 700+ mil rows)

254 views
Skip to first unread message

Cal Leeming [Simplicity Media Ltd]

unread,
Jun 30, 2012, 11:10:27 AM6/30/12
to django...@googlegroups.com
Hi all,

As some of you know, I did a live webcast last year (July 2011) on our LLG project, which explained how we overcome some of the problems associated with large data processing.

After reviewing the video, I found that the sound quality was very poor, the slides weren't very well structured, and some of the information is now out of date (at the time it was 40mil rows, now we're dealing with 700+mil rows).

Therefore, I'm considering doing another live webcast (except this time it'll be recorded+posted the next day, the stream will be available in 1080p, it'll be far better structured, and will only last 50 minutes).

The topics I'd like to cover are:

* Bulk data processing where bulk_insert() is still not viable (we went from 30 rows/sec to 8000 rows/sec on bulk data processing, whilst still using the ORM - no raw sql here!!)
* Applying faux child/parent relationship when standard ORM is too expensive (allows for ORM approach without the cost)
* Applying faux ORM read-only structure to legacy applications (allows ORM usage on schemas that weren't properly designed, and cannot be changed - for example, vendor software with no source code).
* New Relic is beautiful, but expensive. Hear more about our plans to make an open source version.
* Appropriate use cases for IAAS vs colo with SSDs.
* Percona is amazing, some of the tips/tricks we've learned over.

If you'd like to see this happen, please leave a reply in the thread - if enough people want this, then we'll do public vote for the scheduled date.

Cheers

Cal

Rafael Durán Castañeda

unread,
Jun 30, 2012, 8:36:12 PM6/30/12
to django...@googlegroups.com
El 30/06/12 17:10, Cal Leeming [Simplicity Media Ltd] escribi�:
> --
> You received this message because you are subscribed to the Google
> Groups "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to
> django-users...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/django-users?hl=en.
That would be great!

Setiaman Lee

unread,
Jun 30, 2012, 8:40:17 PM6/30/12
to django...@googlegroups.com

Hi Cal,
Interesting topic. Count me in.

--

Sunny Nanda

unread,
Jul 1, 2012, 12:40:38 AM7/1/12
to django...@googlegroups.com
Count me in please.

Mario Gudelj

unread,
Jul 1, 2012, 1:24:33 AM7/1/12
to django...@googlegroups.com
+1

s

unread,
Jul 1, 2012, 1:54:14 AM7/1/12
to django...@googlegroups.com

Great

--

Àlex Pérez

unread,
Jul 1, 2012, 3:42:26 AM7/1/12
to django...@googlegroups.com
+1

2012/7/1 s <saxix...@gmail.com>



--
Alex Perez
alex....@bebabum.com
 
 bebabum be successful

c/ Còrsega 301-303, Àtic 2
08008 Barcelona
http://www.bebabum.com
http://www.facebook.com/bebabum
http://twitter.com/bebabum

This message is intended exclusively for its addressee and may contain
information that is confidential and protected by professional privilege. 
If you are not the intended recipient you are hereby notified that any 
dissemination, copy or disclosure of this communication is strictly prohibited by law.

Este mensaje se dirige exclusivamente a su destinatario y puede contener
información privilegiada o confidencial. Si no es vd. el destinatario indicado,
queda notificado que la utilización, divulgación y/o copia sin autorización 
está prohibida en virtud de la legislación vigente.

Le informamos que los datos personales que facilite/ha facilitado pasarán a
formar parte de un fichero responsabilidad de bebabum, S.L. y que tiene 
por finalidad gestionar las relaciones con usted. 
Tiene derecho al acceso, rectificación cancelación y oposición en nuestra
oficina ubicada en c/ Còrsega 301-303, Àtic 2 de Barcelona o a la dirección de e-mail lo...@bebabum.com

Nikhil Verma

unread,
Jul 1, 2012, 3:46:44 AM7/1/12
to django...@googlegroups.com
Great . +1
Regards
Nikhil Verma
+91-958-273-3156

ionic drive

unread,
Jul 1, 2012, 3:52:08 AM7/1/12
to django...@googlegroups.com
+1 great!

Alec Taylor

unread,
Jul 1, 2012, 4:06:26 AM7/1/12
to django...@googlegroups.com
+! :]

Davinir F Campos Jr

unread,
Jul 1, 2012, 8:05:29 AM7/1/12
to django...@googlegroups.com

+1

James

unread,
Jul 1, 2012, 9:55:03 AM7/1/12
to django...@googlegroups.com
Sounds great! 

william ratcliff

unread,
Jul 1, 2012, 10:02:33 AM7/1/12
to django...@googlegroups.com
Sounds fun!
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/CLrrYBziHO4J.

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 1, 2012, 11:09:05 AM7/1/12
to django...@googlegroups.com
Wow - glad to see there's people interested in this!

Here is the schedule, could everyone please select which days/times they are available (enter more than one if possible)


I'll leave the schedule open until 14th July, whichever slot gets the most votes wins.

Given our awful experiences with conferencing software, we'll probably be using livestream, and a backup stream from one of our own servers - both have a maximum capacity of 50 users at 720p.

Cal

Alec Taylor

unread,
Jul 2, 2012, 12:35:04 AM7/2/12
to django...@googlegroups.com
Sounds good, but have you considered using Google+ Hangouts?

On Mon, Jul 2, 2012 at 1:09 AM, Cal Leeming [Simplicity Media Ltd]
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.

Derek

unread,
Jul 2, 2012, 2:41:33 AM7/2/12
to django...@googlegroups.com
Cal

Can you please confirm which timezone these times are for (I assume UTC, but was not sure...)

Thanks
Derek


On Sunday, 1 July 2012 17:09:05 UTC+2, Cal Leeming [Simplicity Media Ltd] wrote:
Wow - glad to see there's people interested in this!

Here is the schedule, could everyone please select which days/times they are available (enter more than one if possible)


I'll leave the schedule open until 14th July, whichever slot gets the most votes wins.

Given our awful experiences with conferencing software, we'll probably be using livestream, and a backup stream from one of our own servers - both have a maximum capacity of 50 users at 720p.

Cal

Raphael

unread,
Jul 2, 2012, 3:13:08 AM7/2/12
to django...@googlegroups.com
Hello Derek,

they (doodle) do not mention which time scheme they use. But I guess its UTC.
In the upper right corner of the doodle poll your location gets targeted via Geo IP - you can change this preselection manually.


--
Raphael
http://develissimo.com


On Sun, 2012-07-01 at 23:41 -0700, Derek wrote:



--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/y9kkZoS4LdgJ.

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 2, 2012, 4:58:00 AM7/2/12
to django...@googlegroups.com
Oops, sorry forgot to mention that it's in UTC :)

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 2, 2012, 4:59:30 AM7/2/12
to django...@googlegroups.com
Curious, I'll have to test the quality, but we might potentially use hangouts for the backup stream instead - it'd certainly be a lot easier if the quality was good!

Cheers

Cal

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 2, 2012, 7:35:23 PM7/2/12
to django...@googlegroups.com
Just in case anyone missed the URL, you can book your slot here:


Voting open until 14th July.

Cal

Timothy Makobu

unread,
Jul 4, 2012, 2:30:14 AM7/4/12
to django...@googlegroups.com
I'm in.

Babatunde Akinyanmi

unread,
Jul 5, 2012, 6:18:31 AM7/5/12
to django...@googlegroups.com
For the knowledge of it, me two
Sent from my mobile device

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 12, 2012, 3:37:54 PM7/12/12
to django...@googlegroups.com
Just a reminder that the poll will be closing in two days time, if you haven't already booked, please do so now!

Javier Guerra Giraldez

unread,
Jul 12, 2012, 11:38:25 PM7/12/12
to django...@googlegroups.com
On Thu, Jul 12, 2012 at 2:37 PM, Cal Leeming [Simplicity Media Ltd]
<cal.l...@simplicitymedialtd.co.uk> wrote:
> Just a reminder that the poll will be closing in two days time, if you
> haven't already booked, please do so now!

I thought somebody asked this but can't find it now... is there any
recording of the first one available anywhere? (the 40mil one)

--
Javier

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 13, 2012, 7:25:46 AM7/13/12
to django...@googlegroups.com
Hi Javier,

Sadly, the original is not available due to bad sound quality, poor planning and out of date information.

That is partially the reason why I'm doing it again :)

Cal

Javier Guerra Giraldez

unread,
Jul 13, 2012, 11:50:42 AM7/13/12
to django...@googlegroups.com
On Fri, Jul 13, 2012 at 6:25 AM, Cal Leeming [Simplicity Media Ltd]
<cal.l...@simplicitymedialtd.co.uk> wrote:
> Sadly, the original is not available due to bad sound quality, poor planning
> and out of date information.

then count me in!

(not that i prefer to see it recorded, i just wanted to check a 'preview')

--
Javier

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 16, 2012, 7:23:00 AM7/16/12
to django...@googlegroups.com
Hi guys,

This has now been confirmed for the following date:

 - Thursday 9th August 2012 - 7:00 PM (UTC)

Webcast details will be sent 48 hours before.

Cheers

Cal

Fadi Samara

unread,
Jul 16, 2012, 3:07:03 PM7/16/12
to django...@googlegroups.com
Interested.

b1-

unread,
Jul 21, 2012, 2:02:14 AM7/21/12
to django...@googlegroups.com
+1

And use some services like twich with auto VOD writing.
I think it will be intresting not only for us

Javi Romero

unread,
Jul 21, 2012, 10:37:48 AM7/21/12
to django...@googlegroups.com
I'm in :D

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 22, 2012, 10:05:45 AM7/22/12
to django...@googlegroups.com
Could you clarify what you mean by "twich" and "auto VOD writing"..? Google failed me :/

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/2Dkdxaxw3JoJ.

Babatunde Akinyanmi

unread,
Jul 22, 2012, 10:27:59 AM7/22/12
to django...@googlegroups.com
Google failed me too

On 7/22/12, Cal Leeming [Simplicity Media Ltd]

Sithembewena Lloyd Dube

unread,
Jul 22, 2012, 6:34:20 PM7/22/12
to django...@googlegroups.com
Hi Cal,

I know it's confirmed, but I'm in like Flynn as well ...

While I've never had to sort things out on that scale, it'd be interesting to tap into that knowledge.
Regards,
Sithembewena Lloyd Dube

Alex

unread,
Jul 23, 2012, 2:54:28 AM7/23/12
to django...@googlegroups.com
Hello!

Will you make the video available after the webcast?


Thanks.

Cal Leeming [Simplicity Media Ltd]

unread,
Jul 23, 2012, 8:21:03 AM7/23/12
to django...@googlegroups.com
Hi Alex,

Yup - video will be made available - although I may ask if anyone here with a Vimeo/Youtube Director account would mind uploading it instead.. Otherwise the video will need splitting :/

Cal

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/x7P21YF3MM4J.

Cal Leeming [Simplicity Media Ltd]

unread,
Aug 6, 2012, 9:40:14 PM8/6/12
to django...@googlegroups.com
Hi everyone.

Really sorry guys - but due to last minute work commitments I'm going to have to delay this webcast for a short while.

Although we could have still gone ahead - we didn't have any slides or tidy example code ready, and I really didn't want the whole thing to be rushed like the last time.

A few people have suggested that we do the webcast offline, post it up on Vimeo, then accept questions afterwards in several Google hangout sessions, so we might end up doing this instead to avoid delaying any further.

Again, my apologies to everyone for the last minute cancellation on this - we'll get there in the end!

Cal
--

Cal Leeming
Technical Support | Simplicity Media Ltd
US 310-362-7070UK 02476 100401 Direct 02476 100402

Available 24 hours a day, 7 days a week.


ApogeeGMail

unread,
Aug 21, 2012, 6:24:39 PM8/21/12
to django...@googlegroups.com
I would VERY interested..

Thanks

Richard
On Jun 30, 2012, at 11:10 AM, Cal Leeming [Simplicity Media Ltd] wrote:

Hi all,

As some of you know, I did a live webcast last year (July 2011) on our LLG project, which explained how we overcome some of the problems associated with large data processing.

After reviewing the video, I found that the sound quality was very poor, the slides weren't very well structured, and some of the information is now out of date (at the time it was 40mil rows, now we're dealing with 700+mil rows).

Therefore, I'm considering doing another live webcast (except this time it'll be recorded+posted the next day, the stream will be available in 1080p, it'll be far better structured, and will only last 50 minutes).

The topics I'd like to cover are:

* Bulk data processing where bulk_insert() is still not viable (we went from 30 rows/sec to 8000 rows/sec on bulk data processing, whilst still using the ORM - no raw sql here!!)
* Applying faux child/parent relationship when standard ORM is too expensive (allows for ORM approach without the cost)
* Applying faux ORM read-only structure to legacy applications (allows ORM usage on schemas that weren't properly designed, and cannot be changed - for example, vendor software with no source code).
* New Relic is beautiful, but expensive. Hear more about our plans to make an open source version.
* Appropriate use cases for IAAS vs colo with SSDs.
* Percona is amazing, some of the tips/tricks we've learned over.

If you'd like to see this happen, please leave a reply in the thread - if enough people want this, then we'll do public vote for the scheduled date.

Cheers

Cal

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Richard Smith



Reply all
Reply to author
Forward
0 new messages