DB Design question

24 views
Skip to first unread message

jordi collell

unread,
Aug 1, 2015, 2:59:38 AM8/1/15
to Django users
Hi all!

I have to store spreadsheet info on a db. Some kind of quotes where the user can store distinct prices for every zone. 

After modeling the data, i have a Cell like row. (related to a zone, and with a quantity field).. something like:

zone1  1  100
zone1  2  99
zone1  3  98

Every zone is a fk... The problem is, that data grows quick.. because some quotes can have up to 65 unit fields with 70 to 100 zones.. 

Currently i have a autopk field. but I'm not sure if it will scale.. If i have 5000 users and every users makes 100 quotations.. (7000 rows on the worst case) this is 3500000000.. easy enought to get out of autoincrement fields.. 

I'm thinking on getting rid of the pk field, and use a unique together with (zone_units). do you think this is a good aproach with the orm? I saw that is not possible to make primary key fields ( with grouped fields), but perhaps I can do it with instead of declaring a int field, have a char field storing the 
%s_%s_%s ( quote_id, zone_id, units ) ... 

do you think the last could be a good aproach? 

Sure if i have to shard the data (on the future) I can do it, using this kind of keys.. Data will be queryed by zone.. (For making comparasions of quotes)

I will apreciate any help on the matter.


Stephen J. Butler

unread,
Aug 1, 2015, 4:57:22 AM8/1/15
to django...@googlegroups.com
Why not use a BigIntegerField?

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/c2a69c66-d1b8-4f08-95ed-b26fbb1d762e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Schneider

unread,
Aug 1, 2015, 5:53:52 AM8/1/15
to django...@googlegroups.com
If you are talking about potentially having enough rows to extend past the AutoPK limits, you should consider instead using a UUID field as the PK:


The example in the docs uses UUID4. They index nicely and there are 2^128 (~3.4 x 10^38) UUID's available, the same as the total number of IPv6 addresses available. You'll run out of database resources (HDD, RAM, CPU) and more importantly lifespan before running out of UUID's. I've seen some larger DB applications use UUID's exclusively as their PK's for all tables.

Trying to merge fields to create uniqueness is difficult to do, wastes CPU, can be buggy if fields are missing, etc., and probably doesn't index as well.

If possible, I would also start thinking about a pruning/archiving strategy to keep your main tables as lean as possible. With a couple billion rows, you may see a bit of a stutter on queries. ;-D

-James


Javier Guerra Giraldez

unread,
Aug 1, 2015, 10:32:07 AM8/1/15
to django...@googlegroups.com
On Sat, Aug 1, 2015 at 4:53 AM, James Schneider <jrschn...@gmail.com> wrote:
> If you are talking about potentially having enough rows to extend past the
> AutoPK limits, you should consider instead using a UUID field as the PK:

note that this is only good advice if your DB handles it natively. if
not, Django will use a charField to store it and constantly
encode-decode it. If you're using PosgreSQL, then yes, UUID is great;
on MySQL, then it's far more efficient to use a BigInteger. 2^64
records still is far more than what you can store in the biggest
storage you can get.


--
Javier
Reply all
Reply to author
Forward
0 new messages