Avoiding huge IDs after deleting data

29 views
Skip to first unread message

Sean Whalen

unread,
Aug 5, 2013, 5:26:00 PM8/5/13
to django...@googlegroups.com
I have a Django/PostgreSQL application to analyze data from tweets. The dataset increases by thousands of records with each request. I am using the database primarily as a relational cache, so I had planned to delete all records every 24 hours to permit new requests, without needlessly increasing the size of the database.

However, because the Django ORM uses the SERIAL data type to store the IDs in the DB; the IDs get larger, even when all existing records have been deleted. Eventually, I will run out of key space. What can I do to make Django produce smaller ID values?

Javier Guerra Giraldez

unread,
Aug 5, 2013, 5:36:39 PM8/5/13
to django...@googlegroups.com
On Mon, Aug 5, 2013 at 4:26 PM, Sean Whalen <whale...@gmail.com> wrote:
> However, because the Django ORM uses the SERIAL data type to store the IDs
> in the DB; the IDs get larger, even when all existing records have been
> deleted. Eventually, I will run out of key space. What can I do to make
> Django produce smaller ID values?


instead of "producing smaller ID values" (which could potentially
collide if you decide to keep a few "old" records), try switching to
bigserial. even if you get a million requests per day, producing a
billion new records, you won't exhaust the keyspace before several
billion days (almost 50 million years from now)

--
Javier

Sean Whalen

unread,
Aug 5, 2013, 5:52:08 PM8/5/13
to django...@googlegroups.com
How can the models be configured to use bigserial? I know I could convert the tables myself, but that is not helpful for distributing the app.

It looks like someone requested that feature three years ago.

https://code.djangoproject.com/ticket/14286



--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/Ro7TOBuBEQg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-users.
For more options, visit https://groups.google.com/groups/opt_out.



Drew Ferguson

unread,
Aug 5, 2013, 5:56:16 PM8/5/13
to django...@googlegroups.com
On Mon, 5 Aug 2013 14:26:00 -0700 (PDT)
Sean Whalen <whale...@gmail.com> wrote:

> I have a Django/PostgreSQL application to analyze data from tweets. The
> dataset increases by thousands of records with each request. I am using
> the database primarily as a relational cache, so I had planned to delete
> all records every 24 hours to permit new requests, without needlessly
> increasing the size of the database.
>
> However, because the Django ORM uses the
> SERIAL<http://www.postgresql.org/docs/current/interactive/datatype-numeric.html#DATATYPE-SERIAL>data
> type to store the IDs in the DB; the IDs get larger, even when all
> existing records have been deleted. Eventually, I will run out of key
> space. What can I do to make Django produce smaller ID values?
>

Rather than deleting records could you truncate the table instead? Then
you can reset the sequence with this...

TRUNCATE mytable RESTART IDENTITY

http://www.postgresql.org/docs/9.1/interactive/sql-truncate.html

--
Drew Ferguson

Javier Guerra Giraldez

unread,
Aug 5, 2013, 6:37:37 PM8/5/13
to django...@googlegroups.com
On Mon, Aug 5, 2013 at 4:52 PM, Sean Whalen <whale...@gmail.com> wrote:
> How can the models be configured to use bigserial? I know I could convert
> the tables myself, but that is not helpful for distributing the app.

https://docs.djangoproject.com/en/1.5/ref/django-admin/#sqlcustom-appname-appname

you can provide custom SQL commands that are executed right after
table creation by syncdb. in short, just add
"<appname>/sql/<modelname>.sql" or
"<appname>/sql/<modelname>.<dbbackend>.sql" files.


i guess South also provides some way to express the same change as a migration.

--
Javier

Nigel Legg

unread,
Aug 6, 2013, 2:41:25 AM8/6/13
to django...@googlegroups.com
Two thoughts:
1.  If you are analysing tweets, why are you deleting on a daily basis? Wouldn't a longer term view give you a better picture?
2.  Why not use the status_id field from the tweet as your id? The twitter have the problem of dealing with it's size, and you will (should, or at least could) be getting it in the JSON you download.

--
Javier

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages