26 views
Skip to first unread message

MAbeeTT

unread,
May 24, 2016, 4:42:35 PM5/24/16
to web...@googlegroups.com
I have a table with a huge amount of records. There are aprox 3*10^6
registers and counting.

I need via scheduler delete the olders, bit the criteria of old
register is determined by a datetime filed:

db.define_table('foo',
Field('foo_file', 'upload', required=True, notnull=True,
autodelete=True, uploadseparate=True),
Field('foo_date', 'datetime', notnull=True))

Since there there is no relation between id and date (the user/system
could put data older than the current date).

There is a controller which shows the lasts 10 registers with the
indicate criteria. But the query takes too much time, more than 7
seconds just the query. I have also a periodically task archiving the
older registers.

rows = db().select( orderby=~db.foo.foo_date, limitby=(0, 10))


I need a way in the dal for indicate that the foo_time datefile field
should be indexed or similar in order to get fastest queries! :/ I am
using mysql.

Thanks in advance.


--
.::MAbeeTT::.

mabeett [at] gmail [ dot] com

Niphlod

unread,
May 24, 2016, 6:03:13 PM5/24/16
to web2py-users, MAb...@gmail.com
index support is on the way but for the time being just stick an index manually.
CREATE INDEX ix_foo_foo_date on foo (foo_date)
should suffice.


on further iterations, you should consider how much you'd like to be exact showing the last 10 registers: you can't really expect to scale (index or no index) if every hit to  the page sends a query for the last 10 registers on a 6M rows table. 
The first thing to consider would be caching the damn select ^_^. 
The second to use a smaller table that gets periodically truncated and refreshed (say, every 5 minutes) in the background by a scheduler task that fetches the latest records from the "big one" and inserts the records on the smaller.
Reply all
Reply to author
Forward
0 new messages