feedback on commands to CAP relational table sizes

1 view
Skip to first unread message

Jak Sprats

unread,
Jan 16, 2011, 6:02:02 AM1/16/11
to redisql-dev
Hi All,

I am going to implement some size caps in AlchemyDB's relational
tables and need opinions on whether the proposed syntax makes sense.

The following command would limit the number of rows a particular FK
had in a table to 100, using the PK to determine the eviction sorting
order (which in auto-increment means get rid of the oldest)
ALTER TABLE ADD CONSTRAINT LIMITROWS 100 KEY (FK) ORDER BY(PK)

The following will evict rows older than one day w/ num_views less
than 100
ALTER TABLE ADD CONSTRAINT EXPIRE(timestamp,86400) AND
MINVAL(num_views,100)

Does this make sense?
Can the syntax be improved?
Are there other ways a relational table is commonly capped?

Any feedback would be appreciated.

- Jak

Didier Spezia

unread,
Jan 23, 2011, 6:08:57 AM1/23/11
to redisql-dev
Hi Jak,

one use case I had several times was to implement an history table
for a given kind of objects in a database. In this situation,
the primary key of the history table is typically composed of the
foreign key identifying the object plus some autoincrement key
(or some timestamp).

So IMO the FK is very often a subset of the PK.
I think your first proposal is more generic and should cover also
this use case ...

My opinion is ordering by PK is enough, because I guess most
of the situations where another order is needed are precisely due
to the lack of capping features.

I remember implementing an history in SQL where the primary key
was something like id,pos with pos being an integer modulo n.
Instead of inserting a new history line and deleting the last obsolete
line, I only had to update id,pos modulo n (optimizing history
maintenance over history retrieval). In this situation, the order of
the
PK does not represent anymore the order of the history, but such
tricks are not required if a proper capping mechanism is provided.

Your second proposal (constraint expire) looks difficult to implement
efficiently because it will probably involve two indexes (one for
timestamp,
one for num_views). IMO, the complexity is the same as a join: you
need a range scan on an index, and for each item, a lookup in the
other
one, or two range scans a la merge join. With a normal disk-based
database,
it would probably be better implemented using a daily batch to delete
expired rows. Now, for a memory-based database such as alchemy,
perhaps it makes sense to check it at each insert.

Generally, I tend to think that triggers and constraints do overlap.
If you have a good trigger implementation, constraints are not so
useful
(and they are less generic than triggers).

Just my 0.02 euros ...

Regards,
Didier.

Jak Sprats

unread,
Jan 24, 2011, 12:45:04 PM1/24/11
to redisql-dev
Hi Didier,

thanks for the feedback, it helps, I am still working on this idea and
the syntax. It is a very important product decision, and I am taking
my time, letting it evolve slowly (so I do it right :).

>> So IMO the FK is very often a subset of the PK. I think your first proposal is more generic and should cover also this use case ...

Right I am looking to implement flexible, yet sane controls.

>> My opinion is ordering by PK is enough

This is a strange point, the "ORDER BY (PK)" is technically not
needed, because a foreign key (in AlchemyDB) stores sorted PKs, so the
ordering is implicit, and this directive is more-or-less to make the
command more sensible. I do have to think about if opening this up
will add complexity (seems like it would), I dont want to support
people doing:
"ALTER TABLE ADD CONSTRAINT LIMITROWS 100 KEY (FK) ORDER BY(FK2) "
because this would require an at-trigger-time sort, which is not
performant .... so possibly just documenting that the order is
naturally by PK is the right solution, I dont see any reason for
another type of ordering in these CAP constraints (or none worth
implementing) and every other ordering is not in the Indexes natural
ordering, so it is a bad idea ... so that part of the command is
gone :)

>> primary key was something like id,pos with pos being an integer modulo n

This is basically an implementation of these CAPs in application
logic, so this feature would provide a simple way to do this server-
side. And good to hear that people hit this use-case. The thinking on
these CAP features is that the danger of an InRAMDB is over time, data
that never gets accessed will take up the same resources (RAM) that
really hot data will, and this is just an unacceptable usage of RAM.

>> Your second proposal (constraint expire) looks difficult to implement efficiently because it will probably involve two indexes (one for timestamp,
one for num_views).

It will be difficult to implement, but it will not have multiple-index
or join-like complexity, but not making it have those will be the hard
thing. Capping to both expiration-date AND minval(col) is an extreme
use-case, I put it in just to push this topic to an extreme.
Nevertheless, this can be done by putting a single index on multiple
columns. In the example:
ALTER TABLE ADD CONSTRAINT EXPIRE(timestamp,86400) AND
MINVAL(num_views,100)
The index would be on (timestamp, num_views), so the first part of the
index-lookup would find all rows older than 86400 seconds, and the
second part of the index-lookup would search w/in THOSE rows to find
rows w/ "num_Views less than 100". The AND in the statement means it
can be done w/ a single index that indexes multiple columns. (I did a
poor job explaining this, but if you know multiple column indexes, you
will get it)
Still this is tough/painful to implement and can definitely wait :)

>> daily batch to delete expired rows

I think I will have to have trigger caps, and also cron caps ... this
seems inescapable.

>>triggers and constraints do overlap

this is very true. I initially implemented the LIMITROWS(100) CAP in
Lua here: https://gist.github.com/780217
the problem is for CAPs like this, running from LUA takes a minimum 2X
more time, and in this case 3X more.
Single threaded servers need to process all OPs AQAP because each OP
blocks ALL others, so this seemed like a use-case that should be
optimised for performance and also standardised, because most people
are gonna be suspicious of triggers calling LUA functions :)

Still this was another example where having LUA as awesome, because I
implemented this stuff in 2 hours, and then decided it needed to be
quicker, so it will be redone in C w/ a simple syntax.

Just FYI (for further discussion and to have it documented in group)
This is how Oracle TimesTen declares their CAPs
1.) LRU Aging run when table reaches maxMB, every reap_interval secs,
stop when table below minMB
ttAgingLRUConfig(minMB,maxMB,reap_interval)
2.) ADD AGING USE ColumnName LIFETIME 86400

So this topic still needs to be thought on .... implementation will
not start for minimum 2-3 weeks.

- Jak
Reply all
Reply to author
Forward
0 new messages