Is there some aggregation ability built-in in the SQLAlchemy?
I want some simple functions like counting rows and determining the last
one.
I know it's something simple to implement with Mapper Extensions, but may be
there are readily available solutions?
And is there interest if I'll make some code on that?
--
Paul.
Hmmm, do you mean joining relations against a subrelation that uses
an aggregate like MAX ? i'd like to see what you have in mind for this.
It looks quite like you're saying, but I want that aggregations
to be stored (cached?) in the table.
Suppose you have, a table of articles and comments on them.
I want to have article.last_comment and article.number_of_comments
to be stored in columns and to be updated when I'm saving Comment
object (which e.g. has a foreign key to article).
And I really want that to be atomic updates (update set cnt=cnt+1)
instead of updating counter on Article instance and saving it.
we've just added the atomic update thing to 0.4 (note the uppercase
Article which produces a column expression):
article.comment_count = Article.comment_count + 1
session.flush()
i wonder though if theres some way that could get "out of sync" with
the actual number. you'd have to be careful to establish this
update. if you wanted to use a mapper extension to do it, then youd
have to issue the UPDATE directly, the change to the "comment_count"
attribute wont get picked up if its established within the flush itself.
Well, while writing this e-mail I've realized that there can
be some problems with deletes and updates that probably
can make value out of sync.
I'll just try and give a feedback after that :)
--
Paul.
Hi,
Thank you Michael for help, and for new great docs!
I've finally done first POC implementation of this feature.
Basic usage looks like:
import aggregator as a
mapper(Line, lines,
extension=a.Quick(a.Count(blocks.c.lines),
a.Max(blocks.c.lastline, lines.c.id)))
(You also need foreign keys)
I've implemented two interfaces one
that counts like "cnt = cnt+1" (called Quick)
second does query "SELECT count(*) FROM ...."
at each insert/update/delete (called Accurate)
Well, Quick should be accurate as long as you either don't
update foreign keys involved in process, or have only one
Count aggregation on changable key (*), or use transactions.
So it covers most usecases I think. If you like to recalc you can
use Accurate which can be better in some edge cases.
A bit slower on updates but still fast on reads.
(*) - aggregations on update is not implemented yet, but it's
only a matter of time
For more comprehensive description look here:
http://www.mr-pc.kiev.ua/projects/SQLAlchemyAggregation
I have also two issues.
1. func.if_(...) - tries to sql function "if_(...)",
but func._if(...) - "_i(...)"
I believe is typo?
2. If there a way, to handle functions in cross-database manner?
I've need to write something like this, for max function which
exists on sqlite but absent in mysql (and if() visa-versa):
if aggregator.mapper.local_table.\
metadata.bind.url.drivername == 'mysql':
return if_func(A < B, B, A)
else:
return func.max(A, B)
Is there a better way?
(Well, I will decrease number of dots it no better way exists :) )
--
Paul.
On Aug 7, 1:20 pm, Paul Colomiets <paul.colomi...@gmail.com> wrote:
>
> 1. func.if_(...) - tries to sql function "if_(...)",
> but func._if(...) - "_i(...)"
> I believe is typo?
er, probably. func is trying to sidestep various "_" underscore
attributes i think. do you need to say "_if(...)" ?
> 2. If there a way, to handle functions in cross-database manner?
this is ticket 615 which seems to be becoming very important, so i
think ill move it to 0.4xx milestone and increase priority.
hi, i have similar idea/need within dbcook, although on a somewhat
higher level:
<pre>
cache_results/: (dbcook/SA) add-on for automaticaly-updated database
denormalisation caches of intermediate results, each one depending on
particular pattern of usage. Wishful syntax:
class SomeCacheKlas( Base):
fieldname = cache_aggregator( klas.field, AggrFilterCriteria)
#e.g.
#class Cache4averagePerson( Base):
#age = cache_agregators.Average( Person.age, Filter1 )
#salary = cache_agregators.Sum( Person.salary, Filter2 )
</pre>
i was thinking on using triggers and/or sql-functions but then this is
just one way to do it - and the idea is to hide the implementation.
Can your implementation be extended to use:
a) more complex aggregator expressions (e.g. average(), that is
sum() / count(), and similar dependencies)
b) more complex filters - e.g. not max() on all rows, but on some
subset/select
think of generic report calculations/aggregations, and putting those
into some cacheing table.
Eventualy getting the report being built on-the-run - distributing the
big wait over the atomary updates.
> http://www.mr-pc.kiev.ua/projects/SQLAlchemyAggregation
this gives me 404
No I need to say "IF" but I can't say func.if() currently I use
getattr('func','if') but It's ugly.
It strips last char when sees first underscore. But should ether strip
first char, or check end of string for an underscore.
Patch attached :)
avg = property(lambda self: self.sum / self.count)
I currently have no idea on how to make it better (less writing?)
Also documenting extension interface is in to do list.
> b) more complex filters - e.g. not max() on all rows, but on some
> subset/select
>
Yes. I'm looking for syntax for that. Functionality seems
very similar for `relation()` so may be
`a.Max(...,primaryjoin=...)` would do.
> > Can your implementation be extended to use:
> > a) more complex aggregator expressions (e.g. average(), that is
> > sum() / count(), and similar dependencies)
>
> Definetly can. Now you can just use property for that:
> avg = property(lambda self: self.sum / self.count)
naaah, sorry, that was too easy. i mean more complex aggregation
functions... i guess it can, if u can do sum=sum+x, then u could do
sq=sq+x*x/2
> > b) more complex filters - e.g. not max() on all rows, but on
> > some subset/select
>
> Yes. I'm looking for syntax for that. Functionality seems
> very similar for `relation()` so may be
> `a.Max(...,primaryjoin=...)` would do.
hmmm... relation... yess, it is like a plural relation but getting a
singular result out of it, and never ever loading the items.
Are u using something along PropertyLoader?
ciao
svil
> 2. If there a way, to handle functions in cross-database manner?
> I've need to write something like this, for max function which
> exists on sqlite but absent in mysql (and if() visa-versa):
- u can at least get that 'if...mysql' out of the actual code, just
define a global if_func( usualargs, aggregator=None) and do swicthing
inside there. Or, better, define the if_func() as method on the
aggregator itself (that is, the mapperExtension?) thus separating
somewhat implementation from usage.
- Max.onupdate will recurse forever in the else:
- the min/max has to be redone not to use maxint (or any other
machine/value limit) - one 'if' more..
- why returning a dict if it never contains more than 1 value?
it clutters your code a lot... maybe 'return something' or None (or
raise something) would be better.. eventualy making some generic
updater like:
def do_updates( self, instance, table, fields, method, condition):
updates = {}
for f in fields:
r = getattr( f, method)( self, instance)
if r is not None: updates[ f.target.name] = r
table.update( condition, values=updates).execute()
ah, nevermind.