pyramid_tm custom data mangers for a transaction

80 views
Skip to first unread message

Srikanth Bemineni

unread,
Jan 23, 2017, 1:11:24 PM1/23/17
to pylons-discuss
Hi,

Are there any tutorials to write custom data mangers.  I am thinking of writing a custom data manager for a elastic search object, if database commit succeeds. Is this  good approach to solve this problem?

What is the difference between data manager and resource manger. ? http://transaction.readthedocs.io/en/latest/api.html 

Srikanth Bemineni

Michael Merickel

unread,
Jan 23, 2017, 1:26:08 PM1/23/17
to Pylons
The best resource I know is http://zodb.readthedocs.io/en/latest/transactions.html along with staring at various implementations of data managers. A good one to look at would be zope.sqlalchemy which is the data manager implementation for sqlalchemy sessions. There is also repoze.sendmail with a fairly tricky one for sending emails.

The main reason to make a data manager for a non-transactional endpoint is if you want the transaction to abort when the server is inaccessible. Other than that, you can actually get really far by just using `tm.addAfterCommitHook()` to do the work after a successful commit. For example, if writing to elasticsearch is not mission critical you could use the hook, allow it to fail but still commit data to your rdbms, and have a background job sync the failed attempt later.

- Michael

--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to pylons-discuss@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pylons-discuss/14052a61-ebbf-47ed-b8d0-199f61af68a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Kai Groner

unread,
Jan 23, 2017, 2:19:38 PM1/23/17
to pylons-...@googlegroups.com
I found the data manager interface somewhat convoluted, so I wrote some adapters that allow you to write datamanagers as generator functions (similar to contextlib.contextmanager).

@datamanager
def transactionally_do_something():
    try:
        # BEGIN
        yield
        # VOTE
        yield
    except Exception:
        # ABORT
        raise
    else:
        # FINISH

There's also a @commitmanager version which skips the BEGIN phase if you have nothing to put there, and there are inline flavors of both for attaching a one-off datamanager to the current transaction.

Thierry Florac

unread,
Jan 23, 2017, 3:52:28 PM1/23/17
to pylons-...@googlegroups.com
Did you had a look at "pyramid_es" package?
It provides a custom data manager for ElasticSearch...

Regards,
Thierry

--
You received this message because you are subscribed to the Google Groups "pylons-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pylons-discuss+unsubscribe@googlegroups.com.
To post to this group, send email to pylons-discuss@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Srikanth Bemineni

unread,
Jan 23, 2017, 6:33:12 PM1/23/17
to pylons-discuss
Hi,

I will definitely take look at this.

Jonathan Vanasco

unread,
Jan 24, 2017, 11:31:12 AM1/24/17
to pylons-discuss


On Monday, January 23, 2017 at 1:26:08 PM UTC-5, Michael Merickel wrote:
The best resource I know is http://zodb.readthedocs.io/en/latest/transactions.html along with staring at various implementations of data managers. A good one to look at would be zope.sqlalchemy which is the data manager implementation for sqlalchemy sessions. There is also repoze.sendmail with a fairly tricky one for sending emails.

The zope.sqlalchemy is the gold standard, I would just focus on that.  It handles the two-phase and savepoints perfectly.   I ported the savepoints from it to repoze.sendmail years ago, and there are still edge-cases popping up on it.


Other than that, you can actually get really far by just using `tm.addAfterCommitHook()` to do the work after a successful commit. For example, if writing to elasticsearch is not mission critical you could use the hook, allow it to fail but still commit data to your rdbms, and have a background job sync the failed attempt later.

Another reason to focus on zope.sqlalchemy is that a lot of Python transaction plugins would be better off written to addAfterCommitHook.  Many of the plugins out there basically implement this in a fancy way - they silently fail on errors and/or the service they control doesn't offer a two-phase commit that can roll back.

Mike Orr

unread,
Jan 28, 2017, 2:06:11 PM1/28/17
to pylons-...@googlegroups.com
On Tue, Jan 24, 2017 at 8:31 AM, Jonathan Vanasco <jvan...@gmail.com> wrote:
>
> The zope.sqlalchemy is the gold standard, I would just focus on that. It
> handles the two-phase and savepoints perfectly. I ported the savepoints
> from it to repoze.sendmail years ago, and there are still edge-cases popping
> up on it.

I've been using pyramid_mailer for six months without problem.

How are people synchronizing something transactional with something
non-transactional, especially a database record with a filesystem
file. I have a record representing a file attachment, and the file
itself. The files in aggregate are 30 GB so they're cumbersome to put
into the database and include in a database dump/restore/rsync. Kotti
is using 'filedepot' but I'm hesitant to trust a foreign
directory/filename structure when I've already got one working. So I
create or delete the file before the commit, but that runs the small
risk of orphan files or missing files. Conversely the file create or
delete can fail if the filesystem permissions are messed up, and it
would be desirable for the transaction to abort then. The first case
argues for doing it after the db transaction has successfully
committed, the second case argues for doing it before the commit, but
those are contradictory. It makes me wish the filesystem were
transactional.

Jonathan Vanasco

unread,
Jan 28, 2017, 3:43:28 PM1/28/17
to pylons-discuss

On Saturday, January 28, 2017 at 2:06:11 PM UTC-5, Mike Orr wrote:

How are people synchronizing something transactional with something
non-transactional, especially a database record with a filesystem
file.

I deal with this a lot in the area of S3 file archiving.

I abort the transaction if the s3 upload fails.  Usually I can delete the file via a except/finally clause -- but there are still edge cases where I could end up with a stray because the cleanup activity can fail.  

To handle that, I use a second SqlAlchemy connection in autocommit mode for bookkeeping, then run periodic tasks to reconcile issues later on.

from memory, it looks something like this (psuedocde!)

try:
    # transactional record
    objectRecord = {"id":id, "filename": filename}
    dbSessionTransaction.add(objectRecord)
    dbSessionTransaction.flush()

    # autocommit log
    objectLog = {"id": id, "status": "prep"}
    dbSessionAutocommit.add(objectLog)
    dbSessionAutocommit.flush()

    # mark a partial upload
    objectLog.status = 'uploading'
    dbSessionAutocommit.flush()

    boto.upload_file(filename)

    # mark uploaded
    objectLog.status = 'uploaded'
    dbSessionAutocommit.flush()

    # mark the file
    objectRecord.status = 'uploaded'
    objectRecord.timestamp_upload = datetime()
    dbSessionTransaction.commit()

except:
    # mark the autocommit as "deletion-start"
    # try to delete the file
    # mark the autocommit as "deleted"

I periodically look through the autocommit logs to see if anything is stuck and make sure deleted items are not in s3.

it's definitely overkill, but it has so far caught all the (rare) edge cases.

Mike Orr

unread,
Jan 29, 2017, 12:44:57 PM1/29/17
to pylons-...@googlegroups.com
I sometimes use a separate database connection with autocommit for
logging, but I do it in a request finalizer. I also do it in a
separate database because the database becomes large and other
applications use it too.
> --
> You received this message because you are subscribed to the Google Groups
> "pylons-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pylons-discus...@googlegroups.com.
> To post to this group, send email to pylons-...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/pylons-discuss/00c9a773-ac1b-40c7-9198-d46132f4c17e%40googlegroups.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
Mike Orr <slugg...@gmail.com>

Jonathan Vanasco

unread,
Jan 30, 2017, 2:20:04 PM1/30/17
to pylons-discuss


On Sunday, January 29, 2017 at 12:44:57 PM UTC-5, Mike Orr wrote:
I sometimes use a separate database connection with autocommit for
logging, but I do it in a request finalizer. I also do it in a
separate database because the database becomes large and other
applications use it too.

This has to be done in real time to work correctly.  my logging tables for this type of stuff are on a separate backup strategy, and quickly cleared out -  there's no need for them after a reconciliation is done.

Mike Orr

unread,
Jan 31, 2017, 1:04:25 PM1/31/17
to pylons-...@googlegroups.com
Your logs must be for a different purpose than mine. My logs are for
future queries: how much is a section used, what are the most popular
records and search terms, what are the most common failed searches
(misspellings we may want to do something about), etc. You don't know
what future needs will be. We have to strike a balance between the
usefulness of the dataset and the volume of records. So "access log"
records we keep for 60-90 days and make monthly counts from. Searches
are kept until the next version is released, and then the previous one
is archived as "top 100 version X.X search terms".

As for the "orphan file/missing file" problem I mentioned with a
non-transactional filesystem, we handle that with a routine that looks
for such inconsistencies and makes a report.

Jonathan Vanasco

unread,
Jan 31, 2017, 4:27:23 PM1/31/17
to pylons-discuss


On Tuesday, January 31, 2017 at 1:04:25 PM UTC-5, Mike Orr wrote:
 
Your logs must be for a different purpose than mine.

Not really.  I log most of the same things as you, and usually aggregate at the end too.  The exception is that we run statsd as well, and that logs in real time. But every log/table/etc has it's own data retention policy.

Our autocommit S3 logging is done to a dedicated table with a dedicated connection, and that particular table's retention policy is that it can be cleared out after a reconciliation program runs.  This particular usage needs to run in real time, because uncaught exceptions and unclean shutdowns can result in missed logging actions

Srikanth Bemineni

unread,
May 13, 2017, 9:21:14 AM5/13/17
to pylons-discuss
Hi,

Here is my first shot at creating a data manager for Elasticsearch. This perfectly fits my project requirements. 



Thank you
Reply all
Reply to author
Forward
0 new messages