IMHO it's not a micro-optimization. RDBMs systems will often take a performance hit on the COMMIT vs rollback when there are multiple simultaneous transactions, and it can cause issues on clustered/replicant systems.
I often forget about this too. The techniques that have worked for me:
* using sqlalchemy events to issue a mark_changed on the `before_execute` event. sometimes i'll have the event parse the statement sent to the backend for an insert/update/etc.
* never using `session.execute` directly. instead i use a helper function that takes the session and execute params, it then does the work and applies mark changed.
I've done a few other things too, but those are the ones I recall the most