transaction orm slow with large list

94 views
Skip to first unread message

David Sedeño

unread,
Oct 1, 2015, 11:42:58 AM10/1/15
to Lucee
Hi,

We have some tasks that applys to a thousands of records. We are migrating to from plain sql to orm.

We need that this updates fires the model orm 'postupdate' event so I need to apply transactions individually (can't use ormexecutequery), but this transactions are really slow compared to an sqlquery (about 10x slow).

 for (id in list)
 {
     obj = EntityLoadByPk('myclass', id);
     transaction {
        obj.property = X;
     }
}

We split the thounsands of items in list of 100 elements, so the list var in the code only have 100 items.

I think that the open transaction/close transaction is very costly, but I don't know another way to do it that fires the event.

There are another way to do it more efficient ?

Best regards,
---
David Sedeño
CTO at todocoleccion.net




Jon Clausen

unread,
Oct 1, 2015, 12:17:29 PM10/1/15
to lu...@googlegroups.com
Updating thousands of records via iterator is going to be incredibly expensive and you should never put that kind of load on your server for something you could more effectively do in one transaction.  In your example below, you’re running thousands of SELECT queries and then thousands of UPDATE queries, each with its own transaction. 

You’re better off running a single update query (ORMExecuteQuery()) using an IN (list) statement.  You can still transaction the bulk update.  ORM is a valuable tool in your toolbox, but you’ll still need SQL to handle issues like you’ve outlined below. 
--
See Lucee at CFCamp Oct 22 & 23 2015 @ Munich Airport, Germany - Get your ticket NOW - http://www.cfcamp.org/
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/bfc25027-b129-4093-9bd6-b9543436f49b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Sedeño

unread,
Oct 1, 2015, 12:42:35 PM10/1/15
to lu...@googlegroups.com
Yes I know its a lot of load, but ORMExecuteQuery doesn't fire the ORM events. That's is my problem... I would need some hack to do the postinsert actions.

You received this message because you are subscribed to a topic in the Google Groups "Lucee" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lucee/yzWC_lprk6U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lucee+un...@googlegroups.com.

To post to this group, send email to lu...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
David Sedeño Fernández
Zoconet SL
Tel. 657 384 329

Jon Clausen

unread,
Oct 1, 2015, 2:56:22 PM10/1/15
to lu...@googlegroups.com
IMHO that kind of massive operation is worth writing some extra SQL to handle any of those post-inserts. Sure, there’s some duplication of functionality, but it’s probably a necessary evil.  Can you give me an example of a postInsert that needs to be done that couldn’t be done in the multi-update query?

David Sedeño

unread,
Oct 2, 2015, 9:38:46 AM10/2/15
to lu...@googlegroups.com
Well, for example, we need to update a elasticsearch with the new data, anotate the changes (this is not the culprit of the slowness, if we don't do that the transactions are slow too).

regards,




For more options, visit https://groups.google.com/d/optout.

Jon Clausen

unread,
Oct 2, 2015, 10:04:28 AM10/2/15
to lu...@googlegroups.com
OK.  So based on your example, you already have the list of ID’s to work with.  My suggestion would be:

- fire the bulk update query
- use your ids to pull a new query or orm array and loop it to:
— thread each iteration 
---fire off your elastisearch update and annotation methods
re-join all threads and wrap up

In the above scenario, you make two trips to the database instead of thousands, plus you take advantage of threading to asynchronously handle the API calls to elastisearch and your annotations.

Yes,  you lose the single ORM record postInsert functionality, but you dramatically reduce the overall time and resources it takes to handle the bulk update.

Chris Blackwell

unread,
Oct 3, 2015, 12:54:10 PM10/3/15
to lucee

Fwiw David we had the same issue linking elasticsearch to orm/mysql and in the end we rolled our own solution.

We use a combination of triggers, a mysql http udf, rabbitmq with a restful http proxy written in go, and java workers to process the data into elasticsearch.

Sounds complicated but it's been reliable even when we hit thousands of writes per second

David Sedeño

unread,
Oct 5, 2015, 2:12:44 AM10/5/15
to Lucee
Hi Chris,

We also have a in house solution with triggers and Mule for the integration with elasticsearch. I was looking for a cleaner solution with the event in orm and less logic in the DB.

Jochem van Dieten

unread,
Oct 5, 2015, 8:14:53 AM10/5/15
to lu...@googlegroups.com

On Oct 1, 2015 6:42 PM, "David Sedeño" wrote:
> We need that this updates fires the model orm 'postupdate' event so I need to apply transactions individually

Are you saying that if you put 2, or 10 or more updates in one transaction it doesn't work? If so, wouldn't that be a bug that needs fixing? If not, what is the performance of that?

Jochem

David Sedeño

unread,
Oct 6, 2015, 2:34:27 AM10/6/15
to lu...@googlegroups.com
More updates in one transaction works, but I have read somewhere that the events in an orm model doesn't fires in ormexecutequerys, only in individual transactions. And doing individually in a loop of thousands  the performance degraded in time. Maybe this could be improve.

Some test:

  - select top 1000 id from mytable
  - loop
      obj = EntityloadbyPk
      first = getTickCount();
      transaction {
          obj.property = 1;
      }
     second = getTickCount();
     echo(two - first & "ms<br>");

The first 30 iterations gets around 15ms meanwhile lasts iteration gets around 200ms.

Could be a bug somewhere ?

--
David Sedeño Fernández
CTO at Zoconet SL

Jochem van Dieten

unread,
Oct 6, 2015, 6:09:42 AM10/6/15
to lu...@googlegroups.com


On Oct 6, 2015 9:34 AM, "David Sedeño" wrote:


> 2015-10-05 14:14 GMT+02:00 Jochem van Dieten:
>> On Oct 1, 2015 6:42 PM, "David Sedeño" wrote:
>> > We need that this updates fires the model orm 'postupdate' event so I need to apply transactions individually
>>
>> Are you saying that if you put 2, or 10 or more updates in one transaction it doesn't work? If so, wouldn't that be a bug that needs fixing? If not, what is the performance of that?
>
> More updates in one transaction works, but I have read somewhere that the events in an orm model doesn't fires in ormexecutequerys, only in individual transactions.

So put more updates in one transaction without ormexecutequery.

> And doing individually in a loop of thousands  the performance degraded in time. Maybe this could be improve.

Maybe. But I would say the first part of that work is you proving this is:
- caused by Lucee and not an external system like the database or whatever index you are updating in the event handler;
- not caused by a coding pattern which does excessive commits causing I/O overload.

Jochem

Reply all
Reply to author
Forward
0 new messages