What is the best way to do high number of webservice calls ?

98 views
Skip to first unread message

Ivan Rotrekl

unread,
Mar 9, 2016, 11:01:43 AM3/9/16
to Lucee
Hi,

I am rewriting an old application, which checks if a person or  a company from the list is in the registry of debtors via web service and get some more info from there.

( In the past I had to use a webservice which required incrementally duplicating the whole national database of debtors on my server.  It was quite troublesome,  as I needed  to download completely unrelated data and constantly react to db structure changes.
Now I can finally use a new webservice which checks each entity separately and provides the only related data )

But I have  more than 4000 entities (so far)  which I need to check every day.  So I am wondering what would be the best way to acomplish it ?

Each check includes:

- cfhttp call to the webservice
- parsing quite complex, namespaced, soap response
- and saving the data in multiple tables

I would definitely  like to do it via scheduled task at night.

1) I could simply make all  the cfhttp calls in one  cfloop.  But I don't like the idea of one giant request, which will take ages to finish. It will hog resources, timeout or  cause some other trouble.

2) I thought a simple cfm page like this, run as a scheduled taks, would break it down into more reasonable chunks:


<cfscript>
    param name="url.personid" default="0";    
</cfscript>

<cfif  url.personid>
   
<!--- do the processing here: call the web service, parse the response, save the  data ... --->    
</cfif>

<!--- get the next personID --->
<cfquery name="qNextPersonID" datasource="mydsn">    
        select top 1 personid  from persons where personid >
<cfqueryparam cfsqltype="cf_sql_int" value="#url.personid#"  />
</cfquery>

<!--- run the page again --->
<cfif qNextPersonID.recordCount>
   
<cfhttp method="get" url="http://#cgi.http_host#/-tasks/test.cfm?personid=#qNextPersonID.personid#" />
</cfif>

But this  actually also generates one request, taking  > 50 seconds to finish  without any actual processing. So I am not sure, if it is any better than the "giant loop" approach.

3)  I could come up with some javascript / ajax based solution. But I would like to do it on the server, so I don't need any client running.

Would anyone have  some idea  or  experience with something similar ?

Regards

Ivan










Mark Drew

unread,
Mar 9, 2016, 11:19:18 AM3/9/16
to lucee
Make a queue. 

So in one db table make a list of the id’s you are going to process and if they are processed, and then make the scheduled tasks to loop through them, and once done, you tick that off. so each time you process something one item gets taken out

you can then scale this as you can in effect soft-lock a record by putting a status of “processing” 


So your queue table might look like e

id  personID processingStartedDateTime processingEndedDateTime


So you can go and find the first personID that has a null processingStartedDate, enter the date, into the queue row (you have started processing) do your processing and then put a processingEndedDate

At any point you can query this table to see what is being processed, you can do multiple calls and start various requests processing it (they would each take a different person ID) 

Of course, you should lock those updates to the queue table in a transaction 

Hope that makes sense?

MD




--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lucee/ebb96d57-08c9-4dda-8293-799180f29536%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ivan Rotrekl

unread,
Mar 9, 2016, 12:32:10 PM3/9/16
to Lucee
Thanks a lot for such a quick repply !

If I understand this correctly, these are great  suggestions how to monitor and track the progress of the processing.

But I am actually more confused about how  to  execute the processing for several thousands rows ( ideally via scheduled task ).

Lets say, I have a controller function  which processes one person:  process(personid)  and  the corresponding   FW/1 url  /?action=person.process&personid={personid}

1) I could write another controller function processAll() which would simply loop over query of 4000 rows and invoke process(personid). I could then call it via scheduled task, increase the request timeout & hope for the best. But that seems a bit blunt.

2) Or I could invoke via scheduled task a page or function which would call itself via cfhttp repeatedly (as I wrote in my original post). But I am not sure if that would be any better than the loop.

I could imagine a client side code which would retrieve the next personid to be processed from the server, invoke the processing, wait for it to finish and  then repeat untill all done.

But I am not sure how to do it all server-side, ideally one by one. ( I would rather make 4000 relatively short-lived requests than one giant one )



Regards

Ivan





Mark Drew

unread,
Mar 10, 2016, 6:38:16 AM3/10/16
to lucee
What I would do is set a “optimal” batch (you can vary this) 

The scheduled task would call a page that processes 400 (for example). this runs every 30 seconds or minute or whatever. (a short period of time) 

You can then make it self healing, so part of your query is “ok, get me the next 400 that haven’t been processed”

Part of the query of the queue can see how many are processing (start time but no end time yet) and if that is pretty high, you just exit. 

Your script can have individual processing for each person etc. but the point here is that your system tracks and “self heals” 

You could also add onError code in there and what not but that is another story. 

So , to recap. You have a PersonQueue table, with :

id personID processingStartedDateTime processingEndedDateTime failureCount

You also have a scheduled task that runs very frequently. Your script will handle the  throttling by aborting early if conditions are not met

1) It does a query to find out how many items are being processed (WHERE processingStartedDateTime IS NOT NULL AND processingEndedDateTime IS NULL)
2) If there are loads being processed (say 400?) Quit. you are done. Other requests are doing the work. 
3) If there are a lot of OLD processingStartedDateTime WHERE processingEndedDateTime is NULL AND processingStartedDateTime > RequestTimeout… then we better fix these. 
4) Select them, mark them as failed once (update PersonQueue SET failureCount ++) And clear the processingStartedDateTime (this request can now abort)
5) the next request comes in (from the scheduled task) and sees that we have met the parameters (there aren’t too many people being processed and there aren’t a bunch that need to be cleared up) we can then:
6) Get 400 people to process, set their processingStartedDateTime and get to it! 
7) When you have processed one person set their processingEndedDateTime

as a quick note, I would not do:

UPDATE PersonQueue SET processingStartedDateTime = #now()#

(this was a tip from Cameron Childress actually) 

But rather I would set up at the top of your processing something like request.nowtime = Now() and then do:

UPDATE PersonQueue SET processingStartedDateTime = #request.nowtime#

This makes it easier to then query the items as they would all have the same start time and end time. 

Does this process make more sense?

MD






--
Love Lucee? Become a supporter and be part of the Lucee project today! - http://lucee.org/supporters/become-a-supporter.html
---
You received this message because you are subscribed to the Google Groups "Lucee" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lucee+un...@googlegroups.com.
To post to this group, send email to lu...@googlegroups.com.

Ivan Rotrekl

unread,
Mar 11, 2016, 12:55:27 PM3/11/16
to Lucee
Thank you very much for this detailed explanation. Yes, it  truly does make much more sense now. I really appreciate the time and effort you've spent on this.

Best Regards

Ivan
Reply all
Reply to author
Forward
0 new messages