Handing of the multiple task at a same time

62 views
Skip to first unread message

Hemanth Kumar

unread,
Mar 18, 2016, 8:46:59 AM3/18/16
to Google App Engine

HI ALL

   Urgently required some help.

I am facing a slowness problem while writing the data to the big query.

My problem is I have 5000 task , and each task is interacting with the Rest Api. The rest api is giving me the JSON response. I am parsing the JSON response, and the same response I have to write to big query. 

Every JSON response will give me around 3000 List of JSON Array.
If I calculate the total data which will get inserted in the google big query is around 15000000. 
To insert 47000 record its taking 1 hour. If I have to insert lakhs of record of data , my performance is getting hit.

Please give a good suggestion to improve the time complexity for this problem. How I will resolve this using Google App Engine,

Nick (Cloud Platform Support)

unread,
Mar 18, 2016, 3:26:50 PM3/18/16
to Google App Engine
Hey Hemanth,

There remain, after your post, some questions as to how you've implemented your system. The rate of inserts according to 47,000 / hour is approximately 13 / second. That seems very low - the maximum rate of inserts for streaming inserts is 100,000 rows per second, per table and 1,000,000 rows per second, per project.

Correct me if I'm wrong, but this starts to look like this number (47,000 in 1 hour) represents the overall time of the entire pipeline from API call to sending the BigQuery insert(), not simply the rate at which inserts could theoretically take place.

Ultimately, the best way to insert BigQuery rows is not with isolated HTTP requests, which have a lot of overhead costs, but with batch inserts. It might be worth looking into ways that you could aggregate the records in a layer after the task pipeline but before BigQuery which would allow you to send big batched inserts.

Let me know your thoughts on this, and best wishes,

Nick
Cloud Platform Community Support 

Hemanth Kumar

unread,
Mar 21, 2016, 12:59:38 AM3/21/16
to Google App Engine
HI Nick,

       This includes a call to the API getting the response and inserting into big query. For the batch insert I am using TableRow only, but at a time I can only form a batch of 3000 records, and then call it to the big query. As I had written earlier, we have split our whole logic into a sets of task. The total count of task is around 5000. And each task will give me 3000 records. To complete all the task, how to reduce the time complexity, its a biggest challenge for me.

Earlier , before breaking the job into multiple task , I created only one task, but to execute the whole task it was taking a long time because it almost 5000 times we have to hit the rest api to get the response. So in that case timeout exception was coming, to overcome this we splitted this whole job into 5000 task. Now , its not throwing the timeout exception, but the execution of all these task is taking a long time. 

Please suggest me how to overcome of this problem. It will be very useful for me.

Nickolas Daskalou

unread,
Mar 21, 2016, 1:21:38 AM3/21/16
to Google App Engine
Hi Hemanth,

Is your task queue set up to allow enough concurrency and/or execution rate?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/8475603f-a249-407a-b180-2e5f4b51ed54%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hemanth Kumar

unread,
Mar 21, 2016, 1:40:43 AM3/21/16
to Google App Engine
HI Nick

    My queue configuration is like this

<queue>
<name>appannie-queue</name>
<rate>60/s</rate>
<bucket-size>100</bucket-size>
<max-concurrent-requests>50</max-concurrent-requests>
<retry-parameters>
<task-retry-limit>0</task-retry-limit>
</retry-parameters>
</queue>

But with this, the problem is maximum of the tasks are getting failed and the number of records which I am getting in the big query is very less its about 300000 Lakh instead of 2400000 lakh. Not able to figure out how  to resolve this.

Nickolas Daskalou

unread,
Mar 21, 2016, 2:13:18 AM3/21/16
to Google App Engine
Why are most of the tasks failing? What is the most common error when a task fails?


Hemanth Kumar

unread,
Mar 21, 2016, 2:44:21 AM3/21/16
to google-a...@googlegroups.com
The error is like process is already taken by some other request, like that its coming

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/0tp7ZF0Mv1I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.

To post to this group, send email to google-a...@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.

Nick (Cloud Platform Support)

unread,
Mar 21, 2016, 11:02:04 AM3/21/16
to Google App Engine
Hey Hemanth,

I suggest that you attempt to batch requests, as mentioned in my previous comment, as this could show some performance improvements without the network overhead of one request per record. Additionally, you should post the actual error text which the other user requested, as this is highly relevant to diagnosing task failure.

Regards,


Nick
Cloud Platform Community Support

To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/0tp7ZF0Mv1I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages