Architectural advice needed: background report generation

Karoly Kantor

unread,

Jun 10, 2017, 5:01:52 AM6/10/17

to Google App Engine

I have a rather complex data base structure on which I need to enable users to define various reports and run them in the background. (As the generation time might exceed what is acceptable real-time.) I want to enable users to launch the generation of these reports and then get a notification when the report is done and ready for download. I am on Google App Engine python / Cloud SQL.

1. What are my architectural options to achieve this? What is the recommended setup?

2. How can I ensure that background report generation is done with lower priority than real time page requests, to make sure that background report generation jobs will not degrade primary user experience?

Thank you.

Alex Martelli

unread,

Jun 10, 2017, 3:00:27 PM6/10/17

to google-a...@googlegroups.com

On Sat, Jun 10, 2017 at 2:01 AM, Karoly Kantor <kar...@kantor.hu> wrote:

I have a rather complex data base structure on which I need to enable users to define various reports and run them in the background. (As the generation time might exceed what is acceptable real-time.) I want to enable users to launch the generation of these reports and then get a notification when the report is done and ready for download. I am on Google App Engine python / Cloud SQL.

1. What are my architectural options to achieve this? What is the recommended setup?

To allow report generation to proceed without time limits, it must take place in a service (once called "a module") that is not auto-scaled; the choices of scaling types are therefore manual and basic. For more on scaling types and instance classes, see https://cloud.google.com/appengine/docs/standard/python/an-overview-of-app-engine#scaling_types_and_instance_classes . At some point in the past, auto-scaled services (then called "modules") where known as front-ends, the other kinds as back-ends; that terminology is not in use any more, but you'll notice that the instance classes for autoscaled services have names starting with "F" while those for non-autoscaled services have names starting with "B" -- the last remainder of that terminology!-)

I would recommend basic scaling, which, as the referenced URL says, "is ideal for work that is intermittent or driven by user activity"; differently than with manual scaling, you don't have to worry about starting and stopping instances -- rather, the service "will create an instance when the application receives a request. The instance will be turned down when the app becomes idle".

The natural way to assign some unit of work ("task") to a basic-scaling service is via App Engine task queues, see https://cloud.google.com/appengine/docs/standard/python/taskqueue/ . Specifically, your use case seems to be a good match for the "push" kind of task queue.

The natural way for the service to alert the user when the report is ready is for it to send the user e-mail at that time. Unfortunately, the quotas for maximum numbers of emails and recipients thereof are nowadays very low; a new mail service is under study, but unless and until it is launched, the recommendation is to use third-party providers like sendgrid, see https://cloud.google.com/appengine/docs/standard/python/mail/sendgrid .

You could have the report as an attachment to the email, if you're sure it will never be too large for that; alternatively, you could include in the email only a link from which the report can be downloaded -- to avoid any size limit, put the report in google cloud storage and have your email link to the GCS URL of the report. For more on GCS, see https://cloud.google.com/storage/docs/ .

2. How can I ensure that background report generation is done with lower priority than real time page requests, to make sure that background report generation jobs will not degrade primary user experience?

If you follow my architectural recommendations above, and have the report generation done by a service that is separate from the default (autoscaled) one handling the "primary user experience", then, no worry on this score: the two kinds of tasks will run on separate instances, each with its own amount of CPU and RAM (defined by the instance classes you choose to use for each service). The only issues may have to do with resources that are inevitably shared by both services, which should be limited to the database (Cloud SQL, in your case). If your provision your Cloud SQL component adequately, that should not prove to be a problem in practice; I do not believe (though I'll gladly accept correction here!) that Cloud SQL has any concept of different priority for different connections.

Alex

Thank you.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/8e9cd1b9-150f-4fd5-bb25-f99a2a4c436d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Karoly Kantor

unread,

Jun 10, 2017, 5:03:44 PM6/10/17

to Google App Engine

Thank you, most helpful!

Reply all

Reply to author

Forward