Understanding incremental upload in gcloud app deploy

1,612 views
Skip to first unread message

Nikhil Tibrewal

unread,
Apr 20, 2020, 8:25:29 AM4/20/20
to Google App Engine

Hi,

A newbie to App Engine here. I started an App Engine standard app using Django (Python3.7). Everything worked well in the first few deploys using `gcloud app deploy app.yaml`: the number of files being uploaded to Google Cloud Storage seemed like the number I have in my directory (roughly 150 including static files).

Sometimes the deploy would upload less number of files, and it seems like a nice feature that files that aren't updated don't need to be uploaded again (saving space, network traffic, etc).

Around this time I switched to using Github, and added triggers to deploy to gcloud using Cloud Build. One of the steps is to run collectstatic to collect all static files into a directory so they can be uploaded. But then I found out that the number of files to upload shot up in the thousands, likely because I was installing some libraries and those were being uploaded too.

Separately, I read on this page I found that there's some additional storage buckets: staging.<project>.appspot.com, <project>.appspot.com and us.artifacts.<project>.appspot.com. There's no information on these buckets, but seems like staging bucket is harmless to delete once a deployment happens per the page linked earlier in this paragraph. Then there's also tons of temporary containers that show up in GCR, so I proceeded to empty those out along with the staging bucket in GCS.

Now I'm running into a scenario where no matter what, `app deploy` always uploads all the files, instead of incrementally uploading them, and skipping where appropriate.

Questions:
1) Are there any docs on understanding these extraneous storage points (GCS buckets, GCR)? These take up space, and that's not ideal for a free trial limit (docker images in GCR are around 450MB each! Staging bucket seems to be only 22MB which is nice though.
2) Why would my files always be uploaded vs being skipped if they're not changed?
3) After deleting files in the buckets and GCR, I tried to upload the libraries once thinking the app needs them to be there at least once. But when I didn't specify them in the next deploy, my app failed to find the libraries (like Django), which means they weren't installed. So how is Django installed the very first time I ran `gcloud app deploy`? And what did I do that broke this for me and requires me to upload all library files on each deploy now?
4) What is the best way to run collectstatic in a cloudbuild trigger if not installing the libraries in requirements.txt? Because this is when 3000+ files were uploaded the first time.

I realized it's a bit confusing to follow all that, but let's start there and I can provide more information as necessary. There's a real lack of docs on these internals of app engine.

David (Cloud Platform Support)

unread,
Apr 23, 2020, 10:41:20 AM4/23/20
to Google App Engine

Hello,


Here is some information about the default and staging buckets. As for the artifact, this is a bucket for the build cache which holds intermediary build output. The staging and artifact buckets hold temporary files which are part of the deployment process. It is not recommended to delete any of these buckets but it won’t break your project if you do.


Manually running the command "gcloud app deploy" should check for any file changes and only upload those files that were updated. The behavior may be different if done from a Cloud Build step. In this case, it would need to use previous builds in order to speed up the building time. For more information about making your builds faster, please see our documentation on best practices for speeding up builds.


Please review this document about running Django on GAE standard which provides information about Django's collectstatic. Build steps wise:

 

args: ['-m', 'pip', 'install', '-t', '.', '-r', 'requirements.txt']

args: ['./manage.py', 'collectstatic', '--noinput']

Nikhil Tibrewal

unread,
Apr 23, 2020, 10:08:10 PM4/23/20
to Google App Engine
Thanks David. So I had to read a lot of Google cloud SDK code to figure out how GAE, CloudBuild, GCR and GCS are used to orchestrate a deployment onto GAE.

I'm doing all this because the files in GCS and GCR cost money, that storage is not free. So I'd like to understand it. My questions have narrowed down to the following:

1) During `app deploy`, CloudBuild fetches source files from GCS, builds images (or uses cached ones if any in GCR), and pushes the latest image to GCR under the `ttl-2h` folder. I see that this ttl is respected for most images there, except the ones from 2 most recent builds. Why is this?
2) Following point 1, it seems all images are stored in cloud also (maybe under us.artifacts?), but that's not free: https://cloud.google.com/container-registry/pricing. So can you point to any documentation to explain us.artifacts bucket and what temp files are in there?
3) Link you shared for default and staging buckets only mentions pricing on the default bucket (which btw is always empty). Can you share any information on the pricing of staging and us.artifacts bucket? Are those free since they're used for intermediate steps which I have no visibility into?

Thanks in advance!

barrado

unread,
Apr 24, 2020, 7:39:08 AM4/24/20
to Google App Engine
Hi,

I believe the cost of GCS and GCR is almost negligible. Container Registry images are stored in a Standard class buckets whose price is $0.026 per GB per month. Also the first 5 GB in the App Engine default bucket are free. In any case I would refer to the Cloud Storage pricing documentation for a detailed description of the charges using this product.

Nikhil Tibrewal

unread,
Apr 24, 2020, 9:18:40 AM4/24/20
to google-a...@googlegroups.com
My question isn’t about the default bucket though. That bucket is always empty.

I understand the cost is negligible, but I feel I should know what files are in that bucket and how it relates to GCR. I have 2 services, so 2 images in GCR that don’t get deleted even under the ttl-2h directory. They add up to 1GB. The size of my artifacts bucket is also 1GB. So it’s puzzling to me without a proper explanation as my app as a whole has less than 20 files, and is a test app not doing anything significant.

For a proper production app, I imagine the 5GB would be used up very quickly.

Best,
Nikhil
--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/EpXRjvYSTjc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/bc90aeac-dc4c-4ff4-8521-6ad41cd82f4b%40googlegroups.com.

Katayoon (Cloud Platform Support)

unread,
Apr 28, 2020, 1:22:55 PM4/28/20
to Google App Engine
Hi Nikhil,

I have forwarded your request to the Cloud App Engine documentation team to provide an documentation explain clearly the uploaded files in the related GCS buckets and images folders in GCR. However, there is no ETA for it at this time. I have created a public link for you so that you may star it to receive updates on this request.
Reply all
Reply to author
Forward
0 new messages