GCP Life Sciences API Pipeline "dockerCacheImages"

72 views
Skip to first unread message

Glen Beane

unread,
Jun 17, 2021, 10:12:54 AM6/17/21
to GCP Life Sciences Discuss
I'm wondering if anyone out there is using the "dockerCacheImages" option as part of the virtualMachine configuration when submitting a pipeline.

I have docker image that takes ~20 minutes to pull, so I'm trying to speed up the startup time of my life sciences pipeline by using the cache.

Here is what I've tried so far

1) create a new VM
2) run docker pull on the VM to cache my container
3) create a new disk and attach to VM
4) mount and format disk as ext4
5) copy /var/lib/docker/images and /var/lib/docker/overlay2 to the attached disk
6) unmount & detach disk
7) make image
8) specify disk image as "dockerCacheImages" parameter when submitting pipeline

When I do this, my pipeline fails with this error:

failed: Execution failed: generic::not_found: preparing Docker cache: reading overlays: open /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2: no such file or directory

Which I don't understand, since if I create a disk using the image and attach it to a VM manually, the root of the disk does have an "overlay2" directory. According to the documentation, I just needed to copy the "overlay2" and "images" directories into the disk:

The Compute Engine Disk Images to use as a Docker cache. The disks will be mounted into the Docker folder in a way that the images present in the cache will not need to be pulled. The digests of the cached images must match those of the tags used or the latest version will still be pulled. The root directory of the ext4 image must contain image and overlay2 directories copied from the Docker directory of a VM where the desired Docker images have already been pulled. Any images pulled that are not cached will be stored on the first cache disk instead of the boot disk. Only a single image is supported.

Paul Grosu

unread,
Jun 17, 2021, 8:17:13 PM6/17/21
to GCP Life Sciences Discuss
Hi Glen,

So what happens if you also copy the following directory with all its subdirectories:

   /var/lib/pipelines/google

Does that fix the issue?

Thanks,
~p

Glen Beane

unread,
Jun 18, 2021, 12:40:03 AM6/18/21
to GCP Life Sciences Discuss
Hi Paul, 

It doesn't seem like that would help.

Most of the contents of/var/lib/pipelines/google will be provided by the disk image used to create the Pipeline worker instances.

What looks like is happening is it should be mounting a disk created from my image to /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0, so the contents of my disk should appear there. That means if I was somehow able to copy /var/lib/pipelines/google from the pipeline worker image into my disk image, it would end up at /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/var/lib/pipelines/google on the running worker pipeline instance.

What feels like is happening (but I don't know for certain) is that although a disk is being attached to the worker compute instance, it's not being mounted at  /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0 so the overlay2 directory is not found.

Per the documentation, the root directory of my image should contain the image and overlay2 directories copied from the docker directory. If it were properly mounted, then /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2 should exist.

Tim Jennison

unread,
Jun 18, 2021, 9:39:19 AM6/18/21
to Glen Beane, GCP Life Sciences Discuss
Hi Glen,
I don't see anything obviously wrong with what you're doing. In particular, by creating a disk from the image and manually attaching it to a VM seems like you could verify the directory is there. Can you send either the name or 'gcloud beta lifesciences operations describe' output of an operation that failed? Also, here's the interesting part of code I've used to generate a startup script to populate the disk that you could try to see if it helps (running on Container Optimized OS 89):

DATA=/mnt/disks/data
cat <<EOF >${SCRIPT}
set -o errexit -o xtrace -e
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
sudo mkdir -p "${DATA}"
sudo mount -o discard,defaults /dev/sdb "${DATA}"
sudo chmod a+w "${DATA}"

sleep 10

cp -r /var/lib/docker/* "${DATA}"
mount --bind "${DATA}" /var/lib/docker/

IFS=","
IMAGES="$1"
for IMAGE in \$IMAGES; do
  docker pull "\${IMAGE}"
done

/sbin/poweroff
EOF


Thanks
Tim

--
You received this message because you are subscribed to the Google Groups "GCP Life Sciences Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gcp-life-sciences-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gcp-life-sciences-discuss/10afcf52-0aa8-4130-a65c-329c3e510ce1n%40googlegroups.com.

Glen Beane

unread,
Jun 18, 2021, 10:34:48 AM6/18/21
to GCP Life Sciences Discuss
Hi Tim,

Right -- I did create a disk from the image and then manually attached it to a VM to verify that the directory is there so I'm very confused by the "no such file or directory" error.
I'll try something similar to your script to see if that helps.

here is the output from `gcloud beta lifesciences operation describe`:

gcloud beta lifesciences operations describe 3164378125878051963


done: true

error:

  code: 5

  message: 'Execution failed: generic::not_found: preparing Docker cache: reading

    overlays: open /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2:

    no such file or directory'

metadata:

  '@type': type.googleapis.com/google.cloud.lifesciences.v2beta.Metadata

  createTime: '2021-06-17T13:44:42.648820Z'

  endTime: '2021-06-17T13:46:53.810636428Z'

  events:

  - description: Worker released

    timestamp: '2021-06-17T13:46:53.810636428Z'

    workerReleased:

      instance: google-pipelines-worker-37ac3783fab7115b4fdfcc6a93ddc320

      zone: us-central1-f

  - description: 'Execution failed: generic::not_found: preparing Docker cache: reading

      overlays: open /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2:

      no such file or directory'

    failed:

      cause: 'Execution failed: generic::not_found: preparing Docker cache: reading

        overlays: open /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2:

        no such file or directory'

      code: NOT_FOUND

    timestamp: '2021-06-17T13:46:52.996106944Z'

  - description: Worker "google-pipelines-worker-37ac3783fab7115b4fdfcc6a93ddc320"

      assigned in "us-central1-f" on a "n1-standard-8" machine

    timestamp: '2021-06-17T13:44:53.953871424Z'

    workerAssigned:

      instance: google-pipelines-worker-37ac3783fab7115b4fdfcc6a93ddc320

      machineType: n1-standard-8

      zone: us-central1-f

  pipeline:

    actions:

    - commands:

      - /bin/sh

      - -c

      - gsutil cp gs://jax-compsci-gbeane-pipeline-test/short.avi /data/video.avi

      imageUri: google/cloud-sdk:slim

      mounts:

      - disk: data

        path: /data

    - commands:

      - -c

      - python /deep-hres-net/tools/infermultimousepose.py --max-embed-sep-within-instances

        0.3 --min-embed-sep-between-instances 0.2 --min-pose-heatmap-val 1.0 --max-inst-dist-px

        75 --pose-smoothing /multimousepose.pth /multimousepose-conf.yaml /data/video.avi

        /data/out.h5 1>/data/stdout.log 2>/data/stderr.log

      entrypoint: bash

      environment:

        PYTHONPATH: /deep-hres-net

      imageUri: gcr.io/jax-jmcrs-behavior-sb-01/pose-est-test1

      mounts:

      - disk: data

        path: /data

    - alwaysRun: true

      commands:

      - /bin/sh

      - -c

      - gsutil cp /data/stderr.log gs://jax-compsci-gbeane-pipeline-test/short_stderr.log

      imageUri: google/cloud-sdk:slim

      mounts:

      - disk: data

        path: /data

    - commands:

      - /bin/sh

      - -c

      - gsutil cp /data/stdout.log gs://jax-compsci-gbeane-pipeline-test/short_stdout.log

      imageUri: google/cloud-sdk:slim

      mounts:

      - disk: data

        path: /data

    - commands:

      - /bin/sh

      - -c

      - gsutil cp /data/out.h5 gs://jax-compsci-gbeane-pipeline-test/short_pose_est_v3.h5

      imageUri: google/cloud-sdk:slim

      mounts:

      - disk: data

        path: /data

    resources:

      regions:

      - us-central1

      virtualMachine:

        accelerators:

        - count: '1'

          type: nvidia-tesla-t4

        bootDiskSizeGb: 40

        bootImage: projects/cloud-lifesciences/global/images/w20210604-0900-rc0060ba4e36-0000-21d7-92e1-089e0828ec78stable

        dockerCacheImages:

        - projects/jax-jmcrs-behavior-sb-01/global/images/docker-cache

        labels:

          goog-pipelines-worker: 'true'

        machineType: n1-standard-8

        nvidiaDriverVersion: 450.51.06

        preemptible: true

        serviceAccount:

          email: default

          scopes:

          - https://www.googleapis.com/auth/cloud-platform

        volumes:

        - persistentDisk:

            sizeGb: 100

          volume: data

    timeout: 604800s

  startTime: '2021-06-17T13:44:53.953871424Z'

name: projects/262596842792/locations/us-central1/operations/3164378125878051963

Glen Beane

unread,
Jun 18, 2021, 2:45:43 PM6/18/21
to GCP Life Sciences Discuss
I just tried this:

  1. create drive and attach to VM
  2. format drive & mount to /mnt/drives/data
  3. cp -rp /var/lib/docker/* /mnt/drives/data/ 
  4. mount --bind /mnt/drives/data /var/lib/docker
  5. docker pull <my image>
  6. unmount & detach drive
  7. create new image from drive
  8. submit pipeline specifying new image as dockerCacheImages

this still failed 
Operation [projects/262596842792/locations/us-central1/operations/9546371734292450683] failed: Execution failed: generic::not_found: preparing Docker cache: reading overlays: open /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2: no such file or directory


On Friday, June 18, 2021 at 9:39:19 AM UTC-4 tjen...@google.com wrote:

Tim Jennison

unread,
Jun 21, 2021, 2:36:44 PM6/21/21
to Glen Beane, GCP Life Sciences Discuss
Hi Glen,
We've discovered a bug when using Docker cache disks in combination with GPUs. We have a fix in the works but unfortunately it won't be live until Friday next week. Sorry for the inconvenience and thanks for your assistance in tracking it down.

Thanks
Tim

Glen Beane

unread,
Jul 8, 2021, 12:35:15 PM7/8/21
to GCP Life Sciences Discuss
Hi Tim,

I just tried using the Docker cache disk in combination with a GPU again, and had the same error

error:
  code: 5
  message: 'Execution failed: generic::not_found: preparing Docker cache: reading
    overlays: open /var/lib/pipelines/google/docker-cache-disks/google-docker-cache0/overlay2:
    no such file or directory'

If I don't use a GPU, I don't get this error.

Aaron Golden

unread,
Jul 8, 2021, 10:22:06 PM7/8/21
to Glen Beane, GCP Life Sciences Discuss
Hi Glen,

Due to some issues last week and this week, the fix Tim mentioned above was not rolled out on the usual schedule. We expect that fix to be available by Friday the 16th. Sorry for the delay!

--Aaron

Reply all
Reply to author
Forward
0 new messages