Persistant SSL errors when using Snakemake with google life sciences executor

99 views
Skip to first unread message

Mick Watson

unread,
May 11, 2021, 7:53:06 AM5/11/21
to GCP Life Sciences Discuss
Hey

I am using snakemake to submit jobs to google life science executor. My problem is the first few jobs go through and work, then I get an SSL error. This always happens. I have tried submitting from three machines: google cloud shell, my university cluster which uses scientific linux, and a google VM running ubuntu - so this happens across OSs


The most common error message is: ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2627)

Entire output is pasted below

Any help appreciated

Thanks
Mick

$ snakemake  --google-lifesciences --default-remote-prefix dog_microbiome_updated_env  --preemption-default 1 --use-conda --jobs 20 -r --preemptible-rules download_and_trim=0 combine=0 megahit=0
Preemptible instances are only available for the Google Life Sciences Executor.
['Bandit_p1', 'Belle_p2', 'Bandit_p2']
Building DAG of jobs...
Using shell: /bin/bash
Job counts:
        count   jobs
        1       all
        3       combine
        26      download_and_trim
        3       megahit
        33
Select jobs to execute...

[Tue May 11 10:20:17 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914784_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914784_2.t.fastq.gz
    jobid: 55
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914784_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914784_1.t.fastq.gz
    wildcards: id=ERR1914784
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/18226784425250149977
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:19 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914157_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914157_2.t.fastq.gz
    jobid: 24
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914157_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914157_1.t.fastq.gz
    wildcards: id=ERR1914157
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/10326049937255736491
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:21 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914151_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914151_2.t.fastq.gz
    jobid: 18
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914151_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914151_1.t.fastq.gz
    wildcards: id=ERR1914151
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/11537486689270747502
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:22 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914807_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914807_2.t.fastq.gz
    jobid: 78
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914807_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914807_2.t.fastq.gz
    wildcards: id=ERR1914807
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/2811166438802341500
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:24 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914799_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914799_2.t.fastq.gz
    jobid: 70
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914799_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914799_1.t.fastq.gz
    wildcards: id=ERR1914799
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/11925268550454341352
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:26 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914137_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914137_2.t.fastq.gz
    jobid: 4
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914137_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914137_1.t.fastq.gz
    wildcards: id=ERR1914137
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/1634142244858023504
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:28 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914793_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914793_2.t.fastq.gz
    jobid: 64
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914793_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914793_1.t.fastq.gz
    wildcards: id=ERR1914793
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/9206629699432946097
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:30 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914800_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914800_2.t.fastq.gz
    jobid: 71
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914800_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914800_1.t.fastq.gz
    wildcards: id=ERR1914800
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/384354668917063172
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:32 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914138_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914138_2.t.fastq.gz
    jobid: 5
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914138_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914138_2.t.fastq.gz
    wildcards: id=ERR1914138
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/8249848565164966783
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:33 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914139_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914139_2.t.fastq.gz
    jobid: 6
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914139_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914139_1.t.fastq.gz
    wildcards: id=ERR1914139
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/15868307048721116973
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:35 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1915830_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1915830_2.t.fastq.gz
    jobid: 43
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1915830_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1915830_1.t.fastq.gz
    wildcards: id=ERR1915830
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/2811166438802341500
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:37 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914787_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914787_2.t.fastq.gz
    jobid: 58
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914787_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914787_2.t.fastq.gz
    wildcards: id=ERR1914787
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Get status with:
gcloud config set project cloud-practice-metagenomics
gcloud beta lifesciences operations describe projects/cloud-practice-metagenomics/locations/us-central1/operations/9206629699432946097
gcloud beta lifesciences operations list
Logs will be saved to: dog_microbiome_updated_env/dog_microbiome_updated_env/google-lifesciences-logs


[Tue May 11 10:20:39 2021]
rule download_and_trim:
    output: dog_microbiome_updated_env/trimmed/ERR1914795_1.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914795_2.t.fastq.gz
    jobid: 66
    reason: Missing output files: dog_microbiome_updated_env/trimmed/ERR1914795_2.t.fastq.gz, dog_microbiome_updated_env/trimmed/ERR1914795_1.t.fastq.gz
    wildcards: id=ERR1914795
    threads: 2
    resources: mem_mb=16000, disk_mb=18000

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/google_lifesciences.py", line 917, in _wait_for_jobs
    status = self._retry_request(request)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/google_lifesciences.py", line 886, in _retry_request
    raise ex
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/google_lifesciences.py", line 875, in _retry_request
    return request.execute()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/http.py", line 920, in execute
    resp, content = _retry_request(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/http.py", line 222, in _retry_request
    raise exception
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/http.py", line 191, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/oauth2client/transport.py", line 173, in new_request
    resp, content = request(orig_request_method, uri, method, body,
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/oauth2client/transport.py", line 280, in request
    return http_callable(uri, method=method, body=body, headers=headers,
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/httplib2/__init__.py", line 1708, in request
    (response, content) = self._request(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/httplib2/__init__.py", line 1424, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/httplib2/__init__.py", line 1376, in _conn_request
    response = conn.getresponse()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/http/client.py", line 1345, in getresponse
    response.begin()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:2627)
Traceback (most recent call last):
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/__init__.py", line 695, in snakemake
    success = workflow.execute(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/workflow.py", line 1017, in execute
    success = scheduler.schedule()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 489, in schedule
    self.run(runjobs)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/scheduler.py", line 500, in run
    executor.run_jobs(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/__init__.py", line 136, in run_jobs
    self.run(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/google_lifesciences.py", line 804, in run
    result = self._retry_request(operation)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/google_lifesciences.py", line 886, in _retry_request
    raise ex
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/snakemake/executors/google_lifesciences.py", line 875, in _retry_request
    return request.execute()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/http.py", line 920, in execute
    resp, content = _retry_request(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/http.py", line 222, in _retry_request
    raise exception
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/googleapiclient/http.py", line 191, in _retry_request
    resp, content = http.request(uri, method, *args, **kwargs)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/oauth2client/transport.py", line 173, in new_request
    resp, content = request(orig_request_method, uri, method, body,
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/oauth2client/transport.py", line 280, in request
    return http_callable(uri, method=method, body=body, headers=headers,
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/httplib2/__init__.py", line 1708, in request
    (response, content) = self._request(
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/httplib2/__init__.py", line 1424, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/site-packages/httplib2/__init__.py", line 1376, in _conn_request
    response = conn.getresponse()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/http/client.py", line 1345, in getresponse
    response.begin()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/socket.py", line 704, in readinto
    return self._sock.recv_into(b)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/home/mwatson9/miniconda3/envs/snakemake/lib/python3.9/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:2627)

Mick Watson

unread,
May 12, 2021, 7:36:01 AM5/12/21
to GCP Life Sciences Discuss
I created a test case which produces this error


I would appreciate knowing if others see this error.

Paul Grosu

unread,
May 12, 2021, 9:10:05 AM5/12/21
to GCP Life Sciences Discuss
Hi Mick,

I suspect it is a snakemake code strategy issue.  So in the snakemake codebase you will notice at the following retry section, that it is scheduled via the status_rate_limiter parameter:

  Section:


  Code:

    with self.status_rate_limiter:
    ...
      try:
          status = self._retry_request(request)
      except googleapiclient.errors.HttpError as ex:

The status_rate_limiter is defined in the following code section:

  Section:


  Code:

    self.max_status_checks_per_second = max_status_checks_per_second

    self.status_rate_limiter = RateLimiter(
        max_calls=self.max_status_checks_per_second, period=1
    )

Based on the RateLimiter documentation at the following link, your period is every 1 second performing a maximum of max_status_checks_per_second calls:


Most services guard against such simple repetitive requests, which could overload a service.  Instead most perform the scheduled retry requests using an exponential backoff logic, which is what Google recommends here:


Hope it helps,
Paul

Mick Watson

unread,
May 12, 2021, 10:28:21 AM5/12/21
to GCP Life Sciences Discuss
Hi Paul

Thanks that is a very helpful suggestion.

I have been playing aorund with two snakemake parameters:

[--max-jobs-per-second MAX_JOBS_PER_SECOND]
[--max-status-checks-per-second MAX_STATUS_CHECKS_PER_SECOND]

First time I tried I encountered a C++ error (*** Error in `/home/mwatson9/miniconda3/envs/snakemake/bin/python3.9': munmap_chunk(): invalid pointer: 0x00007f8ae0011c70 ***) but hopefully that was a one-off gremlin because I don't fancy debugging it.

I have set the above paramters to 0.3 but still encounter the bug. I will keep experimenting with these to see if it helps

Thanks
Mick

Paul Grosu

unread,
May 12, 2021, 10:49:47 AM5/12/21
to GCP Life Sciences Discuss
Hi Mick,

So that's a different coding issue.  Invalid pointer means the function (munmap_chunk) is trying to access a memory address that is most likely unallocated, or it's not in the heap -- there are more complex scenarios, but for brevity reasons won't cover.  It's probably a race condition where it had access to it at one time, but it was operated upon before this function call.

Those two parameters don't indicate a change in the linear periodic request logic, as the time parameter seems to still be a constant of 1 second based on the "_PER_SECOND" suffix.

Hope it helps,
Paul

Paul Grosu

unread,
May 12, 2021, 3:09:07 PM5/12/21
to GCP Life Sciences Discuss
Hi Mick,

One other thing, I understand what you are trying to do with fractional values, but the code suggests the following:

1)  For max_jobs_per_second:

Section: 


Code: 

  max_jobs_per_second (int):  maximal number of cluster/drmaa jobs per second, None to impose no limit (default None)

2)  For max_status_checks_per_second:

Section:



Code:

    class RateLimiter(object):

        """Provides rate limiting for an operation with a configurable number of
        requests for a time period.
        """

        def __init__(self, max_calls, period=1.0, callback=None):
            """Initialize a RateLimiter object which enforces as much as max_calls
            operations on period (eventually floating) number of seconds.
            """
  ...
      def __enter__(self):
        with self._lock:
            # We want to ensure that no more than max_calls were run in the allowed
            # period. For this, we store the last timestamps of each call and run
            # the rate verification upon each __enter__ call.
            if len(self.calls) >= self.max_calls:


For the first (max_jobs_per_second) it seems to require an integer (not a partial), and for the second (max_status_checks_per_second) the way it is coded it the check is based on a the length of a list where a partial would not have an impact.  One could play with the period somehow, which they changed it for the GA4GH TES but the code would need to be changed to for the rest so that is possible for Google submissions.  With fractional values is probably why the exception is being thrown.

Hope it helps,
Paul

WATSON Mick

unread,
May 12, 2021, 4:46:50 PM5/12/21
to GCP Life Sciences Discuss, Paul Grosu
Thanks Paul

I was looking at the documentation

Which states both parameters accept fractions. I assumed these would be used to slow down the number of requests i.e. 0.1 requests per second would be 1 request every 10 seconds.

Admittedly I haven't looked too deeply at the code

Thanks
Mick


From: gcp-life-sci...@googlegroups.com <gcp-life-sci...@googlegroups.com> on behalf of Paul Grosu <pgr...@gmail.com>
Sent: Wednesday, May 12, 2021 8:09:07 PM
To: GCP Life Sciences Discuss <gcp-life-sci...@googlegroups.com>
Subject: Re: Persistant SSL errors when using Snakemake with google life sciences executor
 
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email is genuine and the content is safe.
--
You received this message because you are subscribed to a topic in the Google Groups "GCP Life Sciences Discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gcp-life-sciences-discuss/j4-h-qz4lR0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gcp-life-sciences-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gcp-life-sciences-discuss/59819e37-149a-49d9-ad65-63ccf760598en%40googlegroups.com.
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

Paul Grosu

unread,
May 12, 2021, 5:12:57 PM5/12/21
to GCP Life Sciences Discuss
Hi Mick,

You are right, the parser allows for floats based on this section and code -- though it's also listed under the snakemake() function's argument definition as an integer (which I showed previously):


Code:

    group_behavior.add_argument(
        "--max-jobs-per-second",
        default=10,
        type=float,
        help="Maximal number of cluster/drmaa jobs per second, default is 10, "
        "fractions allowed.",
    )
    group_behavior.add_argument(
        "--max-status-checks-per-second",
        default=10,
        type=float,
        help="Maximal number of job status checks per second, default is 10, "
        "fractions allowed.",
    )

Based on my previous analysis the Google Life Sciences uses the status_rate_limiter, and treats it as a discrete (integer) approach, given that the period is 1:

        self.status_rate_limiter = RateLimiter(
            max_calls=self.max_status_checks_per_second, period=1
        )

Other parts of the code interacts with the rate limiter using floats in terms of logic.  For example scheduler does come closer with the float approach as follows -- or like the GA4GH TES from the previous email:


Code:

   if self.max_jobs_per_second and not self.dryrun:
       max_jobs_frac = Fraction(self.max_jobs_per_second).limit_denominator()
              self.rate_limiter = RateLimiter(
              max_calls=max_jobs_frac.numerator, period=max_jobs_frac.denominator
            )

Not all parts of the snakemake codebase are consistent if one looks carefully, which is usually my preferred method.  I usually try to double-check most Bioinformatics code-bases after reading the documentation to verify that it behaves as intended, especially open-source ones.  This shows a clear example why that's necessary.  It was a nice team-effort :)

Thanks,
Paul

Mick Watson

unread,
May 14, 2021, 8:03:30 AM5/14/21
to GCP Life Sciences Discuss

Basically whenever Snakemake calls function _retry_request in google_lifesciences.py and two calls are less than 1ms apart, snakemake crashes

Unfortunately it looks like Snakemake has multiple threads running simultaneously that call this function, which means the liklihood of calls <1ms apart is quite high, and also quite hard to fix

Paul Grosu

unread,
May 14, 2021, 11:20:34 AM5/14/21
to GCP Life Sciences Discuss
Hi Mick,

Nice analysis!  As I initially suspected it probably was a race condition.  By the way if it's only two threads, and the order is always run->wait as I see it in the code section below which is inherited by the Google executor, just add a time.sleep( sec ) function call with a parameter of around 1 second just before the self.wait_thread = ..., (otherwise if that does not work use an atomic thread-safe condition variable, or better yet make it all concurrent and serialize it to get away from threading):

Section: https://github.com/snakemake/snakemake/blob/main/snakemake/executors/__init__.py#L694-L696

Code:

    self.wait = True
    self.wait_thread = threading.Thread(target=self._wait_for_jobs)
    self.wait_thread.daemon = True

Hope it helps,
Paul

Reply all
Reply to author
Forward
0 new messages