Demultiplexing 2.5 errors

29 views
Skip to first unread message

Laura Domenech

unread,
Feb 23, 2024, 3:40:10 PM2/23/24
to Cumulus Support
Hello,

I'm trying to run demultiplexing (both souporcell and demuxlet, with reference genotypes and default parameters) in Terra, but I'm not succeeding on it. I'm trying with just one pool with 2 samples.

I'm using the latest WDL (https://dockstore.org/workflows/github.com/lilab-bcb/cumulus/Demultiplexing:2.5.0?tab=info) and I'm encountering the following issues:

- For demuxlet:
I'm getting warnings like these for the last couple of hours:
NOTICE [2024/02/23 20:01:40] - WARNING: Cannot find AF field from INFO field in VCF file, now calculate AF from AC/AN
- For Souporcell:
It's been running for 4 hours now, there's no error message, nothing in stderr, but the log file is "stuck" in the following step for the last 3 1/2 hours:
2024-02-23 16:39:19,410 - pegasusio.qc_utils - INFO - After filtration, 55300 out of 2218837 cell barcodes are kept in UnimodalData object GRCh38-rna.

I also tried version 2.4.1 for souporcell and I get this error (but not the previous one), although the bam.bai file is in the same folder as the bam file:
[E::idx_find_and_load] Could not retrieve index file for '/cromwell_root/fc-5126aebf-29fc-4e2b-96d3-3a55ac249606/Pool-10X-001-GEX.bam'

Can you help me with these errors? I can provide inputs/parameters.

Many thanks,

Laura

Yiming Yang

unread,
Feb 23, 2024, 8:01:53 PM2/23/24
to Laura Domenech, Cumulus Support
Hello Laura,

Thank you for your interest in our workflows.

The genetic-pooling jobs usually take very long time to finish. I've seen situations of ~12 hours. For your souporcell 2.5.0 WDL job, it's possible that you are not able to always see the up-to-date stdout or stderr, as Google Cloud usually caches the on-screen output. 

If you don't mind, could you share with me the workflow inputs you set? Also, please let me know if your demuxlet and souporcell jobs finally succeed or fail.

For your 2.4.1 souporcell run, if I remember correctly, that error message appears in many of my jobs but usually doesn't stop your jobs from continuing. If possible, could you also share the log file of that job? It could be either stdout and stderr, or a file ending with ".log" in your execution folder.


Sincerely,
Yiming

--
You received this message because you are subscribed to the Google Groups "Cumulus Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cumulus-suppo...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cumulus-support/2316101d-a8a9-41f7-9ce8-94d42315d8edn%40googlegroups.com.


--
γνθι σεαυτόν.

Laura Domenech

unread,
Feb 25, 2024, 8:42:47 PM2/25/24
to Yiming Yang, Cumulus Support
Hi Yiming,

Thank you very much for your response. Both my souporcell job using the latest demultiplexing 2.5.0 WDL and the 2.4.1 versions failed...

Souporcell demultiplexing 2.50 WDL:

I think the error appeared while trying to run vartrix:

2024-02-25 16:09:16,031 - pegasusio.qc_utils - INFO - After filtration, 55300 out of 2218837 cell barcodes are kept in UnimodalData object GRCh38-rna.

***** WARNING: File result/depth_merged.bed has inconsistent naming convention for record:

KI270728.1      97812   97818


***** WARNING: File result/depth_merged.bed has inconsistent naming convention for record:

KI270728.1      97812   97818


checking modules

imports done

checking bam for expected tags

checking fasta

restarting pipeline in existing directory result

using known genotypes

32

running vartrix

Traceback (most recent call last):

  File "/opt/souporcell/souporcell_pipeline.py", line 589, in <module>

    vartrix(args, final_vcf, bam)

  File "/opt/souporcell/souporcell_pipeline.py", line 512, in vartrix

    subprocess.check_call(cmd, stdout = out, stderr = err)

  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call

    raise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command '['vartrix', '--mapq', '30', '-b', '/cromwell_root/.../possorted_genome_bam.bam', '-c', 'result/Pool-10X-001.barcodes.tsv', '--scoring-method', 'coverage', '--threads', '32', '--ref-matrix', 'result/ref.mtx', '--out-matrix', 'result/alt.mtx', '-v', 'result/common_variants_covered.vcf', '--fasta', 'genome_ref/fasta/genome.fa', '--umi']' returned non-zero exit status 1.

souporcell_pipeline.py -i /cromwell_root/.../possorted_genome_bam.bam -b result/Pool-10X-001.barcodes.tsv -f genome_ref/fasta/genome.fa -t 32 -o result -k 2 --known_genotypes ref_genotypes.vcf --skip_remap True

Traceback (most recent call last):

  File "<stdin>", line 34, in <module>

  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call

    raise CalledProcessError(retcode, cmd)

subprocess.CalledProcessError: Command '['souporcell_pipeline.py', '-i', '/cromwell_root/.../possorted_genome_bam.bam', '-b', 'result/Pool-10X-001.barcodes.tsv', '-f', 'genome_ref/fasta/genome.fa', '-t', '32', '-o', 'result', '-k', '2', '--known_genotypes', 'ref_genotypes.vcf', '--skip_remap', 'True']' returned non-zero exit status 1.

2024/02/25 16:24:24 Starting delocalization.




These were the inputs:


input_sample_sheet.csv following the same format as in https://cumulus.readthedocs.io/en/latest/demultiplexing.html#prepare-input-data-and-import-workflow:

OUTNAME,RNA,TagFile,TYPE,Genotype
Pool1,gs://.../raw_feature_bc_matrix.h5_h5.h5,gs://.../possorted_genome_bam.bam,genetic-pooling,gs://.../ref_genotypes.vcf.gz

demultiplexing_algorithm: "souporcell"
docker_registry: "quay.io/cumulus"
genome: "GRCh38"
min_num_genes: 100
preemptible: 2
souporcell_de_novo_mode: false
souporcell_num_clusters: 2 *as I have 2 samples per pool (is this correct?)
souporcell_skip_remap: true *as I'm providing ref genotypes

I'm not providing common variants

I used default values for the rest of the parameters.

Is there anything am I missing...?


Souporell demultiplexing 2.4.1 WDL

error in log file: 

running vartrix running souporcell clustering /opt/souporcell/souporcell/target/release/souporcell -k 2 -a result/alt.mtx -r result/ref.mtx --restarts 100 -b result/Pool-10X-001.barcodes.tsv --min_ref 10 --min_alt 10 --threads 32 --known_genotypes result/common_variants_covered.vcf --known_genotypes_sample_names RP-1361_GB1199B_1_v1_WGS_GCP RP-1361_GB4682B_1_v3_WGS_GCP running souporcell doublet detection raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/opt/souporcell/troublet/target/release/troublet', '--alts', 'result/alt.mtx', '--refs', 'result/ref.mtx', '--clusters', 'result/clusters_tmp.tsv']' returned non-zero exit status 101. Traceback (most recent call last): File "<stdin>", line 34, in <module> File "/usr/lib/python3.9/subprocess.py", line 373, in check_call raise CalledProcessError(retcode, cmd) souporcell_pipeline.py -i /cromwell_root/fc-5126aebf-29fc-4e2b-96d3-3a55ac249606/Pool-10X-001-GEX.bam -b result/Pool-10X-001.barcodes.tsv -f genome_ref/fasta/genome.fa -t 32 -o result -k 2 --known_genotypes ref_genotypes.vcf subprocess.CalledProcessError: Command '['souporcell_pipeline.py', '-i', '/cromwell_root/fc-5126aebf-29fc-4e2b-96d3-3a55ac249606/Pool-10X-001.bam', '-b', 'result/Pool-10X-001.barcodes.tsv', '-f', 'genome_ref/fasta/genome.fa', '-t', '32', '-o', 'result', '-k', '2', '--known_genotypes', 'ref_genotypes.vcf']' returned non-zero exit status 1. 2024/02/24 03:48:55 Starting delocalization.


I think I found a possible error for demuxlet, so I fixed it and trying again. I'll keep you posted about that.

Many thanks for your help!

Best,

Laura

Laura Domenech

unread,
Feb 26, 2024, 12:05:14 PM2/26/24
to Yiming Yang, Cumulus Support
Hello,

Just a Demuxlet update: still running with one pool of 2 samples (running time 22h), and still getting all those warnings (wARNING: Cannot find AF field from INFO field in VCF file, now calculate AF from AC/AN)

Thanks,

Laura
Reply all
Reply to author
Forward
0 new messages