Canis familiaris: Errors Building a CTAT Genome Library

319 views
Skip to first unread message

jennyl....@gmail.com

unread,
Mar 16, 2021, 4:08:59 PM3/16/21
to STAR-Fusion
Hello,

I am trying to build a CTAT genome library to run STAR-Fusion on samples from Canis lupus familiaris. I have run the `prep_genome_lib.pl` script locally on my HPC and using the docker image to ensure the error isn't coming from my environment. But I keep receiving the same error from `dfamscan.pl`,  "Error: Failed to open binary auxfiles for Dfam.hmm: use hmmpress first". 

If I do run `hmmpress Dfam.hmm`, it creates the auxiliary files properly. However, on running the  `prep_genome_lib.pl` afterwards, I get the error, "Error: GA bit thresholds unavailable on model DR0006964". This error seems to be a known with `dfamscan.pl` from the documentation, "Models for "raw" families (accessions starting with DR) do not undergo the same threshold calculation process as curated families. [...] Trying to use the precalculated threshold options will produce an error such as: "Error: GA bit thresholds unavailable on model _____". (https://dfam.org/help/tools). 

My question is, are there any options I need to  pass to `prep_genome_lib.pl` to avoid the errors in the `dfamscan.pl` step? I can also provide log files as well. 


Thanks,

Jenny


"""
set -eou pipefail
echo \$STAR_FUSION_HOME
GTF=Canis_lupus_familiaris.CanFam3.1.103.gtf #from ensembl
GENOME=Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa #from ensembl


\$STAR_FUSION_HOME/ctat-genome-lib-builder/prep_genome_lib.pl \
                       --genome_fa $GENOME \
                       --gtf $GTF \
--dfam_db $DFAM \
                       --pfam_db $PFAM \
                       --output_dir \$PWD \
                       --CPU 16

"""



Brian Haas

unread,
Mar 16, 2021, 8:39:11 PM3/16/21
to jennyl....@gmail.com, STAR-Fusion
Hi Jenny,

I'm not sure about that error, but you might try using the mouse dfam.hmm file instead of the full Dfam one, or maybe combine the human and mouse dfam.hmms to build a more comprehensive one.  The full Dfam is super slow to search and I'm not sure how much you're going to find in dog that isn't already represented in human or mouse.  I'm no Dfam expert though...  I just know it's super slow to search the full one.

--
You received this message because you are subscribed to the Google Groups "STAR-Fusion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to star-fusion...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/star-fusion/5b06f5b0-1620-4220-996e-15c12bc478e8n%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

Jenny Smith

unread,
Mar 19, 2021, 12:49:43 PM3/19/21
to Brian Haas, STAR-Fusion
Hi Brian,

I tried running the `prep_genome_lib.pl` using the dfam.hmm for Mus musculus instead, but I am running into the same error at the `dfamscan.pl` step in the pipeline. Do you have any further suggestions on how to resolve this error?


Thanks so much for you help,

Jenny



From the Logs:
```
* Running CMD: /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/util/dfam_repeat_masker.pl --dfam_hmm mus_musculus_dfam.hmm --target_fa ref_annot.cdsplus.fa --out_masked ref_annot.cdsplus.dfam_masked.fa --CPU 4
* Running CMD: dfamscan.pl -fastafile ref_annot.cdsplus.fa -hmmfile mus_musculus_dfam.hmm -dfam_outfile __dfam_ref_annot.cdsplus.fa/dfam.out --masking_thresh --cpu 4

Error: Failed to open binary auxfiles for mus_musculus_dfam.hmm: use hmmpress first

Error running command:
nhmmscan --noali --cut_ga --dfamtblout /tmp/XrEgyj7ZX8 --cpu=4 mus_musculus_dfam.hmm ref_annot.cdsplus.fa
Error, cmd: dfamscan.pl -fastafile ref_annot.cdsplus.fa -hmmfile mus_musculus_dfam.hmm -dfam_outfile __dfam_ref_annot.cdsplus.fa/dfam.out --masking_thresh --cpu 4 died with ret 6400 No such file or directory at /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/util/../lib/Pipeliner.pm line 186.
Pipeliner::run(Pipeliner=HASH(0x127a1c0)) called at /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/util/dfam_repeat_masker.pl line 84
Error, cmd: /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/util/dfam_repeat_masker.pl --dfam_hmm mus_musculus_dfam.hmm --target_fa ref_annot.cdsplus.fa --out_masked ref_annot.cdsplus.dfam_masked.fa --CPU 4 died with ret 512 No such file or directory at /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.
Pipeliner::run(Pipeliner=HASH(0xb032e8)) called at /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/prep_genome_lib.pl line 460
```

Code:
```
GTF=Canis_lupus_familiaris.CanFam3.1.103.gtf #from ensembl
GENOME=Canis_lupus_familiaris.CanFam3.1.dna.toplevel.fa #from ensembl


\$STAR_FUSION_HOME/ctat-genome-lib-builder/prep_genome_lib.pl \
                       --genome_fa $GENOME \
                       --gtf $GTF \
--dfam_db $DFAM \
                       --pfam_db $PFAM \
                       --output_dir \$PWD \
                       --CPU 4

```

Brian Haas

unread,
Mar 19, 2021, 2:20:25 PM3/19/21
to Jenny Smith, STAR-Fusion
Hi Jenny,

I think it's missing a step wrt running 'hmmpress' on the dfam library.  If you specify 'mouse' as the parameter for --dfam_db, then it should automatically pull it down and build it for you as part of the process.

best,

~brian

Jenny Smith

unread,
Apr 7, 2021, 2:31:02 PM4/7/21
to Brian Haas, STAR-Fusion
Hi Brian,

Thanks so much for your assistance, and my apologies for the late reply. I am still getting errors when trying to run the `prep_genome_lib.pl`. The error is coming from the pipeline copy command `cp:  are the same file". I have run this same error locally on my HPC and on AWS Batch using the pre-built docker images from "trinityctat/starfusion:1.10.0" using a nextflow workflow.   Please let me know if you would like any log files or anything else. 


Thanks,

Jenny


Error on Ubuntu Linux HPC: 
cp: 'ref_annot.cdna.fa' and '/fh/scratch/delete90/meshinchi_s/jlsmith3/CSU_Canine_AML/genome_refs/ref_annot.cdna.fa' are the same file
Error, cmd: cp ref_annot.cdna.fa /fh/scratch/delete90/meshinchi_s/jlsmith3/CSU_Canine_AML/genome_refs/ref_annot.cdna.fa died with ret 256 No such file or directory at /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.
Pipeliner::run(Pipeliner=HASH(0x1cb7430)) called at /app/software/STAR-Fusion/1.9.1-foss-2020b-Perl-5.32.0/ctat-genome-lib-builder/prep_genome_lib.pl line 460



Error on AWS EC2 Instance:
  cp: 'ref_annot.cdna.fa' and '/tmp/nxf.feZl8hQruA/ref_annot.cdna.fa' are the same file
  Error, cmd: cp ref_annot.cdna.fa /tmp/nxf.feZl8hQruA/ref_annot.cdna.fa died with ret 256 No such file or directory at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/lib/Pipeliner.pm line 186.
  	Pipeliner::run(Pipeliner=HASH(0x559bcc8aafb8)) called at /usr/local/src/STAR-Fusion/ctat-genome-lib-builder/prep_genome_lib.pl line 460

Brian Haas

unread,
Apr 7, 2021, 3:08:57 PM4/7/21
to Jenny Smith, STAR-Fusion
Hi Jenny,

When you're running the prep_genome_lib.pl, are you setting the --output_dir to some other directory than the one containing your inputs?
Reply all
Reply to author
Forward
0 new messages