fail to open splits/*_R1*.fastq.gz

1,250 views
Skip to first unread message

Ankita Nand

unread,
Jun 18, 2017, 12:51:55 PM6/18/17
to 3D Genomics

Hi,


I am getting following error while running juicer mapping pipeline on LSF cluster:


[M::bwa_idx_load_from_disk] read 0 ALT contigs

[E::main_mem] fail to open file `/scare/hpgh/an27t/fly/juicer-run/splits/*_R1*.fastq.gz'.


In my fastq folder I have fastq files folder(/scare/hpgh/an27t/fly/juicer-run/fastq) like:


HiC-flyplus_1_S2_L002_R1_001.fastq.gz

HiC-flyplus_1_S2_L002_R2_001.fastq.gz

HiC-flyplus_1_S9_L002_R1_001.fastq.gz

HiC-flyplus_1_S9_L002_R2_001.fastq.gz

..

...


In total I have 46 fastq.gz files named similar to this. 


I am not sure why is it not taking the fastq files names instead giving it a *


Please let me know what could be the possible reason for this error. 


Thanks,

Ankita

Neva Durand

unread,
Jun 18, 2017, 1:19:43 PM6/18/17
to Ankita Nand, 3D Genomics
Hello Ankita,

I've not seen this with the LSF pipeline before but sometimes there's a problem with the soft links created in splits/.  What I would do is the following:

Remove splits directory: `rm -r /scare/hpgh/an27t/fly/juicer-run/splits`
Manually create it and soft-link files:
``` 
mkdir /scare/hpgh/an27t/fly/juicer-run/splits 
cd /scare/hpgh/an27t/fly/juicer-run/splits
ln -s /scare/hpgh/an27t/fly/juicer-run/fastq/*.fastq.gz .
```
Then try running Juicer again with whatever flags you sent it before.

Best
Neva

--
You received this message because you are subscribed to the Google Groups "3D Genomics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/a59d64d1-64ae-4d39-b5da-d97eeffa9983%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Neva Cherniavsky Durand, Ph.D.
Staff Scientist, Aiden Lab

Ankita Nand

unread,
Jun 21, 2017, 10:26:54 PM6/21/17
to Neva Durand, 3D Genomics
Hi Neva,

Thank you so much. This worked! I just have one another question regarding your 3D de novo assembly (3D DNA) pipeline.  I am trying to run the pipeline with GEO datasets GSE95797_Hs1.fasta and GSE95797_Hs1.mnd.txt. I kept the input files as same folder as run-pipeline.sh and I am trying to run this as explained in the documentation. But it ends in an error as follow:


./run-pipeline.sh –m haploid –t 15000 –s 2 –c 23 GSE95797_Hs1.fasta GSE95797_Hs1.mnd.txt

Not sure how to parse your input: files not listed or not found at expected locations. Exiting!

*****************************************************

3D de novo assembly: version 170123

USAGE: ./run-asm-pipeline.sh [options] <path_to_input_fasta> <path_to_input_mnd> 

Could you please help me understand the reason of this error and how could I resolve this.

Thanks a lot!

Ankita

Olga Dudchenko

unread,
Jun 27, 2017, 1:13:50 PM6/27/17
to 3D Genomics, ne...@broadinstitute.org
Ankita,

Were the files unzipped? If yes, I wonder if you are running into some sort of permissions issues. Can you read the files, e.g. do you see output when you try to run:

head GSE95797_Hs1.fasta

and

head GSE95797_Hs1.mnd.txt

Olga
To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Ankita Nand

unread,
Jul 20, 2017, 7:40:06 PM7/20/17
to Olga Dudchenko, Sanjit Singh Batra, 3D Genomics, Neva Durand
Hi Olga,

Thank you so much for your reply. Yes I do have access for the fasta files. I found that if I run it without any arguments it works fine. i.e.

./run-pipeline.sh GSE95797_Hs1.fasta GSE95797_Hs1.mnd.txt

I just changed the arguments in the run-pipeline.sh file. But by giving the LSF run time to 48 hours or 72 hours  it fails giving the following error. I am sure it's not failing because of run time limit. I tried both (48 and 72 hours) but it has the same error both times. I think there is something else causing this error. Or I am missing any module to load on my cluster? If you could please have a look and help me resolve it, that will be great!  


------------------------------------------------------------

# LSBATCH: User input

#!/bin/bash

module load gnu_parallel/20160522

./run-pipeline.sh GSE95797_Hs1.fasta GSE95797_Hs1.mnd.txt

------------------------------------------------------------

TERM_RUNLIMIT: job killed after reaching LSF run time limit.

Exited with exit code 140.

Resource usage summary:

    CPU time :                                   165565.34 sec.

    Max Memory :                                 51064 MB

    Average Memory :                             39858.61 MB

    Total Requested Memory :                     80960.00 MB

    Delta Memory :                               29896.00 MB

    Max Processes :                              11

    Max Threads :                                77

    Run time :                                   172804 sec.

    Turnaround time :                            172808 sec.

The output (if any) follows:

gnu_parallel 20160522 is located under /share/pkg/gnu_parallel/20160522

:) -p flag was triggered. Running LIGer with GNU Parallel support parameter set to true.

:) -s flag was triggered, starting calculations with 1000 threshold starting contig/scaffold size

:) -q flag was triggered, starting calculations with 1 threshold mapping quality

...Using cprops file: GSE95797_Hs1.0.cprops

...Using merged_nodups file: GSE95797_Hs1.mnd.0.txt

...Scaffolding all scaffolds and contigs greater or equal to 1000 bp.

...Starting iteration # 1

sort: unrecognized option '--parallel=24'

Try `sort --help' for more information.

:) DONE!

:) -p flag was triggered. Running with GNU Parallel support parameter set to true.

:) -q flag was triggered, starting calculations for 1 threshold mapping quality

...Remapping contact data from the original contig set to assembly

...Building track files

...Building the hic file

Not including fragment map

Start preprocess

Writing header

Writing body

.User defined signal 2

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGSEGV (0xb) at pc=0x00002ac6337af15e, pid=15532, tid=47030746217248

#

# JRE version: Java(TM) SE Runtime Environment (8.0_31-b13) (build 1.8.0_31-b13)

# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.31-b07 mixed mode linux-amd64 )

# Problematic frame:

# V  [libjvm.so+0x8e415e]  SR_handler(int, siginfo*, ucontext*)+0x3e

#

# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

#

# An error report file with more information is saved as:

# /tmp/hs_err_pid15532.log

#

# If you would like to submit a bug report, please visit:

#   http://bugreport.java.com/bugreport/crash.jsp

#

If you could please give any insight to this that will be great!

Thanks,
Ankita


Olga Dudchenko

unread,
Jul 20, 2017, 8:37:42 PM7/20/17
to 3D Genomics, odudc...@icloud.com, sanjit...@gmail.com, ne...@broadinstitute.org
Hi Ankita,

The error suggests that you have an older verion of sort than required for the pipeline (and that does not support parallelization). Please see below for the pipeline:
GNU coreutils sort >=8.11. You can check your version by typing "sort --version".

Best,
Olga

Ankita Nand

unread,
Jul 27, 2017, 10:50:04 AM7/27/17
to Olga Dudchenko, 3D Genomics, Sanjit Singh Batra

Hi Olga,

Thank you so much for your reply.Upgrading my sort fixed the previous issue! Now the pipeline is running but I got stuck in another thing. If you could please suggest on this:

In my run-pipeline.sh I have defined MAX_STEP=10   but it is keep on writing 

h.scores.step.*.txt ,h.scaffolds.step.*.txt, h.scaffolds.original.notation.step.*.txt

currently it is writing 29th steps file. I am confused since in my params I have defined it to be 10. Or this MAX_STEP=10 is different than how many *.step.*.txt files it will write?  Now the pipeline has been running since 5 days. Also I have given the run time to be 5 days, so soon it is going to exit the program. 

Do you suggest that I should give more time limit for this? Or there is something that I am doing wrong and that is why it couldn't exit the loop? 


Please let me know.

Thanks,
Ankita





--
You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/kaFOGQA51Ys/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/e9f672e6-5bfd-45ee-a154-82dca22e8930%40googlegroups.com.

Olga Dudchenko

unread,
Jul 27, 2017, 11:11:03 AM7/27/17
to 3D Genomics, odudc...@icloud.com, sanjit...@gmail.com
Hi Ankita,


This should probably be renamed for ease of reference, but the "step" files you are seeing are not steps of iterative error-correction and scaffolding, but rather temp files relating to iterations within the scaffolder (supplements in the 2017 paper has a pseudocode which gives a top-level view of what happens in each of those iterations). Once the scaffolding is done these temporary files will be deleted. Given how long things are running for you I am guessing that you are not using GNU parallel. We highly recommend GNU Parallel >=20150322 to increase performance.


Best,
Olga
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.

Pavla Navratilova

unread,
May 16, 2021, 1:43:22 PM5/16/21
to 3D Genomics
Dear Neva,
I am getting the same problem with the unrecognized link to the fastq files in the splits directory.
However, I am running the job on a grid using machines controlled by scheduling Torque batch system. Therefore, I cannot really stop and restart the job in the middle. I attempted to include copying directly the fastq files into splits while deleting the ln -s command, but that was not correct, getting other issues.
How can I get around and what is causing this problem (I haven`t got that when calculating interactively).

Thank you,

Pavla


To unsubscribe from this group and stop receiving emails from it, send an email to 3d-genomics...@googlegroups.com.

Neva Durand

unread,
May 16, 2021, 2:36:34 PM5/16/21
to Pavla Navratilova, 3D Genomics
Just cancel your jobs. If the soft links aren’t working then copy the fastqs to the splits directory. Then run juicer. 
--
Neva Cherniavsky Durand, Ph.D. | she, her, hers
Assistant Professor |  Molecular and Human Genetics
Aiden Lab | Baylor College of Medicine
Reply all
Reply to author
Forward
0 new messages