Bug happened when running UCSC-doBlastzChainNet.pl pipeline

95 views
Skip to first unread message

刘子玮

unread,
Jan 3, 2022, 4:57:07 PM1/3/22
to genome
First of all, thanks for making all these scripts to generate chain files. I tried to use these script to generate a chain file from a self-made de-novo genome assembly.

Now I have ran into some issues by following this tutorial: http://genomewiki.ucsc.edu/index.php/DoBlastzChainNet.pl#The_new_streamline_pairLastz_script
at the step of running DoBlastzChainNet.pl. ( I used localhost instead of other host names)

I've started a parasol jobsystem successfully previously:

Then I ran a command line as:

And it ran for a while, making an do.log:

nohup: ignoring input

/GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/DEF looks OK!

        tDb=ptri

        qDb=pcla

        s1d=/GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/ptri.2bit

        isSelf=

Building in /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes

profile db not found in sqlProfileToMyCnf() -- failed for file /GPUFS/sysu_mhwang_1/.hgsql.cnf-m2VqGT failed with errno 2

profile db not found in sqlProfileToMyCnf() -- failed for file /GPUFS/sysu_mhwang_1/.hgsql.cnf-C7Hy3D failed with errno 2

HgStepManager: executing from step 'partition' through step 'syntenicNet'.

HgStepManager: executing step 'partition' Wed Dec 29 09:02:38 2021.

sort: write failed: 'standard output': Broken pipe

sort: write error

# chmod a+x /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz/doPartition.bash

# ssh -x -o 'StrictHostKeyChecking = no' -o 'BatchMode = yes' localhost nice /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz/doPartition.bash

+ cd /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/scripts/partitionSequence.pl 32100000 10000 /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/ptri.2bit /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/ptri.chrom.sizes -xdir xdir.sh -rawDir ../psl 2000 -lstDir tParts

++ wc -l

+ export L1=45

+ L1=45

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/scripts/partitionSequence.pl 10000000 0 /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/pcla.2bit /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/pcla.chrom.sizes 2000 -lstDir qParts

++ wc -l

+ export L2=323

+ L2=323

++ echo 45 323

++ awk '{print $1*$2}'

+ export L=14535

+ L=14535

+ echo 'cluster batch jobList size: 14535 = 45 * 323'

cluster batch jobList size: 14535 = 45 * 323

+ '[' -d tParts ']'

constructing tParts/*.2bit files

+ echo 'constructing tParts/*.2bit files'

+ sed -e 's#tParts/##; s#.lst##;'

+ read tPart

+ ls tParts/part000.lst tParts/part001.lst tParts/part002.lst tParts/part003.lst tParts/part004.lst tParts/part005.lst tParts/part006.lst tParts/part007.lst tParts/part008.lst tParts/part009.lst tParts/part010.lst tParts/part011.lst tParts/part012.lst tParts/part013.lst tParts/part014.lst tParts/part015.lst tParts/part016.lst tParts/part017.lst tParts/part018.lst tParts/part019.lst tParts/part020.lst tParts/part021.lst tParts/part022.lst tParts/part023.lst tParts/part024.lst tParts/part025.lst tParts/part026.lst tParts/part027.lst tParts/part028.lst tParts/part029.lst tParts/part030.lst

+ sed -e 's#.*.2bit:##;' tParts/part000.lst

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/bin/twoBitToFa -seqList=stdin /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/ptri.2bit stdout

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/bin/faToTwoBit stdin tParts/part000.2bit

...

...

...

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/bin/faToTwoBit stdin qParts/part089.2bit

+ read qPart

+ sed -e 's#.*.2bit:##;' qParts/part090.lst

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/bin/twoBitToFa -seqList=stdin /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/pcla.2bit stdout

+ /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/bin/faToTwoBit stdin qParts/part090.2bit

+ read qPart

# ssh -x -o 'StrictHostKeyChecking = no' -o 'BatchMode = yes' localhost '(cd /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz;  csh -ef xdir.sh)'

HgStepManager: executing step 'blastz' Wed Dec 29 09:03:04 2021.

# chmod a+x /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz/doClusterRun.csh

# ssh -x -o 'StrictHostKeyChecking = no' -o 'BatchMode = yes' localhost nice /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz/doClusterRun.csh

cd /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz

gensub2 ptri.lst pcla.lst gsub jobList

para make jobList

Checking input files

14535 jobs written to /GPUFS/sysu_mhwang_1/sysu_mhwang_1/zwei_liuziwei/02_parasol/data/genomes/run.blastz/batch

14535 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Checking finished jobs

updated job database on disk

Pushed Jobs: 14535

total sick machines: 1 failures: 5

================

Checking job status 0 minutes after launch

14535 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Sick Batch: consecutive crashes (36) >= sick batch threshold (25)

Checking finished jobs

updated job database on disk

total sick machines: 1 failures: 36

Sick batch! will sleep 10 minutes, clear sick nodes and retry

Told hub to clear sick nodes

================

Checking job status 10 minutes after launch

14535 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Checking finished jobs

updated job database on disk

Pushed Jobs: 14535

Retried jobs: 14535

================

Checking job status 11 minutes after launch

14535 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Sick Batch: consecutive crashes (36) >= sick batch threshold (25)

Checking finished jobs

updated job database on disk

total sick machines: 1 failures: 36

Sick batch! will sleep 10 minutes, clear sick nodes and retry


Then I checked my parasol logfiles, and they are as following:

24808E92@5DF27E36.CC6DCD61.png
22A64A49@8CE2E21A.CC6DCD61.png
25F9C007@72113B54.CC6DCD61

jingwei yuan

unread,
Oct 13, 2023, 1:21:48 PM10/13/23
to UCSC Genome Browser Public Support, 刘子玮
Hi, I got the same issue  'profile db not found in sqlProfileToMyCnf()' when make the over.chain.file. Did you debug it  successfully? 

Luis Nassar

unread,
Oct 31, 2023, 8:05:02 PM10/31/23
to jingwei yuan, UCSC Genome Browser Public Support, 刘子玮
Hello,

Are you able to isolate the single line from your jobList that brings about this error?

As it stands it is difficult to disentangle the issue since you are having a problem running a batch of jobs. If you can isolate it by running the jobs individually we may be able to isolate and fix the problem.

I hope this is helpful. Please include gen...@soe.ucsc.edu in any replies to ensure visibility by the team. All messages sent to that address are archived on our public forum. If your question includes sensitive information, you may send it instead to genom...@soe.ucsc.edu.

Lou Nassar
UCSC Genomics Institute

--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/c59d3cc1-c326-4377-b7af-566078255c75n%40soe.ucsc.edu.

Lin Ou

unread,
Sep 12, 2025, 2:05:08 PM (4 days ago) Sep 12
to UCSC Genome Browser Public Support, Luis Nassar, UCSC Genome Browser Public Support, 刘子玮, jingwei yuan
Hello, recently I faced a similar timeout problem with the problem mentioned by Ziwei, like this: 
29052 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Checking finished jobs

updated job database on disk

Pushed Jobs: 29052

================

Checking job status 0 minutes after launch

29052 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Sick Batch: consecutive crashes (45) >= sick batch threshold (25)

Checking finished jobs

updated job database on disk

total sick machines: 1 failures: 45

Sick batch! will sleep 10 minutes, clear sick nodes and retry

rudpSend timed out

pmSendString timed out!

pmSendString: will sleep 60 seconds and retry

Told hub to clear sick nodes

================

Checking job status 11 minutes after launch

29052 jobs in batch

0 jobs (including everybody's) in Parasol queue or running.

Checking finished jobs

updated job database on disk

Pushed Jobs: 29052

Retried jobs: 29052

Could you please tell me how to solve this problem?Thank you very much!
Reply all
Reply to author
Forward
0 new messages