Dear
UCSC staff:
I'm a Ph.D candidate from Tsinghua University, and
recently I'm trying to map genomic coordinates from the tree shrew genome(TS_3.0
genome annotation, http://www.treeshrewdb.org/download.html)
to human genome(hg38) and from rabbit genome(Genome assembly OryCun3.0, https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_013371645.1/)
to human genome(hg38).
First
I used your program LiftOver (thank you for making it available for so
many species!), however the available version of tree shrew genome and rabbit genome are not what I am using now.
Hence, I'm attempting to build the necessary chain files following
the directions provided here :
http://genomewiki.ucsc.edu/index.php/DoBlastzChainNet.pl.
But then I faced the “timed out” problem. Even though I tried to install the Parasol job control system following the procedure and succeed:
http://genomewiki.ucsc.edu/index.php/Parasol_job_control_system
when I got back to build chain file, the problem still existed, like this:
29052 jobs in batch
0 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
updated job database on disk
Pushed Jobs: 29052
================
Checking job status 0 minutes after launch
29052 jobs in batch
0 jobs (including everybody's) in Parasol queue or running.
Sick Batch: consecutive crashes (45) >= sick batch threshold (25)
Checking finished jobs
updated job database on disk
total sick machines: 1 failures: 45
Sick batch! will sleep 10 minutes, clear sick nodes and retry
rudpSend timed out
pmSendString timed out!
pmSendString: will sleep 60 seconds and retry
Told hub to clear sick nodes
================
Checking job status 11 minutes after launch
29052 jobs in batch
0 jobs (including everybody's) in Parasol queue or running.
Checking finished jobs
updated job database on disk
Pushed Jobs: 29052
Retried jobs: 29052
Have you encountered this problem before, and do
you know how to solve this problem? perhaps there might be an update or some
other documentation for chain-file generation besides the link above? If
possible, would you please provide us the new version chain files?
I have tried to make this chain files for several weeks and I feel sorry because your team already provided detailed guidance. Besides, I was running these procedures in a single pod built from a cluster, and I don’t know if my configuration is suitable for this task. Any more suggestions from you would be very helpful.
Thanks very much again for providing this valuable
resource for the community. Wish you better and better!
best regards,
Lin Ou
Hello, Lin.
That documentation is the best we have; it's a technical pipeline originally designed for our system, so it is not uncommon for issues to come up when others run it.
We can generate the chain files you are looking for, GCA_013371645.1 to hg38 is not a problem, but we do require that the genome have an NCBI accession, e.g., GCA or GCF. Is TS_3.0 not in NCBI Genbank?
We see version 2 (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_033439345.1/), but that may not be suitable for you, let us know.
If you would like to try and debug the chain pipeline, one of our engineers shares the following:
This means that communication from node process to hub process is failing with a timeout,
so it is not connecting. This could be because the paraHub daemon is not running.Or perhaps a firewall interferes, although that is much less common on internal
machines that are not acting as web servers so they do not need as much security.What shell is the user running commands in.
Internally we use bash shell, I suggest the user might run it in their bash shell too.
--
---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To view this discussion visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/77ac65e1-1951-4ee7-8c8a-eba28c920a33n%40soe.ucsc.edu.