Query regarding making of chain file

215 views
Skip to first unread message

tanushree tiwari

unread,
Sep 18, 2018, 5:25:50 PM9/18/18
to gen...@soe.ucsc.edu
Hello,

I have genome from same species with 235 MB which I want to map on each other using Picard liftover program. It does not have chain file in UCSC so, by following the instructions from http://genomewiki.ucsc.edu/index.php/DoSameSpeciesLiftOver.pl I have been able to make .ooc file, .chrom.sizes, 2bit files for new and old files.
But now when I try to run the DoSameSpeciesLiftOver.pl it does not give me any output or error. It just shows me the parameters available for this script.

I tried to look for help and support but could not. Any type of help or support is appreciated.

Thank You,
With Regards,
Tanushree

Hiram Clawson

unread,
Sep 18, 2018, 5:35:05 PM9/18/18
to tanushree tiwari, gen...@soe.ucsc.edu
Good Afternoon Tanushree:

Please note the example operation of the script with parameters:

http://genomewiki.ucsc.edu/index.php/DoSameSpeciesLiftOver.pl#doSameSpeciesLiftOver.pl

You can use the script with the '-debug' option with the other parameters.
With '-debug' the script will write the necessary shell scripts to perform each
step, but not perform the steps. You can then use those small shell scripts
to perform each step of the process.

This script assumes you have an instance of the 'parasol' cluster management
system in operation:
http://genomewiki.ucsc.edu/index.php/Parasol_job_control_system

You don't need to have a cluster computer, but you do need to have 'parasol' running.

--Hiram

tanushree tiwari

unread,
Sep 19, 2018, 11:11:27 AM9/19/18
to hi...@soe.ucsc.edu, gen...@soe.ucsc.edu
Hello,
Thank you so much for your quick response and time. 
I was lacking the 'parasol ' cluster management. I tried doing it using the link provided by you in the earlier email, but i am getting this error.
/data2/side_analysis/liftover/parasol/nodeInfo/nodeReport.sh: line 4: /usr/sbin/ip: No such file or directory
/data2/side_analysis/liftover/parasol/nodeInfo/nodeReport.sh: line 13: /data/parasol/nodeInfo/.ms: Not a directory
Kindly suggest where and what am I doing incorrect. Sorry if you find these questions irrelevant, but i have never used anything like parasol before. 

With Regards,
Tanushree

tanushree tiwari

unread,
Sep 19, 2018, 4:32:23 PM9/19/18
to Hiram Clawson, gen...@soe.ucsc.edu
Hello,

Please ignore the earlier email, I have resolved those errors.
I have been trying hard to install parasol on my server.  But I am unsuccessful to do so and I need chain file for proceeding ahead in my research.
I am attaching the outputs and other files. Kindly have look at them and suggest where am I going wrong and how can I proceed. I am not able to get the name of parasol hub.

Thank you in advance for your help and support.

With Regards,
Tanushree

info.txt

Galt Barber

unread,
Sep 19, 2018, 7:35:52 PM9/19/18
to tanushree tiwari, Hiram Clawson, gen...@soe.ucsc.edu

Are you planning to run all the hub and nodes on a single machine or not?

The .ms file seems to indicate that you have just one node machine defined with 46 CPUs.
Are also going to be running paraHub on that machine too?

The scripts are set up so that you can have multiple nodes
and a separate machine for running the hub and para commands.

SSH is used to log into each node machine and start paraNode daemon.

The initParasol script is using the following to discover the machine name of the node
where you are running the hub and initParasol script.

What does it output on that machine for you?
/usr/sbin/ip addr show

The scripts also start the paraHub daemon for accepting requests
and passing jobs out to various nodes.

If you have previously tried to run the commands, they may have left paraHub
or paraNode daemons running. That could block ports, so check if they are there,
and then stop or kill them before running something else.

-Galt


--

---
You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.
To view this discussion on the web visit https://groups.google.com/a/soe.ucsc.edu/d/msgid/genome/CAM5N%2B5RuCYwZX1RZLS7nZEeqhgyrLXgR2ei4PrOD%3DXFEN5aD4Q%40mail.gmail.com.

Hiram Clawson

unread,
Sep 19, 2018, 9:18:53 PM9/19/18
to tanushree tiwari, gen...@soe.ucsc.edu
Good Afternoon Tanushree:

Please examine the code in nodeReport.sh script.
It is merely trying to collect together some numbers
to place into nodeInfo/<ip address>.ms

You can do this collection yourself:
1. the IP address of the computer you want parasol to use
2. the number of CPUs to use in parasol
3. the amount of memory available on the machine
4. the amount of space for temporary storage in /dev/shm/

For the IP address, you can use the 'ifconfig' command to find
the IP address of your machine:
/usr/sbin/ifconfig -a | grep inet
will show several addresses, use the one that belongs to the
subnet for your computer.

This command will show you the number of CPUs/cores on your computer:
grep processor /proc/cpuinfo | wc -l
Reduce that number by 2 if there are more than 2 to reserve 2 CPUs
for the operating system and the parasol processes.

The amount of memory on your computer can be found:
grep -w MemTotal /proc/meminfo
that is most likely outputting Kbytes, convert to Mbytes:
grep -w MemTotal /proc/meminfo | awk '{print 1024*(1+int($2/(1024*1024)))}'

The space to use for temporary storage in /dev/shm can be calculated:
df -k /dev/shm | grep dev/shm | awk '{printf "%d\n", 512*(1+int($2/(1024*1024)))}'

This allocates half of the space of /dev/shm for this use.

Then, these numbers are placed into the <ip address>.ms file
where the columns have these meanings:

# can see this info from the paraHub usage message
# name - Network name
# cpus - Number of CPUs we can use
# ramSize - Megabytes of memory on the machine
# tempDir - Location of (local) temp dir
# localDir - Location of local data dir
# localSize - Megabytes of local disk
# switchName - Name of switch this is on (not used, can be anything)
# note: the default ram available to an single job will be:
# ramSize / cpus
# in this case: 20480/20 == 1024 Mb == 1G
# the 32768 for localSize is == 32 Gb in /dev/shm
123.123.123.123 20 20480 /dev/shm /dev/shm 32768 bsw

The installation of the parasol programs do not need to be in '/data/parasol/'
That is merely an example. You can place them anywhere and they do not need
to be owned as 'root' user. Add the bin/ directory where all the kent commands are
to your shell PATH setting.

Adjust the path names for your installed location in the 'initParasol' script which
expects find the <ip address>.ms file in a subdirectory: ./nodeInfo/*.ms

tanushree tiwari

unread,
Sep 20, 2018, 12:51:18 PM9/20/18
to Hiram Clawson, gen...@soe.ucsc.edu
Hello,

Thank You for your detailed explanations, but I have already tried this and I still get the same errors. Errors attached in the email.
Is there any other way to activate the parasol server or run the script to make the chain file.

info1.txt

Hiram Clawson

unread,
Sep 20, 2018, 12:58:14 PM9/20/18
to tanushree tiwari, gen...@soe.ucsc.edu
Good Morning Tanushree:

The information you supplied seems to indicate you are
running different commands as different users ?
You can run everything as your user identity, you do
not need to run anything as the 'root' user.

What does your <ip address>.ms file look like after you
constructed it ? It appears to have the string 'date'
in the first column where the IP address should be.

You can run the chain script with the '-debug' option
which will cause it to do nothing, but it will write
out the necessary shell scripts to perform each step.
You could then work thought the commands in each shell
script. You would need to figure out how to run the thousands
of commands that would be generated for the parasol run
if you can not get parasol to run.

--Hiram

tanushree tiwari

unread,
Sep 20, 2018, 12:59:40 PM9/20/18
to ga...@soe.ucsc.edu, Hiram Clawson, gen...@soe.ucsc.edu
Hello,

Yes, I plan to run all the hub and nodes on a single machine.
The  ip addr show gives following addresses to me
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:18:77:3b:97:0b brd ff:ff:ff:ff:ff:ff
    inet 130.63.76.26/24 brd 130.63.76.255 scope global em1
       valid_lft forever preferred_lft forever
    inet6 fe80::1618:77ff:fe3b:970b/64 scope link
       valid_lft forever preferred_lft forever
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 14:18:77:3b:97:0c brd ff:ff:ff:ff:ff:ff
4: em3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 14:18:77:3b:97:0d brd ff:ff:ff:ff:ff:ff
5: em4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 14:18:77:3b:97:0e brd ff:ff:ff:ff:ff:ff
6: openstack0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 96:b2:42:47:e9:7c brd ff:ff:ff:ff:ff:ff
    inet 10.99.0.1/24 scope global openstack0
       valid_lft forever preferred_lft forever
    inet6 fe80::94b2:42ff:fe47:e97c/64 scope link
       valid_lft forever preferred_lft forever
I have been carefully monitoring and killing the leftover processes.
I have modified my .ms file to 130.63.76.26.ms and ip address in it too but I still get the same errors.

Any help or suggestions.

Thank You 
With Regards,
Tanushree

Galt Barber

unread,
Sep 20, 2018, 4:55:57 PM9/20/18
to Hiram Clawson, tanushree tiwari, gen...@soe.ucsc.edu
The initParasol script is specifically looking for a line
in the "ip addr show" output that says
"inet" followed by anything, ending with "eth0".

 myIp=`/usr/sbin/ip addr show | egrep "inet.*eth0" | awk '{print $2}' | sed -e 's#/.*##;'`

Yours has this line.

inet 130.63.76.26/24 brd 130.63.76.255 scope global em1

Try changing the initParasol script to search for em1 instead eth0.

-Galt

--

--- You received this message because you are subscribed to the Google Groups "UCSC Genome Browser Public Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to genome+un...@soe.ucsc.edu.
To post to this group, send email to gen...@soe.ucsc.edu.
Visit this group at https://groups.google.com/a/soe.ucsc.edu/group/genome/.

tanushree tiwari

unread,
Sep 21, 2018, 11:41:40 AM9/21/18
to Galt Barber, Hiram Clawson, gen...@soe.ucsc.edu
Hello,

I did the suggested change but did not help.

With Regards,
Tanushree

tanushree tiwari

unread,
Sep 21, 2018, 11:46:05 AM9/21/18
to Galt Barber, Hiram Clawson, gen...@soe.ucsc.edu
Hello,

This is the message i am getting in my .log file when i initialize and start the parasol job.
2018/09/21 11:30:28: info: starting paraHub on koschei
2018/09/21 11:30:28: error: mustOpen: Can't open parasol.jid to write: Permission denied
2018/09/21 11:30:28: error: paraHub aborted
2018/09/21 11:30:52: info: starting paraHub on koschei
2018/09/21 11:30:52: error: mustOpen: Can't open parasol.jid to write: Permission denied
2018/09/21 11:30:52: error: paraHub aborted
.2018-09-21T11:30.log (END)

Where do you think there is a permission problem.

With Regards,
Tanushree

Hiram Clawson

unread,
Sep 21, 2018, 12:24:35 PM9/21/18
to tanushree tiwari, Galt Barber, gen...@soe.ucsc.edu
Does the './initParasol initalize' operate properly ?

Are you in the parasol directory where the initParasol script
is when you are running it './initParasol start'

Before you go there, what directory are in when you
receive this permission error ?
$ pwd -P

And what are your directory read/write permissions for this
directory you are in:

$ ls -ld .

Galt Barber

unread,
Sep 21, 2018, 8:16:07 PM9/21/18
to Hiram Clawson, tanushree tiwari, gen...@soe.ucsc.edu

The parasol.jid file is made by parasol so that it can try to preserve the auto-incrementing job id sequence
between runs of paraHub. It is saved in the current directory at the time that paraHub is launched.
It is periodically updated as paraHub runs and processes new jobs.

char *jobIdFileName = "parasol.jid";    /* File name where jobId file is. */

So, you will need to change to a directory which you have write access in
before launching parasol.

-Galt

tanushree tiwari

unread,
Sep 24, 2018, 2:05:03 PM9/24/18
to Galt Barber, Hiram Clawson, gen...@soe.ucsc.edu
hello,

I am little bit of confused now what changes or what steps I should modify and run.

cat nodeInfo/*.ms
130.63.76.26    46      129024  /dev/shm        /dev/shm        32256   bsw

After i made the suggested changes  'export parasolHub=`ip addr show | egrep "inet.*eth0|inet.*em1" | head -1  | awk '{print $2}' | sed -e 's#/.*##;'  

/data2/side_analysis/liftover/parasol/initParasol initialize
Permission denied (publickey,password).
# initialized ssh to node: 130.63.76.26

Starting parasol:/home/tanu/.ssh 130.63.76.26 /data2/side_analysis/liftover/bin/paraNode start -cpu=46 log=/data2/side_analysis/liftover/parasol/130.63.76.26.2018-09-24T13:52.log hub=130.63.76.26 umask=002 sysPath=. userPath=bin
sh: 1: /home/tanu/.ssh: Permission denied
Done.

I have write permissions in the folder parasol. It is making the parasol.jid* in parasol folder but it is having few characters

^@^@^@^@
parasol.jid (END)

Any help and suggestions. 


With Regards,
Tanushree

Hiram Clawson

unread,
Sep 24, 2018, 2:35:13 PM9/24/18
to tanushree tiwari, Galt Barber, gen...@soe.ucsc.edu
Good Morning Tanushree:

Are you ssh keys functioning ? It appears they are not:
> Permission denied (publickey,password).

Can you run an ssh command such as the following without any
questions being asked at any time:

$ ssh localhost date

That command should print out the date. No questions asked.

Brian Lee

unread,
Oct 2, 2018, 3:08:05 PM10/2/18
to Hiram Clawson, tiwari.t...@gmail.com, Galt Barber, UCSC Genome Browser Mailing List
Dear Tanushree,

I hope things are going well. I wanted to check-in with your situation. It may be that our older wiki page (http://genomewiki.ucsc.edu/index.php/LiftOver_Howto) with a list of manual steps may be a better approach for you to pursue as it seemed we could not resolve what is going on with your system.

This archived mailing list answer helps outline the wikisteps:

The outline of the process on that page is:

1. Generate PSL alignments (e.g., with BLAT or lastz).
2. Turn those alignments into chains with axtChain.
3. Merge the short chains using chainMergeSort, chainSplit, and chainSort.
4. You may wish to filter your chains at this point with chainPreNet, to remove chains that don't have a chance of being part of the final file.
5. Create a net from the chains using the chainNet program, pass that to netSyntenic to add synteny information, use netChainSubset to create a liftOver file, and finally (optionally) join chain fragments with chainStitchId (this is skipped on the wiki page).

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute
See us FREE @ ASHG on Wed Oct 17th 7:15 am: http://bit.ly/ucscAshg2018

tanushree tiwari

unread,
Oct 2, 2018, 4:29:07 PM10/2/18
to bria...@soe.ucsc.edu, Hiram Clawson, Galt Barber, gen...@soe.ucsc.edu
Hello,
Thank you for the information. I am curious to know till what size does the genome is called small genome. I have the genome of 235 MB is that doable with the steps given on the linked wiki page provided by you.
I visited this page and from this, I went to same_species_liftover.


With Regards,
Tanushree

Brian Lee

unread,
Oct 2, 2018, 5:55:22 PM10/2/18
to tiwari.t...@gmail.com, Hiram Clawson, Galt Barber, UCSC Genome Browser Mailing List
Hi Tanushree,

The 235 MB size should be fine, provided the sequences are repeat-masked first (i.e. lower-cased sequence for repetitive elements and simple tandem repeats).

Thank you again for your inquiry and using the UCSC Genome Browser. If you have any further public questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

All the best,

Brian Lee
UC Santa Cruz Genomics Institute
See us FREE @ ASHG on Wed Oct 17th 7:15 am:  http://bit.ly/ucscAshg2018
Reply all
Reply to author
Forward
0 new messages