Re: umap Mappability for rn6

31 views

Skip to first unread message

Mehran Karimzadeh

unread,

May 31, 2021, 10:42:04 AM5/31/21

to von Stromberg, Konstantin, Ubismap

Hi Konstantin,

Attached, I am sending you the rnor6 (rat genome) umap uint files from 75-mer and 150-mer unique mappability.

I hope to make a version with more k-mers available in the future on bismap.hoffmanlab.org.

Best,

Mehran

On Wed, May 26, 2021 at 10:17 AM Mehran Karimzadeh <mehran.k...@gmail.com> wrote:

Hi Konstantin,

Just wanted to let you know that I did start the process, and I had several issues because my PhD institute server doesn't have SGE any more and I needed to make changes to use slurm.
In addition, there are other issues with submitting a large number of jobs in array format as required by Umap.

Two other things I had to change which may help you with getting umap running:
1. You need to change chromosome files to start with ">". You can do this by: cat genome.fa | sed 's/^/chr/g' > genome_2.fa
2. You need a chrsize.tsv file: cat genome.fa.faidx | cut -f 1,2 > chrsize.tsv

You need to make sure you are using the correct python and bowtie.

Can you try these and let me know if that resolves your issues?

Mehran
On Tue, May 25, 2021 at 12:04 PM Mehran Karimzadeh <mehran.k...@gmail.com> wrote:
I see. Yes in the paper we see that the percent of genome which is uniquely mappable by different read lengths plateaus as we increase the k-mer size to 100-mers.
Sure, I'll let you know when the files are ready.

Please remind me in a week if you don't hear from me.

Best,
Mehran
On Tue, May 25, 2021 at 11:51 AM von Stromberg, Konstantin <konstantin...@leibniz-hpi.de> wrote:
Dear Mehran,

thank you very much for your reply & effort! I used the human blacklists (https://www.nature.com/articles/s41598-019-45839-z) that were generated by also using mappability data generated by your umap tool – so I figured that it would be advisable to do the same here!

I am using single-read sequencing with 75bp length, but why would the type of sequencing have any impact on this? Do you mean that long reads (>50 for paired and >100 for Single-read) are long enough to be mapped to non-repeat regions without problems by discarding the low-quality reads? Then my 75bp single-end read is probably not enough!

Best and thanks,

Konstantin

From: Mehran Karimzadeh <mehran.k...@gmail.com>
Sent: Dienstag, 25. Mai 2021 16:30
To: von Stromberg, Konstantin <konstantin...@leibniz-hpi.de>
Subject: Re: umap Mappability for rn6

Dear Konstantin,

I hope all is well.

What is the read-length of the sequencing protocol you are using?

If you are using any type of paired sequencing approach with each read > 50 bp or if you are using an unpaired protocol with read length >= 100 bp, I'd say UMAP results wouldn't do you any favour.

Just make sure to filter for mapping quality.

Regardless, I will try to make rat mappability available and get back to you.

It may take some time though.

Best,

Mehran

--

Mehran Karimzadeh, M.Sc., Ph.D.

Post-doctoral Fellow at Vector Institute and UCSF
On Fri, May 21, 2021 at 3:07 AM von Stromberg, Konstantin <konstantin...@leibniz-hpi.de> wrote:
Dear Dr. Karimzadeh,

I hope it’s ok to write you an e-mail here! Its noted on your github page that one should get into contact with you via mail in the case of requests.

Now to my Problem: I am a total beginner when it comes to bioinformatics/linux usage and am working on ChIPseq with rat cells. I also worked on human cells and have used blacklists generated with the help of your human genome mappability UMAP tool.

I do really want to also use these blacklists for the rn6 genome but failed time and time again in creating the mappability unit8 files for it. I don’t have sun engine cluster and have to use manual job ids in a for loop, but even with minor adjustments, there are still problems that I can’t work around somehow.

For example, all my kmers are starting with “>x” for the chromosome, the unify_bowtie job is skipping over the chr1.1.8.kmer, regardless of what I am doing, ect.
Unfortunately, I am running out of time and don’t think I can manage to create this myself. I therefore would kindly ask if you could create these mappability files for me for the rn6 genome? The raw fasta file is available at http://ftp.ensembl.org/pub/release-104/fasta/rattus_norvegicus/dna/ with the Rattus_norvegicus.Rnor_6.0.dna.toplevel.fa.gz fasta file being the genome of choice, I suppose. 
 
I am looking forward for your reply!
 
Thank you very much
Best,
 
M.Sc. Konstantin von Stromberg

Doktorand

PhD student

Viral Transformation

Heinrich-Pette-Institut,

Leibniz-Institut für Experimentelle Virologie

Martinistraße 52

20251 Hamburg

Tel.: +49 (040) 480 51-304

Mail: Konstantin...@leibniz-hpi.de

Das HPI im Internet & den Sozialen Medien:

www.hpi-hamburg.de * www.facebook.com/hpi.hamburg * twitter.com/HeinrichPette