Re: [EXTERNAL] Three questions about the CTK-iCLIP pipeline usage. Thank Dr.Zhang so much!

71 views
Skip to first unread message

Chaolin Zhang

unread,
Aug 30, 2020, 10:43:21 PM8/30/20
to JIANG, CHONGMING, CTK User Group
Hi,

You will have to run bwa samse one file at a time.  Also, our standard pipeline does not deal with paired end reads.

Hope this helps!

Chaolin


On Aug 28, 2020, at 5:56 PM, JIANG, CHONGMING <CHONGMI...@bcm.edu> wrote:


Dear Dr.Zhang,

I am a postdoctoral in Baylor College of Medicine. I am new to the iCLIP-seq analysis. I met a problem when I use your CTK-iCLIP pipeline to do the miCLIP-seq analysis.

My miCLIP sequencing data are PE data(data_1.fq.gz, data_2.fq.gz).

My problems are happened on the mapping steps. After the fastq2collapse.pl, stripBarcode.pl, I got the data_1.c.tag.fq.gz and data_2.c.tag.fq.gz respectively. Then I used the bwa to produce the sam file.
perl ./fastq2collapse.pl data_1.fq.gz - | gzip -c > data_1.trim.c.fq.gz
perl ./fastq2collapse.pl data_2.fq.gz - | gzip -c > data_2.trim.c.fq.gz

perl ./stripBarcode.pl -format fastq -len 9 data_1.trim.c.fq.gz - | gzip -c > data_1.c.tag.fq.gz
perl ./stripBarcode.pl -format fastq -len 9 data_2.trim.c.fq.gz - | gzip -c > data_2.c.tag.fq.gz

bwa aln -t 8 -n 0.06 -q 35 ./bwa/mm10 ./data_1.c.tag.fq.gz >data_1.sai
bwa aln -t 8 -n 0.06 -q 35 ./bwa/mm10 ./data_2.c.tag.fq.gz >data_2.sai

#error happend at here:
bwa samse ./bwa/mm10  data_1.sai data_2.sai ./data_1.c.tag.fq.gz ./data_2.c.tag.fq.gz > data.sam

Question1:  
the "bwa samse" produce a empty data.sam file. This step only spend 0.381 sec. there are no data in the data.sam file.

Then, I use bwa mem to do mapping step:
bwa mem -t 8 -v 2 -M ./bwa/mm10 ./data_1.c.tag.fq.gz ./data_2.c.tag.fq.gz > data.sam

Question2:
I get these errors output:

M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (112, 120, 146)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (44, 214)
[M::mem_pestat] mean and std.dev: (119.48, 20.83)
[M::mem_pestat] low and high boundaries for proper pairs: (10, 248)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (4, 6, 12)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 28)
[M::mem_pestat] mean and std.dev: (7.50, 5.21)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 36)
[mem_sam_pe] [mem_sam_pe] paired reads have different names: "ST-E00310:584:HJ7KJCCXY:5:1223:28798:1836#2#AAAAAAAAA", "ST-E00310:584:HJ7KJCCXY:5:1113:26606:17799#2#AAAAAAAAA"

paired reads have different names: "ST-E00310:584:HJ7KJCCXY:5:2106:3985:72561#1#AAAAAAAAA", "ST-E00310:584:HJ7KJCCXY:5:1106:16559:72614#1#AAAAAAAAA"

Whether am I use wrong in the some steps for PE data?
 
Question3:
At the step "Peak calling". Could I use the $f.tag.norRNA.bed to replace the $f.tag.uniq.bed to call peak directly? Some errors always appear in the intermediate processing steps.

I very appriciate your kindly help.(^_^)

Take care.
Best,

Chongming Jiang

JIANG, CHONGMING

unread,
Aug 31, 2020, 1:02:45 AM8/31/20
to Chaolin Zhang, CTK User Group

Thank Dr.Zhang for your kindly response.

I am new to the iCLIP-seq data analysis. For the paired end reads iCLIP-seq data, could there any suitable software to do it? I very appreciate it.

Take care.

Best wishes and regards,

Chongming

From: Chaolin Zhang <zhangc...@gmail.com>
Sent: Sunday, August 30, 2020 9:42 PM
To: JIANG, CHONGMING <CHONGMI...@bcm.edu>
Cc: CTK User Group <ctk-use...@googlegroups.com>
Subject: Re: [EXTERNAL] Three questions about the CTK-iCLIP pipeline usage. Thank Dr.Zhang so much!
 
***CAUTION:*** This email is not from a BCM Source. Only click links or open attachments you know are safe.
Reply all
Reply to author
Forward
0 new messages