Dear Dr.Zhang,
I am a postdoctoral in Baylor College of Medicine. I am new to the iCLIP-seq analysis. I met a problem when I use your CTK-iCLIP pipeline to do the miCLIP-seq analysis.
My miCLIP sequencing data are PE data(data_1.fq.gz, data_2.fq.gz).
My problems are happened on the mapping steps. After the
fastq2collapse.pl, stripBarcode.pl, I got the data_1.c.tag.fq.gz and data_2.c.tag.fq.gz respectively. Then I used the bwa to produce the sam file.
perl ./stripBarcode.pl -format fastq -len 9 data_1.trim.c.fq.gz - | gzip -c > data_1.c.tag.fq.gz
perl ./stripBarcode.pl -format fastq -len 9 data_2.trim.c.fq.gz - | gzip -c > data_2.c.tag.fq.gz
bwa aln -t 8 -n 0.06 -q 35 ./bwa/mm10 ./data_1.c.tag.fq.gz >data_1.sai
bwa aln -t 8 -n 0.06 -q 35 ./bwa/mm10 ./data_2.c.tag.fq.gz >data_2.sai
#error happend at here:
bwa samse ./bwa/mm10 data_1.sai data_2.sai ./data_1.c.tag.fq.gz ./data_2.c.tag.fq.gz > data.sam
Question1:
the "bwa samse" produce a empty data.sam file. This step only spend 0.381 sec. there are no data in the data.sam file.
Then, I use bwa mem to do mapping step:
bwa mem -t 8 -v 2 -M ./bwa/mm10 ./data_1.c.tag.fq.gz ./data_2.c.tag.fq.gz > data.sam
Question2:
I get these errors output:
M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (112, 120, 146)
[M::mem_pestat] low and high boundaries for computing mean and
std.dev: (44, 214)
[M::mem_pestat] mean and
std.dev: (119.48, 20.83)
[M::mem_pestat] low and high boundaries for proper pairs: (10, 248)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (4, 6, 12)
[M::mem_pestat] low and high boundaries for computing mean and
std.dev: (1, 28)
[M::mem_pestat] mean and
std.dev: (7.50, 5.21)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 36)
[mem_sam_pe] [mem_sam_pe] paired reads have different names: "ST-E00310:584:HJ7KJCCXY:5:1223:28798:1836#2#AAAAAAAAA", "ST-E00310:584:HJ7KJCCXY:5:1113:26606:17799#2#AAAAAAAAA"
paired reads have different names: "ST-E00310:584:HJ7KJCCXY:5:2106:3985:72561#1#AAAAAAAAA", "ST-E00310:584:HJ7KJCCXY:5:1106:16559:72614#1#AAAAAAAAA"
Whether am I use wrong in the some steps for PE data?
Question3:
At the step "Peak calling". Could I use the $f.tag.norRNA.bed to replace the $f.tag.uniq.bed to call peak directly? Some errors always appear in the intermediate processing steps.
I very appriciate your kindly help.(^_^)
Take care.
Best,
Chongming Jiang