Regarding the Mapping output

368 views
Skip to first unread message

Vikash Yadav

unread,
Oct 23, 2017, 2:49:40 PM10/23/17
to HiC-Pro
Hi,

I am using the HiC pro to process my data. If we see the output of HiC pro it produced different mapping files. The .bam files contain the uniquely mapped read in the pair end formate. Correspond to this it also has the .sam file in the HiC result folder which contains only the uniquely mapped reads. Beside this, it also has the .bam file in the bowtie folder which contains the R1 and R2 mapped file separately which correspond to all reads (mapped, unmapped, multiple maps). For my purpose, I want R1 and R2 only for the uniquely mapped reads. But when I am trying to extract from this file R1 and R2 are not equal in number which is expected. I want R1 and R2 to be equal in the R1 and R2 mapped file which contains only the uniquely mapped reads.  
One alternative way to do this if I could split my sam/bam file (paired end data) into R1 and R2 sam/bam file. I tried this using this command

samtools view -F 0x40 -bS Viks1.bwt2pairs.bam > R1.bam
or
samtools view -F 0x40 -h Viks1.bwt2pairs.bam > R1.sam

But it seems it could not separate the R1.bam and R2.bam (R1.sam and R2.sam).

So is there any way by which using HiC pro I can get R1 and R2 bam/sam separately for uniquely mapped reads or I can split my uniquely paired-end mapping file into R1 and R2.

Thanks in advance

Best
Vikash


nservant

unread,
Oct 24, 2017, 4:58:53 AM10/24/17
to HiC-Pro
Hi Vikash

That's strange. It works well on my side on the test dataset.
>>samtools view -c SRR400264_00_hg19.bwt2pairs.bam ## all reads
326932
>>samtools view -F 0x40 -c SRR400264_00_hg19.bwt2pairs.bam ## R1
163466
>>samtools view -F 0x80 -c SRR400264_00_hg19.bwt2pairs.bam ## R2
163466

I'm using samtools v1.1.

N

vicky

unread,
Oct 24, 2017, 3:18:00 PM10/24/17
to HiC-Pro
Yes, I agree with you that when you count the number you will get the exact half the number which is expected. But if you check the flagstat status of files then it reveals some abnormalities. T=
The first is the flagstat details of the Pair-end mapped bam file
The second is the  flagstat details of the single end mapped bam file (generated through the HiC pro)
Third is the flagstat details of the R1.bam file extracted though command "samtools view -F 0x40 -h/bS file.bam ## R1"

when you just see the output of 2 and 3 you can see the extracted bam does not look like the single end bam file and it has information for mate pair mapping

(1)
samtools flagstat pairend.bam
59184172 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
59184172 + 0 mapped (100.00% : N/A)
59184172 + 0 paired in sequencing
29592086 + 0 read1
29592086 + 0 read2
59184172 + 0 properly paired (100.00% : N/A)
59184172 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
7395194 + 0 with mate mapped to a different chr
6561384 + 0 with mate mapped to a different chr (mapQ>=5)

(2)
samtools flagstat signleendmaped.bam
44667530 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
40832459 + 0 mapped (91.41% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

(3)
samtools flagstat extractedR2.bam                            
29592086 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
29592086 + 0 mapped (100.00% : N/A)
29592086 + 0 paired in sequencing
29592086 + 0 read1
0 + 0 read2
29592086 + 0 properly paired (100.00% : N/A)
29592086 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
3697597 + 0 with mate mapped to a different chr
3290634 + 0 with mate mapped to a different chr (mapQ>=5)

nservant

unread,
Oct 25, 2017, 12:37:57 PM10/25/17
to HiC-Pro
Hi Vick
That's strange.
Could you share with me your BAM file ??

Here is what I have using the files available here and samtools 1.3
https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_results/HiC_Pro_v2.7.9_test_data/bowtie_results/bwt2/dixon_2M/

>>samtools view -F 0x40 -h SRR400264_00_hg19.bwt2pairs.bam > R1.bam
>>samtools view -F 0x80 -h SRR400264_00_hg19.bwt2pairs.bam > R2.bam
>>samtools flagstat R1.bam
163466 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
163466 + 0 mapped (100.00% : N/A)
163466 + 0 paired in sequencing
0 + 0 read1
163466 + 0 read2
163466 + 0 properly paired (100.00% : N/A)
163466 + 0 with itself and mate mapped

0 + 0 singletons (0.00% : N/A)
36612 + 0 with mate mapped to a different chr
34925 + 0 with mate mapped to a different chr (mapQ>=5)

>>samtools flagstat R2.bam
163466 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
163466 + 0 mapped (100.00% : N/A)
163466 + 0 paired in sequencing
163466 + 0 read1
0 + 0 read2
163466 + 0 properly paired (100.00% : N/A)
163466 + 0 with itself and mate mapped

0 + 0 singletons (0.00% : N/A)
36612 + 0 with mate mapped to a different chr
34965 + 0 with mate mapped to a different chr (mapQ>=5)

>>samtools flagstat SRR400264_00_hg19.bwt2pairs.bam
326932 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
326932 + 0 mapped (100.00% : N/A)
326932 + 0 paired in sequencing
163466 + 0 read1
163466 + 0 read2
326932 + 0 properly paired (100.00% : N/A)
326932 + 0 with itself and mate mapped

0 + 0 singletons (0.00% : N/A)
73224 + 0 with mate mapped to a different chr
69890 + 0 with mate mapped to a different chr (mapQ>=5)

vicky

unread,
Oct 25, 2017, 2:04:32 PM10/25/17
to HiC-Pro
Hi Nicolas,

Yes, I can share my bam file but its a big file. Please let me know how I can provide you.

your results are the same what I am getting. If you see the result of your "samtools flagstat SRR400264_00_hg19.bwt2pairs.bam" its pattern is similar to the R1.bam R2.bam extracted through the the mentioned command. Although the SRR400264_00_hg19.bwt2pairs.bam is pair end mapping file.
>>samtools view -0x40 -h SRR400264_00_hg19.bwt2pairs.bam > R1.bam
>>samtools view -0x80 -h SRR400264_00_hg19.bwt2pairs.bam > R2.bam

But if you map the fastq file in single end mapping pattern and then check the flagstat

samtools flagstat signleendmaped.bam
44667530 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
40832459 + 0 mapped (91.41% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

This pattern indicates that it does not have any mate-pair information which i am expecting from the extracted R1/R2.bam files. Is this make sense?.

nservant

unread,
Oct 26, 2017, 4:29:03 AM10/26/17
to HiC-Pro
Not sure to understand what you mean by ;


But if you map the fastq file in single end mapping pattern and then check the flagstat

How do you generate your "singleendmaped.bam" file ? Is the output of the samtools -F 0x4 / -F 0x8 comand ?
The point that I do not understand, is that if you run the flagstat on single-end BAM ... why do you expect to have pairs information ??
Sorry if I miss something !
N

vicky

unread,
Oct 26, 2017, 6:08:36 AM10/26/17
to HiC-Pro
Hi,

Sorry for making it complex.
As you said that ^if you run the flagstat on single-end BAM ... why do you expect to have pairs information ??

This is what I am saying when I am running the flagstat for the BAM file generated through the samtools -F 0x4 / -F 0x8 command (I assume this file have information for the only R1 or R2 reads), I should not have any information for the pairs. But somehow it has information for that

>>samtools flagstat R2.bam
163466 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
163466 + 0 mapped (100.00% : N/A)
163466 + 0 paired in sequencing
163466 + 0 read1
0 + 0 read2
163466 + 0 properly paired (100.00% : N/A)
163466 + 0 with itself and mate mapped

0 + 0 singletons (0.00% : N/A)
36612 + 0 with mate mapped to a different chr
34965 + 0 with mate mapped to a different chr (mapQ>=5)

Is this make sense.!!!!!

Best
Vikash

nservant

unread,
Oct 27, 2017, 3:37:46 AM10/27/17
to HiC-Pro
ahhh ok I see !
This is related to samtools, not really to HiC-pro.
You should use the samtools fixmate command to solve this issues.

>>samtools flagstat R1.bam
163466 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
163466 + 0 mapped (100.00% : N/A)
163466 + 0 paired in sequencing
0 + 0 read1
163466 + 0 read2

163466 + 0 properly paired (100.00% : N/A)
163466 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
36612 + 0 with mate mapped to a different chr
34925 + 0 with mate mapped to a different chr (mapQ>=5)

>>samtools fixmate R1.bam R1_fm.bam

>>samtools flagstat R1_fm.bam
163466 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
163466 + 0 mapped (100.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2

0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Cheers. N
Message has been deleted

vicky

unread,
Oct 30, 2017, 2:49:39 PM10/30/17
to HiC-Pro
Hi Nicolas,

Ohh yes, This is what I was looking for. Thank you very much for this help. You are doing great work by answering everyone query and helping them in all the possible way. 

Cheers

vicky

unread,
Oct 30, 2017, 2:49:55 PM10/30/17
to HiC-Pro
Hi Nicolas,

Ohh yes, This is what I was looking for. Thank you very much for this help. You are doing great work by answering everyone query and helping them in all the possible way. 

Cheers


On Friday, October 27, 2017 at 9:37:46 AM UTC+2, nservant wrote:

vicky

unread,
Oct 31, 2017, 7:22:59 AM10/31/17
to HiC-Pro
Hi Nicolas,

I have one query regarding the HiC pro
In config file 
Hi-C PROCESSING
[MIN_CIS_DIST]Filter short range contact below the specified distance. Mainly useful for DNase Hi-C. Example: 1000
GET_ALL_INTERACTION_CLASSESCreate output files with all classes of 3C products. Default: 0
GET_PROCESS_BAMCreate a BAM file with all aligned reads flagged according to their classifaction and mapping category. Default: 0


What will be the chane in the output if i put
GET_ALL_INTERACTION_CLASSES= 1 
GET_PROCESS_BAM= 1

Best
Vikash

nservant

unread,
Oct 31, 2017, 12:00:12 PM10/31/17
to HiC-Pro
Hi Vick

GET_ALL_INTERACTION_CLASSES= 1
will output all discarded classes for interaction products, i.e. instead of having only the .validPairs information, you will also have the dangling-end, self-circle, etc. in distinct files.

GET_PROCESS_BAM= 1
will generate a SAM file (I need to change it into BAM), with all mapped reads with a TAG (XX) specifying if the read pair was classify as "valid", "dangling-end", etc.
Basically, there is no real interest in generating thoses files. This is more a way to double check that everything is fine !
Cheers

vicky

unread,
Oct 31, 2017, 3:16:01 PM10/31/17
to HiC-Pro
Hi Nicolas,

Ok, got it. Thank you.
I am working on that new thread regarding which I spoke to you. Soon I will post on the group for discussion.

Thanks a lot
Best
V

Zhibo M.

unread,
Oct 26, 2018, 2:59:06 PM10/26/18
to HiC-Pro
I also have some confusion on this parameter. Because the default config-hicpro.txt and the config_test_latest.txt are different on this parameter.
I am still not sure what the "interaction products" being outputed and how these output info being used. Do you mean that, if set to 1, the info other than the validPairs (like self-ligate, dangling-ends, religation...)will be used in the subsequent steps to build matrix and final contact maps?
If so, I am not sure why did you put this option here because I think they (danling-end and self-circle) should be simply be filtered out from the analysis.
Could you help clarify if I am understanding it correctly?
Thanks
Zhibo

nservant

unread,
Oct 29, 2018, 3:51:42 AM10/29/18
to HiC-Pro
GET_ALL_INTERACTION_CLASSES will simply output all classes of interaction in different output file.
In any case, only valid interaction are used, so this option does not have any impact on the final results. This is just to double check the pairs which are discarded.

GET_PROCESS_BAM will generate a BAM file of aligned reads, and flag them according to their classification in valid pairs, dangling end, self circle, etc.
As previously, no impact on the final contact maps, but just a way to visualize the data using IGV for instance, and to check that everything looks good.
Best

Zhibo M.

unread,
Oct 29, 2018, 4:14:58 AM10/29/18
to HiC-Pro, nservant
Great! All clear now!
Thank you!
Zhibo


From: hic...@googlegroups.com <hic...@googlegroups.com> on behalf of nservant <nicolas.se...@gmail.com>
Sent: Monday, October 29, 2018 3:51:42 AM
To: HiC-Pro
Subject: [hic-pro] Re: Regarding the Mapping output
 
--
You received this message because you are subscribed to a topic in the Google Groups "HiC-Pro" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hic-pro/BZqCp4juHrQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hic-pro+u...@googlegroups.com.
To post to this group, send email to hic...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/hic-pro/6ecea2af-96e3-4bb3-b917-8530f666fd58%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages