Error: the chromosome combination 1_3 appears in multiple blocks

430 views
Skip to first unread message

Marine Bergot

unread,
Aug 5, 2022, 3:24:48 PM8/5/22
to 3D Genomics
Hi,

i 'm trying to use juicer pre with :
java -jar hic_tools_3.22.02.jar pre

i tried to follow steps as indicated here, then i used bwa mem for R1 and R2, then i merged the 2 bam with samtools merge.

after that is used :
samtools view /work/gad/shared/analyse/HiC/data/dijhic006/dijhic006_merge_hg38.bam | awk 'BEGIN {FS="\t"; OFS="\t"} {name1=$1; str1=and($2,16); chr1=$3; pos1=$4; mapq1=$5; getline; name2=$1; str2=and($2,16); chr2=$3; pos2=$4; mapq2=$5; print name1, str1, chr1, pos1, 0, str2, chr2, pos2 ,1, mapq1, mapq2}' > dijhic006_merge_hg38.input.txt

and :
sort -k2,2d -k6,6d dijhic006_merge_hg38.input.txt > dijhic006_merge_hg38.input.sorted.txt

my file :
A00417:65:HMMNNDMXX:2:1101:10004:12618  0       chr5_KI270796v1_alt     65888   0       0       chr5    125339997       1       0       0
A00417:65:HMMNNDMXX:2:1101:10004:15499  0       chr6    136532936       0       0       chr6    44678249        1       60      60
A00417:65:HMMNNDMXX:2:1101:10004:1626   0       chr5    180785324       0       0       chr19   926262  1       60      60
A00417:65:HMMNNDMXX:2:1101:10004:17002  0       chrX    116314714       0       0       chrY    20053390        1       60      60
A00417:65:HMMNNDMXX:2:1101:10004:17190  0       chr10   63908899        0       0       chr3    107182900       1       60      60
A00417:65:HMMNNDMXX:2:1101:10004:22514  0       chr8_KI270822v1_alt     588042  0       0       chr1    117696185       1       0       60
A00417:65:HMMNNDMXX:2:1101:10004:22858  0       chr3    7995924 0       0       chr1    79865250        1       60      58

and when i'm trying juicer :
java -jar hic_tools_3.22.02.jar pre dijhic006_merge_hg38.input.sorted.txt test_dijhic006.hic hg38
WARN [2022-08-03T14:05:13,218]  [Globals.java:138] [main]  Development mode is enabled
Using 1 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
Start preprocess
Writing header
Writing body
.........Error: the chromosome combination 1_3 appears in multiple blocks

did i miss a step?
thanks for your help!
Marine

Moshe Olshansky

unread,
Aug 7, 2022, 6:55:17 AM8/7/22
to 3D Genomics
Hi Marine.

Since you have read names, you need to sort on fields 3 and 7 (the chromosome names), so the sorting command should be:
sort -k3,3d -k7,7d dijhic006_R1_hg38.input.txt > dijhic006_merge_hg38.input.sorted.txt

Best regards,
Moshe.

Marine Bergot

unread,
Aug 8, 2022, 11:58:37 AM8/8/22
to 3D Genomics
hi!

Thanks for your answer, i tried as you said, i have almost the same error :

java -jar hic_tools_3.22.02.jar pre dijhic006_merge_hg38.input.sorted.txt test_dijhic006.hic hg38
WARN [2022-08-08T10:21:55,355]  [Globals.java:138] [main]  Development mode is enabled

Using 1 CPU thread(s) for primary task
Using 10 CPU thread(s) for secondary task
Start preprocess
Writing header
Writing body
..........................Error: the chromosome combination 1_10 appears in multiple blocks


and my file :
A00417:65:HMMNNDMXX:2:1101:1009:7529    0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10158:32863  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10176:1861   0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10176:1861   0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10366:24455  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10682:17926  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10682:17926  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10682:29888  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:10863:6903   0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:11035:16125  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:11089:32847  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:11153:27383  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:11153:27383  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:11252:6793   0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:11397:11929  0       *       0       0       0       *       0       1       0       0
A00417:65:HMMNNDMXX:2:1101:12237:2644   0       *       0       0       0       *       0       1       0       0


i guess i still miss a step?
thanks for your help


Best,
Marine

Moshe Olshansky

unread,
Aug 8, 2022, 8:31:12 PM8/8/22
to 3D Genomics
Hi Marine,

It is strange. How big is your dijhic006_merge_hg38.input.sorted.txt file? If it is not too big, can you (g)zip it and send it to me (or let me download it)?

Moshe.

--
You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/sGBhhng8xtw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/3d-genomics/71e08652-4925-4ac2-a8d1-81abee737387n%40googlegroups.com.

Marine Bergot

unread,
Aug 10, 2022, 11:14:02 AM8/10/22
to 3D Genomics
Hi Moshe,

thanks for your help :)

here a link to dl the file (and the pwd LEZCpCdxZM)


Marine

Moshe Olshansky

unread,
Aug 10, 2022, 9:50:40 PM8/10/22
to 3D Genomics
Hi Marine,

I downloaded your file. Please see below:

awk '{if ($3 == "chr1" && $7 == "chr10") {print NR,$0; exit}}' dijhic006_merge_hg38.input.sorted.txt
4691370 A00417:65:HMMNNDMXX:2:1101:10104:6715    0    chr1    24928772    0    0    chr10    75633923    1    60    47
awk '{if ($3 == "chr10" && $7 == "chr1") {print NR,$0; exit}}' dijhic006_merge_hg38.input.sorted.txt
12783104 A00417:65:HMMNNDMXX:2:1101:10004:2284    16    chr10    70907616    0    16    chr1    12404324    1    60    60

So you have records where chr1 comes before, say, chr10 and those where it is the other way around. This never happens if you use juicer pipeline but can happen with other pipeline. So you first need to make sure that the order of chromosomes for each pair is the same. I would do the following:

awk '{if ($3 <= $7) print $0; else print $1,$6,$7,$8,$9,$2,$3,$4,$5,$11,$10}' dijhic006_merge_hg38.input.sorted.txt > correct_dijhic006_merge_hg38.input.txt
sort -k3,3d -k7,7d correct_dijhic006_merge_hg38.input.txt > correct_dijhic006_merge_hg38.input.sorted.txt
and then run pre on correct_dijhic006_merge_hg38.input.sorted.txt

Hope this helps,
Moshe.

AM

unread,
Apr 15, 2023, 8:52:26 PM4/15/23
to 3D Genomics
Hi Moshe,

How can I deal with this problem if I have pairs.gz file (porec_test.concatemers.pairs.gz).

I am running this command and the info on the porec_test.concatemers.pairs.gz file is below:

(base) Data1$ java -Xmx48000m  -Djava.awt.headless=true -jar juicer_tools_1.22.01.jar pre --threads 16 /Data1/wf-pore-c-master/output/pairs/porec_test.concatemers.pairs.gz test_contact_map.hic /Data1/wf-pore-c-master/test_data/porec_test.fasta_chrom.sizes 

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.

WARN [2023-04-06T16:05:03,267]  [Globals.java:138] [main]  Development mode is enabled

Using 16 CPU thread(s)

Not including fragment map

Start preprocess

Writing header

Writing body

...Error: the chromosome combination 1_2 appears in multiple blocks


(base) Data1$ gzip -dc wf-pore-c-master/output/pairs/porec_test.concatemers.pairs.gz | head -n 5

## pairs format v1.0.0

#shape: whole matrix

#genome_assembly: unknown

#chromsize: chr1 3577

#chromsize: chr2 7551


(base) Data1$ gzip -dc wf-pore-c-master/output/pairs/porec_test.concatemers.pairs.gz | grep -v "#" | head -n 5

CONCAT0 chr2 5443 chr1 3003 + - UU 1 R1 32 60 5443 3003 5512 1104 70M 1900M 70 1900 70 1900 70 1900 70 1900 0 0 0 0 1 540 5445 2 1106 3006

CONCAT0 chr1 1104 chr1 1103 + - UU 2 R1 60 60 1104 1103 3003 602 1900M 502M 1900 502 1900 502 1900 502 1900 502 0 0 0 0 1 604 1106 1 604 1106

CONCAT0 chr1 602 chr2 6455 + - UU 3 R1 60 60 602 6455 1103 5530 502M 926M 502 926 502 926 502 926 502 926 0 0 0 0 0 0 604 4 5532 6458

CONCAT0 chr2 5530 ! 0 + + UN 4 R1 60 0 5530 0 6455 0 926M None 926 0 926 0 926 0 926 5515 5532 -1 0 0

CONCAT0 ! 0 chr2 6538 - - NU 5 R1 0 51 0 6538 0 6456 None 83M 0 83 0 83 0 83 0 83 0 0 0 0 -1 0 0 5 6458 6541


(base) test_data$ cat porec_test.fasta_chrom.sizes

chr1 3577

chr2 7551



Moshe Olshansky

unread,
Apr 17, 2023, 1:15:10 AM4/17/23
to 3d-ge...@googlegroups.com
Hi Marine,

I am not familiar with the pairs format. Is it one record per contact? Does it have any mapping quality (and if yes, where).

Regards,
Moshe.

--
You received this message because you are subscribed to a topic in the Google Groups "3D Genomics" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/3d-genomics/sGBhhng8xtw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to 3d-genomics...@googlegroups.com.

Nastya Gridasova

unread,
Apr 17, 2023, 10:12:37 AM4/17/23
to 3d-ge...@googlegroups.com
Hello,

The sequencing was done with Oxford Nanopore. Trying to visualize Oxford Nanopore Pore-C test data.
.pairs file (attached)  was obtained by running :

sudo /home/ubuntu/nextflow run epi2me-labs/wf-pore-c --ubam test_data/porec_test.concatemers.bam --ref test_data/porec_test.fasta --coverage --pairs --cutter NlaIII


Commands to try to visualize pairs file with juicer are attached.

Thank you very much for looking into it!
Best,
Anastasia

porec_test.concatemers.pairs.txt
juicer.rtf
Reply all
Reply to author
Forward
0 new messages