awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1],
$7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, frag2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS | sort -
k3,3d -k7,7d -S 90 > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted
and the .prejuicebox_sorted file is still empty.(0K)It seems like making changes (1) & (2) you've suggested above still yields the same results. Is there something wrong maybe regarding the format of the output for allValidPairs that could be causing the problem?
I attach here the head of my .allValidPairs file:
SRR1658673.259994198 chr1 10776 + chr7 131533799 + 570 HIC_chr1_1 HIC_chr7_319776 0 42
SRR1658673.72579757 chr1 12259 + chr15 93563685 - 341 HIC_chr1_2 HIC_chr15_188640 2 40
SRR1658676.125252832 chr1 13028 + chr4 134982856 + 426 HIC_chr1_6 HIC_chr4_308064 30 42
SRR1658676.120863062 chr1 13028 - chr15 102348304 - 454 HIC_chr1_6 HIC_chr15_209860 31 42
SRR1658676.152863214 chr1 13028 - chr18 77468066 - 361 HIC_chr1_6 HIC_chr18_177304 30 42
SRR1658673.90423160 chr1 13028 - chr19 560618 + 439 HIC_chr1_6 HIC_chr19_1329 0 2
SRR1658673.201320596 chr1 13028 - chr19 16679875 + 297 HIC_chr1_6 HIC_chr19_54907 31 42
SRR1658676.103541592 chr1 13029 + chr3 78354693 - 456 HIC_chr1_6 HIC_chr3_194482 6 23
SRR1658676.219111220 chr1 13029 - chr11 111314436 - 393 HIC_chr1_6 HIC_chr11_265725 31 42
SRR1658676.81193276 chr1 13031 + chr15 86012106 + 365 HIC_chr1_6 HIC_chr15_168511 11 42
and the file is 70GB....
Problem with creating fragment-delimited maps, NullPointerException.
This could be due to a null fragment map or to a mismatch in the chromosome name in the fragment map vis-a-vis the input file or chrom.sizes file.
Exiting.
done !
Does that mean some possible formatting error in my file?
${TEMP}/$$_allValidPairs.pre_juicebox_sorted <- which is be in ./tmp , where . is whatever the directory I ran the script. Is it odd that, when I include the sort part of the awk command, the script doesn't show signs of crashing(quitting and going back to the command prompt), and yet the pre_juicebox_sorted file is 0KB the whole time? or is it that if the sorting was never finished and the program has crashed, this is what we expect to see?
Thank you so much for you help on running the script and the whole pipeline :)
awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1], $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, f rag2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS > ./tmp/test.out
##sort -k3,3d -k7,7d -S 90 ./tmp/test.out > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted
##awk '{$4=$4!="+"; $7=$7!="+"; n1=split($9, frag1, "_"); n2=split($10, frag2, "_"); } $2<=$5{print $1, $4, $2, $3, frag1[n1], $7, $5, $6, frag2[n2], $11, $12 }$5<$2{ print $1, $7, $5, $6, f rag2[n2], $4, $2, $3, frag1[n1], $12, $11}' $VALIDPAIRS > ./tmp/test.out
##sort -k3,3d -k7,7d ./tmp/test.out > ${TEMP}/$$_allValidPairs.pre_juicebox_sorted