bash $GOBY_HOME/goby 16g stc \
-i $1 \
-o $2 \
-g $3 \
--preserve-all-mapped-qualities \
--preserve-all-tags \
--preserve-soft-clips \
--preserve-read-names \
-x AlignmentWriterImpl:permutate-query-indices=false \
-x SAMToCompactMode:ignore-read-origin=false \
-x MessageChunksWriter:codec=hybrid-1 \
-x AlignmentCollectionHandler:enable-domain-optimizations=true \
-x MessageChunksWriter:compressing-codec=true
Where:
$1 - input bam file
$2 - output goby basename
$3 - genome, after build-sequence-cache
The conversion back to bam is using bam-to-compact. Samtools are used for bam to sam. The bam file has amount 100K records. The stats for the goby entries are attached.
The first 100 lines of the input and output bams (in sam format) are also attached. I have sorted the attributes by their name to assist in comparing.
As you can observe, the are not the same. If you study the first line, you will note the furst difference on the RNEXT & PNEXT columns (8,9).
Could you please comment on these differences and suggest a way to advance to lossless bam->compact-bam process?
Thank you again!