Incompatible read groups in gstacks 2.68

34 views
Skip to first unread message

sedg...@gmail.com

unread,
May 9, 2025, 5:24:40 PMMay 9
to Stacks
Hi,

I'm working with three sets of samples that I'm trying to combine in a single analysis. There are 384 samples in total, and I've selected 81 for inclusion in the catalog. In several attempts at gstacks, it appears that samples that aren't in the catalog are all labelled with ID 82:

```
Reading BAM headers...
Error: Incompatible read groups (ID:82, SM:BWA13) and (ID:82, SM:BWA1).
Error: In BAM files './stacks25/BWA13.matches.bam' and './stacks25/BWA1.matches.bam.
```

I wonder if this is a regression of the bug that was fixed in version 2.67, or if I've done something wrong? The only thing I can think of is using process_radtags 2.67 output with the rest of the pipeline using 2.68 maybe my mistake?

I'll paste the steps I've done below. I'd really like to avoid using all 384 samples in the catalog, as this is taking a lot of time, and will keep growing as I add new samples each year.

Thanks,

Tyler

I used v2.67 of process_radtags with the following arguments:
process_radtags -1 <SAMPLE>.R1.fastq.gz \
                  -2 <SAMPLE>R2.fastq.gz \
                  -b <BARCODES> -o <OUTDIR> \
                  --renz-1 nsiI --renz-2 mspI --inline-null -c -q -r --threads 16 

From here forwards I used v2.68

ustacks -t gzfastq -f ${READS_DIR}/${SAMPLE}.1.fq.gz \
          -o ${OUT_DIR} -m 3 --name ${SAMPLE} -M $M -p ${THREADS}

cat25Popmap.tsv contains 81 samples:

cstacks -n 4 -P stacks25/ -M cat25Popmap.tsv -p 48

sstacks -c ${DIR} -s ${DIR}/${SAMPLE} -p $THREADS

tsv2bam -P stacks25 -s $SAMPLE -R ./prort/all_files/ -t $THREADS

gstacks -P ./${DIR}/ -M gs25Popmap.tsv -t 32                         

Catchen, Julian

unread,
May 12, 2025, 5:44:17 PMMay 12
to stacks...@googlegroups.com

Hi Tyler,

 

Using an older version of process_radtags should not be related to this problem. When you run sstacks, I think you need to specify a popmap containing all the samples, so that sstacks only runs one time, looping itself over all the samples supplied in the popmap. I think you are running sstacks repeatedly in a shell loop? If you had all samples in the catalog this would be fine, but since you have a subset, sstacks will check the catalog and increment the ID from the last one in the catalog – so if you run sstacks on its own multiple times, it should explain why you get the same ID repeated multiple times. The bug fixed in 2.67 would have misnumbered two of the samples, but all the rest would increment as expected.

 

Best,

 

Julian

Reply all
Reply to author
Forward
0 new messages