STAR failing to index A to G converted human genome with --sjdbGTFfile

52 views
Skip to first unread message

Harry Fischl

unread,
Oct 16, 2023, 11:50:21 AM10/16/23
to rna-star
Hi,

I am trying to generate an index for the human genome (GRCh38) in which all As have been converted to Gs with an ensembl annotation gtf file.

The command I use is as follows:
STAR --runMode genomeGenerate --runThreadN 48 --genomeDir "STARIndex" --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly_AtoGconverted.fa --sjdbGTFfile Homo_sapiens.GRCh38.108.gtf --sjdbOverhang 100

the last line in the log file is: Finished inserting junction indices. After this it gets stuck (even after running for more than 12 hours).

The following command works fine on the original un-AtoG-converted genome fasta:
STAR --runMode genomeGenerate --runThreadN 48 --genomeDir "STARIndex" --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly.fa --sjdbGTFfile Homo_sapiens.GRCh38.108.gtf --sjdbOverhang 100

This completes in about 1 hour.

The following without the gtf file also works fine:
STAR --runMode genomeGenerate --runThreadN 48 --genomeDir "STARIndex" --genomeFastaFiles Homo_sapiens.GRCh38.dna.primary_assembly_AtoGconverted.fa

Any suggestions as to why this might not work?

Thanks,

Harry

Alexander Dobin

unread,
Oct 17, 2023, 10:36:37 AM10/17/23
to rna-star
Hi Harry,

This could be an issue with reduced-alphabet genome being much more repetitive, which increases sorting time of long suffixes.
Please try --genomeSuffixLengthMax 300 . This value has to be > largest read length you expect to map.

Harry Fischl

unread,
Oct 19, 2023, 3:46:05 AM10/19/23
to rna-star
Hi Alex,

Thanks for your reply. This still seems to get stuck at the same stage with this  --genomeSuffixLengthMax 300 option added. Is there anything else you would recommend?

Thanks,

Harry

Reply all
Reply to author
Forward
0 new messages