STARsolo 2.7.0e - A problem in reading the barcode file

1,848 views
Skip to first unread message

Miri

unread,
Mar 15, 2019, 8:58:04 AM3/15/19
to rna-star
Hi all
I'm trying to use the latest version (2.7.0e) of STARsolo, but for some reason, it can't read the barcode file.
This is the error I get:
EXITING because of FATAL ERROR in input read file: the total length of barcode sequence is 0 not equal to expected 26
Read ID=@A00187:37:HFM7VDMXX:2:1101:1687:1000 1 N 0   Sequence=
SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting
If UMI+CB length is not equal to the barcode read length, specify barcode read length with --soloBarcodeReadLength

And this is the head of the file:

@A00187:37:HFM7VDMXX:1:1101:1723:1000 1:N:0:TCAGCCGT

TTCGGTCAGCTAACAATACATCCTTG

+

FFF:FFFFFFFFFFFF:FFFFFFF:F

@A00187:37:HFM7VDMXX:1:1101:1958:1000 1:N:0:TCAGCCGT

GACGTGCGTGAGGGTTCCATAGGTCC

+

FFFFFFFFFFFFFFFFFFFFFFFFFF

@A00187:37:HFM7VDMXX:1:1101:2446:1000 1:N:0:TCAGCCGT

TTGACTTAGAGTCTGGACTACAGTTC

+

FFFFFFFFFFFFFFFFFFFFFFFFFF



Any suggestion can help here.

Thanks!

Miri.


Alexander Dobin

unread,
Mar 15, 2019, 3:38:21 PM3/15/19
to rna-star
Hi Miri,

please make sure that the barcode file is 2nd in --readFilesIn, i.e.
--readFilesIn cDNAfragment.fastq.gz Barcode.fastq.gz
Note that for 10X protocol this means that Read2 goes first, i.e.
--readFilesIn Read2.fq.gz Read1.fq.gz

Cheers
Alex

Miri

unread,
Mar 16, 2019, 6:19:21 PM3/16/19
to rna-star
Now I realize that I put a comma between the file names instead of a space. 
Now it's running.
Thanks for the support and your excellent aligner! 
Miri. 

Kathie Mihindukulasuriya

unread,
Jun 12, 2019, 8:14:00 PM6/12/19
to rna-star
Hi Alex,
I am trying to run STARsolo (2.7.1a) but it does not recognize my 10X barcode file:

 /home/mihinduk/STAR-2.7.1a/bin/Linux_x86_64/STAR --runThreadN 10 --runMode alignReads --soloType Droplet --soloCBwhitelist /usr/local/genome/cellranger-3.0.2/cellranger-cs/3.0.2/lib/python/cellranger/barcodes/3M-february-2018.txt.gz  --readFilesIn TWEK-blood_pool-blood_pool-lib1_S5_L003_R2_001.fastq.gz TWEK-blood_pool-blood_pool-lib1_S5_L003_R1_001.fastq.gz --readFilesCommand zcat --soloFeatures GeneFull --genomeDir /home/public/kathie/GRCh38-3.0.0.premrna --outFileNamePrefix TWEK-blood_ --outStd BAM_SortedByCoordinate --outReadsUnmapped Fastx

EXITING because of FATAL ERROR in input CB whitelist file: /usr/local/genome/cellranger-3.0.2/cellranger-cs/3.0.2/lib/python/cellranger/barcodes/3M-february-2018.txt.gz the total length of barcode sequence is 85 not equal to expected 26
SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting

gzip -cd /usr/local/genome/cellranger-3.0.2/cellranger-cs/3.0.2/lib/python/cellranger/barcodes/3M-february-2018.txt.gz | head
AAACCCAAGAAACACT
AAACCCAAGAAACCAT
AAACCCAAGAAACCCA
AAACCCAAGAAACCCG
AAACCCAAGAAACCTG
AAACCCAAGAAACGAA
AAACCCAAGAAACGTC
AAACCCAAGAAACTAC
AAACCCAAGAAACTCA
AAACCCAAGAAACTGC

gzip -cd TWEK-blood_pool-blood_pool-lib1_S5_L003_R1_001.fastq.gz | head
@A00580:81:HKLV3DSXX:3:1101:2329:1016 1:N:0:TTGTTGAT
TCAATCTAGTAACCTCGCCCTATCTCGC
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00580:81:HKLV3DSXX:3:1101:2383:1016 1:N:0:TTGTTGAT
GTGGAGAAGAAGGGATGTTTATATTGAG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00580:81:HKLV3DSXX:3:1101:2672:1016 1:N:0:TTGTTGAT
CCTCCTCCAGACTCTATATGGGATTCGC

I tried adding --soloBarcodeReadLength 85

But then I get this:

EXITING because of FATAL ERROR in input CB whitelist file: /usr/local/genome/cellranger-3.0.2/cellranger-cs/3.0.2/lib/python/cellranger/barcodes/3M-february-2018.txt.gz the total length of barcode sequence is 85 not equal to expected 85
SOLUTION: make sure that the barcode read is the second in --readFilesIn and check that is has the correct formatting

Help!

Kathie

Alexander Dobin

unread,
Jun 13, 2019, 9:53:32 AM6/13/19
to rna-star
Hi Kathie,

please unzip the whitelist file 3M-february-2018.txt.gz, it has to be plain text file.

Cheers
Alex

Kathie Mihindukulasuriya

unread,
Jun 13, 2019, 10:04:23 AM6/13/19
to rna-star
That fixed the problem.  Thanks

Kathie Mihindukulasuriya

unread,
Jun 13, 2019, 10:20:37 AM6/13/19
to rna-star
Hi Alex,

We are trying STARsolo for single nuclei RNAseq.   We have been using a premRNA reference and GTF with CellRanger.  Is STARsolo creating this under the hood from the genomic reference and GTF or should we be using the premRNA reference, like we  did with CellRanger?

Thank you for your help,
Kathie

Alexander Dobin

unread,
Jun 14, 2019, 10:05:04 AM6/14/19
to rna-star
Hi Kathie,

you can run STARsolo with --soloFeatures GeneFull option, and it will generate the gene-cell count matrix for premRNA, i.e. consider overlap between the reads and full gene (exons+introns).
You can also run --soloFeatures Gene GeneFull, which will create both the normal (mature RNA) exonic count and pre-mRNA count.

Cheers
Alex

Jonathan Keats

unread,
Aug 28, 2019, 3:48:02 PM8/28/19
to rna-...@googlegroups.com
Hi Alex,

Is there a special setting or way you need to create the genome index to support GeneFull.  I'm getting a segmentation fault with any --soloFeatures option including "GeneFull" where:
Gene
Gene SJ
work fine while:
GeneFull
Gene GeneFull
Gene SJ GeneFull
all create a segmentation fault after first stage mapping (tested 2.7.1a and 2.7.2a)


Aug 28 12:32:14 ..... started STAR run
Aug 28 12:32:14 ..... loading genome
Aug 28 12:32:42 ..... started 1st pass mapping
Segmentation fault (core dumped)


Example Code block
STAR \
   
--runMode alignReads \
   
--runThreadN 19 \
   
--genomeDir /home/tgenref/homo_sapiens/grch38_hg38/hg38tgen/gene_model/ensembl_v97/tool_resources/star_2.7.1a/100bpReads \
   
--genomeLoad NoSharedMemory \
   
--twopassMode Basic \
   
--sjdbOverhang 99 \
   
--readFilesType Fastx \
   
--readFilesIn \
    MC1685_0003_3_PB_MNC_C1_X5SCR_F01501_G3C84_AACCCGTT_L001_R2_001
.fastq.gz \
    MC1685_0003_3_PB_MNC_C1_X5SCR_F01501_G3C84_AACCCGTT_L001_R1_001
.fastq.gz \
   
--readFilesCommand zcat \
   
--outFileNamePrefix MC1685_0003_3_PB_MNC_C1_X5SCR_ \
   
--outSAMtype BAM SortedByCoordinate \
   
--outSAMmode Full \
   
--outSAMunmapped None \
   
--outSAMmapqUnique 255 \
   
--outSAMattributes NH HI AS nM CR CY UR UY \
   
--soloType Droplet \
   
--soloCBwhitelist /packages/cellranger/3.0.2/cellranger-cs/3.0.2/lib/python/cellranger/barcodes/737K-august-2016.txt \
   
--soloFeatures Gene SJ GeneFull \
   
--soloUMIdedup 1MM_All \
   
--soloCBstart 1 \
   
--soloCBlen 16 \
   
--soloUMIstart 17 \
   
--soloUMIlen 10 \
   
--soloBarcodeReadLength 0 \
   
--soloStrand Reverse \
   
--soloOutFileNames Solo.out/ MC1685_0003_3_PB_MNC_C1_X5SCR_genes.tsv MC1685_0003_3_PB_MNC_C1_X5SCR_barcodes.tsv MC1685_0003_3_PB_MNC_C1_X5SCR_matrix.mtx MC1685_0003_3_PB_MNC_C1_X5SCR_matrixSJ.mtx MC1685_0003_3_PB_MNC_C1_X5SCR_matrixGeneFull.mtx

Maybe its an issue with the ensembl v97 GTF used to build the index?

Alexander Dobin

unread,
Aug 29, 2019, 11:35:23 AM8/29/19
to rna-star
Hi Jonathan,

please try the later releases 2.7.1a or 2.7.2a as some bugs have been fixed.
Or even the development branch https://github.com/alexdobin/STAR/tree/soloDevelop which will soon be released.
If it still causes seg-faults, please send me the Log.out file.

Cheers
Alex

Bryce Turner

unread,
Aug 29, 2019, 3:37:23 PM8/29/19
to rna-star
Hi Alex,

I've attached the Log.out for the example mentioned by Jonathan's earlier post. Let me know if any other information is needed!

Thank you for your help,

Bryce

This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged, including patient health information. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.

MC1685_0003_3_PB_MNC_C1_X5SCR_Log.out

Alexander Dobin

unread,
Aug 29, 2019, 4:20:57 PM8/29/19
to rna-star
Hi Bryce,

thanks - nothing suspicious in the Log.out file...
Could you please try to run it with --readMapNumber 1000 (or higher until it seg-faults), and then cut that many reads from the fastq R1 and R2, and send them to me?

Thanks!
Alex

Alexander Dobin

unread,
Sep 1, 2019, 2:57:55 PM9/1/19
to rna-star
Hi Bryce,

thanks for the files!
I think the problem is related to the 2-pass mode, and it was fixed on the development branch:
Please try it out and let me know if the problem is fixed. I will make a 2.7.3 release out of it next week, hopefully.

Cheers
Alex
Reply all
Reply to author
Forward
0 new messages