Re: Help with SMART-3Seq analysis

Joe Foley

unread,

Jan 28, 2020, 12:56:15 PM1/28/20

to Lakshmi Kuttippurathu, smart...@googlegroups.com

Hi Lakshmi,

What kind of samples are you using? We see low alignability (high amount of adapter dimers) when the libraries are made from small amounts of RNA, or from degraded RNA, or especially from both (FFPE LCM). 18% alignable is not unusual for FFPE LCM samples, but it would be low if you started with a nanogram of high-quality RNA.

I've never tried aligning to only the transcriptome, but that might be responsible for your low alignability. In figures S11B, S12, S24C, and S27B in the supplement to the Genome Research paper, we showed that more than more than half of the alignments tend to be at sites other than the 3' ends of annotated transcripts, including a substantial proportion that do not align to any part of an annotated transcript. So if you exclude those parts of the genome from your reference, it might make sense that all the reads from those parts would appear unalignable.

Even in the data from the synthetic ERCC transcripts we found reads aligning farther than expected from the 3' end, mainly at the 5' end (unfragmented RNAs) and upstream of internal poly(A) sites (alternative priming sites). But when we tried excluding all of those and only counting alignments at the 3' end, it greatly worsened the correlation of the read counts with the known RNA concentrations (Figure 2A), so I wouldn't recommend limiting your analysis to 3' ends alone.

JWF

On 1/27/20 12:55 PM, Lakshmi Kuttippurathu wrote:

Hello Dr Foley,

I am Lakshmi, a faculty in the Dept of Pathology, Thomas Jefferson University. Our lab is trying to analyze data that was created using SMRT-3Seq protocol as explained in your publication.

However, I am facing some issues with the mapping percentage and the final exon/gene counts. It would be great if you can help me with some questions.

(1) when I use STAR to align my fastq files (after read trimming using umi_homopolymer.py ), I get extremely low mapping. The best I got so far is about 16%. I have tried changing parameters to map with

higher mismatches since these are short reads.

(2) I tried to align with the transcriptome (whole and 250bp from 3 prime end). I am not sure whether this is a good strategy. What do you think?

(3) I am also trying to analyze the data that you used for this publication to see where I am going wrong. One of the data set gave me about 18% mapping and 5.6% exon counts. Is this what is expected? I can share what I have and the parameters I used.

We have tried many attempts to improve the experimental protocol to get the best results out of it. So I wanted to make sure that I am not missing anything at the data analysis part. I can share more information regarding the exact steps if needed.

Any help would be greatly appreciated.

Thank you!

Lakshmi Kuttippurathu

The information contained in this transmission contains privileged and confidential information. It is intended only for the use of the person named above. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or duplication of this communication is strictly prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

CAUTION: Intended recipients should NOT use email communication for emergent or urgent health care matters.

signature.asc

laks...@gmail.com

unread,

Jan 28, 2020, 4:03:13 PM1/28/20

to Smart-3SEQ

Hello,

Thank you very much for your help.

We use fresh, frozen neuron samples (rat) from heart. Sample sizes are from 1 cell to 100 micron diameter circles picked with LCM.

I tried both genome and transcriptome aligning. Thank you for clarifying the transcriptome mapping issue. Like you mentioned it reduced the mapping percentage.( IGV showed that the genome mapped reads mapped everywhere, not just on 3` ends, so that explains it.).

Is there any particular parameter that you specifically used to run STAR and verse (feature counts)? Here are the parameters I am using.

STAR: (I used 30 bp as the length of the genomic sequence around the annotated junction- to create index) star - -runThreadN 60 --quantMode GeneCounts --genomeDir star_index_30 --readFilesIn R247-274-4-trim.fastq --outFileNamePrefix R247-274-4- --outFilterScoreMinOverLread 0.1 --outFilterMatchNminOverLread 0.1 --outFilterMatchNmin 0 --outFilterMismatchNmax 99

Verse: verse Rattus_norvegicus.Rnor_6.0.95_clean.gtf --singleEnd -z 1 --readExtension3 200 --ignoreDup -t 'exon;three_prime_utr;CDS;gene' test.verse R247-274-4-Aligned.out.sam

Please let me know.

Thank you!

Lakshmi

Joe Foley

unread,

Jan 28, 2020, 4:27:34 PM1/28/20

to smart...@googlegroups.com

Yes, those definitely sound like low-input samples, and from LCM on fresh-frozen tissue we've seen results similar to FFPE, probably due to the amount of extra handling involved in LCM. So given the sample quality and the alignment conditions I think 18% alignable reads is not surprising.

Table S3 in the supplement shows the parameters we used for STAR and featureCounts, viz.:

STAR --outFilterMultimapNmax 1 --outFilterMismatchNmax 999 --clip3pAdapterMMp 0.2 --clip3pAdapterSeq AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

featureCounts -s 1 --read2pos 5

This is after creating the STAR reference with "--sjdbOverhang 67", because we typically use 76 nt reads but the first 8 bases are the NNNNNGGG of the 2S primer, so the effective read length is 68 and therefore the overhang length is one less. It seems to work better to let STAR remove the poly(A) so we let umi_homopolymer.py report the trimmed lengths for QC but it doesn't actually trim the poly(A) from the sequences that go into STAR.

--
You received this message because you are subscribed to the Google Groups "Smart-3SEQ" group.
To unsubscribe from this group and stop receiving emails from it, send an email to smart-3seq+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/smart-3seq/bafc2422-631d-403a-b6e3-52a298074adb%40googlegroups.com.

signature.asc

laks...@gmail.com

unread,

Jan 28, 2020, 4:51:39 PM1/28/20

to Smart-3SEQ

Thanks! It is good to know that the mapping percentage we get is not too low for this method.

Thank you for sharing the parameters - I do not remember using those options for verse - Let me see whether it will improve my exon/gene counts. With the parameters I used, it is extremely low.

Lakshmi

To unsubscribe from this group and stop receiving emails from it, send an email to smart...@googlegroups.com.

Reply all

Reply to author

Forward