Finding simple sequence repeat (SSR) markers for a given disease

33 views
Skip to first unread message

setar...@gmail.com

unread,
Jul 19, 2016, 1:03:20 AM7/19/16
to trinityrnaseq-users

Hi all experts,


Frist of all, please accept my apology if this question isn't relevant to here, however, I found that this forum is more active others, so please let me consult with you. 

I have got Illumina sequencing reads from a healthy and diseased mouse. I want to compare the simple sequence repeats (SSR) between healthy and disease groups and discover the disease-related SSR markers. This is my first experience on this issue, so I was wondering what is the suitable pipeline for this work? any comment and suggestion would be highly appreciated.



Thanks in advance

Mark Chapman

unread,
Jul 19, 2016, 2:41:06 AM7/19/16
to maryam moazam, trinityrnaseq-users
Hello,
Do you know the gene which is mutated and causing the trait (1)? or are you trying to find the gene using SSRs (2)?
1. If you know the gene you can hope there is an SSR in that gene that differs between diseased and non mice, and then use this to identify diseased and non-diseased mice.
2. If you don't know the gene you'll be doing an association analysis - genotyping hundreds of markers in hundreds of mice. Is this what you're trying to do?
Cheers, Mark

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Mark A. Chapman
+44 (0)2380 594396
------------------------------------
Centre for Biological Sciences
University of Southampton
Life Sciences Building 85
Highfield Campus
Southampton
SO17 1BJ

setar...@gmail.com

unread,
Jul 19, 2016, 3:35:57 AM7/19/16
to trinityrnaseq-users
Hi Mark, thank you for your response.

Regarding your questions (1) no, I have not  previous knowledge about a gene or SSR. I have to find the probable SSR involved in disease in hope to use them as SSR marker for this kind of disease.

I have three transcriptome libraries for healthy and three ones for diseased mice generated from mRNA fraction. As the genome mount is available, I think of generating the genome-guided transcriptome assembly for healthy and diseased mice, separately. Then, the two transcriptome dataset will be screened for SSR detection, followed by statistical analysis to compare the motifs frequency between them. So, after this analysis, I can identify different SSR motifs between healthy and diseased mice along with their position on the gene (given the annotation of genome mice). Could you please let me know your opinion about this workflow and correct me whenever I'm wrong? 

I will be glad to have any suggestion and advice from you.

Thank you

Mark Chapman

unread,
Jul 19, 2016, 6:01:35 AM7/19/16
to maryam moazam, trinityrnaseq-users
Hi,

So is the plan to use the expression variation between diseased and non-diseased to identify candidate genes? Or are you using the assembly, without any data on gene expression, to identify SSRs? I am also not sure what you mean by "compare the motifs frequency between them"

The first will presumably reveal several DE genes, some with and some without SSRs. The latter will reveal thousands of SSRs. I'm not really sure what you're trying to find. Are the mice isogenic and you think an SSR might have mutated between healthy and diseased mice? Or are they from different backgrounds (in which case dozens of SSRs will differ)

Thanks, sorry not to be much help!

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

setar...@gmail.com

unread,
Jul 19, 2016, 8:16:28 AM7/19/16
to trinityrnaseq-users
Thank you and you're welcome, Mark. I'm happy to discuss with you, it's really helpful. Please correct me whenever I'm wrong as it is my first experience on this issue and I know you are an expert scientific. Again, thank you for being with me.

Regarding your question about expression variation, I also plan to do DE analysis to recognize differentially expressed genes and hope to find some SSR markers on these DE genes, actually one of our main goals is to find the target of putative SSR markers on DE gene sequence. In this case, we need the sequence of a given DE gene in healthy and diseased mice, separately, that's why I think of generating two set of genome-guided transcriptome assembly to provide these sequences, am I right, are you agree with me or not?

Regarding "compare the motifs frequency between them", my mean was to find and compare the number of SSR motifs between healthy and diseased mice, for example, the number of "AAG" motif may be 10 in healthy mouse while it's 20 in the diseased mouse. The mice are not isogenic and as you mentioned that dozens of SSRs will differ, however, those SSRs that are located on DE genes are important for us. 

All the best




On Tuesday, July 19, 2016 at 9:33:20 AM UTC+4:30, setar...@gmail.com wrote:

Mark Chapman

unread,
Jul 19, 2016, 4:56:33 PM7/19/16
to maryam moazam, trinityrnaseq-users

Hello,

Ok so the DE analysis seems good and this would give you an idea of some genes that might be involved in the disease in your mice. The fact that they're not isogenic does mean that some DE could be down to random differences between the mice though. But if you get a handful of DE genes you can look at putative functions and pathways that might be involved. You don't need to do multiple assemblies, just one, and you can assemble all your data at once then map the reads individually.

From this you'll do DE analysis and you can look at alignments of specific genes to look for seq differences between WT and diseased mice. So if there's an SSR (maybe 5 or 10% of genes have SSRs?) then you might find differences (again not all SSRs are polymorphic). So I don't see why you have to find an SSR and it is unlikely you'll find a polymorphic SSR that differentiates your WT and diseased mice.

So whats the need for this SSR? What are you trying to do with it? Do you want a marker that differentiated between WT and diseased mice? IMHO I find this unlikely to work. But there might be other things you can do.

Cheers, Mark


--

setar...@gmail.com

unread,
Jul 20, 2016, 1:08:07 AM7/20/16
to trinityrnaseq-users
Hi. Thank you for your comment. 

Mark, you said " you can assemble all your data at once then map the reads individually", your mean is to make the genome-guided transcriptome assembly with all reads (instead of using the reference genome of mice for read mapping) and then mapping reads to this assembly, individually in order to get the gene sequences separately in healthy and diseased mice, yes?

As you correctly understand, we hope to find the disease-related markers to diagnose the disease and differentiate WT from diseased mice. However, you mentioned that it is unlikely to work. Nowadays, many papers were published with reporting the SSR markers within the related assembly, so how these SSRs can be useful? Please let me know if you have another idea to develop the putative SSR markers for a given disease.

Many thanks


On Tuesday, July 19, 2016 at 9:33:20 AM UTC+4:30, setar...@gmail.com wrote:

Mark Chapman

unread,
Jul 20, 2016, 3:08:35 AM7/20/16
to maryam moazam, trinityrnaseq-users
Hi,

I've not done genome-guided analyses, but my understanding is to 1. map reads to the reference genome to generate your assembly. Then 2. map each individuals reads against the assembly.


Identifying disease-related markers will only work if you now the causative gene. Once you have this you can ID SSRs in the surrounding regions of the genome if you wanted but the chance of an SSR being IN the gene of interest is slim. But if you don't know the causative gene then where will you identify a marker (most likely a SNP not an SSR)?

Cheers, Mark

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

setar...@gmail.com

unread,
Jul 20, 2016, 9:50:40 AM7/20/16
to trinityrnaseq-users
Hi Mark,

To be honest, I have not also done these kinds of analysis, including genome-guided assembly. I'm even not sure about genome-guided transcriptome assembly importance, what is its benefit? 

Sorry, your mean from "map reads to the reference genome to generate your assembly" is the same with genome-guided transcriptome assembly, yes?

In the last paragraph of your previous response, you mentioned that Identifying disease-related markers will only work if you now the causative gene and the chance of an SSR being IN the gene of interest is slim. After doing expression analysis and getting DE genes, how we can evaluate them to find the informative (disease-related) markers?


All the best




On Tuesday, July 19, 2016 at 9:33:20 AM UTC+4:30, setar...@gmail.com wrote:

Farbod Emami

unread,
Jul 23, 2016, 12:58:22 PM7/23/16
to trinityrnaseq-users
Dear Setar,
As Dr Mark (which is very expert is this field) has suggested, I think You need first some sort of assembly,  then a DEG procedure (e.g using DESeq2 ), then annotation of your DEG genes (e.g blasting against NCBI nr database) to check what these DEGs are, then separating the genes that are present in one conditions and absent in other conditions (I have asked several questions in this regard from Mark  and Brian in this group) and then make a .fasta file of sequences of those genes and then find SSRs using some tools same as MISSA. :)

setar...@gmail.com

unread,
Jul 24, 2016, 4:18:42 AM7/24/16
to trinityrnaseq-users
Thank you for your comment, Farbod. Yes, I believed in Dr. Mark, he really helped me. However, I should show the SSRs on the transcripts (say DE transcript), so I need to have the sequences of given transcript in the both healthy and disease mice, separately, it's my problem. Marked advised me to map reads to the reference genome to generate the assembly (Probably, his mean was genome-guided transcriptome assembly). Then map each individuals reads against the assembly. To be honest, I'm confused that how two types of the sequence set (healthy and disease) will be generated after this analysis. This is my first experience on this issue, please share me if you have any experiences in this filed.

Thank you


 
On Tuesday, July 19, 2016 at 9:33:20 AM UTC+4:30, setar...@gmail.com wrote:

Farbod Emami

unread,
Jul 24, 2016, 4:44:40 AM7/24/16
to trinityrnaseq-users
Dear Setar,
Maybe I did not get your point correctly but gaining the transcripts that are deferentially expressed between two conditions is one of the final goal of Trinity package and is simple.
when you have run the DEG analysis (which you must map your reads separately to your reference -even it could be a de novo transcriptome assembly - before it) you will gain some matrix files that is showing the transcripts and their fold change and FDR and expressions (e.g TMM or fpkm). Some of them has (-) sign and for example they are for healthy individuals and some has (+) that are for sick individuals.
then you can separate absent/present genes and collect their seq with some simple linux commands (e.g  >cat Trinity.fast | grep "your-transcript-name")
and then make a fasta file for them (if they are several transcripts!) and then use some tools same as MISA.
I hope this was helpful

Mark Chapman

unread,
Jul 24, 2016, 6:52:31 AM7/24/16
to maryam moazam, trinityrnaseq-users

Hi, Sorry if I didn't answer that part. Once you map the reads to the de novo you'll output a bam file for each individual. This can then be converted to vcf and fasta files using samtools. You can then align the transcripts and check for polymorphism between diseased and susceptible.
Best wishes, Mark


--

setar...@gmail.com

unread,
Jul 25, 2016, 1:10:54 AM7/25/16
to trinityrnaseq-users
Hi Mark

Thank you very much for your feedback, I'm looking for such a response. 

Just one thing, what is the benefits of the genome-guided transcriptome assembly when the reference genome of interest is available and someone can simply map RNA-seq reads on the reference genome? I searched a lot one the net, but don't catch helpful response. Could you please let me know your explanation about it? 


All the best



On Tuesday, July 19, 2016 at 9:33:20 AM UTC+4:30, setar...@gmail.com wrote:

Mark Chapman

unread,
Jul 25, 2016, 5:09:41 AM7/25/16
to maryam moazam, trinityrnaseq-users

Hi, Assembling a transcriptome using a published genome, instead of de novo, should reduce the assembly of spurious transcripts. In your case if you're looking for an SSR you will be able to scan the genome near your candidate gene (eg a DE one) and looks for an SSR motif.
Best wishes, Mark


--

setar...@gmail.com

unread,
Jul 25, 2016, 8:16:40 AM7/25/16
to trinityrnaseq-users
Many thanks, Mark.



On Tuesday, July 19, 2016 at 9:33:20 AM UTC+4:30, setar...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages