Question about Motif enrichment analysis
Hi there,
I want to check whether there are any JASPAR motifs are enriched in my sequences. One thing I can do is just like MEME-Chip:
1) find discover novel DNA-binding motifs (with MEME and DREME)
2) analyze them for similarity to known binding motifs in JASPAR motif database with Tomtom
But if the number of the sequences is not big enough for MEME or DREME to discover the de novo motif, it cannot carry out motif
comparison for de novo motif and known JASPAR motifs by TomTom.
So, I think AME is right tools for my case, since it can tests whether a set of sequences contain more or better matches to the
motifs then would be expected by chance.
However, according to its manual, AME assumes the sequences have different importance.
Input Sequence File
A collection of (primary) sequences in FASTA format. The sequences must be sorted by increasing value of some secondary criterion (e.g.,
expression level, peak height, fluorescence score). In this documentation, we refer to this secondary criterion as the "FASTA score". This score
can optionally be placed in the FASTA ID line. If present, the FASTA score must come immediately after the sequence ID. For example, if the
FASTA ID line is
>seq_1 0.123
then 0.123 is the FASTA score for that sequence.
My question is if it’s hard to say which sequence is more important than others, the fasta file can’t be sorted or no scores can be
assigned to each sequence, can I assuming all the sequences are equally important? Is it possible for AME to perform motif
enrichment analysis for this situation? Should I assigned equal number such as 1 to each sequence in a fasta file? Or whether
there are any other proper tool for such kind of analysis?
Thank you very much and look forward your reply.