Actually we need the perl code for the project 5( K-mer analysis) from the Unix and
Perl Primer for Biologists tutorial by Keith Bradnam & Ian Korf . Do you have solution to this problem or
any link where this problem is solved ? we dont know perl. we study
genetics. but we need this programs for our assignment. can you help me?
''This project will use the intron_IME_data.fasta file in the Data/Arabidopsis directory.
However, this is a multi-line FASTA file and you will first need to make a new FASTA file that
rearranges each sequence to occupy only one line (see Project 4 for how to do this).
Your program should have the following structure:
1. Provide a typical command-line interface allowing the user to choose the value for k.
2. Create two hashes to store the k-mer counts for 1st introns and other introns. You might
name these %count1 and %count2.
3. Read a definition line from your new
FASTA file. Extract the intron number from the
definition line. First introns will be labeled i1, second introns i2, and so on.
4. If it is the first intron, count all of the k-mers in the intron and add the counts to the
%count1 hash. If it is a more distant intron, add the counts to %count2.
5. After all the counting is done, create two new hashes, call them %freq1 and %freq2 to
hold the frequencies for every k-mer.
6. Report the log-odds ratio of the frequency of each k-mer occurring in 1st introns vs.
other introns. ''