Programming exercises

5 views
Skip to first unread message

Steven Ahrendt

unread,
Oct 19, 2011, 4:44:49 PM10/19/11
to UCR Perl Group
Hi all,

Attached is one solution to the regular expression problem set sent out last week ("regex_problems.pl") and the sequence file that it needs ("regex_seq.txt"). Note the use of file I/O, "chomp", and "join". Compare it to yours (if you have one) or to Zhigang's if you want to see how certain things can be implemented differently. 

Below are some problems for subroutines. Email your script either to me (sahr...@ucr.edu) or to the group and we can discuss the solution during next week's workshop.

Feel free to email me or Sofia with any questions you might have.

-- Steven

Subroutine problem set:
1. Write a subroutine that joins two DNA strings.
2. Write a second subroutine that reports the GC content of this (or any) DNA sequence.
3. Write a third subroutine that counts the instances of any restriction site in any DNA sequence. (Don't worry about inserting the 'cut' character: ^)
4. Using some of these subroutines and anything else we've learned so far, create a script that reads the attached fasta file ("multi_seq.fna") and reports the following information for each sequence:
 - sequence name
 - sequence length
 - GC content 
 - counts of restriction sites: EcoRI (GAATTC), SduI (GDGCHC), and HindII (GTYRAC). (Again, don't worry about the cut character. Just report the counts. IUPAC Table for reference)

====
Steven Ahrendt
Graduate Student Researcher
Genetics, Genomics, and Bioinformatics
University of California, Riverside


regex_seq.txt
multi_seq.fna
regex_problems.pl

Sofia Robb

unread,
Oct 19, 2011, 5:18:51 PM10/19/11
to ucr-perl-bi...@googlegroups.com
hi steven,

what fungal species are you working on that you said was really close to the split of plants and fungi?

thanks
sofia


<regex_seq.txt><multi_seq.fna><regex_problems.pl>

Steven Ahrendt

unread,
Oct 19, 2011, 5:31:51 PM10/19/11
to ucr-perl-bi...@googlegroups.com
Batrachochytrium dendrobatidis, which is from a lineage that is very close to the split of animals and fungi.

The genes in the multi_seq.fna file are from this organism.

-- Steven

Sofia Robb

unread,
Oct 19, 2011, 7:53:03 PM10/19/11
to ucr-perl-bi...@googlegroups.com
thank you.
Reply all
Reply to author
Forward
0 new messages