Hi all,
Attached is one solution to the regular expression problem set sent out last week ("
regex_problems.pl") and the sequence file that it needs ("
regex_seq.txt"). Note the use of file I/O, "chomp", and "join". Compare it to yours (if you have one) or to Zhigang's if you want to see how certain things can be implemented differently.
Below are some problems for subroutines. Email your script either to me (
sahr...@ucr.edu) or to the group and we can discuss the solution during next week's workshop.
Feel free to email me or Sofia with any questions you might have.
-- Steven
Subroutine problem set:
1. Write a subroutine that joins two DNA strings.
2. Write a second subroutine that reports the GC content of this (or any) DNA sequence.
3. Write a third subroutine that counts the instances of any restriction site in any DNA sequence. (Don't worry about inserting the 'cut' character: ^)
4. Using some of these subroutines and anything else we've learned so far, create a script that reads the attached fasta file ("multi_seq.fna") and reports the following information for each sequence:
- sequence name
- sequence length
- GC content
- counts of restriction sites: EcoRI (GAATTC), SduI (GDGCHC), and HindII (GTYRAC). (Again, don't worry about the cut character. Just report the counts.
IUPAC Table for reference)
====
Steven Ahrendt
Graduate Student Researcher
Genetics, Genomics, and Bioinformatics
University of California, Riverside