problem set for regular expressions

3 views
Skip to first unread message

Sofia Robb

unread,
Oct 12, 2011, 1:28:05 PM10/12/11
to ucr-perl-bi...@googlegroups.com
Hello All,

One of the students thought it would be helpful if i supplied a "quiz" or something similar so that you would work on it and then give it back to me when finished so that I can see your progress and make suggestions.

so here is it. 

Problem Set

  1. The enzyme ApoI has a restriction site: R^AATTY where R and Y are degenerate nucleotideides. See the IUPAC table to identify the nucleotide possibilities for the R and Y.

    Write a regular expression that will match occurrences of the site in a sequence. (hint: what are you going to do about the actual cut site, represented by the '^'?)


  2. Use the regular expression you just wrote to find all the restriction sites in the following sequence. Be sure to think about how to handle the newlines!
    GAATTCAAGTTCTTGTGCGCACACAAATCCAATAAAAACTATTGTGCACACAGACGCGAC
    TTCGCGGTCTCGCTTGTTCTTGTTGTATTCGTATTTTCATTTCTCGTTCTGTTTCTACTT
    AACAATGTGGTGATAATATAAAAAATAAAGCAATTCAAAAGTGTATGACTTAATTAATGA
    GCGATTTTTTTTTTGAAATCAAATTTTTGGAACATTTTTTTTAAATTCAAATTTTGGCGA
    AAATTCAATATCGGTTCTACTATCCATAATATAATTCATCAGGAATACATCTTCAAAGGC
    AAACGGTGACAACAAAATTCAGGCAATTCAGGCAAATACCGAATGACCAGCTTGGTTATC
    AATTCTAGAATTTGTTTTTTGGTTTTTATTTATCATTGTAAATAAGACAAACATTTGTTC
    CTAGTAAAGAATGTAACACCAGAAGTCACGTAAAATGGTGTCCCCATTGTTTAAACGGTT
    GTTGGGACCAATGGAGTTCGTGGTAACAGTACATCTTTCCCCTTGAATTTGCCATTCAAA
    ATTTGCGGTGGAATACCTAACAAATCCAGTGAATTTAAGAATTGCGATGGGTAATTGACA
    TGAATTCCAAGGTCAAATGCTAAGAGATAGTTTAATTTATGTTTGAGACAATCAATTCCC
    CAATTTTTCTAAGACTTCAATCAATCTCTTAGAATCCGCCTCTGGAGGTGCACTCAGCCG
    CACGTCGGGCTCACCAAATATGTTGGGGTTGTCGGTGAACTCGAATAGAAATTATTGTCG
    CCTCCATCTTCATGGCCGTGAAATCGGCTCGCTGACGGGCTTCTCGCGCTGGATTTTTTC
    ACTATTTTTGAATACATCATTAACGCAATATATATATATATATATTTAT


  3. Determine the site(s) of the cut in the above sequence. Print out the sequence with "^" at the cut site.

    Hints:

    • Use subpatterns (parentheses and $1, $2) to find the cut site within the pattern.
    • Use s///

    Example: if the pattern is GACGT^CT the following sequence
    would be cut like this:
    
    AAAAAAAAGACGT^CTTTTTTTAAAAAAAAGACGT^CTTTTTTT

  4. Now that you've done your restriction digest, determine the lengths of your fragments and sort them by length (in the same order they would separate on an electrophoresis gel).

    Hint: take a look at the split man page or think about storing your matches in an array. With one of these two approaches you should be able to convert this string:

       AAAAAAAAGACGT^CTTTTTTTAAAAAAAAGACGT^CTTTTTTT

    into this array:
    ("AAAAAAAAGACGT","CTTTTTTTAAAAAAAAGACGT","CTTTTTTT")

Reply all
Reply to author
Forward
0 new messages