regular expression question (escaping)

9 views
Skip to first unread message

Zhigang Wu

unread,
Nov 28, 2011, 12:22:41 PM11/28/11
to ucr-perl-bi...@googlegroups.com
Hi All,
I have a little question on regular expression. What I wanna to do is
to refrain from reporting any sequences which contain more than
defined number of poly nucleotides. Below is the code I wrote to
achieve this goal. Specifically, the array @seq contains three
sequences and the $number is defined to have value 10. My expected
result is that since both sequence 'AAAAAAAAAAAAAAAAAAAAA' and
'TTTTTTTTTTTTTTTT' has more than 10 As so that they won't be printed
out after the filter process. However, the regular expression I wrote
$nt{$number,} is not working as I expected. PERL keeps complaining
"Global symbol "%nt" requires explicit package name", which seems like
that PERL recognized "$nt{$number,}" as an Hash data structure. I
tried to put the double quotes around $nt and $number
"$nt"{"$number",}. The result is not what I want, PERL did not
complain any syntax error though. How can I get it work? Any thoughts
and comments are welcome.


#!/usr/bin/perl
use warnings;
use strict;
my $number = 10;
my @seqs = qw ('AAAAAAAAAAAAAAAAAAAAA' 'TTTTTTTTTTTTTTTT' 'ATCGTGAGGTGTCCAAGT');
for my $seq (@seqs){
my $flag = 0;
for my $nt (qw / 'A' 'T' 'C' 'G'/ ){
if ($seq =~ /$nt{$number,}/){
$flag = 1;
}
}
if (! $flag){
print $seq,"\n";
}
}
--

---------------------------------------------------------------------------------------------
PhD Candidate in Plant Biology
Department of Botany and Plant Sciences
University of California, Riverside

Jason Stajich

unread,
Nov 28, 2011, 12:37:01 PM11/28/11
to ucr-perl-bi...@googlegroups.com, Zhigang Wu


I think you can solve your problem with this -- \Q and \E tell perl to literally take the value of the variable rather than trying to treat it like a variable in the regexp.  It is probably not necessary if you don't also try to do the {} after but I think this will force the context in the regexp.

 if ($seq =~ /\Q$nt\E{$number,}/){


Also see solutions to your homopolymer run detection problem also worked out.


BTW - the whole point of qw is you don't need the quotes

eg

my @seqs = qw(AAAA TTT ATC)

or 
for my $nt (qw (A T C G) ) {

}

Jason

Sofia Robb

unread,
Nov 28, 2011, 12:46:08 PM11/28/11
to ucr-perl-bi...@googlegroups.com
Hi,

Perl definitely thinks your $nt{$number,} is a hash.

You need to make perl not interpret your statement to be a hash.
Here is the first solution I came up with, but there are probably plenty more. I took away the need for a variable to denote the nucleotide that you are looking for.

#!/usr/bin/perl
use warnings;
use strict;
my $number = 10;
my @seqs = qw ('AAAAAAAAAAAAAAAAAAAAA' 'TTTTTTTTTTTTTTTT' 'ATCGTGAGGTGTCCAAGT');
for my $seq (@seqs){
   my $flag = 0;
   #for my $nt (qw / 'A' 'T' 'C' 'G'/ ){
       if ($seq =~ /A{$number}|T{$number}|G{$number}|C{$number,}/){
           $flag = 1;
       }
   #}

Zhigang Wu

unread,
Nov 28, 2011, 1:13:15 PM11/28/11
to ucr-perl-bi...@googlegroups.com
Thanks for your quick responses.
Yes, Sofia, your solution is definitely correct.
By taking Jason's hint, below is the working code (I have removed the
unnecessary quotes for those words contained in qw).

#!/usr/bin/perl
use warnings;
use strict;
my $number = 10;

my @seqs = qw (AAAAAAAAAAAAAAAAAAAAA ATCGTGAGGTGTCCAAGT);


for my $seq (@seqs){
my $flag = 0;

for my $nt (qw /A T C G/ ){
if ($seq =~ /\Q$nt\E{\Q$number\E,}/){

Reply all
Reply to author
Forward
0 new messages