Project 4 The name game. My script does not report all sequences containing the name.

23 views
Skip to first unread message

Marie-Line Faucillion

unread,
Feb 18, 2014, 2:38:47 PM2/18/14
to unix-and-perl-...@googlegroups.com
Hi, here is my problem:

I cannot figure out why it does not report all the occurrences. Sometimes it does not even find the name. I´d need a bit of help. I left the print statements I have used while debugging but commented out. 

Also here is a problem I get for all my scripts. When use strict; use warnings; are on the same line, the script does not run !!! When I add comments, the code does not run as before. I am working on a mac book pro with Mac OS X 10.7.5. My current perl version is v5.12.3 and I am using the fraise text editor provided in the primer.
Thanks a lot in advance for your help.



 Here is my code:

#!/usr/bin/perl
#project4_name_game.pl by Marie-Line

use strict;
use warnings;
####### stores the input data
my ($file, $name) = @ARGV;


#### die statement to check that two arguments given and it is a protein sequence

die "usage: project4_name_game.pl <protein file> <name>" unless @ARGV == 2;
die "The name cannot contain non protein letter which are:B, J, O, U, X, or Z," if ($name =~ m/[bjouxz]/i);

print "input is: $file\t $name\n";
  #############################

# open the protein file and store each line into an array @line, then close the file 
#stores the name in $name- stores its length in $length - capitalizes the name
open(my $in, "<$ARGV[0]") or die "Couldn´t open the file";
chomp(my @line = <$in>);
close $in;
my $name_length = length ($name);
$name =~ tr/a-z/A-Z/;
print "\$name_length is $name_length\n";


#print "@line\n\n";
#print "$line[0]\n"; #index 0 of array is empty
#print "$line[1]\n";
#print "$line[2]\n";
shift (@line);
# transfer the array @line to a hash %name2seq
my %name2seq = @line; #create a hash

#my @keys = keys (%name2seq);
#my @vals = values (%name2seq);
#print "keys: @keys\n";
#print "values: @vals\n";

############################ looking for a match 
#for (my $i=0 ; $i < protein sequence length; $i ++) 
#{create each possible substring of length $name_length and compare it (eq) with $name - if it matches print key - next -
my $found =0;
my $round =0;
foreach my $key_seq_name (keys %name2seq) {
$round ++;
my $sequence = $name2seq{$key_seq_name};
my $seqlength = length ($sequence);
#print "$seqlength\n";
my $search = 1;
while ($search) {
for (my $i=1; $i < ($seqlength - $name_length); $i ++) {
my $subseq = substr($sequence, $i, $name_length);
if ($subseq eq $name) {
my @short_key_seq_name = split(' ' ,$key_seq_name);
print "$short_key_seq_name[0] contains $name\n";
#print "$key_seq_name\n";
#print "$name2seq{$key_seq_name}\n";
$search = 0;
$found = 1;
last;
}
$search = 0;
last;
}
}
}
if ($found == 0) {
print "$name is not represented in this protein sequence\n";
print "This program checked $round sequences\n";
}

Keith Bradnam

unread,
Feb 18, 2014, 3:16:47 PM2/18/14
to unix-and-perl-...@googlegroups.com
Hi Marie-Line,

First of all, thank you for buying our book! :-)

Some comments on your code.

  1. You assign the file name from @ARGV to $file, but then you continue to needlessly use @ARGV
  2. Your approach is to first slurp the entire file into one array. This might work for this example, but might cause problems if you used a much bigger file (depending on how much memory your computer has). It is probably more efficient to have a while loop that loops over the input file. Then you only need to deal with two lines at a time and ask 'does this current sequence in line 2 contain my target name, if so print out the ID from line 1?'
  3. Be careful with your indentation, it can be hard to follow your code if you don't indent consistently 
  4. You try to match the name by using the substr function to extract all possible words that equate to the length of the target name. This is wildly inefficient in comparison to using the matching operator to just ask 'does this sequence *match* the target name anywhere?'
  5. Switching your code to simply use a matching operator allowed me to remove nearly all of your script and leave just this:
#!/usr/bin/perl
#project4_name_game.pl by Marie-Line
use strict;
use warnings;

####### stores the input data
my ($file, $name) = @ARGV;

#### die statement to check that two arguments given and it is a protein sequence

die "usage: project4_name_game.pl <protein file> <name>" unless @ARGV == 2;
die "The name cannot contain non protein letter which are:B, J, O, U, X, or Z," if ($name =~ m/[bjouxz]/i);

print "input is: $file\t $name\n";

# open the protein file and store each line into an array @line, then close the file 
#stores the name in $name- stores its length in $length - capitalizes the name
open(my $in, "<$file") or die "Couldn´t open $file";
chomp(my @line = <$in>);
close $in;

# get rid of blank line from start of file
shift (@line);
# transfer the array @line to a hash %name2seq
my %name2seq = @line; #create a hash

############################ looking for a match 
foreach my $key_seq_name (keys %name2seq) {
if ($name2seq{$key_seq_name} =~ m/$name/){
print "$key_seq_name contains $name\n";
}
}

However, I would still suggest that you try solving this without resorting to storing all of the information in a hash.

Regards,

Keith

Keith Bradnam

unread,
Feb 18, 2014, 3:18:58 PM2/18/14
to unix-and-perl-...@googlegroups.com
Can you please provide an example of any script that does not run when you include use strict and use warnings on the same line. Also, when you say 'does not run'...do you see any type of error message?


On Tuesday, February 18, 2014 11:38:47 AM UTC-8, Marie-Line Faucillion wrote:

Marie-Line Faucillion

unread,
Feb 19, 2014, 2:14:31 PM2/19/14
to unix-and-perl-...@googlegroups.com
Thank you so much for your quick reply. I fixed this kind of problems in my codes by splitting the line, but for example if I copy-paste "array.pl" from the primer, remove the line numbering, save it and try to run it. Nothing happens, no error messages, it simply goes to the next terminal line.

I attach a file that fails to run on my computer: array_err.pl.

Best regards,

Marie-Line
array_err.pl
Reply all
Reply to author
Forward
0 new messages