Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Help: Replace Help

0 views
Skip to first unread message

Amy Lee

unread,
May 1, 2008, 8:29:38 AM5/1/08
to
Hello,

I'm going to process some RNA sequences files. And I make a small script
to reverse these sequences. However, I face a problem while it's running
because of order problem.

This is my file contents.

>seq1
ACGU
>seq2
GUACCGU

And I wanna replace A to C, C to A, G to U, U to G. So from my point the
reversed file should be viewed like this.

>seq1
CAUG
>seq2
UGCAAUG

This is my codes.

if (@ARGV == 1)
{
$file = $ARGV[0];
unless (-e $file)
{
print "***Error: $file dose not exist.\n";
next;
}
unless (open $FILE_IN, '<', $file)
{
print "***Error: Cannot read $file.\n";
next;
}
while (<$FILE_IN>)
{
unless (/^>.*$/)
{
s/A/C/g;
s/C/A/g;
s/G/U/g;
s/U/G/g;
}
print $_;
}
close $FILE_IN;
}

When I finished doing this task, the file is like this.

>seq1
AAGG
>seq2
GGAAAGG

And I don't wanna use BioPerl to solve this tiny problem, anyway I'm
trying to know how to do that.

So how to solve this kind of order problem? I suppose that the replacement
must process at the same time.

Thank you very much~

Regards,

Amy Lee

A. Sinan Unur

unread,
May 1, 2008, 8:46:53 AM5/1/08
to
Amy Lee <openlin...@gmail.com> wrote in
news:pan.2008.05.01....@gmail.com:

> Re: Help: Replace Help

You have just wasted your subject line by repeating the word 'Help'.
Clearly, by posting a question here, you are asking for help. Repeating
the word 'help' does not serve any useful purpose.

> This is my file contents.
>
>>seq1
> ACGU
>>seq2
> GUACCGU
>
> And I wanna replace A to C, C to A, G to U, U to G. So from my point
> the reversed file should be viewed like this.
>
>>seq1
> CAUG
>>seq2
> UGCAAUG
>
> This is my codes.

You are missing

use strict;
use warnings;

> if (@ARGV == 1)
> {
> $file = $ARGV[0];
> unless (-e $file)
> {
> print "***Error: $file dose not exist.\n";
> next;
> }
> unless (open $FILE_IN, '<', $file)
> {
> print "***Error: Cannot read $file.\n";
> next;
> }

I do not understand what the 'next's are for. You should not send error
messages to STDOUT lest it also contain output you would like to use
further. You should show the reason for the error in your error messages
by including the $! variable. In short, replace all of the above with:

my ($file) = @ARGV;

open my $FILE_IN, '<', $file
or die "Cannot open '$file': $!";

Now, if you are processing files in a loop, replace die with warn.

Let's suppose $_ contains ACGU

> s/A/C/g;

Now it is CCGU.

> s/C/A/g;

Now it is AAGU.

> s/G/U/g;

Now it is AAUU.

> s/U/G/g;

Now it is AAGG.

I am assuming this is not what you wanted.

> And I don't wanna

s/wanna/want to/

wanna makes you sound childish.

> use BioPerl

Well, I do not know a thing about BioPerl so ...

#!/usr/bin/perl

use strict;
use warnings;

my %subst = qw( A C C A G U U G );
my @strings = qw( ACGU GUACCGU );

print "Before:\t@strings\n";

s/([ACGU])/$subst{$1}/g for @strings;

print "After\t@strings\n";

__END__

--
A. Sinan Unur <1u...@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Amy Lee

unread,
May 1, 2008, 8:50:40 AM5/1/08
to
Sorry, I did a principle mistake in my post.

I hope replace A to U, U to A, C to G, G to C.

Regards,

Amy

Jürgen Exner

unread,
May 1, 2008, 8:50:48 AM5/1/08
to
Amy Lee <openlin...@gmail.com> wrote:

>>seq1
>ACGU
>>seq2
>GUACCGU
>
>And I wanna replace A to C, C to A, G to U, U to G. So from my point the
>reversed file should be viewed like this.
>
>>seq1
>CAUG
>>seq2
>UGCAAUG
>
>This is my codes.

[4 individual s///]

>So how to solve this kind of order problem? I suppose that the replacement
>must process at the same time.

Long-winded option: replace A with some temporary value, e.g. X, then C
to A, then X to C. And then the same for G and U.

Much better option: use tr{}{}

tr {ACGU}{CAUG};

jue

A. Sinan Unur

unread,
May 1, 2008, 8:55:43 AM5/1/08
to

> Sorry, I did a principle mistake in my post.


>
> I hope replace A to U, U to A, C to G, G to C.

You can easily adapt both Jurgen's (better for single character lookup
table driven substitutions) or mine to work with whatever you need.

Sinan

Amy Lee

unread,
May 1, 2008, 9:02:48 AM5/1/08
to

Thank you very much. I've solved my problem. And could you tell me what
{} stands for?

Thank you again~

Amy

Ben Bullock

unread,
May 1, 2008, 9:02:21 AM5/1/08
to
On Thu, 01 May 2008 20:29:38 +0800, Amy Lee wrote:

> So how to solve this kind of order problem? I suppose that the
> replacement must process at the same time.

For single letters you can use

tr/ACGU/CAUG/;

If the strings to swap are longer than a single character,

s/A/unlikely/g;
s/C/A/g;
s/unlikely/C/g;
s/G/unlikely/g;
s/U/G/g;
s/unlikely/U/g;

where "unlikely" is a string which is unlikely to occur in your data.

RedGrittyBrick

unread,
May 1, 2008, 9:28:24 AM5/1/08
to
Amy Lee wrote:
> On Thu, 01 May 2008 12:50:48 +0000, Jürgen Exner wrote:
>
>> Amy Lee <openlin...@gmail.com> wrote:
>>
>>> I wanna replace A to C, C to A, G to U, U to G.
>>
>> tr {ACGU}{CAUG};

>>
> could you tell me what {} stands for?
>

{} stands for {}

They are just used to group the characters to be replaced and their
replacements.

The following are all equivalent

tr/ACGU/CAUG/;
tr!ACGU!CAUG!;
tr-ACGU-CAUG-;
tr.ACGU.CAUG.;

tr{ACGU}{CAUG};
tr(ACGU)(CAUG);
tr[ACGU][CAUG];
tr<ACGU>(CAUG);

Perl lets you use almost any character as a delimiter/separator for the
two groups of characters, you can instead use any of a few types of
bracket or brace like characters to group the two sets of characters.

Choose whatever characters make the code clearest to readers. The oldest
form is the first shown above but people can use one of the other forms
for greater clarity if, for example, they need to translate '/' to
something else.

--
RGB

Jürgen Exner

unread,
May 1, 2008, 9:36:22 AM5/1/08
to
Amy Lee <openlin...@gmail.com> wrote:
>On Thu, 01 May 2008 12:50:48 +0000, Jürgen Exner wrote:
>> Much better option: use tr{}{}
>>
>> tr {ACGU}{CAUG};
>>
> And could you tell me what {} stands for?

Hmmmm, what do you mean? It's just curly brackets or braces, see
http://en.wikipedia.org/wiki/Brackets#Uses_of_.E2.80.9C.7B.E2.80.9D_and_.E2.80.9C.7D.E2.80.9D

And maybe 'perldoc perlop', section 'Quotes and quote-like Operators'.

jue

A. Sinan Unur

unread,
May 1, 2008, 9:46:27 AM5/1/08
to
Ben Bullock <benkasmi...@gmail.com> wrote in news:fvcf0t$pa6$1
@ml.accsnet.ne.jp:

A simple lookup table driven solution would obviate the need to make
assumptions about the unlikeliness of a given character as well as
getting rid of the multiple substitutions.

Message has been deleted

szr

unread,
May 1, 2008, 9:21:47 PM5/1/08
to
A. Sinan Unur wrote:
[...]

> Clearly, by posting a question here, you are asking for help.

I disagree that, in general, by simply posting, one is seeking help. One
could just as well be seek a discussion, or insight on something, but
not necessarily assistance. After all, this /is/ a *discussion* group
:-)

--
szr


Ben Bullock

unread,
May 1, 2008, 9:40:04 PM5/1/08
to
A. Sinan Unur <1u...@llenroc.ude.invalid> wrote:
> Ben Bullock <benkasmi...@gmail.com> wrote

>> If the strings to swap are longer than a single character,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


>> s/A/unlikely/g;
>> s/C/A/g;
>> s/unlikely/C/g;
>> s/G/unlikely/g;
>> s/U/G/g;
>> s/unlikely/U/g;
>>
>> where "unlikely" is a string which is unlikely to occur in your data.
>
> A simple lookup table driven solution would obviate the need to make
> assumptions about the unlikeliness of a given character as well as
> getting rid of the multiple substitutions.

And a simple tr/// based solution would obviate the need to for you to
write a lookup table solution. But if the strings to swap are longer than
a single character, the lookup table solution is going to be somewhat
complex.

Here is an example of a badly-written lookup table solution:

#!/usr/bin/perl

use strict;
use warnings;

my %subst = qw( A C C A G U U G );
my @strings = qw( ACGU GUACCGU );

print "Before:\t@strings\n";

s/([ACGU])/$subst{$1}/g for @strings;

print "After\t@strings\n";

__END__

The problem here is that the writer has put the same data, the list of
stuff to swap, in three different places. Maybe that kind of clumsy
solution is OK for an example program, but for the real world it's
not. If one uses a lookup table, then the swapping data should only be
in exactly one place:

my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
%subst = (%subst, reverse %subst);
my $substkeys = join ('|',keys %subst); # We want to swap strings so use |


my @strings = qw( ACGU GUACCGU );

s/($substkeys)/$subst{$1}/g for @strings;

If one uses the original solution proposed above, as the list of data
to swap changes, (and since the strings consist of more than one
character, remember), bugs will occur if the programmer is not
extremely careful about updating both parts of the list of stuff to
swap and the left hand side of the substitution.

So I don't recommend a lookup table, unless one knows what one is doing.

A. Sinan Unur

unread,
May 1, 2008, 10:16:04 PM5/1/08
to
benkasmi...@gmail.com (Ben Bullock) wrote in
news:fvdrdk$8la$1...@ml.accsnet.ne.jp:

> A. Sinan Unur <1u...@llenroc.ude.invalid> wrote:
>> Ben Bullock <benkasmi...@gmail.com> wrote
>
>>> If the strings to swap are longer than a single character,
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> s/A/unlikely/g;
>>> s/C/A/g;
>>> s/unlikely/C/g;
>>> s/G/unlikely/g;
>>> s/U/G/g;
>>> s/unlikely/U/g;
>>>
>>> where "unlikely" is a string which is unlikely to occur in your
>>> data.
>>
>> A simple lookup table driven solution would obviate the need to make
>> assumptions about the unlikeliness of a given character as well as
>> getting rid of the multiple substitutions.
>
> And a simple tr/// based solution would obviate the need to for you to
> write a lookup table solution. But if the strings to swap are longer
> than a single character, the lookup table solution is going to be
> somewhat complex.

Granted.

> Here is an example of a badly-written lookup table solution:
>

<snipped for brevity>

>
> The problem here is that the writer has put the same data, the list of
> stuff to swap, in three different places. Maybe that kind of clumsy
> solution is OK for an example program,

and that was the spirit in which those lines were written.

> but for the real world it's not. If one uses a lookup table, then the
> swapping data should only be in exactly one place:
>
> my %subst = qw/A C G U/; # Do not repeat this data anywhere!!!!!
> %subst = (%subst, reverse %subst);
> my $substkeys = join ('|',keys %subst); # We want to swap strings so use |
> my @strings = qw( ACGU GUACCGU );
> s/($substkeys)/$subst{$1}/g for @strings;
>
> If one uses the original solution proposed above, as the list of data
> to swap changes, (and since the strings consist of more than one
> character, remember), bugs will occur if the programmer is not
> extremely careful about updating both parts of the list of stuff to
> swap and the left hand side of the substitution.
>
> So I don't recommend a lookup table, unless one knows what one is
> doing.

Well, if one uses the solution you proposed above and the list of data
to swap changes to

my %subst = qw( A|C C|A G|U U|G );

there will be issues with the way you build the search string.

So:

#!/usr/bin/perl

use strict;
use warnings;

my %replace = qw( A|C C|A G|U U|G A$A Z$Z);
%replace = (%replace, reverse %replace);

my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);
my @strings = qw( A|C G|U G|UA|CC|AG|U Z$Z A$A );

print "Before:\t@strings\n";

s/($search)/$replace{$1}/g for @strings;

print "After\t@strings\n";

__END__

--

Ben Bullock

unread,
May 1, 2008, 10:32:15 PM5/1/08
to
A. Sinan Unur <1u...@llenroc.ude.invalid> wrote:

> Well, if one uses the solution you proposed above and the list of data
> to swap changes to
>
> my %subst = qw( A|C C|A G|U U|G );
>
> there will be issues with the way you build the search string.

> my $search = join ('|', map { "(?:\Q$_\E)" } keys %replace);

So you agree that the lookup table driven solution isn't simple?

I think my original method of substituting in an unlikely string,
which you objected to, was fairly appropriate for this particular
question. I often use this kind of method for quick jobs.


A. Sinan Unur

unread,
May 1, 2008, 10:50:43 PM5/1/08
to
benkasmi...@gmail.com (Ben Bullock) wrote in
news:fvduff$96d$1...@ml.accsnet.ne.jp:

Yes. That was the first thing in my response: 'Granted'.

OTOH, the number of repeated substitution operations which the 'unlikely
string' approach requires (especially as the number of
lookups/replacements grows) makes me think that the more complex
approach might end up being simpler to maintain for any 'durable'
program.

Thank you for your corrections.

Sinan

0 new messages