I have a problem while I'm processing my sequence file. The file content
is like this.
>seq1
ACGGTC
ACTG
>seq2
CGATCC
ACCTC
>seq3
......
And I hope make every sequence into a single file. For example, a file
"seq1" content is
>seq1
ACGGTC
ACTG
And a file "seq2" content is
>seq2
CGATCC
ACCTC
and so on.
However, I'm only a newbie in perl, I don't know what to do. So could
anyone post some sample codes to do that? And I don't wanna use BioPerl
because others machines do not install this package although it's quite
useful.
Thank you very much~
Regards,
Amy Lee
I know text files, binary files, random access files, sequential files,
but I've never heard of a sequence file.
>The file content
>is like this.
>
>>seq1
>ACGGTC
>ACTG
>>seq2
>CGATCC
>ACCTC
>>seq3
>......
>
>And I hope make every sequence into a single file. For example, a file
What is a sequence?
>"seq1" content is
>>seq1
>ACGGTC
>ACTG
>And a file "seq2" content is
>>seq2
>CGATCC
>ACCTC
>and so on.
How is this desired content different from the original content? They
seem to be identical to me.
>However, I'm only a newbie in perl, I don't know what to do. So could
>anyone post some sample codes to do that?
Probably not without some much improved specification.
jue
My most work is to process DNA so I save DNA sequences as a format called
FastA as you've seen before. And you could call my file dna.fasta, the
content is
>seq1
ACGGTC
ACTG
>seq2
CGATCC
ACCTC
>seq3
......
The "seq1" "seq2" "seq3" and "seqx" is the names of these sequences. I can
say, it's a mark. And under "seqx" it's DNA sequences. My point is quite
simple, I wanna extract every sequences as a file saved. I mean I can
extract sequences for dna.fasta and make a single file for every sequences.
There's an example.
From dna.fasta, I can make 3 sequences files and the names are from
mark names. They are seq1, seq2, seq3. In seq1, its content is
>seq1
ACGGTC
ACTG
In file seq2, its content is
>seq2
CGATCC
ACCTC
And so on. So from this I can deal with my sequences easily.
From your previous description I thought those were 3 separte files.
Obviously I was wrong.
>The "seq1" "seq2" "seq3" and "seqx" is the names of these sequences. I can
>say, it's a mark. And under "seqx" it's DNA sequences. My point is quite
>simple, I wanna extract every sequences as a file saved. I mean I can
>extract sequences for dna.fasta and make a single file for every sequences.
So you want to split the file at each ">seq*" marker.
Well, then why not just loop (while (<>)) through the input file and
whenever you encounter such a marker (m//) close() the current output
file and open() a new one?
jue
while ( <> ) {
if ( /^>(.+)/ ) {
open my $OUT, '>>', $1 or die "Cannot open '$1' $!";
select $OUT;
}
print;
}
John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
Thank you very much~
I've solved this problem.
Regards,
Amy Lee
Anyway, could you tell me how to find out the usage of "select" function?
Thank you.
Yes, you are right, and the codes is right for my work.
Thank you again~
Amy
The usage of each perl function is described in the first line(s) of the
manual page for this function. It doesn't explicitely say "Usage" as in
Unix man pages, but it has the same format:
select FILEHANDLE
Sometimes, if a function is overloaded, there may be additional usages
farther down the page, too, e.g.
select RBITS,WBITS,EBITS,TIMEOUT
jue