I have a file that I would like to read in then do the following:
- Read in each line and remove any duplicate text with tags
- Sort the file so all tag IDs are in sequential order
- Save the results to a different file name.
Can this be done easily? If so, how? I'm really a newbie at this
stuff. Any help would be greatly appreciated.
Ed
Below is a sample of my input file and what I want the output file to
look like.
Input:
<tag id=1>Data 1</tag>
<tag id=2>Data 2</tag>
<tag id=3>Data 3</tag>
<tag id=4>Data 4</tag>
<tag id=2>Data 2</tag>
<tag id=5>Data 5</tag>
<tag id=13>Data 13</tag>
<tag id=6>Data 6</tag>
<tag id=7>Data 7</tag>
<tag id=8>Data 8</tag>
<tag id=9>Data 9</tag>
<tag id=13>Data 13</tag>
<tag id=10>Data 10</tag>
<tag id=11>Data 11</tag>
<tag id=12>Data 12</tag>
<tag id=14>Data 14</tag>
<tag id=15>Data 15</tag>
<tag id=16>Data 16</tag>
<tag id=17>Data 17</tag>
Output:
<tag id=1>Data 1</tag>
<tag id=2>Data 2</tag>
<tag id=3>Data 3</tag>
<tag id=4>Data 4</tag>
<tag id=5>Data 5</tag>
<tag id=6>Data 6</tag>
<tag id=7>Data 7</tag>
<tag id=8>Data 8</tag>
<tag id=9>Data 9</tag>
<tag id=10>Data 10</tag>
<tag id=11>Data 11</tag>
<tag id=12>Data 12</tag>
<tag id=13>Data 13</tag>
<tag id=14>Data 14</tag>
<tag id=15>Data 15</tag>
<tag id=16>Data 16</tag>
<tag id=17>Data 17</tag>
Hello,
> I have a file that I would like to read in then do the following:
>
> - Read in each line and remove any duplicate text with tags
> - Sort the file so all tag IDs are in sequential order
> - Save the results to a different file name.
>
> Can this be done easily? If so, how? I'm really a newbie at this
> stuff. Any help would be greatly appreciated.
#!/usr/bin/perl
use warnings;
use strict;
my $file_in = 'somefile';
my $file_out = 'differentfile';
open my $in, '<', $file_in or die "Cannot open $file_in: $!";
open my $out, '>', $file_out or die "Cannot open $file_out: $!";
my %seen;
print $out map $_->[ 1 ],
sort { $a->[ 0 ] <=> $b->[ 0 ] }
map [ /<tag id=(\d+)>/, $_ ],
grep />([^<]+)</ && !$seen{ $1 }++,
<$in>;
__END__
John
--
use Perl;
program
fulfillment