Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

re: Reading in external file, strip out duplicates, sort then save as ext. file

0 views
Skip to first unread message

Macromedia

unread,
Sep 21, 2005, 6:27:30 PM9/21/05
to begi...@perl.org

Hi,

I have a file that I would like to read in then do the following:

- Read in each line and remove any duplicate text with tags
- Sort the file so all tag IDs are in sequential order
- Save the results to a different file name.

Can this be done easily? If so, how? I'm really a newbie at this
stuff. Any help would be greatly appreciated.

Ed

Below is a sample of my input file and what I want the output file to
look like.


Input:

<tag id=1>Data 1</tag>
<tag id=2>Data 2</tag>
<tag id=3>Data 3</tag>
<tag id=4>Data 4</tag>
<tag id=2>Data 2</tag>
<tag id=5>Data 5</tag>
<tag id=13>Data 13</tag>
<tag id=6>Data 6</tag>
<tag id=7>Data 7</tag>
<tag id=8>Data 8</tag>
<tag id=9>Data 9</tag>
<tag id=13>Data 13</tag>
<tag id=10>Data 10</tag>
<tag id=11>Data 11</tag>
<tag id=12>Data 12</tag>
<tag id=14>Data 14</tag>
<tag id=15>Data 15</tag>
<tag id=16>Data 16</tag>
<tag id=17>Data 17</tag>


Output:

<tag id=1>Data 1</tag>
<tag id=2>Data 2</tag>
<tag id=3>Data 3</tag>
<tag id=4>Data 4</tag>
<tag id=5>Data 5</tag>
<tag id=6>Data 6</tag>
<tag id=7>Data 7</tag>
<tag id=8>Data 8</tag>
<tag id=9>Data 9</tag>
<tag id=10>Data 10</tag>
<tag id=11>Data 11</tag>
<tag id=12>Data 12</tag>
<tag id=13>Data 13</tag>
<tag id=14>Data 14</tag>
<tag id=15>Data 15</tag>
<tag id=16>Data 16</tag>
<tag id=17>Data 17</tag>

John W. Krahn

unread,
Sep 21, 2005, 6:58:35 PM9/21/05
to Perl Beginners
macromedia wrote:
> Hi,

Hello,

> I have a file that I would like to read in then do the following:
>
> - Read in each line and remove any duplicate text with tags
> - Sort the file so all tag IDs are in sequential order
> - Save the results to a different file name.
>
> Can this be done easily? If so, how? I'm really a newbie at this
> stuff. Any help would be greatly appreciated.

#!/usr/bin/perl
use warnings;
use strict;

my $file_in = 'somefile';
my $file_out = 'differentfile';

open my $in, '<', $file_in or die "Cannot open $file_in: $!";
open my $out, '>', $file_out or die "Cannot open $file_out: $!";

my %seen;
print $out map $_->[ 1 ],
sort { $a->[ 0 ] <=> $b->[ 0 ] }
map [ /<tag id=(\d+)>/, $_ ],
grep />([^<]+)</ && !$seen{ $1 }++,
<$in>;

__END__

John
--
use Perl;
program
fulfillment

0 new messages