#!/usr/bin/perl -w
use Data::Dumper;
my %HoH=();
while (<DATA>)
{
if ( /(\S+) (\S+\s\S+\s\S+) (.*?) (\w+) \S+ (\S+) (\S+)( \(([^)]+
\)))?/ )
{
if ($6 eq 'Start')
{
%HoH = (
'start_server' => {'start_name' =>$5, 'start_date' => $2,
'start_time' => $3},
);
}
if ($6 eq 'End')
{
%HoH = (
'ende_name' => {'end_name' =>$5, 'end_date' => $2,
'end_time' => $3, 'size' => $7}
);
}
}
}
print Dumper(\%HoH);
=item
What I want is to match right server for right date, time and size,
the out put I'm looking for is:
Server: Hercules:sm_fv_servicedata
Start date: Sun Nov 15
Start time: 00:00:03
End date: Sun Nov 15
End time: 00:00:55
End size: (53664 KB)
Same for other servers
=cut
__DATA__
dst Sun Nov 15 00:00:03 EST
galaxy.fuqua.duke.edu:fv_servicedataHercules:sm_fv_servicedata Start
dst Sun Nov 15 00:00:03 EST
galaxy.fuqua.duke.edu:fv_phdHercules:sm_fv_phd Start
dst Sun Nov 15 00:00:03 EST
galaxy.fuqua.duke.edu:fv_studentsHercules:sm_fv_students Start
dst Sun Nov 15 00:00:25 EST
galaxy.fuqua.duke.edu:fv_studentsHercules:sm_fv_students End (3368 KB)
dst Sun Nov 15 00:00:26 EST
galaxy.fuqua.duke.edu:fv_phdHercules:sm_fv_phd End (4528 KB)
dst Sun Nov 15 00:00:55 EST
galaxy.fuqua.duke.edu:fv_servicedataHercules:sm_fv_servicedata End
(53664 KB)
dst Sun Nov 15 00:01:00 EST
andromeda.fuqua.duke.edu:esx_fc_nfs2Hercules:sm_esx_fc_nfs2 Request
(Retry)
dst Sun Nov 15 00:15:04 EST
galaxy.fuqua.duke.edu:fv_facultyHercules:sm_fv_faculty Start
dst Sun Nov 15 00:15:04 EST
galaxy.fuqua.duke.edu:fv_researchdataHercules:sm_fv_researchdata Start
dst Sun Nov 15 00:15:04 EST
galaxy.fuqua.duke.edu:fv_embaHercules:sm_fv_emba Start
dst Sun Nov 15 00:15:04 EST
galaxy.fuqua.duke.edu:rootHercules:sm_galaxy_root Start
dst Sun Nov 15 00:15:13 EST
galaxy.fuqua.duke.edu:fv_embaHercules:sm_fv_emba End (1900 KB)
dst Sun Nov 15 00:15:18 EST
galaxy.fuqua.duke.edu:fv_researchdataHercules:sm_fv_researchdata End
(1820 KB)
dst Sun Nov 15 00:15:32 EST
galaxy.fuqua.duke.edu:rootHercules:sm_galaxy_root End (39128 KB)
dst Sun Nov 15 00:16:00 EST
andromeda.fuqua.duke.edu:esx_fc_nfs2Hercules:sm_esx_fc_nfs2 Request
(Retry)
You need
use warnings;
use strict;
here, and you don't need -w (use warnings is better).
> use Data::Dumper;
> my %HoH=();
Use sensible names for your variables. I know the documentation in
perllol uses names like @AoA: that's because it's just an example, and
the only interesting thing about the variables is that they hold an
array-of-arrays. That isn't the case here (you're more interested in
what the data is than in how it's stored) so you should call it
something like %servers.
> while (<DATA>)
> {
> if ( /(\S+) (\S+\s\S+\s\S+) (.*?) (\w+) \S+ (\S+) (\S+)( \(([^)]+
> \)))?/ )
You don't use $1 or $4; also relying on numbered captures makes your
code hard to read. Since your data actually looks like
dst Sun Nov 15 00:00:25 EST
galaxy.fuqua.duke.edu:fv_studentsHercules:sm_fv_students End (3368 KB)
(presumably all on one line: you need to be careful to be more clear
about that) you probably want something more like
if (
my ($date, $name, $action, $size) =
/dst (.{10}) (.{8}) \w+ [\w.:]+(Hercules:\w+) (Start|End) (\(.*?\))?/
) {
I am here assuming that the date and time are of fixed length, and that
you can safely ignore the timezone: you need to be sure this is correct.
It might be better to extract the whole date-and-time string (including
the timezone) and then parse it with Date::Parse or DateTime or
something. I presume there is a good reason for omitting the year from
the logfile: what's going to happen in a month and a half when you have
some entries from 2009 and some from 2010?
I am also assuming that the part of the 'name' string you want always
starts with 'Hercules:'. If this isn't correct you will need to work out
what the correct rules are.
> {
> if ($6 eq 'Start')
if ($action eq "Start") {
> {
> %HoH = (
> 'start_server' => {'start_name' =>$5, 'start_date' => $2,
> 'start_time' => $3},
> );
This will set %HoH to just one entry, deleting any other entries you've
already parsed. Since you are trying to build up a set of records about
each server, you need to use the server name as a key in the hash. Since
you want two entries for each server (a 'start' and an 'end' record) you
need a further key below that:
$servers{$name}{start_server} = {
date => $date,
time => $time,
};
Notice that I didn't need to worry about creating $servers{$name} or
putting a new hashref in it: I just assumed it exists and contained one.
Perl allows you to do this, for convenience.
You need a similar assignment in the 'End' block; and, in fact, since
these two assignments are pretty-much the same, you could leave off the
'if' blocks altogether and just do
$servers{$name}{$action} = {
date => $date,
time => $time,
size => $size,
};
This will mean your 'Start' records end up with a 'size' key set to
undef, but that doesn't matter. (Notice that the fact I changed the
regex above to ignore the lines that weren't 'Start' or 'End' makes this
section of the code simpler.)
> print Dumper(\%HoH);
Obviously at this point you will need to print the data out in the
correct format, rather than simply using Dumper. You want to leave the
Dumper in for now, though, so you can see what the data structure looks
like. Your printing code will need to start with
for my $server (keys %servers) {
Ben
Correct, all is one line such as below:
dst Sun Nov 15 00:00:03 EST galaxy.fuqua.duke.edu:fv_phd
Hercules:sm_fv_phd Start
>
> I am also assuming that the part of the 'name' string you want always
> starts with 'Hercules:'. If this isn't correct you will need to work out
> what the correct rules are.
No 'name' does not always starts with 'Hercules:' it could be
different as each log is different.
also ^dst is not the case always , it could be slk or src
Per your recommendation I created the following, but it does not work:
#!/usr/bin/perl -w
use warnings;
use strict;
use Data::Dumper;
my %servers;
my $time;
while (<DATA>) {
if (my ($date, $name, $action, $size) = /dst (.{10}) (.{8}) \w+
[\w.:]+(Hercules:\w+) (Start|End) (\(.*?\))?/ ) {
print "[DBG]: $date, $name, $action, $size\n";
if ($action eq "Start") {
$servers{$name}{start_server} = {
'date' => $date,
'time' => $time, <-- thi is not defined and it
complains
};
}
$servers{$name}{$action} = {
'date ' => $date,
'time' => $time, <--- this is not defined complains
'size ' => $size,
};
}
}
#print Dumper(\%$servers);
for my $server (keys %servers) {
my $start_server= $servers->{ $server }{ start_server };
my $start_date = $servers->{ $server }{ start_server }->{ Start }-
>{'date' };
my $start_time = $servers->{ $server }{ start_server }->{ Start }-
>{ 'time' };
my $end_date = $servers->{ $server }{ start_server }->{ End }->
{ 'date' };
my $end_time = $servers->{ $server }{ start_server }->{ End }->
{ 'time' };
my $end_size = $servers->{ $server }{ start_server }->{ End }->
{ 'size' };
#print "Start server :$start_server\nStart Date: $start_date
$start_time\nEnd Date: $end_date $end_time\nSize: $end_size\n\n";
}
The "good reason" is probably that the author the log generating
software was influenced by syslog which doesn't include the year either
(which can be a major pain in the ass when you have to process old
logfiles).
hp
The "good reason" is probably that the author of the log generating
^dst (.{10}) (.{8}) \w+ [\w.:]+(Hercules:\w+) (Start|End) (\(.*?\))
(I made an error here: I meant
my ($date, $time, $name, $action, $size)
)
> > � � ) {
> >
>
> Correct, all is one line such as below:
> dst Sun Nov 15 00:00:03 EST galaxy.fuqua.duke.edu:fv_phd
> Hercules:sm_fv_phd Start
>
> >
> > I am also assuming that the part of the 'name' string you want always
> > starts with 'Hercules:'. If this isn't correct you will need to work out
> > what the correct rules are.
>
> No 'name' does not always starts with 'Hercules:' it could be
> different as each log is different.
Then how do you tell where the section of the name you are interested in
starts? Your example line above suggests the two pieces are
space-separated, which your original example did not.
> also ^dst is not the case always , it could be slk or src
So, modify the regex accordingly.
> Per your recommendation I created the following, but it does not work:
'Does not work' is not a useful problem description.
> #!/usr/bin/perl -w
> use warnings;
> use strict;
>
> use Data::Dumper;
> my %servers;
>
> my $time;
> while (<DATA>) {
> if (my ($date, $name, $action, $size) = /dst (.{10}) (.{8}) \w+
> [\w.:]+(Hercules:\w+) (Start|End) (\(.*?\))?/ ) {
> print "[DBG]: $date, $name, $action, $size\n";
> if ($action eq "Start") {
> $servers{$name}{start_server} = {
> 'date' => $date,
> 'time' => $time, <-- thi is not defined and it
> complains
Yes, that was the error I pointed out above. You could have worked that
out for yourself.
> };
> }
> $servers{$name}{$action} = {
> 'date ' => $date,
> 'time' => $time, <--- this is not defined complains
> 'size ' => $size,
> };
Oh, for goodness' sake! Did you actually *read* what I wrote, the words
as well as the code? This assignment (to ...{$action}) is *instead* of
the if blocks, not as well as.
> }
> }
>
> #print Dumper(\%$servers);
>
> for my $server (keys %servers) {
> my $start_server= $servers->{ $server }{ start_server };
You don't have a $servers variable. You do have a %servers variable.
> my $start_date = $servers->{ $server }{ start_server }->{ Start }-
> >{'date' };
> my $start_time = $servers->{ $server }{ start_server }->{ Start }-
> >{ 'time' };
> my $end_date = $servers->{ $server }{ start_server }->{ End }->
> { 'date' };
> my $end_time = $servers->{ $server }{ start_server }->{ End }->
> { 'time' };
> my $end_size = $servers->{ $server }{ start_server }->{ End }->
> { 'size' };
No, you have completely misunderstood how this data structure is
supposed to work (and adding the 'Start' records twice, under different
names, doesn't help). Print a Dumper of the structure and then study it
until you understand what ends up where.
As a point of style, repeating essentially the same deref expression
over and over is terribly messy. Assign it to a temporary:
my $start = $servers{$server}{Start};
my $end = $servers{$server}{End};
print <<OUT;
Server: $server
Start Date: $start->{date}
...
OUT
Ben
Thank you so much Ben, I got it to work god bless you
The hash table seems it overwrites if one server has more than one
'Start|End', I got more than 300 records, when I run the program it
record the last 23, any idea?
Please stop machine-gun posting. And please do not start new threads
when a perfectly fine one is already available to you.
Wait for some of the replies to come back before adding more and more
posts that add or change information.
Anyone who wants the full history on this now has to try to find three
unrelated threads, and read a large number of posts just to get the
picture.
if you have not already done so, please read
http://www.rehabitation.com/clpmisc.shtml
and
http://www.catb.org/~esr/faqs/smart-questions.html
if you don't want to run out of credit before you start.
Martien
--
|
Martien Verbruggen | prepBut nI vrbLike adjHungarian! qWhat's
first...@heliotrope.com.au | artThe adjBig nProblem? -- Alec Flett
|
Yes. At the level where you now have the hash ref with Start and End
keys, put an array ref. When you see a start tag, create a new array
element and put a hash ref in it which has keys Start and End, just as
the one single one you have now. When you see an End tag, add that to
the last element of that array (beware of possible dangling end lines).
Let us know if you have troubles implementing that, by showing us what
you did, and how it's failing.
More changing requirements. You are what Eric Raymond calls a time-sink.
Martien
--
|
Martien Verbruggen | Computers in the future may weigh no
first...@heliotrope.com.au | more than 1.5 tons. -- Popular
| Mechanics, 1949
BTW. This assumes that start-end pairs for the same 'server' cannot
overlap in any way. If they do, and there is no other information that
can be used to match start and end tag, there is no solution.
Martien
--
| Yes; Windows is great for running &
Martien Verbruggen | developing viruses, for instance. It's
first...@heliotrope.com.au | also very popular, but then again, so is
| the common cold. -- Dave Hinz
An alternative would be to move the printing logic into the 'End' block,
so that you never need to remember more than one 'Start' line per
server.
Ben
I'm sorry I'm totally lost again here, will you please tell me how to
that please, please, I've not slept over this since yesterday, I'm not
good at this Hash-of-has, please.
Pleading will not make anyone more inclined to help you. The basic logic
you need looks like
for each line
if it's a 'Start' line
store all the captured information under the appropriate
server name.
if it's an 'End' line
check you have a 'Start' line recorded for this server.
print the data.
delete the 'Start' entry now that you've finished with it.
Now you write the code. You have all the pieces you need already.
Ben
I have this one and it does not work..
if ($action eq "Start") {
$servers{$name}{start_server} = {
date => $date,
time => $time,
};
if ($action eq "End") {
if (exists $servers{$name}{start_server}) {
$servers{$name}{$action} = {
date => $date,
time => $time,
size => $size,
};
my $start= $servers{$name}{Start};
my $end= $servers{$name}{End};
print "Server: $servers{$name}\n";
print "Start date: $start->{date}\n";
print "Start time: $start->{time}\n";
print "End date: $end->{date}\n";
print "End time: $end->{time}\n";
print "End size: $end->{size}\n";
delete $servers{$name}{start_server};
}
}
}
}
>> I'm sorry I'm totally lost again here, will you please tell me how to
>> that please, please,
> Pleading will not make anyone more inclined to help you.
And the Posting Guidelines warn against pleading too.
--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
Fix your indentation.
<snip>
> $servers{$name}{start_server} = {
<snip>
> if (exists $servers{$name}{start_server}) {
<snip>
> my $start= $servers{$name}{Start};
<snip>
One of these is not the same as the other two, not to mention that the
second level of hash is redundant at this point.
> print "Server: $servers{$name}\n";
> print "Start date: $start->{date}\n";
> print "Start time: $start->{time}\n";
> print "End date: $end->{date}\n";
> print "End time: $end->{time}\n";
> print "End size: $end->{size}\n";
Learn how to use heredocs.
Ben
Is this a different problem or the same problem as you posted under
"help with regex"?
It is awfully confusing as at least I am not able to keep the
information from both threads separated.
jue
if ($action eq "Start") {
$servers{$name}{start_server} = {
date => $date,
time => $time,
};
} elsif ($action eq "End") {
if (exists $servers{$name}{start_server}) {
$servers{$name}{$action} = {
date => $date,
time => $time,
size => $size,
};
print "Server: $name\n";
my $start= $servers{$name}{start_server};
print "Start date: $start->{date}\n";
print "Start time: $start->{time}\n";
my $end= $servers{$name}{End};
print "End date: $end->{date}\n";
print "End time: $end->{time}\n";
print "End size: $end->{size}\n\n";
delete $servers{$name};
}
}
Well, yeah, that's exactly what hashes are designed for and what they
do! I was about to point out that depending upon your data a hash may
not be the best choice for a data structure. That's what you get for
changing the requirements on the fly.
While Martien's suggestion with a HoA is certainly doable I would use a
different approach altogether: for a large amount of data dump all those
entries into a database and then go from there.
But for your tiny number of only 300 records just read them into a
two-dimensional array (AoA). A trivial split() on white space seems to
do quite nicely, although I have a suspicion that somewhere in the data
you did not show us there may be yet another hidden surprise.
Then retrieve the set of unique server names, and for each server name
filter (grep()) for all entries for that server, and then process those
one by one as needed for your output.
How you want that output to be compiled is still unclear to me. You just
admitted that each server can have multiple entries in the log file. But
can the time spans overlap (time between Start and End)? Can they be
subsets, i.e. can the task that started second finish first? Are all
entries in the log file in chronological sequence (otherwise you would
have to sort them by time first)? As you can see there are still a lot
of open questions even about your input data. And please don't start
changing the requirements again halfway through the solution. That's
what people call a moving target and nobody can hit that.
jue
Nice idea for what the OP told us so far! Unfortunately I have a strong
feeling that what he told us so far is still only part of the story.
Some open questions still (the OP did not mention any details on those
and I have a suspicion that he is so bogged down in tiny details that he
doesn't see the big issues):
- your suggestion requires that the log file is in chronological order.
We do not now that.
- what about overlapping time slots, e.g. Start2 happens before End1?
- what about included time slots, e.g. End2 happens before End1?
- what about incomplete tasks or abnormal termination? What if for
whatever reason (network cable unplugged? janitor pulled power cord for
his vacuum cleaner?) there is a Start-x but no corresponding End-x?
- ...
I strongly believe the OP just jumped at trying to code the nitty-gritty
details without giving any thought to the bigger picture as is very
evident by now admitting that the same server can have many entries
which was completely missing from the sample data he was presenting to
us. Typical case of a moving target "Oh, but there is now yet another
problem that I didn't think about".
jue
Actually the number of records could go up to 10000 records, depends
of the log file
> As you can see there are still a lot of open questions even about your input data.
The input data could be something like this:
src date time EST server-dep server-name-x Start
dst date time EST server-dep server-name-y Start
slk date time EST server-dep server-name-x Request
dst date time EST server-dep server-name-z Start
src date time EST server-dep server-name-x End (size kb)
slk date time EST server-dep server-name-y Request
slk date time EST server-dep server-name-z Request
dst date time EST server-dep server-name-y End (size kb)
dst date time EST server-dep server-name-z End
.....
.....
.....
src date time EST server-dep server-name-x Start
dst date time EST server-dep server-name-y Start
slk date time EST server-dep server-name-x Request
dst date time EST server-dep server-name-z Start
src date time EST server-dep server-name-x End (size kb)
slk date time EST server-dep server-name-y Request
slk date time EST server-dep server-name-z Request
dst date time EST server-dep server-name-y End (size kb)
dst date time EST server-dep server-name-z End
> And please don't start changing the requirements again halfway through the solution. That's
> what people call a moving target and nobody can hit that.
It won't happen again, I learned my lesson...
I will go with what I have or I should say I got from good people here
since it works the way I was it.
Thank you to all of you good people out there helping people like me.
10,000 is also a tiny number.