Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Parsing GC Log File

13 views
Skip to first unread message

AyOut

unread,
Nov 4, 2009, 9:07:39 PM11/4/09
to
I have a GC log file with entries like this one:

2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
(502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
secs]

I would like to parse this to output for easy plotting using gnuplot
and would like the following output:

2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
0.04, 0.02

I have tried with a command like this:

awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s
%s %s %s %s\n", $1,$2,$3,$4,$5,$6)}' gc_20091104_024256_psghlc301.log
| sed "s/[0-9][0-9]:.*GC \[PSYoungGen: /, /" | sed "s/K.*->/, /" | sed
"s/K.*(/, /" | sed "s/K)//"

but it jumps over several fields and gives me the following output:

2.7, 70850, 6800, 502464, 0.0165440

How can I set sed to not look at the last match ( "K(" ), but trigger
on the first match?

Thanks

Ed Morton

unread,
Nov 4, 2009, 11:11:44 PM11/4/09
to
AyOut wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

Assuming the input is all on one line:

$ cat file


2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]

$ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file


2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02

Ed.

Kaz Kylheku

unread,
Nov 5, 2009, 12:39:14 AM11/5/09
to
On 2009-11-05, AyOut <mor...@gmail.com> wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

Kaz's txr utility to the rescue.

txr -c '@(collect)
@num: [GC [PSYoungGen: @{size1}K->@{size2}K(@{size3}K)] @{size4}K->@{size5}K (@{size6}K), @secs secs] [Times: user=@utime sys=@systime, real=@realtime secs]
@(end)
@(output)
@(repeat)
@num, @size1, @size2, @size3, @size4, @size5, @size6, @secs, @utime, @systime, @realtime
@(end)
@(end)
' logfile

www.nongnu.org/txr

AyOut

unread,
Nov 6, 2009, 2:39:55 PM11/6/09
to

That's a beautiful solution! Now, there's a change in the log file
output. The first field is now a date and time stamp

2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170
secs]
2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
0.0043760 secs]

and applying this command

cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A-
Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F:
'{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}'

generates the following output:

15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760

where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a
way to have the output look like this:

15:00:16, 0.405, 2112, 750, 7680, 0.0204170
15:00:17, 0.527, 2862, 1010, 7680, 0.0043760

Thanks!

Ed Morton

unread,
Nov 6, 2009, 3:09:15 PM11/6/09
to

"beautiful solution" discarded apparently!

> generates the following output:
>
> 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
> 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760
>
> where the time stamp (15:00:16) shows up as 15, 00, 16.  Is there a
> way to have the output look like this:
>
> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>

> Thanks!- Hide quoted text -
>
> - Show quoted text -

Why do you keep going back to pipelines of cat, sed, and awk? If
you're going to use awk anyway, you don't need sed or cat.

Try this:

awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file

Ed.

AyOut

unread,
Nov 6, 2009, 3:29:55 PM11/6/09
to

Thanks, Ed!

Well, I'm by no means a shell expert.

Running your command on the file, I get the following output:

15:00:16, :, [GC, K->, K(, K),, secs]

Ed Morton

unread,
Nov 6, 2009, 3:37:13 PM11/6/09
to
> 15:00:16, :, [GC, K->, K(, K),, secs]- Hide quoted text -

>
> - Show quoted text -

Are you sure you copy/pasted my script instead of retyping it?
Are you sure your input file is the same as you posted?

Look:

$ cat file


2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K),
0.0204170 secs]
2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
0.0043760 secs]

$ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^
[:digit:].]/," "); $1=$1; print t,$0}' file


15:00:16, 0.405, 2112, 750, 7680, 0.0204170
15:00:17, 0.527, 2862, 1010, 7680, 0.0043760

Please post exactly the same commands and their output so we can see
where something's going wrong.

Ed.

Ed Morton

unread,
Nov 6, 2009, 3:43:17 PM11/6/09
to
>      Ed.- Hide quoted text -

>
> - Show quoted text -

Hint: check if you mistyped the gsub() as

gsub(/[[:digit:].]/," ")

instead of what I had:

gsub(/[^[:digit:].]/," ")

Note the "^".

Regards,

Ed.

AyOut

unread,
Nov 6, 2009, 3:44:41 PM11/6/09
to

My bad! I lost the ^ in the copy/past.

Thanks, Ed!

Ben Bacarisse

unread,
Nov 6, 2009, 3:54:46 PM11/6/09
to
AyOut <mor...@gmail.com> writes:

> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

If you can live without the spaces:

tr -sc '0-9.' ,

or (since gnuplot won't mind):

tr -sc '0-9.' ' '

<snip>
--
Ben.

stan

unread,
Nov 7, 2009, 9:13:10 PM11/7/09
to
Ed Morton wrote:
<snip>

>> > > > > Assuming the input is all on one line:
>>
>> > > > > $ cat file
>> > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
>> > > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>>
>> > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
>> > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02

This looks like the original code.

<snip>

>> > > Try this:
>>
>> > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
>> > > � � � � gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file
>>
>> > > � �Ed.

This looks like your ammended code.

>>
>> > Thanks, Ed!
>>
>> > Well, I'm by no means a shell expert.
>>
>> > Running your command on the file, I get the following output:
>>
>> > 15:00:16, :, [GC, K->, K(, K),, secs]
>>

>> Are you sure you copy/pasted my script instead of retyping it?
>> Are you sure your input file is the same as you posted?
>>
>> Look:
>>
>> $ cat file
>> 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K),
>> 0.0204170 secs]
>> 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
>> 0.0043760 secs]
>>
>> $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^
>> [:digit:].]/," "); $1=$1; print t,$0}' file
>> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
>> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>>
>> Please post exactly the same commands and their output so we can see
>> where something's going wrong.

I could be wrong, but I noticed what I thought was a missing "^" in
your ammended code. It could be that I only saw a munged reply in the
thread and missed the original reply.

I actually stopped for a couple of minutes when I saw your amended code
because I couldn't figure out how it worked and I typically learn
something nearly every time I read yur code. Local events overcame my
studies and I never got to try it out. My point here is not to call
out others errors; I thought my knowledge was leaking through a hole
and your response actually put a finger in the hole! I wanted to say
thanks.

As I get older I can't distuinguish between senior moments and actual
ignorance. The only bright side is that I can enjoy old movies.g

w_a_x_man

unread,
Nov 8, 2009, 4:37:36 AM11/8/09
to
On Nov 6, 1:39 pm, AyOut <mort...@gmail.com> wrote:

ruby -ne'puts [$_[11,8], $_[30..-1].scan(/[\d.]+/)].join(", ")' file

=== output ===

Ed Morton

unread,
Nov 8, 2009, 7:31:44 AM11/8/09
to

You're right, looks like I did drop the "^" in one of my posts.

Ed.

Michael Paoli

unread,
Nov 8, 2009, 2:10:55 PM11/8/09
to
On Nov 4, 6:07 pm, AyOut <mor...@gmail.com> wrote:
> I have a GC log file with entries like this one:
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02
>
> I have tried with a command like this:
> awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s
...

sed -e 's/[^0-9.]/ /g;s/ */ /g;s/^ //;s/ $//;s/ /, /g'

Rakesh Sharma

unread,
Nov 10, 2009, 1:13:33 AM11/10/09
to


You could do this in one go:

perl -lne '$,=", ";print/\d+[.]?(?:\d+)?|[.]\d+/g'
yourfile

perl -lpe '$"=", ";$_="@{[/\d+[.]?(?:\d+)?|[.]\d+/g]}"'
yourfile

--Rakesh

0 new messages