Parsing GC Log File

AyOut

unread,

Nov 4, 2009, 9:07:39 PM11/4/09

to

I have a GC log file with entries like this one:

2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
(502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
secs]

I would like to parse this to output for easy plotting using gnuplot
and would like the following output:

2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
0.04, 0.02

I have tried with a command like this:

awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s
%s %s %s %s\n", $1,$2,$3,$4,$5,$6)}' gc_20091104_024256_psghlc301.log
| sed "s/[0-9][0-9]:.*GC \[PSYoungGen: /, /" | sed "s/K.*->/, /" | sed
"s/K.*(/, /" | sed "s/K)//"

but it jumps over several fields and gives me the following output:

2.7, 70850, 6800, 502464, 0.0165440

How can I set sed to not look at the last match ( "K(" ), but trigger
on the first match?

Thanks

Ed Morton

unread,

Nov 4, 2009, 11:11:44 PM11/4/09

to

AyOut wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

Assuming the input is all on one line:

$ cat file

2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]

$ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file

2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02

Ed.

Kaz Kylheku

unread,

Nov 5, 2009, 12:39:14 AM11/5/09

to

On 2009-11-05, AyOut <mor...@gmail.com> wrote:
> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

Kaz's txr utility to the rescue.

txr -c '@(collect)
@num: [GC [PSYoungGen: @{size1}K->@{size2}K(@{size3}K)] @{size4}K->@{size5}K (@{size6}K), @secs secs] [Times: user=@utime sys=@systime, real=@realtime secs]
@(end)
@(output)
@(repeat)
@num, @size1, @size2, @size3, @size4, @size5, @size6, @secs, @utime, @systime, @realtime
@(end)
@(end)
' logfile

www.nongnu.org/txr

AyOut

unread,

Nov 6, 2009, 2:39:55 PM11/6/09

to

That's a beautiful solution! Now, there's a change in the log file
output. The first field is now a date and time stamp

2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K), 0.0204170
secs]
2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
0.0043760 secs]

and applying this command

cat ${gclogfile}|sed 's/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][A-
Z]:*//'|sed 's/\.[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]:*//'|awk -F:
'{print (NR==1||(!$1&&$1!=p)?++c:c),$0;p=$1}'

generates the following output:

15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760

where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a
way to have the output look like this:

15:00:16, 0.405, 2112, 750, 7680, 0.0204170
15:00:17, 0.527, 2862, 1010, 7680, 0.0043760

Thanks!

Ed Morton

unread,

Nov 6, 2009, 3:09:15 PM11/6/09

to

"beautiful solution" discarded apparently!

> generates the following output:
>
> 15, 00, 16, 0.405, 2112, 750, 7680, 0.0204170
> 15, 00, 17, 0.527, 2862, 1010, 7680, 0.0043760
>
> where the time stamp (15:00:16) shows up as 15, 00, 16. Is there a
> way to have the output look like this:
>
> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>

> Thanks!- Hide quoted text -
>
> - Show quoted text -

Why do you keep going back to pipelines of cat, sed, and awk? If
you're going to use awk anyway, you don't need sed or cat.

Try this:

awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file

Ed.

AyOut

unread,

Nov 6, 2009, 3:29:55 PM11/6/09

to

Thanks, Ed!

Well, I'm by no means a shell expert.

Running your command on the file, I get the following output:

15:00:16, :, [GC, K->, K(, K),, secs]

Ed Morton

unread,

Nov 6, 2009, 3:37:13 PM11/6/09

to

> 15:00:16, :, [GC, K->, K(, K),, secs]- Hide quoted text -

>
> - Show quoted text -

Are you sure you copy/pasted my script instead of retyping it?
Are you sure your input file is the same as you posted?

Look:

$ cat file

2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K),
0.0204170 secs]
2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
0.0043760 secs]

$ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^
[:digit:].]/," "); $1=$1; print t,$0}' file

15:00:16, 0.405, 2112, 750, 7680, 0.0204170
15:00:17, 0.527, 2862, 1010, 7680, 0.0043760

Please post exactly the same commands and their output so we can see
where something's going wrong.

Ed.

Ed Morton

unread,

Nov 6, 2009, 3:43:17 PM11/6/09

to

> Ed.- Hide quoted text -

>
> - Show quoted text -

Hint: check if you mistyped the gsub() as

gsub(/[[:digit:].]/," ")

instead of what I had:

gsub(/[^[:digit:].]/," ")

Note the "^".

Regards,

Ed.

AyOut

unread,

Nov 6, 2009, 3:44:41 PM11/6/09

to

My bad! I lost the ^ in the copy/past.

Thanks, Ed!

Ben Bacarisse

unread,

Nov 6, 2009, 3:54:46 PM11/6/09

to

AyOut <mor...@gmail.com> writes:

> I have a GC log file with entries like this one:
>
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
>
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
>
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02

If you can live without the spaces:

tr -sc '0-9.' ,

or (since gnuplot won't mind):

tr -sc '0-9.' ' '

<snip>
--
Ben.

stan

unread,

Nov 7, 2009, 9:13:10 PM11/7/09

to

Ed Morton wrote:
<snip>

>> > > > > Assuming the input is all on one line:
>>
>> > > > > $ cat file
>> > > > > 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K (502464K),
>> > > > > 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02 secs]
>>
>> > > > > $ awk '{OFS=", "; gsub(/[^[:digit:].]/," "); $1=$1}1' file
>> > > > > 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09, 0.04, 0.02

This looks like the original code.

<snip>

>> > > Try this:
>>
>> > > awk '{OFS=", "; t=substr($0,12,8); $0=substr($0,30);
>> > > � � � � gsub(/[[:digit:].]/," "); $1=$1; print t,$0}' file
>>
>> > > � �Ed.

This looks like your ammended code.

>>
>> > Thanks, Ed!
>>
>> > Well, I'm by no means a shell expert.
>>
>> > Running your command on the file, I get the following output:
>>
>> > 15:00:16, :, [GC, K->, K(, K),, secs]
>>

>> Are you sure you copy/pasted my script instead of retyping it?
>> Are you sure your input file is the same as you posted?
>>
>> Look:
>>
>> $ cat file
>> 2009-11-05T15:00:16.965-0600: 0.405: [GC 2112K->750K(7680K),
>> 0.0204170 secs]
>> 2009-11-05T15:00:17.087-0600: 0.527: [GC 2862K->1010K(7680K),
>> 0.0043760 secs]
>>
>> $ awk '{OFS=", "; t=substr($1,12,8); $0=substr($0,30); gsub(/[^
>> [:digit:].]/," "); $1=$1; print t,$0}' file
>> 15:00:16, 0.405, 2112, 750, 7680, 0.0204170
>> 15:00:17, 0.527, 2862, 1010, 7680, 0.0043760
>>
>> Please post exactly the same commands and their output so we can see
>> where something's going wrong.

I could be wrong, but I noticed what I thought was a missing "^" in
your ammended code. It could be that I only saw a munged reply in the
thread and missed the original reply.

I actually stopped for a couple of minutes when I saw your amended code
because I couldn't figure out how it worked and I typically learn
something nearly every time I read yur code. Local events overcame my
studies and I never got to try it out. My point here is not to call
out others errors; I thought my knowledge was leaking through a hole
and your response actually put a finger in the hole! I wanted to say
thanks.

As I get older I can't distuinguish between senior moments and actual
ignorance. The only bright side is that I can enjoy old movies.g

w_a_x_man

unread,

Nov 8, 2009, 4:37:36 AM11/8/09

to

On Nov 6, 1:39 pm, AyOut <mort...@gmail.com> wrote:

ruby -ne'puts [$_[11,8], $_[30..-1].scan(/[\d.]+/)].join(", ")' file

=== output ===

Ed Morton

unread,

Nov 8, 2009, 7:31:44 AM11/8/09

to

You're right, looks like I did drop the "^" in one of my posts.

Ed.

Michael Paoli

unread,

Nov 8, 2009, 2:10:55 PM11/8/09

to

On Nov 4, 6:07 pm, AyOut <mor...@gmail.com> wrote:
> I have a GC log file with entries like this one:
> 2.729: [GC [PSYoungGen: 70850K->6800K(152896K)] 70850K->6800K
> (502464K), 0.0165440 secs] [Times: user=0.09 sys=0.04, real=0.02
> secs]
> I would like to parse this to output for easy plotting using gnuplot
> and would like the following output:
> 2.729, 70850, 6800, 152896, 70850, 6800, 502464, 0.0165440, 0.09,
> 0.04, 0.02
>
> I have tried with a command like this:
> awk '{if($1~/[0-9]+/ && $2=="[GC" && $3=="[PSYoungGen:")printf("%s %s

...

sed -e 's/[^0-9.]/ /g;s/ */ /g;s/^ //;s/ $//;s/ /, /g'

Rakesh Sharma

unread,

Nov 10, 2009, 1:13:33 AM11/10/09

to

You could do this in one go:

perl -lne '$,=", ";print/\d+[.]?(?:\d+)?|[.]\d+/g'
yourfile

perl -lpe '$"=", ";$_="@{[/\d+[.]?(?:\d+)?|[.]\d+/g]}"'
yourfile

--Rakesh