Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Filtering and analyzing queries by year, month, hour and day with mk-query-digest
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Daniel Nichter  
View profile  
 More options Oct 22 2009, 5:47 pm
From: Daniel Nichter <dan...@percona.com>
Date: Thu, 22 Oct 2009 14:47:57 -0700 (PDT)
Local: Thurs, Oct 22 2009 5:47 pm
Subject: Filtering and analyzing queries by year, month, hour and day with mk-query-digest
A little while ago a user asked in
http://groups.google.com/group/maatkit-discuss/browse_thread/thread/2...
if it was possible to use mk-query-digest to analyze queries per
hour.  I responded with a skeleton script for use with --filter, but I
didn't actually test this.  Today, I filled out the script and tested
it and found that it works.  The script is available from trunk at:

http://maatkit.googlecode.com/svn/trunk/mk-query-digest/t/samples/fil...

The test file I'm using is available at:

http://maatkit.googlecode.com/svn/trunk/common/t/samples/binlog005.txt

The filter code does two things: it adds attributes called year,
month, day and hour to each event, and it uses environment variables
called YEAR, MONTH, DAY and HOUR to filter those newly added
attributes.  I'll show how this works later.

The filter works best with binary logs because binlogs reliably
timestamp events.  If an event does not have a timestamp (as is often
the case in a slowlog), then it gets values 0, 0, 0, 24 for year,
month, day and hour respectively.  Since 0 is a valid hour, 24 is used
to indicate that the event had no hour.

The basic usage is to group queries by hour.  Let's say you want to
see query stats for each hour.  The command line is:

mk-query-digest --type binlog binlog005.txt --filter filter-add-ymdh-
attribs.txt --group-by hour

Notice "--group-by hour".  And the result is (truncated for brevity):

#
########################################################################
# Report grouped by hour
#
########################################################################

# Item 1: 1.50 QPS, 31.01kx concurrency, ID 0x0DB5E4B97FC2AF39 at byte
450
#              pct   total     min     max     avg     95%  stddev
median
# Count         30       3
# Exec time     30  62029s  20661s  20704s  20676s  19861s       0
19861s
# Time range 2007-12-07 13:02:08 to 2007-12-07 13:02:10
# bytes         23      81      27      27      27      27
0      27
# day           25      21       7       7       7       7
0       7
# error cod      0       0       0       0       0       0
0       0
# month         27      36      12      12      12      12
0      12
# year          27      21       7       7       7       7
0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+
################################################################
13
...
# Item 2: 0.00 QPS, 0.71x concurrency, ID 0xAA27A0C99BFF6710 at byte
301 _
#              pct   total     min     max     avg     95%  stddev
median
# Count         30       3
# Exec time     29  62000s  20661s  20675s  20667s  19861s       0
19861s
# Time range 2007-12-07 12:02:50 to 2007-12-08 12:12:12
# bytes         46     163      22      87   40.75   84.10   25.86
26.08
# day           37      30       7       8    7.50    7.70    0.36
7.70
# error cod      0       0       0       0       0       0
0       0
# month         36      48      12      12      12      12
0      12
# year          36      28       7       7       7       7
0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+
################################################################
12
...
# Rank Query ID           Response time    Calls   R/Call     Item
# ==== ================== ================ ======= ========== ====
#    1 0x                 62029.0000 30.0%       3 20676.3333 13
#    2 0x                 62000.0000 30.0%       3 20666.6667 12
#    3 0x                 20661.0000 10.0%       1 20661.0000 23
#    4 0x                 20661.0000 10.0%       1 20661.0000 10
#    5 0x                 20661.0000 10.0%       1 20661.0000 08
#    6 0x                 20661.0000 10.0%       1 20661.0000 18

Each item corresponds to the queries for that hour.  Shown above are
hours 13 (1pm) and 12 (noon).  Then the profile gives you summarized
information about each hour.  From this fake binlog we see that 30% of
queries occurred in the noon hour.  (binlog005.txt is highly
contrived; the values are just for demonstration.)

Unless your logs are rotated daily, chances are there will be noon-
hour queries for multiple days.  If you want to see per-hour stats for
one specific day, the filter can do this, too, by using environment
variables.  Filter scripts were not originally meant to accept user
input, and having to modify values in the actual code isn't flexible,
so the solution is to use environment variables.  Here's how:

DAY=7 mk-query-digest --type binlog binlog005.txt --filter filter-add-
ymdh-attribs.txt --group-by hour

The leading "DAY=7" temporarily sets the environment variable DAY only
during the execution of mk-query-digest.  This way you don't pollute
your normal environment variables.  The result is now (truncated
again):

# Item 2: 0.00 QPS, 12.24x concurrency, ID 0xAA27A0C99BFF6710 at byte
301
# This item is included in the report because it matches --limit.
#              pct   total     min     max     avg     95%  stddev
median
# Count         28       2
# Exec time     28  41339s  20664s  20675s  20670s  20675s      8s
20670s
# Time range 2007-12-07 12:02:50 to 2007-12-07 12:59:07
# bytes         28      54      27      27      27      27
0      27
# day           28      14       7       7       7       7
0       7
# error cod      0       0       0       0       0       0
0       0
# month         28      24      12      12      12      12
0      12
# year          28      14       7       7       7       7
0       7
# Query_time distribution
#   1us
#  10us
# 100us
#   1ms
#  10ms
# 100ms
#    1s
#  10s+
################################################################
12

Notice that there are now only 2 queries in the noon hour and that the
time range is only in 2007-12-07.  Previously, there was a noon-hour
query in 2007-12-08.  Thus we know that the DAY filter worked.

In this fashion, you can group and filter your log as you please.  You
can combine multiple filters like:

DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter
filter-add-ymdh-attribs.txt --group-by hour

That will group and analyze only queries from the noon hour of the 7th
(December 7, 2007 in this log).  mk-query-digest is so flexible you
can even do this:

DAY=7 HOUR=12 mk-query-digest --type binlog binlog005.txt --filter
filter-add-ymdh-attribs.txt --group-by hour --no-report --print

That will suppress the query analysis and report and simply print all
the queries from the noon hour of the 7th in pseudo-slowlog format.

There are, of course, other ways to do this kind of per-time-unit
query aggregation, analysis and reporting, but if all you have are pre-
existing logs and mk-query-digest, then --filter can be used to
accomplish this task, too.

-Daniel


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Nichter  
View profile  
 More options Oct 22 2009, 6:06 pm
From: Daniel Nichter <dan...@percona.com>
Date: Thu, 22 Oct 2009 15:06:22 -0700 (PDT)
Local: Thurs, Oct 22 2009 6:06 pm
Subject: Re: Filtering and analyzing queries by year, month, hour and day with mk-query-digest
I should also point out that --since and --until is one of those
"other ways" to do this.

On Oct 22, 3:47 pm, Daniel Nichter <dan...@percona.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
eonarts  
View profile  
 More options Oct 27 2009, 3:30 pm
From: eonarts <eona...@gmail.com>
Date: Tue, 27 Oct 2009 12:30:48 -0700 (PDT)
Local: Tues, Oct 27 2009 3:30 pm
Subject: Re: Filtering and analyzing queries by year, month, hour and day with mk-query-digest
Hey Daniel!
I love maatkit so I tried your examples but alas I could not get them
to translate into realworld ....

I use your filter file: filter-add-ymdh-attribs.txt, but I use my
binlog..... (mysql v. 5.0.51a ubuntu default)

I get this:
# 1.1s user time, 300ms system time, 338.84M rss, 364.11M vsz
Use of uninitialized value in numeric ge (>=) at /home/dba/bin/mk-
query-digest line 1368.

erin


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »