Re: Hannibal - additional metrics

6 views
Skip to first unread message

Nils Kübler

unread,
Oct 28, 2013, 7:56:12 AM10/28/13
to Vitaliy Morarian, hannib...@googlegroups.com
Hi Vitaliy,

Great stuff you are working on. This looks like an awesome addition, it would be nice if you plan contribute this back to the project when you get it working :-)

Regarding the counter-based metrics, I have to disappoint you. For most metrics there are tools like Ganglia and therefor we did not plan to support counter-based metrics. If I understand correctly you proposed to store the absolute values for the counters and to query the rate by doing it like this psoido code:

lower = SELECT value FROM record WHERE metric_id = {metric_id} AND timestamp <= (NOW - 3600000) LIMIT 1
upper = SELECT value FROM record WHERE metric_id = {metric_id} AND timestamp <= (NOW) LIMIT 1
ratePerSecond = (upper.value - lower.value) / 3600

You’re right that this looks quite like a big overhead as it would result in two queries per region. Maybe there is a way to combine the query for multiple regions? 


Best Regards

Nils Kübler



--
Nils Kübler
Software Engineer
____________________________________

YMC AG
Sonnenstr. 4
CH-8280 Kreuzlingen
Switzerland

Tel +41 (0)71 / 508 24 81
Fax +41 (0)71 / 560 53 89

Web www.ymc.ch
____________________________________ 


On 25. Oktoober 2013 at 10:12:18, Vitaliy Morarian (vmor...@gmail.com) wrote:

Hi Nils,

We have installed your tool to monitor our small cluster. And it awesome - we already discovered few largest regions and spit them manually.

However, we thought that it would be also great to see 2 additional metrics:

15.4.3.8. readRequestsCount

Number of read requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.

15.4.3.11. writeRequestsCount

Number of write requests for this RegionServer since startup. Note: this is a 32-bit integer and can roll.


We think that it will help us to understand better correlation between region size and how hot is it. So I started to investigate sources of Hannibal.
Correct me please if I'm mistaken, but I found that current implementation doesn't support Counters (which are continuously increasing) - it operates only with Gauges.

So I adopted code a bit and started to store in DB counter value as is. But for history screen I changed /api controller - I'm transforming counters into gauges (by calculating deltas). 
You can see how it looks in screenshot below.
It works, but counter it not so usable for screen where are displayed all regions of given table. I added possibility to sort by Writes/Reads, but it's not so valuable because regions are created in different time.
So it would be great to be able by rate. But open question I have is how to do that and which duration should be used to calculate this rate (like last 1 hour, or last 1 day).
I could fetch metric value for each region stored N hours ago and calculate this rate. But from performance point of view it doesn't look very well. 

So questions I have:
- Do you have any plans to add such Counter metrics?
- How it would be better to calculate rates?


In this screen sorted by counter value (but would be great to sort by rate)

Inline image 2


In this screen are displayed deltas:
Inline image 1



With best regards,
Vitaliy Morarian
ii_141eea148a810c60
ii_141ee96d5f895e57

Vitaliy Morarian

unread,
Oct 28, 2013, 8:45:17 AM10/28/13
to Nils Kübler, hannib...@googlegroups.com
Hi Nils,

Yes, I'm storing counter value as is. But this solution:
  lower = SELECT value FROM record WHERE metric_id = {metric_id} AND timestamp <= (NOW - 3600000) LIMIT 1
  upper = SELECT value FROM record WHERE metric_id = {metric_id} AND timestamp <= (NOW) LIMIT 1
  ratePerSecond = (upper.value - lower.value) / 3600
didn't scale. Procession of Api.regions started to take more than 2 mins in our prod env. That because Hannibal had to do additional ~2500*2 queries just to calculate these rates.
So in my second revision I changed it slightly: I implemented actor which calculates rate every 1 hour of read/write per last 24 hours. And Api.regions just uses this pre-calculated map to inject data.


Inline image 1





With best regards,
Vitaliy Morarian


ii_141eea148a810c60
ii_141ee96d5f895e57
image.png
Reply all
Reply to author
Forward
0 new messages