TxMetricsCollector has empty implementation

eh...@gigya-inc.com

unread,

Mar 8, 2015, 1:39:32 PM3/8/15

to tephr...@googlegroups.com

Hi all,

We are checking out Tephra, and one of our concerns is how to check that the service is healthy ?

So you might answer that we can see in zookeeper that we have a connection, but this is not good enough, what if we have a connection and the process is not working well, we would like to see some metrics and maybe understand from out side if we have a problem.

I saw that you have the TxMetricsCollector class. but it is not doing anything.

Are you planning to add some implementation there ?

Is there another way for us to check health on the manager server ?

Thanks

Ehud

Gary Helmling

unread,

Mar 9, 2015, 8:02:51 PM3/9/15

to eh...@gigya-inc.com, tephr...@googlegroups.com

Hi Ehud,

Yes, TxMetricsCollector is just a base class for metrics collector implementations. The idea is that how metrics are collected is going to depend entirely on what system you are using for storing and processing metrics, so no single solution will work for everyone.

We could provide a base implementation that would work for some common cases, like a JMX based collector. I'll open a JIRA for this.

What system are you using for metrics collection?

Gigya | The Leader in Customer Identity Management
2015 CRM Watchlist Winner
Join Us at The Gigya Grill @ SXSWi from March 13-15

--
You received this message because you are subscribed to the Google Groups "Tephra User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tephra-user...@googlegroups.com.
To post to this group, send email to tephr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tephra-user/60cdb90b-9ee5-460b-bd5b-07d8f3884a8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gary Helmling

unread,

Mar 9, 2015, 8:05:14 PM3/9/15

to eh...@gigya-inc.com, tephr...@googlegroups.com

I opened a JIRA to create a JMX based implementation: https://issues.cask.co/browse/TEPHRA-73

Ehud Lev

unread,

Mar 10, 2015, 1:16:26 PM3/10/15

to Gary Helmling, tephr...@googlegroups.com

Hi Gary,

Currently we use "com.codahale.metrics" and JMX for our system, but it would be nice to get a simple JMX out of the box, and we can also implement it, I just want to know that I am not missing something.

And since we are doing our own POC I would like to ask some other questions:

1. we ran a stress test on the transaction manager in amazon environment.

Some details: each amazon node has 16G ram and 8 cpu's, 3 hdfs/regions nodes, 2 hbase masters, in all of the nodes I installed transaction manager, replication factor 1, cdh4 not exactly your version ( I needed to recompile the code).

The test just checked how many transaction can the manager handle per second, so it started and committed a transaction without write or read.

And the results are:

#processes	# Threads	Trx Amount	Total Trx	Total Time	trx per second
1	50	100	5000	2	2500
1	120	1000	120000	26	4615.384615
1	200	1000	200000	45	4444.444444
1	300	1000	300000	71	4225.352113
5	100	1000	500000	162	3086.419753
5	200	1000	1000000	283	3533.568905
5	300	1000	1500000	462	3246.753247
3	200	1000	600000	141	4255.319149
3	200	10000	6000000	1413	4246.284501

So my conclusion is that the manager in that environment is limited to 5000 transactions per second, does this make sense ?

2. I have one scenario that is unclear to me:

start transaction
write key1 -> value1
commit
start transaction a
transaction a: read key1
start transaction b (with other client)
transaction B : write to key1 -> value2
transaction A : write to key2 : value it got from get key1
commit transaction B
commit transaction A
We don’t get exception !!

Is this as expected or am I doing something wrong ? since transaction A is using the value it got from key 1 I would expect it to throw an exception.

Anyway many thanks for answering so fast

Regards

Ehud Lev

Gary Helmling

unread,

Mar 11, 2015, 3:35:24 PM3/11/15

to Ehud Lev, tephr...@googlegroups.com

There are a few additional configuration parameters that you can tweak when doing your performance tests.

Configuration for the transaction server (property name, default value, description):

`data.tx.server.io.threads`	2	Number of threads for socket IO
`data.tx.server.threads`	20	Number of handler threads

Upping the number of IO threads and server handler threads may help with many concurrent clients.

And additional configuration for the transaction clients (again property name, default value, description):

`data.tx.client.provider`	pool	Client provider strategy: "pool" uses a pool of clients; "thread-local" a client per thread
`data.tx.client.count`	5	Max number of clients for "pool" provider

It's possible your clients are bottlenecking on the thrift client pool. You could try to increase the data.tx.client.count when using a large number of threads, or alternately try using the "thread-local" provider.

Do you have a view of the server operating system resources while your performance test is running? Does that provide any insights into where the constraint is -- CPU, network IO, disk IO?

2. I have one scenario that is unclear to me:
start transaction
write key1 -> value1
commit
start transaction a
transaction a: read key1
start transaction b (with other client)
transaction B : write to key1 -> value2
transaction A : write to key2 : value it got from get key1
commit transaction B
commit transaction A
We don’t get exception !!
Is this as expected or am I doing something wrong ? since transaction A is using the value it got from key 1 I would expect it to throw an exception.

Yes the situation that you describe here is a write skew anomaly, which snapshot isolation does allow, since it only detects write-write conflicts. Since transaction A and transaction B are not attempting to write the same key, they do not encounter a conflict. In general, this is just a limitation of the guarantees that snapshot isolation provides.

However, once Tephra adds configurable conflict detection levels (https://issues.cask.co/browse/TEPHRA-69), you could work around this in the case that both key1 and key2 are in the same row, by using row-level conflict detection. This would convert this special case of write skew into a write-write conflict. Of course, there are times when you do not want writes to different columns in the same row to conflict, so this is not a universal solution.

Reply all

Reply to author

Forward