TxMetricsCollector has empty implementation

23 views
Skip to first unread message

eh...@gigya-inc.com

unread,
Mar 8, 2015, 1:39:32 PM3/8/15
to tephr...@googlegroups.com
Hi all,

We are checking out Tephra, and one of our concerns is how to check that the service is healthy ? 

So you might answer that we can see in zookeeper that we have a connection, but this is not good enough, what if we have a connection and the process is not working well, we would like to see some metrics and maybe understand from out side if we have a problem.

I saw that you have the TxMetricsCollector class. but it is not doing anything.

Are you planning to add some implementation there ?

Is there another way for us to check health on the manager server ?

Thanks

Ehud

Gary Helmling

unread,
Mar 9, 2015, 8:02:51 PM3/9/15
to eh...@gigya-inc.com, tephr...@googlegroups.com
Hi Ehud,

Yes, TxMetricsCollector is just a base class for metrics collector implementations.  The idea is that how metrics are collected is going to depend entirely on what system you are using for storing and processing metrics, so no single solution will work for everyone.

We could provide a base implementation that would work for some common cases, like a JMX based collector.  I'll open a JIRA for this.

What system are you using for metrics collection?



Gigya | The Leader in Customer Identity Management
2015 CRM Watchlist Winner
Join Us at The Gigya Grill @ SXSWi from March 13-15

--
You received this message because you are subscribed to the Google Groups "Tephra User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tephra-user...@googlegroups.com.
To post to this group, send email to tephr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tephra-user/60cdb90b-9ee5-460b-bd5b-07d8f3884a8c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gary Helmling

unread,
Mar 9, 2015, 8:05:14 PM3/9/15
to eh...@gigya-inc.com, tephr...@googlegroups.com
I opened a JIRA to create a JMX based implementation: https://issues.cask.co/browse/TEPHRA-73

Ehud Lev

unread,
Mar 10, 2015, 1:16:26 PM3/10/15
to Gary Helmling, tephr...@googlegroups.com
Hi Gary,

Currently we use "com.codahale.metrics" and JMX for our system, but it would be nice to get a simple JMX out of the box, and we can also implement it, I just want to know that I am not missing something.

And since we are doing our own POC I would like to ask some other questions:

1. we ran a stress test on the transaction manager in amazon environment.
Some details: each amazon node has 16G ram and 8 cpu's, 3 hdfs/regions nodes, 2 hbase masters, in all of the nodes I installed transaction manager, replication factor 1, cdh4 not exactly your version ( I needed to recompile the code). 

The test just checked how many transaction can the manager handle per second, so it started and committed a transaction without write or read.
And the results are:
#processes# ThreadsTrx AmountTotal TrxTotal Timetrx per second
150100500022500
11201000120000264615.384615
12001000200000454444.444444
13001000300000714225.352113
510010005000001623086.419753
5200100010000002833533.568905
5300100015000004623246.753247
320010006000001414255.319149
320010000600000014134246.284501



So my conclusion is that the manager in that environment is limited to 5000 transactions per second, does this make sense ?


2. I have one scenario that is unclear to me: 
  1. start transaction

  2. write key1 -> value1

  3. commit

  4. start transaction a

  5. transaction a: read key1

  6. start transaction b (with other client)

  7. transaction B : write to key1 -> value2

  8. transaction A : write to key2 : value it got from get key1

  9. commit transaction B

  10. commit transaction A

  11. We don’t get exception !!
Is this as expected or am I doing something wrong ? since transaction A is using the value it got from key 1 I would expect it to throw an exception.

Anyway many thanks for answering so fast

Regards

Ehud Lev







Gary Helmling

unread,
Mar 11, 2015, 3:35:24 PM3/11/15
to Ehud Lev, tephr...@googlegroups.com
There are a few additional configuration parameters that you can tweak when doing your performance tests.

Configuration for the transaction server (property name, default value, description):

data.tx.server.io.threads2Number of threads for socket IO
data.tx.server.threads20Number of handler threads
Upping the number of IO threads and server handler threads may help with many concurrent clients.


And additional configuration for the transaction clients (again property name, default value, description):

data.tx.client.providerpoolClient provider strategy: "pool" uses a pool of clients; "thread-local" a client per thread
data.tx.client.count5Max number of clients for "pool" provider
 
It's possible your clients are bottlenecking on the thrift client pool.  You could try to increase the data.tx.client.count when using a large number of threads, or alternately try using the "thread-local" provider.

Do you have a view of the server operating system resources while your performance test is running?  Does that provide any insights into where the constraint is -- CPU, network IO, disk IO?



2. I have one scenario that is unclear to me: 
  1. start transaction

  2. write key1 -> value1

  3. commit

  4. start transaction a

  5. transaction a: read key1

  6. start transaction b (with other client)

  7. transaction B : write to key1 -> value2

  8. transaction A : write to key2 : value it got from get key1

  9. commit transaction B

  10. commit transaction A

  11. We don’t get exception !!
Is this as expected or am I doing something wrong ? since transaction A is using the value it got from key 1 I would expect it to throw an exception.



Yes the situation that you describe here is a write skew anomaly, which snapshot isolation does allow, since it only detects write-write conflicts.  Since transaction A and transaction B are not attempting to write the same key, they do not encounter a conflict.  In general, this is just a limitation of the guarantees that snapshot isolation provides.

However, once Tephra adds configurable conflict detection levels (https://issues.cask.co/browse/TEPHRA-69), you could work around this in the case that both key1 and key2 are in the same row, by using row-level conflict detection.  This would convert this special case of write skew into a write-write conflict.  Of course, there are times when you do not want writes to different columns in the same row to conflict, so this is not a universal solution.
Reply all
Reply to author
Forward
0 new messages