1) Do the canCommit and commit RPCs still happen if the change set is empty? This would be the case in Phoenix connection that is doing only querying.
2) What kind of tuning is available on the Transaction Manager? For example, how big is the thread pool for obtaining transaction ids and can we make it bigger through a config param?
--We've done an initial pass to determine the overhead of transactions for write-once/append-only data through Tephra in Phoenix - see https://issues.apache.org/jira/browse/PHOENIX-1901 for details. Looks like about a 20% perf hit across the board for our concurrent query workloads. The sets of queries between the dashed lines here[1] are run in parallel. The bottom set (prefixed with "Serial") are run serially and these look to be more similar perf-wise with and without transactions. I'm wondering if perhaps we're seeing a bottleneck in the transaction manager. Two questions:1) Do the canCommit and commit RPCs still happen if the change set is empty? This would be the case in Phoenix connection that is doing only querying.2) What kind of tuning is available on the Transaction Manager? For example, how big is the thread pool for obtaining transaction ids and can we make it bigger through a config param?Thanks,James
You received this message because you are subscribed to the Google Groups "Tephra Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tephra-dev+...@googlegroups.com.
To post to this group, send email to tephr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tephra-dev/CAAF1Jdi6G55UHP2rKZUefFtTpt7FV%3DSrKxRM9R0K3a9MgEi2sw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Yes, today canCommit and commit RPCs still happen if the change set is empty. We should be able to reduce the transaction RPC overhead for transactions with empty change sets. Can you please file a JIRA for this?
How many concurrent transaction RPCs happen in the test? Depending on that we need to increase thread pool size.
The other setting that might need increasing is the heap memory size allocated to Transaction Manager. The default heap size might be too low.
Thanks,Poorna.On Tue, Dec 22, 2015 at 3:43 PM, James Taylor <james...@apache.org> wrote:--We've done an initial pass to determine the overhead of transactions for write-once/append-only data through Tephra in Phoenix - see https://issues.apache.org/jira/browse/PHOENIX-1901 for details. Looks like about a 20% perf hit across the board for our concurrent query workloads. The sets of queries between the dashed lines here[1] are run in parallel. The bottom set (prefixed with "Serial") are run serially and these look to be more similar perf-wise with and without transactions. I'm wondering if perhaps we're seeing a bottleneck in the transaction manager. Two questions:1) Do the canCommit and commit RPCs still happen if the change set is empty? This would be the case in Phoenix connection that is doing only querying.2) What kind of tuning is available on the Transaction Manager? For example, how big is the thread pool for obtaining transaction ids and can we make it bigger through a config param?Thanks,James
You received this message because you are subscribed to the Google Groups "Tephra Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tephra-dev+...@googlegroups.com.
To post to this group, send email to tephr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tephra-dev/CAAF1Jdi6G55UHP2rKZUefFtTpt7FV%3DSrKxRM9R0K3a9MgEi2sw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Tephra Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tephra-dev+...@googlegroups.com.
To post to this group, send email to tephr...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/tephra-dev/CAJhv_HVnzpVNd4BhfkKEuuE6%3DJAd8FJ7mYSAcmECpvBnEdLjEg%40mail.gmail.com.
On Tue, Dec 22, 2015 at 4:55 PM, Poorna Chandra <poo...@cask.co> wrote:Yes, today canCommit and commit RPCs still happen if the change set is empty. We should be able to reduce the transaction RPC overhead for transactions with empty change sets. Can you please file a JIRA for this?Filed TEPHRA-158.
How many concurrent transaction RPCs happen in the test? Depending on that we need to increase thread pool size.There are 8 concurrent clients issuing queries
The other setting that might need increasing is the heap memory size allocated to Transaction Manager. The default heap size might be too low.FYI, for this tests, all the data is immutable, so no conflict detection is performed. Do you still think we need to increase the default heap size?
In http://phoenix-bin.github.io/client/publish/pherf-txn.html, I see that there are 3 sections. The first section has 8 tests, the second one 5 and the third one 5 tests. Does the 8 concurrent clients come from the tests of the first section running in parallel? Or does each test make 8 concurrent transaction manager calls, leading to a maximum 64 concurrent calls? I'm trying to understand why 20 worker threads are not able to handle 8 concurrent transaction RPCs.
For each transaction we store the transaction id and a change set (currently we default to an empty set if change set is of size 0, this can use some optimization). So depending on how many transactions are in-progress at a given time, this might consume some memory. If you have you enabled GC logging, this is one good thing to rule out.
At this point, it's pure speculation where the slowdown is, so it might not be the bottlenecked on the tx manager at all. Another theory is that the overhead is due to the extra filtering out of in flight transactions.
Would be good if a read-only client didn't need to add to the in flight transaction list. Another optimization might be to only add to the in flight transaction list when a client writes data to HBase, since it's not needed otherwise. This would require another RPC, so it'd be ideal to do this only optionally. In Phoenix, if a client doesn't run a query against uncommitted data, it won't be written to HBase if/when it's committed. I'll file JIRAs for these if you think these ideas have merit.
At this point, it's pure speculation where the slowdown is, so it might not be the bottlenecked on the tx manager at all. Another theory is that the overhead is due to the extra filtering out of in flight transactions.We can figure out whether the slowdown is due to 3 transaction RPC calls overhead, or the inflight transaction filtering by repeating the same query multiple times in a single transaction. If the issue is due to RPC calls overhead then this test will run without any slowdown, but if the issue is due to inflight transaction filtering then this test will still be slow. Is it possible to modify the test to do this?
Would be good if a read-only client didn't need to add to the in flight transaction list. Another optimization might be to only add to the in flight transaction list when a client writes data to HBase, since it's not needed otherwise. This would require another RPC, so it'd be ideal to do this only optionally. In Phoenix, if a client doesn't run a query against uncommitted data, it won't be written to HBase if/when it's committed. I'll file JIRAs for these if you think these ideas have merit.Today Tephra does not optimize for read-only transactions use case. I'll say file one umbrella JIRA for optimizing read only transactions, and put all the above suggestions in it. We can create sub-tasks when we are ready to implement them.