TSDB write throughput drops at the beginning of the hour

hy...@zenoss.com

unread,

May 13, 2014, 12:45:48 PM5/13/14

to open...@googlegroups.com

I am running TSDB 2.0 on top of a 3-node Hbase cluster (CDH 5.0 distribution). Under a constant load, a throughput pattern for writing metrics was observed. The metric injection rate of 50,000 data points/second, and collection interval is 30 seconds for each data point. At the beginning of the hour, the tsdb write throughput drops to 32,000 data points/sec and gradually increases. It jumps to 50,000 data points/sec at about 20 minutes into the hour and stays there till the end of the hour. Then the same cycle starts again.

At the beginning of the hour, the following error is captured in the tsdb log file.

2014-05-12 22:16:07,827 ERROR [New I/O worker #49] CompactionQueue: Failed to delete a row to re-compact

org.hbase.async.RemoteException: org.apache.hadoop.hbase.RegionTooBusyException: Above memstore limit, regionName=tsdb,\x00\x02\x9BSk\xD40\x00\x00\x04\x00\x00\x04\x00\x00\x09\x03\xFA\xF0\x00\x00\x0A\x02\x18g\x00\x00\x0B\x00\x00.,1399672937047.03d555e4a237f2d921ffd52cf524213c., server=localhost,60201,1399669535794, memstoreSize=285459752, blockingMemStoreSize=268435456

at org.apache.hadoop.hbase.regionserver.HRegion.checkResources(HRegion.java:2561)

at org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:1963)

at org.apache.hadoop.hbase.regionserver.HRegionServer.doBatchOp(HRegionServer.java:4050)

at org.apache.hadoop.hbase.regionserver.HRegionServer.doNonAtomicRegionMutation(HRegionServer.java:3361)

at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3265)

at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26935)

at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)

at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

at org.hbase.async.RegionClient.makeException(RegionClient.java:1442) [asynchbase-1.5.0.jar:d543609]

at org.hbase.async.RegionClient.decodeExceptionPair(RegionClient.java:1476) [asynchbase-1.5.0.jar:d543609]

at org.hbase.async.MultiAction.deserialize(MultiAction.java:576) ~[asynchbase-1.5.0.jar:d543609]

at org.hbase.async.RegionClient.decode(RegionClient.java:1309) [asynchbase-1.5.0.jar:d543609]

at org.hbase.async.RegionClient.decode(RegionClient.java:89) [asynchbase-1.5.0.jar:d543609]

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:500) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:485) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) [netty-3.9.0.Final.jar:na]

at org.hbase.async.RegionClient.handleUpstream(RegionClient.java:1080) [asynchbase-1.5.0.jar:d543609]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559) [netty-3.9.0.Final.jar:na]

at org.hbase.async.HBaseClient$RegionClientPipeline.sendUpstream(HBaseClient.java:2652) [asynchbase-1.5.0.jar:d543609]

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) [netty-3.9.0.Final.jar:na]

at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) [netty-3.9.0.Final.jar:na]

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_51]

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51]

at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]

Are there any tunings either on TSDB or HBASE side recommended to avoid this problem and maintain a steady throughput?

Is the row written to hbase only after data cell compaction is done on tsdb side? Because tsdb creates one row for each metric per hour and the data cell compaction always occur at the beginning of the next hour, the heaviest load occurs at the beginning of the hour?

Thanks,

Hong

John A. Tamplin

unread,

May 13, 2014, 2:20:21 PM5/13/14

to hy...@zenoss.com, OpenTSDB

On Tue, May 13, 2014 at 12:45 PM, <hy...@zenoss.com> wrote:

I am running TSDB 2.0 on top of a 3-node Hbase cluster (CDH 5.0 distribution). Under a constant load, a throughput pattern for writing metrics was observed. The metric injection rate of 50,000 data points/second, and collection interval is 30 seconds for each data point. At the beginning of the hour, the tsdb write throughput drops to 32,000 data points/sec and gradually increases. It jumps to 50,000 data points/sec at about 20 minutes into the hour and stays there till the end of the hour. Then the same cycle starts again.

What does cloudera show at that time? Is it doing a region split, major compactions, etc?

3 nodes is tiny for an HBase cluster, and if your steady-state is 50k entries/sec you are going to need to be able to handle higher. We run our metrics via Kafka so we get buffering, and one of the reasons is the HBase performance when doing region splits, compactions, or dealing with a down node.

Our staging cluster is 3 nodes, and our steady-state is about 25k points/sec there, but bursting up to ~150k. Our production cluster is 6 nodes and steady-state is about 72k, with bursts up to 450k (though at 450k, read performance is impacted, so we generally try and keep it to 300k even when catching up).

--
John A. Tamplin

Hong Yang

unread,

May 13, 2014, 3:10:39 PM5/13/14

to John A. Tamplin, OpenTSDB

Thanks, John.

There was a small compaction on the hbase region server. The region server log entries of the same time period is as follow.

My question is why is the throughput drops at the beginning of the hour? Any tuning can smooth it out?

2014-05-12 22:16:03,212 DEBUG [regionserver60201-smallCompactions-1399669538302] backup.HFileArchiver: Finished archiving from class org.

apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, file:file:/var/hbase/data/default/tsdb/03d555e4a237f2d921ffd52cf524213c/t/1d9

bab92d26a4c1e86eb5baaece7b29a, to file:/var/hbase/archive/data/default/tsdb/03d555e4a237f2d921ffd52cf524213c/t/1d9bab92d26a4c1e86eb5baaec

e7b29a

2014-05-12 22:16:03,212 INFO [regionserver60201-smallCompactions-1399669538302] regionserver.HStore: Completed compaction of 4 file(s) i

n t of tsdb,\x00\x02\x9BSk\xD40\x00\x00\x04\x00\x00\x04\x00\x00\x09\x03\xFA\xF0\x00\x00\x0A\x02\x18g\x00\x00\x0B\x00\x00.,1399672937047.0

3d555e4a237f2d921ffd52cf524213c. into 1e6b51acac344d48a4c3f0e65f3e4933(size=337.4 M), total size for store is 6.5 G. This selection was i

n queue for 0sec, and took 4sec to execute.

2014-05-12 22:16:03,212 INFO [regionserver60201-smallCompactions-1399669538302] regionserver.CompactSplitThread: Completed compaction: R

equest = regionName=tsdb,\x00\x02\x9BSk\xD40\x00\x00\x04\x00\x00\x04\x00\x00\x09\x03\xFA\xF0\x00\x00\x0A\x02\x18g\x00\x00\x0B\x00\x00.,13

99672937047.03d555e4a237f2d921ffd52cf524213c., storeName=t, fileCount=4, fileSize=340.6 M, priority=3, time=2168930121328292; duration=4s

ec

2014-05-12 22:16:03,212 DEBUG [regionserver60201-smallCompactions-1399669538302] regionserver.CompactSplitThread: CompactSplitThread Stat

us: compaction_queue=(0:0), split_queue=0, merge_queue=0

2014-05-12 22:16:07,089 DEBUG [RpcServer.handler=24,port=60201] regionserver.HRegion: Flush requested on tsdb,\x00\x02\x9BSk\xD40\x00\x00

\x04\x00\x00\x04\x00\x00\x09\x03\xFA\xF0\x00\x00\x0A\x02\x18g\x00\x00\x0B\x00\x00.,1399672937047.03d555e4a237f2d921ffd52cf524213c.

2014-05-12 22:16:07,089 DEBUG [Thread-11] regionserver.HRegion: Started memstore flush for tsdb,\x00\x02\x9BSk\xD40\x00\x00\x04\x00\x00\x

04\x00\x00\x09\x03\xFA\xF0\x00\x00\x0A\x02\x18g\x00\x00\x0B\x00\x00.,1399672937047.03d555e4a237f2d921ffd52cf524213c., current region mems

tore size 146.6 M

2014-05-12 22:16:07,200 DEBUG [regionserver60201.logRoller] regionserver.LogRoller: HLog roll requested

2014-05-12 22:16:07,206 DEBUG [regionserver60201.logRoller] wal.FSHLog: cleanupCurrentWriter waiting for transactions to get synced tot

al 4885014 synced till here 4885013

Arun Gupta

unread,

May 13, 2014, 7:17:04 PM5/13/14

to open...@googlegroups.com

Given the timing and error messages, this is likely related to OpenTSDB compactions. You could change to code to compact over 30 or minutes to smooth the load.

At Yahoo, we are experimenting with using Appends instead of Puts + compaction. Once we have proven the change end to end, we'll submit a pull request.

John A. Tamplin

unread,

May 13, 2014, 8:27:53 PM5/13/14

to Arun Gupta, OpenTSDB

On Tue, May 13, 2014 at 7:17 PM, Arun Gupta <arun...@gmail.com> wrote:

Given the timing and error messages, this is likely related to OpenTSDB compactions. You could change to code to compact over 30 or minutes to smooth the load.

Yes, we see OpenTSDB compactions impacting read performance at the beginning of hour (99th percentiles go up from ms to a few seconds while 75th percentiles stay low), but no impact on write performance.

At Yahoo, we are experimenting with using Appends instead of Puts + compaction. Once we have proven the change end to end, we'll submit a pull request.

Using append isn't going to deal with duplicate and out-of-order entries though, so you are still going to need some pass if you don't want to rely on reads cleaning them up (which is the case now - you could just stop running the compaction thread if you are satisfied with all compaction taking place on reads).

This will also conflict with my PR#317 which speeds up compaction by assuming existing columns are already sorted. If this is the approach we plan to take for the future, perhaps we need a more in-depth design discussion.

If compactions are the problem, maybe we could make it smarter about scheduling them, such as throttling them during peak times and/or running them solely in off-hours.

--
John A. Tamplin

Hong Yang

unread,

May 14, 2014, 3:49:57 PM5/14/14

to John A. Tamplin, Arun Gupta, OpenTSDB

Arun and John, thanks for your feedback.

There is something we can do in our code to alleviate the impact of compaction at the top of the hour on the write performance.

But I do need some advises on how to avoid the hot region especially during the compaction. Here is our scenario -

1. The total number of metrics is less than 1k

2. There are less than 10 tags per metric

3. The same tag can have over 100K of different values

4. The data is collected at 5-300s interval

5. We pretty much take the default behavior of tsdb, no explicit compaction calls.

6. Two TSDB instances hooked up to the hbase cluster. One serves as writer and the other one as reader.

Any tuning opportunities?

Thanks.

ManOLamancha

unread,

May 15, 2014, 2:20:08 PM5/15/14

to open...@googlegroups.com, Arun Gupta

On Tuesday, May 13, 2014 8:27:53 PM UTC-4, John A. Tamplin wrote:

On Tue, May 13, 2014 at 7:17 PM, Arun Gupta <arun...@gmail.com> wrote:

Given the timing and error messages, this is likely related to OpenTSDB compactions. You could change to code to compact over 30 or minutes to smooth the load.

Yes, we see OpenTSDB compactions impacting read performance at the beginning of hour (99th percentiles go up from ms to a few seconds while 75th percentiles stay low), but no impact on write performance.

At Yahoo, we are experimenting with using Appends instead of Puts + compaction. Once we have proven the change end to end, we'll submit a pull request.

Using append isn't going to deal with duplicate and out-of-order entries though, so you are still going to need some pass if you don't want to rely on reads cleaning them up (which is the case now - you could just stop running the compaction thread if you are satisfied with all compaction taking place on reads).

This will also conflict with my PR#317 which speeds up compaction by assuming existing columns are already sorted. If this is the approach we plan to take for the future, perhaps we need a more in-depth design discussion.

Yahoo's work is actually appending to a different column and is handled in the same way as Annotations so that during reads the appended cell would be injected into a RowSeq independently of the compaction path. Your compaction fix can coexist with the appends.

If compactions are the problem, maybe we could make it smarter about scheduling them, such as throttling them during peak times and/or running them solely in off-hours.

I definitely think this would help too. In fact I need to write a routine to go through and compact my old data as I had compactions disabled for a loooong time.

ManOLamancha

unread,

May 15, 2014, 2:23:53 PM5/15/14

to open...@googlegroups.com, John A. Tamplin, Arun Gupta

On Wednesday, May 14, 2014 3:49:57 PM UTC-4, Hong Yang wrote:

Arun and John, thanks for your feedback.
There is something we can do in our code to alleviate the impact of compaction at the top of the hour on the write performance.
But I do need some advises on how to avoid the hot region especially during the compaction. Here is our scenario -

1. The total number of metrics is less than 1k
2. There are less than 10 tags per metric
3. The same tag can have over 100K of different values
4. The data is collected at 5-300s interval

5. We pretty much take the default behavior of tsdb, no explicit compaction calls.
6. Two TSDB instances hooked up to the hbase cluster. One serves as writer and the other one as reader.

Any tuning opportunities?

How many regions do you have now and which ones are being hit the hardest? You would want to look at the UIDs assigned to your metrics and try to split the busy regions so that you have an even distribution of writes across servers. And you may want to add more region servers.

John A. Tamplin

unread,

May 15, 2014, 3:22:40 PM5/15/14

to ManOLamancha, OpenTSDB, Arun Gupta

On Thu, May 15, 2014 at 2:23 PM, ManOLamancha <clars...@gmail.com> wrote:

How many regions do you have now and which ones are being hit the hardest? You would want to look at the UIDs assigned to your metrics and try to split the busy regions so that you have an even distribution of writes across servers. And you may want to add more region servers.

Speaking of that, the issue of allocating UIDs sequentially means it is hard to split your regions well until you have seem most all metrics, and if you keep creating them you will have to resplit regions anyway.

What about making the row key be:

concat(hash(metric.uid), timestamp, [hash(tagk), hash(tagv)]*)

If the hash function uniformly is 1-to-1 on the 3-byte values (ie, basically xoring with a fixed value and shuffling bits around), you would get near-uniform distribution of the row prefix, and you could pre-split regions by dividing up the 2^24 address space instead of having to know how many metrics you are going to have.

While we are talking about row keys, having the tags after the timestamp is why we had to switch to basically not using tags -- on queries, it was having to read through too much data not matching the query (and speaking of that, is there any reason we actually have to have any tags at all? right now we create a dummy one).

--
John A. Tamplin

ManOLamancha

unread,

May 15, 2014, 4:17:22 PM5/15/14

to open...@googlegroups.com, ManOLamancha, Arun Gupta

On Thursday, May 15, 2014 3:22:40 PM UTC-4, John A. Tamplin wrote:

On Thu, May 15, 2014 at 2:23 PM, ManOLamancha <clars...@gmail.com> wrote:

How many regions do you have now and which ones are being hit the hardest? You would want to look at the UIDs assigned to your metrics and try to split the busy regions so that you have an even distribution of writes across servers. And you may want to add more region servers.

Speaking of that, the issue of allocating UIDs sequentially means it is hard to split your regions well until you have seem most all metrics, and if you keep creating them you will have to resplit regions anyway.

What about making the row key be:

concat(hash(metric.uid), timestamp, [hash(tagk), hash(tagv)]*)

If the hash function uniformly is 1-to-1 on the 3-byte values (ie, basically xoring with a fixed value and shuffling bits around), you would get near-uniform distribution of the row prefix, and you could pre-split regions by dividing up the 2^24 address space instead of having to know how many metrics you are going to have.

Hashing on the metric could certainly work. We wouldn't really need hashing on the tags unless users had a tiny number of metrics. Also someone came up with a patch to randomly assign UIDs so that may be an option as well.

While we are talking about row keys, having the tags after the timestamp is why we had to switch to basically not using tags -- on queries, it was having to read through too much data not matching the query (and speaking of that, is there any reason we actually have to have any tags at all? right now we create a dummy one).

The tags are still useful for aggregation queries and only get in the way if you have really high cardinality but primarily query for a specific time series. I don't think there are any huge blockers to dropping the "at least one tag pair" requirement, probably just tweaks to the row key parser and some asserts.

John A. Tamplin

unread,

May 15, 2014, 4:28:02 PM5/15/14

to ManOLamancha, OpenTSDB, Arun Gupta

On Thu, May 15, 2014 at 4:17 PM, ManOLamancha <clars...@gmail.com> wrote:

Hashing on the metric could certainly work. We wouldn't really need hashing on the tags unless users had a tiny number of metrics. Also someone came up with a patch to randomly assign UIDs so that may be an option as well.

Randomizing the assignment of IDs works, though it seems harder to do concurrently without conflicts.

The big issue with changing the row key is how you switch to it without a stop-the-world upgrade, rewriting keys.

The tags are still useful for aggregation queries and only get in the way if you have really high cardinality but primarily query for a specific time series. I don't think there are any huge blockers to dropping the "at least one tag pair" requirement, probably just tweaks to the row key parser and some asserts.

In our case, the queries are virtually all for fully-specified metrics, and having the tags after the timestamp meant the HBase scans were taking ~5x as long as after we stopped using tags and just have different metrics for the different values (which also matches how existing metrics were being used in Graphite).

--
John A. Tamplin

Hong Yang

unread,

May 16, 2014, 10:01:07 AM5/16/14

to John A. Tamplin, ManOLamancha, OpenTSDB

The discussion is really helpful. I think the issue in my case is that there are a handful of metrics with really high cardinality (150k). Since the region split is based on the row key and metric uid is the leading bytes, it is almost impossible to avoid hot regions especially during the tsdb row compaction. Each of these metrics are split into multiple regions across 3 different servers. But the rows to be compacted and the new rows are still in the same region.

We might need to move some of the tags before the timestamp, make it part of the metric to reduced the cardinality. Is there any general guideline on metric cardinality to ensure high scalibility?

Thanks.

John A. Tamplin

unread,

May 16, 2014, 10:40:25 AM5/16/14

to Hong Yang, ManOLamancha, OpenTSDB

On Fri, May 16, 2014 at 10:01 AM, Hong Yang <hy...@zenoss.com> wrote:

The discussion is really helpful. I think the issue in my case is that there are a handful of metrics with really high cardinality (150k). Since the region split is based on the row key and metric uid is the leading bytes, it is almost impossible to avoid hot regions especially during the tsdb row compaction. Each of these metrics are split into multiple regions across 3 different servers. But the rows to be compacted and the new rows are still in the same region.

This is just a half-baked idea and it isn't clear how we would incrementally transition to it, but it seems like for the case of tags that are always going to be present, we do want those in the key before the timestamp, and optional tags should continue to be after the timestamp. Maybe as part of the metadata we can define required tags with each metric and use that in building the row key.

We might need to move some of the tags before the timestamp, make it part of the metric to reduced the cardinality. Is there any general guideline on metric cardinality to ensure high scalibility?

Well, in our case we went to 1 to get decent read performance, particularly for queries of 10-sec interval data across 2 weeks -- those queries were taking several minutes and now take 5s (still long, but workable - to get that I had to make our graphite gateway do downsampling, otherwise it took an additional 10s just to serialize the response)..

--
John A. Tamplin

ManOLamancha

unread,

May 19, 2014, 7:34:12 PM5/19/14

to open...@googlegroups.com, John A. Tamplin, ManOLamancha

On Friday, May 16, 2014 10:01:07 AM UTC-4, Hong Yang wrote:

The discussion is really helpful. I think the issue in my case is that there are a handful of metrics with really high cardinality (150k). Since the region split is based on the row key and metric uid is the leading bytes, it is almost impossible to avoid hot regions especially during the tsdb row compaction. Each of these metrics are split into multiple regions across 3 different servers. But the rows to be compacted and the new rows are still in the same region.
We might need to move some of the tags before the timestamp, make it part of the metric to reduced the cardinality. Is there any general guideline on metric cardinality to ensure high scalibility?

No real guidelines regarding cardinality. It just depends on the number of region servers you have running to spread the load around. 150K is pretty high for a tag so that would certainly be a good candidate for migration to the metric name.

ManOLamancha

unread,

May 19, 2014, 7:40:23 PM5/19/14

to open...@googlegroups.com, Hong Yang, ManOLamancha

On Friday, May 16, 2014 10:40:25 AM UTC-4, John A. Tamplin wrote:

On Fri, May 16, 2014 at 10:01 AM, Hong Yang <hy...@zenoss.com> wrote:

The discussion is really helpful. I think the issue in my case is that there are a handful of metrics with really high cardinality (150k). Since the region split is based on the row key and metric uid is the leading bytes, it is almost impossible to avoid hot regions especially during the tsdb row compaction. Each of these metrics are split into multiple regions across 3 different servers. But the rows to be compacted and the new rows are still in the same region.

This is just a half-baked idea and it isn't clear how we would incrementally transition to it, but it seems like for the case of tags that are always going to be present, we do want those in the key before the timestamp, and optional tags should continue to be after the timestamp. Maybe as part of the metadata we can define required tags with each metric and use that in building the row key.

Moving one or more tag pairs before the timestamp would help the distribution but would slow down write throughput as you'd have to now lookup a rule before every write to see where a tag should go. It also makes parsing a row key a bit more difficult since it's all a byte array.

We might need to move some of the tags before the timestamp, make it part of the metric to reduced the cardinality. Is there any general guideline on metric cardinality to ensure high scalibility?

Well, in our case we went to 1 to get decent read performance, particularly for queries of 10-sec interval data across 2 weeks -- those queries were taking several minutes and now take 5s (still long, but workable - to get that I had to make our graphite gateway do downsampling, otherwise it took an additional 10s just to serialize the response)..

Was the 5 additional seconds without downsampling due to transmission or actual serialization?

John A. Tamplin

unread,

May 20, 2014, 1:18:36 AM5/20/14

to ManOLamancha, OpenTSDB, Hong Yang

On Mon, May 19, 2014 at 7:40 PM, ManOLamancha <clars...@gmail.com> wrote:

Moving one or more tag pairs before the timestamp would help the distribution but would slow down write throughput as you'd have to now lookup a rule before every write to see where a tag should go. It also makes parsing a row key a bit more difficult since it's all a byte array.

All the metric metadata should be cached anyway, right? So a bit extra in-memory processing is negligible compared to the rest of what is going on.

Was the 5 additional seconds without downsampling due to transmission or actual serialization?

I didn't measure it separately, just time curl with and without downsampling to 30s (it was 5s vs 15s though, not just a 5s delta).

--
John A. Tamplin

John A. Tamplin

unread,

Jun 3, 2014, 12:34:41 PM6/3/14

to Arun Gupta, OpenTSDB

On Tue, May 13, 2014 at 7:17 PM, Arun Gupta <arun...@gmail.com> wrote:

At Yahoo, we are experimenting with using Appends instead of Puts + compaction. Once we have proven the change end to end, we'll submit a pull request.

Can you share the code you are currently using even if you aren't ready to submit it for review yet?

--
John A. Tamplin

Arun Gupta

unread,

Jun 17, 2014, 4:26:32 AM6/17/14

to John A. Tamplin, OpenTSDB

Sorry for the late response. We are aiming to get the pull request out in 2 weeks.

Arun Gupta

unread,

Jul 18, 2014, 10:52:18 PM7/18/14

to open...@googlegroups.com, j...@jaet.org

Here is the pull request for Appends instead of Puts. It also includes random distribution of metric ids.

https://github.com/OpenTSDB/opentsdb/pull/353

John A. Tamplin

unread,

Jul 19, 2014, 12:02:32 AM7/19/14

to Arun Gupta, OpenTSDB

We have up on the Append idea because performance was too low. If you think about it, it makes sense - append requires rewrites for every sample, while compaction just rewrites it once. We have currently disabled compaction and put up with extra space, and are investigating ways to make that less painful.

Reply all

Reply to author

Forward