I am running a three node cluster with opentsdb and we are pushing around 15K points a second through a TSD running on the hbase master. I have heard that a single TSD can easily handle that but we are running into some issues.
after about 50 to 70 million points written (an hour or so, fairly stable) I start seeing this on the regionserver hbase logs ->
2017-06-20 21:04:44,319 WARN [B.defaultRpcServer.handler=4,queue=1,port=16020] regionserver.MultiVersionConcurrencyControl: STUCK: MultiVersionConcurrencyControl{readPoint=18008, writePoint=18038}
The master server then starts cpu spiking and memory usage grows quickly. Hbase shows a spike of 25K points written per sec. After a couple minutes of this the regionserver memstores overload and the writes go down to zero with this in the hbase regionserver log
2017-06-20 22:03:37,655 WARN [B.defaultRpcServer.handler=22,queue=1,port=16020] regionserver.MemStoreFlusher: Memstore is above high water mark and block 65020ms
I pre-split the regions to about 15 on each regionserver and the writes are fairly evenly distributed.
Am i at the limits of my cluster or is something configured incorrectly?
Any help would be appreciated.