pcpStarted: false iterations: 10000 numThreads: 8 numCounters: 1000 incrementRate(/sec): 65,146,579.80 blockedCount: 27 blockedTime: 162
pcpStarted: true iterations: 10000 numThreads: 8 numCounters: 1000 incrementRate(/sec): 2,788,136.48 blockedCount: 246664 blockedTime: 124435
Fairly heavy hit there as we'd expect. So, I was thinking about the synchronisation, and I remember you saying that we could just create a lock object per metric, BUT, can't we already use the PcpValueInfo object we're using here to synch?
@SuppressWarnings("unchecked")
protected final void updateValue(PcpValueInfo info, Object value) {
@SuppressWarnings("rawtypes")
TypeHandler rawHandler = info.getTypeHandler();
synchronized (dataFileBuffer) {
dataFileBuffer.position(rawHandler.requiresLargeStorage() ? info.getLargeValue()
.getOffset() : info.getOffset());
rawHandler.putBytes(dataFileBuffer, value);
}
}
I was thinking we can't actually create a lock per metric because right now the synch is needed to protect the dataFileBuffer positioning for the update. I think instead we need to create a ByteBuffer.slice() per PcpValueInfo which allows independent positioning (and indeed, could just be positioned once at init time). hen we don't need any synchronization at all do we?
thoughts?
Paul
I was thinking we can't actually create a lock per metric because right now the synch is needed to protect the dataFileBuffer positioning for the update. I think instead we need to create a ByteBuffer.slice() per PcpValueInfo which allows independent positioning (and indeed, could just be positioned once at init time). hen we don't need any synchronization at all do we?
thoughts?
Host: app3.syd.acx Java: 1.6.0_26
Thread Contention Supported: true, Enabled by default: false
numThreads: 32, numCounters=1000, iterations=10000
pcpStarted: false perMetricLock: false incrementRate(/sec): 131,958,762.89 blockedCount: 3 blockedTime: 36
pcpStarted: true perMetricLock: false incrementRate(/sec): 1,880,279.72 blockedCount: 0 blockedTime: 0
pcpStarted: true perMetricLock: true incrementRate(/sec): 13,451,595.28 blockedCount: 2435900 blockedTime: 107871
pcpStarted: true perMetricLock: true incrementRate(/sec): 14,482,259.23 blockedCount: 2468246 blockedTime: 110775
pcpStarted: true perMetricLock: false incrementRate(/sec): 1,962,068.48 blockedCount: 0 blockedTime: 0
pcpStarted: false perMetricLock: false incrementRate(/sec): 86,253,369.27 blockedCount: 2 blockedTime: 5
I do not know why the blockedCount & time count out 0 for the global lock run (row 2 & 2nd to last), but it is consistent when I run it. On other hosts it does report, here's our current production hardware, quite a bit older:
app1.syd (current production app server - ALTHOUGH, using Update 26)
numThreads: 8, numCounters=1000, iterations=10000
pcpStarted: false perMetricLock: false incrementRate(/sec): 64,777,327.94 blockedCount: 14 blockedTime: 22
pcpStarted: true perMetricLock: false incrementRate(/sec): 1,593,466.79 blockedCount: 9731521 blockedTime: 185599
pcpStarted: true perMetricLock: true incrementRate(/sec): 6,963,788.30 blockedCount: 239336 blockedTime: 77
pcpStarted: true perMetricLock: true incrementRate(/sec): 7,303,934.99 blockedCount: 266851 blockedTime: 113
pcpStarted: true perMetricLock: false incrementRate(/sec): 1,512,516.07 blockedCount: 10133819 blockedTime: 195594
pcpStarted: false perMetricLock: false incrementRate(/sec): 85,378,868.73 blockedCount: 2 blockedTime: 2
So I'm presuming that the rate increase in counter increments is due to vastly reduced blockedTime which makes total sense.
So, I think this is definitely a better way than before, I just need to get the code to work properly and actually write to the buffers properly. Hopefully whatever change is needed to get this working doesn't materially affect the benchmark data, but I can run it again anyway.
Cowan, can you review the recent commits and pay closer attending to the BasePcpWriter changes? Something in the buffer slicing and positioning contract I'm missing from the docs.
tah,
Paul
Somehow the slicing/positioning isn't working as expected, because when I connect pmchart to pcp and plot one of the generated counters, it displays correctly with the global lock, but I get 0's per second when it switches the test to perMetricLock.
ByteBuffer metricByteBufferSlice = dataFileBuffer.slice();metricByteBufferSlice.position(bufferPosition);
The content of the new buffer will start at this buffer's current position.
Having said that, while the position the buffer is writing too is probably incorrect, I think the results are still valid and interesting. I tested across my MBP, our load test machine, our current production hardware, and the new one about to be commisioned, and all but my MBP show vastly increased performance (snickering from the non-Mac folk).
Somehow the slicing/positioning isn't working as expected, because when I connect pmchart to pcp and plot one of the generated counters, it displays correctly with the global lock, but I get 0's per second when it switches the test to perMetricLock.It's this bit:ByteBuffer metricByteBufferSlice = dataFileBuffer.slice();metricByteBufferSlice.position(bufferPosition);From the docs for 'slice':The content of the new buffer will start at this buffer's current position.
As an added bonus (in fact, for thread-safety we REALLY should do this), also call .limit(int) to limit the writing area of the slice to the correct size for the data type - then all the slices will be non-overlapping (or you'll get an error if you try to overrun the correct data area) which gives me a lot more confidence in this. :)
Awesome work, thanks. This is much better. Happy for the per-thread lock option to only be an option temporarily - this should be the default (hell, the only supported one) eventually.
Ooh, forgot the minor scolding: until this is production-ready, should probably be done in a separate branch (not 'default'). Tsk tsk. :)
Somehow the slicing/positioning isn't working as expected, because when I connect pmchart to pcp and plot one of the generated counters, it displays correctly with the global lock, but I get 0's per second when it switches the test to perMetricLock.It's this bit:ByteBuffer metricByteBufferSlice = dataFileBuffer.slice();metricByteBufferSlice.position(bufferPosition);From the docs for 'slice':The content of the new buffer will start at this buffer's current position.
So, now for my Tales of WOH. I spent far too long on this, I hate you author-of-ByteBuffer-I-damn-you-to-all-of-eternity. What is happening is that the .slice() call indeed returns a buffer backed by the parent, but of a DIFFERENT BYTE ORDER than the parent.