Considering a project to do the followng:
Given,
1. have access to a single (fully isolated, Ionly user) 8xCPU Linux instance (128gb RAM)
3. each JVM booted with-Xms15g -Xmx15g -XX:+UseConcMarkSweepGC -XX:+UseTLAB -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycleMin=0 -XX:CMSIncrementalDutyCycle=10 -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128 -XX:CMSInitiatingOccupancyFraction=50
4. a long running MapReduce task that is broken up as 2 sub-tasks
4.1 "prepare data" step = extract from an RDBMS, transform into a large CacheMap<K,V> abstraction, and distribute (load) onto my 8xJVM grid (all I/O-bound to construct the MapReduce operand)
4.2 "render result" step = use Infinispan's DistributedExecutorServiceMap API and framework to initiate the processing (outside of network loop-back hops, no other I/O-bound to affect the MapReduce operation)
ambition is to then compare/contrast the performance of 1-4 above being processed on permutations of the following:
P0. use taskset(1) to boot all 8xJVM process set with complete affintiy to CPU 0 (fully aware 7 other CPUs effectiveyly unused)
P1 use taskset(1) to boot 7xJVM process set with affinity to CPU 0, 1 xJVM process set to CPU 1
P2. use taskset(1) to boot 6xJVM process set with affinity to CPU 0, 1 x JVM process set to CPU 1, 1xJVM process set to CPU 2
....
P7 use taksset(1) to boot 1x JVM process with affinity to CPU 0, 1 x JVM process set to CPU 1, ..., 1xJVM process set to CPU 7 (fully aware I may be overly distributing wrt to CPUs' local physical cache(s) not being best effectiveyly utilized)
of course I will also want to compare/contrast the performance of 1-4 above without any taskset(1) affinity influence, completely allowing the default OS to handle all distribution and scheduling.
I may also consider running other variants of JVM:CPU_affintinity configurations (maybe even all possible members of the full permutation set).
Independent of any actual performance metrics produced, what should be expected conceptually when considering possible merits/delinquencies/trade-offs of any specific point of the permutations making up JVM:CPU_affinity distribution curve (especially interested to know what tradeoffs to consider re CPU locality to physical cache(s) usage) ?
.