Of course L1 cache (content and bandwidth) is shared by the logical
cores - they are /logical/ cores, not physical cores. They share almost
everything - instruction decoders, pipelines, execution units, buffers,
etc. They have separate logical sets of ISA registers (the registers
visible to the programmer), but on devices like x86 chips (where the ISA
has few registers) there are many more physical hardware registers that
are mapped at different times - and the logical cores share them too.
Cores and caches are organised as a hierarchy in multi-core devices.
The highest bandwidths are in the closest steps - physical cores to
their L1 caches, cores to cores within a core cluster (if the chip has
this level), L1 caches to their L2 caches, L2 caches to the L3 cache
(usually shared amongst all cores on the chip), bandwidth off-chip.
Usually the off-chip bandwidth is shared amongst all cores, but for
multi-module chips like AMD's new devices, each chip in the module has
its own buses off the module.
In other words - it is complicated, depends totally on the level of
cache you are talking to, and details are specific to the device
architecture.
And as has been pointed out, it has /nothing/ to do with C++ - it is
general architecture issue, independent of language. Unless you are
targeting for a specific chip (such as fine-tuning for a particular
supercomputer model), you use the same general rules for all languages,
and all chips: Aim for locality of reference in your critical data
structures. Keep the structures small. Avoid sharing and false sharing
between threads. Use an OS that is aware of the memory architecture of
your processor, and the geometry of its logical and physical cores.
(I am replying to you here, for your interest. I have long ago seen it
as pointless trying to talk to Fir.)