Performance benefit of caching indices for NamedData in DataGroups.

8 views
Skip to first unread message

Ryan Beasley

unread,
Aug 18, 2014, 4:54:36 PM8/18/14
to openSurgSim
OpenSurgSim list,
We have recently done a performance test (code below) of caching the indices into NamedData, which are the work-horses of DataGroups.

On my Win64 system with strings of length 10, name-based access is about 3-4 times slower than index-based access, and the time per call to NamedData::get is between 11 nanoseconds and 2.4 microseconds.  Specifically:
In a Debug build, 2 million calls to NamedData::get (100,000 loops over 20 entries) is taking 4.7 seconds for access by name, vs 1.7 seconds for access by index.
In Release the same code takes 81 milliseconds vs 21 milliseconds, respectively.
Note that in these tests the values may be getting accessed from the cache when they might have been flushed in normal code.  For this and other reasons, measured testing times may be lower (i.e., faster) than performance "in the field".

Based on the above numbers for the Release target, 20 calls to NamedData::get would take 0.81 microseconds by name or 0.21 microseconds by index, a difference of 0.6 microsecond.  If 1000 Hz is the target update rate then the target period is 1 millisecond, so that difference is three orders of magnitude smaller than the target period.

Based on these results, caching indices from NamedData is considered a low priority and only recommended if indicated by performance analysis.

Sincerely,
Ryan Beasley

Computer Scientist
SimQuest




namespace
{
const std::string array[] =
{"uzgfk1f41V",
"PbpZficBR2",
"M3OYjZgHXX",
"WVIEjG3QaX",
"2o6nN2nuaW",
"3rWXCXR2gi",
"XO2dATxWFq",
"M2xpjE6PAL",
"cT1Bmj3Z1U",
"65UtrrfXn7",
"tNltVIcurd",
"jFVkKzn17i",
"0QlA0FAPc3",
"jCXGBopngK",
"GTwY4YyVnC",
"0uVPK4S9Le",
"iNdq3p6ZQb",
"LfREPeFczN",
"9RKdlFnm1I",
"N2acPVhGY2"};

const std::vector<std::string> names(std::begin(array), std::end(array));

const int numberOfLoops = 100000;
};

namespace SurgSim
{
namespace DataStructures
{

class NamedDataTest : public ::testing::Test
{
public:

virtual void SetUp()
{
NamedDataBuilder<int> builder;
for (std::string name : names)
{
builder.addEntry(name);
}

data = builder.createData();

for (int i = 0; i < names.size(); ++i)
{
ASSERT_TRUE(data.set(i, i));
indices.push_back(i);
}
}

virtual void TearDown()
{
}

NamedData<int> data;
std::vector<int> indices;
};

TEST_F(NamedDataTest, GetByName)
{
int value;
const int numberOfEntries = static_cast<int>(names.size());

SurgSim::Framework::Timer timer;
for (int i = 0; i < numberOfLoops; ++i)
{
for (int j = 0; j < numberOfEntries; ++j)
{
ASSERT_TRUE(data.get(names[j], &value));
}
}
timer.endFrame();
}

TEST_F(NamedDataTest, GetByIndex)
{
int value;
const int numberOfEntries = static_cast<int>(indices.size());

SurgSim::Framework::Timer timer;
for (int i = 0; i < numberOfLoops; ++i)
{
for (int j = 0; j < numberOfEntries; ++j)
{
ASSERT_TRUE(data.get(indices[j], &value));
}
}
timer.endFrame();
}

} // namespace DataStructures
} // namespace SurgSim

Paul Novotny

unread,
Aug 19, 2014, 11:28:54 AM8/19/14
to opens...@simquest.com
Ryan, I think this is a good test that demonstrates when to use the
index based access to data from devices. The name (string) based access
produces simpler, easier to read code, and as you demonstrated should be
the default. Caching and using indices, does add a bit more overhead,
and is useful when a _micro_ second matters.

Just to summarize Ryan's findings, you save 30 nanoseconds per access if
you use index based over name based.

-Paul
Reply all
Reply to author
Forward
0 new messages