Hello,
The 10M edges include the multiple edges between the same nodes? Or
is the final number of "unique" edges?
What's the average number of duplicates for an unique edge?
If you have only a few duplicates, you are adding the cost of a
FindEdge (which is expensive), to the NewEdge for a lot of edges. That could explain
the performance problem.
But, if you have a lot of duplicates, i guess that the performance
should not be that different. In this case a FIndEdge/GetAttribute
is added but the NewEdge is avoided.
To optimize a little your code, you could try these:
-
Avoid doing a FindAttribute for each duplicated edge. You have the
"edge_attr_count_type"
precalculated somewhere. Use it in the GetAttribute too.
-
You could also use another GetAttribute method where you give a
value as an argument instead of receiving a new one. That's a little better (probably not much) because you can reuse a
single Value object for all your edges. The same value object could
be reused in the SetAttribute of a new edge too.
long found_edge = graph.FindEdge(edge_type, a_node, b_node);
if (found_edge == Objects.InvalidOID)
{
found_edge = graph.NewEdge(edge_type, a_node, b_node);
// Reuse the existing Value Object ("val")
graph.SetAttribute(found_edge, edge_attr_count_type, val.SetInteger(0));
}
else
{
// Get the attribute in the existing "val" object using the precalculated attribute type
graph.GetAttribute(found_edge, edge_count_attr_type, val);
int get_count = val.GetInteger();
graph.SetAttribute(found_edge, edge_attr_count_type, val.SetInteger(get_count + 1));
}
You could always do some optimizations independent of DEX. For
example, you could try to sort your edges first. Then you could just
count the equal consecutive ones and do a single NewEdge /
SetAttribute for each unique edge without having to check it's existence. But I don't
know your data. That may not be possible or easy.
Best regards.
El divendres 22 de febrer de 2013 10:37:15 UTC+1, maxteneff va escriure: