corruption? problems with properties and indexes

38 views
Skip to first unread message

David Kimdon

unread,
Jun 21, 2016, 7:53:45 PM6/21/16
to Aurelius
Hi,

We are experiencing a number of data problems with our graph.  We have looked through our application code and not found a way that we could be writing such data.  We suspect a problem somewhere in titan or cassandra or our cassandra/titan config/operations.  If anyone has ideas that might help us they would be much appreciated.

Thanks in advance,

David

* 1. missing user defined labels - Whenever our application creates a vertex the vertex is given a label.  We never use the label 'vertex', however, sometimes we find vertexes that have a label of 'vertex'.  I don't know where that string is coming from.

gremlin> g.V(958259245056).label()
==>vertex
gremlin> 

* 2. unindexed vertices - We initialized the graph indices before adding any vertices.  We also tried rebuilding the index in question.  Still, we have vertices that cannot be found via indexed lookups.  We can only find these via vertex id.

# When I do a lookup by properties and label I don't find it:
gremlin> g.V().hasLabel('accessPoint').has('bssid', '50602816b3a1').has('ssid', '415641432057494649')
gremlin> 

# But if I do the lookup by vertex id I do find it:
gremlin> g.V(76362871024).label()
==>accessPoint
gremlin> g.V(76362871024).values('bssid', 'ssid')
==>50602816b3a1
==>415641432057494649
gremlin>

* 3. missing properties -

I don't have an example handy for this one.  But the idea is that we know our application code always writes certain properties to vertexes of a given label. However, we sometimes find these vertices are missing the expected properties.

* 4. indexed vertices but the indexed properties are missing from the vertex - This is an odd one.  I do a lookup using properties that are indexed but then I find a vertex that doesn't have the indexed properties.

gremlin> g.V().hasLabel('accessPoint').has('bssid', '8e705a7fef00').has('ssid', '44532d4775657374').valueMap()
==>[:]
gremlin>


Stephen Mallette

unread,
Jun 27, 2016, 8:12:34 AM6/27/16
to Aurelius
It's hard to say what might be at issue as you didn't provide a way to attempt to reproduce what you are seeing. Indexing weirdness seems to occur a fair bit on this list, but the missing labels/properties is more strange (perhaps related). Are those vertices with the default "vertex" label recognizable? In other words, are there other properties present that let you know what the label should have been or is it essentially an empty vertex?

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/CAFgM5FR-bxeORwva1AoUrToma3LHQ1ixczOYet9XPh1Hp00M%3DQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

David Kimdon

unread,
Jul 6, 2016, 6:36:33 PM7/6/16
to Aurelius
On Mon, Jun 27, 2016 at 5:12 AM, Stephen Mallette <spmal...@gmail.com> wrote:
It's hard to say what might be at issue as you didn't provide a way to attempt to reproduce what you are seeing. Indexing weirdness seems to occur a fair bit on this list, but the missing labels/properties is more strange (perhaps related). Are those vertices with the default "vertex" label recognizable? In other words, are there other properties present that let you know what the label should have been or is it essentially an empty vertex?

They are recognizable, based on the properties (we have some properties only on specific vertices so it is easy to know what the label is supposed to be)  and also in some cases based on the index used to lookup (when the index used only contains specific vertices).  I haven't seen empty vertices (vertices without any properties) when the label is missing.

FWIW, what we have done to address this for now is to make our application log complaints when it sees these issues.  We will repair them, and hopefully see them gradually go away (still working on this next part).  The underlying theory being that some operational issues we had early on that have been resolved created the bad data and it just need to be cleanup up.



 

Stephen Mallette

unread,
Jul 6, 2016, 6:55:56 PM7/6/16
to Aurelius
that sounds like a good approach. i've found that it's good practice, especially on a large graph with billions of edges, to make sure that you have some way of validating that your graph is growing properly. maybe you just sample the graph or you run global gremlin queries with Spark or whatever....you just need something that tells you on a relatively frequent basis that the graph isn't going bonkers.  it's not easy to clean up large graphs once they are hosed. it's best to find those problems as early as possible.


Reply all
Reply to author
Forward
0 new messages