using netflix graph for a multi-graph

155 views
Skip to first unread message

Frank San Miguel

unread,
Apr 17, 2014, 3:29:42 PM4/17/14
to netfli...@googlegroups.com
Hi,

I really like this library but my use case requires one of the following:

A) each edge has a list of 0 to N attributes  
  OR
B) there are multiple edges between the same two nodes and each edge has a single attribute

Has anyone consider how this might be accomplished?  I'm willing to contribute back the results if it is generally useful.

Thanks,
Frank

Drew Koszewnik

unread,
Apr 17, 2014, 4:21:55 PM4/17/14
to netfli...@googlegroups.com
Hi Frank,

I'm glad to hear you like the library!

I think this can be done in a way that is generally useful.  Tell me more about your use case.  You indicate that you want to tag each connection with 0-n properties.  Are these properties likely to contain many duplicates?  Can we assume that these properties are representable with Strings?  Can you provide some specifics about the volume of data we might be talking about?

For the tagged properties, let's assume there will be many duplicates.  In this case, I think you'll likely want to assign each unique tag value to an "ordinal value", which can be done using an OrdinalMap<String>.  Now, for each property, we need to represent a mapping ((from node, to node) -> tag set).  Because we've assigned each unique tag an ordinal value, this "tag set" is representable as (for example) a gap-encoded set of variable-length integers, or a bit set if there is a small number of total possible tags.  

If you want to keep this off to the side (for example, this might be useful if the number of connections which are tagged is low relative to the total number of connections), then this can be done fairly easily without modifying NFGraph itself (by including the tag data along with the NFGraph in the serialized artifact you transmit to consumers).

You may instead want to write this encoded set of tags in-line in the graph where the properties are being encoded.  This will save you from having to represent the ((from node, to node) -> tag set) mapping, because you would be able to read the values as you iterate with a new OrdinalIterator implementation.  If you want to go this route, it will be important to ensure that this modification has no impact on the existing use cases.  I think this can be done by requiring properties with this feature to be specified in the NFGraphSpec with a new flag (e.g. TAGGABLE v. UNTAGGED) in the same way COMPACT v. HASH, or SINGLE v. MULTIPLE is done.  The default should be UNTAGGED to retain backwards compatibility, and the encoding should remain unchanged for graphs which do not use the feature.

I look forward to hearing your thoughts.

Thanks,
Drew.

Frank San Miguel

unread,
Apr 19, 2014, 10:40:06 AM4/19/14
to netfli...@googlegroups.com
Drew,

The picture below shows what I mean by multigraph OR edges with attributes:


  • The left side shows a multigraph representation.  The right is the same thing using edge attributes.  Attribute '0' is the default and can be implicitly assumed / omittied.
  • It is a digraph with a low diameter (10 to 20) and is scale-free.  There are cuts that can result in a directed acyclic graph.
  • The graph is fairly small (less than 10 million nodes) but I want it to take as little RAM as possible.  
  • The average degree is 2 to 5. 
  • Labels (or edges) will be unique for a given set of nodes (x,v) - no duplicates.  
  • A given label may be assigned to more than one edge (e.g. label 'a').

The most important use case is a complete traversal (probably breadth-first).  Although there are plenty of cuts that allow traversal of subgraphs as well.  I think the option on the left is most efficient for that type of algorithm so that is my favorite at the moment.  

I like your idea "...encoded set of tags in-line in the graph...".  To accomplish this, I think a new API is required.  In the current API the you use an iterator to find child nodes for a given node.  Traversals are a simple matter of:
  • get node
  • iterate over children to get child node

The new API would have an iterator that returns edges.  An edge would have three properties: From, To and Attributes.  Traversals would now involve one extra step: 
  • get node
  • iterate over edges
  • get "To" node
What do you think?

Frank

Frank San Miguel

unread,
Apr 20, 2014, 10:38:42 AM4/20/14
to netfli...@googlegroups.com
I've thought about this a bit more and I think I can create edges with attributes quite easily without modifying NFGraph.  Perhaps this was what you meant by "...keep this off to the side...".  Do you see any potential problems with this approach?

public static final NFGraphSpec testSchema = new NFGraphSpec(
 
new NFNodeSpec("Node", new NFPropertySpec("edge", "Edge", MULTIPLE | COMPACT)),
 
new NFNodeSpec("Edge")
);


public class Edge implements Serializable {
 
private static final long serialVersionUID = 1L;
 
private int[] attributes;
 
private int childOrdinal;
 
 
...
}



Now a traversal goes like this:

public static void traverseDepthFirst(NFCompressedGraph graph, int currOrdinal, OrdinalMap<Edge> edgeOrdinals) {
   
OrdinalIterator iter = graph.getConnectionIterator("Node", currOrdinal, "edge");
   
int edgeOrdinal = iter.nextOrdinal();
   
while(edgeOrdinal != OrdinalIterator.NO_MORE_ORDINALS) {
       
Edge edge = edgeOrdinals.get(edgeOrdinal);
       
int childOrdinal = edge.getChildOrdinal();
       
//Do something with the child
        traverseDepthFirst
( graph, childOrdinal, edgeOrdinals);
        edgeOrdinal
= iter.nextOrdinal();
   
}
}



Reply all
Reply to author
Forward
0 new messages