With libraries like fastutil and WebGraph, JGraphT can truly stand on the shoulders of giants :)
I took a look at the dependencies for WebGraph, and in general it looks like those should be fine for the jgrapht-opt module. (As you noted, we try to keep jgrapht-core as light as possible.) However, I did notice compile-time dependencies on jung-api and jung-io, which are needed for the JungAdapter included by WebGraph. It probably wouldn't be great for jgrapht-opt to drag in Jung spuriously.
a) since you already have a Jung adapter, put the JGraphT->WebGraph adapter next to it in WebGraph (instead of inside of JGraphT)
b) move the Jung adapter out of WebGraph into a separate module, and then have the JGraphT->WebGraph adapter in jgrapht-opt depend only on a Jung-free release of WebGraph
There may be other good ways to solve this.
Regarding the license, note that we dual-license under both LGPL and EPL for maximum compatibility. We don't require any copyright assignment; all contributors retain their own copyright on the portions they contributed (same model as Linux).
And on the topic of the BigGraph proposals, if we add something like the iterators you propose, we may want to think about parallel algorithm support at the same time (like the spliterators in Java).
> On 23 Oct 2020, at 11:44, Dimitrios Michail <
dimitrio...@gmail.com> wrote:
>
> thanks a lot for your kind words. Your libraries and research efforts are also very impressive! I remember listening to a talk by Ian Munro about succinct data structures and wondering if there are any actual implementations of similar results.
Well, there's Sux (both C++ and Java), and Simon Gog's library of succinct data structures. In fact, Facebook is running (graph and index) on the Elias-Fano quasi-succinct representation of monotone sequences (
https://github.com/facebook/folly/blob/master/folly/experimental/EliasFanoCoding.h). More precisely, on the improved version with partitioning who won the best paper at one of the last SIGIR.
> Regarding the adapter classes for WebGraph, I would personally be very happy to include them in the library. Since this is not a solo decision, I am also cc-ing my colleague John Sichi whose opinion I value a lot.
Of course!
> Some questions for both of you:
> - Would this fit into the jgrapht-opt package?
Yes, certainly. You do not want all the WebGraph dependencies in jgraph-core.
> - For the "big" version is the degree only the limit or do you also have problems with the Sets we are returning? I was thinking of adding in the future one more
> `BigGraph` interface where degrees would be longs and all sets would be returned as streams or iterators, thus avoiding such issues.
You read my mind. I would like to propose (see below) a backward-compatible, non-intrusive extension of the JGraphT Graph interface that would address that issue. It is not only a problem with "big" graphs--even a small Wikipedia graph (see the example below) will give you problems if you try to call edgeSet() or run recursive algorithms.
> - We recently released the python bindings for JGraphT which compiles natively the JGraphT library (as a shared library using GraalVM). Any chance
> this would also work with the webgraph? The only restriction we have is that the graph implements Graph<Integer, Integer>.
Well, my current implementation uses <Integer, EndpointPair<Integer>>. I do not understand how an <Integer,Integer> parameterization would work--assigning a distinct integer to each edge? Like, iterators would return successors? But then how containsEdge(Integer e) would work?
> Maybe it makes sense to see some of the code first, in order to give you a bit more involved feedback.
I'm including below the standard adapter (the other one is equivalent).
> On a different note, if you feel that we can jointly work on some research oriented problems regarding graph algorithms or graph representations, do not
> hesitate to ask. I am actively looking to expand my research collaborations.
That would be great.
Ciao,
seba
----------------------------------
I would also like to propose a backward-compatible extension of the Graph interface that would make JGraphT applicable to much larger graphs.
The problem we face presently when trying to use the adapter is that JGraphT is built about the idea of materializing all edge sets (entire graph, incoming, outgoing, etc.). While this is not a big issue with dense representations, it becomes an issue with compressed or implicit representations as those materialized edge sets might be orders of magnitude larger than the graph representation. This problem should affect also your sparse representation.
As an example, I tried to compute the strongly connected components of a small Wikipedia graph (≈5M nodes, ≈140M arcs). I needed to eliminate the part of the code creating the components, as it was materializing the whole arc set, and then I needed 8G of RAM because during the recursion a large number of sets of outgoing arcs have to represented in main memory at the same time.
What I suggest is adding for each set method a corresponding Iterable method. Like, edgeSetIterable(), outgoingEdgesOfIterable() (or any other name). At that point compressed or implicit representations can materialize the edges as they're used: using this approach I was able to run strongly connected components using about a GB of RAM, that is, almost ten times less memory: the result was also computed 20% faster (better cache usage and less garbage collection), and in the same time of our WebGraph implementation.
These methods can have default implementation returning the set, which is of course an Iterable, so the change would be entirely backward-compatible. But compressed or implicit representation could provide specialized lazy implementations.
Then, algorithmic classes enumerating incoming or outgoing edges to do something would just have to replace, say, in for loops outgoingArcsOf() with outgoingArcsOfIterable() to enormously reduce their memory footprint at zero cost, and probably performing faster due to the lessened burden on the garbage collector. My guess is that the vast majority of your algorithmic classes, at their core, are based on such enumerations, rather than on accessing randomly sets of arcs.