157 views

Skip to first unread message

Dec 5, 2013, 8:56:49 AM12/5/13

to Sage-devel mailing list

Hi,

I wonder if there is a way in Sage to efficiently compute the diameter of a graph. The current example I have has 100.000 vertices and 500.000 edges, and calling diameter on it returns"ValueError: The graph backend contains more than 65535 nodes"

Dec 5, 2013, 9:57:12 AM12/5/13

to sage-...@googlegroups.com

while playing a big with Nathann's implementation http://www.sagemath.org/doc/reference/graphs/sage/graphs/distances_all_pairs.html, I also saw:

sage: len(cov)

24024

sage: %time Graph(cov)

CPU times: user 3.00 s, sys: 0.04 s, total: 3.04 s

Wall time: 2.97 s

Graph on 6006 vertices

sage: G = Graph

sage: %time G.add_edges(cov); print G

Graph on 6006 vertices

CPU times: user 0.38 s, sys: 0.01 s, total: 0.39 s

Wall time: 0.37 s

And the time difference becomes much bigger for larger graphs (I also tried one with 10^6 edges, and adding them using add_edges took a few secs, while I interrupted Graph(cov) after about 10min... Is this something that should be fixed?

Thanks, Christian

sage: len(cov)

24024

sage: %time Graph(cov)

CPU times: user 3.00 s, sys: 0.04 s, total: 3.04 s

Wall time: 2.97 s

Graph on 6006 vertices

sage: G = Graph

sage: %time G.add_edges(cov); print G

Graph on 6006 vertices

CPU times: user 0.38 s, sys: 0.01 s, total: 0.39 s

Wall time: 0.37 s

And the time difference becomes much bigger for larger graphs (I also tried one with 10^6 edges, and adding them using add_edges took a few secs, while I interrupted Graph(cov) after about 10min... Is this something that should be fixed?

Thanks, Christian

Dec 5, 2013, 4:41:15 PM12/5/13

to sage-...@googlegroups.com

Yoooooooooooooooooooooo !!

Yeah, sorry about this 65536 nodes limitation, it's meaningless (at least for the diameter). This is because the code that computes the diameter also computes larger things, and these largers things have absolutely no meaning for graphs with more than n=65536 nodes (it would take forever, and eat a lot of memory)

So yeah. This stupid problem with the diameter will go away, I swear. A couple of patches, things like that. I thought 65536 would be okay for Sage but even I need to create larger graphs sometimes. Plus it's shameful to have errors like that.

For the data structure : we have two very important basic problems with the data structure : basically except Robert Miller (who isn't exactly doing Sage developement these days), nobody feels at ease there. I don't. From time to time I add some documentation to it when I understand how it works. But basically it's hard to maintain. I have been trying to solve the problem by other means, i.e. by creating different data structures, for instance when you don't want to modify the graphs. One has been reviewed (it was needed by the combinat guys who still have , and another one is #14589, waiting for a review).

Plus we are also slowed down by the fact that we support labels on vertices. And that is *COSTLY*.

More practically :

The diameter stuff can be easily solved, I had it in mind for a while anyway. For the slow graph data structure, I can definitely try to improve things but then I have very practical problems... REVIEWS. Because with GIT and everything, it is not like patches are getting reviewed real quick and I don't mind writing patches, but I it's not very motivating if I all this work just ends up in a patch that waits forever on the server. To give you an idea, here is the list of my patches which are waiting for a review right now. There's 25 of them :-P

Soooooo well. I really, but really DO NOT MIND at all working on making the data structure faster, but it would be nice if we could work on this together, like "I write, somebody reviews" or something. I more or less stopped adding graph patches waiting for those to be reviewed, I also stopped adding designs patches waiting for those to be reviewed....

Soooo, well. I'd be glad to add more stuff, but I can't add Sage code by myself, it needs reviewers too ^^;

And sorry for this 65536 thing. That's my fault :-P

Nathann

Dec 6, 2013, 7:32:25 AM12/6/13

to sage-...@googlegroups.com

Hi Nathann,

So yeah. This stupid problem with the diameter will go away, I swear. A couple of patches, things like that. I thought 65536 would be okay for Sage but even I need to create larger graphs sometimes. Plus it's shameful to have errors like that.

I do see that you are constantly working hard on patches, and also that the switch to git is holding things off a bit (I must confess that I haven't written or reviewed a patch in the past months, so I am the last to push things forward currently...).

Anyway, do you see a simple (even hackish) workaround so I can still get the diameter for some big graphs?

(Did you see my followup question? Any idea what goes on there?)

Thanks, Christian

Dec 6, 2013, 7:51:02 AM12/6/13

to Sage devel

Yoooooooooooo !!

> I do see that you are constantly working hard on patches, and also that the

> switch to git is holding things off a bit (I must confess that I haven't

> written or reviewed a patch in the past months, so I am the last to push

> things forward currently...).

>

> Anyway, do you see a simple (even hackish) workaround so I can still get the

> diameter for some big graphs?

Two ways out :

- Try networkx :

sage: import networkx

sage: g=graphs.PetersenGraph().networkx_graph()

sage: networkx.diameter(g)

2

- You tell me that you will review the patch, and you have it in a

couple of hours at most.

Of course the second way out is the only one that actually solves the

problem :-P

> (Did you see my followup question? Any idea what goes on there?)

Well, you gave an example of code with undefined variables. Soooooooo

unless you tell me what "cov" is the only answer I could give you in

my previous email is "beware of labels" :-P

Your "cov" list is 25000 elements long. If I try the same with normal

objects (integers), I get :

sage: cov = [(randint(0,5000),randint(0,5000)) for i in range(25000)]

sage: %timeit Graph(cov)

1 loops, best of 3: 847 ms per loop

Which is still quite slow, admittedly...

Nathann

> I do see that you are constantly working hard on patches, and also that the

> switch to git is holding things off a bit (I must confess that I haven't

> written or reviewed a patch in the past months, so I am the last to push

> things forward currently...).

>

> Anyway, do you see a simple (even hackish) workaround so I can still get the

> diameter for some big graphs?

- Try networkx :

sage: import networkx

sage: g=graphs.PetersenGraph().networkx_graph()

sage: networkx.diameter(g)

2

- You tell me that you will review the patch, and you have it in a

couple of hours at most.

Of course the second way out is the only one that actually solves the

problem :-P

> (Did you see my followup question? Any idea what goes on there?)

unless you tell me what "cov" is the only answer I could give you in

my previous email is "beware of labels" :-P

Your "cov" list is 25000 elements long. If I try the same with normal

objects (integers), I get :

sage: cov = [(randint(0,5000),randint(0,5000)) for i in range(25000)]

sage: %timeit Graph(cov)

1 loops, best of 3: 847 ms per loop

Which is still quite slow, admittedly...

Nathann

Dec 6, 2013, 8:02:39 AM12/6/13

to sage-...@googlegroups.com

Hi,

- Try networkx :

sage: import networkx

sage: g=graphs.PetersenGraph().networkx_graph()

sage: networkx.diameter(g)

2

- You tell me that you will review the patch, and you have it in a

couple of hours at most.

Of course the second way out is the only one that actually solves the

problem :-P

Thanks for offering -- I can try to review it. (In the old world, I'd say I can certainly review it,) But I first have to dig through the documentation of how to review a patch. But I will try doing so then...

> (Did you see my followup question? Any idea what goes on there?)

Well, you gave an example of code with undefined variables. Soooooooo

unless you tell me what "cov" is the only answer I could give you in

my previous email is "beware of labels" :-P

Your "cov" list is 25000 elements long. If I try the same with normal

objects (integers), I get :

sage: cov = [(randint(0,5000),randint(0,5000)) for i in range(25000)]

sage: %timeit Graph(cov)

1 loops, best of 3: 847 ms per loop

Which is still quite slow, admittedly...

sage: cov

[[((1, 3, 5), (0, 2, 4)), ((2, 3, 5), (0, 1, 4))],

[((1, 3, 5), (0, 2, 4)), ((1, 4, 5), (0, 2, 3))],

[((2, 3, 5), (0, 1, 4)), ((3, 4, 5), (0, 1, 2))],

[((1, 4, 5), (0, 2, 3)), ((2, 4, 5), (0, 3, 1))],

[((2, 4, 5), (0, 3, 1)), ((3, 4, 5), (0, 1, 2))]]

so my covers are unlabelled graphs with vertices indexed by tuples (of size <10) of tuples (of size < 6) of integers.

But my impression is that "G = Graph(cov)" should not take considerably longer than "G = Graph(); G.add_edges(cov)", independent on what cov is, as long as the outcome is indeed the same.

Thx, C.

But my impression is that "G = Graph(cov)" should not take considerably longer than "G = Graph(); G.add_edges(cov)", independent on what cov is, as long as the outcome is indeed the same.

Thx, C.

Dec 6, 2013, 8:07:20 AM12/6/13

to Sage devel

Yoooooooooo !!

> Thanks for offering -- I can try to review it. (In the old world, I'd say I

> can certainly review it,) But I first have to dig through the documentation

> of how to review a patch. But I will try doing so then...

HMmmmmmmmmm... It's not exactly a "yes", nor exactly a "no"... :-P

> sage: cov

> [[((1, 3, 5), (0, 2, 4)), ((2, 3, 5), (0, 1, 4))],

> [((1, 3, 5), (0, 2, 4)), ((1, 4, 5), (0, 2, 3))],

> [((2, 3, 5), (0, 1, 4)), ((3, 4, 5), (0, 1, 2))],

> [((1, 4, 5), (0, 2, 3)), ((2, 4, 5), (0, 3, 1))],

> [((2, 4, 5), (0, 3, 1)), ((3, 4, 5), (0, 1, 2))]]

>

> so my covers are unlabelled graphs with vertices indexed by tuples (of size

> <10) of tuples (of size < 6) of integers.

You mean. Real tuples, or abstract things with parents/elements categories and everything that take forever to hash and are printed as tuples ? Are those really integers ? :-P

> But my impression is that "G = Graph(cov)" should not take considerably

> longer than "G = Graph(); G.add_edges(cov)", independent on what cov is, as

> long as the outcome is indeed the same.

Oh ! That was your problem ?

So of course "G = Graph(cov)" should not be much longer than "G = Graph(); G.add_edges(cov)", but the timeit will be quite different indeed !

So of course "G = Graph(cov)" should not be much longer than "G = Graph(); G.add_edges(cov)", but the timeit will be quite different indeed !

timeit repeats many times the same command, and sees gives you an idea of the average time. But given what you fed it with, it spends most of its iterations adding edges that already belong to the graph when you run %timeit add_edges(...)

Nathann

Dec 6, 2013, 8:12:21 AM12/6/13

to sage-...@googlegroups.com

HMmmmmmmmmm... It's not exactly a "yes", nor exactly a "no"... :-P

This is the most certain yes you can hope for from someone who doesn't know how to achieve the task bouded to the yes.

timeit repeats many times the same command, and sees gives you an idea of the average time. But given what you fed it with, it spends most of its iterations adding edges that already belong to the graph when you run %timeit add_edges(...)

As in my example, I didn't do a timeit but only a timing of a single generation of the graph, so I think my question is still valid...

Christian

Dec 6, 2013, 8:17:54 AM12/6/13

to Sage devel

Yooooooooo !!

> This is the most certain yes you can hope for from someone who doesn't know

> how to achieve the task bouded to the yes.

> how to achieve the task bouded to the yes.

Ahahaha. Okay okay. Well, then tell me when you feel sufficiently at ease with git to review patches, and you'll have that patch two hours from then :-D

> As in my example, I didn't do a timeit but only a timing of a single

> generation of the graph, so I think my question is still valid...

> generation of the graph, so I think my question is still valid...

Oh sorry, my mistake then. Then it probably comes from the fact that the graph constructor normalizes its input when it gets it (the graph constructor is an awful thing), and creates the graph fom this afterwards. O_o

I will check.

Nathann

Dec 6, 2013, 8:32:31 AM12/6/13

to Sage devel

Yep, that's the problem indeed. The input is converted to a dictionary first, then the graph is created, and that's what eats all this time ^^;

Nathann

Dec 6, 2013, 8:47:08 AM12/6/13

to sage-...@googlegroups.com

Ahahaha. Okay okay. Well, then tell me when you feel sufficiently at ease with git to review patches, and you'll have that patch two hours from then :-D

Actually, reviewing git tickets is easier than in the old world.

You need to:

You need to:

- Install sage-git: (assuming linux/macosx and that git is installed)
> git clone g...@trac.sagemath.org:sage.git sage-git

> cd sage-git && ./sage -i ccache && make - Copy an ssh key to your trac account following the git-developer-guide
- Download the ticket (as per the development guide )
`> sage -dev checkout --ticket 12270`

- Compile (this isn't mentioned in the git guide, but I'm sure it's necessary):

> sage -br - Review and make any changes to the files
- Post your review on trac.
- If you didn't chage any files then you're finished. Otherwise:
- > sage -dev commit

adding a suitable commit message and then

> sage -dev push

to push your changes to trac -- it will ask you to confirm that you want to create a new remote branch: say yes. (IN a review you are unlikely to add files but if you do then use git add <file>) - Finally, return sage-git to its pristine state with

> git checkout master

Please feel free to correct me if I've left anything out or said anything wrong.

Dec 6, 2013, 11:31:04 AM12/6/13

to Sage devel

To answer a bit more precisely, the behaviour of Graph(edges) and g=Graph; g.add_edges(edges) is not exactly the same.

For instance, if you run the second there will never be any multiple edge in your graph, for that is the default behaviour of a Graph() object, built without argument. If, on the other hand, you create Graph([(1,2), (1,2)]), the graph will contain this edge twice and support multiple edges:

sage: Graph([(1,2),(1,2)])

Multi-graph on 2 vertices

sage: g=Graph();g.add_edges([(1,2),(1,2)]);g

Graph on 2 vertices

Same with loops :

sage: Graph([(0,0)])

Looped graph on 1 vertex

sage: g=Graph();g.add_edge(0,0);g

Graph on 0 vertices

To do that (and remember that I totally hate everything that has to do with multiple edges/loops/labels), you have to know the list of all edges, and go through it, and decide whether you allow loops/multiple edges. I personally wouldn't mind saying that all Sage graphs aer loopless and do not contain any multiple edges so that the two syntaxes would be equivalent. And Graph(list_of_edges, multiple_edges=True) would return what we expect. But then, that's explicit.

That would also be sufficient to fix the bug you reported, as there would be no need to analyse the whole list of edges. Create the graph, add the edges, and that's settled. The two commands would be equivalent.

There is another difference, about the homogeneity of the data (some edges have labels, others don't)

sage: Graph([(1,2),(1,3,"a label")])

---------------------------------------------------------------------------

ValueError Traceback (most recent call last)

<ipython-input-5-5065db547344> in <module>()

----> 1 Graph([(Integer(1),Integer(2)),(Integer(1),Integer(3),"a label")])

/home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/graphs/graph.pyc in __init__(self, data, pos, loops, format, boundary, weighted, implementation, data_structure, vertex_labels, name, multiedges, convert_empty_dict_labels_to_None, sparse)

1181 raise ValueError("Two different labels given for the same edge in a graph without multiple edges.")

1182 else:

-> 1183 raise ValueError("Edges input must all follow the same format.")

1184

1185

ValueError: Edges input must all follow the same format.

sage: g = Graph(); g.add_edges([(1,2),(1,3,"a label")]);g

Graph on 3 vertices

Of course, there is no problem with explicitly saying that the first edge should have no label :

sage: Graph([(1,2,None),(1,3,"a label")])

Graph on 3 vertices

Sooo, well. That's the kind of differences you can expect between the two syntaxes.

My own opinion on this is that we would be better off if the two were equivalent, as you reported. This would also make the Graph's constructor shorter and easier to understand. This would also make the "type" of the graph (i.e. does it allow multiple edges and loops) more predictable, for then the user would have to specify it clearly and live with the consequences. As they are now, *unless you specify it*, you don't know what the type of the graph you will get is going to be. Maybe we should say that "default graphs" are "simple graphs" (no loops, no multiple edges) unless explicitly said otherwise.

My problem with this is that I am not an user of these kind of graphs, and I don't feel legitimate to make decisions on features I don't use. Though if those who use these graphs feel like this would be a nice change, I'd be glad to work on it :-P

Nathann

P.S. : I hate labels. I hate loops. I hate multiple edges. Once more, that's what costs all this running time :-P

Dec 10, 2013, 9:11:24 AM12/10/13

to sage-...@googlegroups.com

Yooooooooooo !!

Ahahaha. Okay okay. Well, then tell me when you feel sufficiently at ease with git to review patches, and you'll have that patch two hours from then :-D

Okay, even though I didn't hear anything from you since, the ticket is there and ready : http://trac.sagemath.org/ticket/15507

Nathann

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu