I have a pretty big file (3 million lines) with each line being a person-to-event relationship. Ultimate, I want to project this bipartite network onto a single-mode, weighted, network, and write it to a CSV file. I'm using NetworkX, and I've tested my code on a much smaller sample dataset, and it works as it should. However, when I scale up to my actual dataset, my computer just maxes out on memory and spins and spins, but doesn't make any progress.
I'm using an AWS EC2 machine with 32GB of memory.
After some sample testing, I'm pretty sure things are getting hung up in the final step after the graph has been projected, and it is being written to a CSV file. I've tried breaking up the file into chunks, but then I have a problem with missing edged, or correctly adding edgeweights together. But I think a better solution is going to be to find a way to speed up writing the projected graph to CSV.
# add nodes and edges to a graph
B = nx.Graph()
B.add_nodes_from(Event, bipartite=0)
B.add_nodes_from(Name, bipartite=1)
B.add_edges_from(edgelist)
print 'Bipartite graph created at: ' + str(datetime.datetime.now() - startTime)
# create bipartite projection graph
name_nodes, event_nodes = bipartite.sets(B)
event_nodes = set(n for n,d in B.nodes(data=True) if d['bipartite']==0)
name_nodes = set(B) - event_nodes
name_graph = bipartite.weighted_projected_graph(B, name_nodes)
print 'Single-mode projected graph created at: ' + str(datetime.datetime.now() - startTime)
# write graph to CSV
nx.write_weighted_edgelist(name_graph, name_outfile, delimiter=',')
print 'Single-mode weighted edgelist to CSV: ' + str(datetime.datetime.now() - startTime)
endTime = datetime.datetime.now()
print 'Run time: ' + str(endTime - startTime)
Using Pandas to Write the Projected Edgelist, but Missing Edge Weight?
I've thought about using `pandas` to write to `name_graph` to CSV, but that doesn't include the weight. Any idea how to include the weight in the dataframe? Would this be a good option for speeding up the writing to CSV part of the process?
import pandas as pd
df = pd.DataFrame(nx.edges(name_graph))
df.to_csv('foldedNetwork.csv')
--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To post to this group, send email to networkx...@googlegroups.com.
Visit this group at http://groups.google.com/group/networkx-discuss.
For more options, visit https://groups.google.com/d/optout.
|
|
|
|
|