SegFault due to nx.to_numpy_matrix, using STRING

46 views
Skip to first unread message

Lorenzo

unread,
Oct 23, 2018, 6:00:48 AM10/23/18
to HotNet
Hi,
I am facing a segfault error running the code. I inferred the issue is in W = nx.to_numpy_matrix( G , nodelist=nodes, dtype=np.float64 ) in hotnet2/network.py ; are you aware of such a  problem? 
I get the error when using STRING, so it is quite clear the issue is in the network size:
(generated output:)
* Loading PPI...
- Edges: 443750
- Nodes: 24748
* Removing self-loops, multi-edges, and restricting to largest connected component...
- Largest CC Edges: 438944
- Largest CC Nodes: 24711

Thank you for the attention

Lorenzo

matthe...@brown.edu

unread,
Oct 23, 2018, 9:13:53 AM10/23/18
to HotNet
Hi Lorenzo,

Thank you for your message.  I assume that you are running the makeNetworkFiles.py script in HotNet2, but please correct me if you are running something else.

First, I would ensure that you are using the current version of the HotNet2 repository and the current versions of the NumPy and NetworkX packages.  Next, I would check that you can run the commands in paper/paper_commands/sh successfully.  Then, I would recommend trying to run the makeNetworkFiles.py script with one core, i.e., -c 1.  If this issue persists, then could you share your arguments to the makeNetworkFiles.py script and output from the script, including the error message?

Best,
Matt

Lorenzo

unread,
Oct 24, 2018, 4:02:21 AM10/24/18
to HotNet
Hi Matt,

thank you for you quick response. 

All the software I am using is updated, I just installed conda and downloaded HotNet2 a few days ago.
Yes, from makeNetworkFiles.py I traced back to the call nx.to_numpy_matrix. 
About the commands, I can confirm I successfully ran both the original scripts and data, and new data as well. The issue comes up only with STRING network, which by the way was processed by the same script I used for the other new networks.
As you suggested, I ran with num_cores = 1, but lead to no difference. 

Here are the script arguments:
hotnet2=/home/lorenzo/software/hotnet2-master
num_cores=1
num_network_permutations=50
num_heat_permutations=500
# Create network data.
python $hotnet2/makeNetworkFiles.py \
    -e  data/string/STRING_human_genesymbol_indices.txt \
    -i  data/string/STRING_genes_index.txt \
    -nn string \
    -p  string \
    -b  0.4 \
    -o  data/string/ \
    -np $num_network_permutations \
    -c  $num_cores

And here is the output:

Creating PPR matrix for real network
--------------------------------------
* Loading PPI...
- Edges: 443750
- Nodes: 24748
* Removing self-loops, multi-edges, and restricting to largest connected component...
- Largest CC Edges: 438944
- Largest CC Nodes: 24711
* Creating HotNet2 diffusion matrix for beta=0.4...
./paper_commands.sh: line 20: 31546 Segmentation fault      python $hotnet2/makeNetworkFiles.py -e data/string/STRING_human_genesymbol_indices.txt -i data/string/STRING_genes_index.txt -nn string -p string -b 0.4 -o data/string/ -np $num_network_permutations -c $num_cores
* Loading heat scores for 77 genes
Traceback (most recent call last):
  File "/home/lorenzo/software/hotnet2-master/HotNet2.py", line 142, in <module>
    run(get_parser().parse_args(sys.argv[1:]))
  File "/home/lorenzo/software/hotnet2-master/HotNet2.py", line 69, in run
    infmat, indexToGene, G, network_name = hnio.load_network(network_file, HN2_INFMAT_NAME)
  File "/home/lorenzo/software/hotnet2-master/hotnet2/hnio.py", line 378, in load_network
    H = load_hdf5(file_path)
  File "/home/lorenzo/software/hotnet2-master/hotnet2/hnio.py", line 398, in load_hdf5
    f = h5py.File(file_path, 'r')
  File "/opt/anaconda3/envs/py2/lib/python2.7/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/opt/anaconda3/envs/py2/lib/python2.7/site-packages/h5py/_hl/files.py", line 142, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to open file: name = 'data/string/string_ppr_0.4.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

So you successfully could run the code on a larger network like this one?

Thank you

Lorenzo

matthe...@brown.edu

unread,
Oct 24, 2018, 9:53:29 AM10/24/18
to HotNet
Hi Lorenzo,

Yes, we have successfully used larger networks with HotNet2.  Could you try the following things?
  1. You indicated that you traced the error to W = nx.to_numpy_matrix( G , nodelist=nodes, dtype=np.float64 ) in the hotnet2/network.py script.  Could you replace dtype=np.float64 with dtype=np.float32?  This change will not affect the results.
  2. Your error message indicates that you have heat scores for 77 genes, but your network has 24,748 genes.  You may be able to reduce the size of your network, e.g., keep only genes that are shortest path distance 3 from your scored genes, without significantly changing the results.
Based on your error message, the first error raised by HotNet2 occurs after the above line that you identified as the source of the error, but I hope that the above steps help.

Best,
Matt

Lorenzo

unread,
Oct 25, 2018, 7:04:47 AM10/25/18
to HotNet
Hi Matt,

I investigated further, the problem is a matrix inverse computation that my 16 GBs of memory do not allow. Thank you for the support.

I have one detached question, since it's very basic maybe a new conversation is not required:
Running HotNet2 many times, I have never observed significant p-values for the network sizes, and the consensus network usually shows a large component with nodes almost never connected and a few ones made of a couple of nodes not connected. Do you have any suggestions for possible source of errors? I have been using STRING, Mint, Iref PPIs and GWAS p-values for scores (and as you saw, a small percentage of nodes are assigned a score different from 0 in the input file). Each experiment is made of one network and one score file, so there is no consensus actually.

Best

Lorenzo

matthe...@brown.edu

unread,
Oct 25, 2018, 3:14:55 PM10/25/18
to HotNet
Hi Lorenzo,

Your observations may be because of the HotNet2 consensus: if you are performing a consensus of very different networks or very different heat scores, then the HotNet2 subnetworks may be very different, so the consensus HotNet2 subnetworks may not be very meaningful.  Also, you need to set the number of consensus permutations (the -cp/--consensus_permutations argument for the HotNet2.py script) in order to have consensus p-values; by default, the HotNet2.py script performs no consensus permutations, which is much faster, but the consensus p-values are not meaningful in this case.  I would take a look at the HotNet2 subnetworks and p-values for individual networks and sets of heat scores to see how they look.  I would also make sure to define your heat scores as h_g = -log(p_g) for each gene g.

In general, if there are few genes (< 80 in your case) with positive heat scores and these genes are generally not connected, then I would not expect for the HotNet2 subnetworks to be connected (with respect to the PPI network) or statistically significant.  Our newer algorithm, Hierarchical HotNet, may be better able to identify statistically significant hot subnetworks in this case because it is more robust to different hot subnetwork sizes, including potentially small ones that you may encounter with your data.  Also, Hierarchical HotNet is better able to accommodate large networks:

Best,
Matt

Lorenzo

unread,
Oct 25, 2018, 3:33:10 PM10/25/18
to HotNet
Hi Matt,
I am sorry the last part of my last post was not very clear: my experiments were always made of a single network and a single scoring file (yes, -log(p_g)), no consensus involved. So, it is each subnetwork to be poorly connected and not statistically significant in the size. Your answer, still, remains precise and useful, and I will delve into your new publication  soon.

Side note, the issue with the segmentation fault was due to scipy's inv function. Changing it to numpy's inv function, the code works fine, I'll attach the versions of my packages.

Best

Lorenzo
Reply all
Reply to author
Forward
0 new messages