Locating node based on attributes

40 views
Skip to first unread message

Elvis Shera

unread,
Jan 19, 2023, 1:50:18 PM1/19/23
to networkx...@googlegroups.com
Hi,

what is the most performing way to locate a node based on 2 or more attributes? 

Regards,
Elvis
--
Best Regards / Freundlichen Grüßen,
Elvis Shera

Dan Schult

unread,
Jan 19, 2023, 2:06:58 PM1/19/23
to networkx...@googlegroups.com
Maybe there are other ways, but this is pretty universal and flexible:

foo4nodes = [n for n, nodedata in G.nodes.data() if nodedata["attr1"] <= 4 and nodedata["attr2"] == "foo"]

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCL%2BRD8BefHOSEYaemjmAjnMam_b7Xx38HmR%3DXzamvU%3DBjw%40mail.gmail.com.

Elvis Shera

unread,
Jan 20, 2023, 3:43:42 AM1/20/23
to networkx...@googlegroups.com
Thank you, 

I have a graph with 300K nodes, 269K edges and with the above approach it takes around 7 minutes to get the task done. The bottleneck seems to be really locating the interested nodes. I wonder how this can be improved by a factor 10... 

Thanks, 
Elvis

Dan Schult

unread,
Jan 20, 2023, 3:51:49 AM1/20/23
to networkx...@googlegroups.com
Maybe you should process (and even store?) the node attributes in a different structure.  Finding which nodes satisfy two criteria has nothing to do with a graph or network structure.  Perhaps storing the node attributes as a numpy array with two columns. Or a pandas data frame. It depends on your specific data and criteria.

Elvis Shera

unread,
Jan 20, 2023, 4:09:19 AM1/20/23
to networkx...@googlegroups.com
I see that some times ago there was a nx.to_pandas_df(). function. Seems like a useful thing and I wonder why it was removed? . I do not see it in more recent versions.  Do you have a code snippet on how can i put a graph into a pandas DF or in a numpy array ?  Or should this be added at the time the nodes are been added to the grap and in parallel can be added to the dataframe ?

Nicolas Cadieux

unread,
Jan 20, 2023, 7:13:36 AM1/20/23
to networkx...@googlegroups.com

Cason Konzer

unread,
Jan 20, 2023, 8:52:31 AM1/20/23
to networkx...@googlegroups.com
How is the data of the graph originally stored? It may be easiest to convert to a dataframe from this data, if you are looking for speed, I usually create lists containing data frame columns, e.g. you would have an edge represented by 2 lists source and column, and then a list for the attributes. 

To convert to a dataframe it would be of the order: 
df = pd.DataFrame({'source':source_list, 'target':target_list', 'attrib1':attrib1_list, ... })




--
Love,
Cason Konzer

Elvis Shera

unread,
Jan 20, 2023, 6:58:47 PM1/20/23
to networkx...@googlegroups.com
Ok, I was looking for export to pandas df... and not directly to numpy arrays. 

Nicolas Cadieux

unread,
Jan 20, 2023, 7:17:34 PM1/20/23
to networkx...@googlegroups.com
Hi,

If you don’t have text data, you’re looking at the same thing. If you do have texte data, then use dict_of_dicts.  From there, pandas will have an import. 

Jarrod Millman

unread,
Jan 22, 2023, 7:10:46 AM1/22/23
to networkx...@googlegroups.com
On this page
- https://networkx.org/documentation/stable/reference/convert.html
you will find
- https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_adjacency.html
- https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_edgelist.html

If you search for "Pandas" in the documentation, the top result is
- https://networkx.org/documentation/stable/reference/convert.html#pandas

To search the documentation, you should be able to click on the little
magnifying glass icon in the upper right-hand corner.

On Fri, Jan 20, 2023 at 4:17 PM Nicolas Cadieux
> To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/93737A19-513E-469B-9FCD-D97645D6DB7D%40gmail.com.

Elvis Shera

unread,
Jan 22, 2023, 4:11:47 PM1/22/23
to networkx...@googlegroups.com
To give some more details, I am looking at a graph (class Graph)  of 300K nodes, 269K edges, 31K communities with the biggest community made of 42K nodes. 

I have at first exported the graph into a pandas dataframe. This is taking around 10 sec. 
then i have:

for prt in prts:
        raw = df.loc[(df['leaf_name'] == leaf_name) & (df['prt_name'] == prt_name)]
        nature = raw.iloc[0]['nature']
        category = raw.iloc[0]['category']
        sdn_comment = raw.iloc[0]['mapp_comment']

        if nature:
            prt['properties'].update({'nature': {'value': nature}})

So to get all my task done I need ca. 350 sec which is an improvement of factor 3x from using the get_node_attribute from the Graph API..  However this seems to be not enough.  Although useful, the user experience is not great. I am looking to come below 30 sec. 

I will try by exporting to ditc_of_dicts to see if this helps....

Thanks

Elvis Shera

unread,
Jan 22, 2023, 4:37:50 PM1/22/23
to networkx...@googlegroups.com
update, 

using dictof_dicts did not give me an advantage although my attributes are text based. However, once in a DataFrame, I did drop duplicates based on 2 column value and this gave me an over 10x additional performance improvement. Which is good.  

I am still looking to see if this can be further improved. 


Nicolas Cadieux

unread,
Jan 23, 2023, 7:39:52 AM1/23/23
to networkx...@googlegroups.com
Hi,

I’am not sure what your doing exactly but I think I see a loop and a if clause to update a dataframe.  Unless I am mistaken, this should all be replaced by a single np.where method as this is much faster than a loop in a dataframe.  A np.where (or panda.where) will update or filter a column based on conditions without using a for loop.  



Elvis Shera

unread,
Jan 25, 2023, 11:46:59 AM1/25/23
to networkx...@googlegroups.com
The conditions are following:

The loop is to select leafs from a list where for each leaf I need to get the name and validate that name.

Once I have the leaf I will look for raws in the data frame for corresponding leaf’s names and another attribute.  Here there is no loop.  

When I take those leafs from the  DF I ca then get the other attributes which are the real reason for all this search. 

A NP will have numbers but my attributes are strings. How to do it? 

Also the gain I  have seen from querying a graph to querying a DF is an improvement of factor 3x while the real gain came when I dropped duplicates from the DF. 

I was expecting more gains just from moving from the graph to DF but maybe I was wrong I. My expectation 

Nicolas Cadieux

unread,
Feb 2, 2023, 5:19:13 PM2/2/23
to networkx...@googlegroups.com, Elvis Shera

Hi,

I hope this example will help.  Cut and paste in a .py file and run it.

# ==============================

# -*- coding: utf-8 -*-
"""
Created on Thu Feb  2 15:41:10 2023

@author: Nicolas
"""

import pandas as pd

df = pd.DataFrame(

    {

        "Name": [

            "Braund, Mr. Owen Harris",

            "Allen, Mr. William Henry",

            "Bonnell, Miss. Elizabeth",

        ],

        "Age": [22, 55, 22],

        "Sex": ["male", "male", "female"],
        
        "Gen":['unknow', 'unknow', 'unknow']

    }

)

print(df)
# update a value
df.loc[df['Age'] > 50,'Gen'] = 'x'
print(df,'\n')

# change a value
df.loc[df['Age'] > 50,'Age'] = 50
print (df,'\n')

# print unique values for 'Age'
print(df['Age'].unique)
print('\n')

# returns a boolean series where age is 22.  This is a loop and this is fast.
df_bool = df['Age'] == 22
print(df_bool,'\n')

#  The wrong way to do a loop with pandas. This is slow
for x in df['Age']:
    if x == 22:
        print (True)
    else:
        print(False)


# Adding df[] in front will return the df where the condition applies
df0 = df[df['Sex'] == 'male']
print(df0,'\n')
df1 = df[df['Age'] == 22 ]
print (df1,'\n')

# multiple conditions
df2 = df[(df['Age'] == 22) & (df['Sex'] == 'female')]
print(df2,'\n')

# ==============================

Reply all
Reply to author
Forward
0 new messages