Locating node based on attributes

Elvis Shera

unread,

Jan 19, 2023, 1:50:18 PM1/19/23

to networkx...@googlegroups.com

Hi,

what is the most performing way to locate a node based on 2 or more attributes?

Regards,

Elvis

--

Best Regards / Freundlichen Grüßen,

Elvis Shera

Dan Schult

unread,

Jan 19, 2023, 2:06:58 PM1/19/23

to networkx...@googlegroups.com

Maybe there are other ways, but this is pretty universal and flexible:

foo4nodes = [n for n, nodedata in G.nodes.data() if nodedata["attr1"] <= 4 and nodedata["attr2"] == "foo"]

--
You received this message because you are subscribed to the Google Groups "networkx-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to networkx-discu...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCL%2BRD8BefHOSEYaemjmAjnMam_b7Xx38HmR%3DXzamvU%3DBjw%40mail.gmail.com.

Elvis Shera

unread,

Jan 20, 2023, 3:43:42 AM1/20/23

to networkx...@googlegroups.com

Thank you,

I have a graph with 300K nodes, 269K edges and with the above approach it takes around 7 minutes to get the task done. The bottleneck seems to be really locating the interested nodes. I wonder how this can be improved by a factor 10...

Thanks,

Elvis

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CA%2BXMcTP0G5QhoV_-3veXVCrWEQs79s_E5ASTMkp_EBZcGLLdog%40mail.gmail.com.

Dan Schult

unread,

Jan 20, 2023, 3:51:49 AM1/20/23

to networkx...@googlegroups.com

Maybe you should process (and even store?) the node attributes in a different structure. Finding which nodes satisfy two criteria has nothing to do with a graph or network structure. Perhaps storing the node attributes as a numpy array with two columns. Or a pandas data frame. It depends on your specific data and criteria.

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCLJkH2ZKSV_qiyw5Z3pEGUiO5dGJAQNu2y%3DBgYtrs4BVzQ%40mail.gmail.com.

Elvis Shera

unread,

Jan 20, 2023, 4:09:19 AM1/20/23

to networkx...@googlegroups.com

I see that some times ago there was a nx.to_pandas_df(). function. Seems like a useful thing and I wonder why it was removed? . I do not see it in more recent versions. Do you have a code snippet on how can i put a graph into a pandas DF or in a numpy array ? Or should this be added at the time the nodes are been added to the grap and in parallel can be added to the dataframe ?

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CA%2BXMcTMXuzpn%2BPUbWyTn-5mR1ibmkRVTQEOFsG18VPoq%3Db3p7A%40mail.gmail.com.

Nicolas Cadieux

unread,

Jan 20, 2023, 7:13:36 AM1/20/23

to networkx...@googlegroups.com

Hi,

Looks like it’s still there,

https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_numpy_array.html#networkx.convert_matrix.to_numpy_array

https://networkx.org/documentation/stable/reference/convert.html

Nicolas Cadieux

https://gitlab.com/njacadieux

Le 20 janv. 2023 à 04:09, Elvis Shera <elvis...@gmail.com> a écrit :

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCLL2PnZo6AJNtuWq1SsJvpwa47NruJ6SXLNgG%3DnYttAwrA%40mail.gmail.com.

Cason Konzer

unread,

Jan 20, 2023, 8:52:31 AM1/20/23

to networkx...@googlegroups.com

How is the data of the graph originally stored? It may be easiest to convert to a dataframe from this data, if you are looking for speed, I usually create lists containing data frame columns, e.g. you would have an edge represented by 2 lists source and column, and then a list for the attributes.

To convert to a dataframe it would be of the order:

df = pd.DataFrame({'source':source_list, 'target':target_list', 'attrib1':attrib1_list, ... })

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCLL2PnZo6AJNtuWq1SsJvpwa47NruJ6SXLNgG%3DnYttAwrA%40mail.gmail.com.

--

Love,

Cason Konzer

Elvis Shera

unread,

Jan 20, 2023, 6:58:47 PM1/20/23

to networkx...@googlegroups.com

Ok, I was looking for export to pandas df... and not directly to numpy arrays.

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/2D2EA316-CA0D-4E2F-9F09-CADC940F0B42%40gmail.com.

Nicolas Cadieux

unread,

Jan 20, 2023, 7:17:34 PM1/20/23

to networkx...@googlegroups.com

Hi,

If you don’t have text data, you’re looking at the same thing. If you do have texte data, then use dict_of_dicts. From there, pandas will have an import.

https://networkx.org/documentation/stable/reference/generated/networkx.convert.to_dict_of_dicts.html

Nicolas Cadieux

https://gitlab.com/njacadieux

Le 20 janv. 2023 à 18:58, Elvis Shera <elvis...@gmail.com> a écrit :

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCL%2BPDOxrMaWmaPMat%2BscJwOBYg5Dw6Ktohud5mF93ZoOEw%40mail.gmail.com.

Jarrod Millman

unread,

Jan 22, 2023, 7:10:46 AM1/22/23

to networkx...@googlegroups.com

On this page
- https://networkx.org/documentation/stable/reference/convert.html
you will find
- https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_adjacency.html
- https://networkx.org/documentation/stable/reference/generated/networkx.convert_matrix.to_pandas_edgelist.html

If you search for "Pandas" in the documentation, the top result is
- https://networkx.org/documentation/stable/reference/convert.html#pandas

To search the documentation, you should be able to click on the little
magnifying glass icon in the upper right-hand corner.

On Fri, Jan 20, 2023 at 4:17 PM Nicolas Cadieux

> To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/93737A19-513E-469B-9FCD-D97645D6DB7D%40gmail.com.

Elvis Shera

unread,

Jan 22, 2023, 4:11:47 PM1/22/23

to networkx...@googlegroups.com

To give some more details, I am looking at a graph (class Graph) of 300K nodes, 269K edges, 31K communities with the biggest community made of 42K nodes.

I have at first exported the graph into a pandas dataframe. This is taking around 10 sec.

then i have:

for prt in prts:
raw = df.loc[(df['leaf_name'] == leaf_name) & (df['prt_name'] == prt_name)]
nature = raw.iloc[0]['nature']
category = raw.iloc[0]['category']
sdn_comment = raw.iloc[0]['mapp_comment']

if nature:
prt['properties'].update({'nature': {'value': nature}})

So to get all my task done I need ca. 350 sec which is an improvement of factor 3x from using the get_node_attribute from the Graph API.. However this seems to be not enough. Although useful, the user experience is not great. I am looking to come below 30 sec.

I will try by exporting to ditc_of_dicts to see if this helps....

Thanks

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAB6X4shG-XZDUKJFmEEdJC%3DRtHY4Q9t%3DojDMjKTdwpBt1hwkGg%40mail.gmail.com.

Elvis Shera

unread,

Jan 22, 2023, 4:37:50 PM1/22/23

to networkx...@googlegroups.com

update,

using dictof_dicts did not give me an advantage although my attributes are text based. However, once in a DataFrame, I did drop duplicates based on 2 column value and this gave me an over 10x additional performance improvement. Which is good.

I am still looking to see if this can be further improved.

Nicolas Cadieux

unread,

Jan 23, 2023, 7:39:52 AM1/23/23

to networkx...@googlegroups.com

Hi,

I’am not sure what your doing exactly but I think I see a loop and a if clause to update a dataframe. Unless I am mistaken, this should all be replaced by a single np.where method as this is much faster than a loop in a dataframe. A np.where (or panda.where) will update or filter a column based on conditions without using a for loop.

https://stackoverflow.com/questions/52089558/how-to-update-numpy-column-where-column-condition-met

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.where.html

Nicolas Cadieux

https://gitlab.com/njacadieux

Le 22 janv. 2023 à 16:11, Elvis Shera <elvis...@gmail.com> a écrit :

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCLJboQAv5VG4cUk6EuN6SGUqTaZ5UeQfLH7uva3jJd9HpQ%40mail.gmail.com.

Elvis Shera

unread,

Jan 25, 2023, 11:46:59 AM1/25/23

to networkx...@googlegroups.com

The conditions are following:

The loop is to select leafs from a list where for each leaf I need to get the name and validate that name.

Once I have the leaf I will look for raws in the data frame for corresponding leaf’s names and another attribute. Here there is no loop.

When I take those leafs from the DF I ca then get the other attributes which are the real reason for all this search.

A NP will have numbers but my attributes are strings. How to do it?

Also the gain I have seen from querying a graph to querying a DF is an improvement of factor 3x while the real gain came when I dropped duplicates from the DF.

I was expecting more gains just from moving from the graph to DF but maybe I was wrong I. My expectation

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/E4E134CE-CF98-4927-9512-A160CB7A378F%40gmail.com.

Nicolas Cadieux

unread,

Feb 2, 2023, 5:19:13 PM2/2/23

to networkx...@googlegroups.com, Elvis Shera

Hi,

I hope this example will help. Cut and paste in a .py file and run it.

# ==============================

# -*- coding: utf-8 -*-
"""
Created on Thu Feb 2 15:41:10 2023

@author: Nicolas
"""

import pandas as pd

df = pd.DataFrame(

    {

        "Name": [

            "Braund, Mr. Owen Harris",

            "Allen, Mr. William Henry",

            "Bonnell, Miss. Elizabeth",

        ],

        "Age": [22, 55, 22],

        "Sex": ["male", "male", "female"],

        "Gen":['unknow', 'unknow', 'unknow']

    }

)

print(df)
# update a value
df.loc[df['Age'] > 50,'Gen'] = 'x'
print(df,'\n')

# change a value
df.loc[df['Age'] > 50,'Age'] = 50
print (df,'\n')

# print unique values for 'Age'
print(df['Age'].unique)
print('\n')

# returns a boolean series where age is 22. This is a loop and this is fast.
df_bool = df['Age'] == 22
print(df_bool,'\n')

# The wrong way to do a loop with pandas. This is slow
for x in df['Age']:
    if x == 22:
        print (True)
    else:
        print(False)

# Adding df[] in front will return the df where the condition applies
df0 = df[df['Sex'] == 'male']
print(df0,'\n')
df1 = df[df['Age'] == 22 ]
print (df1,'\n')

# multiple conditions
df2 = df[(df['Age'] == 22) & (df['Sex'] == 'female')]
print(df2,'\n')

# ==============================

To view this discussion on the web visit https://groups.google.com/d/msgid/networkx-discuss/CAOeUCLJboQAv5VG4cUk6EuN6SGUqTaZ5UeQfLH7uva3jJd9HpQ%40mail.gmail.com.

-- 
Nicolas Cadieux
https://gitlab.com/njacadieux

Reply all

Reply to author

Forward