problem importing (large?) dataset with labels

216 views
Skip to first unread message

David Diaz

unread,
Sep 14, 2015, 8:46:29 PM9/14/15
to nwcommands
Hello! thanks in advance for the help

I'm trying to import an edgelist from an stata file and after importing, all the labels for the nodes dissapear. I need to join the file with some other columns later on so labels missing is not an option. 

I noticed that using the exact same code but with an smaller (mock) data sample the problem do not occur. Maybe there is an issue with memory or a max number of labels allowed?

the data in the "bi_2014_ce.dta" file is in this format:

A B
58883894 3040
103807759 3040
290455044 3040
46096087 270810
167804656  270810
321518297  270810
329368958  270810
394711590  270810
... ...

Columns A and B are strings

In total there are 1274 nodes (labels)

the code goes like this

clear
nwclear

use "bi_2014_ce.dta"

nwfromedge A B, name(mynet) undirected
nwdegree mynet, isolates
nwdegree mynet, generate(nrmdegree) standardize
nwcomponents mynet, lgc
nwclustering mynet

sort _clustering
gen id=_n

nwcloseness mynet, unconnected(max) nosym

basically after nwfromedge A B, name(mynet) undirected the labels are not recognize any more

I hope someone can help

best regards

Thomas Grund

unread,
Sep 17, 2015, 9:50:18 AM9/17/15
to David Diaz, nwcommands
Hi,

I replied to David already in-depth, but maybe some of you might be interested in this as well. The reason why you might not see the node labels is because "nwload" (which is implicitly called by "nwfromedge") has an in-built check to NOT load larger networks (500+ nodes). You can explicitly ask for the node labels with:

. nwload, labelonly

This generates a new variable "_nodelab" (and some others). In the next release, I will change the in-built check to default to "nwload, labelonly" for larger networks.

Furthermore, notice there is a more elegant way to keep the original node labels for matching and sorting. In David's data there are leading blanks in the original string variable which holds the node labels. Hence, one can substring and then destring _nodelab like this. Afterwards, "nwsort" sorts the network in ascending order of the integer mylab.

. gen mylab = substr(_nodelab, 2,.)
. destring mylab, replace
. nwsort mynet, by(mylab)

Hope this helps.

Best,
Thomas

--
You received this message because you are subscribed to the Google Groups "nwcommands" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nwcommands+...@googlegroups.com.
Visit this group at http://groups.google.com/group/nwcommands.
To view this discussion on the web visit https://groups.google.com/d/msgid/nwcommands/f6d305c8-fcad-45fb-ab1c-45b1456819f6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages