Junto accuracy vs label propagation using matrix multiplication

elia...@gmail.com

unread,

Feb 8, 2017, 4:51:24 AM2/8/17

to The Junto Label Propagation Toolkit Open Discussion

Hello,

I implemented label propagation by using matrix multiplication using the pseudo code from algorithm 11.1 https://www.iro.umontreal.ca/~lisa/pointeurs/bengio_ssl.pdf and the accuracy I get is different from the accuracy when using junto (algo =lp_zgl + same number of iterations), my results are higher by a couple of points.

To verify I don't have a bug I also used Sklearn label propagation and got almost the same results as my implementation (http://scikit-learn.org/stable/modules/label_propagation.html).

Does the Junto library implements the same pseudo code?

Can you please explain what are the differences between the pseudo code written in 11.1 and the Junto implementation? or are they equal?

Thanks,

Eliav

Partha Talukdar

unread,

Feb 8, 2017, 10:20:53 AM2/8/17

to junto...@googlegroups.com

Eliav: Can you share the config file used?

p

elia...@gmail.com

unread,

Feb 8, 2017, 10:58:56 AM2/8/17

to The Junto Label Propagation Toolkit Open Discussion

I just changed the simple config file from the example.

# File where the input graph is stored. Format of the data

# is specified by the "data_format" field. The most common

# format is the edge_factored way where each line specifies an

# edge in the graph. For example,

# <source_node>TAB<target_node>TAB<edge_weight>

#

graph_file = /work/eng/eliavb/junto/examples/simple/data/input_graph

data_format = edge_factored

# Specifies the seed label information to be injected into

# selected nodes at the start of the algorithm.

seed_file = /work/eng/eliavb/junto/examples/simple/data/seeds

# Gold labels of nodes (if known). This gold label information

# is used only during evaluation.

#gold_labels_file = data/gold_labels

# Nodes (along with corresponding gold label information) which

# are used during evaluation, if any. This is kept as separate

# option from the gold_labels_file, as we may not evaluate all

# nodes whose gold label information is known.

#test_file = data/gold_labels

# Number of label propagation rounds

iters = 30

verbose = false

# All nodes with degree lower than this threshold will be pruned

# away. No nodes will be pruned with a threshold of 0

prune_threshold = 0

# Choose one of the three label propagations algorithms

algo = lp_zgl

#algo = adsorption

#algo = mad

# Hyperparameters for Adsorption and MAD

mu1 = 1

mu2 = 1e-2

mu3 = 1e-2

beta = 2

# File where label propagation output is stored. Each line

# corresponds to a node and fields in the line are organized

# as follows:

#

# <node_name>TAB[<gold_label> <gold_score>]+TAB[<seed_label> <seed_score>]+TAB\

# [<estimated_label> <estimated_score>]+TAB<is_test_node>TAB<node_MRR>

#

# For nodes for which gold (or seed label) information is not known, the corresponding

# field is left empty.

#

output_file = /work/eng/eliavb/junto/examples/simple/data/label_prop_output

Partha Talukdar

unread,

Feb 8, 2017, 11:41:04 AM2/8/17

to junto...@googlegroups.com

I think there is a slight difference in the normalization: the junto implementation does row normalization of the label scores, as per step 2 of the algorithm in sec 2.2 of http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf

p

Eliav Buchnik

unread,

Feb 8, 2017, 11:58:17 AM2/8/17

to junto...@googlegroups.com

OK, thank you!

Where do I need to change the code to get the normalization based on the degree and not on the sum of weights?

2017-02-08 18:40 GMT+02:00 Partha Talukdar <par...@talukdar.net>:

I think there is a slight difference in the normalization: the junto implementation does row normalization of the label scores, as per step 2 of the algorithm in sec 2.2 of http://pages.cs.wisc.edu/~jerryzhu/pub/CMU-CALD-02-107.pdf

p

--
You received this message because you are subscribed to a topic in the Google Groups "The Junto Label Propagation Toolkit Open Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/junto-open/uxHGqPcXwI8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to junto-open+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Partha Talukdar

unread,

Feb 8, 2017, 12:01:58 PM2/8/17

to junto...@googlegroups.com

On Wed, Feb 8, 2017 at 10:28 PM, Eliav Buchnik <elia...@gmail.com> wrote:

Where do I need to change the code to get the normalization based on the degree and not on the sum of weights?

look for ProbUtil.Normalize in https://github.com/parthatalukdar/junto/blob/master/src/main/scala/upenn/junto/algorithm/LpZgl.scala , in particular line 115.

Jason Baldridge

unread,

Feb 8, 2017, 12:33:45 PM2/8/17

to junto...@googlegroups.com

In case it helps, I've updated the scalanlp/junto repository recently to be a pared down, Scala-only implementation (with fewer dependencies and arrays to store label weights rather than hashmaps):

https://github.com/scalanlp/junto

It only has MAD in there, and there is generally a lot less code to tease through, as a whole. I'm still hoping to tighten some things up, but figured I'd throw this out there in case it's easier to work with for others for questions like this.

-Jason

--
You received this message because you are subscribed to the Google Groups "The Junto Label Propagation Toolkit Open Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to junto-open+...@googlegroups.com.

swath...@gmail.com

unread,

Apr 24, 2020, 2:43:25 AM4/24/20

to The Junto Label Propagation Toolkit Open Discussion

Hi,

I am doing some research work on label propagation and I was looking for a code of label propagation using matrix multiplication. Can you please share your code for label propagation by using matrix multiplication.