Score domination of frequent labels in modified adsorption?

pvenkat...@gmail.com

unread,

Jun 18, 2016, 2:32:51 AM6/18/16

to The Junto Label Propagation Toolkit Open Discussion

Hello.

I'm working on a multi-label classification problem with this tool-kit under 2 different labels.

The labeled data is pretty much around 1% of the whole data. But, even that label distribution is skewed. the first label overpowers the other in number by around 11 times.

I ran the algorithm without any real problems, only to find out that the minority labels have very less influence than the majority ones. Like, if I should assign labels to the unlabeled data, every unlabeled node will be labeled as the majority label.

What is the problem here? Are there any parameters I can modify to make the algorithm more efficient to my problem?

Thanks in advance.

Partha Talukdar

unread,

Jun 18, 2016, 4:51:17 AM6/18/16

to junto...@googlegroups.com

On Sat, Jun 18, 2016 at 12:02 PM, <pvenkat...@gmail.com> wrote:

The labeled data is pretty much around 1% of the whole data. But, even that label distribution is skewed. the first label overpowers the other in number by around 11 times.
I ran the algorithm without any real problems, only to find out that the minority labels have very less influence than the majority ones. Like, if I should assign labels to the unlabeled data, every unlabeled node will be labeled as the majority label.

What is the problem here? Are there any parameters I can modify to make the algorithm more efficient to my problem?

There are variety of things you can do: give equal number of seeds from both categories, class mass normalization (sec 11.5 in http://www.iro.umontreal.ca/~lisa/pointeurs/bengio_ssl.pdf), etc.

pvenkat...@gmail.com

unread,

Jun 18, 2016, 11:17:52 AM6/18/16

to The Junto Label Propagation Toolkit Open Discussion

Thanks for the reference sir, we are looking into mass normalization.

By the way, sir, all the edges we are dealing with have weights of the order of 1e-4. The scores from the labels seem to be dependent on the edge weights, so they too are of the same order. But whatever the edge-weights may be, the dummy label always has a score to the order of 1e-1. They are over-powering. So, after going through this google group, I found a solution and changed the mu3 parameter in steps till I reached 1e-6. Now, they are all of same order.

My question is, does decreasing a parameter by that much have any effect on the accuracy of the algorithm?

Thankyou.

Partha Talukdar

unread,

Jun 19, 2016, 2:12:13 PM6/19/16

to junto...@googlegroups.com

On Sat, Jun 18, 2016 at 8:47 PM, <pvenkat...@gmail.com> wrote:

My question is, does decreasing a parameter by that much have any effect on the accuracy of the algorithm?

Yes it may. mu3 controls regularization.

Reply all

Reply to author

Forward