Observed values higher than 1

30 views
Skip to first unread message

Sawischa

unread,
Dec 17, 2021, 11:27:49 AM12/17/21
to PSL Users
Hello everyone,

we have seen in a dataset we are working with that PSL allows to set observed truth values that exceed the range of [0,1]. In our case it represents the amount of a certain action signalising the system that it should give more importance to certain rules.

Is this behavior intended?

If it is, there might be a couple of implications:
  • Łukasiewicz T-norm wasn't designed for values exceeding the range of [0,1]
  • Certain ground rules will have a higher incompatibility that might not be in proportion with the importance the system should give
Any hint on this behavior is highly appreciated. 

Best regards
Sammy

Eriq Augustine

unread,
Dec 17, 2021, 11:44:25 AM12/17/21
to Sawischa, PSL Users
Hey Sammy,

I think that we may actually be talking about two different types of values.

There are truth values (either observed values given in data files, or predicted values produced by PSL's inference).
These values should always be in [0, 1].
So if you are seeing values that are not in [0, 1], it could be a bug that we need to track down.

Then there are weights attached to soft rules/constraints.
You see these in the rules file for PSL, and as you say these values "give more importance to certain rules".
These can be any positive value.
By default, these are normalized to be in [0, 1], but that behavior can be turned off by setting the "inference.normalize" option to false.

Do you know if the values you are seeing truth values or weights?

-eriq

--
You received this message because you are subscribed to the Google Groups "PSL Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to psl-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/psl-users/7d72286d-f543-4611-a064-ef13eec52bd7n%40googlegroups.com.

Sawischa

unread,
Dec 17, 2021, 12:10:11 PM12/17/21
to PSL Users
Hi Eriq,

Thanks for your swift reply!

I am referring to the actual observed truth values of the UserShare ground atoms in this dataset: https://github.com/linqs/chowdhury-cikm20, more specifically this file: https://github.com/linqs/chowdhury-cikm20/blob/master/BuzzFeed/data/BuzzFeedNewsIDUserShare.txt

I suspected it could be a bug, but the chances that PSL allows this and that the model still works seemed so low for me.
I tried to track down what influence these particular truth values have (where the user share count is 2) and it seems that we introduce a "standard incompatibility" to our terms.

Let's start with this  ground rule:
FAKENEWS(Fake_7) & BLOCK(Fake_7) & USERSHARE(Fake_7, 7974) >> ~USERCREDLAT(7974) ^2
-->   !FAKENEWS(Fake_7) | !BLOCK(Fake_7) | !USERSHARE(Fake_7, 7974) | !USERCREDLAT(7974)                // DNF
-->   FAKENEWS(Fake_7) & BLOCK(Fake_7) & USERSHARE(Fake_7, 7974) & USERCREDLAT(7974)                // Negating for dissatisfaction
-->   1 & 1 & 2 &  USERCREDLAT(7974)                                                                                                                        // Inserting observed truth values
-->   max(1+1-1, 0) & max(2+USERCREDLAT(7974) - 1, 0)                                                                                        // Łukasiewicz T-norm
-->   1 & (1+ USERCREDLAT(7974)  )
-->   1 + USERCREDLAT(7974)

So in this case the incompatibility of this ground rule will always be at least 1. I suspect that for a bigger amount of user shares PSL would give more importance to this ground rule than it should.

Kindest regards
Sammy

Sawischa

unread,
Dec 17, 2021, 12:11:52 PM12/17/21
to PSL Users
I forgot to square and weight the ground rule in the example but the implication of a standard incompatibility is the same.

Eriq Augustine

unread,
Dec 17, 2021, 12:44:14 PM12/17/21
to Sawischa, PSL Users
Hey Sammy,

You are right, I can confirm that a value not in [0, 1] is sneaking into PSL.
It looks like when using a Postgres Database capable of bluk loading data, the intended check can be skipped:

As for the impact, have you see a difference on your side when changing those values > 1 to just 1?
If the values make it all the way to inference, then the impact may be low, since they will always be boxed into [0, 1]:

I created an issue for this:

-eriq

Sawischa

unread,
Dec 17, 2021, 2:21:23 PM12/17/21
to PSL Users
I have run an experiment where I amplified the news count from 2 to 5. Since I run a customised version of PSL, I can see the term incompatibilities as well as the random variable atoms changing after each iteration. And the answer is yes, the high observed truth values make it into inference. I will attach the different confusion matrices for both models as well as a cluster for the atom USERCREDLAT('7974') (Notice the incompatibility at the top).

I have to run again the experiment where I set the shares >1 to just 1 because I used old weights for that experiment (However, you could see that the cluster incompatibility for USERCREDLAT('7974') was close to zero, which makes sense when looking at the previous transformations of ground rules into incompatibility).

Let me know if you need further information (logs, etc.)

Best regards
Sammy




original_shares_7974.png
original.png.png
amplified_shares.png.png
amplified_shares_7974.png

Eriq Augustine

unread,
Dec 17, 2021, 2:46:38 PM12/17/21
to Sawischa, PSL Users
Interesting, thanks for the info.

I put in a fix that is going through CI now:

Since you are using a custom version of PSL, I would probably recommend just changing the values in the data files to 1.
(If you want to cherrypick the fix, you can do that too.)

I will also fix this experiment repository as well and try put it in a format that is consistent with the experiments you see in the psl-examples repository:

-eriq


Reply all
Reply to author
Forward
0 new messages