Hello, I'm having doubts on how to declare some rules on my program. I'm using Scallop under python.
So basically I want to calculate the probability of an enviroment to end (I'm using the
gymnasium cartpole enviroment). The rules for the env are:
- The cart x-position (index 0) can be take values between (-4.8, 4.8), but the episode terminates if the cart leaves the (-2.4, 2.4) range.
- The pole angle can be observed between (-.418, .418) radians (or ±24°), but the episode terminates if the pole angle is not in the range (-.2095, .2095) (or ±12°)
- Since the goal is to keep the pole upright for as long as possible, a reward of +1 for every step taken, including the termination step, is allotted.
- The threshold for rewards is 475 for v1
I plan to calculate the probability to finish (or the probability not to finish), since the more time the cart is working the more points it will accumulate.
I defined my enviroment like this with the scallop module in python:
ctx = scallopy.ScallopContext(provenance="topkproofs")
ctx.add_relation("terminated_xpos", int)
ctx.add_relation("terminated_xneg", int)
ctx.add_relation("terminated_angpos", int)
ctx.add_relation("terminated_angneg", int)
ctx.add_relation("state0", float)
ctx.add_relation("state2", float)
ctx.add_relation("reward", int)
ctx.add_relation("steps", int)
ctx.add_relation("next_action", int)
ctx.add_facts("state0", state0)
ctx.add_facts("state2", state2)
ctx.add_facts("terminated_xpos", terminated_ctx)
ctx.add_facts("terminated_xneg", terminated_ctx)
ctx.add_facts("terminated_angpos", terminated_ctx)
ctx.add_facts("terminated_angneg", terminated_ctx)
ctx.add_facts("steps", steps_ctx)
ctx.add_facts("next_action", actions_ctx)
ctx.add_rule("terminated_xpos(1) = state0(state0), state0 > 2.4")
ctx.add_rule("terminated_xneg(1) = state0(state0), state0 < -2.4")
ctx.add_rule("terminated_angpos(1) = state2(state2), state2 > 0.2095")
ctx.add_rule("terminated_angneg(1) = state2(state2), state2 < -0.2095")
ctx.add_rule("terminated() = terminated_xpos(1) or terminated_xneg(1) or terminated_angpos(1) or terminated_angneg(1)")
First: there is any way of "join" the terminated_xpos and terminated_xneg rules in one rule? I tried different ways but I always get an error. The rule should be something like "terminated(1) = -2.4 > state0 > 2.4. (because state0 should be If you can put me an example I'll be able to extend it to other rules.
Second: the terminated is passed from gymnasium as a boolean value (True or False). Is there any way I can use it on my rules without transforming it?
Third: will my final rule ("terminated() = terminated_xpos(1) or terminated_xneg(1) or terminated_angpos(1) or terminated_angneg(1)") if passed into a NN will be able to get me the probability of finishing the enviroment?
Thank you so much in advance. I find the library pretty useful and I plan to continue using it!