Hello, it's Joaquin again. I'm trying a new approach to the problem of the CartPole. Instead of calculating the probabilities of termination given a state, I think a better approach would be to calculate the probability of being over or under the limits (defined by the rules) given the actual state.
So my program right now is:
ctx = scallopy.ScallopContext(provenance="topkproofs")
ctx.add_relation("state0", float)
ctx.add_relation("state2", float)
ctx.add_facts("state0", state0)
ctx.add_facts("state2", state2)
ctx.add_rule("terminated_ang(state2 > 0.2095 || state2 < -0.2095) = state2(state2)")
ctx.add_rule("terminated_pos(state0 > 2.4 || state0 < -2.4) = state0(state0)")
ctx.add_rule("terminated() = terminated_pos(True) or terminated_ang(True)")
ctx.run()
But when I execute the probability, it gives me the value of the state0 or state2. Is this correct?
Also, in this case, and using the NN how can I calculate the loss function?
Another approach I'm thinking about would be to calculate the probability of being over/under the limits depending on the action, but I don't know how to model it. Any ideas?
I don't know if you have read the answer to the other email, let me know what you think.
Best,
Joaquin