Fwd: [scallop-lang] A few more questions about Scallop.

32 views
Skip to first unread message

Mayur Naik

unread,
Jun 3, 2023, 9:17:32 AM6/3/23
to scallo...@seas.upenn.edu


---------- Forwarded message ---------
From: 'Joaquin' via scallop-lang <scallo...@seas.upenn.edu>
Date: Fri, Jun 2, 2023 at 3:39 PM
Subject: [scallop-lang] A few more questions about Scallop.
To: Scallop Lang+owners <scallop-l...@seas.upenn.edu>


Hello, it's Joaquin again. I'm working in a new enviroment because I have been stuck on the other env. I have chosen the Mountain Car enviroment because I found their rules well explained and easy to implement. It's rules are:

There are 3 discrete deterministic actions:

    0: Accelerate to the left

    1: Don’t accelerate

    2: Accelerate to the right
    
Given an action, the mountain car follows the following transition dynamics:

velocityt+1 = velocityt + (action - 1) * force - cos(3 * positiont) * gravity

positiont+1 = positiont + velocityt+1

where force = 0.001 and gravity = 0.0025. The collisions at either end are inelastic 
with the velocity set to 0 upon collision with the wall.

The goal is to reach the flag placed on top of the right hill as quickly as possible, 
as such the agent is penalised with a reward of -1 for each timestep.

The episode ends if either of the following happens:

    Termination: The position of the car is greater than or equal to 0.5 
    (the goal position on top of the right hill)

    Truncation: The length of the episode is 200.

After thinking a bit if I were having a correct approach I took this approach:
  • Define the rules of the system symbolically
  • Estimate the different possible rewards in a state for each action
  • Get the max reward
  • RL system: make it make an action
  • Get the reward
  • Compare the possible max reward obtained from the symbolic system with the obtained reward (loss)
I expect using this approach to get a custom loss and the agent to learn fast, as it's supposed the defined rules are okay.

So I modelled my system like this:

ctx = scallopy.ScallopContext(provenance="topkproofs")

# Relations:
ctx.add_relation("state0", float)
ctx.add_relation("state1", float)
ctx.add_relation("steps", float)
ctx.add_relation("force", float)
ctx.add_relation("gravity", float)
ctx.add_relation("state0_cos", float)
ctx.add_relation("action_list", float)

# Facts:    
ctx.add_facts("state0", state0)
ctx.add_facts("state1", state1)
ctx.add_facts("force", force)
ctx.add_facts("gravity", gravity)
ctx.add_facts("state0_cos", state0_cos)
ctx.add_facts("action_list", action_list)
ctx.add_facts("steps", steps)

# Rules:
    
ctx.add_rule("new_state1(state1 + (action_list * force) - (state0_cos * gravity)) = state1(state1), action_list(action_list), force(force), state0_cos(state0_cos), gravity(gravity)")    

ctx.add_rule("new_state0(new_state0) = state0(state0), new_state1(new_state1), state0 + new_state1 ==  new_state0")

ctx.add_rule("expected_reward(expected_reward) = new_state0(new_state0), steps(steps), new_state0 - steps == expected_reward")

where action_list is a list with all the possible actions [0,1,2].

So my questions are:

  • When I do 
for prob, tup in ctx.relation("expected_reward"):
  print(prob, tup) 
print("---------------")

They only show one reward and it's zero. It should have any value and it should show the 3 different reward (depending on each action). I have look at your sum tutorial in python and seems my data is okay. What am I doing wrong? I got 0.0 (0.0,) when I try to calculate new_state1 and new_state0. And it should show 3 different states.

  • The formula has an action - 1. When I tried to model it (like action_list - 1) I didn't know how to sustract it to the action list so I had to do it outside the symbolic program. Is there an efficient way to do it?
  • I tried to mix int data with float data (because it's how it comes). Scallop gives me an error about mixing different types of data. Is there any way to do it? So far I solved it converting everything to float.
Once I solve this enviroment I plan to go further and try, if I have time enough, with VizDoom.

About your last e-mail: It's perfect for me to schedule some short calls. I live in Spain so I suppose there is +5 or +6 hours apart. I can schedule the calls in the night so there is no problem for me about it. Just tell me when you can.

Let me know what you think about it.

Thanks for the time and support you're spending on me.
Joaquin

Reply all
Reply to author
Forward
0 new messages