Symbolic Regression

25 views
Skip to first unread message

Hakan Kjellerstrand

unread,
Sep 14, 2022, 3:30:41 PM9/14/22
to Picat
The last weeks I've written a Symbolic Regression program in Picat: http://hakank.org/picat/symbolic_regression.pi

It's a mix of puzzles, classic symbolic regression problems, and some - more or less silly or futile - experiments .

The GitHub commit is here:

Here is a simple example; It's the "facebook_puzzle" example in symbolic_regression.pi ; see data(facebook_puzzle_) :

1 + 4 = 5
2 + 5 = 12
3 + 6 = 21
5 + 8 = ?

What should "?" be?

The "+" should not be taken too literal but instead represent some (mostly mathematical) operation(s).
Is there some formula that satisfies all these equations?

The program finds some solutions (some are redundant since they mean the same thing), all indicating that "?" should be 45:
  [program = A * (4 + A),res = 45,count = 12]
  [program = A + B * A,res = 45,count = 11]
  [program = (A + 3) * A + A,res = 45,count = 5]
  [program = (B / B + B) * A,res = 45.0,count = 5]
  [program = (4 + A) * A,res = 45,count = 5]
  [program = A + A * B,res = 45,count = 4]
  [program = (3 + A + B / B) * A,res = 45.0,count = 4]
  [program = (B + 1) * A,res = 45,count = 3]
  [program = A * (B / B + B),res = 45.0,count = 3]
  [program = A * (B + 1),res = 45,count = 3]
  [program = A * (A + 3) + A,res = 45,count = 2]
  [program = A * 1 * (4 + A),res = 45,count = 2]

The advantage of doing this in Picat (in contrast to my Java based implementation http://hakank.org/jgap/ ) is that it is simple to add new functions such as  to_num2(X,Y) =Z  which ensures that 10*X+Y = Z .
For adding a new function, one have to add it both in the list of
   arities(new_fun) = 2. % the arity of new_fun
as well as
  eval('new_fun'(X,Y) =  Val =>
     ...
 
  [program = D * (A + 5),res = 40,count = 42]
  [program = C + D + E,res = 25,count = 23]
  [program = E * D + D,res = 45,count = 23]
  [program = D * (E + 1),res = 45,count = 22]
  [program = E + (D + C),res = 25,count = 21]
  [program = (E + 1) * D,res = 45,count = 17]
  [program = E + (C + D),res = 25,count = 16]
  [program = D + D * E,res = 45,count = 15]
  [program = D + C + E,res = 25,count = 14]
  [program = E + C + D,res = 25,count = 11]
  [program = (D * 1 + 4) * D,res = 45,count = 11]
  [program = D + E * D,res = 45,count = 9]
  [program = (A + 5) * D,res = 40,count = 7]
  [program = (D + 4) * D,res = 45,count = 6]
  [program = A + 5 + (C + A),res = 23,count = 2]
  [program = 1 + A + E + C * 1,res = 24,count = 1]
  [program = D * E + D,res = 45,count = 1]
  [program = D + (C + E),res = 25,count = 1]


With another representation of the same problem (in data(facebook_puzzle2, ...) we get some completely different results, e.g. 40, 25, and 24:
The genetic operators are the traditional crossover and mutation but I've made it a little simple and only crossover/mutate the top arguments (via the conversion of a function to a list using =../2). So there's room for improvement here.

Note that the program might struggle to find optimal solutions, especially for problems with float values or if it's many examples.

(This program can be seen as an extension/improvement of my function induction program: http://www.hakank.org/picat/#symbolic_function_induction
which tries to solve about the same type of problems but it uses enumeration to find solutions (in principle all solutions) via backtracking, though it - as can be expected - struggles with problems with a large search space.
One major difference is that symbolic_function_induction.pi uses strings to represent the functions, while symbolic_regression.pi uses a Picat structure ("function").
)

If you have any great ideas to improve this program, especially for speeding it up, please contact me.

/Hakan

Reply all
Reply to author
Forward
0 new messages