Paper: Deep Symbolic Regression for Recurrent Sequences

51 views
Skip to first unread message

richie

unread,
Jan 29, 2022, 5:12:25 PM1/29/22
to Multi Expression Programming
Hi!

I just started to watch the following video, which reminded me of MEPX:


How would you compare that method to your work on MEPX?

I tried to reproduce some series using the new timeseries prediction feature in MEPX. I was successful in getting MEPX predict the Fibonacci series and the sample series that Yannic mentioned at the beginning of the video (1,2,1,3,1,4,...). But I failed for example to get "Final two digits of 2^n"  (https://youtu.be/1HEdXwEYrGM?t=1255), which is interesting. I see that many expressions generated by the approach explained in the video often contains the % symbol, which I guess indicates the modulus. I see MEPX doesn't include the modulus operator: is there the possibility to include it in the list of operators?, or is there a reason why it was not included?

Kind regards,
Richie

Mihai Oltean

unread,
Jan 29, 2022, 9:17:27 PM1/29/22
to me...@googlegroups.com
Hi,

Internal, default, representation for MEPX are real numbers (double).
% is for integer numbers.
For real numbers modulus operation is not well defined (there is a function called fmod in C, but it does not work well in all cases).
I could try to add it, but I must add some warnings (that all numbers should be integer and no operation generating real numbers should be included in the set of mathematical functions, etc).

Another possibility is to add integer as internal representation for MEPX, and let the user set it ... but this will take more time and work.

As for the video, I have not watched it yet ... seems to be very long. Will watch it soon.

regards,
Mihai




--
You received this message because you are subscribed to the Google Groups "Multi Expression Programming" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mepx+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mepx/cc00dd9c-008e-402a-9940-38d859380e7cn%40googlegroups.com.

Mihai Oltean

unread,
Jan 30, 2022, 4:30:41 AM1/30/22
to me...@googlegroups.com
Hi Richie,

Now I have added fmod (modulus for real numbers in the MEPX).
Please download version 2022.1.30.0 from the website.

I have tested against the problem that you mentioned (last 2 digits of 2 to power n).
I have taken the data from the paper, set some commonsense parameters (like +,-,*,/ fmod, min, max, floor, ceil), set the problem to time-series, no constants, and ... it worked since the first run!
The first run was with 0 error, but later runs were with an error greater than 0.
I have predicted 10 new more values, and they are correct, which is good, it means that formula is correct.

But, the formula is quite complex. 
First of all, because no constants are used. Having real constants in the context of integer data is more complicated (this is why I have added floor and ceil functions to round values, to be more suitable to fmod). 
Definitively is needed an implementation where data are all integers, all operations generate only integers and constants are integers. Definitively, that will help.

If I would add some fixed constants (like 100), most likely the formula would be simpler.

I have attached a project and a screenshot.

Thank you for the suggestion with modulus and best regards,
Mihai



last_2_digits_of_2_to_power_n.png
last_2_digits_of_2_to_power_n.xml

richie

unread,
Jan 30, 2022, 8:10:21 AM1/30/22
to Multi Expression Programming
Hi Mihai,

Thank you for the explanations and for the updated version!

I hope the additional operator can be of help in further improving the generalization capabilities of MEPX. Wondering if there is any other operator left which could be useful (and which cannot easily derived from the ones already present)...

Kind regards,
R.

richie

unread,
Jan 30, 2022, 8:51:47 AM1/30/22
to Multi Expression Programming
Hi Mihai,

BTW, is the "Random seed" feature still working in the last versions of the software? I'm testing on the last two versions, and I cannot see any change across the runs done with different starting seeds.

Best,
Richie

Mihai Oltean

unread,
Jan 30, 2022, 9:12:02 AM1/30/22
to me...@googlegroups.com
I must check that because I did some changes there recently.

regards,

Mihai Oltean

unread,
Jan 30, 2022, 9:34:19 AM1/30/22
to me...@googlegroups.com
Hi Richie,

I have tested a little bit the problem with the last 2 digits of 2 to power n.
I have added constants too (5 constants over the 0..100 interval).
I have obtained a perfect solution in 9 out of 10 cases, which is much better than without constants.
Also, the resulting programs are much shorter now.

for instance:

void mepx(double *x /*inputs*/, double *outputs)
{
//constants ...
  double constants[5];
  constants[1] = 99.020793;
 
  double prg[5];
  prg[0] = x[0];
  prg[1] = prg[0] + prg[0];
  prg[2] = constants[1];
  prg[3] = fmod(prg[1], prg[2]); // fmod
  prg[4] = floor(prg[3]);

  outputs[0] = prg[4];
}

which is floor((2 * a[n-1]) % 99.0).

I have attached the project.

regards,
last_2_digits_of_2_to_power_n_with_constants.xml

mihai....@gmail.com

unread,
Aug 22, 2022, 10:05:40 AM8/22/22
to Multi Expression Programming
Hi Richie,

Now I've added integer data type support for MEPX.

I've tested against some of the problems you mentioned in this thread's paper.

I've added some examples in the projects folder of the Windows archive 
and here: https://github.com/mepx/mepx-binaries/tree/master/mepx-projects (here are all those 8 sequences from the paper).

But, I've played a little bit with the parameters. 
For instance, the window size is a parameter that must be set by human and is not automatically discovered by the program.

best regards,
Mihai

richie

unread,
Aug 25, 2022, 9:37:56 AM8/25/22
to Multi Expression Programming
Hi Mihai,

many thanks for the communication!

> For instance, the window size is a parameter that must be set by human and is not automatically discovered by the program.

This is interesting. Is there any possibility that the automatic discover of the optimal window size will be integrated in the future, or it is an intrinsic limitation of MEPX?

Thanks,
Richie

mihai....@gmail.com

unread,
Aug 27, 2022, 4:46:40 AM8/27/22
to Multi Expression Programming
Hi,

I think that I can do that.
I'll try to implement it after finishing the implementation of multi-variate time series.

best regards,

Mihai Oltean

unread,
Aug 27, 2022, 11:10:52 AM8/27/22
to me...@googlegroups.com
Hi,

One more thing about the sequences in the paper mentioned above.

I see that some of them depend on the current index, like: 
u_n(x) = n
but n is not a parameter of the function u.

In this case, the actual formula will depend on which is the first number of the sequence.
For instance, if the sequence is 1,2,3... then the formula can be u_n(x) = n, but if the sequence is 2,3,4, then the actual formula is u_n(x) = n-1.
Thus, in such cases, you always have to specify all terms of the sequence, not only the last ones, as in time-series predictions.

MEP for time-series, in its current form, will discover a function that depends on the previous values, like this:
u_n(x) = u_{n-1}(x)+1 

where n does not appear explicitly in the formula.

Because of that, MEP formulas are longer than those presented in the paper mentioned above.

If the needed function should depend explicitly on the position of the number in the sequence, then the solution is to have a multi-variate time series where a column contains the index of the sequence.

Best regards,
Mihai
 


richie

unread,
Aug 28, 2022, 5:04:03 PM8/28/22
to Multi Expression Programming
This is great news! Looking forward to it (and of course to the implementation of the multi-variate ts)!

Kind regards,
R.


mihai....@gmail.com

unread,
Feb 23, 2023, 2:10:53 PM2/23/23
to Multi Expression Programming
Hi,

I just posted a new video showing how to discover formulas for the Josephus sequence. 
Previously, this was not discovered with 1 variable only.
This is why, now, I have added another variable, which is the index of the term.

Here is the video:

The discovered formula with MEPX seems to be simpler than the one with deep learning (from the paper that you mentioned):

F(n) = (F(n-1) + 1) % n + 1.

Best regards,
Mihai
Reply all
Reply to author
Forward
0 new messages