Consider a string of length 9, where each position is associated with three non-negative variables representing the "A-ness", "B-ness" and "C-ness" of that position. We impose the constraint that the sum of these variables is 1 for each position, since there should be a total of one character per position. So all in all there are 27 non-negative variables which sum to 1 in triples, if that makes sense.
The number of ABC's in the first three positions is 0.9 * 0.8 * 0.8 = 0.576, since there are 0.9 A's in the first position, 0.8 B's in the second position and 0.8 C's in the third position. The number of BCA's is 0.1 * 0.1 * 0.2 = 0.002, since there are 0.1 B's in the first position, 0.1 C's in the second position and 0.2 A's in the third position. The number of CBA's is 0 since there are 0.0 C's in the first position.
2. Raising some number to a sum of variables:
The number of ABC's is b^(0.9 + 0.8 + 0.8) / b^3 = b^-0.5, the number of BCA's is b^(0.1 + 0.1 + 0.2) / b^3 = b^-2.6 and the number of CBA's is b^(0.0 + 0.8 + 0.2) / b^3 = b^-2.0. If we choose b = 3 for example we get 0.577 ABC's, 0.057 BCA's and 0.111 CBA's.
1. Maximizing (the logarithm of) the product of the permutation counts.
3. Minimizing the permutation counts' deviations from 1 in the least squares sense. This function also penalizes strings where permutation counts grow unnecessarily large.
There's a jungle of methods for solving this type of problem and I have little knowledge of the subject, but I'm aware that it's generally very difficult to find a global minimum of a non-convex function and vice versa.
I experimented with a simple gradient descent method with logarithmic barrier functions and the results were a bit discouraging. I think I found the length 9 superpermutation in about half of the runs when I started with random values. This could probably be improved by tuning various parameters.
This was a while back and I don't remember all the details, so I'm sorry if my explanation is unclear.
On Mon, 8 Jul 2019 at 21:41, Vidar <vid...@gmail.com> wrote:I experimented with a simple gradient descent method with logarithmic barrier functions and the results were a bit discouraging.
--
You received this message because you are subscribed to the Google Groups "Superpermutators" group.
To unsubscribe from this group and stop receiving emails from it, send an email to superpermutato...@googlegroups.com.
To post to this group, send email to superper...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/superpermutators/CAKFNCm1kNrMtfbvLSKpOmGzAC30%2BsMWxgzoxwzfqqRMM8gHh4w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
You may know this already, but this trick is known as "one-hot encoding" in machine-learning: https://en.wikipedia.org/wiki/One-hot. In statistics such variables are apparently known as "dummy variables" (https://en.wikipedia.org/wiki/Dummy_variable_(statistics) ), and in linear programming I think they're called "switch variables".A common way of building a classifier is to encode the classification of each training example using one-hot encoding, then train a neural network (or other system with continuous outputs) to reproduce these. When the trained system is then run on new data, it outputs its estimate of the probability that a datapoint belongs to each class.
2. Raising some number to a sum of variables:
The number of ABC's is b^(0.9 + 0.8 + 0.8) / b^3 = b^-0.5, the number of BCA's is b^(0.1 + 0.1 + 0.2) / b^3 = b^-2.6 and the number of CBA's is b^(0.0 + 0.8 + 0.2) / b^3 = b^-2.0. If we choose b = 3 for example we get 0.577 ABC's, 0.057 BCA's and 0.111 CBA's.Wait, what? I don't understand why you're doing this. The expected total number of ABCs is P(ABC at position 1) + P(ABC at position 2) + ... + P(ABC at position 9). Where does the parameter b come from?
3. Minimizing the permutation counts' deviations from 1 in the least squares sense. This function also penalizes strings where permutation counts grow unnecessarily large.This sounds promising!
What constraints are you encoding in your barrier functions? From your first message, it sounds likeThis should be enough to rule out the trivial solution: set each probability to (m-n+1)^-n, where n is the number of possible characters and m is the length of the string, so the expected number of occurrences of each permutation is (m-n+1)*((m-n+1)^-n)^n = 1.
- all probabilities lie between 0 and 1
- each position's probabilities sum to 1
I think you could get some mileage out of adding additional constraints, if you're not already doing so:You could maybe also add some terms in your objective function to penalise higher-weight edges (indicated by a character occurring less than n positions apart, with smaller gaps between occurrences being penalised more heavily).
- the first characters are always ABC...
- each character is different from the one before it
Also, and sorry for not saying this in my first reply: hi, and welcome! It's great to have you in the group, and I for one am intrigued by your approach: I'm not aware of anyone having tried it before. I'm really interested to see your code!
In general, I am quite pessimistic about using gradient descent for combinatorial optimization - there aren't really clear gradients to navigate for this type of problem (not sure I can make this statement more rigorous).
However, I do think the idea of optimizing a relaxed version of the problem using some kind of numeric method is really promising!
I don't have time to implement either of these myself, but here are some ideas:1) Start by minimizing a version of the objective function with no barriers/no constraints, and then gradually "warm up" the strength of the barrier function in relation to the objective.
2) Generally to get a list of probabilities, you use the softmax activation function - so the model's outputs can be unconstrained logits which are then converted into probabilities by the softmax.(granted, I know you're not using a neural network, but the idea should still work)
3) There are neural network models which are specifically designed to analyze/output sequential data - LSTMs and GRUs in particular. Perhaps one of those models could be modified to output a list of characters, starting from the seed "abc" or some permutation, with the objective function being related to how close the output is to a superpermutation - for example this semi-recent paper using reinforcement learning to solve TSP. https://arxiv.org/pdf/1611.09940.pdf
I should have motivated this. The product formula has the disadvantage (?) that for instance ABCDDD counts as zero ABCDEF's, and the gradient of the objective function doesn't even indicate the direction toward ABCDEF. The exponential formula was just an attempt to fix this, but it certainly doesn't represent a probability.
What constraints are you encoding in your barrier functions? From your first message, it sounds likeThis should be enough to rule out the trivial solution: set each probability to (m-n+1)^-n, where n is the number of possible characters and m is the length of the string, so the expected number of occurrences of each permutation is (m-n+1)*((m-n+1)^-n)^n = 1.
- all probabilities lie between 0 and 1
- each position's probabilities sum to 1
Yes, those are the constraints, but I'm not sure we're talking about the same probabilities. What do you mean by the trivial solution?
You could maybe also add some terms in your objective function to penalise higher-weight edges (indicated by a character occurring less than n positions apart, with smaller gaps between occurrences being penalised more heavily).Great suggestions! I'll try this.
Also, and sorry for not saying this in my first reply: hi, and welcome! It's great to have you in the group, and I for one am intrigued by your approach: I'm not aware of anyone having tried it before. I'm really interested to see your code!Thanks! :) I'll share the code after I've fixed a few things.
In the examples I have looked at, no permutation of k distinct symbols occurs more than 0! + 1! + 2! + ... + (n - k)! times in a minimal superpermutation of n symbols. Maybe this is a useful constraint.
Sorry, I was unclear: by "the trivial solution", I meant "all entries are equal, such that each permutation has one expected appearance spread evenly over the whole string". In general, this "solution" does not meet the "entries are probabilities" constraints.
Good luck! After Cory's post, I'm wondering if we could represent these penalties using a convolutional layer...
In the examples I have looked at, no permutation of k distinct symbols occurs more than 0! + 1! + 2! + ... + (n - k)! times in a minimal superpermutation of n symbols. Maybe this is a useful constraint.Interesting! Have you checked this against all the example superpermutations in the repo?