Functions used for individual with initCycle() not run with algorithms.eaGenerateUpdate and cma.StrategyOnePlusLambda?

97 views
Skip to first unread message

Glen van den Bergen

unread,
Aug 6, 2015, 9:50:39 PM8/6/15
to deap-users
This is probably a stupid question, but I can't seem to figure it out on my own.

I register an individual like so:

toolbox.register("individual", tools.initCycle, creator.Individual, funcs_seq, n=1)

The funcs_seq is a tuple of functions that each return pairs of floats for particular types of amino acids in the protein sequence e.g. glycines, prolines etc.

I also register this strategy:

strategy = cma.StrategyOnePlusLambda(parent=parent, sigma=cma_sigma, lambda_=lmbda)

And this is the algorithm used:

pop, log = algorithms.eaGenerateUpdate(toolbox, ngen=run.gen_num, stats=stats, halloffame=hof, verbose=True)

Now the problem I'm having is that when I run the optimisation it doesn't appear to ever execute code in the functions that are contained in funcs_seq. I tested it by putting print statements in the code, but nothing gets printed from those functions, so I assume they're never used. What I'm trying to determine is whether this is intended functionality for using the 1+lambda strategy with the generate+update algorithm? I'm getting the sense that I'm misunderstanding how these work, and shouldn't expect those functions to run.

But then I have to wonder, if that's the only way I've provided to generate individuals, how is it generating individuals without ever executing those functions? Does the strategy and/or algorithm take care of it entirely?

Would appreciate some expertise with this. Cheers.

Glen van den Bergen

unread,
Aug 9, 2015, 6:43:48 PM8/9/15
to deap-users
Can anyone help?

François-Michel De Rainville

unread,
Aug 10, 2015, 4:34:34 PM8/10/15
to deap-...@googlegroups.com
I'm sorry for the delay. I'll provide a very short answer for the moment as I haven't had time to read your entire problem.

Instead of using initCycle try to use your own function to init you sequences for example:

def init_seq(icls, arg1, arg2, ...):
    part1 = numpy.random.random(3) * arg1
    part2 = numpy.random.randint(0, arg2, size=(2,))
    ...
    # icls is the class created for the individuals
    # it is initalized with a sequence
    ind = icls([part1, part2, ...])
    return ind

toolbox.register("individual", init_seq, creator.Individual, arg1=my_arg1, arg2=my_arg2, ...)

You can look at the PSO example for a complete example. IMHO, initCycle should be banned from the library (I introduced it and I'm not very proud). It is too complicated and does stuff that is way easier to do by hand anyway.

Also note that I haven't tested the code.

Cheers,
François-Michel



2015-08-09 18:43 GMT-04:00 Glen van den Bergen <g.vande...@gmail.com>:
Can anyone help?

--
You received this message because you are subscribed to the Google Groups "deap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deap-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Glen van den Bergen

unread,
Aug 11, 2015, 3:54:28 AM8/11/15
to deap-users
Thanks for responding François. I'll try implementing it myself and see how that goes. However, I feel like I'll still end up with the same problem where the CMA strategy and GenerateUpdate algorithm will ignore the initialisation functions for individuals.

I'll report back with results.

François-Michel De Rainville

unread,
Aug 11, 2015, 8:33:56 AM8/11/15
to deap-...@googlegroups.com
Actually, you are right! The first population is initialized through the generate method (It's been a really long time since I haven't touched that code). I don't think this qualifies as a bug, but rather an API inconsistency. If you wish to use your initCycle you can call your individual function directly to initialize the parent.

parent = toolbox.individual()

Note that the 1+lambda strategy expects a flat array.

Regards,
François-Michel

Glen van den Bergen

unread,
Aug 11, 2015, 8:18:15 PM8/11/15
to deap-users
Is there a way I can have a little more control of the selection of values chosen by the CMA? Each float in the array is a dihedral angle for an amino acid (i.e. phi, psi). However for some special case amino acids, like proline and glycine, I want to have the CMA to select values from a different statistical distribution.

Do you know of a way that I could achieve this, or is the CMA not well suited for this kind of problem?

François-Michel De Rainville

unread,
Aug 13, 2015, 9:50:57 AM8/13/15
to deap-...@googlegroups.com
The new samples in CMA-ES are produced using a multivariate gaussian distribution. You can remap these values to another distribution (or range) without much worries. You can do the remapping at evaluation time (without modifying the original individual). This is called a genotype-phenotype transformation, where the genotype is produced by the CMA and the phenotype is evaluated. This is pretty standard in classical evolutionary computation.

Regards,
François-Michel


Glen van den Bergen

unread,
Aug 13, 2015, 7:35:56 PM8/13/15
to deap-users
Ok, thanks for the suggestion. I'm new to this and so hadn't heard of the technique before. I did some Googling and found the following from Nikolaus Hansen.


However I couldn't find any specific examples of it used in DEAP. I'll continue to search and use what's available, but if you know of any specific examples where it's already used in DEAP, please let me know.

Thanks for your help François-Michel.

François-Michel De Rainville

unread,
Aug 13, 2015, 8:19:54 PM8/13/15
to deap-...@googlegroups.com
Actually you don't need anything special. In your evaluation function you receive an individual. Just transform this individual before computing and returning its fitness.

Glen van den Bergen

unread,
Aug 13, 2015, 9:16:01 PM8/13/15
to deap-users
Ok, just to clarify for myself and anyone else who might read this thread.

  1. The CMA generates individuals (collectively a "population")
  2. Each individual, in turn, is passed to the evaluation function where it is first transformed based on additional distributions that I define e.g. independent distributions for different amino acids such as proline, glycine etc.
  3. These transformed individuals then continue through the evaluation process where their fitness is scored
  4. The CMA process continues as normal with the covariance matrix updated (based on the fitness scores of the evaluated individuals)
  5. A new population is generated from the updated matrix
Is that a correct understanding?

Glen van den Bergen

unread,
Aug 13, 2015, 10:22:43 PM8/13/15
to deap-users
Sorry, another point I wanted to clarify is whether the covariance matrix has any awareness of these other distributions i.e. does it 'learn' to choose more appropriate phi and psi values for particular amino acids as it evaluates more individuals that have been transformed?

François-Michel De Rainville

unread,
Aug 14, 2015, 6:55:01 AM8/14/15
to deap-...@googlegroups.com
Exactly.

No, the CMA-ES does not need to know about the other distributions since it will optimize in the genotype space. In the best case, the mapping between the genotype space and the phenotype space is linear, in which case CMA-ES is invariant. In the worst case, the transformation (non-linear) will add ruggedness to the genotype space, which can also be handled by CMA-ES (up to a certain extent, atleast).

Cheers

Glen van den Bergen

unread,
Aug 19, 2015, 11:47:56 PM8/19/15
to deap-users
My attempt at this doesn't appear to have worked.

The progress of the evolution can be seen below.

You will notice that the evolution progress is relatively flat, whereas previously evolution progress looked more like this.

Notice the incremental improvement towards a fitness of 1.0, particularly in the first ~50 generations.


The evaluation function is implemented like this.


def evaluate(individual, deap_aa_seq, ref_angle, hydro_index, run, trans=True):
    if trans:
        trans_ind = transform_ind(individual, deap_aa_seq)
    else:
        trans_ind = individual

    structure = build_peptide_fast(trans_ind, deap_aa_seq)

    hpmv, unit_vec, norm_hpm, amph_fit = calculate_amph(structure, deap_aa_seq, ref_angle, hydro_index, run)
    assert 0 <= amph_fit <= 1, "Amphipathic fitness isn't normalised!"

    return amph_fit,


This is where the individual is transformed.


def transform_ind(ind, aa_seq):
    phis = ind[0::2]
    psis = ind[1::2]
    trans_ind = []
    append = trans_ind.append

    for aa, phi, psi in zip(aa_seq, phis, psis):
        trans_phi, trans_psi = get_dihedrals(AA_DICT_1_to_3[aa])
        append(trans_phi)
        append(trans_psi)

    return trans_ind


This is where relevant parameters for the statistical models are gathered.


def get_dihedrals(aa, num=1):
    phi_mu = AA_STATS[aa]['phi_mu']
    phi_std = AA_STATS[aa]['phi_std']
    psi_mu = AA_STATS[aa]['psi_mu']
    psi_std = AA_STATS[aa]['psi_std']
    covar = AA_STATS[aa]['covariance']

    if aa == 'GLY':  # special case for glycine because it's bimodal
        phi, psi = get_gly_angles(num)
    else:
        phi, psi = get_aa_angles(phi_mu, phi_std, psi_mu, psi_std, covar, num)

    return phi, psi


This is the function for all the amino acids, except glycine.


def get_aa_angles(phi_mu, phi_std, psi_mu, psi_std, covar, num=1):
    phi, psi = multivariate_normal([phi_mu, psi_mu], covar, size=num).T

    return phi, psi


And this is the function for selecting glycine angles (the parameters for the GMM are pre-computed based on available glycine data).


def get_gly_angles(num):
    g = mixture.GMM(n_components=2)
    g.converged_ = True
    g.covars_ = np.array([[302.20669879, 542.40686993], [238.31054863, 360.50980976]])
    g.means_ = np.array([[-70.62229728, -27.70590334], [85.05020584, 6.95850818]])
    g.weights_ = np.array([0.39771791, 0.60228209])
    xy_samples = g.sample(num)
    phi, psi = zip(*xy_samples)

    return phi, psi


What I assume is happening I'm not applying the changes to the transformed individual correctly, such that the CMA-ES is unable "learn" in the genotype space because I keep overwriting the values it produces in the phenotype.


Any advice for how I might be able to address this?

Glen van den Bergen

unread,
Aug 20, 2015, 6:53:13 PM8/20/15
to deap-users
I was just re-reading earlier posts in this thread and realised my error.

You can remap these values to another distribution (or range) without much worries. You can do the remapping at evaluation time (without modifying the original individual).

This is where I've made the error. I'm directly modifying the individual, which of course causes issues because the CMA-ES update step is still occurring on the original population, not the transformed one.

I'll investigate how I can remap values to another distribution, without modifying the individual. If you have any advice on how to do this, please let me know. 

François-Michel De Rainville

unread,
Aug 21, 2015, 8:10:28 AM8/21/15
to deap-...@googlegroups.com
The best way would be to make sure the individuals are copy, if you have a single level list:

l2 = l1[:]
or
l2 = list(l1)

will do the trick. Or if you have nested lists you can use the copy module:

l2 = copy.deepcopy(l1)

With numpy,array you can use

n2 = n1.copy()

Keep us updated on wether or not this was the bug.

Regards,
François-Michel

Glen van den Bergen

unread,
Aug 21, 2015, 8:33:20 PM8/21/15
to deap-users
The thing I'm confused about is even if I copy the individual, transform it, evaluate the transformed individual and return the fitness, during the "update" step of the CMA it's updating the matrix based on the original population of untransformed individuals. How does the CMA strategy learn to optimise the fitness for individuals that it can't "see"?

Do I need somehow pass the population of transformed and evaluated individuals to the "update" step?

François-Michel De Rainville

unread,
Aug 25, 2015, 10:46:50 AM8/25/15
to deap-...@googlegroups.com
Since the values (from evaluation) are mapped to the original search space, searching in it makes no difference than searching in the transformed search space (under certain transformation hypothesis: CMA-ES is scale and rotation invariant).
Reply all
Reply to author
Forward
0 new messages