COST Action IC0702 - SoftStat

Work Group B |

From a statistical point of view, metaheuristics like evolutionary algorithms can be interpreted as iterative estimation procedures if one assumes an underlying population model and sees the fitness functions as a distribution the mode of which is to be estimated. The convergence of an evolutionary algorithm to the true or best solution can then be seen as the consistency of this estimator. By similar reasoning other statistical properties of estimators (like unbiasedness, efficiency, sufficiency etc.) can be transferred to the metaheuristics framework, which may provide results about the expected number of steps (generations) to reach a solution (of given minimum quality) or the expected solution quality after a given number of steps (generations). Although there exists some seminal work along these lines, concrete results are often restricted to highly simplified models of genetic algorithms (small populations, special fitness functions, very restricted genetic operations etc.) or provide only fairly general or somewhat vague insights into why genetic algorithms work. It is definitely desirable to achieve a better and formally grounded understanding of the search as it is carried out by an evolutionary algorithm, for which statistical techniques are very well suited, since the evolutionary search is essentially a guided random process.

Estimation of distribution algorithms are a newer addition to the range of evolutionary algorithms, which make stronger use of statistical principles, although they are still based on the classical paradigms. Instead of forming new individuals for the next generation by recombining the (genetic descriptions of) the individuals of the current generation (as it is the case in all "classical" evolutionary algorithms), estimation of distribution algorithms use the current population to estimate a distribution on the search space, which describes (in probabilistic terms) where good solutions may be located and what the quality of these solutions can be expected to be. The next generation of individuals is then obtained from this estimated distribution by random sampling, so that there is no direct connection between the (genetic description of the) individuals of two consecutive populations, thus enhancing flexibility and sometimes convergence speed, but also introducing new problems. Since distribution estimation and sampling are used, statistical techniques can be expected to solve these problems while maintaining the advantages. The investigation of the properties of estimation distribution algorithms (w.r.t., for example, expected number of populations, expected solution quality) is in a somewhat better state than for standard evolutionary algorithms, but still needs a lot of improvement.

- João Miguel da Costa Sousa (leader)
- Ayşen Apaydın
- Bernard de Baets
- Jürgen Branke
- Christakis Charalambous
- Grégoire Danoy
- Bernabé Dorronsoro
- Cristian Gatu
- Lars Hildebrand
- Raimundas Jasinevicius
- Janusz Kacprzyk
- Uzay Kaymak
- Frank Klawonn
- Erricos Kontoghiorghes
- Bernd Reusch
- Nikolaos S. Thomaidis
- Özlem Türkşen
- Bülent Tütmez
- Susana Margarida da Silva Vieira
- Peter Winker