Estimation of distribution algorithms with dependency learning

Estimation of distribution algorithms with dependency learning(2009)

引用 23|浏览4
暂无评分
摘要
For the success of Estimation of Distribution Algorithm (EDA) for optimization, it is important to define an appropriate model to approximate the fitness landscape, and at the same time the model should simplify the problem so as to make the problem easy to solve. To tradeoff the complexity and the learning of distribution models in EDA, this thesis proposes a new framework of Estimation of Dependency and Distribution Algorithm (EDDA) to choose an appropriate learning model automatically. Basically, EDDA partitions an individual representation into separate parts such that they are independent with respect to the fitness function. The independent parts of the individual representation are evolved separately with a different distribution model each. The combination of the optima of the independent parts forms the optimum of the complete individual representation. For the problems which cannot be partitioned into completely independent parts, EDDA also maintains the information of the interdependencies between the separate parts and evolves the interdependencies. The complexity of a model is determined adaptively by the amount of the dependency information maintained in the model.There are several advantages of EDDA over the standard Evolutionary Computation. First, partitioning the individual representation and evolving the independent parts separately reduces the size of the search space significantly. Consequently, the global optimum becomes easier to be found than in the original space. Second, important dependency information between the separate parts are maintained while the trivial ones are ignored, and so the complexity of the model is selected at an appropriate level. Third, it is easy to control the diversity and convergence of the sub-populations of the separate parts of the individual representation, because the sub-populations are of only a few dimensions. Fourth, compared to other EDAs, EDDA learns the distribution model with all the individuals in the population and with their fitness. EDDA thus estimates a better approximation of a more complete fitness landscape.Based on the framework of EDDA, four algorithms have been developed for different problems.A new Genetic Algorithm with Independent Component Analysis (GA/ICA) is proposed for unconstraint function optimization. GA/ICA uses ICA to project the original space into a new space such that the new dimensions are independent from each other with respect to the fitness function. Dividing a solution into independent parts and evolving the parts separately clearly makes the problem easier than evolving in the original space. The experiments show that GA/ICA requires much less function evaluations to produce optimal or close-to-optimal solutions which are better than or comparable to those produced by Orthogonal Genetic Algorithm on the benchmark problems.A parallel development with GA/ICA is a novel Instruction Matrix based Genetic Programming (IMGP) is designed to evolve programs for problem solving. IMGP evolves instructions separately and at the same time maintains the interdependencies between the instructions in the form of subtrees. It can be shown that IMGP actually evolve some schemata directly, and thus it is efficient and effective in searching the global optimum. The experimental results verify that IMGP out-performs the canonical Genetic Programming and other related algorithms on both the benchmark Genetic Programming problems and classification problems. EDDA is then applied to an important bioinformatics problem, i.e., computational motif discovery in DNA sequences. Estimation of Distribution Algorithm for Motif Discovery (EDAMD) employs a Gaussian distribution to model the distribution of the motif consensuses in the population. The Gaussian distribution is able to capture the bi-variate linear dependencies between the motif positions. A fast local search method is used to find a set of motif instances from a motif consensus sampled from the Gaussian distribution. EDAMD has achieved a better performance than other Genetic Algorithms on the testing real problems. A new deterministic algorithm, Cluster Refinement algorithm for Motif Discovery (CRMD), is also designed for this problem. Rather than evolving a population of motif consensuses, CRMD clusters all the subsequences where each cluster has already maximized part of the objective function of motif. With the clusters, CRMD identifies the corresponding sets of motif instances by maximizing the objective function. On a variety of benchmark problems with different levels of difficulties and properties, CRMD has a better performance than the testing state-of-the-art algorithms.
更多
查看译文
关键词
motif instance,dependency learning,motif consensus,Gaussian distribution,independent part,benchmark problem,distribution algorithm,distribution model,original space,separate part,Distribution Algorithm,individual representation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要