Genetic Algorithm

From GPWiki

Files:GUITutorial_warn.gif The Game Programming Wiki has moved! Files:GUITutorial_warn.gif

The wiki is now hosted by GameDev.NET at wiki.gamedev.net. All gpwiki.org content has been moved to the new server.

However, the GPWiki forums are still active! Come say hello.

Contents

Introduction

Nature seems to have an uncanny knack for problem-solving. Life began as a handful of simple, single-celled organisms barely equipped to survive the harsh environment of planet Earth. However, in the short span of a few billion years, nature has adapted and evolved them into beings complex enough to ponder their own origins. While this is indeed amazing, the truly incredible part is that it all happened according to a simple plan--allow individuals with favorable traits to survive and reproduce, and let die all the rest. This, in short, is the basis for a genetic algorithm.

The Algorithm

The basic genetic algorithm attempts to evolve traits that are optimal for a given problem. It has a wide variety of common uses, notably for balancing weights in neural networks. Although the algorithm itself is fairly simple, those who are new to the concept may have difficulty understanding it. I strongly recommend that this section be reread until it is fully understood. The following example should help explain things as well:

You are presented with a three-by-three grid of blank boxes. Your goal is to find nine integer values between 0 and 15 (inclusive) to insert into each box so that the sums of the values in every row and column are 24.

0 <= N <= 15
N N N
N N N
N N N
= 24
= 24
= 24
= = =
24 24 24

This problem is somewhat complex, and while a human may be able to solve it quickly (due to superior intelligence), a computer would take much longer to generate enough random values to solve it. Fortunately, the genetic algorithm is the perfect choice for solving this problem quickly and efficiently.

Generation Zero

The first step in the genetic algorithm is to create an initial population, generation zero, that contains a set of randomized strings of genes. Each string of genes, illustratively called a genome or chromosome, represents a series of traits that may or may not be useful for the problem at hand. These "genes" are usually represented by either binary digits or real numbers. Consider our example.

First, the random population of genomes is created. A genome will contain the information for all nine variables in the grid. Because these variables, or traits, can hold the values 0-15, they can be easily represented as groups of four bits, or genes. Therefore, each genome contains a string of 36 bits, with 4 bits for each variable. A sample genome might look like this:

Random Genome
Bits (Genes) 0110 1100 1111 1011 0100 1010 0111 0101 1110
Values (Traits) 6 12 15 11 4 10 7 5 14

Survival of the Fittest

Every genome in the population must now be assigned a fitness score according to how well it solves the problem at hand. The process and approach to measuring a genome’s fitness will be different for every problem. Determining the fitness measure is the most important and often most difficult part of developing a genetic algorithm.

In our example, we will plug the values into the grid and see how close they come to solving it.

6 12 15
11 4 10
7 5 14
= 33
= 25
= 26
= = =
24 21 39

The resulting values are 33, 25, 26, 24, 21, and 39. Clearly, this genome did not solve the problem correctly, but it did come reasonably close. Therefore, it is necessary to assess and quantify the fitness of this particular genome. In this case, there is a fairly straightforward way of doing so. First, find out how far off each resulting value was from the target 24 by taking the absolute value of its difference from 24. Then, find the inverse of the sum of the differences.

\frac{1}{|33-24| + |25-24| + |26-24| + |24-24| + |21-24| + |39-24|}


\frac{1}{9 + 1 + 2 + 0 + 3 + 15}


\frac{1}{30}\approx0.033

Thus, the fitness score for this genome is approximately 0.033. As the resulting values get closer to the target value, the fitness score will increase. But what happens when the denominator equals zero? Well, the denominator will only be zero if all of the resulting values are 24, so if the denominator is zero, we have found a solution and can end the genetic algorithm.

The Next Generation

Once the fitness for every genome is determined, it’s time to start building the next generation of genomes based on probability and fitness. This is the main part of the genetic algorithm, where the strong survive and the weak perish. It usually consists of these three parts:

Selection

Two genomes are selected randomly from the current population (reselection allowed), with fitter genomes having a higher chance of selection. The selected genomes, which should have a relatively high fitness score, are guaranteed to pass some of their traits to the next generation. This means that the average fitness of each successive generation will tend to increase.

The best way to program the selection function is through a method creatively named roulette selection. First, a random number between zero and the sum of the population’s fitness is generated. Imagine this value as a ball landing somewhere on a pie graph of the population’s fitness. Then, each genome’s fitness, or slice of the pie graph, is added one by one to a running total. If the ball ends up in that genome’s slice, it is selected. Observe the following pseudo-code:

RouletteSelection()
{
    float ball  = rand_float_between(0.0, total_fitness);
    float slice = 0.0;
 
    for each genome in population
    {
        slice += genome.fitness;
    
        if ball < slice
            return genome;
    }
}

Crossover

The two genomes now have a good chance of crossing over with one another, meaning that they will each donate a portion of their genes to form two offspring that become part of the next generation. If they do not cross over, they simply go on to the next generation unchanged. The crossover rate determines how often the genomes will cross over, and should be in the vicinity of 65-85%.

A crossover operation on the binary genomes in our example would begin by choosing a random position at which to cross them. The first part of the father’s genes and the second part of the mother’s genes combine to form the first child, with a similar effect for the second child. The following shows a crossover operation with the crossover point at 12.

Before Crossing
Father 011110010011 001011011000111011010000
Mother 010100111110 010101111101000100010010
After Crossing
Child 1 011110010011 010101111101000100010010
Child 2 010100111110 001011011000111011010000

Mutation

Just before the genomes are placed into the next generation, they have a slight chance of mutating. A mutation is simply a small, random change to one of the genes. With binary genes, mutation means flipping the bit from 1 to 0 or 0 to 1. With real number genes, a small, random perturbation is added to the gene. The mutation rate determines the chances for each gene to undergo mutation, meaning that every individual gene should get a chance to mutate. The mutation rate should be roughly 1-5% for binary and 5-20% for real numbers.

Procedure

Now that you are familiar with the general idea of a genetic algorithm, this summary of the procedure should cement the concept nicely in your mind. If you still don't understand, just be patient and try rereading the previous sections. It will come to you eventually.

  • Create an initial population of random genomes.
  • Loop through the genetic algorithm, which produces a new generation every iteration.
    • Assess the fitness of each genome, stopping if a solution is found.
    • Evolve the next generation through natural selection and reproduction.
      • Select two random genomes based on fitness.
      • Cross the genomes or leave them unchanged.
      • Mutate genes if necessary.
    • Delete the old generation and set the new generation to the current population.
  • When a solution is found or a generation limit is exceeded, the loop breaks and the genetic algorithm is complete.

Source Code

If you think you understand genetic algorithms, I encourage you try making a simple program that solves the above problem using a genetic algorithm. If you get stuck, you can download the source for the program that I made (C++).

Genetic Algorithm Source (C++)

External Links