This part of the tutorial on using NEAT algorithm explains how genomes are crossed over in a meaningful way maintaining their topological information and how speciation (group genomes into species) can be used to protect weak genomes with new topological information from prematurely being eradicated from the gene pool before their weight space can be optimised.
The first part of this tutorial can be found here.
Tracking Gene History through Innovation Numbers
Part 1 showed two mutations, link mutate and node mutate which both added new genes to the genome. Each time a new gene is created (through a topological innovation) a global innovation number is incremented and assigned to that gene.
The global innovation number is tracking the historical origin of each gene. If two genes have the same innovation number then they must represent the same topology (although the weights may be different). This is exploited during the gene crossover.
Genome Crossover (Mating)
Genomes crossover takes two parent genomes (lets call them A and B) and creates a new genome (lets call it the child) taking the strongest genes from A and B copying any topological structures along the way.
During the crossover genes from both genomes are lined up using their innovation number. For each innovation number the gene from the most fit parent is selected and inserted into the child genome. If both parent genomes are the same fitness then the gene is randomly selected from either parent with equal probability. If the innovation number is only present in one parent then this is known as a disjoint or excess gene and represents a topological innovation, it too is inserted into the child.
The image below shows the crossover process for two genomes of the same fitness.
Speciation takes all the genomes in a given genome pool and attempts to split them into distinct groups known as species. The genomes in each species will have similar characteristics.
A way of measuring the similarity between two genomes is required, if two genomes are “similar” they are from the same species. A natural measure to use would be a weighted sum of the number of disjoint & excess genes (representing topological differences) and the difference in weights between matching genes. If the weighted sum is below some threshold then the genomes are of the same species.
The advantage of splitting the genomes into species is that during the genetic evolution step where genomes with low fitness are culled (removed entirely from the genome pool) rather than having each genome fight for it’s place against every other genome in the entire genome pool we can make it fight for it’s place against genomes of the same species. This way species that form from a new topological innovation that might not have a high fitness yet due to not having it’s weights optimised will survive the culling.
Summary of whole process
- Create a genome pool with n random genomes
- Take each genome and apply to problem / simulation and calculate the genome fitness
- Assign each genome to a species
- In each species cull the genomes removing some of the weaker genomes
- Breed each species (randomly select genomes in the species to either crossover or mutate)
- Repeat all of the above