Genetic Algorithm in R – Trend Following

This post is going to explain what genetic algorithms are, it will also present R code for performing genetic optimisation.

A genetic algo consists of three things:

  1. A gene
  2. A fitness function
  3. Methods to breed/mate genes

The Gene

The gene is typically a binary number, each bit in the binary number controls various parts of your trading strategy. The gene below contains 4 sub gene, a stock gene to select what stock to trade, a strategy gene to select what strategy to use, paramA sets a parameter used in your strategy and paramB sets another parameter to use in your strategy.

Gene = [StockGene,StrategyGene,ParamA,ParamB]

Stock Gene
00 Google
01 Facebook
10 IBM
11 LinkedIn

Strategy Gene
0 Simple Moving Average
1 Exponential Moving Average

ParamA Gene – Moving Average 1 Lookback
00 10
01 20
10 30
11 40

ParamB Gene – Moving Average 2 Lookback
00 15
01 25
10 35
11 45

So Gene = [01,1,00,11]

Would be stock=Facebook, strategy=Exponential Moving Average,paramA=10,paramB=45].

The strategy rules are simple, if the moving average(length=paramA) > moving average(length=paramB) then go long, and vice versa.

The fitness function

A gene is quantified as a good or bad gene using a fitness function. The success of a genetic trading strategy depends heavily upon your choice of fitness function and whether it makes sense with the strategies you intend to use. You will trade each of the strategies outlined by your active genes and then rank them by their fitness. A good starting point would be to use the sharp ratio as the fitness function.

You need to be careful that you apply the fitness function to statistically significant data. For example if you used a mean reverting strategy that might trade once a month (or what ever your retraining window is), then your fitness is determined by 1 or 2 datapoints!!! This will result in poor genetic optimisation (in my code i’ve commented out a mean reversion strategy test for yourself). Typically what happens is your sharpe ratio from 2 datapoints is very very high merely down to luck. You then mark this as a good gene and trade it the next month with terrible results.

Breeding Genes

With a genetic algo you need to breed genes, for the rest of this post i’ll assume you are breeding once a month. During breeding you take all of the genes in your gene pool and rank them according to the fitness function. You then select the top N genes and breed them (discard all the other genes they’re of no use).

Breeding consists of two parts:

Hybridisation – Take a gene and cut a chunk out of it, you can use whatever random number generator you want to determine the cut locations, swap this chunk with a corresponding chunk from another gene.

Eg.
Old gene: 00110010 and 11100110 (red is the randomly select bits to cut)
New gene: 00100110 and 11110010

You do this for every possible pair of genes in your top N list.

Mutation – After hybridisation go through all your genes and randomly flip the bits with an fixed probability. The mutation prevents your strategy from getting locked into an every shrinking gene pool.

For a more detailed explanation with diagrams please see:

http://blog.equametrics.com/ scroll down to Genetic Algorithms and its Application in Trading

genetic algo sharpe 1.14

Annualized Sharpe Ratio (Rf=0%) 1.15

On to the code:

?View Code RSPLUS
library("quantmod")
library("PerformanceAnalytics")
library("zoo")
 
#INPUTS
topNToSelect <- 5   #Top n genes are selected during the mating, these will be mated with each other
mutationProb <- 0.05 #A mutation can occur during the mating, this is the probability of a mutation for individual chromes
symbolLst <- c("^GDAXI","^FTSE","^GSPC","^NDX","AAPL","ARMH","JPM","GS")
#symbolLst <- c("ADN.L","ADM.L","AGK.L","AMEC.L","AAL.L","ANTO.L","ARM.L","ASHM.L","ABF.L","AZN.L","AV.L","BA.L","BARC.L","BG.L","BLT.L","BP.L","BATS.L","BLND.L","BSY.L","BNZL.L","BRBY.L","CSCG.L","CPI.L","CCL.L","CNA.L","CPG.L","CRH.L","CRDA.L","DGE.L","ENRC.L","EXPN.L","FRES.L","GFS.L","GKN.L","GSK.L","HMSO.L","HL.L","HSBA.L","IAP.L","IMI.L","IMT.L","IHG.L","IAG.L","IPR.L","ITRK.L","ITV.L","JMAT.L","KAZ.L","KGF.L","LAND.L","LGEN.L","LLOY.L","EMG.L","MKS.L","MGGT.L","MRW.L","NG.L","NXT.L","OML.L","PSON.L","PFC.L","PRU.L","RRS.L","RB.L","REL.L","RSL.L","REX.L","RIO.L","RR.L","RBS.L","RDSA.L","RSA.L","SAB.L","SGE.L","SBRY.L","SDR.L","SRP.L","SVT.L","SHP.L","SN.L","SMIN.L","SSE.L","STAN.L","SL.L","TATE.L","TSCO.L","TLW.L","ULVR.L","UU.L","VED.L","VOD.L","WEIR.L","WTB.L","WOS.L","WPP.L","XTA.L")
 
#END INPUTS
 
 
 
#Stock gene
stockGeneLength <- 3 #8stocks
#stockGeneLength<-6 #Allows 2^6 stocks (64)
 
#Strategy gene
strateyGeneLength<-2
 
#Paramter lookback gene
parameterLookbackGeneLength<-6
 
#Calculate the length of our chromozone, chromozone=[gene1,gene2,gene3...]
chromozoneLength <- stockGeneLength+strateyGeneLength+parameterLookbackGeneLength
 
#TradingStrategies
signalMACross <- function(mktdata, paramA, paramB, avgFunc=SMA){
 signal = avgFunc(mktdata,n=paramA)/avgFunc(mktdata,n=paramB)
 signal[is.na(signal)] <- 0
 signal <- (signal>1)*1 #converts bools into ints
 signal[signal==0] <- (-1)
 return (signal)
}
 
signalBollingerReversion <- function(mktdata, paramA, paramB){
  avg <- SMA(mktdata,paramB)
  std <- 1*rollapply(mktdata, paramB,sd,align="right")
  shortSignal <- (mktdata > avg+std)*-1
  longSignal <- (mktdata < avg-std)*1
  signal <- shortSignal+longSignal
  signal[is.na(signal)]<-0
  return (signal)
}
 
signalRSIOverBoughtOrSold <- function(mktdata, paramA, paramB){
  upperLim <- min(60*(1+paramB/100),90)
  lowerLim <- max(40*(1-paramB/100),10)
  rsisignal <- RSI(mktdata,paramB)
  signal <- ((rsisignal>upperLim)*-1)+((rsisignal<lowerLim)*1)
  return (signal)
}
 
 
#Gene = [StockGene,StrategyGene,ParamAGene,ParamBGene]
#The following functions extract specific parts of the gene
getStockGeneFromChromozone <- function(chrome){
     return(chrome[,seq(1,stockGeneLength)])
}
 
getStrategyGeneFromChromozone <- function(chrome){
     return(chrome[,seq(stockGeneLength+1,stockGeneLength+strateyGeneLength)])
}
 
getParameterLookbackGeneFromChromozone <- function(chrome){
     return(chrome[,seq(stockGeneLength+strateyGeneLength+1,stockGeneLength+strateyGeneLength+parameterLookbackGeneLength)])
}
 
#Once parts of the gene have been extracted they are then converted into
#lookback values, what stocks to trade, or what strategy to use
getStockDataFromChromozone<- function(chrome){
      #Basically a binary number to decimal converter
      gene <- getStockGeneFromChromozone(chrome)
      index <-sum(chrome*(2^(seq(1,length(gene),1)-1)))+1 #The +1 is to stop 0 since not a valid index
      cleanName <- sub("^","",symbolLst[index],fixed=TRUE)
     return (get(cleanName,symbolData))
}
 
getStrategyFromChromozone <- function(chrome){
      gene <- matrix(getStrategyGeneFromChromozone(chrome))
 
      if(all(gene==matrix(c(0,0)))){
       return (signalMACross)
      }
 
      if(all(gene==matrix(c(0,1)))){
       return (function(mktdata,paramA,paramB) {signalMACross(mktdata,paramA,paramB,avgFunc=EMA)})
      }
 
      if(all(gene==matrix(c(1,0)))){
       return (function(mktdata,paramA,paramB) {signalMACross(mktdata,paramA,paramB,avgFunc=ZLEMA)})
      # return (signalBollingerReversion)
      }
 
      if(all(gene==matrix(c(1,1)))){
       return (function(mktdata,paramA,paramB) {signalMACross(mktdata,paramA,paramB,avgFunc=WMA)})
      # return (signalRSIOverBoughtOrSold)
      }
      print("nothing found")
 
}
 
getLookbackAFromChromozone <- function(chrome){
    gene <- getParameterLookbackGeneFromChromozone(chrome)
    gene <- gene[,1:3]
    gene <- matrix(gene)
 
      if(all(gene==matrix(c(0,0,0)))){
        return (10)
      }
      if(all(gene==matrix(c(0,0,1)))){
        return (20)
      }
      if(all(gene==matrix(c(0,1,0)))){
        return (30)
      }
      if(all(gene==matrix(c(0,1,1)))){
        return (40)
      }
      if(all(gene==matrix(c(1,0,0)))){
        return (50)
      }
      if(all(gene==matrix(c(1,0,1)))){
        return (60)
      }
      if(all(gene==matrix(c(1,1,0)))){
        return (70)
      }
      if(all(gene==matrix(c(1,1,1)))){
        return (80)
      }
 
}
 
getLookbackBFromChromozone <- function(chrome){
    gene <- getParameterLookbackGeneFromChromozone(chrome)
    gene <- gene[,4:6]
    gene <- matrix(gene)
 
      if(all(gene==matrix(c(0,0,0)))){
        return (15)
      }
      if(all(gene==matrix(c(0,0,1)))){
        return (25)
      }
      if(all(gene==matrix(c(0,1,0)))){
        return (35)
      }
      if(all(gene==matrix(c(0,1,1)))){
        return (45)
      }
      if(all(gene==matrix(c(1,0,0)))){
        return (55)
      }
      if(all(gene==matrix(c(1,0,1)))){
        return (65)
      }
      if(all(gene==matrix(c(1,1,0)))){
        return (75)
      }
      if(all(gene==matrix(c(1,1,1)))){
        return (85)
      }
}
 
#The more positive the fitness, the better the gene
calculateGeneFitnessFromTradingReturns <- function(tradingRet){
  tradingFitness <- SharpeRatio.annualized(tradingRet)
  #tradingFitness <- SharpeRatio.annualized(tradingRet) * (1/maxDrawdown(tradingRet))
  #tradingFitness <- max(cumsum(tradingRet))/maxDrawdown(tradingRet)
  #tradingFitness <- sum((tradingRet>0)*1)/length(tradingRet) #% of trades profitable
  #tradingFitness <- -1*maxDrawdown(tradingRet)
  return(tradingFitness)
}
 
#This function performs the mating between two chromozones
genetricMating <- function(chromozoneFitness,useTopNPerformers,mutationProb){
        selectTopNPerformers <- function(chromozoneFitness,useTopNPerformers){
              #Ranks the chromozones by their fitness and select the topNPerformers
              orderedChromozones <- order(chromozoneFitness[,"Fitness"],decreasing=TRUE)
              orderedChromozones <- chromozoneFitness[orderedChromozones,]
 
              ##Often there are lots of overlapping strategies with the same fitness
              ##We should filter by unique fitness to stop the overweighting of lucky high fitness
              orderedChromozones <- subset(orderedChromozones, !duplicated(Fitness))
 
              print(orderedChromozones)
              return(orderedChromozones[seq(1,min(nrow(orderedChromozones),useTopNPerformers)),])
        }
 
        hybridize <- function(topChromozones,mutationProb){
            crossoverFunc <- function(chromeA,chromeB){
 
            chromeA <- chromeA[,!colnames(chromeA) %in% c("Fitness")]
            chromeB <- chromeB[,!colnames(chromeB) %in% c("Fitness")]
 
                  #Takes a number of chromes from B and swaps them in to A
                  nCross <- runif(min=0,max=ncol(chromeA)-1,1) #the number of individual chromes to swap
                  swapStartLocation = round(runif(min=1,max=ncol(chromeA),1))
                  swapLocations <- seq(swapStartLocation,swapStartLocation+nCross) #Can run over the end of our vector, need to wrap around back to start
                  swapLocations <- swapLocations %% ncol(chromeA)+1 #Performs the wrapping
                  chromeA[1,swapLocations] <- chromeB[1,swapLocations] #Performs the swap
                  return (chromeA)
            }
 
            mutateFunc <- function(chrome,mutationProb){
                return((round(runif(min=0,max=1,ncol(chrome))<mutationProb)+chrome) %% 2)
            }
 
            #Take each chromozone and mate it with all the others (and it's self)
            a <- topChromozones[rep(seq(1,nrow(topChromozones)),each=nrow(topChromozones)),] #Repeat each row nrow times
            b <- topChromozones[rep(seq(1,nrow(topChromozones)),nrow(topChromozones)),] #Repeat whole matrix nrow times
 
 
            #Can this be vectorised (not huge amounts of data anyway so probs not an issue)?
            res <- matrix(nrow=0,ncol=ncol(a)-1) #The minus 1 is to drop the "Fitness" column
            for(i in 1:nrow(a)){
                res <- rbind(res,mutateFunc(crossoverFunc(a[i,],b[i,]),mutationProb))
            }
            return (res)
        }
 
        topChromozones <- selectTopNPerformers(chromozoneFitness,useTopNPerformers)
        #return ((hybridize(topChromozones,mutationProb))) #You may want duplicates to give more weight to 'good' genes
        return (unique(hybridize(topChromozones,mutationProb))) #Remove duplicate genes
}
 
#This function takes a chrome/gene and does the according trades
#It takes market data and a start and an end date
#It does not take responsibility for the mating and ranking of genes
doGeneticTrading <- function(mktdata,chrome, startDate, endDate){
    signalFunc <-getStrategyFromChromozone(chrome)
    paramA <- getLookbackAFromChromozone(chrome)
    paramB <- getLookbackBFromChromozone(chrome)
 
    signal <- signalFunc(Op(mktdata),paramA,paramB)
    opClRet <- (Cl(mktdata)/Op(mktdata)) - 1
    tradingReturns = opClRet * signal
    dataWin <- (paste(startDate,"::",endDate,sep=""))
    tradingReturns <- tradingReturns[dataWin]
    colnames(tradingReturns) <- c("TradingRet")
    return(tradingReturns)
}
 
#This function mates genes every month
#It also passes those genes into the doGeneticTrading function
doTrading <- function(chromelist){
  #Function for taking a year and a month and spitting out a clean date
  cleanDate <- function(y,m){
      if(m == 13){
       m <- 1
       y <- y+1
      }
       if(m < 10){
         return(paste(y,paste("0",m,sep=""),sep="-"))
       } else {
         return(paste(y,m,sep="-"))
       }
  }
  year <- 2002
  month <- 1
  totalRet <- 0
  fitnessEvoltion <- 0
 
  dev.new()
  par(mfrow=c(2,1))
  #Loop through many years and months
  for(y in 2002:2010){
    for(m in 1:12){
        chromeFitness <-  as.data.frame(matrix(nrow=0,ncol=ncol(chromelist)))
 
        startD <- cleanDate(y-2,m) #Subtracting off 2 years to ensure we pass enough data in(should really be calculated from MA lookback)
        liveStart <- cleanDate(y,m)
        liveEnd <- cleanDate(y,m+1)
        print(paste("Start",startD,"LiveStart",liveStart,"LiveEnd",liveEnd))
        dataWin <- (paste(startD,"::",liveEnd,sep=""))
        monthReturn <- data.frame()
        #Look through all the active chromes and use them for trading
        for(cn in 1:nrow(chromelist)){
        #USE a try catch just incase there are data issue etc...
         try({
            mktdata <- getStockDataFromChromozone(chromelist[cn,])
            tradingRet <- doGeneticTrading(mktdata[dataWin],chromelist[cn,],liveStart,liveEnd)
            tradingRet <- tradingRet*(1/nrow(chromelist)) #even money given to each strategy
            tradingFitness <- calculateGeneFitnessFromTradingReturns(tradingRet)
            if(!is.nan(tradingFitness) && !is.nan(max(tradingRet)) && !is.nan(min(tradingRet))){
              if(length(monthReturn) == 0 ){
               monthReturn <- tradingRet
              } else {
               monthReturn <- cbind(monthReturn,tradingRet)
              }
 
            res <- cbind(chromelist[cn,],tradingFitness)
            colnames(res) <- c(colnames(chromelist[cn,]),"Fitness")
            chromeFitness <- rbind(chromeFitness,res)
            }
           },silent=FALSE)
        }
        print("Month return")
        #Collapse all the trades from each chromozone into a single P&L for each day in the month
        monthReturn <- apply(monthReturn,1,sum,na.rm=TRUE)
        print(monthReturn)
        currentMonthFitness <- calculateGeneFitnessFromTradingReturns(monthReturn)
        #Update the running total of P&L
        totalRet <- c(totalRet,monthReturn)
        fitnessEvoltion <- c(fitnessEvoltion,currentMonthFitness)
        plot(cumsum(totalRet))
        plot(fitnessEvoltion)
        #print(chromeFitness)
        #print(chromeFitness[,"Fitness"])
        chromelist <- genetricMating(chromeFitness,topNToSelect,mutationProb)
        print(paste("There are",nrow(chromelist), "chromes active"))
        print(paste("Min Fitness:",min(chromeFitness[,"Fitness"])))
        print(paste("Max Fitness:",max(chromeFitness[,"Fitness"])))
        print(paste("Average Fitness:",mean(chromeFitness[,"Fitness"])))
        print(paste("Current Month Fitness:",currentMonthFitness))
 
 
    }
  }
   return (totalRet)
 
}
 
 
 
#Specify dates for downloading data, training models and running simulation
startDate = as.Date("2000-01-01") #Specify what date to get the prices from
symbolData <- new.env() #Make a new environment for quantmod to store data in
getSymbols(symbolLst, env = symbolData, src = "yahoo", from = startDate)
 
 
#Create some genes at random
#Make a diag matrix so that each chrome gets activated atleast once
startingChromozones <- diag(chromozoneLength)
rownames(startingChromozones) <- apply(t(seq(1,chromozoneLength)),2,function(x) { paste("Chrome",x,sep="") } )
fitness <- matrix(runif(min=-1,max=1,nrow(startingChromozones)),nrow=nrow(startingChromozones),ncol=1)
colnames(fitness) <- c("Fitness")
startingChromozones <- as.data.frame(cbind(startingChromozones,fitness))
 
 
 
 
print("Before mating")
print(startingChromozones)
print("After mating")
startingChromozones <- genetricMating(startingChromozones,topNToSelect,mutationProb)
print(startingChromozones)
 
 
tradingReturns <- doTrading(startingChromozones)
tradingReturns <- as.data.frame((as.matrix(tradingReturns[-1])))
tradingReturns<-as.zoo(tradingReturns)
 
dev.new()
charts.PerformanceSummary(tradingReturns,main=paste("Arithmetic Genetic Trading Returns"),geometric=FALSE)
print(table.Stats(tradingReturns))
cat("Sharpe Ratio")
print(SharpeRatio.annualized(tradingReturns))

13 thoughts on “Genetic Algorithm in R – Trend Following

  1. really great post thanks.. typed through the code and read the post you linked wrt the genetic algo which is also a great site.. trying to just stick to this algo to learn more the intricacies and getting it to work well in R.
    I’m more of a hack in R and was just kicking around ideas on where you would put in drawdown % as a parameter? (trading fitness) or maybe better to use it as part of gene pool? The smaller things i’m wrestling with are getting it to run all the way to current date, and then kick out as of today.. right now it throws an error if i just run it through 2013 b/c it kicks out as of last trading day return. also was curious if you had considered running more as a daily optimisation or getting it configured for intraday.. i’m guessing that would be messing around with the cleandate function..

    at any rate, very informative post on this, thx!

    • Thanks for the kind comments.

      Modify the function calculateGeneFitnessFromTradingReturns, some of the commented out lines contain drawdown calculations.

      Modify for loop with the comment ‘loop through years and months’. Currently it will only run upto 2010.

      There are probably many great R packages that can do genetic optimisation written in a lower language/run significantly faster.

  2. Hi Gekko
    I thank you for every effort you put into this blog. It is really very good.
    I tried running the code but I am not finding the same result…the sharpe it is showing is 0.64 …Regards(please note that I did a copy paste)

    • The genetic algo is a random algorithm, the way the genes mate and mutate is random, hence you will have different results than I do.

      Perhaps run a monte Carlo and calculate the probability of getting a specific sharpe ratio

  3. really thanks. a question on the algo is what if different strategies have different set of parameters. for example, strategy 1 has four parameters (A1,A2,A3,A4) while strategy 2 has 2 parameter2 (B1,B2). now how to mating as Ai and Bi belongs to different fields.

    • Hi Ben,

      Typically you may chose to just ignore two of the genes / bits.

      E.g
      Bit’s ABCD
      Strategy A takes ABCD as a parameter and uses ABCD
      Strategy B takes ABCD as a parameter but only uses AB

      It should have minimal effect on your optimisation results.

  4. Hey,

    I think I found a problem with your algorithm. Because liveStart and liveEnd only contain years and months you get 2 months of returns in tradingRet which causes you to rank the chromozones on OOS data.

    Say you have m=1, you then generate returns for m=1,2, add these to tradingReturns and select best chromes, then go to m=2 generate returns for m=2,3. Add these to tradingReturns etc. etc.

    So you rank the chromozones on OOS data and you have returns for each month twice.

    I fixed this by using:
    cleanDate1 <- function(y,m){
    if(m == 13){
    m <- 1
    y <- y+1
    }
    if(m < 10){
    return(paste(y,paste("0",m,sep=""),1,sep="-"))
    } else {
    return(paste(y,m,1,sep="-"))
    }
    }

    cleanDate2 <- function(y,m){
    if(m == 13){
    m <- 1
    y <- y+1
    }

    if(is.element(m,c(1,3,5,7,8,10,12))){
    d = 31
    }
    else if(is.element(m,c(4,6,9,11))){
    d = 30
    }
    else if(y%%4 == 0){
    d = 29
    }
    else{
    d = 28
    }

    if(m < 10){
    return(paste(y,paste("0",m,sep=""),d,sep="-"))
    } else {
    return(paste(y,m,d,sep="-"))
    }
    }

    and
    liveStart <- cleanDate1(y,m)
    liveEnd <- cleanDate2(y,m)

  5. Is there any particular reason you programmed the genetic algorithm “manually” rather than using a ready-made (and powerful) package like rgenoud?

  6. Hi Gekko!
    Thank’s for Code.
    Can i reduce some way number of transactions?
    I got 2.5 deals per day using 6 indexes.
    Sorry for my English (I’m from Russia) ))).

  7. Thanks for the great post. But I’m wondering why your GA does not have an internal loop which is mean to generate multiple generations until fitness get improved to a certain value or the maximal iteration is reached. The function genetricMating is only called one time per month or 12 times per year. As your gene pool is updated one time per month, I guess it should be better to iterate more within one month in order to make a more suitable GA.

  8. Thanks for the post. After trying genetic programming on several assets, however, I find that the simulation results varies a lot from run to run. This randomness makes it difficult to implement in real life trading. May be good for stability test though.

Leave a Reply to Neverfox Cancel reply

Your email address will not be published. Required fields are marked *