K-Nearest Neighbour Algo : Fail

In this post I’m going to look at an implementation of the k-nearest neighbour algorithm.

The algorithm is very simple and can be split into 3 components:

1. A Data Measure – What features / observation describe the current trading day? (vol, rsi, moving avg etc…, don’t forget to normalise your measurements) (variable dataMeasure)

2. Error Measure – How to measure the similarity between data measures (just use MSE) this identifies the K-most similar trading days to today (function calculateMSE)

3. Convert k-nearest neighbors to trading signal (function calculateTradeSignalFromKNeighbours)

In the data measure we look to come up with some quantitative measures that capture information about the current trading today. In the example presented below I’ve used a normalised volatility measure (vol(fast)/(vol(fast)+vol(slow)) where fast and slow indicate the window size, slower = longer window. The same procedure but for linear regression curves is used, additionally i’ve included a fast / slow rsi. We take this measure and compare it to the measures on all of the previous trading days, trying to identify the most similar K days in history.

You should look to normalise your signals in some fashion. The reason you need to do this is so that during the MSE calculation you haven’t unexpectedly put a large weight on one of your measurement variables.

Now that you’ve found a set of trading days that are most similar to your current trading day you still have to determine how to convert those days into a trading signal. In the code I take the Knearest neighbours and look at what occurred on the day after after them. I take the open close return and calculate the sharpe ratio of the K neighbors and use this as the number of contracts to buy the following day. If the K neighbors are unrelated their trading will be erratic and the sharpe ratio close to 0, hence we will only trade a small number of contracts.

This algo is potentially interesting when using vol as one of the data measures, it naturally captures the different regimes in the market. If today is a high vol day, it’ll be compared to the historical days that also have high vol. It is hoped that todays market still behaves in the same fashion as in a historically similar day.

gspc trading knearest neighbors

ftse trading knearest neighbors


arm trading knearest neighbors

Sadly the performance of this strategy is terrible (could just be poor input parameter selection / poor data measures). I suspect that there are better forms of K-nearest neighbour to use. I take today, and compare it to single days in history. There could be significant gains to be had if I take 1 month of data and find the historical most similar month. This will identify patterns of similar behavior which may be more tradeable. I will investigate this in my next post.

On to the code:

?View Code RSPLUS
marketSymbol <- "ARM.L"
nFastLookback <- 30 #The fast signal lookback used in linear regression curve
nSlowLookback <- 50 #The slow signal lookback used in linear regression curve
nFastVolLookback <- 30 #The fast signal lookback used to calculate the stdev
nSlowVolLookback <- 50 #The slow signal lookback used calculate the stdev
nFastRSILookback <- 30 #The fast signal lookback used to calculate the stdev
nSlowRSILookback <- 50 #The slow signal lookback used calculate the stdev
kNearestGroupSize <- 50 #How many neighbours to use
normalisedStrengthVolWeight <- 2 #Make some signals more important than others in the MSE
normalisedStrengthRegressionWeight <- 1
fastRSICurveWeight <- 2
slowRSICurveWeight <- 0.8
#Specify dates for downloading data, training models and running simulation
startDate = as.Date("2006-08-01") #Specify what date to get the prices from
symbolData <- new.env() #Make a new environment for quantmod to store data in
stockCleanNameFunc <- function(name){
getSymbols(marketSymbol, env = symbolData, src = "yahoo", from = startDate)
cleanName <- stockCleanNameFunc(marketSymbol)
mktData <- get(cleanName,symbolData)
linearRegressionCurve <- function(data,n){
    regression <- function(dataBlock){
           fit <-lm(dataBlock~seq(1,length(dataBlock),1))
    return (rollapply(data,width=n,regression,align="right",by.column=FALSE,na.pad=TRUE))
volCurve <- function(data,n){
    stdev <- function(dataBlock){
    return (rollapply(data,width=n,stdev,align="right",by.column=FALSE,na.pad=TRUE))^2
fastRegression <- linearRegressionCurve(Cl(mktData),nFastLookback)
slowRegression <- linearRegressionCurve(Cl(mktData),nSlowLookback)
normalisedStrengthRegression <- slowRegression / (slowRegression+fastRegression)
fastVolCurve <- volCurve(Cl(mktData),nFastVolLookback)
slowVolCurve <- volCurve(Cl(mktData),nSlowVolLookback)
normalisedStrengthVol <- slowVolCurve / (slowVolCurve+fastVolCurve)
fastRSICurve <-RSI(Cl(mktData),nFastRSILookback)/100 #rescale it to be in the same range as the other indicators
slowRSICurve <-RSI(Cl(mktData),nSlowRSILookback)/100
      #Lets plot the signals just to see what they look like
#DataMeasure will be used to determine how similar other days are to today
#It is used later on for calculate the days which are most similar to today according to MSE measure
dataMeasure <- cbind(normalisedStrengthVol*normalisedStrengthVolWeight,normalisedStrengthRegression*normalisedStrengthRegression,fastRSICurve*fastRSICurveWeight,slowRSICurve*slowRSICurveWeight)
colnames(dataMeasure) <- c("normalisedStrengthVol","normalisedStrengthRegression","fastRSICurve","slowRSICurve")
#Finds the nearest neighbour and calculates the trade signal
calculateNearestNeighbourTradeSignal <- function(dataMeasure,K,mktReturns){
        findKNearestNeighbours <- function(dataMeasure,K){
              calculateMSE <- function(dataMeasure){
                      calculateMSEInner <- function(dataA,dataB){
                        se <- ((as.matrix(dataA) - as.matrix(dataB))^2)
                      #Repeat the last row of dataMeasure multiple times
                      #This is so we can compare dataMeasure[today] with all the previous dates
                      lastMat <- last(dataMeasure)
                      setA <-  lastMat[rep(1, length(dataMeasure[,1])),]
                      setB <- dataMeasure
                      mse <- calculateMSEInner(setB,setA)
                      mse[is.na(mse)] <- Inf #Give it a terrible MSE if it's NA
                      colName <-  c(colnames(dataMeasure),"MSE")
                      dataMeasure <- cbind(dataMeasure,mse)
                      colnames(dataMeasure) <- colName
                      return (dataMeasure)
             rowNum <- seq(1,length(dataMeasure[,1]),1)
             dataMeasureWithMse <- as.data.frame(calculateMSE(dataMeasure))
             tmp <- c("rowNum", colnames(dataMeasureWithMse))
             dataMeasureWithMse <- cbind(rowNum,dataMeasureWithMse)
             colnames(dataMeasureWithMse) <- tmp
             dataMeasureWithMse <- dataMeasureWithMse[order(dataMeasureWithMse[,"MSE"]),]
             #Starting from the 2nd item as the 1st is the current day (MSE will be 0) want to drop it
             return (dataMeasureWithMse[seq(2,min(K,length(dataMeasureWithMse[,1]))),])
    calculateTradeSignalFromKNeighbours <- function(mktReturns,kNearestNeighbours){
         rowNums <- kNearestNeighbours[,"rowNum"]
         rowNums <- na.omit(rowNums)
         if(length(rowNums) <= 1) { return (0) }
         print("The kNearestNeighbours are:")
         #So lets see what happened on the day AFTER our nearest match
         mktRet <- mktReturns[rowNums+1]
         #return (sign(sum(mktRet)))
         return (SharpeRatio.annualized(mktRet))
    kNearestNeighbours <- findKNearestNeighbours(dataMeasure,K)
    tradeSignal <- calculateTradeSignalFromKNeighbours(mktReturns,kNearestNeighbours)
ret <- (Cl(mktData)/Op(mktData))-1
signalLog <- as.data.frame(ret)
signalLog[,1] <- 0
colnames(signalLog) <- c("TradeSignal")
#Loop through all the days we have data for, and calculate a signal for them using nearest neighbour
for(i in seq(1,length(ret))){
    print (paste("Simulating trading for day",i,"out of",length(ret),"@",100*i/length(ret),"%"))
    index <- seq(1,i)
    signal <- calculateNearestNeighbourTradeSignal(dataMeasure[index,],kNearestGroupSize,ret)
    signalLog[i,1] <- signal
tradeRet <- Lag(signalLog[,1])*ret[,1] #Combine todays signal with tomorrows return (no lookforward issues)
totalRet <- cbind(tradeRet,ret)
colnames(totalRet) <- c("Algo",paste(marketSymbol," Long OpCl Returns"))
charts.PerformanceSummary(totalRet,main=paste("K nearest trading algo for",marketSymbol),geometric=FALSE)