Neural Networks with R – A Simple Example

In this tutorial a neural network (or Multilayer perceptron depending on naming convention) will be build that is able to take a number and calculate the square root (or as close to as possible). Later tutorials will build upon this to make forcasting / trading models.The R library ‘neuralnet’ will be used to train and build the neural network.

There is lots of good literature on neural networks freely available on the internet, a good starting point is the neural network handout by Dr Mark Gayles at the Engineering Department Cambridge University http://mi.eng.cam.ac.uk/~mjfg/local/I10/i10_hand4.pdf, it covers just enough to get an understanding of what a neural network is and what it can do without being too mathematically advanced to overwhelm the reader.

The tutorial will produce the neural network shown in the image below. It is going to take a single input (the number that you want square rooting) and produce a single output (the square root of the input). The middle of the image contains 10 hidden neurons which will be trained.

The output of the script will look like:

Input Expected Output Neural Net Output

   Input 	Expected Output		 Neural Net Output
      1               1     		 0.9623402772
      4               2     		 2.0083461217
      9               3     		 2.9958221776
     16               4     		 4.0009548085
     25               5     		 5.0028838579
     36               6     		 5.9975810435
     49               7    		 6.9968278722
     64               8    		 8.0070028670
     81               9    		 9.0019220736
    100              10    		 9.9222007864

As you can see the neural network does a reasonable job at finding the square root, the largest error in in finding the square root of 1 which is out by ~4%

Onto the code:

?View Code RSPLUS
install.packages('neuralnet')
library("neuralnet")
 
#Going to create a neural network to perform sqare rooting
#Type ?neuralnet for more information on the neuralnet library
 
#Generate 50 random numbers uniformly distributed between 0 and 100
#And store them as a dataframe
traininginput <-  as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
 
#Column bind the data into one variable
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
 
#Train the neural network
#Going to have 10 hidden layers
#Threshold is a numeric value specifying the threshold for the partial
#derivatives of the error function as stopping criteria.
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01)
print(net.sqrt)
 
#Plot the neural network
plot(net.sqrt)
 
#Test the neural network on some training data
testdata <- as.data.frame((1:10)^2) #Generate some squared numbers
net.results <- compute(net.sqrt, testdata) #Run them through the neural network
 
#Lets see what properties net.sqrt has
ls(net.results)
 
#Lets see the results
print(net.results$net.result)
 
#Lets display a better version of the results
cleanoutput <- cbind(testdata,sqrt(testdata),
                         as.data.frame(net.results$net.result))
colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output")
print(cleanoutput)
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2012/05/SquareRootNeuralNetImage-300x300.jpgDigg ThisSubmit to redditShare via email

Twitter Trading – Downloading Tweets Using Python (Part 2 of 2)

Previously in Part 1 code was produced to use tweetstream and download tweets in real time, this part of the series will look at how to store these tweets for future analysis.

The tweets will be saved into SqliteDB (http://www.sqlite.org/download.html) since this is compatible with python and R. Also its much simpler than some versions of SQL (no daemons / services to install). Good Sqlite information for R (Sqlite with R) and Python (Sqlite Python Docs)

The database will contain a “source” table. This is used to track what information source the tweets are downloaded from. For now it’ll just contain one entry, and that is twitter. Just collecting real time sweets means we’ll have to wait a long time to get a significant database to analyse. In a future article tweets will be downloaded from historical archives (hence are a different information source).

Setup the db:

-bash-3.2$ sqlite3 twitter.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> CREATE TABLE sources(sourceid INTEGER,sourcedesc,PRIMARY KEY(sourceid));
sqlite> INSERT INTO sources(sourcedesc) VALUES ("twitter");
sqlite> SELECT * FROM sources;
1|twitter
sqlite> CREATE TABLE tweets(id INTEGER, sourceid REAL,username TEXT,tweet TEXT,timestamp TIMESTAMP, PRIMARY KEY(id));
sqlite> .exit
-bash-3.2$

Now onto the code, you may want to alter the try catchs depending on how you want the script to respond to errors. On my server there is a cron job that launches the script every 5 min, so if it dies it’ll get restarted.

?View Code PYTHON
import sqlite3 as lite
import tweetstream
import csv
import sys
from datetime import date
import time
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
 
 
try:
    #Load the data base file (or make it if not found)
    #If dont set isolation level then we need to call
    #Db commit after every execute to save the transaction
    con = lite.connect('twitter.db',isolation_level=None)
    cur = con.cursor() #Use cursor for executing queries
    #Get the sourceid (will be useful when we use multiple data sources)
    cur.execute("SELECT sourceid FROM sources where sourcedesc='twitter'")
    sourceid = cur.fetchone() #Get the source id
    sourceid = sourceid[0]
 
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            tweettimestamp =  time.mktime(time.strptime(tweet['created_at'],"%a %b %d %H:%M:%S +0000 %Y")) - time.timezone
 
            print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('ascii','ignore')
            #print tweet #Use for raw output            
            try:
                cur.execute("INSERT INTO tweets(sourceid,username,tweet,timestamp) VALUES(?,?,?,?)",[sourceid,tweet['user']['screen_name'],tweet['text'].encode('ascii','ignore'),tweettimestamp])
            except:
                print "SQL Insert Error: Probably some encoding issue due to foreign languages"
 
 
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
except lite.Error, e:
    print "SQLite Error:",e
except:
    print "ERROR:",sys.exc_info()    
finally:
    if con:
        con.close()

Don’t forget to update the wordsfilterlist.csv to track the stocks that you’re interested in. It looks like american stocks respond well to $TICKER, this is because the stocktwits has forced a nice format. From a quick play around it looks like using tickers without symbols in front just matches random foreign tweets.

Attached is the complete project so far including a db (TwitterDownload.Zip).

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading – Getting tweetstream working on GoDaddy

In http://gekkoquant.com/2012/05/17/twitter-trading-downloading-tweets-using-python-part-1-of-2/ python code was provided for downloading tweets. The python tweetsteam library was used because python is more compatible than R when using cheap webhosting.

The hosting provider for this site is GoDaddy, they provide python 2.4 and python 2.7 on their basic economy web hosting package. Tweetstream is only compatible with python 2.6 and upwards.

Unfortunately the GoDaddy python 2.7 wasn’t built using the enable ssl options, thus communication over HTTPS which the twitter API requires isn’t possible. Fortunatly I have been able to find a solution which is discussed in this post.

Installing python over SSH and the commandline is normally a straight forward task but since I don’t have root access and there are no compilers available the problem is tricky to solve.

Fortunately it is possible to download pre-compiled python binaries and install them in a folder where I have read/write access. The prebuilt binaries can be found at http://www.activestate.com/activepython/downloads.

wget http://downloads.activestate.com/ActivePython/releases/2.7.2.5/ActivePython-2.7.2.5-linux-x86.tar.gz tar -zxvf ActivePython-2.7.2.5-linux-x86.tar.gz cd ActivePython-2.7.2.5-linux-x86 ./install.ch
-bash-3.2$ ./install.sh
Enter directory in which to install ActivePython. Leave blank and
press 'Enter' to use the default [/opt/ActivePython-2.7].
Install directory: ~/ActivePython
()
Installing ActivePython to '/var/chroot/home/content/92/9339192/ActivePython'...
Relocating dir-dependent files...
Pre-compiling .py files in the standard library...
ActivePython has been successfully installed to:
/var/chroot/home/content/92/9339192/ActivePython
You can add the following to your .bashrc (or equivalent)
to put ActivePython on your PATH:
export PATH=/var/chroot/home/content/92/9339192/ActivePython/bin:$PATH
The documentation is available here:
/var/chroot/home/content/92/9339192/ActivePython/doc/python2.7/index.html
 web: http://docs.activestate.com/activepython/2.7
Please send us any feedback you might have or log bugs here:
activepython-feedback@ActiveState.com

http://bugs.activestate.com/ActivePython/

Thank you for using ActivePython.
export PATH=~/ActivePython/bin:$PATH
-bash-3.2$ python2.7 ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
It should say ACTIVE PYTHON above if install was successful
Python 2.7.2 (default, Jun 24 2011, 09:10:09)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.ssl
<function ssl at 0xb7f21ae4>
The above would have throw an error if ssl not installed 
-bash-3.2$ pip install tweetstream
Downloading/unpacking tweetstream
 Downloading tweetstream-1.1.1.tar.gz
 Running setup.py egg_info for package tweetstream
Downloading/unpacking anyjson (from tweetstream)
 Downloading anyjson-0.3.1.tar.gz
 Running setup.py egg_info for package anyjson
Installing collected packages: tweetstream, anyjson
 Running setup.py install for tweetstream
Running setup.py install for anyjson
Successfully installed tweetstream anyjson
Cleaning up...
-bash-3.2$ python twit.py
1 ( 0 tweets/sec). FinancialBin : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/dfWQb3ZL
2 ( 0 tweets/sec). stocksgame : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/WP92slZt
3 ( 0 tweets/sec). Daily_Finances : #News Global stocks fall 5th day, Brent off 2 percent on euro zone fears http://t.co/flCVonMF
4 ( 0 tweets/sec). AUTSmallBiz : [Sydney Morng Hrld] Stocks set to tumble http://t.co/OEP14YvL
5 ( 0 tweets/sec). BeckaDerrick : @MattoneAwang it's the same price to finance a brand new mini + get free insurance as it is to insure an old car #WhyIsEveryonePuttingMeOff!

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading – Downloading Tweets Using Python (Part 1 of 2)

In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.

The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.

It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.

Installing the tweetstream library

Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream

Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.

Downloading a generic set of tweets

This code allows you to download and display the TweetStream

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
try:
    with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Downloading a set of tweets matching a word filter

The following code lets us filter the tweets before they are downloaded

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
twitterWordFilter = ["stocks","ftse","finance"]
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Loading the word library form a csv file

This code will let us load in our filter words from a .csv file (wordstofilter.csv)

?View Code PYTHON
import csv
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)

Putting it all together load filters form CSV and filter the stream

?View Code PYTHON
import tweetstream
import csv
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Getting Started with Python + IDE

Libraries exist in R and Python for downloading tweets, however the majority of webhosts don’t support R but they do tend to support Python. Since we want to download tweets around the clock then we want the scripts to run on a webhost (the cheaper the better) hence this post on python.

I considered running a server at home but when you consider the electricity bill it actually works out significantly cheaper to use a webhost (saves ~£80 a year).

To get started with Python I’d recommend downloading www.pythonxy.com it’s multi-platform, has a good community and a ton of great packages for scientific analysis.

Visit http://docs.python.org/tutorial/ for a comprehensive tutorial on how to program in python.

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2012/05/spyder-windows-300x207.pngDigg ThisSubmit to redditShare via email

Trading Strategies To Exploit Blog, News & Twitter Sentiment (Paper)

Trading Strategies To Exploit Blog and News Sentiment (Paper)

I just came across this paper and wanted to document it here for something to come back to and test for myself, hopefully you will find it as interesting as I did.

The method has four Parameters:

  • Sentiment Analysis Period – How many days of previous sentiment data to use?
  • Holding Period – How long to hold a trade for?
  • Market Capitalization – Do small cap and large cap respond the same?
  • Diversification – How many stocks to have in the portfolio?

Each of the trading model parameters is also analysed and their effects explained.The paper outlines a market neutral sentiment based trading algorithm which is back tested over a five year period (2005-2009) and produces some exceptionally impressive returns almost 40% in certain years depending on configuration. 

What i like most about the paper is that the asset to trade is selected based upon a fixed criteria (ie is it in the top n most extreme sentiments), this stops positive bias effects whereby the author could just present profitable scenarios / cherry pick the results.

The sentiment is based upon analysing news posts, blog posts and tweets. Since twitter only came into existence in 2009 the authors only had half a years worth of twitter data to analyse. The great results in this paper were achieved without twitter data using normal news and blog sources.

The paper shows that corpus size matters, using blogs might be a cheaper method to collect a corpus (scrape lots of RSS feeds), whereas with twitter there are limitations to what data you can get for free (full datafeeds start at $3500 a month!!!!).

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

PerformanceAnalytics Basics – Measure Strategy Performance and Risk

In this quick tutorial I will introduce the PerformanceAnalytics library, the library lets us easily analyse the performance of our strategies. In this tutorial you will learn how to plot cumulative returns and draw downs compared to an index, output a table of monthly performance metrics, use boxplots to investigate strategy outliers and finally plot histograms of returns and overlay with different statistical measures.

You will need some returns data for this tutorial, I have created a file with some returns in to get you started: strategyperfomance.csv

The three image outputs are:

The text output is:


      Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec Strategy1 Strategy2 Index
2005  1.8  0.0 -0.3  1.9  1.4  1.4  0.7  1.3  0.7  0.0  1.2  1.2      12.0       9.2   9.6
2006  0.1  0.9  0.9  1.4  1.0  1.8 -1.7  0.4  1.0  0.2  0.1  0.5       6.9       4.0   6.3
2007  1.5  0.5  0.4 -0.9  1.3  1.8 -2.9 -0.3 -0.2 -1.1 -1.3  0.5      -0.9      -4.5  -3.7
2008  1.8 -1.0  3.5  1.5  0.2 -2.4 -1.0 -1.3  1.3  2.1 -5.3  2.9       1.9       0.4  -0.9
2009 -2.0 -4.4  0.5 -0.5  2.4  3.1  2.0 -0.6 -1.9  1.6  2.2  1.7       3.8      -0.5   0.8
2010  1.4  1.1 -0.8 -3.0 -0.1 -3.3  2.6  4.1  1.4  0.2  2.8  1.5       8.0       8.3   8.6
2011  0.9 -0.2  0.7  1.0 -1.3  2.5  0.6  0.5 -0.5 -2.0 -0.5 -0.6       1.0      -1.2  -1.5

Onto the code:

?View Code RSPLUS
install.packages("PerformanceAnalytics") #Install the PerformanceAnalytics library
library("PerformanceAnalytics") #Load the PerformanceAnalytics library
 
#Load in our strategy performance spreadsheet
#Structure is: Date,Strategy1,Strategy2,Index
#Each row contains the period returns (% terms)
 
strategies <- read.zoo("strategyperfomance.csv", sep = ",", header = TRUE, format="%Y-%m-%d")
head(strategies)
 
#List all the column names to check data loaded in correctly
#Can access colums with strategies["columnname"]
colnames(strategies)
 
#Lets see how all the strategies faired against the index
charts.PerformanceSummary(strategies,main="Performance of all Strategies")
 
#Lets see how just strategy one faired against the index
charts.PerformanceSummary(strategies[,c("Strategy1","Index")],main="Performance of Strategy 1")
 
#Lets calculate a table of montly returns by year and strategy
table.CalendarReturns(strategies)
 
#Lets make a boxplot of the returns
chart.Boxplot(strategies)
 
#Set the plotting area to a 2 by 2 grid
layout(rbind(c(1,2),c(3,4)))
 
#Plot various histograms with different overlays added
chart.Histogram(strategies[,"Strategy1"], main = "Plain", methods = NULL)
chart.Histogram(strategies[,"Strategy1"], main = "Density", breaks=40, methods = c("add.density", "add.normal"))
chart.Histogram(strategies[,"Strategy1"], main = "Skew and Kurt", methods = c("add.centered", "add.rug"))
chart.Histogram(strategies[,"Strategy1"], main = "Risk Measures", methods = c("add.risk"))
 
#Note: The above histogram plots is taken from the example documentation
#The documentation is excellent
#http://cran.r-project.org/web/packages/PerformanceAnalytics/vignettes/PA-charts.pdf

The performance analytics documentation (including more examples) can be found at:

http://cran.r-project.org/web/packages/PerformanceAnalytics/vignettes/PA-charts.pdf

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

QuantMod Basics – Stock Data Download and Manipulation

In this quick tutorial I will show you how to use the quantmod library to download historical data, plot it, add a technical indicator (Bollinger Bands) and do some basic manipulation with date ranges and intersecting data sets.

The tutorial will let you make images such as:

This is a very quick tutorial designed to get you started / make you aware of certain packages.

Onto the code:

?View Code RSPLUS
install.packages("quantmod") #Install the quantmod library
library("quantmod")  #Load the quantmod Library
stockData <- new.env() #Make a new environment for quantmod to store data in
 
startDate = as.Date("2008-01-13") #Specify period of time we are interested in
endDate = as.Date("2012-01-12")
 
tickers <- c("ARM","CSR") #Define the tickers we are interested in
 
#Download the stock history (for all tickers)
getSymbols(tickers, env = stockData, src = "yahoo", from = startDate, to = endDate)
 
#Use head to show first six rows of matrix
head(stockData$ARM)
 
#Lets look at the just the closing prices
Cl(stockData$ARM)
 
#Lets plot the data
chartSeries(stockData$ARM)
 
#Lets add some bollinger bands to the plot (with period 50 & width 2 standard deviations)
?addBBands #Make R display the help documentation so we know what variables to pass to the function
addBBands(n=50, sd=2)
 
#Lets get the technical indicator values saved into a variable
#Note must give it a single time series (I gave it the close price in this example)
indicatorValuesBBands <- BBands(Cl(stockData$ARM),n=50, sd=2)
 
#Lets examine only a 1 month period of data
armSubset<-  window(stockData$ARM, start = as.Date("2010-02-15"), end = as.Date("2010-03-15"))
armSubset #Lets see the data
 
#Lets extract a 1 month period of data for CSR but starting midway through the arm data
csrSubset<-  window(stockData$CSR, start = as.Date("2010-02-25"), end = as.Date("2010-03-25"))
csrSubset #Lets see the data
 
#Now we want to get the intersection of the two subsets of data
#this will gives us all the sets of data where the dates match
#Its important to match the date series to stop spurious analysis of non-synchronised data
#All=FALSE specifies the intersection as in don't include all dates in the merge
armcsrIntersection <- merge(armSubset, csrSubset, all = FALSE)
subset(armcsrIntersection,select = c("ARM.Open","CSR.Open")) #Select the open columns and display

Further material can be found at:

Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2012/05/QuantMod-ARM-plot-300x300.jpgDigg ThisSubmit to redditShare via email

Twitter Trading & Sentiment Analysis

A standard idea in behavioral economics is that emotions play a large part in decision making and profoundly influence an agents behavior. This line of logic can be applied to the stock market, price moves are a function of the emotions of the agents in the market.

In 2011 a Paper by Johan Bollen, Huina Mao, Xiaojun Zeng called “Twitter mood predicts the stock market”, it is shown that by applying sentiment analysis to twitter posts (tweets) it is possible to gauge the current emotional state of agents. The paper then goes on to argue that the emotion of twitter is correlated with market movements and possibly even predictive of the movements.

After this landmark paper was first publish a number of hedge funds have taken the idea and produced twitter funds, the most publicly known twitter fund is run by Derwent Capital.

http://www.efinancialnews.com/story/2011-08-15/twitter-derwent-capital-hedge-fund

I plan on investigating this idea further in this blog, but if you want to get started before me the following should be useful:

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email