# Neural Networks with R – A Simple Example

In this tutorial a neural network (or Multilayer perceptron depending on naming convention) will be build that is able to take a number and calculate the square root (or as close to as possible). Later tutorials will build upon this to make forcasting / trading models.The R library ‘neuralnet’ will be used to train and build the neural network.

There is lots of good literature on neural networks freely available on the internet, a good starting point is the neural network handout by Dr Mark Gayles at the Engineering Department Cambridge University http://mi.eng.cam.ac.uk/~mjfg/local/I10/i10_hand4.pdf, it covers just enough to get an understanding of what a neural network is and what it can do without being too mathematically advanced to overwhelm the reader.

The tutorial will produce the neural network shown in the image below. It is going to take a single input (the number that you want square rooting) and produce a single output (the square root of the input). The middle of the image contains 10 hidden neurons which will be trained.

The output of the script will look like:

Input Expected Output Neural Net Output

```   Input 	Expected Output		 Neural Net Output
1               1     		 0.9623402772
4               2     		 2.0083461217
9               3     		 2.9958221776
16               4     		 4.0009548085
25               5     		 5.0028838579
36               6     		 5.9975810435
49               7    		 6.9968278722
64               8    		 8.0070028670
81               9    		 9.0019220736
100              10    		 9.9222007864```

As you can see the neural network does a reasonable job at finding the square root, the largest error in in finding the square root of 1 which is out by ~4%

Onto the code:

?View Code RSPLUS
 ```install.packages('neuralnet') library("neuralnet")   #Going to create a neural network to perform sqare rooting #Type ?neuralnet for more information on the neuralnet library   #Generate 50 random numbers uniformly distributed between 0 and 100 #And store them as a dataframe traininginput <- as.data.frame(runif(50, min=0, max=100)) trainingoutput <- sqrt(traininginput)   #Column bind the data into one variable trainingdata <- cbind(traininginput,trainingoutput) colnames(trainingdata) <- c("Input","Output")   #Train the neural network #Going to have 10 hidden layers #Threshold is a numeric value specifying the threshold for the partial #derivatives of the error function as stopping criteria. net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01) print(net.sqrt)   #Plot the neural network plot(net.sqrt)   #Test the neural network on some training data testdata <- as.data.frame((1:10)^2) #Generate some squared numbers net.results <- compute(net.sqrt, testdata) #Run them through the neural network   #Lets see what properties net.sqrt has ls(net.results)   #Lets see the results print(net.results\$net.result)   #Lets display a better version of the results cleanoutput <- cbind(testdata,sqrt(testdata), as.data.frame(net.results\$net.result)) colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output") print(cleanoutput)```

# Twitter Hedge Fund – Derwent Capital Interview

Interesting interview with Paul Hawtin from Derwent Capital about their twitter fund and some of the implementation details. Key things to note is that they analyse all tweets (no filtering for just FTSE companies), it’s not a blackbox system the mood signals are only single component of their strategy.

Previously in Part 1 code was produced to use tweetstream and download tweets in real time, this part of the series will look at how to store these tweets for future analysis.

The tweets will be saved into SqliteDB (http://www.sqlite.org/download.html) since this is compatible with python and R. Also its much simpler than some versions of SQL (no daemons / services to install). Good Sqlite information for R (Sqlite with R) and Python (Sqlite Python Docs)

The database will contain a “source” table. This is used to track what information source the tweets are downloaded from. For now it’ll just contain one entry, and that is twitter. Just collecting real time sweets means we’ll have to wait a long time to get a significant database to analyse. In a future article tweets will be downloaded from historical archives (hence are a different information source).

Setup the db:

```-bash-3.2\$ sqlite3 twitter.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> CREATE TABLE sources(sourceid INTEGER,sourcedesc,PRIMARY KEY(sourceid));
sqlite> INSERT INTO sources(sourcedesc) VALUES ("twitter");
sqlite> SELECT * FROM sources;
sqlite> CREATE TABLE tweets(id INTEGER, sourceid REAL,username TEXT,tweet TEXT,timestamp TIMESTAMP, PRIMARY KEY(id));
sqlite> .exit
-bash-3.2\$```

Now onto the code, you may want to alter the try catchs depending on how you want the script to respond to errors. On my server there is a cron job that launches the script every 5 min, so if it dies it’ll get restarted.

?View Code PYTHON
 ```import sqlite3 as lite import tweetstream import csv import sys from datetime import date import time   twitterUsername = "USERNAME" twitterPassword = "PASSWORD"   twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0])   print "Filtering the following words: ",', '.join(twitterWordFilter)       try: #Load the data base file (or make it if not found) #If dont set isolation level then we need to call #Db commit after every execute to save the transaction con = lite.connect('twitter.db',isolation_level=None) cur = con.cursor() #Use cursor for executing queries #Get the sourceid (will be useful when we use multiple data sources) cur.execute("SELECT sourceid FROM sources where sourcedesc='twitter'") sourceid = cur.fetchone() #Get the source id sourceid = sourceid[0]   with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: tweettimestamp = time.mktime(time.strptime(tweet['created_at'],"%a %b %d %H:%M:%S +0000 %Y")) - time.timezone   print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('ascii','ignore') #print tweet #Use for raw output try: cur.execute("INSERT INTO tweets(sourceid,username,tweet,timestamp) VALUES(?,?,?,?)",[sourceid,tweet['user']['screen_name'],tweet['text'].encode('ascii','ignore'),tweettimestamp]) except: print "SQL Insert Error: Probably some encoding issue due to foreign languages"       except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason except lite.Error, e: print "SQLite Error:",e except: print "ERROR:",sys.exc_info() finally: if con: con.close()```

Don’t forget to update the wordsfilterlist.csv to track the stocks that you’re interested in. It looks like american stocks respond well to \$TICKER, this is because the stocktwits has forced a nice format. From a quick play around it looks like using tickers without symbols in front just matches random foreign tweets.

The hosting provider for this site is GoDaddy, they provide python 2.4 and python 2.7 on their basic economy web hosting package. Tweetstream is only compatible with python 2.6 and upwards.

Unfortunately the GoDaddy python 2.7 wasn’t built using the enable ssl options, thus communication over HTTPS which the twitter API requires isn’t possible. Fortunatly I have been able to find a solution which is discussed in this post.

Installing python over SSH and the commandline is normally a straight forward task but since I don’t have root access and there are no compilers available the problem is tricky to solve.

Fortunately it is possible to download pre-compiled python binaries and install them in a folder where I have read/write access. The prebuilt binaries can be found at http://www.activestate.com/activepython/downloads.

```wget http://downloads.activestate.com/ActivePython/releases/2.7.2.5/ActivePython-2.7.2.5-linux-x86.tar.gz tar -zxvf ActivePython-2.7.2.5-linux-x86.tar.gz cd ActivePython-2.7.2.5-linux-x86 ./install.ch
-bash-3.2\$ ./install.sh
Enter directory in which to install ActivePython. Leave blank and
press 'Enter' to use the default [/opt/ActivePython-2.7].
Install directory: ~/ActivePython
()
Installing ActivePython to '/var/chroot/home/content/92/9339192/ActivePython'...
Relocating dir-dependent files...
Pre-compiling .py files in the standard library...
ActivePython has been successfully installed to:
/var/chroot/home/content/92/9339192/ActivePython
to put ActivePython on your PATH:
export PATH=/var/chroot/home/content/92/9339192/ActivePython/bin:\$PATH
The documentation is available here:
/var/chroot/home/content/92/9339192/ActivePython/doc/python2.7/index.html
web: http://docs.activestate.com/activepython/2.7
Please send us any feedback you might have or log bugs here:
activepython-feedback@ActiveState.com

http://bugs.activestate.com/ActivePython/

Thank you for using ActivePython.
export PATH=~/ActivePython/bin:\$PATH
-bash-3.2\$ python2.7 ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
It should say ACTIVE PYTHON above if install was successful
Python 2.7.2 (default, Jun 24 2011, 09:10:09)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
>>> import socket
>>> socket.ssl
<function ssl at 0xb7f21ae4>
The above would have throw an error if ssl not installed
-bash-3.2\$ pip install tweetstream
Running setup.py egg_info for package tweetstream
Running setup.py egg_info for package anyjson
Installing collected packages: tweetstream, anyjson
Running setup.py install for tweetstream
Running setup.py install for anyjson
Successfully installed tweetstream anyjson
Cleaning up...
-bash-3.2\$ python twit.py
1 ( 0 tweets/sec). FinancialBin : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/dfWQb3ZL
2 ( 0 tweets/sec). stocksgame : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/WP92slZt
3 ( 0 tweets/sec). Daily_Finances : #News Global stocks fall 5th day, Brent off 2 percent on euro zone fears http://t.co/flCVonMF
4 ( 0 tweets/sec). AUTSmallBiz : [Sydney Morng Hrld] Stocks set to tumble http://t.co/OEP14YvL
5 ( 0 tweets/sec). BeckaDerrick : @MattoneAwang it's the same price to finance a brand new mini + get free insurance as it is to insure an old car #WhyIsEveryonePuttingMeOff!```

In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.

The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.

It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.

Installing the tweetstream library

Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream

Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.

?View Code PYTHON
 ```import tweetstream   twitterUsername = "USERNAME" twitterPassword = "PASSWORD" try: with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field"   except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason```

The following code lets us filter the tweets before they are downloaded

?View Code PYTHON
 ```import tweetstream   twitterUsername = "USERNAME" twitterPassword = "PASSWORD" twitterWordFilter = ["stocks","ftse","finance"]   try: with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field"   except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason```

This code will let us load in our filter words from a .csv file (wordstofilter.csv)

?View Code PYTHON
 ```import csv   twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0])   print "Filtering the following words: ",', '.join(twitterWordFilter)```

Putting it all together load filters form CSV and filter the stream

?View Code PYTHON
 ```import tweetstream import csv   twitterUsername = "USERNAME" twitterPassword = "PASSWORD"   twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0])   print "Filtering the following words: ",', '.join(twitterWordFilter)   try: with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field"   except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason```

# Getting Started with Python + IDE

Libraries exist in R and Python for downloading tweets, however the majority of webhosts don’t support R but they do tend to support Python. Since we want to download tweets around the clock then we want the scripts to run on a webhost (the cheaper the better) hence this post on python.

I considered running a server at home but when you consider the electricity bill it actually works out significantly cheaper to use a webhost (saves ~£80 a year).

To get started with Python I’d recommend downloading www.pythonxy.com it’s multi-platform, has a good community and a ton of great packages for scientific analysis.

Visit http://docs.python.org/tutorial/ for a comprehensive tutorial on how to program in python.

Trading Strategies To Exploit Blog and News Sentiment (Paper)

I just came across this paper and wanted to document it here for something to come back to and test for myself, hopefully you will find it as interesting as I did.

The method has four Parameters:

• Sentiment Analysis Period – How many days of previous sentiment data to use?
• Holding Period – How long to hold a trade for?
• Market Capitalization – Do small cap and large cap respond the same?
• Diversiﬁcation – How many stocks to have in the portfolio?

Each of the trading model parameters is also analysed and their effects explained.The paper outlines a market neutral sentiment based trading algorithm which is back tested over a five year period (2005-2009) and produces some exceptionally impressive returns almost 40% in certain years depending on configuration.

What i like most about the paper is that the asset to trade is selected based upon a fixed criteria (ie is it in the top n most extreme sentiments), this stops positive bias effects whereby the author could just present profitable scenarios / cherry pick the results.

The sentiment is based upon analysing news posts, blog posts and tweets. Since twitter only came into existence in 2009 the authors only had half a years worth of twitter data to analyse. The great results in this paper were achieved without twitter data using normal news and blog sources.

The paper shows that corpus size matters, using blogs might be a cheaper method to collect a corpus (scrape lots of RSS feeds), whereas with twitter there are limitations to what data you can get for free (full datafeeds start at \$3500 a month!!!!).

# PerformanceAnalytics Basics – Measure Strategy Performance and Risk

In this quick tutorial I will introduce the PerformanceAnalytics library, the library lets us easily analyse the performance of our strategies. In this tutorial you will learn how to plot cumulative returns and draw downs compared to an index, output a table of monthly performance metrics, use boxplots to investigate strategy outliers and finally plot histograms of returns and overlay with different statistical measures.

You will need some returns data for this tutorial, I have created a file with some returns in to get you started: strategyperfomance.csv

The three image outputs are:

The text output is:

```
Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec Strategy1 Strategy2 Index
2005  1.8  0.0 -0.3  1.9  1.4  1.4  0.7  1.3  0.7  0.0  1.2  1.2      12.0       9.2   9.6
2006  0.1  0.9  0.9  1.4  1.0  1.8 -1.7  0.4  1.0  0.2  0.1  0.5       6.9       4.0   6.3
2007  1.5  0.5  0.4 -0.9  1.3  1.8 -2.9 -0.3 -0.2 -1.1 -1.3  0.5      -0.9      -4.5  -3.7
2008  1.8 -1.0  3.5  1.5  0.2 -2.4 -1.0 -1.3  1.3  2.1 -5.3  2.9       1.9       0.4  -0.9
2009 -2.0 -4.4  0.5 -0.5  2.4  3.1  2.0 -0.6 -1.9  1.6  2.2  1.7       3.8      -0.5   0.8
2010  1.4  1.1 -0.8 -3.0 -0.1 -3.3  2.6  4.1  1.4  0.2  2.8  1.5       8.0       8.3   8.6
2011  0.9 -0.2  0.7  1.0 -1.3  2.5  0.6  0.5 -0.5 -2.0 -0.5 -0.6       1.0      -1.2  -1.5
```

Onto the code:

?View Code RSPLUS
 ```install.packages("PerformanceAnalytics") #Install the PerformanceAnalytics library library("PerformanceAnalytics") #Load the PerformanceAnalytics library   #Load in our strategy performance spreadsheet #Structure is: Date,Strategy1,Strategy2,Index #Each row contains the period returns (% terms)   strategies <- read.zoo("strategyperfomance.csv", sep = ",", header = TRUE, format="%Y-%m-%d") head(strategies)   #List all the column names to check data loaded in correctly #Can access colums with strategies["columnname"] colnames(strategies)   #Lets see how all the strategies faired against the index charts.PerformanceSummary(strategies,main="Performance of all Strategies")   #Lets see how just strategy one faired against the index charts.PerformanceSummary(strategies[,c("Strategy1","Index")],main="Performance of Strategy 1")   #Lets calculate a table of montly returns by year and strategy table.CalendarReturns(strategies)   #Lets make a boxplot of the returns chart.Boxplot(strategies)   #Set the plotting area to a 2 by 2 grid layout(rbind(c(1,2),c(3,4)))   #Plot various histograms with different overlays added chart.Histogram(strategies[,"Strategy1"], main = "Plain", methods = NULL) chart.Histogram(strategies[,"Strategy1"], main = "Density", breaks=40, methods = c("add.density", "add.normal")) chart.Histogram(strategies[,"Strategy1"], main = "Skew and Kurt", methods = c("add.centered", "add.rug")) chart.Histogram(strategies[,"Strategy1"], main = "Risk Measures", methods = c("add.risk"))   #Note: The above histogram plots is taken from the example documentation #The documentation is excellent #http://cran.r-project.org/web/packages/PerformanceAnalytics/vignettes/PA-charts.pdf```

The performance analytics documentation (including more examples) can be found at:

http://cran.r-project.org/web/packages/PerformanceAnalytics/vignettes/PA-charts.pdf

In this quick tutorial I will show you how to use the quantmod library to download historical data, plot it, add a technical indicator (Bollinger Bands) and do some basic manipulation with date ranges and intersecting data sets.

The tutorial will let you make images such as:

This is a very quick tutorial designed to get you started / make you aware of certain packages.

Onto the code:

``` ?View Code RSPLUSinstall.packages("quantmod") #Install the quantmod library library("quantmod") #Load the quantmod Library stockData <- new.env() #Make a new environment for quantmod to store data in   startDate = as.Date("2008-01-13") #Specify period of time we are interested in endDate = as.Date("2012-01-12")   tickers <- c("ARM","CSR") #Define the tickers we are interested in   #Download the stock history (for all tickers) getSymbols(tickers, env = stockData, src = "yahoo", from = startDate, to = endDate)   #Use head to show first six rows of matrix head(stockData\$ARM)   #Lets look at the just the closing prices Cl(stockData\$ARM)   #Lets plot the data chartSeries(stockData\$ARM)   #Lets add some bollinger bands to the plot (with period 50 & width 2 standard deviations) ?addBBands #Make R display the help documentation so we know what variables to pass to the function addBBands(n=50, sd=2)   #Lets get the technical indicator values saved into a variable #Note must give it a single time series (I gave it the close price in this example) indicatorValuesBBands <- BBands(Cl(stockData\$ARM),n=50, sd=2)   #Lets examine only a 1 month period of data armSubset<- window(stockData\$ARM, start = as.Date("2010-02-15"), end = as.Date("2010-03-15")) armSubset #Lets see the data   #Lets extract a 1 month period of data for CSR but starting midway through the arm data csrSubset<- window(stockData\$CSR, start = as.Date("2010-02-25"), end = as.Date("2010-03-25")) csrSubset #Lets see the data   #Now we want to get the intersection of the two subsets of data #this will gives us all the sets of data where the dates match #Its important to match the date series to stop spurious analysis of non-synchronised data #All=FALSE specifies the intersection as in don't include all dates in the merge armcsrIntersection <- merge(armSubset, csrSubset, all = FALSE) subset(armcsrIntersection,select = c("ARM.Open","CSR.Open")) #Select the open columns and display ```

Further material can be found at: