# Neural Networks with R – A Simple Example

In this tutorial a neural network (or Multilayer perceptron depending on naming convention) will be build that is able to take a number and calculate the square root (or as close to as possible). Later tutorials will build upon this to make forcasting / trading models.The R library ‘neuralnet’ will be used to train and build the neural network.

There is lots of good literature on neural networks freely available on the internet, a good starting point is the neural network handout by Dr Mark Gayles at the Engineering Department Cambridge University http://mi.eng.cam.ac.uk/~mjfg/local/I10/i10_hand4.pdf, it covers just enough to get an understanding of what a neural network is and what it can do without being too mathematically advanced to overwhelm the reader.

The tutorial will produce the neural network shown in the image below. It is going to take a single input (the number that you want square rooting) and produce a single output (the square root of the input). The middle of the image contains 10 hidden neurons which will be trained.

The output of the script will look like:

Input Expected Output Neural Net Output

```   Input 	Expected Output		 Neural Net Output
1               1     		 0.9623402772
4               2     		 2.0083461217
9               3     		 2.9958221776
16               4     		 4.0009548085
25               5     		 5.0028838579
36               6     		 5.9975810435
49               7    		 6.9968278722
64               8    		 8.0070028670
81               9    		 9.0019220736
100              10    		 9.9222007864```

As you can see the neural network does a reasonable job at finding the square root, the largest error in in finding the square root of 1 which is out by ~4%

Onto the code:

?View Code RSPLUS
 ```install.packages('neuralnet') library("neuralnet")   #Going to create a neural network to perform sqare rooting #Type ?neuralnet for more information on the neuralnet library   #Generate 50 random numbers uniformly distributed between 0 and 100 #And store them as a dataframe traininginput <- as.data.frame(runif(50, min=0, max=100)) trainingoutput <- sqrt(traininginput)   #Column bind the data into one variable trainingdata <- cbind(traininginput,trainingoutput) colnames(trainingdata) <- c("Input","Output")   #Train the neural network #Going to have 10 hidden layers #Threshold is a numeric value specifying the threshold for the partial #derivatives of the error function as stopping criteria. net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01) print(net.sqrt)   #Plot the neural network plot(net.sqrt)   #Test the neural network on some training data testdata <- as.data.frame((1:10)^2) #Generate some squared numbers net.results <- compute(net.sqrt, testdata) #Run them through the neural network   #Lets see what properties net.sqrt has ls(net.results)   #Lets see the results print(net.results\$net.result)   #Lets display a better version of the results cleanoutput <- cbind(testdata,sqrt(testdata), as.data.frame(net.results\$net.result)) colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output") print(cleanoutput)```

# Twitter Hedge Fund – Derwent Capital Interview

Interesting interview with Paul Hawtin from Derwent Capital about their twitter fund and some of the implementation details. Key things to note is that they analyse all tweets (no filtering for just FTSE companies), it’s not a blackbox system the mood signals are only single component of their strategy.

Previously in Part 1 code was produced to use tweetstream and download tweets in real time, this part of the series will look at how to store these tweets for future analysis.

The tweets will be saved into SqliteDB (http://www.sqlite.org/download.html) since this is compatible with python and R. Also its much simpler than some versions of SQL (no daemons / services to install). Good Sqlite information for R (Sqlite with R) and Python (Sqlite Python Docs)

The database will contain a “source” table. This is used to track what information source the tweets are downloaded from. For now it’ll just contain one entry, and that is twitter. Just collecting real time sweets means we’ll have to wait a long time to get a significant database to analyse. In a future article tweets will be downloaded from historical archives (hence are a different information source).

Setup the db:

```-bash-3.2\$ sqlite3 twitter.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> CREATE TABLE sources(sourceid INTEGER,sourcedesc,PRIMARY KEY(sourceid));
sqlite> INSERT INTO sources(sourcedesc) VALUES ("twitter");
sqlite> SELECT * FROM sources;
sqlite> CREATE TABLE tweets(id INTEGER, sourceid REAL,username TEXT,tweet TEXT,timestamp TIMESTAMP, PRIMARY KEY(id));
sqlite> .exit
-bash-3.2\$```

Now onto the code, you may want to alter the try catchs depending on how you want the script to respond to errors. On my server there is a cron job that launches the script every 5 min, so if it dies it’ll get restarted.

?View Code PYTHON
 ```import sqlite3 as lite import tweetstream import csv import sys from datetime import date import time   twitterUsername = "USERNAME" twitterPassword = "PASSWORD"   twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0])   print "Filtering the following words: ",', '.join(twitterWordFilter)       try: #Load the data base file (or make it if not found) #If dont set isolation level then we need to call #Db commit after every execute to save the transaction con = lite.connect('twitter.db',isolation_level=None) cur = con.cursor() #Use cursor for executing queries #Get the sourceid (will be useful when we use multiple data sources) cur.execute("SELECT sourceid FROM sources where sourcedesc='twitter'") sourceid = cur.fetchone() #Get the source id sourceid = sourceid[0]   with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: tweettimestamp = time.mktime(time.strptime(tweet['created_at'],"%a %b %d %H:%M:%S +0000 %Y")) - time.timezone   print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('ascii','ignore') #print tweet #Use for raw output try: cur.execute("INSERT INTO tweets(sourceid,username,tweet,timestamp) VALUES(?,?,?,?)",[sourceid,tweet['user']['screen_name'],tweet['text'].encode('ascii','ignore'),tweettimestamp]) except: print "SQL Insert Error: Probably some encoding issue due to foreign languages"       except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason except lite.Error, e: print "SQLite Error:",e except: print "ERROR:",sys.exc_info() finally: if con: con.close()```

Don’t forget to update the wordsfilterlist.csv to track the stocks that you’re interested in. It looks like american stocks respond well to \$TICKER, this is because the stocktwits has forced a nice format. From a quick play around it looks like using tickers without symbols in front just matches random foreign tweets.

The hosting provider for this site is GoDaddy, they provide python 2.4 and python 2.7 on their basic economy web hosting package. Tweetstream is only compatible with python 2.6 and upwards.

Unfortunately the GoDaddy python 2.7 wasn’t built using the enable ssl options, thus communication over HTTPS which the twitter API requires isn’t possible. Fortunatly I have been able to find a solution which is discussed in this post.

Installing python over SSH and the commandline is normally a straight forward task but since I don’t have root access and there are no compilers available the problem is tricky to solve.

Fortunately it is possible to download pre-compiled python binaries and install them in a folder where I have read/write access. The prebuilt binaries can be found at http://www.activestate.com/activepython/downloads.

```wget http://downloads.activestate.com/ActivePython/releases/2.7.2.5/ActivePython-2.7.2.5-linux-x86.tar.gz tar -zxvf ActivePython-2.7.2.5-linux-x86.tar.gz cd ActivePython-2.7.2.5-linux-x86 ./install.ch
-bash-3.2\$ ./install.sh
Enter directory in which to install ActivePython. Leave blank and
press 'Enter' to use the default [/opt/ActivePython-2.7].
Install directory: ~/ActivePython
()
Installing ActivePython to '/var/chroot/home/content/92/9339192/ActivePython'...
Relocating dir-dependent files...
Pre-compiling .py files in the standard library...
ActivePython has been successfully installed to:
/var/chroot/home/content/92/9339192/ActivePython
to put ActivePython on your PATH:
export PATH=/var/chroot/home/content/92/9339192/ActivePython/bin:\$PATH
The documentation is available here:
/var/chroot/home/content/92/9339192/ActivePython/doc/python2.7/index.html
web: http://docs.activestate.com/activepython/2.7
Please send us any feedback you might have or log bugs here:
activepython-feedback@ActiveState.com
http://bugs.activestate.com/ActivePython/
Thank you for using ActivePython.
export PATH=~/ActivePython/bin:\$PATH
-bash-3.2\$ python2.7 ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
It should say ACTIVE PYTHON above if install was successful
Python 2.7.2 (default, Jun 24 2011, 09:10:09)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
>>> import socket
>>> socket.ssl
<function ssl at 0xb7f21ae4>
The above would have throw an error if ssl not installed
-bash-3.2\$ pip install tweetstream
Running setup.py egg_info for package tweetstream
Running setup.py egg_info for package anyjson
Installing collected packages: tweetstream, anyjson
Running setup.py install for tweetstream
Running setup.py install for anyjson
Successfully installed tweetstream anyjson
Cleaning up...
-bash-3.2\$ python twit.py
1 ( 0 tweets/sec). FinancialBin : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/dfWQb3ZL
2 ( 0 tweets/sec). stocksgame : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/WP92slZt
3 ( 0 tweets/sec). Daily_Finances : #News Global stocks fall 5th day, Brent off 2 percent on euro zone fears http://t.co/flCVonMF
4 ( 0 tweets/sec). AUTSmallBiz : [Sydney Morng Hrld] Stocks set to tumble http://t.co/OEP14YvL
5 ( 0 tweets/sec). BeckaDerrick : @MattoneAwang it's the same price to finance a brand new mini + get free insurance as it is to insure an old car #WhyIsEveryonePuttingMeOff!```

In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.

The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.

It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.

Installing the tweetstream library

Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream

Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.

?View Code PYTHON
 ```import tweetstream   twitterUsername = "USERNAME" twitterPassword = "PASSWORD" try: with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field"   except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason```

The following code lets us filter the tweets before they are downloaded

?View Code PYTHON
 ```import tweetstream   twitterUsername = "USERNAME" twitterPassword = "PASSWORD" twitterWordFilter = ["stocks","ftse","finance"]   try: with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field"   except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason```

 ```import csv   twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0])   print "Filtering the following words: ",', '.join(twitterWordFilter)```
 ```import tweetstream import csv   twitterUsername = "USERNAME" twitterPassword = "PASSWORD"   twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0])   print "Filtering the following words: ",', '.join(twitterWordFilter)   try: with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field"   except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason```