Neural Networks with R – A Simple Example

In this tutorial a neural network (or Multilayer perceptron depending on naming convention) will be build that is able to take a number and calculate the square root (or as close to as possible). Later tutorials will build upon this to make forcasting / trading models.The R library ‘neuralnet’ will be used to train and build the neural network.

There is lots of good literature on neural networks freely available on the internet, a good starting point is the neural network handout by Dr Mark Gayles at the Engineering Department Cambridge University http://mi.eng.cam.ac.uk/~mjfg/local/I10/i10_hand4.pdf, it covers just enough to get an understanding of what a neural network is and what it can do without being too mathematically advanced to overwhelm the reader.

The tutorial will produce the neural network shown in the image below. It is going to take a single input (the number that you want square rooting) and produce a single output (the square root of the input). The middle of the image contains 10 hidden neurons which will be trained.

The output of the script will look like:

Input Expected Output Neural Net Output

   Input 	Expected Output		 Neural Net Output
      1               1     		 0.9623402772
      4               2     		 2.0083461217
      9               3     		 2.9958221776
     16               4     		 4.0009548085
     25               5     		 5.0028838579
     36               6     		 5.9975810435
     49               7    		 6.9968278722
     64               8    		 8.0070028670
     81               9    		 9.0019220736
    100              10    		 9.9222007864

As you can see the neural network does a reasonable job at finding the square root, the largest error in in finding the square root of 1 which is out by ~4%

Onto the code:

?View Code RSPLUS
install.packages('neuralnet')
library("neuralnet")
 
#Going to create a neural network to perform sqare rooting
#Type ?neuralnet for more information on the neuralnet library
 
#Generate 50 random numbers uniformly distributed between 0 and 100
#And store them as a dataframe
traininginput <-  as.data.frame(runif(50, min=0, max=100))
trainingoutput <- sqrt(traininginput)
 
#Column bind the data into one variable
trainingdata <- cbind(traininginput,trainingoutput)
colnames(trainingdata) <- c("Input","Output")
 
#Train the neural network
#Going to have 10 hidden layers
#Threshold is a numeric value specifying the threshold for the partial
#derivatives of the error function as stopping criteria.
net.sqrt <- neuralnet(Output~Input,trainingdata, hidden=10, threshold=0.01)
print(net.sqrt)
 
#Plot the neural network
plot(net.sqrt)
 
#Test the neural network on some training data
testdata <- as.data.frame((1:10)^2) #Generate some squared numbers
net.results <- compute(net.sqrt, testdata) #Run them through the neural network
 
#Lets see what properties net.sqrt has
ls(net.results)
 
#Lets see the results
print(net.results$net.result)
 
#Lets display a better version of the results
cleanoutput <- cbind(testdata,sqrt(testdata),
                         as.data.frame(net.results$net.result))
colnames(cleanoutput) <- c("Input","Expected Output","Neural Net Output")
print(cleanoutput)
Share on FacebookShare on TwitterSubmit to StumbleUponhttp://gekkoquant.com/wp-content/uploads/2012/05/SquareRootNeuralNetImage-300x300.jpgDigg ThisSubmit to redditShare via email

Twitter Trading – Downloading Tweets Using Python (Part 2 of 2)

Previously in Part 1 code was produced to use tweetstream and download tweets in real time, this part of the series will look at how to store these tweets for future analysis.

The tweets will be saved into SqliteDB (http://www.sqlite.org/download.html) since this is compatible with python and R. Also its much simpler than some versions of SQL (no daemons / services to install). Good Sqlite information for R (Sqlite with R) and Python (Sqlite Python Docs)

The database will contain a “source” table. This is used to track what information source the tweets are downloaded from. For now it’ll just contain one entry, and that is twitter. Just collecting real time sweets means we’ll have to wait a long time to get a significant database to analyse. In a future article tweets will be downloaded from historical archives (hence are a different information source).

Setup the db:

-bash-3.2$ sqlite3 twitter.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> CREATE TABLE sources(sourceid INTEGER,sourcedesc,PRIMARY KEY(sourceid));
sqlite> INSERT INTO sources(sourcedesc) VALUES ("twitter");
sqlite> SELECT * FROM sources;
1|twitter
sqlite> CREATE TABLE tweets(id INTEGER, sourceid REAL,username TEXT,tweet TEXT,timestamp TIMESTAMP, PRIMARY KEY(id));
sqlite> .exit
-bash-3.2$

Now onto the code, you may want to alter the try catchs depending on how you want the script to respond to errors. On my server there is a cron job that launches the script every 5 min, so if it dies it’ll get restarted.

?View Code PYTHON
import sqlite3 as lite
import tweetstream
import csv
import sys
from datetime import date
import time
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
 
 
try:
    #Load the data base file (or make it if not found)
    #If dont set isolation level then we need to call
    #Db commit after every execute to save the transaction
    con = lite.connect('twitter.db',isolation_level=None)
    cur = con.cursor() #Use cursor for executing queries
    #Get the sourceid (will be useful when we use multiple data sources)
    cur.execute("SELECT sourceid FROM sources where sourcedesc='twitter'")
    sourceid = cur.fetchone() #Get the source id
    sourceid = sourceid[0]
 
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            tweettimestamp =  time.mktime(time.strptime(tweet['created_at'],"%a %b %d %H:%M:%S +0000 %Y")) - time.timezone
 
            print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('ascii','ignore')
            #print tweet #Use for raw output            
            try:
                cur.execute("INSERT INTO tweets(sourceid,username,tweet,timestamp) VALUES(?,?,?,?)",[sourceid,tweet['user']['screen_name'],tweet['text'].encode('ascii','ignore'),tweettimestamp])
            except:
                print "SQL Insert Error: Probably some encoding issue due to foreign languages"
 
 
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
except lite.Error, e:
    print "SQLite Error:",e
except:
    print "ERROR:",sys.exc_info()    
finally:
    if con:
        con.close()

Don’t forget to update the wordsfilterlist.csv to track the stocks that you’re interested in. It looks like american stocks respond well to $TICKER, this is because the stocktwits has forced a nice format. From a quick play around it looks like using tickers without symbols in front just matches random foreign tweets.

Attached is the complete project so far including a db (TwitterDownload.Zip).

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading – Getting tweetstream working on GoDaddy

In http://gekkoquant.com/2012/05/17/twitter-trading-downloading-tweets-using-python-part-1-of-2/ python code was provided for downloading tweets. The python tweetsteam library was used because python is more compatible than R when using cheap webhosting.

The hosting provider for this site is GoDaddy, they provide python 2.4 and python 2.7 on their basic economy web hosting package. Tweetstream is only compatible with python 2.6 and upwards.

Unfortunately the GoDaddy python 2.7 wasn’t built using the enable ssl options, thus communication over HTTPS which the twitter API requires isn’t possible. Fortunatly I have been able to find a solution which is discussed in this post.

Installing python over SSH and the commandline is normally a straight forward task but since I don’t have root access and there are no compilers available the problem is tricky to solve.

Fortunately it is possible to download pre-compiled python binaries and install them in a folder where I have read/write access. The prebuilt binaries can be found at http://www.activestate.com/activepython/downloads.

wget http://downloads.activestate.com/ActivePython/releases/2.7.2.5/ActivePython-2.7.2.5-linux-x86.tar.gz tar -zxvf ActivePython-2.7.2.5-linux-x86.tar.gz cd ActivePython-2.7.2.5-linux-x86 ./install.ch
-bash-3.2$ ./install.sh
Enter directory in which to install ActivePython. Leave blank and
press 'Enter' to use the default [/opt/ActivePython-2.7].
Install directory: ~/ActivePython
()
Installing ActivePython to '/var/chroot/home/content/92/9339192/ActivePython'...
Relocating dir-dependent files...
Pre-compiling .py files in the standard library...
ActivePython has been successfully installed to:
/var/chroot/home/content/92/9339192/ActivePython
You can add the following to your .bashrc (or equivalent)
to put ActivePython on your PATH:
export PATH=/var/chroot/home/content/92/9339192/ActivePython/bin:$PATH
The documentation is available here:
/var/chroot/home/content/92/9339192/ActivePython/doc/python2.7/index.html
 web: http://docs.activestate.com/activepython/2.7
Please send us any feedback you might have or log bugs here:
activepython-feedback@ActiveState.com

http://bugs.activestate.com/ActivePython/

Thank you for using ActivePython.
export PATH=~/ActivePython/bin:$PATH
-bash-3.2$ python2.7 ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
It should say ACTIVE PYTHON above if install was successful
Python 2.7.2 (default, Jun 24 2011, 09:10:09)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.ssl
<function ssl at 0xb7f21ae4>
The above would have throw an error if ssl not installed 
-bash-3.2$ pip install tweetstream
Downloading/unpacking tweetstream
 Downloading tweetstream-1.1.1.tar.gz
 Running setup.py egg_info for package tweetstream
Downloading/unpacking anyjson (from tweetstream)
 Downloading anyjson-0.3.1.tar.gz
 Running setup.py egg_info for package anyjson
Installing collected packages: tweetstream, anyjson
 Running setup.py install for tweetstream
Running setup.py install for anyjson
Successfully installed tweetstream anyjson
Cleaning up...
-bash-3.2$ python twit.py
1 ( 0 tweets/sec). FinancialBin : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/dfWQb3ZL
2 ( 0 tweets/sec). stocksgame : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/WP92slZt
3 ( 0 tweets/sec). Daily_Finances : #News Global stocks fall 5th day, Brent off 2 percent on euro zone fears http://t.co/flCVonMF
4 ( 0 tweets/sec). AUTSmallBiz : [Sydney Morng Hrld] Stocks set to tumble http://t.co/OEP14YvL
5 ( 0 tweets/sec). BeckaDerrick : @MattoneAwang it's the same price to finance a brand new mini + get free insurance as it is to insure an old car #WhyIsEveryonePuttingMeOff!

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading – Downloading Tweets Using Python (Part 1 of 2)

In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.

The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.

It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.

Installing the tweetstream library

Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream

Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.

Downloading a generic set of tweets

This code allows you to download and display the TweetStream

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
try:
    with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Downloading a set of tweets matching a word filter

The following code lets us filter the tweets before they are downloaded

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
twitterWordFilter = ["stocks","ftse","finance"]
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Loading the word library form a csv file

This code will let us load in our filter words from a .csv file (wordstofilter.csv)

?View Code PYTHON
import csv
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)

Putting it all together load filters form CSV and filter the stream

?View Code PYTHON
import tweetstream
import csv
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email