Twitter Trading – Downloading Tweets Using Python (Part 2 of 2)

Previously in Part 1 code was produced to use tweetstream and download tweets in real time, this part of the series will look at how to store these tweets for future analysis.

The tweets will be saved into SqliteDB (http://www.sqlite.org/download.html) since this is compatible with python and R. Also its much simpler than some versions of SQL (no daemons / services to install). Good Sqlite information for R (Sqlite with R) and Python (Sqlite Python Docs)

The database will contain a “source” table. This is used to track what information source the tweets are downloaded from. For now it’ll just contain one entry, and that is twitter. Just collecting real time sweets means we’ll have to wait a long time to get a significant database to analyse. In a future article tweets will be downloaded from historical archives (hence are a different information source).

Setup the db:

-bash-3.2$ sqlite3 twitter.db
SQLite version 3.5.9
Enter ".help" for instructions
sqlite> CREATE TABLE sources(sourceid INTEGER,sourcedesc,PRIMARY KEY(sourceid));
sqlite> INSERT INTO sources(sourcedesc) VALUES ("twitter");
sqlite> SELECT * FROM sources;
1|twitter
sqlite> CREATE TABLE tweets(id INTEGER, sourceid REAL,username TEXT,tweet TEXT,timestamp TIMESTAMP, PRIMARY KEY(id));
sqlite> .exit
-bash-3.2$

Now onto the code, you may want to alter the try catchs depending on how you want the script to respond to errors. On my server there is a cron job that launches the script every 5 min, so if it dies it’ll get restarted.

?View Code PYTHON
import sqlite3 as lite
import tweetstream
import csv
import sys
from datetime import date
import time
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
 
 
try:
    #Load the data base file (or make it if not found)
    #If dont set isolation level then we need to call
    #Db commit after every execute to save the transaction
    con = lite.connect('twitter.db',isolation_level=None)
    cur = con.cursor() #Use cursor for executing queries
    #Get the sourceid (will be useful when we use multiple data sources)
    cur.execute("SELECT sourceid FROM sources where sourcedesc='twitter'")
    sourceid = cur.fetchone() #Get the source id
    sourceid = sourceid[0]
 
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            tweettimestamp =  time.mktime(time.strptime(tweet['created_at'],"%a %b %d %H:%M:%S +0000 %Y")) - time.timezone
 
            print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('ascii','ignore')
            #print tweet #Use for raw output            
            try:
                cur.execute("INSERT INTO tweets(sourceid,username,tweet,timestamp) VALUES(?,?,?,?)",[sourceid,tweet['user']['screen_name'],tweet['text'].encode('ascii','ignore'),tweettimestamp])
            except:
                print "SQL Insert Error: Probably some encoding issue due to foreign languages"
 
 
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
except lite.Error, e:
    print "SQLite Error:",e
except:
    print "ERROR:",sys.exc_info()    
finally:
    if con:
        con.close()

Don’t forget to update the wordsfilterlist.csv to track the stocks that you’re interested in. It looks like american stocks respond well to $TICKER, this is because the stocktwits has forced a nice format. From a quick play around it looks like using tickers without symbols in front just matches random foreign tweets.

Attached is the complete project so far including a db (TwitterDownload.Zip).

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading – Getting tweetstream working on GoDaddy

In http://gekkoquant.com/2012/05/17/twitter-trading-downloading-tweets-using-python-part-1-of-2/ python code was provided for downloading tweets. The python tweetsteam library was used because python is more compatible than R when using cheap webhosting.

The hosting provider for this site is GoDaddy, they provide python 2.4 and python 2.7 on their basic economy web hosting package. Tweetstream is only compatible with python 2.6 and upwards.

Unfortunately the GoDaddy python 2.7 wasn’t built using the enable ssl options, thus communication over HTTPS which the twitter API requires isn’t possible. Fortunatly I have been able to find a solution which is discussed in this post.

Installing python over SSH and the commandline is normally a straight forward task but since I don’t have root access and there are no compilers available the problem is tricky to solve.

Fortunately it is possible to download pre-compiled python binaries and install them in a folder where I have read/write access. The prebuilt binaries can be found at http://www.activestate.com/activepython/downloads.

wget http://downloads.activestate.com/ActivePython/releases/2.7.2.5/ActivePython-2.7.2.5-linux-x86.tar.gz tar -zxvf ActivePython-2.7.2.5-linux-x86.tar.gz cd ActivePython-2.7.2.5-linux-x86 ./install.ch
-bash-3.2$ ./install.sh
Enter directory in which to install ActivePython. Leave blank and
press 'Enter' to use the default [/opt/ActivePython-2.7].
Install directory: ~/ActivePython
()
Installing ActivePython to '/var/chroot/home/content/92/9339192/ActivePython'...
Relocating dir-dependent files...
Pre-compiling .py files in the standard library...
ActivePython has been successfully installed to:
/var/chroot/home/content/92/9339192/ActivePython
You can add the following to your .bashrc (or equivalent)
to put ActivePython on your PATH:
export PATH=/var/chroot/home/content/92/9339192/ActivePython/bin:$PATH
The documentation is available here:
/var/chroot/home/content/92/9339192/ActivePython/doc/python2.7/index.html
 web: http://docs.activestate.com/activepython/2.7
Please send us any feedback you might have or log bugs here:
activepython-feedback@ActiveState.com

http://bugs.activestate.com/ActivePython/

Thank you for using ActivePython.
export PATH=~/ActivePython/bin:$PATH
-bash-3.2$ python2.7 ActivePython 2.7.2.5 (ActiveState Software Inc.) based on
It should say ACTIVE PYTHON above if install was successful
Python 2.7.2 (default, Jun 24 2011, 09:10:09)
[GCC 4.0.2 20051125 (Red Hat 4.0.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> socket.ssl
<function ssl at 0xb7f21ae4>
The above would have throw an error if ssl not installed 
-bash-3.2$ pip install tweetstream
Downloading/unpacking tweetstream
 Downloading tweetstream-1.1.1.tar.gz
 Running setup.py egg_info for package tweetstream
Downloading/unpacking anyjson (from tweetstream)
 Downloading anyjson-0.3.1.tar.gz
 Running setup.py egg_info for package anyjson
Installing collected packages: tweetstream, anyjson
 Running setup.py install for tweetstream
Running setup.py install for anyjson
Successfully installed tweetstream anyjson
Cleaning up...
-bash-3.2$ python twit.py
1 ( 0 tweets/sec). FinancialBin : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/dfWQb3ZL
2 ( 0 tweets/sec). stocksgame : Global stocks fall 5th day, Brent off 2 percent on euro zone fears: NEW YORK (Reuters) - World stocks fell for a... http://t.co/WP92slZt
3 ( 0 tweets/sec). Daily_Finances : #News Global stocks fall 5th day, Brent off 2 percent on euro zone fears http://t.co/flCVonMF
4 ( 0 tweets/sec). AUTSmallBiz : [Sydney Morng Hrld] Stocks set to tumble http://t.co/OEP14YvL
5 ( 0 tweets/sec). BeckaDerrick : @MattoneAwang it's the same price to finance a brand new mini + get free insurance as it is to insure an old car #WhyIsEveryonePuttingMeOff!

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading – Downloading Tweets Using Python (Part 1 of 2)

In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.

The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.

It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.

Installing the tweetstream library

Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream

Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.

Downloading a generic set of tweets

This code allows you to download and display the TweetStream

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
try:
    with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Downloading a set of tweets matching a word filter

The following code lets us filter the tweets before they are downloaded

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
twitterWordFilter = ["stocks","ftse","finance"]
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Loading the word library form a csv file

This code will let us load in our filter words from a .csv file (wordstofilter.csv)

?View Code PYTHON
import csv
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)

Putting it all together load filters form CSV and filter the stream

?View Code PYTHON
import tweetstream
import csv
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Trading Strategies To Exploit Blog, News & Twitter Sentiment (Paper)

Trading Strategies To Exploit Blog and News Sentiment (Paper)

I just came across this paper and wanted to document it here for something to come back to and test for myself, hopefully you will find it as interesting as I did.

The method has four Parameters:

  • Sentiment Analysis Period – How many days of previous sentiment data to use?
  • Holding Period – How long to hold a trade for?
  • Market Capitalization – Do small cap and large cap respond the same?
  • Diversification – How many stocks to have in the portfolio?

Each of the trading model parameters is also analysed and their effects explained.The paper outlines a market neutral sentiment based trading algorithm which is back tested over a five year period (2005-2009) and produces some exceptionally impressive returns almost 40% in certain years depending on configuration. 

What i like most about the paper is that the asset to trade is selected based upon a fixed criteria (ie is it in the top n most extreme sentiments), this stops positive bias effects whereby the author could just present profitable scenarios / cherry pick the results.

The sentiment is based upon analysing news posts, blog posts and tweets. Since twitter only came into existence in 2009 the authors only had half a years worth of twitter data to analyse. The great results in this paper were achieved without twitter data using normal news and blog sources.

The paper shows that corpus size matters, using blogs might be a cheaper method to collect a corpus (scrape lots of RSS feeds), whereas with twitter there are limitations to what data you can get for free (full datafeeds start at $3500 a month!!!!).

 

 

 

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email

Twitter Trading & Sentiment Analysis

A standard idea in behavioral economics is that emotions play a large part in decision making and profoundly influence an agents behavior. This line of logic can be applied to the stock market, price moves are a function of the emotions of the agents in the market.

In 2011 a Paper by Johan Bollen, Huina Mao, Xiaojun Zeng called “Twitter mood predicts the stock market”, it is shown that by applying sentiment analysis to twitter posts (tweets) it is possible to gauge the current emotional state of agents. The paper then goes on to argue that the emotion of twitter is correlated with market movements and possibly even predictive of the movements.

After this landmark paper was first publish a number of hedge funds have taken the idea and produced twitter funds, the most publicly known twitter fund is run by Derwent Capital.

http://www.efinancialnews.com/story/2011-08-15/twitter-derwent-capital-hedge-fund

I plan on investigating this idea further in this blog, but if you want to get started before me the following should be useful:

Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email