Twitter Trading – Downloading Tweets Using Python (Part 1 of 2)

In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.

The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.

It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.

Installing the tweetstream library

Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream

Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.

Downloading a generic set of tweets

This code allows you to download and display the TweetStream

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
try:
    with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Downloading a set of tweets matching a word filter

The following code lets us filter the tweets before they are downloaded

?View Code PYTHON
import tweetstream
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
twitterWordFilter = ["stocks","ftse","finance"]
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason

Loading the word library form a csv file

This code will let us load in our filter words from a .csv file (wordstofilter.csv)

?View Code PYTHON
import csv
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)

Putting it all together load filters form CSV and filter the stream

?View Code PYTHON
import tweetstream
import csv
 
twitterUsername = "USERNAME"
twitterPassword = "PASSWORD"
 
twitterWordFilter = [] #Defined the list
wordListCsv = csv.reader(open('wordstofilter.csv', 'rb'))
for row in wordListCsv:
    #Add the 0th column of the current row to the list
    twitterWordFilter.append(row[0])
 
print "Filtering the following words: ",', '.join(twitterWordFilter)
 
try:
    with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream:
        for tweet in stream:
            try:
                print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8')
                #print tweet #Use for raw output
            except:
                print "ERROR: Presumably missing field"
 
except tweetstream.ConnectionError, e:
    print "Disconnected from twitter. Reason:", e.reason
Share on FacebookShare on TwitterSubmit to StumbleUponSave on DeliciousDigg ThisSubmit to redditShare via email