In the first of this two part series I’ll show you how to connect to the twitter API using the tweetstream (http://pypi.python.org/pypi/tweetstream) python library. The second part will store them in a database.
The twitter api FAQ (https://dev.twitter.com/docs/faq#6861) states that the free API gives us access to 1% of all the public tweets (known as the firehose) in any streaming second. We can chose to filter the tweets before they’re downloaded in which case the 1% we’re given of the firehose is more relevant.
It is unknown at this stage whether we want to collect general tweets or tweets specific to the financial industry. My hunch would be that tweets that are financial related contain the most information of the future movement of the market.
Installing the tweetstream library
Download the library from http://pypi.python.org/pypi/tweetstream and extract to a folder say C:\tweetstream
Navigate to that folder in a console end execute this command “python setup.py install”, with any luck the library should now be installed.
Downloading a generic set of tweets
This code allows you to download and display the TweetStream
import tweetstream twitterUsername = "USERNAME" twitterPassword = "PASSWORD" try: with tweetstream.TweetStream(twitterUsername, twitterPassword) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field" except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason |
Downloading a set of tweets matching a word filter
The following code lets us filter the tweets before they are downloaded
import tweetstream twitterUsername = "USERNAME" twitterPassword = "PASSWORD" twitterWordFilter = ["stocks","ftse","finance"] try: with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field" except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason |
Loading the word library form a csv file
This code will let us load in our filter words from a .csv file (wordstofilter.csv)
import csv twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0]) print "Filtering the following words: ",', '.join(twitterWordFilter) |
Putting it all together load filters form CSV and filter the stream
import tweetstream import csv twitterUsername = "USERNAME" twitterPassword = "PASSWORD" twitterWordFilter = [] #Defined the list wordListCsv = csv.reader(open('wordstofilter.csv', 'rb')) for row in wordListCsv: #Add the 0th column of the current row to the list twitterWordFilter.append(row[0]) print "Filtering the following words: ",', '.join(twitterWordFilter) try: with tweetstream.FilterStream(twitterUsername, twitterPassword,track=twitterWordFilter) as stream: for tweet in stream: try: print stream.count,"(",stream.rate,"tweets/sec). ",tweet['user']['screen_name'],':', tweet['text'].encode('utf-8') #print tweet #Use for raw output except: print "ERROR: Presumably missing field" except tweetstream.ConnectionError, e: print "Disconnected from twitter. Reason:", e.reason |
Pingback: Twitter Trading – Getting tweetstream working on GoDaddy | Gekko Quant – Quantitative Trading
Pingback: Twitter Trading – Downloading Tweets Using Python (Part 2 of 2) | Gekko Quant – Quantitative Trading
Hi,
I’m trying to collect some tweets based on the instruction steps above. However when entering the line:
print stream.count,”(“,stream.rate,”tweets/sec). “,tweet[‘user’][‘screen_name’],’:’, tweet[‘text’].encode(‘utf-8’)
I’m receiving the following error:
SyntaxError: invalid syntax
and the work stream from stream.count in filled in red.
I’m a beginner in this and would really appreciate it if I had your help.
thanks
import twitter.stream
twitterUsername = “hole”
twitterPassword = “sfdefe”
try:
with twitter.stream.TwitterStream(twitterUsername, twitterPassword) as stream:
for tweet in stream:
try:
print (twitter.stream.count,”(“,stream.rate,”tweets/sec). “,tweet[‘user’][‘screen_name’],’:’, tweet[‘text’].encode(‘utf-8’))
#print tweet #Use for raw output
except:
print (“ERROR: Presumably missing field”)
except :
print (“Disconnected from twitter. Reason:”)
i modified for python 3.4 but always giving error Disconnected from twitter. Reason: