Libraries exist in R and Python for downloading tweets, however the majority of webhosts don’t support R but they do tend to support Python. Since we want to download tweets around the clock then we want the scripts to run on a webhost (the cheaper the better) hence this post on python.
I considered running a server at home but when you consider the electricity bill it actually works out significantly cheaper to use a webhost (saves ~£80 a year).
To get started with Python I’d recommend downloading www.pythonxy.com it’s multi-platform, has a good community and a ton of great packages for scientific analysis.
Visit http://docs.python.org/tutorial/ for a comprehensive tutorial on how to program in python.
Trading Strategies To Exploit Blog and News Sentiment (Paper)
I just came across this paper and wanted to document it here for something to come back to and test for myself, hopefully you will find it as interesting as I did.
The method has four Parameters:
- Sentiment Analysis Period – How many days of previous sentiment data to use?
- Holding Period – How long to hold a trade for?
- Market Capitalization – Do small cap and large cap respond the same?
- Diversiﬁcation – How many stocks to have in the portfolio?
Each of the trading model parameters is also analysed and their effects explained.The paper outlines a market neutral sentiment based trading algorithm which is back tested over a five year period (2005-2009) and produces some exceptionally impressive returns almost 40% in certain years depending on configuration.
What i like most about the paper is that the asset to trade is selected based upon a fixed criteria (ie is it in the top n most extreme sentiments), this stops positive bias effects whereby the author could just present profitable scenarios / cherry pick the results.
The sentiment is based upon analysing news posts, blog posts and tweets. Since twitter only came into existence in 2009 the authors only had half a years worth of twitter data to analyse. The great results in this paper were achieved without twitter data using normal news and blog sources.
The paper shows that corpus size matters, using blogs might be a cheaper method to collect a corpus (scrape lots of RSS feeds), whereas with twitter there are limitations to what data you can get for free (full datafeeds start at $3500 a month!!!!).
In this quick tutorial I will introduce the PerformanceAnalytics library, the library lets us easily analyse the performance of our strategies. In this tutorial you will learn how to plot cumulative returns and draw downs compared to an index, output a table of monthly performance metrics, use boxplots to investigate strategy outliers and finally plot histograms of returns and overlay with different statistical measures.
You will need some returns data for this tutorial, I have created a file with some returns in to get you started: strategyperfomance.csv
The three image outputs are:
The text output is:
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Strategy1 Strategy2 Index
2005 1.8 0.0 -0.3 1.9 1.4 1.4 0.7 1.3 0.7 0.0 1.2 1.2 12.0 9.2 9.6
2006 0.1 0.9 0.9 1.4 1.0 1.8 -1.7 0.4 1.0 0.2 0.1 0.5 6.9 4.0 6.3
2007 1.5 0.5 0.4 -0.9 1.3 1.8 -2.9 -0.3 -0.2 -1.1 -1.3 0.5 -0.9 -4.5 -3.7
2008 1.8 -1.0 3.5 1.5 0.2 -2.4 -1.0 -1.3 1.3 2.1 -5.3 2.9 1.9 0.4 -0.9
2009 -2.0 -4.4 0.5 -0.5 2.4 3.1 2.0 -0.6 -1.9 1.6 2.2 1.7 3.8 -0.5 0.8
2010 1.4 1.1 -0.8 -3.0 -0.1 -3.3 2.6 4.1 1.4 0.2 2.8 1.5 8.0 8.3 8.6
2011 0.9 -0.2 0.7 1.0 -1.3 2.5 0.6 0.5 -0.5 -2.0 -0.5 -0.6 1.0 -1.2 -1.5
Onto the code:
install.packages("PerformanceAnalytics") #Install the PerformanceAnalytics library
library("PerformanceAnalytics") #Load the PerformanceAnalytics library
#Load in our strategy performance spreadsheet
#Structure is: Date,Strategy1,Strategy2,Index
#Each row contains the period returns (% terms)
strategies <- read.zoo("strategyperfomance.csv", sep = ",", header = TRUE, format="%Y-%m-%d")
#List all the column names to check data loaded in correctly
#Can access colums with strategies["columnname"]
#Lets see how all the strategies faired against the index
charts.PerformanceSummary(strategies,main="Performance of all Strategies")
#Lets see how just strategy one faired against the index
charts.PerformanceSummary(strategies[,c("Strategy1","Index")],main="Performance of Strategy 1")
#Lets calculate a table of montly returns by year and strategy
#Lets make a boxplot of the returns
#Set the plotting area to a 2 by 2 grid
#Plot various histograms with different overlays added
chart.Histogram(strategies[,"Strategy1"], main = "Plain", methods = NULL)
chart.Histogram(strategies[,"Strategy1"], main = "Density", breaks=40, methods = c("add.density", "add.normal"))
chart.Histogram(strategies[,"Strategy1"], main = "Skew and Kurt", methods = c("add.centered", "add.rug"))
chart.Histogram(strategies[,"Strategy1"], main = "Risk Measures", methods = c("add.risk"))
#Note: The above histogram plots is taken from the example documentation
#The documentation is excellent
The performance analytics documentation (including more examples) can be found at:
In this quick tutorial I will show you how to use the quantmod library to download historical data, plot it, add a technical indicator (Bollinger Bands) and do some basic manipulation with date ranges and intersecting data sets.
The tutorial will let you make images such as:
This is a very quick tutorial designed to get you started / make you aware of certain packages.
Onto the code:
install.packages("quantmod") #Install the quantmod library
library("quantmod") #Load the quantmod Library
stockData <- new.env() #Make a new environment for quantmod to store data in
startDate = as.Date("2008-01-13") #Specify period of time we are interested in
endDate = as.Date("2012-01-12")
tickers <- c("ARM","CSR") #Define the tickers we are interested in
#Download the stock history (for all tickers)
getSymbols(tickers, env = stockData, src = "yahoo", from = startDate, to = endDate)
#Use head to show first six rows of matrix
#Lets look at the just the closing prices
#Lets plot the data
#Lets add some bollinger bands to the plot (with period 50 & width 2 standard deviations)
?addBBands #Make R display the help documentation so we know what variables to pass to the function
#Lets get the technical indicator values saved into a variable
#Note must give it a single time series (I gave it the close price in this example)
indicatorValuesBBands <- BBands(Cl(stockData$ARM),n=50, sd=2)
#Lets examine only a 1 month period of data
armSubset<- window(stockData$ARM, start = as.Date("2010-02-15"), end = as.Date("2010-03-15"))
armSubset #Lets see the data
#Lets extract a 1 month period of data for CSR but starting midway through the arm data
csrSubset<- window(stockData$CSR, start = as.Date("2010-02-25"), end = as.Date("2010-03-25"))
csrSubset #Lets see the data
#Now we want to get the intersection of the two subsets of data
#this will gives us all the sets of data where the dates match
#Its important to match the date series to stop spurious analysis of non-synchronised data
#All=FALSE specifies the intersection as in don't include all dates in the merge
armcsrIntersection <- merge(armSubset, csrSubset, all = FALSE)
subset(armcsrIntersection,select = c("ARM.Open","CSR.Open")) #Select the open columns and display
Further material can be found at:
A standard idea in behavioral economics is that emotions play a large part in decision making and profoundly influence an agents behavior. This line of logic can be applied to the stock market, price moves are a function of the emotions of the agents in the market.
In 2011 a Paper by Johan Bollen, Huina Mao, Xiaojun Zeng called “Twitter mood predicts the stock market”, it is shown that by applying sentiment analysis to twitter posts (tweets) it is possible to gauge the current emotional state of agents. The paper then goes on to argue that the emotion of twitter is correlated with market movements and possibly even predictive of the movements.
After this landmark paper was first publish a number of hedge funds have taken the idea and produced twitter funds, the most publicly known twitter fund is run by Derwent Capital.
I plan on investigating this idea further in this blog, but if you want to get started before me the following should be useful: