I just came across this paper and wanted to document it here for something to come back to and test for myself, hopefully you will find it as interesting as I did.
The method has four Parameters:
- Sentiment Analysis Period – How many days of previous sentiment data to use?
- Holding Period – How long to hold a trade for?
- Market Capitalization – Do small cap and large cap respond the same?
- Diversiﬁcation – How many stocks to have in the portfolio?
Each of the trading model parameters is also analysed and their effects explained.The paper outlines a market neutral sentiment based trading algorithm which is back tested over a five year period (2005-2009) and produces some exceptionally impressive returns almost 40% in certain years depending on configuration.
What i like most about the paper is that the asset to trade is selected based upon a fixed criteria (ie is it in the top n most extreme sentiments), this stops positive bias effects whereby the author could just present profitable scenarios / cherry pick the results.
The sentiment is based upon analysing news posts, blog posts and tweets. Since twitter only came into existence in 2009 the authors only had half a years worth of twitter data to analyse. The great results in this paper were achieved without twitter data using normal news and blog sources.
The paper shows that corpus size matters, using blogs might be a cheaper method to collect a corpus (scrape lots of RSS feeds), whereas with twitter there are limitations to what data you can get for free (full datafeeds start at $3500 a month!!!!).