Twitter Trading & Sentiment Analysis

A standard idea in behavioral economics is that emotions play a large part in decision making and profoundly influence an agents behavior. This line of logic can be applied to the stock market, price moves are a function of the emotions of the agents in the market.

In 2011 a Paper by Johan Bollen, Huina Mao, Xiaojun Zeng called “Twitter mood predicts the stock market”, it is shown that by applying sentiment analysis to twitter posts (tweets) it is possible to gauge the current emotional state of agents. The paper then goes on to argue that the emotion of twitter is correlated with market movements and possibly even predictive of the movements.

After this landmark paper was first publish a number of hedge funds have taken the idea and produced twitter funds, the most publicly known twitter fund is run by Derwent Capital.

http://www.efinancialnews.com/story/2011-08-15/twitter-derwent-capital-hedge-fund

I plan on investigating this idea further in this blog, but if you want to get started before me the following should be useful:

5 thoughts on “Twitter Trading & Sentiment Analysis

  1. Hi, I’m looking forward to see how you tackle this. Are you planning to do it the same way as described in the paper? POMS-Bipolar mood dimensions, SOFNN training and all that? I started working on this some time ago, but then somehow lost focus and didn’t get very far…

    • Hi Vita,
      How I see the problem is:
      1. Getting and storing tweets (enough for statistical significance), apparently the API limits the number of requests you can do for free, so will probably have to sign up for lots of accounts. I hope to develop a library to make this process easy for less experienced programmers / let them skip to the more interesting side of the problem..
      2. Performing the sentiment analysis will largely be done by existing R libraries, I’m still to research this aspect. So yes if SOFNN is available I will use it.

      • So I did some research on collecting Twitter data.

        I had Problem 1 solved about 11 months ago when one could access Twitter’s streaming API without limitations. I used the Python module tweetstream (http://pypi.python.org/pypi/tweetstream) to get a real-time stream of tweets which I then filtered (English only, expressions of feeling, no URLs) and stored in a relational database. The idea was I’d be running this continually and analyzing the accumulated DB whenever I needed to.

        Unfortunately, today you can get only about 1% of the total (“Firehose”) tweet volume for free. Although it seems possible to ask for up to 400 keywords specifically (see https://dev.twitter.com/docs/streaming-api/methods). So maybe one could set up several filters for words related to feelings and thus “fill up” the 1% volume limit only with relevant tweets. But will 1% of all tweets be enough to draw any conclusions?

        It’s also possible to get 10% volume limit (“Gardenhose”) but it must be for “research or education purposes”: https://dev.twitter.com/discussions/2907

        Another option is to get paid data from Gnip or DataSift: https://dev.twitter.com/docs/twitter-data-providers. Very expensive 🙁

        • Vita I really appreciated that post, its full of gems.
          There is some useful info in: https://dev.twitter.com/docs/faq#6861

          “If you’re using the filter feature of the Streaming API, you’ll be streamed Y tweets per “streaming second” that match your criteria, where Y tweets can never exceed 1% of X public tweets in the firehose during that same streaming second.”

          So yes, it appears we can filter the tweets before they’re downloaded. I would imagine that less than 1% of tweets will be financial related in any streaming second, hence the limit won’t effect us (this can be tested).

          Its difficult to say if we want to analyse all tweets, or just those that match a bank of financial filters. I’m inclined to say that it might be best to filter for the stocks that make up an index. ie filter for the FTSE 100 constituents then we can either trade a portfolio of stocks or just the index.

          I’m torn between accessing the twitter data using either R, Python or PHP. I need to see how easy it is to get R running on this webhost (Python is already installed). From my understanding a cron job can be setup to download a cache of tweets, it’s not required to have a client running constantly. I’ll then put the tweets into a db for later analysis.

  2. My company: InfoTrie (www.infotrie.com) has developed FinSentS a Sentiment Analysis and News Analytics system. We process News, blogs and social media (mainly Twitter) for thousands of Stocks, FX and commodities.

    We provide Sentiment API. Feel free to have a look on portal.finsents.com

Leave a Reply

Your email address will not be published. Required fields are marked *