The danger of Twitter sentiment analysis

So I was reading an article from TechCrunch entitled "Sentiment Is Split On The iPad: People Either Love It, Or Hate Others For Not Shutting Up About It". The subject was funny so I decided to read it. But, when I started to look at their sources I realized they were pretty much just using TweetFeel which is a service that tries to do sentiment analysis on the tiny twitter messages flowing around with the given keyword.

Sentiment analysis is a very hot topic lately and there are lots of interesting results from it. However, it doesn't work that well, because most of the methods are based on keywords around the concept you are looking for and language is not very good at being locally unambiguous. Twitter makes it different:

- It has a positive thing that people can't write much, so they will put their sentiments there and not just make a reference to it in an far away phrase
- It's bad because there is only so much context that can be obtained from 140 characters.

So I decided to use TweetFeel to see what data they were using. They made some references in the article, but I wasn't sold. TweetFeel is quite interesting: it keeps streaming the references to the keyword you enter (in this case "ipad") and highlights it in green if it's considered good and red if it's bad. It also keeps count of good and bad references.

After letting it run for a minute or so I was seeing about the same thing that the TechCrunch article mentioned:

Negative: 37 (52%)
Positive: 34 (48%)

But when I started looking at what was considered positive and negative, I started seeing some very interesting tweets (I don't recommend people clicking on the links):

Free ipad WTF Risen #thefeelingyouget #TLS 5lko
Free ipad WTF #OMGThatsSoTrue Feliz Páscoa #TheFeelingYouGet #HappyBdayKoba j59y

In other words, lots of spam sources that were trying to use common keywords to get people to click on their links hoping it had something to do with the their keyword spam. Moreover, because had the word "WTF" I'm guessing TweetFeel considered that negative. My 1-minute sample is not significant, but if I remove all those spam tweets, here is the new count:

Negative: 26 (43%)
Positive: 34 (57%)

(as I said, this is not statistically significant, so don't take these numbers too seriously)

Anyway, now that I'm talking about the iPad, one might be wondering if I'm planning on buying one. The answer right now is "no". If I looks at how I access the information that I want to access and interact, I don't really think that there is a gap that is worth $500. Although there are some apps on it that I really wished I could access without having one, even if it's just to play around with it for some time (like the Marvel app for reading comics). I just hope that the trend is not for thing to migrate all to the iPad framework and not to also have a web or other computer-based way of accessing it.