Arpan Ghosh will present his MSE thesis talk on Wednesday May 1 at 3PM in Room 301
(note room!). the members of his committee are: Mung Chiang, ELE (advisor) and
Andrea LaPaugh (reader). Everyone is invited to attend his talk. His abstract follows
below.
------------------------------------
Abstract:
Tracking the Business Pulse Using Twitter
While Twitter surged to popularity soon after its creation, it took a few years to find legitimate & profitable uses for it.
Just like it brought fans closer than ever to celebrities, people closer to real-time news and voters closer to presidential
candidates, today it is helping define a new paradigm for consumer businesses to interact with their past, present
and future customers. Twitter makes it very easy for businesses to directly ‘reach-out’ to the customers that follow
them, for the purposes of announcements, promotions and advertising. The return path of the communication loop
wherein a company would like to analyze customer needs, sentiment, feedback, reviews and grievances, is much
harder for several reasons. Twitter as a data stream is very sparse and has a low signal to noise ratio as it continues
to be dominated by tweets that are basically about nothing: personal conversations, random observations or spam.
It is a slight exaggeration to classify tweets as semi-structured data, even with the presence of hashtags, due to the
dynamic method of Hashtag creation and their longevity. Twitter also provides very sparse information about the users
authoring tweets. This work is driven by the vision of building a tool that allows businesses to monitor, analyze and
gain insights, in near real-time, of the opinion, sentiment and feedback that exists about them on social networks like
twitter, analogous to what Google finance accomplishes for the financial aspect of businesses. To this end we make
the following contributions: 1) A quantitative analysis of twitter usage to determine its suitability for this business-oriented
use case. 2) A classifier filter out ‘irrelevant’ tweets for the aforementioned use-case. 3) An algorithm to detect trending
increased activity on twitter pertaining to a business and cluster periods of high activity into logically separable stories/events
in the business’ timeline. 4) An algorithm to classify people talking about a business on twitter based on their
expertise/influence in the area/topic that the business specializes in.