One of the most hyped applications of big data analysis to social media is sentiment analysis (a.k.a. opinion mining). Sentiment analysis is the area of Natural Language Processing that aims to identify and extract subjective information from text. This generally includes identifying if a piece of text is subjective or objective, what sentiment (a.k.a. valence) it expresses (positive or negative), what emotion it conveys and towards which entity or aspect of the text. Companies and marketers are mostly interested in automatically inferring public opinion about products, movies or actions.
Opposite to mining these attitudes towards other objects, people also express their own emotions online. We decided to analyze this less popular facet: learning about the emotions of people posting subjective messages. In this post I’ll present variations in sentiment and intensity of Facebook posts and how these vary with the attributes of the people that post them. I will investigate a number of user traits such as gender, age and personality.
Data and Methods
In psychology research, the emotionality of a piece of text is usually measured by two independent scales: one measures the sentiment or valence (from negative to positive) and the other measures intensity or arousal (from low to high). This is known as Russell’s circumplex model of affect and is displayed below:
We created a training dataset with ratings of 3120 posts performed by two annotators who rated posts from 1-9 on both sentiment and intensity scales. We trained two linear regression models on these posts. Here are a couple of examples and ratings:
Is the one whoz GOing to Light Up your Day!!!!!!!!!!!!positive sentiment, high intensity
Blessed with a baby boy today…positive sentiment, low intensity
the boring life is back …negative sentiment, low intensity
IS SUPER STRESSED AND ITS JUST THE SECOND MONTH OF SCHOOL..D:negative sentiment, high intensity
We use a set of Facebook posts from the amazing MyPersonality dataset consisting of 115,312 users with their reported age, gender and Big 5 personality (self-assessed). We automatically score all their posts for sentiment and intensity. Each user is then assigned the average score of its messages for both aspects.
This figure shows the Pearson correlations of each trait (point-biserial in case of gender) with both sentiment and intensity:
Both sentiment and intensity are correlated in the same direction for all user attributes because both annotations are highly correlated as well (.526 Pearson correlation between user scores). This means positive sentiment messages are higher in intensity, while negative sentiment messages are usually low intensity. This is likely caused by the annotators not being fully able to distinguish the two dimensions.
The largest correlations are between females and both affect and intensity. The average female affect and intensity scores are 5.24 and 3.24 while the average males scores are 5.09 and 2.96, significantly lower in both cases. This adds to a long standing debate over whether females are higher in expression of emotions.
Both dimensions are correlated positively with age, but to a lower extent. Researchers previously found that positive emotions increase with age, while negative emotions decrease.
Although here we study at the level of posts, one of my team’s previous studies noticed this ageing positivity effect at a word level. Here is a plot from their study where we see that negative words like ‘bored’ and ‘hate’ decrease in use with age, while some positive ones such as ‘grateful’ and ‘proud’ increase:[caption id="attachment_323" align="alignnone" width="799"] Sentiment and intensity Pearson correlations with user attributes such as gender (point biserial correlation with females), age and personality.
Conscientiousness, extroversion and agreeableness are all positively correlated with sentiment and intensity (more positive and intense messages), while people who are more neurotic and have high openness to experience post more negative and less intense messages. Psychologists describe the presence of positive affect as a distinctive trait of extraverts (e.g. higher frequency of positive affect, activity). On the other hand, neuroticism is associated with the presence of negative affect.
If you are interested in reading more about sentiment analysis, I recommend reading this review paper.
A few resources on sentiment analysis tools available online:
- Online sentiment analysis demos: Sentiment 140, NLTK example, Deep learning sentiment analysis demo