Sentiment, intensity and user attributes

One of the most hyped applications of big data analysis to social media is sentiment analysis (a.k.a. opinion mining). Sentiment analysis is the area of Natural Language Processing that aims to identify and extract subjective information from text. This generally includes identifying if a piece of text is subjective or objective, what sentiment (a.k.a. valence) it expresses (positive or negative), what emotion it conveys and towards which entity or aspect of the text. Companies and marketers are mostly interested in automatically inferring public opinion about products, movies or actions.

Opposite to mining these attitudes towards other objects, people also express their own emotions online. We decided to analyze this less popular facet: learning about the emotions of people posting subjective messages. In this post I’ll present variations in sentiment and intensity of Facebook posts and how these vary with the attributes of the people that post them. I will investigate a number of user traits such as gender, age and personality.

Data and Methods

In psychology research, the emotionality of a piece of text is usually measured by two independent scales: one measures the sentiment or valence (from negative to positive) and the other measures intensity or arousal (from low to high). This is known as Russell’s circumplex model of affect and is displayed below:


Russell's Circumplex of affect. Sentiment (negative to positive) is on the X axis and intensity (low to high) is on the Y axis.
Russell’s Circumplex of affect. Sentiment (negative to positive) is on the X axis and intensity (low to high) is on the Y axis.

We created a training dataset with ratings of 3120 posts performed by two annotators who rated posts from 1-9 on both sentiment and intensity scales. We trained two linear regression models on these posts. Here are a couple of examples and ratings:

Is the one whoz GOing to Light Up your Day!!!!!!!!!!!!

positive sentiment, high intensity

Blessed with a baby boy today…

positive sentiment, low intensity

the boring life is back :(

negative sentiment, low intensity

IS SUPER STRESSED AND ITS JUST THE SECOND MONTH OF SCHOOL..D:

negative sentiment, high intensity

We use a set of Facebook posts from the amazing MyPersonality dataset consisting of 115,312 users with their reported age, gender and Big 5 personality (self-assessed). We automatically score all their posts for sentiment and intensity. Each user is then assigned the average score of its messages for both aspects.

This figure shows the Pearson correlations of each trait (point-biserial in case of gender) with both sentiment and intensity:


Sentiment and intensity Pearson correlations with user attributes such as gender (point biserial correlation with females), age and personality.
Sentiment and intensity Pearson correlations with user attributes such as gender (point biserial correlation with females), age and personality. All correlations are significant (p<.0001).[/caption]

Both sentiment and intensity are correlated in the same direction for all user attributes because both annotations are highly correlated as well (.526 Pearson correlation between user scores). This means positive sentiment messages are higher in intensity, while negative sentiment messages are usually low intensity. This is likely caused by the annotators not being fully able to distinguish the two dimensions.

Gender

The largest correlations are between females and both affect and intensity. The average female affect and intensity scores are 5.24 and 3.24 while the average males scores are 5.09 and 2.96, significantly lower in both cases. This adds to a long standing debate over whether females are higher in expression of emotions.

Age

Both dimensions are correlated positively with age, but to a lower extent. Researchers previously found that positive emotions increase with age, while negative emotions decrease.

Although here we study at the level of posts, one of my team’s previous studies noticed this ageing positivity effect at a word level. Here is a plot from their study where we see that negative words like ‘bored’ and ‘hate’ decrease in use with age, while some positive ones such as ‘grateful’ and ‘proud’ increase:

[caption id="attachment_323" align="alignnone" width="799"]Sentiment and intensity Pearson correlations with user attributes such as gender (point biserial correlation with females), age and personality. Sentiment and intensity Pearson correlations with user attributes such as gender (point biserial correlation with females), age and personality.

Personality

Conscientiousness, extroversion and agreeableness are all positively correlated with sentiment and intensity (more positive and intense messages), while people who are more neurotic and have high openness to experience post more negative and less intense messages. Psychologists describe the presence of positive affect as a distinctive trait of extraverts (e.g. higher frequency of positive affect, activity). On the other hand, neuroticism is associated with the presence of negative affect.

Resources

If you are interested in reading more about sentiment analysis, I recommend reading this review paper.

A few resources on sentiment analysis tools available online:

  1. Online sentiment analysis demos: Sentiment 140, NLTK example, Deep learning sentiment analysis demo

  2. Most popular sentiment lexicons: MPQA, NRC Sentiment and Emotion, HL, WordNet Affect

  3. Practical tutorial

  4. Competitions: Twitter Sentiment Analysis, Sentiment Analysis of Figurative Language in Twitter, Aspect Based Sentiment Analysis

Share this on ...Share on Facebook0Tweet about this on Twitter0Share on Google+0Print this pageEmail this to someone

About Daniel Preotiuc

Daniel is a Postdoctoral researcher at the University of Pennsylvania. His research is situated at the intersection of Natural Language Processing, Machine Learning and Social Science. His current interests include spatial and temporal learning models for text, user attribute prediction from text and Gaussian Processes, using large user-generated data coming from Social Media. Prior to joining UPenn, Daniel completed his PhD in Natural Language Processing and Machine Learning at the University of Sheffield, UK and was a researcher on the Trendminer EU FP7 project.

Leave a Reply

Your email address will not be published. Required fields are marked *