All posts by Daniel Preotiuc

About Daniel Preotiuc

Daniel is a Postdoctoral researcher at the University of Pennsylvania. His research is situated at the intersection of Natural Language Processing, Machine Learning and Social Science. His current interests include spatial and temporal learning models for text, user attribute prediction from text and Gaussian Processes, using large user-generated data coming from Social Media. Prior to joining UPenn, Daniel completed his PhD in Natural Language Processing and Machine Learning at the University of Sheffield, UK and was a researcher on the Trendminer EU FP7 project.

What traits enable crowd-workers to voluntarily disclose their identity?

Here at the World Well-Being Project, we have done many crowdsourcing experiments over the past years. Often times, we are interested not only in what the workers annotate, but also in these workers themselves. For example, in a short paper [1] we have shown that females are better and more confident than males in guessing gender from tweets, especially when it comes to guessing females.

Similarly, many surveys are done over a non-random selection of participants. This is for example a problem in exit-polls, where a non-random population agrees to share their voting preference, leading to the need for pollsters to perform corrections a posteriori. Moreover, in online studies where users are anonymous, not all will agree to disclose self-identifying or personal information. Using data from our studies on the Amazon Mechanical Turk crowdsourcing platform, we aimed to uncover which users are more likely to voluntarily disclose their identity.

Continue reading What traits enable crowd-workers to voluntarily disclose their identity?

Personality Profiles of ‘Cat’ and ‘Dog People’ in Social Media

Owning a pet is very popular in the U.S. The most recent survey from the American Pet Products Association estimated that 65% of American households (79.9 million) include at least one pet. The most popular household pets are, unsurprisingly, dogs (44% of households) and cats (34.9% of households) [1].

Psychologists have long been fascinated to uncover whether individual differences drive pet ownership and preference. Most have focused on comparing the so called cat and dog people. Perhaps driven by the natures of their respective favorite pet, cat people are stereotyped as quiet, sensitive, and unorthodox while dog people are thought of as gregarious and energetic. One of the most comprehensive studies to-date [5] analysed 4,565 participants who took the Big Five personality Inventory and self-identified as dog people, cat people, both or neither. They found that dog people are higher in extraversion, agreeableness and conscientiousness and lower in neuroticism and openness, even when controlling for gender differences. Contrary to these findings, some failed to uncover differences between the two types [9] or suggested the labels do little more than offer a different way of saying masculine and feminine [11].

To shed new light on this debate, we analysed two different online behaviors using big data from social media:

  1. Mentioning animal names in Facebook posts;
  2. Using a profile picture featuring cat or a dog on Twitter.

Continue reading Personality Profiles of ‘Cat’ and ‘Dog People’ in Social Media

Do the presidential candidates have a plan or highlight problems?

Automatically highlighting the central words of the candidates’ debate rhetoric.

As the primary election season continues to other important contests in the U.S., we continue our data-driven analysis of the presidential candidates. Previously, we have looked at the most distinctive words used by voters of each candidate and at the distinctive words of each candidate in debate speeches. This time, we were interested in what are the core concepts in each candidate’s rhetoric. To uncover this, we used a different algorithm that highlights the most ‘central’ words and phrases of each candidate those which appear over and over to bridge the distinct themes of each candidate (see Technical Section for more details).

Continue reading Do the presidential candidates have a plan or highlight problems?

Moral Foundations in Partisan News Sources

Moral Foundation Theory

Although almost everyone agrees that some things are morally good and some things are morally bad, the specific form of these beliefs can differ throughout the population. What is egregious to one person: harming marginalized communities, banning sugary soft drinks, refusing to go to church, etc.; can be considered completely trivial or even be endorsed by someone else.

The Moral Foundations Theory [1,2,3] was developed to model and explain these differences. Under this theory, there are a finite number of basic, moral values that people can intuitively support, but not necessarily to the same extent across the population. The five moral foundations are:

  1. Care/Harm:
    The valuation of compassion, kindness, and warmth, and the derogation of malice, deliberate injury, and inflicting suffering.
  2. Fairness/Cheating:
    The endorsement of equality, reciprocity, egalitarianism, and universal rights.
  3. Ingroup loyalty/Betrayal:
    Valuing patriotism and special treatment for ones own ingroup.
  4. Authority/Subversion:
    The valuation of extant or traditional hierarchies or social structures and leadership roles.
  5. Purity/Degradation:
    Disapproval of dirtiness, unholiness, and impurity.

Under this theory, a person who strongly endorses the value of ‘Care/Harm’ will be appalled at an action that causes suffering, while someone who endorses ‘Authority’ will support an action that supports the social hierarchy. These responses would be immediate, emotional, and intuitive.

Continue reading Moral Foundations in Partisan News Sources

Sentiment, intensity and user attributes

One of the most hyped applications of big data analysis to social media is sentiment analysis (a.k.a. opinion mining). Sentiment analysis is the area of Natural Language Processing that aims to identify and extract subjective information from text. This generally includes identifying if a piece of text is subjective or objective, what sentiment (a.k.a. valence) it expresses (positive or negative), what emotion it conveys and towards which entity or aspect of the text. Companies and marketers are mostly interested in automatically inferring public opinion about products, movies or actions.

Opposite to mining these attitudes towards other objects, people also express their own emotions online. We decided to analyze this less popular facet: learning about the emotions of people posting subjective messages. In this post I’ll present variations in sentiment and intensity of Facebook posts and how these vary with the attributes of the people that post them. I will investigate a number of user traits such as gender, age and personality.

Continue reading Sentiment, intensity and user attributes

Zodiac sign stereotypes in Twitter

In experiments on word usage in Twitter, I’ve constantly noticed some very coherent groups of hashtags and words: those belonging to astrology. Apparently, many users post horoscope information, statements or comments and tag them using the name of the zodiac sign. So, I wondered (since I pretty much tried ignored astrology all my life) what are the most particular traits that people use to describe each sign.

#Taurus is extremely kind and sweet..until you betray them; then death is better.

To uncover this, I planned to use a combination of Twitter data and one of my favourite statistical measures – Pointwise Mutual Information (PMI) [1,2].

Continue reading Zodiac sign stereotypes in Twitter