PUBLICATIONS
Show All | APA only | Bibtex only


YOUR SEARCH HAD NO RESULTS
publicationImage

Park, G., Yaden, D. B., Schwartz, H. A., Kern, M. L., Eichstaedt, J. C., Kosinski, M., ... & Seligman, M. E. (2016). Women are Warmer but No Less Assertive than Men: Gender and Language on Facebook. PloS one, 11(5), e0155885. bibtex

@article{ahu61, author={Park, G., Yaden, D.B. , Schwartz, H.A., Kern, M.L., Eichstaedt, J.C., Kosinski, M., Stillwell, D., Ungar, L.H. & Seligman, M.E.P.}, title={Constraint qualifications in maximization problems}, journal={Naval Research Logistics Quarterly}, volume={8}, year=1961, pages={175-191} }


Abstract...
Using a large social media dataset and open-vocabulary methods from computational linguistics, we explored differences in language use across gender, affiliation, and assertiveness.
publicationImage

Flekova L., Carpenter J., Giorgi S., Ungar L. & Preotiuc-Pietro D. (2016). Analyzing Biases in Human Perception of User Age and Gender from Text. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL. bibtex

@inproceedings{perceived2016acl, author={Flekova, Lucie and Carpenter, Jordan and Giorgi, Salvatore and Ungar, Lyle and Preoc{t}iuc-Pietro, Daniel}, title={{Analyzing Biases in Human Perception of User Age and Gender from Text}}, booktitle={Proceedings of the 54th annual meeting of the Association for Computational Linguistics}, series={ACL}, year={2016} }


Abstract...
Human perception of others is not perfectly aligned with reality and may suffer from biases. We present the results of a large crowdsourcing experiment on rating age and gender of others from their tweets. We identify and quantify the textual cues which lead to miss-assessments of traits or make annotators more or less confident in their choice. This study demonstrates that differences between real and perceived traits are noteworthy and elucidates inaccurately used stereotypes in human perception and possible implications to NLP research.
publicationImage

Flekova L., Ungar L. & Preotiuc-Pietro D. (2016). Exploring Stylistic Variation with Age and Income on Twitter. Proceedings of the 54th annual meeting of the Association for Computational Linguistics, ACL. bibtex

@inproceedings{ageincome2016acl, author={Flekova, Lucie and Ungar, Lyle and Preoc{t}iuc-Pietro, Daniel}, title={{Exploring Stylistic Variation with Age and Income on Twitter}}, booktitle={Proceedings of the 54th annual meeting of the Association for Computational Linguistics}, series={ACL}, year={2016} }


Abstract...
We explore the relation between stylistic and syntactic features and authors’ age and income. We find that writing style is predictive of income even beyond age and analyze the predictive power of writing style in a regression task on two data sets of around 5,000 Twitter users each. Additionally, we use our validated features to study daily variations in writing style of users from distinct income groups. Temporal stylistic patterns not only provide novel psychological insight into user behavior, but are useful for future research and applications in social media.
publicationImage

Preotiuc-Pietro, D., Schwartz, H.A., Park, G., Eichstaedt, J., Kern, M., Ungar, L., Shulman, E.P. (2016). Modelling Valence and Arousal in Facebook Posts. Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA), NAACL. bibtex

@inproceedings{va16wassa, title={Modelling Valence and Arousal in Facebook Posts}, author={Preoc{t}iuc-Pietro, Daniel and Schwartz, H. Andrew and Park, Gregory and Eichstaedt, Johannes and Kern, Margaret and Ungar, Lyle and Shulman, Elizabeth P.}, series = {NAACL}, booktitle = {Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis (WASSA)}, year = {2016} }


Abstract...
Emotional expressions are widespread on Social Media. A model for capturing emotions is the emotional circumplex, where each emotion is expressed on two dimensions: valence - or the pleasantness experienced and arousal - the intensity of the state. We introduce a dataset of age and gender stratified Facebook posts consisting of 2895 messages annotated with valence and arousal on a nine point scale by two annotators with a psychology training. We develop a message-level emotion detection model based on single words which correlates highly with valence (r=.65) and arousal (r.85). The dataset and model are publicly available to encourage new research and applications in this area.
publicationImage

Ranard, B.L., Werner, R.M., Antanavicius, T., Schwartz, H.A., Smith, R.J., Meisel, Z.F., Asch, D.A., Ungar, L.H. & Merchant, R.M. (2016). Yelp Reviews Of Hospital Care Can Supplement And Inform Traditional Surveys Of The Patient Experience Of Care. Health Affairs, 35(4), 697-705. bibtex

@article{ranard2016yelp, title={Yelp Reviews Of Hospital Care Can Supplement And Inform Traditional Surveys Of The Patient Experience Of Care}, author={Ranard, Benjamin L and Werner, Rachel M and Antanavicius, Tadas and Schwartz, H Andrew and Smith, Robert J and Meisel, Zachary F and Asch, David A and Ungar, Lyle H and Merchant, Raina M}, journal={Health Affairs}, volume={35}, number={4}, pages={697--705}, year={2016}, publisher={Health Affairs} }


Abstract...
Little is known about how real-time online rating platforms such as Yelp may complement the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) survey, which is the US standard for evaluating patients’ experiences after hospitalization. We compared the content of Yelp narrative reviews of hospitals to the topics in the HCAHPS survey, called domains in HCAHPS terminology. While the domains included in Yelp reviews covered the majority of HCAHPS domains, Yelp reviews covered an additional twelve domains not found in HCAHPS. The majority of Yelp topics that most strongly correlate with positive or negative reviews are not measured or reported by HCAHPS. The large collection of patient- and caregiver-centered experiences found on Yelp can be analyzed with natural language processing methods, identifying for policy makers the measures of hospital quality that matter most to patients and caregivers. The Yelp measures and analysis can also provide actionable feedback for hospitals.
publicationImage

Park, G., Schwartz, H.A., Sap, M., Kern, M.L., Weingarten, E., Eichstaedt, J.C., Berger, J., Stillwell, D.J., Kosinski, M., Ungar, L.H. & Seligman, M.E. (2015). Living in the Past, Present, and Future: Measuring Temporal Orientation with Language. Journal of personality. bibtex

@article{park2015living, title={Living in the Past, Present, and Future: Measuring Temporal Orientation with Language}, author={Park, Gregory and Schwartz, H Andrew and Sap, Maarten and Kern, Margaret L and Weingarten, Evan and Eichstaedt, Johannes C and Berger, Jonah and Stillwell, David J and Kosinski, Michal and Ungar, Lyle H and others}, journal={Journal of personality}, year={2015}, publisher={Wiley Online Library} }


Abstract...
Temporal orientation refers to individual differences in the relative emphasis one places on the past, present, or future, and it is related to academic, financial, and health outcomes. We propose and evaluate a method for automatically measuring temporal orientation through language expressed on social media. Judges rated the temporal orientation of 4,302 social media messages. We trained a classifier based on these ratings, which could accurately predict the temporal orientation of new messages in a separate validation set (accuracy/mean sensitivity 5.72; mean specificity 5.77). We used the classifier to automatically classify 1.3 million messages written by 5,372 participants (50% female; ages 13–48). Finally, we tested whether individual differences in past, present, and future orientation differentially related to gender, age, Big Five personality, satisfaction with life, and depressive symptoms. Temporal orientations exhibit several expected correlations with age, gender, and Big Five personality. More future-oriented people were older, more likely to be female, more conscientious, less impulsive, less depressed, and more satisfied with life; present orientation showed the opposite pattern. Language-based assessments can complement and extend existing measures of temporal orientation, providing an alternative approach and additional insights into language and personality relationships.
publicationImage

Fulgoni, D., Carpenter, J., Ungar, L., & Preotiuc-Pietro, D. An Empirical Exploration of Moral Foundations Theory in Partisan News Sources. Proceedings of the 10th edition of the Language Resources and Evaluation Conference, LREC. bibtex

@inproceedings{mf2016lrec, title = {{An Empirical Exploration of Moral Foundations Theory in Partisan News Sources}}, author = {Fulgoni, Dean and Carpenter, Jordan and Ungar, Lyle and Preoc{t}iuc-Pietro, Daniel}, series = {LREC}, booktitle = {{Proceedings of the 10th edition of the Language Resources and Evaluation Conference}}, year = {2016} }


Abstract...
New sources frame issues in different ways in order to appeal or control the perception of their readers. We analyze articles from partisan news sources in the US across a wide range of issues. We highlight the words that different sources use finding that each side prefers to combat the opposing view and show that we can predict the partisan bias of a new article. We also explore the extent to which partisan news sources appeal to a set of five underlying morals - care, fairness, loyalty, authority and sanctity - uncovering significant differences between sides.
publicationImage

Liu, L., Preotiuc-Pietro, D., Samani, Z. R., Moghaddam, M. E., & Ungar, L. (2016). Analyzing Personality through Social Media Profile Picture Choice. Proceedings of the 10th International AAAI Conference on Web and Social Media, ICWSM. bibtex

@inproceedings{persimages16icwsm, title={Analyzing Personality through Social Media Profile Picture Choice}, author={Liu, Leqi and Preoc{t}iuc-Pietro, Daniel and Riahi Samani, Zahra and Moghaddam, Mohsen E. and Ungar, Lyle}, series = {ICWSM}, booktitle = {Proceedings of the 10th International AAAI Conference on Web and Social Media}, year={2016} }


Abstract...
People choose their profile picture to be representative of their online persona. This choice is driven in part by personality. We use a large Twitter dataset to discover how personality traits influence profile picture type. We study this using interpretable features such a color choice, aesthetic quality, composition, facial presentation and emotions. We discover, for example that people high in openness to experience prefer aesthetically pleasing images that are not conventional (of inanimate objects or grayscale) while conscientious users prefer single faces that display positive facial emotions.
publicationImage

Schwartz, H.A., Sap, M., Kern, M.L., Eichstaedt, J.C., Kapelner, A., Agrawal, M., Blanco, E., Dziurzynski, L., Park, G., Stillwell, D. & Kosinski, M. (2016). Predicting individual well-being through the language of social media. In Biocomputing 2016: Proceedings of the Pacific Symposium (pp. 516-527). bibtex

@article{schwartz2016predicting, author={Schwartz, H Andrew, Sap, Maarten, Kern, Margaret L, Eichstaedt, Johannes C, Kapelner, Adam, Agrawal, Megha, Blanco, Eduardo, Dziurzynski, Lukasz, Park, Gregory, Stillwell, David, Kosinski, Michal, Seligman, Martin E P, and Ungar, Lyle H.}, title={{Predicting Individual Well-Being Through the Language of Social Media}}, journal={Pacific Symposium on Biocomputing}, year={2016}, volume={21}, pages={516-527} }


Abstract...
We present the task of predicting individual well-being, as measured by a life satisfaction scale, through the language people use on social media. Well-being, which encompasses much more than emotion and mood, is linked with good mental and physical health. The ability to quickly and accurately assess it can supplement multi-million dollar national surveys as well as promote whole body health. Through crowd-sourced ratings of tweets and Facebook status updates, we create message-level predictive models for multiple components of well-being. However, well-being is ultimately attributed to people, so we perform an additional evaluation at the user-level, finding that a multi-level cascaded model, using both message-level predictions and user- level features, performs best and outperforms popular lexicon-based happiness models. Finally, we suggest that analyses of language go beyond prediction by identifying the language that characterizes well-being.
publicationImage

Ireland, M. E., Schwartz, H. A., Chen, Q., Ungar, L. H., & Albarracín, D. (2015). Future-oriented tweets predict lower county-level HIV prevalence in the United States. Health Psychology, 34(S), 1252. bibtex

@article{ireland2015future, title={Future-oriented tweets predict lower county-level HIV prevalence in the United States.}, author={Ireland, Molly E and Schwartz, H Andrew and Chen, Qijia and Ungar, Lyle H and Albarrac{'i}n, Dolores}, journal={Health Psychology}, volume={34}, number={S}, pages={1252}, year={2015}, publisher={American Psychological Association} }


Abstract...
Future orientation promotes health and well-being at the individual level. Computerized text analysis of a dataset encompassing billions of words used across the United States on Twitter tested whether community-level rates of future-oriented messages correlated with lower human immunodeficiency virus (HIV) rates and moderated the association between behavioral risk indicators and HIV. Over 150 million tweets mapped to U.S. counties were analyzed using 2 methods of text analysis. First, county-level HIV rates (cases per 100,000) were regressed on aggregate usage of future-oriented language (e.g., will, gonna). A second data-driven method regressed HIV rates on individual words and phrases. Results: Results showed that counties with higher rates of future tense on Twitter had fewer HIV cases, independent of strong structural predictors of HIV such as population density. Future-oriented messages also appeared to buffer health risk: Sexually transmitted infection rates and references to risky behavior on Twitter were associated with higher HIV prevalence in all counties except those with high rates of future orientation. Data-driven analyses likewise showed that words and phrases referencing the future (e.g., tomorrow, would be) correlated with lower HIV prevalence. Integrating big data approaches to text analysis and epidemiology with psychological theory may provide an inexpensive, real-time method of anticipating outbreaks of HIV and etiologically similar diseases.
publicationImage

Preoţiuc-Pietro, D., Xu, W., Ungar, L. (2016). Discovering User Attribute Stylistic Differences via Paraphrasing. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI. bibtex

@inproceedings{paraphrase16aaai, author = {Preoc{t}iuc-Pietro, Daniel and Xu, Wei and Ungar, Lyle}, title = {{Discovering user attribute stylistic differences via paraphrasing}}, booktitle = {{Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence}}, series = {AAAI}, year = {2016} }


Abstract...
User attribute prediction from social media text was proven successful and useful for downstream tasks. In previous studies, differences in user trait language use have been limited primarily to the presence or absence of words that indicate topical preferences.

In this study, we aim to find linguistic style distinctions across three different user attributes: gender, age and occupational class. By combining paraphrases with a simple yet effective method, we capture a wide set of stylistic differences that are exempt from topic bias. We show their predictive power in user profiling, conformity with human perception and psycholinguistic hypotheses, and potential use in generating natural language tailored to specific user traits.

Link to dataset
publicationImage

Ireland, M. E., Chen, Q., Schwartz, H. A., Ungar, L. H., & Albarracin, D. (2015). Action Tweets Linked to Reduced County-Level HIV Prevalence in the United States: Online Messages and Structural Determinants. AIDS and Behavior, 1-9. bibtex

@article{ireland2015action, title={Action Tweets Linked to Reduced County-Level HIV Prevalence in the United States: Online Messages and Structural Determinants}, author={Ireland, Molly E and Chen, Qijia and Schwartz, H Andrew and Ungar, Lyle H and Albarracin, Dolores}, journal={AIDS and Behavior}, pages={1--9}, year={2015}, publisher={Springer} }


Abstract...
HIV is uncommon in most US counties but travels quickly through vulnerable communities when it strikes. Tracking behavior through social media may provide an unobtrusive, naturalistic means of predicting HIV outbreaks and understanding the behavioral and psychological factors that increase communities’ risk. General action goals, or the motivation to engage in cognitive and motor activity, may support protective health behavior (e.g., using condoms) or encourage activity indiscriminately (e.g., risky sex), resulting in mixed health effects. We explored these opposing hypotheses by regressing county-level HIV prevalence on action language (e.g., work, plan) in over 150 million tweets mapped to US counties. Controlling for demographic and structural predictors of HIV, more active language was associated with lower HIV rates. By leveraging language used on social media to improve existing predictive models of geographic variation in HIV, future targeted HIV-prevention interventions may have a better chance of reaching high-risk communities before outbreaks occur.
publicationImage

Padrez, K. A., Ungar, L., Schwartz, H. A., Smith, R. J., Hill, S., Antanavicius, T., ... & Merchant, R. M. (2015). Linking social media and medical record data: a study of adults presenting to an academic, urban emergency department. BMJ quality & safety, bmjqs-2015. bibtex

@article{padrez2015linking, title={Linking social media and medical record data: a study of adults presenting to an academic, urban emergency department}, author={Padrez, Kevin A and Ungar, Lyle and Schwartz, Hansen Andrew and Smith, Robert J and Hill, Shawndra and Antanavicius, Tadas and Brown, Dana M and Crutchley, Patrick and Asch, David A and Merchant, Raina M}, journal={BMJ quality & safety}, pages={bmjqs--2015}, year={2015}, publisher={BMJ Publishing Group Ltd} }


Abstract...
Background: Social media may offer insight into the relationship between an individual's health and their everyday life, as well as attitudes towards health and the perceived quality of healthcare services.

Objective: To determine the acceptability to patients and potential utility to researchers of a database linking patients’ social media content with their electronic medical record (EMR) data.

Methods: Adult Facebook/Twitter users who presented to an emergency department were queried about their willingness to share their social media data and EMR data with health researchers for the purpose of building a databank for research purposes. Shared posts were searched for select terms about health and healthcare.

Results: Of the 5256 patients approached, 2717 (52%) were Facebook and/or Twitter users. 1432 (53%) of those patients agreed to participate in the study. Of these participants, 1008 (71%) consented to share their social media data for the purposes of comparing it with their EMR. Social media data consisted of 1 395 720 posts/tweets to Facebook and Twitter. Participants sharing social media data were slightly younger (29.1±9.8 vs 31.9±10.4 years old; p<0.001), more likely to post at least once a day (42% vs 29%; p=0.003) and more likely to present to the emergency room via self-arrival mode and have private insurance. Of Facebook posts, 7.5% (95% CI 4.8% to 10.2%) were related to health. Individuals with a given diagnosis in their EMR were significantly more likely to use terms related to that diagnosis on Facebook than patients without that diagnosis in their EMR (p<0.0008).

Conclusions: Many patients are willing to share and link their social media data with EMR data. Sharing patients have several demographic and clinical differences compared with non-sharers. A database that merges social media with EMR data has the potential to provide insights about individuals’ health and health outcomes.
publicationImage

Preoţiuc-Pietro D, Volkova S, Lampos V, Bachrach Y, Aletras N (2015) Studying User Income through Language, Behaviour and Affect in Social Media. PLoS ONE 10(9): e0138717. doi:10.1371/journal.pone.0138717 bibtex

@article{income15plos, author = {Preoc{t}iuc-Pietro, Daniel and Volkova, Svitlana and Lampos, Vasileios and Bachrach, Yoram and Aletras, Nikolaos}, journal = {PLoS ONE}, title = {{Studying User Income through Language, Behaviour and Affect in Social Media}}, year = {2015}, month = {09}, volume = {10}, number = {9}, }


Abstract...
Automatically inferring user demographics from social media posts is useful for both social science research and a range of downstream applications in marketing and politics. This paper presents the first extensive study where user behaviour on Twitter is used to build a predictive model of income. This allows us to shed light on the factors that characterise income on Twitter and analyse their interplay with user emotions and sentiment, perceived psycho-demographics and language use expressed through the topics of their posts.

Our analysis uncovers correlations between different feature categories and income, some of which reflect common belief e.g. higher perceived education and intelligence indicates higher earnings, known differences e.g. gender and age differences, however, others show novel findings e.g. higher income users express more fear and anger, whereas lower income users express more of the time emotion and opinions.

Link to dataset
publicationImage

Flekova, L., Preoţiuc-Pietro, D., Carpenter, J., Giorgi, S., and Ungar, L. Analyzing crowdsourced assessment of user traits through Twitter posts In Proceedings of the Third AAAI Conference on Human Computation and Crowdsourcing. HCOMP, 2015 bibtex

@inproceedings{cstraits2015hcomp, author={Flekova, Lucie and Preoc{t{iuc-Pietro, Daniel and Carpenter, Jordan and Giorgi, Salvatore and Ungar, Lyle}, title={{Analyzing crowdsourced assessment of user traits through Twitter posts}}, booktitle={Third AAAI Conference on Human Computation and Crowdsourcing}, series={HCOMP}, year={2015} }


Abstract...
This preliminary study aims to gain a better understanding of which human attributes lead to better perceptions of the true identity of others. Using a crowdsourcing experiment, we used tweets from a set of users with known gender and age and asked workers on Amazon MTurk to rate their perception of these traits. Subsequently, we analyse which of the workers' own traits influence prediction performance and confidence.

Our results show that female workers are both more confident and more accurate at predicting gender from tweets, especially in identifying female authors. Workers in their thirties are the most accurate at rating age. However, they are also the least confident in their predictions.

Our study is a first step in identifying human demographic or psychological traits that contribute to a better understanding of others. Our findings can also be applied to identify target workers that are best suited for a certain task.
publicationImage

Flekova, L., Ruppert, E., & Preotiuc-Pietro, D. (2015). Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words. Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis. bibtex

@inproceedings{lmi2015wassa, author={Flekova, Lucie and Ruppert, Eugen and Preoc{t{iuc-Pietro, Daniel}, title={{Analysing domain suitability of a sentiment lexicon by identifying distributionally bipolar words}}, booktitle={Proceedings of the Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis}, series={EMNLP}, year={2015}, }


Abstract...
Sentiment (or valence) prediction from Twitter is of the utmost interest for research and commercial organizations. Systems are usually using lexicons, where each word is positive or negative. However, word lexicons suffer from ambiguities at a contextual level: the word cold is positive when in cold beer and negative in cold coffee or the word dark in dark chocolate (+) or dark soul (-). We introduce a method which helps to identify frequent contexts in which a word switches polarity, and to reveal which words often appear in both positive and negative contexts. We show that our method matches human perception of polarity and demonstrate improvements in automated sentiment classification. Our method also helps to assess the suitability to use an existing lexicon to a new platform (e.g. Twitter).
publicationImage

Daniel Preotiuc-Pietro, V. L., & Nikolaos Aletras (2015). An analysis of the user occupational class through Twitter content. Proceedings of the 53rd annual meeting of the Association for Computational Linguistics, ACL. bibtex

@inproceedings{jobs2015acl, author={Preoc{t}iuc-Pietro, Daniel and Lampos, Vasileios and Aletras, Nikolaos}, title={{An analysis of the user occupational class through Twitter content}}, booktitle={Proceedings of the 53rd annual meeting of the Association for Computational Linguistics}, series={ACL}, year={2015}, }


Abstract...
We explore the dynamics of social media information in the task of inferring the occupational class of users. Our analysis is based on the Standard Occupational Classification from the Office of National Statistics in the UK, which encloses 9 extensive categories of occupations.

The investigated methods take advantage of the user's textual input as well as platform-oriented characteristics (interaction, impact, usage). The best performing methodology uses a neural clustering technique (spectral clustering on neural word embeddings) and a Gaussian Process model for conducting the classification. It delivers a 52.7% accuracy in predicting the user's occupational class, a very decent performance for a 9-way classification task.

Our qualitative analysis confirms the generic hypothesis of occupational class separation as indicated by the language usage for the different job categories. This can be due to a different topical focus, e.g. artists will talk about art, but also due to more generic behaviours, e.g. the lower-ranked occupational classes tend to use more elongated words, whereas higher-ranked occupations tend to discuss more about politics or higher education.
publicationImage

Schwartz, H. A., & Ungar, L. H. (2015). Data-Driven Content Analysis of Social Media: A Systematic Overview of Automated Methods. The ANNALS of the American Academy of Political and Social Science, 659, 78-94. bibtex

@article{schwartz2015datadriven, author={Schwartz, H Andrew and Ungar, Lyle H}, title={{Data-driven content analysis of social media: A systematic overview of automated methods}}, year={2015}, journal={The ANNALS of the American Academy of Political and Social Science}, volume={659}, pages={78-94} }


Abstract...
Researchers have long measured people’s thoughts, feelings, and personalities using carefully designed survey questions, which are often given to a relatively small number of volunteers. The proliferation of social media, such as Twitter and Facebook, offers alternative measurement approaches: automatic content coding at unprecedented scales and the statistical power to do open-vocabulary exploratory analysis. We describe a range of automatic and partially automatic content analysis techniques and illustrate how their use on social media generates insights into subjective well-being, health, gender differences, and personality.
publicationImage

Preotiuc-Pietro, D., Eichstaedt, J., Park, G., Sap, M., Smith, L., Tobolsky, V., Schwartz, H. A., & Ungar, L. H. (2015). The Role of Personality, Age and Gender in Tweeting about Mental Illnesses. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, NAACL. bibtex

@inproceedings{pers2015clpsych, author={Preotiuc-Pietro, Daniel and Eichstaedt, Johannes and Park, Gregory and Sap, Maarten and Smith, Laura and Tobolsky, Victoria and Schwartz, H Andrew and Ungar, Lyle H}, title={{The role of personality, age and gender in tweeting about mental illnesses}}, booktitle={Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality}, series={NAACL}, year={2015}, }


Abstract...
Populations sharing similar demographics and personality traits are known to be more at risk than to have a certain mental illnesses. Our study focuses on personality and demographic influence in users tweeting about their mental illness: depression or post-traumatic stress disorder (PTSD). We find that age is a major predictor of PTSD users, with users having this illness being older than the rest. For personality, both sets of users are more neurotic and introverted and less conscientious. However, PTSD users are show more openness. Age together with the text-derived Big Five personality scores and gender show impressive predictive accuracies of around .8 AUC (area under the receiver operating characteristic curve), although lower than using all unigrams. We also study language use between populations. This analysis shows that we can recover many symptoms associated with the mental illnesses in the clinical literature. For example, depressed users disclose the presence of the two sets of core symptoms: sustained periods of low mood (dysphoria) and low interest (anhedonia).
publicationImage

Weeg C, Schwartz H. A., Hill S, Merchant RM, Arango C, Ungar L Using Twitter to Measure Public Discussion of Diseases: A Case Study JMIR Public Health Surveill 2015;1(1):e6 DOI: 10.2196/publichealth.3953 bibtex

@article{weeg2015using, author={Weeg, Christopher and Schwartz, H Andrew and Hill, Shawndra and Merchant, M. Raina and Arango, Catalina and Ungar, Lyle}, title={{Using Twitter to measure public discussion of diseases: A case study}}, journal={JMIR Public Health Surveillance}, year={2015}, volume={1}, issue={1} }


Abstract...
Word-use patterns in Twitter, Facebook, newsgroups, and Google queries have been used to investigate a wide array of health concerns. Twitter is perhaps the most popular online data source for such studies, due in part to its relative accessibility. It has been used to monitor health issues including influenza [1,2], cholera [3], H1N1 [4-6], postpartum depression [7], concussion [8], epilepsy [9], migraine [10], cancer screening [11], antibiotic use [12], medical practitioner errors [13], dental pain [14], and attitudes about vaccination [15]. Such research has demonstrated the utility of mining social media for public health applications despite potential methodological challenges, including the following: (1) Twitter users form a biased sample of the population [16-18], and (2) their word usage within tweets can be highly ambiguous. For example, focusing just on the medical domain, “stroke” has many nonmedical uses (“stroke of genius” or “back stroke”); most mentions of “heart attack” are metaphorical, not literal (just had a heart attack and died the power went out while I was in the shower); and although doctors associate “MI” with myocardial infarction, on Twitter it refers more often to the state of Michigan.
publicationImage

Preotiuc-Pietro, D., Sap, M., Schwartz, H. A., & Ungar, L. H. (2015). Mental Illness Detection at the World Well-Being Project for the CLPsych 2015 Shared Task. In Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, NAACL. bibtex

@inproceedings{wwbpst2015clpsych, author={Preoc{t}iuc-Pietro, Daniel and Sap, Maarten and Schwartz, H Andrew and Ungar, Lyle H}, title={{Mental illness detection at the World Well-Being Project for the CLPsych 2015 Shared Task}}, booktitle={Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality}, series={NAACL}, year={2015}, }


Abstract...
Mental illnesses are globally widespread, but still underdiagnosed due to costly treatment, social stigma associated with them and imperfect screening methods. Screening for these illnesses through Social Media behaviour can represent a viable large-scale method. The CLPsych2015 (Computational Linguistics and Clinical Psychology) Shared Task represents the first attempt to provide an apples-to-apples comparison of automatic methods to classify users having a mental illness from their Social Media language use. The dataset consists of users disclosing to have either depression or post-traumatic stress disorder (PTSD) on Twitter. Our system represented a user as probability distributions over topic usage, where a topic is a group of words sharing similar functions, either semantic or syntactic. We combined different topic representations in a linear learning algorithm. Remarkably, our method is fully automatic and not dependent on hand-crafted lists of features. The approach ranked second in all tasks on average precision and showed best results at .1 false positive rates.
publicationImage

Schwartz, H. A., Park, G., Sap, M., Weingarten, E., Eichstaedt, J., Kern, M., Stillwell, D., Kosinski, M., Berger, J., Seligman, M., & Ungar, L. (2015). Extracting Human Temporal Orientation from Facebook Language. NAACL-2015: Conference of the North American Chapter of the Association for Computational Linguistics. bibtex

@inproceedings{schwartz2015extracting, author={Schwartz, H Andrew and Park, Gregory and Sap, Maarten and Weingarten, Evan and Eichstaedt, Johannes and Kern, Margaret and Stillwell, David and Kosinski, Michal and Berger, Jonah and Seligman, Martin and Ungar, Lyle}, title={{Extracting human temporal orientation from Facebook language}}, booktitle={Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies}, year={2015}, series={NAACL} }


Abstract...
People vary widely in their temporal orientation—how often they emphasize the past, present, and future—and this affects their finances, health, and happiness. Traditionally, temporal orientation has been assessed by self-report questionnaires. In this paper, we develop a novel behavior-based assessment using human language on Facebook. We first create a past, present, and future message classifier, engineering features and evaluating a variety of classification techniques. Our message classifier achieves an accuracy of 71.8%, compared with 52.8% from the most frequent class and 58.6% from a model based entirely on time expression features. We quantify a users’ overall temporal orientation based on their distribution of messages and validate it against known human correlates: conscientiousness, age, and gender. We then explore social scientific questions, finding novel associations with the factors openness to experience, satisfaction with life, depression, IQ, and one's number of friends. Further, demonstrating how one can track orientation over time, we find differences in future orientation around birthdays.
publicationImage

Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Park, G., Labarthe, D. R., Merchant, R. M., Jha, S., Agrawal, M., Dziurzynski, L. A., Sap, M., Weeg, C., Larson, E. E., Ungar, L. H., & Seligman, M. E. (2015). Psychological Language on Twitter Predicts County-Level Heart Disease Mortality. Psychological Science 26(2), 159-169. bibtex

@article{eichstaedt2015psychological, author={Eichstaedt, Johannes C and Schwartz, H Andrew and Kern, Margaret L and Park, Gregory and Labarthe, Darwin R and Merchant, Raina M and Jha, Sneha and Agrawal, Megha and Dziurzynski, Lukasz A and Sap, Maarten and Weeg, Christopher and Larson, Emily E and Ungar, Lyle H and Seligman, Martin EP}, title={{Psychological language on Twitter predicts county-level heart disease mortality}}, journal={Psychological Science}, year={2015}, volume={26}, issue={2}, pages={159-169} }



Abstract...
Hostility and chronic stress are known risk factors for heart disease, but they are costly to assess on a large scale. We used language expressed on Twitter to characterize community-level psychological correlates of age-adjusted mortality from atherosclerotic heart disease (AHD). Language patterns reflecting negative social relationships, disengagement, and negative emotions—especially anger—emerged as risk factors; positive emotions and psychological engagement emerged as protective factors. Most correlations remained significant after controlling for income and education. A cross-sectional regression model based only on Twitter language predicted AHD mortality significantly better than did a model that combined 10 common demographic, socioeconomic, and health risk factors, including smoking and hypertension. Capturing community psychological characteristics through social media is feasible, and these characteristics are strong markers of cardiovascular mortality at the community level.
publicationImage

Yaden, D. B., Eichstaedt, J. C., Schwartz, H. A., Kern, M. L., Le Nguyen, K. D., Wintering, N. A., Hood, R. W., Jr., & Newberg, A. B. (2015, July 27). The Language of Ineffability: Linguistic Analysis of Mystical Experiences. Psychology of Religion and Spirituality. Advance online publication. http://dx.doi.org/10.1037/rel0000043 bibtex

@article{yaden2015language, title={The Language of Ineffability: Linguistic Analysis of Mystical Experiences.}, author={Yaden, David B and Eichstaedt, Johannes C and Schwartz, H Andrew and Kern, Margaret L and Le Nguyen, Khoa D and Wintering, Nancy A and Hood Jr, Ralph W and Newberg, Andrew B}, year={2015}, publisher={Educational Publishing Foundation} }


Abstract...
Mystical experiences are often described as “ineffable,” or beyond language. However, people readily speak about their mystical experiences if asked about them. How do people describe what is supposedly indescribable? In this study, we used quantitative linguistic analyses to interpret the writings of 777 participants (45.5% female, 51.0% male) who recounted their most significant spiritual or religious experience as part of an online survey. High and low scorers on a measure of mystical experiences differed in the language they used to describe their experiences. Participants who have had mystical experiences used language that was more socially and spatially inclusive (e.g., “close,” “we,” “with”) and used fewer overtly religious words (e.g., “prayed,” “Christ,” “church”) than participants without such experiences. Results indicated that people can meaningfully communicate their mystical experiences, and that quantitative language analyses provide a means for understanding aspects of such experiences.
publicationImage

Duckworth, AL, Eichstaedt, JC, and Ungar, LH (2015), The Mechanics of Human Achievement. Social and Personality Psychology Compass, 9, 359–369. doi: 10.1111/spc3.12178. bibtex

@article{duckworthmechanics, title={The Mechanics of Human Achievement}, journal = {Social and Personality Psychology Compass}, author={Duckworth, Angela L and Eichstaedt, Johannes C and Ungar, Lyle H}, year = {2015}, volume = {9}, issue = {7}, pages = {359-369}, doi = {10.1111/spc3.12178}, url = {http://dx.doi.org/10.1111/spc3.12178} }


Abstract...
Countless studies have addressed why some individuals achieve more than others. Nevertheless, the psychology of achievement lacks a unifying conceptual framework for synthesizing these empirical insights.

We propose organizing achievement-related traits by two possible mechanisms of action: Traits that determine the rate at which an individual learns a skill are talent variables and can be distinguished conceptually from traits that determine the effort an individual puts forth. This approach takes inspiration from Newtonian mechanics: achievement is akin to distance traveled, effort to time, skill to speed, and talent to acceleration. A novel prediction from this model is that individual differences in effort (but not talent) influence achievement (but not skill) more substantially over longer (rather than shorter) time intervals. Conceptualizing skill as the multiplicative product of talent and effort, and achievement as the multiplicative product of skill and effort, advances similar, but less formal, propositions by several important earlier thinkers.
publicationImage

Park, G., Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Stillwell, D. J., Kosinski, M., Ungar, L. H., & Seligman, M. E. (in press). Automatic personality assessment through social media language. Journal of Personality and Social Psychology, Nov 3 , 2014. bibtex

@article{park2014automatic, author={Park, Greg and Schwartz, H Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Stillwell, David J and Kosinski, Michal and Ungar, Lyle H and Seligman, Martin EP}, title={{Automatic personality assessment through social media language}}, year={2014}, journal={Journal of Personality and Social Psychology}, volume={108}, issue={6}, pages={934-952} }


Abstract...
We describe a personality assessment derived from an automated analysis of social media language. First, we build a model based on 66,000+ Facebook users and their personality traits, then we create a predictive model of personality based on their language. To test our model, we generate personality predictions for a new sample of 4,800 users. We compare predictions to (a) questionnaire assessments, (b) personality ratings from friends, and (c) outcomes related to personality (e.g., number of friends, political attitudes). We also assess the stability of predictions by making multiple predictions for single users at different time points and comparing predictions over time. We find that language-based assessments can constitute valid personality measures: they agree with questionnaires and friend ratings, they can be combined with friend ratings to improve accuracy, they have expected correlations to relevant outcomes, and they are stable over six-month intervals. This method can complement traditional assessments, and can quickly and cheaply assess many people with minimal burden.
publicationImage

Sap, M., Park, G., Eichstaedt, J. C., Kern, M. L., Stillwell, D. J., Kosinski, M., Ungar, L. H., & Schwartz, H. A. (2014). Developing Age and Gender Predictive Lexica over Social Media. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1146–1151. bibtex

@inproceedings{developing2014emnlp, author={Sap, Maarten and Park, Greg and Eichstaedt, Johannes C and Kern, Margaret L and Stillwell, David J and Kosinski, Michal and Ungar, Lyle H and Schwartz, H Andrew}, title={{Developing age and gender predictive lexica over social media}}, booktitle={Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing}, series={EMNLP}, year={2014}, }


Abstract...
Demographic lexica have potential for widespread use in social science, economic, and business applications. We derive predictive lexica (words and weights) for age and gender using regression and classification models from word usage in Facebook, blog, and Twitter data with associated demographic labels. The lexica, made publicly available, achieved state-of-the-art accuracy in language based age and gender prediction over Facebook and Twitter, and were evaluated for generalization across social media genres as well as in limited message situations.
publicationImage

Schwartz, H. A., Eichstaedt, J., Kern, M. L., Park, G., Sap, M., Stillwell, D., Kosinski, M., & Ungar, L. (2014). Towards Assessing Changes in Degree of Depression through Facebook. Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality, Association for Computational Linguistics, 118-125. bibtex

@inproceedings{schwartz2014towards, author={Schwartz, H Andrew and Eichstaedt, Johannes and Kern, Margaret L and Park, Gregory and Sap, Maarten and Stillwell, David and Kosinski, Michal and Ungar, Lyle}, title={{Towards assessing changes in degree of depression through Facebook}}, booktitle={Proceedings of the Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality}, year={2014}, pages={118-125}, series={ACL} }


Abstract...
Depression is typically diagnosed as being present or absent. However, depression severity is believed to be continuously distributed rather than dichotomous. Severity may vary for a given patient daily and seasonally as a function of many variables ranging from life events to environmental factors. Repeated population-scale assessment of depression through questionnaires is expensive. In this paper we use survey responses and status updates from 28,749 Facebook users to develop a regression model that predicts users’ degree of depression based on their Facebook status updates.
publicationImage

Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Dziurzynski, L., Ungar, L. H., Stillwell, D. J., Kosinski, M., Ramones, S. M., & Seligman, M. E. (2014). The Online Social Self: An Open Vocabulary Approach to Personality. Assessment, 21(2), 158-169. bibtex

@article{kern2014socialself, author={Kern, Margaret L and Eichstaedt, Johannes C and Schwartz, H Andrew and Dziurzynski, Lukasz and Ungar, Lyle H and Stillwell, David J and Kosinski, Michal and Ramones, Stephanie M and Seligman, Martin EP}, title={{The online social self: An open vocabulary approach to personality}}, journal={Assessment}, year={2014}, pages={158-169}, volume={21}, issue={2} }


Abstract...
Objective: We present a new open language analysis approach that identifies and visually summarizes the dominant naturally occurring words and phrases that most distinguished each Big Five personality trait. Method: Using millions of posts from 69,792 Facebook users, we examined the correlation of personality traits with online word usage. Our analysis method consists of feature extraction, correlational analysis, and visualization. Results: The distinguishing words and phrases were face valid and provide insight into processes that underlie the Big Five traits. Conclusion: Open-ended data driven exploration of large datasets combined with established psychological theory and measures offers new tools to further understand the human psyche.
publicationImage

Kern, M. L., Eichstaedt, J. C., Schwartz, H. A., Park, G., Ungar, L. H., Stillwell, D. J., Kosinski, M., Dziurzynski, L., & Seligman, M. E. (2014). From "sooo excited!!!" to "so proud": Using language to study development. Developmental Psychology, 50(1), 178-188. bibtex

@article{kern2014from, author={Kern, Margaret L and Eichstaedt, Johannes C and Schwartz, H Andrew and Park, Greg and Ungar, Lyle H and Stillwell, David J and Kosinski, Michal and Dziurzynski, Lukasz and Seligman, Martin EP}, title={{From "sooo excited!!!" to "so proud": Using language to study development}}, journal={Developmental Psychology}, year={2014}, volume={50}, issue={1}, pages={178-188} }


Abstract...
We introduce a new method, differential language analysis (DLA), for studying human development that uses computational linguistics to analyze the big data available through online social media in light of psychological theory. Our open vocabulary DLA approach finds words, phrases, and topics that distinguish groups of people based on one or more characteristics. Using a dataset of over 70,000 Facebook users, we identifyhow word and topic use vary as a function of age, and compile cohort specific words and phrases into visual summaries that are face valid and intuitively meaningful. We demonstrate how this methodology can be used to test developmental hypotheses, using the aging positivity effect (Carstensen & Mikels, 2005) as an example. While this study focuses primarily on common trends across age-related cohorts, the same methodology can be used to explore heterogeneity within developmental stages or to explore other characteristics that differentiate groups of people. Our comprehensive list of words and topics are available on our website for deeper exploration by the research community.
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E., & Ungar, L. H. (2013). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLOS ONE, 8(9), e73791. bibtex

@article{schwartz2013personality,, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Ramones, Stephanie M and Agrawal, Megha and Shah, Achal and Kosinski, Michal and Stillwell, David and Seligman, Martin EP and Ungar, Lyle H}, title={{Personality, gender, and age in the language of social media: The Open-Vocabulary approach}}, journal={PLoS ONE}, year={2013}, }


Abstract...
We analyzed 700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers, who also took standard personality tests, and found striking variations in language with personality,gender, and age. In our open-vocabulary technique, the data itself drives a comprehensive exploration of language that distinguishes people, finding connections that are not captured with traditional closed-vocabulary word-category analyses. Our analyses shed new light on psychosocial processes yielding results that are face valid (e.g., subjects living in high elevations talk about the mountains), tie in with other research (e.g., neurotic people disproportionately use the phrase .sick of. and the word .depressed.), suggest new hypotheses (e.g., an active life implies emotional stability), and give detailed insights (males use the possessive .my. when mentioning their .wife. or .girlfriend. more often than females use .my. with .husband. or .boyfriend.). To date, this represents the largest study, by an order of magnitude, of language and personality.
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Lucas, R. E., Agrawal, M., Park, G. J., Lakshmikanth, S. K., Jha, S., Seligman, M. E. P., & Ungar, L. H. (2013). Characterizing Geographic Variation in Well-Being using Tweets. Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM), Boston, MA. bibtex

@inproceedings{schwartz2013characterizing, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Kern, Margaret L and Dziurzynski, Lukasz and Lucas, Richard E and Agrawal, Megha and Park, Gregory J and Lakshmikanth, Shrinidhi K and Jha, Sneha and Seligman, Martin E P and Ungar, Lyle H}, title={{Characterizing geographic variation in well-being using tweets}}, booktitle={Proceedings of the 7th International AAAI Conference on Weblogs and Social Media}, year={2013}, series={ICWSM} }


Abstract...
The language used in tweets from 1,300 different US counties was found to be predictive of the subjective well-being of people living in those counties as measured by representative surveys. Topics, sets of co-occurring words derived from the tweets using LDA, improved accuracy in predicting life satisfaction over and above standard demographic and socio-economic controls (age, gender, ethnicity, income, and education). The LDA topics provide a greater behavioural and conceptual resolution into life satisfaction than the broad socio-economic and demographic variables. For example, tied in with the psychological literature, words relating to outdoor activities, spiritual meaning, exercise, and good jobs correlate with increased life satisfaction, while words signifying disengagement like `bored’ and `tired’ show a negative association
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Dziurzynski, L., Kern, M. L., Blanco, E., Ramones, S., Seligman, M. E. P., & Ungar, L. H. (2013). Choosing the Right Words: Characterizing and Reducing Error of the Word Count Approach. Proceedings of *SEM-2013: Second Joint Conference on Lexical and Computational Semantics, Atlanta, Georgia, USA. 296-305. bibtex

@inproceedings{schwartz2013choosing, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Dziurzynski, Lukasz and Kern, Margaret L and Blanco, Eduardo and Ramones, Stephanie and Seligman, Martin E P and Ungar, Lyle H}, title={{Choosing the right words: Characterizing and reducing error of the word count approach}}, booktitle={Proceedings of the 2nd Joint Conference on Lexical and Computational Semantics}, year={2013}, pages={296-305}, series={*SEM} }


Abstract...
Social scientists are increasingly using the vast amount of text available on social media to measure variation in happiness and other psychological states. Such studies count words deemed to be indicators of happiness and track how the word frequencies change across locations or time. This word count approach is simple and scalable, yet often picks up false signals, as words can appear in different contexts and take on different meanings. We characterize the types of errors that occur using the word count approach, and find lexical ambiguity to be the most prevalent. We then show that one can reduce error with a simple refinement to such lexica by automatically eliminating highly ambiguous words. The resulting refined lexica improve precision as measured by human judgments of word occurrences in Facebook posts
publicationImage

Schwartz, H. A., Eichstaedt, J. C., Dziurzynski, L., Kern, M. L., Blanco, E., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Toward Personality Insights from Language Exploration in Social Media. Proceedings of the AAAI Spring Symposium Series: Analyzing Microtext, Stanford, California, USA. bibtex

@inproceedings{schwartz2013toward, author={Schwartz, H Andrew and Eichstaedt, Johannes C and Dziurzynski, Lukasz and Kern, Margaret L and Blanco, Eduardo and Kosinski, Michal and Stillwell, David and Seligman, Martin E P and Ungar, Lyle H}, title={{Toward personality insights from language exploration in social media}}, booktitle={Proceedings of the AAAI Spring Symposium Series: Analyzing Microtext}, year={2013}, }


Abstract...
Language in social media reveals a lot about people’s personality and mood as they discuss the activities and relationships that constitute their everyday lives. Although social media are widely studied, researchers in computational linguistics have mostly focused on prediction tasks such as sentiment analysis and authorship attribution. In this paper, we show how social media can also be used to gain psychological insights. We demonstrate an exploration of language use as a function of age, gender, and personality from a dataset of Facebook posts from 75,000 people who have also taken person- ality tests, and we suggest how more sophisticated tools could be brought to bear on such data.