Do the presidential candidates have a plan or highlight problems?

Automatically highlighting the central words of the candidates’ debate rhetoric.

As the primary election season continues to other important contests in the U.S., we continue our data-driven analysis of the presidential candidates. Previously, we have looked at the most distinctive words used by voters of each candidate and at the distinctive words of each candidate in debate speeches. This time, we were interested in what are the core concepts in each candidate’s rhetoric. To uncover this, we used a different algorithm that highlights the most ‘central’ words and phrases of each candidate those which appear over and over to bridge the distinct themes of each candidate (see Technical Section for more details).

Democrats

First, here are the results of the two remaining democratic candidates. The size of the word shows the most representative words or phrases:


Hillary Clinton's most central debate words. Size is proportional to the centrality.
Hillary Clinton’s most central debate words. Size is proportional to the centrality.


Hillary Clinton frequently refers to the accomplishments of Barack Obama, hence ‘president’ is her most central phrase and ‘Obama’ is third. Other central words include domestic policies related to health care, family and gender issues (‘child’, ‘health’, ‘job’, ‘family’, ‘care’, ‘women’). Further down the list appear the words related to finance (‘money’, ‘Wall Street’, ‘economy’), foreign policy (‘ISIS’, ‘world’, ‘Syria’) and education (‘college’, ‘education’), in this order. ‘Lot’, ‘way’ and ‘kind’ represent linguistic idiosyncrasies of the candidate (‘lot of children’, ‘lot of suffering’, ‘lot of time’, ‘a lot of what happened’; ‘better way’, “that’s the way”, ‘the way that’).


Bernie Sanders' most central debate words. Size is proportional to the centrality.
Bernie Sanders’ most central debate words. Size is proportional to the centrality.

Bernie Sanders has at the center of his discourse ‘America’ and ‘United States’, usually as a way of showing a bigger picture through his discourse (‘today in America’, ‘ordinary Americans’, ‘change in the United States of America’, ‘issues facing America’). Perhaps unsurprisingly, the third and fourth most central words refer to his two major campaign themes: wealth (‘Wall Street’, ‘millions’, ‘billionaires’, ‘dollars’, ‘money’, ‘wage’) and social issues (‘health’, ‘care’, ‘child’, ‘family’, ‘tax’). Other central words identified refer to the campaign system. His linguistic idiosyncrasies include the expressions ‘today’, ‘year’ and ‘issue’.

Both candidates have among their top words those related to their counterpart as they were mentioning them in a wide range of contexts (‘secretary’, ‘Clinton’, ‘Senator Sanders’).

Republicans

Here are the results on the Republican side:


Ted Cruz's most central debate words. Size is proportional to the centrality.
Ted Cruz’s most central debate words. Size is proportional to the centrality.

Cruz’s most central word is ‘tax’, which is indeed an important theme of his campaign (‘flat tax’, ‘my tax plan’, ‘tax reform’, ‘Obamacare taxes’) followed by ‘president’ which in this case refers to things he will accomplish if he is elected president (‘as president’, ‘if I’m president’). Additional themes include legal issues (‘Supreme Court’, ‘law’, ‘right’, ‘court’, ‘second amendment’, ‘justice’) followed by foreign affairs (‘ISIS’, ‘islamic’, ‘terrorist’, ‘Iran’).
He also mentions his democratic opposition in a variety of contexts (‘Hillary Clinton’, ‘Obama’).


John Kasich's most central debate words. Size is proportional to the centrality.
John Kasich’s most central debate words. Size is proportional to the centrality.

Kasich is focused on economic issues with ‘job’ and ‘tax’ ranking as his two most central words. Very often he connects these to his home state of Ohio. Further references to economic issues follow (‘budget’, ‘plan’, ‘growth’, ‘business’, ‘program’, ‘wage’) and no other theme seems to stand out. His linguistic idiosyncrasies include ‘time’ (‘spend time’, ‘a very long time’, ‘all the time’), ‘everybody’ (‘everybody in America’, ‘suggest to everybody’) and ‘look’ (‘and look’, ‘Look, I …’, ‘we look at’).


Donald Trump's most central debate words. Size is proportional to the centrality.
Donald Trump’s most central debate words. Size is proportional to the centrality.

Finally, Trump’s most central words are more general and abstract, including ‘way’, ‘lot’, ‘thing’ as the top three. ‘China’ is the fourth most central word (and first in terms of frequency), which he usually ties into arguments about the economy. The economy seems to be his most central theme (‘company’, ‘deal’, ‘job’, ‘money’, ‘trade’, ‘business’), followed by the immigration (‘border’, ‘Mexico’). His linguistic choices include the word ‘problem’, in contrast to his democratic counterparts preferring ‘issue’. He also uses more emotive words than other candidates, for example ‘disaster’, ‘respect’, ‘reason’ or ‘friend’.

Similarly to the democrats, the top words for republicans include a direct address to another candidate, albeit more informally (‘Donald’, ‘Ted’).

Conclusions

Our analysis aimed to highlight the most important concepts that the candidates refer to over and over in their debate speeches. The above word clouds show the topics that each candidate repeatedly went back to in their speeches, across different contexts. In other words, these are the terms that characterize the preoccupations, idiosyncratic interests, and verbal habits of each candidate: Kasich will bring up Ohio across a multitude of situations; Sanders will liberally connect Wall Street to other topics.

Technical Section

The algorithm we used is called TextRank and is the textual counterpart of PageRank, Google’s original website ranking algorithm. Intuitively, it builds a graph of words which are linked by the number of times they appear in the same context (here, same sentence). Then, it finds the words that most central in this graph, i.e. appear in context with as many other words from separate parts of the graph. This method was originally used in text summarization.

The further refine this, we have performed part-of-speech tagging on all the debates and took into account only nouns as these are known to be most distinctive for summarization purposes. We also ran a chunker to identify names like ‘Wall Street’ or ‘New York’ and identified collocations such as ‘ballistic missile’ or ‘coal miner’. We also removed the words ‘people’, ‘country’ as these were among the top words for all candidates and provided little information. We then lemmatized words in order to merge words with the same lemma such as ‘republican’ – ‘republicans’.

The implementation of the algorithm is freely available on GitHub.

Share this on ...Share on Facebook0Tweet about this on Twitter0Share on Google+0Print this pageEmail this to someone

About Daniel Preotiuc

Daniel is a Postdoctoral researcher at the University of Pennsylvania. His research is situated at the intersection of Natural Language Processing, Machine Learning and Social Science. His current interests include spatial and temporal learning models for text, user attribute prediction from text and Gaussian Processes, using large user-generated data coming from Social Media. Prior to joining UPenn, Daniel completed his PhD in Natural Language Processing and Machine Learning at the University of Sheffield, UK and was a researcher on the Trendminer EU FP7 project.

Leave a Reply

Your email address will not be published. Required fields are marked *