... advancing understanding of human flourishing using language analysis
Choose a new page...
ABOUT
PEOPLE
PUBLICATIONS
DEMOS
RESOURCES
BLOG
MAP
SERVICES
... advancing understanding of human flourishing using language analysis
ABOUT
PEOPLE
PUBLICATIONS
DEMOS
RESOURCES
BLOG
MAP
SERVICES
Medical conditions are predictable from social media posts
Condition X Topic Loadings
200 topics and their loadings across 37 medical conditions.
Top Words in Each Topic
Top 10 words in each topic, order by descending conditional probability
Readme
Citation and relavant information for data files.
Facebook Topics
Top 20 words per topic:
[.csv]
[Excel file]
All words:
[.csv]
You might also use...
Conditional probabilities
[.csv]
(sparse matrix format)
Click here to walk-through an example
The goal is to get the probability of a topic given the document:
For example, let's say we have topics with the following condition probabilities for words:
topic 1:
a: 0.01, b: 0.02, c: 0.001
topic 2:
c: 0.02, d: 0.005
and two documents with the following frequencies of words:
document 1:
a: 2, b: 10, c: 3, d: 0, e: 6, f: 4
document 2:
a: 5, b: 3, c: 8, d: 4, e: 0, f: 10
therefore the total word use in the documents are:
document 1:
2 + 10 + 3 + 0 + 6 + 4 = 25
document 2:
5 + 3 + 8 + 4 + 0 + 10 = 30
document 1's use of topics is given by summing the weighted relative frequencies:
p(topic1|document1):
(2/25)*0.01 + (10/25)*0.02 + (3/25)*0.001 = 0.00892
p(topic2|document1):
(3/25)*0.02 + (0/25)*0.005 = 0.0024
while document 2's use of topics:
p(topic1|document2):
(5/30)*0.01 + (3/30)*0.02 + (8/30)*0.001 = .00393
p(topic2|document2):
(8/30)*0.02 + (4/30)*0.005 = 0.006
Link to Publication
APA Citation
Bibtex Citation