Medical conditions are predictable from social media posts Top 10 words in each topic, order by descending conditional probability
• Conditional probabilities [.csv] (sparse matrix format)

The goal is to get the probability of a topic given the document: For example, let's say we have topics with the following condition probabilities for words:

topic 1: a: 0.01, b: 0.02, c: 0.001
topic 2: c: 0.02, d: 0.005

and two documents with the following frequencies of words:

document 1: a: 2, b: 10, c: 3, d: 0, e: 6, f: 4
document 2: a: 5, b: 3, c: 8, d: 4, e: 0, f: 10

therefore the total word use in the documents are:

document 1: 2 + 10 + 3 + 0 + 6 + 4 = 25
document 2: 5 + 3 + 8 + 4 + 0 + 10 = 30

document 1's use of topics is given by summing the weighted relative frequencies:

p(topic1|document1): (2/25)*0.01 + (10/25)*0.02 + (3/25)*0.001 = 0.00892
p(topic2|document1): (3/25)*0.02 + (0/25)*0.005 = 0.0024

while document 2's use of topics:

p(topic1|document2): (5/30)*0.01 + (3/30)*0.02 + (8/30)*0.001 = .00393
p(topic2|document2): (8/30)*0.02 + (4/30)*0.005 = 0.006