To calculate the lexicon usage, one can take the sum over all words of the word weight in that particular lexicon multiplied by that word's relative frequency, and subsequently adding the intercept value to correct for the model bias (found under '_intercept' in the lexicon csvs).Click here for a walk-through example
A weighted lexicon is often applied as the sum of all weighted word relative frequencies over a document:
For example, let's say a lexicon has the following weights for words a, b, and c:
and two documents with the following frequencies of words:
therefore the total word uses in the documents are:
The documents' lexicon usage are given by summing the weighted relative frequencies:
Once the usages have been computed, the intercept of the lexicon needs to be added to the usages:
If the lexicon used represents age, and
are the predicted ages for both documents.
If it represents gender, simply take the sign of the result and if it's positive, the document is female, else it's male.