Using Python 3.6 and the latest version of gensim I've trained an LDA model over a corpus of ~3,000 tweets. I can print out the topics and their word probabilities, but am stuck at generating summary statistics for it. (Yes, I know tweets are not the best thing to use with LDA, but it's what I'm currently working with)

Specifically, I want to try and plot the percent of docs (aka tweets) which each of the topics appear in, and calculate an average prediction for each of the docs they appear in across the entire corpus. I've searched but can't find a function that will easily give me this information. Does one even exist?

If not, I'm assuming I would write a for loop to pass each doc into the model to predict it's topic and then store the results. Then once I have all of the results I would need to iterate over that to get the summary data I'm looking for. Any other suggestions on how to do this?

0 Answers 11

Not the answer you're looking for? Browse other questions tagged or ask your own question.