All Questions

14
votes
3answers
911 views

Interpreting the sum of TF-IDF scores of words across documents

First let's extract the TF-IDF scores per term per document: from gensim import corpora, models, similarities documents = ["Human machine interface for lab abc computer applications", "...
9
votes
2answers
5926 views

LDA model generates different topics everytime i train on the same corpus

I am using python gensim to train an Latent Dirichlet Allocation (LDA) model from a small corpus of 231 sentences. However, each time i repeat the process, it generates different topics. Why does th...
21
votes
5answers
8721 views

Update gensim word2vec model

I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the mo...
58
votes
9answers
39298 views

How to calculate the sentence similarity using word2vec model of gensim with python

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words. e.g. trained_model.similarity('woman', 'man') 0.73723527 However, the...
20
votes
3answers
34244 views

How to create a word cloud from a corpus in Python?

From Creating a subset of words from a corpus in R, the answerer can easily convert a term-document matrix into a word cloud easily. Is there a similar function from python libraries that takes eithe...
18
votes
2answers
9098 views

Topic distribution: How do we see which document belong to which topic after doing LDA in python

I am able to run the LDA code from gensim and got the top 10 topics with their respective keywords. Now I would like to go a step further to see how accurate the LDA algo is by seeing which document ...
13
votes
2answers
6702 views

Document topical distribution in Gensim LDA

I've derived a LDA topic model using a toy corpus as follows: documents = ['Human machine interface for lab abc computer applications', 'A survey of user opinion of computer system respo...
6
votes
1answers
4326 views

What is the simplest way to get tfidf with pandas dataframe?

I want to calculate tf-idf from the documents below. I'm using python and pandas. import pandas as pd df = pd.DataFrame({'docId': [1,2,3], 'sent': ['This is the first sentence','This ...
2
votes
2answers
1567 views

How to speed up Gensim Word2vec model load time?

I'm building a chatbot so I need to vectorize the user's input using Word2Vec. I'm using a pre-trained model with 3 million words by Google (GoogleNews-vectors-negative300). So I load the model usi...
37
votes
10answers
19617 views

Convert word2vec bin file to text

From the word2vec site I can download GoogleNews-vectors-negative300.bin.gz. The .bin file (about 3.4GB) is a binary format not useful to me. Tomas Mikolov assures us that "It should be fairly strai...

Previous Next