All Questions

14
votes
3answers
1155 views

Interpreting the sum of TF-IDF scores of words across documents

First let's extract the TF-IDF scores per term per document: from gensim import corpora, models, similarities documents = ["Human machine interface for lab abc computer applications", "...
64
votes
9answers
42758 views

How to calculate the sentence similarity using word2vec model of gensim with python

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words. e.g. trained_model.similarity('woman', 'man') 0.73723527 However, the...
9
votes
2answers
6309 views

LDA model generates different topics everytime i train on the same corpus

I am using python gensim to train an Latent Dirichlet Allocation (LDA) model from a small corpus of 231 sentences. However, each time i repeat the process, it generates different topics. Why does th...
21
votes
5answers
9612 views

Update gensim word2vec model

I have a word2vec model in gensim trained over 98892 documents. For any given sentence that is not present in the sentences array (i.e. the set over which I trained the model), I need to update the mo...
2
votes
2answers
2309 views

How to speed up Gensim Word2vec model load time?

I'm building a chatbot so I need to vectorize the user's input using Word2Vec. I'm using a pre-trained model with 3 million words by Google (GoogleNews-vectors-negative300). So I load the model usi...
21
votes
3answers
36842 views

How to create a word cloud from a corpus in Python?

From Creating a subset of words from a corpus in R, the answerer can easily convert a term-document matrix into a word cloud easily. Is there a similar function from python libraries that takes eithe...
18
votes
2answers
9949 views

Topic distribution: How do we see which document belong to which topic after doing LDA in python

I am able to run the LDA code from gensim and got the top 10 topics with their respective keywords. Now I would like to go a step further to see how accurate the LDA algo is by seeing which document ...
8
votes
1answers
5170 views

What is the simplest way to get tfidf with pandas dataframe?

I want to calculate tf-idf from the documents below. I'm using python and pandas. import pandas as pd df = pd.DataFrame({'docId': [1,2,3], 'sent': ['This is the first sentence','This ...
13
votes
2answers
7311 views

Document topical distribution in Gensim LDA

I've derived a LDA topic model using a toy corpus as follows: documents = ['Human machine interface for lab abc computer applications', 'A survey of user opinion of computer system respo...
37
votes
10answers
21452 views

Convert word2vec bin file to text

From the word2vec site I can download GoogleNews-vectors-negative300.bin.gz. The .bin file (about 3.4GB) is a binary format not useful to me. Tomas Mikolov assures us that "It should be fairly strai...

Previous Next