Download

Get gensim from the Python Package Index or install directly with:

easy_install gensim

Table Of Contents

Questions? Suggestions?

Subscribe to Google group

You can also open an issue at the github issue tracker.

Gensim – Vector Space Modelling for Humans

What’s new?

For an overview of what you can (or cannot) do with gensim, go to the introduction.

For installation and troubleshooting, see the installation page and the gensim discussion group.

For examples on how to convert text to vectors and work with the result, try the tutorials.

When citing gensim in academic papers, use this BibTeX entry.

Quick Reference Example

>>> from gensim import corpora, models, similarities
>>>
>>> # Load corpus iterator from a Matrix Market file on disk.
>>> # See Tutorial 1 on text corpora and vectors.
>>> corpus = corpora.MmCorpus('/path/to/corpus.mm')
>>>
>>> # Initialize a transformation (Latent Semantic Indexing with 200 latent dimensions).
>>> # See Tutorial 2 on semantic models.
>>> lsi = models.LsiModel(corpus, num_topics=200)
>>>
>>> # Convert another corpus to the latent space and index it.
>>> # See Tutorial 3 on similarity queries.
>>> index = similarities.MatrixSimilarity(lsi[another_corpus])
>>>
>>> # determine similarity of a query document against each document in the index
>>> sims = index[query]