Table Of Contents

Next topic

Introduction

Gensim – Python Framework for Vector Space Modeling

What’s new

Version 0.6 is out!

It contains several minor bug fixes and a fully online implementation of Latent Semanting Indexing! You can now update LSI with new documents at will, and use the resulting LSI transformation at each step. The training document stream may even be infinite!

For an introduction on what gensim does (or does not do), go to the introduction.

To download and install gensim, consult the install page.

For examples on how to use it, try the tutorials.

Quick Reference Example

>>> from gensim import corpora, models, similarities
>>>
>>> # load corpus iterator from a Matrix Market file on disk
>>> corpus = corpora.MmCorpus('/path/to/corpus.mm')
>>>
>>> # initialize a transformation (Latent Semantic Indexing with twenty latent dimensions)
>>> lsi = models.LsiModel(corpus, numTopics = 20)
>>>
>>> # convert the same corpus to latent space and index it
>>> index = similarities.MatrixSimilarity(lsi[corpus])
>>>
>>> # perform similarity query of another vector in LSI space against the whole corpus
>>> sims = index[query]

Indices and tables