What’s new?
Version 0.7 is out!
gensim now contains two algorithms for Latent Semantic Indexing:
Of course, the option of incrementally adding new documents to an existing decomposition, without the need to recompute everything from scratch, remains from the previous version.
For an overview on what gensim does (or does not do), go to the introduction.
To download and install gensim, consult the install page.
For examples on how to use it, try the tutorials.
>>> from gensim import corpora, models, similarities
>>>
>>> # load corpus iterator from a Matrix Market file on disk
>>> corpus = corpora.MmCorpus('/path/to/corpus.mm')
>>>
>>> # initialize a transformation (Latent Semantic Indexing with 200 latent dimensions)
>>> lsi = models.LsiModel(corpus, numTopics=200)
>>>
>>> # convert the same corpus to latent space and index it
>>> index = similarities.MatrixSimilarity(lsi[corpus])
>>>
>>> # perform similarity query of another vector in LSI space against the whole corpus
>>> sims = index[query]