This module contains math helper functions.
Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents).
Note that the file is read into memory one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows for representing corpora which are larger than the available RAM.
Initialize the matrix reader.
The fname is a path to a file on local filesystem, which is expected to be in sparse (coordinate) Matrix Market format. Documents are assumed to be rows of the matrix (and document features are columns).
Store corpus in Matrix Market format.
Save the vector space representation of an entire corpus to disk.
Note that the documents are processed one at a time, so the whole corpus is allowed to be larger than the available RAM.
Write a single sparse vector to the file.
Sparse vector is any iterable yielding (field id, field value) pairs.
Scale a vector to unit length. The only exception is the zero vector, which is returned back unchanged.
If the input is sparse (list of 2-tuples), output will also be sparse. Otherwise, output will be a numpy array.