This module contains math helper functions.
Treat dense numpy array as a sparse gensim corpus.
No data copy is made (changes to the underlying matrix imply changes in the corpus).
Wrap a term-document matrix on disk (in matrix-market format), and present it as an object which supports iteration over the rows (~documents).
Note that the file is read into memory one document at a time, not the whole matrix at once (unlike scipy.io.mmread). This allows us to process corpora which are larger than the available RAM.
Initialize the matrix reader.
The input refers to a file on local filesystem, which is expected to be in the sparse (coordinate) Matrix Market format. Documents are assumed to be rows of the matrix (and document features are columns).
input is either a string (file path) or a file-like object that supports seek() (e.g. gzip.GzipFile, bz2.BZ2File).
Return document at file offset offset (in bytes)
Store corpus in Matrix Market format.
Save the vector space representation of an entire corpus to disk.
Note that the documents are processed one at a time, so the whole corpus is allowed to be larger than the available RAM.
Write a single sparse vector to the file.
Sparse vector is any iterable yielding (field id, field value) pairs.
Convert a matrix in scipy.sparse format into a streaming gensim corpus.
Convert corpus into a sparse matrix, in scipy.sparse.csc_matrix format, with documents as columns.
Convert corpus into a dense numpy array (documents will be columns).
Convert a dense numpy array into the sparse corpus format (sequence of 2-tuples).
Values of magnitude < eps are treated as zero (ignored).
Convert a dense numpy array into the sparse corpus format (sequence of 2-tuples).
Values of magnitude < eps are treated as zero (ignored).
Like full2sparse, but only return the topn greatest elements (not all).
Add additional rows/columns to a numpy.matrix mat. The new rows/columns will be initialized with zeros.
Return QR decomposition of la[0]. Content of la gets destroyed in the process.
Using this function should be less memory intense than calling scipy.linalg.qr(la[0]), because the memory used in la[0] is reclaimed earlier.
Convert a document in sparse corpus format (sequence of 2-tuples) into a dense numpy array (of size length).
Scale a vector to unit length. The only exception is the zero vector, which is returned back unchanged.
If the input is sparse (list of 2-tuples), output will also be sparse. Otherwise, output will be a numpy array.