algorithms analysis answer api collection concepts corpus design documents features framework human index infer install introduction latent dirichlet allocation model open-source paragraphs python query questions random reference representation semantic similar space sparse structure SVD text thought topic training tutorials unsupervised vector words