skhubness.neighbors.HNSW

class skhubness.neighbors.HNSW(n_candidates: int = 5, metric: str = 'euclidean', method: str = 'hnsw', post_processing: int = 2, n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for using nmslib

Hierarchical navigable small-world graphs are data structures, that allow for approximate nearest neighbor search. Here, an implementation from nmslib is used.

Parameters
n_candidates: int, default = 5

Number of neighbors to retrieve

metric: str, default = ‘euclidean’

Distance metric, allowed are “angular”, “euclidean”, “manhattan”, “hamming”, “dot”

method: str, default = ‘hnsw’,

ANN method to use. Currently, only ‘hnsw’ is supported.

post_processing: int, default = 2

More post processing means longer index creation, and higher retrieval accuracy.

n_jobs: int, default = 1

Number of parallel jobs

verbose: int, default = 0

Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Attributes
valid_metrics:

List of valid distance metrics/measures

__init__(self, n_candidates: 'int' = 5, metric: 'str' = 'euclidean', method: 'str' = 'hnsw', post_processing: 'int' = 2, n_jobs: 'int' = 1, verbose: 'int' = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, n_candidates, metric, method, …)

Initialize self.

fit(self, X[, y])

Setup the HNSW index from training data.

kneighbors(self, X, n_candidates, …)

Retrieve k nearest neighbors.

Attributes

valid_metrics

fit(self, X, y=None) → 'HNSW'[source]

Setup the HNSW index from training data.

Parameters
X: np.array

Data to be indexed

y: any

Ignored

Returns
self: HNSW

An instance of HNSW with a built graph

kneighbors(self, X: 'np.ndarray' = None, n_candidates: 'int' = None, return_distance: 'bool' = True) → 'Union[Tuple[np.array, np.array], np.array]'[source]

Retrieve k nearest neighbors.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

n_candidates: int or None, optional, default = None

Number of neighbors to retrieve. If None, use the value passed during construction.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.