skhubness.neighbors.ONNG

class skhubness.neighbors.ONNG(n_candidates: int = 5, metric: str = 'euclidean', index_dir: str = 'auto', edge_size_for_creation: int = 40, edge_size_for_search: int = 10, n_jobs: int = 1, verbose: int = 0)[source]

Wrapper for ngtpy and ONNG

Parameters
n_candidates: int, default = 5

Number of neighbors to retrieve

metric: str, default = ‘euclidean’

Distance metric, allowed are ‘manhattan’, ‘L1’, ‘euclidean’, ‘L2’, ‘minkowski’, ‘Angle’, ‘Normalized Angle’, ‘Hamming’, ‘Jaccard’, ‘Cosine’ or ‘Normalized Cosine’.

index_dir: str, default = ‘auto’

Store the index in the given directory. If None, keep the index in main memory (NON pickleable index), If index_dir is a string, it is interpreted as a directory to store the index into, if ‘auto’, create a temp dir for the index, preferably in /dev/shm on Linux. Note: The directory/the index will NOT be deleted automatically.

n_jobs: int, default = 1

Number of parallel jobs

verbose: int, default = 0

Verbosity level. If verbose > 0, show tqdm progress bar on indexing and querying.

Notes

ONNG stores the index to a directory specified in index_dir. The index is persistent, and will NOT be deleted automatically. It is the user’s responsibility to take care of deletion, when required.

Attributes
valid_metrics:

List of valid distance metrics/measures

__init__(self, n_candidates: 'int' = 5, metric: 'str' = 'euclidean', index_dir: 'str' = 'auto', edge_size_for_creation: 'int' = 40, edge_size_for_search: 'int' = 10, n_jobs: 'int' = 1, verbose: 'int' = 0)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(self, n_candidates, metric, …)

Initialize self.

fit(self, X[, y])

Build the ngtpy.Index and insert data from X.

get_params(self[, deep])

Get parameters for this estimator.

kneighbors(self[, X, n_candidates, …])

Retrieve k nearest neighbors.

set_params(self, \*\*params)

Set the parameters of this estimator.

Attributes

internal_distance_type

valid_metrics

fit(self, X, y=None) → 'ONNG'[source]

Build the ngtpy.Index and insert data from X.

Parameters
X: np.array

Data to be indexed

y: any

Ignored

Returns
self: ONNG

An instance of ONNG with a built index

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

kneighbors(self, X=None, n_candidates=None, return_distance=True) → 'Union[Tuple[np.array, np.array], np.array]'[source]

Retrieve k nearest neighbors.

Parameters
X: np.array or None, optional, default = None

Query objects. If None, search among the indexed objects.

n_candidates: int or None, optional, default = None

Number of neighbors to retrieve. If None, use the value passed during construction.

return_distance: bool, default = True

If return_distance, will return distances and indices to neighbors. Else, only return the indices.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self