Coverage for lingpy/algorithm/extra.py : 95%

Hot-keys on this page
r m x p toggle line displays
j k next/prev highlighted chunk
0 (zero) top of page
1 (one) first highlighted chunk
""" Adapting specific cluster algorithms from scikit-learn to LingPy. """
except ImportError:
threshold, matrix, taxa, revert=False, min_samples=1): """ Compute DBSCAN cluster analysis.
Parameters ---------- threshold : float The threshold for clustering you want to use. matrix : list The two-dimensional matrix passed as list or array. taxa : list The list of taxon names. If set to "False" a fake list of taxon names will be created, giving a positive numerical ID in increasing order for each column in the matrix. revert : bool If set to "False", don't return taxon names but simply the language identifiers and their labels as a dictionary. Otherwise returns a dictionary with labels as keys and list of taxon names as values. min_samples : int (default=1) The minimal samples parameter of the DBCSCAN method from the SKLEARN package.
Returns ------- clusters : dict Either a dictionary of taxon identifiers and labels, or a dictionary of labels and taxon names.
Notes ----- This method does not work as expected, probably since it normally requires distances between points as input. We list it only for completeness here, but urge to be careful when using the code and checking properly our implementation in the source code.
Requires the scikitlearn package, downloadable from http://scikit-learn.org/. """ raise ValueError("The package sklearn is needed to run this analysis.")
taxa = list(range(1, len(matrix) + 1))
matrix, eps=threshold, min_samples=min_samples, metric='precomputed')
# change to our internal cluster style
# check for revert
""" Compute affinity propagation from the matrix.
Parameters ---------- threshold : float The threshold for clustering you want to use. matrix : list The two-dimensional matrix passed as list or array. taxa : list The list of taxon names. If set to "False" a fake list of taxon names will be created, giving a positive numerical ID in increasing order for each column in the matrix. revert : bool If set to "False", don't return taxon names but simply the language identifiers and their labels as a dictionary. Otherwise returns a dictionary with labels as keys and list of taxon names as values.
Returns ------- clusters : dict Either a dictionary of taxon identifiers and labels, or a dictionary of labels and taxon names.
Notes -----
Affinity propagation is a clustering method originally proposed by :evobib:`Frey2007`.
Requires the scikitlearn package, downloadable from http://scikit-learn.org/.
"""
# turn distances to similarities
# iterate over matrix else:
# change to our internal cluster style
# check for revert
""" Compute the Infomap clustering analysis of the data.
Parameters ---------- threshold : float The threshold for clustering you want to use. matrix : list The two-dimensional matrix passed as list or array. taxa : list The list of taxon names. If set to "False" a fake list of taxon names will be created, giving a positive numerical ID in increasing order for each column in the matrix. revert : bool If set to "False", don't return taxon names but simply the language identifiers and their labels as a dictionary. Otherwise returns a dictionary with labels as keys and list of taxon names as values.
Returns ------- clusters : dict Either a dictionary of taxon identifiers and labels, or a dictionary of labels and taxon names.
Notes ----- Infomap clustering is a community detection method originally proposed by :evobib:`Rosvall2008`.
Requires the igraph package is required, downloadable from http://igraph.org/. """ raise ValueError("The package igraph is needed to run this analysis.")
# variable stores edge weights, if they are not there, the network is # already separated by the threshold
vertex_weights=None)
|