vbvarsel

Submodules

Attributes

__version__

__authors__

Classes

Hyperparameters

Class representing the hyperparameters for the simulation.

SimulationParameters

Class representing the simulation parameters.

ExperimentValues

Functions

main(, Ctrick, user_data, user_labels, cols_to_skip, ...)

The main entry point to the package.

Package Contents

vbvarsel.main(hyperparameters: vbvarsel.global_parameters.Hyperparameters, simulation_parameters: vbvarsel.global_parameters.SimulationParameters = SimulationParameters(), Ctrick: bool = True, user_data: str | os.PathLike = None, user_labels: str | list[str] = None, cols_to_skip: list[str] = None, annealing_type: str = 'fixed', save_output: bool = False) _Results[source]

The main entry point to the package.

Params

hyperparameters: Hyperparameters (Required)

An object of hyperparamters to apply to the simulation.

simulation_parameters: SimulationParameters (Optional) (Default: SimulationParameters())

An object of simulation paramaters to apply to the simulation. Note: This is a required parameter if a user does not supply their own data.

Ctrick: bool (Optional) (Default: True)

Flag to determine whether or not to apply replica trick to the simulation

user_data: str or os.PathLike (Optional) (Default: None)

A location of a csv document for data a user whishes to test.

user_labels: str | list[str] (Optional) (Default: None)

A string or list of strings to identify labels. A string value will try to extract a column of the same name from the supplied data.

cols_to_skip: list[str] (Optional) (Default: None)

An optional list of columns to drop from the dataframe. This should be used to remove any non-numeric data from the dataframe. If a column shares the same name as a label column, the labels will be extracted before the column is dropped.

Hint: an unnamed column can be passed by using “Unnamed: [index]”, eg “Unnamed: 0” to drop a blank name first column.

annealing_type: str (Optional) (Default: “fixed”)

Optional type of annealing to apply to the simulation, can be one of “geometric”, “harmonic” or “fixed”, the latter of which does not apply any annealing.

save_output: bool (Optional) (Default: False)

Optional flag for users to save their output to a csv file. Data is saved in the current working directory with the file naming format “results-timestamp.csv”.

Returns

results: dataclass

An object of results stored in a series of arrays from the clustering algorithm. Some arrays may be populated by nan values. This is the case if a user supplies their own data but does not have corresponding labels. Additionally, some fields are only captured during entirely simulated runs, as such will be nan-ed if a user provides their own dataset.

class vbvarsel.Hyperparameters[source]

Class representing the hyperparameters for the simulation.

threshold: float = 0.1
k1: int = 5
alpha0: float
a0: int = 3
beta0: float = 0.001
d0: int = 1
t_max: int = 1
max_itr: int = 25
max_annealed_itr: int = 10
max_models: int = 10
__post_init__()[source]
class vbvarsel.SimulationParameters[source]

Class representing the simulation parameters.

These parameters are the “settings” for the simulation experiment. For more information regarding the simulation, see [INSERT PAPER HERE]. Mixture proportions must be numbers between 0 and 1. The n_relevants array must be all numbers that are less than the n_variables value, as it is not possible to have a higher number of relevant variables than total variables.

n_observations: list[int]
n_variables: int = 200
n_relevants: list[int]
mixture_proportions: list[float]
means: list[int]
class vbvarsel.ExperimentValues[source]
true_labels: list[int]
data: numpy.ndarray = None
permutations: list[int]
shuffled_data: numpy.ndarray = None
vbvarsel.__version__ = '0.0.1'[source]
vbvarsel.__authors__ = ['Paul Kirk', 'Emma Prevot', 'Rory Toogood', 'Filippo Pagani', 'Alan Nardo'][source]