skvai.tasks package

Submodules

skvai.tasks.classification module

class skvai.tasks.classification.Task

Bases: object

load_data(path)
set_target(col_name)

Call this before load_data() to override the default target column.

train_and_output(format='graph')

skvai.tasks.clustering module

Clustering task module for skvai.

skvai.tasks.clustering.cluster(data: CSVData, model: str = 'KMeans', n_clusters: int = 3, output: list = ['labels'], save_path: str = 'clusterer.pkl', labels_csv: str = 'labels.csv', random_state: int = 42)

Fit and apply a clustering model on CSVData.

Parameters:
  • data (CSVData) – Loaded dataset with attribute X and df.

  • model (str) – Which clusterer: ‘KMeans’ or ‘DBSCAN’.

  • n_clusters (int) – Number of clusters for KMeans.

  • output (list) – What to output: ‘metrics’, ‘plot’, ‘csv’, ‘save’.

  • save_path (str) – Path to save the clusterer (.pkl).

  • labels_csv (str) – Path to save full-data labels (.csv).

  • random_state (int) – Random seed for reproducibility.

Returns:

{‘labels’, ‘model’}

Return type:

dict

skvai.tasks.regression module

skvai.tasks.regression.regress(data: CSVData, model: str = 'LinearRegression', output: list = ['metrics'], save_path: str = 'regressor.pkl', prediction_csv: str = 'predictions.csv', test_size: float = 0.2, random_state: int = 42)

Train and evaluate a regression model on CSVData.

Parameters:
  • data (CSVData) – Loaded dataset with X, y, df.

  • model (str) – ‘LinearRegression’ or ‘RandomForestRegressor’.

  • output (list) – Options: ‘metrics’, ‘plot’, ‘csv’, ‘save’.

  • save_path (str) – Save path for model.

  • prediction_csv (str) – Save path for predictions.

  • test_size (float) – Split size for testing.

  • random_state (int) – Seed for reproducibility.

Returns:

mse, r2, predictions, model

Return type:

dict

Module contents

class skvai.tasks.Task

Bases: object

load_data(path)
set_target(col_name)

Call this before load_data() to override the default target column.

train_and_output(format='graph')
skvai.tasks.cluster(data: CSVData, model: str = 'KMeans', n_clusters: int = 3, output: list = ['labels'], save_path: str = 'clusterer.pkl', labels_csv: str = 'labels.csv', random_state: int = 42)

Fit and apply a clustering model on CSVData.

Parameters:
  • data (CSVData) – Loaded dataset with attribute X and df.

  • model (str) – Which clusterer: ‘KMeans’ or ‘DBSCAN’.

  • n_clusters (int) – Number of clusters for KMeans.

  • output (list) – What to output: ‘metrics’, ‘plot’, ‘csv’, ‘save’.

  • save_path (str) – Path to save the clusterer (.pkl).

  • labels_csv (str) – Path to save full-data labels (.csv).

  • random_state (int) – Random seed for reproducibility.

Returns:

{‘labels’, ‘model’}

Return type:

dict

skvai.tasks.regress(data: CSVData, model: str = 'LinearRegression', output: list = ['metrics'], save_path: str = 'regressor.pkl', prediction_csv: str = 'predictions.csv', test_size: float = 0.2, random_state: int = 42)

Train and evaluate a regression model on CSVData.

Parameters:
  • data (CSVData) – Loaded dataset with X, y, df.

  • model (str) – ‘LinearRegression’ or ‘RandomForestRegressor’.

  • output (list) – Options: ‘metrics’, ‘plot’, ‘csv’, ‘save’.

  • save_path (str) – Save path for model.

  • prediction_csv (str) – Save path for predictions.

  • test_size (float) – Split size for testing.

  • random_state (int) – Seed for reproducibility.

Returns:

mse, r2, predictions, model

Return type:

dict