src.clustering

Submodules

Classes

BaseClustering

Base class for clustering models used to group data based on embeddings or other features.

Functions

get_clustering_model(method_name, *args, **kwargs)

Package Contents

class src.clustering.BaseClustering(dataset_name: str = None, embedding_method: str = None, dataset_id: int = None, embedding_id: int = None, embeddings: pandas.DataFrame = None)

Base class for clustering models used to group data based on embeddings or other features.

This class provides core functionality for clustering models, including loading data and storing clustering results. It is meant to be subclassed by specific clustering algorithms, which should implement their own logic for fitting the model and predicting clusters.

Attributes:

datapd.DataFrame or None

DataFrame containing the data to be clustered.

labelspd.Series or None

Series containing the cluster labels assigned to the data.

Methods:

load_data(file_path: str) -> pd.DataFrame:

Loads a dataset from a CSV or pickle file into a pandas DataFrame.

save_labels(file_path: str):

Saves the cluster labels to a CSV or pickle file.

dataset_name
embedding_method
dataset_id
embedding_id
embeddings
data = None
labels = None
load_data() pandas.DataFrame

Load the data to be clustered from a CSV or pickle file.

Parameters:

file_path – Path to the data file (CSV or pickle format).

Returns:

DataFrame containing the loaded data.

scale_data(data)
save_labels(labels)

Save the cluster labels to a CSV or pickle file.

Parameters:

file_path – Path where the cluster labels will be saved.

fit_predict()
Parameters:

self.embeddings – DataFrame containing the data to be clustered.

Returns:

DataFrame containing the cluster labels assigned to the data.

src.clustering.get_clustering_model(method_name: str, *args, **kwargs)