src.clustering ============== .. py:module:: src.clustering Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/src/clustering/affinity/index /autoapi/src/clustering/dbscan/index /autoapi/src/clustering/hdbscan/index /autoapi/src/clustering/kmeans/index /autoapi/src/clustering/optics/index /autoapi/src/clustering/spectral/index Classes ------- .. autoapisummary:: src.clustering.BaseClustering Functions --------- .. autoapisummary:: src.clustering.get_clustering_model Package Contents ---------------- .. py:class:: BaseClustering(dataset_name: str = None, embedding_method: str = None, dataset_id: int = None, embedding_id: int = None, embeddings: pandas.DataFrame = None) Base class for clustering models used to group data based on embeddings or other features. This class provides core functionality for clustering models, including loading data and storing clustering results. It is meant to be subclassed by specific clustering algorithms, which should implement their own logic for fitting the model and predicting clusters. Attributes: ----------- data : pd.DataFrame or None DataFrame containing the data to be clustered. labels : pd.Series or None Series containing the cluster labels assigned to the data. Methods: -------- load_data(file_path: str) -> pd.DataFrame: Loads a dataset from a CSV or pickle file into a pandas DataFrame. save_labels(file_path: str): Saves the cluster labels to a CSV or pickle file. .. py:attribute:: dataset_name .. py:attribute:: embedding_method .. py:attribute:: dataset_id .. py:attribute:: embedding_id .. py:attribute:: embeddings .. py:attribute:: data :value: None .. py:attribute:: labels :value: None .. py:method:: load_data() -> pandas.DataFrame Load the data to be clustered from a CSV or pickle file. :param file_path: Path to the data file (CSV or pickle format). :return: DataFrame containing the loaded data. .. py:method:: scale_data(data) .. py:method:: save_labels(labels) Save the cluster labels to a CSV or pickle file. :param file_path: Path where the cluster labels will be saved. .. py:method:: fit_predict() :param self.embeddings: DataFrame containing the data to be clustered. :return: DataFrame containing the cluster labels assigned to the data. .. py:function:: get_clustering_model(method_name: str, *args, **kwargs)