src.embeddings
Submodules
Attributes
Classes
Base class for embedding models used to generate embeddings from audio data. |
Functions
|
Splits an audio file into non-overlapping chunks of specified duration. |
Package Contents
- src.embeddings.logger
- src.embeddings._split_audio_into_chunks(filename: str, chunk_duration: float, sampling_rate: int or None = None) pandas.DataFrame
Splits an audio file into non-overlapping chunks of specified duration.
- Args:
filename (str): Path to the audio file. chunk_duration (float): Duration of each chunk in seconds. sampling_rate (int, optional): Sampling rate to load the audio file. If None, the default sampling rate
is used by librosa.
- Returns:
- pd.DataFrame: A DataFrame with two columns:
‘filename’: The chunked filename with the format ‘original_filename_starttime_endtime.ext’.
‘audio_data’: The corresponding chunked audio data.
- class src.embeddings.BaseEmbedding(dataset_name: str, clip_duration: float = 3.0, model_path: str | pathlib.Path | None = None, sampling_rate: int | None = None)
Base class for embedding models used to generate embeddings from audio data.
This class provides core functionality for embedding models, including loading the model, reading and processing audio datasets, and optionally using Dask for distributed processing. It is meant to be subclassed by specific embedding models, which should implement their own model loading and processing logic.
Attributes:
- dataset_namepd.DataFrame or None
DataFrame holding the processed audio data (e.g., file paths, audio features).
- embeddingspd.DataFrame or None
DataFrame containing the generated embeddings for the audio dataset.
Methods:
- load_model():
Abstract method for loading the model. Must be implemented by subclasses. :raises NotImplementedError: This method must be implemented by subclasses for model loading.
- process(audio_files):
Abstract method for processing audio files to generate embeddings. Must be implemented by subclasses. :raises NotImplementedError: This method must be implemented by subclasses for model loading.
- read_audio_dataset() -> pd.DataFrame:
Reads and processes an audio dataset, optionally using Dask for parallel processing. Returns a pandas DataFrame containing the audio file paths and other metadata. :return: A pandas DataFrame indexed by ‘filename’ and a data column ‘audio_data’ containing processed audio chunks.
- dataset_name
- model_path
- sampling_rate
- clip_duration
- data
- embeddings
- list_of_audio_files = []
- path_dataset = None
- load_model()
Placeholder method for loading the model. This should be implemented by subclasses if needed.
- abstract process()
Placeholder method for processing audio files. This should be implemented by subclasses.
- read_audio_dataset() pandas.DataFrame
Read the dataset of audio files and process it using Dask for parallelization if available.
- return:
A pandas DataFrame containing audio file paths and any other relevant metadata.