Dataset Base Classes
Abstract base classes for the dataset hierarchy.
Hierarchy
Dataset (ABC) ├── LabeledDataset (ABC) └── UnlabeledDataset (ABC)
Concrete subclasses live in image_dataset.py and audio_dataset.py.
- class src.dataset.Dataset(root: str, lazy: bool = True)[source]
Bases:
ABCAbstract base class for all datasets.
Subclasses must implement
_scan_files(),_load_file(), and__getitem__().- Attributes (private):
_root: Root folder path where data files are stored. _lazy: Whether to load data lazily (on access) or eagerly (at init). _file_paths: Paths discovered by
_scan_files(). _data: Pre-loaded data objects when eager;Nonewhen lazy.
- split(train_ratio: float) tuple[Dataset, Dataset][source]
Split the dataset into training and test subsets.
The dataset is shuffled randomly before splitting so that the distribution of examples is approximately balanced in both subsets.
- Parameters:
train_ratio – Fraction of data points to include in the training set. Must be in the open interval (0, 1).
- Returns:
A
(train_dataset, test_dataset)tuple, each of the same concrete class asself.- Raises:
TypeError – If train_ratio is not a
float.ValueError – If train_ratio is not strictly between 0 and 1.
- property root: str
Root folder path.
- property lazy: bool
Whether the dataset uses lazy loading.
- class src.dataset.LabeledDataset(root: str, lazy: bool = True)[source]
Bases:
Dataset,ABCAbstract base class for datasets that carry per-sample labels.
Adds a
_labelslist that is populated by_load_labels()and kept parallel to_file_paths.Concrete subclasses must still implement
_scan_files()and_load_file(). They should call_load_labels()aftersuper().__init__()(which calls_scan_files).- property labels: list[Any]
Per-sample labels, parallel to the file-path list.