Preprocessing

Preprocessing transforms and pipeline.

All transforms are callable classes: hyperparameters at construction time, data-only at call time.

Class hierarchy

Transform (ABC) ├── CenterCrop image ├── RandomCrop image ├── RandomFlip image ├── Padding image ├── MelSpectrogram audio (y, sr) → np.ndarray ├── AudioRandomCrop audio (y, sr) → (y, sr) ├── Resample audio (y, sr) → (y, sr) ├── PitchShift audio (y, sr) → (y, sr) └── Pipeline any chains transforms sequentially

class src.preprocessing.Transform[source]

Bases: ABC

Abstract base class for all preprocessing transforms.

Every transform is callable: hyperparameters are fixed at construction, and __call__() receives only the data to transform.

class src.preprocessing.CenterCrop(height: int, width: int)[source]

Bases: Transform

Crop an image around its center to at most height × width.

A dimension is only cropped when the image is strictly larger than the target in that dimension; smaller dimensions are left unchanged.

Parameters:

height – Maximum output height in pixels.
width – Maximum output width in pixels.

Raises:

TypeError – If height or width are not int.
ValueError – If either dimension is less than 1.

property height: int: Target height.

property width: int: Target width.

class src.preprocessing.RandomCrop(height: int, width: int)[source]

Bases: Transform

Crop an image at a random position to at most height × width.

A dimension is only cropped when the image is strictly larger than the target in that dimension; smaller dimensions are left unchanged.

Parameters:

height – Maximum output height in pixels.
width – Maximum output width in pixels.

Raises:

TypeError – If height or width are not int.
ValueError – If either dimension is less than 1.

property height: int: Target height.

property width: int: Target width.

class src.preprocessing.RandomFlip(p: float = 0.5)[source]

Bases: Transform

Randomly flip an image along its horizontal and/or vertical axis.

Each axis is flipped independently with probability p.

Parameters:

p – Probability of flipping along each axis. Must be in [0, 1].

Raises:

TypeError – If p is not a float.
ValueError – If p is outside [0, 1].

property p: float: Flip probability per axis.

class src.preprocessing.Padding(height: int, width: int, color: tuple[int, int, int] = (0, 0, 0))[source]

Bases: Transform

Pad an image to at least height × width using a solid colour.

Padding is added symmetrically. Dimensions that already meet or exceed the target are not modified.

Parameters:

height – Minimum output height in pixels.
width – Minimum output width in pixels.
color – RGB fill colour as an (R, G, B) int tuple. Default black.

Raises:

TypeError – If height or width are not int, or color is not a tuple.
ValueError – If either dimension is less than 1.

property height: int: Minimum output height.

property width: int: Minimum output width.

property color: tuple[int, int, int]: RGB fill colour.

class src.preprocessing.MelSpectrogram(n_mels: int = 128, n_fft: int = 2048, hop_length: int = 512)[source]

Bases: Transform

Convert a waveform to a Mel spectrogram.

This transform changes the data type: input is (y, sr), output is a 2-D np.ndarray. Audio-specific transforms (e.g. Resample) cannot be chained after this one.

Parameters:

n_mels – Number of Mel frequency bands.
n_fft – FFT window size in samples.
hop_length – Hop length in samples between successive frames.

Raises:

TypeError – If any argument is not an int.

property n_mels: int

property n_fft: int

property hop_length: int

class src.preprocessing.AudioRandomCrop(duration: float)[source]

Bases: Transform

Randomly crop an audio track to a fixed duration.

If the track is shorter than or equal to duration seconds, the original track is returned unchanged.

Parameters:

duration – Target duration in seconds (must be positive).

Raises:

TypeError – If duration is not numeric.
ValueError – If duration is not positive.

property duration: float: Target duration in seconds.

class src.preprocessing.Resample(target_sr: int)[source]

Bases: Transform

Resample an audio track to a new sampling rate.

Parameters:

target_sr – Target sampling rate in Hz.

Raises:

TypeError – If target_sr is not an int.
ValueError – If target_sr is less than 1.

property target_sr: int: Target sampling rate in Hz.

class src.preprocessing.PitchShift(n_steps: float)[source]

Bases: Transform

Shift the pitch of an audio track by a fixed number of semitones.

Parameters:: n_steps – Semitones to shift (positive = up, negative = down).
Raises:: TypeError – If n_steps is not numeric.

property n_steps: float: Semitone shift applied.

class src.preprocessing.Pipeline(*transforms: Transform)[source]

Bases: Transform

Chain transforms and apply them sequentially.

Takes a variable number of Transform instances and applies them left to right. Order matters: some transforms change the data type (e.g. MelSpectrogram) and cannot be followed by transforms that expect the original type.

Parameters:: *transforms – Transform instances to apply in order.
Raises:: TypeError – If any positional argument is not a Transform.

Example:

pipeline = Pipeline(
    AudioRandomCrop(duration=5.0),
    Resample(target_sr=22050),
    MelSpectrogram(n_mels=128),
)
spectrogram = pipeline((y, sr))

property transforms: tuple[Transform, ...]: The ordered tuple of transforms in this pipeline.