Preprocessing

Preprocessing transforms and pipeline.

All transforms are callable classes: hyperparameters at construction time, data-only at call time.

Class hierarchy

Transform (ABC) ├── CenterCrop image ├── RandomCrop image ├── RandomFlip image ├── Padding image ├── MelSpectrogram audio (y, sr) → np.ndarray ├── AudioRandomCrop audio (y, sr) → (y, sr) ├── Resample audio (y, sr) → (y, sr) ├── PitchShift audio (y, sr) → (y, sr) └── Pipeline any chains transforms sequentially

class src.preprocessing.Transform[source]

Bases: ABC

Abstract base class for all preprocessing transforms.

Every transform is callable: hyperparameters are fixed at construction, and __call__() receives only the data to transform.

class src.preprocessing.CenterCrop(height: int, width: int)[source]

Bases: Transform

Crop an image around its center to at most height × width.

A dimension is only cropped when the image is strictly larger than the target in that dimension; smaller dimensions are left unchanged.

Parameters:
  • height – Maximum output height in pixels.

  • width – Maximum output width in pixels.

Raises:
  • TypeError – If height or width are not int.

  • ValueError – If either dimension is less than 1.

property height: int

Target height.

property width: int

Target width.

class src.preprocessing.RandomCrop(height: int, width: int)[source]

Bases: Transform

Crop an image at a random position to at most height × width.

A dimension is only cropped when the image is strictly larger than the target in that dimension; smaller dimensions are left unchanged.

Parameters:
  • height – Maximum output height in pixels.

  • width – Maximum output width in pixels.

Raises:
  • TypeError – If height or width are not int.

  • ValueError – If either dimension is less than 1.

property height: int

Target height.

property width: int

Target width.

class src.preprocessing.RandomFlip(p: float = 0.5)[source]

Bases: Transform

Randomly flip an image along its horizontal and/or vertical axis.

Each axis is flipped independently with probability p.

Parameters:

p – Probability of flipping along each axis. Must be in [0, 1].

Raises:
  • TypeError – If p is not a float.

  • ValueError – If p is outside [0, 1].

property p: float

Flip probability per axis.

class src.preprocessing.Padding(height: int, width: int, color: tuple[int, int, int] = (0, 0, 0))[source]

Bases: Transform

Pad an image to at least height × width using a solid colour.

Padding is added symmetrically. Dimensions that already meet or exceed the target are not modified.

Parameters:
  • height – Minimum output height in pixels.

  • width – Minimum output width in pixels.

  • color – RGB fill colour as an (R, G, B) int tuple. Default black.

Raises:
  • TypeError – If height or width are not int, or color is not a tuple.

  • ValueError – If either dimension is less than 1.

property height: int

Minimum output height.

property width: int

Minimum output width.

property color: tuple[int, int, int]

RGB fill colour.

class src.preprocessing.MelSpectrogram(n_mels: int = 128, n_fft: int = 2048, hop_length: int = 512)[source]

Bases: Transform

Convert a waveform to a Mel spectrogram.

This transform changes the data type: input is (y, sr), output is a 2-D np.ndarray. Audio-specific transforms (e.g. Resample) cannot be chained after this one.

Parameters:
  • n_mels – Number of Mel frequency bands.

  • n_fft – FFT window size in samples.

  • hop_length – Hop length in samples between successive frames.

Raises:

TypeError – If any argument is not an int.

property n_mels: int
property n_fft: int
property hop_length: int
class src.preprocessing.AudioRandomCrop(duration: float)[source]

Bases: Transform

Randomly crop an audio track to a fixed duration.

If the track is shorter than or equal to duration seconds, the original track is returned unchanged.

Parameters:

duration – Target duration in seconds (must be positive).

Raises:
  • TypeError – If duration is not numeric.

  • ValueError – If duration is not positive.

property duration: float

Target duration in seconds.

class src.preprocessing.Resample(target_sr: int)[source]

Bases: Transform

Resample an audio track to a new sampling rate.

Parameters:

target_sr – Target sampling rate in Hz.

Raises:
  • TypeError – If target_sr is not an int.

  • ValueError – If target_sr is less than 1.

property target_sr: int

Target sampling rate in Hz.

class src.preprocessing.PitchShift(n_steps: float)[source]

Bases: Transform

Shift the pitch of an audio track by a fixed number of semitones.

Parameters:

n_steps – Semitones to shift (positive = up, negative = down).

Raises:

TypeError – If n_steps is not numeric.

property n_steps: float

Semitone shift applied.

class src.preprocessing.Pipeline(*transforms: Transform)[source]

Bases: Transform

Chain transforms and apply them sequentially.

Takes a variable number of Transform instances and applies them left to right. Order matters: some transforms change the data type (e.g. MelSpectrogram) and cannot be followed by transforms that expect the original type.

Parameters:

*transformsTransform instances to apply in order.

Raises:

TypeError – If any positional argument is not a Transform.

Example:

pipeline = Pipeline(
    AudioRandomCrop(duration=5.0),
    Resample(target_sr=22050),
    MelSpectrogram(n_mels=128),
)
spectrogram = pipeline((y, sr))
property transforms: tuple[Transform, ...]

The ordered tuple of transforms in this pipeline.