Preprocessing
Preprocessing transforms and pipeline.
All transforms are callable classes: hyperparameters at construction time, data-only at call time.
Class hierarchy
Transform (ABC) ├── CenterCrop image ├── RandomCrop image ├── RandomFlip image ├── Padding image ├── MelSpectrogram audio (y, sr) → np.ndarray ├── AudioRandomCrop audio (y, sr) → (y, sr) ├── Resample audio (y, sr) → (y, sr) ├── PitchShift audio (y, sr) → (y, sr) └── Pipeline any chains transforms sequentially
- class src.preprocessing.Transform[source]
Bases:
ABCAbstract base class for all preprocessing transforms.
Every transform is callable: hyperparameters are fixed at construction, and
__call__()receives only the data to transform.
- class src.preprocessing.CenterCrop(height: int, width: int)[source]
Bases:
TransformCrop an image around its center to at most
height×width.A dimension is only cropped when the image is strictly larger than the target in that dimension; smaller dimensions are left unchanged.
- Parameters:
height – Maximum output height in pixels.
width – Maximum output width in pixels.
- Raises:
TypeError – If
heightorwidthare notint.ValueError – If either dimension is less than 1.
- property height: int
Target height.
- property width: int
Target width.
- class src.preprocessing.RandomCrop(height: int, width: int)[source]
Bases:
TransformCrop an image at a random position to at most
height×width.A dimension is only cropped when the image is strictly larger than the target in that dimension; smaller dimensions are left unchanged.
- Parameters:
height – Maximum output height in pixels.
width – Maximum output width in pixels.
- Raises:
TypeError – If
heightorwidthare notint.ValueError – If either dimension is less than 1.
- property height: int
Target height.
- property width: int
Target width.
- class src.preprocessing.RandomFlip(p: float = 0.5)[source]
Bases:
TransformRandomly flip an image along its horizontal and/or vertical axis.
Each axis is flipped independently with probability
p.- Parameters:
p – Probability of flipping along each axis. Must be in
[0, 1].- Raises:
TypeError – If
pis not afloat.ValueError – If
pis outside[0, 1].
- property p: float
Flip probability per axis.
- class src.preprocessing.Padding(height: int, width: int, color: tuple[int, int, int] = (0, 0, 0))[source]
Bases:
TransformPad an image to at least
height×widthusing a solid colour.Padding is added symmetrically. Dimensions that already meet or exceed the target are not modified.
- Parameters:
height – Minimum output height in pixels.
width – Minimum output width in pixels.
color – RGB fill colour as an
(R, G, B)int tuple. Default black.
- Raises:
TypeError – If
heightorwidthare notint, orcoloris not atuple.ValueError – If either dimension is less than 1.
- property height: int
Minimum output height.
- property width: int
Minimum output width.
- property color: tuple[int, int, int]
RGB fill colour.
- class src.preprocessing.MelSpectrogram(n_mels: int = 128, n_fft: int = 2048, hop_length: int = 512)[source]
Bases:
TransformConvert a waveform to a Mel spectrogram.
This transform changes the data type: input is
(y, sr), output is a 2-Dnp.ndarray. Audio-specific transforms (e.g.Resample) cannot be chained after this one.- Parameters:
n_mels – Number of Mel frequency bands.
n_fft – FFT window size in samples.
hop_length – Hop length in samples between successive frames.
- Raises:
TypeError – If any argument is not an
int.
- property n_mels: int
- property n_fft: int
- property hop_length: int
- class src.preprocessing.AudioRandomCrop(duration: float)[source]
Bases:
TransformRandomly crop an audio track to a fixed duration.
If the track is shorter than or equal to
durationseconds, the original track is returned unchanged.- Parameters:
duration – Target duration in seconds (must be positive).
- Raises:
TypeError – If
durationis not numeric.ValueError – If
durationis not positive.
- property duration: float
Target duration in seconds.
- class src.preprocessing.Resample(target_sr: int)[source]
Bases:
TransformResample an audio track to a new sampling rate.
- Parameters:
target_sr – Target sampling rate in Hz.
- Raises:
TypeError – If
target_sris not anint.ValueError – If
target_sris less than 1.
- property target_sr: int
Target sampling rate in Hz.
- class src.preprocessing.PitchShift(n_steps: float)[source]
Bases:
TransformShift the pitch of an audio track by a fixed number of semitones.
- Parameters:
n_steps – Semitones to shift (positive = up, negative = down).
- Raises:
TypeError – If
n_stepsis not numeric.
- property n_steps: float
Semitone shift applied.
- class src.preprocessing.Pipeline(*transforms: Transform)[source]
Bases:
TransformChain transforms and apply them sequentially.
Takes a variable number of
Transforminstances and applies them left to right. Order matters: some transforms change the data type (e.g.MelSpectrogram) and cannot be followed by transforms that expect the original type.- Parameters:
*transforms –
Transforminstances to apply in order.- Raises:
TypeError – If any positional argument is not a
Transform.
Example:
pipeline = Pipeline( AudioRandomCrop(duration=5.0), Resample(target_sr=22050), MelSpectrogram(n_mels=128), ) spectrogram = pipeline((y, sr))