Transormation pipeline for Train dataset
- Read audio wave from filepath
- Read wav file (
tf.io.read_file
)
- Decode wav file (
tf.audio.decode_wav
)
- Remove silence from the begining and the end (
tfio.audio.trim
) (OPTIONAL)
- Limit audio to a fixed number of seconds
- Sorter audio –> Pad the end with zeros
- Longer audio –> Random crop
- Data augmentation over audio wave
- Change Speed
- Pink noise
- Gaussian noise
- Gaussian SNR
- Gain (Volume Adjustment)
- Convert audio to MelSpectogram
- Convert audio to spectogram (
tfio.audio.spectrogram
)
- Apply the Mel scale (
tfio.audio.melscale
)
- Apply the DB scale (
tfio.audio.dbscale
)
- Data augmentation over MelSpectogram
- Time Warping (
tfa.image.sparse_image_warp
) (from the SpecAugment paper)
- Time Masking (
tfio.audio.time_mask
) (from the SpecAugment paper)
- Frequency Masking (
tfio.audio.freq_mask
) (from the SpecAugment paper)
- Mixup
- Any other image transformation
- Add the coordconv channel (OPTIONAL)
- Normalize (standard scale)
- Apply the correct mean and std if transfer learning