Transormation pipeline for Train dataset
- Read audio wave from filepath
- Read wav file (
tf.io.read_file)
- Decode wav file (
tf.audio.decode_wav)
- Remove silence from the begining and the end (
tfio.audio.trim) (OPTIONAL)
- Limit audio to a fixed number of seconds
- Sorter audio –> Pad the end with zeros
- Longer audio –> Random crop
- Data augmentation over audio wave
- Change Speed
- Pink noise
- Gaussian noise
- Gaussian SNR
- Gain (Volume Adjustment)
- Convert audio to MelSpectogram
- Convert audio to spectogram (
tfio.audio.spectrogram)
- Apply the Mel scale (
tfio.audio.melscale)
- Apply the DB scale (
tfio.audio.dbscale)
- Data augmentation over MelSpectogram
- Time Warping (
tfa.image.sparse_image_warp) (from the SpecAugment paper)
- Time Masking (
tfio.audio.time_mask) (from the SpecAugment paper)
- Frequency Masking (
tfio.audio.freq_mask) (from the SpecAugment paper)
- Mixup
- Any other image transformation
- Add the coordconv channel (OPTIONAL)
- Normalize (standard scale)
- Apply the correct mean and std if transfer learning