Skip to content
William Peytz

2024 · Deep learning coursework

Instrument Audio Classification

CNN and LSTM models for classifying 15 musical instruments from short audio clips, reaching 96–98% accuracy after tuning.

  • PyTorch
  • CNN
  • LSTM
  • Librosa

Overview

A supervised audio-classification project: given a short clip, identify which of 15 instruments is being played. I compared a CNN operating on spectrograms with an LSTM operating on temporal feature sequences.

What I did

  • Pre-processing. Standardized clip lengths, computed mel-spectrograms, and applied basic augmentation (time / frequency masking, gain jitter).
  • Two architectures. A 2D-CNN on log-mel spectrograms, and an LSTM consuming framewise MFCC sequences.
  • Tuning and validation. Held-out validation to tune learning rate, augmentation strength, and model depth; used confusion matrices to diagnose which instruments the models confused most often.

Result

Both models landed in the 96–98% accuracy range on the held-out test split. The CNN was the stronger baseline; the LSTM caught up after I gave it a richer feature representation.