Deep Learning for MIR Tickets, Monday, July 20-Friday, July 31

Overview

Music Information Retrieval, starting with basics, and ending with state-of-the-art algorithms.

Deep Learning for Music Information Retrieval

This workshop offers a fast-paced introduction to audio and music processing with deep learning to bring you up to speed with the state-of-the-art practice in 2025. Participants will learn to build tools to analyze and manipulate digital audio signals with PyTorch. Both theory and practice of digital audio processing will be discussed with hands-on exercises on algorithm implementation. These concepts will be applied to various topics in music information retrieval. Some knowledge of python, linear algebra, and object oriented programming are assumed.

In-person (CCRMA, Stanford) and online enrollment options available. Students will receive the same teaching materials and have access to the same tutorials in either format. However, students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.

Schedule

Day 1
• Review: Fundamentals of audio signals, key mathematical concepts (linear algebra, calculus), and common music/audio features (MFCCs, chroma, spectral contrast).
• Theory: Overview of time-frequency representations (STFT, mel-spectrogram), feature extraction pipelines.
• Hands-on: Audio feature extraction using Librosa and TorchAudio.

Day 2
• Review: Feedforward neural networks and the fundamentals of deep learning (backpropagation, loss functions).
• Theory: Introduction to the Transformer architecture; comparison with traditional sequence models.
• Hands-on: Training a simple Transformer for sequence classification (e.g., audio command recognition).

Day 3
• Theory: Convolutional Neural Networks (CNNs) for audio classification; Recurrent Neural Networks (RNNs) for temporal modeling.
• Hands-on: Spectrogram-based genre or instrument classification using CNNs and/or RNNs in PyTorch.

Day 4
• Theory: Generative models for audio — Variational Autoencoders (VAEs), diffusion models, and their applications in audio/music synthesis.
• Hands-on: Musical tone generation using a pitch- or timbre-conditioned VAE; exploration of a pre-trained diffusion model for audio generation.

Day 5
• Literature: Guided reading and discussion on recent papers (e.g., AudioCLIP, Jukebox, AudioLM, MusicLM, MusicGen).
• Hands-on: Group project presentations and demos (e.g., semantic audio tagging, music synthesis, or creative audio applications using models explored during the week).

About the instructors

Kitty Shi is an accordionist, pianist, bagpipes player, and a music technologist. She received her PhD from CCRMA in 2021 and she’s now a machine learning engineer at Pinterest. Kitty’s research interest is in computer-assisted expressive musical performance.

Iran R. Roman is a faculty member at Queen Mary University London, leading research in theoretical neuroscience and machine perception. He holds a PhD from CCRMA. Iran is a passionate instructor and mentor, with extensive experience teaching AI and signal processing at institutions like Stanford University, New York University, and the National Autonomous University of Mexico. He has worked with companies companies like Plantronics, Apple, Oscilloscape, Tesla, and Raytheon/BBN to build and deploy AI models. iranroman.github.io

Music Information Retrieval, starting with basics, and ending with state-of-the-art algorithms.

Deep Learning for Music Information Retrieval

This workshop offers a fast-paced introduction to audio and music processing with deep learning to bring you up to speed with the state-of-the-art practice in 2025. Participants will learn to build tools to analyze and manipulate digital audio signals with PyTorch. Both theory and practice of digital audio processing will be discussed with hands-on exercises on algorithm implementation. These concepts will be applied to various topics in music information retrieval. Some knowledge of python, linear algebra, and object oriented programming are assumed.

In-person (CCRMA, Stanford) and online enrollment options available. Students will receive the same teaching materials and have access to the same tutorials in either format. However, students will gain access to more in-depth, hands-on 1:1 instructor discussion and feedback when taking the course in-person.

Schedule

Day 1
• Review: Fundamentals of audio signals, key mathematical concepts (linear algebra, calculus), and common music/audio features (MFCCs, chroma, spectral contrast).
• Theory: Overview of time-frequency representations (STFT, mel-spectrogram), feature extraction pipelines.
• Hands-on: Audio feature extraction using Librosa and TorchAudio.

Day 2
• Review: Feedforward neural networks and the fundamentals of deep learning (backpropagation, loss functions).
• Theory: Introduction to the Transformer architecture; comparison with traditional sequence models.
• Hands-on: Training a simple Transformer for sequence classification (e.g., audio command recognition).

Day 3
• Theory: Convolutional Neural Networks (CNNs) for audio classification; Recurrent Neural Networks (RNNs) for temporal modeling.
• Hands-on: Spectrogram-based genre or instrument classification using CNNs and/or RNNs in PyTorch.

Day 4
• Theory: Generative models for audio — Variational Autoencoders (VAEs), diffusion models, and their applications in audio/music synthesis.
• Hands-on: Musical tone generation using a pitch- or timbre-conditioned VAE; exploration of a pre-trained diffusion model for audio generation.

Day 5
• Literature: Guided reading and discussion on recent papers (e.g., AudioCLIP, Jukebox, AudioLM, MusicLM, MusicGen).
• Hands-on: Group project presentations and demos (e.g., semantic audio tagging, music synthesis, or creative audio applications using models explored during the week).

About the instructors

Kitty Shi is an accordionist, pianist, bagpipes player, and a music technologist. She received her PhD from CCRMA in 2021 and she’s now a machine learning engineer at Pinterest. Kitty’s research interest is in computer-assisted expressive musical performance.

Iran R. Roman is a faculty member at Queen Mary University London, leading research in theoretical neuroscience and machine perception. He holds a PhD from CCRMA. Iran is a passionate instructor and mentor, with extensive experience teaching AI and signal processing at institutions like Stanford University, New York University, and the National Autonomous University of Mexico. He has worked with companies companies like Plantronics, Apple, Oscilloscape, Tesla, and Raytheon/BBN to build and deploy AI models. iranroman.github.io

Good to know

Highlights

In person

Refund Policy

Refunds up to 7 days before event

Location

The Knoll

660 Lomita Court

Stanford, CA 94305

How do you want to get there?

Organized by

C

Top OrganizerCCRMA Summer Workshops

Followers--

Events64

Hosting5 years

Report this event

Deep Learning for MIR

Deep Learning for Music Information Retrieval

Schedule

Deep Learning for Music Information Retrieval

Schedule

Good to know

Location

The Knoll

How do you want to get there?

More events from CCRMA Summer Workshops

Discover more events from CCRMA Summer Workshops, from Music to other experiences you might love.

Still looking for the right event?

Explore all events in Stanford and filter by date, category, and more to find the perfect fit.

Deep Learning for MIR

Deep Learning for Music Information Retrieval

Schedule

Deep Learning for Music Information Retrieval

Schedule

Good to know

Location

The Knoll

How do you want to get there?

More events from CCRMA Summer Workshops

Discover more events from CCRMA Summer Workshops, from Music to other experiences you might love.

More Music events

Browse more Music events with different dates, prices, and formats to find your next great experience.

Still looking for the right event?

Explore all events in Stanford and filter by date, category, and more to find the perfect fit.