WaveNet
looks like a convolution (not necessary, was just a design choice)
They actually use “dilated” connections to be able to look further back
Architecture
Building makemore Part 5: Building a WaveNet - YouTube
L2 Autoregressive Models -- CS294-158 SP24 Deep Unsupervised Learning -- UC Berkeley Spring 2024 - YouTube