Offline

AI Architecture Katas: Learning by Building Small Models in Plain Python

Track:
Machine Learning: Research & Applications
Type:
Talk (long session)
Level:
intermediate
Duration:
45 minutes
View in the schedule

Abstract

Deep learning is often taught through large frameworks and large models, which is great for getting real projects out of the door, but not always great for learning. This talk is about a different practice: building tiny, runnable versions of various modern architectures with minimal dependencies (mostly Python and NumPy) to learn about the ideas through application.

We’ll get our feet wet by building a small Transformer end-to-end and learn about the model architecture that started the craze. Then we switch perspectives, and learn about other architectures, always staying small and nimble, focusing on applying the math and breathing life into formulas. We will look look at multi-scale modelling (in a simplified version of Renormalizing Generative Models), State Spaces, and other scary concepts, until they are not scary at all anymore.

You’ll leave with a model for turning papers into little prototypes that stay true to ideas and the starting point for your own little lab to build models yourself.

Prerequisites: a basic understanding of NumPy and a willingness to look at Greek letters. No deep learning framework knowledge required.