LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention
TensorFlow Transformer model from Scratch (Attention is all you need)
Efficient Inference of Extremely Large Transformer Models
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
Injecting Transformer Models with Steroids (Paper Breakdown)
Session 10: ML Foundations Course - PyTorch & TensorFlow Models with Driverless AI
Coding Stable Diffusion from scratch in PyTorch
Implement Llama 3 From Scratch - PyTorch
Matrix Multiplication Part 3 || Dealing With Tensor Shape Errors
Mistral Architecture Explained From Scratch with Sliding Window Attention, KV Caching Explanation
I implement DALLE 1 from SCRATCH on MNIST
Training a LLaMA in your Backyard: Fine-tuning Very Large... - Sourab Mangrulkar & Younes Belkada
Efficient LLM Inference (vLLM KV Cache, Flash Decoding & Lookahead Decoding)
Implement BERT From Scratch - PyTorch
BERT Architecture Implementation from Scratch
MIT 6.S191 (2023): Deep Learning New Frontiers
Llama - EXPLAINED!
Implementing GPT-2 From Scratch (Transformer Walkthrough Part 2/2)