How to Build an LLM from Scratch | An Overview

2023/10/06 に公開

視聴回数 173,618 回

👉 Need help with AI? Book a call: https://calendly.com/shawhintalebi

This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.

More Resources:
👉 Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0
📰 Read more: https://towardsdatascience.com/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3

[1] BloombergGPT: https://arxiv.org/pdf/2303.17564.pdf
[2] Llama 2: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
[3] LLM Energy Costs: https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
[4] arXiv:2005.14165 [cs.CL]
[5] Falcon 180b Blog: https://huggingface.co/blog/falcon-180b
[6] arXiv:2101.00027 [cs.CL]
[7] Alpaca Repo: https://github.com/gururise/AlpacaDataCleaned
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[11] SentencePience: https://github.com/google/sentencepiece/tree/master
[12] Tokenizers Doc: https://huggingface.co/docs/tokenizers/quicktour
[13] arXiv:1706.03762 [cs.CL]
[14] Andrej Karpathy Lecture: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=5307s
[15] Hugging Face NLP Course: https://huggingface.co/learn/nlp-course/chapter1/7?fw=pt
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[22] Trained with Mixed Precision Nvidia: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html
[23] DeepSpeed Doc: https://www.deepspeed.ai/training/
[24] https://paperswithcode.com/method/weight-decay
[25] https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]
[31] https://huggingface.co/blog/evaluating-mmlu-leaderboard
[32] https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf

--
Homepage: https://shawhintalebi.com/

Socials
https://medium.com/@shawhin
https://www.linkedin.com/in/shawhintalebi/
https://twitter.com/ShawhinT
https://www.instagram.com/shawhintalebi/

The Data Entrepreneurs
🎥 YouTube: https://www.youtube.com/@TheDataEntrepreneurs
👉 Discord: https://discord.gg/RSqZbF9ygh
📰 Medium: https://medium.com/the-data-entrepreneurs
📅 Events: https://lu.ma/tde
🗞️ Newsletter: https://the-data-entrepreneurs.ck.page/profile

Support ❤️
https://www.buymeacoffee.com/shawhint

Intro - 0:00
How much does it cost? - 1:30
4 Key Steps - 3:55
Step 1: Data Curation - 4:19
1.1: Data Sources - 5:31
1.2: Data Diversity - 7:45
1.3: Data Preparation - 9:06
Step 2: Model Architecture (Transformers) - 13:17
2.1: 3 Types of Transformers - 15:13
2.2: Other Design Choices - 18:27
2.3: How big do I make it? - 22:45
Step 3: Training at Scale - 24:20
3.1: Training Stability - 26:52
3.2: Hyperparameters - 28:06
Step 4: Evaluation - 29:14
4.1: Multiple-choice Tasks - 30:22
4.2: Open-ended Tasks - 32:59
What's next? - 34:31