Mechanistic Interpretability - Stella Biderman | Stanford MLSys #70

2023/01/31 に公開

視聴回数 4,840 回

Episode 70 of the Stanford MLSys Seminar “Foundation Models Limited Series”!

Speaker: Stella Biderman

Title: Mechanistic Interpretability – Reverse Engineering Learned Algorithms from Transformers

Abstract:
Transformers are exceptionally powerful technologies that have quickly gone from smashing NLP benchmarks to being one of, if not the premier ML technology in a wide array of fields. Given their growing role in technological pipelines and society writ large, understanding how and why they work is a pressing issue. In this talk I give an overview of research on Mechanistic Interpretability, a field of work that has had substantial success picking apart transformers and understanding the algorithms that trained models use to reason. Topics covered include: the algorithm that toy LLMs can use to perform arithmetic accurately; how real-world LLMs do object identification; and how AlphaFold learns 2D projections of structures and then inflates them over time. Time permitting, I hope to discuss recent discoveries at EleutherAI currently under review for publication.

Bio:
Stella Biderman is the head of research at EleutherAI, an online research lab that has revolutionized open access to large language models. She is best known for her work on democratizing LLMs, especially the GPT-Neo-2.7B, GPT-NeoX-20B, and BLOOM-176B models, all of which where the largest publicly available GPT-3-style LLMs in the world at time of release. Her work on publicly available datasets and evaluation frameworks has become an integral part of training foundation models in NLP. Her interest in open sourcing NLP models is primarily driven by her passion for interpretability research, a topic she has increasingly focused on as access to LLMs has increased. She proudly does not possess a PhD.