結果 : reinforcement learning in language models
11:29

Reinforcement Learning from Human Feedback (RLHF) Explained

IBM Technology
65,175 回視聴 - 1 年前
18:02

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

StatQuest with Josh Starmer
37,308 回視聴 - 5 か月前
15:34

LLMの予想外の現実世界の啓示

bycloud
139,648 回視聴 - 4 か月前
59:31

Early stages of the reinforcement learning era of language models

Nathan Lambert
4,966 回視聴 - 7 か月前
23:20

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Richard Aragon
619 回視聴 - 4 か月前
40:50

Functional alignment of protein language models via reinforcement learning

ML for protein engineering seminar series
854 回視聴 - 2 か月前
4:10

強化学習はひどい - アンドレイ・カルパシー

Dwarkesh Clips、Dwarkesh Patel
50,683 回視聴 - 6 日前
8:37

Reinforcement learning (RL) enhanced large language models (LLMs), exploring RL techniques

AI Podcast Series. Byte Goose AI.
22 回視聴 - 2 か月前
30:30

🔵 Want better RAG results? Optimize your Data

SAP Developers
193 回視聴 - 2 日前 に配信済み

-
51:06

小さな LM を微調整して、自分で考え、パズルを解くようにする方法 (GRPO & RL!)

Neural Breakdown with AVB
17,436 回視聴 - 3 か月前
18:17

エージェントのための強化学習 - モルガン・スタンレーのML研究者、ウィル・ブラウン

AI Engineer
95,213 回視聴 - 7 か月前
7:58

Large Language Models explained briefly

3Blue1Brown
4,086,303 回視聴 - 11 か月前

-
50:53

Lecture 05 • Reinforcement Learning for Language Models

Meridian Cambridge
364 回視聴 - 5 か月前
38:24

Proximal Policy Optimization (PPO) - How to train Large Language Models

Serrano.Academy
70,270 回視聴 - 1 年前
1:20:32

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 16: Alignment - RL 1

Stanford Online
16,977 回視聴 - 3 か月前
23:16

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

Julia Turc
27,230 回視聴 - 7 か月前
26:31

Optimizing Large Language Models with Reinforcement Learning-Based Prompts

LLMs Explained - Aggregate Intellect - AI.SCIENCE
1,632 回視聴 - 2 年前
1:07:09

Richard Sutton – Father of RL thinks LLMs are a dead end

Dwarkesh Patel
449,395 回視聴 - 4 週間前
53:51

How language model post-training is done today

Interconnects AI
11,272 回視聴 - 9 か月前
22:44

LLM Training & Reinforcement Learning from Google Engineer | SFT + RLHF | PPO vs GRPO vs DPO

Martin Is A Dad
8,584 回視聴 - 7 か月前