LLMの予想外の現実世界の啓示
What is Retrieval Augmented Generation (RAG) ? Simplified Explanation
🔵 Want better RAG results? Optimize your Data
Early stages of the reinforcement learning era of language models
Understanding LLMs for Code Generation
小さな LM を微調整して、自分で考え、パズルを解くようにする方法 (GRPO & RL!)
How to fine-tune LLMs for with Tunix
How language model post-training is done today
Reinforcement Learning with Large Datasets: Robotics, Image Generation, and LLMs
AI Practitioner Exam Bites #35: Fine-Tuning Methods for Optimized AI
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning
LLMの説明 | LLMとは
HRPO: RL for Hybrid Latent Reasoning
Hands-on 10: Large Language Model Alignment with Direct Preference Optimization
Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)
[UCLA RL-LLM] Chapter 0: Course outline and prologue
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills (ICML 2024)
ReVisual-R1: Staged MLLM Reasoning