Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。Reinforcement

发布时间: 2025-03-20 03:30:28

1分

数据加载中

关注推特

收听电报

2

1

0

Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。
但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。
分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。
Reinforcement
时政
( twitter.com )

20小时前由马东锡 NLP 🇸🇪 提交

Large Reasoning Model时代, 几乎等于Reinforcement Learning + LLM的时代。

但RL专业性非常强，去参加ML的会议时，专门做RL的研究员都现场拿着笔纸推算数学公式，掌握起来学习难度较高。

分享一本RL的入门教材，从RL基础MDP，PPO，直到跟LLM结合，如RLHF，都有讲解，深入浅出。

Reinforcement Learning: An Overview：
https://t.co/rjYSpOtbJl

点击图片查看原图

Markdown支持

评论加载中...

您可能感兴趣的：更多

1

2

1

1

🍓Marco-o1! Newly Open-Sourced o1: Towards Large Reasoning Models for Open-Ended Solutions.
🎯 Built and released a CoT dataset to activate LLMs' reasoning abilities.
💡 Integrated LLMs with MCTS to expand the solution space.
🔬 Exploited action granularities in MCTS and
时政
( twitter.com)

1个月前 • Longyue Wang • -- 点击 0 评论

2

2

1

1

Btw, the chain of thought in the "thinking" mode for Grok 3 is completely open. No summarizers or obfuscation. This is really important and the reasoning process is often fascinating!
btc
( twitter.com)

27天前 • Keiran Paster • -- 点击 0 评论

3

2

1

1

大语言模型 post-training 的变迁，从 Large Language Model (LLM) 到 Large Reasoning Model (LRM)
本周推荐论文：POST-TRAINING OF LARGE LANGUAGE MODELS
Post-training，本质是在做一件事，即如何运用 LLM 的 pretrained knowledge 来解决实际任务，具体的方法如 supervised
时政
( twitter.com)

3天前 • 马东锡 NLP 🇸🇪 • -- 点击 0 评论

4

2

1

1

Grok 3 might be the best base LLM for real-world physics!
Prompt: "write a python script of a ball bouncing inside a spinning tesseract".
There is no "thinking" or "big brain" mode enabled, it's just the base model. I'm very interested in trying their reasoning models.
btc
( twitter.com)

1个月前 • Yuchen Jin • -- 点击 • 下载视频 0 评论

00:00:08

5

2

1

1

🚨 #BREAKING: Washington Post Editor-at-Large Robert Kagan has RESIGNED after owner Jeff Bezos BARRED the endorsement of Kamala Harris
The leftist media is in TOTAL crisis mode! 🤣
This comes just days after fellow leftist paper Los Angeles Times also refused to endorse a
时政
( twitter.com)

4个月前 • Nick Sortor • -- 点击 0 评论

6

2

1

1

Reasoning from first principles is a superpower
btc
( twitter.com)

1个月前 • Elon Musk • -- 点击 • 下载视频 0 评论

00:00:45

7

2

1

1

As usual, excellent reasoning and judgment from
btc
( twitter.com)

昨天 • Elon Musk • -- 点击 • 下载视频 0 评论

00:17:33

8

2

1

1

Why O3-mini is reasoning in Chinese 🥹
时政
( twitter.com)

1个月前 • Vikhyat Rana • -- 点击 0 评论

9

3

2

2

A big jump in math/reasoning for our coding benchmark 🤯
时政
( livecodebench.github.io)

11个月前 • Wen-Ding Li • -- 点击 0 评论

10

2

1

1

Camouflage mode activated.
有趣
( twitter.com)

2个月前 • Figen • -- 点击 • 下载视频 0 评论

00:00:52

0.16274 Second , Gzip Enable.本网所有言论均来自网络，不代表本网站立场。联系方式: admin@bad.news

©2012.11.21 bad.news All rights reserved. 社区自动运营第 -- 年零 -- 天
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

关注推特