R1 Zero 的秘密 OpenAI 一定知道但是不说，DeepSeek 捅破后已经有好几个复刻项目了。这个项目仅用Math8k 数据集复刻R1，同样发现 test-time rl scaling law. 效果好于之前论文。确实就是一层窗户纸，反过来看很简单，很第一性，很合理。但是能去这么做的，很少。

发布时间: 2025-01-26 03:10:13

1分

数据加载中

关注推特

收听电报

2

1

0

R1 Zero 的秘密 OpenAI 一定知道但是不说，DeepSeek 捅破后已经有好几个复刻项目了。
这个项目仅用Math8k 数据集复刻R1，同样发现 test-time rl scaling law. 效果好于之前论文。
确实就是一层窗户纸，反过来看很简单，很第一性，很合理。但是能去这么做的，很少。
时政
( twitter.com )

9天前由九原客提交

点击图片查看原图

Markdown支持

评论加载中...

您可能感兴趣的：更多

1

2

1

1

这就是为什么 DeepSeek R1-Zero 其实在传统任务（如语言、助人性、无害性等）上表现不佳。
最终，他们需要使用一个基于 V3（甚至是 ChatGPT）生成的监督数据集结合多阶段强化学习（RL）来抵消这种效果。
因此，将 DeepSeek R1 称为完全无监督并不公平。
That's why DeepSeek R1-zero doesn't
时政
( twitter.com)

9天前 • 勃勃OC • -- 点击 0 评论

2

2

1

1

网友曝光OpenAI秘密项目Feather：简化AI应用开发
大陆资讯
( www.geekpark.net)

11个月前 • 孤独寂寞冷 • -- 点击 0 评论

3

2

1

1

53页PDF广泛流传，核心员工相继离职，OpenAI到底有什么秘密？
大陆资讯
( www.jiqizhixin.com)

11个月前 • 道知不 • -- 点击 0 评论

4

2

1

1

我之前说OpenAI>Deepseek，所以追赶OpenAI的临门一脚到底差在哪里？先说结论，我认为是超高质量的RLHF (Reinforcement Learning from Human Feedback)，也就是人类反馈
Deepseek-R1这次的训练，仅利用了rule-based outcome
时政
( twitter.com)

7天前 • JundeWu • -- 点击 0 评论

5

2

1

1

推荐阅读：《万字长文深度解析Sora的核心技术，解密OpenAI掌控时空的秘密武器》
by
转自作者微博：我仔细翻阅了Sora引用的论文，仔细拼凑出了Sora训练的核心秘密——时空图像块（Space…
IT技术
( twitter.com)

11个月前 • 宝玉 • -- 点击 0 评论

6

2

1

1

有用户反映，询问DeepSeek属于哪个openai模型。开启深度思考R1模式后，DeepSeek回答自己是基于GPT-3.5的增强版模型即GPT3.5turbo
时政
( twitter.com)

4天前 • 李老师不是你老师 • -- 点击 0 评论

7

2

1

1

下图是一位墙内人士对DeepSeek的评论，其中提到的内控密码值得引起注意，因此我不打算安装使用deepseek r1。
时政
( twitter.com)

6天前 • 韩连潮 • -- 点击 0 评论

8

3

2

2

Sora为何出自OpenAI？一线员工作息时间线揭秘：我们疯狂地卷
大陆资讯
( www.jiqizhixin.com)

11个月前 • 观摩团 • -- 点击 0 评论

9

2

1

1

OpenAI o3-mini is a good model, but DeepSeek r1 is similar performance, still cheaper, and reveals its reasoning.
Better models will come (can't wait for o3pro), but the "DeepSeek moment" is real. I think it will still be remembered 5 years from now as a pivotal event in tech
时政
( twitter.com)

3天前 • Lex Fridman • -- 点击 0 评论

0.09471 Second , Gzip Enable.本网所有言论均来自网络，不代表本网站立场。联系方式: admin@bad.news

©2012.11.21 bad.news All rights reserved. 社区自动运营第 -- 年零 -- 天
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

关注推特