【Hacker News搬运】Llm.c–在简单、纯c/CUDA中进行Llm培训

hackernews

Title: Llm.c – LLM training in simple, pure C/CUDA

Llm.c–在简单、纯c/CUDA中进行Llm培训

Text:

Url: https://github.com/karpathy/llm.c

很抱歉，尝试使用 webscraper 工具抓取指定 URL 时遇到了连接超时的问题。由于我无法直接访问互联网，因此无法提供实时的网页内容分析。如果您能提供网页内容的文本或摘要，我可以帮助您进行分析和总结。如果您需要针对特定内容的帮助，请提供相关信息。

Post by: tosh

Comments:

patrick-fitz: <a href="https://twitter.com/karpathy/status/1777427944971083809" rel="nofollow">https://twitter.com/karpathy/status/1777427944971083809</a>> And once this is a in a bit more stable state: videos on building this in more detail and from scratch.Looking forward to watching the videos.

patrick-fitz: <a href=“https:#x2F；&#x2F；twitter.com&#x2F！karpathy&#x2F，status&#x2F：1777427944971083809”rel=“nofollow”>https://&#x2F；twitter；karpathy；status；1777427944971083809</a>&gt；一旦这是一个更稳定的状态：关于从头开始更详细地构建它的视频 期待观看视频。

convexstrictly: Candle is a minimalist ML framework for Rust with a focus on performance (including GPU support) and ease of use<a href="https://github.com/huggingface/candle">https://github.com/huggingface/candle</a>

convexstrictly: Candle是Rust的一个极简ML框架，专注于性能（包括GPU支持）和易用性<a href=“https://；&#x2F；github.com&#x2F！huggingface&#x2F”>https://&#x2F；github.com&#x2F；拥抱脸；蜡烛</a>

yinser: I've seen his nano GPT implemented using JAX, now we have C/CUDA. I'd love to see if nano GPT could be doable in Mojo. I took a stab at a Mojo conversion of his Wavenet project (Andrej's zero to hero course) and I gotta say... python has so many nice features lol. Stating the obvious I know but what you see done in 6 lines of python takes so much more work in other languages.

yinser: I-；我们已经看到使用JAX实现了他的纳米GPT，现在我们有了；库达。I-；我很想看看纳米GPT在Mojo中是否可行。我尝试了一下Mojo对他的Wavenet项目的转换（Andrej的零到英雄课程），我不得不说。。。python有很多不错的功能，哈哈。我知道这一点很明显，但你在python的6行中看到的内容在其他语言中需要做更多的工作。

qwertox: > direct CUDA implementation, which will be significantly faster and probably come close to PyTorch.It almost hurts, to read that PyTorch is faster.But then again, with these GPU-RAM-prices, let's see how it speeds up the CPU.We really need SO-DIMM slots on the RTX series (or AMD/Intel equivalent) so that we can expand the RAM as we need it to. Is there a technical problem to it?

qwertox: &gt；直接实现CUDA，这将明显更快，并且可能接近PyTorch 读到PyTorch更快，几乎让人心痛 但话说回来，对于这些GPU RAM价格；让我们看看它是如何提高CPU速度的 我们真的需要RTX系列上的SO-DIMM插槽（或AMD x2F Intel等效产品），这样我们就可以根据需要扩展RAM。它有技术问题吗？

flockonus: Question, apologize if slightly off-topic, it's something I'd like to use this project for: Is there an example of how to train GPT-2 on time series, in particular with covariates?As my understanding of LLM goes at a basic level it's predicting the next token from previous tokens, which sounds directionally similar to time series (perhaps letting aside periodicity).

flockonus: 问题，如果有点跑题，请道歉；这是我喜欢的东西；我想把这个项目用于：有没有关于如何在时间序列上训练GPT-2的例子，特别是使用协变量 由于我对LLM的理解处于基本水平；s从以前的令牌中预测下一个令牌，这在方向上听起来类似于时间序列（也许不考虑周期性）。