【Hacker News搬运】Lm.rs：Rust中无依赖性的最小CPU LLM推理

hackernews

Title: Lm.rs: Minimal CPU LLM inference in Rust with no dependency

Lm.rs：Rust中无依赖性的最小CPU LLM推理

Text:

Url: https://github.com/samuel-vitorino/lm.rs

很抱歉，作为一个文本和代码处理的AI，我无法直接访问或分析外部的网页链接，如你提供的GitHub链接。但是，我可以根据你提供的信息来解释这个链接可能指向的内容。

GitHub链接：https://github.com/samuel-vitorino/lm.rs

这个链接指向的是GitHub上的一个项目，由用户samuel-vitorino创建。基于项目名称 `lm.rs`，我们可以推测以下信息：

1. 项目名称中的 `lm` 可能代表 "language model"，即语言模型。
2. `.rs` 是Rust编程语言的文件扩展名，这表明该项目是用Rust语言编写的。

因此，这个项目很可能是用Rust语言实现的一个语言模型。以下是可能的项目内容：

- **语言模型实现**：该项目可能包含了一个Rust语言的库或者框架，用于构建、训练或使用语言模型。
- **开源项目**：作为GitHub上的项目，它很可能是开源的，这意味着任何人都可以查看、使用、修改或分发该项目的代码。
- **文档和示例**：项目可能包含文档和示例代码，帮助开发者理解如何使用这个语言模型库。

如果你需要对该项目的具体内容进行抓取和分析，你需要在GitHub上查看该项目的README文件、代码库、以及可能的相关文档。以下是你可以进行的一些步骤：

1. **访问GitHub项目页面**：点击链接访问GitHub页面。
2. **阅读README**：通常，项目的README文件会提供项目的简介、安装指南、使用示例以及贡献者的联系方式。
3. **查看代码库**：浏览项目的代码库，了解语言模型的架构和功能。
4. **运行示例**：尝试运行项目中的示例代码，以了解语言模型的基本用法。
5. **查阅文档**：如果项目提供了额外的文档，查阅这些文档以获取更深入的指导。

如果你需要将项目内容翻译成中文，你可能需要使用在线翻译工具或翻译服务，然后将翻译后的内容进行分析和总结。

Post by: littlestymaar

Comments:

simonw: This is impressive. I just ran the 1.2G llama3.2-1b-it-q80.lmrs on a M2 64GB MacBook and it felt speedy and used 1000% of CPU across 13 threads (according to Activity Monitor).<pre><code> cd /tmp
git clone https://github.com/samuel-vitorino/lm.rs
cd lm.rs
RUSTFLAGS="-C target-cpu=native" cargo build --release --bin chat
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/tokenizer.bin?download=true'
curl -LO 'https://huggingface.co/samuel-vitorino/Llama-3.2-1B-Instruct-Q8_0-LMRS/resolve/main/llama3.2-1b-it-q80.lmrs?download=true'
./target/release/chat --model llama3.2-1b-it-q80.lmrs</code></pre>

simonw: 这令人印象深刻。我刚刚在M2 64GB MacBook上运行了1.2G的llama3.2-1b-it-q80.lmrs，它感觉很快，在13个线程上使用了1000%的CPU（根据活动监视器） <pre><code>cd&#x2F；tmpgit克隆https:#x2F&#x2F；github.com；塞缪尔·维托里诺；lm.rscd lm.rsRUSTFLAGS=“-C目标cpu=本地“；货物制造-放行-仓位聊天curl-LO；https:&#x2F；huggingface.co；塞缪尔·维托里诺；Llama-3.2-1B-说明书-Q8_0-LMRS；解决；main；tokenizer.bin？download=true；curl-LO；https:&#x2F；huggingface.co；塞缪尔·维托里诺；Llama-3.2-1B-说明书-Q8_0-LMRS；解决；main；美洲驼3.2-1b-it-q80.lmrs？download=true；·；target；释放；聊天--型号llama3.2-1b-it-q80.lmrs</code></pre>

jll29: This is beautifully written, thanks for sharing.I could see myself using some of the source code in the classroom to explain
how transformers "really" work; code is more concrete/detailed than all those
pictures of attention heads etc.Two points of minor criticism/suggestions for improvement:- libraries should not print to stdout, as that output may detroy application output (imagine I want to use the library in a text editor to offer style checking). So best to write to a string buffer owned by a logging class instance associated with a lm.rs object.- Is it possible to do all this without "unsafe" without twisting one's arm? I see there are uses of "unsafe" e.g. to force data alignment in the model reader.Again, thanks and very impressive!

jll29: 这篇文章写得很好，谢谢分享。我可以看到自己在课堂上使用了一些源代码来解释变压器如何&quot；真的&quot；工作；代码更具体；比所有这些都详细注意头部等图片两点小批评；改进建议：-库不应打印到stdout，因为该输出可能会影响应用程序的输出（想象一下，我想在文本编辑器中使用该库来提供样式检查）。因此，最好写入与lm.rs对象关联的日志类实例所拥有的字符串缓冲区 -是否有可能在没有&quot；的情况下做到这一切；不安全&quot；而不会扭曲一个；你的手臂？我看到了&quot；的用途；不安全&quot；例如，强制模型读取器中的数据对齐 再次感谢，非常令人印象深刻！

nikolayasdf123: how does this compare to <a href="https://github.com/EricLBuehler/mistral.rs">https://github.com/EricLBuehler/mistral.rs</a> ?

nikolayasdf123: 这与<a href=“https:”github.com“EricLBuehler”mistral.rs“>https:”相比如何&#x2F；github.com；EricLBuehler；mistral.rs</a>？

J_Shelby_J: Neat.FYI I have a whole bunch of rust tools[0] for loading models and other LLM tasks. For example auto selecting the largest quant based on memory available, extracting a tokenizer from a gguf, prompting, etc. You could use this to remove some of the python dependencies you have.Currently to support llama.cpp, but this is pretty neat too. Any plans to support grammars?[0] <a href="https://github.com/ShelbyJenkins/llm_client">https://github.com/ShelbyJenkins/llm_client</a>

J_Shelby_J: 整洁 仅供参考，我有一大堆防锈工具[0]用于加载模型和其他LLM任务。例如，根据可用内存自动选择最大的数量，从gguf中提取标记器，提示等。您可以使用此功能删除一些python依赖项 目前支持llama.cpp，但这也很整洁。有支持语法的计划吗 [0]<a href=“https:&#x2F；github.com&#x2H；ShelbyJenkins&#x2M；llm_client”>https:&quot&#x2F；github.com；谢尔比·詹金斯；llm_客户端</a>

gip: Great! Did something similar some time ago [0] but the performance was underwhelming compared to C/C++ code running on CPU (which points to my lack of understanding of how to make Rust fast). Would be nice to have some benchmarks of the different Rust implementations.Implementing LLM inference should/could really become the new "hello world!" for serious programmers out there :)[0] <a href="https://github.com/gip/yllama.rs">https://github.com/gip/yllama.rs</a>

gip: 太棒了不久前做了类似的事情[0]，但与C#x2F相比，性能并不令人印象深刻；在CPU上运行的C++代码（这表明我对如何使Rust快速运行缺乏理解）。若能有一些不同Rust实现的基准测试就好了 实施LLM推理应该；真的可以成为新的&quot；你好，世界&“；对于严肃的程序员来说：）[0]<a href=“https:&#x2F；github.com&#x2G；yllama.rs”>https:&#x2F；github.com；gip&#x2F；yllama.rs</a>