【Hacker News搬运】Jamba：基于生产级Mamba的人工智能模型

hackernews

Title: Jamba: Production-grade Mamba-based AI model

Jamba：基于生产级Mamba的人工智能模型

Text:

Url: https://www.maginative.com/article/ai21-labs-unveils-jamba-the-first-production-grade-mamba-based-ai-model/

很抱歉，尝试使用webscraper抓取指定URL时遇到了连接超时的问题。因此，我无法直接提供该页面的内容摘要。不过，如果您能提供该页面的中文翻译或者简短描述，我可以帮助您进行分析和总结。

Post by: bubblehack3r

Comments:

smusamashah: There was a recent thread on explaining Mamba <a href="https://news.ycombinator.com/item?id=39501982">https://news.ycombinator.com/item?id=39501982</a> (<a href="https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html" rel="nofollow">https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html</a>)<p>There was another one on the same thing, probably better <a href="https://news.ycombinator.com/item?id=39482428">https://news.ycombinator.com/item?id=39482428</a> (<a href="https://jackcook.com/2024/02/23/mamba.html" rel="nofollow">https://jackcook.com/2024/02/23/mamba.html</a>)

smusamashah: 最近有一个关于解释Mamba的帖子<a href=“https://；&#x2F；news.ycombinator.com&#x2F！item？id=39501982”>https://&#x2F；news.ycombinator.com&#x2F；项目id=39501982</a>可能更好<a href=“https://；&#x2F；news.ycombinator.com&#x2F！item？id=39482428”>https://&#x2F；news.ycombinator.com&#x2F；项目id=39482428</a>

eigenvalue: Has anyone gotten this to work in linux using 1 or 2 4090s? I get stuck on "Loading checkpoint shards: 71%" and then it bails. But weirdly nvidia-smi shows plenty of VRAM available. My machine has 256gb of RAM so I don't think that's the problem either. Really excited to try this one.

eigenvalue: 有人使用1或2 4090在linux中实现了这一点吗？我被困在“；加载检查点碎片：71%“；然后它就跳起来了。但奇怪的是，nvidia smi显示了大量的VRAM可用。我的机器有256gb的RAM，所以我没有；Don’我不认为；这也是问题所在。尝试这个真的很兴奋。

a_wild_dandan: To those curious about the tradeoffs between transformer and state space model layers, I highly recommend Sasha Rush's video on it: <a href="https://www.youtube.com/watch?v=dKJEpOtVgXc" rel="nofollow">https://www.youtube.com/watch?v=dKJEpOtVgXc</a>

a_wild_dandan: 对于那些对变换器和状态空间模型层之间的权衡感到好奇的人，我强烈推荐Sasha Rush；上面的视频：<a href=“https:&#x2F；&#x2F！www.youtube.com&#x2F：观看？v=dKJEpOtVgXc”rel=“nofollow”>https:&#x20F&#x2F；www.youtube.com&#x2F；看v=dKJEpOtVgXc</a>

Reubend: It's great to see a full production level model using Mamba. But when it comes to long context window benchmarks, I'd love to see performance as well as throughput. I was under the impressions that Mamba has huge increases in throughput at the cost of modest losses in accuracy when using long contexts.

Reubend: 它；很高兴看到一个完整的生产级别模型使用曼巴。但是当涉及到长上下文窗口基准时；我希望看到性能和吞吐量。我的印象是，当使用长上下文时，Mamba以适度的准确性损失为代价，在吞吐量上有了巨大的提高。

skybrian: > Jamba boasts an extensive context window of 256K tokens, equivalent to around 210 pages of text, while fitting up to 140K tokens on a single 80GB GPU.<p>I realize this is a big improvement, but it’s striking how inefficient LLM’s are, that you need 80GB of GPU memory to analyze less than 1 megabyte of data. That’s a lot of bloat! Hopefully there’s a lot of room for algorithmic improvements.

skybrian: &gt；Jamba拥有256K代币的广泛上下文窗口，相当于大约210页的文本，同时在单个80GB GPU上最多可容纳140K代币<p> 我意识到这是一个很大的改进，但令人惊讶的是LLM的效率有多低，你需要80GB的GPU内存来分析不到1兆字节的数据。太夸张了！希望算法还有很大的改进空间。