【Hacker News搬运】Jamba:基于生产级Mamba的人工智能模型
-
Title: Jamba: Production-grade Mamba-based AI model
Jamba:基于生产级Mamba的人工智能模型
Text:
Url: https://www.maginative.com/article/ai21-labs-unveils-jamba-the-first-production-grade-mamba-based-ai-model/
很抱歉,尝试使用webscraper抓取指定URL时遇到了连接超时的问题。因此,我无法直接提供该页面的内容摘要。不过,如果您能提供该页面的中文翻译或者简短描述,我可以帮助您进行分析和总结。
Post by: bubblehack3r
Comments:
smusamashah: There was a recent thread on explaining Mamba <a href="https://news.ycombinator.com/item?id=39501982">https://news.ycombinator.com/item?id=39501982</a> (<a href="https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html" rel="nofollow">https://www.kolaayonrinde.com/blog/2024/02/11/mamba.html</a>)<p>There was another one on the same thing, probably better <a href="https://news.ycombinator.com/item?id=39482428">https://news.ycombinator.com/item?id=39482428</a> (<a href="https://jackcook.com/2024/02/23/mamba.html" rel="nofollow">https://jackcook.com/2024/02/23/mamba.html</a>)
smusamashah: 最近有一个关于解释Mamba的帖子<a href=“https://;/;news.ycombinator.com/!item?id=39501982”>https:///;news.ycombinator.com/;项目id=39501982</a>可能更好<a href=“https://;/;news.ycombinator.com/!item?id=39482428”>https:///;news.ycombinator.com/;项目id=39482428</a>
eigenvalue: Has anyone gotten this to work in linux using 1 or 2 4090s? I get stuck on "Loading checkpoint shards: 71%" and then it bails. But weirdly nvidia-smi shows plenty of VRAM available. My machine has 256gb of RAM so I don't think that's the problem either. Really excited to try this one.
eigenvalue: 有人使用1或2 4090在linux中实现了这一点吗?我被困在“;加载检查点碎片:71%“;然后它就跳起来了。但奇怪的是,nvidia smi显示了大量的VRAM可用。我的机器有256gb的RAM,所以我没有;Don’我不认为;这也是问题所在。尝试这个真的很兴奋。
a_wild_dandan: To those curious about the tradeoffs between transformer and state space model layers, I highly recommend Sasha Rush's video on it: <a href="https://www.youtube.com/watch?v=dKJEpOtVgXc" rel="nofollow">https://www.youtube.com/watch?v=dKJEpOtVgXc</a>
a_wild_dandan: 对于那些对变换器和状态空间模型层之间的权衡感到好奇的人,我强烈推荐Sasha Rush;上面的视频:<a href=“https:/;/!www.youtube.com/:观看?v=dKJEpOtVgXc”rel=“nofollow”>https:ȏ/;www.youtube.com/;看v=dKJEpOtVgXc</a>
Reubend: It's great to see a full production level model using Mamba. But when it comes to long context window benchmarks, I'd love to see performance as well as throughput. I was under the impressions that Mamba has huge increases in throughput at the cost of modest losses in accuracy when using long contexts.
Reubend: 它;很高兴看到一个完整的生产级别模型使用曼巴。但是当涉及到长上下文窗口基准时;我希望看到性能和吞吐量。我的印象是,当使用长上下文时,Mamba以适度的准确性损失为代价,在吞吐量上有了巨大的提高。
skybrian: > Jamba boasts an extensive context window of 256K tokens, equivalent to around 210 pages of text, while fitting up to 140K tokens on a single 80GB GPU.<p>I realize this is a big improvement, but it’s striking how inefficient LLM’s are, that you need 80GB of GPU memory to analyze less than 1 megabyte of data. That’s a lot of bloat! Hopefully there’s a lot of room for algorithmic improvements.
skybrian: >;Jamba拥有256K代币的广泛上下文窗口,相当于大约210页的文本,同时在单个80GB GPU上最多可容纳140K代币<p> 我意识到这是一个很大的改进,但令人惊讶的是LLM的效率有多低,你需要80GB的GPU内存来分析不到1兆字节的数据。太夸张了!希望算法还有很大的改进空间。