【Hacker News搬运】DBRX：一种新的开放LLM

hackernews

Title: DBRX: A new open LLM

DBRX：一种新的开放LLM

Text:

Url: https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

文章介绍了由 Databricks 开发的一种新型的开源语言模型 DBRX，该模型在编程和数学方面表现出色，超过了 GPT-3.5，并且与 Gemini 1.0 Pro 竞争。DBRX 使用了细粒度的混合专家（MoE）架构，这使得训练和推理性能得到了提高。DBRX 可以通过 API 供 Databricks 客户使用，并且可以从头开始预训练或继续使用他们的检查点进行训练。该模型还将被整合到 Databricks 的人工智能产品中。基础模型和微调模型的权重在 Hugging Face 上以开源许可证发布。

Post by: jasondavies

Comments:

djoldman: Model card for base: <a href="https://huggingface.co/databricks/dbrx-base" rel="nofollow">https://huggingface.co/databricks/dbrx-base</a>> The model requires ~264GB of RAMI'm wondering when everyone will transition from tracking parameter count vs evaluation metric to (total gpu RAM + total CPU RAM) vs evaluation metric.For example, a 7B parameter model using float32s will almost certainly outperform a 7B model using float4s.Additionally, all the examples of quantizing recently released superior models to fit on one GPU doesnt mean the quantized model is a "win." The quantized model is a different model, you need to rerun the metrics.

djoldman: 底座的模型卡：<a href=“https://；&#x2F；huggingface.co&x2F；databricks&#x2F！dbrx-base”rel=“nofollow”>https://&#x2F；拥抱脸.co；databricks；dbrx基</a>&gt；该型号需要~264GB的RAM＜p＞I×；我想知道什么时候每个人都会从跟踪参数计数与评估度量转换为（总gpu RAM+总CPU RAM）与评估度量 例如，使用float32的7B参数模型几乎肯定会优于使用float4的7B模型 此外，量化最近发布的高级模型以适合一个GPU的所有例子并不意味着量化的模型是“一个”；赢quot；量化模型是不同的模型，您需要重新运行度量。

hintymad: Just curious, what business benefit will Databricks get by spending potentially millions of dollars on an open LLM?

hintymad: 很好奇，Databricks在一个开放的LLM上花费数百万美元，会带来什么商业利益？

briandw: Worse than the chart crime of truncating the y axis is putting LLaMa2's Human Eval scores on there and not comparing it to Code Llama Instruct 70b. DBRX still beats Code Llama Instruct's 67.8 but not by that much.

briandw: 比截断y轴的图表犯罪更糟糕的是；s Human Eval评分在那里，而不是将其与Code Llama Instruction 70b进行比较。DBRX仍然胜过Code Llama Instruction；s 67.8，但差距不大。

underlines: Waiting for Mixed Quantization with MQQ and MoE Offloading [1]. With that I was able to run Mistral 8x7B on my 10 GB VRAM rtx3080... This should work for DBRX and should shave off a ton of VRAM requirement.1. <a href="https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-file">https://github.com/dvmazur/mixtral-offloading?tab=readme-ov-...</a>

underlines: 等待MQQ和MoE卸载的混合量化[1]。有了这一点，我就可以在我的10 GB VRAM rtx3080上运行Mistral 8x7B……这应该适用于DBRX，并且应该可以减少大量的VRAM要求 1<a href=“https://；&#x2F；github.com&#x2F！dvmazur&#x2F：mixtral offloading？tab=readme ov file”>https://&#x2F；github.com&#x2F；dvmazur&#x2F；混合卸载？tab=自述文件</一

XCSme: I am planning to buy a new GPU.If the GPU has 16GB of VRAM, and the model is 70GB, can it still run well?
Also, does it run considerably better than on a GPU with 12GB of VRAM?I run Ollama locally, mixtral works well (7B, 3.4GB) on a 1080ti, but the 24.6GB version is a bit slow (still usable, but has a noticeable start-up time).

XCSme: 我计划买一个新的GPU 如果GPU有16GB的VRAM，型号是70GB，它还能正常运行吗？此外，它是否比具有12GB VRAM的GPU运行得更好 我在本地运行Ollama，mixtral在1080ti上运行良好（7B，3.4GB），但24.6GB版本有点慢（仍然可用，但有明显的启动时间）。