【Hacker News搬运】AMD GPU推理

hackernews

Title: AMD GPU Inference

AMD GPU推理

Text:

Url: https://github.com/slashml/amd_inference

由于我是一个AI，我无法直接访问外部链接或执行代码。但是，我可以根据提供的GitHub链接信息来分析该项目。

GitHub链接：`https://github.com/slashml/amd_inference`

项目名称：`amd_inference`

项目概述：
`amd_inference` 可能是一个用于在AMD（Advanced Micro Devices，即超威半导体公司）硬件上执行推理任务的机器学习项目。从链接中我们可以推测以下信息：

1. **技术栈**：项目可能使用了AMD的GPU或CPU进行深度学习模型的推理。这可能涉及到CUDA、cuDNN、OpenCL等AMD相关的技术。

2. **目标用户**：该项目可能针对希望优化深度学习模型推理性能的开发者或研究人员。它可能提供了优化后的模型、代码或库，以便在AMD硬件上获得更好的性能。

3. **内容**：项目可能包含了以下内容：
   - 模型优化：针对AMD硬件优化的深度学习模型。
   - 推理引擎：可能是一个可以部署在AMD设备上的推理引擎。
   - 性能分析工具：用于评估模型在AMD硬件上的性能。
   - 文档和示例：如何使用项目中的工具和库的指南。

总结：
`amd_inference` 项目可能是一个专注于在AMD硬件上优化和执行深度学习模型推理的GitHub项目。如果你对深度学习模型在AMD设备上的性能优化感兴趣，这个项目可能是一个很好的资源。你可以通过访问GitHub链接来查看项目的具体内容，包括代码、文档和示例，以及如何贡献或使用该项目。如果你需要将非中文内容翻译成中文，你可能需要使用在线翻译工具或服务，因为我的功能不包括直接翻译外部链接上的内容。

Post by: fazkan

Comments:

lhl: For inference, if you have a supported card (or probably architecture if you are on Linux and can use HSA_OVERRIDE_GFX_VERSION), then you can probably run anything with (upstream) PyTorch and transformers. Also, compiling llama.cpp is has been pretty trouble-free for me for at least a year.(If you are on Windows, there is usually a win-hip binary of llama.cpp in the project's releases or if things totally refuse to work, you can use the Vulkan build as a (less performant) fallback).Having more options can't hurt, but ROCm 5.4.2 is almost 2 years old, and things have come a long way since then, so I'm curious about this being published freshly today, in October 2024.BTW, I recently went through and updated my compatibility doc (focused on RDNA3) w/ ROCm 6.2 for those interested. A lot has changed just in the past few months (upstream bitsandbytes, upstream xformers, and Triton-based Flash Attention): <a href="https://llm-tracker.info/howto/AMD-GPUs" rel="nofollow">https://llm-tracker.info/howto/AMD-GPUs</a>

lhl: 为了推断，如果你有一个受支持的卡（或者如果你在Linux上并且可以使用HSA_OVERRIDE_GFX_VERSION，那么你可能可以使用（上游）PyTorch和转换器运行任何东西。此外，至少一年来，编译llama.cpp对我来说是相当顺利的 （如果您使用的是Windows，项目版本中通常会有llama.cpp的流行二进制文件，或者如果事情完全不起作用，您可以使用Vulkan构建作为（性能较低的）回退） 拥有更多选择可以；没有伤害，但ROCm 5.4.2已经快2岁了，从那以后，事情已经发生了很大的变化，所以我；我很好奇这篇文章是在2024年10月的今天刚刚发表的。顺便说一句，我最近浏览并更新了我的兼容性文档（重点是RDNA3）；ROCm 6.2适合感兴趣的人。在过去的几个月里，发生了很多变化（上游位和字节、上游xformers和基于Triton的Flash Attention）：<A href=“https:”llm tracker.info“如何”AMD GPU“rel=”nofollow“>https:”&#x2F；llm tracker.info；如何；AMD GPU</a>

tcdent: The rise of generated slop ml libraries is staggering.This library is 50% print statements. And where it does branch, it doesn't even need to.Defines two environment variables and sets two flags on torch.

tcdent: 生成的slop-ml库的增长令人震惊。这个库有50%是打印语句。在它分支的地方，它不会；甚至不需要。定义两个环境变量并在torch上设置两个标志。

slavik81: On Ubuntu 24.04 (and Debian Unstable¹), the OS-provided packages should be able to get llama.cpp running on ROCm on just about any discrete AMD GPU from Vega onwards²³⁴. No docker or HSA_OVERRIDE_GFX_VERSION required. The performance might not be ideal in every case⁵, but I've tested a wide variety of cards:<pre><code> # install dependencies
sudo apt -y update
sudo apt -y upgrade
sudo apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential

# ensure you have permissions by adding yourself to the video and render groups
sudo usermod -aG video,render $USER
# log out and then log back in to apply the group changes
# you can run `rocminfo` and look for your GPU in the output to check everything is working thus far

# download a model, build llama.cpp, and run it
wget https:&#x2F;&#x2F;huggingface.co&#x2F;TheBloke&#x2F;dolphin-2.2.1-mistral-7B-GGUF&#x2F;resolve&#x2F;main&#x2F;dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
git clone https:&#x2F;&#x2F;github.com&#x2F;ggerganov&#x2F;llama.cpp.git
cd llama.cpp
git checkout b3267
HIPCXX=clang-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_HIP_ARCHITECTURES=&quot;gfx803;gfx900;gfx906;gfx908;gfx90a;gfx1010;gfx1030;gfx1100;gfx1101;gfx1102&quot; -DCMAKE_BUILD_TYPE=Release
make -j16 -C build
build&#x2F;bin&#x2F;llama-cli -ngl 32 --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -m ..&#x2F;dolphin-2.2.1-mistral-7b.Q5_K_M.gguf --prompt &quot;Once upon a time&quot;

</code></pre>
I'd suggest RDNA 3, MI200 and MI300 users should probably use the AMD-provided ROCm packages for improved performance. Users that need PyTorch should also use the AMD-provided ROCm packages, as PyTorch has some dependencies that are not available from the system packages. Still, you can't beat the ease of installation or the compatibility with older hardware provided by the OS packages.¹ <a href="https://lists.debian.org/debian-ai/2024/07/msg00002.html" rel="nofollow">https://lists.debian.org/debian-ai/2024/07/msg00002.html</a>
² Not including MI300 because that released too close to the Ubuntu 24.04 launch.
³ Pre-Vega architectures might work, but have known bugs for some applications.
⁴ Vega and RDNA 2 APUs might work with Linux 6.10+ installed. I'm in the process of testing that.
⁵ The version of rocBLAS that comes with Ubuntu 24.04 is a bit old and therefore lacks some optimizations for RDNA 3. It's also missing some MI200 optimizations.

slavik81: 在Ubuntu 24.04（和Debian不稳定¹）上，操作系统提供的软件包应该能够让llama.cpp在几乎任何从Vega开始的独立AMD GPU上的ROCm上运行。不需要docker或HSA_OVERRIDE_GFX_VERSION。性能可能并非在所有情况下都是理想的，但我；我测试了各种各样的卡：<pre><code>#安装依赖关系sudo apt-y更新sudo apt-y升级sudo apt-y安装git wget hipcc libhipblas开发libocblas开发cmake build essential#通过将自己添加到视频和渲染组来确保您有权限sudo usermod-aG视频，渲染$USER#注销，然后重新登录以应用组更改#你可以运行rocminfo并在输出中查找你的GPU，以检查到目前为止一切正常#下载模型，构建llama.cpp并运行它wget https:&#x2F；huggingface.co；The Bloke；海豚-2.2.1英里-7B-GGUF；解决；main；海豚-2.2.1英里-7b。Q5_K_M.gguf？下载=真-O海豚-2.2.1英里-7b。Q5_K_M.ggufgit克隆https:#x2F&#x2F；github.com；格尔加诺夫；llama.pp.gitcd llama.cppgit结账b3267HIPCXX=clang-17 cmake-H.-Bbuild-DGGML_HIPBLAS=ON-DCMAKE_HIP_ARCHITECTURES=“；gfx803；gfx900；gfx906；gfx908；gfx90a；gfx1010；gfx1030；gfx1100；gfx1101；gfx1102“-DCMAKE_BUILD_TYPE=发布make-j16-C构建构建；bin；骆驼cli-nl 32-颜色-c 2048-温度0.7-重复性1.1-n-1-m&#x2F；海豚-2.2.1英里-7b。Q5_K_M.gguf——提示“；从前&quot；</code></pre>我；d建议RDNA 3、MI200和MI300用户可能应该使用AMD提供的ROCm包来提高性能。需要PyTorch的用户也应该使用AMD提供的ROCm包，因为PyTorch有一些系统包中没有的依赖项。尽管如此，您仍然可以；t击败了操作系统包提供的易于安装或与旧硬件的兼容性 ¹<a href=“https:&#x2F；lists.debian.org/ debian-ai&#x2F2024&#x2F,07&#msg00002.html”rel=“nofollow”>https:&quot&#x2F；lists.debian.org；debian ai；2024年；07■；msg00002.html</a>²不包括MI300，因为它的发布时间太接近Ubuntu 24.04的发布时间。³Pre-Vega架构可能有效，但某些应用程序存在已知错误。⁴ Vega和RDNA 2 APU可能在安装了Linux 6.10+的情况下工作。我；我正在测试。⁵ Ubuntu 24.04附带的rocBLAS版本有点旧，因此缺少对RDNA 3的一些优化。它；还缺少一些MI200优化。

a2128: It seems to use an old, 2 year old version of ROCm (5.4.2) which I'm doubtful would support my RX 7900 XTX. I personally found it easiest to just use the latest rocm/pytorch image and run what I need from there

a2128: 它似乎使用了一个2年前的旧版本的ROCm（5.4.2）；我怀疑它是否会支持我的RX 7900 XTX。我个人发现使用最新的“rocm”最简单；pytorch的图像，并从那里运行我需要的东西

rglullis: So, this is all I needed to add to NixOS workstation:<pre><code> hardware.graphics.enable = true;

 services.ollama = {
 enable = true;
 acceleration = &quot;rocm&quot;;
 environmentVariables = {
   ROC_ENABLE_PRE_VEGA = &quot;1&quot;;
   HSA_OVERRIDE_GFX_VERSION = &quot;11.0.0&quot;;
 };

};</code></pre>

rglullis: 所以，这就是我需要添加到NixOS工作站的所有内容：<pre><code>硬件例如graphs.enable=true；服务.骆驼={enable=true；加速度=“；rocm&quot；；环境变量={ROC_ENABLE_PREGA=“；1&quot；；HSA_OVERRIDE_GFX_VERSION=“；11.0.0“；；};};</代码></pre>