【Hacker News搬运】GPU上的全尺寸文件系统加速[pdf]
-
Title: Full-scale file system acceleration on GPU [pdf]
GPU上的全尺寸文件系统加速[pdf]
Text:
Url: https://dl.gi.de/server/api/core/bitstreams/7c7a8830-fd81-4e56-8507-cd4809020660/content
很抱歉,我尝试使用webscraper工具抓取指定URL的内容时遇到了问题,因为该URL返回的内容类型是application/json,而webscraper目前不支持这种内容类型的抓取。如果您需要分析该URL返回的JSON内容,您可能需要使用其他工具或方法来获取和解析数据。如果您有其他请求或需要帮助,请告诉我。
Post by: west0n
Comments:
magicalhippo: Given that PCIe allows data to be piped directly from one device to another without going through the host CPU[1][2], I guess it might make sense to just have the GPU read blocks straight from the NVMe (or even NVMe-of[3]) rather than having the CPU do a lot of work.<p>edit: blind as a bat, says so right in the paper of course:<p><i>PMem is mapped directly to the GPU, and NVMe memory is accessed via Peer to Peer-DMA (P2PDMA)</i><p>[1]: <a href="https://nvmexpress.org/wp-content/uploads/Enabling-the-NVMe-CMB-and-PMR-Ecosystem.pdf" rel="nofollow">https://nvmexpress.org/wp-content/uploads/Enabling-the-NVMe-...</a><p>[2]: <a href="https://lwn.net/Articles/767281/" rel="nofollow">https://lwn.net/Articles/767281/</a><p>[3]: <a href="https://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabrics.pdf" rel="nofollow">https://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabr...</a>
magicalhippo: 考虑到PCIe允许数据直接从一个设备通过管道传输到另一个设备,而无需经过主机CPU[1][2],我想让GPU直接从NVMe(甚至[3]的NVMe)读取块可能是有意义的,而不是让CPU做很多工作。<p>edit:blind as bat,当然在论文中是这么说的:<p><I>PMem直接映射到GPU,NVMe内存通过对等DMA(P2PDMA)访问</I><p>[1]:<a href=“https://;/;nvmexpress.org/;wp内容ȏ;上传/:启用NVMe CMB和PMR生态系统.pdf“rel=”nofollow“>https:///;nvmexpress.org/;wp内容;上传;正在启用NVMe-</a> <p>[2]:<a href=“https://;/;lwn.net#xx2F;文章#xx2F:767281#xx2F”rel=“nofollow”>https:///;lwn.net;文章/;767281</a> <p>[3]:<a href=“https://;/;www.nvexpress.org/:wp-contence/,上传/!NVMe_Over_Fabrics.pdf”rel=“nofollow”>https://;www.nvexpress.org/;wp内容;上传;NVMe_Over_Fabr</一
west0n: According to this paper, GPU4FS is a file system that can run on the GPU and be accessed by applications. Since GPUs cannot make system calls, GPU4FS uses shared video memory (VRAM) and a parallel queue implementation. Applications running on the GPU can utilize GPU4FS after modifying their code, eliminating the need for a CPU-side file system when accessing the file system. The experiments are done on Optane memory.<p>It would be interesting to know if this approach could optimize the performance of training and inference for large models.
west0n: 根据本文,GPU4FS是一个可以在GPU上运行并由应用程序访问的文件系统。由于GPU不能进行系统调用,GPU4FS使用共享视频存储器(VRAM)和并行队列实现。在GPU上运行的应用程序可以在修改其代码后使用GPU4FS,从而在访问文件系统时无需CPU端文件系统。实验是在Optane存储器上进行的<p> 我们很想知道这种方法是否可以优化大型模型的训练和推理性能。
ec109685: Interesting they would discuss system call overhead of opening a file, reading from it and closing it. Seems like in almost all cases the open and close calls would be overwhelmed by the other operations.
ec109685: 有趣的是,他们会讨论打开文件、读取文件和关闭文件的系统调用开销。似乎在几乎所有情况下,打开和关闭调用都会被其他操作淹没。