【Hacker News搬运】系统程序员的垃圾收集(2023)
-
Title: Garbage collection for systems programmers (2023)
系统程序员的垃圾收集(2023)
Text:
Url: https://bitbashing.io/gc-for-systems-programmers.html
本文讨论了操作系统中垃圾回收(GC)的概念,特别是在系统编程中的应用。文章提到了读取频繁但更新少的数据场景,例如当前连接的USB设备列表,并介绍了读取复制更新(RCU)策略,这是一种用于无锁数据共享的技术。RCU允许读者在读取数据时不会被写者阻塞,同时确保数据更新时的原子性。 文章还探讨了垃圾回收在操作系统中的作用,反驳了关于垃圾回收速度慢、占用细粒度控制等常见误解。作者指出,现代垃圾回收提供了诸如堆内存压缩等优化,这些优化是传统内存管理方法所无法实现的。 此外,文章还讨论了内存管理的几个误区,例如程序员可以决定内存管理何时发生,以及调用free()函数可以将内存还给操作系统等。作者认为,这些观点都是错误的,并且在系统编程中,我们往往无法精确控制内存管理的时间点。 最后,作者强调了垃圾回收是系统编程工具箱中的一个重要工具,并鼓励开发者不要害怕使用它。 翻译后的总结: 本文介绍了操作系统中的一种性能关键技术——垃圾回收,特别是在系统级编程中的应用。它讨论了一种名为RCU的技术,用于实现无需锁定的数据共享,并确保数据更新时的原子性。文章还批驳了关于垃圾回收的常见误解,强调了现代垃圾回收技术提供的优化,并指出垃圾回收是系统编程中的一个重要工具。
Post by: ingve
Comments:
teleforce: For promising modern and parallel GC techniques please check MPL or MaPLe with its novel Automatic Management of Parallelism. It won distinguished paper award in POPL 2024 and ACM SIGPLAN dissertation award 2023 by proposing these two main things [1],[2]:<p>a) Provably efficient parallel garbage collection based on disentanglement<p>b) Provably efficient automatic granularity control<p>[1] MaPLe (MPL):<p><a href="https://github.com/MPLLang/mpl">https://github.com/MPLLang/mpl</a><p>[2] Automatic Parallelism Management:<p><a href="https://dl.acm.org/doi/10.1145/3632880" rel="nofollow">https://dl.acm.org/doi/10.1145/3632880</a>
teleforce: 对于有前途的现代和并行GC技术,请查看MPL或MaPLe及其新颖的并行性自动管理。通过提出这两个主要内容,它获得了POPL 2024和ACM SIGPLAN论文奖2023的杰出论文奖[1],[2]:<p>a)基于解纠缠的可证明高效的并行垃圾收集<p>b)可证明有效的自动粒度控制<p>[1]MaPLe(MPL):<p><a href=“https://;/;github.com/,MPLLang/!MPL”>https:///;github.com/;MPLLang;mpl</a><p>[2]自动并行管理:<p><a href=“https://x2F;/;dl.acm.org/:doi/!10.1145/,3632880”rel=“nofollow”>https:///;dl.acm.org/;doi/;10.1145℉;3632880</a>
celrod: The RCU use case is convincing, but my experience with GCs in other situations has been poor.
To me, this reads more like an argument for bespoke memory management solutions being able to yield the best performance (I agree!), which is a totally different case from the more general static lifetimes generally outperforming dynamic lifetimes (especially when a tracing step is needed to determine liveness).<p>> Lies people believe... Calling free() gives the memory back to the OS.<p>I believe callingfree()
gives the memory back to the allocator, which is much better than giving it to the OS; syscalls are slow.
Perhaps not immediately; mimalloc only makes frees available to futuremalloc
s periodically.<p>Trying a simple benchmark where I allocate and then immediatelyfree
800 bytes, 1 million times, and counting the number of unique pointers I get:
glibc's malloc: 1
jemalloc: 1
mimalloc: 4
Julia's garbage collector: 62767<p>62767, at about 48 MiB, isn't that bad, but it still blows out my computer's L3 cache.
Using a GC basically guarantees every new allocation is from RAM, rather than cache. This kills performance of any heavily allocating code; we don't care only about how fast memory management can work, but how quickly we can worth with what it gives us.
I gave a benchmark in Julia showcasing this: <a href="https://discourse.julialang.org/t/blog-post-rust-vs-julia-in-scientific-computing/101711/80?u=elrod" rel="nofollow">https://discourse.julialang.org/t/blog-post-rust-vs-julia-in...</a><p>Malloc/free gives you a chance at staying hot, if your actual working memory is small enough.<p>Allocators like mimalloc are also designed (like the compacting GC) to have successive allocations be close together. The 4 unique pointers I got from mimalloc were 896 bytes apart.<p>My opinions might be less sour if I had more experience with compacting GCs, but I think GCs are just a vastly more complicated solution to the problem of safe memory management than something like Rust's borrow checker.
Given that the complexity is foisted on the compiler and runtime developers, that's normally not so bad for users, and an acceptable tradeoff when writing code that isn't performance sensitive.
Similarly, RAII with static lifetimes is also a reasonable tradeoff for code not important enough for more bespoke approaches.
The articles example is evidently one of those deserving a more bespoke solution.celrod: RCU用例是令人信服的,但我在其他情况下使用GC的经验一直很差。对我来说,这读起来更像是定制内存管理解决方案能够产生最佳性能的论点(我同意!),这与更一般的静态寿命(通常优于动态寿命)完全不同(尤其是当需要跟踪步骤来确定寿命时)<p> >;人们相信谎言。。。调用free()将内存返回给操作系统<p> 我认为调用“free()”会将内存交还给分配器,这比将内存交给操作系统要好得多;系统调用很慢。也许不会立即;mimalloc只定期向未来的malloc提供空闲空间<p> 尝试一个简单的基准测试,我分配并立即“释放”800个字节,100万次,并计算我获得的唯一指针的数量:glibc;s malloc:1jemalloc:1mimalloc:4Julia;s垃圾收集器:62767<p>62767,在大约48MiB下,是;没有那么糟糕,但它仍然会烧坏我的电脑;s的L3缓存。使用GC基本上可以保证每个新的分配都来自RAM,而不是缓存。这会扼杀任何重分配代码的性能;我们不;不要只关心内存管理工作的速度有多快,而是关心它给我们带来的价值有多快。我在Julia中给出了一个基准,展示了这一点:<a href=“https://;/;discussion.julialang.org/,t/”博客文章科学计算中的rust vs Julia/ 101711/ 80?u=elrod”rel=“nofollow”>https:///;话语。julialang.org/;t;博客文章rust vs julia in…</a><p>Malloc/;如果你的实际工作记忆足够小,free会给你一个保持热度的机会<p> 像mimalloc这样的分配器也被设计为(像压缩GC一样)将连续的分配紧密地放在一起。我从mimalloc得到的4个唯一指针相隔896个字节<p> 如果我在压缩GC方面有更多的经验,我的观点可能不会那么刻薄,但我认为GC只是安全内存管理问题的一个比Rust-7更复杂的解决方案;’’’’s借债检查员。考虑到复杂性被强加给编译器和运行时开发人员;s通常对用户来说并不那么糟糕,并且在编写不太糟糕的代码时是可接受的折衷;t性能敏感。同样,具有静态寿命的RAII也是一种合理的权衡,因为对于更定制的方法来说,代码不够重要。文章的例子显然是值得定制的解决方案之一。
pron: Except in the special case where all memory can be easily handled in arenas, good tracing GCs have long ago surpassed manual memory management in throughput, and more recently their latency impact is more than acceptable for the vast majority of applications (OpenJDK's ZGC has typical pause times measured in double/triple-digit microseconds, and the worst case rarely exceeds 1ms for a reasonable allocation rate -- the pauses are in the same ballpark as OS-induced ones). The only real and significant tradeoff is in memory footprint, and outside of specialty niches (where arenas just work for everything and worst-case latency is in the low microseconds range) that is the only high order question: is my application running in a memory-constrained environment (or it's really worth it to sacrifice other things to keep down RAM consumption) or not?
pron: 除了可以在竞技场中轻松处理所有内存的特殊情况外,良好的跟踪GC在吞吐量方面早就超过了手动内存管理,而且最近它们对延迟的影响对绝大多数应用程序来说都是可以接受的(OpenJDK的ZGC具有典型的以两到三位数微秒为单位的暂停时间,对于合理的分配速率,最坏的情况很少超过1ms——暂停与操作系统引起的暂停大致相同)。唯一真实而重要的权衡是内存占用,而在专业领域之外(竞技场只适用于所有东西,最坏情况下的延迟在低微秒范围内),这是唯一的高阶问题:我的应用程序在内存受限的环境中运行吗(或者牺牲其他东西来降低RAM消耗真的值得吗)?
HippoBaro: For the kind of software I write there are two cases: (1) the hot path for which I will always have custom allocators and avoid allocations and (2) everything else.<p>For (1) GC or not it doesn’t make a difference, I’ll opt-out. For (2) GC is really convenient and correct.
HippoBaro: 对于我编写的软件类型,有两种情况:(1)热路径,我将始终为其提供自定义分配器并避免分配;(2)其他一切<p> 对于(1)GC与否,这没有什么区别,我会选择退出。因为(2)GC是非常方便和正确的。
keybored: The article motivates RCU and then does a u-turn and starts making a general argument for general-purpose GC. Not quite a trojan horse but a bit whiplash provoking.
keybored: 这篇文章激发了RCU,然后做了一个180度大转弯,开始对通用GC进行一般性论证。不是特洛伊木马,但有点刺激。