【Hacker News搬运】C++“final”关键字对性能的影响

hackernews

Title: The Performance Impact of C++'s `final` Keyword

C++“final”关键字对性能的影响

Text:

Url: https://16bpp.net/blog/post/the-performance-impact-of-cpp-final-keyword/

文章讨论了C++中`final`关键字对性能的影响。作者首先指出，在C++编程中，性能是一个重要的考虑因素，但是关于性能提升的建议往往缺乏具体的性能数据支持。然后，作者提到了`final`关键字，这是一个用于防止子类化的关键字，据称可以提高性能，但作者发现并没有人提供具体的性能数据来支持这一说法。

为了验证这一说法，作者使用了一个名为PSRayTracing的射线追踪项目来进行测试。这个项目有很多派生类（实现接口），并且在正常执行中被调用数百万次。作者通过修改CMake文件，在编译时添加了一个选项来启用或禁用`final`关键字。然后，作者进行了广泛的测试，包括在不同配置和编译器上运行，以评估`final`关键字对性能的影响。

测试结果表明，`final`关键字并不是在所有情况下都能提高性能。在一些情况下，它确实可以带来1%或更高的性能提升，但在其他情况下，它可能会导致性能下降。特别是，作者发现Clang编译器在x86_64 Linux上使用`final`时，大多数测试案例的性能至少降低了5%。

总的来说，作者得出结论，`final`关键字是否能够提高性能取决于具体的配置和平台，因此建议开发者进行测试和测量，以确定是否值得使用。作者还指出，他们不会在实际产品中使用这种方法来启用或禁用`final`关键字，因为这可能不是最佳实践。最后，作者提供了他们用于处理和呈现这些发现的Jupyter笔记本和原始数据链接，以供读者进一步探索。

Post by: hasheddan

Comments:

mgaunard: What final enables is devirtualization in certain cases. The
main advantage of devirtualization is that it is necessary for inlining.Inlining has other requirements as well -- LTO pretty much covers it.The article doesn't have sufficient data to tell whether the testcase is built in such a way that any of these optimizations can happen or is beneficial.

mgaunard: 在某些情况下，最终促成的是机会的丧失。这个去机会化的主要优点是它是内联所必需的 Inlining还有其他要求——LTO几乎涵盖了它；我没有足够的数据来判断测试用例的构建方式是否可以使这些优化发生或有益。

tombert: I don't do much C++, but I have definitely found that engineers will just assert that something is "faster" without any evidence to back that up.Quick example, I got in an argument with someone a few years ago that claimed in C# that a switch was better than an if(x==1) elseif(x==2)... because switch was "faster" and rejected my PR. I mentioned that that doesn't appear to be true, we went back and forth until I did a compile-then-decompile of a minimal test with equality-based-ifs, and showed that the compiler actually converts equality-based-ifs to switch behind the scenes. The guy accepted my PR after that.But there's tons of this stuff like this in CS, and I kind of blame professors for a lot of it [1]. A large part of becoming a decent engineer [2] for me was learning to stop trusting what professors taught me in college. Most of what they said was fine, but you can't assume that; what they tell you could be out of date, or simply never correct to begin with, and as far as I can tell you have to always test these things.It doesn't help that a lot of these "it's faster" arguments are often reductive because they only are faster in extremely minimal tests. Sometimes a microbenchmark will show that something is faster, and there's value in that, but I think it's important that that can also be a small percentage of the total program; compilers are obscenely good at optimizing nowadays, it can be difficult to determine when something will be optimized, and your assertion that something is "faster" might not actually be true in a non-trivial program.This is why I don't really like doing any kind of major optimizations before the program actually works. I try to keep the program in a reasonable Big-O and I try and minimize network calls cuz of latency, but I don't bother with any kind of micro-optimizations in the first draft. I don't mess with bitwise, I don't concern myself on which version of a particular data structure is a millisecond faster, I don't focus too much on whether I can get away with a smaller sized float, etc. Once I know that the program is correct, then I benchmark to see if any kind of micro-optimizations will actually matter, and often they really don't.[1] That includes me up to about a year ago.[2] At least I like to pretend I am.

tombert: 我不；我没有做太多C++，但我肯定发现工程师们会断言某些东西是“；更快”；没有任何证据支持这一点 举个简单的例子，几年前我和某人发生了一场争论，他在C#中声称“switch”比“if（x==1）elseif（x==2）…”更好因为开关是“；更快”；并拒绝了我的PR；这似乎不是真的，我们来来回回，直到我用基于等式的if对一个最小测试进行了编译然后反编译，并表明编译器实际上将基于等式的ifs转换为幕后的“切换”。那家伙接受了我的公关 但是；在CS中有很多这样的东西，我有点责怪教授们。对我来说，成为一名体面的工程师的很大一部分是学会不再相信大学里教授教给我的东西。他们说的大部分都很好，但你可以；t假设；他们告诉你的可能已经过时，或者一开始就根本不正确，据我所知，你必须始终测试这些东西 它不；没有帮助，很多这些“；它；s更快”；自变量通常是简化的，因为它们只在极少量的测试中更快。有时微基准将显示某些东西更快；s的值，但我认为它是；重要的是，这也可以是整个项目的一小部分；如今，编译器非常擅长优化，很难确定什么时候某个东西会被优化，并且你断言某个东西是&quot；更快”；在一个不平凡的程序中可能不是真的 这就是为什么我不；我真的不喜欢在程序真正工作之前进行任何类型的重大优化。我试图将程序保持在合理的Big-O中，并且由于延迟，我试图将网络调用最小化，但我没有；不要在初稿中进行任何形式的微观优化。我不；don’别乱了，我不在乎；我不关心特定数据结构的哪个版本快一毫秒；t过多地关注我是否可以使用较小大小的浮点等。一旦我知道程序是正确的，然后我进行基准测试，看看是否有任何类型的微优化真的很重要，而且通常它们真的不重要；t.＜p＞[1]这包括大约一年前的我 [2]至少我喜欢假装自己是。

andrewla: I'm surprised that it has any impact on performance at all, and I'd love to see the codegen differences between the applications.Mostly the final keyword serves as a compile-time assertion. The compiler (sometimes linker) is perfectly capable of seeing that a class has no derived classes, but what final assures is that if you attempt to derive from such a class, you will raise a compile-time error.This is similar to how inline works in practice -- rather than providing a useful hint to the compiler (though the compiler is free to treat it that way) it provides an assertion that if you do non-inlinable operations (e.g. non-tail recursion) then the compiler can flag that.All of this is to say that final can speed up runtimes -- but it does so by forcing you to organize your code such that the guarantees apply. By using final classes, in places where dynamic dispatch can be reduced to static dispatch, you force the developer to not introduce patterns that would prevent static dispatch.

andrewla: I-；我很惊讶它对性能有任何影响；我很想看看应用程序之间的代码生成差异 大多数情况下，“final”关键字用作编译时断言。编译器（有时是链接器）完全能够看到一个类没有派生类，但“final”确保的是，如果您试图从这样的类派生，则会引发编译时错误 这类似于“内联”在实践中的工作方式——它不是向编译器提供有用的提示（尽管编译器可以自由地这样处理它），而是提供一个断言，即如果您执行不可内联的操作（例如，非尾部递归），则编译器可以对此进行标记 所有这些都意味着“final”可以加快运行时的速度，但它是通过强制您组织代码以使保证适用来实现的。通过使用“final”类，在可以将动态调度简化为静态调度的地方，可以强制开发人员不要引入会阻止静态调度的模式。

mgraczyk: The main case where I use final and where I would expect benefits (not covered well by the article) is when you are using an external library with pure virtual interfaces that you implement.For example, the AWS C++ SDK uses virtual functions for everything. When you subclass their classes, marking your classes as final allows the compiler to devirtualize your own calls to your own functions (GCC does this reliably).I'm curious to understand better how clang is producing worse code in these cases. The code used for the blog post is a bit too complicated for me to look at, but I would love to see some microbenchmarks. My guess is that there is some kind of icache or code side problem. where inlining more produces worse code.

mgraczyk: 我使用final的主要情况以及我期望获得的好处（本文没有很好地介绍）是当您使用具有纯虚拟接口的外部库时 例如，AWS C++SDK对所有内容都使用虚拟函数。当你对它们的类进行子类化时，将你的类标记为final允许编译器破坏你自己对自己函数的调用（GCC可以可靠地做到这一点） I-；我很想更好地了解clang是如何在这些情况下生成更糟糕的代码的。用于博客文章的代码有点太复杂了，我看不下去，但我很想看到一些微基准。我的猜测是存在某种icache或代码端问题。内联越多，代码越差。

ein0p: You should use final to express design intent. In fact I’d rather it were the default in C++, and there was some sort of an opposite (‘derivable’?) keyword instead, but that ship has sailed long time ago. Any measurable negative perf impact should be filed as a bug and fixed.

ein0p: 您应该使用final来表达设计意图。事实上，我宁愿它是C++中的默认值，取而代之的是某种相反的（“可验证”？）关键字，但那艘船很久以前就已经启航了。任何可衡量的负面性能影响都应该作为bug进行归档并修复。