【Hacker News搬运】The return of the frame pointers

hackernews

Title: The return of the frame pointers

帧指针的返回

Text:

From: https://news.ycombinator.com/item?id=39731824

Url: https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html

这篇文章讨论了在软件调试和剖析中使用和重要性帧指针。它解释说，在2004年，GCC编译器发生了一个变化，停止生成帧指针，这导致调试和剖析工具出现了问题。文章强调，删除帧指针对系统剖析器是一个重大问题，它打破了許多离开CPU的火焰图中的栈。好消息是，Fedora和Ubuntu已经发布了修复此问题的版本，通过默认编译libc和更多内容来生成帧指针，使离开CPU的火焰图更加实用，并使客户更容易采用连续剖析器。文章还讨论了其他栈遍历技术，如LBR、BTS、AET、DWARF、eBPF栈遍历、ORC、SFrames和Shadow Stacks。

Post by: mfiguiere

Comments:

extheat: At 8x86B, looks like the largest open model yet by far. Would be interesting to hear how many tokens it's been trained on. Especially important for higher param models in order to efficiently utilize all those parameters.

extheat: 8x86B，看起来是迄今为止最大的开放式机型。很有意思的是，听听它有多少代币；s进行了训练。对于更高参数的模型，为了有效地利用所有这些参数，这一点尤为重要。

ilaksh: Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.Because I doubt it's as simple as just 'python run.py' to get it going.

ilaksh: 除了x.ai之外，有人真的用这个模型做过推理吗？如果是，他们是否提供了硬件的详细信息？什么类型的AWS实例或其他什么 我认为你可以租一辆8 x A100或8 x H100；s〃；负担得起的“；至少玩几分钟。但您需要确切地知道如何设置GPU集群 因为我对此表示怀疑；s简单到仅为；python运行.py；让它继续下去。

nasir: I'd be very curious to see how it performs especially on inputs that's blocked by other models. Seems like Grok will differentiate itself from other OS models from a cencorship and alignment perspective.

nasir: I-；d非常好奇地看到它是如何执行的；s被其他型号挡住了。看起来Grok将从协调和一致的角度将自己与其他操作系统模型区分开来。

simonw: "Base model trained on a large amount of text data, not fine-tuned for any particular task."Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.

simonw: &quot；基于大量文本数据训练的基础模型，不针对任何特定任务进行微调&quot 据推测，他们的版本；我在推特上预览了一个经过指令调整的模型，它的行为与这些原始权重截然不同。

nylonstrung: For what reason would you want to use this instead of open source alternatives like Mistral

nylonstrung: 你为什么要使用它而不是像Mistral这样的开源替代品

jjcm: I think it's smart to start trying things here. This has infinite flaws with it, but from a business and learnings standpoint it's a step toward the right direction. Over time we're going to both learn and decide what is and isn't important to designate as "AI" - Google's approach here at least breaks this into rules of what "AI" things are important to label:• Makes a real person appear to say or do something they didn't say or do• Alters footage of a real event or place• Generates a realistic-looking scene that didn't actually occurAt the very least this will test each of these hypotheses, which we'll learn from and iterate on. I am curious to see the legal arguments that will inevitably kick up from each of these - is color correction altering footage of a real event or place? They explicitly say it isn't in the wider description, but what about beauty filters? If I have 16 video angles, and use photogrammetry / gaussian splatting / AI to generate a 17th, is that a realistic-looking scene that didn't actually occur? Do I need to have actually captured the photons themselves if I can be 99% sure my predictions of them are accurate?So many flaws, but all early steps have flaws. At least it is a step.

jjcm: 我认为；在这里开始尝试是明智的。这有无限的缺陷，但从商业和学习的角度来看；这是朝着正确方向迈出的一步。随着时间的推移，我们；我们将学习并决定什么是和不是；将其指定为“；AI”-谷歌；这里的方法至少将其分解为“什么”的规则；AI”；事物是重要的标签：＜p＞•让真实的人看起来说或做了他们没有做的事情；t说或做•改变真实事件或地点的镜头？生成逼真的场景；t实际发生至少这将检验这些假设中的每一个；我将从中学习并不断迭代。我很想看看每一个不可避免地会引发的法律争论——颜色校正是否会改变真实事件或地点的镜头？他们明确表示这不是；在更广泛的描述中，但美容滤镜呢？如果我有16个视频角度，并且使用摄影测量；高斯飞溅；人工智能生成的第17个场景，是一个看起来很逼真的场景；实际上没有发生？如果我能99%地确定我对光子的预测是准确的，我是否需要真正捕获光子本身 这么多缺陷，但所有早期步骤都有缺陷。至少这是一个步骤。

summerlight: Looks like there is a huge grea area that they need to figure out in practice. From <a href="https://support.google.com/youtube/answer/14328491#" rel="nofollow">https://support.google.com/youtube/answer/14328491#</a>:Examples of content creators don’t have to disclose:<pre><code> * Someone riding a unicorn through a fantastical world

Green screen used to depict someone floating in space
Color adjustment or lighting filters
Special effects filters, like adding background blur or vintage effects
Production assistance, like using generative AI tools to create or improve a video outline, script, thumbnail, title, or infographic
Caption creation
Video sharpening, upscaling or repair and voice or audio repair
Idea generation
</code></pre>
Examples of content creators need to disclose:<pre><code> * Synthetically generating music (including music generated using Creator Music)
Voice cloning someone else’s voice to use it for voiceover
Synthetically generating extra footage of a real place, like a video of a surfer in Maui for a promotional travel video
Synthetically generating a realistic video of a match between two real professional tennis players
Making it appear as if someone gave advice that they did not actually give
Digitally altering audio to make it sound as if a popular singer missed a note in their live performance
Showing a realistic depiction of a tornado or other weather events moving toward a real city that didn’t actually happen
Making it appear as if hospital workers turned away sick or wounded patients
Depicting a public figure stealing something they did not steal, or admitting to stealing something when they did not make that admission
Making it look like a real person has been arrested or imprisoned</code></pre>

summerlight: 看起来他们需要在练习中找出一个巨大的格雷阿区域。来自<a href=“https://；&#x2F；support.google.com#xx2F；youtube#xx2F！answer&#x2F！14328491#”rel=“nofollow”>https://&#x2F；support.google.com&#x2F；youtube&#x2F；答案&#x2F；14328491#</a>：内容创作者不必披露的例子：<pre><code>有人骑着独角兽穿越奇幻世界绿色屏幕用于描绘漂浮在太空中的人颜色调整或照明过滤器特效滤镜，如添加背景模糊或复古效果制作辅助，如使用生成人工智能工具创建或改进视频大纲、脚本、缩略图、标题或信息图标题创建视频锐化、放大或修复以及语音或音频修复产生想法</code></pre>创作者需要披露的内容示例：<pre><code>综合生成音乐（包括使用创作者音乐生成的音乐）语音克隆他人的语音以用于画外音综合生成真实地方的额外镜头，比如毛伊岛冲浪者的宣传旅游视频综合生成两名真实职业网球运动员比赛的逼真视频让人看起来好像有人给出了他们实际上没有给出的建议对音频进行数字更改，使其听起来像流行歌手在现场表演中错过了一个音符展示龙卷风或其他天气事件向真实城市移动的真实写照，但实际上并没有发生让人觉得医院工作人员拒绝了生病或受伤的病人描述一个公众人物偷了他们没有偷的东西，或者在他们没有承认的情况下承认偷了东西让它看起来像一个真人被逮捕或监禁</code></pre>

the_duke: They don't bother to mention it, but this is actually to comply with the the new EU AI act.> Providers will also have to ensure that AI-generated content is identifiable. Besides, AI-generated text published with the purpose to inform the public on matters of public interest must be labelled as artificially generated. This also applies to audio and video content constituting deep fakes<a href="https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai#:~:text=Providers will also have to,video content constituting deep fakes" rel="nofollow">https://digital-strategy.ec.europa.eu/en/policies/regulatory...</a>.Some discussion here: <a href="https://news.ycombinator.com/item?id=39746669">https://news.ycombinator.com/item?id=39746669</a>

the_duke: 他们不；我不想提，但这实际上是为了遵守新的欧盟人工智能法案 &gt；提供商还必须确保人工智能生成的内容是可识别的。此外，人工智能生成的文本是为了向公众通报公共利益事项而发布的，必须被贴上人为生成的标签。这也适用于构成深度伪造的音频和视频内容<a href=“https://；&#x2F；数字战略.ec.europa.eu&#x2F！en&#x2F，政策&#x2F：监管框架ai#：~：text=提供商%20will%20also%20have%20to，视频%20content%20construction%20deep%20fakes”rel=“nofollow”>https://&#x2F；数字战略.ec.europa.eu；en；策略；监管的一p> 这里的一些讨论：<a href=“https://；&#x2F；news.ycombinator.com&#x2F？id=39746669”>https://&#x2F；news.ycombinator.com&#x2F；项目id=39746669</a>

yoavz: Most interesting example to me: "Digitally altering audio to make it sound as if a popular singer missed a note in their live performance".This seems oddly specific to the inverse of what happened recently with Alicia Keys from the recent Superbowl. As Robert Komaniecki pointed out on X [1], Alicia Keys hit a "sour note" which was silently edited by the NFL to fix it.[1] <a href="https://twitter.com/Komaniecki_R/status/1757074365102084464" rel="nofollow">https://twitter.com/Komaniecki_R/status/1757074365102084464</a>

yoavz: 对我来说最有趣的例子是：；对音频进行数字更改，使其听起来像是流行歌手在现场表演中错过了一个音符” 这似乎与最近超级碗中艾丽西亚·凯斯的遭遇正好相反。正如Robert Komaniecki在X[1]上指出的那样，Alicia Keys打出了一个“；酸味”；NFL对其进行了静默编辑以修复它。[1]<a href=“https://；&#x2F；twitter.com&#x2F：Komaniecki_R&x2F；status&#x2F，1757074365102084464”rel=“nofollow”>https://&#x2F；twitter；Komaniecki_ R；status；1757074365102084464</a>

sigmoid10: >Some examples of content that require disclosure include: [...] Generating realistic scenes: Showing a realistic depiction of fictional major events, like a tornado moving toward a real town.This sounds like every thumbnail on youtube these days. It's good that this is not limited to AI, but it also means this will be a nightmare to police.

sigmoid10: &gt；一些需要披露的内容示例包括：[…]生成真实场景：显示虚构重大事件的真实描述，如龙卷风向真实城镇移动 这听起来像是最近youtube上的每一个缩略图。它；这不仅限于人工智能，这很好，但也意味着这将是警方的噩梦。

thrdbndndn: The emphasis here is Single Image, but can this model generate with multiple images too?We know that a single image of an object physically can't cover all the sides of it, so it's all guesswork in AI. This is totally fine for certain scenario, but in lots of other cases, it's trivial to have multiple images of the same object, and if that offers higher fidelity, it's totally worth it.I'm aware there are many algorithms or AI models that already
do that. I'm asking about Stability's one specifically because if they have impressive Single Image result, surely their multi-image results would also be much better than state-of-the-art?

thrdbndndn: 这里的重点是单图像，但这个模型也能用多个图像生成吗 我们知道，物体的单个图像在物理上可以；t覆盖它的所有侧面，所以它；这都是人工智能中的猜测。这对某些场景来说是完全可以的，但在许多其他情况下；具有同一对象的多个图像是微不足道的，并且如果这提供了更高的保真度；这完全值得；我知道已经有很多算法或人工智能模型这样做。I-；m询问稳定性；特别是因为如果他们有令人印象深刻的单图像结果，那么他们的多图像结果肯定也会比最先进的要好得多？

kouteiheika: Just tried to run this using their sample script on my 4090 (which has 24GB of VRAM). It ran for a little over 1 minute and crashed with an out-of-memory error. I tried both SV3D_u and SV3D_p models.[edit]Managed to generate by tweaking the script to generate less frames simultaneously. 19.5GB peak VRAM usage, 1 min 25 secs to generate at 225 watts.[/edit]

kouteiheika: 只是试着在我的4090（它有24GB的VRAM）上使用他们的示例脚本来运行这个。它运行了1分钟多一点，由于内存不足而崩溃。我尝试了SV3D_u和SV3D_p两种模型 [edit]通过调整脚本来同时生成更少的帧，从而实现了生成。19.5GB峰值VRAM使用量，1分25秒以225瓦的功率产生。[&#x2F；编辑]

nbzso: Billions purred into technology with minimal use case application.
What is the direct implication of this tech?
Porn on demand?

nbzso: 数十亿美元投入到使用最少用例应用程序的技术中。这项技术的直接含义是什么？色情点播？

Filligree: If the animations shown are representative, then the mesh output may very well be good enough to use in a 3d printer.Looking forward to experimenting with this.

Filligree: 如果显示的动画具有代表性，那么网格输出可能非常好，足以在3d打印机中使用 期待着对此进行试验。

ionwake: Im sorry for dumb lazy question. But would the input require more than one image? Is there a demo url to test this? I think it might jsut be time to buy a 3d printer.EDIT> Does "single image inputs" mean more than one image?

ionwake: 我很抱歉问了这个愚蠢而懒惰的问题。但是输入是否需要多个图像？有没有一个演示url来测试这个？我想也许是时候买一台3d打印机了 编辑&gt；是否“；单个图像输入“；意思是不止一张图片？

dsign: I remember when the omission of stack frame pointers started spreading at the beginning of the 2000s. I was in college at the time, studying computer sciences in a very poor third-world country. Our computers were old and far from powerful. So, for most course projects, we would eschew interprets and use compilers. Mind you, what my college lacked in money it compensated by having interesting course work. We studied and implemented low level data-structures, compilers, assembly-code numerical routines and even a device driver for Minix.During my first two years in college, if one of our programs did something funny, I would attach gdb and see what was happening at assembly level. I got used to "walking the stack" manually, though the debugger often helped a lot. Happy times, until all of the sudden, "-fomit-frame-pointer" was all the rage, and stack traces stopped making sense. Just like that, debugging that segfault or illegal instruction became exponentially harder. A short time later, I started using Python for almost everything to avoid broken debugging sessions. So, I lost an order of magnitude or two with "-fomit-frame-pointer". But learning Python served me well for other adventures.

dsign: 我记得2000年代初，堆栈帧指针的省略开始蔓延。当时我正在上大学，在一个非常贫穷的第三世界国家学习计算机科学。我们的电脑很旧，功能也很差。因此，对于大多数课程项目，我们都会避免解释和使用编译器。请注意，我所在的大学缺乏资金，但它通过有趣的课程工作来弥补。我们研究并实现了低级数据结构、编译器、汇编代码数值例程，甚至Minix的设备驱动程序 在我大学的头两年里，如果我们的一个项目做了一些有趣的事情，我会附上gdb，看看组装级别发生了什么。我习惯了“；行走堆栈”；手动操作，尽管调试器通常帮助很大。快乐的时光，直到突然，&quot-格式帧指针“；风靡一时，堆叠痕迹不再有意义。就这样，调试segfault或非法指令变得越来越困难。不久后，我开始在几乎所有的事情上使用Python，以避免调试会话中断。所以，我失去了一两个数量级的“-fomit帧指针”；。但学习Python对其他冒险活动很有帮助。

rwmj: I'm glad he mentioned Fedora because it's been a tiresome battle to keep frame pointers enabled in the whole distribution (eg <a href="https://pagure.io/fesco/issue/3084" rel="nofollow">https://pagure.io/fesco/issue/3084</a>).There's a persistent myth that frame pointers have a huge overhead, because there was a single Python case that had a +10% slow down (now fixed). The actual measured overhead is under 1%, which is far outweighed by the benefits we've been able to make in certain applications.

rwmj: I-；我很高兴他提到Fedora，因为它；在整个分发版中保持启用帧指针是一场令人厌倦的战斗（例如<a href=“https://；&#x2F；pagure.io&#x2F，fesco&#x2F）issue&#x2F！3084”rel=“nofollow”>https://；#xx2F；page.io&#x2F；fesco&#x20F；issue&#x2F！3084</a>） 有；这是一个持续存在的神话，即帧指针有巨大的开销，因为有一个Python案例的速度慢了+10%（现在已经修复）。实际测量的开销低于1%，这远远超过了我们所获得的好处；我能够在某些应用中制作。

ReleaseCandidat: That's one thing Apple did do right on ARM:> The frame pointer register (x29) must always address a valid frame record. Some functions — such as leaf functions or tail calls — may opt not to create an entry in this list. As a result, stack traces are always meaningful, even without debug information.<a href="https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms" rel="nofollow">https://developer.apple.com/documentation/xcode/writing-arm6...</a>

ReleaseCandidat: 那个；苹果在ARM上做得很好的一件事：；帧指针寄存器（x29）必须始终寻址有效的帧记录。某些函数（如叶函数或尾部调用）可能会选择不在此列表中创建条目。因此，即使没有调试信息，堆栈跟踪也总是有意义的 <a href=“https://；&#x2F；developer.apple.com&#x20F；文档&#x2F：xcode&#x2F，writing-arm64-code-for-apple-platforms”rel=“nofollow”>https://&#x2F；developer.apple.com&#x2F；文档；xcode&#x2F；正在编写第6条</a>

adsharma: I was at Google in 2005 on the other side of the argument. My view back then was simple:Even if $BIG_COMPANY makes a decision to compile everything with frame pointers, the rest of the community is not. So we'll be stuck fighting an unwinnable argument with a much larger community. Turns out that it was a ~20 year argument.I ended up writing some patches to make libunwind work for gperftools and maintained libunwind for some number of years as a consequence of that work.Having moved on to other areas of computing, I'm now a passive observer. But it's fascinating to read history from the other perspective.

adsharma: 2005年，我在谷歌站在了争论的另一边。我当时的观点很简单：即使$BIG_COMPANY决定用帧指针编译所有内容，社区的其他人也不会。所以我们；我将被困在与一个更大的社区进行一场无法获胜的争论中。事实证明，这是一场长达20年的争论 我最终写了一些补丁，使libunvell为gperftools工作，并因此维护了libunvelle好几年；我现在是一个被动的观察者。但是它；从另一个角度阅读历史很有趣。

titzer: Virgil doesn't use frame pointers. If you don't have dynamic stack allocation, the frame of a given function has a fixed size can be found with a simple (binary-search) table lookup. Virgil's technique uses an additional page-indexed range that further restricts the lookup to be a few comparisons on average (O(log(# retpoints per page)). It combines the unwind info with stackmaps for GC. It takes very little space.The main driver is in (<a href="https://github.com/titzer/virgil/blob/master/rt/native/NativeStackWalker.v3);">https://github.com/titzer/virgil/blob/master/rt/native/Nativ...</a> the rest of the code in the directory implements the decoding of metadata.I think frame pointers only make sense if frames are dynamically-sized (i.e. have stack allocation of data). Otherwise it seems weird to me that a dynamic mechanism is used when a static mechanism would suffice; mostly because no one agreed on an ABI for the metadata encoding, or an unwind routine.I believe the 1-2% measurement number. That's in the same ballpark as pervasive checks for array bounds checks. It's weird that the odd debugging and profiling task gets special pleading for a 1% cost but adding a layer of security gets the finger. Very bizarre priorities.

titzer: Virgil不；t使用帧指针。如果你不；t具有动态堆栈分配，给定函数具有固定大小的帧可以通过简单的（二进制搜索）表查找找到。Virgil；s技术使用了一个额外的页面索引范围，该范围进一步将查找限制为平均几个比较（O（log（每页#个retpoints））。它将展开信息与GC的堆栈映射相结合。它占用的空间很小 主驱动程序位于（<a href=“https://；&#x2F；github.com#xx2F；titzer#xx2F，virgil#xx2F）blob#xx2F、master#xx2F。rt#xx2F！native#xx20F；NativeStackWalker.v3）；”>https://&#x2F；github.com&#x2F；titzer；virgil；blob；master；rt；native&#x2F；Nativ</a> 目录中的其余代码实现元数据的解码 我认为只有当帧是动态大小的（即具有数据的堆栈分配）时，帧指针才有意义。否则，在我看来，当静态机制足够时，却使用动态机制是很奇怪的；主要是因为没有人就元数据编码的ABI或展开例程达成一致 我相信1-2%的测量数字。那个；s与数组边界检查的普适性检查大致相同。它；奇怪的是，奇怪的调试和分析任务得到了1%成本的特殊恳求，但添加一层安全性却受到了指责。非常奇怪的优先级。