【Hacker News搬运】谁杀死了网络交换机?一个傲慢的错误故事
-
Title: Who killed the network switch? A Hubris Bug Story
谁杀死了网络交换机?一个傲慢的错误故事
Text:
Url: https://cliffle.com/blog/who-killed-the-network-switch/
在网站 https://cliffle.com/blog/who-killed-the-network-switch/ 上,文章讲述了Hubris操作系统中存在的一个漏洞。这个操作系统最初是为了处理Oxide Rack中大型处理器启动所需的任务而编写的。该漏洞是在两个各有用处的特性结合时产生的。问题与Hubris任务使用消息在进程间通信方案(IPC)中的方式有关。由于系统的故障隔离能力和分析失败快照的能力,这个漏洞在约三小时内被修复了。新代码虽然更复杂,但仍然比大多数操作系统中的等效代码要简单。 文章还描述了Hubris内核,它旨在实现故障隔离和安全,具有幂等操作和全面输入解析等机制,以防止错误和安全性漏洞。内核具有简单和最小化的设计,一个小团队紧密合作以识别和修复问题。作者强调了简洁性和团队文化在创建可靠和安全软件方面的好处。Hubris背后的公司Oxide正在招聘更多的软件开发人员。
Post by: ingve
Comments:
orf: Hubris is really, really nice. I've spent half an hour reading some of the kernel code and it’s exceptionally clear and well written - a far cry from ifdef macro soup, two letter variable name loving, comment starved C code I’ve seen previously. A good bit of bedtime reading!<p>I recommend leafing through it: <a href="https://github.com/oxidecomputer/hubris/blob/b44e677fb39cde8be5b10bbf78a9f26c000f6ad6/sys/kern/src/">https://github.com/oxidecomputer/hubris/blob/b44e677fb39cde8...</a>
orf: 傲慢真的非常好。I-;我花了半个小时阅读了一些内核代码,它非常清晰,写得很好——这与我以前看到的ifdef宏汤(一种喜欢两个字母的变量名、缺乏注释的C代码)相去甚远。睡前好好读书<p> 我建议翻阅一下:<a href=“https://;/;github.com#xx2F;oxideecomputer#xx2F!blob#xx20F;b44e677fb39cde8be5b10bbf78a9f26c000f6ad6#xx20;sys/!kern#xx2F:src#xx2F”>https:///;github.com/;氧化计算器;傲慢;blob;b44e677fb39cde8</a>
scottlamb: Nice read!<p>Nit:<p><pre><code> // Order the task's regions in ascending address order.
//
// THIS IS IMPORTANT. The kernel exploits this property to do cheaper
// access tests.
regions.sort_by_key(|i| region_table.get_index(*i).unwrap().1.base);
</code></pre>
I wouldn't put this comment here. It's not just some detail of this function; it's an invariant of the field that all writers have to respect (maybe this is the only one now but still) and all readers can take advantage of. So I'd add it to theTaskDesc::regions
docstring. [1]<p>[1] <a href="https://github.com/oxidecomputer/hubris/commit/b44e677fb39cde8be5b10bbf78a9f26c000f6ad6#diff-96e27941c050f6013357d7ad1716058d7171f2ab012e2bc5ee01a6c19701d826L45">https://github.com/oxidecomputer/hubris/commit/b44e677fb39cd...</a>scottlamb: 读得好<p> Nit:<p><pre><code>/;排序任务;s区域按地址升序排列。//;//;这很重要。内核利用这个属性做得更便宜//;访问测试。regions.sort_by_key(|i|region_table.get_index(*i).unwrap().1.base);</code></pre>我不会;不要把这个评论放在这里。它;这不仅仅是这个功能的一些细节;它;这是所有作者都必须尊重的领域的不变量(也许这是现在唯一的一个,但仍然是),所有读者都可以利用它。所以我;d将其添加到
TaskDesc::regions
文档字符串中。[1]<p>[1]<a href=“https://;/;github.com/!oxidecomputer/:hubris/,commit/”b44e677fb39cde8be5b10bbf78a9f26c000f6ad6#diff-96e27941c050f6013357d7ad1716058d7171f2ab012e2bc5ee01a6c19701d826L45”>https:///;github.com/;氧化计算器;傲慢;提交/;b44e677fb39cd</一moosingin3space: This is a fantastic in-depth look at debugging a complex problem, and the fact that the rest of the system remained stable is a testament to the quality of the engineering work that the Oxide team put into this. I'm personally quite inspired by this and plan on applying similar techniques in my day job!
moosingin3space: 这是对调试一个复杂问题的一个极好的深入研究,系统的其余部分保持稳定的事实证明了Oxide团队为此投入的工程工作的质量。I-;我个人受到了这一点的启发,并计划在日常工作中应用类似的技术!
mgerdts: According to [1] the prequel is found at [2].<p>1. <a href="https://hachyderm.io/@mjk/112157472314396711" rel="nofollow">https://hachyderm.io/@mjk/112157472314396711</a><p>2. <a href="https://www.mattkeeter.com/blog/2024-03-25-packing/" rel="nofollow">https://www.mattkeeter.com/blog/2024-03-25-packing/</a>
mgerdts: 根据[1],前传可在[2]找到<p> 1<a href=“https://;/;hachyderm.io&x2F;@mjk/:112157472314396711”rel=“nofollow”>https:///;hachyderm.io@mjk;112157472314396711</a><p>2<a href=“https://;/;www.mattkeeter.com/!blog/:2024-03-25-packing/”rel=“nofollow”>https:///;www.mattkeeter.com/;博客/;2024-03-25包装</一
monocasa: FWIW, you can support more than 8 regions by treating that hardware more like a soft fill TLB.
monocasa: FWIW,您可以通过将硬件视为软填充TLB来支持8个以上的区域。