【Hacker News搬运】OpenZFS重复数据消除现在很好，你不应该使用它

hackernews

Title: OpenZFS deduplication is good now and you shouldn't use it

OpenZFS重复数据消除现在很好，你不应该使用它

Text:

Url: https://despairlabs.com/blog/posts/2024-10-27-openzfs-dedup-is-good-dont-use-it/

很抱歉，我无法直接访问外部网站或执行网络爬虫任务来抓取和分析内容。但是，我可以根据你提供的链接标题和描述来模拟一个总结过程。

标题：“OpenZFS Dedup 是好的，不要使用它”

假设这是一个关于 OpenZFS 数据去重功能（Deduplication）的文章，以下是对该标题可能内容的总结：

---

**总结：**

这篇文章讨论了 OpenZFS 数据去重（Deduplication）功能的使用情况。作者认为虽然 Dedup 是一个有用的功能，但在某些情况下可能不推荐使用。

**主要内容：**

1. **Dedup 的优势**：首先，文章可能会概述 Dedup 的优点，如节省存储空间和提高存储效率。

2. **不推荐使用 Dedup 的情况**：然后，作者可能会解释在哪些情况下不应该使用 Dedup。这可能包括：
   - 当存储数据频繁更改时，Dedup 可能会导致性能下降。
   - 当需要保证数据的原始顺序时，Dedup 可能会打乱数据。
   - 在某些特定的工作负载或应用场景中，Dedup 可能不适用。

3. **替代方案**：文章可能会提出一些替代方案，例如调整 Dedup 的设置、使用不同的存储技术或考虑其他存储优化策略。

4. **结论**：最后，作者可能会总结 Dedup 的利弊，并给出是否使用 Dedup 的建议。

**注意**：以上总结是基于标题和假设内容进行的。要获取准确的信息，请直接访问文章链接并阅读全文。

Post by: type0

Comments:

kderbe: I clicked because of the bait-y title, but ended up reading pretty much the whole post, even though I have no reason to be interested in ZFS. (I skipped most of the stuff about logs...) Everything was explained clearly, I enjoyed the writing style, and the mobile CSS theme was particularly pleasing to my eyes. (It appears to be Pixyll theme with text set to the all-important #000, although I shouldn't derail this discussion with opinions on contrast ratios...)For less patient readers, note that the concise summary is at the bottom of the post, not the top.

kderbe: 我点击是因为标题很吸引人，但最终几乎读完了整篇文章，尽管我没有理由对ZFS感兴趣。（我跳过了关于日志的大部分内容……）一切都解释得很清楚，我喜欢这种写作风格，移动CSS主题尤其让我赏心悦目。（这似乎是Pixyll主题，文本设置为最重要的#000，尽管我不应该用对比度的观点来破坏这场讨论……）对于不太有耐心的读者，请注意，简洁的总结在文章的底部，而不是顶部。

UltraSane: "And this is the fundamental issue with traditional dedup: these overheads are so outrageous that you are unlikely to ever get them back except on rare and specific workloads."This struck me as a very odd claim. I've worked with Pure and Dell/EMC arrays and for VMWare workloads they normally got at least 3:1 dedupe/compression savings. Only storing one copy of the base VM image works extremely well. Dedupe/compression works really well on syslog servers where I've seen 6:1 savings.The effectiveness of dedupe is strongly affected by the size of the blocks being hashed, with the smaller the better. As the blocks get smaller the odds of having a matching block grow rapidly. In my experience 4KB is my preferred block size.

UltraSane: &“；这是传统去重的根本问题：这些开销太离谱了，除非在罕见和特定的工作负载上，否则你不太可能收回它们&“ 这让我觉得很奇怪。我；我曾与Pure和Dell合作过；EMC阵列和VMWare工作负载通常至少可获得3:1的重复数据消除；压缩节省。只存储一个基本VM映像的副本效果非常好。重复数据消除；压缩在syslog服务器上非常有效；我看到了6:1的节省 重复数据消除的有效性受到哈希块大小的强烈影响，越小越好。随着区块变小，拥有匹配区块的几率迅速增长。根据我的经验，4KB是我首选的块大小。

Wowfunhappy: I want "offline" dedupe, or "lazy" dedupe that doesn't require the pool to be fully offline, but doesn't happen immediately.Because:> When dedup is enabled [...] every single write and free operation requires a lookup and a then a write to the dedup table, regardless of whether or not the write or free proper was actually done by the pool.To me, this is "obviously" the wrong approach in most cases. When I'm writing data, I want that write to complete as fast as possible, even at the cost of disk space. That's why I don't save files I'm actively working on in 7zip archives.But later on, when the system is quiet, I would love for ZFS to go back and figure out which data is duplicated, and use the BRT or whatever to reclaim space. This could be part of a normal scrub operation.

Wowfunhappy: 我想要&quot；离线”；重复数据消除，或“；懒惰&quot；重复数据消除；不需要池完全脱机，但不需要；不会马上发生 因为：&gt；启用重复数据消除后[…]，无论池是否实际完成了写入或释放操作，每一次写入和释放操作都需要先查找再写入重复数据消除表 对我来说，这是&quot；显然&quot；在大多数情况下，这是一种错误的方法。当我；我正在写入数据，我希望写入尽快完成，即使以磁盘空间为代价。那；这就是为什么我不；t保存文件；我积极处理7zip档案 但后来，当系统安静时，我希望ZFS能够回去找出哪些数据是重复的，并使用BRT或其他方式来回收空间。这可能是正常擦洗操作的一部分。

nikisweeting: I'm so excited about fast dedup. I've been wanting to use ZFS deduping for ArchiveBox data for years, as I think fast dedup may finally make it viable to archive many millions of URLs in one collection and let the filesystem take care of compression across everything. So much of archive data is the same jquery.min.js, bootstrap.min.css, logo images, etc. repeated over and over in thousands of snapshots. Other tools compress within a crawl to create wacz or warc.gz files, but I don't think anyone has tried to do compression across the entire database of all snapshots ever taken by a tool.Big thank you to all the people that worked on it!BTW has anyone tried a probabilistic dedup approach using soemthing like a bloom filter so you don't have to store the entire dedup table of hashes verbatim? Collect groups of ~100 block hashes into a bucket each, and store a hyper compressed representation in a bloom filter. On write, lookup the hash of the block to write in the bloom filter, and if a potential dedup hit is detected, walk the 100 blocks in the matching bucket manually to look for any identical hashes. In theory you could do this with layers of bloom filters with different resolutions and dynamically swap out the heavier ones to disk when memory pressure is too high to keep the high resolution ones in RAM. Allowing the accuracy of the bloom filter to be changed as a tunable parameter would let people choose their preference around CPU time/overhead:bytes saved ratio.

nikisweeting: 我；我对快速分手感到非常兴奋。我；多年来，我一直想对ArchiveBox数据使用ZFS去重，因为我认为快速去重可能最终使在一个集合中存档数百万个URL成为可能，并让文件系统负责所有内容的压缩。如此多的存档数据是相同的jquery.min.js、bootstrap.min.css、徽标图像等，在数千个快照中反复出现。其他工具在抓取过程中压缩以创建wacz或warc.gz文件，但我不这样做；我认为没有人尝试过对工具拍摄的所有快照的整个数据库进行压缩 非常感谢所有参与其中的人 顺便说一句，有没有人尝试过使用类似布隆过滤器的概率去重方法，这样你就不会；不必逐字存储整个哈希表吗？将每组约100个块哈希收集到一个桶中，并将超压缩表示存储在布隆过滤器中。在写入时，在布隆过滤器中查找要写入的块的哈希值，如果检测到潜在的去重命中，请手动遍历匹配桶中的100个块以查找任何相同的哈希值。理论上，您可以使用具有不同分辨率的布隆过滤器层来实现这一点，并在内存压力过高而无法将高分辨率过滤器保存在RAM中时动态地将较重的过滤器换到磁盘上；开销：字节节省率。

klysm: I really wish we just had a completely different API as a filesystem. The API surface of filesystem on every OS is a complete disaster that we are locked into via backwards compatibility.

klysm: 我真的希望我们有一个完全不同的API作为文件系统。每个操作系统上文件系统的API表面都是一个完全的灾难，我们通过向后兼容性被锁定。