【Hacker News搬运】为什么蟑螂数据库不使用EvalPlanQual

hackernews

Title: Why CockroachDB doesn't use EvalPlanQual

为什么蟑螂数据库不使用EvalPlanQual

Text:

Url: https://www.cockroachlabs.com/blog/why-cockroachdb-doesnt-use-evalplanqual/

标题：为什么CockroachDB不使用EvalPlanQual

内容概述：
本文讨论了CockroachDB在实现READ COMMITTED隔离级别时，为何没有采用PostgreSQL中用于防止丢失更新异常的EvalPlanQual机制。EvalPlanQual是PostgreSQL在READ COMMITTED隔离下用来确保更新正确性的一个检查步骤，但它可能会导致在某些情况下错过行，从而需要应用层的重试。CockroachDB采用了不同的技术，能够在保持隔离性的同时避免这种异常，从而不需要应用层的重试。

文章通过一个篮球联赛的数据库操作例子，对比了PostgreSQL和CockroachDB在处理并发更新时的行为。在PostgreSQL中，由于EvalPlanQual的实现，有时会出现即使单个更新语句看似成功，但实际上数据完整性受损的情况。而CockroachDB则保证在这种情况下数据完整性得以维护。

文章还详细解释了PostgreSQL中EvalPlanQual的工作原理，以及CockroachDB如何通过创建保存点、读取快照、锁定行、再次检查过滤条件以及写入意图等步骤，来防止丢失更新，并确保在高度争用的情况下也能继续进展。

最后，文章指出虽然在表面上看PostgreSQL和CockroachDB在防止丢失更新上似乎存在一种权衡，但从应用程序的角度来看，这种权衡并不是绝对的。应用程序在使用PostgreSQL时可能需要进行应用层的重试，而CockroachDB则能在数据库层面隐藏这些重试，从而对应用程序更加友好。

翻译内容：
标题：为什么CockroachDB不使用EvalPlanQual

内容概述：
本文讨论了CockroachDB在实现READ COMMITTED隔离级别时，为何没有采用PostgreSQL中用于防止丢失更新异常的EvalPlanQual机制。EvalPlanQual是PostgreSQL在READ COMMITTED隔离下用来确保更新正确性的一个检查步骤，但它可能会导致在某些情况下错过行，从而需要应用层的重试。CockroachDB采用了不同的技术，能够在保持隔离性的同时避免这种异常，从而不需要应用层的重试。

文章通过一个篮球联赛的数据库操作例子，对比了PostgreSQL和CockroachDB在处理并发更新时的行为。在PostgreSQL中，由于EvalPlanQual的实现，有时会出现即使单个更新语句看似成功，但实际上数据完整性受损的情况。而CockroachDB则保证在这种情况下数据完整性得以维护。

文章还详细解释了PostgreSQL中EvalPlanQual的工作原理，以及CockroachDB如何通过创建保存点、读取快照、锁定行、再次检查过滤条件以及写入意图等步骤，来防止丢失更新，并确保在高度争用的情况下也能继续进展。

最后，文章指出虽然在表面上看PostgreSQL和CockroachDB在防止丢失更新上似乎存在一种权衡，但从应用程序的角度来看，这种权衡并不是绝对的。应用程序在使用PostgreSQL时可能需要进行应用层的重试，而CockroachDB则能在数据库层面隐藏这些重试，从而对应用程序更加友好。

Post by: michae2

Comments:

michae2: Author here. We've spent the past year adding read committed isolation to CockroachDB.There were many interesting design decisions, such as:- whether to use multiple snapshots or a single snapshot per statement- how to handle read uncertainty intervals- how to incorporate SELECT FOR UPDATE locking into Raft- how to handle SELECT FOR UPDATE subqueries- how to prevent lost update anomalies between two UPDATEsSome of the gory details are in the public RFC: <a href="https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20230122_read_committed_isolation.md">https://github.com/cockroachdb/cockroach/blob/master/docs/RF...</a>This blog post just discusses the last point, but please AMA.

michae2: 作者在这里。我们；在过去的一年里，我一直在将read承诺的隔离添加到蟑螂数据库中 有许多有趣的设计决策，例如：-是使用多个快照还是每个语句使用一个快照-如何处理读取不确定性间隔如何将SELECT FOR UPDATE锁定合并到Raft中-如何处理SELECT FOR UPDATE子查询–如何防止两个更新之间丢失更新异常一些血腥的细节在公共RFC中：RFCS&#x2F；20230122_read_committed_isolation.md“>https://&#x2F；github.com&#x2F；蟑螂；蟑螂&#x2F；blob；master；docs&#x2F；RF</a> 这篇博客文章只是讨论了最后一点，但请AMA。

foota: It's often talked about how new sql databases offer better scalability than standard SQL databases, but I think it's maybe sometimes underappreciated how (some, not all) of them are also much simpler in terms of their consistency models.I'd speculate this is because postgres and friends try to eek out every bit of single node performance (which helps with single row throughout and overall throughout, which is obviously much better for them than newsql) but the scalability of new SQL databases might allow them to prefer easy consistency over single node performance.Possibly this is also just the passage of time benefiting newer systems.

foota: 它；我们经常谈到新的sql数据库如何提供比标准sql数据库更好的可扩展性，但我认为它；s有时可能没有充分认识到它们中的一些（而不是全部）在一致性模型方面也要简单得多 I-；我们推测这是因为postgres和朋友们试图了解单节点性能的每一点（这有助于单行和整体性能，这对他们来说显然比newsql好得多），但新SQL数据库的可扩展性可能使他们更喜欢简单的一致性，而不是单节点性能 也许这也只是时间的流逝，有利于更新的系统。

CGamesPlay: How does the CockroachDB approach not deadlock? Surely retrying could encounter a situation where two competing UPDATE will lock rows in different order, and no amount of retrying will unlock the required rows, right?

CGamesPlay: 蟑螂数据库的方法如何不陷入僵局？当然，重试可能会遇到这样的情况：两个竞争的UPDATE将以不同的顺序锁定行，再多的重试也无法解锁所需的行，对吧？

erhaetherth: I'm having trouble with the example given.If UPDATE player SET level = 'AA' WHERE team = 'Gophers'; is executed before the player swap, then why should "Stonebreaker" be upgraded to "AA"? I'd be pretty mad at my database if I sent those 2 queries in sequence and my DB decided to re-order them.The sleep actually really complicates things here. I understand some queries run slower than others and the sleep is a useful tool to artificially slow things down, but now I don't know I don't know if I should interpret that as one command or two. If WITH sleep AS (SELECT pg_sleep(5)) UPDATE player SET level = 'AA' FROM sleep WHERE team = 'Gophers'; is atomic then I'd expect it to put a lock on the 3 Gophers (which doesn't include Stonebreaker), wait the 5 seconds and then complete the update. The player swap would be blocked for those 5 seconds because it touches a row that's being updated.

erhaetherth: I-；I’我对所举的例子有意见 如果UPDATE player SET level=-7；AA-；其中team＝；Gophers&#；在玩家交换之前执行，那么为什么要“；Stonebreaker”；升级为“；AA”；？I-；如果我按顺序发送了这两个查询，并且我的数据库决定重新排序，我会对我的数据库非常恼火 睡眠实际上使事情变得复杂。我知道有些查询运行得比其他查询慢，睡眠是一个有用的工具，可以人为地减慢速度，但现在我不知道了；I don’我不知道；I don’我不知道该把它解释成一个命令还是两个命令。如果WITH sleep AS（选择pg_sleep（5））更新玩家设置级别=；AA-；FROM sleep WHERE team=；Gophers&#；是原子的，则I-；d希望它锁定3个Gopher（不包括Stonebreaker），等待5秒钟，然后完成更新。玩家交换将在这5秒内被阻止，因为它触摸了一行；正在更新。

ngalstyan4: For similar isolation level anomalies in real world applications check out this SIGMOD '17 paper:ACIDRain: Concurrency-Related Attacks on
Database-Backed Web Applications: <a href="http://www.bailis.org/papers/acidrain-sigmod2017.pdf" rel="nofollow">http://www.bailis.org/papers/acidrain-sigmod2017.pdf</a>

ngalstyan4: 对于现实世界应用中类似的隔离级别异常，请查看此SIGMOD；17论文：＜p＞ACIDRain:对的并发相关攻击数据库支持的Web应用程序：<a href=“http://x2F；&#x2F；www.bailis.org&#x2F，papers&#x2F：acidrain-sigmod2017.pdf”rel=“nofollow”>http://x2F&#x2F；www.bailis.org&#x2F；纸张；酸雨-sigmod2017.pdf</a>