【Hacker News搬运】了解循环DNS

hackernews

Title: Understanding Round Robin DNS

了解循环DNS

Text:

Url: https://blog.hyperknot.com/p/understanding-round-robin-dns

很抱歉，由于我是一个文本交互的AI，我无法直接访问或分析外部网页内容，包括您提供的链接。因此，我无法使用JinaReader或其他工具来抓取和分析该网页的内容。

不过，我可以根据您提供的链接标题和一般知识，对“轮询DNS（Round Robin DNS）”进行简要的中文解释。

轮询DNS（Round Robin DNS）是一种DNS（域名系统）记录的类型，用于将多个IP地址分配给同一个域名。当域名解析请求发生时，轮询DNS会按照预定的顺序将请求分配给不同的IP地址。这种技术通常用于负载均衡，以便将流量分散到多个服务器上，从而提高网站或服务的可用性和响应速度。

以下是轮询DNS工作原理的简要概述：

1. **配置多个记录**：为同一个域名配置多个A记录，每个记录指向不同的IP地址。
2. **顺序请求**：当DNS服务器接收到解析请求时，它会按照配置的顺序依次尝试解析这些IP地址。
3. **轮询**：每次请求时，DNS服务器都会选择列表中的下一个IP地址，并在下一轮请求时选择下一个。
4. **负载均衡**：由于请求被分配到不同的服务器，这有助于平均分配负载，避免单个服务器过载。

这种方法的优点是简单且易于实施，缺点是它不提供任何高级的负载均衡策略，比如基于服务器性能或响应时间的智能路由。此外，如果某个服务器出现故障，轮询DNS不会自动调整，仍会将请求发送到故障服务器，这可能导致服务中断。

如果您需要对该博客的具体内容进行总结或翻译，您可能需要手动访问该网页并使用翻译工具或服务来完成这一任务。

Post by: hyperknot

Comments:

jgrahamc: Hmm. I've asked the authoritative DNS team to explain what's happening here. I'll let HN know when I get an authoritative answer. It's been a few years since I looked at the code and a whole bunch of people keep changing it :-)My suspicion is that this is to do with the fact that we want to keep affinity between the client IP and a backend server (which OP mentions in their blog). And the question is "do you break that affinity if the backend server goes down?" But I'll reply to my own comment when I know more.

jgrahamc: 嗯，我；我已要求权威DNS团队解释；正在这里发生。我；当我得到权威的答案时，我会告诉HN的。它；自从我查看代码以来已经有几年了，很多人一直在更改它：-）我怀疑这与我们希望保持客户端IP和后端服务器之间的亲和力有关（OP在他们的博客中提到了这一点）。问题是&quot；如果后端服务器发生故障，你会破坏这种关联吗&“；但我；等我知道更多，我会回复我自己的评论。

teddyh: One of the early proposed solutions for this was the SRV DNS record, which was similar to the MX record, but for every service, not just e-mail. With MX and SRV records, you can specify a list of servers with associated priority for clients to try. SRV also had an extra “weight” parameter to facilitate load balancing. However, SRV did not want the political fight of effectively hijacking every standard protocol to force all clients of every protocol to also check SRV records, so they specified that SRV should only be used by a client if the standard for that protocol explicitly specifies the use of SRV records. This technically prohibited HTTP clients from using SRV. Also, when the HTTP/2 (and later) HTTP standards were being written, bogus arguments from Google (and others) prevented the new HTTP protocols from specifying SRV. SRV seems to be effectively dead for new development, only used by some older standards.The new solution for load balancing seems to be the new HTTPS and SVCB DNS records. As I understand it, they are standardized by people wanting to add extra parameters to the DNS in order to to jump-start the TLS1.3 handshake, thereby making fewer roundtrips. (The SVCB record type is the same as HTTPS, but generalized like SRV.) The HTTPS and SVCB DNS record types both have the priority parameter from the SRV and MX record types, but HTTPS/SVCB lack the weight parameter from SRV. The standards have been published, and support seem to have been done in some browsers, but not all have enabled it. We will see what browsers will actually do in the near future.

teddyh: 早期提出的解决方案之一是SRV DNS记录，它类似于MX记录，但适用于每种服务，而不仅仅是电子邮件。使用MX和SRV记录，您可以指定具有相关优先级的服务器列表，供客户端尝试。SRV还有一个额外的“权重”参数来促进负载平衡。然而，SRV不希望进行有效劫持每个标准协议的政治斗争，以迫使每个协议的所有客户端也检查SRV记录，因此他们规定，如果该协议的标准明确规定使用SRV记录，则SRV只应由客户端使用。这在技术上禁止HTTP客户端使用SRV。此外，当HTTP；2（及以后）HTTP标准正在编写中，谷歌（和其他公司）的虚假参数阻止了新的HTTP协议指定SRV。SRV似乎已经不再适合新的开发，只被一些旧的标准使用 负载平衡的新解决方案似乎是新的HTTPS和SVCB DNS记录。据我所知，它们是由那些想要向DNS添加额外参数以启动TLS1.3握手的人标准化的，从而减少了往返次数。（SVCB记录类型与HTTPS相同，但概括为SRV。）HTTPS和SVCB DNS记录类型都具有SRV和MX记录类型中的优先级参数，但HTTPS；SVCB缺少SRV的权重参数。这些标准已经发布，一些浏览器似乎已经提供了支持，但并非所有浏览器都启用了它。我们将在不久的将来看到浏览器的实际功能。

unilynx: > So what happens when one of the servers is offline? Say I stop the US server:> service nginx stopBut that's not how you should test this. A client will see the connection being refused, and go on to the next IP. But in practice, a server may not respond at all, or accept the connection and then go silent.Now you're dependent on client timeouts, and round robin DNS will suddenly look a whole lot less attractive to increase reliability.

unilynx: &gt；那么，当其中一台服务器离线时会发生什么？假设我停止了美国服务器：&gt；service nginx停止但那是；这不是你应该怎么测试的。客户端将看到连接被拒绝，并继续访问下一个IP。但在实践中，服务器可能根本没有响应，或者接受连接然后保持沉默 现在，您；由于重新依赖于客户端超时，轮询DNS在提高可靠性方面的吸引力会突然大大降低。

turbobrew: DNS load balancing has some really nasty edge cases. I have had to deal with golang HTTP2 clients using RR DNS and it has caused issues.Golang HTTP2 clients will reuse the first server they can connect to over and over and the DNS is never re-resolved. This can lead to issues where clients will not discover new servers which are added to the pool.An particularly pathological case is if all serving backends go down the clients will all pin to the first serving backend which comes up and they will not move off. As other servers come up few clients will connect since they are already connected to the first server which came back.A similar issue happens with grpc-go. The grpc DNS resolver will only re-resolve when the connection to a backend is broken. Similarly grpc clients can all gang onto a host and never move off. There are suggestions that on the server side you can set MAX_CONNECTION_AGE which will periodically disconnect clients after a while which causes the client to re-resolve the DNS.I really wish there was a better standard solution for service discovery. I guess the best you can do is implement a request based load balancer with a virtual IP and have the load balancer perform health checks. But you are still kicking the can down the road as you are just pushing down the problem to the system which implements virtual IPs. I guess you assume that the routing system is relatively static compared to the backends and that is where the benefits come in.I'm curious how do people do this on bare metal? I know AWS/GCP/etc... have their internal load balancers, but I am kind of curious what the secret sauce is to doing this. Maybe suggestions on blog posts or white papers?

turbobrew: DNS负载平衡有一些非常糟糕的边缘情况。我不得不使用RR DNS处理golang HTTP2客户端，这导致了问题 Golang HTTP2客户端将重复使用它们可以反复连接的第一台服务器，DNS永远不会被重新解析。这可能会导致客户端无法发现添加到池中的新服务器的问题 一个特别病态的情况是，如果所有服务后端都出现故障，客户端将全部固定到出现的第一个服务后端，并且不会离开。随着其他服务器的出现，很少有客户端会连接，因为它们已经连接到返回的第一个服务器 grpc go也存在类似的问题。grpc DNS解析器仅在与后端的连接中断时才会重新解析。同样，grpc客户端可以全部连接到主机上，永远不会离开。有人建议，在服务器端，您可以设置“MAX_CONNECTION-AGE”，这将在一段时间后定期断开客户端连接，从而导致客户端重新解析DNS 我真的希望有一个更好的服务发现标准解决方案。我想你能做的最好的事情就是使用虚拟IP实现一个基于请求的负载均衡器，并让负载均衡器执行健康检查。但你仍然在路上踢罐子，因为你只是把问题推到实现虚拟IP的系统上。我想你假设路由系统与后端相比是相对静态的，这就是好处所在；我很好奇人们是如何在裸露的金属上做到这一点的？我了解AWS；GCP；等等……有他们的内部负载均衡器，但我有点好奇这样做的秘诀是什么。也许是博客文章或白皮书上的建议？

tetha: > As you can see, all clients correctly detect it and choose an alternative server.This is the nasty key point. The reliability is decided client-side.For example, systemd-resolved at times enacted maximum technical correctness by always returning the lowest IP address. After all, DNS-RR is not well-defined, so always returning the lowest IPs is not wrong. It got changed after some riots, but as far as I know, Debian 11 is stuck with that behavior, or was for a long time.Or, I deal with many applications with shitty or no retry behavior. They go "Oh no, I have one connection refused, gotta cancel everything, shutdown, never try again". So now 20% - 30% of all requests die in a fire.It's an acceptable solution if you have nothing else. As the article notices, if you have quality HTTP clients with a few retries configured on them (like browsers), DNS-RR is fine to find an actual load balancer with health checks and everything, which can provide a 100% success rate.But DNS-RR is no loadbalancer and loadbalancers are better.

tetha: &gt；如您所见，所有客户端都能正确检测到它并选择替代服务器 这是令人讨厌的关键点。可靠性由客户端决定 例如，systemd有时会通过始终返回最低的IP地址来实现最大的技术正确性。毕竟，DNS-RR的定义并不明确，因此总是返回最低的IP地址并没有错。在一些骚乱之后，它发生了变化，但据我所知，Debian 11一直坚持这种行为，或者已经坚持了很长时间 或者，我处理许多具有糟糕或没有重试行为的应用程序。他们走了&quot；哦，不，我有一个连接被拒绝，必须取消一切，关机，永远不要再试。&quot；。所以现在20%-30%的请求在火灾中死亡 它；如果你别无选择，这是一个可以接受的解决方案。正如文章所述，如果你有配置了几次重试的高质量HTTP客户端（如浏览器），DNS-RR可以找到一个具有健康检查和所有功能的实际负载均衡器，这可以提供100%的成功率 但是DNS-RR不是负载均衡器，负载均衡器更好。