【Hacker News搬运】可靠软件设计实践
-
Title: Practices of Reliable Software Design
可靠软件设计实践
Text:
Url: https://entropicthoughts.com/practices-of-reliable-software-design
很抱歉,作为一个AI,我无法直接访问外部网站如 `https://entropicthoughts.com/practices-of-reliable-software-design` 来抓取内容。但是,我可以提供一些关于如何使用JinaReader或其他类似的工具来抓取和分析内容的步骤,以及如何处理非中文内容的方法。 ### 使用JinaReader抓取内容 1. **安装JinaReader**: 如果你有权限访问该网站,并且JinaReader是一个可用的工具,首先需要安装JinaReader。通常,这可以通过包管理器(如pip)完成。 2. **配置JinaReader**: 配置JinaReader以抓取所需的内容。这可能包括设置URL、抓取参数和任何其他配置选项。 3. **执行抓取**: 运行JinaReader以从网站抓取内容。 ### 分析抓取的内容 1. **解析HTML**: 使用JinaReader或其他解析库(如BeautifulSoup)来解析HTML内容。 2. **提取文本**: 从解析后的HTML中提取文本内容。 3. **数据清洗**: 清洗数据以去除无关信息,如HTML标签、JavaScript代码等。 ### 翻译非中文内容 如果抓取的内容不是中文,你可以使用以下方法将其翻译成中文: 1. **使用在线翻译服务**: 使用Google Translate、DeepL或百度翻译等在线服务手动翻译内容。 2. **编程方式翻译**: 使用API如Google Cloud Translation API、Microsoft Translator Text API等,通过编程方式将内容翻译成中文。 3. **集成翻译工具**: 如果JinaReader或其他分析工具支持集成翻译服务,可以直接使用这些服务。 ### 示例代码(Python) 以下是一个使用Python和Google Cloud Translation API进行翻译的示例代码: ```python from google.cloud import translate_v2 as translate def translate_text(text, target='zh-CN'): client = translate.Client() result = client.translate(text, target_language=target) return result['translatedText'] # 假设你有一个抓取到的英文文本 english_text = "Your English text here" # 翻译文本 chinese_text = translate_text(english_text) print(chinese_text)
请注意,为了运行上述代码,你需要设置Google Cloud Translation API,并获取必要的认证信息。
综上所述,虽然我无法直接抓取或翻译网站内容,但我提供了使用JinaReader和其他工具的步骤,以及如何处理非中文内容的建议。
## Post by: fagnerbrack ### Comments: **nostrademons**: There is a bunch of good advice here, but it's missed the most useful principal in my experience, probably because the motivating example is too small in scope:<p><i>The way to build reliable software systems is to have multiple independent paths to success.</i><p>This is the Erlang "let it crash" strategy restated, but I've also found it embodied in things like the architecture of Google Search, Tandem Computer, Ethereum, RAID 5, the Space Shuttle, etc. Basically, you achieve reliability through redundancy. For any given task, compute the answer multiple times in parallel, ideally in multiple independent ways. If the answer agrees, great, you're done. If not, have some consensus mechanism to detect the true answer. If you can't compute the answer in parallel, or you still don't get one back, retry.<p>The reason for this is simply math. If you have n different events that must all go right to achieve success, the chance of this happening is x1 * x2 * ... * xn. This product goes to zero very quickly - if you have 20 components connected in series that are all 98% reliable, the chance of success is only 2/3. If instead you have n different events where <i>any</i> one can go right to achieve success, the chance of success is 1 - (1 - y1) * (1 - y2) * ... * (1 - yn). <i>This</i> inverse actually increases as the number of alternate pathways to success goes up and fast. If you have 3 alternatives each of which has just an 80% chance of success, but any of the 3 will work, then doing them all in parallel has a 97% chance of success.<p>This is why complex software systems that must stay up are built with redundancy, replicas, failover, retries, and other similar mechanisms in place. And the presence of those mechanisms usually trumps anything you can do to increase the reliability of individual components, simply because you get diminishing returns to carefulness. You might spend 100x more resources to go from 90% reliability to 99% reliability, but if you can identify a system boundary and correctness check, you can get that 99% reliability simply by having 2 teams each build a subsystem that is 90% reliable and checking that their answers agree. > **nostrademons**: 这里有很多好的建议,但;s错过了我经验中最有用的原则,可能是因为激励性的例子范围太小:<p><i>构建可靠软件系统的方法是有多条独立的成功之路</i> <p>这就是Erlang;让它崩溃";战略重申,但我;我还发现它体现在谷歌搜索、串联计算机、以太坊、RAID 5、航天飞机等的架构中。基本上,你通过冗余来实现可靠性。对于任何给定的任务,并行多次计算答案,最好是以多种独立的方式。如果答案一致,很好,你;重新完成。如果没有,有一些共识机制来检测真正的答案。如果可以的话;不要并行计算答案,否则您仍然无法计算;无法取回,请重试<p> 原因很简单,就是数学。如果你有n个不同的事件必须顺利完成才能取得成功,那么这种情况发生的可能性是x1*x2*…*xn。该产品很快就会归零——如果你有20个串联的组件,它们都是98%可靠的,那么成功的几率只有2;3.如果你有n个不同的事件,其中<i>任何</i>都可以正确地实现成功,那么成功的机会是1-(1-y1)*(1-y2)*…*(1-yn)<i> 随着通往成功的替代途径的数量增加且速度加快,这种逆向效应实际上会增加。如果你有3个备选方案,每个方案的成功几率只有80%,但这3个方案中的任何一个都有效,那么并行完成它们的成功几率为97%<p> 这就是为什么必须保持正常运行的复杂软件系统是用冗余、副本、故障转移、重试和其他类似机制构建的。这些机制的存在通常胜过你能做的任何事情来提高单个组件的可靠性,仅仅是因为你对细心的回报递减。从90%的可靠性到99%的可靠性,你可能会花费100倍的资源,但如果你能确定一个系统边界和正确性检查,你只需让两个团队各自构建一个90%可靠的子系统并检查他们的答案是否一致,就可以获得99%的可靠性。 **BillLucky**: Simple but elegant design principles, recommended > **BillLucky**: 简单而优雅的设计原则,推荐 ****: > ****: ****: > ****: