【Hacker News搬运】下一代带外垃圾收集

hackernews

Title: Next Generation Out of Band Garbage Collection

下一代带外垃圾收集

Text:

Url: https://railsatscale.com/2024-10-23-next-generation-oob-gc/

由于我无法直接访问互联网来抓取网页内容，我将提供一个假设性的分析，并演示如何使用JinaReader这样的工具进行内容抓取和翻译的过程。

首先，假设我们已经抓取了上述链接的内容。以下是一个简化的步骤说明，说明如何使用JinaReader（一个假想的工具）来抓取和分析内容，以及如何处理非中文内容：

1. **抓取网页内容**：
   ```python
   from jinareader import JinaReader

   url = "https://railsatscale.com/2024-10-23-next-generation-oob-gc/"
   reader = JinaReader()
   content = reader.fetch(url)

分析抓取的内容：

# 分析文本内容
analysis = reader.analyze(content)
print(analysis.summary)  # 假设有一个方法可以提取摘要

检测语言并翻译：
如果内容不是中文，JinaReader会检测语言并自动进行翻译：

if not analysis.is_chinese:
    translated_content = reader.translate(content, target_language='zh')
    print(translated_content)

总结内容：
使用翻译后的内容进行总结：

summary = reader.summarize(translated_content)
print(summary)

以下是一个假设性的总结输出，假设内容是英文且经过翻译：

抓取内容摘要：
- 文章讨论了下一代OutOfBand（OOB）垃圾回收器的进展。
- 分析了新的垃圾回收算法和它们如何提高性能。
- 讨论了在大型Rails应用程序中实施这些新技术的挑战。

翻译后内容总结：
- 文章介绍了下一代OutOfBand（OOB）垃圾回收器的进展。
- 探讨了新的垃圾回收算法及其如何提高性能。
- 讨论了在大型Rails应用程序中实施这些新技术的困难。

请注意，上述代码和输出是假设性的，因为JinaReader是一个虚构的工具，并不存在实际的代码实现。实际的实现将依赖于具体的库和API，如BeautifulSoup用于网页抓取，Google Translate API用于翻译，以及自然语言处理库如spaCy或transformers进行文本分析和摘要。

        
## Post by: ksec
        
### Comments: 
        
**hinkley**: &gt; Ideally in a web application, aside from some in-memory caches, no object allocated as part of a request should survive longer than the request itself.<p>This is one of those areas where out of process caching wins. In process caching has a nasty habit of putting freshly created objects into collections that have survived for days or hours, creating writes in the old generation and back references from old to new.<p>Going out of process makes it someone else’s problem. And if it’s a compiled language with no or a better GC, all the better.
> **hinkley**: &gt；理想情况下，在web应用程序中，除了一些内存缓存外，作为请求一部分分配的任何对象都不应比请求本身存活更长时间<p> 这是进程外缓存获胜的领域之一。进程内缓存有一个令人讨厌的习惯，那就是将新创建的对象放入已存活数天或数小时的集合中，在旧一代中创建写入，并将引用从旧到新<p> 走出流程会让它成为别人的问题。如果它是一种没有GC或GC更好的编译语言，那就更好了。
            
**spullara**: All the other virtual machines that support GC need to look at the JVM&#x27;s ZGC and Shenandoah. Sub-millisecond pause times with terabyte heaps.
> **spullara**: 支持GC的所有其他虚拟机都需要查看JVM；ZGC和谢南多厄。TB堆的亚毫秒级暂停时间。
            
**jeeyoungk**: This is from several years ago (2017), but this has very similar vibe as Instagram disabling Python GC - <a href="https:&#x2F;&#x2F;instagram-engineering.com&#x2F;dismissing-python-garbage-collection-at-instagram-4dca40b29172" rel="nofollow">https:&#x2F;&#x2F;instagram-engineering.com&#x2F;dismissing-python-garbage-...</a>
> **jeeyoungk**: 这是几年前（2017年）的事情，但这与Instagram禁用Python GC的感觉非常相似-<a href=“https:#x2F；#x2F Instagram engineering.com#x2F discussion-Python-garbagage-collection-at-Instagram-4dca40b29172”rel=“nofollow”>https:#^2&#x2F；instagram engineering.com；忽略python垃圾-</一
            
**sys64739**: What dashboard software is that?
> **sys64739**: 那是什么仪表板软件？
            
**henning**: They built a large codebase on a language that doesn&#x27;t let you control memory, because that makes you &quot;more productive&quot;. So just having Rails allocate a per-request arena that is asynchronously freed which would force the programmer not to have any objects that outlive the request, or just pre-allocating memory for a fixed amount of request handling per server instance, or whatever allocation behavior you want to do that is generally possible in C&#x2F;C++&#x2F;Zig&#x2F;Rust&#x2F;Odin&#x2F;etc, requires hacking on the language itself. Which means your changes have to go through the Ruby team first. Any additional changes would also need to go through them, which increases the cost of change. Then there is a permanent layer of indirection between your GC callbacks and the semantics of what those callbacks do. Instead of just writing out the custom allocators you want, because that&#x27;s impossible. How depressing.
> **henning**: 他们在一种语言上构建了一个庞大的代码库；不要让你控制记忆，因为这会让你&quot；更有效率&quot；。因此，让Rails为每个请求分配一个异步释放的竞技场，这将迫使程序员不要拥有任何超过请求的对象，或者只是为每个服务器实例预分配固定数量的请求处理内存，或者你想做的任何分配行为，这在C中通常都是可能的；C++™；Zig；锈蚀；Odin；需要对语言本身进行黑客攻击。这意味着您的更改必须首先通过Ruby团队。任何额外的更改也需要经过这些更改，这增加了更改的成本。然后，在GC回调和这些回调的语义之间有一个永久的间接层；这是不可能的。多么令人沮丧。