【Hacker News搬运】分页寡妇，或者，为什么我对我的电子书感到尴尬（2023）

hackernews

Title: Pagination widows, or, Why I'm embarrassed about my eBook (2023)

分页寡妇，或者，为什么我对我的电子书感到尴尬（2023）

Text:

Url: https://clagnut.com/blog/2426

由于我是一个AI，我无法直接访问互联网来抓取或分析网页内容。不过，我可以指导你如何使用JinaReader这样的工具来抓取和分析网页内容，以及如何处理非中文内容的翻译。

以下是一个使用JinaReader进行网页抓取和内容分析的步骤概述，以及如何翻译非中文内容：

### 使用JinaReader抓取和总结内容

1. **安装JinaReader**:
   确保你已经安装了JinaReader库。如果没有，可以使用pip安装：
   ```bash
   pip install jinareader

抓取网页:
使用JinaReader的API或命令行工具来抓取网页内容。以下是一个基本的命令行示例：
```
jinareader fetch https://clagnut.com/blog/2426
```
这将抓取指定URL的内容。

分析内容:
一旦抓取到内容，你可以使用JinaReader的文本分析功能来提取关键信息，例如摘要、关键词等：

from jinareader import Reader

reader = Reader()
doc = reader.read("path_to_your_crawled_content.html")
summary = doc.summary()  # 获取摘要
print(summary)

翻译非中文内容

如果抓取到的内容不是中文，你需要使用翻译服务来将其翻译成中文。以下是一些常用的翻译方法：

使用Google翻译API:
你可以注册Google Cloud平台，获取API密钥，然后使用该API进行翻译。以下是一个简单的示例：

from google.cloud import translate_v2 as translate

translate_client = translate.Client()

def translate_text(text, target='zh-CN'):
    # 文本内容，目标语言代码
    result = translate_client.translate(text, target_language=target)

    return result['translatedText']

# 假设`non_chinese_content`是抓取到的非中文内容
chinese_content = translate_text(non_chinese_content)
print(chinese_content)

使用其他翻译服务:
有很多其他的翻译服务，如DeepL、Yandex等，它们也提供了API，你可以根据需要选择合适的翻译服务。

请注意，上述代码示例需要安装相应的库（如google-cloud-translate），并且需要你有一个有效的API密钥。

将上述步骤结合起来，你就可以抓取网页内容，分析其摘要，并对非中文内容进行翻译。

        
## Post by: OuterVale
        
### Comments: 
        
**userbinator**: The fact that it&#x27;s a book about typography may mean the requirements are a little different, because I personally (and likely many others) don&#x27;t really pay attention to such things.
> **userbinator**: 事实上，它；这是一本关于排版的书，可能意味着要求有点不同，因为我个人（可能还有许多其他人）不这么认为；我真的不太注意这些事情。
            
**gorgoiler**: In the page model, a heading says it needs only one line of vertical space, so if there’s a tiny bit of space at the bottom of the page it’ll get orphaned.  (Vertical box space shown as <i>!</i> and <i>%</i> for the heading and paragraph, respectively.)<p><pre><code>  Page 1         Page 2
  ..        Paragraph..%
  ..        ..text.    %
  ..                 ..
  ..                 ..
 !Heading
</code></pre>
When instead it should be moved to the top of the next page:<p><pre><code>  Page 1         Page 2
  ..        Heading    !
  ..        Paragraph.. %
  ..        ..text.     %
  ..                 ..
                     ..
</code></pre>
Rather than being honest about needing one line…<p><pre><code>  Heading     !
  Paragraph..  %
  ..text.      %
</code></pre>
…the heading could instead claim it needs three lines, which would ensure it would never be orphaned:<p><pre><code>  Heading     !
              !
              !
  Paragraph..  %
  ..text.      %
</code></pre>
But now you have a big gap below the heading.<p>If you could then shift the paragraph up from where it should be in the flow such that the vertical space of the heading and paragraph overlapped…<p><pre><code>  Heading      !
  Paragraph..  !%
  ..text.      !%
</code></pre>
…then you’d get a heading that would never be orphaned on one line, but which looked as it if only used one line.
> **gorgoiler**: 在页面模型中，标题表示它只需要一行垂直空间，所以如果页面底部有一点空间，它就会变成孤立的。（标题和段落的垂直空格分别显示为<i>！</i>和<i>%</i>。）<p><pre><code>第1页第2页..段落..%..        ..文本。%..                 ....                 ..！航向</code></pre>当它应该移动到下一页的顶部时：<p><pre><code>第1页第2页..前进！..段落..%..        ..文本。%..                 ....</code></pre>与其诚实地说需要一行…<p><pre><code>标题！段落..%..文本。%</code></pre>…标题可以改为声明它需要三行，这将确保它永远不会成为孤立的：<p><pre><code>heading！!!段落..%..文本。%</code></pre>但现在你在标题下方有一个很大的差距<p> 如果你能把段落从它应该在流中的位置上移，这样标题和段落的垂直空间就会重叠…<p><pre><code>标题！段落..！%..文本。！%</code></pre>…然后你会得到一个永远不会在一行上孤立的标题，但看起来就像只使用一行一样。
            
**acabal**: If you think it&#x27;s bad that `break-*` isn&#x27;t supported in Firefox or Chrome, wait till you see what your ebook looks like in Kindle, or worse, ADE-based readers, of which there are still many in use!<p>Kindle, the reading device with by far the largest market share, is basically the IE6 of ereaders - too big to ignore, and at the same time dragging down the entire ebook ecosystem with its crappy renderer. Amazon has shown little interest in improving it for over a decade now, while simultaneously fragmenting its own ecosystem with a variety of different proprietary formats that support different CSS and features.<p>ADE, while less common in new devices, is still very common in much older devices - B&amp;N&#x27;s eink Nooks were based on ADE at least as late as a few years ago. (Perhaps they still are?) ADE is closer to IE5 in terms of CSS support!<p>At Standard Ebooks we&#x27;re often hamstrung in our attempts to make beautiful ebooks by these big players refusing to improve their renderers. We&#x27;re forced to dumb down our CSS and use outdated techniques (like occasionally having to use tables for layout!) because ebook renderers are so bad.<p>iBooks is the top tier renderer, because as far as I can tell it&#x27;s basically a wrapper for an up-to-date Webkit; next is Kobo - also Webkit-based - along with other Webkit-based indie apps. The rest of the big players are far, far, far distant.
> **acabal**: 如果你认为；坏的是“break-*”不是；Firefox或Chrome不支持，等你在Kindle或更糟糕的是，基于ADE的阅读器上看到你的电子书是什么样子，其中仍有许多在使用<p> Kindle是迄今为止市场份额最大的阅读设备，基本上是电子阅读器的IE6——太大了，不容忽视，同时用糟糕的渲染器拖垮了整个电子书生态系统。十多年来，亚马逊对改进它几乎没有兴趣，同时用支持不同CSS和功能的各种不同专有格式来分割自己的生态系统<p> ADE虽然在新设备中不太常见，但在更旧的设备中仍然很常见；N；早在几年前，s eink Nooks就基于ADE。（也许他们仍然是？）ADE在CSS支持方面更接近IE5<p> 在Standard Ebooks，我们；这些大公司拒绝改进他们的渲染器，这常常阻碍了我们制作精美电子书的努力。我们；我们不得不降低CSS的效率，使用过时的技术（比如偶尔不得不使用表格进行布局！），因为电子书渲染器太糟糕了<p> iBooks是顶级渲染器，因为据我所知；它基本上是最新Webkit的包装器；下一个是Kobo，也是基于Webkit的，以及其他基于Webkit的独立应用程序。其余的大玩家都离得很远，很远，很远。
            
**cratermoon**: Maybe the reason we&#x27;re still stuck with LaTeX and PDFs because ebook software can&#x27;t be bothered to implement decent typesetting.
> **cratermoon**: 也许是因为我们；我们仍然坚持使用LaTeX和PDF，因为电子书软件可以；不要费心去实现像样的排版。
            
**fragmede**: Pragmatism wins out of waiting for css properties to get implemented, and div display inline block works today in epubs and doesn&#x27;t need to be backported to iBooks.<p><a href="https:&#x2F;&#x2F;ebooks.stackexchange.com&#x2F;questions&#x2F;7014&#x2F;how-can-i-prevent-a-widowed-orphaned-header" rel="nofollow">https:&#x2F;&#x2F;ebooks.stackexchange.com&#x2F;questions&#x2F;7014&#x2F;how-can-i-pr...</a>
> **fragmede**: 实用主义赢得了等待css属性实现的胜利，而div显示内联块如今在epubs中工作，但不起作用；不需要背移植到iBooks<p> <a href=“https:&#x2F；ebooks.stackchange.com&#x2F;问题&#7014;如何预防丧偶孤儿头”rel=“nofollow”>https:&#x2F；ebooks.stackchange.com；问题&quot；7014；我怎么能</a>