【Hacker News搬运】阿歇特与互联网档案馆：我们仍在为合理使用而战

hackernews

Title: Hachette vs. Internet Archive: We're Still Fighting for Fair Use

阿歇特与互联网档案馆：我们仍在为合理使用而战

Text:

Url: https://www.libraryfutures.net/post/were-still-fighting-for-fair-use

由于我无法直接访问外部链接，我将无法直接读取您提供的网页内容。但是，我可以告诉您如何使用 JinaReader 抓取网页内容并进行分析，以及如果内容不是中文，如何将其翻译成中文。

首先，您需要确保已经安装了 JinaReader 和相关依赖。以下是一个基本的步骤指南：

1. **安装 JinaReader 和相关库**：
   ```bash
   pip install jina

创建一个 Jina 索引器：

from jina import Document, Indexer

# 创建索引器
indexer = Indexer()

抓取网页内容：
您可以使用 requests 库来抓取网页内容，然后将内容作为 Document 对象传递给 Jina 索引器。

import requests

url = "https://www.libraryfutures.net/post/were-still-fighting-for-fair-use"
response = requests.get(url)
content = response.text

# 创建 Document 对象
doc = Document(text=content)

分析内容：
使用 Jina 的处理器来分析内容。您可能需要创建或使用现有的处理器来处理文本。

# 假设我们有一个处理器来分析内容
processor = YourTextAnalysisProcessor()

# 应用处理器
processor.run(doc)

翻译非中文内容：
如果内容不是中文，您需要使用翻译服务。这里以 Google Translate API 为例。

from googletrans import Translator

translator = Translator()
translated_content = translator.translate(content, src='auto', dest='zh-cn').text
doc.text = translated_content

总结内容：
您可以使用简单的文本摘要方法或更复杂的自然语言处理技术来总结内容。

# 简单的文本摘要方法
summary = "This is a summary of the content."

# 将摘要添加到 Document 对象
doc.meta.summary = summary

保存或使用索引器：

# 保存文档到索引器
indexer.add([doc])

# 或者，如果需要，检索并使用它
retrieved_docs = indexer.search('fair use')

请注意，上述代码仅为示例，实际使用时您需要根据实际情况调整代码，例如处理错误、优化性能等。另外，使用 Google Translate API 可能需要注册并获取一个 API 密钥。

        
## Post by: MrVandemar
        
### Comments: 
        
**iwishiknewlisp**: To some degree I think that with the increase in use of digital media there needs to better rights and methods for sharing like one can with physical media.<p>However, it&#x27;s not fair use to copy material and redistribute it. Furthermore, the creator should be able to determine the format of the release of their work. If someone wants to alter their work, they must do so in a transformative manner and not pass it off as the creator&#x27;s work.<p>Someone who makes a book with formatting specific to say a PDF, could be unfairly reviewed or judged by readers who borrowed distributed copies that are formatted to epub, for example.
> **iwishiknewlisp**: 在某种程度上，我认为随着数字媒体使用的增加，需要更好的权利和方法来与物理媒体共享<p> 然而，它；复制材料并重新分发是不公平的。此外，创作者应该能够确定其作品发布的格式。如果有人想改变他们的作品，他们必须以变革的方式这样做，而不是将其冒充创作者；<p>例如，如果有人制作了一本格式特定为PDF的书，那么借阅了格式为epub的分发副本的读者可能会对其进行不公平的审查或评判。
            
**ranger_danger**: &gt; Similar to a photocopy or resale of a book, the publisher and author were paid when their work was purchased or acquired. What do the big publishers want to come after next – used bookstores?<p>Do they not understand what <i>copy</i>right means? They are not allowed to make a _copy_ of a book without permission. Traditional libraries and bookstores do not do that... it is a very important distinction that they either seem completely oblivious to, or are intentionally playing dumb. Or they&#x27;re somehow trying to get the actual definition of a &quot;copy&quot; changed.<p>Don&#x27;t get me wrong, I&#x27;m all for IA and don&#x27;t have anything against them... but in this case the court upholds that digital copies are still copies, and thus this is still copyright infringement.
> **ranger_danger**: &gt；与书籍的复印或转售类似，出版商和作者在购买或获得作品时获得报酬。大出版商想在二手书店之后推出什么<p> 难道他们不明白copyright是什么意思吗？未经允许，他们不得复制书籍。传统的图书馆和书店不会这样做。。。这是一个非常重要的区别，他们要么完全无视，要么故意装傻。或者他们；我们试图以某种方式得到“；复制”；改变了<p> 不要；别误会，我；m全部用于IA和don；我对他们没有任何意见。。。但在本案中，法院认为数字副本仍然是副本，因此这仍然是侵犯版权的行为。