【Hacker News搬运】SmoothLLM：保护大型语言模型免受越狱攻击

hackernews

Title: SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks

SmoothLLM：保护大型语言模型免受越狱攻击

Text:

Url: https://arxiv.org/abs/2310.03684

由于我是一个文本和代码生成的AI，我无法直接访问外部链接或下载内容。因此，我无法直接查看或分析您提供的arXiv论文链接（https://arxiv.org/abs/2310.03684）的内容。

但是，我可以告诉您如何使用JinaReader这样的工具来抓取和总结内容，如果内容是英文的：

1. 首先，您需要安装JinaReader。这通常涉及使用pip安装相应的包。

```bash
pip install jinareader

然后，您可以使用JinaReader来读取和总结文档。以下是一个基本的示例：

from jinareader import Document

# 读取文档
doc = Document(url='https://arxiv.org/abs/2310.03684')

# 输出文档标题
print(doc.title)

# 输出文档摘要
print(doc.summary)

# 如果需要翻译成中文，可以使用一些翻译API，例如Google Translate API
# 注意：这里只是示例，实际使用时需要注册并获取API密钥
from googletrans import Translator

translator = Translator()
doc_summary_translated = translator.translate(doc.summary, dest='zh-cn').text

# 输出翻译后的摘要
print(doc_summary_translated)

请注意，jinareader和googletrans是示例库，可能需要您根据实际情况安装和配置。

如果文档内容不是英文，您需要使用支持目标语言的翻译API。例如，如果文档是中文的，您可能不需要翻译，可以直接使用JinaReader进行抓取和分析。

如果您需要具体分析这篇论文的内容，您需要手动下载论文，然后使用JinaReader或其他文本分析工具进行操作。如果您需要帮助分析下载的论文内容，请提供论文的文本内容，我可以帮助您进行总结。

        
## Post by: amai
        
### Comments: 
        
**freeone3000**: I find it very interesting that “aligning with human desires” somehow includes prevention of a human trying to bypass the safeguards to generate “objectionable” content (whatever that is). I think the “safeguards” are a bigger problem with aligning with my desires.
> **freeone3000**: 我发现非常有趣的是，“与人类欲望保持一致”在某种程度上包括防止人类试图绕过保护措施来生成“令人反感”的内容（无论是什么）。我认为“保障”是一个更大的问题，与我的愿望保持一致。
            
**padolsey**: So basically this just adds random characters to input prompts to break jailbreaking attempts? IMHO If you can&#x27;t make a single-inference solution, you may as well just run a couple of output filters, no? That appeared to have reasonable results, and if you make such filtering more domain-specific, you&#x27;ll probably make it even better. Intuition says there&#x27;s no &quot;general solution&quot; to jailbreaking, so maybe it&#x27;s a lost cause and we need to build up layers of obscurity, of which smooth-llm is just one part.
> **padolsey**: 所以基本上，这只是在输入提示中添加随机字符来破解越狱尝试？依我之见，如果可以的话；不要做一个单一的推理解决方案，你最好运行几个输出过滤器，不是吗？这似乎有合理的结果，如果你使这种过滤更加针对特定领域，你；我可能会让它变得更好。直觉告诉我们；不是&quot；一般解决办法&quot；越狱，所以也许它；这是一个注定失败的事业，我们需要建立一层层的模糊，而平滑的llm只是其中的一部分。
            
**mapmeld**: There are some authors in common with a more recent paper &quot;Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing&quot; <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2402.16192" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2402.16192</a>
> **mapmeld**: 有些作者与最近的一篇论文有共同之处&quot；通过语义平滑保护大型语言模型免受越狱攻击&quot<a href=“https://arxiv.org.abs2402.16192”rel=“nofollow”>https://&#x2F；arxiv.org；abs；2402.16192</a>
            
**ipython**: It concerns me that these defensive techniques themselves often require even more llm inference calls.<p>Just skimmed the GitHub repo for this one and the read me mentions four additional llm inferences for each incoming request - so now we’ve 5x’ed the (already expensive) compute required to answer a query?
> **ipython**: 让我担心的是，这些防御技术本身通常需要更多的llm推理调用<p> 刚刚浏览了这个问题的GitHub仓库，read-me提到了每个传入请求的四个额外的llm推断——所以现在我们已经将回答查询所需的（已经很昂贵的）计算量增加了5倍？
            
**handfuloflight**: Github: <a href="https:&#x2F;&#x2F;github.com&#x2F;arobey1&#x2F;smooth-llm">https:&#x2F;&#x2F;github.com&#x2F;arobey1&#x2F;smooth-llm</a>
> **handfuloflight**: Github：<a href=“https:&#x2F；&#x2F; Github.com&#x2F/arobey1&#x2F-平滑llm”>https:&#x2F；github.com；arobey1；光滑llm</a>