【Hacker News搬运】检测LLM何时不确定

hackernews

Title: Detecting when LLMs are uncertain

检测LLM何时不确定

Text:

Url: https://www.thariq.io/blog/entropix/

很抱歉，作为一个AI，我无法直接访问外部网站如 `https://www.thariq.io/blog/entropix/` 来获取内容。但我可以提供一个大致的方法来分析你提供的链接内容，并使用假设的内容进行总结。

以下是一个分析过程：

1. **访问链接**：你首先需要访问这个链接来查看网页的内容。

2. **抓取内容**：你可以使用JinaReader或者其他网页内容抓取工具来提取网页上的文本信息。

3. **分析内容**：分析提取的文本，了解文章的主题、观点、作者意图等。

4. **翻译非中文内容**：如果内容包含非中文部分，可以使用在线翻译工具或API（如Google翻译）将其翻译成中文。

5. **总结内容**：基于分析的结果，总结文章的主要内容。

以下是一个假设的总结过程，基于假设的抓取内容：

**假设抓取的内容**：

In this blog post, the author discusses the concept of Entropix, a novel approach to visualizing and understanding complex systems. The author explains that Entropix combines ideas from information theory and machine learning to provide a new way of looking at data. The post also includes a case study where Entropix is used to analyze a real-world dataset, demonstrating its effectiveness.

The author highlights the following key points:

Entropix is a tool for visualizing entropy in complex systems.
It uses information theory and machine learning techniques.
A case study shows its use in analyzing a dataset from a real-world application.

The author concludes by suggesting that Entropix could be a valuable tool for researchers and professionals working with complex data.


**假设的总结**：
这篇文章介绍了Entropix，这是一种新颖的方法，用于可视化和理解复杂系统。Entropix结合了信息理论和机器学习技术，提供了一个观察数据的新视角。文章通过一个案例研究展示了Entropix在分析实际应用数据集中的应用效果。作者认为Entropix对于处理复杂数据的科研人员和专业人士来说是一个有价值的工具。

请注意，这只是一个基于假设的总结，实际内容的总结将取决于你从链接中抓取的具体信息。

Post by: trq_

Comments:

nhlx2: On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
— Charles Babbage

nhlx2: 有两次我被问到：；请问，巴贝奇先生，如果你把错误的数字输入机器，正确的答案会出来吗&#x27；我无法正确理解会引发这样一个问题的那种思想混乱。--查尔斯·巴贝奇

cchance: This when that entropy is high i feel like models should have an escape hatch to trigger that the answers overall certainty was low, and hell add it up and score it so at the end the user can see if during the generation the certainty of the answer was shit, and should be thrown out ore replaced with a "i'm not sure"

cchance: 当熵很高时，我觉得模型应该有一个逃生口来触发答案的总体确定性很低，他会把它加起来并打分，这样最终用户就可以看到在生成过程中答案的确定性是否很糟糕，应该扔掉，或者用&quot；i■；我不确定；

tylerneylon: I couldn't figure out if this project is based on an academic paper or not — I mean some published technique to determine LLM uncertainty.This recent work is highly relevant: <a href="https://learnandburn.ai/p/how-to-tell-if-an-llm-is-just-guessing" rel="nofollow">https://learnandburn.ai/p/how-to-tell-if-an-llm-is-just-gues...</a>It uses an idea called semantic entropy which is more sophisticated than the standard entropy of the token logits, and is more appropriate as a statistical quantification of when an LLM is guessing or has high certainty. The original paper is in Nature, by authors from Oxford.

tylerneylon: 我不能；我不知道这个项目是否基于学术论文——我指的是一些已发表的确定法学硕士不确定性的技术 最近的这项工作非常相关：<a href=“https:&#x2F；learnandburn.ai/ p&#x2F.如何判断llm是否只是猜测”rel=“nofollow”>https:&quot&#x2F；学习与燃烧；p■；如何判断llm是否只是猜测</a> 它使用了一种称为语义熵的概念，这种概念比令牌逻辑的标准熵更复杂，更适合作为LLM猜测或具有高确定性时的统计量化。原始论文发表在《自然》杂志上，作者来自牛津。

lasermike026: Currently LLMs do not have executive or error detection cognitive abilities. There is no theory of self or emotional instinct and imperatives. At the moment LLMs are just mindless statical models.

lasermike026: 目前，LLM没有执行或错误检测认知能力。没有关于自我或情感本能和命令的理论。目前，LLM只是无脑的静态模型。

badsandwitch: Has anyone tried to see what the output looks like if the model is never allowed to be uncertain?For example, whenever certainty drops below a threshold the sampler backtracks and chooses different tokens. Such that at the end every single token had an above threshold certainty.I doubt it would entirely eliminate undesirable outputs, but it would be interesting.

badsandwitch: 如果模型永远不允许不确定性，有人试过看看输出是什么样子吗 例如，每当确定性降至阈值以下时，采样器就会回溯并选择不同的令牌。这样，最后每一个代币都有一个高于阈值的确定性 我怀疑它不会完全消除不希望的输出，但它会很有趣。