【Hacker News搬运】作为编译器的语言模型：模拟伪代码执行

hackernews

Title: Language models as compilers: Simulating pseudocode execution

作为编译器的语言模型：模拟伪代码执行

Text:

Url: https://arxiv.org/abs/2404.02575

标题：语言模型作为编译器：模拟伪代码执行可提高语言模型在算法推理方面的能力
作者：Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo
提交日期：2024年4月3日
发布日期：未提供
顶部图片链接：未提供
文本内容：

作者: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo
查看PDF
摘要：算法推理是指理解问题背后的复杂模式，并将它们分解为朝着解决方案的推理步骤的能力。算法推理的这种性质使得大型语言模型（LLM）面临挑战，尽管它们在其他推理任务中表现出了有希望的表现。在这一点上，一些最近的研究使用编程语言（如Python）来表达解决给定实例/问题的必要逻辑（如Program-of-Thought），这受到了它们严格的和精确的语法结构的启发。然而，即兴编写一个表达正确逻辑的可执行代码在单个推理调用中是非平凡的。此外，为特定实例生成的代码无法重新用于其他实例，即使它们来自相同的任务，也可能需要相同的逻辑来解决问题。本文介绍了Think-and-Execute，一个将语言模型的推理过程分解为两个步骤的新颖框架。（1）在Think步骤中，我们发现了解决给定任务的任务级逻辑，并使用伪代码表达该逻辑；（2）在Execute步骤中，我们进一步定制生成的伪代码以适应每个实例，并模拟代码的执行。在七个算法推理任务上的大量实验表明了Think-and-Execute的有效性。我们的方法比执行特定实例推理的几种强大基线（如CoT和PoT）更好地提高了LLM的推理能力，这表明发现任务级逻辑的帮助性。此外，我们显示，与自然语言相比，伪代码可以更好地引导LLM的推理，尽管它们被训练为遵循自然语言指令。
提交历史：来自Hyungjoo Chae[查看电子邮件] [v1]
2024年4月3日星期三 08:49:11 UTC（1,323 KB）

Post by: milliondreams

Comments:

pkoird: Any sufficiently advanced LLM is indistinguishable from Prolog.<p>I half-jest but I envision the direction of LLM research to head towards a parser-oriented setup where LLMs merely extract the entities and relations and the actual logic is done by a logical engine such as Prolog.

pkoird: 任何足够先进的LLM都无法与Prolog区分开来<p> 我半开玩笑，但我设想LLM研究的方向是朝着面向解析器的设置发展，在这种设置中，LLM仅提取实体和关系，而实际逻辑由Prolog等逻辑引擎完成。

Mathnerd314: The phase 2 prompt is complete, but the phase 3 prompt's initial part ends in "When constructing the main function, ...", and no mention of random seeds, so I guess this paper is not reproducible at all.

Mathnerd314: 阶段2提示完成，但阶段3提示；s的初始部分以“；在构建主函数时&”；，而且没有提到随机种子，所以我想这篇论文根本无法复制。

jumploops: English is terribly imprecise, so it makes sense to use pseudo instructions to improve the bounds/outcome of a language model’s execution.<p>I do wonder how long hacks like this will be necessary; as it stands, many of these prompting techniques are essentially artificially expanding the input to enhance reasoning ability (increasing tokens, thus increasing chance of success).

jumploops: 英语是非常不精确的，所以使用伪指令来改进边界是有意义的；语言模型执行的结果<p> 我真想知道这样的黑客攻击需要多长时间；目前，许多提示技术本质上是人为地扩大输入以增强推理能力（增加标记，从而增加成功的机会）。

spxneo: This seems quite promising. Using pseudo-code as an intermediary step isn't new but seems like this takes it a bit further. Will need to see some code and test it out.

spxneo: 这似乎很有希望。使用伪代码作为中间步骤是；这并不是什么新鲜事，但似乎更进一步。需要查看一些代码并进行测试。

inciampati: It's going to be really fascinating to see this applied instead of chain of thought and other kinds of reasoning approaches, because it's generic. It should in principle work on every kind of LLM.

inciampati: 它；看到这一点而不是思维链和其他类型的推理方法的应用将非常有趣，因为它；是通用的。原则上，它应该适用于各种LLM。