【Hacker News搬运】ReALM：引用解析作为语言建模

hackernews

Title: ReALM: Reference Resolution as Language Modeling

ReALM：引用解析作为语言建模

Text:

Url: https://arxiv.org/abs/2403.20329

标题：ReALM：将参考解析作为语言建模
作者：[于2024年3月29日提交]
发布日期：未提供
顶部图片链接：无
文本：

摘要：参考解析是一个重要的问题，对于理解并成功处理不同类型的上下文至关重要。这种上下文包括之前的轮次和非对话实体相关的上下文，例如用户屏幕上的实体或在后台运行的实体。虽然已经证明LLM对于各种任务非常强大，但它们在参考解析方面的使用，特别是在非对话实体方面，仍然被低估。本文通过展示如何将参考解析转换为语言建模问题，从而创建一个非常有效的系统来解析各种类型的参考，尽管涉及屏幕上的实体等传统上不适合仅文本模式的实体形式。我们在不同类型的参考中取得了与具有类似功能现有系统相比的巨大改进，我们最小的模型对于屏幕上的参考获得了超过5%的绝对增益。我们还与GPT-3.5和GPT-4进行了基准测试，我们最小的模型性能与GPT-4相当，我们更大的模型性能大幅超过它。

提交历史：Joel Ruben Antony Moniz [查看电子邮件] [v1]
2024年3月29日星期五 17:59:06 UTC (7,019 KB)

Post by: mfiguiere

Comments:

ultra_nick: Did they just add 2D text position to the feature vectors?

ultra_nick: 他们是否只是将2D文本位置添加到特征向量中？

dvt: I'm very excited about work being done in this area. In fact, I'm working on a product that does exactly this (runs in the background, local LLM, has access to screen-space entities, can take certain actions). It feels pretty magical to use (here it's running on my 3090Ti; much slower but still serviceable on my M1 MBP): <a href="https://www.youtube.com/watch?v=JH1noETdQEA" rel="nofollow">https://www.youtube.com/watch?v=JH1noETdQEA</a>Currently using Mistral-7B-Instruct-v0.2, but working on a fine-tuning dataset which should make it work better with local applications interfaces (console, browser, email client, Slack, Discord, Word, Excel, etc.).

dvt: I-；我对这方面的工作感到非常兴奋。事实上；我正在开发一个恰好这样做的产品（在后台运行，本地LLM，可以访问屏幕空间实体，可以采取某些操作）。使用起来感觉很神奇（这里它在我的3090Ti上运行；速度慢得多，但在我的M1 MBP上仍然可用）：<a href=“https://；&#x2F；www.youtube.com&#x2F！watch？v=JH1noETdQEA”rel=“nofollow”>https://&#x2F；www.youtube.com&#x2F；看v=JH1noETdQEA</a>目前使用Mistral-7B-Instruct-v0.2，但正在进行微调数据集，这将使其更好地与本地应用程序接口（控制台、浏览器、电子邮件客户端、Slack、Discord、Word、Excel等）配合使用。

threeseed: For context this is the basis for Apple's Siri replacement.I wonder whether this is going to make such a difference given Apple's POI data is so poor.

threeseed: 就上下文而言，这是Apple；的Siri替代品 我想知道，考虑到苹果；的POI数据太差了。