【Hacker News搬运】基于纯视觉的GUI代理的OmniParser

hackernews

Title: OmniParser for Pure Vision Based GUI Agent

基于纯视觉的GUI代理的OmniParser

Text:

Url: https://microsoft.github.io/OmniParser/

很抱歉，我无法直接访问外部链接来获取内容。不过，我可以根据您提供的链接提供一些信息。

Microsoft GitHub 上的 OmniParser 项目是一个旨在提供跨多种数据源和格式的解析能力的工具。以下是关于 OmniParser 的一些可能的信息和总结：

### OmniParser 简介
- **目的**：OmniParser 的目标是使解析数据变得简单和快速，它支持多种数据源，包括网页、PDF、文档等。
- **功能**：它能够解析不同类型的数据，并提取结构化信息。
- **语言和平台**：通常这类项目会支持多种编程语言和平台。

### 使用场景
- **数据分析**：对于需要处理和分析大量不同格式数据的场景，OmniParser 可以帮助自动化数据提取过程。
- **信息提取**：在需要从各种文档中提取关键信息时，OmniParser 可以提供帮助。

### 技术特点
- **多语言支持**：OmniParser 可能支持多种语言的数据解析，包括中文。
- **可扩展性**：该项目可能允许用户添加自定义解析器来支持特定的数据格式或数据源。

### 如何使用
- **安装**：通常需要通过 pip 等工具安装。
- **配置**：可能需要配置解析器来适应不同的数据源。
- **调用**：编写代码来调用解析器并处理提取的数据。

### 翻译内容（假设内容是英文）
If the content of OmniParser is not in Chinese, here's a general translation of what you might find on the provided link:

"OmniParser is a project on GitHub by Microsoft that aims to provide a parsing tool capable of handling various data sources and formats. It simplifies and accelerates the process of parsing data and supports multiple data sources, including web pages, PDFs, and documents. The tool is designed to work with different programming languages and platforms, making it a versatile solution for data analysis and information extraction tasks."

请注意，这只是一个基于假设的翻译，实际内容可能会有所不同。如果您需要具体的操作指南或详细信息，建议直接访问[Microsoft GitHub 上的 OmniParser](https://microsoft.github.io/OmniParser/)页面。

Post by: fzliu

Comments:

Smaug123: To a considerable extent, we are stuck in the world we live in; but I am reminded of a quote by Guillaume Allais:<p>> My entire job seems to be repeating variations of "never start by forgetting the user's stated intent only to then attempt to guess it".

Smaug123: 在很大程度上，我们被困在我们生活的世界里；但我想起了Guillaume Allais的一句话：<p>&gt；我的整个工作似乎都在重复&quot；永远不要忘记用户；&quot；他表示的意图只是试图猜测；。

trq_: This is awesome, can't wait for evals against Claude Computer Use!

trq_: 这太棒了，可以；不要等待对克劳德电脑使用的惩罚！

s3tt3mbr1n1: Has anyone gotten this to work?<p>Copying the repo and downloading the models through HuggingFace or manually does not seem to work, you get errors indicating missing files.

s3tt3mbr1n1: 有人让这个工作吗<p> 复制仓库并通过HuggingFace或手动下载模型似乎不起作用，您会收到指示文件丢失的错误。

amelius: Can it detect ads and mask them out?

amelius: 它能检测广告并屏蔽它们吗？

akshayKMR: Does it also tell the coordinates (x,y) of the annotated box w.r.t. the screenshot dimensions?

akshayKMR: 它是否也告诉了带注释的框相对于屏幕截图尺寸的坐标（x，y）？