【Hacker News搬运】推出HN:Skyvern(YC S23)——用于浏览器自动化的开源AI代理
-
Title: Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations
推出HN:Skyvern(YC S23)——用于浏览器自动化的开源AI代理
Text: Hey HN, we’re Suchintan and Shu from Skyvern (<a href="https://www.skyvern.com">https://www.skyvern.com</a>). We’re building an open source tool to help companies automate browser-based workflows using LLMs.<p>Our open source repo is at <a href="https://github.com/Skyvern-AI/Skyvern">https://github.com/Skyvern-AI/Skyvern</a>, and we're excited to share our cloud version with you (<a href="https://app.skyvern.com">https://app.skyvern.com</a>) :)<p>Skyvern allows you to define a single (or a series of) goal-based prompts to instruct an agent to complete complex tasks on websites. Here’s a quick demo of Skyvern: <a href="https://www.loom.com/share/76b231309df74a528061fcf102e1967f" rel="nofollow">https://www.loom.com/share/76b231309df74a528061fcf102e1967f</a><p>We built this to solve a specific problem: building browser automations often requires companies to either hire people and scale out operations teams to do tedious manual work, or hire developers to use products like UI-Path or Selenium to build automations.<p>Code-based solutions always run into the same problem: they’re brittle (wow this website added a new pop-up dialog and my script broke), and fail to achieve the same objective across multiple websites (how can I fill out a contact-us form on hundreds of different websites?)<p>We did a Show HN a few months ago (<a href="https://news.ycombinator.com/item?id=39706004">https://news.ycombinator.com/item?id=39706004</a>), and
since then, we’ve onboarded customers for a wide variety of use cases: generating insurance quotes on websites like Geico.com; applying to jobs on websites like lever.co; automating filing permits in local government portals; registering new corporations for employment identification; fetching invoices from hundreds of different portals such as hydroone.com; automating purchasing on a handful of e-commerce websites like zooplus.com; and filling out contact us forms on a bunch of random smb websites (such as HVAC websites).<p>To be able to service all of these, we’ve built and open-sourced quite a few interesting features:<p>(1) a fully-featured React application allowing you to see every action Skyvern is taking in real-time;<p>(2) livestreaming browser instances to allow our users to see what Skyvern is doing when running inside of a docker container;<p>(3) authenticated sessions, integrating with Bitwarden and allowing users to specify Email + Phone + QR-code based 2FAs;<p>(4) “workflows” allowing users to chain multiple goal-based prompts together, which can handle tasks like invoice downloading, or automating purchasing pipelines;<p>(5) processing HTML Elements (ex. identifying + summarizing SVGs) and performing website interactions (ex. Iterating over dynamic autocompletes to fill in address information correctly)<p>(6) “cached workflows”, allowing Skyvern to memorize previous interactions (ie text inputs) and re-use them in future runs.<p>We’ve also been blessed with a few model advancements to solve some of the cost concerns the community brought up. Skyvern’s token costs went down 80% from $15 / 1M tokens (GPT-4V) to $2.50 / 1M tokens (GPT-4O)<p>Despite the model costs going down 80%, Skyvern is still quite expensive to run, so we give every new user $5 of credits to try it out and see if it can be useful for you.<p>We would be honored if you could give it a try at <a href="https://app.skyvern.com">https://app.skyvern.com</a> and share some feedback with us, and we look forward to any and all of your comments!
嘿,HN,我们是来自Skyvern的Suchintan和Shu(<a href=“https://www.Skyvern.com”>https://www.Skyvern.com</a>)。我们正在构建一个开源工具,帮助公司使用LLM自动化基于浏览器的工作流程<p> 我们的开源仓库位于<a href=“https:/;/ github.com/-Skyvern AI//Skyvern”>https:/;github.com;Skyvern AI;Skyvern</a>,我们;很高兴与您分享我们的云版本(<a href=“https:/;app.skyvern.com”>https:";app.sky vern.com</a>):)<p>skyvern允许您定义一个(或一系列)基于目标的提示,以指示代理完成网站上的复杂任务。这是Skyvern的一个快速演示:<a href=“https:/;www.loom.com/ share=76b231309df74a528061fcf102e1967f”rel=“nofollow”>https:/;www.loom.com;分享;76b231309df74a528061fcf102e1967f</a><p>我们构建这个是为了解决一个特定的问题:构建浏览器自动化通常要求公司要么雇佣人员并扩大运营团队来完成繁琐的手工工作,要么雇佣开发人员使用UI Path或Selenium等产品来构建自动化<p> 基于代码的解决方案总是遇到同样的问题:它们很脆弱(哇,这个网站添加了一个新的弹出对话框,我的脚本坏了),并且无法在多个网站上实现相同的目标(我如何在数百个不同的网站上填写联系我们的表格从那时起,我们为各种各样的用例注册了客户:在Geico.com等网站上生成保险报价;在lever.co等网站上申请工作;在地方政府门户网站上实现许可证备案自动化;注册新公司以进行就业识别;从数百个不同的门户网站(如hydroone.com)获取发票;在zooplus.com等少数电子商务网站上实现自动购买;在一堆随机的smb网站(如HVAC网站)上填写联系我们表格<p> 为了能够为所有这些提供服务,我们构建并开源了一些有趣的功能:<p>(1)一个功能齐全的React应用程序,允许您实时查看Skyvern正在采取的每一个行动<p> (2)直播浏览器实例,让我们的用户看到Skyvern在docker容器内运行时正在做什么<p> (3)认证会话,与Bitwarden集成,允许用户指定基于电子邮件+电话+二维码的2FA<p> (4)“工作流”允许用户将多个基于目标的提示链接在一起,可以处理发票下载或自动化采购流程等任务<p> (5)处理HTML元素(例如识别+总结SVG)并执行网站交互(例如迭代动态自动补全以正确填写地址信息)<p>(6)“缓存工作流”,允许Skyvern记住之前的交互(即文本输入)并在未来的运行中重复使用<p> 我们还幸运地获得了一些模型改进,以解决社区提出的一些成本问题。Skyvern的代币成本从15美元下降了80%;1M代币(GPT-4V)兑换2.50美元;1M代币(GPT-4O)<p>尽管模型成本下降了80%,但Skyvern的运行成本仍然很高,所以我们给每个新用户5美元的积分来尝试它,看看它是否对你有用<p> 如果您能在<a href=“https:”app.skyvern.com“>https:”尝试一下,我们将不胜荣幸/;app.skyvern.com</a>并与我们分享一些反馈,我们期待您的任何和所有评论!
Url: https://github.com/Skyvern-AI/Skyvern
很抱歉,作为一个AI,我无法直接访问互联网或GitHub上的资源,包括您提供的链接。因此,我无法使用JinaReader或其他工具来抓取或分析该链接中的内容。 不过,我可以根据您提供的链接信息给您一些一般性的指导。 Skyvern 是一个开源项目,根据您提供的 GitHub 链接,它似乎是一个 AI 相关的项目。以下是一般性的步骤,如果您想抓取分析这个项目的内容并进行总结: 1. **访问 GitHub 项目页面**:首先,您需要访问 [Skyvern 的 GitHub 仓库](https://github.com/Skyvern-AI/Skyvern)。 2. **安装依赖**:根据项目的 README 文件,安装项目所需的依赖。 3. **阅读文档**:仔细阅读项目的文档,了解项目的功能、安装和使用说明。 4. **代码分析**:如果您对项目代码感兴趣,可以使用代码分析工具(如 JinaReader)来分析代码结构和功能。 5. **内容抓取**:如果项目包含可抓取的内容(如博客文章、数据集等),可以使用抓取工具来获取这些内容。 6. **翻译非中文内容**:如果项目包含非中文内容,您可以使用在线翻译工具(如 Google 翻译)将内容翻译成中文。 7. **总结**:将抓取到的信息、功能、代码等要点进行整理,形成总结。 如果您有具体的代码或文档内容需要分析,可以提供给我,我会尽力帮助您进行总结。
Post by: suchintan
Comments:
sahmeepee: Probably not the first AI wrapper around Playwright this week, and certainly not the first this month.<p>I think this use case of automation in a BPA sense is more compelling than using it for test automation, because the latter is much more concerned with the precision and repeatability of the process. For the BPA task, arguably you care only about the outcome and it often doesn't matter if it gets there via some crazy route.<p>Part of the problem for me is that your example video shows a big wodge of prompt that had to be written to make this work and then a few kb of payload data (parameters) in a plaintext, non-csv format. If the expectation is that this replaces someone just using Playwright with codegen due to that being too technical, I'm not convinced there is a huge group of people who can manage one task but not the other.<p>Furthermore, you are expecting them to pass over their website login credentials and apparently their credit card details too, in plain text. You had better have a very solid idea of how to handle that sensitive data to avoid serious consequences if your users' skyvern accounts are compromised.<p>I think the frequency of website redesigns is oversold by people producing these LLM-driven Playwright wrappers, especially when targeting old-fashioned or government sites. As an example, we have had a suite of lengthy Playwright browser automations to interact with a government site for a few years and have had to maintain them only once, when the agency's business process changed. The prompt would also have needed to change had we used Skyvern, as would the payload, because the process was different. The difference with the Playwright automation, though, is that we could use assertions to verify steps had succeeded/failed and data had been recorded correctly, so we would know the process needed updating. I can't see that option in Skyvern which would have me worrying that process changes would be overlooked and we would unknowingly start entering the wrong data or missing steps.
sahmeepee: 可能不是本周Playwright的第一个AI包装器,当然也不是本月的第一个<p> 我认为这种BPA意义上的自动化用例比将其用于测试自动化更有说服力,因为后者更关心过程的精度和可重复性。对于BPA任务,可以说你只关心结果,而结果往往并不重要;如果它通过某种疯狂的路线到达那里也没关系<p> 对我来说,部分问题在于,你的示例视频显示了一大堆必须编写的提示,然后是明文、非csv格式的几kb有效载荷数据(参数)。如果期望由于技术性太强而用codegen取代仅使用Playwright的人;我不相信有一大群人可以管理一项任务,但不能管理另一项任务<p> 此外,您希望他们以纯文本形式传递他们的网站登录凭据,显然还有他们的信用卡详细信息。你最好对如何处理这些敏感数据有一个非常扎实的想法,以避免如果你的用户;skyvern的账户被攻破<p> 我认为网站重新设计的频率被制作这些LLM驱动的剧作家包装的人夸大了,尤其是在针对老式或政府网站时。例如,我们已经有一套冗长的Playwright浏览器自动化程序与政府网站交互了几年,并且只需要维护一次,当该机构;的业务流程发生了变化。如果我们使用Skyvern,提示也需要改变,有效载荷也是如此,因为过程不同。不过,与Playwright自动化的不同之处在于,我们可以使用断言来验证步骤是否成功;失败,数据记录正确,因此我们知道该过程需要更新。我可以;我没有在Skyvern中看到这个选项,这会让我担心流程更改会被忽视,我们会在不知不觉中开始输入错误的数据或缺少步骤。
glorpsicle: Congrats on the launch! I've been keeping up with you folks since you last posted (a few months ago, I believe). How does Anthropic's recent announcement of Claude's "computer use" abilities grab you? What key differentiators does Skyvern have, at this point in time ("computer use" with Claude being relatively new)?
glorpsicle: 祝贺发射!我;自从你上次发帖以来(我相信是几个月前),我一直在和你们保持联系。Anthropic如何;Claude最近宣布;s";计算机使用”;能力吸引你?目前,Skyvern有哪些关键的区别(“计算机使用”,而Claude相对较新)?
Workaccount2: Anyone building a start-up on 3rd party LLMs at this point has to have some big cajones. Or you need a smash-and-grab business model. Serious risk if your horizon is measured in years instead of months.<p>Anthropic threw their hat in this ring yesterday, and it will very likely be followed by OpenAI and Google soon. Godspeed.
Workaccount2: 在这一点上,任何在第三方LLM上建立初创公司的人都必须有一些大的成就。或者你需要一个抢购一空的商业模式。如果你的视野是以年而不是月来衡量的,那么风险就很大<p> Anthropic昨天向这个戒指致敬,OpenAI和谷歌很可能很快也会效仿。祝你好运。
sirmarksalot: As with any of these LLM workflow automation tools, it raises a few questions about each potential use case, and the likely long-term outcomes.<p>1. Is this working around friction due to a lack of interoperability between tools? For example, is this something that would be more efficient if the owner of the website exposed a REST service? Will the existence of this tool disincentivize companies from exposing services when it makes sense?<p>2. If there is a good reason for the lack of a service endpoint, perhaps for security reasons, will your automation workflow be used to bypass those security measures? Could your tool be used by malicious actors to disable major services? Are you that malicious actor yourself? Will your tool be used by scalpers to prevent consumers from buying high-demand products?<p>3. If this is being used to work around deferred maintenance with internal tools and processes, will the existence of these kind of tools be used by management to justify further deferral of that maintenance? Will your tool become a critical piece of the support staff's workflow?<p>4. If your tool is being used in good faith to work around anti-patterns in website design, will the owner of the website be incentivized to break your workflow? Is your use case just a step in an arms race?<p>These are the thoughts that go through my head whenever I hear about software being laid on top of complicated processes, where instead of simplifying the underlying processes, we add another layer of complexity to sweep it under the rug. I'm sure that people will find your project useful, but I wonder what the longer-term effects will be.
sirmarksalot: 与这些LLM工作流自动化工具一样,它对每个潜在用例以及可能的长期结果提出了一些问题<p> 1。这能解决由于工具之间缺乏互操作性而产生的摩擦吗?例如,如果网站的所有者公开了REST服务,这会更有效率吗?这个工具的存在是否会抑制公司在有意义的情况下公开服务<p> 2。如果缺乏服务端点有充分的理由,也许是出于安全原因,你的自动化工作流程会被用来绕过这些安全措施吗?你的工具会被恶意行为者用来禁用主要服务吗?你自己就是那个恶毒的演员吗?你的工具会被黄牛用来阻止消费者购买高需求的产品吗<p> 3。如果这被用来解决内部工具和流程的延期维护问题,管理层是否会利用这些工具的存在来证明进一步推迟维护是合理的?您的工具是否会成为支持人员的关键部分;工作流程<p> 4。如果你的工具被善意地用于解决网站设计中的反模式问题,网站所有者是否会受到激励而破坏你的工作流程?你的用例只是军备竞赛中的一步吗<p> 每当我听说软件被放在复杂的流程之上时,这些想法就会萦绕在我的脑海里,我们不是简化底层流程,而是增加另一层复杂性来掩盖它。我;我相信人们会发现你的项目很有用,但我想知道它的长期影响是什么。
mmaunder: Congrats!!! And super cool that you've open sourced it under the AGPL. Sorry if this is answered in the docs but I did a brief search on the source and noticed you're not using LangChain but do plan to integrate it so it can be offered to that community. I'm curious if you wouldn't mind talking about what you did use to create the chain of thought/actions logic in Skyvern and if you had to start work today if you'd consider going the LangChain/Graph route? Thanks.
mmaunder: 恭喜!!!你真是太酷了;我在AGPL下开源了它。很抱歉,如果文档中已经回答了这个问题,但我对来源进行了简短的搜索,注意到你;我们没有使用LangChain,但确实计划将其集成,以便提供给该社区。我;我很好奇你是否愿意;我不介意谈谈你用什么来创建思维链;Skyvern中的操作逻辑,以及如果你今天必须开始工作,如果你;d考虑去LangChain;路线图?谢谢。