【Hacker News搬运】电脑使用,新的克劳德3.5十四行诗和克劳德3.5俳句
-
Title: Computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku
电脑使用,新的克劳德3.5十四行诗和克劳德3.5俳句
Text:
Url: https://www.anthropic.com/news/3-5-models-and-computer-use
由于我无法直接访问外部链接,我将根据您提供的URL和您的要求,模拟如何使用JinaReader(一个假设的文本分析工具)来抓取、分析并总结来自Anthropic的这篇新闻文章的内容,并将非中文内容翻译成中文。 首先,我们需要模拟JinaReader的抓取和分析过程: ```python # 假设的JinaReader工具类 class JinaReader: def __init__(self, url): self.url = url self.content = self.fetch_content() def fetch_content(self): # 这里模拟从URL抓取内容的过程 # 实际应用中可能需要使用requests库等来抓取网页内容 return """ <h1>3.5 Models and Computer Use</h1> <p>This is the title of the article. Here we discuss the recent advancements in AI models and their impact on computer use.</p> <p>In this section, we explore the evolution of AI models from simple rule-based systems to complex neural networks. We also discuss how these models are being used in various applications.</p> <p>Anthropic is leading the way in this field with its cutting-edge research and development. The company has developed a new AI model that can perform tasks such as language translation, image recognition, and natural language understanding.</p> """ def extract_text(self): # 从内容中提取文本 return self.content.split("<")[1:] def summarize(self): # 概括文章内容 text = self.extract_text() summary = "This article discusses the advancements in AI models and their impact on computer use, highlighting Anthropic's contributions in this field." # 如果内容包含非中文,进行翻译 if "This" not in text[1] or "Anthropic" not in text[3]: summary += " This article discusses the advancements in AI models and their impact on computer use, highlighting Anthropic's contributions in this field. 以下是文章的中文翻译:" summary += " 这篇文章讨论了人工智能模型在计算机使用方面的发展,强调了Anthropic在这一领域的贡献。" return summary # 使用JinaReader reader = JinaReader("https://www.anthropic.com/news/3-5-models-and-computer-use") print(reader.summarize())
以上代码模拟了JinaReader的基本功能,包括抓取内容、提取文本、总结内容,并在内容中检测到非中文部分后进行翻译。请注意,这里的翻译是简单的字符串替换,实际应用中可能需要使用更复杂的翻译工具或服务。
由于我无法访问实际网页内容,上述代码只是一个模拟示例。在实际应用中,您可能需要使用像BeautifulSoup这样的库来解析HTML内容,并使用像Google Translate API这样的服务来进行准确的翻译。
## Post by: weirdcat ### Comments: **attentive**: They need to work on their versioning.<p>"3.5 Sonnet (New)", WTAF? - just call it 3.6 Sonnet or something.<p>Is it "New" sonnet? is it "upgraded"? Is there a difference? How do I know which one I use?<p>I can understand claude-3-5-sonnet-20241022, but that's not what users see. > **attentive**: 他们需要改进版本控制<p> ";3.5十四行诗(新)";,WTAF?-就叫它3.6十四行诗什么的<p> 是否";新";十四行诗?是吗";升级“;?有区别吗?我怎么知道我用的是哪一个<p> 我能理解claude-3-5-connect-20241022,但那是;这不是用户看到的。 **anotherpaulg**: The new Sonnet tops aider's code editing leaderboard at 84.2%. Using aider's "architect" mode it sets the SOTA at 85.7% (with DeepSeek as the "editor" model).<p><pre><code> 84% Claude 3.5 Sonnet 10/22 80% o1-preview 77% Claude 3.5 Sonnet 06/20 72% DeepSeek V2.5 72% GPT-4o 08/06 71% o1-mini 68% Claude 3 Opus </code></pre> It also sets SOTA on aider's more demanding refactoring benchmark with a score of 92.1%!<p><pre><code> 92% Sonnet 10/22 75% o1-preview 72% Opus 64% Sonnet 06/20 49% GPT-4o 08/06 45% o1-mini </code></pre> <a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a> > **anotherpaulg**: 新的Sonnet超越了aider™;s代码编辑排行榜,占84.2%。使用辅助工具;s";建筑师”;它将SOTA设置为85.7%(DeepSeek作为“编辑器”模型)<p> <上一页><代码>84%克劳德3.5十四行诗10x2F;2280%预览77%克劳德3.5十四行诗06;2072%DeepSeek V2.572%的GPT-4o 08;0671%o1迷你68%克劳德3 Opus</code></pre>它还将SOTA设置为aider;这是一个要求更高的重构基准测试,得分为92.1%<p> <上一页><代码>92%Sonnet 10x2F;2275%预览72%Opus64%十四行诗06;202008年4月49%的GPT-4o;0645%o1迷你</code></pre><a href=“https:”aider.chat“”docs“”排行榜“”rel=“nofollow”>https:”/;aider.chat;docs™;排行榜/</一 **LASR**: This is actually a huge deal.<p>As someone building AI SaaS products, I used to have the position that directly integrating with APIs is going to get us most of the way there in terms of complete AI automation.<p>I wanted to take at stab at this problem and started researching some daily busineses and how they use software.<p>My brother-in-law (who is a doctor) showed me the bespoke software they use in his practice. Running on Windows. Using MFC forms.<p>My accountant showed me Cantax - a very powerful software package they use to prepare tax returns in Canada. Also on Windows.<p>I started to realize that pretty much most of the real world runs on software that directly interfaces with people, without clearly defined public APIs you can integrate into. Being in the SaaS space makes you believe that everyone ought to have client-server backend APIs etc.<p>Boy was I wrong.<p>I am glad they did this, since it is a powerful connector to these types of real-world business use cases that are super-hairy, and hence very worthwhile in automating. > **LASR**: 这实际上是一件大事<p> 作为构建AI SaaS产品的人,我曾经认为,直接与API集成将使我们在完全的AI自动化方面取得最大的进展<p> 我想尝试解决这个问题,并开始研究一些日常业务以及他们如何使用软件<p> 我姐夫(他是一名医生)向我展示了他们在他的诊所中使用的定制软件。在Windows上运行。使用MFC表单<p> 我的会计师向我展示了Cantax——一个非常强大的软件包,他们用它来准备加拿大的纳税申报表。也在Windows上<p> 我开始意识到,几乎大多数现实世界都运行在直接与人交互的软件上,而没有可以集成的明确定义的公共API。进入SaaS领域会让你相信每个人都应该有客户端-服务器后端API等等。<p>天哪,我错了<p> 我很高兴他们这样做,因为它是连接这些类型的真实世界业务用例的强大连接器,这些用例非常复杂,因此非常值得自动化。 **marsh_mellow**: Anthropic blog post outlining the research process: <a href="https://www.anthropic.com/news/developing-computer-use" rel="nofollow">https://www.anthropic.com/news/developing-computer-use</a><p>Computer use API documentation: <a href="https://docs.anthropic.com/en/docs/build-with-claude/computer-use" rel="nofollow">https://docs.anthropic.com/en/docs/build-with-claude/compute...</a><p>Computer Use Demo: <a href="https://github.com/anthropics/anthropic-quickstarts/tree/main/computer-use-demo">https://github.com/anthropics/anthropic-quickstarts/tree/mai...</a> > **marsh_mellow**: Anthropic博客文章概述了研究过程:<a href=“https:”www.anthropic.com“news”开发计算机使用“rel=”nofollow“>https:”/;www.anthropic.com;news;开发-计算机使用</a><p>计算机使用API文档:<a href=“https:”docs.anthropic.com“en”docs“用claude构建”计算机使用“rel=”nofollow“>https:”/;docs.anthropic.com;en■;docs™;用claude构建;计算</a> <p>电脑使用演示:<a href=“https:"Ś)?(!github.com"; anthropics"于anthropicŝquickstarts \346t;"/;github.com;人类学;人因性快速启动;树;迈</一 **HarHarVeryFunny**: The "computer use" ability is extremely impressive!<p>This is a lot more than an agent able to use your computer as a tool (and understanding how to do that) - it's basically an autonomous reasoning agent that you can give a goal to, and it will then use reasoning, as well as it's access to your computer, to achieve that goal.<p>Take a look at their demo of using this for coding.<p><a href="https://www.youtube.com/watch?v=vH2f7cjXjKI" rel="nofollow">https://www.youtube.com/watch?v=vH2f7cjXjKI</a><p>This seems to be an OpenAI GPT-o1 killer - it may be using an agent to do reasoning (still not clear exactly what is under the hood) as opposed to GPT-o1 supposedly being a model (but still basically a loop around an LLM), but the reasoning it is able to achieve in pursuit of a real world goal is very impressive. It'd be mind boggling if we hadn't had the last few years to get used to this escalation of capabilities.<p>It's also interesting to consider this from POV of Anthropic's focus on AI safety. On their web site they have a bunch of advice on how to stay safe by sandboxing, limiting what it has access to, etc, but at the end of the day this is a very capable AI able to use your computer and browser to do whatever it deems necessary to achieve a requested goal. How far are we from paperclip optimization, or at least autonomous AI hacking ? > **HarHarVeryFunny**: ";计算机使用”;能力令人印象深刻<p> 这远远不止是一个能够将您的计算机用作工具(并了解如何做到这一点)的代理——它;它基本上是一个自主的推理代理,你可以给它一个目标,然后它会使用推理,以及它;s访问您的计算机,以实现该目标<p> 看看他们使用这个进行编码的演示<p> <a href=“https:”www.youtube.com“观看?v=vH2f7cjXjKI”rel=“nofollow”>https:”/;www.youtube.com;看?v=vH2f7cjXjKI</a><p>这似乎是一个OpenAI GPT-o1杀手——它可能使用一个代理来进行推理(仍然不清楚到底是什么),而不是被认为是一个模型(但基本上仍然是围绕LLM的循环),但它在追求现实世界目标时能够实现的推理非常令人印象深刻。它;如果我们没有;在过去的几年里,我没有时间适应这种能力的升级<p> 它;从Anthropic的POV来看,这也很有趣;我们专注于人工智能安全。在他们的网站上,他们有很多关于如何通过沙盒、限制其访问权限等来保持安全的建议,但归根结底,这是一个非常有能力的人工智能,能够使用您的计算机和浏览器做任何它认为必要的事情来实现所请求的目标。我们离回形针优化还有多远,或者至少离自主人工智能黑客还有多远?